Wednesday, April 15, 2015

Datadog Monitoring

Datadog is a SaaS-based monitoring and analytics platform for IT infrastructure, operations and development teams. It is part of my day-to-day tools inventory.

We use Datadog at my work-place extensively, specially now that we have a fleet of live services running in AWS.

How does Datadog work?

Datadog has datadog agent that runs passively in background (if configured that way!). The agent captures all KPI of the host the agent is running. Besides this you can configure your app to send statsd metrics over jmx which the Datadog agent sniffs and finally reports back to Datadog.

How to install Datadog agents?

Datadog offers a free trial account. When you signup you will be ask to register an agent. You can do it via a simple curl as shown below

DD_API_KEY=APIKEY bash -c "$(curl -L https://raw.githubusercontent.com/DataDog/dd-agent/master/packaging/datadog-agent/source/install_agent.sh)"

It takes care of installing the agent, configuring it and installing any dependencies it needs. Once installed the agent will start sending stats of the host you installed the agent on to Datadog.

Immediately you should see metrics being reported to Datadog as shown below - 



Next lets use one of the several integrations Datadog provides. I'll use Python Integration to demonstrate the integration.

easy_install  dogstatsd-python

then lets say you have Python-Flask app with a healthcheck endpoint

@app.route("/sysstat")
def sysstat():
        statsd.increment('sysstat')
 response= jsonify(loadTime=moduleLoadTime)
 response.status_code=200
 return response
Once you have the app running the agent that you installed will start reporting a 'sysstat' metric to Datadog immediately as can be seen here - 



How to setup monitoring?

One of the great things of Datadog is not only you can send KPI metrics but you can setup alerts for them which you can then send to your favorite altering system like e.g. PagerDuty.

One such monitoring alert I set was for CPU usage for the host machine running my app. If the CPU usage went above 55% I setup Datadog to send alert to folks who are on-call



You can also use the Datadog API integration to send custom metric via the agent to Datadog.

There are tons of other integrations like for AWS, Openstack and many other cloud vendors which is neatly coupled to pull ton of other metrics from these vendors. 

I hope you liked my short post on Datadog and how you can use it for monitoring.

No comments:

Post a Comment