Tuesday, December 17, 2013

Logstash Metrics Filter and Graphite Output

Not many people have published more advanced metrics filter configurations. After spending a day with the examples and source code I have a more advanced configuration to share. NOTE: I saw weird behavior with logstash 1.2.2 and I'm not sure if it was my in-progress configuration at the time, but after upgrading to 1.3.1 everything worked as expected.

The problem: We are trying to get metrics on our API usage by user. We were already logging the operation and the user to disk and picking it up in Logstash. Now we want metrics on how frequently each user makes each call.

After quite a bit of googling, I couldn't find an example where the metric name has more than one field name in it. The last message in this thread had some pieces of the puzzle as we needed to translate some special characters to be more graphite-friendly. But I got confused as to what was contained in the metric event. The metrics filter creates a new event, but until I threw the event to a file output and saw it I didn't realize the new metric event didn't have any knowledge of any field of the message that generated it (Actually, the "meter" option can use any name from the original event, but no other option can. I tried to use add_field inside the metrics filter and it didn't work). So instead of mutating the metric event, I have to mutate the original event. Also, you can't gsub a newly added field in the same mutate block, so I had to break the gsub to a second mutate.


    if [type] == "apicalls" {
    mutate {
      add_field => [ "modhost", "%{host}" ]
      add_field => [ "modorgname", "%{org_name}" ]
    }
    mutate {
      gsub => [ "modhost", "\.", "_", "modorgname", "[\.\,\ ]", "_" ]
    }
    metrics {
      meter => [ "apioperations.%{pod}.%{modhost}.%{operation}.byOrg.%{modorgname}" ]
      add_tag => [ "apiopmetric" ]
    }

  }

Now when I send those metrics events to a file output I see my resulting event is

{
"@timestamp":"2013-12-17T19:08:42.968Z",
"@version":"1",
"message":"server.name",
"apioperations.qa1.host_domain.Login.byOrg.my_org_name.count":11,
"apioperations.qa1.host_domain.Login.byOrg.my_org_name.rate_1m":0.0,
"apioperations.qa1.host_domain.Login.byOrg.my_org_name.rate_5m":0.0,
"apioperations.qa1.host_domain.Login.byOrg.v.rate_15m":0.0,
"apioperations.qa1.host_domain.GetModifiedRecipients.byOrg.my_org_name.count":2,
"apioperations.qa1.host_domain.GetModifiedRecipients.byOrg.my_org_name.rate_1m":0.0,
"apioperations.qa1.host_domain.GetModifiedRecipients.byOrg.my_org_name.rate_5m":0.0,
"apioperations.qa1.host_domain.GetModifiedRecipients.byOrg.my_org_name.rate_15m":0.0,
"tags":["apiopmetric"]
}

Next is to figure out how to get those to graphite. Since there is an unknown number of operations and users, I had to go to the source code to really figure out how all the options to the graphite output work. It turns out you can't use the "metrics" option as that would need to enumerate each name. The magical "fields_are_metrics" option sends all the fields in the event to graphite. All you need to do is use "include_metrics" or "exclude_metrics" to  get just what you want to graphite. Our graphite output looks like this (the file output was for debug purposes only and is turned off now that it works):

  if "apiopmetric" in [tags] {
    graphite {
      host => [ "10.1.1.1" ]
      include_metrics => [ "apioperations.*" ]
      fields_are_metrics => true
    }
    file {
      path => "/var/log/logstash/apidebug.log"
    }
  }

And Shazam! All your metrics start flowing into Graphite!