Earlier, I've mentioned that I intend to start a series of short notes and tips. Today, a new tip.

Logstash tip #1: measuring logstash performance

TL;DR: use the 'metrics' filter to gather metrics about your logstash instance.

A while ago I started using logstash for, well, collecting some logs, and storing them in Elasticsearch.

Everything works like a charm most of the time, if you've got your filters right... except for the fact that logstash itself does not provide any insight into how it is performing. How many messages per second are being processed, etc etc. Of course, you can get that data from Elasticsearch. And that's a good solution most of the time. If it works for you - fine, stop reading :)

Thing is, in my specific setup I'm only indexing a fraction of the logs to Elasticsearch, everything else gets dropped.

Enter the logstash metrics filter. You can pretty much count anything in logstash using this filter.

In my logstash pipeline, I'm adding a logtype field to all my inputs, so let's say the lumberjack input is configured like this:

input {
  lumberjack {
    port => 5043
    ssl_certificate => "/etc/logstash/ssl/logstash-forwarder.crt"
    ssl_key => "/etc/logstash/ssl/logstash-forwarder.key"
    add_field => {
      "logtype" => "application"
    }
  }
}

This way, all the messages that are received by logstash have a logtype field.

Let's say I want to know how many messages are being received, per logtype. Then I can just do:

filter {
  metrics {
    meter => [ "logtype.%{logtype}" ]
    add_tag => "logstash_metric"
    flush_interval => 60
    rates => []
  }
}

Additionally, for the application logs, I want to know how many messages are being processed, per log level (or severity).

if [logtype] == 'application' {
  metrics {
    meter => [
      "loglevel.%{level}"
    ]
    add_tag => "logstash_metric"
    flush_interval => 60
    rates => []
  }
}

What this does, is to create a logtype.${logtype} / loglevel.${loglevel} metric, with additional fields. E.g:

logtype.application.count
logtype.application.rate_1min
logtype.application.rate_5min
logtype.application.rate_15min
...
loglevel.error.count
loglevel.error.rate_1min
...
loglevel.warn.count
loglevel.warn.rate_1min
...

I don't care about the rates, so I can specify that using rates => [](no rates will be collected). At the end, I can write the metrics to a file, to Elasticsearch, etc. In my specific case, I decided to store those metrics in graphite, as that's what I've setup for metrics storage:

output {
  if "logstash_metric" in [tags] {
    graphite {
      host => "localhost"
      port => "2003"
      metrics_format => "general.logstash.*"
      include_metrics => ["log.*\.count"]
      fields_are_metrics => true
    }
  }
}

If you have multiple logstash servers, you might want to replace metrics_format => "general.logstash.*" with metrics_format => "general.logstash.${hostname}.*"

Oh, and if you use graphite, don't forget about configuring your storage aggregation rules. By default, the metrics are treated as gauge, while this here is a counter.

Comments