In this article, I exaplain how to use Norikra to count HTTP status with Fluentd. Previously, it was common to use fluent-plugin-datacounter for the purpose, but, today I will explain how to achieve the same feature with Norikra.
REMARK: This article is a short translated version of Japanese article written at http://blog.livedoor.jp/sonots/archives/37921050.html
What Not Covered
In this article, I will not cover, how to install Norikra, how to install Fluentd.
What is Norikra, and Why to Use Norikra
Norikra is a Schema-less Stream Processor based on Esper, which is a kind of CEP (Complex Event Processing) engine. With Norikra, we can use highly-functional SQL-like query for processing streaming data.
We can achieve similar things by using plugins like fluent-plugin-datacounter. However, a Fluentd process can utilize only one CPU core because Fluentd uses CRuby, in contrast, a Norikra process can utilize multiple CPU cores because Norikra uses JVM (Esper is wirtten with Java, and Norikra is written with JRuby). This is the 1st reason to use Norikra. This is attractive especially when we need to processs heavy log data where one CPU core is not sufficient to process it in real-time.
The 2nd reason to use Norikra is because it does not require to restart for adding or removing queries. Fluentd requires us to restart its processes when we change their configuration files to add or remove new settings. I need the graceful restart.
Data Counting using fluent-plugin-datacounter
Let me describe how to perform data counting using fluent-pugin-datacounter first. We will replace this to Norikra later.
Requirement
Assume there is a demand as followings:
- Want to count HTTP status codes in a specified time interval by seeing status fields written in logs.
- Want to count the status code for each host.
- Want to count the status code as an aggregation of all hosts.
Specification of Input Data (Log)
Assume that log data are sent from Fluentd agent with tags like “visualizer.logname.hostname” where logname stands for an arbitrary string which operation engineers name to distinguish logs.
Also, assume that messages contain time
, status
, reqtime
, method
, uri
fields.
Example) visualizer.api_restful.host001
1
|
|
Configuration
The status code counting can be achieved with the following configuration using fluent-plugin-datacounter.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
|
The output messages become as followings (unnecessary fields are ommitted):
1 2 |
|
Let us think of replacing this with Norikra.
Data Counting using Norikra
Referring the document of Norikra, and README of fluent-plugin-norikra, we will try to replace fluent-plugin-datacounter with out_norikra, Norikra Query (accurately, EPL of Esper), and in_norikra.
in_norikra
Configure Fluentd to receive data, and transfer the data to Norikra.
This configuration sets a target
of Norikra (which are like the table
of RDBMS) to be a logname
of the “visualizer.logname.hostname” tag.
Also, this sets hostname
to the host
field of messages by extracting it from tags.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
|
Norikra Query
Surprisingly, we can write Java code in a Norikra query. So, we can realize same conditions with the case of fluent-plugin-datacounter by using String#matches
.
Also, we can take couting for each host using GROUP BY
statement. Cool.
EDIT: I replaced COUNT(1, status.matches('^2..$'))
to COUNT(1, status REGEXP '^2..$')
because the master tagomoris, the author of Norikra, said the latter is better in performance. Thinking of performance, replacing REGEXP to LIKE, or converting status field to interger and using COUNT(1, status / 100 = 2)
would achieve better performances.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
|
Notice that the query can be also constructed as followings:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
|
but, I felt preparing many queries make difficult to manage them. So, I prefered the former COUNT(1, status REGEXP '^2..$')
way.
in_norikra
We configure Fluentd to retrieve results of Norikra in each 60 seconds.
Here, we set tag query_name
so that the tag of retrieved messages to be the query name which we specified on query add
commands.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
|
Done!
The output wil be as belows:
1 2 |
|
Conclusion
I explained how to replace fluent-plugin-datacounter with Norikra. Try it out!