Nginx Tracking with MySQL
#web#nginx
Table of Contents
Nginx log files are hard to run analytics on. A MySQL server is a much better long-term solution for log files because it allows for arbitrary queries to be run against all of your data.
I'll start with a simple nginx proxy server and use rsyslog to add new log entries to MySQL.
server { server_name blog.tygr.info; listen 80; location / { proxy_pass http://127.0.0.1:8080/; include /etc/nginx/proxy_params; } }
JSON Logging
Logging as a JSON string is much more extensible. With JSON, I can
add fields to new log entries without invalidating old log
entries. Rsyslog uses "Common Event Expressions" (CEEs) to express
log entries as a list of JSON elements each separated by a new
line. Each JSON tree needs an @cee:
cookie in front of it to be
recognized as a valid CEE log entry.
I'll modify /etc/nginx/nginx.conf
to add a json
log entry format
which includes the @cee:
cookie and common metadata about each
request.
http { ## # Logging Settings ## log_format json '@cee:{' '"path":"$request_uri",' '"ip":"$remote_addr",' '"time":"$time_iso8601",' '"method":"$request_method",' '"status":"$status",' '"response_time":"$request_time",' '"bytes":"$body_bytes_sent",' '"referer":"$http_referer"' '}'; }
I'll use the json
format for the access_log
in the blog, while
using a standard error_log
format:
server { location / { ... } access_log /var/log/blog/access.log json; error_log /var/log/blog/error.log; }
access.log
example line:
@cee:{"path":"/emacs/index.html","ip":"0.0.0.0","time":"2022-03-31T15:22:20+00:00","method":"GET","status":"200","response_time":"1.491","bytes":"3634","referer":"-"}
Rsyslog script
Setup
The script requires mysql-server
and rsyslog-mysql
. Install
mysql with apt install mysql-server
and following along with the
prompts.
Then, enter mysql
to create a database for nginx:
mysql> SHOW DATABASES; mysql> CREATE DATABASE nginx; mysql> USE nginx; mysql> CREATE TABLE blog ( id BIGINT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY, path VARCHAR(255), ip VARCHAR(255), time DATETIME, method VARCHAR(255), status INT, response_time DOUBLE, bytes INT, referer VARCHAR(255) ); mysql> SHOW COLUMNS FROM blog;
Create a user for rsyslog:
mysql> CREATE USER 'rsyslog'@'localhost' IDENTIFIED BY '*******'; mysql> GRANT PRIVILEGES ON nginx.* TO 'rsyslog'@'localhost';
Keep track of the password you used.
Next, install rsyslog-mysql
with apt install
rsyslog-mysql
. When prompted, answer no
to set up default
database.
Conf file
Rsyslog uses a single file, /etc/rsyslog.conf
, to configure how
to handle log entries created by system services. This file
combines all files in /etc/rsyslog.d
in alphabetical order, so in
order to add to the rsyslog.conf
file, I should create files in
/etc/rsyslog.d
which load after the 50-default.conf
file that
already exists in that folder. Each file I create should then be
prefixed with 51
or higher.
Before I do so, I will add some modules to the actual
/etc/rsyslog.conf
file:
# loads log entries from input file module(load="imfile") # parses CEE style log entries module(load="mmjsonparse") # provides access to a MySQL database module(load="ommysql")
MySQL template
Next, I'll create a template to insert a log entry into the mysql
database, called /etc/rsyslog.d/51-mysql.conf
. Each property
in
the template will pull out a value from the JSON log entry, and
each constant
is plain sql code. The query I want to run is:
INSERT INTO $table ( path, ip, time, method, status, response_time, bytes, referer ) VALUES ( $path, $ip, $time, $method, $status, $response_time, $bytes, $referer )
And this is what it looks like as an rsyslog template:
template(name="mysql-common" type="list" option.sql="on") { constant(value="INSERT INTO ") property(name="$!table") constant(value= " (path, ip, time, method, status, response_time, bytes, referer) VALUES ('") property(name="$!path") constant(value="','") property(name="$!ip") constant(value="','") property(name="$!time") constant(value="','") property(name="$!method") constant(value="','") property(name="$!status") constant(value="','") property(name="$!response_time") constant(value="','") property(name="$!bytes") constant(value="','") property(name="$!referer") constant(value="')") }
The option.sql="on"
option escapes all sql column values.
52-blog.conf
Next, I'll load the log entries from the nginx access log into
rsyslog in a new file named /etc/rsyslog.d/52-blog.conf
, which
will load after /etc/rsyslog.d/51-mysql.conf
. Because rsyslog is
run by a single file, there's now way to distinguish between log
entries coming from one source or another, so I will add the
required 'Tag' option.
# Tag log entries from blog with 'blog:' input(type="imfile" File="/var/log/blog/access.log" Tag="blog:")
If the syslog tag is equal to blog:
, I should run the
mmjsonparse
action in order to extract information from the log
entry:
if( $syslogtag == 'blog:') then { action(type="mmjsonparse") }
I will also set the MySQL table to 'blog' to tell the mysql
template to insert data into the blog
table I created earlier.
if( $syslogtag == 'blog:') then { action(type="mmjsonparse") set $!table = "blog"; }
Finally, I will run the mysql template to insert the data into the
table. Here is /etc/rsyslog.d/52-blog.conf
in full:
input(type="imfile" File="/var/log/blog/access.log" Tag="blog:") if($syslogtag == 'blog:') then { action(type="mmjsonparse") set $!table = "blog"; action(type="ommysql" Server="localhost" db="nginx" uid="rsyslog" pwd="*********" Template="mysql-common") }
The pwd
here should be the same password as the one set up for
the rsyslog mysql user earlier.
Reload rsyslog with service rsyslog restart
and check that it is
working correctly with systemctl status rsyslog
.
Org mode analytics
Now that each request is being logged to the mysql server, running analytics on the blog is as simple as writing mysql scripts in org mode.
First, I need to load the babel language engine for sql
:
(org-babel-do-load-languages 'org-babel-load-languages '((sql . t)))
For each query, I want to disregard any requests from a set of IP addresses which I regularly use. I have a sql fragment I wrote which I can include in any of the queries:
* Excluded IPs #+begin_src sql ip NOT IN ("0.0.0.0", "1.1.1.1") #+end_src
Example query: unique visitors each day
* Unique visitors each day #+begin_src sql :engine mysql :noweb yes SELECT DATE(time) AS date, COUNT(DISTINCT(ip)) AS visitors FROM nginx.blog WHERE DATEDIFF(NOW(), time) < 31 AND <<exclude-ips>> GROUP BY date ORDER BY date DESC #+end_src
Example query: last 10 errors
* Last 'n' errors #+begin_src sql :engine mysql :noweb yes SELECT path, ip, time, method, status, response_time FROM nginx.blog WHERE <<exclude-ips>> AND status NOT IN (200, 404) ORDER BY time DESC LIMIT $num #+end_src