UP | HOME

Nginx Tracking with MySQL
#web#nginx

Table of Contents

Nginx log files are hard to run analytics on. A MySQL server is a much better long-term solution for log files because it allows for arbitrary queries to be run against all of your data.

I'll start with a simple nginx proxy server and use rsyslog to add new log entries to MySQL.

server {
    server_name blog.tygr.info;
    listen 80;

    location / {
	proxy_pass   http://127.0.0.1:8080/;
	include      /etc/nginx/proxy_params;
    }
}

JSON Logging

Logging as a JSON string is much more extensible. With JSON, I can add fields to new log entries without invalidating old log entries. Rsyslog uses "Common Event Expressions" (CEEs) to express log entries as a list of JSON elements each separated by a new line. Each JSON tree needs an @cee: cookie in front of it to be recognized as a valid CEE log entry.

I'll modify /etc/nginx/nginx.conf to add a json log entry format which includes the @cee: cookie and common metadata about each request.

http {
    ##
    # Logging Settings
    ##

    log_format json '@cee:{'
	'"path":"$request_uri",'
	'"ip":"$remote_addr",'
	'"time":"$time_iso8601",'
	'"method":"$request_method",'
	'"status":"$status",'
	'"response_time":"$request_time",'
	'"bytes":"$body_bytes_sent",'
	'"referer":"$http_referer"'
	'}';
}

I'll use the json format for the access_log in the blog, while using a standard error_log format:

server {
    location / {
	...
    }

    access_log /var/log/blog/access.log json;
    error_log /var/log/blog/error.log;
}

access.log example line:

@cee:{"path":"/emacs/index.html","ip":"0.0.0.0","time":"2022-03-31T15:22:20+00:00","method":"GET","status":"200","response_time":"1.491","bytes":"3634","referer":"-"}

Rsyslog script

Setup

The script requires mysql-server and rsyslog-mysql. Install mysql with apt install mysql-server and following along with the prompts.

Then, enter mysql to create a database for nginx:

mysql> SHOW DATABASES;
mysql> CREATE DATABASE nginx;
mysql> USE nginx;
mysql> CREATE TABLE blog (
  id BIGINT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY,
  path VARCHAR(255),
  ip VARCHAR(255),
  time DATETIME,
  method VARCHAR(255),
  status INT,
  response_time DOUBLE,
  bytes INT,
  referer VARCHAR(255)
);
mysql> SHOW COLUMNS FROM blog;

Create a user for rsyslog:

mysql> CREATE USER 'rsyslog'@'localhost' IDENTIFIED BY '*******';
mysql> GRANT PRIVILEGES ON nginx.* TO 'rsyslog'@'localhost';

Keep track of the password you used.

Next, install rsyslog-mysql with apt install rsyslog-mysql. When prompted, answer no to set up default database.

Conf file

Rsyslog uses a single file, /etc/rsyslog.conf, to configure how to handle log entries created by system services. This file combines all files in /etc/rsyslog.d in alphabetical order, so in order to add to the rsyslog.conf file, I should create files in /etc/rsyslog.d which load after the 50-default.conf file that already exists in that folder. Each file I create should then be prefixed with 51 or higher.

Before I do so, I will add some modules to the actual /etc/rsyslog.conf file:

# loads log entries from input file
module(load="imfile")

# parses CEE style log entries
module(load="mmjsonparse")

# provides access to a MySQL database
module(load="ommysql")

MySQL template

Next, I'll create a template to insert a log entry into the mysql database, called /etc/rsyslog.d/51-mysql.conf. Each property in the template will pull out a value from the JSON log entry, and each constant is plain sql code. The query I want to run is:

INSERT INTO $table (
  path, ip, time, method, status,
  response_time, bytes, referer
) VALUES (
  $path, $ip, $time, $method, $status,
  $response_time, $bytes, $referer
)

And this is what it looks like as an rsyslog template:

template(name="mysql-common" type="list" option.sql="on") {
  constant(value="INSERT INTO ")
  property(name="$!table")
  constant(value= " (path, ip, time, method, status, response_time, bytes, referer) VALUES ('")
  property(name="$!path")
  constant(value="','")
  property(name="$!ip")
  constant(value="','")
  property(name="$!time")
  constant(value="','")
  property(name="$!method")
  constant(value="','")
  property(name="$!status")
  constant(value="','")
  property(name="$!response_time")
  constant(value="','")
  property(name="$!bytes")
  constant(value="','")
  property(name="$!referer")
  constant(value="')")
}

The option.sql="on" option escapes all sql column values.

52-blog.conf

Next, I'll load the log entries from the nginx access log into rsyslog in a new file named /etc/rsyslog.d/52-blog.conf, which will load after /etc/rsyslog.d/51-mysql.conf. Because rsyslog is run by a single file, there's now way to distinguish between log entries coming from one source or another, so I will add the required 'Tag' option.

# Tag log entries from blog with 'blog:'
input(type="imfile" File="/var/log/blog/access.log" Tag="blog:")

If the syslog tag is equal to blog:, I should run the mmjsonparse action in order to extract information from the log entry:

if( $syslogtag == 'blog:') then {
    action(type="mmjsonparse")
}

I will also set the MySQL table to 'blog' to tell the mysql template to insert data into the blog table I created earlier.

if( $syslogtag == 'blog:') then {
    action(type="mmjsonparse")

    set $!table = "blog";
}

Finally, I will run the mysql template to insert the data into the table. Here is /etc/rsyslog.d/52-blog.conf in full:

input(type="imfile" File="/var/log/blog/access.log" Tag="blog:")

if($syslogtag == 'blog:') then {
    action(type="mmjsonparse")

    set $!table = "blog";

    action(type="ommysql"
      Server="localhost"
      db="nginx"
      uid="rsyslog"
      pwd="*********"
      Template="mysql-common")
}

The pwd here should be the same password as the one set up for the rsyslog mysql user earlier.

Reload rsyslog with service rsyslog restart and check that it is working correctly with systemctl status rsyslog.

Org mode analytics

Now that each request is being logged to the mysql server, running analytics on the blog is as simple as writing mysql scripts in org mode.

First, I need to load the babel language engine for sql:

(org-babel-do-load-languages 'org-babel-load-languages
                             '((sql . t)))

For each query, I want to disregard any requests from a set of IP addresses which I regularly use. I have a sql fragment I wrote which I can include in any of the queries:

* Excluded IPs
  #+name: exclude-ips
  #+begin_src sql
      ip NOT IN ("0.0.0.0", "1.1.1.1")
  #+end_src

Example query: unique visitors each day

* Unique visitors each day
  #+begin_src sql :engine mysql :noweb yes
     SELECT
       DATE(time) AS date,
       COUNT(DISTINCT(ip)) AS visitors
     FROM nginx.blog
     WHERE
       DATEDIFF(NOW(), time) < 31
       AND <<exclude-ips>>
     GROUP BY date
     ORDER BY date DESC
  #+end_src

Example query: last 10 errors

* Last 'n' errors
  #+header: :var num=10
  #+begin_src sql :engine mysql :noweb yes
    SELECT
      path, ip, time, method, status, response_time
    FROM nginx.blog
    WHERE
      <<exclude-ips>>
      AND status NOT IN (200, 404)
    ORDER BY time DESC
    LIMIT $num
  #+end_src