RSyslog, ZFS, and Storing logs based on the source in my HomeLab

There are many ways to store syslog data, and nearly all of them are better than what I am outlining here. If you’re looking to learn how to deal with syslog at scale, take a look at Graylog, or the Elk Stack or some other similar tool. There are many free and/or open source options to do this. Many of which I’ve setup and used for my employers.

For me though, I’m not looking to load big piles of data into some database and keep it stored for long periods, automatically indexed and cataloged. I don’t need that.

Instead, I’d like to gather logs from a handful of devices, store them as flat files based on the date, and then just throw them away after a month. If I can also store them in a really easy to access compressed format, all the better.

Ok, so step one must be where will I store this data? In my case, I am doing this task on a Raspberry Pi I am already running for something else. Specifically, It’s a PiBox which I have running a stripped down version of Raspberry Pi OS. Nearly any computer you’re already running can handle the load of syslog data from a home lab. It requires nearly no CPU time.

On this Pi, I have a pair of very inexpensive SSD disks and I have them mirrored via ZFS. In addition to some performance benefits this gets me, along with the dependability of data written into a ZFS file system, I’m also leveraging ZFS’s inline compression. For this syslog dataset I have enabled ZStardard Compression.

zfs set compression=zstd Storage/LogData

Using ZFS, with this compression, I am seeing a 7x reduction in my data stored on disk, without needing to spend any time compressing it manually, or needing to decompress it when I want to read it. ZFS handles this for you, and all the files just appear as normal files on the file system. Syslog data is really simple text data, so it compresses very easily.

NAME PROPERTY VALUE SOURCE
Storage/LogData compressratio 7.10x -

I am storing logs from a dozen machines, and it amounts to a very modest ~6MB/day. With compression inline it means that, on disk, I am storing under 1MB/day. Pretty nice.

With a space to park the data ready, we can move onto Step 2, which is getting rsyslog setup on the machine to listen for, and store, syslog data being sent to it.

You’ll want to get rsyslog installed on your system. It’s pretty easy to do on most any flavor of linux, if it’s not already installed. A quick dnf/apt install rsyslog should sort it all out for you.

Once installed, you should find folder ‘/etc/rsyslog.d’ and, by default, rsyslog is setup to parse any conf files stored in there.

I add two files to that folder. The first, I call ‘recieve_UDP1514.conf’. This is a simple one that just tells rsyslog to listen for syslog data on port 1514, via UDP. The default syslog port is 514/udp, but using 1514/udp allows you to run rsyslog as a non-root user. When you tell a given device to send syslog messages to your server, they always ask you to enter a port anyway so it’s a pretty low-risk change to make.

recieve_UDP1514.conf:

module(load="imudp")
input(type="imudp" port="1514")

The second file, I call FilterToFiles.conf. And, this is where I tell rsyslog what to do with the syslog messages it gets. This file is just a series of the same config lines repeated for each IP sending data.

FilterToFiles.conf
if $fromhost-ip startswith 'IP.ADD.GOES.HERE' then /Place/To/Store/The/File.log
& stop

And, I repeat those two lines a dozen or so times changing the IP and File each time.

Restart the rsyslog service, and you capturing syslog data.

I left my config like that for a couple weeks, but I didn’t have a really easy way to purge out the data I’d like to get rid of. I did some research into what the helper app logrotate can do, and quite a powerful app it is, but it didn’t seem to have a method to do what I wanted. So, I wrote a short bash script. Which we’ll call Step 3

I run the below script just before ‘today’ ends. This runs via cronjob at 11:59PM. It works out today’s date, and creates that as a variable in the format I want. For better or worse, I am using ‘Nov_11_05_2022’. Feel free to tweak that to your liking. With this variable, a folder is created in the LogData folder. Then it stops rsyslog, moves all of the current .log files into that dated folder, and starts rsyslog. So, ‘live’ log files for the current day are always just at /Storage/LogData/ and then, when the day ends they get filed away and the new day’s log files are created by rsyslog as data arrives,

I then do a little work to remove any empty folders, and then finally purge out any data older than 30 days.

Log Rotation Script:

#!/bin/bash

#Set Date Variable
DATE=$(date +"%b_%m_%d_%Y")

#Create Folder for Log Files
mkdir /Storage/LogData/$DATE

#Stop RSyslog
systemctl stop rsyslog

#Move All Log Files into Folder
mv -f /Storage/LogData/*.log /Storage/LogData/$DATE/

#Start RSyslog
systemctl start rsyslog

#Purge Empty Folders
find /Storage/LogData/ -type d -empty -delete

#Purge Files Older than 30 days
find /Storage/LogData/* -mtime +30 -exec rm -rf {} \;

And, that’s it. With the rsyslog config in place, and the cronjob, this has been taking care of itself.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.