I use Sanoid, and Syncoid, to take snapshots of my Centos based ZFS storage system and copy them to replicas. Generally speaking, I use re-purposed Thecus N5550 storage units for the remote hardware. I have a long history with Thecus hardware, and while I’m not a fan of their software, I am a big fan of removing it and installing Linux. I usually take out the 2GB of memory it comes with, and swap it for a pair of 4GB modules. The only down side, especially for systems I install in remote locations, is that I don’t have any out of band management.
I have the machines configured to boot up and connect to my OpenVPN server. This works great, but after a while the units drop offline. The Thecus hardware is good, but I’m asking the hardware to do a lot more than it was built for, running more RAM than Thecus says is possible. When it goes offline, I just power it off and then back on.. or really, I ask someone near by to do that for me.
Recently though, I started to wonder if this was a machine lock up or if OpenVPN was just getting confused. I let a system stay ‘offline’ for a week or so until I had a chance to get on site. I SSH’d into the machine locally, and lo and behold, the system was running great and OpenVPN had no idea it wasn’t connected. This is a problem I can solve…
I wanted to setup a script to check if the system could ping an address across the VPN, and if it wasn’t, restart the OpenVPN client.
Ping, on Linux, has a feature where it will ping an address for up to a certain amount of time, it’s the ‘-w’ switch. It will complete immediately if the ping is successful, but keep trying without any output until the wait time is over and then reports the failure. This gives me a clean output to put into a script. I didn’t want to restart the OpenVPN service for a single missed ping, that happens all the time, I wanted to be sure it was down.
count=$( ping -w 30 -c 1 $target | grep icmp* | wc -l )
if [ $count -eq 0 ]
echo "Restarting VPN"
systemctl restart (Service Name for your OpenVPN Client Config)
echo "VPN is Running"
(I’m sure I got the bones of this script online, but I can’t find it again to link to.)
For the script to work you enter the IP you want to ping on the line for the ‘target’ variable. The script will then try to ping the address. If it gets a response back, the script will get a value not equal to zero and drops to the ‘else’ line. This just writes the output to the system log and closes. If, after the wait time, it hears nothing, which would be the ‘0’ output, it runs the command to restart the OpenVPN client process. This process name will be different depending on what you called your config file.
I then have this run, via crontab, every fifteen minutes
*/15 * * * * (user with permissions to restart the service) (/location/of/script/)
I got all of the OpenVPN endpoints back on the network over the weekend and rolled out the automation. As luck would have it, my internet connection dropped a few hours later. One of the end points re-joined without any delay. The other did not. I waited for the next quarter hour, when the script would fire, and voila it ran the script, detected the down connection and rejoined. Success.
Maybe this will save you some time too.