This was far more challenging than it needed to be. Cisco makes some SDK’s available for use with Nagios but I was totally unable to make the system see them. They are just Python packages so I didn’t expect much trouble, but I was totally at a loss. I thought, initially, that the issue was that the installer dumped the files into /usr/lib/python/site-packages/ instead of the 64bit path /usr/lib64/python/site-packages/ but no amount of copying and permissions changes made the system able to see the dependencies. For those interested, here is a link to the Cisco Nagios tools (from 09/2017) Nagios Plug-Ins for Cisco UCS. I am not using that package, or its dependencies.
Instead, I found a script written in Go that worked a treat and required no other dependencies to work. You can find it here on Github: check_cisco_ucs
Having never used anything written in Go before I had a few things to learn, but it turned out very simple. I downloaded the ‘check_cisco_ucs.go’ script to my Nagios server. I tried to run it and installed a few bits of software before I understood that I needed to compile the script into an executable. That turned out to be exceedingly simple.
First, I installed the ‘golang-bin’ package, which is in EPEL.
yum install golang-bin
I then navigated to the folder containing the check_cisco_ucs.go script and ran the following command:
go build check_cisco_ucs.go
With that, I was able to run the script. The Github page has a series of example commands which I started firing at some of the Cicso C220 M4 servers I needed to monitor. The script was last updated in 2014 and while much of it worked, not all of the items the author used as examples remain working in the updated BIOS on my systems.
This one worked fine, once you update the IP address and user/password as required. It takes 10-15 seconds to run and then reported back accurate information about the RAID setup and status of the system. The only caveat being, since we run CIMC through https, I needed to add the ‘-M 1.2’ tag so that it would accept TLS 1.2, which CIMC was running.
./check_cisco_ucs -H
With this, I knew the solution would work and I started to go through the steps to make the required checks appear in Nagios. I moved the check_cisco_ucs file into the folder Nagios expects to find the command files, which for me on Centos 7 is /usr/lib64/nagios/plugins.
I then created a new file, called check_cisco_ucs.cfg, and put it into the folder I have configured my Nagios install to look for commands. Within that file I’ve listed my commands, which are a bit messy:
define command {
command_name check_cisco_ucs_storage
command_line $USER1$/check_cisco_ucs -H $HOSTADDRESS$ -M 1.2 -t class -q storageVirtualDrive -a "raidLevel vdStatus health" -e Optimal -u admin -p yourpass
}
I then added the relevant check to the server object I wanted to check:
define service{
use generic-service
host_name NameInNagios
service_description RAID Controller Status
check_command check_cisco_ucs_storage
}
Do a quick check of your Nagios config to make sure it’s sane and working:
nagios -v /path/to/nagios.cfg
I was able to do the same for the next example check on the GitHub page, for information about local disks.
From here though, the checks I wanted didn’t work as expected. Reading light status was problematic and the power supply command needed a small tweak.
The author suggested a powersupply command as:
./check_cisco_ucs -H
But, in the 3.0 version of CIMC I’m running, the ‘operState’ option is no longer present and instead I needed to use ‘operability’. My Nagios command is this tweaked version:
$USER1$/check_cisco_ucs -H $HOSTADDRESS$ -M 1.2 -u admin -p yourpass -t class -q equipmentPsu -a "id model operability serial" -e operable
The final check I wanted was to let me know if any of the status lights were amber, instead of green. The author suggested a check to watch one LED to ensure it was green but this reported no information. I instead rewrote the command to look at all of the lights and tell me if any are amber, instead of confirming one light was green.
$USER1$/check_cisco_ucs -H $HOSTADDRESS$ -M 1.2 -u admin -p password -t class -q equipmentIndicatorLed -a “id color name operState” -z -e amber
This command looks at the lights, which gives me this output:
1,green,LED_PSU_STATUS,on
2,green,LED_TEMP_STATUS,on
3,green,LED_FAN_STATUS,on
4,green,LED_HLTH_STATUS,on
5,blue,FP_ID_LED,off
0,green,OVERALL_DIMM_STATUS,on (0 of 6 ok)
The key here, is that the -z means that if the state is NOT found, i.e. there is no amber light, the check is considered Ok. If there is an amber light, the check will fail and alert me. Perfect.
The final hurdle between finding the plugin, and making my checks work was a couple hours effort with the Cisco UCS Rack-Mount Servers Cisco IMC XML API Programmer’s Guide. Using that guide, and some curl commands I was able to finally get the data out I wanted and transform it into commands that plugin could use.
First step was to authenticate to the CIMC:
curl -d "
That final ‘-k’ is required to make curl ignore the SSL cert on the system, which is just a basic self-signed.
That command outputs a cookie, which you then pass in future commands to remain authenticated.
curl -d "
Be careful not to mix ‘ and “. The ” is used to escape the command being sent via curl, the ‘ is used to bind together the variables in that command. Mixing them will cause the commands to fail.
That should be enough to get you going to find any other commands you might be interested in.. I’ll try to fill in some more detail once I’m finished rolling out the Nagios install.