Sunday, October 10, 2010

Nagios monitoring Dell PE 2900 via SNMP

I decided I would like to monitor our new file server - you know, so if the RAID became degraded, I'd know... rather than lose two disks from a set like we did recently and um, lose a bit of data. Yeah, oops.

So... how hard can it be? Answer: quite hard for our Windows 2000 server. More on that later.

For Windows 2003, it wasn't too bad, but there were a few hoops to jump through. I'm documenting those hoops here for future reference:

Install OpenManage Server Administrator Managed Node (v6.3) ... ahah, but not so fast - it will probably force you to install new RAID firmware and drivers, so do that, of course, reboot... then here's the trick that got me first time around: Storage Management is deselected for installation by default, so you MUST choose a custom installation and for the love of god, select Storage Management for installation! Why would Dell do this??? Especially after making a song-and-dance that forced me to upgrade my RAID firmware... anyway...

Then to monitor via SNMP you need Windows SNMP installed (Start -> Settings -> Control Panel -> Add/Remove Programs, select "Windows Components" then "Management and Monitoring Tools", click "Details:" button and scroll down to "Simple Network Management Protocol" and make sure that's ticked. By default SNMP only allows polling from localhost (this is either good security, or absolutely stupid, depending on your point of view and level of caffeination). To allow SNMP polling from other hosts, go to the services control panel applet, find "SNMP Service", right-click, select "Properties", click the "Security" tab and either allow SNMP from all hosts, or just the hosts you choose.

Test that SNMP is working:

$ snmpget -v 1 -c public hostname .1.3.6.1.4.1.674.10893.1.20.140.1.1.2.1
SNMPv2-SMI::enterprises.674.10893.1.20.140.1.1.2.1 = STRING: "System"

(this is the name of the "disk label" for virtual disk 1). You can find a list of useful info on OpenManage SNMP here.

Great. From here I could write some simple SNMP checks for Nagios, and so long as the virtualDiskRollUpStatus (1.3.6.1.4.1.674.10893.1.20.140.1.1.19.x) comes back as 3 then we can assume we're all happy. But I thought maybe some helpful soul out there might have already written something more sophisticated for monitoring OpenManage, and they surely have - I settled on check_openmanage as a nice one.

So on the nagios server, I did this:

# cd /usr/local/libexec/nagios/
# wget http://folk.uio.no/trondham/software/check_openmanage-3.6.0/check_openmanage
# chmod +x check_openmanage
# ./check_openmanage -H my-server
OK - System: 'PowerEdge 2900 III', SN: 'XXXXXX1S', 2 GB ram (2 dimms), 2 logical drives, 4 physical drives
# vi /usr/local/etc/nagios/commands.cfg
# add these lines:
# 'check_openmanage' command definition

define command{

command_name check_openmanage

command_line $USER1$/check_openmanage -H $HOSTADDRESS$

}

Then edit the server's .cfg file to call the plugin:
define service{

use local-service ; Name of service

host_name my-server

service_description OpenManage Status

check_command check_openmanage

}



Then re-start nagios and wait till it polls, and see nice green output. Yay!

No comments:

Post a Comment