Sunday, November 28, 2010

ReadyNAS SNMP agent dies... aaarrgggh

If you've been following along, you'll be aware that I set up Nagios monitoring of our ReadyNAS units via SNMP. Happiness ensues! Until Nagios starts spitting out warnings:

readyNAS temp is UNKNOWN
SNMP problem - No data received from host

Oh crud. The box is still happily clicking along... responding to pings, frontview (web management interface) is still working. And what's strangest is that most of SNMP is still responding:

$ snmpwalk -v1 -cpublic em-nas system
SNMPv2-MIB::sysDescr.0 = STRING: Linux em-nas 2.6.17.14ReadyNAS #1 Wed Sep 22 04:42:09 PDT 2010 padre
SNMPv2-MIB::sysObjectID.0 = OID: NET-SNMP-MIB::netSnmpAgentOIDs.10
DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (50273528) 5 days, 19:38:55.28
SNMPv2-MIB::sysContact.0 = STRING: root
[snip all the exciting rest of the system section of the MIB]

But when you try to walk the ReadyNAS-specific section of the MIB:

$ snmpwalk -v1 -cpublic em-nas enterprises.4526
[nothing!]

Hmmm... taking a shrewd guess, the ReadyNAS section of the MIB is probably implemented as a sub-agent, and that sub-agent has died. Let's have a poke around... reading /etc/init.d/snmpd, sure enough, that script starts up the usual snmpd and snmptrapd AND /usr/sbin/readynas-agent - ahah, so is this process running?

em-nas:~# ps axwu | grep [a]gent
[nothing!]

So a quick solution:

em-nas:~# /etc/init.d/snmpd restart
em-nas:~# ps axwu | grep [a]gent
root     29772  0.1  1.3  9600 3168 ?        S    09:28   0:00 /usr/sbin/readynas-agent

Now check that the ReadyNAS MIB works again:

$ snmpwalk -v1 -cpublic em-nas enterprises.4526.18.7.1
SNMPv2-SMI::enterprises.4526.18.7.1.1.1 = INTEGER: 1
SNMPv2-SMI::enterprises.4526.18.7.1.2.1 = STRING: "Volume C"
SNMPv2-SMI::enterprises.4526.18.7.1.3.1 = STRING: "RAID Level X"
SNMPv2-SMI::enterprises.4526.18.7.1.4.1 = STRING: "ok"
SNMPv2-SMI::enterprises.4526.18.7.1.5.1 = INTEGER: 2837504
SNMPv2-SMI::enterprises.4526.18.7.1.6.1 = INTEGER: 2177024

Yep, that's got it. Now to follow up: why does readynas-agent crash?

Thursday, November 25, 2010

g15macro on Ubuntu 10.04.1

I have this fancy G15 keyboard from logitech, which (theoretically) should allow me to record macros bound to the extra G keys. What's supposed to happen is that you install the g15macro package (and some dependencies), set g15macro as a Startup Application. Then you can hit the MR (macro record?) key, type a macro, hit a G key and viola! your macro is bound to a G key. For an LDAP dude like me, this could save me typing 'ou=People,dc=example,dc=com,dc=au' a few times a day. Sounds like it's full of win!

Of course, things are rarely that easy. I simply never could get it to work. Once I tried running it form the terminal, I could see that after its first run (once it had written out ~/.g15macro/g15macro.conf) it would segfault as soon as it was run. Bugger!

Soon enough, googling around found me plenty of links showing that the answer was to modify g15macro.c, comment out a line that was causing the crash, and hey pesto!

Being a moderately experienced Linux user, I tried to install the g15macro-dev package... nope, not found in the Ubuntu repos. OK, download source, uninstall the packaged g15macro, compile and install new version. Along the way i hit a few dependencies for the compilation:

sudo apt-get remove g15macro
sudo apt-get install libg15daemon-client-dev libg15-dev libg15render-dev
sudo apt-get install libfreetype6-dev libxtst-dev
cd Downloads/g15macro-1.0.3/
./configure 
make check
sudo make install


Easy as pi, though compiling software from tarballs rather than using apt-get makes me feel like I've travelled back in time.

Wednesday, November 17, 2010

Bash scripting, test and OR

Rsync exits with 0 if the transfer was successful, or a non-zero value if there was a "problem". I say "problem" with quoties because one condition that can (and does) occur regularly on our mail server is when mail files get deleted while the backup is running - they "vanish". And this isn't really an error, but rsync exits with error code 24... and if your backup script does this:

rsync --lots-of-options $SRC $DEST
EXIT=$?
if [ $EXIT -eq 0 ]
then 
    echo "Yay, we are all good: [$EXIT]"
else
    echo "Oh noes, bad things happened: [$EXIT]"
fi

...you end up with spurious error reports giving you high blood pressure.

So the obvious solution is to make the if condition for error code 0 or 24, and only spit out "Oh Noes" if it was something different. It took me a while to get the syntax right:

if [ \( $EXIT -eq 0 -o $EXIT -eq 24 \) ]