Tuesday, August 23, 2011

Monitoring CPU load with SNMP

Or: Nothing Is Ever Easy.

We have a small farm of Citrix servers. They run a particular app for about 130 users. After a recent upgrade to the app, we are beginning to suspect that the new version of the app is putting more load on the CPUs. Alas, we have no historical data to refer to... but doesn't it sound like the kind of thing that some Cacti graphs would be perfect for? For example, in this next graph of our network traffic, you can see a sudden jump in outbound network traffic at the end of March - that's when the Bacula backup system became live:

So for spotting trends, and detecting changes, this kind of graph is invaluable.

Since we already have Cacti, why not poll the Citrix servers for CPU load, and graph that too... maybe also memory use... all sounds good, right? All you have to do is enable SNMP on the Windows host, figure out the OIDs of each CPU (each core counts as a CPU) and hey presto, graphs! As we'll see, it's not that easy.

OK, first step: enable SNMP on your Windows host - go to Start -> Control Panel -> Add/Remove Programs -> Windows Components -> Management and Monitoring Tools and make sure "Simple Network Management Protocol" is selected:

Next step: Go to Services, scroll down to SNMP Service, and right click, select "Properties"

Now click the Agent tab, and in the Service section, enable Physical:

Without Physical selected, the SNMP service will not report on physical hardware components such as the CPUs or memory. Strangely enough, it will report on physical hardware such as disks. I'm still scratching my head over that one.

OK, so now we should be able to get some info on the CPU load. The MIB that deals with this info is the HOST-RESOURCES-MIB and the interesting bits relating to CPU load are in the hrProcessor table. Let's take a walk... an snmp-walk:

pyarra@iceberg:~$ snmpwalk -v1 -cpublic windows-server 1.3.6.1.2.1.25.3.3
HOST-RESOURCES-MIB::hrProcessorFrwID.12 = OID: SNMPv2-SMI::zeroDotZero
HOST-RESOURCES-MIB::hrProcessorFrwID.13 = OID: SNMPv2-SMI::zeroDotZero
HOST-RESOURCES-MIB::hrProcessorFrwID.14 = OID: SNMPv2-SMI::zeroDotZero
HOST-RESOURCES-MIB::hrProcessorFrwID.15 = OID: SNMPv2-SMI::zeroDotZero
HOST-RESOURCES-MIB::hrProcessorLoad.12 = INTEGER: 30
HOST-RESOURCES-MIB::hrProcessorLoad.13 = INTEGER: 15
HOST-RESOURCES-MIB::hrProcessorLoad.14 = INTEGER: 20
HOST-RESOURCES-MIB::hrProcessorLoad.15 = INTEGER: 35

Cool! We're all done, right? Well... no. You see, as the Net-snmp doco makes clear, the indexes into this table (12,13,14,15 in this example) are the device IDs of the CPUs. As such, this table is sparse - there won't be entries for other device IDs. That's okay, so long as we know the OIDs we want, no dramas. Sort of... except that when you look at how Windows enumerates those device IDs, you'll find that printers get enumerated before CPUs:

pyarra@iceberg:~$ snmpwalk -v1 -cpublic em-fap HOST-RESOURCES-MIB::hrDeviceDescr
HOST-RESOURCES-MIB::hrDeviceDescr.1 = STRING: Microsoft XPS Document Writer
HOST-RESOURCES-MIB::hrDeviceDescr.2 = STRING: Xerox Phaser 8560DT
HOST-RESOURCES-MIB::hrDeviceDescr.3 = STRING: FX DocuPrint C2100 PCL 6
HOST-RESOURCES-MIB::hrDeviceDescr.4 = STRING: HP Universal Printing PCL 6
HOST-RESOURCES-MIB::hrDeviceDescr.5 = STRING: HP Universal Printing PCL 6
HOST-RESOURCES-MIB::hrDeviceDescr.6 = STRING: Dell Laser Printer 1720dn
HOST-RESOURCES-MIB::hrDeviceDescr.7 = STRING: FX DocuPrint C2100 PCL 6
HOST-RESOURCES-MIB::hrDeviceDescr.8 = STRING: HP LaserJet 4
HOST-RESOURCES-MIB::hrDeviceDescr.9 = STRING: HP Universal Printing PCL 6
HOST-RESOURCES-MIB::hrDeviceDescr.10 = STRING: HP Universal Printing PCL 6
HOST-RESOURCES-MIB::hrDeviceDescr.11 = STRING: HP Universal Printing PCL 6
HOST-RESOURCES-MIB::hrDeviceDescr.12 = STRING: Intel
HOST-RESOURCES-MIB::hrDeviceDescr.13 = STRING: Intel
HOST-RESOURCES-MIB::hrDeviceDescr.14 = STRING: Intel
HOST-RESOURCES-MIB::hrDeviceDescr.15 = STRING: Intel

On reboot, if you have added a printer... yes, the device IDs get re-enumerated, and suddenly your CPU device IDs get changed. Some brave souls have attempted to find ways to deal with this. Me, I dunno if it's worth it.

For the record, Linux doesn't re-number the CPU OIDs on a reboot, at least, not when I added a network interface. It seems like the CPU OIDs start at 768 regardless of what else there is... I really am curious about why they start there. I'm sure it's some sort of fixed offset to avoid OID renumbering issues, but why 768? I would have expected a power of 2. Maybe I think too much like a programmer.

Anyhow, back to WIndows: I also have my doubts about how useful the moment-to-moment CPU load will be, since sampling it at one- or five-minute intervals doesn't give us a great overall picture of CPU load. What we really need is something more like the Unix load average figures. And lo and behold, UCD MIBs implement this in the laTable in a range of helpful ways:

pyarra@verbena:~$ snmpwalk -v1 -cpublic localhost .1.3.6.1.4.1.2021.10
UCD-SNMP-MIB::laIndex.1 = INTEGER: 1
UCD-SNMP-MIB::laIndex.2 = INTEGER: 2
UCD-SNMP-MIB::laIndex.3 = INTEGER: 3
UCD-SNMP-MIB::laNames.1 = STRING: Load-1
UCD-SNMP-MIB::laNames.2 = STRING: Load-5
UCD-SNMP-MIB::laNames.3 = STRING: Load-15
UCD-SNMP-MIB::laLoad.1 = STRING: 1.19
UCD-SNMP-MIB::laLoad.2 = STRING: 1.12
UCD-SNMP-MIB::laLoad.3 = STRING: 0.94
UCD-SNMP-MIB::laConfig.1 = STRING: 12.00
UCD-SNMP-MIB::laConfig.2 = STRING: 14.00
UCD-SNMP-MIB::laConfig.3 = STRING: 14.00
UCD-SNMP-MIB::laLoadInt.1 = INTEGER: 118
UCD-SNMP-MIB::laLoadInt.2 = INTEGER: 112
UCD-SNMP-MIB::laLoadInt.3 = INTEGER: 93
UCD-SNMP-MIB::laLoadFloat.1 = Opaque: Float: 1.190000
UCD-SNMP-MIB::laLoadFloat.2 = Opaque: Float: 1.120000
UCD-SNMP-MIB::laLoadFloat.3 = Opaque: Float: 0.940000
UCD-SNMP-MIB::laErrorFlag.1 = INTEGER: noError(0)
UCD-SNMP-MIB::laErrorFlag.2 = INTEGER: noError(0)
UCD-SNMP-MIB::laErrorFlag.3 = INTEGER: noError(0)
UCD-SNMP-MIB::laErrMessage.1 = STRING: 
UCD-SNMP-MIB::laErrMessage.2 = STRING: 
UCD-SNMP-MIB::laErrMessage.3 = STRING: 

So now, I'm beginning to think that net-snmp on Windows might be the way to go. Time to create some VM images and start experimenting!


Tuesday, August 9, 2011

FreeBSD, ALTQ and SNMP

Some history: we use FreeBSD (actually, a cut-down version called nanoBSD) to route and shape our WAN traffic. Works like a charm. ALTQ is the magic kernel bits that make the queuing work. It's a lot like the Linux tc stuff: set up queues, assign queuing disciplines, then push traffic into the appropriate queues based on certain criteria - in our case, usually related to port numbers and IP addresses.

Anyhow, all good, it works just like you'd expect. We can use pftop -v queues to watch in realtime how much traffic is passing through (and being dropped by) each queue.

Then I got ambitious, and decided it'd be really helpful to use Cacti to graph the queue stats. We already do this for overall traffic throughput on the interfaces, and it's handy. We just use SNMP to poll the interface counters, Cacti makes a nice graph, and we can see what's going where, when.

However, that's where things started to get a little complicated. You see, the ALTQ SNMP implementation was incomplete for a while. Specifically, this is what happens if you try to walk the ALTQ section of the MIB:

$ snmpwalk -v1 -cpublic my-router .1.3.6.1.4.1.12325.1.200.1.9.2.1
SNMPv2-SMI::enterprises.12325.1.200.1.9.2.1.2.1 = STRING: "NoRouteIPs"
SNMPv2-SMI::enterprises.12325.1.200.1.9.2.1.2.2 = STRING: "Sequencers"
SNMPv2-SMI::enterprises.12325.1.200.1.9.2.1.3.1 = INTEGER: 2
SNMPv2-SMI::enterprises.12325.1.200.1.9.2.1.3.2 = INTEGER: 2
SNMPv2-SMI::enterprises.12325.1.200.1.9.2.1.4.1 = Timeticks: (1198511000) 138 days, 17:11:50.00
SNMPv2-SMI::enterprises.12325.1.200.1.9.2.1.4.2 = Timeticks: (1198511000) 138 days, 17:11:50.00
[snip]
SNMPv2-SMI::enterprises.12325.1.200.1.9.2.1.20.1 = Counter64: 0
SNMPv2-SMI::enterprises.12325.1.200.1.9.2.1.20.2 = Counter64: 0
Error in packet.
Reason: (genError) A general failure occured
Failed object: SNMPv2-SMI::enterprises.12325.1.200.1.9.2.1.20.2

Sub-optimal! The reason why this is occurring was that the section of code that would then walk through the ALTQ table was implemented like this:

int
pf_tbladdr(struct snmp_context __unused *ctx, struct snmp_value __unused *val,
        u_int __unused sub, u_int __unused vindex, enum snmp_op __unused op)
{
        return (SNMP_ERR_GENERR);
} 

Yep, that'll do it. The source file is /usr/src/usr.sbin/bsnmpd/modules/snmp_pf/pf_snmp.c Fortunately, version 1.14 of pf_snmp.c has a proper implementation. So which version is it in? Sadly, not in FreeBSD releases 8.x but it is in FreeBSD 9.0 Beta 1. Sooo, I thought I'd run that up and give it a crack.

Had to re-compile my kernel to support ALTQ, with reference to this article and this one too, as I'm something of a FreeBSD n00b. Then reboot and...

Enabling pfpanic mutex pf task mtx owned at /usr/src/sys/contrib/pf/net/if_pfsync.c:3163
cpuid = 0
KDB: enter: panic
[ thread pid 961 tid 100059 ]
Stopped at    kdb_enter+0x3a:  movl    $0,kdb_why
db>

Oh well, nice try. I guess I'll check back when 9.0 is stable.

Tuesday, June 28, 2011

Stop Windows Search from indexing a specific folder

One of our Windows 2003 started logging errors in Event Log with a search service trying to index the contents of D:\Program Files\Trend Micro\ and failing.

I'd previously told "Indexing Service" to exclude those folders (via its snap-in to the Computer Management utility), and the service now even disabled.

It appears to be that the problem was now with a different service, called "Windows Search", which apparently supersedes the old Indexing Service. All the doco mentions using GPO to configure it (!) but it turns out there's a control panel applet, and the Trend sub-folder PCCSRV was listed there specifically for indexing. I wonder if this happened because it is shared over the network? Anyhow, removed that and it now seems to be running without errors.

Bloody computers - no sooner do you get the hang of how something works, and some bright spark goes and supersedes it!

Friday, May 27, 2011

RT4 on Ubuntu 10.04, part 1

I want to try RT4, but the pre-packaged version on Ubuntu 10.04 is RT3.8. What's a fella to do? Why, install from source, of course!

Download the tarball, extract (tar zxf, in case you were wondering :-) and do the configure dance (./configure --with-db-type=Pg)

Then there are some perl modules it needs. Some are available as packages, which are:

apt-get install build-essential
apt-get install libdatetime-perl
apt-get install libclass-returnvalue-perl
apt-get install libemail-address-perl
apt-get install libtext-quoted-perl libtext-wrapper-perl
apt-get install liblog-dispatch-perl libtree-simple-perl
apt-get install libtest-template-perl
apt-get install libtext-template-perl
apt-get install libuniversal-require-perl
apt-get install libnet-cidr-perl
apt-get install libdevel-globaldestruction-perl
apt-get install liblocale-maketext-lexicon-perl liblocale-maketext-fuzzy-perl
apt-get install libcss-squish-perl
apt-get install libregexp-common-perl
apt-get install libcache-simple-timedexpiry-perl
apt-get install libgraph-writer-graphviz-perl
apt-get install libhtml-scrubber-perl
apt-get install libmodule-versions-report-perl
apt-get install libgd-graph-perl libgd-graph3d-perl libgd-gd2-perl
apt-get install libnet-cidr-lite-perl
apt-get install libipc-run3-perl
apt-get install libtext-password-pronounceable-perl
apt-get install libfile-sharedir-perl
apt-get install libregexp-common-time-perl libregexp-copy-perl libregexp-optimizer-perl libregexp-shellish-perl libtie-regexphash-perl libppix-regexp-perl libregexp-assemble-perl
apt-get install libhtml-mason-perl
apt-get install libemail-mime-perl
apt-get install libmime-tools-perl libmime-perl libmime-lite-perl libmime-explode-perl libmime-encwords-perl libmime-charset-perllibemail-mime-creator-perl libemail-mime-createhtml-perl
apt-get install libmime-tools-perl libmime-perl libmime-lite-perl libmime-explode-perl libmime-encwords-perl libmime-charset-perl libemail-mime-creator-perl libemail-mime-createhtml-perl
apt-get install libdatetime-format-builder-perl
apt-get install libfcgi libfcgi-perl libfcgi-procmanager-perl
apt-get install libdbix-searchbuilder-perl
apt-get install libyaml-syck-perl
apt-get install libdbd-pg-perl



Then after all that, you'll need to configure CPAN:



/usr/bin/perl -MCPAN -e shell

Then run fixdeps to grab the remaining modules:

make fixdeps

That's as far as I've got so far, more fun to follow, I'm sure.

Wednesday, May 25, 2011

Perl installation WTF?

Sometimes, you just want to sit back and let CPAN do its magic. Don't care, just go fetch the pre-requisite packages so I can try RequestTracker. Then something whizzes past that makes you sit up and think "What was that??"

It was this:

Install module Devel::GlobalDestruction
Running install for module 'Devel::GlobalDestruction'

Global Destruction? Yeah, okay, that sounds fine too :-)

Wednesday, May 11, 2011

Nagios monitoring HP Proliant running Ubuntu 10.04 Linux

A while back I wrote about how to get the HP bits and pieces installed on Ubuntu 10.04. Now the next step: getting Nagios to poll it via SNMP and report if things are okay.

OK, assuming you already have net-snmp installed (check it: snmpwalk -v1 -cpublic your.host.name system - should return basic SNMP info such as sysLocation, contact name etc).

Now run /sbin/hpsnmpconfig (installed by package hp-snmp-agents) and answer the prompts. Note that the initial question "Do you wish to use an existing snmpd.conf" fooled me the first few times through - answer "n" to have it make changes to your existing snmpd.conf file - is it just me, or is this question confusing?

Anyhow... after that it was all smooth sailing, I answered with the config I wanted, and hey presto, it made its changes to snmpd.conf

Then from another box, try a few snmpwalks to make sure the HP sub-agent is answering:

snmpwalk -v 1 -c public my.host.name 1.3.6.1.4.1.232 <-- this is where HP's SNMP subtree (arc, if you prefer the technical term) starts

You will get an abundance of SNMP output, all of it pertaining to HP components.


Then add the check_hp plugin to your Nagios box - download the file, extract the tarball, cp check_hp /path/to/nagios/executables (for us, on nanoBSD, /usr/local/libexec/nagios/) . Note I had to edit the check_hp script to make the first "use lib" to point to /usr/local/libexec/nagios/ - if this is wrong, check_hp will complain that it can't find utils.pm

Then test:
nagios# /usr/local/libexec/nagios/check_hp -H my.host.name
Compaq/HP Agent Check: overall system state OK

Woohoo! Now add the command definition to commands.cfg (for us, /usr/local/etc/nagios/commands.cfg):

# 'check_hp' command definition
define command{
        command_name    check_hp
        command_line    $USER1$/check_hp -H $HOSTADDRESS$
        }

Then add the check to the host in question:

define service{
        use                             local-service         ; Name of service template to use
        host_name                       my.host.name
        service_description             HP System State
        check_command                   check_hp
        }

Restart Nagios, and watch the newly-added check get run, and (hopefully) green bits appear on screen.

If you're interested  in what exactly the check_hp plugin is monitoring, you can run it manually with the -d flag to see debug output about the SNMP values checked. The script is also pretty easy to read, so have a look through at what is being checked. I also found a neat-o guide to the HP/Compaq MIBS which is worth a read (and considerably easier than reading the MIB files, which are enormous!)

And now that SNMP is working properly, the previously-blank System Management page (accessed at https://your.host.name:2381) will be full of useful info about your hardware. Yep, it uses SNMP to find out about the system state too. Busted SNMP == no info. No wonder I never found it very useful before!

Also a useful manual from HP here - I wish I'd had this in the first place!

Tuesday, March 22, 2011

Compiling bacula natively on the ReadyNAS

It works! Here's the Reader's Digest version:

apt-get install libc6-dev 
apt-get install libc6-dev 
apt-get install gcc 
apt-get install make autoconf automake libtool flex bison gdb
apt-get install libtag1-dev 
apt-get install uuid-dev
apt-get install g++

wget http://wwwmaster.postgresql.org/redir/198/h/source/v8.4.7/postgresql-8.4.7.tar.gz
tar zxf postgresql-8.4.7.tar.gz 
cd postgresql-8.4.7
./configure --without-readline --without-zlib
make install
/usr/local/pgsql/bin/psql --version
wget http://sourceforge.net/projects/bacula/files/bacula/5.0.1/bacula-5.0.1.tar.gz/download
tar zxf bacula-5.0.1.tar.gz 
cd bacula-5.0.1
./configure --prefix=/usr/local/bacula --with-postgresql=/usr/local/pgsql/ --build=sparc-linux && make install
/usr/local/bacula/sbin/bacula-sd --help  
tar zcf bacula-5.0.1-sparc.tgz /usr/local/bacula/
tar zcf postgresql-8.4.7-sparc.tgz /usr/local/pgsql/

Then I copied both of these tarballs of binaries over to a production ReadyNAS. It already had the bacula add-on installed, so I left that in place (so I could re-use the init script and config) and disabled it from running via Frontview. I changed to / and unpacked the tarballs, then tried manually running bacula-sd with the existing config. It worked!!!!

The big test was to then try a restore, since that was the bit I was trying to get working. I decided to try restoring a subset of a previous backup to a remote server. It worked. Flawlessly. I then proceeded outside and did a victory lap of my yard.

As a clean-up: I copied my original bacula-sd.conf to /etc/bacula, and modified the add-on-installed init script to run /usr/local/bacula/sbin/baculs-sd instead of /opt/rfw/sbin/bacula. I added exit 0 near the top of the installed bacula-dir and bacula-fd init scripts, just in case.

I also used update-rc.d to get the bacula-sd init script to start at boot, though... it didn't. Still puzzling over that one, but eh, I'll get there.

One minor nit: when you use the init script to stop bacula, it reports an error:

rm-nas-1:~# /etc/init.d/bacula-sd stop
Stopping Bacula Storage Daemon: start-stop-daemon: warning: failed to kill 7948: No such process
start-stop-daemon: warning: failed to kill 7946: No such process
bacula-sd

I think this is because killing one of the three running bacula-sd processes (old-school: uses multiple processes rather than threads, it would seem) causes the others to exit, and they're gone before the kill process can zap their process IDs. Not a problem, just caused me a few minutes' puzzlement.

Cross-compiling: I admit defeat

After days - and I mean days - of attempting to cross-compile sparc binaries for the ReadyNAS NV+ on an Ubuntu 10.04 build host, I've achieved little, other than remembering how much I loathe the old
./configure --with-billions-of-options && make
dance. For the record, here's what I tried.

Before I could compile bacula, I had to compile a database, as the configure script for bacula requires you to link against at least one of the available DBs (I suspect this is only needed when you are compiling the director, but there seems to be no make target to compile only the storage daemon). I initially tried to install libpq, but of course, that installed the intel version suited to my build host - doh! So I had to cross-compile postgresql, which is okay with me, as I have some familiarity with compiling it. I downloaded the source for 8.4.7 and made that:

./configure --with-postgresql=/usr/local/sparc-builds/postgresql --host=sparc-linux-gnu
make install
root@sparc-cc-10-04:~# file /usr/local/sparc-builds/postgresql/lib/libpq.so.5.2
/usr/local/sparc-builds/postgresql/lib/libpq.so.5.2: ELF 32-bit MSB shared object, SPARC32PLUS, V8+ Required, version 1 (SYSV), dynamically linked, not stripped

Hmmmm... file tell me this is a sparc binary, but it looks like it's built for a much beefier CPU than the ReadyNAS has. So I grabbed /bin/ls from a ReadyNAS and compared what kind of binary that is:

root@sparc-cc-10-04:/tmp# file ls
ls: ELF 32-bit MSB executable, SPARC, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.2.0, stripped

Hmmm... I ignored this for a bit, and decided to continue with bacula compilation: so long as I had a running bacula-sd, who cares what file says?
./configure --with-postgresql=/usr/local/sparc-builds/postgresql --host=sparc-linux-gnu --prefix=/usr/local/sparc-builds/bacula ; make install
file /usr/local/sparc-builds/bacula/sbin/bacula-sd 
/usr/local/sparc-builds/bacula/sbin/bacula-sd: ELF 32-bit MSB executable, SPARC32PLUS, V8+ Required, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.8, not stripped

Undeterred, I copied this to the ReadyNAS, just to try executing it.... no go:

rg-nas:~# /tmp/bacula-sd 
-bash: /tmp/bacula-sd: cannot execute binary file

After much fruitless thrashing about, including setting CFLAGS='mcpu=v7' and assorted incantations of -Av7, and a lot of manpage reading, I was still no closer to getting a binary that wasn't targeted at the SPARC32PLUS. I know the ReadyNAS unit is a cut-down sparc CPU - a neon, or a leon, or whatever. I figured V7 code would be a safe bet to run on it, and despite the man page for gcc stating "By default (unless configured otherwise), GCC generates code for the V7 variant of the SPARC architecture", and despite setting that CPU type explicitly, nothing would coax a V7 binary out of the process. I started to suspect an issue with libtool or maybe the make scripts... after a while, it dawned on me I should try something simpler:

root@sparc-cc-10-04:~/tmp# cat hello.c 
/* hello.c : prints "hello world" */
#include 
int main(void) {
    printf("hello world\n");
}

root@sparc-cc-10-04:~/tmp# sparc-linux-gnu-gcc -mcpu=v7 hello.c -o hello
root@sparc-cc-10-04:~/tmp# file hello
hello: ELF 32-bit MSB executable, SPARC32PLUS, V8+ Required, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.8, not stripped

And no, it didn't run when I copied it over to the ReadyNAS. Actually, nothing I tried seemed to have any influence on the compiler's binary output:

root@sparc-cc-10-04:~/tmp# file hello
hello: ELF 32-bit MSB executable, SPARC32PLUS, V8+ Required, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.8, not stripped
root@sparc-cc-10-04:~/tmp# export CFLAGS='-mcpu=v7'
root@sparc-cc-10-04:~/tmp# sparc-linux-gnu-gcc  hello.c -o hello
root@sparc-cc-10-04:~/tmp# file hello
hello: ELF 32-bit MSB executable, SPARC32PLUS, V8+ Required, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.8, not stripped

There's probably some trick I don't get to all this, and I'd love to know what it is. However, by the time I got to this point, too many days had wasted away, and I needed to get the backup project back on track. Time for option 2: use the sparc ReadyNAS as the build host to produce its own binaries. I haven't written that up yet, but: it worked :-)

Tuesday, March 15, 2011

Problems cross-compiling bacula-5.0.1 for ReadyNAS Sparc

It looks as though the ready-made sparc cross-compiler package will not compile Bacula 5.0.1 - stat.h seems to have changed enough that the build fails like this:


==>Entering directory /home/pyarra/bacula-5.0.1/src/findlib
make[1]: Entering directory `/home/pyarra/bacula-5.0.1/src/findlib'
Compiling attribs.c
attribs.c: In function `void encode_stat(char*, stat*, int, int)':
attribs.c:211: error: 'struct stat' has no member named 'st_flags'
attribs.c: In function `int decode_stat(char*, stat*, int32_t*)':
attribs.c:308: error: 'struct stat' has no member named 'st_flags'
attribs.c:310: error: 'struct stat' has no member named 'st_flags'
attribs.c: In function `bool set_attributes(JCR*, ATTR*, BFILE*)':
attribs.c:482: error: 'struct stat' has no member named 'st_flags'
attribs.c:482: error: `chflags' undeclared (first use this function)
attribs.c:482: error: (Each undeclared identifier is reported only once for 
   each function it appears in.)
make[1]: *** [attribs.lo] Error 1
make[1]: Leaving directory `/home/pyarra/bacula-5.0.1/src/findlib'


  ====== Error in /home/pyarra/bacula-5.0.1/src/findlib ====== 

Doh!

My latest attempt to cross-compile for sparc has meant setting up an Ubuntu 10,04 box, and I'm currently following this guide to get the cross-compile tools set up. It's not an area I know much about - although I "get" the theory behind cross-compiling, in the past, I've always just kept a build host for each architecture I'm building for (ultrasparc, alpha) and battled with drain-bamaged build environmnts as best I could. Hopefully, this initial pain of setting up a cross-compile environment will be worth it.

So far so good, then install libpq-dev (so I can --with-postgresql - you need to specify one DB to support when you compile, as there's no way to compile only the storage daemon).

Then ./configure --with-postgresql --host=sparc-linux-gnu and of course it bombs out here:

checking whether setpgrp takes no argument... configure: error: cannot check setpgrp when cross compiling

Agh, well how about not checking for it then?? I guess one day I'll have to learn how to use autoconf so I can fix this properly, but for now, edit configure to skip this check.

Monday, March 14, 2011

Bacula: mis-matched director and storage daemons, eeep!

The backup solution is coming along nicely, but... ehhh... it turns out there are really good reasons to not mix your Director and Storage Daemon versions:

11-Mar 14:45 rm-bac-1-dir JobId 5: Error: Bacula rm-bac-1-dir 5.0.1 (24Feb10): 11-Mar-2011 14:45:14
  Build OS:               i486-pc-linux-gnu ubuntu 10.04
  Job:                    RestoreFiles.2011-03-11_14.45.12_10
  Restore Client:         wd-server
  Termination:            *** Restore Error ***

11-Mar 14:48 rm-bac-1-dir JobId 6: Start Restore Job RestoreFiles.2011-03-11_14.48.53_13
11-Mar 14:48 rm-bac-1-dir JobId 6: Using Device "rm-nas-1-wd-server"
11-Mar 14:48 rm-nas-1 JobId 6: Fatal error: Bootstrap file error: Keyword Storage not found
            : Line 1, col 7 of file /opt/rfw/var/bacula/working/rm-nas-1.RestoreFiles.2011-03-11_14.48.53_13.3.bootstrap
Storage="rm-nas-1-wd-server"

11-Mar 14:48 rm-nas-1 JobId 6: Fatal error: Error parsing bootstrap file.
11-Mar 14:48 rm-bac-1-dir JobId 6: Fatal error: Bad response to Bootstrap command: wanted 3000 OK bootstrap
, got 3904 Error bootstrap


The ReadyNAS is running version 3.0.3 Storage Daemon. While it will quite happily accept the backup files sent to it, when you try to restore it bombs out - clearly the bootstrap file format has changed sufficiently that this will not work. Darn.

This leaves me with three options:

1. build my own Bacula packages for the NV+ units

2. build my own bacula storage daemon binary (and supporting libs), and
copy this over the older installed bacula on the NV+ units, or remove
packaged versions and make; make install

3. don't use independent bacula storage daemons on each ReadyNAS NV+ and
mount them via NFS

My preferred option is 1 and I am setting up a sparc cross-compile
environment and the ReadyNAS Add-on SDK - while this course has the
highest initial cost (time-wise) it should yield better results, and be
potentially useful later if we decide to package additional software for
the ReadyNAS units.

Option 3 would preclude us using local storage to keep
backup traffic local where that is desired, and would only be my
last-resort option.

You can download a pre-made cross-compile package for Linux here, which I did, and set up a VM to be the cross-compile host. Ubuntu 10.04 failed during configure:

pyarra@ubuntu-sparc-compile:~/bacula-5.0.1$ ./configure --prefix=/home/pyarra/ready-builds/ --host=sparc
-linux
[snip oodles of configure messages]
checking whether setpgrp takes no argument... configure: error: cannot check setpgrp when cross compiling

So I've decided to build an Ubuntu 8.04 host to try building with. More on that later!

Thursday, March 10, 2011

Fixed: SNMP on Ubuntu listening only on localhost

By default, snmpd only listens on 127.0.0.1. Edit /etc/default/snmpd and change this line:

SNMPDOPTS='-Lsd -Lf /dev/null -u snmp -g snmp -I -smux -p /var/run/snmpd.pid 127.0.0.1'

to this:

SNMPDOPTS='-Lsd -Lf /dev/null -u snmp -g snmp -I -smux -p /var/run/snmpd.pid'

That is, remove the specification to run only on 127.0.0.1 - then restart snmpd.

I also removed any RW access that had been configured in /etc/snmp/snmpd.conf. I only want read access via SNMP.

Ubuntu Server on HP Proliant - managing RAID

As part of the disk-based backup project, my Bacula Director is going on an Ubunutu 10.04 LTS server on HP Proliant hardware. All well and good, but this is my first Ubuntu Server deployment on HP hardware. Under Dell, I'm used to using the MegaRAID CLI tools to monitor and manage the RAID controller, but was stuck trying to find the equivalent for Ubuntu.

The answer is to use HP's ProLiant Support Pack which offers a bunch of CLI tools. The one I was interested in here was hpacucli

To install (as root, obviously):
  1. wget http://downloads.linux.hp.com/SDR/downloads/bootstrap.sh
  2. ./bootstrap.sh ProLiantSupportPack (this adds the HP repo to apt's sources)
  3. wget http://downloads.linux.hp.com/SDR/downloads/ProLiantSupportPack/GPG-KEY-ProLiantSupportPack
  4. apt-key add GPG-KEY-ProLiantSupportPack
  5. aptitude update
  6. apt-get install hpacucli
Like any good CLI program, it is finicky to learn, and intuitive thereafter :-) Here's the thing I wanted to do:

root@rm-bac-1:~# hpacucli 
HP Array Configuration Utility CLI 8.50-6.0
Detecting Controllers...Done.
Type "help" for a list of supported commands.
Type "exit" to close the console.

==> ctrl slot=0 logicaldrive all show detail

Smart Array P410i in Slot 0 (Embedded)

   array A

      Logical Drive: 1
         Size: 136.7 GB
         Fault Tolerance: RAID 1
         Heads: 255
         Sectors Per Track: 32
         Cylinders: 35132
         Stripe Size: 128 KB
         Status: OK
         Unique Identifier: 600508B1001031303434393834300300
         Disk Name: /dev/cciss/c0d0
         Mount Points: / 131.1 GB
         OS Status: LOCKED
         Logical Drive Label: A01197DF5001438010449840590B
         Mirror Group 0:
            physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SAS, 146 GB, OK)
         Mirror Group 1:
            physicaldrive 1I:1:2 (port 1I:box 1:bay 2, SAS, 146 GB, OK)

Cool! There are other bits I'll want - I see one of the packages offers SNMP monitoring, which naturally, I'm going to use so Nagios can keep an eye on this server.

A list of the 7 packages in the Proliant Support Pack is in this file, but here's an abridged listing:


Package: hpacucli
Description: HP Command Line Array Configuration Utility
 The HP Command Line Array Configuration Utility is the disk
 array configuration program for Array Controllers.

Package: cpqacuxe
Description: HP Array Configuration Utility
 The HP Array Configuration Utility is the web-based disk array
 configuration program for Array Controllers.

Package: hpsmh
Description: HP System Management Homepage
 Provides HTTP infrastructure for HP Agent & Utility system packages.
Homepage: http://hp.com/go/proliantlinux

Package: hp-smh-templates
Description: HP System Management Homepage Templates
 This package contains the System Management Homepage Templates for all
 hp Proliant systems with ASM, ILO, & ILO2 embedded management asics.

Package: hp-snmp-agents
Description: Insight Management SNMP Agents for HP ProLiant Systems
 This package contains the SNMP server, storage, and nic agents for all
 hp Proliant systems with ASM, ILO, & ILO2 embedded management asics.

Package: hponcfg
Description: RILOE II/iLo online configuration utility
 Hponcfg is a command line utility that can be used to configure iLO/RILOE II
 from within the operating system without requiring a reboot of the server.
Homepage: http://hp.com/go/proliantlinux

Package: hp-health
Description: hp System Health Application and Command line Utility Package
 This package contains the System Health Monitor for all hp Proliant systems
 with ASM, ILO, & ILO2 embedded management asics.  Also contained are the
 command line utilities.

Thursday, March 3, 2011

A disk-based backup solution: Bacula, ReadyNAS NV+

I decided to set up a new backup system. I don't like tapes, so I started looking at disk-based backups. I also settled on Bacula for handling the backups. I found I could buy a ReadyNAS NV+ for $380, then 2TB disks for $125... allowing for a cold spare, that means $625 for disks... total cost for one storage unit, about $1000 for 5.4TB of RAID-backed, hot-swappable storage. That's outstanding value!

Better yet, I discovered that some helpful people had compiled and packaged Bacula for the Sparc ReadyNAS units (of which the NV+ is one) so all I need do is install it. Then I can run the Director on my basic HP server, the Storage Daemon on the ReadyNAS, and the File Daemon on the clients to be backed up.

I also figured we'd get better redundancy by using a pair of ReadyNAS units, and writing alternate weeks' backups to each.. so weeks 1 3 and 5 go to nas-1, and weeks 2 and 4 go to nas-2.

After trying 3 different web UIs for controlling the backups, I settled on bweb, which seems to work pretty well (only a few broken bits).

I'm still part-way through the final implementation, but when I get the config all sorted out, I'll post a follow-up. So far, I have one of the ReadyNAS units running the Storage Daemon, and I've got my desktop computer backing up to it as I type. W00T!


At this stage, we're doing simple backups of user's network drives on a range of servers situated at 6 sites, to a central facility. If this scheme works out well, we may look at adding more storage, and expanding into doing bare-metal restore support for critical machines... could be exciting stuff ahead!

Tuesday, February 1, 2011

Findstr

Wow, so I am a bit out of the loop when it comes to Windows command-line tools. I've often cursed the uselessness of find compared to grep. But it turns out that since Windows 2000, there's been the rather handier findstr, which does regexp searches, recursive searches through files and folders... why, it sounds just like GNU grep!

Full reference details at technet.

Wednesday, January 19, 2011

Mummy, can we run Linux at home?

While I would generally prefer Linux as a home server, I have to hand it to the folks at Microsoft, with their promotional book Mommy, Why Is There A Server In The House? (download the 24 MB PDF here) - it's funny.

No, I'm still not going to get a Windows Home Server, but it is amusing.

Monday, January 17, 2011

Webalizer: Microsoft Internet Exploder, tee hee

Looking at some web stats for our company's website, I see the most popular UA (user agent) is... "Microsoft Internet Exploder"?? I'm no great fan of Microsoft's browser, but even I haven't called it that for a while... puts me in mind of Bud Uglly's brilliant Alvin's Internot Exploder - if you're not familiar with Bud Uglly, check 'em out. When I used to work in web development, this site would make me laugh until I was almost sick.

Anyhow, I did a bit of digging, and found our webalizer.conf - it's from 2000. I think our site has been hosted there for a while!

Sunday, January 16, 2011

Squirrelmail LDAP address book filtering

I recently did some hackery to exclude email accounts that aren't "real people" (things like admin mailboxes) from our mailing lists. That was easy enough, just meant modifying the script that pulls entries out of LDAP and pokes them into the mailman's sync_members script. Cool!

Now to do the same with SquirrelMail. Ran squirrelmail_configure... hmm, no option to set up a filter. That's okay, I'm happy to edit config.php... filter, filter... nope, nothing. Hmmm... yet the doco says you can do it, with a config item called, astonishingly, filter. Reading the doco more carefully, oh noes! The filter config item wasn't added till version 1.5.1, and that's a development version. Bugger! Do I want this functionality more than I want to stay on a stable release? Hmmmm...

Monday, January 10, 2011

Taming netstat with awk

Specifically, I want to see all network connections on a Solaris host which aren't localhost or from the local LAN. Let's start with netstat:

netstat -a -n -f inet -P tcp

Which gives me *everything*. Now let's filter out localhost connections:

netstat -a -n -f inet -P tcp | awk '$2 !~ /^127\.0\./'

Sweet! Now let's get rid of stuff on our local LAN (192.168.1.0/24) and stuff that's in a LISTEN state:

bash-3.00$ netstat -a -n -f inet -P tcp | \
awk '$2 !~ /^127\.0\./ && $2 !~ /^192\.168\.1\./ && $7 !~ /LISTEN/'

TCP: IPv4
   Local Address        Remote Address    Swind Send-Q Rwind Recv-Q    State
-------------------- -------------------- ----- ------ ----- ------ -----------
      *.*                  *.*                0      0 49152      0 IDLE
      *.*                  *.*                0      0 49152      0 IDLE
      *.1017               *.*                0      0 49152      0 BOUND
      *.32832              *.*                0      0 49152      0 BOUND
192.168.1.29.22      10.88.0.90.54670     64128      0 49232      0 ESTABLISHED


Getting closer... let's also remove those idle and bound lines:

bash-3.00$ netstat -a -n -f inet -P tcp | \
awk '$2 !~ /^127\.0\./ && $2 !~ /^192\.168\.1\./ && $7 !~ /LISTEN/ && $7 !~ /BOUND/ && $7 !~ /IDLE/'

TCP: IPv4
   Local Address        Remote Address    Swind Send-Q Rwind Recv-Q    State
-------------------- -------------------- ----- ------ ----- ------ -----------
192.168.1.29.22      10.88.0.90.54670     64128      0 49232      0 ESTABLISHED

Yay. Now I can see my SSH login, and more to the point, see that there are no other connections to this host from outside our LAN, which is what I wanted to check.