Tuesday, August 9, 2011

FreeBSD, ALTQ and SNMP

Some history: we use FreeBSD (actually, a cut-down version called nanoBSD) to route and shape our WAN traffic. Works like a charm. ALTQ is the magic kernel bits that make the queuing work. It's a lot like the Linux tc stuff: set up queues, assign queuing disciplines, then push traffic into the appropriate queues based on certain criteria - in our case, usually related to port numbers and IP addresses.

Anyhow, all good, it works just like you'd expect. We can use pftop -v queues to watch in realtime how much traffic is passing through (and being dropped by) each queue.

Then I got ambitious, and decided it'd be really helpful to use Cacti to graph the queue stats. We already do this for overall traffic throughput on the interfaces, and it's handy. We just use SNMP to poll the interface counters, Cacti makes a nice graph, and we can see what's going where, when.

However, that's where things started to get a little complicated. You see, the ALTQ SNMP implementation was incomplete for a while. Specifically, this is what happens if you try to walk the ALTQ section of the MIB:

$ snmpwalk -v1 -cpublic my-router .1.3.6.1.4.1.12325.1.200.1.9.2.1
SNMPv2-SMI::enterprises.12325.1.200.1.9.2.1.2.1 = STRING: "NoRouteIPs"
SNMPv2-SMI::enterprises.12325.1.200.1.9.2.1.2.2 = STRING: "Sequencers"
SNMPv2-SMI::enterprises.12325.1.200.1.9.2.1.3.1 = INTEGER: 2
SNMPv2-SMI::enterprises.12325.1.200.1.9.2.1.3.2 = INTEGER: 2
SNMPv2-SMI::enterprises.12325.1.200.1.9.2.1.4.1 = Timeticks: (1198511000) 138 days, 17:11:50.00
SNMPv2-SMI::enterprises.12325.1.200.1.9.2.1.4.2 = Timeticks: (1198511000) 138 days, 17:11:50.00
[snip]
SNMPv2-SMI::enterprises.12325.1.200.1.9.2.1.20.1 = Counter64: 0
SNMPv2-SMI::enterprises.12325.1.200.1.9.2.1.20.2 = Counter64: 0
Error in packet.
Reason: (genError) A general failure occured
Failed object: SNMPv2-SMI::enterprises.12325.1.200.1.9.2.1.20.2

Sub-optimal! The reason why this is occurring was that the section of code that would then walk through the ALTQ table was implemented like this:

int
pf_tbladdr(struct snmp_context __unused *ctx, struct snmp_value __unused *val,
        u_int __unused sub, u_int __unused vindex, enum snmp_op __unused op)
{
        return (SNMP_ERR_GENERR);
} 

Yep, that'll do it. The source file is /usr/src/usr.sbin/bsnmpd/modules/snmp_pf/pf_snmp.c Fortunately, version 1.14 of pf_snmp.c has a proper implementation. So which version is it in? Sadly, not in FreeBSD releases 8.x but it is in FreeBSD 9.0 Beta 1. Sooo, I thought I'd run that up and give it a crack.

Had to re-compile my kernel to support ALTQ, with reference to this article and this one too, as I'm something of a FreeBSD n00b. Then reboot and...

Enabling pfpanic mutex pf task mtx owned at /usr/src/sys/contrib/pf/net/if_pfsync.c:3163
cpuid = 0
KDB: enter: panic
[ thread pid 961 tid 100059 ]
Stopped at    kdb_enter+0x3a:  movl    $0,kdb_why
db>

Oh well, nice try. I guess I'll check back when 9.0 is stable.

No comments:

Post a Comment