Friday, January 9, 2009

How to Report a Bro Problem

Generally, when you see Bro doing something you believe it shouldn't, the best thing to do is opening a ticket in the Bro tracker, including information how to reproduce the issue. In particular, your ticket should come with the following:

  • The Bro version you're using (if working directly from the Subversion repository, the branch and revision number.)

  • A small trace in libpcap format demonstrating the effect (assuming the problem doesn't happen right at startup already).

  • The command-line you're using to run Bro with the trace. (Please run the Bro binary directly rather than using the bro.rc wrapper from the BroLite environment.)

  • Any non-standard scripts you're using (but please only those necessary; ideally just a small code snippet).

  • The output you're seeing along with a description what you'd expect Bro to do instead.

  • If you encounter a crash, information from the core dump, such as a stack backtrace, can be very helpful. See below for more on this.

It is crucial for us to have away of reliably reproducing the effect you're seeing. Unfortunately, reproducing problems can be rather tricky with Bro because more often than not, they occur only either in very rare situations or after Bro has been running for some time. In particular, getting a small trace showing a particular effect can be a real problem. In the following, I'll summarize some strategies to this end.

How Do I Get a Trace File?

Since Bro is usually running live, coming up with a small trace file can turn out to be a challenge. Often it works to best to start with a large trace triggering the problem, and then successively thin it out as much a possible.

To get to the initial, large trace, here are few things you can try:

  • Capture a trace with tcpdump, either on the same interface Bro is running on, or on another host where you can generate traffic of the kind likely triggering the problem (e.g., if you're seeing problems with the HTTP analyzer, record some of your Web browsing on your desktop.) When using tcpdump, don't forget to record complete packets (tcpdump -s 0 ...).

    You can reduce the amount of traffic captured by using the same BPF filter as Bro is using. If you add print-filter to Bro's command-line, it will print its BPF filter to stdout, which you can copy over to tcpdump.

  • Bro's command-line option -w <trace> records all packets processed by Bro to the given the trace file. You can then later run Bro offline on this trace and it will process the packets in the same way as it did live. This is particularly helpful with problems which only occur after Bro has been running for some time. For example, sometimes crashes are triggered by a particular kind of traffic only occurring rarely. Running Bro live with -w and then, after the crash, offline on the recorded trace might, with a little bit of luck, reproduce the the problem reliably.

    However, be careful with -w: it can result in huge trace files, quickly filling up your disk. (One way to mitigate the space issues is to periodically delete the trace file by configuring rotate-logs.bro accordingly.)

  • Finally, you can try running Bro on some publically available trace files, such as anonymized FTP traffic, headers-only enterprise traffic, or Defcon traffic. Some of these particularly stress certain components of Bro (e.g., the Defcon traces contain tons of scans).

Once you have a trace which demonstrates the effect, you will often notice that it's pretty big, in particular if recorded from the link you're monitoring. Therefore, the next step is to shrink its size as much as possible. Here are a few things you can try to this end:

  • Very often, a single connection is able to demonstrate the problem. If you can identify which one it is (e.g., from one of Bro's *.log files) you can extract the connection's packets from the trace with tcpdump by filtering for its 4-tuple of addresses and ports:

    tcpdump -r large.trace -w small.trace \
       host <ip1> and port <port1> \
       and host <ip2> and port <port2>.
    

  • If you can't reduce the problem to a connection, try to identify either a host pair or a single host triggering it, and filter down the trace accordingly.

  • You can try to extract a smaller time slice from the trace using the TCPslice utility. For example, to extract the first 100 seconds from the trace:

    tcpslice +100 <in >out
    

    Alternatively, tcpdump extracts the first n packets with its option -c <n>.

Getting More Information After a Crash.

If Bro crashes, a core dump can be very helpful to nail down the problem. Examining a core is not for the faint of heart but can reveal extremely useful information ...

First, you should configure Bro with the option --enable-debug and recompile; this will disable all compiler optimizations and thus make the core dump more useful (don't expect great performance of this version though; compiling Bro without optimization has a noticeable impact on its CPU usage.). Then enable core dumps if you don't have already (e.g., ulimit -c unlimited if you're using a bash).

Once Bro has crashed, start gdb with the Bro binary and the file containing the dump. (Alternatively, you can also run Bro directly inside gdb instead of working from a core file.) The first helpful information to include with your tracker ticket is a stack backtrace, which you get with gdb's bt command:

gdb bro core
[...]
> bt
....

If the crash occurs inside Bro's script interpreter, the next thing to do is identifying the line of script code processed just before the abnormal termination. Look for methods in the stack backtrace which belong to any of the script interpreter's classes; roughly speaking, these are all classes with names ending in Expr, Stmt, or Val. Then climb up the stack with up until you reach the first of these methods. The object to which this is pointing, will have a Location object, which in turn contains the file name and line number of the corresponding piece of script code. Continuing the example from above, here's how to get that information:

>up
>...
>up
>print this->location->filename
>print this->location->first_line

If the crash occurs while processing input packets but you cannot directly tell which connection is responsible (and thus not extract its packets from the trace as suggested above), try getting the 4-tuple of the connection currently being processed from the core dump. To this end again examine the stack backtrace, this time looking for methods belonging to the Connection class. The connection class has members orig_addr/resp_addr and orig_port/resp_port storing (pointers to) the IP addresses and ports respectively:

>up
>...
>up
>printf "%08x:%04x %08x:%04x\n", \
    *this->orig_addr, this->orig_port, \
    *this->resp_addr, this->resp_port

Note that these values are stored in network byte order so you will need flip the bytes around if you are on a low-endian machine (which is why the above example prints them in hex). For example, if an IP address prints as 0100007f, that's 127.0.0.1.