Previously, I wrote about the general workflow to follow if you wanted to seriously begin fuzzing applications, while covering fuzzing a small YAML library. In this post, we will cover taking that workflow and applying it in real life to the open-source antivirus project ClamAV. This fuzz job was literally months in the making and we were able to find some really good bugs in very security-sensitive software. ClamAV is a popular email-gateway antivirus scanner, as well as popular on Linux and OS X as a desktop anti-virus solution.

You can download the ClamAV source code from Github. This post was written using ClamAV 0.99. To resolve the issues disclosed in this post, please upgrade to 0.99.2.

Why ClamAV?

There are a few reasons why we should focus on ClamAV.

ClamAV is used on network perimeters to automatically scan files sent in emails or other mediums. Ensuring that software automatically accepting untrusted complex input at a network edge can handle severely malformed inputs is of utmost importance
ClamAV is also popular on Linux and OS X for on-demand and automatic on-access file scanning
Any corpus generated while fuzzing ClamAV should be very useful in fuzzing other antivirus software

The three basic questions

ClamAV is also an encouraging and attractive target given the three questions I posed in the previous fuzzing article.

Is there example code readily available?

In the examples folder in the root of the ClamAV project, there is a file called ex1.c. This is a great bare-bones example we can use (and pare down) to fuzz ClamAV.

Can I compile it myself? (Is the build system sane?)

ClamAV uses autoconf, so building with afl-clang-fast should be cake.

Are there easily available and unique testcases available?

Inside of the test/.split folder in the root of the ClamAV project are multiple interesting files to seed with. Being antivirus though, we have a lot of files we can use to seed just lying around as well.

Getting and compiling ClamAV

ClamAV is simple to download, compile, and instrument.

# git clone https://github.com/vrtadmin/clamav-devel.git

# cd clamav-devel && git checkout 0.99

# CC=afl-clang-fast CXX=afl-clang-fast++ LDFLAGS="-static" ./configure

# make

Once compiled, we can create and build our target binary we want to fuzz. The clamscan utility shipped with ClamAV is far too heavy for us to use to fuzz with, we need a lighter-weight binary that we can make work with persistent mode.

The target binary

Using the ex1.c example file, I was able to pare down a very small C file to fuzz with (also using persistent mode).

#include <stdio.h>

#include <stdlib.h>

#include <unistd.h>

#include <string.h>

#include <clamav.h>

int main(int argc, char *argv[]) {

  cl_init(CL_INIT_DEFAULT);

  struct cl_engine *engine = cl_engine_new();

  cl_engine_compile(engine);

  const char *virusName = NULL;

  long unsigned int scanned = 0;

  while (__AFL_LOOP(1000))

    cl_scanfile(argv[1], &virusName, &scanned, engine, CL_SCAN_STDOPT);

  cl_engine_free(engine);

}

I removed any code related to loading the antivirus definitions, as this is what causes a vast majority of the scan time when using clamscan utility shipped with ClamAV. We aren’t fuzzing anything related to running virus definitions against our file, so this can be safely ignored.

One thing to note, ClamAV requires a rewindable file stream, so we can’t pass stdin directly to ClamAV in any way and have the data scanned, so we must pass a file name to be read and scanned. This is unfortunate and it introduces a bottleneck at the hard drive when reading new test cases. Other than that, this binary is quite speedy relative to the clamscan utility shipped. We will be able to make a couple more optimizations to eek out a bit more speed, though.

We can compile and instrument our target binary to test with afl-clang-fast.

# afl-clang-fast clamfuzz.c -I ~/clamav-devel/libclamav clamav-devel/libclamav/.libs/libclamav.a -ldl -lm -lpthread -lcrypto -lssl -lz -o clamfuzz

With this binary, we can now call ~/clamfuzz <file to scan>.

Small optimizations

Very early on in the fuzzing, it became very apparent there were files being left in /tmp without being deleted. More than once did I wake up and realise the /tmp of the chroot contained millions of tiny temp files left over from crashes, causing very slow hard drive access. This wasn’t something that I could keep dealing with, and we also don’t want to have to access the hard drive at all during fuzzing if we can help it.

To resolve these issues, I ended up mounting an 8-10gb tmpfs into /tmp. I also started a screen session removing every ClamAV temporary file every 10 seconds or so. With ClamAV writing and reading temporary files to a RAM drive mounted to /tmp, this removes a bottleneck while fuzzing, and allows us to quickly clean up left over files.

Testcases and strategery

When I first started thinking about fuzzing ClamAV, I wanted to focus on at least 3 types of files; archives, executables, and office documents. I chose these three file types because there are notably exotic types of archives, packed executables, and office documents. There is also some past history of vulns in some of the code for these file types.

In order to be able to make a distinct separation during fuzzing, I decided that each filetype would get its own set of four fuzzers, 1 master and 3 slaves each. This ultimately amounted to 16 afl-fuzz instances, as I also added a miscellaneous job with some file types that didn’t fit in with the others that I wanted to fuzz. The miscellaneous job contained graphical image formats such as JPG, PNG, etc, as well as an ISO image and others.

With 4 afl-fuzz instances dedicated to each filetype (as opposed to 16 total in a single job with all the file types together), I would be able to stop, prune, and restart jobs much more easily and granularly. I also would not likely saturate my path bitmap which AFL uses to keep track of the currently known paths. Having highly-targeted fuzzing strategies is very important.

The results

We were able to find a handful of potentially exploitable bugs, as well as many simple out-of-bounds read crashes. The full corpus generated during fuzzing is available on Github, as well as the seeds started with, the release of ClamAV used, and the small program used to fuzz libclamav: https://github.com/brandonprry/clamav-fuzz

CVE-2016-1371 – Crash when processing a crafted mew packer executable
CVE-2016-1372 – Multiple vulnerabilities when processing crafted 7z files

Timeline

The ClamAV team was excellent to work with when reporting these bugs. Special thanks to Steven Morgan and Joel Esler.

Mar 01 2016 – Samples sent and bug report opened. 

Mar 14 2016 - Agree to up to 120-day disclosure policy

Apr 11 2016 - CVE's allocated, fixes to be released in 0.99.2.

May 03 2016 - 0.99.2 released

Jun 13 2016 - Disclosure date

The generated corpora

I am going to supply each corpus separately for the given file types. These are the amalgamated queues for each file type.

Executables	https://github.com/brandonprry/clamav-fuzz/tree/master/exe_cov/queue
Office documents	https://github.com/brandonprry/clamav-fuzz/tree/master/doc_cov/queue
Archives	https://github.com/brandonprry/clamav-fuzz/tree/master/arc_cov/queue
Miscellaneous	https://github.com/brandonprry/clamav-fuzz/tree/master/misc_cov/queue

Conclusions

I went into this expecting to find better bugs in the packed executable parsing code, and was surprised I found more in the archive parsing code, but it probably shouldn’t be surprising in that so much more archive parsing code exists than packed executable. I hope the corpora generated are able to be used to fuzz other applications or antivirus software, or to pick back up in fuzzing ClamAV with some new seeds to hit new codepaths.