Analyzing MATIO And stb_vorbis Libraries With Mayhem

At ForAllSecure, our mission is to help developers find critical bugs in their software quicker, easier, and faster than standard development practices and tools. To facilitate this mission, we have looked to the open source world for exemplar software we can analyze with our next-generation fuzzer Mayhem, in order to get a stronger sense of its effectiveness and ease of integration into existing projects. This process has proven invaluable for ForAllSecure, providing hands-on experience in ingesting additional real-world software targeting a variety of environments and build systems and ensuring that the process is as streamlined as possible for new adopters. For more information on Mayhem visit www.forallsecure.com
We have also had the opportunity to not only discover and report multiple security-relevant defects to open source projects, but also assist in the vulnerability fix and verification process, improving the security of their users.
In this post, we will examine how we analyzed two open source libraries using Mayhem in a specific workflow that we’ve found to be particularly effective for finding bugs. We will cover building fuzz targets, dockerizing then, and running them inside of Mayhem. Following this process we found 8 previously unknown security-relevant defects across these projects, which were assigned:
stb_vorbis:
Matio:
stb_vorbis
stb is a suite of single-file C libraries in the public domain, containing utility functions useful to developers working on computer graphics applications or games. Their liberal license and ease of integration have made these libraries a popular choice for developers in these domains. The various components in this project provide an abundance of functionality including image file parsing and manipulation, font file parsing, a voxel rendering engine, Ogg Vorbis audio file parsing (the functionality explored in this post), and more.
To follow along with this post, check out commit c72a95d766b8cbf5514e68d3ddbf6437ac9425b1 for an unpatched version of the library.
Targets
For our analysis of stb_vorbis, we will generate two different Mayhem-compatible targets experimentally found to be an excellent combination for bug hunting: a LibFuzzer target and a standalone uninstrumented target compatible with Mayhem’s symbolic execution engine. These two targets not only complement each other for greater coverage in less time, but also require minimal setup to function.
The standalone target is itself sufficient for Mayhem to analyze using its support for black-box compiled binaries, but will incur overhead. Users may experience reduced performance in form of execs per second, compared to when a LibFuzzer target is also provided. Setting up a LibFuzzer target requires a marginal amount of work for improved analysis efficiency.
Development Speed or Code Security. Why Not Both?
 Find out how ForAllSecure can bring advanced fuzz testing into your development pipelines.
Request Demo Learn More
Building Targets
One of the first questions that arise when attempting to fuzz test a library target is "how do I feed fuzzed input to the target code?" In application targets, this input may be delivered via a file or the network. For a library, this is usually determined by the host application.
To simulate usage of the library by a representative application, we will set up a small bit of code, so the target can take in raw bytes from the fuzzer and convert it into inputs that the library can use. To set up a LibFuzzer target, we require a function with a specific name and signature: LLVMFuzzerTestOneInput(const uint8_t *data, size_t size), that accepts raw bytes and sends them to the target function(s). If you are unfamiliar with LibFuzzer and would like to know more, their documentation goes into further depth.
The target function, in this case stb_vorbis_decode_memory, is the function that takes the raw content of an Ogg Vorbis file and parses it into meaningful audio data. Parsing functions in general have the advantages (for bug hunters) of being notoriously hard to get right, but easy to send fuzzed data into. A good place to look for examples on how to write this function is in the projects test suite (if available). In our case, the file tests/test_vorbis.c provides a good example for what we need to do. Setting up the LibFuzzer target is as easy as adapting this test into the following code:
Notice how there are two commands listed: one for the libfuzzer target and one for the standalone target. It is also recommended to provide a starting corpus of valid inputs (Ogg Vorbis files), which can be readily found on the internet or in the test suites of other Ogg Vorbis parsers. These can be placed in a corpus directory next to the Mayhemfile.
Once these steps are done, we can run the package with mayhem run and see the results of our analysis!
Triage
Once defects are found, Mayhem will automatically classify them, provide additional analysis, and share a test case that can be run with a debugger to pinpoint the bug. This information is incredibly useful, as it allows consistent reproduction of the crash, further examination of the program state leading up to it, and automated analysis. This information allows relatively easy root-causing and patching of the discovered defects. Fixes for these issues were included in stb_vorbis version 1.17.
On this target, Mayhem found the following defects:
- Heap buffer overflow
- Stack buffer overflow
- Division by zero
- Null pointer dereference
- Usage of uninitialized stack variables
- Global out of bounds read
- Reachable assertion
The security impact of these vulnerabilities depends on the host application using this library and its deployment scenario. The two buffer overflows can be exploited to execute arbitrary code, the out of bounds read could be used to leak sensitive information from the process, and the remainder are likely limited to causing a crash / denial of service of the application.
Matio
As another example, we will examine Matio, which is an open source C library for parsing MATLAB files and an alternative to MATLAB's own shared libraries for performing these functions. The steps we will follow are the same as stb:
- Create a LibFuzzer target
- Create a standalone target
- Create a Docker image containing both
- Create a Mayhemfile for our target
- Assemble a starting corpus
- Run
To follow along, checkout an unpatched version of Matio at tag v1.5.15.
Building Targets
Through an examination of the test suite and a Github search for API usages by downstream applications and libraries, users can develop a function that takes in bytes and feeds them to the MATLAB parsing core.
In this case, we write the bytes to a file on a tmpfs filesystem on /dev/shm. This prevents data dropping to disk unnecessarily, as there is no "read data from memory" function like there was in stb. This file is used as the input to the matio MATLAB file parsing function. Once parsed, we also iterate all variables contained in the file and read their data to include these code paths in our executions. This is critical. When coverage is not driven down these paths, these functions may hide relevant bugs.
Once these steps are done, we can run the package with mayhem run.
Triage
In this case, Mayhem found one crash caused by an integer overflow. The overflow was detected, but in response the resulting computation was set to 0 and that error condition was not checked. This resulted in a heap overflow by writing to a buffer allocated with malloc(0).
Although our simple fuzz target only exercised enough code to find one instance of this, a manual review of the code revealed several additional cases where this exact pattern existed. Fixing this bug involved analyzing all cases where the results of these checked multiplications were passed directly to malloc() or were otherwise improperly used. The example provided by Mayhem provided enough information to manually locate and correct several instances of this bug, including in e.g. the MATLAB 7.3 file parser which was not directly exercised by our targets.
The impact of this vulnerability again depends on how an application uses the library. An application which parses untrusted MATLAB files with an unpatched version of matio could conceivably allow an attacker to execute arbitrary code by leveraging this heap overflow.
Conclusions
While libraries require slightly more effort to fuzz than e.g. an application that directly reads input out of a file, fuzzing library functions is relatively straightforward and can lead to greater performance in terms of execs per second than repeatedly forking a full application. Combining the speed of LibFuzzer with other techniques, such as symbolic execution, furthers the efficiency of this approach. When writing LibFuzzer fuzz target functions however, one must remain cognizant of the code paths that are being exercised. Creating multiple targets that cover different slices of the program is one approach to maximize the effectiveness of fuzzing. Thank you to the maintainers of stb (Sean Barrett, Github user nothings) and Matio (Thomas Beutlich, Github user tbeu) for their excellent handling of these reports and timely patches for the underlying issues!
