Analyzing MATIO And stb_vorbis Libraries With Mayhem

Maxwell Koo
October 16, 2019
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

At ForAllSecure, our mission is to help developers find critical bugs in their software quicker, easier, and faster than standard development practices and tools. To facilitate this mission, we have looked to the open source world for exemplar software we can analyze with our next-generation fuzzer Mayhem, in order to get a stronger sense of its effectiveness and ease of integration into existing projects. This process has proven invaluable for ForAllSecure, providing hands-on experience in ingesting additional real-world software targeting a variety of environments and build systems and ensuring that the process is as streamlined as possible for new adopters. For more information on Mayhem visit www.forallsecure.com

We have also had the opportunity to not only discover and report multiple security-relevant defects to open source projects, but also assist in the vulnerability fix and verification process, improving the security of their users.

In this post, we will examine how we analyzed two open source libraries using Mayhem in a specific workflow that we’ve found to be particularly effective for finding bugs. We will cover building fuzz targets, dockerizing then, and running them inside of Mayhem. Following this process we found 8 previously unknown security-relevant defects across these projects, which were assigned:

stb_vorbis:

Matio:

stb_vorbis

stb is a suite of single-file C libraries in the public domain, containing utility functions useful to developers working on computer graphics applications or games. Their liberal license and ease of integration have made these libraries a popular choice for developers in these domains. The various components in this project provide an abundance of functionality including image file parsing and manipulation, font file parsing, a voxel rendering engine, Ogg Vorbis audio file parsing (the functionality explored in this post), and more.

To follow along with this post, check out commit c72a95d766b8cbf5514e68d3ddbf6437ac9425b1 for an unpatched version of the library.

Targets

For our analysis of stb_vorbis, we will generate two different Mayhem-compatible targets experimentally found to be an excellent combination for bug hunting: a LibFuzzer target and a standalone uninstrumented target compatible with Mayhem’s symbolic execution engine. These two targets not only complement each other for greater coverage in less time, but also require minimal setup to function. 

The standalone target is itself sufficient for Mayhem to analyze using its support for black-box compiled binaries, but will incur overhead. Users may experience reduced performance in form of execs per second, compared to when a LibFuzzer target is also provided. Setting up a LibFuzzer target requires a marginal amount of work for improved analysis efficiency.

Development Speed or Code Security. Why Not Both?

Find out how ForAllSecure can bring advanced fuzz testing into your development pipelines.

Request Demo Learn More

Building Targets

One of the first questions that arise when attempting to fuzz test a library target is "how do I feed fuzzed input to the target code?" In application targets, this input may be delivered via a file or the network. For a library, this is usually determined by the host application. 

To simulate usage of the library by a representative application, we will set up a small bit of code, so the target can take in raw bytes from the fuzzer and convert it into inputs that the library can use. To set up a LibFuzzer target, we require a function with a specific name and signature: LLVMFuzzerTestOneInput(const uint8_t *data, size_t size), that accepts raw bytes and sends them to the target function(s). If you are unfamiliar with LibFuzzer and would like to know more, their documentation goes into further depth. 

The target function, in this case stb_vorbis_decode_memory, is the function that takes the raw content of an Ogg Vorbis file and parses it into meaningful audio data. Parsing functions in general have the advantages (for bug hunters) of being notoriously hard to get right, but easy to send fuzzed data into. A good place to look for examples on how to write this function is in the projects test suite (if available). In our case, the file tests/test_vorbis.c provides a good example for what we need to do. Setting up the LibFuzzer target is as easy as adapting this test into the following code:

// fuzz/fuzz.cpp
#include <stdint.h>
#include <stdlib.h>

#define STB_VORBIS_HEADER_ONLY
#include "stb_vorbis.c"
#include "stb.h"

extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
int chan, samplerate;
short *output = NULL;
int samples = stb_vorbis_decode_memory(data, size, &chan, &samplerate, &output);

if (output)
free(output);

return 0;
}

 To convert the above into a standalone target as well, we require a main() function which accepts input as a file and passes it to the LLVMFuzzerTestOneInput function. Because the fuzz target function has a standardized ABI, we can define this once and link against any libFuzzer target to also generate a standalone target. This will prove to be useful, when we use this same file for analyzing Matio later in this post. Our "driver" code looks like this:

// fuzz/driver.cpp
#include <stddef.h>
#include <stdbool.h>
#include <stdio.h>
#include <stdint.h>
#include <unistd.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <fcntl.h>

extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size);

bool run_one_test(char *filename) {
int fd = open(filename, O_RDONLY);
if (fd == -1) {
return false;
}

struct stat st;
if (fstat(fd, &st) == -1) {
close(fd);
return false;
}
size_t size = st.st_size;

void *data = mmap(NULL, size, PROT_READ, MAP_PRIVATE, fd, 0);
if (data == (void *)-1) {
close(fd);
return false;
}

LLVMFuzzerTestOneInput(reinterpret_cast<const uint8_t *>(data), size);
/* ignore failures on cleanup */
munmap(data, size);
close(fd);
return true;
}

int main(int argc, char **argv) {
for (int i = 1; i < argc; i++) {
char *filename = argv[i];
if (!run_one_test(filename)) {
printf("Warning: failed to run %s\n", filename);
} else {
printf("Successfully tested %s\n", filename);
}
}
}

Although this code is a little longer, it only needs to be written once. Its purpose is to mmap() in a file as input and pass its contents to our LibFuzzer fuzz target function.

Now that the code for our targets is written, we’ll  build and link against the library to generate our targets.

Building

Build systems for C and C++ code widely vary, which creates complexity when trying to analyze a new project or integrate a new library distributed as source-code into your project. This pain is familiar to most C and C++ developers. Fortunately, stb was specifically designed as a set of "single file" libraries to specifically alleviate this pain and is easy to integrate. Assuming our fuzz target cpp file and driver cpp file are in a subdirectory from the repository root, as noted in the comments at the top of each file, we can write a simple Makefile to generate our target binaries:

# fuzz/Makefile
.PHONY: all

all: stb_vorbis_libfuzzer stb_vorbis_standalone

stb_vorbis_libfuzzer: fuzz.cpp
clang++ -I.. -fsanitize=address,fuzzer -g ../stb_vorbis.c fuzz.cpp -o
../stb-vorbis-libfuzzer

stb_vorbis_standalone: fuzz.cpp driver.cpp
clang++ -I.. -g ../stb_vorbis.c fuzz.cpp driver.cpp -o
../stb-vorbis-standalone

One way to prepare a set of target binaries for Mayhem is to build a Docker image containing the necessary binaries and supporting environment. Other workflows are available, but this is the recommended way to ensure all necessary dependencies are encapsulated in a way that can be effortlessly run on other systems. Detailed information about building Docker images can be found in the Docker documentation. In our case, creating the Dockerfile is straightforward:

# Dockerfile
FROM ubuntu:bionic as builder
# Install dependencies, including up-to-date Clang and LibFuzzer
RUN apt-get update && \
apt-get install -y
build-essential \
git \
wget && \
echo "deb http://apt.llvm.org/bionic/ llvm-toolchain-bionic-8 main" >> /etc/apt/sources.list && \
wget -O- https://apt.llvm.org/llvm-snapshot.gpg.key | apt-key add - && \
apt-get update && \
apt-get install -y \
clang-8 \
llvm-8-dev \
libc++-8-dev \
libc++abi-8-dev \
libfuzzer-8-dev \
lld-8 && \
ln -s /usr/lib/llvm-8/bin/* /usr/local/bin/ && \
rm -rf /var/lib/apt/lists/*

WORKDIR /build
ENV ASAN_OPTIONS=detect_leaks=0
RUN git clone https://github.com/nothings/stb.git && \
cd stb && \
git checkout c72a95d766b8cbf5514e68d3ddbf6437ac9425b1
COPY fuzz /build/stb/fuzz
RUN cd /build/stb/fuzz && \
make all

FROM ubuntu:bionic
WORKDIR /mayhem
COPY --from=builder /build/stb/stb-vorbis-libfuzzer /mayhem/stb-vorbis-libfuzzer
COPY --from=builder /build/stb/stb-vorbis-standalone /mayhem/stb-vorbis-standalone
CMD ["/mayhem/stb-vorbis-standalone"]

 To run in Mayhem we must build, tag, and push this image to a Docker repository accessible to the Mayhem installation (such as the Mayhem installation's built in docker repository).

Running with Mayhem

The last step is to configure our target. We require a Mayhemfile. To learn more about its contents, you can find more details in our documentation. As we use the Docker workflow, we need to tell Mayhem what Docker image to use and what targets exist inside the image. Our Mayhemfile looks like this:

version: '0.7'
project: stb
target: stb-vorbis
baseimage: 10.96.96.96:5000/stb
cmds:
- cmd: /mayhem/stb-vorbis-libfuzzer
libfuzzer: true
asan: true
- cmd: /mayhem/stb-vorbis-standalone @@

Notice how there are two commands listed: one for the libfuzzer target and one for the standalone target. It is also recommended to provide a starting corpus of valid inputs (Ogg Vorbis files), which can be readily found on the internet or in the test suites of other Ogg Vorbis parsers. These can be placed in a corpus directory next to the Mayhemfile.

Once these steps are done, we can run the package with mayhem run and see the results of our analysis!

Triage

Once defects are found, Mayhem will automatically classify them, provide additional analysis, and share a test case that can be run with a debugger to pinpoint the bug. This information is incredibly useful, as it allows consistent reproduction of the crash, further examination of the program state leading up to it, and automated analysis. This information allows relatively easy root-causing and patching of the discovered defects. Fixes for these issues were included in stb_vorbis version 1.17.

On this target, Mayhem found the following defects:

  • Heap buffer overflow
  • Stack buffer overflow
  • Division by zero
  • Null pointer dereference
  • Usage of uninitialized stack variables
  • Global out of bounds read
  • Reachable assertion

The security impact of these vulnerabilities depends on the host application using this library and its deployment scenario. The two buffer overflows can be exploited to execute arbitrary code, the out of bounds read could be used to leak sensitive information from the process, and the remainder are likely limited to causing a crash / denial of service of the application.

Matio

As another example, we will examine Matio, which is an open source C library for parsing MATLAB files and an alternative to MATLAB's own shared libraries for performing these functions. The steps we will follow are the same as stb:

  1. Create a LibFuzzer target
  2. Create a standalone target
  3. Create a Docker image containing both
  4. Create a Mayhemfile for our target
  5. Assemble a starting corpus
  6. Run

To follow along, checkout an unpatched version of Matio at tag v1.5.15.

Building Targets

Through an examination of the test suite and a Github search for API usages by downstream applications and libraries, users can develop a function that takes in bytes and feeds them to the MATLAB parsing core.

In this case, we write the bytes to a file on a tmpfs filesystem on /dev/shm. This prevents data dropping to disk unnecessarily, as there is no "read data from memory" function like there was in stb. This file is used as the input to the matio MATLAB file parsing function. Once parsed, we also iterate all variables contained in the file and read their data to include these code paths in our executions. This is critical. When coverage is not driven down these paths, these functions may hide relevant bugs.

// fuzz/fuzz.cpp
#include <stdint.h>
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <matio.h>

extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
char filename[256];
sprintf(filename, "/dev/shm/libfuzzer.%d", getpid());

FILE *fp = fopen(filename, "wb");
if (!fp)
return 0;

fwrite(data, size, 1, fp);
fclose(fp);

mat_t *mat;
matvar_t *var;

mat = Mat_Open(filename, MAT_ACC_RDONLY)
if (!mat)
goto out;

for (;;) {
var = Mat_VarReadNextInfo(mat);
if (!var)
goto out;

if (Mat_VarReadDataAll(mat, var))
goto out;

Mat_VarFree(var);
}

out:
Mat_Close(mat);
unlink(filename);

return 0;
}

We can use the same fuzz/driver.cpp as we did for stb to generate the standalone target.

Building

Matio uses GNU autotools for its build system, which is fairly easy to work with but requires extra steps to compile due to the flags required for the LibFuzzer and standalone case.

To compile for LibFuzzer, we use: 

CC=clang CXX=clang++ CFLAGS="-fsanitize=address,fuzzer-no-link -g"
CXXFLAGS="-fsanitize=address,fuzzer-no-link -g" ./configure; make

For the standalone case, we use just: 

CC=clang CXX=clang++ ./configure; make

 This generates a static library inside of src/.libs/libmatio.a which we can directly link our targets to in our Makefile:

# fuzz/Makefile
.PHONY: all

all: matio-libfuzzer matio-standalone

matio-libfuzzer: fuzz.cpp
cd ..; CC=clang CXX=clang++ CFLAGS="-fsanitize=address,fuzzer-no-link -g"
CXXFLAGS="-fsanitize=address,fuzzer-no-link -g" ./configure
make -C.. clean
make -C..
clang++ -I../src -fsanitize=address,fuzzer -g fuzz.cpp ../src/.libs/libmatio.a -lz -o ../matio-libfuzzer

matio-standalone: fuzz.cpp driver.cpp
cd ..; CC=clang CXX=clang++ ./configure
make -C.. clean
make -C..
clang++ -I../src -g fuzz.cpp driver.cpp ../src/.libs/libmatio.a -lz -o 

 We will also generate a Docker image with the following Dockerfile:

# Dockerfile
FROM ubuntu:bionic as builder
# Install dependencies, including up-to-date Clang and LibFuzzer
RUN apt-get update && \
apt-get install -y \
autoconf \
automake \
build-essential \
git \
libtool-bin \
wget \
zlib1g-dev && \
echo "deb http://apt.llvm.org/bionic/ llvm-toolchain-bionic-8 main" >> /etc/apt/sources.list && \
wget -O- https://apt.llvm.org/llvm-snapshot.gpg.key | apt-key add - && \
apt-get update && \
apt-get install -y \
clang-8 \
llvm-8-dev \
libc++-8-dev \
libc++abi-8-dev \
libfuzzer-8-dev \
lld-8 && \
ln -s /usr/lib/llvm-8/bin/* /usr/local/bin/ && \
rm -rf /var/lib/apt/lists/*

WORKDIR /build
ENV ASAN_OPTIONS=detect_leaks=0
RUN git clone https://github.com/tbeu/matio -b v1.5.15
COPY fuzz /build/matio/fuzz
RUN cd /build/matio && \
./autogen.sh && \
cd fuzz && \
make

FROM ubuntu:bionic
WORKDIR /mayhem
COPY --from=builder /build/matio/matio-libfuzzer /mayhem/matio-libfuzzer
COPY --from=builder /build/matio/matio-standalone /mayhem/matio-standalone
CMD ["/mayhem/matio-standalone"]

 

Again we build, tag, and push this image to a Docker registry.

Running with Mayhem

Next, we’ll set up a Mayhemfile and populate our corpus directory with sample MATLAB files:

version: '0.7'
project: matio
target: matio
baseimage: 10.96.96.96:5000/matio
cmds:
- cmd: /mayhem/matio-libfuzzer
libfuzzer: true
asan: true
- cmd: /mayhem/matio-standalone @@

Once these steps are done, we can run the package with mayhem run.

Triage

In this case, Mayhem found one crash caused by an integer overflow. The overflow was detected, but in response the resulting computation was set to 0 and that error condition was not checked. This resulted in a heap overflow by writing to a buffer allocated with malloc(0). 

Although our simple fuzz target only exercised enough code to find one instance of this, a manual review of the code revealed several additional cases where this exact pattern existed. Fixing this bug involved analyzing all cases where the results of these checked multiplications were passed directly to malloc() or were otherwise improperly used. The example provided by Mayhem provided enough information to manually locate and correct several instances of this bug, including in e.g. the MATLAB 7.3 file parser which was not directly exercised by our targets.

The impact of this vulnerability again depends on how an application uses the library. An application which parses untrusted MATLAB files with an unpatched version of matio could conceivably allow an attacker to execute arbitrary code by leveraging this heap overflow.

Conclusions

While libraries require slightly more effort to fuzz than e.g. an application that directly reads input out of a file, fuzzing library functions is relatively straightforward and can lead to greater performance in terms of execs per second than repeatedly forking a full application. Combining the speed of LibFuzzer with other techniques, such as symbolic execution, furthers the efficiency of this approach. When writing LibFuzzer fuzz target functions however, one must remain cognizant of the code paths that are being exercised. Creating multiple targets that cover different slices of the program is one approach to maximize the effectiveness of fuzzing. Thank you to the maintainers of stb (Sean Barrett, Github user nothings) and Matio (Thomas Beutlich, Github user tbeu) for their excellent handling of these reports and timely patches for the underlying issues!

Share this post

How about some Mayhem in your inbox?

Subscribe to our monthly newsletter for expert insights and news on DevSecOps topics, plus Mayhem tips and tutorials.

By subscribing, you're agreeing to our website terms and privacy policy.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Add Mayhem to Your DevSecOps for Free.

Get a full-featured 30 day free trial.

Complete API Security in 5 Minutes

Get started with Mayhem today for fast, comprehensive, API security. 

Get Mayhem

Maximize Code Coverage in Minutes

Mayhem is an award-winning AI that autonomously finds new exploitable bugs and improves your test suites.

Get Mayhem