Uncovering Memory Defects In Cereal (CVE 2020-11104 & CVE-2020-11105)

Guido Vranken

May 26, 2020

What is Cereal Serialization?

Deserialization of untrusted input is a common attack vector, making both the MITRE top-25 most dangerous software errors. Even without an attacker, mistakes in serialization or deserialization decrease the reliability of your code.

Cereal is a light-weight, highly used, general-purpose serialization library written in C++. It’s the recommended library by awesome CPP, starred 2,300 times, and has 463 forks. Cereal supports serialization/deserialization for three basic archive types: binary, XML, and JSON. I fuzzed cereal (using Mayhem, my fuzzer of choice), and found several bugs and vulnerabilities.

Cereal is a header library included in many applications. Header-only library vulnerabilities are interesting, especially in the context of the supply chain. Header-only libraries are compiled-in, so you must recompile and redistribute the entire application to fix bugs. You can’t just update a shared object.

In this post, I cover:

What are header-only libraries
How I fuzzed cereal
Two CVEs: CVE-2020-11104 and CVE-2020-11105. They were both reported in March 2020 to the cereal developers as part of our responsible disclosure.
A number of other bugs. We also reported these in March 2020 to the cereal developers, and it’s still unclear which ones are just bugs and which might represent vulnerabilities.

We know that sharing code helps educate, so we’ve made all our fuzzing harnesses and proof of concept code available on github.

Want to Learn More About Zero-Days?

Catch the FASTR series to see a technical proof of concept on our latest zero-day findings. Episode 2 features uncovering memory defects in cereal.

Watch EP 02 See TV Guide

Header-Only Libraries are Compiled In

Header-only libraries use C++ template functionality to auto-generate the library code at compile-time. The standard example of a template is a function for computing the maximum value of different data types. The following example creates a myMax template, and calls it with a float, and int, and a char.

#include <iostream>
using namespace std;

template
T myMax(T x, T y)
{
return (x > y)? x: y;
}

int main(){
cout << myMax(1,2) << endl; // myMax for integers
cout << myMax(1.01, 2.02) << endl; // myMax for float
cout << myMax('g', 'e') << endl; // myMax for char
return 0;
}

The templated code isn’t a shared library -- it’s a separate instance of the function for each data type. When you compile (g++ -o max max.c), you can see the three different functions created in IDA pro or your favorite disassembler.

Uncovering Memory Defects In Cereal Serialization (CVE 2020-11104 & CVE-2020-11105)

IDA Pro shows 3 functions were created

When you fuzz, you’d like to be able to test all the functions that were auto-generated. In the above, all 3 instances of myMax.

Cereal is a header-only library that may create many different functions, and you want to test each for vulnerabilities. For example, the following creates a JSON deserializer:

#include <cereal/archives/json.hpp>

struct ComplexNumber
{
float real, imaginary;

template
void serialize(Archive &ar){
ar(real, imaginary);
}
};

int main()
{
cereal::JSONOutputArchive archive(std::cout);
ComplexNumber n;
n.real = 4.0;
n.imaginary = 3.0;
n.serialize(archive);
return 0;
}

‍

Like the simple class above, the C++ compiler (clang++ -o point -I/src/cereal/include point.c && ./point) expands the cereal header and creates an appropriate JSON deserializer for the type. A typical application may have several different serializers (e.g., JSON and XML), and many functions with each. You want to test each one.

Fuzzing Header-Only Libraries

Writing fuzzers for header-only libraries is similar and different to fuzzing normal libraries. It is similar in that you write a harness, just like when you fuzz any other library. On the other hand, it is different in that you need to consider the different data types that get instantiated by the header.

When I approached fuzzing cereal, I organized a set of harnesses that would instantiate the templates with different data types, and checked how the serializer (and deserializer) routines worked on those data types. My logic considered the cross-product of:

serialize and deserialize (well-formed and random input)
… of any number of variables
… internally identified by any name (cereal supports name-value pairs)
… of any data type
… holding any possible value
… in any order
… using any supported target archive (JSON, XML, binary, portable binary)
... on 64 bit and 32 bit, and be alert for discrepancies arising from conversions between the two

When building a libfuzzer-style target, one challenge is extracting many data types the compiler creates from the input byte array. To address this, I use an open-source library I built myself some time ago. Coincidentally, it is also a header-only library (just like cereal) and unsurprisingly it is called fuzzing-headers.

An example harness using fuzzing-headers is this:

#include <cstdint>
#include <cstddef>
#include <fuzzing/datasource/datasource.hpp>

extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
fuzzing::datasource::Datasource ds(data, size);

try {
auto anInt = ds.Get();
auto aString = ds.Get();
auto aFloat = ds.Get();
} catch ( fuzzing::datasource::Base::OutOfData& ) { }

return 0;
}

As you can tell from the code between try and catch, it is very easy to get a variable of any type. You don’t have to worry about running out of data; if that happens, an exception will be thrown, and thanks to C++ constructors, all memory in use will be freed.

Here is some code from the actual cereal harness where it decides which archive type to use:

 const uint8_t which = ds.Get();

switch ( which ) {
case 0:
untrusted_deserialization_detail::test(ds);
break;
case 1:
untrusted_deserialization_detail::test(ds);
break;
case 2:
untrusted_deserialization_detail::test(ds);
break;
case 3:
untrusted_deserialization_detail::test(ds);
break;
}

CVE-2020-11104

My fuzzing found a bug that indicated cereal produced uninitialized data when dealing with variables of the type long double. Serialization of an (initialized) C/C++ long double variable into a BinaryArchive or PortableBinaryArchive leaks several bytes of stack or heap memory, from which sensitive information (such as memory layout or private keys) can be gleaned if the archive is distributed outside of a trusted context. MITRE gives this CVE a CVSS score of a 5.8, defined by MITRE as exploitable potentially over a network, with low/no impact on confidentiality, integrity, or availability.

A long double is represented using 12 bytes of padding and 8 bytes of actual data. The padding bytes are used for memory alignment. This definition is consistent with the IEEE 754 representation of floating point numbers, one variant of which prescribes to encode it in 80 bits of data as done by Intel.

For example, Mayhem (you can also just run valgrind) reports the following as using uninitialized memory:

#include <stdio.h>

int main(void)
{
long double v = 0;
unsigned char* p = (unsigned char*)&v;

for (int i = 0; i < sizeof(v); i++)
printf("%02X\n", *(p++));

return 0;
}

The program simply accesses the raw memory of the variable ‘v’, which is initialized, and prints it to the screen. An uninitialized memory access happens because the padding bytes are never initialized -- they are just there for alignment. This is another fascinating instance of C (and by extension C++) living up to its reputation of being esoteric, where nothing is ever quite like it seems.

The bug in cereal happens to leak those padding bytes when using a Binary or PortableBinary archive format. We have put a POC that demonstrates the problem.

CVE-2020-11105

If a user has control of a cereal archive file, they can potentially free attacker-controlled addresses in memory. MITRE has given a CVSS score of 9.8 out of 10 for this bug, indicating it could potentially be readily exploited over a network.

The problem is cereal employs caching of std::shared_ptr values, using the raw pointer address as a unique identifier. This becomes problematic if an std::shared_ptr variable goes out of scope, is freed, and a new std::shared_ptr is allocated at the same address.

The problem arises because cereal uses a caching mechanism for smart pointers. The raw pointer value is used to derive an ID, which it uses for subsequent lookups of the value of the smart pointer.

You can view a sample proof of concept (POC) illustrating the problem here. Running the POC on my machine prints out:

v is 0x5578c0144ec0
serialized: 1
v is 0x5578c0144ec0
serialized: 0
deserialized: 1
deserialized: 1

‍

Note that the input is (true, false), but the output is (true, true). The reason is the first and second bool variable in the code happen to have the same memory address, so cereal thinks they are the same variable. As a result, it erroneously stores the value of the first variable (true) into the archive twice.

Other Bugs

While fuzzing, we found a number of other bugs. The developers of cereal did not give CVEs to these bugs, and therefore we do not know their severity.

Loading untrusted XML can lead to a null pointer dereference. In our POC, this leads to a crash.
XML may output uninitialized memory. In our POC, this leaks data.
An XML stack overflow. In our POC, it causes stack exhaustion and a crash. .
A JSON stack overflow, which can lead to an unexpected crash. This appears to actually be a bug in rapidjson, which cereal uses.
A memory bomb where cereal will allocate gigabytes of memory from a mere 10-byte input archive (similar to a zip bomb).
If an input std::string contains a null byte, both the JSON and XML backends fail to deserialize it back into the original value. A primary design requirement of a serialization library is that serialize(deserialize(input))= input. This bug violates that constraint. Bugs like this may cause a semantic error in an app.
A potential XML injection issue. The XML archive for cereal allows for name/value pairs. This would only be a vulnerability if an attacker could pick the name, which seems unlikely. Fuzzing was very helpful in finding this bug because it managed to find a case where:

angular brackets were inserted (that would normally turn the archive into invalid XML)
the archive can still be parsed without issue by cereal
the serialization-deserialization symmetry requirement above is violated (data that went in was different than what comes out).

A loss of precision for unsigned long if you serialize on 64-bit, and deserialize on 32-bit. This is an interesting bug if you have an application you first started using on 32-bit, and recompiled to support a modern 64-bit CPU.
Invalid values if you deserialize a long double from a JSON archive. In particular, some long double values may throw std::out_of_range upon deserialization. These values essentially corrupt the archive, because restoring the serialized values is not possible.

Conclusion

cereal is a relatively compact library with a focus on one specific task based around several core features. The code looks professional; the developers show a deep understanding of C++, it is richly documented, and the project comes with an extensive test suite which passes with flying colors.

Extensive test suite coverage has traditionally been seen as an adequate guard against bugs slipping into code, but the uptick in fuzzing efforts in recent years has uncovered troves of bugs in projects, calling for more testing. Fuzzing algorithmically explores code and uncovers "unknown unknowns" which depend on such specific input parameters that manual testing approaches miss them.

Interested in getting started? You can learn more about Mayhem by requesting a demo at the following link.

Share this post