Uncovering Memory Defects In Cereal (CVE 2020-11104 & CVE-2020-11105)
What is Cereal Serialization?
Deserialization of untrusted input is a common attack vector, making both the MITRE top-25 most dangerous software errors. Even without an attacker, mistakes in serialization or deserialization decrease the reliability of your code.
Cereal is a light-weight, highly used, general-purpose serialization library written in C++. It’s the recommended library by awesome CPP, starred 2,300 times, and has 463 forks. Cereal supports serialization/deserialization for three basic archive types: binary, XML, and JSON. I fuzzed cereal (using Mayhem, my fuzzer of choice), and found several bugs and vulnerabilities.
Cereal is a header library included in many applications. Header-only library vulnerabilities are interesting, especially in the context of the supply chain. Header-only libraries are compiled-in, so you must recompile and redistribute the entire application to fix bugs. You can’t just update a shared object.
In this post, I cover:
- What are header-only libraries
- How I fuzzed cereal
- Two CVEs: CVE-2020-11104 and CVE-2020-11105. They were both reported in March 2020 to the cereal developers as part of our responsible disclosure.
- A number of other bugs. We also reported these in March 2020 to the cereal developers, and it’s still unclear which ones are just bugs and which might represent vulnerabilities.
We know that sharing code helps educate, so we’ve made all our fuzzing harnesses and proof of concept code available on github.
Want to Learn More About Zero-Days?
Catch the FASTR series to see a technical proof of concept on our latest zero-day findings. Episode 2 features uncovering memory defects in cereal.
Watch EP 02 See TV Guide
Header-Only Libraries are Compiled In
Header-only libraries use C++ template functionality to auto-generate the library code at compile-time. The standard example of a template is a function for computing the maximum value of different data types. The following example creates a myMax template, and calls it with a float, and int, and a char.
Like the simple class above, the C++ compiler (clang++ -o point -I/src/cereal/include point.c && ./point) expands the cereal header and creates an appropriate JSON deserializer for the type. A typical application may have several different serializers (e.g., JSON and XML), and many functions with each. You want to test each one.
Fuzzing Header-Only Libraries
Writing fuzzers for header-only libraries is similar and different to fuzzing normal libraries. It is similar in that you write a harness, just like when you fuzz any other library. On the other hand, it is different in that you need to consider the different data types that get instantiated by the header.
When I approached fuzzing cereal, I organized a set of harnesses that would instantiate the templates with different data types, and checked how the serializer (and deserializer) routines worked on those data types. My logic considered the cross-product of:
- serialize and deserialize (well-formed and random input)
- … of any number of variables
- … internally identified by any name (cereal supports name-value pairs)
- … of any data type
- … holding any possible value
- … in any order
- … using any supported target archive (JSON, XML, binary, portable binary)
- ... on 64 bit and 32 bit, and be alert for discrepancies arising from conversions between the two
When building a libfuzzer-style target, one challenge is extracting many data types the compiler creates from the input byte array. To address this, I use an open-source library I built myself some time ago. Coincidentally, it is also a header-only library (just like cereal) and unsurprisingly it is called fuzzing-headers.
An example harness using fuzzing-headers is this:
The program simply accesses the raw memory of the variable ‘v’, which is initialized, and prints it to the screen. An uninitialized memory access happens because the padding bytes are never initialized -- they are just there for alignment. This is another fascinating instance of C (and by extension C++) living up to its reputation of being esoteric, where nothing is ever quite like it seems.
The bug in cereal happens to leak those padding bytes when using a Binary or PortableBinary archive format. We have put a POC that demonstrates the problem.
CVE-2020-11105
If a user has control of a cereal archive file, they can potentially free attacker-controlled addresses in memory. MITRE has given a CVSS score of 9.8 out of 10 for this bug, indicating it could potentially be readily exploited over a network.
The problem is cereal employs caching of std::shared_ptr values, using the raw pointer address as a unique identifier. This becomes problematic if an std::shared_ptr variable goes out of scope, is freed, and a new std::shared_ptr is allocated at the same address.
The problem arises because cereal uses a caching mechanism for smart pointers. The raw pointer value is used to derive an ID, which it uses for subsequent lookups of the value of the smart pointer.
You can view a sample proof of concept (POC) illustrating the problem here. Running the POC on my machine prints out:
Note that the input is (true, false), but the output is (true, true). The reason is the first and second bool variable in the code happen to have the same memory address, so cereal thinks they are the same variable. As a result, it erroneously stores the value of the first variable (true) into the archive twice.
Other Bugs
While fuzzing, we found a number of other bugs. The developers of cereal did not give CVEs to these bugs, and therefore we do not know their severity.
- Loading untrusted XML can lead to a null pointer dereference. In our POC, this leads to a crash.
- XML may output uninitialized memory. In our POC, this leaks data.
- An XML stack overflow. In our POC, it causes stack exhaustion and a crash. .
- A JSON stack overflow, which can lead to an unexpected crash. This appears to actually be a bug in rapidjson, which cereal uses.
- A memory bomb where cereal will allocate gigabytes of memory from a mere 10-byte input archive (similar to a zip bomb).
- If an input std::string contains a null byte, both the JSON and XML backends fail to deserialize it back into the original value. A primary design requirement of a serialization library is that serialize(deserialize(input))= input. This bug violates that constraint. Bugs like this may cause a semantic error in an app.
- A potential XML injection issue. The XML archive for cereal allows for name/value pairs. This would only be a vulnerability if an attacker could pick the name, which seems unlikely. Fuzzing was very helpful in finding this bug because it managed to find a case where:
- angular brackets were inserted (that would normally turn the archive into invalid XML)
- the archive can still be parsed without issue by cereal
- the serialization-deserialization symmetry requirement above is violated (data that went in was different than what comes out).
- A loss of precision for unsigned long if you serialize on 64-bit, and deserialize on 32-bit. This is an interesting bug if you have an application you first started using on 32-bit, and recompiled to support a modern 64-bit CPU.
- Invalid values if you deserialize a long double from a JSON archive. In particular, some long double values may throw std::out_of_range upon deserialization. These values essentially corrupt the archive, because restoring the serialized values is not possible.
Conclusion
cereal is a relatively compact library with a focus on one specific task based around several core features. The code looks professional; the developers show a deep understanding of C++, it is richly documented, and the project comes with an extensive test suite which passes with flying colors.
Extensive test suite coverage has traditionally been seen as an adequate guard against bugs slipping into code, but the uptick in fuzzing efforts in recent years has uncovered troves of bugs in projects, calling for more testing. Fuzzing algorithmically explores code and uncovers "unknown unknowns" which depend on such specific input parameters that manual testing approaches miss them.
Interested in getting started? You can learn more about Mayhem by requesting a demo at the following link.
Add Mayhem to Your DevSecOps for Free.
Get a full-featured 30 day free trial.