CVE-2020-10029: Buffer overflow in GNU libc trigonometry functions?!?
Remember trigonometry, where you were given the length of two sides of a triangle and had to compute the third side? We remembered vaguely SOH CAH TOA, but not much more. One thing we would have bet $50 on: That there wouldn’t be a buffer overflow in basic trigonometric functions. We would have lost that bet.
Earlier this year we uncovered bugs in the glibc functions cosl, sinl, sincosl, and tanl due to assumptions in an underlying common function, leading to CVE-2020-10029. These bugs, after being dormant for 8 years (introduced in 2012, in this commit) are now fixed in glibc 2.32.
Bugs in floating point operations can be of tremendous consequence. To name a few:
- The Patriot Missile problem that killed 28 soldiers
- The Ariane 5 rocket explosion, that became a primary driver to build up model checking research.
- The Intel FDIV bug that caused a recall on Intel processors.
- Yet security analysis — including fuzzing — of floating point arithmetic often goes undone. Indeed, to quote David Goldberg:
Floating-point arithmetic is considered an esoteric subject by many people. This is rather surprising because floating-point is ubiquitous in computer systems.
In this post, I’ll cover:
- The basics of how floating point works.
- The vulnerability in glibc.
- How we fuzzed it with Mayhem.
The GNU libc library
A C library is a set of general-purpose utility functions that nearly every program written in C uses. The programmer uses it to work with memory buffers, files, network sockets, system time and more. It also provides over 200 functions to perform calculations with floating point numbers, which this blog post is about.
The GNU C Library (glibc) is the most common open source C library and is used on most Linux systems. Because so many applications depend on it, and because it is installed across millions of devices, it is a critical component in the open source ecosystem.
Floating Point Arithmetic
“Twenty years ago anarchy threatened floating-point arithmetic.” –Prof Kahan, 1997.
Computers represent rational numbers, like 3.14, as floating point numbers. While most developers understand how integers, like 31 or 57, are represented, relatively few understand how floating point numbers are represented by the computer.
In 1985, the IEEE released IEEE 754, the standard by which most computers now implement floating point numbers. The standard carefully weighed considerations, such as how to handle rounding and how to represent the largest range of numbers compactly.
At a high level, every floating point number consists of three fields:
- A sign bit, indicating whether the number is positive or negative
- An exponent field, so that you can represent the exponent -12 in 1.01111 x 2-12
- A signicand field, which represents the non-exponent component. For example, “1.01111” in 1.01111 x 2-12 above.
The different types for floating point, such as float, double, and long double, are represented using different-size fields.
However, not all valid bit patterns are valid floating point numbers. This was surprising to us, and we only realized it after fuzzing and finding the bug with Mayhem.
The IEEE-754 format states the number 0.0 is represented by the exponent and fractional bits all being zero. The problem arises: What if the exponent is non-zero, and the fractional bits f are zero? While mathematically one may think this is zero (after all 0.0 * 10<any number> = 0 mathematically), it’s not on a computer. The IEEE-754 considers this an invalid number.
In other words, not all bit-patterns are valid floating point numbers.
How to identify fuzz testing targets
As we were assessing the security of OpenWRT, which uses musl as its C library, we discovered that a memory bug had been found in musl libc’s floating point code last year.
Inspired by this, we set out to uncover more bugs in floating point code. To this end, we constructed a fuzzer that calls all floating point functions with random (fuzzer-generated) input for glibc. Through this holistic approach, any memory bug should come to light.
Unfortunately, we found no bugs in musl libc. But because our fuzzer is universal and can be used on any POSIX-compliant math library, adopting its use for GNU libc was easy.
Finding CVE-2020-10029 in glibc
When we started fuzzing trig functions in glibc, we didn’t remember any of the above. So, we wrote a small library that automatically calls any glibc functions. Then a vulnerability fell out, which prompted us to go back to understand the issue.
The proof of vulnerability created by Mayhem is pretty simple: pass 0x5d4141414141414100000000
to sinl and watch the stack protector scream.
Here is the full proof of vulnerability:
#include <iostream>
$ cat sinl-pov.c
#include
#include
int main(void)
{
const unsigned char _v[16] = {0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x41, 0x41, 0x41, 0x41, 0x41, 0x41, 0x41, 0x5d};
long double v;
memcpy(&v, _v, sizeof(v));
/* Return the result so that gcc doesn't optimize everything away */
return sinl(v);
}
Glibc crashes with SIGSEGV. What? Why would purely arithmetic code have a buffer or anything else that would cause a SIGSEGV?
With a little hunting, we found that sinl, and all trig functions operating on a long double type, called __kernel_rem_pio2. That’s where the bug is at. The function performs pi/2 and keeps track of any remainder.
At first we were thrown off from an uninitialized use error. It turns out that you always have an uninitialized use error when using glibc because of how it represents the double long bit pattern. It breaks it up into three 32-bit words, with the top 16 bits unused “junk” bits simply there for memory alignment. Think of a double long value as like being an enum with four parts (and indeed this is from glibc code):
#include <iostream>
typedef union
{
long double value;
struct
{
uint32_t lsw; /* bits 0-31 */
uint32_t msw; /* bits 32-63 */
int sign_exponent:16; /* bits 64-79 */
unsigned int junk:16; /* bits 80-128 */
} parts;
} ieee_long_double_shape_type;
Although the vulnerability isn’t the “uninitialized memory errors”, it did start pointing us in the right direction. The vulnerable function __kernel_rem_pio2 is called with:
- x, an array of 3 doubles for the most significant word and least significant word for the double long significand
- The exponent and sign bits e0
- A number of other options that are well documented, but not necessary to describe here
- The POC discovered by Mayhem set the array x to be all zero and e0 to be non-zero. However, the code as written expects at least one bit in x[] to be set to 1. As a result, there is an out-of-bounds memory read, followed by an out-of-bounds memory write.
The vulnerable code is dense (available here).
Here is a walk-through with the salient parts commented with “note”:
#include <iostream>
int
__kernel_rem_pio2 (double *x, double *y, int e0, int nx, int prec,
const int32_t *ipio2)
{
int32_t jz, jx, jv, jp, jk, carry, n, iq[20], i, j, k, m, q0, ih;
double z, fw, f[20], fq[20], q[20];
...
/* compute q[0],q[1],...q[jk] */
/* note: since x[j] is the input and always zero in the POV,
q[i] will always be zero */
for (i = 0; i <= jk; i++)
{
for (j = 0, fw = 0.0; j <= jx; j++)
fw += x[j] * f[jx + i - j];
q[i] = fw;
}
jz = jk;
recompute: /* note: goto-loop. Our POV iterates through it until it
gets to an OOB read and OOB write. */
/* distill q[] into iq[] reversingly */
/* note: iq[i] should be zero since q[i] is zero */
for (i = 0, j = jz, z = q[jz]; j > 0; i++, j--)
{
fw = (double) ((int32_t) (twon24 * z));
iq[i] = (int32_t) (z - two24 * fw);
z = q[j - 1] + fw;
}
…
/* check if recomputation is needed */
if (z == zero)
{
j = 0;
for (i = jz - 1; i >= jk; i--)
j |= iq[i];
/* note: note that iq is zero, so this is true */
if (j == 0) /* need recomputation */
{
/* note: OOB read. iq[0] for all input parameters.
Returns true at the first uninitialized slot of iq[],
At least on the first iteration or recompute/goto loop */
for (k = 1; iq[jk - k] == 0; k++)
; /* k = no. of terms needed */
/* note: k is now some index such that iq[jk-k] is
uninitialized memory. We get an out-of-bound write
Because i is derived from k in f[jx+i] */
for (i = jz + 1; i <= jz + k; i++) /* add q[jz+1] to q[jz+k] */
{
/* note: Out-of-bound write! */
f[jx + i] = (double) ipio2[jv + i];
for (j = 0, fw = 0.0; j <= jx; j++)
fw += x[j] * f[jx + i - j];
q[i] = fw;
}
jz += k;
goto recompute;
}
}
...
}
We reported this bug to the glibc private security address, then reported it to their public tracker per their request. You can follow the public thread from January 31, 2020 on the glibc developers mailing list. The developers have put in a bug fix, and the CVE (CVE-2020-10029) is now public. The bug affects the glibc functions cosl, sinl, sincosl, and tanl due to assumptions in an underlying common function. The bugs will be fixed in glibc 2.32. The CVSS score is 5.5.
Summary
Floating point arithmetic is fundamental to many systems, from graphics to cyber-physical systems. Floating point code is also hard to check manually; even reviewing the vulnerability after we knew where it was took considerable time. One of the benefits of using advanced fuzz testing is it gives you an example, proving that the vulnerability exists, and giving you an actual input to fix the problem. This ensures developers only focus on verified issues, expediting the remediation process and allowing them to verify their fixes.
Further reading:
- Glibc source that shows how long doubles are represented for various architectures. GET_LDOUBLE_WORDS:
- Intel’s floating point reference sheet
- Lecture on floating point basics
- How to print out each bit of a floating point number
- Our favorite textbook introduction: Computer Systems: A Programmer’s Perspective (pricey, but good).
Add Mayhem to Your DevSecOps for Free.
Get a full-featured 30 day free trial.