Demystifying A Docker Image
Six months ago ForAllSecure started analyzing Docker images. What does this mean? Imagine we have a user who wants us to fuzz their application. How do they give it to us? Do they tar it up? Do they give us access to an environment where it’s running? Do we integrate into their build pipeline? Applications are an entire ecosystem -- they require specific library versions, environment variables, users, etc. While it may seem like a small limitation conceptually, this added barrier can contribute to the friction between development and security teams, especially as organizations look to incorporate security as a part of their build cycles.
This is where Docker comes into play. We wanted Docker as a packaging solution for our users because it’s accessible and easy to use, but we didn’t want the overhead of the Docker daemon and all the other fancy features that come with it. We ended up building our own lightweight version of Docker, allowing ForAllSecure to accept Docker images, while running them with the barebones RunC runtime. This allows us to analyze code without requiring changes to developer behavior. In this blog, we’ll focus on the first part of the problem: how to ingest Docker images.
Accompanying this post is the open sourcing of Rootfs Builder, the tool we use to extract a rootfs from a Docker image. A Docker image provides a portable, efficient format. Instead of sending a 4GB rootfs across the wire, users can simply give us a string like “ubuntu:latest” and ForAllSecure servers can pull the image and extract the rootfs. This value prop doesn’t just apply to ForAllSecure. Rootfs Builder allows any run time to ingest a Docker image and extract the rootfs. We chose Runc, but the extracted rootfs is vanilla (i.e. there is no Docker specific information) and will work with rkt, NSJail, etc.
It’s worth noting that there were a few already existing solutions for building a rootfs from an image. Unfortunately, they do not handle whiteouts correctly (explained further below). I also want to give a shout out to Makisu and Kaniko (written by Uber and Google respectively), which do provide functionality for extracting an image from a rootfs. They solve the problem of building Docker images in an environment not suitable for Docker, namely Kubernetes. We chose to not use their software because it was still a bit too feature-full for us.
Now that you understand the problem we are trying to solve, we can dive into the question, what is a Docker image? How do we go from a Docker “image” which is just some string like “alpine:latest” to a running instance of Alpine? In short, an image is a glorified tarball. It consists of various layers, which when merged together, form the rootfs of the container. To understand these layer, we need to make a quick detour to discuss the underlying technology, OverlayFS (OFS).
OverlayFS
OverlayFS layers two directories on a single Linux host and presents them as a single directory. The first directory, referred to as the “lower” directory, is read-only and usually provides the base file system. The second directory, referred to as the “upper” directory, reflects any changes made to the lower directory, while leaving the lower directory itself unchanged. If a file is removed, a “whiteout” file is created in the upper directory, to simulate the removal. The mount point is the 2 merged directories. Note that OFS requires support for extended attributes in order to store metadata regarding whiteouts.
OFS is the storage driver for Docker and, as you can imagine, is well-suited for containers. The lower directory is the filesystem, and then each layer on top is a snapshot of the container filesystem at a given time. OFS is an efficient way to generate and store diffs to a filesystem.
Try it out yourself:
What’s in an image?
Now that we understand the tech underlying a Docker image, we can look inside and better understand its contents. The Docker imasge contains 3 components:
- Manifest.json: points to all the layers and the config.json.
- Config.json: contains metadata necessary for running the container. Think Docker version, environment variables, mounts, etc.
- Layers: These are OFS layers as described above and are named using the hash of their contents. When merged together, they form the rootfs.
Let’s step through this using Docker to shed some more light on this:
Start by Docker pulling and saving the image. `docker save` saves the images to a tar archive.
This mapping reserves the first 65536 uids starting at 100000 under fas’s namespace. According to this mapping, uid 0 inside the container maps to 100000 outside the container.
Development Speed or Code Security. Why Not Both?
Find out how ForAllSecure can bring advanced fuzz testing into your development pipelines.
Request Demo Learn More
Next Steps
Developers use Docker images every day, and now you know, they are just glorified tarballs. There’s plenty of room for improvement with Rootfs Builder. Outstanding features we hope to add will allow the user to specify:
- The number of layers to untar.
- A layer to omit when untarring.
- A binary the user is interested in. Instead of returning an entire rootfs, this will just return the binary.
But for now, hopefully Rootfs Builder will help users introspect into Docker images. You can get started with Rootfs Builder here: https://github.com/ForAllSecure/rootfs_builder
Add Mayhem to Your DevSecOps for Free.
Get a full-featured 30 day free trial.