Top Takeaways From The “Knowing The Unfuzzed And Finding Bugs With Coverage Analysis” Webinar
March 31, 2020
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
The adoption of fuzzing has resulted in vulnerabilities being found and fixed at scale. Although it is known for a number of its benefits never seen before in other Application Security Testing techniques, advanced users have eventually come across two key questions:
How do we find good fuzzing targets quickly?
What is left to fuzz?
ForAllSecure security researcher, Mark Griffin, sought to identify a solution to those that struggle with these questions and points to one possible solution: automated coverage analysis. Coverage analysis can be done with tools and workflows that are uncommon among software developers and security researchers alike.
In his webinar, Mark educates his viewers on “The Law of Fuzzing”, the importance of automated coverage analysis, and his motivation behind the development of bncov, an open-source coverage analysis plugin for Binary Ninja that allows its user to automate coverage analysis.
Over the last two decades fuzzing has gained popularity among tech companies as an established security testing technique. In fact, it has been deemed as a best practice by Microsoft and security engineers because it is an excellent technique for testing the security and resiliency of complex software.
The fuzzing market radically changed when an open source solution, called AFL, was released, bringing together many improvements. It changed the market due to one significant capability: coverage-guided generational fuzzing.
Although modern fuzzers today have this capability, it was significant at the time because it found the sweet spot between performance and granularity of coverage with a static-sized bitmap that looked at edge coverage with hit count. This capability allows the fuzzer to see when the target exercises different functionality. It also allows the fuzzer to register new behaviors, which indicates that the fuzzer has reached new code. Reaching new code means new testing opportunities that will allow users to uncover more bugs.
What are the Different Types of Coverage?
Previously, we touched on edge coverage. But what are the various types of code coverage? When considering code coverage there are three primary types of code coverage:
Block: Block coverage shows what did and did not get executed. It could refer to a statement, block, or line coverage. It offers a basic understanding of code coverage and is great for understanding functionality.
Edge: Edge coverage shares when execution transfers from one block to another. It is also able to share how often an edge was taken. Edge coverage is more descriptive and offers more granularity. This granularity offers the ability to distinguish a basic sense of ordering, which is important in loops, and provides more detailed information over block coverage. Some examples of fuzzers offer this capability include solutions like AFL.
Path: Path coverage is able to share information down to the exact order of every block that was executed. Path coverage offers very granular information, but often is too much to be practical due to the large amount of space required for storage. For some applications, though, it may be reasonable to store because you only need path coverage from one execution.
In all instances, code coverage is measured in percentages.
Figure 1: This diagram represents the three different types of coverage. The black blocks represent the information that is stored.
The key tradeoff between these coverage types is performance or storage vs. granularity, hence the reason why we often see edge coverage and rarely see path coverage.
The “Law of Fuzzing”: If It’s Never Been Fuzzed, You’ll Probably Find Bugs…
Fuzzing is an excellent technique for finding vulnerabilities, but what happens when you think you have fuzzed everything and start to see a plateau in results?
Figure 2: The diagram above indicates a harnesses’ coverage throughout a program. The yellow outline indicates the fuzz target on subsequent harnesses. Black indicates functions only covered by a second harness; green indicates functions covered by both harnesses; and gray indicates functions covered by neither harness.
Mark shares that fuzzing the same target differently can surprisingly result in more bugs. Regardless of whether you’re using a modern fuzzer that tracks coverage, he advises pulling coverage information from your existing fuzzing targets to understand what code actually got fuzzed. In his proposed approach, coverage information links inputs to the functionality of the target, and informs the user what to fuzz next. This is particularly helpful when your fuzzing results plateau, which always happens in fuzzing, due to the nature of having finite space to explore and how fuzzers explore it.
This is where automated coverage analysis, such as bncov, comes into play. With bncov, you can…
Assess whether you’re working with a good target. The goal is to discover how much code was covered when fuzzing the target vs how much code could have been covered.
Automate target discovery. This is where identifying which parts of the code remain uncovered can be automated, since you can see which functions can reach the most uncovered functions. Today, this process is often manual and users wouldn’t mind some machine help. This feature isn’t meant to replace the human element, but help humans find new targets quickly. Humans can still vet the recommended targets to see if it makes sense to fuzz them.
Automate the build/fuzz/analyze workflow. bncov is scriptable, allowing for less time spent manually executing common tasks. Automation can help scale some of those unsexy manual tasks. Automating workflows allows fuzzing and coverage analysis to be done without extra interaction, which results in the experts only having to spend time reviewing the answers to the questions they'd normally ask.
Mark’s suggested approach is conducted after the fuzzing cycle -- meaning it isn’t tied to any particular fuzzer. If you already fuzz and are struggling with similar questions, this may be an approach for you. For more information on bncov, check out Mark’s blog, “How Much Testing is Enough?”
How Much is Enough?
A commonly asked question about fuzzing is: how much is enough? Organizations such as Google recommend 1 CPU year, which may not always be possible based on product release schedules and various other reasons. Mark shares two answers - the idealistic and the realistic.
“Ideally, we should fuzz every source of input and possible options, all exported functions, internal functions with high complexity (regardless of they’re exposed externally), for all release versions or nightly if your organization releases multiple times a day,” shares Mark.
He further expounds, “Realistically, we should fuzz at least your primary sources of inputs. Having worked with a lot of code, he’s often run into programs that are hard to fuzz. He encourages his viewers to question why something is hard to fuzz, because chances are it’s hard to test in general and therefore isn’t likely to be tested well in the first place.
Irrespective of what’s possible for you, it’s important to keep fuzzing, even if you’ve fuzzed the target before. Inevitably, new bugs are discovered over time, as more functions are added and new mistakes are integrated. Thus, regression testing is key to ensuring the security health of your code. Google’s OSS-fuzz program has shared excellent statistics here. OSS-Fuzz has reported that 40% of their thousands of bugs are regressions.