DevOps Chat Podcast: $2M DARPA Award Sparks Behavior Testing With ForAllSecure's Mayhem Solution

Mitchell Ashley
August 16, 2019
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Secure software depends on people finding vulnerabilities and deploying fixes before they are exploited in the wild. This has led to a world of security researchers and bug bounties directed at finding new vulnerabilities.

As dedicated as security researchers are, there is a vast ocean of software in existence, waiting for someone to find and exploit the next security vulnerability for profit or nefarious uses. With autonomous vehicles on the horizon, is there an autonomous solution to finding and fixing software vulnerabilities?

Enter DARPA Cyber Grand Challenge winner “Mayhem,” created by a team of researchers from Carnegie Mellon University who spun out security startup ForAllSecure. And they have a BHAG (Big Hairy Audacious Goal). “Our vision is to check the world’s software for exploitable bugs so they can be fixed before attackers use them to hack computers.” Mayhem has moved on from capture the flag contests to observing and finding vulnerabilities in DoD software and is working its way to corporate systems.

In this episode of DevOps Chats we talk with David Brumley, ForAllSecure co-founder and CEO, and CMU professor about the technology behind Mayhem, how it observes software as it executes and injects changes to effect and observe new and potentially exploitable behaviors. More information about Mayhem is also available at

The streaming audio is immediately below, followed by the transcript of the conversation.


Mitch Ashley: Hi, everyone, this is Mitch Ashley with, and you’re listening to another DevOps Chat podcast. Today, I’m joined by David Brumley, CEO at ForAllSecure. David is also a professor at CMU. He’s currently on leave. The topic that we’re talking about today is called Mayhem behavior testing—it ties back to David’s work at CMU. So, David, welcome to DevOps Chat.

David Brumley: I’m happy to be here.

Ashley: Great, we’re happy to have you on the podcast. Would you start by just introducing yourself a little more fully—your background, maybe a little bit about some of the research and how that led you to form ForAllSecure?

Brumley: Yeah, absolutely. When I got out of undergrad, I was a computer security officer. My job was to chase intrusions on the Stanford network and try to help people fix them. And at the time, I got pretty frustrated with the idea that we were always behind attackers, that I couldn’t find vulnerabilities first or get the fixes deployed.

I actually went back to grad school, got a Ph.D. and really made that my work since 2003 on how do we go about finding vulnerabilities before attackers and just as importantly, how do we get those fixes fielded? Because it’s not just about finding the vulnerabilities, it’s about how quickly we can get those in place.

I’ve been working on that, I’m a tenured professor at Carnegie Mellon, and for the last three years, been working on commercializing it.

Ashley: Excellent. Well, an entrepreneur and a professor, and it’s great to see that your research has led you to kinda bring this out to market. That’s exciting. So, tell us, what is Mayhem? What is Mayhem or behavioral testing?

Brumley: Yeah, what Mayhem does, we call behavioral testing. So, behavioral testing is about watching an application as it executes and learning from that execution and then trying to come up with a new input that would cause the application to do something different. And you do this again and again and again, like hundreds of times per second, with the idea that if you can learn the behaviors of a program and you can start driving it to new behaviors, you’ll, one, come up with a test suite for the program, that’s important for the DevOps part and how you get things out quickly, and second, things like vulnerabilities and exploits that trigger vulnerabilities are just triggering a particular behavior. So, you can automatically generate inputs that trigger these. At one point, people were saying we were automatically generating exploits with this technology.

Ashley: Interesting. I’m pretty sure that it’s not the same thing, but it sounds like a more thoughtful and learned approach to what we know as chaos testing from the Chaos Monkeys from Netflix, et cetera. But you’re really picking specific behaviors, building a profile of the application, and then looking for new behaviors that you can test against it?

Brumley: That’s what we’re doing, and so, there’s really kind of two technologies people may have heard of. One is called fuzzing. So, fuzzing is about running the application again and again and again. And it uses heuristics to pick those inputs, and one of the things we built is a way to monitor that application as it runs and to inform the fuzzer on how to come up with new inputs that would get different behaviors.

The second that we did is, we really took a page from surprisingly formal verification. So, the formal verification was trying to prove a program was safe. What we tried to do is use those same techniques to prove where a program is unsafe, and our proof creates an input that triggers the unsafe property.

So, we use these two techniques—fuzzing, symbolic execution and a few others and a portfolio approach to try to come up with behaviors that are exploitable.

Ashley: Interesting. Do you do this in a production environment, in a test environment, in both? How do you approach this?

Brumley: For a product, we always do it in testing. We think it’s important to have these techniques out there in testing so that you can find them before attackers. We have done some simulations in production. One of the things we participated in was, DARPA had a big autonomous Cyber Grand Challenge. So, we fielded it there, but we think the market would be in the testing environment.

Ashley: Mm-hmm. It would definitely make sense—makes more sense there. Say a little bit more about fuzzing and how that exactly works. Is it the software that’s doing the variations, or are you introducing variations to that from what you learn?

Brumley: We’re introducing the variations, so the idea is, an exploit for a program is just an input.

Ashley: Mm-hmm.

Brumley: And, you know, test cases are just inputs. So, you run the program on an input and you watch how it executes. Like, you see it executes this system call, that system call. It covers these branches. And you use that to learn a new behavior or guess a new input. And then the fuzzer will use a heuristic to pick that new input, with the idea it should trigger a new behavior.

Ashley: Step back just for a moment, you’ve talked a little bit about Mayhem and the technology behind it. You’ve stepped out of CMU to set up this company, ForAllSecure. How did you get that started? Are you working with a particular private sector, government sector? Sounds like something that might be interesting to the government side of things, too.

Brumley: Yeah, we got started by really participating in this, as I said, DARPA project. So, they have Grand Challenges every so often. One was the self-driving car contest. This was, like, a self-driving car for a computer security contest. We won that and that gave us our first $2 million, so that was, like, seed funding from the government.

Ashley: Excellent! Wow, that’s awesome.

Brumley: Yeah. And so, we’ve been working to bring it into the government so it’s used in pretty much every service and in the IC right now to try to look for vulnerabilities in weapons platforms as well as in software the government might be using to try to fix it before attackers can break into it.

Ashley: Mm-hmm.

Brumley: About, oh, six months ago, we then went and raised money from NEA to help transition this from just a government tool to something in the enterprise section as well.

Ashley: Mm-hmm, so you’re also beginning to work with the private sector, then?

Brumley: Yeah, we’re working—starting to work with the private sector. It’s really kind of an interesting difference between the two.

Ashley: Oh, there’s a lot of difference. [Laughter] Yes. Highlight what you see the difference as. I have some experience with both, also.

Brumley: Well, I think that the private sector, it’s harder for them to put a value on finding a vulnerability and fixing it quickly while, in the DoD, it’s really easy. That mission didn’t succeed, which is a big deal.

Ashley: Mm-hmm.

Brumley: So, I think that’s one difference. I think the second difference as we see going to market is, in the DoD, they care a lot about checking legacy systems, because they still have to maintain them. And if you think about it, lots of vehicles, airplanes, trains—things like that. And in an enterprise, they care about things that were developed in the last two or three years only. And so, they’re just kind of products in different parts of the life cycle.

Ashley: Yeah, interesting. You also have many layers of different parties involved in the government sector—contractors, subcontractors, primes, et cetera. So, you’re dealing with lots of layers of companies that are working as part of the government team.

Brumley: Yeah, working with the government is definitely, it’s a chore until you get into the groove and you realize a lot of the, what you perceive are barriers, are actually just regulations put up to protect taxpayers from complete abuse. So, it takes a long time, it takes about nine months to get anything really going from start to scratch in the government. But once you kind of get that cycle, it goes forward pretty quick.

Ashley: Yeah, that was my experience, too. It’s an investment, but once you get going, it’s great.

Brumley: Yeah.

Ashley: So, talk a little bit about where you are in the product life cycle? Do you have product in the market? Is Mayhem publicly available? Are you alpha, beta? Give us an idea of that.

Brumley: Yeah. So, within the DoD space, we have Mayhem that you can buy starting in July this year—so, pretty close. So, we have a number of beta installs, people are happy, and we’re gonna be switching over to general availability pretty quick.

Ashley: Mm-hmm.

Brumley: In the commercial market, we’re taking it a little bit slower, because some of these differences I’ve discussed. They use different application stacks, they’re often concerned with much newer software than old software, they often use different languages.

And so, that, we’re looking for design partners. People who wanna take Mayhem, what’s working in the DoD, what’s working in places like aerospace, and see if it works for them and figure out what we need to do to make it a really good fit.

Ashley: Mm-hmm. So, you’re spending a lot of time with customers and potential customers really learning what the commercial sector is looking for or can benefit from.

Brumley: We’re learning what they’re looking for. We’re also trying to understand, like you said, when you go to market, it’s kind of interesting in the DoD who the buyer and who the user are and trying to understand that in the commercial space. Like, it’s kind of fascinating to me in DevSecOps, it’s almost always the security team that’s the buyer but it’s the development team that’s the user.

Ashley: Mm-hmm. Yeah, exactly. And that’s an interesting dynamic there, it’s, you know, the DevSecOps conversation, one that we helped facilitate bringing together through a lot of the activities at

I’d love to hear a little bit about, so you said you’re not commercially available yet in the commercial or private sector, but you are entertaining companies to work with? Is that true?

Brumley: That is. So, we’ve worked with a large aerospace vendor that makes airplanes. Can’t say much about that. We’ve been able to really help them find some flaws in some software that they use as well as build up test suites for them so that when the developer does push a fix, that can be more rigorously tested, working with a Fortune 100 company, and then some IT infrastructure people.

Ashley: Mm-hmm. I could see, for example, data center providers being another place that it’s gonna really benefit from.

Brumley: Oh, absolutely. Anyone who has something that’s extremely high value like a web server that, if it gets compromised, your entire business is at stake.

Ashley: I’d love to hear your thoughts about how you fit this into the DevOps or the DevSecOps pattern or cycle. How does this happen? Where do you build it into?

Brumley: Yeah, so, from what we can see in research right now in a lot of DevSecOps, it’s really kind of a primitive stage where people want to run a scan before they field software. To me, this is insane. I don’t know any attacker who runs a commercial product and a scan and then says, you know, “Here’s the new zero day” or, “Oh, better not go after that, the scan was okay.”

Instead, what they do is, they’re always trying to break into that software. They’re always trying to learn from what hasn’t succeeded and come up with attacks and succeed. So, Mayhem is like that. So, the idea is, it’s more of asynchronous testing. You’re gonna push as part of your DevOps cycle through, you know, things like making sure you’re not using old versions of libraries. But then, you’re gonna launch a process that, for the lifetime of that product, tries to hack it in the background.

I think that’s really kind of the conceptual shift that we think needs to be made is to stop thinking of security as a scan you do once, and actually expect it to work, to something that’s always going on in the background.

Ashley: I’m gonna shift just like we do with testing, we’ve moved into continuous improvement, continuous testing do the same for DevSecOps where this is running maybe on multiple versions of the product and test environment all the time.

Brumley: Absolutely, as well as all the dependencies. If you look at Google—so, Google runs fuzzing, one of the technologies that I talked about earlier, on Google Chrome. And they found 12,000 new vulnerabilities in the last three years, and each one was accompanied by a test case, so zero false positives. And one of the things this has allowed them to do is get ahead of attackers, because they’re finding those flaws so quickly with their automated, always on infrastructure that I’ve talked to people who try to participate in bug bounties. And by the time they report it, it’s already fixed.

Ashley: Mm-hmm. I mean, aren’t the bad guys also doing fuzzing themselves, so you’re at a disadvantage if you’re not?

Brumley: Absolutely. Every top notch hacker I know uses fuzzing extensively as one of their techniques. I mean, this is assuming you kind of do the base, like, did the person forget to set a password? Always check for that first, right? But after that, after that base level, you’re gonna do fuzzing, especially after high value targets.

Ashley: Talk a little bit about where you see things going next. You’re kinda testing with the commercial market, you’re heavily engaged with the federal market. What happens next in terms of the product roadmap?

Brumley: Well, the next question that we have in terms of the product roadmap is, how do we close that life cycle from finding a vulnerability to having a fix fielded. So, some app site companies, what they wanna do is always add on that next language support so that they can support a larger and larger set of languages.

Ashley: Mm-hmm.

Brumley: For us, we’re really looking at C, C++, Java, Python—the core languages used by infrastructure Go and Rust. And we want to be able to do deep analysis in these languages and then be able to help automatically suggest fixes and actually, for DARPA, we proved that the computers could automatically patch. So, it could automatically patch binary programs, and it could assess whether that patch would have any business impact, and field it, if not. And so, that’s really what we’re trying to do, and we’re going language by language as opposed to trying to cover all the languages at once.

Ashley: Interesting. What’s your top pick in terms of languages you’re looking at first?

Brumley: So, the top one that we started with is C and C++. And that’s been a bit surprising to people, because it’s an older language. I can tell you why we picked it.

Ashley: Okay.

Brumley: The reason that we picked it is, regardless of what language you choose, you’re gonna be calling out to C and C++ code. In Java, you have JNI calls. In Python, you have, you know, as part of TensorFlow, if you’re doing machine language, everyone goes to C/C++ or a compiled language when they want performance.

And so, if you wanna analyze these applications completely, even if it’s Java or Python, you have to be able to analyze those components as well. So, we’re kind of working our way up.

Ashley: Yeah, I was gonna say, it sounds like you’re working way up the stack into the scripting languages.

Brumley: Absolutely. The second reason is, when you look at really critical infrastructure out there, it’s still primarily developed in C and C++. So, you know, I, like many people, have probably put up a Flask or Python website, it’s really quick to do, it’s online—you definitely don’t want it hacked. But when you start talking about a car or an airplane or a power plant—these are also on C/C++. And so, just from a safety point of view, you have to cover those.

Ashley: Interesting. Do you foresee, looking kinda down the road—I’m not asking you to announce anything officially, but do you see that you might be working with either the patch management companies, the vulnerability scanning companies as a complement to them? Would this be something that potentially displaces them? How do you see that future unfolding?

Brumley: Yeah, I think people who detect known vulnerabilities are complementary to us. So, the way I think of it is, there’s really two types of security companies out there. There are those who find new vulnerabilities, and there’s those who check for old vulnerabilities.

So, you have, for example, Tenable, which is a network scanner that goes and looks for known vulnerabilities. You have software component analysis, which looks for known vulnerable versions of libraries and other things, right? So, those are pattern matchers in some sense.

We’re not one of those companies. That’s someone that you would partner with. What we’re doing is, we’re finding new vulnerabilities. And so, this is more like a typical SaaS or DaaS type solution.

The other sort of people that we’re looking to partner with are the patch management. So, one of the cool things that we can do is, when we find a vulnerability, we automatically are building a test suite as well, just part of the process. So, when there’s a patch we can replay it, and we can tell you things like, “Hey, have any of the previously passing test cases stopped working? Has performance dropped?” as well as, of course, where the bug’s fixed.

Ashley: I definitely can see the partnership alignment there. If you had to paint a future of what success looks like for, both for you and ForAllSecure, what would that be?

Brumley: I think we’re really motivated by this vision that we wanna automatically protect and check the world’s software. And so, we wanna close that cycle and make it autonomous. And we’re making, actually, pretty careful design decisions as we do that.

So, I think a lot of people talk about, you know, big startups, billion dollar businesses—that’s not really our goal. Our goal is, we want to make it so the time from when we detect a vulnerability to there’s a patch that’s tested and ready to field is seconds. That, to us, is success.

Ashley: The billions will come later, right? [Laughter] That’s if you’re passionate about that core problem.

Brumley: I think so. Because this problem, we were talking about DevSecOps. Like, DevSecOps, the problems facing DevSecOps face every system administrator if we go back in the world. In Silicon Valley, we think of SREs and DevSecOps, but there’s this huge nation full of just system administrators who wanna know, “If I update it, is it gonna break? Do I need to update it?” And solving those problems for those people is what’s important for us.

Ashley: Do you have any demonstrations coming up, either online or you’re gonna be at some conferences? Is there a way for folks to see this yet?

Brumley: Yeah, so, we’re gonna be at Black Hat and we’re happy to demo it there. We’ll have a booth and a room. We also have a number of online videos. If you want to see the fully autonomous, in its splendor system, there’s definitely videos of the DARPA Cyber Grand Challenge as well.

Ashley: Going back to the original challenge.

Brumley: That was the original vision. I mean, when I talk to people about this technology, what really lights up their eyes is not finding security bugs, it’s that there’s not enough people to do the work. And by being autonomous and focusing on that value proposition, you’re really focusing on the pain point, which is, how do I do things that there’s just not enough highly skilled people to do.

Ashley: Well, I applaud you for what you’re doing. You know, there’s yet another scanner, yet another anti-virus something. It’s great to see folks that are really doing some innovation, kinda thinking about the problem differently and really trying to solve it in a more systemic way, and that’s one of the things that I see that you’re doing. It isn’t just, we’re trying to test for these kind of behaviors, but it’s also in this ecosystem of how patches happen and things can happen dynamically, automatically.

Brumley: Absolutely. I mean, we’ve had a number of customers who want us to support 10 different languages, but there’s no product in the world that can add 10 languages to support and also be really good at each one of them. For us, our main value proposition is, for everything we do, can we have zero false positives, can we make sure it’s actionable before we report it at all?

Ashley: Awesome! Well, we’re—[Laughter] the time always flies by on these podcasts. Is there any last parting thoughts that you wanna share with us before we wrap up?

Brumley: Well, thank you for having me and, as I said, we’d love to talk to you at Black Hat 2019.

Ashley: Great, great. Well, I’ll be there as well, so we’ll get to meet up there. We’d love to have you on a future podcast as things develop and hear more about how things are moving along for you and ForAllSecure.

Brumley: Thank you very much. Have a great day.

Ashley: Well, another DevOps Chat podcast has flown by, as they always do. I’d like to thank David Brumley, CEO, at ForAllSecure for joining us today. Thank you, David.

Brumley: Thank you.

Ashley: And I’d like to thank you—you, our listeners—for joining us, of course. This is Mitch Ashley with You’ve listened to another DevOps Chat. Thank you for joining us today.

Originally published on DevOps

Share this post

Fancy some inbox Mayhem?

Subscribe to our monthly newsletter for expert insights and news on DevSecOps topics, plus Mayhem tips and tutorials.

By subscribing, you're agreeing to our website terms and privacy policy.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Add Mayhem to Your DevSecOps for Free.

Get a full-featured 30 day free trial.

Complete API Security in 5 Minutes

Get started with Mayhem today for fast, comprehensive, API security. 

Get Mayhem

Maximize Code Coverage in Minutes

Mayhem is an award-winning AI that autonomously finds new exploitable bugs and improves your test suites.

Get Mayhem