Why I'm Not Sold On Machine Learning In Autonomous Security: Some Hard Realities On The Limitations Of Machine Learning In Autonomous NetSec

David Brumley

October 2, 2019

Tell me if you’ve heard this: there is a new advanced network intrusion device that uses modern, super-smart Machine Learning (ML) to root out known and unknown intrusions. The IDS device is so smart, it learns what’s normal on your network and does not immediately inform you when it sees an anomaly. Or maybe it’s an intrusion prevention system (IPS) that will then block all malicious traffic. This AI-enabled solution boasts 99% accuracy detecting attacks. Even more, it can detect previously unknown attacks. Exciting, right?

That’s an amazing sales pitch, but can we do it? I’m not sold yet. Here are the two big reasons why:

The above pitch confused detecting an attack with detecting an intrusion. An attack may not be successful; an intrusion is. Suppose you detected five new attacks, but only one was a real intrusion. Wouldn’t you want to focus on the one successful intrusion, not the four failed attacks?
ML-enabled security may not be robust, meaning that it works well on one data set (more often than not, the vendor’s), but not on another (your real network). In a nutshell, an attacker’s job is to evade detection, and ML research has shown it’s often not hard to evade detection.

Put simply, ML algorithms are not generally intended to defeat an active adversary. Indeed, academic research areas in adversarial machine learning is still in its infancy, let alone real products with ML technology. Make no mistake — there is amazing research and researchers, but I don’t think it’s ready for full autonomy.

The vision of autonomous security is machines detect, react, and defend. The two reasons I listed above can’t add up to full autonomy. In a nutshell:

A fully autonomous system that goes around and chases attacks (the vast majority of which are unsuccessful) is useless; it’s about preventing real intrusions.
A system that isn’t robust is easy to bypass. As soon as it’s deployed, attackers will learn what works and what doesn’t and still get in.

Let’s dig into netsec and ML, specifically using intrusion detection and prevention as our main setting.

Detection rate != real intrusions

One key reason I’m still not sold on ML comes down to this: they often confuse detection rates (most times listed as accuracy) with the true rate of an attack. In IDS, I assume users want to identify real intrusions. They don’t want to study the attacker. Instead, they want to detect when the attacker actually succeeds. Unfortunately, these wants aren’t reality.

In the first paragraph of this blog post I tried to fool you. Did you catch it?

I switched detecting attacks with detecting intrusions. These are different. Attacks are a possible indication of an intrusion, but they may or may not be successful. An intrusion is someone having success.

Consider a port sweep (e.g., with nmap), script kiddies trying default passwords and a targeted exploit against your system. Nmap sweeps don’t result in compromise, script kiddie scripts are likely to be unsuccessful, but the targeted attack -- that’s something quite different. It’s likely to be successful. You want to target your limited resources, manpower, and attention span on real intrusions.

The key concept to understand is the base rate fallacy. It’s not so much a fallacy, but a difficulty for humans to readily understand statistics. (For the mathematically-minded, the key paper I cover at Carnegie Mellon University’s coursework is Stefan Axelsson’s paper. If you want to annoy a sales rep, ask him if he’s read it.) We see this every day, when gamblers think they are on a hot streak. Every so often they win, so they think their system is a “winner”. The same principle can apply to security -- if every so often your ML algorithm is right, your brain can trick you into thinking it’s better than it really is.

The typical example of the base rate fallacy goes like this; you’ve gone to the doctor and he’s run a test that is 99% accurate for a disease. The bad news: the test said you had the disease. The good news: the disease itself is really rare. Indeed, only 1 in 500 people get the disease. What is the probability you really have the disease?

This is the fallacy: most people would assume since the test is 99% accurate that they are very likely to have the disease. However, math states otherwise. In fact, the actual probability you have the disease is only about 20%. How can that be?

Think of it like this; a 99% accurate test (here) means that every 1 out of 100 people will turn up positive. However, we know only 1 in 500 people actually have the disease. For 500 tests, 5 people are detected, but only 1 is a true positive… or only about 20%. (If you are a mathematician, you can calculate the exact probability with Bayes theorem. We use approximations because they’re easier to grapple with than conditional probabilities.)

Things get worse as the numbers get larger. For example, suppose the rate of real intrusions is only 1 in 1,000,000 events. With a 99% accurate detection rate you’ll have approximately 9,999 false positives for every true alarm. Who has time to chase that many false positives?

How does this apply to IDS and any algorithm, machine learning or not? If the actual rate of successful intrusions is low then even a 99% accurate IDS (as in the sales literature) will create a huge number of false positives. One rule of thumb used by SOCs [credits: Michael Collins at USC] is that an analyst can handle about 10 events in an hour. If almost everything is just an attack (a false positive) and not a real intrusion, you’ll at best inundate them with false positives or at worse teach them to ignore the IDS.

Indeed, the math shows that almost any false positive rate is likely too high.

Machine learning isn’t robust

What I mean by robustness is “does it still work once it’s expected by the attacker”. Let’s use an analogy where you are the attacker. Pretend, for a moment, you want to get into the illegal import business. You go through a security checkpoint and bam, you’re caught! So far so good; law enforcement (the IDS in this analogy) is winning. They won because you had no idea what they really were checking for and you likely triggered something.

As a determined criminal, what would you do? The natural thing is to figure out the checkpoints and the checkpoint rules and then evade them, right? In other words, as soon as you recognize the defense, you’ll figure out how to evade it. The same is true for attackers.

An IDS system is robust if it continues to detect even when the attacker changes their methods. That’s the key problem: right now, ML seems attractive because attackers are not trying to fool it. As soon as you’ve deployed it, attackers will notice they are not getting through and try to evade it.

A theorem applies here: it’s called (literally) The No Free Lunch Theorem (NFL). You can find the original paper here if you are mathematically minded: No Free Lunch Theorem, and an easier-to-read summary here. The “No Free Lunch” theorem states that there is no one model that works best for every problem. In our setting of security, the implication is that for any ML algorithm, an adversary can likely create one that violates assumptions and, therefore, the ML algorithm performs poorly.

Current research in ML security has had a hard time showing its robust. For example:

Carrie Gates and Carol Taylor challenge ML-based IDS on many fronts, citing robustness (as well as quality of training data) as a key gaps.
Malware authors routinely use VirusTotal (a system that runs many commercial anti-malware solutions) to modify their malware to evade detection.
In 2016, researchers at the University of Virginia showed they could evade state-of-the-art machine learning algorithms for detecting malware in PDFs. You can check out the results at https://evademl.org/. Here is the kicker: they showed they could automatically find evasive variants for ML classifiers for all samples in their study. A further study showed that you can trick GMail’s malware classifier 47% of the time with two simple mutations. Google is not alone; other ML-based AV engines are also reported to be fooled.
Currently, researchers at Carnegie Mellon University led by Lujo Bauer, have shown that it is possible to 3D print a pair of eyeglasses that can fool state-of-the-art face-recognition algorithms to identify the adversary as (a specific) someone else. For example, an attacker could 3D print a pair of glasses to evade face recognition in an airport. While the domain may seem separate, it’s the same theory -- ML isn’t robust against evasion.
In intrusion detection, luminaries Vern Paxon (architect of the Bro network IDS) with Robin Summers published a great overview of the challenges of ML and anomaly detection in intrusion prevent.

The lesson: attackers can learn about your defenses and if the defense isn’t robust, they’ll be able to evade it.

Why I’m not sold

Overall, I’m not sold. In this blog post, we covered two reasons:

AI and Machine Learning IDS products often don’t have a firm grasp on how easy they are to evade. While deploying one today may get you great detection, it may be very short lived.
The iron-clad rules of math show that almost any rate of false positives -- even if it looks really low on paper -- is likely too high if real intrusions are relatively (with respect to attacks) rare.

This is just the start. There are other technical problems we didn’t get in to, such as, whether the data you used to train is realistic of your network. Companies like Google and AT&T have immense data and still have challenges with this. There are also organizational issues, such as whether your SLA is calibrated so your people don’t have to manage the unknown. Mature security operations centers often first figure out how much they can handle, then tune the detector down appropriately.

My recommendations

First, think about whether you’re looking for a short-term fix or long-term fix requiring robustness against evasion. If you only want to stop internet chaff, new machine learning products may help. There isn’t a scientific consensus that they won’t be easy to evade as attackers learn about them.

Second, think hard about what you want to detect: do you want to study the attacker or are you in charge of responding to real problems? For example, the base rate fallacy teaches us that if your organization has relatively few intrusions per attack (ask your team if you do not know!), the iron-clad rules of math mean the hard limits on any approach -- ML or not -- may not be on your side.

Where can ML really help? The jury is still out, but the general principle is ML is a statistic, and will better apply where you are trying to marginally boost your statistical accuracy. I use the word “statistic” on purpose: you have to accept there is risk. For example, Google has had tremendous success on boosting ad click rates with machine learning because a 5% improvement means millions (billions?) more in revenue, but is a 5% boost enough for you?

A second place ML can help get rid of unsophisticated attackers, for example: that script kiddie using a well-known exploit. In such a setting we’ve removed the need to be robust since we’ve defined-away someone really trying to fool the algorithm.

Finally, I want to reiterate the amazing research being done in ML today and the researchers studying it. We need more. My opinion is we are not “there” yet, at least not in the way an average user would expect.

In the intro, I said I have more confidence that we can make parts of application security fully autonomous. The reason is:

Application Security Testing techniques, such as fuzzing, have zero false positives.
Attackers don’t have control over which apps you deploy, so the idea of evasion doesn’t even come into play.

But do I think there is a chance ML-powered IDS will become a fully autonomous network defense? Not today.

Originally published at CSOOnline

Share this post

Why I'm Not Sold On Machine Learning In Autonomous Security: Some Hard Realities On The Limitations Of Machine Learning In Autonomous NetSec

Detection rate != real intrusions

Machine learning isn’t robust

Why I’m not sold

My recommendations

Get a Demo

Or let us know if you have any questions

Complete API Security in 5 Minutes

Maximize Code Coverage in Minutes