The Hacker Mind Podcast: Security Chaos Engineering with Kelly Shortridge

August 9, 2023
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Speaking at Black Hat 2023, Kelly Shortridge is bringing cybersecurity out of the dark ages by infusing security by design to create secure patterns and practices. It’s a subject of her new book on Security Chaos Computing, and it’s a topic that’s long overdue to be discussed in the field.

The Hacker Mind is available on all podcast platforms.

[Heads Up: This transcription was autogenerated, so there may be errors.]

VAMOSI: When I was at ZDNet, I once interviewed someone who compared how we handle crashes in airplanes and cars vs how we handle crashes in computer systems. I often think about that. 

For the airline industry, convincing people to get inside a metal tube and fly through the air demands trust. When there’s a crash, teams of investigators swarm the crash site and spend hours reconstructing everything. Sometimes they even ground similar aircraft until they can diagnose the problem. Then the fix is rolled out to all the airplanes. That’s why flying is one of the safest modes of transportation today.

Contrast that with computers. Given how we depend on our digital lifestyles from our banking to our healthcare, something like that should have happened with computers. Should have, but didn’t. Each blue screen of death or each time a screen froze that should have been an opportunity for the vendors for others to spend hours diagnosing the problem before moving on. It didn’t happen that way. And today we still get pagefault errors and so forth. There’s no reason why that should be other than the vendors just want to keep moving on. And that’s where we are today, still dealing with computing problems from the 1960s.

Some of this comes from our paradigms that we use with computer science. We need paradigms. We need a more holistic way of thinking about the issues. 

My next guest is challenging the existing models and she’s bringing some much needed perspective to the problem. I hope you’ll stick around.


Welcome to the hacker mind, an original podcast from the makers of Mayhem Security. It’s about challenging our expectations about people who hack for a living. I’m Robert Vamosi and in this episode I’m discussing secure by design and software resilience. 


Security Chaos Engineering

VAMOSI: If you’ve never heard of Kelly Shortirdge, then you’re in for a real treat.

She’s the Senior Principal Engineer in the office of the CTO at Fastly. Previously she founded a startup that was sold to CrowdStrike. Now she advises Fortune 500s, investors, startups, and federal agencies and has spoken at major technology conferences internationally. Kelly is giving a talk at Black Hat USA 2023. It’s called Fast, Ever-Evolving Defenders: The Resilience Revolution and it maps to a recent book from O'Reilly that she wrote along with Aaron Rinehart.

SHORTRIDGE:  Yes, the title is Security Chaos Engineering: Sustaining Resilience in Software and Systems. I was motivated to write it because what I see in the industry is kind of spinning its wheels for decades in some cases, so I joke that I'm bringing cybersecurity out of the dark ages with it. And in particular, a big thing I focused on was what are the things from software quality, all of the opportunities and practices that we have at our disposal to start infusing security by design to create repeatable, secure Patterns, and Practices like what are all those things that we can adopt, rather than just continuing to bolt on evermore security solutions on top of things?

VAMOSI:  Let’s define the phrase security chaos engineering in this context.

SHORTRIDGE:  Yes, so security chaos engineering is defined as the organizational ability to respond to failure gracefully and adapt to evolving conditions. There is the practice of chaos engineering here. Is experimentation, which is basically akin to resilient stress testing in every other domain. That's my preferred term for it. But in the context of software, chaos experimentation is basically simulating adverse scenarios to see how the system behaves end to end. So this isn't just like low testing a component, certainly not something like unit testing, and it's not a penetration test. This is very much how do you simulate these adverse conditions to see how both the socio part of the system, the humans in it and the technical part of the machines and services, how they behave? So that includes like, maybe we do want to understand was an alert generated, but we also want to understand could the human actually take action based on the alert? Did they understand that the alert meant something bad, really understanding the kind of the totality of the context of the system when or when we think about failure?

VAMOSI:  So is it kind of like fuzz testing on a macro level? Random input across the software engineering process?

SHORTRIDGE:  In some ways, yes. It's not necessarily that we don't want to introduce random input like fuzzing. It's not quite as stochastic. What I usually do just the scientific method is action. So we develop a hypothesis, like a classic one that's probably your listeners will enjoy was from Aaron Reinhart, who did a fantastic job cultivating all the case studies that are in Chapter Nine of the book. So he created a security chaos experiment, the first one actually, United Healthcare group. And it was he had a hypothesis that's, you know, like most people, you would assume that your firewall will detect blocking alerts on the events specifically that a user like accidentally or maliciously introduces a misconfigured port, very standard firewall stuff. So that's a hypothesis where was an assumption that I like to say it was one of the art this will always be true assumptions. We just take it for granted. Or like of course, the firewall will do this. But when he conducted the experiment and simulated that adversary, who actually found that the firewall only works 60% of the time, which is very contrary to expectations. So we aren't really introducing something at random. It's really we're trying to poke at those assumptions in that we take for granted the things we always think are true, because those are the ones that attackers love to take advantage of. They love when we've overlooked something like that.

VAMOSI:  Actually the subtitle does sound more relevant, so let's start by unpacking the phrase sustaining resilience in software and systems.

SHORTRIDGE:  Yes. So I draw on the huge, huge wealth of literature in scholarship from other domains in the context of resilience, and resilience means the ability to prepare and plan form, absorb and recover from failure gracefully. To me that really the underlying principle is adaptation is that when we face evolving conditions, especially adverse scenarios that have come into our existence, we are able to rise to the occasion and adapt to be able to change ourselves. You know, there's something very poetic review book, it's kind of the basis of life right is changing to stay the same that's out of voices. So I view it as almost a very poetic notion for cybersecurity is that we have to embrace change and speed and evolution in order to succeed

VAMOSI:  So is chaos engineering related to threat modeling and risk assessment?

SHORTRIDGE:  Oh, absolutely. I think it's very much a core part of threat modeling. In fact, I would love to kind of upgrade the discipline of threat modeling to start doing a two part process where you create decision trees, which I've used very much for articulating a hypothesis about how your system would respond to attacker behavior. Then you use those decision trees to formulate the experiments where you're basically testing the hypothesis because you want to validate or deny it and then use that to refine the system and then you come up with new hypotheses. It's this beautiful, beautiful feedback loop that really helps the company to have adopted it continuously to refine their systems and make them much more resilient to attack.


VAMOSI:   I’m still trying to wrap my brain around this concept.

So security chaos engineer more like a blue team exercise? Or is this more in the DevOps part of the world?

SHORTRIDGE:  answers could be either I am a huge fan when I've worked with companies and even internally vastly we've used decision trees, so I always think it's better for it to be collaborative, and in particular, you want stakeholders who understand the system in question. There are sometimes a tendency for blue teams, security teams to be kind of outside of the sphere of work. They may be prescribed how work should be done, but they aren't always in the flow of the work. I think that is often a mistake because what we want when we threat model, we want people who intimately understand the system understand its nuances and understand the quirks of its behavior.

So I think we want those stakeholders at the table. I argue that a lot of DevOps professionals they're actually quite similar to attackers if you look at attackers who write software like malware. They're certainly they have the autopsy component where they're looking to see like, hey, when we deploy that malware, is that changing? You know, are there any kind of monitoring or alerting thresholds that we might be tipping over? Are we tipping defenders off? So there's actually a kinship, I think between developers and operations folks and attackers in that mindset, so I've actually seen great success with using decision trees to get DevOps people thinking about like, Okay, now imagine I was disgruntled, how would I compromise the system and they have a lot of fun with it, which is great to get them excited about cybersecurity. Certainly, I think if you have a cybersecurity expert in there helping facilitate the discussion and again, poking holes and coming through with their perspective that can be valuable too, for sure.

VAMOSI:  So it’s a way of introducing security concepts to the DevOps world.

SHORTRIDGE:  That is one of the benefits. Yes, there are myriad benefits, but I think that's a huge one and again, one of the goals of the book was really to get DevOps people whether that's a pure programmer through to like in Site Reliability Engineering, SRE, like getting them excited that like, hey, I can actually make a huge difference in cybersecurity to doing a lot of the things I already do just maybe extending it a little bit to cover resilience. against attack.

VAMOSI:  So you mentioned having a cybersecurity expert in the room. Not all organizations have that. So your book actually provides some recipes, even exercises.

SHORTRIDGE:  I would say they are recommended practices. I definitely intend for the book to be accessible even to companies that don't have cybersecurity programs. I know especially right now, budgets are tight. It's difficult to have, you know, a bunch of cybersecurity experts that usually are quite expensive in your organization. So I tried to give just again, a wealth of opportunities. That even small businesses can adopt just to make a difference. And a lot of it is again, Secured by Design principles where no, you asked a bit about kind of like what is fastly offer.

Another thing we offer is a compute platform. Essentially, think of it as like high performance serverless that relies on our Global Edge network. Serverless in general, abstracted away from fastly, but serverless in general, actually is great security by design, because you have that sandboxing technology in place. You don't have to worry about maintaining your own server and like all of that stuff. You're offloading those responsibilities onto the provider. That's a great example I have seen personally, especially startups have amazing success where they can just tell their developers like Yeah, as long as you're deploying on to serverless. You don't have to kind of go through the security checklist, actually offloads a ton of kind of security toil, which again, I feel like they're these little opportunities where if you just think a bit differently about it, you're able to just save a lot of effort in cybersecurity.


Integrating Security Concepts into DevOps

VAMOSI: What I took away from the book is that resilience needs to be a paradigm shift. This isn't like you need to purchase a new tool, or that you need to adopt an entirely new schema. You just need to approach the problems in devops differently.

SHORTRIDGE:  Precisely and one of the concepts I have in the book is the effort investment portfolio, because I have a finance background, I think any investment portfolios often and the idea is that, you know, I present as many opportunities as I could in the book without it becoming 1000 Page tome. You're not going to use all of them. Realistically, what matters is really your local context. What can your organization do? Like for some companies, they're going to be able to more easily adopt serverless. For other companies, they're going to be able to more easily, you know, refactor something into memory safe language or, you know, introduce like better logging or introduced distributed tracing again, it's you have to think about what your organization can feasibly do and consider like, what's the ROI of that effort and how will it pay off? So, it was very much intended to be like, Hey, here's kind of the spectrum of things to think about. And now you can kind of go forth and choose for yourself like how you're going to and the key thing is, constantly learn, and constantly improve like don't assume security is just like done. As I say resilience as a verb security is a verb as well. We don't want to stay static. The The important thing is really that word learning and adapting,


VAMOSI:  The software development life cycle typically includes seven steps such as planning, analysis, design, development, testing, implementation, and maintenance. I prefer the Microsoft model which includes training, requirements, design, implementation, verification, release and response. Whichever method you prefer, security by design begins early in the software development lifecycle.

SHORTRIDGE:  Yes, I would say that it's very much about software design, software architecture, there are things you can do later on as well, for sure. But I think the great news is that a lot of this intersects with software quality. I personally view security as a subset of software quality, actually, because I've used software quality just as the system behaves as it intends. Part of that is like, well, an attacker is very much not something that we intend in our system. So it's just part of this whole subset. So a lot of the things we want to do to achieve software quality. We can use it to achieve a kind of resilience against attack as well. So I think this is again, something that, for instance, when I've talked to very senior people like infrastructure leadership, not cybersecurity at all, when they learned like, oh, immutable infrastructure actually helps a resilience against tech. We were just using it because it's more reliable. But actually, it means that attackers can persist on our systems. That's great news.

And I think there are a lot of opportunities like that where it's something the engineering team maybe wants to do anyway. And with that cybersecurity benefit, maybe now you can join forces and get that, you know, upper echelon leadership to buy in so when I think of the software development lifecycle, it makes sense. It looks clean, looks good. You start with the planning, you go through the development, you you have pre release tests, and then you release very cleanly. However, in the real world, that isn't the way software is developed. You don't really start in square one and progress to 234. You're somewhere in the middle usually and you're retroactively going or proactively going.

VAMOSI: Yeah, but how does that even work in the real world?

SHORTRIDGE:  In the real world, it's definitely messy. Every organization is different and say testing is often neglected. You know, they're often unit tests, tests at best and not integration tests. So talking about in the book integration tests are actually a boon for cybersecurity, but still often overlooked. So I agree there's a lot more kind of circular and what or a Boris in nature, a lot of the time. So in the real world, they're certainly the part where when you're starting a system from scratch, you have a design phase, often when you're iterating on the system as well, and you have some sort of like design review for new features. Sometimes you don't, depending on the company, which can be a problem as well. Obviously, you have developers writing code, hopefully they're using something like source code management to make sure everything gets merged in cleanly. There aren't conflicts, version control, all that good stuff.

Given there's generally some sort of testing, sometimes it's check the box or like unit testing, then there are like, a huge number of ways that you can deploy the software. There's still many manual deployments, which terrify me. I actually wrote another blog post earlier this year where it was 69 ways to screw up your deployments, shall we say? And it's just there are so many ways it can go wrong. So I definitely recommend automating deployments as much as possible. There are a lot of companies who have done that and adopted things like CI CD pipelines, which are great. The CD part is harder than the CI part.

Then once you're delivering the software in production when it's reaching end users, you have a whole host of challenges and that's past. His heritage is actually helping with some of those challenges on the content delivery side, making sure things are reliable, making sure that you know, you don't experience downtime. Then there's also all the database stuff, or stateful stuff as I like to put it where, you know, databases are just hard, right? You have logging, monitoring, you have incident response. It's just like it is so complicated, and I tried to help make it feel a little less complicated in the book and again, provide opportunities. But I think the key thing, no matter where you're looking across the software delivery lifecycle, is things have to be repeatable, because humans make mistakes. We shouldn't get mad when humans don't repeat the same action perfectly every time because that's not what humans are best at, humans are best at adapting. So if we can automate the things that need that repetition, it's only going to benefit us from both a reliability and security perspective. 

VAMOSI:  Right . And security is only as strong as its weakest link. 

SHORTRIDGE:  I don't always agree that security is only as strong as its weakest link. Because that's often used against humans. It's often to decry, like, oh, well, humans, you know, again, they make mistakes all the time. So what hope is there and I think that's not the healthiest attitude, but integration testing. So a lot of failure we see and what attackers look for. Because they think in systems and a lot of cybersecurity think teams are thinking components, attackers look for the interaction between things like ransomware is essentially an attack against the fact that you have a lot of resources that are all interconnected and interact with each other and generally have full permissions. So that's part of how you can have that cascading failure. Same thing attackers will get access to, you know, maybe a server that doesn't matter much they immediately look for, okay, what other access can I gain? What can I pivot to? How can I migrate? So they're always thinking about how things are interconnected and interactive and integration tests if you don't want to leap right into chaos experiments, integration tests to me are just before you even have application security testing, you need your integration tests.

Because there was an example of a vulnerability in Oracle Cloud infrastructure, which I talked about in the book, where basically you would like access another volume through basically like a really unintended interaction that their credit Oracle fixed super quickly, we love to see it. But that's something that could be caught with an integration test and a lot of things were like multi tenancy can fall over in a kind of cloud case or, you know, oh, you may have a perfectly secure you know, server perfectly secure database, but how they talk to each other. Something is messed up in the implementation and integration tests can catch those kinds of things. So it really helps us better understand how things interact. are they interacting as intended, and make sure that the attackers don't find those snafus first.

VAMOSI: So going back to the definition of resilience, we have already talked a little bit about the failure aspect of it. Is it that we're just not trained to be thinking about that, that we just think that the code is always going to work so we're not thinking about the disaster scenarios that could happen.

SHORTRIDGE:  I think developers are definitely conscious of what can go wrong from again, a performance perspective. Most any developer probably at least a urine has made some sort of mistake or seen something go quite wrong. I do think that the cybersecurity industry there is a very ingrained thing in just the industry culture that we need to prevent failure at all costs. That if an attack happens at all, then it's game over. We failed. And I think that's not a particularly constructive mindset because you can't stop failure from happening. It's impossible in any sort of complex system. Instead, though, we can make it so okay. An attacker gains access to let's say, again, serverless function. They can't really do anything worth it. They can't modify it. They can't use it to access anything else. We can immediately kill the function and restart it because we see like something's going wrong. The impact is minimal. So yes, an attack happened but we actually won because the attacker wasn't able to succeed.

And they had to like, either move on to a different company or think through like, Okay, what am I going to do instead? So I think it's starting to see victories minimizing impact is such an important shift, and I think it's getting much healthier. It's hopefully we won't feel so downtrodden. Like, oh, this attack happened. It's like, yeah, the attacker got nothing. Like they didn't gain anything from it. And we didn't experience any disruption to our service. No data was leaked, like all of that should be seen as a victory, especially in my view, it's like, you're able to keep even impacted same as once last year, but your revenue grew like 5% 10%. That's a win. It means that you're succeeding security wise, but we don't always think about it that way today and I'm hoping to help change that.


Challenging the Binary View

VAMOSI: What originally drew me to interview Kelly was a blog she wrote on Sun Tzu and the whole metaphor of war within cybersecurity. That personally really resonated with me. It always strikes me as rather binary -- you’re good or you’re bad. You’re a White hat or a Black Hat.  And yet our favorite response whenever someone asks a security question is … It depends. So it isn’t so binary, is it? Lots of nuance. Lots of shades or gray. Kelly talks about always trying to find victories. That isn’t always the work of security. So, yeah, we need to reframe that it's not a war per se. 

SHORTRIDGE:  So one thing that I was very careful about in the book and in general is not to frame things like a battle or use military lingo. Do use terms like nurturing, instead borrowing from ecology and nature, like how do we empower people and just a little more positive of language? The problem with some of the military and war lingo is that it can quickly become more and more with our users and the humans that are involved in the system. And it makes the stakes feel much higher, I think, than they are accepting. You know, I'm going to remove cases where in cybersecurity, it's a nation state game, and it's literally more right, talking about most corporations and commercial use cases. So in that case, we really want to view it as again, like how do we nurture the humans in our systems? To adapt? How do we make sure that the secure ways the easy way? How do we work with human behavior rather than against it?

And the thing that I found when I was reading, rereading some zoo because I was reminded of how was a trend even less than a decade ago to just have a bunch of Sunsoo quotes in your cybersecurity presentation at conferences. So I reread it and I was like, Wait a second, actually, what he's talking about in terms of like fundamental blunders, a lot of things that cybersecurity industry does today. So things like ignoring the context of your users in the context of your company, ignoring the fact that you know he refers to just like the conditions like the soldiers face, right? Or basically, you don't want to make them feel like they're betrayed because you're asking them to do something impossible.

Well, we often ask that our users we expect them to pay perfect attention to every email and every link like that's, that's impossible. You can't expect users to pay 100% attention all the time. The other thing he says is like, if you stay rigid, you can't adapt, and that's going to hurt you in battle. And so basically, what I countered with is what Sunsoo is actually saying is basically we need to understand user's local context. We need to respect the fact that, you know, human attention is finite. We need to respect the fact that they have incentives like shipping code or closing deals, we shouldn't expect security to be their number one priority. And importantly, we need to preserve that capability for adaptation. And if we want to outmaneuver attackers, if we stay rigid, attackers won't be rigid. There'll be more flexible and they'll be able to outmaneuver us. So actually, my Blackhat talk is basically about how we can borrow attacker advantages like being nimble, empirical and curious and harness that for an advantage and defense as well.

VAMOSI:   Kelly mentioned that how Sun Tzu is mentioned in every cyber presentation. I've seen those talks -- they’re not original. So Kelly quotes various literary sources within her book. It's refreshing to see the humanities being represented in infosec.

SHORTRIDGE:  And I joke is, you know, like many who have a liberal arts degree, it was rather pricey, so I'm trying to, you know, get my money's worth, but also, I'm just like a huge literary nerd. I just devour fiction. And I think there's so much that the world outside of cybersecurity can teach us whether again, I draw on complex systems domains like healthcare, aerospace, urban water systems, you know, the diversity in birds even comes up in the book like there. There's some random places that can really inspire us from a cybersecurity perspective.

And I do think there are literary aspects that can inspire us as well because again, resilience if you look at it is part of the reasons why humanity has succeeded. You know, part of the reason why we have the technology we have today is because of adaptation, and being able to adapt to these evolving conditions. So again, I do think there's kind of this poetic heart to resilience that I hope again, makes cybersecurity maybe a little less dour and a little more like yeah, you know what we can do it right. That would be a nice refreshing change

VAMOSI:  So yeah, getting to that nimbleness. We always romanticize the the attacker only has to be right once and they've got all this flexibility and whatnot. When in reality, it's not that particularly if you want to like be specific like a Stuxnet or something. It takes a tremendous amount of research to get that at that level. So how can how can we start to be more nimble?

SHORTRIDGE:  Yeah, I totally agree. It's a myth. attackers have to get right once and then they have to get right every time after that, because every single move they make after that point is, is potential for being detected and kicked out by defenders. So I think a key part of it is again, what we're talking about is that adaptation, like some of the things I'll be talking about my Blackhat talk, and I've already talked about in the book, food, things like modularity by design, just making sure you can fail independently change independently, things like those CI CD pipelines, things like infrastructures code, making sure that you can move infrastructures code is a ton of benefits for cybersecurity, but like getting stronger change control being able to patch more quickly. Things like again, we talked about isolation, some of the new web assembly stuff that is super cool.

And I'm privileged enough to get to work on it at fastly. And it's even down to what systems signals do we collect? A lot of those overlap with reliability signals. So we're not always looking at things like queue depth as a security signal today. But that can actually help us be a little more nimble because we start getting more of those leading indicators of compromise that can help us respond more quickly to and overall systems thinking and thinking about the interactions between things, is also going to let us be more nimble. lets us be more curious and then also, if you combine it with things like chaos experiments, what I call resilience, stress tests, and allow you to be more empirical, which is a key advantage attackers have today too.

VAMOSI:  I wondered if Kelly had any ideas as to how we got down that path? I mean, is it the fact that you know, it was a largely male culture to begin with and computers and cybersecurity. Or that the internet started out as a military project hence the language that we discussed.

SHORTRIDGE:  It's interesting, I've been trying to look at kind of the origins of it from an almost anthropological or sociological perspective is interesting, because up until kind of the mid 80s, and this might be a historical, like rabbit hole for your listeners, but up until the 80s. There was very much this focus on how we create security by design, I think actually over indexing on things like formal methods a bit, but you know, math is cool. And so people like that, and you saw a lot of it concentrated in the intelligence community.

And so a lot of it concentrated into kind of like a hardcore computer science like people just deeply understood computers at a very technical level. And then at a certain point, you know, I'm not sure I can identify all the contributing factors, but the long and short his funding dried up for a lot of that activity and enter the internet and suddenly had the emergence of kind of like the network security companies and certain then the endpoint security companies and antivirus. And there was this growing need for cybersecurity professionals because attacks were starting to emerge and happen. The interesting thing is a lot of cybersecurity people at the time basically because of the way it looked came up through more of the network side of things. They learned how to like work with the cybersecurity tools because that was really the challenge at the time. People stopped really thinking about the security by design, it was, you know, out of fashion, in a certain sense and I think from there it's almost like the emergence of the cybersecurity industry. Was very concentrated just on the vendor side of things rather than like the computer science part. I think that kind of ran rampant for quite a while.

And I think there's the other thing is there was still a counterculture to it. It was not something that was mainstream, very few people knew about you know, cybersecurity and even today, a lot of people think you're basically speaking technobabble when you talk about cybersecurity stuff, right? And so there was almost like this cliquishness are like, Oh, we're the cool kids who are kind of away from everyone else. And I think a bit of self selection, in some cases of like, well, we're the protectors and we know what's best for you. Again, that might have worked for a while I think we're in a very different world where there's just more collaboration now.

We do have the opportunity with some recent innovations, more on the software and infrastructure side to start investing in security by design. So I think it's just know a lot of strategies. This is actually a core part of psychological resilience. strategies that help you during times of stress and turmoil. They actually sabotage you when times are good. It may keep you rigid from pursuing, like better opportunities. I think that's kind of what's happening now is we just need to get a little more open and flexible and adaptive because we have all these new opportunities. Now. So that's my very long overanalysis of why we might be in this position today.


"Secure by Design"

VAMOSI:  I want to go back to Secure by Design. CISA defines it as products where the security of the customers is a core business requirement, not just a technical feature. Haven’t we been doing this all along?

SHORTRIDGE: That's a great question. I think I love all your questions around secure by design. I'm a huge fan of that. And certainly resilience. Actually, my Blackhat talk I refer to as a resilience revolution, because I do think like you said, it's a paradigm shift. I think it's more inclusive kind of across teams, and it really tries to break down those silos that we see in cybersecurity to make it like you said, so anyone who's just interested in computers and making software systems better can contribute towards making cybersecurity better. I'm in the midst of finalizing my Blackhat talks. I'm like very immersed in it and very excited because it's spitting a lot of the kind of folk wisdom we have, like attackers are fast, they're ever evolving. It's like we can be that too. And kind of talking about all the ways we can make that happen is a lot of fun. So I'm looking forward to that.

VAMOSI:  I hear from a lot of different people that secure by design doesn't necessarily mean from the beginning. Things -- products and projects -- are often in progress. And you never see the software development lifecycle in its totality. Usually, you’re just a part of it and understand how that part fits. How can security by design be phrased differently or be more accessible to people who are in the midst of working right now on projects

SHORTRIDGE:  Yeah, do you like house use of premises both by design and by default? Because I think that's a powerful way to indicate basically, we want the secure way to be kind of like the easy and fast way. There's something in the book I created called the ice cream cone. Hierarchy of security solutions basically says that when we talk about security by design, it's really being able to either eliminate hazards by design, or reduce things like hazards, hazardous methods and materials. I'm borrowing lingo from physical safety, but a hazardous method for instance, is the Mandel deployments we talked about something incredibly hazardous. So if you're able to automate part of that process, you've reduced hazardous methods by design, you've made it so that the design of the system now has something that's a little less prone to failure, which is what we want.

Now other examples, a lot of companies wrestle with like, at least companies that have been around a while they have a lot of C code. C code, I've analogized to lead like it's very useful, but it's poisoning us over time. You know, the hotness right now is rust. It can be very difficult to refactor C code into rust there aren't that many rust developers. But there are approaches like web assembly where there's a project called RL box where you can actually wrap your C code in, in essence, a web assembly wrapper and you don't have to worry about like vulnerabilities or like memory corruption issues in your C code as much like there are techniques to make the C code less stinky, shall we say?

And I think there are a bunch of things like that. Again, we're if you start thinking about things like okay, it's a hazardous method or material. How do we either remove it entirely, or how do we at least reduce the hazard associated with it? And again, that can be something that you don't design from the beginning, like very few people are starting to design from scratch. I'm a huge fan of isolation.

As an example of this. Again, you could have, you know, your monolith service. If you start peeling off, let's say the billing service into its own micro service. That's the only separate service you have. You put that in it's kind of isolated sandbox, however, that looks whether that's a realistic VM or container. You can now make sure that only the billing service has access to whatever payment data or financial aid in the rest of your monolith doesn't. You've reduced the hazard there.

So if the attacker gets access to all the other services, they still can't access that billing data, they now have to compromise the billing service instead. That's well sandboxed. That's a huge win. And yes, that involves some effort, but it's certainly a lot better than trying to break apart your monolith entirely right. So it's thinking through like, what are the most hazardous things and what can we do given the resources we have to start at either eliminating or reducing those hazards by design?

VAMOSI:  Kelly books has a lot of examples which any organization can use. What would a chaos experiment look like? What would a stress test look like?

SHORTRIDGE:  The one with Aaron which I love because the firewall only works 60% of the time is very memorable. But in chapter nine in the book, we have a bunch of different case studies. What I know is Verizon where they do something very clever, which is they basically validate that their security controls are working as intended by introducing a malicious workload and then like a legitimate workload with whatever changes have been introduced. So you can basically constantly validate like, okay, are secure Kirti controls working as expected. You know, we have one from open door which talks about experiments for logging.

So there the Census Bureau, I think, had an incident quite a few years ago now, where their logs were being sent to a sim that had been decommissioned for I think over a year, which is kind of a nightmare scenario, right? Like anyone across the software spectrum, just find that horrifying, right? That's clearly a mistake. And so making sure that you have alerting for stuff like that, like open door talks about how logging pipelines are basically our lifeblood, right.

And so if you disrupt those logging pipelines and see how the end end system behaves, you can make sure when a real incident happens you have that data that you need. So there's a whole wealth of stuff. I even built a chaos experiment that strips, cookies and forces cross Site Origin requests. I built it on fast Lee's like computer platforms. It's really nice. You can insert in front of a service that doesn't disrupt users, and it's a lot of fun. It's like a small, simple experiment, where we all just assume that our website is going to require cookies, especially a login page, we assume that it's going to block cross origin requests, but it's good to verify it anyway. So those even small experiments again, really help us get empirical.

VAMOSI:  Finally, I get a lot of questions from people that are thinking of transitioning into security. I personally encourage anyone to take up security. If you've got a background as a nurse or perhaps running an assembly line and so forth. You will have a different perspective than someone who's been coding all their lives. It's a very different world. Does infosec need that diversity?

SHORTRIDGE:  I agree with that. I think having as many lenses into how systems behave is so valuable for cybersecurity. I very fundamentally believe that it's difficult for us to protect a system that we don't understand. And it's very difficult given the complexity of diverse systems today. It's very difficult for a single human to understand that system too. So we need another like a group of people who all have those different perspectives whether again, programmer SRE, cyber security person I also think that if you were someone who understands things like again, automated deployments, CI CD, you understand just like the weirdness of how computers talk to each other, like distributed systems. All of that is so valuable for cybersecurity.

And I promise you the cybersecurity part isn't as hard as you think. You know, there's a lot of gatekeeping. But really, if you just think about like, Okay, if I were being really crafty, like how would I navigate the system that, you know, I'm currently helping make reliable, like, how would I disrupt that reliability? You're gonna go a long way in thinking how attackers think. And again, if you think about it very much like how do we just make sure that the system behaves as intended, you're going to cut off the path that attackers would naturally gravitate towards

VAMOSI:  I’d really like to thank Kelly Shortridge for talking about her presentation at Black Hat USA 2023 and her book Security Chaos Engineering: Sustaining Resilience in Software and Systems. Available from OReilly at Amazon or wherever you get your books. Evolving our software development to be more nimble and agile is going to take time and a paradigm shift. Given the alternative, we’ve got to get started today, so I anticipate we’ll be hearing more from Kelly in the near future. You can read her blogs or find out where she’s speaking next at

Share this post

Fancy some inbox Mayhem?

Subscribe to our monthly newsletter for expert insights and news on DevSecOps topics, plus Mayhem tips and tutorials.

By subscribing, you're agreeing to our website terms and privacy policy.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Add Mayhem to Your DevSecOps for Free.

Get a full-featured 30 day free trial.

Complete API Security in 5 Minutes

Get started with Mayhem today for fast, comprehensive, API security. 

Get Mayhem

Maximize Code Coverage in Minutes

Mayhem is an award-winning AI that autonomously finds new exploitable bugs and improves your test suites.

Get Mayhem