The Hacker Mind Podcast: Scanning the Internet

Robert Vamosi

November 2, 2021

Traditional anti-malware research relies on customer systems but what if a particular malware wasn’t on the same platform as your solution software?

Marc-Etienne M.Léveillé from ESET joins The Hacker Mind podcast to talk about the challenges of building his own internet scanner to scan for elusive malware. Speaking at this year’s SecTor 2021, he shares some of his findings on Kabolos, a stealthy malware that uses SSH credentials to hide, that is perhaps exposed much easier through scanning the IPv4 space -- all 3.7 billion addresses.

Vamosi: War dialing, stated simply, is a technique to automatically scan a list of telephone numbers. This was made famous in the film WarGames when David Lightman tries to connect to a server in Sunnyvale, California.

David Lightman: Yeah, for Sunnyvale, California, the number for provision. And could you also tell me what other prefixes cover that area?

Vamosi: The script enumerates every possible phone number in a local area, and then every possible prefix. The script would then dial every possible number. What it might find is people, or fax machines. What was most interesting, of course, were the computer connections. Perhaps there was a bulletin board behind it, or perhaps there was a corporate network.

War dialing evolved in the early 2000s into war driving, which hackers would drive around to expose all the unprotected Wi Fi hotspots, say in a city. But what if you took it to the next level? What if you were dialed the entire Internet? That's over 3.7 billion connections you'd have to make. And that's just in the ipv4 space. And that's just for one port. There are over 3500 ports on each connection. That's a lot of scanning. But what might you learn? As crazy as it seems, there are legitimate reasons for scanning the ipv4 space and for diving deeper. into each of the connections. In a moment, I'll introduce you to someone who built a scanner, the challenges he faced operating it and the good data that he's now providing others.

Vamosi: Welcome to The Hacker Mind. An original podcast from ForAllSecure. It's about challenging our expectations about the people who hack for a living. I'm Robert Vamosi, and in this episode, I'm discussing the need to look for accurate data in malware, and how scanning the internet for compromised systems isn't as far fetched as it might seem and it might just be how we do anti malware research going forward.

Vamosi: In another episode, I talked about Heartbleed, a critical flaw in open SSL that could expose sensitive information when it was first disclosed researcher Robert Graham wanted to see how many vulnerable open SSL instances there were on the internet at the time. And there was a lot, about 600,000. But to find that information back in 2014, he had to scan the Internet, the entire internet and that was a very noisy process. He racked up a fair number of complaints, but it showed that scanning the entire Internet was possible. At this year's sector. I spoke with someone who built a scanner that's designed to repeatedly scan. He works for an antivirus company and he's been scanning for malware families on the internet.

Léveillé: My name is Marc-Ettienne, malware researcher at ESET for almost 10 years now.

Vamosi: Marc-Ettienne job at ESA is to learn as much as he can about specific malware families, such as those containing Trojans. Botnets ransomware well, you get the

Léveillé: Part of our job at ESET is performing in depth research on specific malware families. We can spend weeks or even months on a single narrow family to fully understand it. So what we do is reverse engineer malware samples. We try to understand as much as possible about the network protocol and the malicious activity that this particular malware does.

Vamosi: In many cases. That means obtaining the sample malware and deconstructing it to learn what it has been programmed to do.

Léveillé: Looking at samples is not enough. Sometimes we want to get more information about what the intent of the attackers are. And also how prevalent this malware family is. On the internet. We do have some telemetry on detections that are triggered by our product. Sometimes it's just not enough and it's also biased by our users, right so we don't have a full census of the whole internet about who is targeted by this particular attack. So by performing an internet wide scan, we get a really wider picture. And also our telemetry is quite limited to the Windows ecosystem because of our user base basically. So we have far less users for Mac and Linux than we have on Windows.

Vamosi: Most antivirus products are found on Windows, much less so on Mac and Linux. So analysis of prevalence of malware typically represents only what's being seen on Windows boxes. In other words, it's not always representative of the entire Internet. So he set out to build a customized internet scanner.

Léveillé: I've been researching Mac and Linux malware for the best few years and I needed something else to get the information about who targets are and how prevalent the malware is. One of the problems with using this technique is that with this particular malware, you must be able to reach out to the malware. While most of them are actually active and try to reach out to a CNC server command and control server.

Vamosi: malware that typically gets deposited on your computer from say a phishing attack or a malicious website is sometimes just a shell. Some of them have functionality, a lot of them do not and so they reach out to a CnC server to tell them what to do next. In a recent show, I talked to two researchers who were looking to identify active command and control or CnC servers. In those cases, the malware is reaching out what market Tn is looking for is different. In this case, the malware needs to be listening for his ping.

Léveillé: So this whole scanning is irrelevant if it's not listening to a particular port or you know it's not waiting, passively waiting for for command. So it's very specific to passive malware.

Vamosi: The war dialing example at the beginning, hanging up when the other end answers would be more like a Surface Scan or a port scan, where you just check to see if the port is open. When market tn performs his scans, he waits and listens to see what's on the other side. Is it a person? Is it a fax machine? Or is it a dial up internet service provider? Or in this case? Is it malware?

Léveillé: I've been looking a lot at server side threads. So this applies very well. Because in not all but you know, a lot of server side trip there's still kind of passive implant waiting for incoming connections. So this this works very well in those cases.

Vamosi: This all seems pretty straightforward until you realize what it is he's asking for. That's a lot of traffic. And not all internet service providers. ISPs will put up with that.

Léveillé: So it's really the first step in having some internet scanner that's that's durable and sustainable. And if you still want to scan the Internet in a year from now, then respect your neighbors and respect. All the ISPs involved.

Vamosi: Yeah, ISPs are going to notice if you suddenly start scanning the entire Internet they might just shut you down. So what you really need to work with is them and do it in advance and let them know what you're thinking and give them the chance to say no, in this case, at least for ISPs specifically said now, until one said yes,

Léveillé: we did try a few different ISPs before one of them accepted to use their network for scanning and we have a very good relationship with them. So hopefully, if this will last as long as possible.

Vamosi: ISPs don't like their users scanning from their network for various good reasons. One way to be good to the community is to have a banner that states what you're doing and how to file a abuse request.

Léveillé: We also are very respectful to them as well and respond to all abuse requests that we have. anyone tries to reach out to either demo or us about the scans then we'll be happy to answer all the questions and also excluding IP addresses. That doesn't want to be scanned for whatever reason. So we did have lots of abuse complaints since we started but not no one asked for exclusion. It

Vamosi: Turns out that when people heard what Marc-Ettienne was up to they were pretty cool with it.

Léveillé: Some of them did ask but when we explain what we're doing, you know, well, you know, we are trying to find victims who are specific member families. If you ever get compromised by this malware and or any other malware that we scan in the future we'll reach out to you about it. They just say that they're fine with the scan. So that's what some of us complain about, no exclusions so far, which is good for everyone. I believe.

Vamosi: This is a key difference between just bringing the internet with requests and actually having a research project behind it.

Léveillé: Though our goal is not to take down any system are in a trade, use the different backdoors that are available online just to scan and we do notify the victims whenever we find that they're compromised, really try to be as good as possible. Our goal is really to clean the internet from the bee's knees malware.

Vamosi: So why would you scan the Internet? Well, if you study malware, it makes a lot of sense. In order to get a sense of how prevalent a malware family might be. You need input. He said makes anti malware software and it's a customer base is usually Windows users. That means if Marc-Ettienne wants telemetry from his ESET customers, he's going to be getting a Windows version of the internet. Probably he'll see clusters primarily in North America and Europe. But what about the rest of the world? And what if the malware family he's investigating infect some other Oss? That we're scanning the internet for clues makes more sense. If we already have something like this shoden Or better yet, census, so why not use those?

Léveillé: One of the reasons we started scanning the internet is well, we used to use Shodan and Censys to do scans for us they were very kind and you know they're very generous of their time. But in some cases, we needed something a little bit more precise and we needed to run our own custom scan based on specific malware families.

Vamosi: Showing and census are well known internet scanners Shoden is well known for its search engine ability. It scans the corporate network for a range of devices, and then notifies that organization when something unexpected shows up, since us on the other hand continually scans the public ipv4 address space at all 3500 plus ports this gives us a snapshot of the internet at any given point in time. I have used senses to report on vulnerable open SSL implementations for my Heartbleed articles. Marc-Ettienne in his work at ESET needed a more focused and more dedicated scanner for his needs. That's not to say he was perfect in the beginning.

Léveillé: We're quite new compared to others was been doing it for multiple years talking about showdown and census. But we do it for a different purpose than what they do so they want to have an idea about the census, you know, like what services are available on the internet. And we're trying to find very precise things or devices or malware for our cases most of the time. But unlike them we don't keep the result. We don't make it in a searchable way, we don't have the interface or searching. They do it way better than what we would be able to.

Vamosi: In general his goals were very different.

Léveillé: Regular scans weren't because we wanted to perform like the first and shake with the malware to be able to confirm or confirm that the machine was compromised or not. And running custom scans like that is out of the scope of census and show that so we decided well you know, let's let's do it. On our own. developing our own custom module to perform the scan and audits to find victims were very specific malware families.

Vamosi: So how does one set about scanning the internet?

Léveillé: We started not from scratch because we use software from the center project which was developed at the University of Michigan. It's what powers the census, that project. And there's that there's actually multiple software that's part of the project. But what we use is the map itself and also that graph to which enables us to perform the entry with the malicious software that's running on the other end.

Vamosi: Zmap is a very fast open source, TCP, UDP and ICMP scanner. A companion project is Zgrab2 which performs the handshake at the application layer and parses the replies.

Léveillé: We do develop our own custom module for the grab tool to be able to fingerprint properly the different malware families that we are looking at

Vamosi: Fingerprint. The malware researcher first identifies the unique properties of any malware that they're looking for, such as specific IP addresses for CNC servers or credentials to get them access to those CnC servers, then triggers some behavior. Sometimes it may be multiple conditions to trigger the behavior and this isn't a one off activity. This is something he needs to do for his active research.

Léveillé: Exactly. So we've done 10s of scans already for different malware families. So one of our colleagues Susanna has been doing research on malware. So it's a Microsoft web server component in these cases as the internet scanner was very well applied to finding the victims with a different malware family that she found. So in some cases, we knew that they were quite widespread. Some in some other cases were just to do two or three different victims very targeted, but she's done a survey of more than 10 is backdoors. And we tried to find victims for almost all of them, not all of them. We could scan but for most of them we were able to.

Vamosi: Connection then as a matter of sending a get request and seeing what happens next.

Léveillé: Basically tried to connect to all the IP addresses on the internet and see if they're infected or not. And this is the new thing. The fun thing about the internet scanner is that we can do it in a few days through the whole process of identifying which port is open and also direct connect to each of the ports and identify if it's compromised or not.

Vamosi: When you access a website, you and I use common name like thehackermind.com. Behind that is a sequence of numbers resolved by your DNS and that sequence of numbers is the site's IP address. Currently, we use IP version four addresses which uses 32 bit address space or two to the 32 power. And so there are about 3.7 billion active addresses in the rest of the marine reserve. That sounds like a lot, but we're about to run out of those addresses. I mean, when you think about all the IoT products, your car, your phone, well, we're burning up more addresses today than we thought we'd be possible back in the early 1980s When the Advanced Research Projects Agency networks or ARPAnet first deployed ipv4, so we're adopting a new schema ratified in July of 2017. ipv6 is designed to overcome the problems of ipv4 address exhaustion. It uses much longer addresses, in this case 128 bit or two to the 128 power addresses, meaning there are more than 340 trillion trillion trillion IP addresses. That's a very large number. Someone has suggested that it's more than the number of grains of sand on Earth. We won't exhaust that number anytime soon. Although for the moment, relatively few are in use today. So it would be very, very hard to scan for all the ipv6 addresses.

Léveillé: That's correct. We just scan the ipv4 space and ipv6 as a very large address space, which makes it very, a lot more difficult to scan. We've not solved that problem yet on our end, and they are things that can be done but for us it's out of the scope for now. Scanning the ipv4 Internet yields enough results for us or for now.

Vamosi: So how long does it take to scan the ipv4 Internet

Léveillé: depends on the ISPs load I would say but it takes about two and a half days right now. Wow. That is fast. That's for a single TCP port and performing the entry. That's for the complete scan.

Vamosi: Fortunately for his research, not every port needs to be scanned

Léveillé: most of the time in the malware reanalysis period to just listen to either one board or they reuse some boards that that's already open. For example, for the IAS research I mentioned before scanning port 80 was sufficient because like most IIS servers will listen on port 80. I think we did four for three as well just in case it was a server running SSL on me. There is is servers running on non standard ports out there, but there was nothing that you can do enough to justify scanning for you know, 10 or 100 different ports.

Vamosi: So, what makes a good candidate for Marc-Ettienne research for his scans? How does he identify a family that he wants to drill down? And look at more in depth

Léveillé: in a more general sense? Like what malware family we choose to dig further, not necessarily for scanning, but there's a complexity of the malware. It's very complex. If there's answers we can, if there's questions we can answer right away and there's we can justify spending a little bit of time looking into it. Also the relevance so if it's something that's very relevant, then perhaps we'll find some time to dig it. Dig a little bit further and try to get a better picture. But also when it comes to Mac and Linux when there's some new family I would save some family that had been publicly disclosed then most of the time we'll spend days to look into it and see if it's interesting or not. In some cases, it's not but you know, it happens from time to time that you know, will this particular family be interesting, and here's why. And we'll spend a few weeks then and try to publish something interesting. About this district.

Vamosi: We mentioned earlier that most malware is either reaching out for instructions or waiting for something to contact it before acting.

Léveillé: So we don't have to contact any command and control server in some of the cases because then the malware is passively waiting for incoming connections. So it's possible it's a matter of family as both are active and passive where, you know, students reach out to a CNC server and also wait for incoming connections. But in most cases, it's one or the other. So he either reaches out through a CNC or waits for someone or something to connect with him and get commands.

Vamosi: And so it's important to understand whether the malware is passive or active.

Léveillé: So we are scanning for the former so the way where it waits for some incoming connection and we try to perform the whatever and shake there is and we were very careful not to execute any commands on the victim's machine. So it's really important that we do not try to clean anything or never you know perform anything that may be illegal so we don't use a backdoor. We just try to identify if it's compromised or not.

Vamosi: Kobalos is one such family of malware that he researched in Greek mythology, Kobalos is a small mischievous creature and it fits in this case. It infects Linux, BSD, Solaris, and others. According to isa capitalist is a generic backdoor in the sense that it contains broad commands that don't necessarily reveal the intent of the attackers. In short, capitalise grants remote access to a file system, provides the ability to spawn terminal sessions and allows proxying connections to other couples infected servers.

Léveillé: Kobalos is a multi platform that malware so we've seen variants for Linux FreeBSD and we believe that there's also AI x and so that is that are targeted it perhaps even Windows because there are strings in the nower that indicates that there might be some windows version of competitors. So the reason we started studying Kobalos is because it was very stealthy and it was just a single function that the sophistication they used was pretty good. And there was just this single function that needed to be reverse engineered, but also very complex because of all the actions that it performed, it just recursively calls itself and it makes the analysis quite hard.

Vamosi: So Marc-Ettienne starts by deconstructing what is known about the malware until he can find out what's truly unique about it, what he can effectively scan for

Léveillé: Once we figured out the network communication protocol, we realized that we were able to produce some kind of fingerprint to do the internet scanning and that's the that was our first use case for the Internet scanner.

Vamosi: With the scanner up and running. They can now see the prevalence and also who the victims might be.

Léveillé: We knew that it was pretty well done. pretty advanced, but we had no idea about who the victims were. We were able to scan one of the variants and we found out that the education sector was pretty overly represented in the victims.

Vamosi: When you think about it, it's not too surprising that education is so high up there. Higher education such as universities typically have high performance servers for the research facilities and for the general university system.

Léveillé: So there were high performance computer clusters that were compromised by Kobalos. And actually multiple of them in Europe. So after we realized that we also knew that on most servers that were categorized with providers, there were also some SSH credentials stealers. So we believe that they were able to compromise high performance computing clusters because they use credentials that were stolen from one of them to connect to the others.

Vamosi: Ssh, or secure shell is a cryptographic network protocol for operating Network Services securely over an unsecured network. It allows servers to talk securely to each other,

Léveillé: But we also found out that they were not the only ones targeted. There were some other organizations in the US and also one very large ISP in Asia. So we knew that they were really like a very large system, you know, not the regular servers that you end up with, but we still to this day don't know exactly what their intensity was. Because the malware itself is a very generic command so we can read and write files on the file system and just execute arbitrary commands that's provided by the operator. So we were still unsure about who was behind this whole operation which tried to track them and find exactly what they're after.

Vamosi: But European education systems were not the only victims of capitalists. They soon found that one of the victims in the United States was kind of curious.

Léveillé: So in the United States, there was a few personal servers and also endpoint security vendor actually doing security and we reached out to them about the the compromise and I know that they clean everything up but the been getting much information about like how they used it, I don't know how they investigated on that on there was quite surprising. So this was a business doing security and their main website and back into real compromise.

Vamosi: And wondering, could this be used for espionage?

Léveillé: That 's it's possible that they were trying to do with Kobalos. But also the available resources on those machines were quite considerable. So perhaps Bitcoin mining Well, cryptocurrency mining was in their mind. But as far as I know, that none of this was performed you know, it would have been very noisy, it would have been very easy to find if they were compromised for a very long time, and such activity was never noticed on the compromised systems. So it's very sit still, like I said, totally a mystery and couldn't find exactly what they were after.

Vamosi: So when you're scanning the internet, you're looking at a particular port that the malware has identified and you're scanning for that. You're using any sort of credentials. Any sort of defaults that can start a handshake.

Léveillé: So in the case of Kobalos, there was a password that was required to authenticate. So fortunately, we were able to fingerprint it before the authentication.

Vamosi: So when associating with a server there are a couple of steps one can think of as knocking at the door, and then somebody answers, and then that's where the market DNS team stops. They don't necessarily complete the process of gaining access. They just want to see who's on the other side of the door.

Léveillé: And as a general rule, we never tried to authenticate to any servers. We don't. Our purpose is not to find devices with default password on or try to gain access to a system that we're not authorized to authenticate to. So we would never try to send credentials unless it's something like our code in the malware itself. Then, for us, it wouldn't be something unknown because it was given in the mountain right. But so we don't try to authenticate with default passwords. But in the case, we have the password that's given to us in our samples themselves, then we could use them if necessary.

Vamosi: The point of the scanning after all is to identify how pervasive a given malware is. And sometimes that means finding malware on systems where the victims weren't even aware they had been compromised.

Léveillé: So we'll find anything on the port itself, but we were able to reach out to some of the victims and ask for additional details or, well, you know, here's what we know will come to us and say hey, your victim for this malware, here's what we know. And perhaps you can look for this or that and give us the results. In some cases, people are really happy that we reach out to them and also very happy to give us additional information about the incident. And we're very grateful and really makes us progress in our research because we can we know that the this Kobalos used to either drop additional malware or install additional malware or worked with some other malware that that either new to us that we that we knew before but couldn't find the how it was installed. So this is really when whenever we receive additional information about the incidents, it's always like a win win situation where we we can progress in out further victims because now if this victim has this additional power then I can go through all the others and say hey, you know, one of the victim found this additional malware perhaps you can look at your SSH client, for example, or any other software that's installed

Vamosi: Not only scanning serves to expose the existence of new malware, but it can, if we switch to a more active mode, help fill in the infrastructure of new and existing malware families.

Léveillé: So this is different. This is not about finding victims of malware families but finding infrastructure used by such malware families.

Vamosi: It’s funny how the criminal hackers seek to replicate legitimate software development.

Léveillé: A lot of our operators do scripts and you know, use DevOps principle and try to deploy servers as easily as possible. This means that in most cases, there's something that's very specific or that repeats itself for all of their infrastructure. So in some cases, it's possible to produce a fingerprint for those CNC servers and scan the whole internet and try to find what other servers belong to this threat actor. So we did that before for Lazarus for an example.

Vamosi: Lazarus is a criminal hacking group that has been traced back to North Korea. It is thought to have been behind the Sony Pictures data breach, and the theft of money from the Central Bank of Bangladesh, and even wanna cry ransomware attack in 2017 attribution such as this is possible because of fingerprints in the malware code and in the infrastructure found on the internet.

Léveillé: The idea is that if they use some SSL certificate that that's the same for all their their servers or if they use some non standard port for the CNC server bars or any other reason then it's it's possible to find them over all the the internet this app and to be quite useful in the best.

Vamosi: Again, this is where you are posing as the malware reaching out with your scan to wake up any CnC servers that might be connected with that malware or the malware family.

Léveillé: In some cases, they use multiple ISPs. I'm not an awkward malware myself. Like I can imagine the reasons but I know that some of our operators prefer to have a server that's closer to their targets. So if they do targeted attacks, for example, Germany, then they would try to find some server in Germany for this particular cluster of victims or a lot of ISPs we can pay in crypto currencies, because it makes it harder to track who was paying for this particular server. But it really depends on the group of operators that we're looking for, in general, the state sponsored group are more careful than the criminal like the cyber criminal groups, sometimes be in the ASPD use is actually part of the fingerprint. So if we if we know that, for example, it's an IP address that's in that belongs to the Amazon AWS cloud, then and this particular threat actor is known to use this particular cloud service and it matches all the other fingerprints we add, then we are even more confident that it's part of their infrastructure. Because sometimes we don't have the malware sample itself. We just know that. It just looks like some server that belongs to this traductor infrastructure

Vamosi: Sometimes this process is not so transparent.

Léveillé: There. Some of them make it very difficult to reach ISPs or hosting companies that hide behind a lot of other company names and that makes it very difficult for law enforcement to get the actual physical server. There's quite a few on the internet that do that. And yeah

Vamosi: In his talk, Marc-Ettiiene said "Don't underestimate the weird things you see on the internet." So I asked him what was behind that statement?

Léveillé: There's a few things so whenever you scan the Internet, be prepared to receive any kind of data or experience any kind of network weirdness happening. For example, when we scan for compilers, which is the example I used in the presentation. There's a lot of SSH server implementation that's single threaded, so you basically connect one to it. And even if you disconnect and try to connect again, it won't work because probably under the device itself, there's still some kind of active connection and it doesn't support adding multiple connections at once. Those are one of the problems we had when we scan for Cabarrus because we had to connect twice to the device.

Vamosi: So the lack of multiple connections on a device was surprising.

Léveillé: The other things you encounter is there's a few servers that tries to exploit vulnerabilities in scanners itself like showdown and census for example, as in a web page HTML web page output of the result, or some of them try to inject JavaScript or HTML in any of the available fields in, for example, an HTTP response. So you get those kind of devices. We don't display ours in HTML pages. So it's not a problem for us but if you want to display the results in an HTML page make sure that actually escaped all the characters properly.

Vamosi: Unlike Censys and Shodan, Marc-Ettiene's scanner doesn't have to display the results so he doesn't have to worry about injections on this HTML page.

Léveillé: There's also servers that send back whenever whatever you send. So that was another one. So basically, once we started scanning for I don't quite remember what it was exactly, but we were sending something and we were expecting the same thing, or something very similar in response. We got all those positive results and like we looked at it and actually the service will the server was just sending back whatever we sent before.

Vamosi: And then there were academic issues as well.

Léveillé: Other than network devices in between, I know produce some latency in some cases. We had four providers, it relied on the source port of the connection. And for some reason some devices didn't like some of the source port and we tried to use so there were quite a few things like that, that makes it a little bit harder and you have to filter out more of the results that you get.

Vamosi: So given the success does mark it and think that other companies will start doing this for themselves in the future.

Léveillé: It's certainly useful. And I have no doubt that other companies can profit from doing their own scan as well. Hopefully it will be in a good way. I mean, we've drawn the lines of what we are going to do and what we're not going to do. Hopefully everyone agrees that enter internet scanning is useful, but also very powerful. Like perhaps even too much sometimes if you try to authenticate to different devices or try to try to execute commands on the malware itself. That could even be illegal in some cases. Yes, there's a lot of reasons to do it. And I'm pretty sure that naturally some other security companies do it as well. Yeah.

Vamosi: I'd like to thank Marc-Ettienne for talking about his talk at sector 2021 in Toronto, Ontario. You can learn more about his research on We Live Security dot com And perhaps as a result of his work and others will start to see other anti malware researchers begin to scan the Internet on a regular basis. Who knows what they'll find next.

Let's keep the conversation going. DM me @Robert Vamosi on Twitter or join me on Reddit or discord. The deets are available at The Hacker Mind

The hacker mind is brought to you every two weeks commercial free by ForAllSecure.

For the hacker mine I remain your war dialing buddy Robert from

Share this post

The Hacker Mind Podcast: Scanning the Internet

Get a Demo

Or let us know if you have any questions

Complete API Security in 5 Minutes

Maximize Code Coverage in Minutes