Auto-detecting malware? It's possible

By Markus Jakobsson Security, antivirus, PARC 5 comments

It's nearly impossible for anti-virus protectors to keep up with the pace of malware – producing descriptions of what that malware looks or acts like – around the clock, especially with forty thousand new and unique malware instances every day. And things are only getting worse.

Despite the fact that malware wants to hide itself, let's argue that there are secure ways antivirus protectors could learn about all installations of software -- good and bad -- that any of their end-users perform. Let's also assume that they could easily collect other data from these machines and users: geographic location, social networking information, type of operating system, installed programs and configurations.

If they could collect this information, it would enable them to quickly identify new malware strains without even looking at the code.

How is this possible?
We'll argue that if you know the circumstances of software installations and executions, then you can often tell what kind of software it is without even looking at the code. This information can auto-inform antivirus protectors. It can be used to provide immediate advice to a client machine, which turns to the "centralized [malware] nervous system" to ask whether a particular piece of code is safe to install or not. Let me provide a few examples of how this could work.

Geographic location. Consider a sequence of installations of some unknown program, performed over a short period of time within a small geographic area. Malware installation patterns, seen as a function of time, typically do not have a strong geographic component. But wait! Some malware does -- for example, malware that spreads over Bluetooth or Wi-Fi channels, infecting machines close to them. Of course, if everybody in a large local company patched some software at once this would also show up as geographically correlated installations. But the same patch is likely to also be installed in many other places at the same time, and will not spread like rings on water.

Social graph. Now imagine a graph representing all the computers in the world, where two nodes are connected to each other if and only if the owners of the corresponding two machines know each other -- or, more practically, if one of them lists the other in his address book. In plotting installations of new software, does it seem to spread along the vertices (the connections between the nodes) of the graph? Several infamous types of malware (like the Melissa virus) did just that, since they spread using the address books of infected machines. We don't need to know what the software looks like or what it does to determine if it is good or bad -- we only need to look at the pattern of installations.

However, a legitimate application -- advertised on a social network and shared between friends -- will also spread along social connections. But consider the speed at which the Melissa virus spread: a moment after a machine was infected, 50 emails were sent out to people in the address list. No matter how much users love the app their friends told them about, everyone is not likely to act that fast. And while malware writers can artificially slow down their spreads to avoid automated detection, that action helps antivirus companies distribute patches in a timely manner.

Time. Automated patching occurs around the clock, and worms infect no matter what time of day. But a Trojan, for example, depends on its victim being awake – the user has to approve its installation. Roughly speaking, if the malware takes advantage of a machine vulnerability, it often will spread independently of the local time of the day (to the extent that people leave their machines on, of course), whereas malware that relies on human vulnerabilities will depend on the time of the day (as does most legitimate software).

Behavior. Malware typically behaves in a very static manner – either it uses the address book, or it does not; either it spreads over Bluetooth, or it does not; and so on. But legitimate software is different. Think of a game: a small number of enthusiasts play it, and tell their friends about it. Then, a local newspaper picks up a story about the game, and lots of people in the city where the newspaper is published – whether they know each other or not – start playing it. Some of them are in the same neighborhood, others are miles away from the closest person who also installed the game. The patterns change for many kinds of legitimate software, but not for typical malware.

Yield. This is the term used to measure the chances that a machine that could become infected actually does become infected. Or, for legitimate software, the chances that a person who is given the opportunity to install the software decides to do so.

5 comments

    Anonymous 2 years ago
    file signatures change , we see it, there it is, with looking only where it came from, and especially if the main virus loading the malware is already a dll replaced and not installed on the machine, then the malware can be blocks of code hiding within files and then reconstructed on the other end. Does the author know anything about what he is writing....I have a hard time believing he is a security expert working at one of the big companies...you can give a probability of infection of pcs, but to really know you need to look at the files, else there are too many ways to hide such info.,..you would have to be starting with a machine you could guarantee was not infected and move from there..this is impossible, even some NEW computers come with boot virus built in from china...(lenovo...we are talking about you!)
    Anonymous 2 years ago in reply to Anonymous
    You look at all events -- installation of a new app, execution of another one, changes of parameters, ... additions of DLLs. The way, of course, that you identify these pieces of code can not be by their name -- the malware author would simple name his code in a convenient manner. No, it is by a hash or another code signature. And yes, sometimes you may not know for sure (just as traditional AV results in false positives, and loads of false negatives.) But if you can give each machine a score ... say "30% risk" ... that is a huge step forward.
    Anonymous 2 years ago
    i may not be understanding some of your concepts but i dont think you could get the statistics for what you want to do. yes this might detect some of the more brandish malware but what i want to see is a bot that is so smart it talks through forums and emulates human typing and posts pictures and the uses stereography to communicate through web apps. all the data coming into and out of the server look like real user data. it plays when the user is on and hides when they are idle. i believe that there is a perfect infection one that can never be detected once it has been installed. one idea i did like was the point that no regular code should be polymorphic well games like counter strike use a form of it to do copyright protection but if all the app writers sign their code then you just don't trust any unsigned code. then only run trusted signed apps so i add googles cert and only google code can run so if i have a app some dev wrote i have to add his trusted cert to run the code i think we already do a form of this but if we stuck tight on it and never run unsigned code we would be cool. it would work and counter strike would be ok, they make their metamorphic code and sign it and the trusted cert lets it run. not signed never going to run its that easy. hash for the program doesn't match cert info it doesn't run really how many companies would you need to add to the trusted list? iono im not a security guy yet but i just think that all this info would be hard to collect and wouldn't give you the info you need to properly detect stuff and not get a million false positives. everyone is becoming a developer thats why the number of viri written every day is growing exponentialy but deciding what "good" code or "bad" code will never be 100% heuristics may catch some possibly even most but you can never catch it all
    Anonymous 2 years ago
    There are some excellent ideas here. In fact similar ideas are in development at several major anti-virus vendors right now according to papers presented at last week’s Virus Bulletin conference in Geneva.However, the approach is not without its problems. Although it is technically possible, and not really difficult, to observe all software installations on a computer there are serious privacy and trust issues to contend with. This is especially true in the corporate arena there are very good reasons why a company would not want any outside vendor, even their security vendor, to know exactly what is going on inside their network and what the contents of everyone’s addressbooks are. If such a system is to take advantage of geographical and social information the data cannot be truly anonymized. Treating the new and uncommon with suspicion is reasonable but there are necessary exceptions. For example, documents used in targeted malware attacks can easily carry malicious payloads but are unlikely to have a wide distribution or a profile unlike that of legitimate documents. Using the differences in behavior of malware when compared with legitimate software is one of the ways that current solutions can distinguish good from bad, even without the information gathered from other computers. Sophos do this as part of our HIPS technology (http://www.sophos.com/security/sophoslabs/sophos-hips/). Extending the technique to cover information from multiple computers when making a threat assessment can provide both improved proactive protection and faster reaction times but will not be the silver bullet that solves the malware problem.
    Anonymous 1 year ago in reply to Anonymous
    First off, much love to Sophos antivirus. Second, I think you make a great point here, Richard. This methodology will always be a good complement to other methodologies, but I can't see it ever being a complete stand alone solution.

      Add a comment

      Post a comment using one of these accounts
      Or join now
      At least 6 characters

      Note: Comment will appear soon after you have activated your account.
      Obscene/spam comments will be removed and accounts suspended.
      The information you submit is subject to our Privacy Policy and Terms of Service.

      ITworld LIVE

      SecurityWhite Papers & Webcasts

      White Paper

      A Proactive Approach to Server Security

      Learn why security-conscious organizations are taking a more proactive approach to server security. Download this Spire Research whitepaper to understand how you can eliminate the threat caused by today's more advanced threats and protect your organization's most valuable data.

      White Paper

      Protection Against Modern Cybersecurity Threats

      Download this case study to learn how this accounting and consulting giant uses Bit9's adaptive application whitelisting to offer employees flexibility without jeopardizing enterprise safety.

      White Paper

      Stop Hackers Before They Attack

      Hacktivism, Identify Theft, Financial Gain, Cyber War - regardless of motivation, stopping today's hackers requires a new proactive approach to protecting endpoints. Learn how this New England hospital, breached multiple times by targeted attacks, put an end to the malware with Bit9 Parity. Their IT team can now identify malware and secure PCs and workstations -protecting patient care and privacy.

      White Paper

      From the Frontline - Preventing APT

      Is your company's network secure? Are your endpoints and servers secured? Before you answer, read this case study on a US Military Command that discovered no matter how much you educate users, hackers can get through traditional defenses. This targeted attack blew through all layers of their security, except one: Bit9 Parity's advanced threat protection.

      White Paper

      Protecting Point of Sale Systems from Targeted Attack

      If you are responsible for protecting retail systems, download this case study to learn how this retailer eliminated the threat of malware on their POS systems using Bit9's award winning solutions.

      See more White Papers | Webcasts

      Ask a question

      Ask a Question