Auto-detecting malware? It's possible

5 comments | 34I like it!
September 29, 2009, 08:50 PM — 

It's nearly impossible for anti-virus protectors to keep up with the pace of malware – producing descriptions of what that malware looks or acts like – around the clock, especially with forty thousand new and unique malware instances every day. And things are only getting worse.

Despite the fact that malware wants to hide itself, let's argue that there are secure ways antivirus protectors could learn about all installations of software -- good and bad -- that any of their end-users perform. Let's also assume that they could easily collect other data from these machines and users: geographic location, social networking information, type of operating system, installed programs and configurations.

If they could collect this information, it would enable them to quickly identify new malware strains without even looking at the code.

How is this possible?
We'll argue that if you know the circumstances of software installations and executions, then you can often tell what kind of software it is without even looking at the code. This information can auto-inform antivirus protectors. It can be used to provide immediate advice to a client machine, which turns to the "centralized [malware] nervous system" to ask whether a particular piece of code is safe to install or not. Let me provide a few examples of how this could work.

Geographic location. Consider a sequence of installations of some unknown program, performed over a short period of time within a small geographic area. Malware installation patterns, seen as a function of time, typically do not have a strong geographic component. But wait! Some malware does -- for example, malware that spreads over Bluetooth or Wi-Fi channels, infecting machines close to them. Of course, if everybody in a large local company patched some software at once this would also show up as geographically correlated installations. But the same patch is likely to also be installed in many other places at the same time, and will not spread like rings on water.

Social graph. Now imagine a graph representing all the computers in the world, where two nodes are connected to each other if and only if the owners of the corresponding two machines know each other -- or, more practically, if one of them lists the other in his address book. In plotting installations of new software, does it seem to spread along the vertices (the connections between the nodes) of the graph? Several infamous types of malware (like the Melissa virus) did just that, since they spread using the address books of infected machines. We don't need to know what the software looks like or what it does to determine if it is good or bad -- we only need to look at the pattern of installations.

However, a legitimate application -- advertised on a social network and shared between friends -- will also spread along social connections.

Sign up for ITworld's Daily newsletter
Follow ITworld on Twitter @IT_world

I like it!
Close

On Twitter now

parc

Powered by Twitter
You are logged in | Sign out
Sign in and post to Twitter

What are you thinking?

Cancel Tweet sent

On Twitter now

Comments

Useful but not the holy grail

There are some excellent ideas here. In fact similar ideas are in development at several major anti-virus vendors right now according to papers presented at last week’s Virus Bulletin conference in Geneva.
However, the approach is not without its problems. Although it is technically possible, and not really difficult, to observe all software installations on a computer there are serious privacy and trust issues to contend with. This is especially true in the corporate arena there are very good reasons why a company would not want any outside vendor, even their security vendor, to know exactly what is going on inside their network and what the contents of everyone’s addressbooks are. If such a system is to take advantage of geographical and social information the data cannot be truly anonymized.
Treating the new and uncommon with suspicion is reasonable but there are necessary exceptions. For example, documents used in targeted malware attacks can easily carry malicious payloads but are unlikely to have a wide distribution or a profile unlike that of legitimate documents.
Using the differences in behavior of malware when compared with legitimate software is one of the ways that current solutions can distinguish good from bad, even without the information gathered from other computers. Sophos do this as part of our HIPS technology (http://www.sophos.com/security/sophoslabs/sophos-hips/). Extending the technique to cover information from multiple computers when making a threat assessment can provide both improved proactive protection and faster reaction times but will not be the silver bullet that solves the malware problem.
| reply

i dont agree

i may not be understanding some of your concepts but i dont think you could get the statistics for what you want to do. yes this might detect some of the more brandish malware but what i want to see is a bot that is so smart it talks through forums and emulates human typing and posts pictures and the uses stereography to communicate through web apps. all the data coming into and out of the server look like real user data. it plays when the user is on and hides when they are idle. i believe that there is a perfect infection one that can never be detected once it has been installed. one idea i did like was the point that no regular code should be polymorphic well games like counter strike use a form of it to do copyright protection but if all the app writers sign their code then you just don't trust any unsigned code. then only run trusted signed apps so i add googles cert and only google code can run so if i have a app some dev wrote i have to add his trusted cert to run the code i think we already do a form of this but if we stuck tight on it and never run unsigned code we would be cool. it would work and counter strike would be ok, they make their metamorphic code and sign it and the trusted cert lets it run. not signed never going to run its that easy. hash for the program doesn't match cert info it doesn't run really how many companies would you need to add to the trusted list? iono im not a security guy yet but i just think that all this info would be hard to collect and wouldn't give you the info you need to properly detect stuff and not get a million false positives. everyone is becoming a developer thats why the number of viri written every day is growing exponentialy but deciding what "good" code or "bad" code will never be 100% heuristics may catch some possibly even most but you can never catch it all
| reply

Addressing the privacy question

Richard, I agree with some of your points, but respectfully disagree with some.

It certainly is true that there are several recent ideas of similar nature. Symantec's Quorum product is probably one of the first to hit the market; it uses the reputation of a piece of software to assess whether it is malware or not. A new piece of software is considered bad until proven good.

A paradigm shift towards centralized detection, though, is close to the holy grail. At least as far as mobile malware goes. Mind you, in 2-3 years, many believe that there will be more smartphones than PCs. That will be the machine targeted by malware writers. And here's the catch: Today's anti-virus approach does not scale to a large volume of threats for mobile phones, since detection would be too resource intensive. (Read: it will drain your batteries.)
Therefore, I believe we need to push everything we can off into the cloud, to use a common buzzword.

Now, about privacy. Is it actually possible to report truthfully any threat to a client machine without severely affecting the privacy of the user and his organization? At first, it may seem that you cannot, but you actually can. I will be presenting how this can be done at the members-only APWG '09 meeting in just about two weeks. I will also talk about retroactive detection, and how one can guarantee detection even for malware that infected *before* the detection program was installed. (Yes, this sounds absurd, and I fully understand if you doubt it.)

If you want a preview, contact me, and we can talk.

Cheers,
Markus
| reply
peer-to-peer

Brian Proffitt
Microsoft/Novell: Breaking Down the Coupon Numbers

Esther Schindler
Drupal's Dries Buytaert on Building the Next Drupal

Tom Henderson
Top Ten General Operating Systems Rants

pasmith
PS3 motion controller delayed; goes up against Project Natal

sjvn
Neolithic Windows security hole alive and well in Windows 7

claird
Perl source code comparison makes for good reading

mikelgan
Cell phones don't create stress or interrupt much

Sandra Henry-Stocker
How to: The Unix Interview

 

Where Google Chrome security fails: the password
I heard mention that the Chrome OS will have some sort of encryption available a la bitlocker. If it's possible to encrypt personal data using another password or key, then it may have potential for very secure data.... And Ubuntu has an 'encrypt home directory' option, perhaps google should follow suit.
- Dann

Join the conversation here

The Daily Tip

The Daily TipQuick, practical advice for IT pros. Made fresh daily.

Hot tips:

Want to cash in on your IT savvy? Send your tip to tips@itworld.com. If we post it, we'll send you a $25 Amazon e-gift card.

Newsletters

Subscribe to ITWORLD TODAY and receive the latest IT news and analysis.

I would like to receive offers via email from ITworld partners.
By clicking submit you agree to the terms and conditions outlined in ITworld's privacy policy.
Marketplace