September 29, 2009, 9:50 PM — It's nearly impossible for anti-virus protectors to keep up with the pace of malware – producing descriptions of what that malware looks or acts like – around the clock, especially with forty thousand new and unique malware instances every day. And things are only getting worse.
Despite the fact that malware wants to hide itself, let's argue that there are secure ways antivirus protectors could learn about all installations of software -- good and bad -- that any of their end-users perform. Let's also assume that they could easily collect other data from these machines and users: geographic location, social networking information, type of operating system, installed programs and configurations.
If they could collect this information, it would enable them to quickly identify new malware strains without even looking at the code.
How is this possible?
We'll argue that if you know the circumstances of software installations and executions, then you can often tell what kind of software it is without even looking at the code. This information can auto-inform antivirus protectors. It can be used to provide immediate advice to a client machine, which turns to the "centralized [malware] nervous system" to ask whether a particular piece of code is safe to install or not. Let me provide a few examples of how this could work.
Geographic location. Consider a sequence of installations of some unknown program, performed over a short period of time within a small geographic area. Malware installation patterns, seen as a function of time, typically do not have a strong geographic component. But wait! Some malware does -- for example, malware that spreads over Bluetooth or Wi-Fi channels, infecting machines close to them. Of course, if everybody in a large local company patched some software at once this would also show up as geographically correlated installations. But the same patch is likely to also be installed in many other places at the same time, and will not spread like rings on water.
Social graph. Now imagine a graph representing all the computers in the world, where two nodes are connected to each other if and only if the owners of the corresponding two machines know each other -- or, more practically, if one of them lists the other in his address book. In plotting installations of new software, does it seem to spread along the vertices (the connections between the nodes) of the graph? Several infamous types of malware (like the Melissa virus) did just that, since they spread using the address books of infected machines. We don't need to know what the software looks like or what it does to determine if it is good or bad -- we only need to look at the pattern of installations.
However, a legitimate application -- advertised on a social network and shared between friends -- will also spread along social connections.