Naval researchers pioneer TCP-based spam detection

A group of researchers has built a SpamAssassin that detects spam by TCP usage

By Joab Jackson, IDG News Service |  Networking 3 comments

A group of researchers from the U.S. Naval Academy has developed a technique for analyzing email traffic in real-time to identify spam messages as they come across the wire, simply using information from the TCP (Transmission Control Protocol) packets that carry the messages.

This approach could be a useful addition to the arsenal of today's spam-fighting techniques, observers argue, in that, unlike other typical spam fighting approaches, the content of the email does not have to be scanned.

The work "advanced both the science of spam fighting and ... worked through all the engineering challenges of getting these techniques built into the most popular open-source spam filter," said Massachusetts Institute of Technology computer science research affiliate Steve Bauer, who was not involved with the work. "So this is both a clever bit of research and genuinely practical contribution to the persistent problem of fighting spam."

Researchers Robert Beverly, Georgios Kakavelakis and Joel Young built a plug-in for the SpamAssassin mail filter, called SpamFlow, that incorporates their analysis techniques. They presented their work at the Usenix Large Installation System Administration (LISA) conference arlier this month in Boston.

In the paper that accompanied the presentation, the researchers showed that spam email blasts have certain characteristics at the networking transport layer. Signal analysis of factors such as timing, packet reordering, congestion and flow control can reveal the work of a spam-spewing botnet. "A lot of spam comes from spambots, which are sending as fast as they can and congesting their local uplink," Beverly said. "So you can detect them by looking really hard at the TCP stream."

Thus far, earlier techniques developed for analyzing spam at the network transport layer have been offline, which is to say, the email traffic is analyzed as a batch, and the results can be used later. The naval researchers have developed an architecture for analyzing network traffic as it comes over the wire.

For the implementation, they used the the SpamAssassin email filter. SpamAssassin has a plug-in architecture for incorporate new filtering techniques. "We have a daemon that captures all the packets and looks timing and other congestion characteristics of the traffic stream," Beverly said. The plug-in can learn to identify and detect spam without human intervention. In tests, SpamFlow was able to correctly identify spam over 95 percent of the time, after a reception of 1,000 emails.

The ability to detect a spam message without actually examining the contents of the message would be handy in a number of situations, noted Bruce Davie, a Cisco fellow and visiting lecturer at MIT. Davie is familiar with though not involved in the work. An Internet service provider could apply the detection algorithm without violating users' privacy. It can be used to detect messages that are encrypted, such as those traveling over an encrypted link. It can also be used to detect other forms of malicious traffic, such as port scans from botnet hosts.

"Overall, I see it as a generally useful tool in the fight against malicious traffic," Davie said. "You can combine it with traditional anti-spam techniques to improve accuracy."

Currently, the team is beta testing the software at a number of locations. They plan to release it as open-source software afterward.

The U.S. National Science Foundation funded part of this work, under the Software Development for Cyberinfrastructure (SDCI) program.

Joab Jackson covers enterprise software and general technology breaking news for The IDG News Service. Follow Joab on Twitter at @Joab_Jackson. Joab's e-mail address is Joab_Jackson@idg.com

3 comments

    Ray
    Ray 8 weeks ago
    Dave, before you make any negative criticisms, you ought to read the paper first. First, Sendmail's "ratecontrol" setting only provides a simple threshold to the # of incoming packets. This means, for every server you are setting up, you have to manually set a threshold that works. This is only effective against DDOS like attacks. There is no study of effectiveness for this threshold either.

    The Naval paper, on the other hand, describes a method to classify "patterns" of timings using Naive Bayes classifiers and SVMs. The pattern of timings between packets are learned to determine what is normal and what is spam. This is truly unique compared to what exists in the world today. RTF paper before you spew false and destructive criticisms.
    DaveGillam_tw155432447 8 weeks ago in reply to Ray
    Ray, I believe you're talking about the MTA ratecontrol setting, not the commercial Rate Control milter I referred to. While I agree that the milter does not time TCP packets extensively, it has proven itself at many companies in my experience as being easy to setup & manage, and very effective at stopping spam floods and runaway application mail. I have now had time to re-read the Navy approach, and agree they are doing something a bit different, but the end result seems to be largely the same. You build a database of what specific IPs are doing, and stop them abusing you at the front end of the email relay process.

    One problem with learning what is normal from a TCP timing perspective, is when the sending site has multiple outbound hosts, and normally all of them share the load--then something happens to one or more of them, and the remaining ones take over the whole load. You end up with IPs that suddenly are sending lots more traffic than usual. A similar situation occurs when the sending site is transitioning to new gear. Generally the new gear is brought online, but only process a minority of email. Then production cutover happens, and the new gear take on the whole email load. You need to build in safeguards for these types of normal activity.
    This REALLY sounds like a copy of Sendmail Inc.'s Rate Control component, which has been deployed to many sites for the last several years. Rate Control allows the admin to throttle or otherwise block email that breaks various TCP-related thresholds (messages/second, bad recipients/second, connections/second, etc.). Further, recent real world indications show that spammers are sending fewer spams per second from individual IP addresses--they make up the volume by increasing the size of the botnet, and coordinating activity so that not too many bots hit the same relay at the same time. This is why Rate Control added an IP Reputation subcomponent a couple of years ago.

    It appears these Navy guys have simply come up with a tool that has already existed for years.

      Add a comment

      Post a comment using one of these accounts
      Or join now
      At least 6 characters

      Note: Comment will appear soon after you have activated your account.
      Obscene/spam comments will be removed and accounts suspended.
      The information you submit is subject to our Privacy Policy and Terms of Service.

      ITworld LIVE

      NetworkingWhite Papers & Webcasts

      White Paper

      HP X5000 G2 Network Storage System Data Sheet

      The new HP X5000 G2 Network Storage Systems is ideal for midsize companies. The solution is a two-node Network Attached Storage (NAS) cluster with shared storage built on HP BladeSystem technology. In this datasheet, you will find a in-depth look into the HP X5000 G2 Network Storage Systems including key features and benefits that set this unit apart from the rest of those in the market.

      White Paper

      HP X5000 G2 Network Storage System Quickspecs

      The new HP X5000 G2 Network Storage Systems powered by Microsoft Windows Storage Server 2008 R2 Enterprise edition are Network Attached Storage (NAS), with two-node integrated into a converged 3U chassis, that are designed for a better file serving experience. In this quickspecs flyer, you will explore a view of the key features and benefits that set this unit apart from the rest of those in the market.

      White Paper

      HP X5000 G2 Network Storage System Ease of Use, High-Availability, Performance, and Interoperability Evaluation

      Today's small and medium-size businesses (SMBs) have ever-growing storage needs. This often translates into a requirement for terabytes of storage and a level of high-availability typically associated with large-scale, enterprise deployments, managed without having a large IT staff. Given IT staffing constraints, storage solutions for SMBs must be easily installed and managed by existing IT staff, which is why HP introduced the new HP X5000 G2 Network Storage Systems, powered by Intel® Xeon® Processors. Independent tests show the HP X5520 G2 Network Storage System can be installed out-of-the-box to a fully-functional, high availability system joined to a Microsoft domain in less than 90 minutes with the help of an automated HP Configuration Tasks Tool guide.

      White Paper

      Optimizing Enterprise WLAN Performance

      This white paper reviews business and technology trends impacting enterprise wireless networks and describes how HP Mobility Solutions in general and HP Mobility Traffic Manager in particular enable the industry's most scalable, cost-effective and manageable wireless network deployments.

      White Paper

      The Cost Advantages of Using a Hosted Unified Communications Service: A TCO Guide for SMBs

      A challenge for small and mid-sized businesses (SMBs) is the cost of scaling their communications systems to rival the rich functionality and flexibility of bigger competitors with dedicated IT staffs. Upfront capital costs and the requirement for on-site staff to manage equipment and applications have fueled interest in hosted unified communications (UC) services, which allow smaller organizations to use a third-party provider's UC infrastructure in the cloud and enjoy the economies of scale of very large organizations.

      See more White Papers | Webcasts

      Ask a question

      Ask a Question