To retrieve stolen data: Have it phone home to tell you where it is.

If stolen smartphones and wads of cash from banks can do it, data can too

What if you had Lojack or PC Phone Home for private data?

Private data on public sites has the same problem as a tourist carrying cash: it's convenient for the owner, but even more so for a thief. Once it's stolen there's no practical way to track it back to the culprit, let alone make it self destruct so whoever stole it can't get any use out of it.

Both those things actually are possible, according to J. Oquendo, senior security architect at E-Fensive Security Strategies and a blogger at Infosec Island.

Despite attention and activity from business, media and government about digital security growing so quickly it is becoming what Oquendo called a "Cybersecurity Industrial Complex," very little information has become public that could identify attackers in the Shady Rat report, Sony attacks or any other major digital assaults.

The most common information is "source" IP addresses from which the attacks are supposed to have been originated.

U.S. military cybersecurity groups have frequently attributed long-term, sophisticated attacks to specific cyberwarfare groups in China, as have reports that combine IP addresses and common sense reasoning to convict China on the basis of what may be falsified or misunderstood circumstantial evidence.

Saying an attack on the UN came from a group of IP addresses within China and that much of the data taken reflected China's interest in internal Taiwanese politics sounds convincing as a sound bite, but wouldn't stand up under cross-examination in a U.S. courtroom.

IP addresses not only can be spoofed, they have to be by anyone hoping for more than a one-attack career. Spoofed IPs, remotely controlled zombies rented from botnets, commercial or free proxies, proxies that are free because their owners don't know they're participating in cyberwarfare and a dozen other techniques can hide the source of an attack behind one wall of fakery after another.

The South China Morning Post reported China suffered 500,000 cyberattacks during 2010, Oquendo writes.

That doesn't mean it actually was hit by that many attacks, or that China is actually the scapegoat for some other international sponsor of cyberespionage rather than the source of much of it.

It does mean much of the evidence against China is circumstantial.

Making it concrete – being able to say for sure who stole a piece of data, trace it from its source to storage sites where it is eventually stashed – would shed a comparatively blinding amount of light on who commits data theft, where the data go and what is done with it.

If it were possible to build a full-scale validation attribution framework, every unit of data – a record, a gigabyte, a chunk of storage space that could live in a metadata container able to contain data of many types – would have a beacon that would activate like a homing device if it were stolen, Oquendo suggests.

A data containers able to identify when it had been moved from an approved location, and send a stream of self-identifying packets back home would not require that much intelligence or additional storage overhead.

It wouldn't if the number of containers was relatively small, which means the containers themselves would have to be really big, which raises the possibility that the containers themselves could be cracked, data removed and the beacon function lost.

If the containers were small enough for single documents, the potential for huge increases in storage and bandwidth requirements might pose such huge cost penalties as to kill any project before it got started. (More globally, we probably don't want to give all those Word and Excel files more power to talk to each other, anyway. They cause enough trouble with the little they're already able to say.)

Oquendo's suggestion is more refined; also more widely deployed already.

Not only would adding beacons to every document be expensive, it would cause a huge shift in the signal/noise ratio of the Internet as well as capacity problems as millions of documents try continually to send notes home.

A more elegant approach is to add what he calls "loaded cookies" that can attach themselves permanently to identify the files when they're found.

It wouldn't require extravagant effort or even new technology. The open-source Honeynet Project created honeytokens as an integral part of its trap-the-intruder strategy in 2003.

Honeytokens are good tools for tracing and proving the ownership of data because they can be created to be unique and embedded within almost any set of protected data.

They're completely separate entities, with no real value of their own except a unique identifier the owner can use to track them down. They could be fake credit-card records in a database, for example, that could be used later to identify a data set incontrovertibly as one stolen from the facility that installed the fake record.

Here's the original description of honeytokens from Lance Spitzner, founder of the Honeynet Project, from 2003.

In calling them "the other honeypot, " Spitzner made them easy to understand, but limited their applicability.

As a digital honeypot, honeytokens could be stolen without cost to the owner, and used to locate or identify stolen data.

They couldn't phone home effectively because their streams of packets got lost in all the other billions of packets that all looked like normal Internet traffic.

Adding greater communications ability to honeytokens and salting them into data sets that are either vulnerable to theft or likely to be attacked builds into the crooks' payload a beacon that can call back to the owners to let it know where it's been taken.

Smarter honeytokens don't require the kind of full-database or record-by-record containerization as would attempts to secure every data bit, and they wouldnt' be so small and stupid they couldn't get their messages through.

If, as Pentagon cybersecurity planners recently acknowledged, it's basically impossible to keep determined hackers out of your network and better to build a stiff firewall and add the ability to detect and counter intruders once they've penetrated, smart honeytokens offer a third line of defense – not stopping data thieves, but making it possible to catch them after they've gotten the data back to where it's going.

Read more of Kevin Fogarty's CoreIT blog and follow the latest IT news at ITworld. Follow Kevin on Twitter at @KevinFogarty. For the latest IT news, analysis and how-tos, follow ITworld on Twitter and Facebook.

ITWorld DealPost: The best in tech deals and discounts.
Shop Tech Products at Amazon