January 11, 2011, 6:10 PM — Every company gets sued eventually. Or sues someone. Or has to go to court or hearings that look a lot like court to demonstrate they're being good citizens by keeping all the financial records in order required by Sarbanes-Oxley, HIPAA and a host of other shorthand nicknames for anal-retention.
Compliance is eating up so much of the average IT security budget that security managers complain they can't buy gear they consider to be the minimum required to keep valuable bits inside the building from going outside without permission.
A big part of compliance is data collection, though. And a big part of data collection is efficiency, according to John Palumbo, senior litigation support manager for high-tech Boston-based law firm Foley Hoag, LLP.
Palumbo, whose job title boils down to a combination of chief records expert and CIO, has spent four years upgrading and automating the way Foley Hoag deals with the mountains of unstructured data the firm has to take in from clients for every case.
The biggest problem, the one that costs clients the most money unnecessarily?
"Clients don't know how to collect data," Palumbo said.
Hah. Simple. These are IT folks. They know how to copy a couple of files and an Exchange database.
"They don't know what information they have; they don't know where it is; they don't know how to decide what it is they need to have, so they just copy it all and send it to us," Palumbo said.
It's not unusual to have a client sent 100GB piles consisting of email files, documents, transaction records and so forth that might be useful, along with the system files from the server on which some of them lived, all the attached JPGs from the email, and hundreds of copies of that cartoon or joke everyone thought was so funny for 10 minutes three years ago.
That's no crime, but it does take a while to go through all that data to sift out the maybe 5GB that's even vaguely worth reading and analyzing to find the parts relevant to the lawsuit.
There are automated ways to de-dupe and filter all that data -- for which you'll pay an average of $350 per gigabyte.
Even then you end up with a lot of extraneous emails, documents and other stuff that might be relevant to the trial, and might not.
Who goes through it and decides?