August 21, 2012, 12:00 PM — People who have been involved in the challenges of e-discovery for a while remember when email arrived on the scene nearly two decades ago. It changed the way people collaborate and left companies with mounds of digital information that was costly and time-consuming to sort through when litigation struck.
The arrival of social media is in many ways a repeat of those challenges. As was true of email, social media comes with new metadata and formats. But because of the similarities, there is an opportunity to avoid the mistakes made with email. One thing is clear: Companies that dive into social media without the right policies and solutions to govern usage will encounter information governance and e-discovery nightmares down the road.
With email, companies could plead ignorance about the e-discovery issues that arose. The digital revolution was new and case law and civil procedure rules were still in flux. With email as a precedent, however, companies cannot hide behind ignorance in the case of social media. Instead, they can get ahead of social media by putting in place governance policies, processes and tools to ensure that the email history lesson informs these new methods of collaboration.
Social media have seen widespread adoption. In order to avoid the mistakes made in the email generation, companies must figure out ways to best collect and preserve social-media content in the event it is needed for e-discovery. Today, this practice is extremely immature.
Recently, the eDJ Group conducted a survey on "The Cloud and eDiscovery" that looked at the experiences e-discovery professionals have had with collecting and preserving information from cloud-based sources such as Amazon, Rackspace and social-media publishers. At most, only 15% of the respondents indicated that they have had to collect from a popular social-media service. But that figure will surely rise.
Technological methods for collection and preservation
When it comes to the collection and preservation of social-media content, companies have several choices of technological methods, each with distinct pros and cons.
A Web crawler is a computer program that periodically browses the Web (in this case, a social-media URL). The crawler creates a copy of the page to be stored for processing into a preservation repository. Companies can set up Web crawlers to capture content from social-media sites at various intervals. Most of these systems store social-media content as static Web pages. However, Web crawling does not necessarily create a forensic capture of a Web page in its full context and therefore may not be sufficient in certain types of cases.
Companies can set up programs that will essentially take a screenshot, or screen scrape, of a Web page and then store that image as a record of the page at that point in time. In most cases, the image will be converted to a PDF (or similar) file so that it can be indexed and searched within a preservation repository. A screenshot, though, is not a full capture of the information in a Web page. It lacks metadata and other context that may be important depending on the matter.
Publisher application programming interfaces
The major social-media publishers have APIs that third parties can write to in order to enable collection directly from the publisher. By writing to an API, it is possible to capture all of the data and metadata that the publisher makes available -- for example, a Facebook page -- and then map that data back into a preservation repository. A major consideration with the API method is bandwidth; social-media sites create massive volumes of content. In 2011, content aggregator Gnip estimated that Twitter created 35MB per second of sustained network traffic. That is a lot of content to ingest. It is wise to use third-party applications that connect to the social-media publishers directly.
There are many ways to execute an API collection approach. Many third-party vendors build connectors to social-media publishers and then provide applications that allow customers to collect and preserve as needed. One approach is to have employees authorize the enablement of an application that sits on the social-media site and have that application monitor and collect all information. This can be done automatically at the company firewall and gives the company an opportunity to restate policies and capture login information with informed consent of users. This practice has user privacy implications that should be carefully evaluated by counsel, especially for global corporations with users/customers located in foreign countries with strong privacy protections.
In the context of collecting and preserving social media, a proxy approach is one where a company requires employees to interface with social media through a proxy server so that interactions can be monitored and captured.
The most comprehensive approach to social-media collection and preservation would combine the API and proxy methods. Doing so would ensure complete capture of all of a user's social-media content. But this approach is probably overkill for any but the most highly regulated organizations (and even then, it will only be a small subset of employees in regulated companies that need to be monitored so closely).