Storage Tip: Choosing an e-discovery tool

By David Hill, Mesabi Group |  Storage Add a new comment

Send your Storage question to David Hill today! | See other Storage tips


What seems to be the problem? When presented with the challenge of finding data to meet
e-discovery requests for legal purposes, IT administrators may have to search both high and low to find all the data and to put it into an analyzable format. But collecting the data into a searchable repository is only part of the challenge. The second challenge is to extract what you need, and hopefully only what you need, from the potentially vast pile of information, i.e. a data haystack. And going through that data haystack to find not only what you need, but only what you need can prove to be a formidable challenge.



What do you need to know? When you search through a stack of documents, you want only relevant documents identified by a search technique, but you want all the relevant documents identified. Precision is the proportion of retrieved and relevant documents to all documents retrieved. (You do not want to have to separate the data wheat from the data chaff especially if there are a lot of documents.) However, you also want to identify all the documents that are relevant. Recall is the proportion of relevant documents that are retrieved, out of all relevant documents available. (You need to make sure that you get all of the data wheat.) Unfortunately, there tends to be a tradeoff between precision and recall in that there is a tendency for precision to decline as recall increases. Your goal is to try to improve both precision and recall simultaneously even though you may never be able to completely reach your goal.



Now, powerful e-discovery search tools exist and they may be very helpful in giving you both good precision and recall results. They may contain full Boolean capability which means that you do not have to search on single keywords, but rather use AND, OR, NOT, and NOR combinations to help filter the data. Of course, many powerful search algorithms are proprietary (although Boolean logic may still be used). (Think Google.) But Boolean techniques are all about the association of keywords. If you use too many keywords, you may find only relevant documents, but not all relevant documents (a problem with recall). If you use too few keywords, you may get back too many non-relevant documents (a problem with precision).



Adding in the ability to search by category can help improve results. Recommind is an example of a company that provides that type of capability. (Recommind recently gave me a briefing as an industry analyst.)



What is category analysis? Recommind uses the example of Java. A search on Java would yield information on coffee, software, and Pacific Islands. You need to categorize into categories from which you can then select the relevant category. Recommind's software does this automatically so that you can then identify the category that is the relevant one for your requirements. (The categorization may not be as obvious as it would be in the case of Java.) That should help both precision and recall.


What can you do about it? Putting your users in the best possible position to get what they need and only what they need out of the data haystack is your challenge when selecting an e-discovery tool. You must work with your users, such as your legal department, to select a number of test cases that you can use to benchmark e-discovery tools against. You must be able to measure the precision and recall of each of the tools against each of the test cases. You may be able to get by with simple Boolean analysis, but full Boolean analysis capability is likely to be at least the minimum that you need. And, if that alone is not sufficient, you can look at the other capabilities that the software tool can provide and category analysis may be the type of capability that you will feel is essential.

 

    Add a comment

    Post a comment using one of these accounts
    Or join now
    At least 6 characters

    Note: Comment will appear soon after you have activated your account.
    Obscene/spam comments will be removed and accounts suspended.
    The information you submit is subject to our Privacy Policy and Terms of Service.

    ITworld LIVE

    StorageWhite Papers & Webcasts

    White Paper

    AppAssure vs Acronis

    In this study of data protection for environments with virtual and physical servers running Windows, openBench Labs tested AppAssure Backup and Replication software v 4.7 and Acronis Backup & Recovery 11. Both solutions utilize block-based technology to unify data protection operations.

    White Paper

    Guaranteeing 100% Backup Recovery

    The single biggest challenge for IT personnel involved in the data protection process is making sure that their backups are recoverable every time. Management and users won't remember the ninety-nine successful recoveries but they will always remember the one failure.

    White Paper

    ESG Analyst White Paper - VMware's vSphere Storage Appliance: High Availability for Small IT Operations

    Learn how small and midsized businesses are increasingly adopting virtualisation to deliver consolidation, improve data back up and disaster recovery and increase security with an in-depth new paper from the Enterprise Strategy Group (ESG). Learn directly from your peer's experiences and see why VMware's solutions are perfect for the growing and ambitious business.

    Webcast On Demand

    Understand Your Data: The Future of Backup and Archiving

    Archiving and Backup are the foundation of the next generation of information governance. However, commodity data protection tools and basic archives are only good for storing data. In the changing IT landscape, understanding what you are keeping, when to delete, and delivering insight to the business from your data is the future of these systems. Join us to hear the impact of private and public cloud solutions, "big data" and your choices while market evolves.

    Sponsor: Autonomy

    White Paper

    NetVault: #1 in the 2011 Oracle Backup Solutions Buyer's Guide

    Want to know how NetVault Backup compared against other Oracle backup software solutions - and why it's DCIG's #1 choice? In this 37-page report you'll get unbiased, third-party evaluations of Oracle backup software - and why NetVault Backup sits on the top of the list. Download your copy today.

    See more White Papers | Webcasts

    Ask a question

    Ask a Question