Benchmarking Document Review
Document review in litigation can be a complex, expensive and labor-intensive process. Now, the tools and methods used to review the massive stores of electronic data, routinely in play for large litigations, are coming under more scrutiny as courts opine on how, or even whether, technology may be used as part of the review process (See Da Silva Moore v. Publicis Group and Global Aerospace v. Landow Aviation).
As a result, some corporations and firms are taking a hard look at their document review approach — especially those with manual processes — trying to assess whether there is a better approach to use and under what circumstances. This assessment certainly isn’t easy given the variety of matters that require review and the cost, risk and timing trade-offs inherent in any document review process. How can you even begin to evaluate which tool or method will provide you or your client the best solution for a given situation?
What is Benchmarking?
Enter, benchmarking. Benchmarking — which can range from a strict, formal quantitative evaluation to anecdotal information gathering — is a method intended to inform decision making around systems and processes. Benchmarking provides a modeling framework for making an evaluation or comparison so you can:
- Capture the factors under consideration to be considered for a particular model.
- Conduct analyses through manipulation of the factors under consideration (e.g., cost, time-to-completion, risk) to understand the resulting trade-offs.
- Help minimize uncertainty and risk in decision making, thus ensuring that the actual review project will achieve the maximum possible utility (i.e., it will optimally meet the review objectives along each dimension of time, cost and risk).
To frame the benchmark for document review, a primary assumption is that no matter the form of review — manual linear, technology-assisted, whatever — what is sought is the highest quality result (accuracy) relative to other factors — in this case time, cost and risk. For an electronic data population, which is usually the case these days, that means finding the greatest number of relevant documents without bringing in a lot of junk that needs to be reviewed within whatever time or budget restraints may exist.
The standard metrics used for accuracy reflect the risks that matter most to the companies and attorneys that undertake document review projects, namely: 1) the risk of missing documents that should have been retrieved; and (2) the risk of retrieving documents that should not have been retrieved.
In “information retrieval speak,” these two concepts are referred to as recall and precision, respectively. In other words, the most accurate process is one that finds the most relevant documents (high recall) without bringing in a lot of nonrelevant material (high precision) that would have to be reviewed or otherwise weeded out. In essence, recall addresses the risk factor of a document production while precision is more likely to impact cost.
This is important information nowadays, especially for technology-assisted approaches whose results are coming under scrutiny by the court. In the words of Judge Andrew Peck, who made the by now wellknown Da Silva Moore ruling about technology-assisted review, “[I] will want to know what was done and why that produced defensible results. I may be less interested in the science behind the ‘black box’ of the vendor’s software than in whether it produced responsive documents with reasonably high recall and high precision.” (“Search Forward,” Law Technology News, October 2011).
Industry Benchmarking for Accuracy (and Yours for Risk)
So how do you know the recall and precision of a particular method or tool and what constitutes a good result relative to other approaches so that you can benchmark accuracy? The recall and precision measurement process involves an accurate sampling of the document population to see how well a process performs. For purchased technology-assisted solutions where you’re essentially asking a software tool to help you find relevant documents, you should ask the vendor to provide precision and recall statistics of the software’s performance — they should know it — but caveat emptor: Some accuracy claims are highly overstated. In addition, the precision and recall for a particular review must be measured during the review, so search expertise is required to be comfortable with your result.
Interestingly, manual processes, which upon study have been shown to have fairly low levels of precision and recall, were always assumed to be the gold standard and were rarely challenged in court. Now that that myth has been busted and technology has entered the picture, the benchmark effort may be more important, especially if Peck’s view prevails. An uninformed practitioner may face a challenge in court by a more tech-savvy opposition who claims that the incumbent manual process isn’t good enough.
To specifically aid the legal profession, independent benchmark studies related to accuracy have been conducted over the past several years by the National Institute of Standards and Technology’s Text Retrieval Conference (TReC) Legal Track to test the recall and precision of various search tools and methods and provide a basis for comparison. Conclusions about document review based on these studies have been elaborated upon in a more accessible (read “less academic”) article by Maura R. Grossman & Gordon V. Cormack, “Technology-Assisted Review in E-Discovery Can Be More Effective and More Efficient Than Exhaustive Manual Review,” XVII RICH. J.L. & TECH. 11 (2011). The study shows that simultaneous high recall and high precision can be achieved, but it requires a combination of the right technology and the right expertise.
The bottom line is that now that technology is in the picture, so is the sensitivity to accuracy no matter what the method, so it should be considered in light of the other priorities driving the review.
Accuracy of a review process can be benchmarked against the TReC results, but more often than not, it is cost that is the desired factor to benchmark. Cost is, after all, what drives purchasing and budgeting decisions. To effectively benchmark cost, it helps to capture the factors that will enable comparison among methods and/or vendors:
- Documents subject to review (e.g., gigabytes of data volume x documents per gigabyte).
- Cost to cull, process and host the data.
- Estimated documents subject to review if document set is intelligently reduced using technology-driven filters (high precision will lower costs here).
- Estimated documents subject to review after step 2 (and/or 3).
- Cost of licensing/purchase/maintenance of computer-assisted review software or contracting outsourced services, if applicable.
- Estimated time and cost to create seed sets and “train” a software tool, if applicable.
- Review rate per hour.
- Cost per hour for review.
- Resulting cost per document for production.
It should be easy to understand that there are certain trade-offs among time, cost and risk. For example, the less time, money and expertise invested on a review, the lower the accuracy will most likely be, which may increase risk of missing documents you need for your case or may warrant a challenge from the other side.
Conversely, attempting to achieve high accuracy (high precision and high recall simultaneously, finding all of the relevant documents and only the relevant documents) is a more rigorous exercise. However, if done properly using the right combination of technology and expertise, the overall duration of the review can be shorted and costs reduced, providing more of the litigation budget for winning the case. With manual review methods, the opposite is usually true. Thus, the resulting estimated cost should be considered in light of the likely time line of the case and any budget constraints that would address a client’s unique practical objectives, including the appetite for risk
in the case. Understanding the interplay of these factors should enable you to manipulate them and conduct the analyses that can drive the decision about what methods or tools to use to achieve the prioritized objectives of reducing time, cost and risk.
Several expert competencies are involved in conducting more formal models for benchmarking for document review. These include financial modeling expertise, an understanding of accuracy from an information retrieval standpoint (i.e., precision and recall) and its relationship to time-to-completion, cost and risk, as well as knowledge of the performance of various processes and technologies under consideration. Such competencies are available in the marketplace and are necessary in order to conduct a truly accurate benchmark.
An organization’s investment of time and effort required to complete a benchmark is small, but the return on this investment is considerable. The results may or may not validate the originally anticipated review process; but in any event, the benchmark will provide an organization with an assessment of how to best meet its objectives while maximizing the value derived from its review expenditures.
This article is reprinted with permission from the September 12, 2012 issue of The Recorder. © 2012 ALM Media Properties, LLC. Further duplication without permission is prohibited. All rights reserved.