Document Review Accuracy: The Recall-Precision Tradeoff
By now you probably know that two measures—recall and precision—best gauge the overall effectiveness of any document review process. Regardless of whether the review process utlizes technology-assisted review, manual review or simply relies on coin flipping, recall and precision are the metrics that best convey how well you’ve done.
Why should you care about recall and precision?
High recall ensures a you have what you need to produce (compliance) and, as important, what you need to win. High precision means you produce only what you have to (maintain advantage) and keep costs down by reviewing only what you should (i.e. fewer non-relevant documents). If things like compliance, winning, and keeping costs down don’t matter to you, then you need not care. But if they do, you’d be wise to read on and learn how the recall-precision tradeoff can impact you.
Recall and Precision
As a quick refresher, Recall measures how many of the relevant documents in a collection have actually been found. For example, a 40 percent recall rate means that 40 percent of all relevant documents in a collection have been found, and 60 percent have been missed.
Precision measures how many of the documents retrieved are actually relevant, that is, how much of the result set is on target. For example, a 65 percent precision rate means that 65 percent of the documents retrieved are relevant, while 35 percent of those documents have been misidentified as relevant.
The Recall-Precision Tradeoff
Any document review process can achieve either high recall or high precision, but rarely both simultaneously. An effort to improve the performance of one factor generally causes the performance of the other to drop. This is often referred to as the “Recall-Precision tradeoff.” We get alot of questions about what causes this tradeoff that most review processes and search tools are subject to. The diagram below helps illustrate why this happens.
The gray oval represents all documents in the population; the blue oval is the set of relevant documents; the orange oval represents the documents retrieved by your search or assessed as relevant by the review process.
That last point bears repeating: the optimal result – high recall with high precision – is difficult to achieve. As it turns out, very difficult. And in our experience, it cannot be achieved through search technology alone. It also cannot be achieved through manual review alone. Finding just the necessary information and little else is no easy task, but it can be done through an optimal combination of technology, process and human expertise. It is the intersection of these three components, with the human expertise driving the appropriate application of technology and process, that enables simultaneously high recall and high precision.
Learn more about the recall-precision tradeoff and the performance of technology-assisted review processes.