“Conventional wisdom” holds that keyword or Boolean searching is ineffective in legal document review, missing many relevant documents (low recall). Exhibit A in support of this belief is the oft-cited Blair and Maron study of 1985. The study found that a group of attorneys and paralegals, having set out to find—with keyword and Boolean searching—at least 75% of all responsive documents in a given collection had, in fact, only found 20%. Conclusion? Keywords don’t work.
When Judge Peck, in Da Silva Moore, states that “keyword searches are usually not very effective,” he cites the Blair and Maron study. His statement, albeit nuanced, reinforces the same snap conclusion: Keywords don’t work. There is probably no more pervasive meme in the e-discovery biosphere than this statement. When it comes to search, words are perceived as blunt, crude, clumsy, ineffective tools, unable to properly satisfy the quest for finding relevant documents.
Keywords are words. Properly used, words enable great exactitude, subtlety and richness of expression. These properties don’t evaporate simply because a person uses words to retrieve concepts rather than to articulate them. And indeed, NIST’s TREC studies have shown that word-based/linguistic searches designed, executed, measured and iteratively improved by experts in information retrieval can be very effective—more effective, even, than human “search agents” reading every document in a collection.
Stating that (key)words don’t work is the equivalent of concluding that “scalpels don’t work” when scalpel-wielding attorneys—or scalpel-wielding linguists for that matter—achieve a 20% survival rate in unwitting patients they might operate on. How misguided would we be to think that it would work, with appropriate tools clearly in the wrong hands?
The real conclusion to be drawn from Blair & Maron is not that keywords don’t work: it is that keywords cobbled together by attorneys don’t work. And that makes sense. Effectively using language to find relevant information is an actual science. None of us is very good at performing technical tasks in which we do not have adequate academic and professional training; why would we be? Attorneys are experts in the law, ESI experts know electronic information—but rarely is either an expert in information retrieval.
An interesting offshoot of the belief that keywords don’t work is that practitioners run hither and yon, seeking the cure-all search tool or technology that will make it all fast and easy. Just beyond the horizon, the imagined “Next Great Thing” beckons: fuzzy search, concept search (remember that?), LSI, Bayesian classifiers, predictive coding, and on and on. Yet TREC research shows that almost none of these approaches—which are marketed as novel to attorneys but most of which are decades old—do better at document review in 2012 than the paralegals and attorneys in Blair & Maron did using keywords in 1985. Thus, Judge Peck’s nuanced statement about keywords could be usefully paraphrased here: “The ‘Next Great Thing’ in search is usually not very effective.” This is not to say that these techniques cannot work. TREC does show that in the hands of information retrieval experts, some of the approaches tested can be effective as part of an expertly-designed process combining technology and manual review.
So, words do not fail us. Keywords do not fail us. Science does not fail us. Effective solutions to the search challenge (including technology-assisted or automated responsiveness review) exist. They are simply not achievable absent the expertise needed to apply the technologies that are available, whether these technologies are deterministic (using specific language to make a determination of responsiveness) or probabilistic (using the statistical properties of a collection to establish a likelihood of responsiveness).
No matter how sharp a scalpel gets, it won’t do the skilled surgeon’s job in the hands of an attorney. As long as the legal profession persists in having experts in law or in ESI attempt to solve challenges where experts in search are required, practical solutions will remain elusive and all the attendant costs and inefficiencies will continue to burden courts, litigants and regulators.