Helping Minds Meet - Talking TAR at the Meet-and-Confer: Webinar Q&A
In a recent H5-sponsored Bloomberg BNA Webinar entitled “Helping Minds Meet: Talking TAR at the Meet-and-Confer,” a litigator (Christine Payne, Kirkland & Ellis LLP) and a data scientist (Bruce Hedin, H5) discussed a variety of things to consider when TAR is on the table at the meet-and-confer. A question submitted to Bruce for Q&A (answered personally after the webinar) may be of interest to others.
“Don’t machine-learning TAR methods infer which words are “key,” where only the machine knows what it thinks the key words are? Would that make machine learning methods less “validatable” than explicit key word methods?”
It is true that some machine learning systems will, by design, ignore certain words or weight some words more heavily than others. That does not make their results less open to validation, however. Regardless of how it gets there (whether manual, keywords, rules-based, or machine learning), as long as the system results in a set of documents classified as responsive and a set of documents classified as non-responsive, you can validate the result (and machine learning systems produce such a result, once you select a given cut-off point on the ranked list of results). It is just a matter of using sampling to obtain an estimate of the proportion of actually responsive documents in the set classified as responsive and an estimate of the proportion of actually responsive documents in the set classified as non-responsive. With those results in hand, you have what you need to get recall and precision estimates.
Where the systems do differ, however, and this may have been the actual intent of the question, is in the ease with which they allow you to respond to any issues uncovered by the validation exercise. In the case of a keyword approach (or a rules-based (linguistic) approach), for example, if you do your validation and find that you are missing some important documents, it is simply a matter of making appropriate additions to your keyword list (or supplementing your knowledge base of rules).
Likewise, if you find that you are capturing too much non-responsive material, it is a fairly straightforward matter to refine your keywords or rules. In the case of machine learning systems, however, the remedy, in the event that such issues are uncovered in the course of validation, is less transparent. You will certainly want to add data to your training set and go through another cycle of recalibrating the statistical model, but what the outcome of that additional cycle will be is not entirely predictable: it may address the issue or it may not (in which case you will want to try again with another cycle). So, in the case of machine learning systems, when it comes to addressing issues uncovered by a validation exercise (or any other sort of QC exercise), there is a lack of immediacy and transparency that does not hold for keyword or rules-based approaches.
More generally, I’d say that this is an aspect of the “user experience” that those looking at various TAR options should keep in mind: for a given TAR system, how easy is it to address issues that are discovered and what kinds of expertise are required to arrive at the appropriate remedies?