TAR Behind the Scenes: DESI V and the Quest for Standards
For those who toil on the frontlines and in the trenches of technology-assisted review, it is good to know that there’s also hard and diligent work going on in the background by the more erudite members of the extended community. If you need proof, just check out this growing list of research papers submitted to the International Conference on Artificial Intelligence and Law (ICAIL) by an esteemed and varied group of academics and practitioners in advance of DESI V (the fifth Discovery of Electronically Stored Information (DESI) workshop), which will focus on best practices and standards for using advanced search and review methods in e-discovery.
The papers are fascinating. They proffer a wide array of tantalizing areas for exploration, from an examination of the centrality of narrative in constructing theories of relevance in Predictive Coding, Storytelling and God: Narrative Understanding in e-Discovery (Lawrence Chapin, Simon Attfield and Efeosasere Moibi Okoro) to a due consideration of weaponry in life and in document review in The Fall of the Berlin Wall and its Parallels to E-Discovery (Karl Schieneman). A paper by Dan Regard and Tom Matzen takes a backward look to put into proper perspective the oft-quoted and equally as oft-misunderstood 1985 Blair and Maron study in A Re-Examination of Blair & Maron (1985), while H5’s Bruce Hedin, Dan Brassil, and Christopher Hogan examine the need for TAR standards that would include valid recall and precision measurement in Toward a Meaningful E-Discovery Standard. Jason R. Baron and Jesse B. Freeman tackled an explanation of the “black box” of predictive coding in Cooperation, Transparency, and the Rise of Support Vector Machines in E-Discovery: Issues Raised by the Need to Classify Documents as Either Responsive or Nonresponsive. There are more. The submitted papers run a thoughtful gamut and all are good for serious study or just an informative peruse.
DESI V: A call for standards
DESI V, the workshop engendering this flurry of submissions, will be held in Rome, Italy, on June 14, 2013. The first DESI workshop, held in 2007, brought together e-discovery practitioners and a broad range of research communities in an effort to aid in the development of new technologies to support the e-discovery process. The scope was broadened in DESI workshops II through IV to include differing national settings and legal environments as well as contemplation of standard-setting frameworks.
With the use of AI and automated tools for document review very much in the e-discovery limelight, the upcoming DESI V workshop will focus on best practices and standards for using predictive coding, machine learning, and other advanced search and review methods in e-discovery. As of now, there is no commonly accepted set of quality standards for the use of these tools and techniques, a fact which only adds to uncertainty in the courtroom as judges and counsel struggle to address appropriate protocols. There is a growing consensus in the legal community that certification standards would be of significant benefit to service providers, the bench, and the legal profession as a whole, accelerating the adoption of more advanced technologies as participants grow more comfortable with generally accepted quality criteria.
Participants in DESI V will thus grapple with potential models of certification standards that might be applied to e-discovery such as ISO 9000 or ISO/IEC 27000, sets of standards which focus, respectively, on the validation of the quality management systems used in an on-going business process, and best practices and requirements for information security management systems.
The role of measurement in a meaningful standard
In Toward a Meaningful E-Discovery Standard, H5’s Bruce Hedin, Dan Brassil, and Christopher Hogan argue that “the potential benefits of a standard can be realized only if the standard addresses the central question potential consumers have when evaluation an e-discovery product or service: how accurate are the results?” They note that either of the ISO sets of standards mentioned above would do, since both are flexible enough to include a measurement provisions—specifically measures of recall and precision—for e-discovery products and services. (Recall describes the percentage of responsive documents retrieved in a search; precision describes the percentage of retrieved documents that are actually responsive.) They also argue that the standards need not include not a quality threshold per se, but rather a requirement that a provider have the capability of measuring recall and precision in a meaningful way.
One would think this an uncontroversial notion, since measurements used to assess quality are second nature for many products and services. But in the document review realm (which apparently includes places where “angels fear to tread”*), the idea is not so cut and dried. What, after all, constitutes the minimum level of recall and precision? Will a standard lead to unreasonable demands for higher levels of recall? Will the necessity of precision and recall measurements be too time consuming and costly for litigants? Will the use of quantitative measures result in bad-faith methods at arriving at the desired “numbers”?
These questions addressed in the H5 paper, and no doubt many others, will be fodder for the DESI V participants, adding heat during the summer months to an already hot topic. We await with interest what the workshop will produce.
*“For lawyers and judges to dare opine that a certain search term or terms would be more likely to produce information than the terms that were used is truly to go where angels fear to tread.” – Magistrate Judge John Facciola of the U.S. District Court of Washington, D.C., in U.S. v. O’Keefe ( D.D.C. Feb. 18, 2008).