Looking beyond the keyword list paradigm as AI and analytics take the stage
In a 2012 True North blog post, one of our H5 experts provided some practical advice on reducing privilege review burdens and costs by replacing a broad net approach to keyword search with an up-front analysis of the keyword list being used in order to refine the keywords that tend to over-capture.
Great advice then — and now, if you’re relying solely upon crafting a keyword list to identify potentially privileged information, and many people are. But today that’s a bit like greasing up your ‘57 Chevy when you could charge up a Tesla instead. Both may get you where you want to go, but one has the future in mind.
Times have changed, communication channels have changed, data sources have changed, and so have the tools and analytics, including artificial intelligence (AI) solutions such as machine learning (ML) and natural language processing (NLP), that we can now use to effectively prune data collections and identify potentially privileged information. Although advice about how to make keyword lists better for privilege review is always welcome in eDiscovery, it is probably time to look beyond the keyword list paradigm in favor of more sophisticated tools and methods that can be applied across assets, practice groups, firms, or lines of business to identify potentially privileged information.
Privilege protection: vital—and challenging
Protecting privileged information from disclosure is an ethical obligation that rests at the heart of the U.S. legal system and it is as challenging to execute as it is important. Attorney-client or work product privilege claims may be obvious in some cases while requiring nuanced judgement calls in others, depending upon context, participant(s), recipient(s), audience, and other factors. The penalty for a misstep can be severe; inadvertent production of privileged material can sink a case by providing information to the opposing party that can’t be un-seen or result in a waiver of privilege that exposes any number of other documents to the unwelcome light of day (a Rule 502(d) order notwithstanding).
Making everything more difficult today is the sheer volume of information usually in play. It is not uncommon for document collections to comprise thousands to millions of potentially responsive documents, a subset of which will need to be identified to be reviewed for privilege, a very costly and time-consuming exercise. Any tools and methods that can reduce the size of the potentially responsive data pile to begin with are a welcome part of the toolkit; any that can accurately identify the potentially privileged material within that pile even more so.
Enter AI and analytics
Just as technology-assisted review (TAR) took a while to catch on for responsive review, the same is true for the application of AI tools for streamlining privilege. Privilege identification is a tougher nut to crack than responsiveness for a few reasons: it’s not a straightforward binary process as context is very important, it requires consistency throughout the data population, and there can be more at stake in getting it wrong—thus a higher demand for accuracy. And then there’s the dreaded privilege log to consider.
These are actually arguments in favor of using advanced technology, not bypassing it, since tools properly configured and deployed can streamline the entire process and achieve the end goal better than the best keyword list a legal team can devise and a room full of the most diligent reviewers.
Data reduction and targeting
Since many privilege assessments may be nuanced and require a human judgment call, accelerating the privilege review workflow is more about data reduction first and targeting the most likely privileged content next: that is, weeding out non-responsive and non-privileged material early on while finding better ways to elevate, prioritize and review the potentially privileged documents in the responsive pile. Advanced technologies work well here.
To tackle the weeding part, deduplication tools and methods that identify non-responsive content can be deployed to winnow the problem area and technologies to detect and ignore legal disclaimers and boilerplate language can help eliminate false additions to the “for privilege review” pile when they don’t belong there.
Name normalization techniques can address the variation in the way lawyer names appear throughout a document population to elevate those documents and enable more consistent privilege calls. Plus, moving this step to the beginning of the privilege review process can actually help identify privilege, where most workflows wait to take this step until the very end which can mean potentially missing an obscure attorney name hidden in the documents. Analytics tools for email threading can group messages and replies that enable a reviewer to see and understand an email chain as a coherent conversation rather than a bunch of single email messages so they can make nuanced coding decisions at the message, branch or thread level. This is critical in privilege review as only parts of a given thread may be protected.
Automated privilege logging
The benefits of using advanced analytics and AI tools over keyword lists alone extend beyond the identification of privileged information to advancing one’s footing in the creation of the privilege log. Customized or pre-built settings in certain applications can generate privilege reasons associated with privilege calls, to help build, tailor and streamline the process inside a review tool. This not only saves pre-preproduction work, it provides consistency. (Note that it helps to have determined if email threads will need to be logged as single entries, or whether a categorical privilege log rather than a document-by-document log will suffice given the characteristics of the case and document population.)
This is just the beginning of streamlining privilege review. As time goes on, sophisticated AI tools and prebuilt classifiers will be used more and more to help identify nuanced legal concepts based on semantic content in the documents so that less human intervention is required. More customized efforts will be able to improve the likelihood that data volume can be reduced further without missing potentially privileged information.
As with TAR, the privilege review workflow will become even more refined with time as tools and methods (and developers) evolve with experience. With such promising options available, keyword lists as the primary mechanism for finding potentially privileged information may soon enough be a thing of the past.