Five Common Mistakes In Keyword Search: How Many Do You Make?
When you’re a kid, you love easy games to learn and play, whether they’re interactive games, board games or card games. One of the first card games many kids learn how to play is “Go Fish.” It’s easy to learn because you simply ask the other player if they have any cards of a certain kind (e.g., “got any Kings?”) – if they do, you collect those cards from them; if they don’t, they say “Go Fish” and you have to draw a card from the deck and your turn ends. Easy, right?
Conducting keyword searching without a planned, controlled process that includes testing and verifying the results is somewhat like playing “Go Fish” – you might get lucky and retrieve the documents you need to support your case (without retrieving too many others) and you might not. Yet many lawyers and legal professionals think they “get” keyword searching. Why? Because they learned keyword searching in law school using Westlaw and Lexis? Or they understand how to use “Google” to locate web pages related to their topics? But these examples are designed to identify a single item (or handful of items) related to one topic that you seek.
Keyword searching for electronic discovery is about balancing recall and precision to produce a proportional volume of electronically-stored information (ESI) that is responsive to the case, which could be thousands or even millions of responsive documents, depending on the issues of the case.
Five Common Keyword Searching Mistakes
With that in mind, here are five common mistakes that lawyers and legal professionals make when conducting keyword searches:
1. Poor Use of Wildcards: Wildcard characters can be helpful in expanding the scope of the search, but only if you use them well — and understand how they are applied by the search engine you’re using (warning: don’t use Google’s search engine as an exemplar). Poorly placed or ill-advised wildcard character(s) can completely blow up a search. A few years ago, there was a case where one of the goals was to identify documents that related to apps on devices (mobile and PC), so the legal team decided to use a search term “app*” to retrieve words like “app”, “application”, “apps”, etc. Great, right? Not when that same term also retrieves terms like “appear”, “apparent”, “applied”, “appraise”, etc. A better search in this case would have been (app or apps or application*). Make sure to think through word variability and consider word formulations that could be hit by the search. Also consider whether wildcard operators are attached at the appropriate place in the stem of a word so that all of the variants are hit. If not, the search might target too many unrelated words or omit words you want to capture.
2. Use of Noise or Stop Words: To keep retrieval responsive even in large databases, most platforms don’t index certain common words that appear regularly (defined as “noise” or “stop” words), yet many legal professionals fail to exclude these noise words in the searches they conduct – yielding unexpected results. Search terms such as “management did” or “counseled out” won’t work if “did” and “out” are noise words that can’t be retrieved. There are typically 100 or more words that are not indexed by a typical platform, so it’s important to understand what they are and plan around them in creating searches that can get you as close as possible to your desired result.
3. Starting with Searches That Are Too Broad: Another common mistake is to start with searches that are too broad, assuming that you’ll get a result that will be easy to narrow down through additional search. In fact, you may get a result that makes it nearly impossible to determine what might be causing your search to retrieve unexpected results. Keyword search works best when the hard work has been done up front, either by working with subject matter experts who have provided insight into likely vocabulary used (e.g., shorthand, code words, slang) or via a targeted exploration of the document population. That knowledge, coupled with the effective use of Boolean operators like AND, OR, and NOT, should enable you to craft initial searches that put targeted words in the appropriate context, increasing the likelihood that relevant material will be found at the outset. That result will provide the necessary fodder for developing additional searches that are more precise.
4. Failing to Test What’s Retrieved: Many legal professionals create a search, perform that search and then proceed to review without testing the results. Performing a random sample on the results could quickly identify a search that is considerably overbroad and would result in a low prevalence rate of responsive documents, driving up costs for review and production. Testing the result set to ensure the search is properly scoped is well worth the time and effort to take that extra step in terms of potential cost savings. Better to review an extra few hundred documents than an extra hundred thousand documents.
5. Failing to Test What’s Not Retrieved: It’s just as important to test the documents that were not retrieved in a search to identify areas that were potentially missed. Not only does a random sample of the “null set” help identify searches that were too narrow in scope, they also are important in addressing defensibility concerns related to your search process if it is challenged by opposing counsel.
The ”Go Fish” analogy isn’t an original one – then New York Magistrate Judge Andrew J. Peck used it in his article Search, Forward over nine years ago (October 2011) when he observed that “many counsel still use the “Go Fish” model of keyword search.” If you’re making some of the mistakes listed above, you might be doing so as well. Proper keyword searching is an expert planned and managed process that avoids these mistakes to maximize the proportionality and defensibility of your discovery process. It’s not a kid’s game, so make sure you don’t treat it like one.
For more information on H5’s Keyword Consulting service to assist companies with designing and executing a keyword search strategy, click here.
For related posts, see:
Topics: keyword search