Investors and marketers alike are racing to capitalize on the promise of so-called Big Data applications, the “technologies that apply complex algorithms to large, heterodox data sets to extract theoretically more meaningful information than traditional analytics.” By piecing together voluminous, disparate, and sometimes non-intuitive information, Big Data applications generate intelligence which enables companies to service, market, and sell to customers in ways never before possible.
It comes as no surprise, therefore, that Big Data has become big business. The industry this year will account for $28 billion of IT spending worldwide, which will increase to $34 billion in 2013, according to Gartner.
What makes Big Data impressive, however, is what makes it problematic from the standpoint of legal risk management, litigation discovery and regulatory and government investigation response.
At the most basic level, the problem is scale. As Big Data technologies enable businesses to preserve, access, and generate value from ever growing volumes of data, lawyers and risk management professionals will have to scramble to keep pace. Discovery practitioners strive to set boundaries and apply levers that reduce the volume of data at issue. When provocative Big Data business practices inevitably draw the attention of privacy advocates, regulators, and plaintiff lawyers, it may not be so easy to impose limits. The very scale and reach of the data may be what litigants are fighting over.
Another issue is data complexity. Big data applications are breaking new ground by drawing from multiple, largely unstructured sources of information, such as web logs, tweets, and user comments. These sources of data are not only mountainous but also difficult to collect and normalize into formats typically leveraged by eDiscovery tools and practices. What will it mean to litigators and investigators when the relevant data is no longer a closed universe of email and office documents?
Lastly, even if discovery practitioners can somehow harness Big Data in a physical way, that doesn’t mean they can make sense of it. Assessing information across large, disparate sources for discovery purposes is essentially a Big Data problem itself.
Does the answer lie in technology? Do we simply need a bigger, smarter tool to glean evidence from Big Data? While this idea may be appealing, decades of research have shown that technology is no panacea. Human expertise becomes all the more critical when both the target of the inquiry and medium in which it resides are complex and unwieldy. According to a recent blog posting by Harvard Business Review, machine learning may be generating the wrong conclusions 70%-80% of the time.
When recommending a movie on Netflix or generating a banner ad during a web search, it’s probably okay for the machine to miss the mark. Not so in legal discovery. An overlooked communication or inadvertently produced secret can have devastating impact on the outcome of a dispute.
Although eDiscovery battles today are mostly being fought over email and other custodial data, it’s not hard to envision a future where the government and other claimants begin to demand more. And if businesses can deploy Big Data for their commercial needs, judges will expect competence to produce it in litigation.
Photo Credit: x-ray delta one