Using Machine Learning in Litigation
A proliferation of data is requiring analyses beyond the limits of familiar tools such as spreadsheets and statistical software.
In health care, the advent of electronic medical records, the marked decline in DNA sequencing costs, and the introduction of industry reporting requirements such as the Sunshine Act have ballooned the volume of available data. In retail, advances in payment technology allow point-of-sale devices to capture millions of individual transactions, resulting in much larger data sets along with increased security risks. Indeed, in almost any business, the volume of unstructured information contained in electronic documents and communications such as email and instant messaging is now enormous. In a litigation context, this proliferation of data can be daunting.
Enter machine learning. Machine learning uses algorithms to detect complex and unforeseen relationships in high-dimensional data (i.e., where there is an abundance of different types of variables, whether involving numbers or unstructured data contained in text or visual images).
These new techniques can be harnessed to help attorneys improve legal strategies, conduct informed fact discovery, provide testifying experts with the most complete set of relevant information, and prepare analyses at a previously unseen level of granularity.
Here are a few examples of how attorneys can leverage machine learning:
Crafting a legal strategy.
Machine learning can be applied during the discovery phase of litigation to quickly find relevant information in large quantities of data. Consider a dispute over alleged off-label promotion of prescription drugs. Conventional analyses might serve as a blunt instrument, grouping together all patients with a particular condition (e.g., lung cancer). Machine learning methods, on the other hand, can identify similarities among patients based on a wider and deeper range of variables or characteristics. (See figure.) Such clustering could reveal clinical differences (e.g., advanced age, failure of other cancer therapies, genetic markers) among groups of patients that might explain use of the drug independent of any promotion. Uncovering these types of patterns at an early stage can be beneficial to attorneys as they contemplate the theory of the case.
Accessing information in unstructured communications.
Unlike conventional statistical methods, machine learning algorithms can be “taught” to recognize the importance of particular word and phrase combinations or other characteristics within documents such as published articles, patent claims, medical notes, regulatory filings, and emails. These characteristics can be associated with specified outcomes, and then used to improve predictions or support an argument.
In patent infringement cases, for example, machine learning can be used to sort through reams of filings using natural language processing capabilities to reveal features common to desired outcomes. This information can be combined with other data to approximate the patent office processes leading to final judgments. Such predictions can help the parties decide whether to negotiate a settlement or engage in costly litigation.
Mining data more efficiently to strengthen arguments.
Machine learning can make use of the vast amounts of data in a company’s possession to conduct more sophisticated analyses that support testimony or provide counterfactual scenarios. Information that might once have been discarded as impractical or irrelevant for expert modeling purposes can be mined for use in discovery or economic analysis.
For example, a discrimination case may be proven or refuted on the basis of unstructured data in the form of email and voicemail communications. Conventional methods can be cumbersome, taking up valuable time and staff resources to sift through physical records. With a natural language processing algorithm based on machine learning, search efficacy can be enhanced while reducing the time and effort required.
How to Employ a Machine Learning Approach
Of course, as was the case with other new technologies that have been introduced to the courtroom (e.g., fingerprints, DNA evidence), testifying experts’ reliance on machine learning might invite initial skepticism. When using such a methodology, the expert will need to rigorously validate the chosen model and evaluate whether results are meaningful and sufficiently accurate (e.g., a model that accurately predicts an outcome 90 percent of the time but has a high false positive rate might not be appropriate). Testifying experts using these methods will also need to educate and convince the court of the validity of these less familiar models.
If appropriate care is taken, widespread adoption of machine learning may prove to be a significant advantage in the increasingly complex and technical world of litigation. ■
Adapted from “Machine-Learning Algorithms Can Help Health Care Litigation,” by Lisa B. Pinheiro, Jimmy Royer, Nick Dadson, and Paul E. Greenberg, published on Law360.com, June 8, 2016; and “Practical Uses For Machine Learning In Health Care Cases,” by Mihran Yenikomshian, Lisa B. Pinheiro, Jimmy Royer, and Paul E. Greenberg, published on Law360.com, September 22, 2016.