Machine Power: Applications of Natural Language Processing in Economic Consulting
Just as computational methods changed the way economic consultants analyzed quantitative data, natural language processing (NLP) algorithms are transforming the analysis of unstructured data.
The change is both one of scale – NLP algorithms can process vastly larger amounts of information than has previously been possible – and one of nature – they create entirely new analytical possibilities for detecting and extracting patterns from large unstructured datasets.
Here are three examples of this transformation in action.
Many litigation matters depend on the comparison of large datasets, such as those containing financial or health care claims information. These datasets, however, are often rife with inconsistencies in formatting or naming conventions, which will cause an exact matching algorithm to miss many close correspondences between records. While cleaning and manually matching the data may be practical for smaller datasets, it becomes impossible once the volume of data grows too large.
However, it is possible to create an algorithm that not only finds exact matches between data fields but also singles out the most promising ‘almost matches’ for manual review and confirmation, as explained in this video.
Text Similarity Analysis for Copyright Infringement Cases
Copyright infringement cases involving texts often turn on the question of whether one party improperly reproduced content copyrighted by another party, making minor changes to the text to avoid the two being identical.
However, NLP methods can be used to gauge the similarity of two texts beyond simple identity. These tools can be used to indicate levels of text similarity that can be used to determine whether plagiarism or infringement has occurred. The NLP techniques available to assist in this kind of work range from deep learning (a form of machine learning) to simple scoring algorithms that can be interpreted by a judge or jury. One such tool is detailed in this video.
Chatroom Transcript Analysis for Investigations and Litigations
A number of finance and competition investigations and cases have centered on allegations that traders colluded to fix prices by using online chatrooms to exchange information. Reviewing the transcripts of these conversations, which can run into the millions of pages, is often prohibitively expensive and time-consuming. However, NLP algorithms can overcome these hurdles and make these analyses manageable by allowing case teams to discard irrelevant material and focus only on the pertinent sections of these transcripts, as shown in the video below.