Natural Language Processing

One powerful application of machine learning is natural language processing (NLP). NLP can extract useful data elements from unstructured, raw data. Using language- and grammar-specific constructs, it builds on a unique combination of algorithms and artificial intelligence tools to analyze, extract, and classify human communications from unstructured data such as online reviews, patent claims, physician notes in a medical file, insurance claims, and even audio recordings. As a result, it can be used to develop and implement predictive models across a number of sectors.

We have used NLP to help clients:

  • Sort through reams of filings and online product reviews to reveal which features were considered relevant to consumers in patent infringement and consumer protection cases
  • Collect, cluster, and analyze information coming from a variety of unstructured text financial sources in order to find relationships between certain language and conducts on the financial markets
  • Examine detailed information of online booking information, like pricing or other specific characteristics, to measure the impact of constraints (or removing constraints) of competition on consumer welfare
  • Uncover issues that are not captured by traditional patient-reported outcome (PRO) instruments, including through the use of social media and online patient data on medical conditions and their treatments
  • Efficiently conduct literature reviews, organizing large datasets of scientific articles, ranking abstracts by expected relevancy for particular research topics, and highlighting changes in research topics over time
  • Develop a method to identify and standardize medical terminologies, such as disease name, from the unstructured medical data in China’s electronic medical record (EMR) system
  • Identify causally-related medical device adverse event reports in the Food and Drug Administration's narrative-based Manufacturer and User Facility Device Experience (MAUDE) database