Economic Consulting in the Data Science Era
Data science is being used in consulting to law firms, corporations, and government agencies to create more powerful predictive models and inform critical decision making.
The volume of data around us continues to grow at a staggering pace. According to a recent study, 90% of all data in existence today was generated in the past two years. Harnessing this data and using it to inform critical decisions is no easy task. The sheer volume can be overwhelming, with much of the data coming from a multitude of sources and in a wide range of often-incompatible formats, both structured and unstructured. These challenges bring the risk that “big data” will soon become too big to process and convert into meaningful insights.
This is where data science comes into play. The field of data science comprises a number of statistical approaches and methods, including machine learning, natural language processing (NLP), and data visualization. In all of these, data scientists go beyond traditional analytics and focus on extracting deeper knowledge and new insights from what might otherwise be unmanageable datasets and sources.
Analysis Group has long been at the forefront of the disciplines that have evolved into what is known today as data science. Our continued investment in this area helps us to identify previously indiscernible patterns, efficiently comb through unstructured data, and generate more accurate and powerful predictive models. In collaboration with leading academic and industry experts, we are developing new applications for data science tools across virtually every sector of economic and litigation consulting. Examples include creating custom analytics that help companies develop effective controls against the diversion of opioid drugs; analyzing online product reviews to help assess claims of patent infringement; and efficiently analyzing billions of mutual fund transactions across numerous file formats and platforms. (See accompanying table.)
In addition, the integration of data science across our client work often means we are identifying new approaches to solve known problems. NLP is known to many as an e-discovery efficiency tool for processing documents and emails; we are also using it to efficiently gather and analyze valuable intelligence from online product reviews from websites such as Amazon or from the ever-expanding array of social media platforms. Machine learning can also be used to detect complex and unforeseen relationships across numerous data sources. In our health care work, this might include developing applications to compare reported outcomes across both structured and unstructured datasets, such as spreadsheets, handwritten physician notes, and image scans.
To generate swift and actionable insights from large amounts of data, we must be able to explain how to “connect the dots,” and then validate the results. Most machine learning tools, for example, rely on sophisticated, complex algorithms that can be perceived as a “black box.” If used inappropriately, the results can be biased or even incorrect. For this reason, it is important to fully vet and discuss the available data and choice of methodology with clients or adjudicators. This transparency allows us to deliver actionable and understandable analytics through dynamic, interactive platforms and dashboards.
The expanding world of available data has its challenges. Many of these newer data sources, especially user-generated data, bring risks and tradeoffs. While much of the data is freely available and accessible, there are potential biases that need to be addressed. For example, Amazon reviews could be artificially weighted or otherwise influenced. There can also be uncertainty around the overall data quality from user-generated sources. Addressing these kinds of issues in a verifiable way requires sophisticated understanding at the intersection of advanced analytical methodologies in computer science, mathematics, statistics, and economics.
As the volume of available information continues to expand, the challenge of extracting value from the data will only grow more complex. It will be important to take full advantage of further enhancements in data storage, retrieval, and processing to keep pace. Equally important will be continuing to empower key stakeholders and decision makers – whether in the boardroom or the courtroom – by making the data, and the insights it can deliver, understandable and compelling. This will likely continue to require developing new data science tools and applications, as well as enhancing stakeholders’ ability to view and manipulate the data in real time through the continued development and refinement of user-friendly dashboards. ■