The use of big data and related analytics is increasingly widespread in health care. In a recent conversation, Principal Howard Birnbaum met with Managing Principals Mei Sheng Duh and Eric Wu to discuss the future of medical big data research.
Principal Howard Birnbaum
Mei Sheng Duh: An area where data have exploded recently is the Internet. More than 100 million people are estimated to be discussing their health online, which means that the Internet has emerged as a substantial source of information about a wide range of medical conditions and their treatments. In a time series analysis, we found that for certain adverse events, patient self-reports on the Internet and in social media are a predictor of adverse event reporting captured in the FDA’s MedWatch system.
Eric Wu: Social networking is alive and well in China today. It is currently being used by physicians to organize research efforts and to follow up with patients to collect better and more complete health care data.
Managing Principal Mei Sheng Duh
Mei Sheng Duh: Social media chats and patient forum postings belong to “numerator-based” data, which contrast with traditional health economics and outcomes research data, such as insurance claims data or electronic medical records data, that are “denominator-based.” To illustrate the challenges with numerator-based data, consider the example of an anti-obesity drug that was withdrawn from the market in 2010 based on safety concerns raised by the FDA. In a study of Internet postings, we found that online patients were younger than their FDA MedWatch counterparts and that online ratings were actually higher for this anti-obesity drug than for many safe drugs. This suggests that Internet data can have a high degree of responder bias and potentially lead to spurious conclusions if the underlying data are not viewed with caution.
Eric Wu: We have used a variety of data sources to conduct health economics and outcomes research and have published or presented our research findings. For example, multi-center EMR data provide rich information on clinical and economic outcomes, including diagnosis, treatment, lab results, results from other exams, as well as costs. The data can be used to address a wide range of research questions.
Managing Principal Eric Wu
Eric Wu: There are many good data sets collected in China. One major challenge, however, is that data are often stored in an unstructured format. We have developed a natural language processing system to extract useful data elements from the unstructured raw data.
From Health Care Bulletin: Fall 2015