Developments in Medical Big Data Research: What Makes Data Different Today?
The use of big data and related analytics is increasingly widespread in health care. In a recent conversation, Senior Advisor Howard Birnbaum met with Managing Principals Mei Sheng Duh and Eric Wu to discuss the future of medical big data research.
Principal Howard Birnbaum
Howard Birnbaum: Big data discussions are meant to drive health economics and outcomes research and produce novel epidemiological insights. But what is really different about big data today?
Mei Sheng Duh: An area where data have exploded recently is the Internet. More than 100 million people are estimated to be discussing their health online, which means that the Internet has emerged as a substantial source of information about a wide range of medical conditions and their treatments. In a time series analysis, we found that for certain adverse events, patient self-reports on the Internet and in social media are a predictor of adverse event reporting captured in the FDA’s MedWatch system.
Howard Birnbaum: Eric, what are your thoughts about the role of social networks in health care research in China?
Eric Wu: Social networking is alive and well in China today. It is currently being used by physicians to organize research efforts and to follow up with patients to collect better and more complete health care data.
Medical Data in China
Health care data in China contain clinical and economic information that can support research and decision making
Commonly used data types include electronic medical records (EMRs), claims data, chart review, patient and physician surveys, as well as registry data
Challenges in utilizing these data include lack of standardization and accuracy, and fragmentation
Innovative approaches can generate high-quality data with rich clinical and economic information from multiple centers
Managing Principal Mei Sheng Duh
Howard Birnbaum: Mei, can you describe how these Internet and social media data relate to traditional data ?
Mei Sheng Duh: Social media chats and patient forum postings belong to “numerator-based” data, which contrast with traditional health economics and outcomes research data, such as insurance claims data or electronic medical records data, that are “denominator-based.” To illustrate the challenges with numerator-based data, consider the example of an anti-obesity drug that was withdrawn from the market in 2010 based on safety concerns raised by the FDA. In a study of Internet postings, we found that online patients were younger than their FDA MedWatch counterparts and that online ratings were actually higher for this anti-obesity drug than for many safe drugs. This suggests that Internet data can have a high degree of responder bias and potentially lead to spurious conclusions if the underlying data are not viewed with caution.
Howard Birnbaum: There are different data sources in China for health care research. Eric, what has your experience been in China?
Eric Wu: We have used a variety of data sources to conduct health economics and outcomes research and have published or presented our research findings. For example, multi-center EMR data provide rich information on clinical and economic outcomes, including diagnosis, treatment, lab results, results from other exams, as well as costs. The data can be used to address a wide range of research questions.
Social Media Data
Social media and online patient data on medical conditions and their treatments can uncover issues that are not captured by traditional patient-reported outcomes (PRO) instruments
Time series analysis suggests that for certain adverse events, online sources may give earlier indications of adverse events compared with traditional FDA adverse event reporting data
Responder bias can skew the generalizability and reliability of Internet data, requiring careful analysis and consideration of limitations of online data for PROs
Managing Principal Eric Wu
Howard Birnbaum: Could you please provide an example of how we handle the data challenges in HEOR research?
Eric Wu: There are many good data sets collected in China. One major challenge, however, is that data are often stored in an unstructured format. We have developed a natural language processing system to extract useful data elements from the unstructured raw data.
Howard Birnbaum: Thank you both. Clearly, big data discussions today are entering uncharted territory and offer great potential to address today’s pressing health policy and research issues. ■
These analyses were originally presented as part of the symposium “Developments in Medical Big Data Research: United States and China” at the International Society for Pharmacoeconomics and Outcomes Research (ISPOR) 20th Annual International Meeting in Philadelphia, Pennsylvania.