Applying Data Science to Meet Real-World Health Care Challenges
With advanced statistical approaches, health care scientists can extract knowledge from what might otherwise be unmanageable datasets and sources of information.
One of the most pressing challenges businesses face today is how to harness an ever-growing expanse of available information. In the health care field, researchers and decision makers must cope with massive datasets generated from a wide array of new sources in disparate formats. In this context, the tools provided by data science – including machine learning, natural language processing (NLP), voice and imaging recognition, and data visualization – are becoming indispensable for researchers, practitioners, and regulators.
Health care data analyses were once confined primarily to whatever information was generated in clinical trials. Now, analyses of real-world datasets (e.g., payer claims, electronic medical record (EMR) systems, registries) are becoming more commonplace. Researchers also are tapping into data sources that were wholly unimaginable not that long ago. For example, social media posts, emails, and other electronic sources are being analyzed in efforts to detect unforeseen events or previously unidentified conditions. Wearable technology and fitness tracking apps that monitor an individual’s health in real time are helping to manage chronic conditions, support disease diagnosis, and improve overall health. Patient-reported outcomes (PROs) are shedding light on the impact of particular interventions on quality of life, with the documentation of physical function, psychological and social well-being, and satisfaction with care. Handwritten physician notes, audio files, and image scans are now finding their way into large-scale data analyses as well. (See graphic.)
Advances in high-performance computing and flexible data storage undoubtedly are helping to make all this possible, but the advanced methodological approaches offered by data science can be a game-changer. Such techniques provide the ability to rapidly analyze large amounts of information in virtually any format. Take, for example, China’s massive unstructured EMR system, where NLP can help researchers identify and standardize terminology across billions of free-text fields – a key first step for many analyses. Advanced statistical techniques also are often important for developing sophisticated algorithms to monitor distribution of controlled substances and flag potential misuse.
In addition, machine learning techniques can be used to improve the accuracy of other common analyses, such as propensity score matching, predicting the risk of developing a disease, and predicting treatment outcomes. They are proving valuable for identifying patterns that are otherwise indiscernible, such as clustering patients by subtle clinical differences.
These are just a few examples of data science in practice. As the volume and types of available data continue to expand across the increasingly complex global “datasphere,” researchers will continue to identify new ways to use these tools to answer a wide range of health care questions more efficiently and generate deeper insights. ■