• New Challenges and Initiatives in Big Data Processing

     As the quantity of available data increases and traditional processing approaches become cumbersome and costly, Analysis Group teams have utilized a novel approach: leveraging the graphic processing unit (GPU) cores in graphics cards for parallel processing. Managing Principal Lisa Pinheiro notes that this can “increase processing speeds by a factor of 20 to 100 in common, time-intensive applications.” She adds, “This not only cuts time and costs, it also offers new opportunities to explore more complex analytic methods and detailed interpretation of results.”  

    Methodologies Types of Models Used Because Examples of Applications
    Predictive Modeling and Machine Learning Algorithms Neural Networks
    Random Forests
    Clustering
    Predicting outcomes by training an algorithmic model on subgroups of observations typically outperforms standard regression approaches when many variables affect the outcome in varying and nonlinear fashion Predicting drug adherence based on patient and drug characteristics; predicting ultimate drug success based on characteristics of the market and sponsoring
    company; identifying cancerous pixels in medical imaging; predicting resource utilization or side effects based on patient characteristics and history
    Bootstrap GLM Models Calculating confidence intervals for nonlinear models requires simulations Calculating confidence intervals around estimates of cost savings or resource utilization associated with a treatment
    Cross-Validation All Models Cross-validation is used to select
    models with the best predictive power -- i.e., those that perform best when applied to new data
    Cross-validating a patient classification system to ensure appropriate accuracy before implementation
    Simulation All Models Simulations are useful for rare-event modeling Simulating the future development of rare diseases or side effects