• The Depth of Provider Data

    Health care institutions that are early adopters of integrated data systems now have access to rapidly growing databases containing multiple petabytes of data.1

    Although considerations of big data in health care typically focus on patient information (e.g., individual health information and administrative claims data), health care providers, such as hospitals, also create massive quantities of internal personnel data. Hospitals have challenging staffing requirements. They must recruit and retain large numbers of individuals with different skill sets and somehow get them to work on a variety of different days and times. Hospitals stay open on holidays, weekends, and even during the Super Bowl. These staffing requirements result in complex compensation practices, which create large quantities of information on employees. Because the information that is available for such individuals is often detailed and disaggregated, it offers significant analytic opportunities and challenges. The richness of the data makes it possible to investigate the impact of specific activities on different types of hospital employees and behaviors.

    Analysis of disaggregated, hospital-based compensation data was useful, for example, in an antitrust case in which plaintiffs alleged that a conspiracy among a group of hospitals suppressed the wages of nurses working in those hospitals. The complete enterprise payroll system in this case was more than three terabytes in size. At the class certification stage of the case, the question was raised as to whether the alleged conspiracy affected the compensation of all or only of some of the nurses in the putative class.

  • “Access to big data created by health care providers creates analytic opportunities that simply were not previously available.”

    — Vice President Dov Rothman

  • A basic economic implication of the plaintiffs’ theory was that if the hospitals successfully suppressed nurse wages, fewer nurses would have been willing to work in the hospitals. But because hospitals have minimum staffing requirements, shifts cannot go unstaffed. This meant that if the hospitals successfully suppressed nurse wages, some class members’ opportunities for more highly compensated overtime pay could have increased as a result of the conspiracy; the alleged conspiracy actually could have increased some nurses’ total compensation.

    The possibility that the alleged conspiracy increased some nurses’ total compensation had important implications for the commonality of the alleged conspiracy’s impact – a key question at the class certification stage. Evaluating this possibility required data on both the quantity (“how much”) and type (“base pay or overtime”) of compensation nurses received. In the past, such disaggregated data on how nurses were paid might either have been unavailable or too time consuming and costly to analyze. Vice President  Dov Rothman, who has worked on many cases involving very large data sets, explains, “The big data revolution has relaxed analytic constraints on data that is both ‘long,’ meaning it includes information on numerous individuals, and ‘wide,’ containing extensive details on all individuals.” Rothman adds, “Access to big data created by health care providers creates analytic opportunities that simply were not previously available.” ■


    1. Beth Israel Deaconess Medical Center (BIDMC), which is affiliated with Harvard University, now houses three petabytes of data, which is growing at a rate of 25 percent each year.  John D. Halamka, “Early Experiences with Big Data at an Academic Medical Center,” Health Affairs, 33, no.7 (2014):1132-1138.