Analyzing Causation, Damages in Data Breaches
Issues related to data security and privacy have grown with the exponential increase in the amount of corporate and consumer information being stored and accessed online.
Consumers' personal and financial information topped the list of data breach targets in 2011.
The number of corporate and consumer data breaches has risen dramatically over the past few years. A study by Verizon Communications, Inc., reported that about 174 million digital records were breached in 2011 compared with about 4 million in 2010. There has been a proportional increase in the number of commercial and class action lawsuits filed in relation to data breaches.
Estimating damages in these matters can be complex, given the difficulty of establishing causality between the breach and any subsequent fraud suffered by individuals whose data could have been compromised, says MIT Professor and Analysis Group affiliate Arnold Barnett.
“But appropriate statistical analyses can paint a clearer picture about the damages sustained,” says Professor Barnett, who has served as a statistical expert in several high-profile matters involving the loss of personal data and damages assessments.
Here, he describes the pitfalls associated with assessing causation in these matters.
Can you talk about your involvement in various data breach cases?
I've worked on a number of cases involving the loss of credit card information in computer breaches and the consequences of these breaches. I'm often retained by law firms to assess liability and alleged damages, and to comment on the statistical reliability of damages estimates provided by the other side.
I use the available data to conduct appropriate statistical analyses of any fraudulent activities associated with the allegedly “at risk” accounts. This means identifying the at-risk population based on, say, the credit cards that were potentially compromised; defining the “fraud window,” or the time period associated with the breach; analyzing the transactions and claims for the at-risk population during that time frame; and comparing the fraud observed against benchmark levels of fraud. Performing this comparison involves a certain amount of statistical detective work.
In what way? What’s most challenging about assessing damages in these cases?
Many times, the damages calculations put forward are simplistic. Sometimes investigators use an improper fraud window and blame the breach for fraudulent activities that occurred before the intrusion even happened. Or they fail to distinguish between the kinds of fraud that the breach might have affected and those for which the breach was irrelevant. Often they risk confusing correlation with causality – ignoring the fact that, for any observed change in fraud levels, there may be explanations other than the breach at issue.
“Data analysis that is not thorough and careful can do more to confuse than to enlighten.”
— Arnold Barnett, MIT Professor
Comparisons between the at-risk population and other populations can be especially problematic. One cannot automatically assume that, absent the breach, the rate of fraud among the at-risk population would equal the fraud rate for the general credit card population. The holders of at-risk accounts may differ appreciably from other account holders on dimensions like income, types of merchants patronized, place of residence, age, gender, and ethnicity.
Moreover, it’s possible that when credit card holders learn about a breach that may have affected their accounts, they are likely to review their billing statements with extra care – and therefore may be more likely to detect and report a higher fraction of the fraudulent transactions made on their accounts than are other members of the credit card population.
Clearly, data analysis that isn’t thorough and careful can do more to confuse than to enlighten. ■