The New York Times Magazine this week is a special issue on debt (a topic that has a particular resonance to me: we are still paying off an expensive, but spectacular, year in New Zealand!). There is a fascinating article on what credit card companies can learn about you based on your spending. For instance,
A 2002 study of how customers of Canadian Tire were using the company’s credit cards found that 2,220 of 100,000 cardholders who used their credit cards in drinking places missed four payments within the next 12 months. By contrast, only 530 of the cardholders who used their credit cards at the dentist missed four payments within the next 12 months.
A factor of 4 is a pretty significant difference. That should be enough to change the interest rate offered (and 2% default in 2002 is pretty high). The illustrations to the article go on to suggest that chrome accessories for your car are a sign of much more likely default, while premium bird seed suggests likely on-time payment.
The article was not primarily about these issues: it was about how companies learn about defaulters in order to connect to them so that they will be more likely to pay back (or will pay back more). But the illustrations did get me thinking again about the ethics of data mining. Is it “right” to penalize people for activities that don’t have a direct effect on their ability to payback but only a statistical correlation? Similar issues came up earlier when American Express started to penalize people who shopped at dollar stores.
I brought this up during my ethics talk in my data mining course, and my MBA students were split on this. On one hand, companies discriminate on statistical correlations a lot: teenage boys pay more for insurance than middle-aged women, for instance. But it seems unfair to penalize people for simply choosing to purchase one item over another. Isn’t that what capitalism is about? But statistics don’t lie. Or do they? Do statistics from the past hold equivalent relevancy in today’s unusual economy? Is relying on past statistics making today’s economy even worse? Should a company search for something with a more direct correlation or is this correlation enough?
At the Tepper School at Carnegie Mellon, we generally put more faith in so-called structural models, rather than statistical models. Can we get at the heart of what makes people default on credit card debt? For instance, spending more than you earn seems one thing that might directly effect the ability to pay back debt. It is hard to come up with a model where paying for drinks at the bar has a similar effect. But structural models tend to be pretty reduced-form. It is hard to include 85,000 different items (like in the study reported by the New York Times) in such models.
I vacillate a lot about this issue. Right now, I am feeling that data mining like the “spend in a bar implies default on credit cards” can lead to interesting insights and directions, but those insights would not be actionable without some more fundamental insight into behavior. The level of “fundamentalness” would depend on the application: if I am simply deciding on a marketing campaign, I might not require too much insight; if I am setting or reducing credit limits, I would require much more.
I guess this is particularly critical to me since I often play the bank at our Friday Beers. Since we might have 20 people at Beers, the tab can reach $300, and I sometimes grab the cash from the table and pay by credit card. Either the credit card companies have to come up with new rules (“If the tip is > 25% [as it often is with us: they do like us at the bar!], then credit is OK; else ding the record”) or I better hit the ATM on Friday afternoons.
3 thoughts on “Whither Data Mining?”
Mike, you can take your ethical questions to the next level by asking the same questions in health care. Health care companies use similar techniques to score people’s propensity to become ill, or need treatment of some kind. In this case the action may only be an informational campaign, but the consequences of inaccuracy in these campaigns are more important and waiting for better information (or models) is not very attractive.
It really depends on what is the end goal. If you just want to predict who is going to default, then a predictive model (statistical, if we use your terminology) is fine. You just care about predicting, and you make the assumption that past data set is representative of the future. If someone takes the output of the model and tries to “game” the system then the past data is not representative of the future anymore.
If you want to devise policy (the goal of many economists) then you need a structural model that will give some information about causality.
I would like to assume that a credit card company will not try to steer people to drink less and buy more bird seeds in order to decrease its default rate, so a predictive model seems fine in this case.
I guess to me to basing decisions on consumer behavioral and spending patterns would lead to discrimination. It’s like trying to say a person’s genetic makeup determines if they have a predisposition to cancer or alcoholism is the same as having a financial risk predisposition.
Your DNA is your DNA – period. But,your behaviors can change based on the conditions as the article describes Maslow’s hierarchy of needs. As Chris said, the insurance industry uses the same approach to determine premiums. The bottom line is that the mortgage and credit card industry have engaged in predatory practices and loaned to people they know cannot pay.