{"id":679,"date":"2009-05-17T18:51:17","date_gmt":"2009-05-17T22:51:17","guid":{"rendered":"http:\/\/mat.tepper.cmu.edu\/blog\/?p=679"},"modified":"2009-05-17T18:51:17","modified_gmt":"2009-05-17T22:51:17","slug":"whither-data-mining","status":"publish","type":"post","link":"https:\/\/mat.tepper.cmu.edu\/blog\/index.php\/2009\/05\/17\/whither-data-mining\/","title":{"rendered":"Whither Data Mining?"},"content":{"rendered":"<p>The<em> New York Times Magazine<\/em> this week is a special issue on debt (a topic that has a particular resonance to me:\u00a0 we are still paying off an expensive, but spectacular, <a href=\"http:\/\/greatnzadventure.blogspot.com\">year in New Zealand<\/a>!).\u00a0\u00a0 There is a <a href=\"http:\/\/www.nytimes.com\/2009\/05\/17\/magazine\/17credit-t.html?_r=1&amp;ref=magazine\">fascinating article<\/a> on what credit card companies can learn about you based on your spending.\u00a0\u00a0 For instance,<\/p>\n<blockquote><p>A 2002 study of how customers of Canadian Tire were using the company&#8217;s credit cards found that 2,220 of 100,000 cardholders who used their credit cards in drinking places missed four payments within the next 12 months. By contrast, only 530 of the cardholders who used their credit cards at the dentist missed four payments within the next 12 months.<\/p><\/blockquote>\n<p>A factor of 4 is a pretty significant difference.\u00a0 That should be enough to change the interest rate offered (and 2% default in 2002 is pretty high).\u00a0\u00a0 The illustrations to the article go on to suggest that chrome accessories for your car are a sign of much more likely default, while premium bird seed suggests likely on-time payment.<\/p>\n<p>The article was not primarily about these issues:\u00a0 it was about how companies learn about defaulters in order to connect to them so that they will be more likely to pay back (or will pay back more).\u00a0 But the illustrations did get me thinking again about the ethics of data mining.\u00a0 Is it &#8220;right&#8221; to penalize people for activities that don&#8217;t have a direct effect on their ability to payback but only a statistical correlation?\u00a0 <a href=\"http:\/\/mat.tepper.cmu.edu\/blog\/?p=536\">Similar issues came up earlier<\/a> when American Express started to penalize people who\u00a0 shopped at dollar stores.<\/p>\n<p>I brought this up during my ethics talk in my data mining course, and my MBA students were split on this.\u00a0 On one hand, companies discriminate on statistical correlations a lot:\u00a0 teenage boys pay more for insurance than middle-aged women, for instance.\u00a0 But it seems unfair to penalize people for simply choosing to purchase one item over another.\u00a0 Isn&#8217;t that what capitalism is about?\u00a0 But statistics don&#8217;t lie.\u00a0 Or do they?\u00a0 Do statistics from the past hold equivalent relevancy in today&#8217;s unusual economy?\u00a0 Is relying on past statistics making today&#8217;s economy even worse?\u00a0 Should a company search for something with a more direct correlation or is this correlation enough?<\/p>\n<p>At the <a href=\"http:\/\/www.tepper.cmu.edu\">Tepper School at Carnegie Mellon,<\/a> we generally put more faith in so-called structural models, rather than statistical models.\u00a0\u00a0 Can we get at the heart of what makes people default on credit card debt?\u00a0 For instance, spending more than you earn seems one thing that might directly effect the ability to pay back debt.\u00a0 It is hard to come up with a model where paying for drinks at the bar has a similar effect. But structural models tend to be pretty reduced-form.\u00a0 It is hard to include 85,000 different items (like in the study reported by the New York Times) in such models.<\/p>\n<p>I vacillate a lot about this issue.\u00a0 Right now, I am feeling that data mining like the &#8220;spend in a bar implies default on credit cards&#8221; can lead to interesting insights and directions, but those insights would not be actionable without some more fundamental insight into behavior.\u00a0 The level of &#8220;fundamentalness&#8221;\u00a0 would depend on the application:\u00a0 if I am simply deciding on a marketing campaign, I might not require too much insight;\u00a0 if I am setting or reducing credit limits, I would require much more.<\/p>\n<p>I guess this is particularly critical to me since I often play the bank at our Friday Beers.\u00a0 Since we might have 20 people at Beers, the tab can reach $300, and I sometimes grab the cash from the table and pay by credit card.\u00a0 Either the credit card companies have to come up with new rules (&#8220;If the tip is &gt; 25% [as it often is with us:\u00a0 they do like us at the bar!], then credit is OK; else ding the record&#8221;) or I better hit the ATM on Friday afternoons.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The New York Times Magazine this week is a special issue on debt (a topic that has a particular resonance to me:\u00a0 we are still paying off an expensive, but spectacular, year in New Zealand!).\u00a0\u00a0 There is a fascinating article on what credit card companies can learn about you based on your spending.\u00a0\u00a0 For instance, &hellip; <a href=\"https:\/\/mat.tepper.cmu.edu\/blog\/index.php\/2009\/05\/17\/whither-data-mining\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;Whither Data Mining?&#8221;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[15,39],"tags":[],"class_list":["post-679","post","type-post","status-publish","format-standard","hentry","category-data-mining","category-or-in-the-press"],"jetpack_featured_media_url":"","_links":{"self":[{"href":"https:\/\/mat.tepper.cmu.edu\/blog\/index.php\/wp-json\/wp\/v2\/posts\/679","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mat.tepper.cmu.edu\/blog\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mat.tepper.cmu.edu\/blog\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mat.tepper.cmu.edu\/blog\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mat.tepper.cmu.edu\/blog\/index.php\/wp-json\/wp\/v2\/comments?post=679"}],"version-history":[{"count":0,"href":"https:\/\/mat.tepper.cmu.edu\/blog\/index.php\/wp-json\/wp\/v2\/posts\/679\/revisions"}],"wp:attachment":[{"href":"https:\/\/mat.tepper.cmu.edu\/blog\/index.php\/wp-json\/wp\/v2\/media?parent=679"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mat.tepper.cmu.edu\/blog\/index.php\/wp-json\/wp\/v2\/categories?post=679"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mat.tepper.cmu.edu\/blog\/index.php\/wp-json\/wp\/v2\/tags?post=679"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}