Skip to content

Data Mining Competition from FICO and UCSD

I am a sucker for competitions.  I have run a few in the past, and I see my page on the Traveling Tournament Problem as an indefinite length computational competition.    Data Mining naturally leads to competitions:  there are so many alternative techniques out there and little idea of what might work well or poorly on a particular data set.  The preeminent challenge of this type is the Netflix Prize, where the goal is to better predict customer movie ratings, and win a million dollars in doing so.  I have written before about the lessons to be learned from this particular challenge (in short:  while it might be a nice exercise, it is pretty clear that the improvements given by the algorithm would have little noticeable effect on the customer experience).

FICO (formerly known as FairIsaac) has sponsored a data mining competition with the University of California San Diego for a number of years.  The competition is open to all students (and postdocs) and have just announced the 2009 competition.  The website for the competition is now open, with a finish date for the competition of July 15, 2009.

The data involves detecting anomalous e-commerce transactions and come in “easy” and “hard” versions.  I have spent a couple of minutes with the data and it is quite interesting to work with.

I do have one complaint with this sort of data mining.  In my data mining class, I stress that you can do much better data mining if you understand the business context.  This understanding need not be overly deep, but it is hard to analyze data that is simply given as “field1”, “field2”, and so on.  For problems where creating new fields is important (say, aggregating ten types of insurance policies into one new field giving number of insurance policies purchased), if you don’t understand what the data means, it is impossible to generate appropriate new fields.  The data set in this competition has had its fields anonymized so strongly that finding any creative new fields will be more a matter of luck than anything else.

Despite this caveat, I think the competition is a great chance for students to show off what they have learned or developed.  It would be particularly nice for an operations research approach to do well.  And it doesn’t last forever like the Traveling Tournament Problem or, it seems, the Netflix Prize.