Data Mining Competition from FICO and UCSD

I am a sucker for competitions.  I have run a few in the past, and I see my page on the Traveling Tournament Problem as an indefinite length computational competition.    Data Mining naturally leads to competitions:  there are so many alternative techniques out there and little idea of what might work well or poorly on a particular data set.  The preeminent challenge of this type is the Netflix Prize, where the goal is to better predict customer movie ratings, and win a million dollars in doing so.  I have written before about the lessons to be learned from this particular challenge (in short:  while it might be a nice exercise, it is pretty clear that the improvements given by the algorithm would have little noticeable effect on the customer experience).

FICO (formerly known as FairIsaac) has sponsored a data mining competition with the University of California San Diego for a number of years.  The competition is open to all students (and postdocs) and have just announced the 2009 competition.  The website for the competition is now open, with a finish date for the competition of July 15, 2009.

The data involves detecting anomalous e-commerce transactions and come in “easy” and “hard” versions.  I have spent a couple of minutes with the data and it is quite interesting to work with.

I do have one complaint with this sort of data mining.  In my data mining class, I stress that you can do much better data mining if you understand the business context.  This understanding need not be overly deep, but it is hard to analyze data that is simply given as “field1”, “field2”, and so on.  For problems where creating new fields is important (say, aggregating ten types of insurance policies into one new field giving number of insurance policies purchased), if you don’t understand what the data means, it is impossible to generate appropriate new fields.  The data set in this competition has had its fields anonymized so strongly that finding any creative new fields will be more a matter of luck than anything else.

Despite this caveat, I think the competition is a great chance for students to show off what they have learned or developed.  It would be particularly nice for an operations research approach to do well.  And it doesn’t last forever like the Traveling Tournament Problem or, it seems, the Netflix Prize.

Kindle and Math

Added January 6 2012.  Note that this post refers to the kindle circa 2009.  See this discussion on reddit for more recent (late 2011) information.  Unfortunately I no longer use a Kindle so I cannot provide any updated information.

The new Kindle from Amazon is out, and it is receiving a lot of press.  Aurelie Thiele points out the funny pricing of Amazon.  Of course, none of this is open in any sense of the word:  Amazon wants to keep control here.  I bought a Kindle for a trip I am on, and I really enjoy it so far, but I really bought it for research reasons: for reasons I’ll make clear in an upcoming post, I really need to travel with a large amount of technical material, so I thought this would be a good thing.  But how to get math on the Kindle?  My friend and sometime co-author Stan Zin has been working on this and writes:

I converted a pdf file of a paper with lots of math into a
Kindle-readable azw file (using @free.kindle.com).  It can’t handle the math very well, especially multi-line formulas.  Basically the math is completely garbled in translation.  I also tried to first convert pdf to a graphics file (eg, jpg, gif, png) then convert that to an azw file.  Now it’s unreadable because of scale.  The Kindle version doesn’t seem to be zoomable, and so is also unreadable.  Since Kindle’s azw format will handle Greek letters, as well as subscripts and superscripts, it would seem to have all the necessary components for generating complicated math.  But the conversion step from pdf doesn’t seem to be the way to go.  I was wondering if your OR blog readers might take this as a challenge.  How hard could it be for a hacker to create a LaTex2azw program?  I think there would be a big demand if it worked reasonably well.

What do you think?  Is there a way to get math on the Kindle?

ROADEF Challenge

The French OR society ROADEF runs a challenge every two years. This year’s challenge (actually the 2009 challenge) is on an interesting and important topic: flight disruption.

—————————————————————–
Subject of the challenge 2009: Disruption Management for commercial aviation
—————————————————————–
Commercial airlines operate flights according to a published flight schedule that is optimized from a revenue standpoint. However, external events such as mechanical failures, personnel strikes, or inclement weather frequently occur, disrupting planned airline operations. In such cases, it is necessary to find efficient solutions that minimize the duration of the disruption, thus minimizing its impact.

Traditionally, resources are re-allocated sequentially, according to a natural hierarchy: aircraft, crew, passengers. Nonetheless, this method is seriously flawed. Namely, decision making at the local level, concerning one resource, can lead to global repercussions, affecting all resources. For example, a change in the flight schedule, potentially impacting aircraft resources, may also lead to a missed connection for crew or for passengers. Thus, an increasing effort is being made to integrate different levels of decision making. The topic proposed for this challenge follows this trend, since it aims at re-assigning aircraft and passengers simultaneously in case of disruptions.

There are already some instances at the challenge web site to get you started.

More do OR and make money!

Francisco Marco-Serrano, author of the fmwaves blog pointed out another contest to determine a recommender system.  Unlike the Netfix contest, which might not end up with a winner, the MyStrands contest appears to guarantee $100,000, and all that is needed is an idea.  Again, this seems like a good opportunity for those in OR to use our approaches to attack these sorts of problems (it seems that CS “owns” recommender systems, for no obvious reason).

Hmmmm…. blog recommendations?  systems that allow faculty to rate students?  sabbatical destination recommendations?   What do I need recommended, and how can OR help?

Aurelie Thiele on Operations Research

Aurelie Thiele of Lehigh has a wide-ranging blog on “Thoughts on business, engineering and higher education”. Many of her posts are on operations research. I particularly liked her thoughtful piece on the role operations research plays in the Grand Challenges in Engineering. The leadership of INFORMS put together a white paper on the subject, which I wrote about a while ago. Aurelie takes issue with the fragmented aspect of the proposed role:

The applications-driven paper lacks the unifying theme that a focus on information management would have provided, and instead OR comes across as an add-on to other people’s expertise – certainly valuable, but not critical. I’m not sure why anyone would want to be portrayed as jack of all trades but master of none… The authors also miss the opportunity to portray operations researchers as the center of inter-disciplinary teams bringing scientists from various disciplines together, drawing from their experience in one area to help researchers in another. When I finished reading the paper I wasn’t particularly excited to be working in the field, but I give the authors credit for trying – marketing OR is an uphill battle, given the aversion to math of most regular folks, and every little thing helps.

I have been struggling with many of this very issue during my year in New Zealand (since I have had the opportunity to give a number of “big picture” talks). Is OR just a collection of tools that we jealously guard or is there more commonality amongst us? And are we critical, or just a bit of sprinkle on top of the ice cream sundae? Of course, I remain excited by the field, and reading the rest of the blog, I think Aurelie does also.

Aurelie has been blogging since March: I can’t believe it took me so long to stumble across this excellent blog. Check it out!

Student finds small Turing machine

I wrote previously about a competition Stephen Wolfram (of Mathematica fame) had to show that a particularly small Turing Machine (the “2,3 Turing Machine”) is universal. Sure enough it is, as shown by a University of Birmingham (UK) student, Alex Smith, as reported in Nature. This machine, as shown in a diagram from Wolfram’s blog on the prize has just 2 states and 3 colors. It is certainly surprising that such a simple structure can compute anything any computer can!

I had hoped that operations research might be of help in this, but that does not seem to be the case.

Smith learned about Wolfram’s challenge in an Internet chat room and almost immediately went to work fiddling with the machine. After learning its behaviour, he set about proving that it was computationally equivalent to another type of simple, conceptual computer known as a tag system.

Mathematicians have already shown that tag systems can compute any problem, so proving the two were equivalent effectively proved the power of Wolfram’s machine. Smith’s proof is 44 pages long.

My (what is the word for “person who makes me jealous”? Hmmm… I’ll make up a word: “Jealous Idol”) jidol, Scott Aaronson managed to tear himself away from supermodels to provide a quote (albeit of a “raining on a parade” type):

The solution isn’t hugely relevant to modern computer science, says Scott Aaronson, a computer scientist at the Massachusetts Institute of Technology (MIT) in Cambridge, Massachusetts. “Most theoretical computer scientists don’t particularly care about finding the smallest universal Turing machines,” he wrote in an e-mail. “They see it as a recreational pursuit that interested people in the 60s and 70s but is now sort of ‘retro’.”

Thanks to Kathryn Cramer for pointing out the result of this competition. She also has a very nice posting in her blog on visualizing the effect of the Federal Reserve rate cut, relevant to my previous posting on visualization.

Operations Research and Turing Machines

From slashdot.com:

An anonymous reader writes “Stephen Wolfram, creator of Mathematica and author of A New Kind of Science, is offering a prize of $25K to anyone who can prove or disprove his conjecture that a particular 2-state, 3-color Turing machine is universal. If true, it would be the simplest universal TM, and possibly the simplest universal computational system. The announcement comes on the 5-year anniversary of the publication of NKS, where among other things Wolfram introduced the current reigning TM champion — ‘rule 110,’ with 2 states and 5 colors.”

Operations research (particularly integer programming) seems relevant to this work (through things like the undecidability of integer quadratic programming). $25,000 anyone?

Grand Challenges in Engineering

The National Academy of Engineering is soliciting thoughts on “Grand Challenges for Engineering“, setting an agenda for the next 100 years. Of course, such a goal is impossibly ambitious: imagine trying to do such a think in 1907, and come even close to what the last 100 years have achieved (or failed at). But in OR we understand the value of long-term planning with rolling horizons. If we don’t think about where we might want to go, we can’t even take the first step.

What is the role of OR in these Grand Challenges? As a field, we seem to be successful in the details, but less successful in the big picture. Even trying to define “who we are” causes more smoke than light. But we should think big: without the skills and knowledge of those in OR, any problem sufficiently broad and important to justify the title “Grand Challenge” is doomed to failure.

INFORMS President Brenda Dietrich, in her recent OR/MS Today article, talks about getting outside our comfort zone, and her comments really hit home. I am comfortable doing my research, and teaching students. I have gone a bit outside my comfort zone by moving to New Zealand for a year, but am I really stretching myself?

Thinking about Grand Challenges is a good way to get outside your comfort zone. Art Geoffrion and others have put together a wiki for OR people to think about Grand Challenges in Engineering and the role OR has to play. I think it is a great idea to spend some time thinking about the “big picture” and how engineering and OR fits into it. And definitely check out the “Grand Challenges in OR” site!