Sad News on the Netflix Prize

The followup to the wildly successful Netflix Prize has been canceled due to privacy concerns.  From a blog post by Neil Hunt, Chief Product Officer for Netflix:

In the past few months, the Federal Trade Commission (FTC) asked us how a Netflix Prize sequel might affect Netflix members’ privacy, and a lawsuit was filed by KamberLaw LLC pertaining to the sequel. With both the FTC and the plaintiffs’ lawyers, we’ve had very productive discussions centered on our commitment to protecting our members’ privacy.

We have reached an understanding with the FTC and have settled the lawsuit with plaintiffs. The resolution to both matters involves certain parameters for how we use Netflix data in any future research programs.

In light of all this, we have decided to not pursue the Netflix Prize sequel that we announced on August 6, 2009.

This is very sad news.  The first Netflix Prize did a lot of good in showing how combinations of different approaches can work better than any individual approach, and also in showing how difficult it is to find economically significant models for some types of recommendation systems.  I, along with many, had looked forward to the new insights (and possible visibility for operations research) another Netflix Challenge would bring.

For now, we are left with predicting murders in Philadelphia.  Unless someone finds privacy concerns there.

AIMMS Contest

I can’t resist competitions in operations research.  It brings out the competitor in me, even if it is more like ESPN and sports:  I like to watch others doing the work!

AIMMS (whose software I use in class) is sponsoring their second modeling competition, in conjunction with this year’s MOPTA (Modeling and Optimization: Theory and Applications) conference.  Last year’s competition was on scheduling trucks subject to maintenance requirements.  This year’s competition is on creating financial portfolios that embed tax issues:

Classical models used in portfolio optimization focus on return and risk. More complicated models take into account the effect of trading costs. In this case study your team will have to develop a tool to optimize a portfolio in the presence of different tax rules.

Financial models are not really my thing, but the case looks rich and interesting.  I think next year when we do the course, we might just assign the competition as the course project and see if our students can come up with a competitive result.

If anyone at CMU would like to take this on, and would like some faculty support (arranging for an independent study or similar), let me know:  it looks like  a lot of fun.

(In the interest of full disclosure, I should note that my colleague Willem van Hoeve arranged to get the AIMMS software for free for our course.   With AIMMS, we use Gurobi under their standard (free) academic licenses.)

Data Mining, Operations Research, and Predicting Murders

John Toczek, who writes the PuzzlOR column for OR/MS Today  (example), has put together a new operations research/data mining challenge in the spirit of, though without the million dollar reward of, the Netflix Prize.  The Analytics X Prize is a  fascinating problem:

Current Contest – 2010 – Predicting Homicides in Philadelphia

Philadelphia is a city with 5.8 million people spread out over 47 zip codes and, like any major city, it has its share of crime.  The goal of the Analytics X Prize is to use statistical techniques and any data sets you can find to predict where crime, specifically homicides, will occur in the city.  The ability to accurately predict where crime is likely to occur allows us to deploy our limited city resources more effectively.

What I really like about this challenge is how open-ended it is. Unlike the Netflix Prize, there is no data set to be analyzed. It is up to you to determine what might be an interesting/useful/important data set. Should you analyze past murder rates? Newspaper articles? Economic indicators? Success in this might require a team that mixes those who understand societal issues with data miners and operations researchers. This, to me, makes it much more of an operations research challenge than a data mining challenge.

I also like how the Prize handles evaluation: you are predicting the future, so murders are counted after your submission. Unless you have invented time travel, there is no way to know the evaluation test set, nor can you game it like you could in the Netflix Prize (at the risk of overfitting).

I asked John why he started this prize, and he replied:

I started this project about a year ago when trying to think of ways to
attract students and people from other professions into the OR field. I
write an article in ORMS Today called the PuzzlOR which I originally
started in hopes of attracting more students to our field. OR can be a bit
overwhelming when you first get into it so I wanted a way to make it easier
for the newcomers. The puzzles I wanted to run were getting a bit out of
hand in their complexity so I needed some other place to house them.

Plus, I thought it would be good advertising for the OR field in general
and would have positive impact on the city where I live.

He’s already gotten good local press for the project. The Philadelphia City Paper ran a nice article that mentions operations research prominently:

Operations research may not sound sexy; it focuses on analytics and statistics — determining which data in a gigantic data haystack is most relevant — in order to solve big problems.

There is a monetary prize involved: $20 each month plus $100 at the end of the year. It is probably a good thing that this is not a million dollar prize. Since entries are judged based on how well they do after submission, too high a prize might lead to certain … incentives … to ensure the accuracy of your murder predictions.

Operations Research Embarrassments and the Traveling Tournament Problem

Dick Lipton of Georgia Tech has a very nice blog on the theory of computation (though it ranges broader than that). He has a fascinating post on “Mathematical Embarrassments”. A “Mathematical Embarrassment” does not refer to the mistaken proof I had in a real analysis course twenty five years ago (that still causes me to cringe) or a similar goof. Instead, it is defined as:

A mathematical embarrassment (ME) is a problem that should have been solved by now. An ME usually is easy to state, seems approachable, and yet resists all attempts at an attack.

This contrasts with a Mathematical Disease (MD), which is harder to define but has a number of characteristics:

1. A problem must be easy to state to be a MD. This is not sufficient, but is required. Thus, the Hodge-Conjecture will never be a disease. I have no clue what it is about.
2. A problem must seem to be accessible, even to an amateur. This is a key requirement. When you first hear the problem your reaction should be: that is open? The problem must seem to be easy.
3. A problem must also have been repeatedly “solved” to be a true MD. A good MD usually has been “proved” many times—often by the same person. If you see a paper in with many “updates” that’s a good sign that the problem is a MD.

So, in operations research we see many people catch the “P=NP (or not)” disease, but I would not call the problem an embarrassment.

Lipton has a number of good examples of mathematical embarrassments, some of which overlap with the skills/interests of more mathematically inclined operations researchers, but none are “really” operations research. For instance, it seems that proving that both the sum and difference of pi and e are transcendental would be a homework problem from centuries ago, but it is still open. That is an embarrassment!

So what would be an operations research embarrassment?

On the computational side, I think of the Traveling Tournament Problem as a bit of an embarrassment (embarrassment is too strong for this: perhaps this is an “operations research discomfiture”). It is a sports scheduling problem that is easy to state, but even very small instances seem beyond our current methods. It was a big deal when the eight team problem was solved!

Just this month, I received word from New Zealand that the three 10-team examples (NL10, CIRC10, and Galaxy10) have all been solved (by David Uthus, Patricia Riddle, and Hans Guesgen). This is huge for the couple of dozen people really into sports scheduling! Proving optimality for this size takes 120 processors about a week so I would say that the problem is still a discomfiture.

But a discomfiture is not an embarrassment. The problem, though simple to state, is too small to rank as an embarrassment. So what would an “operations research embarrassment” be? And are there other “operations research diseases” other than P=NP (though one could argue that by causing people to spend computer-years in solving Traveling Tournament Problems, I may have infected some with a mild disease).

Become Famous by Winning the COIN-OR Cup!

COIN-OR_CMYK-296Do you use COIN-OR (open source software for operations research)?  According to the log files, lots and lots of people do!  If you are doing something interesting with COIN-OR and are planning to attend the San Diego INFORMS Meeting, I strongly encourage you to enter the COIN-OR Cup!  You could join John Forrest, Jonathan Eckstein+Bill Hart+Cindy Phillips, John Tomlin and the team from the Environmental Protection Agency + Sandia National Labs (including Jonathan Berry) as a winner of this prestigious Cup!  Fame and drinks will be yours forever!  Despite appearances, you do not need a John or Jonathan on your team to win, though it seems to help.

More seriously, if you have used COIN-OR in a neat application (or have advanced COIN-OR in a significant direction), it would be great to hear about it through an application for the award.  This would let the COIN people know of all the great ways the software is being used and help bring together the COIN-OR community.  The deadline is October 2, so don’t delay.  And thanks to IBM for sponsoring the reception (which is always a real highlight for the whole COIN-OR community).

Competition then Cooperation: More on the Netflix Challenge

Wired has a nice article on the teams competing to win the Netflix Prize (thanks for the pointer, Matt!).  I think the most interesting aspect is how the “competition” turned into a cooperation:

Teams Bellkor (AT&T Research), Big Chaos and Pragmatic Theory combined to form Bellkor’s Pragmatic Chaos, the first team to qualify for the prize on June 26 with a 10.05 percent improvement over Netflix’s existing algorithm. This triggered a 30-day window in which other teams were allowed to try to catch up.

As if drawn together by unseen forces, over 30 competitors — including heavy hitters Grand Prize Team, Opera Solutions and Vandelay Industries, as well as competitors lower on the totem pole — banded together to form a new team called, fittingly, The Ensemble.

In fact, with a bit more time, all those groups might have come together:

As much as these teams collapsed into each other during the contest’s closing stages, they might have mated yet again to ensure that everyone on both qualifying teams would see some of the $1 million prize. Greg McAlpin of The Ensemble told his team approached Bellkor’s Pragmatic Chaos and asked to join forces (he later clarified that he was still part of Vandelay Industries at this point), but was spooked by AT&T’s lawyers.“We invited their whole team to join us. The first person to suggest it was my 11-year-old son,” said McAlpin. “I thought it sounded like a really good idea, so I e-mailed the guys from Bellkor’s Pragmatic Chaos… [but] they said AT&T’s lawyers would require contracts with everyone to make agreements about who owns intellectual property… and it would take too long.”

Data mining methods can easily be combined.  A simple way is simply to have algorithms vote on the outcome.  This can often result in much better answers than any individual technique.  The Ensemble clearly did something like that:

To combine competitors’ algorithms, The Ensemble didn’t have to cut and paste much code together. Instead, they simply ran hundreds of algorithms from their 30-plus members (updated) and combined their results into a single set, using a variation of weighted averaging that favored the more accurate algorithms.

This is a great example of the intellectual advances that result from long-term “competitions”.  After a while, it is no longer competitors against competitors but rather all the competitors against the problem.  I don’t know if the Netflix Challenge was specifically designed to result in this, but it has turned out wonderfully.  It is much less likely that the groups would have gotten together if the contest was simply “Who has the best algorithm on August 1, 2009”.   The mix of initial competition (to weed out the bad ideas) and final cooperation (to get the final improvements) was extremely powerful here.

The winner announcement is expected in September.

Graph Coloring Benchmark System

For many years, I have wanted to put together a system for keeping track of results in solving instances of various combinatorial optimization problems.  This started back in the early 1990s when I was the coordinator for the DIMACS Challenge on Cliques, Coloring, and Satisfiability.  This was pre-web, so everything was done via ftp and email.  During that Challenge, David Johnson and I collected a large number of instances suitable for finding cliques and graph coloring (I also collected a number of satisfiability instances).   Lots of people tried a number of methods on these instances, and they became the “DIMACS Instances”, and have appeared in many, many papers (the resulting DIMACS volume is by far my most referenced work).

In 2002, David and I joined with Anuj Mehrotra to run some further workshops with an emphasis on graph coloring.  In 2008, a special issue of Discrete Applied Mathematics came out with ten or so papers from those workshops.

Throughout all this, I was not satisfied with the state of information.  What were the best solutions around?  Results were scattered throughout the literature, sometimes in hard-to-find places.  Worse, there seemed to be some inconsistencies (feasible solutions with values better than lower bounds) and no clear way of determining where the fault lay.  It was not enough for me to collect instances, I had to collect solutions, both values and colorings.  And I had to be much more organized about tracking results.  I was particularly interested in this since I do that for the Traveling Tournament Problem, and I think that has greatly increased interest in the problem.

But life is busy, and I never quite got around to putting the system together.  When I spent a year in New Zealand, I did some work on it, but moved on to other things.  I couldn’t get a doctoral student interested in it, so things kinda lay fallow.

A few days ago, the file I had on this churned to the top of the stack, and I refound the code I was working on.  And, while I do have some really pressing things to do (both editorial and survey article writing), I have spent some time advancing the code.   At this point, I think I can let the world see it, though it is a work in progress.  I have grand plans for this, so I have called it ROIS: Registry for Optimization Instances and Solutions. Right now, all it has the results from some graph coloring papers along with the instances.  As I move along, I will include finding cliques in graphs, the traveling tournament problem, and who knows what else.  Now that the basic code is there, it is extremely easy to add in new problems.

There is a lot more to do.  One key aspect that is not yet implemented is the ability to upload solutions and have them checked for correctness automatically.  This clearly is a key component of the system and about the only way of keeping “wrong” results out.  It is unclear how to handle lower bounds in the same way due to the lack of a succint certificate, but even handling one side is better than nothing.  I also need to add lots of things for sorting columns in different ways, but that is all pretty trivial (a good project for lazy mornings).

I’m entering in some of the published work now.  Any comments you have on this would be very much appreciated.

How to Design a Contest: The Netflix Challenge

It looks like the Netflix people made some good decisions when they designed their million dollar challenge. In particular, it appears that they kept two verification test sets: one that was the basis for the public standings and one that no one ever saw the results from. It is the success on the latter set that determines the winner. So BelKor, which appeared to come in second, based on the “public” verification test set, seems poised to be the winner based on the hidden test set. I put “public” in quotes, since the test set itself was not visible, but the results of each prediction on the test set was visible (as an aggregate statistic).

Why is this a good design? Any data set that gets as much of a workout as the public data set does is vulnerable to techniques that try to fit the model to that particular test set. In fact, there was discussion at the Netflix competition forum that some team or teams was doing exactly that: generating hundreds of slightly different predictions in an attempt to better fit the verification set. Such an approach, however, is counterproductive when it comes to working with a new, hidden data set. Any team that overfits the first verification test set is likely to do poorly on the second set.

So we’ll wait for official word, but it seems as though Netflix has a very nicely designed evaluation system

Netflix Prize ready to finish?

While I have been somewhat skeptical of the Netflix Prize (in short: it seems to be showing how little information is in the data, rather than how much; and the data is rather strange for “real data”), it is still fascinating to watch some pretty high powered groups take a stab at it. If I understand the standings correctly, “BellKor’s Pragmatic Chaos”, a group consisting of people from AT&T, Yahoo! Research, and two companies I am not familiar with, Commendo and Pragmatic Theory have passed the 10% improvement, which means we are now in the final 30 days to determine the overall winner. I wonder if anyone has a killer model hidden for just this time.