Help Santa Plan His Tour (Twice!)

Just in time for the holidays, there is a very nice competition at Kaggle: The Traveling Santa Problem. The problem was devised by the ever-creative Bob Bosch, about whom I have written before. The problem is an interesting variant on the Traveling Salesman Problem. Given a set of points and distances between them, the TS(alesman)P is to find the shortest cycle through all the points.  The TS(alesman)P would not be a great competition problem:  you could save the effort and just send the prize to my soon-to-be-neighbor Bill Cook with his Concorde code.

The TS(anta)P asks for two disjoint cycles, with a goal of minimizing the longer of the two. Of course, there is a clear explanation of why Santa wants this:

Santa likes to see new terrain every year–don’t ask, it’s a reindeer thing–and doesn’t want his route to be predictable.

OK, maybe it not so clear. But what Santa wants, Santa gets.

There is just one instance to work with, and it is a monster with 150,000 points. It turns out that the points are not randomly scattered throughout the plane, nor do they seem to correspond to real cities. I won’t spoil it here, but there is a hint at the kaggle discussion boards.

There are prizes for the best solution by the end of the competition (January 18, 2013) and, most interestingly, at a random date to be chosen between December 23 and January 17.  The $3000 in prizes would certainly make a nice Christmas bonus (that is  7.5 Lego Death Stars!). You can check out the full rules, leaderboard, discussion, and more at the competition site.

I personally find this competition much more interesting than the data-mining type competitions like the General Electric sponsored Flight Quest (which is certainly not uninteresting). In Flight Quest, the goal is to predict landing times for planes. This is all fine and good, but as an operations researcher, I want to make some decisions to change those times to make the system work better.  Helping Santa avoid ennui might not be particularly realistic but it is much better than simply predicting when my Christmas toys arrive.

If we can get a good turnout for the Santa problem, perhaps we can see more optimization competitions, or better yet, competitions that combine data mining with optimization.

Which Average do you Want?

Now that I am spending a sentence as an academic administrator in a business school (the Tepper School at Carnegie Mellon University), I get first-hand knowledge of the amazing number of surveys, questionnaires, inquiries, and other information gathering methods organizations use to rank, rate, or otherwise evaluate our school. Some of these are “official”, involving accreditation (like AACSB for the business school and Middle States for the university). Others are organizations that provide information to students. Biggest of these, for us, is Business Week, where I am happy to see that our MBA program went up four positions from 15th to 11th in the recent ranking. Us administrators worry about this so faculty don’t have to.

Responding to all these requests takes a huge amount of time and effort. We have a full-time person whose job is to coordinate these surveys and to analyze the results of them. Larger schools might have three or four people doing this job. And some surveys turn out to be so time-intensive to answer that we decline to be part of them. Beyond Grey Pinstripes was an interesting ranking based on sustainability, but it was a pain to fill out, which seems to be one reason for its recent demise.

As we go through the surveys, I am continually struck by the vagueness in the questions, even for questions that seem to be asking for basic, quantitative information. Take the following commonly asked question: “What is the average class size in a required course?”. Pretty easy, right? No ambiguity, right?

Let’s take a school with 4 courses per semester, and two semesters of required courses. Seven courses are “normal”, classes run in 65 student sections, while one course is divided into 2 half-semester courses, each run in 20 student seminars (this is not the Tepper School but illustrates the issue). Here are some ways to calculate the average size:

A) A student takes 9 courses: 7 at 65 and 2 at 20 for an average of 55.
B) If you weight over time, it is really 8 semester-courses: 7 at 65 and 1 at 20 for an average of 59.4
C) There are about 200 students, so the school offers 21 sections of 65 student classes and 20 sections of size 20 for an average of 43.

Which is the right one? It depends on what you are going to use the answer for. If you want to know the average student experience, then perhaps calculation B is the right one. An administrator might be much more concerned about calculation C, and that is what you get if you look at the course lists of the school and take the average over that list. If you look at a student’s transcript and just run down the size for each course, you get A.

We know enough about other schools that we can say pretty clearly that different schools will answer this in different ways and I have seen all three calculations being used on the same survey by different schools. But the surveying organization will then happily collect the information, put it in a nice table, and students will sort and make decisions based on these numbers, even though the definition of “average” will vary from school to school.

This is reminiscent of a standard result in queueing theory that says that the system view of a queue need not equal a customer’s view. To take an extreme example, consider a store that is open for 8 hours. For seven of those hours, not a single customer appears. But a bus comes by and drops off 96 people who promptly stand in line for service. Suppose it takes 1 hour to clear the line. On average, the queue length was 48 during that hour. So, from a system point of view, the average (over time) queue length was (0(7)+48(1))/8=6. Not too bad! But if you ask the customers “How many people were in line when you arrived?”, the average is 48 (or 47 if they don’t count themselves). Quite a difference! What is the average queue length? Are you the store or a customer?

Not surprisingly, if we can get tripped up on a simple question like “What’s your average class size?”, filling out the questionnaires can get extremely time consuming as we figure out all the different possible interpretations of the questions. And, given the importance of these rankings, it is frustrating that the results are not as comparable as they might seem.

Registries To Avoid Publication Bias

I have been thinking about the issue of how a field knows what they know.  In a previous post, I wrote about how the field of social psychology is working through the implications of fraudulent research, and is closely examining the cozy interactions between journals, reviewers, and famous researchers.   And any empirical field based on statistical analysis has got to live with the fact that if there 1000 results in the field, some number (50 perhaps, if p=.05 is a normal cutoff and lots of results are just under that value) are going to be wrong just because the statistical test created a false positive.  Of course, replication can help determine what is real and what is not, but how often do you see a paper “Confirming Prof. X’s result”?  Definitely not a smooth path to tenure.

This is worse if malevolent forces are at work.  Suppose a pharmaceutical company has bet the firm on drug X, and they want to show that drug X works.  And suppose drug X doesn’t work.  No problem!  Simply find 20 researchers, sign them to a non-disclosure, and ask them to see if drug X works.  Chances are one or more researchers will come back with a statistically significant result (in fact, there is about a 65% chance that one or more will, given a p=.05).  Publish the result, and voila!  The company is saved!  Hurray for statistics and capitalism!

Fortunately, I am not the first to see this issue:  way back in 1997, the US Congress passed a bill requiring the registration of clinical trials, before the trials get underway.

The first U.S. Federal law to require trial registration was the Food and Drug Administration Modernization Act of 1997 (FDAMA) (PDF).

Section 113 of FDAMA required that the National Institutes of Health (NIH) create a public information resource on certain clinical trials regulated by the Food and Drug Administration (FDA). Specifically, FDAMA 113 required that the registry include information about federally or privately funded clinical trials conducted under investigational new drug applications (INDs) to test the effectiveness of experimental drugs for patients with serious or life-threatening diseases or conditions.

This led to the creation of clinicaltrials.gov (where I am getting this history and the quotes) in 2000.  This was followed by major journals requiring registration before papers could be considered for publication:

In 2005 the International Committee of Medical Journal Editors (ICMJE) began to require trial registration as a condition of publication.

The site now lists more than 130,000 trials from around the world.  It seems this is a great way to avoid some (but by no means all!) fraud and errors.

I think it would be useful to have such systems in operations research.  When I ran a DIMACS Challenge twenty years ago, I had hoped to keep up with results on graph coloring so we had a better idea of “what we know”:  then and now there are graph coloring values in the published literature that cannot be correct (since, for instance, they contradict published clique values:  something must be wrong!).  I even wrote about a system more than two years ago but I have been unable to find enough time to take the system seriously.  I do continue to track results in sports scheduling, but we as a field need more such systems.