Authorship Order

Michael Mitzenmacher, in his excellent blog, My Biased Coin, has recent entries (here, here and here) on the order of authors on joint papers. When you have a last name that begins “Tri…”, it becomes pretty clear early on that alphabetical order is not going to result in a lot of “first author” papers. And it does tick me off when my work in voting theory becomes “Bartholdi et al.” or my work on the Traveling Tournament Problem is “Easton et al.”. I have even seen “Easton, Nemhauser, et al.” which is really hitting below the belt (since it is Easton, Nemhauser, and Trick).

Despite that, all of my papers have gone with alphabetical order, and I am glad I (and my coauthors) went that route. If even once I had gone with “order of contribution”, all of my papers would have been tainted with the thought “Trick is listed third: I guess he didn’t do as much as the others”.

The issue of determining “order of contribution” is a thorny one. There tend to be many skills that go into a paper, and we know from social choice how difficult it is to aggregate multiple orders into a single ordering. Different weighting of the skills leads to different orderings, and there is no clear way to choose the weighting of the skills. Even with the weighting, determining the ordering of any particular aspect of the paper is often not obvious. When doing a computational test, does “running the code” and “tabulating the results” mean more than “designing the experiment” or “determining the instances”? I don’t think hours spent is a particularly good measure (“Hey, I can be more inefficient than you!”) but there is practically nothing else that can be objectively measured.

Further, most papers rely on the mix of skills in order to be publishable. This reminds me of an activity I undertook when I was eight or so. I had a sheet of paper and I went around surveying anyone around on what was more important: “the brain, the heart, or the lungs” (anyone with a five-year-old kid will recognize a real-life version of “Sid the Science Kid” and, yes, I was a very annoying kid, thanks for asking). My father spent time explaining to me the importance of systems, and how there is no “most important” in any system that relies on the others. I would like to say that this early lesson in “systems” inspired me to make operations research my field of study, but I believe I actually browbeat him until he gave up and said “gall bladder” in order to get rid of me. But the lesson did stay with me (thanks, Dad!), and perhaps I was more careful about thinking about systems after that.

Some of the arguments over order strike me as “heart versus lungs” issues: neither can survive without the other. So, if a person has done enough work that the paper would not have survived without them, that both makes them a coauthor, and entitles them to their place in alphabetical order.

As for the unfairness of having a last name beginning “Tri…”, perhaps we should talk to my recent coauthors: Yildiz, Yunes, and Zin.

NSF Operations Research Position open

The National Science Foundation is looking for a program director for operations research. I wrote about this position the last time it came open, when Robert Smith became director.

The NSF is incredibly important to the health of the field of operations research. In addition to the “regular” grant activities (CAREER grants and basic research grants), the program director has the opportunity to make the case for having OR as part of interdisciplinary and exploratory research directions. I hope someone good finds this an interesting opportunity. Robert Sloan has an interesting article on the joys of being a program director in computer science.

Doing Good with Good OR, 2010 edition

INFORMS is again sponsoring a student project prize on the theme “Doing Good with Good OR”:

Doing Good with Good OR-Student Competition is held each year to identify and honor outstanding projects in the field of operations research and the management sciences conducted by a student or student group that have a significant societal impact.

The projects must have, or are likely to have, a significant societal impact, and operations research and management science methods and tools (broadly interpreted) must be central to the success of the projects described. “Societal impact” should be construed to mean an impact on individuals, communities and organizations that goes beyond that associated with a private-sector for-profit initiative. The projects might also strive to include innovation through theory, creative computational methods, and should address implementation challenges.

Last year, David Hutton of Stanford won the award for work on screening Hepatitis B.

The submission deadline for this year’s award is May 15, 2010.

Journalists Should Be Required to Pass an Exam on Conditional Probability

There is nothing more grating than having a journalist toss around numbers showing no understanding of conditional probability (actually, there are 12 more grating things, but this ranks right up there).  In a nice story from NBC Chicago, journalists Dick Johnson and Andrew Greiner write about an autistic teen who has picked the first two rounds of the NCAA tournament correctly:

An autistic teenager from the Chicago area has done something almost impossible.

Nearly 48 games into an upset-filled NCAA tournament, 17-year-old Alex Herrmann is perfect.

“It’s amazing,” he says. Truly.

Yes it is amazing. But the writers get tripped up when trying to project the future:

There are still four rounds remaining, so it could fall apart — the odds of a perfect wire to wire bracket is about 1 in 35,360,000 by some measures or 1 in 1,000,000,000,000 by others.

Aaargh! Let’s let pass the factor of 28,000 or so difference in estimates. THIS IS NOT THE RELEVANT STATISTIC! We already know that Alex has picked the first two rounds correctly. We are interested in the probability he has a perfect bracket given he picked the first 48 games correctly. This is about the simplest version of conditional probability you can get.

If all he did was flip a coin for each of the remaining 15 games, he would have a one in 32,768 chance of having a perfect bracket, given where he is now. Not great odds, certainly but nothing like the probabilities given in the quote. You can argue whether 50/50 on each the fifteen remaining games is the right probability to use (Purdue as champion?), but there is absolutely no justification for bringing in the overall probability of a perfect bracket.  By quoting the unconditional probability (and who knows where those estimates come from), the writers vastly underestimate Alex’s chance of having a perfect bracket.

I swear I see the confusion between unconditional probabilities and conditional probabilities twice a day. I suspect the stroke that will finally get me will be caused by this sort of error.

Edit.  9:06PM March 23. From the comments on the original post, two further points:

  1. The writers also seem confused about joint probabilities:
  2. One in 13,460,000, according to BookofOdds.com. It’s easier to win the lottery. Twice.

    No… not unless your lottery has the probability of winning of one in the square root of 13,460,000, or one in 3669. While there are “lotteries” with such odds, the payoffs tend to be 1000 to 1, not millions to 1. I bet they thought winning one lottery might be 1 in 7,000,000 so two lotteries “must be” 1 in 14,000,000. No, that is not the way it works.

  3. It appears that if you manage a pool on cbs.sportsline.com, you can edit the picks after the games. That might be a more reasonable explanation for picking 48 games, but it is hard to tell.

So, to enumerate what journalists should be tested on, lets go with:

  1. Conditional Probability
  2. Joint Probabilities
  3. Online editing possibilities

You are welcome to add to the certification test requirements in the comments.

Update on LRMC after first round

Sokol and teams’ Logistic Regression/Markov Chain approach had a pretty good first round in the NCAA tournament.  It correctly picked 24 of the 32 games.  On the plus side, it picked the following upsets (NCAA seeds in parens):

Northern Iowa (9) over UNLV (8), Georgia Tech (10) over Oklahoma State (7), Murray State (13) over Vanderbilt (4), Old Dominion (11) over Notre Dame (6), St. Mary’s (10) over Richmond (7)

It incorrectly predicted upsets

San Diego State (11) over Tennessee (6), Florida State (9) over Gonzaga (8), Utah State (12) over Purdue (4)

It missed the upsets

Ohio (14) over Georgetown (3), Missouri (10) over Clemson (7), Washington (11) over Marquette (6), Cornell (12) over Temple (5), Wake Forest (9) over Texas (8)

Overall, favorites (higher seeds) only won 22 of the 32 first round games, so in that sense LMRC is doing better than the Selection Committee’s rankings.  3 of LRMC’s “Sweet Sixteen” have been eliminated but they are still fine for the round of eight on.

March Madness and Operations Research, 2010 Edition

Normally I do a long post on operations research and predicting the NCAA tournament.  I did so in 2009, 2008, 2007 and even in 2006 (when I think I made blog entries with an IBM selectric typewriter).   This year, I will cede the ground to Laura McLay of Punk Rock Operations Research, who has a very nice series of OR related entries on the NCAA tournament (like this post and the ones just previous to it).  I’d rather read her work than write my own posts.

That said, perhaps just a comment or two on Joel Sokol (and his team)’s LRMC picks, as covered in the Atlanta Business Chronicle.  Joel’s ranking (LRMC stands for Logistic Regression Markov Chain) can be used to predict winners.  They have a page for their tournament picks.  Some notable predictions:

1) Their final 4 consists of 3 number 1’s (Kansas, Syracuse, and Duke) and one number 2 (West Virginia).  The remaining number 1 is Kentucky, ranked only number 9 overall by LRMC.

2)  Kansas beating Duke is the predicted final game.

3) 7th seeded BYU is ranked 4th overall, so makes the Elite Eight until knocked off by Syracuse (3).

4) 12th seeded Utah State is predicted to beat 5th seeded Texas A&M and 4th seeded Purdue.

5) 13th seeded Murray State over 4th seeded Vanderbilt is the biggest predicted first round upset.

Let’s see how things turn out.

Nurse, Nurse! Where’s my Nurse?

I am a sucker for competitions.  The people at PATAT (a conference series Practice and Theory of Automated Timetabling; I was co-chair for their 2004 conference, and a member of their steering committee for a few years) do a really good job at timetabling competitions.  I really liked their competition for the 2008 conference on course and exam timetabling.  I had some thoughts of using logical Benders’ to solve one of the problems but didn’t quite get my act together for it.  But I liked the problems and I liked the organization.

The next competition has just begin, though it is on an accelerated schedule.  The topic is nurse rostering and ends June 1, 2010.  This is an important practical problem:

Building good rosters for nurses in hospitals has received ample attention in recent years and is recognised as a difficult combinatorial optimisation problem with practical relevance. In hospitals much effort is spent producing solutions which are workable and of a high quality.

It is a nicely designed competition, but be aware that you can’t use commercial codes.  So those of us who use CPLEX/XPRESS/Gurobi as a linear or integer programming base are probably out of luck:

Rule 5

Participants have to implement an algorithm to tackle the problem on a single processor machine; they can use any programming language. The use of third-party software is allowed under the following restrictions:

  • it is free software
  • it’s behaviour is (reasonably) documented
  • it runs under a commonly-used operating system (Linux or Windows)

Working with COIN-OR is probably legal, under a broad definition of “reasonably documented”. But it is interesting that Mac operating systems are not deemed “commonly-used”.

Despite these caveats, I do recommend the competition, and perhaps will see you in Belfast!

Sad News on the Netflix Prize

The followup to the wildly successful Netflix Prize has been canceled due to privacy concerns.  From a blog post by Neil Hunt, Chief Product Officer for Netflix:

In the past few months, the Federal Trade Commission (FTC) asked us how a Netflix Prize sequel might affect Netflix members’ privacy, and a lawsuit was filed by KamberLaw LLC pertaining to the sequel. With both the FTC and the plaintiffs’ lawyers, we’ve had very productive discussions centered on our commitment to protecting our members’ privacy.

We have reached an understanding with the FTC and have settled the lawsuit with plaintiffs. The resolution to both matters involves certain parameters for how we use Netflix data in any future research programs.

In light of all this, we have decided to not pursue the Netflix Prize sequel that we announced on August 6, 2009.

This is very sad news.  The first Netflix Prize did a lot of good in showing how combinations of different approaches can work better than any individual approach, and also in showing how difficult it is to find economically significant models for some types of recommendation systems.  I, along with many, had looked forward to the new insights (and possible visibility for operations research) another Netflix Challenge would bring.

For now, we are left with predicting murders in Philadelphia.  Unless someone finds privacy concerns there.

Google Maps API Enhancements

Google just announced some enhancements to their Directions in Maps API.  One addition, “avoiding tolls and highways” doesn’t really affect me much:  we have only one toll road in the area, and it is pretty well needed to go either east or west.  But the other two are exciting!

First, the API now adds bicycle routing.  I don’t know if it will let you go the wrong way down one way streets or hop on a crowded sidewalk to make a shortcut, but it does contain at least some of the long-distance bike paths.  Check out the path from my house to my buddy Barack’s place.  There is a bike path starting from about 20 miles from Pittsburgh all the way to Washington, DC.  I turn 50 this year, and have promised myself that I will do that trip.  Of course, I have broken promises before, and if I don’t start training soon, it won’t happen.  But it is nice to see Google will be by my side if I do set off.

The second, and more relevant to OR, enhancement adds Traveling Salesman routing capability.  From the announcement:

Route optimization. Have many places to go but no preference as to the order you visit them in? We can now reorder the waypoints of your route to minimize the distance and time you must travel. Very useful for traveling salesman I hear.

Now I would love to experiment with this!  There have been similar efforts before.  Gebweb has offered routing since 2007, and promises optimal solutions for up to 15 cities with 2-opting solutions for larger problems.  I would love to know how far Google guarantees optimality before resorting to a heuristic (the demo on the announcement page limits you to 11 points).  Unlike the other additions, this has not yet made it to Google Maps, but when it does, let’s see how Google competes with Concorde.