Pushing back boundaries

It is 1AM here in Cork, and an adenoidal singer with a very eclectic song selection is screaming outside my hotel window, making it difficult to sleep. So I am reading “The Science of Discworld III: Darwin’s Watch” by Terry Pratchett, Ian Stewart, and Jack Cohen. Terry Pratchett is my second favorite author (next to Patrick O’Brian), and I enjoy the “Science of Discworld” series. In these books, chapters from a novelette by Pratchett alternate with scientific discussion by Stewart and Cohen.

The first part of the book has some good things to say about the scientific process. From Pratchett:

It [a large piece of machinery in a lab] also helps in pushing back boundaries, and it doesn’t matter what boundaries these are, since any researcher will tell you it is the pushing that matters, not the boundary.

That, in essence, is the problem with much of faculty reviews, paper refereeing, and conference paper selection. Most of the time, we evaluate the pushing, with insufficient attention to the boundary. Pratchett, as (almost) always, gets it right.

Operations Research and the US Presidential Election

I am in Cork, Ireland, attending the Irish Conference on Artificial Intelligence and Cognitive Science (I gave a talk on sports scheduling and three themes of modern integer programming: complicated variables, large scale local search, and logical Benders constraints). Conversation here (when an American is in the group: presumably without an American conversation is about hurling or something) is on the US Presidential Election. Some of the historical anomalies are a bit confusing. Why is it only now that Barack Obama “accepts” the nomination from the Democratic Party: shouldn’t he have decided on this long, long ago? What if he didn’t accept the nomination?

The most confusing aspect of the election process is our use of the Electoral College to elect the President. Rather than directly electing the President, voters vote for electors, with each state being given a set number of electors. For most states, all of the state’s electors are given over to just one candidate. This makes interpreting the polls quite difficult. One recent poll had Obama (the now-nominee of the Democrats) and McCain (the presumptive Republican) tied at 47% support each. A natural leap was to then assume that the election is a toss-up. But it is really the distribution of support that counts. It is possible to win the election for President of the United States with .00001% of the vote. For instance, suppose only one voter shows up in 49 states, and those voters vote for Obama, and 10,000,000 Republicans vote for McCain in New York, then Obama would lose the national popular vote 10,000,000 to 49 but he would have an overwhelming majority in the electoral college. While the results would never be that extreme, it is certainly possible (and has happened) to win the national popular vote and lose the electoral vote.

Interpreting polls gets more complicated when you try to address the uncertainties in the polls. For instance, the 47% results above are only for those in the survey who had a preference. There are a huge number of “undecided” voters who do not yet have a preference. How should they be handled as we try to figure out who is ahead (I hate this idea of elections as a “horse race”, but if the media is going to see it as a race, they could at least accurately represent the real race)?

Sheldon Jacobson (University of Illinois), Steven Rigdon, and Ed Sewell (both of Southern Illinois University Edwardsville) are addressing this issue by taking the current poll data and determining the probability of winning the election for each candidate. They have a fascinating website that is being constantly updated.

It is worthwhile to read their methodology section.

The mathematical model employs Bayesian estimators that use available state poll results (at present, this is being taken from Rasmussen, Survey USA, and Quinipac, among others) to determine the probability that each candidate will win each of the states. These state-by-state probabilities are then used in a dynamic programming algorithm to determine a probability distribution for the number of Electoral College votes that each candidate will win in the 2008 presidential election.

There is a full paper by the above authors along with Christopher Rigdon.

They point out a few limitations of their approach. Of course, the results are only as good as the poll data: if the poll data is off, then their results are meaningless. Further, they are not (currently) treating Maine and Nebraska correctly: those two states divide their electors by congressional district, while every other state is all-or-nothing.

Currently, they have Barack Obama with an 89% chance of winning, which is pretty high, but down from the 96% chance they had him at on July 31.

Dash in FairIsaac

“A.L.” who frequently posts on sci.op-research notes

To improve service for Xpress-MP users even further, Fair Isaac closed
Dash office in Englewood Cliffs, NJ. This is what recorded message
says when office number is called. Guys from this office are still available and working from their
basements. The question is for how long.

I sent an email to Alkis Vazacopoulos who pointed out there is a FairIsaac NY office right
across the Hudson (11.5 miles away) in Manhattan where people are working. Alkis continues to be extremely upbeat about how Dash is doing within FairIsaac. I really don’t think this rather minor office move is worth getting worked up over! After spending $32 million on Dash (admittedly a small amount of money to FairIsaac, representing 4 or 5 months of corporate earnings), I don’t think they are going to mess things up in the first six months.

The Numerati

Stephen Baker of BusinessWeek has just published a book entitled The Numerati, and has a blog related to the book.  The purpose of the book is to look how mathematicians are using data to to profile people in their shopping, voting, and even dating habits.

I am not exactly an unbiased reader of the book.  I talked with Stephen during the writing of the book, and he asked me to review the two pages he wrote about “operations research” (I made a couple suggestions which didn’t make it into the final version:  I guess this is my “cutting room floor” experience).  He was kind enough to send me a review copy of the book, which I received a few weeks ago.  He also accepted my invitation to speak here at CMU to the Tepper School Faculty and doctoral students.

The book is divided into chapters corresponding to the different uses of data:  “Worker”, “Shopper”, “Voter”, “Terrorist”, “Patient” and “Lover”.  For instance, in the “Voter” section, the emphasis is on predicting voter behavior.  In the past (perhaps), geography and economics were very good predictors of voting behavior.  Now, people seem much more in flux as to their behavior.  Perhaps there are better predictors.  Or perhaps there are useful clusterings of like-minded people that would respond to a particular pitch.  If Barack Obama were to identify a cluster of “people who blog about obscure  but important mathematical modeling methods”  and would send a mailer (or email more likely) showing his deep understanding of operations research and a promise to use that phrase in his acceptance speech, then perhaps he would gain a crucial set of voters.  Barack, are you listening?

I greatly enjoyed reading the book, and did so in one sitting.  For someone like me who perhaps could be seen as one of the Numerati, there is not much technical depth to the book, but there are a number of good examples that could be used in the classroom or in conversation.  There is a bit too much “The Numerati know much about you and can use it for good or EEEVVVIILLLL” for my taste, but  perhaps I take comfort in understanding how poorly data mining and similar methods work in predicting individual behavior.  The book is very much about modeling people, so essentially ignores the way operations research is used to automate business decisions and processes.  This is a book primarily about what I would call data mining and clustering, so there are wide swathes of the “numerati” field that are not covered.  But for a popular look on how our mathematics is used to characterize and predict human behavior, The Numerati is an extremely interesting book.

Where did the summer go?

This summer seems to have been far shorter than previous summers. I start teaching next week, and it seems so unfair: I want my summer back! I think the fault is partially the conferences I went to (CPAIOR in May, IFORS in July, and MIP in August) broke up the summer too much, so I never got the multiple weeks in a row to get things done.

In retrospect, it turned out to be a reasonably productive summer, at least in terms of things I could check off my Remember The Milk lists (I am an enthusiastic, but sporadic, follower of the Getting Things Done approach to organizing one’s life). So, in the interest of letting people know what faculty do when they are not teaching, here is what got checked off since I put in my final course grades in early May:

  • 5 journal referee reports
  • 14 conference paper reports
  • 3 promotion/tenure letters
  • 2 prize nominations
  • 6 papers handled in some way as journal Associate Editor
  • 2 prize committees chaired and completed
  • 3 conferences attended
  • 3 conference presentations
  • 4 professional society reports/presentations
  • 1 university presentation
  • 1 other presentation
  • 29 OR blog posts
  • 83 trips across campus for an espresso
  • 1 computer upgraded
  • 1 baseball game attended
  • 1 elephant touched

Plus research moved on (one student ready to graduate next week), though I need to make more time for that (I think any professor would much rather have “3 papers completed and submitted” than that mess above). And Alexander is hitting a baseball much better! So overall, not a bad summer, but I would like it to last a few more weeks!

For another take on the things faculty do, particularly in the summer, be sure to check out the FemaleScienceProfessor blog (and thanks to My Biased Coin for pointing that out!)

In New York doing Mixed Integer Programming

I am in New York at Columbia University, attending the Mixed Integer Programming (MIP) workshop. This workshop series was started about 5 years ago, and has grown into a hundred person workshop/conference. It is still run pretty informally (no nametags: I guess it is assumed that everyone knows everyone else. Having just shaved off my beard, I would prefer letting people know my name rather than relying on their ability to recognize me!).

So far, the most interesting aspects have been approaches much different than current practice. Rekha Thomas of the University of Washington had a very nice talk on a variant of Chvatal Rank (called Small Chvatal Rank) which involved using Hilbert basis calculations to find normals of facets of the integer hull (you can think of this as Chvatal rank independent of the right hand side). I’m not sure if it is useful, but it certainly generated a number of neat results. Peter Malkin of UC-Davis talked about using systems of polynomial equations to prove the infeasibility of problems like 3-coloring. I have seen versions of this work before (given by coauthor Susan Margulies) and always begin thinking “this can’t possibly work” but they are able to prove a lack of 3-coloring for impressively large graphs.

One of the most intriguing talks was given by John Hooker, who is exploring what he calls “principled methods of modeling” (or formulation). John has a knack of looking at seemingly well-known approaches and seeing them in a new and interesting way. It is not yet clear that this principled approach gets you anything that is not folklore formulation tricks, but it is interesting to see a theoretical underpinning to some of the things we do.

Postscript. Now that I think about it a bit more, John did present a problem that would have been difficult to formulate without his principled approach. I’ll try to track down an explanation of that example.

Personal Blogs, Personal Views

William Patry, Google’s Senior Copyright Counsel, is ending his blog “The Patry Copyright Blog“. That sentence, in short, gives one of the two reasons he is ending the blog (the other is the horrid state of copyright law, in his view). Patry is a long time copyright lawyer who started the blog while in private practice. Once he joined Google, however, suddenly people projected his blog onto Google. Of course, it is hard to see how to refer to people without saying something about their credentials. Saying, “William Patry, Google’s Senior Copyright Counsel, said this on his blog” seems to give more credence to a view than “William Patry, random guy on the street, said this”, even without making the step to “Google as a company believes this”. But too many people make the last step, making it impossible for Google’s senior copyright counsel to have a personal blog.

I’m primarily an academic, and there seems little harm in referring to me as “Michael Trick, professor at Carnegie Mellon, said …”, since there is a long tradition of celebrating individual views among professors. If everything a university professor said had to be vetted by the university (and who in the university could do such a vetting?), you wouldn’t hear much from professors on any topic. Once in a while, I get identified with organizations. Recently, I was referred to as “Michael Trick, former President of INFORMS” in a blog entry where I have pretty well convinced myself I was not referred to as a “nut with a computer” (though some doubt exists), which is a bit more troublesome. Even in 2002 as President of INFORMS, almost nothing I said was speaking on behalf of INFORMS. Certainly now, if it is not obvious, I don’t speak for INFORMS.

This is also relevant to the recent IBM/ILOG acquisition. We’d all love to hear from the ILOGers and IBMers on what they think it means. I would love to hear it on a personal level on how it will affect their life, their work, our field, and the universe. But it is essentially impossible for anyone at IBM or ILOG to speak personally at this point. Anything they say could be taken as “official” and would draw harsh legal (and company) repercussions. So, until the acquisition is complete, I think we will have to be satisfied with carefully crafted, legally-cleared, official comments.

Traffic Behavior and Operations Research

mergingThe New York Times Magazine has an article today entitled “The Urge to Merge” on how people handle tunnels, construction, and so on, when driving, where the number of lanes decreases. Some people, the lineuppers, carefully get into one of the continuing lanes and wait patiently to go through the tunnel. Others, the sidezoomers, zip along one of the ending lanes until the last minute, and then force themselves into the lane. Of course, whether they can merge in depends on the mood and attention of the lineupper involved. All this leads to aggravation and, worse, inefficiency, since the constant stop-and-go allows less traffic to flow through than a smoothly flowing system.

When I lived in Germany, traffic patterns were noticeably more organized (as was much of life). There, cars all go up to the merge point, at which point the cars alternated in the use of the lane. This “zipper” effect could be done at reasonably high speed, since there was never any question on whose turn it was. The only worry was some silly American messing up their system (at which point I got to learn lots of German words that were not taught in my classes). This is a great example of the value of coordination. Almost any solution where people each know what the others will do is better than uncertainty.

My only complaint about the article came in the following part:

So I started consulting professionals on my own: traffic engineers, the highway police, queuing theorists. The learning curve, it must be said, was robust. I hadn’t known queuing had theories. But of course it does, mathematicians and business-operations people have to work them out, the heart-attack patient gets in ahead of the sprained ankle and nobody has a problem with that, and anybody who has been to Europe intuitively understands what one engineer meant when in midsentence he said to me, “perfect England,” meaning culturally mandated compulsive queuing, and, “perfect Italy,” meaning culturally mandated compulsive nonqueuing.

Operations Research, dammit, Operations Research!