Conditional Probability in the New York Times

When you ask a question of the form “What are the chances of X given Y”, your are asking a question of conditional probability. These sorts of questions have come up in this blog before: “What are the chances of cancer given a positive test result?” “What are the chances a monkey prefers blue M&Ms to green M&Ms, given it prefers red M&Ms to blue M&Ms?” “What are the chances of predicting the NCAA tournament perfectly, given perfect predictions for the first two rounds?”

Conditional probability is extremely important for two reasons. First, it occurs all the time: it is a fundamental building block as we aggregate information in an uncertain environment. Second, people are really, really bad at it. In case after case, our intuition misleads us and we badly misestimate conditional probabilities. When a 90% accurate drug test (meaning it is positive 90% of the time for a drug user, and negative 90% of the time for a nonuser) comes back positive, what is the probability the person uses drugs. Our intuition screams “It has to be 90%”! But the probability of “User given positive drug test” is not the same as probability of “positive drug test given user”. If 5% of the population use drugs, then the probability of “User given positive drug test is about 1 in 3. Consider 1000 people: 50 are drug users so 45 will test positive; of the 950 non-users, 95 will test positive; so 45/(45+95) is the probability of user given positive test.

Note that in the argument above, I did not rely on the main theorem in conditional probability: Bayes Theorem. Bayes Theorem states P(A|B) (the probability of A given B) = P(B|A)P(A)/P(B). I could have worked it out that way, but in doing so I would have lost all intuition as to the result. For simple cases, the counting approach is much easier and shows why the result is what it is.

This argument is at the heart of Steven Stogatz’s excellent article “Chances Are”, online at the New York Times (thanks for the pointer, Matt). He gives some excellent examples of conditional probability, including a great riff on the O.J. Simpson murder trial.

The prosecution spent the first 10 days of the trial introducing evidence that O.J. had a history of violence toward his ex-wife, Nicole. He had allegedly battered her, thrown her against walls and groped her in public, telling onlookers, “This belongs to me.” But what did any of this have to do with a murder trial? The prosecution’s argument was that a pattern of spousal abuse reflected a motive to kill. As one of the prosecutors put it, “A slap is a prelude to homicide.”

Alan Dershowitz countered for the defense, arguing that if even the allegations of domestic violence were true, they were irrelevant and should therefore be inadmissible. He later wrote, “We knew we could prove, if we had to, that an infinitesimal percentage — certainly fewer than 1 of 2,500 — of men who slap or beat their domestic partners go on to murder them.”

In effect, both sides were asking the jury to consider the probability that a man murdered his ex-wife, given that he previously battered her. But as the statistician I. J. Good pointed out, that’s not the right number to look at.

The real question is: What’s the probability that a man murdered his ex-wife, given that he previously battered her and she was murdered by someone? That conditional probability turns out to be very far from 1 in 2,500.

Turns out that probability is about 90%.

We are in the midst of a curriculum review at the Tepper School of Business and are considering what we absolutely have to be sure our MBAs know. Being able to work with conditional probability is very high on my list: it is an area where your intuition will almost surely lead you astray. And, as the Strogatz article points out, while Bayes Rule may get you the right results, simple counting arguments are much more convincing.

What is Operations Research?

Over on the suddenly active OR-Exchange, David Woods asked the question:

What are the best quick definitions describing operations research? The kind that you’d give to someone if you only had the duration of an elevator ride to describe it…

David then goes on to answer his question with a fantastic answer:

“Operations research is the art and science of obtaining bad answers to questions to which otherwise worse answers would be given.”

I love that! But I’m not sure that outsiders will really get it. Paraphrasing Steve Martin, “I stopped using irony when I realized I was the only one using it.”

So I’m left with “Operations Research is about making better decisions through mathematical models. Say, are you a baseball fan?” where I hope to squeeze in a story about operations research and sports scheduling. I’m hoping for a better line through the answers at OR-Exchange.

Follow INFORMS Practice from your Own Home

Sadly, I’m not at INFORMS Practice, but the blog entries and tweets make me feel like I am there (or perhaps they remind me I am not). Lots of interesting things happening and the conference hasn’t even started yet! Coming up shortly: the technology workshops, followed by the Welcome Reception tonight. I’ve got the tweets in my sidebar, and highly recommend following the conference page for the blog entries.

Authorship Order

Michael Mitzenmacher, in his excellent blog, My Biased Coin, has recent entries (here, here and here) on the order of authors on joint papers. When you have a last name that begins “Tri…”, it becomes pretty clear early on that alphabetical order is not going to result in a lot of “first author” papers. And it does tick me off when my work in voting theory becomes “Bartholdi et al.” or my work on the Traveling Tournament Problem is “Easton et al.”. I have even seen “Easton, Nemhauser, et al.” which is really hitting below the belt (since it is Easton, Nemhauser, and Trick).

Despite that, all of my papers have gone with alphabetical order, and I am glad I (and my coauthors) went that route. If even once I had gone with “order of contribution”, all of my papers would have been tainted with the thought “Trick is listed third: I guess he didn’t do as much as the others”.

The issue of determining “order of contribution” is a thorny one. There tend to be many skills that go into a paper, and we know from social choice how difficult it is to aggregate multiple orders into a single ordering. Different weighting of the skills leads to different orderings, and there is no clear way to choose the weighting of the skills. Even with the weighting, determining the ordering of any particular aspect of the paper is often not obvious. When doing a computational test, does “running the code” and “tabulating the results” mean more than “designing the experiment” or “determining the instances”? I don’t think hours spent is a particularly good measure (“Hey, I can be more inefficient than you!”) but there is practically nothing else that can be objectively measured.

Further, most papers rely on the mix of skills in order to be publishable. This reminds me of an activity I undertook when I was eight or so. I had a sheet of paper and I went around surveying anyone around on what was more important: “the brain, the heart, or the lungs” (anyone with a five-year-old kid will recognize a real-life version of “Sid the Science Kid” and, yes, I was a very annoying kid, thanks for asking). My father spent time explaining to me the importance of systems, and how there is no “most important” in any system that relies on the others. I would like to say that this early lesson in “systems” inspired me to make operations research my field of study, but I believe I actually browbeat him until he gave up and said “gall bladder” in order to get rid of me. But the lesson did stay with me (thanks, Dad!), and perhaps I was more careful about thinking about systems after that.

Some of the arguments over order strike me as “heart versus lungs” issues: neither can survive without the other. So, if a person has done enough work that the paper would not have survived without them, that both makes them a coauthor, and entitles them to their place in alphabetical order.

As for the unfairness of having a last name beginning “Tri…”, perhaps we should talk to my recent coauthors: Yildiz, Yunes, and Zin.

Google Maps API Enhancements

Google just announced some enhancements to their Directions in Maps API.  One addition, “avoiding tolls and highways” doesn’t really affect me much:  we have only one toll road in the area, and it is pretty well needed to go either east or west.  But the other two are exciting!

First, the API now adds bicycle routing.  I don’t know if it will let you go the wrong way down one way streets or hop on a crowded sidewalk to make a shortcut, but it does contain at least some of the long-distance bike paths.  Check out the path from my house to my buddy Barack’s place.  There is a bike path starting from about 20 miles from Pittsburgh all the way to Washington, DC.  I turn 50 this year, and have promised myself that I will do that trip.  Of course, I have broken promises before, and if I don’t start training soon, it won’t happen.  But it is nice to see Google will be by my side if I do set off.

The second, and more relevant to OR, enhancement adds Traveling Salesman routing capability.  From the announcement:

Route optimization. Have many places to go but no preference as to the order you visit them in? We can now reorder the waypoints of your route to minimize the distance and time you must travel. Very useful for traveling salesman I hear.

Now I would love to experiment with this!  There have been similar efforts before.  Gebweb has offered routing since 2007, and promises optimal solutions for up to 15 cities with 2-opting solutions for larger problems.  I would love to know how far Google guarantees optimality before resorting to a heuristic (the demo on the announcement page limits you to 11 points).  Unlike the other additions, this has not yet made it to Google Maps, but when it does, let’s see how Google competes with Concorde.

Operations Research: Growth Industry!

NPR has a nice graphic for where job growth will occur in the next decade based on US Bureau of Labor Statistics data (the NPR site is much cooler than the graphic above). Now, operations research is a little small to appear as a dot on its own, but if you look at that little dot far to the right, showing the most job growth? That is “Management, Scientific, and Technical Consulting Services”. And what field is all of “management, scientific and technical”? Operations Research, of course! The projection is for 82.8% growth.

There are some other interesting dots that might guide those in our field. Note the big dot second from the top. That is Manufacturing, with a 9% loss in jobs. Some of that might be due to efficiencies from our field, but I suspect most is due simply to a shrinkage in importance of manufacturing to the US economy. Some of the big growth areas? Education, health care and construction with growth in the 15-25% range. This suggests that applying operations research in the service industries is going to be a big driver of growth in our field (unless we miss the boat and let another field do operations research there under a different name).

Thanks to the INFORMS Facebook Page for the pointer!

Operations Research Embarrassments and the Traveling Tournament Problem

Dick Lipton of Georgia Tech has a very nice blog on the theory of computation (though it ranges broader than that). He has a fascinating post on “Mathematical Embarrassments”. A “Mathematical Embarrassment” does not refer to the mistaken proof I had in a real analysis course twenty five years ago (that still causes me to cringe) or a similar goof. Instead, it is defined as:

A mathematical embarrassment (ME) is a problem that should have been solved by now. An ME usually is easy to state, seems approachable, and yet resists all attempts at an attack.

This contrasts with a Mathematical Disease (MD), which is harder to define but has a number of characteristics:

1. A problem must be easy to state to be a MD. This is not sufficient, but is required. Thus, the Hodge-Conjecture will never be a disease. I have no clue what it is about.
2. A problem must seem to be accessible, even to an amateur. This is a key requirement. When you first hear the problem your reaction should be: that is open? The problem must seem to be easy.
3. A problem must also have been repeatedly “solved” to be a true MD. A good MD usually has been “proved” many times—often by the same person. If you see a paper in arXiv.org with many “updates” that’s a good sign that the problem is a MD.

So, in operations research we see many people catch the “P=NP (or not)” disease, but I would not call the problem an embarrassment.

Lipton has a number of good examples of mathematical embarrassments, some of which overlap with the skills/interests of more mathematically inclined operations researchers, but none are “really” operations research. For instance, it seems that proving that both the sum and difference of pi and e are transcendental would be a homework problem from centuries ago, but it is still open. That is an embarrassment!

So what would be an operations research embarrassment?

On the computational side, I think of the Traveling Tournament Problem as a bit of an embarrassment (embarrassment is too strong for this: perhaps this is an “operations research discomfiture”). It is a sports scheduling problem that is easy to state, but even very small instances seem beyond our current methods. It was a big deal when the eight team problem was solved!

Just this month, I received word from New Zealand that the three 10-team examples (NL10, CIRC10, and Galaxy10) have all been solved (by David Uthus, Patricia Riddle, and Hans Guesgen). This is huge for the couple of dozen people really into sports scheduling! Proving optimality for this size takes 120 processors about a week so I would say that the problem is still a discomfiture.

But a discomfiture is not an embarrassment. The problem, though simple to state, is too small to rank as an embarrassment. So what would an “operations research embarrassment” be? And are there other “operations research diseases” other than P=NP (though one could argue that by causing people to spend computer-years in solving Traveling Tournament Problems, I may have infected some with a mild disease).

Say Hi! to the New INFORMS Website

INFORMS has a new website and it looks great. I was the founding editor of INFORMS Online way back in 1995, building off some preliminary work done by Jim Bean and ManMohan Sodhi. Over the years, IOL changed quite a bit but I could still see lots of the original work my original editorial team had done. With this new site, INFORMS has revamped everything from scratch (well, not quite from scratch: I still see a bit of previous work in places like the membership directory, though that may change, and things like the Resources Page have only undergone a facelift, not a revamp). I’m still exploring the nooks and crannies, but my initial impression is that this is a much better face for the organization and for our field.

They are still in the process of doing the changeover, so some things are not working right (see their blog entry for more information), but overall it is a huge improvement over the previous website.

Time to Improve Operations Research on Wikipedia

The wikipedia article on operations research needs help.  Let’s see how it begins (as of today:  it might change any moment now, given the way wikipedia works):

Operations research (N America) or Operational research (UK/Europe) “is a scientific method of providing executive departments with a quantitative basis for decisions regarding the operations under their control.”[1] Other names for it include:

  • Operational analysis (UK Ministry of Defence from 1962)[2]
  • Quantitative management[3]

“The historical development of Operational Research (OR) is traditionally seen as the succession of several phases: the “heroic times” of the Second World War, the “Golden Age” between the fifties and the sixties during which major theoretical achievements were accompanied by a widespread diffusion of OR techniques in private and public organisations, a “crisis” followed by a “decline” starting with the late sixties, a phase during which OR groups in firms progressively disappeared while academia became less and less concerned with the applicability of the techniques developed.”[4]. In the current phase, the increase in computing power coupled with the birth of related techniques like business intelligence (BI) and business analytics are leading a resurgence of OR.[5]

Oh great, we get a 1947 definition of operations research, as quoted on a 1954 dustcover. It is synonymous with “Quantitative Management” based on a single South African department brochure. It’s main interest is historical, though it might be resurging. By the way, it is the Canadian Operational Research Society and the Mexican Institute of Systems and Operational Research, so even the statement on where the various terms are used is incorrect. And “UK/Europe”? Wikipedia suggests there is a certain redundancy there.

After that, it is 2/3 history and 1/3 lists. This is not the way to introduce people to our field!

I have fought this fight before, and lost. So I do things like the INFORMS Resources Page (for at least a bit longer), this blog with its lists of other OR blogs, my twitter account, OR-Exchange (for questions and answers, and maybe announcements in OR), and other things to make the successes of our field better known (and maybe show an illustrative failure or two). I’m not willing to add wikipedia to my list.

Fortunately, others are willing to do so. Siamak Faridani posted a question on OR-Exchange asking how to improve the wikipedia article.

OR articles on Wikipedia obviously need some attention.

My question is how can we join forces to improve these articles?

I have made a project page on Wikipedia there is a “To do list” on that page in which you can insert the articles that need attention and request new articles. If you would like to help in edits you can also add your user name to the list of contributors and we will ask for your inputs as we work on articles.

Even if you do not like to edit Wikipedia articles please consider helping with populating the to do list and I will try to work on those articles as much as possible

By its nature, the wikipedia article cannot be the responsibility of one person. If you are looking for ways to help the field, I highly recommend (for you, not for me!) spending some time improving the articles on operations research and related topics, either on your own or in conjunction with Siamak.