October 2011 – Mike Trick's Operations Research Blog

Can the New York Times Magazine Please Hire a Mathematically Literate Editor?

The New York Times Magazine provides almost inexhaustible fodder for this blog. Recently I have written about prostates, cell phones, and ketogenic diets, based on articles in the Magazine. Normally the articles are well researched and provocative. But sometimes things seem to go a bit haywire in the editing process, and the Magazine lets through stuff that it really should catch (see, for instance, my post on nonsensical graphics). This week in the midst of a fascinating (and scary) article on the state of US bio-defense, there came the following sentence:

In hundreds of experiments, scientist weaponized the bacteria to extraordinary potency and the proceeded to mix the slurry with another agent… which multiplied the effect logarithmically.

I suppose I should be happy that the Magazine did not go with the lazy “exponentially”, but really: what can it mean to multiply an effect “logarithmically”? Take x and multiple by log x? Base 10? e? This does not seem the case based on the following sentences, where the result “shatter[s] the human immune system”. It seems more likely that the author was searching for something to add a mathematical veneer and grabbed the only function remembered from a long-ago high school math class.

“… which greatly increased the effect” might not sound so sophisticated, but I think would have been a better choice here.

P.S. I now recall that I have seen this strange use of logarithmically before. Perhaps I can have a category where I track this usage.

Get Money from Google for Operations Research, in Europe anyway

Google seems to be getting the idea that operations research has a lot to add to the company. This is a long way from the time where I got the following response from a google software engineer at an information systems conference: “Integer programming? I tried it: it doesn’t work”. Oh, really?

This change is great for our field: the line between operations research and computer science is extremely fuzzy and is getting fuzzier as “business analytics” unites traditionally CS areas such as data mining with traditionally OR areas such as optimization. Operations Research needs to be seen as a key reason for the continuing success for companies like Google.

If you are in Europe, you can even get some money for a post-doc or other research expense from google, as per this announcement from Laurent Perron:

Object: Call for Proposal for Google Focused Grant Program on Mathematical Optimization and Combinatorial Optimization in Europe.
Deadline: November 25th (strict).
Contact: proposal should be sent to both Laurent Perron (lperron@google.com) and Michel Benard (benard@google.com).
Format: A proposal is a 3 pages document following the format described in http://research.google.com/university/relations/research_awards.html

The purpose of this program is to facilitate more interaction between Google and academia and also nurture stronger relations and partnerships with universities. The intent of this focused awards program is to support academic research aimed at improving the theory and applications of mathematical and combinatorial optimization (Operations Research, Constraint Programming, Meta-Heuristics). Google funds Research Awards unrestricted and retains no intellectual property from the research. We prefer if the results from the research are open sourced and widely published. Awards through this program are for one year in the range of $10K-$150K.
A typical grant will cover 1 Post-Doctoral student for 1 year. This program is restricted to Europe.

Areas that are of particular interest include (but are not limited to):

* Inference and relaxation methods: constraint propagation, cutting planes, global constraints, graph algorithms, dynamic programming, Lagrangean and convex relaxations, counting based inferences and heuristics, constraint relaxation based heuristics.

* Search methods: branch and bound, intelligent backtracking, incomplete search, randomized search, column generation and other decomposition methods, local search, meta-heuristics, large scale parallelism.

* Integration methods: static/dynamic problem decomposition, solver communication, transformations between models and solvers, collaboration between concurrent methods, models, and solvers.

* Modeling methods: comparison of models, symmetry breaking, uncertainty, dominance relationships, model compilation into different technologies (CP, LP, etc.), learning (CP, LP) models from examples.

* Innovative applications derived from OR techniques.

As a point of comparison, last year, we funded 9 grants in the following fields: Explanations in Constraint Programming, SAT techniques in Scheduling, Just-In-Time scheduling, Complex Bin-Packing, Parallel Resources in Scheduling, Convex Integer Programming, Ride sharing, Large Scale Mixed Integer Programming, and Territory Design.

Google values well written proposals that fit into the 3 page format + references. These proposals should include the following sections:
– A clear description of the problem the authors are trying to solve, and the potential impact they could have if the research is successful.
– An overview of the state of the art in this field and an explanation of why the proposed research is innovative.
– A precise understanding on how the authors are going to solve this problem.
– A convincing experimental method that will prove the authors are solving the real problem on realistic data instances, and that will measure the actual gains.
– The biography and a link to recent publications from the P.I. related to the proposal.

Finally, receiving a Google Research Award does not grant access to Google data. In some instances Google researchers may however be interested in collaborating on the problem formulation and the algorithmic solution.

Benchmarks: Coloring, Sports and Umpires

I have always felt strongly that operations research needs more libraries of instances for various problem classes, along with listings of current best solutions. By tracking how well we solve problems over time, we can show how we advance as a field. It also makes it easier to evaluate new work, making both authors and referees work easier.

I began this direction almost two decades ago when I spent a year at DIMACS (a fantastic research institute on discrete mathematics and computer science based at Rutgers) when I (together with David Johnson) ran their Computational Challenge, with an emphasis on solving graph coloring, clique, and satisfiability instances. From that, I put together a page on graph coloring (which has to be one of the oldest pages on the internets!) David, Anuj Mehrotra and I followed that up in 2003 with an updated challenge just on graph coloring. It was great to see people re-use the same instances, so we could understand the advances in the field. It is hard to tell exactly how many papers have used the various benchmark repositories, but it is clearly the basis for hundreds of papers, based on google scholar hits on the DIMACS paper referencing the instances.

I had this experience in mind ten years ago when Kelly Easton, George Nemhauser and I wanted to publish about work we had done with Major League Baseball in their scheduling. It made no sense to use MLB as a benchmark, since there is only one instance per year and much of the information about the needs of a schedule is confidential. So we created the Traveling Tournament Problem that abstracts two key issues in MLB scheduling: travel distance, and “flow” (the need to mix home and away games). We created a set of instances, solving a few smaller ones, and let it loose on the world. The result was fantastic: dozens of groups started working on the problem, and we could clearly see which techniques worked and which didn’t.

I had made a terrible mistake when creating benchmarks for graph coloring. I didn’t keep track of best results. This led to a fair amount of chaos in the field, with contradictory results appearing (claimed coloring values better than claimed lower bounds), and no clear picture of where things are going. I had thought at one time that I would try to clean things up with a “Repository of Optimization Instances and Solutions”, but too many other things have intruded for me to spend the time necessary on that. Fortunately, Stefano Gualandi and Marco Chiarandini have put together a site for graph coloring solutions, and I hope they will be successful in their efforts to put a little structure in the field.

I learned from that mistake and was much more diligent about keeping track of solutions for the Traveling Tournament Problem. The TTP site is always up to date (OK, almost always), so people can reasonably trust the results there. I have recently extended the site to include instances for non-round-robin scheduling and for the Relaxed TTP (where there is an opportunity for off-days).

One relatively new problem I am excited about is scheduling umpires in sports. Former doctoral students Hakan Yildiz (now at Michigan State) and Tallys Yunes (Miami) and I developed a problem called the Traveling Umpire Problem which again tried to abstract out key issues in Major League Baseball scheduling. In this case, the umpires want to travel relatively short distances (unlike the players, the umpires have no “home city”, so they are always traveling) but should not see the same teams too often. This problem feels easier than the Traveling Tournament Problem, but we still cannot solve instances with 14 or more umpires to optimality. This work received a fair amount of interest when the university PR people caught hold of our Interfaces paper. Since that paper, Hakan and I have put together a couple of other papers, exploring optimization-based genetic algorithms and benders-based local search approaches for this problem (to appear in Naval Research Logistics). Both papers illustrate nice ways of using optimization together with heuristic approaches. The website for the problem gives more information on the problem, along with instances and our best solutions.

I don’t think my repositories of benchmarks will be as influential as, say MIPLIB, which focuses on mixed integer programs. But I do like to think that they make the operations research world run a bit smoother.

Prostates and Probabilities

After a few years hiatus, I finally got back to seeing a doctor for an annual physical last week. For a 51-year-old male with a fondness for beer, I am in pretty good shape. Overweight (but weighing a bit less than six months ago), pretty good blood pressure (123/83), no cholesterol issues, all without the need for an drug regime.

Once you hit your mid-to-late 40s (and if you are a male), doctors begin to obsess about your prostate. There is a good reason for this: prostate cancer is the second most common reason for cancer death among males (next to lung cancer). So doctors try to figure out whether you have prostate cancer.

However, there is a downside to worrying about the prostate. It turns out that lots of men have prostate cancer, and most men will die of something else. The cancer is often slow growing and localized, making it less of a worry. Cancer treatment, on the other hand, is invasive and risky, causing not only death through the treatment (rare but not negligible) but various annoying issues such as impotence, incontinence, and other nasty “in—“s. But if the cancer is fast growing, then it is necessary to find it as early as possible and aggressively treat it.

So doctors want to check for prostate cancer. Since the prostate is near the surface, the doctor can feel the prostate, and my doctor did so (under the watchful gaze of a medical student following her and, as far as I know, a YouTube channel someplace). When I say “near the surface”, I did not mention which surface: if you are a man of a certain age, you know the surface involved. The rest of you can google “prostate exam” and spend the rest of the day trying to get those images out of your mind.

Before she did the exam, we did have a conversation about another test: PSA (Prostate Specific Antigen) testing. This is a blood test that can determine the amount of a particular antigen in the blood. High levels are associated with prostate cancer. My doctor wanted to know if I desired the PSA test.

Well, as I was recovering from the traditional test (she declared that my prostate felt wonderful: if it were a work of art, I own the Mona Lisa of prostates, at least by feel), I decided to focus on the decision tree for PSA testing. And here I was let down by a lack of data. For instance, if I have a positive PSA test, what is the probability of my having prostate cancer? More importantly, what is the probability that I have the sort of fast growing cancer for which aggressive, timely treatment is needed? That turns out to be quite a complicated question. As the National Cancer Institutes of the NIH report, there is not any clear cutoff for this test:

PSA test results show the level of PSA detected in the blood. These results are usually reported as nanograms of PSA per milliliter (ng/mL) of blood. In the past, most doctors considered a PSA level below 4.0 ng/mL as normal. In one large study, however, prostate cancer was diagnosed in 15.2 percent of men with a PSA level at or below 4.0 ng/mL (2). Fifteen percent of these men, or approximately 2.3 percent overall, had high-grade cancers (2). In another study, 25 to 35 percent of men who had a PSA level between 4.1 and 9.9 ng/mL and who underwent a prostate biopsywere found to have prostate cancer, meaning that 65 to 75 percent of the remaining men did not have prostate cancer (3).

In short, even those with low PSA values have a pretty good chance of having cancer. There is the rub between having a test “associated with” a cancer, and having a test to determine a cancer. Statistical association is easy: the correlation might be very weak, but as long as it is provably above zero, the test is correlated with the disease. Is the correlation high enough? That depends on a host of things, including an individual’s view of the relative risks involved. But this test is clearly not a “bright line” sort of test neatly dividing the (male) population into those with cancers that will kill them and those without such cancers.

In the days since my doctor’s appointment, there have been a slew of articles on PSA testing, due to the US Preventative Services Task Force moving towards declaring that PSA testing has no net benefit. The Sunday New York Times Magazine has an article on prostate screening. The article includes a wonderfully evocative illustration of the decision to be made:

David Newman, a director of clinical research at Mount Sinai School of Medicine in Manhattan, looks at it differently and offers a metaphor to illustrate the conundrum posed by P.S.A. screening.

“Imagine you are one of 100 men in a room,” he says. “Seventeen of you will be diagnosed with prostate cancer, and three are destined to die from it. But nobody knows which ones.” Now imagine there is a man wearing a white coat on the other side of the door. In his hand are 17 pills, one of which will save the life of one of the men with prostate cancer. “You’d probably want to invite him into the room to deliver the pill, wouldn’t you?” Newman says.

Statistics for the effects of P.S.A. testing are often represented this way — only in terms of possible benefit. But Newman says that to completely convey the P.S.A. screening story, you have to extend the metaphor. After handing out the pills, the man in the white coat randomly shoots one of the 17 men dead. Then he shoots 10 more in the groin, leaving them impotent or incontinent.

Newman pauses. “Now would you open that door?”

Is more information better? To me, information matters only if it changes my actions. Would a “positive” PSA test (whatever that means) lead me to different health-care decisions? And is it really true that more information is always better? Would my knowing that I, like many others, had cancerous prostate cells (without knowing if they will kill me at 54 or 104) really improve my life?

Perhaps in a few years, we’ll have a some advances. Ideal, of course, would be a test that unmistakably can determine if a man has a prostate cancer that, untreated, will kill him. Next best would be better, more individual models that would say, perhaps “A 53 year old male, with normal blood pressure and a fondness for beer, with a prostate shape that causes angels to sing, and a PSA value of 4.1 has an x% chance of having a prostate cancer that, untreated, will kill him with five years.” Then I would have the data to make a truly informed decision.

This year, I opted against the PSA test, and everything I have read so far has made me more confident in my decision. Of course, I did not necessarily opt out of PSA testing forever and ever: I get to make the same decision next year, and the one after that, and the one after that…. But I will spend the time I save in not worrying about PSA testing by working out more at the gym (and maybe adding a bit of yoga to the regime). That will, I think, do me much more good.