Skip to content

Which Average do you Want?

Now that I am spending a sentence as an academic administrator in a business school (the Tepper School at Carnegie Mellon University), I get first-hand knowledge of the amazing number of surveys, questionnaires, inquiries, and other information gathering methods organizations use to rank, rate, or otherwise evaluate our school. Some of these are “official”, involving accreditation (like AACSB for the business school and Middle States for the university). Others are organizations that provide information to students. Biggest of these, for us, is Business Week, where I am happy to see that our MBA program went up four positions from 15th to 11th in the recent ranking. Us administrators worry about this so faculty don’t have to.

Responding to all these requests takes a huge amount of time and effort. We have a full-time person whose job is to coordinate these surveys and to analyze the results of them. Larger schools might have three or four people doing this job. And some surveys turn out to be so time-intensive to answer that we decline to be part of them. Beyond Grey Pinstripes was an interesting ranking based on sustainability, but it was a pain to fill out, which seems to be one reason for its recent demise.

As we go through the surveys, I am continually struck by the vagueness in the questions, even for questions that seem to be asking for basic, quantitative information. Take the following commonly asked question: “What is the average class size in a required course?”. Pretty easy, right? No ambiguity, right?

Let’s take a school with 4 courses per semester, and two semesters of required courses. Seven courses are “normal”, classes run in 65 student sections, while one course is divided into 2 half-semester courses, each run in 20 student seminars (this is not the Tepper School but illustrates the issue). Here are some ways to calculate the average size:

A) A student takes 9 courses: 7 at 65 and 2 at 20 for an average of 55.
B) If you weight over time, it is really 8 semester-courses: 7 at 65 and 1 at 20 for an average of 59.4
C) There are about 200 students, so the school offers 21 sections of 65 student classes and 20 sections of size 20 for an average of 43.

Which is the right one? It depends on what you are going to use the answer for. If you want to know the average student experience, then perhaps calculation B is the right one. An administrator might be much more concerned about calculation C, and that is what you get if you look at the course lists of the school and take the average over that list. If you look at a student’s transcript and just run down the size for each course, you get A.

We know enough about other schools that we can say pretty clearly that different schools will answer this in different ways and I have seen all three calculations being used on the same survey by different schools. But the surveying organization will then happily collect the information, put it in a nice table, and students will sort and make decisions based on these numbers, even though the definition of “average” will vary from school to school.

This is reminiscent of a standard result in queueing theory that says that the system view of a queue need not equal a customer’s view. To take an extreme example, consider a store that is open for 8 hours. For seven of those hours, not a single customer appears. But a bus comes by and drops off 96 people who promptly stand in line for service. Suppose it takes 1 hour to clear the line. On average, the queue length was 48 during that hour. So, from a system point of view, the average (over time) queue length was (0(7)+48(1))/8=6. Not too bad! But if you ask the customers “How many people were in line when you arrived?”, the average is 48 (or 47 if they don’t count themselves). Quite a difference! What is the average queue length? Are you the store or a customer?

Not surprisingly, if we can get tripped up on a simple question like “What’s your average class size?”, filling out the questionnaires can get extremely time consuming as we figure out all the different possible interpretations of the questions. And, given the importance of these rankings, it is frustrating that the results are not as comparable as they might seem.

Registries To Avoid Publication Bias

I have been thinking about the issue of how a field knows what they know.  In a previous post, I wrote about how the field of social psychology is working through the implications of fraudulent research, and is closely examining the cozy interactions between journals, reviewers, and famous researchers.   And any empirical field based on statistical analysis has got to live with the fact that if there 1000 results in the field, some number (50 perhaps, if p=.05 is a normal cutoff and lots of results are just under that value) are going to be wrong just because the statistical test created a false positive.  Of course, replication can help determine what is real and what is not, but how often do you see a paper “Confirming Prof. X’s result”?  Definitely not a smooth path to tenure.

This is worse if malevolent forces are at work.  Suppose a pharmaceutical company has bet the firm on drug X, and they want to show that drug X works.  And suppose drug X doesn’t work.  No problem!  Simply find 20 researchers, sign them to a non-disclosure, and ask them to see if drug X works.  Chances are one or more researchers will come back with a statistically significant result (in fact, there is about a 65% chance that one or more will, given a p=.05).  Publish the result, and voila!  The company is saved!  Hurray for statistics and capitalism!

Fortunately, I am not the first to see this issue:  way back in 1997, the US Congress passed a bill requiring the registration of clinical trials, before the trials get underway.

The first U.S. Federal law to require trial registration was the Food and Drug Administration Modernization Act of 1997 (FDAMA) (PDF).

Section 113 of FDAMA required that the National Institutes of Health (NIH) create a public information resource on certain clinical trials regulated by the Food and Drug Administration (FDA). Specifically, FDAMA 113 required that the registry include information about federally or privately funded clinical trials conducted under investigational new drug applications (INDs) to test the effectiveness of experimental drugs for patients with serious or life-threatening diseases or conditions.

This led to the creation of clinicaltrials.gov (where I am getting this history and the quotes) in 2000.  This was followed by major journals requiring registration before papers could be considered for publication:

In 2005 the International Committee of Medical Journal Editors (ICMJE) began to require trial registration as a condition of publication.

The site now lists more than 130,000 trials from around the world.  It seems this is a great way to avoid some (but by no means all!) fraud and errors.

I think it would be useful to have such systems in operations research.  When I ran a DIMACS Challenge twenty years ago, I had hoped to keep up with results on graph coloring so we had a better idea of “what we know”:  then and now there are graph coloring values in the published literature that cannot be correct (since, for instance, they contradict published clique values:  something must be wrong!).  I even wrote about a system more than two years ago but I have been unable to find enough time to take the system seriously.  I do continue to track results in sports scheduling, but we as a field need more such systems.

 

Referees considered harmful

When doing empirical work, researchers often mess up either in the design of the experiment or in the analysis of data.  In operations research, much of our “empirical work” is in computational testing of algorithms.  Is algorithm A faster than algorithm B?  “It depends” is generally the only honest answer.  It depends on the instance selection, it depends on the computing environment, it depends on the settings, etc. etc.   If we are careful enough, we can say things that are (within the limits of the disclaimers) true.   But even a a careful experiment can fall prey to issues.  For instance, throwing away “easy” instances can bias the results against whatever algorithm is used to determine easiness.  And don’t get me started on empirical approaches that test dozens of possibilities and miraculously find something “statistically significant”, to be duly marked with an asterisk in the table of results.    It is very difficult to truly do trustworthy empirical work.  And it is even harder to do such work when researchers cheat or reviewers don’t do their job.

For some fields, these issues are even more critical.  Operations research generally has some theoretical grounding:  we know about polytopes and complexity, and so on, and can prove theorems that help guide our empirical work.  In fields like Social Psychology (the study of people in their interactions with others), practically all that is known is due to the results of experiments.   The fundamental structure in this field is a mental state, something that can only be imprecisely observed.

Social psychology is in a bit of a crisis.  In a very real sense, the field no longer knows what is true.  Some of that crisis is due to academic malfeasance, particularly that of an influential researcher Diederik Stapel.  Stapel has been found inventing data for dozens of papers, as described by a  “Science Insider” column.

Due to data fraud by Stapel and others, the field has to reexamine much of what it thought was true.  Are meat eaters more selfish than vegetarians?  We thought so for a while, but now we don’t know.  A Dutch report on this goes into great detail on this affair.

But overt fraud is not the only issue, as outlined in the report.  I was particularly struck by the role paper reviewers played in this deceit:

It is almost inconceivable that co-authors who analysed the data intensively, or reviewers of the international “leading journals”, who are deemed to be experts in their field, could have failed to see that a reported experiment would have been almost infeasible in practice, did not notice the reporting of impossible statistical results, … and did not spot values identical to many decimal places in entire series of means in the published tables. Virtually nothing of all the impossibilities, peculiarities and sloppiness mentioned in this report was observed by all these local, national and international members of the field, and no suspicion of fraud whatsoever arose.

And the role of reviewers goes beyond that of negligence:

Reviewers have also requested that not all executed analyses be reported, for example by simply leaving unmentioned any conditions for which no effects had been found, although effects were originally expected. Sometimes reviewers insisted on retrospective pilot studies, which were then reported as having been performed in advance. In this way the experiments and choices of items are justified with the benefit of hindsight.

Not infrequently reviews were strongly in favour of telling an interesting, elegant, concise and compelling story, possibly at the expense of the necessary scientific diligence.

I think it is safe to say that these issues are not unique to social psychology.  I think that I too have, as a reviewer, pushed toward telling an interesting story, although I hope not at the expense of scientific diligence.   And perhaps I could have worked harder to replicate some results during the reviewing process.

I don’t think we in operations research are in crisis over empirical issues.  I am pretty confident that CPLEX 12.4 is faster than CPLEX 4.0 for practically any instance you can throw at it.  And some journals, like  Mathematical Programming Computation have attempted to seriously address these issues.  But I am also pretty sure that some things I think are true are not true, either due to fraud by the author or negligence by reviewers.

One important role of a reviewer is to be on the lookout for malfeasance or bias and to avoid allowing (or, worse, forcing) authors to present data in an untruthful way.  And I think many of us are not doing a great job in this regard.  I would hate to have to rethink the foundations of my field due to these issues.

Operations Research and a Baseball Job

Analytics is getting to be more and more important in sports, and sports teams and leagues are looking to people with analytical skills to fill key roles in their organizations.   The MIT Sports Analytics conference is a big deal, attracting more than 2000 attendees, with an active job placement service.  The MBAs at my own school (the Tepper School) now has a sports analytics club, with a speaker series, case competition and more (including fun things like fantasy sports competitions) and many of these exceptionally bright and ambitious students are eager for jobs in the sports industry.  While some of this may be due to the success of Moneyball, much more of this is due to the fact that computers and decision making have gotten much, much better in the last years, making analytics a key competitive advantage.  And when you get past dashboards and basic data analysis and visualization, you move into using data to make better decisions.  In other words, you move into operations research.

It is clear that many clubs in Major League Baseball get it.  I see it when talking to people with my local team, the Pittsburgh Pirates (a team that I am sure will break .500 any year now!), and I just got a job announcement that shows that the next closest team to me, the Cleveland Indians, get it too.  They are looking for a VP-Technology, but it is clear that they see this as a job involving decision making, not just infrastructure.  From the ad, the primary purpose is:

The Vice President of Technology is responsible for developing, implementing, measuring and maintaining
plans that advance the organization’s achievement of its guiding commitments through enhanced
Baseball Operations and business decision-making tools, increased effectiveness of systems, hardware,
technology infrastructure and improved fan experience through fan-centric technology implementations.

I love the “decision-making tools” in that description.  Sounds just right for an operations research person who also understands technology.

 

The cutting plane method for matching is polynomial

Michael Mitzenmacher is a computer scientist at Harvard with a blog My Biased Coin.  As you might expect from the title, Michael works in the area of randomized algorithms, and even has a book on the subject.  His blog is an extremely useful guide to the what is happening in algorithms in CS (and what is happening in CS at Harvard, which is also quite interesting).  He often provides a summary of talks given at the big theory conferences (FOCS/STOC/etc.).  He just posted on this year’s FOCS (here and here).

There was one talk that caught my eye, summarized by a doctoral student:

[Editor: Fourth-year grad student Justin Thaler of Harvard contributes a summary of two unrelated talks.]

Paper Title: The Cutting Plane Method is Polynomial for Perfect Matchings.
Harvard’s own Karthekeyan Chandrasekaran talked about joint work with Laszlo A. Vegh and Santosh S. Vempala on cutting plane algorithms for matching problems. The cutting plane method is a popular algorithm for solving integer programs (IPs), used in commercial solvers. It works by starting with an LP relaxation of the given IP to obtain basic optimal solution x_0, and then iteratively adding constraints that are valid for integer solutions but violated by the basic optimum. It continues until the basic optimum is integral. The goal of this paper is to take a step toward explaining the practical efficiency of cutting plane methods, by giving an efficient cutting-plane algorithm for min-cost perfect matching (MWPM) –MWPM is known to be in P, but it was open (apparently for 30 years) whether there was a polynomial-time cutting-plane algorithm for this problem.
A brief summary of how they achieve this is as follows. They start with a natural, well-known LP relaxation of the MWPM problem, called the bipartite relaxation. This relaxation has the nice property that all basic optima x are half-integral, and the support of x is a disjoint union of edges and odd cycles. This makes it easy to find cuts (the cuts correspond to what are called blossom inequalities, see the paper for details). A major challenge, though, is that naively adding cuts will not preserve the half-integrality of intermediate LPs, so at each iteration they throw away some of the old cuts that were added earlier in the execution. They need to take considerable care in choosing which cuts to keep in order to guarantee half-integrality of intermediate LPs (and to ensure that their algorithm makes progress at a sufficiently high rate).
 This is pretty amazing.  First, it is wonderful that they were able to prove polynomiality.  It had bothered me that it seemed you might need an exponential number of cuts, even for something like matching.  I had looked at this 25 years ago when doing my doctorate, but didn’t have any particularly insightful ideas.
But the really amazing thing is that they were able to arrange their algorithm so they never had to work with anything worse than half-integral solutions.  This is astounding!  A bane of cutting plane approaches is the weird fractions that keep popping up, leading to numerical stability problems.  Here, they were able to keep things well under control.  And, by keeping to half-integral, maybe some old ideas I had about using generalized networks (networks with multipliers) might come back into play.   The approach certainly avoids the need for Gomory-Hu cut tree approaches to finding violated inequalities:  violated inequalities come straight out of  connected components.  This also harkens back to my dissertation where I had treated matching as a generalized network with side constraints.
So I checked out the paper that underlies the talk at arXiv, thinking I might try to implement some things this week and see how it works (CS theory people rarely implement:  you could make an entire career out of simply implementing what CS people suggest).  On the plus side, they reference my dissertation, so at least I am in the right ballpark.  On the down side:  it is looking a bit complicated!  Looks like I will have to reel in one of the bright doctoral students around to plow through this with me.
And I wonder what else this might be used for?

Optimizing Angry Birds

Most operations research competitions look at problems that are, frankly, dull.  At least to the non-OR-connoisseur.   Whether it is optimizing computer usage, nurse scheduling,  financial portfolio creation, or rescheduling airplanes, there is a certain utilitarianism about operations research competitions.  There are some competitions that have some level of charm (predicting murders in Philadelphia perhaps).  But, to the outsider, most of competitions sound as exciting as eating one’s vegetables.   Good to do, but rarely filled with anticipation.

I note that our colleagues in artificial intelligence and planning have addressed this issue straight on by developing the Angry Birds Challenge.  If you do not know the Angry Birds game, get yourself down to the airport, sit next to any waiting business person with a tablet, and watch what they are doing:  chances are they are playing Angry Birds. Or ask any eight year old.

With more than a billion downloads, Angry Birds is one of the world’s leading wasters of computer cycles.  The game is popular enough that my wife and I went as game characters last Halloween, as you can see.

To summarize, your goal in Angry Birds is to slingshot birds towards various structures in an attempt to kill pigs lurking within (really!).  Your birds have different capabilities (some might drop bombs, others can split into multiple birds), and it is necessary to use those capabilities to get to all the pigs.

The game is not as mindless as it might appear.  The underlying physics engine is very good, so the collapse of the structures when hit by the birds generally feels right.  Killing the pigs takes foresight, planning, creativity (of a very specialized kind), and sometimes a little luck.  So, a natural question is: can you create a computer system to play angry birds?

For some human time-wasters, computer systems are very easy to create.   It is straightforward to create a sudoku-solver far better than any human.  For Angry Birds, it is not so clear.  There is no obvious brute-force search that has a hope of leading to a solution.  Even the problem representation seems to be a difficult issue.  This looks to be quite a challenging problem!

I don’t see how to use traditional operations research methods for this problem.  But it would be great if we could show that operations research is awesome enough to solve the big problem of our time:  how to get three stars on level 4-14!

 

An Operations Research Nobel?

This entry is a copy of an blog posting I made for the INFORMS Phoenix conference blog.

The Nobel Prize committee has never quite taken to operations research.  If George Dantzig  was not awarded the prize, it is hard to see what our field has to do in order to be recognized.  But many recipients are well-known in our field and their research has drawn from, and inspired, research in operations research.  This year’s Economics award recognizes two  economists, Al Roth and Lloyd Shapley, who are very well known in our field, and have strong ties to INFORMS and its predecessor organizations.

Shapley was recognized by ORSA with the von Neumann Theory Prize in 1981.  Here is part of the citation:

Lloyd Shapley has dominated game theory for the thirty-seven years since von Neumann and Morgenstern published their path-breaking book, The Theory of Games and Economic Behavior. Shapley’s key ideas include his inventions of convex games; stochastic games; the “Shapley” value for cooperative games; and his development of the theory of the core, including his independent discovery of the famous Bondareva-Shapley Theorem about the non-emptiness of the core. He has made important contributions to network flow theory and to non-atomic game theory. His work on the core influenced the development of fixed-point and complementarity theory, and his work on stochastic games influenced the development of dynamic programming. His individual work and his joint research with Martin Shubik has helped build bridges between game theory, economics, political science, and practice. Roth received the Lanchester Prize

Roth received the Lanchester Prize in 1990 for the book with coauthor Mari Ida Sotomayor, Two-Sided Matching: a Study in Game Theoretic Modeling and Analysis (Cambridge University Press, 1990).  The citation read, in part:

n their book, Alvin Roth and Mari lda Oliveira Sotomayor use policy evaluation and empirical observation as a guide to deep mathematical analysis. They demonstrate in precise, insightful detail how game theory in general, and matching markets in particular evolved into fields that are grounded in strong theory and, at the same time, are quite relevant to real issues of practice. The theory of matching markets, to which the authors have been major contributors, originated with the famous 1962 Gale-Shapley paper, ‘College Admissions and the Stability of Marriage.

The Prize Page notes that Roth’s comments included the following:

When I received my PhD in Operations Research almost 20 years ago, game theory was as much at home in O.R. as in any other discipline. Today I sit in an economics department, and the situation is very different. Game theory has grown enormously and become the backbone of modern economic theory, but in operations research it seems to be studied and used relatively little. This is an enormous missed opportunity, and one of our hopes for the book is that it should help explain the nature of this opportunity.

I don’t know if Roth’s comments hold today.  Certainly, game theory provides a strong underpinning to what we do, particularly in areas like supply chain and marketing with multiple actors with different objectives. And that area has grown tremendously since Roth’s comments in 1990.  There are more than 200 papers at this conference that have some aspect of game theory in their abstract.  So our field is certainly active in this area.  But there is a tremendous amount of work in the economics literature also.

Congratulations to Roth and Shapley.  It might not be a pure “Operations Research Nobel” but it is pretty darn close.

Image credits: From the official Nobel Prize page.

George Nemhauser and Laurence Wolsey win von Neumann Theory Prize

twitter report has George Nemhauser of Georgia Tech winning the von Neumann Theory Prize from INFORMS.

 

Turns out that you can confirm this at the online conference program:  George Nemhauser and Laurence Wolsey are this year’s von Neumann Prize winners.  This is great news!  George and Laurence are two of my favorite people in the field.  I know George better than Laurence, since he and I are partners in our small sports scheduling company, so I work with him pretty closely.  And I did take a few courses from him when I was a doctoral student at George Tech a century or two ago.   I have interacted often with Wolsey at conferences, and he was kind enough to host me in Louvain years ago. The book George and Laurence wrote is a classic of integer programming and I am actually stunned they did not receive this award years ago.

But this is my blog, not George or Laurence’s, so let me say two things about me in this context:

  1. George and I have a paper that we wrote back in the late 1990s on scheduling Atlantic Coast Conference basketball.  It is dated now, of course (you can solve this problem in seconds rather than hours), but it remains one of my favorite papers, and I think it was pretty influential in kindling interest in sports scheduling optimization.  I wrote it over a 3 day period while snowed in during a year I spent at MIT.  The published version is almost identical to what I wrote in those three days.  I see it is one of George’s top 15 cited papers, but that probably is not the reason he got the von Neumann.  Note that George’s most cited work is his textbook with Wolsey:  Google Scholar is a bit confused about this.
  2. Last year’s winner of the von Neumann was Gerard Cornuejols, who is my colleague at the Tepper School.  Looks like “closeness to Michael Trick” is a key criterion for the von Neumann, though actually being Michael Trick is too close to get the bonus points.

Assuming the rumor is true, Congratulations George and Laurence!

 

Fischetti Speeds Up Optimization with Randomness

I just got back from a very nice workshop on “Matheuristics” held outside of Rio de Janeiro, Brazil.  Matheuristics is the combination of optimization and heuristics.  For instance, I talked about large scale local search for sports scheduling problems (among other things).  In this approach, portions of a sports schedule are fixed, and you optimize over the rest using integer programming.  This has aspects of both metaheuristics (large scale local search) and optimization (integer programming).

I greatly enjoyed the workshop and thank Celso Ribeiro for organizing it and inviting me.  It was a small workshop, with perhaps 40 participants, with a single track of papers.  For the first time in years I attended every talk at a conference!  Try doing that at INFORMS!

My favorite talk was given by Matteo Fischetti of the University of Padova.  The talk, entitled “Erraticism in tree search” was a real eye opener in terms of my understanding of how branch-and-bound on integer programs performs in practice.  I’m going to paraphrase some of the key points of the talk.  None of the numbers in the following are exactly what Matteo presented, but this is close enough to get the idea across.

Here’s the setup:  Matteo claims to have a new cut for general mixed integer programs.  It is parameterized, so he can actually create 10 different versions of that cut.  He downloaded the MIPLIB 2010 benchmarks and some other benchmarks he found, ran default CPLEX on them, and limited his test to hard, but not insolvable instances (tossing out all instances solved by default CPLEX in under 100 seconds or requiring more than 1000 seconds).  This left a test bed of 38 instances.

When Matteo solved the testbed using each of the ten cuts (separately), he found something amazing:  a single cut often greatly decreased computation time.  The (geometric) average decreases over the testbed relative to default CPLEX for the ten different parameterizations were:

Parametrization 1 2 3 4 5 6 7 8 9 10
Decrease 12% 14% 25% 4% -0.50% 32% 15% 12% 6% 33%

Wow!  These cuts are great!  Cut 5 had a bit of an increase in time, but it turned out that cut was kind of a dummy.  Cut 5 took a random permutation of the non-negative integer variables and added a constraint Σ xi ≥ -1. This constraint clearly doesn’t do anything: CPLEX throws it away during preprocessing.

But look at Cut 10.  It decreased computation time by 33%.  It is amazing!  Cut 10 took a random permutation of the non-negative integer variables and added a constraint Σ xi ≥ -1. And that seems to work really well!

Hold on…. What’s going on here?  In fact, each of the cuts is simply a random permutation of the variables and a redundant constraint.  How can that have any effect?

CPLEX (as with Gurobi or any other solver) has a certain “randomness” in its operation.  This is not random in the sense that the final answers are not optimal, but rather the exact operation of the algorithm may depend on things like the ordering of the variables. So, for instance, if a solution to a linear program has multiple optimal solutions, the exact solution reported can depend on which variable is first in the internal ordering of variables.  And, of course, the exact solution reported can then affect the branch and bound tree (since different variables might have fractional values).

If you change the internal ordering of the variables, you can change the operation of the branch and bound algorithm.  Adding the redundant cut changed the internal ordering of the variables, so the timing results could vary.   Not every instance has this aspect.  Some have relatively consistent computation time independent of the ordering.  But some instances vary a lot based on the ordering.

This is insight 1:  branch and bound timings may vary a lot due just to the random choices of the algorithm.

If your computation time is too long, try permuting the variables.  If you see lots of variation in computing time, perhaps it it is worth running multiple instances in parallel with different variable orderings (or internal random number seeds) in the hopes that one instance will solve much faster than the others.

This makes me wonder how many past papers showing that an approach is useful are simply showing different (good) random draws from the search tree?  And how many good ideas have been discarded due to “bad luck” from the random number generator?

But this still leaves a puzzle:  almost every random ordering is better than default CPLEX.   Does this mean that you should at least randomly generate an variable ordering?  One gentleman at the conference, vibrating with excitement, thought he had the justification for this:

Since most of these instances come from real problems, perhaps there is something about the way people formulate problems that leads to problems with the solution code. As I code, I would normally generate all of one type of variable (say, the “x” variables), then the “y” variables, then the “z” variables.  This must be messing up CPLEX.  What a great insight!

That hyperventilating gentleman was me.  And Matteo quickly and firmly showed that I was wrong.  How could almost all of the orderings do better than default CPLEX?  Look back, the hint is there…..

 

 

 

How did we get our instances?  Let me quote myself:

He downloaded the MIP2010 benchmarks and some other benchmarks he found, ran default CPLEX on them, and limited his test to hard, but not insolvable instances (tossing out all instances solved by default CPLEX in under 100 seconds or requiring more than 1000 seconds).

In doing this, he biased the sample!  Anything default CPLEX did well on (under 100 seconds), we threw out!  So the sample was made up of things that default CPLEX did not do well on.  It is no wonder that “Cut 1″ had an advantage.  If we had run all the instances on “Cut 1″ and threw out anything it did well on, it would turn out the default CPLEX would do better than “Cut 1″.  If you toss out instances that an algorithm works well on, don’t be surprised if it does poorly on the rest.

So this is insight 2:  if you want to compare algorithms A and B, make sure the definition of the testbed does not depend on which gets label A and which gets label B.

I can’t tell you the number of times I have read computational papers that make the bias mistake.  And I rarely noticed it.  I will now!

Of course, Matteo doesn’t make mistakes like this in his work:  this presentation was an extremely effective way of showing how easy it is to make this error.

I cannot do full justice to Matteo’s talk:  there were many more insights and surprises.  The associated paper (with Michele Monaci) is here.  And if you get a chance to see him give the talk, do so!

After that talk, I will never look at computation in integer programming the same way again.  That presentation alone was worth the flight to Brazil!

Added 9/26:  Matteo has kindly provided his slides, which are available here.

 

Come work at Carnegie Mellon!

I teach (and, now, administrate) in the Tepper School of Business at Carnegie Mellon University.  Unlike most engineering-oriented universities, CMU does not have an Industrial (or Systems) Engineering department, so operations research people are scattered throughout the university.  The Tepper contingent is a big one (senior faculty include Balas, Cornuejols, and Hooker) and is augmented by others in the school with operations research interests (people like Sridhar Tayur who is in our operations management group).  But there are lots of OR people elsewhere, including Chemical Engineering (Ignacio Grossman, for instance), Mathematics (Alan Frieze), and Computer Science (Tuomas Sandholm).

The Heinz College of Public Policy and Information Systems has a large contingent of operations researchers, including Al Blumstein, one of the founding researchers in our field.  And the Heinz College is hiring in our area!  Here is the announcement:

Carnegie Mellon University’s Heinz College  is currently expanding its program targeted at analysis, design, and eventual implementation of Urban Systems and is seeking to recruit tenure-track faculty to be key participants in those developments. Heinz College (www.heinz.cmu.edu) consists of a School of Public Policy and Management and a School of Information Systems and Management.

We are seeking individuals from a diversity of related disciplines, including economics, transportation, operations research and the management sciences, statistics, and systems engineering with specific attention to civil engineering.

Our interests in Urban Systems include transportation systems, public-safety systems, and the built environment. While these problem domains are diverse, to be addressed effectively an interdisciplinary perspective is required. We are thus seeking candidates who are open to collaborating with researchers from other disciplines. We have developed relationships with a number of cities, including Pittsburgh and New York, as sources of data, experimental opportunities for improved system designs, and expertise in the realm of problems associated with their operating systems.

All candidates should have a PhD (or should demonstrate that one will be achieved shortly) in relation to one of the identified disciplines and have demonstrated research competence. Interested applicants are asked to send their application to the Urban Systems Search Committee at urban-sys-heinz-college@andrew.cmu.edu. An application should include a cover letter indicating their particular interests and their sense of fit with the Heinz College, a CV, copies of any published papers, and three sources of reference letters.

We will be conducting interviews at the October INFORMS meeting in Phoenix. For consideration at these meetings, applications should be submitted at least one month in advance.

If this sounds like you, please consider applying.  I look forward to having even more operations research on this campus!