I was looking through one of the IFORS publications, *International Abstracts in Operations Research*. I am sure I will write about this more, since I think this is a very nice publication looking for its purpose in the age of Google. This journal publishes the abstracts of any paper in operations research, including papers published in non-OR journals. In doing so, it can be more useful than Google, since there is no need to either limit keywords (“Sports AND Operations Research”) or sift through tons of irrelevant links.

I was scanning through the subject categories of the recent issue of *IAOR* to find papers published in “sports”. I saw something really quite impressive. Can you see what caught my eye?

Sheep! There are five papers about operations research and sheep. They come from different journals, so there was not a “Special Issue on Sheep and Operations Research”. I love being part of a field where, in a single two month period, there can be five different papers on the application of that field to sheep. There are also two on “cattle” and two on “pigs”. And 10 on “crop yield”!

With only one “sports” paper, perhaps I have backed myself into an unfashionable sub-area.

]]>There is nothing that makes me more enthusiastic about the current state and future prospects of operations research than the Edelman awards. And, as a judge, I get to see all the work that doesn’t even make the finals, much of which is similarly inspiring. Operations Research is having a tremendous effect on the world, and the Edelman Prize papers are just the (very high quality) tip of the iceberg.

I was very pleased when the editors of Optima, ~~the newsletter of the Optimization Society of INFORMS~~, the newsletter of the Mathematical Optimization Society, asked me to write about the relationship between optimization and the Edelman Prize. The result is in their current issue. In this issue, the editors published work by the 2013 winner of the Edelman, work on optimizing dike heights in the Netherlands, a fantastic piece of work that has saved the Netherlands billions in unneeded spending. My article appears on page 6. Here is one extract on why the Edelman is good for the world of optimization:

There are many reasons why those in optimization should be interested in, and should support, the Edelman award.

The first, and perhaps most important, is the visibility the Edelman competition gets within an organization. A traditional part of an Edelman presentation is a video of a company CEO extolling the benefits of the project. While, in many cases, the CEO has already known about the project, this provides a great opportunity to solidify his or her understanding of the role of optimization in the success of the company. With improved understanding comes willingness to further support optimization within the firm, which leads to more investment in the field, which is good for optimization. As a side note, I find it a personal treat to watch CEOs speak of optimization with enthusiasm: they may not truly understand what they mean when they say “lagrangian based constrained optimization” but they can make a very convincing case for it.

Despite the humorous tone, I do believe this is very important: our field needs to be known at the highest levels, and the Edelman assures this happens, at least for the finalists. And, as I make clear in the article: it is not just optimization. This is all of operations research.

There are dozens of great OR projects done each year that end up submitted to the Edelman Award. I suspect there are hundreds or thousands of equally great projects done each year that don’t choose to submit (it is only four pages!). I am hoping for a bumper crop of them to show up in the submissions this year. Due date is not until October, but putting together the first nomination would make a great summer project.

]]>

Clustering is difficult only when it does not matter, Amit Daniely, Nati Linial, and Michael Saks, arXiv:1205.4891. […] this represents a move from worst-case complexity towards something more instance-based. The main idea here is that the only hard instances for clustering problems (under traditional worst-case algorithms) are ones in which the input is not actually clustered very well. Their definition of a “good clustering” seems very sensitive to outliers or noisy data, but perhaps that can be a subject for future work.

This paper really hit home for me. I have taught data mining quite often to the MBAs at the Tepper School and clustering is one topic I cover (in fact, research on clustering got me interested in data mining in the first place). I generally cover k-means clustering (easy to explain, nice graphics, pretty intuitive), and note that the clustering you end up with depends on the randomly-generated starting centroids. This is somewhat bothersome until you play with the method for a while and see that, generally, k-means works pretty well and pretty consistently as long as the data actually has a good clustering (with the correct number of clusters). It is only when the data doesn’t cluster well that k-means depends strongly on the starting clusters. This makes the starting centroid issue much less important: if it is important, then you shouldn’t be doing clustering anyway.

There are other operations research algorithms where I don’t think similar results occur. In my work in practice with integer programming, most practical integer programs turn out to be difficult to solve. There are at least a couple of reasons for this (in addition to the explanation “Trick is bad at solving integer programs”). Most obviously, easy problems typically don’t get the full “operations research” treatment. If the solution to a problem is obvious, it is less likely to require more advanced analytics (like integer programming and similar).

More subtly, there is a problem-solving dynamic at work. If an instance is easy to solve, then the decision maker will do something to make it harder. Constraints will be tightened (“What if we require at least 3 weeks between consecutive visits instead of just two?”) or details will be added (“Can we add the lunch breaks?”) until the result becomes a challenge to solve. I have not yet had a real-world situation where we exhausted the possibilities to add details to models or to further explore the possible sets of constraints. Eventually, we get to something that we can’t solve in a reasonable amount of time and we back up to something we can (just) solve. So we live on the boundary of what can be done. Fortunately, that boundary gets pushed back every year.

I am sure there is a lot of practical work in operations research that does not have this property. But I don’t think I will wake up one morning to see a preprint: “Integer programming is difficult only when it doesn’t matter”.

]]>At least two of the candidates in Iowa (Rick Santorum and Michelle Bachmann) will have visited all 99 counties in that state, leading up to the January 3, 2012 caucus date (it is a further quirk of American politics that some state don’t hold elections as such: a caucus is a gathering of neighbors to discuss, debate, and choose their delegates to a state convention which then will choose among the candidates).

The question Bill asked was “What is the quickest way to visit all 99 counties?”. Actually, Bill asked the question “What is the quickest way to visit the county seats of all 99 counties?”, a somewhat easier problem. This is an instance of the Traveling Salesman Problem, and problem Bill has studied for years. Bill is the primary author of Concorde, clearly the best code for finding optimal Traveling Salesman tours. As Bill explains in the New York Times article, the TSP is formally a hard problem, but particular instances can be solved to optimality:

Leaving Des Moines, we have 98 choices for our second stop, then, for each of these, 97 choices for the third stop, then 96 choices for the fourth stop, and so on until we hit our last county and drive back to Des Moines. Multiplying all of these choices gives the number of possible routes through Iowa. And it is a big number, roughly 9 followed by 153 zeros. No computer in the world could look through such an enormous collection of tours one by one.

The task of choosing the best of the tours is known as the traveling salesman problem, or T.S.P. I work in an applied mathematics field called operations research, which deals with the optimal use of scarce resources, and the T.S.P. is a specialty of mine.

…

This is a tough challenge, but to tour Iowa we do not need a general solution for the problem, we just have to handle our particular 99-city example. We were able to do this using a technique known as the cutting-plane method.

The Iowa instance is, presumably, not much of a challenge for Concorde: that code has found solutions to TSP instances with tens of thousands of cities. I am sure that most of the work, which Bill did together with Alain Kornhauser and Robert Vanderbei of Princeton, was in determining the distances between the 99 county seats. The actually route optimization would be a matter of seconds or less. The result is a 55.5 hour tour, traveling at legal speed limits.

If you would like to experiment with TSPs of your own, the excellent NEOS solvers allow you to upload your own instances and run Concorde on them.

Stay tuned for more from Bill Cook as we count down to the publication of his book *In Pursuit of the Traveling Salesman*, to be published in January. Trust me, you are going to like the book, so show some support for Bill by liking the Facebook page!

]]>

Land development often results in a reduction and fragmentation of natural habitat, which makes wildlife populations more vulnerable to local extinction. One method for alleviating the negative impact of land fragmentation is the creation of conservation corridors, which are continuous areas of protected land that link zones of biological significance.

My colleague, Willem van Hoeve, had worked on variants of this problem and had some nice data for the students to work with. The models were interesting in their own right, with the “contiguity” constraints causing the most challenge to the students. The results of the project were corridors that were much cheaper (by a factor of 10) than the estimates of the cost necessary to support the wildlife. The students did a great job (as Tepper MBA students generally do!) using AIMMS to model and find solutions (are there other MBA students who come out knowing AIMMS? Not many I would bet!). But I was left with a big worry. The goal here was to find corridors linking “safe” regions for the grizzlies. But what keeps the grizzlies in the corridors? If you check out the diagram (not from the student project but from a research paper by van Hoeve and his coauthors), you will see the safe areas in green, connected by thin brown lines representing the corridors. It does seem that any self-respecting grizzly would say: “Hmmm…. I could walk 300 miles along this trail they have made for me, or go cross country and save a few miles.” The fact that the cross country trip goes straight through, say, Bozeman Montana, would be unfortunate for the grizzly (and perhaps the Bozemanians). But perhaps the corridors could be made appealing enough for the grizzlies to keep them off the interstates.

I thought of this problem as I was planning a trip to China (which I am taking at the end of November). After seeing a picture of a ridiculously cute panda cub (not this one, but similarly cute), my seven-year-old declared that we had to see the pandas. And, while there are pandas in the Beijing zoo, it was necessary to see the pandas in the picture, who turned out to be from Chengdu. So my son was set on going to Chengdu. OK, fine with me: Shanghai is pretty expensive so taking a few days elsewhere works for me.

As I explored the panda site, I found some of the research projects they are exploring. And one was perfect for me!

Construction and Optimization of the Chengdu Research Base of Giant Panda Breeding Ecological System

I was made for this project! First, I have the experience of watching my students work on grizzly ecosystems (hey, I am a professor: seeing a student do something is practically as good as doing it myself). Second, and more importantly, I have extensive experience in bamboo, which is, of course, the main food of the panda. My wife and I planted a “non-creeping” bamboo plant three years ago, and I have spent two years trying to exterminate it from our backyard without resorting to napalm. I was deep into negotiations to import a panda into Pittsburgh to eat the cursed plant before I finally seemed to gain the upper hand on the bamboo. But I fully expect the plant to reappear every time we go away for a weekend.

Between my operations research knowledge and my battle-scars in the Great Bamboo Battle, I can’t think of anyone better to design Panda Ecological Systems. So, watch out “Chengdu Research Base on Giant Panda Breeding”: I am headed your way and I am ready to optimize your environment. And if I sneak away with a panda, rest assured that I can feed it well in Pittsburgh.

*This is part of the INFORMS September Blog Challenge on operations research and the environment.*

Doing the course assignment and scheduling has been eyeopening, and a little worrisome. Just as I worry at the beginning of the season for every sports league I schedule (“Why are there three teams in Cleveland this weekend?”), I worried over the beginning of the fall term as the first of my assignments rolled out. Would all the faculty show up? Would exactly one faculty member show up for each course? Oh, except for our three co-taught courses. And …. etc. etc.

It turns out there is one issue I hadn’t thought of, though fortunately it didn’t affect me. From the University of Pennsylvania (AP coverage based on the Under the Button blog entry):

PHILADELPHIA (AP) — University of Pennsylvania students who were puzzled by a no-show professor later found out why he missed the first day of class: He died months ago.

The students were waiting for Henry Teune (TOO’-nee) to teach a political science class at the Ivy League school in Philadelphia on Sept. 13.

University officials say that about an hour after the class’s start time, an administrator notified students by email that Teune had died. The email apologized for not having canceled the class.

I hadn’t thought to check on the life status of the faculty. I guess I will add “Read obituaries” to my to-do list.

]]>

Hagrid: “I’d like ter see a great Muggle like you stop him,”Harry: “A what?”Hagrid: “A Muggle. It’s what we call non-magic folk like them. An’ it’s your bad luck you grew up in a family o’ the biggest Muggles I ever laid eyes on.”

It is a little hard to see anything positive about Muggle in this exchange! Muggles are often willfully blind to the magic that goes on around them (though sometimes an Obliviator or two can help muggles forget something a little too spectacular), and are generally far less interesting than the magical folk.

But Ms Rowling is pretty even handed in its treatment of the magical world/non-magical world divide. Just like non-magical folk have no understanding of Quidditch, dementors, Patronus charms and the rest, the magical world is equally confused about the non-magical world:

Arthur Weasley:What exactly is a rubber duckie for?

The definition of a Muggle depends on where you stand!

Now what does this have to do with operations research (other than being the theme of this month’s INFORMS Blog Challenge, of which this article forms my entry)? A wonderful aspect of working in operations research, particularly on the practical side of the field, is that you both work with Muggles and get to be a Muggle.

Working with Muggles is pretty obvious. We in operations research have these magical powers to do incredible feats. Have a problem with millions of decisions to make? Easy! Well, easy as long as we can assume linear objective and constraints, and that the data is known and fixed, and …. (magic even in Harry Potter’s world has limitations). But for the Muggles to believe our results, we do have to spend time explaining what we do and the assumptions we work with. So we trot out some simple examples, like the traveling salesman problem (“Consider a traveling salesman who has to visit the cities on this napkin”: don’t laugh! That is the first real conversation I had with the woman who eventually decide to marry me!). And we explain why the problem is hard (“Consider how many tours there are”). And sometimes we convince the whole world of the difficulty so well that they don’t listen to the next part: “Despite the difficulty, we can really solve these models”. There are whole swathes of the world, including, it seems, much of computer science, that believes that 30 city traveling salesman instances are a true challenge, requiring an immediate application of approximation algorithms or heuristic methods. But then we solve interesting problems, and the Muggles begin to believe. And that is very satisfying.

But it gets even better when we in operations research get to be the Muggles. And this happens a lot on the practical side of operations research because we get to work with a lot of very smart and very interesting people outside of our field. A few years ago, I worked with the United States Postal Service to redesign its processing and distribution network. I know a lot about optimization and models and algorithms. About all I knew about postal delivery is that a very friendly guy comes by practically every day around 11, and he prefers if I park my car back about two feet so he can cut through the driveway easier. Turns out there is a heck of a lot more to the Postal Service than Postman Pete and his walk through our neighborhood. So I got to be the Muggle and to learn about the issues in getting mail from one side of the country to the other in a reasonably timely fashion. There is “First Class Mail” and “Third Class Mail”, but no “Second Class Mail”. Why? Well, that’s quite a story, let me tell you! And after a few months, I felt that I had passed my first class in the Magic of Mail, but was nowhere near losing my Muggle designation. But I knew enough to create a few models, and I could explain them to the Wizards of the Mail, and they could correct my Muggle understanding of mail processing (“No, no, no: a flat could never be handled in that way: that is only for Third Class, silly!”). And eventually we arrived at models that did a pretty good job of representing the mail system. I was a bit less of a Muggle about mail, and they were a bit less Mugggley about operations research.

Over the years, I have started as a Muggle about cell-phone production, sports scheduling, voting systems, and a number of other areas. And I got to read about these areas, and talk to smart people about issues, and, eventually, become, if not a Wizard, then at least a competent student of these areas.

Some fields are, by their nature, inward looking. The best operations research is not, and that is a true pleasure of the field.

]]>While I love the picture of an eager seller (“I have just two books for sale, but, man, if I sell one, I am set for life!”), the explanation is much more mundane, at least in some cases. As the “it is NOT junk” blog shows, it is clear that two sellers of the book *The Making of a Fly* (a snip at a price height of a mere $23 million) are setting their price relative to each others price. Seller A sets its price equal to .99830 times that of seller B; B sets its equal to 1.27059 of A. Every day the price of the book goes up by a factor of 1.26843. Do this for a few months, and you’ll get prices in the millions.

This sort of market driven pricing is not unreasonable. Sellers with good reputation are able to command higher prices (see, for instance, the paper by Ghosth, Ipeirotis, and Sundararajan on “Reputation Premiums in Electronic Peer-to-Peer Markets” for results and references). A firm might well feel that its reputation is worth a premium of 27.059%. Another firm might adopt an alternative strategy of just undercutting the competition by, say .0017%. Everything works fine until they become the only two firms in a market. Then the exponential growth in prices appears since there is no real “market” to base their differential on.

Such an issue would be nothing more than an amusing sideline if it weren’t for the effect such algorithmic prices can have on more important issues than obscure used books. The “flash crash” of the stock market in 2010 appears to have been caused by the interaction between one large sale and automated systems that executed trades based on trading volumes, not price. As the SEC report states:

“… under stressed market conditions, the automated execution of a large sell order can trigger extreme price movements, especially if the automated execution algorithm does not take prices into account. Moreover, the interaction between automated execution programs and algorithmic trading strategies can quickly erode liquidity and result in disorderly markets.”

Pricing based on markets abound. At the recent Edelman competition, one of the groups (InterContinental Hotels Group) discussed a price-setting mechanism that had, as one of the inputs, the competing prices in the area. Fortunately, they had a “human in the loop” that prevents spiraling prices of the form seen at Amazon.

In a wish to be quick, there is great pressure to move to automated pricing. Until we create systems that are more robust to unforeseen situations, we risk having not just $900,000,000 books, but all sorts of “transient” effects when systems spin out of control. And these effects can cause tremendous damage in a short period of time.

]]>The relationship between operations research and sports is one topic that I return to often on this site. This is not surprising: I am co-owner of a small sports scheduling company that provides schedules to Major League Baseball and their umpires, to many college-level conferences, and even to my local kids soccer league. Sports has also been a big part of my research career. Checking my vita, I see that about 30% of my journal or book chapter papers are on sports or games, and almost 50% of my competitive conference publications are in those fields. Twenty years ago, my advisor, Don Ratliff, when looking over my somewhat eclectic vita at the time (everything from polymatroidal flows to voting systems to optimization implementation) told me that while it was great to work in lots of fields, it is important to be *known* for something. To the extent that I am known for something at this point, it is either for online stuff like this blog and or-exchange, or for my part in the great increase of operations research in sports, and sports scheduling in particular.

This started, as most things in life often do, by accident. I was talking to one of my MBA students after class (I was younger then, and childless, so I generally took my class out for drinks a couple times a semester after class) and it turned out he worked for the Pittsburgh Pirates (the local baseball team). We started discussing how the baseball schedule was created, and I mentioned that I thought the operations research techniques I was teaching (like integer programming) might be useful in creating the schedule. Next thing I know, I get a call from Doug Bureman, who had recently worked for the Pirates and was embarking on a consulting career. Doug knew a lot about what Major League Baseball might look for in a schedule, and thought we would make a good team in putting together a schedule. That was in 1996. It took until 2005 for MLB to accept one of schedules for play. Why the wait? It turned out that the incumbent schedulers, Henry and Holly Stephenson were very good at what they did. And, at the time, the understanding of how to create good schedules didn’t go much beyond work on on to minimize breaks (consecutive home games or away games) in schedules, work done by de Werra and a few others. Over the decade from 1996-2005, we learned things about what does work and what doesn’t work in sports scheduling, so we got better on the algorithmic side. But even more important was the vast increase in speed in solving linear and integer programs. Between improvements in codes like CPLEX and increases in the speed of computers, my models were solving millions of times faster in 2005 than they did in 1996. So finally we were able to create very good schedules quickly and predictably.

In those intervening years, I didn’t spend all of my time on Major League Baseball of course. I hooked up with George Nemhauser, and we scheduled Atlantic Coast Conference basketball for years. George and I co-advised a great doctoral student, Kelly Easton, who worked with us after graduation and began doing more and more scheduling, particularly after we combined the baseball activities (with Doug) and the college stuff (with George).

After fifteen years, I still find the area of sports scheduling fascinating. Patricia Randall, in a recent blog post (part of the INFORMS Monthly Blog Challenge, as is this post) addressed the question on why sports is such a popular area of application. She points to the way many of us know at least something about sports:

I think the answer lies in the accessibility of the data and results of a sports application of OR. Often only a handful of people know enough about an OR problem to be able to fully understand the problem’s data and judge the quality of potential solutions. For instance, in an airline’s crew scheduling problem, few people may be able to look at a sequence of flights and immediately realize the sequence won’t work because it exceeds the crew’s available duty hours or the plane’s fuel capacity. The group of people who do have this expertise are probably heavily involved in the airline industry. It’s unlikely that an outsider could come in and immediately understand the intricacies of the problem and its solution.

But many people, of all ages and occupations, are sports fans. They are familiar with the rules of various sports, the teams that comprise a professional league, and the major players or superstars. This working knowledge of sports makes it easier to understand the data that would go into an optimization model as well as analyze the solutions it produces.

I agree that this is a big reason for popularity. When I give a sports scheduling talk, I know I can simply put up the schedule of the local team, and the audience will be immediately engaged and interested in how it was put together. In fact, the hard part is to get people to stop talking about the schedule so I can get on talking about Benders’ approaches or large scale local search or whatever is the real content of my talk.

But let me add to Patricia’s comments: there are lots of reasons why sports is so popular in OR (or at least for me).

First, we shouldn’t ignore the fact that sports is big business. Forbes puts the value of the teams of Major League Baseball to be over $15 billion, with the Yankees alone worth $1.7 billion. With values like that, it is not surprising that there is interest in using data to make better decisions. Lots of sports leagues around the world also have high economic effects, making the overall sports economy a significant part of the overall economy.

Second, there are a tremendous number of issues in sports, making it applicable and of interest to a wide variety of researchers. I do essentially all my work in scheduling, but there are lots of other areas of research. If you check out the MIT Sports Analytics conference, you can see the range of topics covered. By covering statistics, optimization, marketing, economics, competition and lots of other areas, sports can attract interest from a variety of perspectives, making it richer and more interesting.

A third reason that sports has a strong appeal, at least in my subarea of scheduling, is the close match between what can be solved and what needs to be solved. For some problems, we can solve far larger problems than would routinely occur in practice. An example of this might be the Traveling Salesman Problem. Are there real instances of the TSP that people want to solve to optimality that cannot be solved by Concorde? We have advanced so far in solving the problem, that the vast majority of practical applications are now handled. Conversely, there are problems where our ability to solve problems is dwarfed by the size of problem that occurs in practice. We would like to understand, say, optimal poker play for Texas Hold’em (a game where each player works with seven cards, five of them in common with other players). Current research is on Rhode Island holdem, where there are three cards and strong limitations on betting strategy. We are a long way from optimal poker play.

Sports scheduling is right in the middle. A decade ago, my coauthors and I created a problem called the Traveling Tournament Problem. This problem abstracts out the key issues of baseball scheduling but provides instances of any size. The current state of the art can solve the 10 team instances to optimality, but cannot solve the 12 team instances. There are lots of sports scheduling problems where 10-20 teams is challenging. Many real sports leagues are, of course, also in the 10-20 team range. This confluence of theoretical challenge and practical interest clearly adds to the research enthusiasm in the area.

Finally, there is an immediacy and directness of sports scheduling that makes it personally rewarding. In much of what I do, waiting is a big aspect: I need to wait a year or two for a paper to be accepted, or for a research agenda to come to fruition. It is gratifying to see people play sports, whether it is my son in his kid’s soccer game, or Derek Jeter in Yankee Stadium, and know not only are they there because programs on my computer told them to be, but that the time from scheduling to play is measured in months or weeks.

*This entry is part of the March INFORMS Blog Challenge on Operations Research and Sports*.