Skip to content

Touring the Rio Olympics

I’m a sportswriter who is bound for the Olympics, and I’m writing an article about maximizing the number of events I see in a particular day.

I thought the input of a mathematician with expertise in these matters would be helpful to the story.

This was an interesting start to day a month ago.  I received this email from New York Times reporter Victor Mather via University of Waterloo prof Bill Cook.  Victor had contacted Bill since Bill is the world’s foremost expert on the Traveling Salesman Problem (he has a must-read book aimed at the general public called In Pursuit of the Traveling Salesman: Mathematics at the Limits of Computation).  The TSP involves visiting a number of sites while traveling the minimum distance.  While Victor’s task will have some aspects of the TSP,  there are scheduling aspects that I work on so Bill recommended me (thanks Bill!).

Victor’s plan was to visit as many Olympic events in Rio as he could in a single day.  This was part of a competition with another reporter, Sarah Lyall.  Victor decided to optimize his day; Sarah would simply go where her heart took her.  Who would see more events?

Victor initially just wanted to talk about how optimization would work (or at least he made it seem so) but I knew from the first moment that I would be optimizing his day if he would let me.  He let me.  He agreed to get me data, and I would formulate and run a model that would find his best schedule.

On the day they chose (Monday, August 15), there were 20 events being held.  rio-2016The map shows that the venues are all spread out across Rio (a dense, somewhat chaotic, city not known for its transportation infrastructure).  So we would have to worry about travel distance.  Victor provided an initial distance matrix, with times in minutes (OLY times).  As we will see, this matrix must have been created by someone who had a few too many caipirinhas on Copacabana Beach:  it did not exactly match reality.

So far the problem does look like a TSP: just minimize the distance to see 20 sites.  The TSP on 20 sites is pretty trivial (see a previous blog post for how hard solving TSPs are in practice) so simply seeing the sites would be quite straightforward.  However, not surprisingly Victor actually wanted to see the events happening.  For that, we needed a schedule of the events, which Victor promptly provided:

Badminton 8:30 am – 10:30 am // 5:30 pm -7
Basketball 2:15 pm – 4:15 pm// 7 pm -9 pm // 10:30 pm – midnite
Beach Volleyball 4 p.m.-6 p.m. // 11pm – 1 am
Boxing 11 am-1:30 pm, 5 pm – 7:30 pm
Canoeing 9 am – 10:30 am
Cycling: 10 a.m. – 11 am /// 4 pm – 5:30
Diving 3:15 pm -4:30
Equestrian 10 am- 11 am
Field Hockey 10 am – 11:30 am // 12:30 –2 // 6-7:30 pm // 8:30 pm -10
Gymnastics 2 pm -3 pm
Handball: 9:30 am- 11 am// 11:30 – 1pm //2:40 -4 //4:40 – 6 pm // 7:50 – 9:10 pm // 9:50 pm -11:10
Open water Swimming 9 am – 10 am
Sailing 1 pm – 2 pm
Synchronized swimming: 11 am- 12:30
Table tennis 10 am -11 am // 3pm -4:30 // 7:30pm – 9
Track 9:30 am -11:30 am // 8:15pm -10 pm
Volleyball 9:30 am- 10:30 am // 11:30 – 12:30 // 3pm – 4 pm // 5 pm – 6 pm // 8:30 pm -9:30 // 1030 pm -11:30
Water polo 2:10 pm – 3 pm // 3:30 -4:20 //6:20 -7:10 // 7:40 – 8:30
Weightlifting 3:30 – 4:30 pm // 7 pm- 8 pm
Wrestling 10 am – noon // 4 pm- 6 pm

Many events have multiple sessions:  Victor only had to see one session at each event, and decided that staying for 15 minutes was enough to declare that he had “seen” the event.  It is these “time windows” that make the problem hard(er).

With the data, I promptly sat down and modeled the problem as an mixed-integer program.  I had variables for the order in which the events were seen, the time of arrival at each event, and the session seen for each event (the first, second, third or so on).  There were constraints to force the result to be a path through the sites (we didn’t worry about where Victor started or ended: travel to and from his hotel was not a concern) and constraints to ensure that when he showed up at an event, there was sufficient time for him to see the event before moving on.

The objective was primarily to see as many events as possible.   With this data, it is possible to see all 20 events.  At this point, if you would like to see the value of optimization, you might give it a try:  can you see all 20 events just by looking at the data?  I can’t!

But there may be many ways to see all the events.  So, secondarily, I had the optimization system minimize the distance traveled.  The hope was that spending less time on the buses between events would result in a more robust schedule.

I put my model in the amazing modeling system AIMMS.  AIMMS lets me input data, variables, constraints, and objectives incredibly quickly and intuitively, so I was able to get a working system together in a couple of hours (most of which was just mucking about with the data).  AIMMS then generates the integer program which is sent to Gurobi software (very fast effective mixed-integer programming solution software) and an hour later I had a schedule.

start Arrival Next travel time
canoe 9:00 20
open swimming 9:35 45
equestrian 10:35 35
wrestling 11:25 10
synchro 12:15 30
sailing 13:00 30
gymnastics 14:00 10
handball 14:40 10
basketball 15:05 10
diving 15:30 10
cycling 17:00 30
beach volleyball 17:45 30
badminton 18:40 5
boxing 19:00 5
weightlifting 19:20 5
table tennis 19:40 10
water polo 20:05 35
hockey 20:55 35
track 21:45 20
volley 22:30
end 22:45

This schedule has 385 minutes on the bus.

I discussed this problem with my colleague Willem van Hoeve, who spent about five minutes implementing a constraint programming model (also in AIMMS) to confirm this schedule.  He had a access to a newer version of the software, which could find and prove optimality within minutes.  Unfortunately my version of the software could not prove optimality overnight, so I kept with my MIP model through the rest of the process (while feeling I was driving a Model T, while a F1 racecar was sitting just up the hallway).  CP looks to be the right way to go about this problem.

I spent some time playing around with these models to see whether I could get something that would provide Victor with more flexibility.  For instance, could he stay for 20 minutes at each venue?  No: the schedule is too tight for that.  So that was the planned schedule.

But then Victor got to Rio and realized that the planned transportation times were ludicrously off.  There was no way he could make some of those trips in the time planned.  So he provided another transportation time matrix (olympics3) with longer times.  Unfortunately, with the new times, he could not plan for all 20: the best solution only allows for seeing 19 events.

Event Arrival Next Travel Slack
canoe 9:00 30
track 9:45 45
equestrian 10:45 35
synchro 11:35 60 0:10
sailing 13:00 60
gymnastics 14:15 15
water polo 14:45 15
diving 15:15 15 0:25
cycling 16:10 15
handball 16:40 15
wrestling 17:10 25
boxing 17:50 15
badminton 18:20 15 0:10
weightlifting 19:00 15 0:35
table tennis 20:05 25
basketball 20:45 35 0:10
hockey 21:45 45 0:30
volley 23:15 30
beach volleyball 0:00

So that is the schedule Vincent started with.  I certainly had some worries.  Foremost was the travel uncertainty.  Trying to minimize time on the bus is a good start, but we did not handle uncertainty as deeply as we could have.  But more handling of uncertainty would have led to a more complicated set of rules and it seemed that a single schedule would be more in keeping with the challenge.  The morning in particularly looked risky, so I suggested that Victor be certain to get to gymnastics no later than 2:15 in order to take advantage of the long stretch of close events that follow.  Sailing in particular, looked to be pretty risky to try to get to.

I had other worries:  for instance, what if Victor arrived during half-time of an event, or between two games in a single session?  Victor probably wouldn’t count that as seeing the event, but I did not have the detailed data (nor an idea of the accuracy of such data if it did exist), so we had to hope for the best.  I did suggest to Victor that he try to keep to the ordering, leaving an event as soon as 15 minutes are up, hoping to get a bit more slack when travel times worked out better than planned (as-if!).

So what happened?  Victor’s full story is here and it makes great reading (my school’s PR guy says Victor is “both a reporter and a writer”, which is high praise).  Suffice it to say, things didn’t go according to plan. The article starts out:

There are 20 [events] on the schedule on Monday.  Might an intrepid reporter get to all of them in one day?  I decide to find out, although it doesn’t take long to discover just how difficult that will be.

The realization comes while I am stranded at a bus stop in Copacabana, two and a half hours into my journey.  The next bus isn’t scheduled for three hours.  And I’ve managed to get to exactly one event, which I could barely see.

I had given him some rather useless advice in this situation:

“Something goes wrong, you reoptimize,” Professor Trick had cheerfully said.  This is hard to do sitting on a bus with the computing power of a pen and a pad at my disposal.

Fortunately, he got back in time for the magical sequence of events in the afternoon and early evening.  One of my worries did happen, but Victor agilely handled the situation:

Professor Trick urged me in his instructions to “keep the order!”  So it is with some trepidation that I go off-program.  Wrestling is closer to where I am than handball, and it looks like I will land at the handball arena between games.  So I switch them.  It’s the ultimate battle of man and machine — and it pays off.  I hope the professor approves of my use of this seat-of-the-pants exchange heuristic.

I do approve, and I very much like the accurate use of the phrase “exchange heuristic”.

At the end Victor sees 14 events.  19 was probably impossible, but a few more might have been seen with better data.  I wish I had used the optimization code to give Victor some more “what-ifs”, and perhaps some better visualizations so his “pen and pad optimization” might have worked out better.  But I am amazed he was able to see this many in one day!

And what happened to Sarah, the reporter who would just go as she pleased?  Check out her report, where she saw five events.  If Victor went a little extreme getting a university professor to plan his day, Sarah went a little extreme the other way in not even looking at a map or a schedule.

This was a lot of fun, and I am extremely grateful to Victor for letting me do this with him (and to Bill for recommending me, and to Willem for providing the CP model that gave me confidence in what I was doing).

I’ll be ready for Tokyo 2020!

The American Scientist Magazine Understands Nothing about the Traveling Salesman Problem

I like to think there are some solid foundations to my life. I will be able to do the Monday New York Times crossword and I will not be able to do the Thursday version. My dental hygienist will not be satisfied with the amount of flossing I do. I will get my favorite spaghetti meal served on my birthday.

And I can trust American Scientist to write articles that teach me science in an accessible, yet challenging, way. American Scientist is the magazine of the honor society Sigma Xi and is my favorite science magazine since the demise of The Sciences, my all-time favorite science magazine. Or it was.

That particular pillar of my existence crumbled into dust today as I was reading an otherwise interesting article on “Paradoxes, Contradictions, and the Limits of Science” by Noson Yanofsky. In this article, there is a sidebar on the Traveling Salesman Problem that makes me weep.

It is fortunate that they have put this behind their firewall so as to minimize the damage they have done. But let me briefly describe: there is a map of the United States, along with a truly atrocious Traveling Salesman tour through the US, apparently visiting each zip code (approximately 43,000 zip codes). The tour is created by “Hilbert Curves” and is “about 75% optimal”. Then, the inevitable, “Finding the actual shortest route would take a computer essentially an infinite amount of time to compute”. Cue the sidebar: “But what about going to 100 different cities? A computer would have to check 100x99x98x….x2x1 possible routes.” They then helpfully expand that into the 157 digit number of potential routes. “The computer would have see how long the route takes, and then compare all of them to find the shortest route.”

The sidebar continues with the drivel of computer speed and the number of centuries it would take.

No, no. 100! times no. It is not necessary to check all the routes. 100 cities TSPs can be solved to proved optimality in a very short period of time. And even 43,000 zip codes can be solved to optimality with a non-infinite (and quite practical) amount of computation. Using integer programming and specialized techniques, Bill Cook and his gang at Concorde have solved to optimality problems with up to 85,900 cities. The “zip code” problem is just like the problems Cook has solved, just smaller.

The Traveling Salesman problem is NP-complete (or NP-hard depending what you mean by the TSP). But as a computational approach, the “figuring out how big 100! is and that is the time” completely misunderstands the role optimization and algorithms play in truly solving hard problems (I mean it: the true optimal is found and proved to be optimal). My most recent blog post (sadly too long ago) was on this very subject. Perhaps I should rename this blog “Michael Trick’s Rants about Complete Enumeration Arguments”.

Even if you want to call this instance impossible to solve, why would you ever use a Hilbert Curve heuristic for it? Don’t get me wrong: I am a student of John Bartholdi and his paper with Loren Platzman on spacefilling curves and his paper applying this to Meals-on-Wheels scheduling changed my life. But these techniques don’t give great tours (they have lots of other great properties).

It appears American Scientist has taken this tour from this webpage. The published map is the bottom one on the page. The resulting tour is clearly not good. Practically any other modern heuristic would do better. This is no complaint about Robert Kosara, the author of the original article: the page describes what he wanted to do, and he did it well. But for American Scientist to put this forward as an example of the state-of-the-art heuristic approach completely misrepresents the state of the art for heuristic approaches. And I have no idea what 75% optimal means: the tour is longer than optimal, of course.

I don’t know if this means I should not trust American Scientist in areas that are not my specialty. On something I know a bit about, the magazine has failed miserably.

Complete Enumeration Arguments Deemed Harmful…

… or “The Traveling Salesman Problem is Not That Hard”.

When I talk to people about what I do for a living, I often face blank stares (or, worse, rapidly retreating backs) when I describe problems like the Traveling Salesman Problem.

Me: “Well, suppose the dots on this napkin represent cities, and you want to find the shortest route that visit them. How could you do it?”

Them: Quick scribble through the dots. “There! So you spend the day drawing lines through dots?”

Me: “No, no! Suppose there are 1000 cities. Then…. well, there are 1000*999*998*…*2*1 ways through all the dots, and that is more than the number of freckles on a dalmatian’s back, so I look for better ways to draw lines through dots.

And I fall into the same old trap.  I try to convince people the Traveling Salesman problem is hard by saying there are a lot of solutions to look through.  And that is an extremely misleading argument that I am now convinced does more harm than good.

Imagine if I had said one of the following:

  1. “Sorting a bunch of numbers!  Hoo-wee, that’s a hard one.  If I had 1000 numbers, I would have to check all orderings of the numbers to find the sorted order.  That is 1000*999*998*…*2*1 orders and that is more than the hairs on a Hobbit’s toes.”
  2. “Maximize a linear function on n variables over m linear constraints?  Well, I happen to know some theory here and know all I need to do is look at subsets of m variables and take a matrix inverse and do some checking.  Unfortunately, with 1000 variables and 700 constraints, that is 1000 choose 700, which is more than the number of atoms in a brick.”
  3. “Find a shortest path from here to the Walmart?  That’s a tough one.  There are a lot of roads from here to there and I need to check all the paths to find the shortest one.  That would take longer than the wait in a Comcast line!”

Of course, all of these are nonsense:  having lots of possibilities may be necessary for a problem to be hard, but it is certainly not sufficient.  Sorting is a problem that anyone can find a reasonable solution for.  Shortest path is a bit harder, and linear programming (problem 2) is harder still.  But all can be solved in time much, much less than the approaches above suggest.

So why do we use this argument for the Traveling Salesman problem?  We know from complexity theory that the problem is NP-Hard, so most of us believe that there is not now a known polynomial time algorithm (there are still some who believe they have such an algorithm, but they are in the very, very tiny minority), and many of us believe that no such algorithm exists.

But that does not mean complete enumeration is necessary:  it is easy to come up with approaches that go through less than the full n! solutions to the Traveling Salesman problem (see the postscript for one example).  This is not surprising.  Complete enumeration for Satisfiability is 2^n but there are methods known that take time proportional to something like 1.4^n [Edit: as per a comment, I meant 3-Sat].  Still exponential but not complete enumeration.  And there is a big difference between 2^n and 1.00001^n (or 2^n/10^10) even if complexity theory likes to call them all “exponential”.

But the “complete enumeration” statement is even more harmful since it glosses over a much more important issue: many “hard” problems (in the complexity theory meaning) are not particularly hard (in the English sense), if you limit yourself to instances of practical interest.  Due to the tremendous amount of work that has been done on the Traveling Salesman Problem, the TSP is one such problem.  If you have an instance of the TSP of practical interest (i.e. it makes a difference to your life if you solve it or not, and it is really a TSP, not the result of some weird set of set of reductions from some other problem), then I bet you the Concorde program of Bill Cook and others will get you a provably optimal solution in a reasonable amount of time.  In fact, I would be really interested in knowing the smallest “real” instance that Concorde cannot solve in, say, one day, on a normal laptop computer.

Being hard-to-solve in the complexity theory sense does not mean being hard-to-solve in the common language sense.  And neither have much to do with the number of possible solutions that complete enumeration might go through.

As an example of this confusion, here is a paragraph from Discover on work done by Randy Olson:

Planning a vacation is a daunting task, so why not let big data take the reins?

That’s exactly what data scientist Randy Olson did. Using specialized algorithms and Google Maps, Olson computed an optimized road trip that minimizes backtracking while hitting 45 of Business Insider’s “50 Places in Europe You Need to Visit in Your Lifetime.” (Five locations were omitted simply because you can’t reach them by car.)

Mapping software has come a long way, but for this kind of challenge it’s woefully inadequate. Google Maps’ software can optimize a trip of up to 10 waypoints, and the best free route optimization software can help with 20. But when you’re looking to hit 45 or 50 landmarks, things get complicated.

According to Olson, a computer would need to evaluate 3 x 10^64 possible routes to find the best one. With current computing power, it would take 9.64 x 10^52 years to find the optimal route to hit all your desired locations — Earth would have long been consumed by the Sun before you left the house. So Olson used a clever workaround, based on the assumption that we don’t need to find the absolute best route, but one that’s pretty darn good.

Now, all this is based on an article Olson wrote months ago, and this has all be hashed over on twitter and in blogs (see here and here for example), but let me add (or repeat a few points):

  1. 45 points, particularly geographically based, is a tiny problem that can solved to optimality in seconds.  The number of possible routes is a red herring, as per the above.
  2. The “based on the assumption that we don’t need to find the absolute best route, but one that’s pretty good” is not a new idea.  Techniques known as “heuristics” have been around for millenia, and are an important part of the operations research toolkit.
  3. “Minimize backtracking” is not exactly what the problem is.
  4. The computer does not need to evaluate all possible routes
  5. With current computing power, it takes seconds (at most) to find the optimal route.
  6. I don’t expect the sun to consume the earth in the next minute.
  7. Olson’s “approach” is fine, but there are a zillion heuristics for the TSP, and it would take a very poor heuristic (maybe 2-opt by itself) not to do as well as Olson’s on this particular instance.
  8. Seriously, bringing in “big data”?  That term really doesn’t mean anything, does it?

In Olson’s defense, let me make a few other points:

  1. Few of the rest of us researchers in this area are covered in Discover.
  2. Journalists, even science journalists, are not particularly attuned to nuance on technical issues, and Olson is not the author here.
  3. The instances he created are provocative and interesting.
  4. Anything that brings interest and attention to our field is great! Even misleading articles.

But the bottom line is that real instances of the TSP can generally be solved to optimality.  If, for whatever reason, “close to optimal” is desired, there are zillions of heuristics that can do that.  The world is not eagerly waiting a new TSP heuristic. Overall, the TSP is not a particularly good problem to be looking at (unless you are Bill Cook or a few other specialists):  there is just too much known out there.  If you do look at it, don’t look at 50 point instances: the problem is too easy to be a challenge at that size.  And if you want to state that what you learn is relevant to the TSP, please read this (or at least this, for a more accessible narrative) first.  There are other problems for which small instances remain challenging: can I suggest the Traveling Tournament Problem?

Finally, let’s drop the “number of solutions” argument:  it makes 50 point TSPs look hard, and that is just misleading.


 

 

Postscript: an example of not needing to do complete enumeration for the Traveling Salesman Problem

For instance, take four cities 1, 2, 3, and 4, and denote the distance matrix to be D (assumed to be symmetric, but similar arguments work for the asymmetric case).  Then one of D(1,2)+D(3,4), D(1,3)+D(2,4), or D(1,4)+D(2,3) is maximum (if there are ties choose one of the maximums), say D(1,2)+D(3,4).  It is easy to show that the optimal tour does not have both the edges (1,2) and (3,4), since otherwise a simple exchange would get a tour no worse.  If you want to be convinced, consider the following diagram.  If a tour used (1,2) and (3,4), then an alternative tour that uses either the dashed edges (1,3) and (2,4) or the dotted edges (1,4) and (2,3) would still be a tour and would be no longer than the (1,2) (3,4) tour.  As drawn, the dashed edges give a shorter tour.

tsp

 

If you are convinced of that, then here is an algorithm that doesn’t enumerate all the tours: enumerate your tours building up city by city, but start 1, 2, 3, 4.  At this point you can stop (and not move on to 1,2,3,4,5) and move on to 1,2,3,5 and so on.  In doing so, we avoided enumerating (n-4)! tours.  Ta da!  Of course, you can do this for all foursomes of points, and don’t need to only have the foursome in the initial slots.  All this can be checked quickly at each point of the enumeration tree.  How many tours are we left with to go through?  I don’t know:  it is still exponential, but it is a heck of a lot less than n! (and is guaranteed to be less than that for any instance).  Maybe even less than the time it takes for the sun to consume the earth.

 

 

State of Operations Research Blogging

It has been almost a year since I had a blog entry here.  Not that I don’t have a lot to say!  I am using twitter more, and I do have ideas for blog entries in cases where 140 characters is not enough.  But there is limited time.

But I think something more fundamental is at work.  What is the state of blogging, and, in particular, the operations research blogging world?  It doesn’t appear all that healthy to me, but perhaps I am not seeing things correctly.

I think the blogging world was badly hurt by the cancellation of Google Reader.  At least for me, Google Reader was a fast a convenient way to follow the blogging world.  And, beyond me, I had creating a public list of OR Blogs, and a public feed of OR blog entries.  It seemed to be well used, but those ended with the end of Reader. It is harder to get word out about OR blogging.

I have tried to continue aspects of these listings on this page with a feed of OR blogs (also in sidebar) but it is much less convenient.

I also think the relentless onslaught of comment spam discouraged (and perhaps still discourages) people from trying out blogging.

Today I went through my list of OR blogs to see who has posted in the past year, and was distressed to see how few have done so.  Even with a pretty broad view of what an OR Blog is, it came to only about 40 people, with many of those (including myself!) barely meeting the “posted in the last year” requirement.

Those that remain are a fantastic source of information.  I think particularly of Laura McLay’s Punk Rock Operations Research and Anna Nagurney’s RENeW as must-read entries.  But there now seem to be few active bloggers.

Am I missing a ton of entries in the OR Blogging world  (let me know if I am missing some from my list)?  Has the world of twitter taken over from the long-form journalism that blogging provides?

In any case, I will make an effort to write more and welcome thoughts about how to strengthen the OR blogging community.

Using Analytics for Emergency Response

I just attended a great talk by Laura McLay at the German OR Society meeting in Aachen.  In her semi-plenary, Laura talked about all the work she has done in Emergency Medical Response.  Planning the location and operation of ambulances, fire trucks, emergency medical technicians, and so on is a difficult problem, and Laura has made very good progress in putting operations research to use in making systems work better.  She has been recognized for this work not only in our field (through things like outstanding paper awards and an NSF CAREER award) but also by those directly involved in emergency response planning, as evidenced by an award from the National Association of Counties.

Laura covered a lot of ground in her talk (she has a dozen papers or more in the area), but I found one result in particular very striking.  Many ambulance systems have a goal of responding to 80% of their calls in 9 minutes (or some such numbers).  One of the key drivers of those values is the survivability from heart attacks:  even minutes matter in such cases.response  The graph attached (not from Laura, available in lots of places on the internet) shows a sharp dropoff as the minutes tick away.

But why 9 minutes?  It is clear from the data that if the goal is to provide response within 9 minutes, there is an awful lot of 8 minute 30 second response times.  Systems respond to what is measured.  Wouldn’t it be better, then to require 5 minute response times?  Clearly more people would be saved since more people would be reached within the critical first minutes.  This looks like a clear win for evidence-based medicine and the use of analytics in decision making.

But Laura and her coauthors have a deeper insight than that.  In the area they are looking at, which is a mix of suburban and rural areas, with a 9 minute response time, the optimal placement of ambulances is a mixture of suburban and rural locations.  With a 5 minute response time, it does no good to place an ambulance in a rural location: they can’t get to enough people in time.  All the ambulances would be placed in the higher-density suburban location.  If a call comes in from a rural location, eventually an ambulance would wend its way to the rural location, but after 20 or 30 minutes, many cases become moot.

To figure out the optimal response time, you need to figure out both survivability and the number of cases the system can reach.  For the area Laura and her team looked at, the optimal response time turned out to be 8 to 9 minutes.

Of course, this analysis is not relevant if the number of ambulances is increased with the decreased response time requirement.  But the enthusiasm for spending more on emergency response is not terrifically high, so it is more likely that the time will be changed without a corresponding increase in budget.  And that can have the effect of making the entire system worse (though things are better for the few the ambulance can reach in time).

This was a great example of the conflict between individual outcome and social outcomes in emergency response.  And a good example of how careful you need to be when using analytics in health care.

I highly recommend reading her Interfaces article “Hanover County Improves its Response to Emergency Medical 911 Patients” (no free version).  I even more highly recommend her blog Punk Rock Operations Research and her twitter stream at @lauramclay.

Taking Optimization With You After Graduation

In the Tepper MBA program, we use versions of Excel’s Solver (actually a souped up version from Frontline Systems)  for most of our basic optimization courses.  Students like this since they feel comfortable with the Excel interface and they know that they can use something like this in their summer internships and first jobs, albeit they are likely to the more crippled version standard with Excel.  For those who are particularly keen, we point them to an open source optimization system that can allow them to stay within the Excel structure.

In our most advanced course, we use AIMMS with Gurobi as the underlying solver. Students generally love the power of the system, but worry that they will not be able to translate what they learn into their jobs.  This wouldn’t be an issue if companies had analytics and optimization as a core strength, and routinely had some of the commercial software, but that is not the case.  So the issue of transfer comes up often.

I am really happy to see that Gurobi has a deal in place to allow students to continue using their software, even after they graduate.  This gives new graduates some time to wow their new employers with their skills, and to make the argument for further investment in operations research capabilities.

Here is an excerpt from an email I received from Gurobi:

Academic Site License

Our FREE academic site license allows students, faculty, and staff at a degree-granting academic institution to use Gurobi from any machine on a university local-area network. This program makes it easy for everyone at the university to have access to the latest version of Gurobi, without having to obtain their own license. You can learn more on our Academic Licensing page, and your network administrator can request a license by emailing support@gurobi.com.

Take Gurobi With You Program Update

This program allows qualified recent graduates to obtain a FREE one-year license of Gurobi for use at their new employer.

Qualified recent graduates can complete a short approval process and then receive the license including maintenance and support at no cost to themselves or their employers. This reflects our continuing support of students, even after they graduate. You can learn more on our Take Gurobi With You page.

I think this sort of program can have a great effect on the use of optimization in practice.  And we need to rethink what we teach in the classrooms now that we know the “can’t take it with you” effect is lessened.

The Baa-readth of Operations Research

IMG_20140806_150251At the recent International Federation of Operational Research Society (IFORS) meeting in Barcelona (a fabulous conference, by the way), I had the honor of being nominated as President of that “society of societies”.  If elected, my term will start January 1, 2016, so I get a bit of a head start in planning.

I was looking through one of the IFORS publications, International AbIMG_20140806_150201stracts in Operations Research.  I am sure I will write about this more, since I think this is a very nice publication looking for its purpose in the age of Google.  This journal publishes the abstracts of any paper in operations research, including papers published in non-OR journals.  In doing so, it can be more useful than Google, since there is no need to either limit keywords (“Sports AND Operations Research”) or sift through tons of irrelevant links.

I was scanning through the subject categories of the recent issue of IAOR to find papers published in “sports”.  I saw something really quite impressive.  Can you see what caught my eye?

 

Continue reading ›

Optimization, Operations Research and the Edelman Prize

This year, I have the distinct honor of chairing the committee to award the Franz Edelman Award, given out by INFORMS for the best work that “attests to the contributions of operations research and analytics in both the profit and non-profit sectors”.  This competition has been incredibly inspiring to me throughout my career.  Just this year, as a judge, I got to see extremely high-quality presentations on eradicating polio throughout the world, bringing high-speed internet to all of Australia, facilitating long kidney exchange chains, and more.  I have seen dozens of presentations over my years as an Edelman enthusiast and judge, and I always leave with the same feeling: “Wow, I wish I had done that!”.

There is nothing that makes me more enthusiastic about the current state and future prospects of operations research than the Edelman awards.  And, as a judge, I get to see all the work that doesn’t even make the finals, much of which is similarly inspiring.  Operations Research is having a tremendous effect on the world, and the Edelman Prize papers are just the (very high quality) tip of the iceberg.

I was very pleased when the editors of Optima, the newsletter of the Optimization Society of INFORMS, the newsletter of the Mathematical Optimization Society, asked me to write about the relationship between optimization and the Edelman Prize.  The result is in their current issue.  In this issue, the editors published work by the 2013 winner of the Edelman, work on optimizing dike heights in the Netherlands, a fantastic piece of work that has saved the Netherlands billions in unneeded spending.  My article appears on page 6.  Here is one extract on why the Edelman is good for the world of optimization:

There are many reasons why those in optimization should be interested in, and should support, the Edelman award.

The first, and perhaps most important, is the visibility the Edelman competition gets within an organization. A traditional part of an Edelman presentation is a video of a company CEO extolling the benefits of the project. While, in many cases, the CEO has already known about the project, this provides a great opportunity to solidify his or her understanding of the role of optimization in the success of the company. With improved understanding comes willingness to further support optimization within the firm, which leads to more investment in the field, which is good for optimization. As a side note, I find it a personal treat to watch CEOs speak of optimization with enthusiasm: they may not truly understand what they mean when they say “lagrangian based constrained optimization” but they can make a very convincing case for it.

Despite the humorous tone, I do believe this is very important:  our field needs to be known at the highest levels, and the Edelman assures this happens, at least for the finalists.  And, as I make clear in the article: it is not just optimization.  This is all of operations research.

There are dozens of great OR projects done each year that end up submitted to the Edelman Award.  I suspect there are hundreds or thousands of equally great projects done each year that don’t choose to submit (it is only four pages!).  I am hoping for a bumper crop of them to show up in the submissions this year.  Due date is not until October, but putting together the first nomination would make a great summer project.

Blogging and the Changing World of Education

As a blogger, I have been a failure in the last six months.  I barely have enough time to tweet, let alone sit down for these extensively researched, tightly edited, and deeply insightful missives that characterize my blog.  I tell you, 1005 words on finding love through optimization doesn’t just happen!

phdtimeI have my excuses, of course.  As the fabulous PHD Comics points out, most of us academics seem somewhat overbooked, despite the freedom to set much of our schedule.  I am not alone in being congenitally unable to turn down “opportunities” when they come by.  “Help hire a Norwegian professor?” Sounds fun! “Be the external examiner for a French habilitation degree?” I am sure I’ll learn a lot!  “Referee another paper?” How long can that take?  “Fly to Australia for a few days to do a research center review?”  Count me in!  And that was just four weeks in February.

All this is in addition to my day job that includes a more-than-healthy dose of academic administration.  Between doing my part to run a top business school and to move along in research, not to mention family time, including picking up the leavings of a hundred pound Bernese Mountain Dog (the “Mountain” in the name comes from said leavings) and entertaining a truly remarkable nine-year-old son, my time is pretty well booked up.

And then something new comes along.  For me, this newness is something I had a hand in putting together: the Tepper School’s new FlexMBA program.  This program offers our flagship MBA program in a hybrid online/onsite structure.  Every seven weeks or so, students in the program gather at one of CMU’s campuses (we have them in Pittsburgh, Silicon Valley, and New York, we have not yet used our Qatar campus) and spend a couple days intensively starting their new courses.  This is followed by six weeks of mixed synchronous and asynchronous course material.  Asynchronous material is stuff the students can do in their own time: videos, readings, assignments, and so on.  The synchronous lesson is a bit more than an hour in a group, meeting via a group video conference, going over any issues in the material and working on case studies, sample problems, and so on.  The course ends with exams or other evaluations back on campus before starting the next courses.

Our commitment is to offer the same program as our full-time residential MBA and our part-time in-Pittsburgh MBA.  So this means, the same courses, faculty, learning objectives, and evaluations that our local students take.

We started this program last September with 29 students, and so far it has gone great.  The students are highly motivated, smart, hard-working, and engaged.  And the faculty have been amazing: they have put in tons of work to adapt their courses to this new structure.  Fortunately, we have some top-notch staff to keep things working.  Unlike some other MBA programs, we have not partnered with any outside firm on this.  If we are going to offer our degree, we want it to be our degree.

I have just finished my own course in this program.  I teach our “Statistical Decision Making” course.  This is a core course all MBA students take and revolves around multiple regression and simulation (the interesting relationships between these topics can wait for another day).  This is not the most natural course for me:  my research and background is more  on the optimization side, but I very much enjoy the course.  And teaching this course has made clear to me the real promise of the hot phrase “business analytics”:  the best of business analytics will combine the predictive analytics of statistics and machine learning with the prescriptive analytics of optimization, again a topic for another day.

My initial meeting with the students concentrated on an overview of the course and an introduction to the software through some inspiring cases.  We then moved on the the six-week distance phase.  Each of the six modules that make up a course is composed of four to eight topics.  For instance, one of my modules on multiple regression includes the topic “Identifying and Handling Muliticollinearity”.  (Briefly: multicollearity occurs when you do regression with two or more variables that can substitute for each other; imagine predicting height using both left-foot-length and right-foot-length as data).  That section of the module consists of

  • A reading from their textbook on the subject
  • One 8 minute video from me on “identifying multicollinearity”
  • One 6 minute video from me on “handling multicollinerity”
  • A three minute video of me using our statistical software to show how it occurs in the software (I separate this out so we can change software without redoing the entire course)
  • A question or two on the weekly assignment.

It would be better if I also had a quiz to check understanding of the topic, along with further pointers to additional readings.

So my course, which I previously thought of as 12 lectures, is now 35 or so topics, each with readings, videos, and software demonstrations.  While there are some relationships between the topics, much is independent, so it would be possible, for instance, to pull out the simulation portion and replace it with other topics if desired.  Or we can now repackage the material as some supplementary material for executive education courses.  The possibilities are endless.

Putting all this together was a blast, and I now understand the structure of the course, how things fit together, and how to improve the course.  For instance, there are topics that clearly don’t fit in this course, and would be better elsewhere in the curriculum.  We can simply move those topics to other courses.  And there are linkages between topics that I did not see before I broke down the course this finely.

I look forward to doing this for our more “operations research” type courses (as some of my colleagues have already done).  Operations Research seems an ideal topic for this sort of structure.  Due to its mathematical underpinnings and need for organized thinking, students sometimes find this subject difficult.  By forcing the faculty to think about it in digestible pieces, I think we will end up doing a better job of educating students.

Creating this course was tremendously time consuming.  I had not taken my own advise to get most of the course prepared before the start of the semester, so I was constantly struggling to stay ahead of the students.  But next year should go easier:  I can substitute out some of the videos, extend the current structure with some additional quizzes and the like, adapt to any new technologies we add to the program, and generally engage in the continuous improvement we want in all our courses.

But perhaps next year, I won’t have to take a hiatus from blogging to get my teaching done!

 

Russia really owned this podium

Back in 2010, Canada’s  goal was to “own the podium” at the Winter Olympics.  What “owning the podium” meant was open to interpretation.  Some argued for “most gold medals”; others opted for “most overall medals”; still others had point values for the different types of medals.  Some argued for normalizing by population (which was won, for London 2012, by Grenada with one medal and a population of 110,821, trailed by Jamaica, Trididad and Tobago, New Zealand, Bahamas, and Slovenia) (*). Others think the whole issue is silly: people win medals, not countries.  But still, each Olympics, the question remains: Who won the podium?

I suggested dividing the podium by the fraction of “reasonable” medal weightings that lead to a win by each country.  A “reasonable” weighting is one that treats gold at least as valuable as silver; silver at least as valuable as gold; no medal as a negative weight; and with total weighting of 1.  By that measure, in Vancouver 2010, the US won with 54.75% of the podium compared to Canada’s 45.25%.  In London 2012, the US owned the entire podium.

The Sochi Olympics have just finished and the result is…. Russia in a rout.  Here are the medal standings:

 

2014medals

Since Russia has more Gold medals than anyone else plus more “Gold+Silver” plus more overall, there are no reasonable weightings for gold, silver, and bronze that result in anyone but the Russian Federation from winning.

Nonetheless, I think Canada will take golds in Mens and Womens hockey along with Mens and Womens curling (among others) and declare this a successful Olympics.

———————————————————————————

(*)  I note that some sports limit the number of entries by each country, giving a disadvantage to larger countries for population based rankings (there is only one US hockey team, for instance but Lithuania also gets just one).