January 2012 – Mike Trick's Operations Research Blog

Car Talk and TSP Paths

On the suddenly hot topic of the Traveling Salesman Problem (see here and here), this week’s Car Talk puzzle is a TSP-like problem (though it is really a graph theory problem: the hamiltonian path problem to be exact).

The company that Bobo works for just finished a new product. They wanted to promote it across the country. Bobo was asked to travel by car to each of the 48 contiguous U.S. states to promote the product. He was told that he could visit each state in whatever order he chose, but the company wanted him to start in Delaware, at their headquarters.

They asked that he visit each state only once. He could not go back into a state he had already visited because this was the “Don’t Look Back” product tour. So, Bobo sat down at his desk and began to plan his trip.

He realized immediately that it was going to be one long car trip. At that moment, his boss stopped by and said, “Hey, I’m going to join you when you reach your last state. I was born there and I’ve been looking for a reason to go back and visit. You can leave your rental car there, and I’ll fly you back in my private jet.”

Since Bobo hadn’t planned his trip yet, how did his boss know which state was going to be Bobo’s last state? And, which state would that be?

That question is not hard (I won’t ruin it by giving the solution here: call the state X); the key is the statement about the flight back from the final state.

But it is strange that Delaware was given as the starting state. Does it really matter? Well, if you start in State X, you can end up in many (but not all) of the other states. But it if you start in some other states, there is no path through all the other states without repeating a state. I count seven starting states that preclude paths through all the other states by road without repeating a state. Which can you find? The following map is just for illustrative purposes: the question refers to the “real” United States state structure (so you might need to check a more detailed map to see what states connect directly to others).

NOTE ADDED 1/17/2012. Reddit’s lordlicorice has drawn the graph underlying the States:

Thanks to @wjcook for pointing out the Car Talk problem.

A Love Letter to the Traveling Salesman Problem

Bill Cook of Georgia Tech has a new book out on the Traveling Salesman Problem entitled “In Pursuit of the Traveling Salesman” from Princeton University Press (or from Amazon or B&N). Unlike his previous book (with David Applegate, Bob Bixby and Vašek Chvátal), this book is aimed at a general audience. Bill takes his readers down a beautiful path covering the history, applications, and algorithms associated with the TSP. It is a fascinating story, and one that shows a researcher who truly loves his research area. You would think such a love was common among researchers, but I have seen many researchers who seem bored by, or positively hate, their research areas. Bill is obviously enraptured by the TSP.

The Traveling Salesman Problem is remarkable easy to state: given a set of points with distances between them, find the shortest tour through all the points. For example, as Bill wrote about recently, given 99 towns in Iowa to visit and the road distances between every pair, what is the quickest way to see all the cities, returning to your starting position?

Despite its simple statement, the problem of finding good or optimal tours has generated thousands of research papers. The TSP is the standard testbed for discrete optimization: practically every possible approach to optimization can be tried out on the TSP. So research on the TSP has had a tremendous effect on approaches for all sorts of other optimization problems. Bill covers every aspect of this research corpus.

Perhaps the most interesting part about this book is that Bill pulls no punches when it comes to presenting work. Most of us would be tempted to go only so deep into an area and then start hand-waving believing “well, no one would understand this area anyway”. Here is how Bill begins a description of finding comb inequalities, a type of constraint necessary to determine the TSP polytope:

Subtour-elimination constraints? No problem. Comb inequalities? Not so good.
At the moment there is no known polynomial-time algorithm that is guaranteed to
spot a violated comb inequality if one exists. The comb-separation problem is also
not known to be in the NP-complete complexity class, so its status is wide open.
A great need coupled with inconclusive results translates into an important research
problem.

This is not to say that combs are currently skipped over in computations. By
hook or by crook computer codes must generate combs to push LP bounds after
subtour-elimination constraints fail. This is accomplished by hit-or-miss strategies
for growing handles, shrinking teeth, splicing sets, and a host of other operations.

If I tried to write this book, I don’t think I would have even got to comb inequalities, let alone continued to the nice description Bill provides of comb-finding heuristics. This level of discussion requires some concentration when reading but the book is accessible to a wide audience. This is an ideal book to get a college student or a smart high school student interested in operations research and combinatorial optimization.

This is a highly personal look at the TSP. Bill tells you his favorite heuristic (farthest insertion), his favorite exotic computing model (time traveling solver: take as long as you like to solve but then go back in time to report the solution), and much more.

Chapters have interesting lead quotes (the book begins with Listen, mate, I’ve traveled every road in this here land. from Geoff Mack in the song “I’ve Been Everywhere”), and the illustrations throughout are fantastic. I also love the way that Bill includes pictures of practically everyone who has done research in this area. Putting faces to the names is a wonderful way to personalize the research.

Through this book, you’ll learn all about the Traveling Salesman Problem and, more broadly, about the different research directions in combinatorial optimization. And you’ll also see why top-notch researchers like Bill Cook find the TSP endlessly fascinating. And you might just want to be a researcher like Bill.

I guess a bit of a disclaimer is needed here: Bill has included a very nice mention of this blog in the book (thanks Bill!) with a picture of me, and his publisher sent me an advance copy of the book. I failed miserably in getting a nice quote in on time, but that is due to my own failings. I really love the book!

Death to Data Hogs! Ooops, that’s us!

The New York Times has an article on the concentration of use of mobile airwaves. It seems that 1% of the users consume half the bandwidth:

The world’s congested mobile airwaves are being divided in a lopsided manner, with 1 percent of consumers generating half of all traffic. The top 10 percent of users, meanwhile, are consuming 90 percent of wireless bandwidth.

Arieso, a company in Newbury, England, that advises mobile operators in Europe, the United States and Africa, documented the statistical gap when it tracked 1.1 million customers of a European mobile operator during a 24-hour period in November.

Wow! That is amazing! Those greedy bandwidth hogs! If it wasn’t for that 1%, usage would go down 50%. Get rid of 10% and usage goes down 90%! Time to Occupy the Download Link! Right?

Wrong. As my former student Ben Peterson pointed out, measuring usage over a day doesn’t give much of an idea of the daily variability of usage. For instance, three days ago I had some time to kill, so I downloaded a video and a few games. I then updated all of my apps, and cleaned up my photo collection by sending some pictures to my web photo storage. I was a complete bandwidth hog! Since then, I haven’t used my phone much at all. So, three days ago, I was part of the 1%; since then I haven’t been. Which day did you measure me?

So yes, on a given day, 1% of the people use half the bandwidth. But, unlike financial wealth, we get to change around (every day!) who that 1% are.

If you want to use analytics, you really have to think about what your statistics mean and how you can generalize it. The New York Times (again!), like much of journalism, rarely seems to take the next step and actually think about the numbers.

16 Clue Sudokus

I am sure everyone has seen Sudoku puzzles: it was quite a fad a few years ago. The puzzle begins with an 9×9 grid, partially filled with numbers from 1 to 9 (the “givens”). The goal is to complete the grid so that that every row, column, and the nine 3×3 subgrids in the corners and center all contain the numbers 1 through 9 with no missing and none repeated. There was a time in my life where I loved Sudoku puzzles. I had to give them up when I started to dream solely in Sudoku.

The puzzles can be difficult or easy depending on straightforward it is to deduce missing values. A grid with many givens tends to be very easy, since there are few choices for missing values (though there can be hard problems with many givens and easy ones with few). If too few values are filled in, however, the solution may not be unique, and it is a fundamental precept of puzzle-creators that the solution must be unique. So, how many givens are needed to ensure uniqueness? It has been known for a long time that there exist problems with 17 givens that lead to unique solutions.

Gary McGuire, Bastian Tugemann, Gilles Civario of University College Dublin proved this result is the best possible: there is no problem with 16 givens with a unique solution. They do this in a very interesting way: they did an exhaustive search. Given the number of possible solutions, it is surprising that exhaustive search works, but they were clever in organizing the search, and had access to a ridiculous amount of computational power. This combination let them run through all the choices. They didn’t find a 16, therefore there is not a sixteen!

Fascinating work, and a sign of how fast computing can change how we look at research problems. Technology Review has an article on this; the technical report is at arXiv.

Super Exciting News on Super Polynomiality of LP Formulations of the TSP Polytope

Years ago, I spent a very pleasant couple of weeks in a group debunking a claimed linear programming formulation of the Traveling Salesman Problem. I wrote on this before, and bewailed the fact that I was not smart enough to figure out there was a general theorem there: Yannakakis showed that no symmetric linear programming formulation of polynomial size can formulate the TSP. The “symmetric” part of the theorem was always bothersome since it seemed an unnecessary addition. You can make a symmetric formulation “unsymmetric” by doing something goofy in adding a useless variable or similar, but the fundamental result still holds. Can asymmetry work on a fundamental level? I worked on trying to remove the symmetric assumption, but had no success. That’s not surprising since Yannakakis clearly would have not included the requirement if it was easy to remove. Yannakakis is smarter than Trick. Therefore Trick cannot remove the requirement. Q.E.D. But I still tried for a while.

Fortunately, research moves on, and people learn more and more, and finally enough gets proved that smart people can figure out how to move forward. It appears that the symmetric requirement can be removed. Sebastian Pokutta, and his coauthors Samuel Fiorini, Serge Massar, Hans Raj Tiwary, and Ronal de Wolf have a paper that proves that any linear programming formulation (symmetric or not) of the TSP is ~~exponentially sized~~ super polynomial. Sebastian’s blog post has a very nice description of some recent history (including the astonishing result that sometimes the symmetry requirement does matter) and gives pointers to both the paper and to some slides. I have not had time to go through things thoroughly (or at all, really) but the paper seems a trove of interesting results. I feel like starting a Ph.D. seminar next week to study it!

Between Bill Cook’s soon-to-be-released TSP book and this news, 2012 is shaping up to the be the year of the Traveling Salesman Problem!

The Perils of Search Engine Optimization

This blog has been around more than six years, making it ancient in the blogosphere. And, while not popular like the big-boy blogs (I run about 125,000 hits a year with about 1500 RSS subscribers according to FeedBurner), I think I have a reasonable-sized audience for the specialized topic I cover (operations research, of course!). People recognize me at conferences and I get the occasional email (or, more commonly, blog comment) that lets me know that I am appreciated. So I like blogging.

The past few months, however, have been a chore due to the amount of comment spam I get. Of course, I have software to get rid of most the spam automatically (Akismet is what I have installed), since otherwise it would be unbearable. Akismet stopped 5,373 spam comments in the last year. This sounds like a lot but that is way down from the heights a few years ago: Akismet stopped 6,711 spams in the month of March, 2009 alone. Unfortunately, it is letting a lot more spam come through for me to judge: in the past year 619 entries were put through to moderation that I determined were spam. This is a frustrating exercise since I like my readers: if they want to say something, I want them to say it! But comment after comment from places like “Sacremento Cabs” or “Callaway Reviews” saying vaguely on-topic things is a bit hard to take. Sometimes it seems that someone has taken the effort to read the blog post and comments:

“From the communication between the two of you I think I can say that I wish I had a teacher like Mr. X and I wish I had a student like Y.”

came in recently, where Mr. X and Y were previous (legit) commentators. But the URL of the commentator was to some warez site, so I deleted it all. Is this a human posting, or just some pattern matching?

Why the sudden influx? Further checking the logs showed that a number of people (a couple hundred per month) are getting to this blog by searching something like ‘site:.edu inurl:blog “post a comment”‘. Sure enough if I do that search (logging out from google and hoping I get something like a generic search result, I get the following:

Wow! Of all the .edu blogs that include the phrase “Post a Comment”, I come in at number 3! Of course, despite my efforts, Google may still be personalizing the search towards me, but clearly I am showing up pretty high to attract the attention of hundreds of people. Through my diligence and efforts, I have made this blog attractive to the Google algorithms (I do seem to be number 1 for “operations research blog” and some other natural searches). This is a great Search Engine Optimization success!

Or not. Because clearly I am attracting lots of people who have no interest in what I have to say but are rather visiting to figure out how they can manipulate me and the blog for their own non-operations research purposes (I am perfectly happy to be manipulated for operations research purposes!). The sponsored link in the search gives it away: there are companies working hard to get comments, any comments, on blogs (presumably any blogs). How many of those 125,000 hits were really my audience (people in operations research or those who would like to know more about it)? Do I really have an operations research audience at all (beyond Brian, Matt, Laura, and a few others who I know personally)?

I’ll spend time thinking about ways to avoid this aggravation. I’ve already put in NOFOLLOW tags, so there is no SEO value to any URLs that get through. I already warn that if the URL submitted is not on operations research, then the comment will be deleted. I could get rid of URLs completely, but I do see legitimate comments as a way of leading people to interesting places. I could add more CAPCHAs and the like, though I am having trouble with some of those myself, particularly when surfing with a mobile device. Or I can put up with deleting useless comments with inappropriate URLs and just relax about the whole thing. But fundamentally: how can you do Search Engine Optimization to attract those you would like to attract without attracting the attention of the bottom-feeders?

On the off chance that one of the …. shall we say, enterprising souls, has made it through this post, perhaps you can explain in the comments why you add a comment when the topic is of no interest to you. Do you get a dime for every one that makes it through? Are you bored and find this a useful way to pass a quiet afternoon? Are you a program, mindlessly grabbing parts of the post and comments to make yourself look more human? Add a comment: I probably won’t let your URL through but at least we can understand each other a bit better.

Cure for the Winter Blues

The weather here in Pittsburgh is finally feeling more like winter: it is a dreary, windy day, with intermittent snow. I suspect it will only get worse over the next two months.

Fortunately I know a cure for the winter blues: Optimization! Optimization with the INFORMS crowd, that is, at the Fourth INFORMS Optimization Society Conference, February 24-26, 2012. I’ll be there (I’m the co-chair, so attendance is kinda expected) and I think there is a great slate of plenary and special presentations:

Jeralt Ault from the University of Miami on using analytics for managing fisheries,
Manoj Saxena from IBM on aspects of Watson,
Dimitris Bertsimis from MIT on analytics in sports,
Suvrajeet Sen from Ohio State on stochastic mixed integer programs,
Dave Alderson from NPS on attacker-defender modeling

(all these are my gloss on the topic: formal titles and abstracts are coming soon).

Nothing like a bit of optimization to make the world a bit brighter!

If you’d like to submit an abstract for this conference, get a move on: the due date is January 6, 2012. Abstracts are short (300 characters) so it shouldn’t take much time to put something together.

Oh, and the conference will be in Miami, which might also do some good for getting away from the winter for many of us.

Operations Research Resolutions for 2012

It is the time of the year for resolutions. Resolutions help define what we would like to be, but are damned hard things to follow through on. As Paul Rubin points out, gyms will be overrun for a few weeks with “resolvers” (or, as my former personal trainer called them “resolutionists”) before the weight room empties again. I’ve been as guilty as any in this regard: it is no coincidence that my gym membership runs January 15-January 15 and that I renew it each year with the best intents, and the worst results. Paul suggests an OR view of resolutions:

…the New Year’s resolution phenomenon offers a metaphor for O. R. practice. The “resolver” diets, exercises, stops smoking or whatever for a while because the “boss” (their conscience) is paying attention. When the “boss” stops watching, the “resolver” makes excuses for why the new regime is too difficult, and reverts to previous behavior. An O. R. solution to a business problem that is implemented top-down, without genuine commitment by the people who actually have to apply the solution (and change behaviors in doing so), is likely to end up a transient response leading to a return to the previous steady state.

So, in keeping with an operations research view of resolutions, I’ve been thinking about my resolutions with a particular focus on variables (what choices can I make), constraints (what are the limits on those choices) and objectives (what I am trying to accomplish). It does no good to define objectives and go willy-nilly off in those directions without also defining the constraints that stop me from doing so. But, of course, a creative re-definition or expansion of variables might let me find better solutions.

I have personal resolutions, and take inspiration from people around me who are able to transform themselves (yes, BB, I am talking about you!). But I also have some professional resolutions. So, here they are, and I hope they have been improved by an operations research view:

Make time for research. This might seem to be a funny resolution: isn’t that a big part of what professors do? Unfortunately, I have taken an administrative role, and there is a never-ending list of things to do. Short term, it seems if I will be a better Associate Dean if I get on with the business of Associate Deaning, but long-term I know I will be a better Associate Dean if I keep active in research. The decision model on time allocation has to be long term, not short term.
Do what makes me happiest. But where will the time come from? I need to stop doing some things, and I have an idea of what those are. I have been very fortunate in my career: I’ve been able to take part in the wide varieties of activities of a well-rounded academic career. Not all of this gives me equal pleasure. Some aspects (*cough* journals *cough*) keep me up at night and are not my comparative advantage. So now is the time to stop doing some things so I can concentrate on what I like (and what I am comparatively good at). While many of my decisions in my life can be made independently, time is a major linking constraint.
Work harder at getting word out about operations research. This has not been a great year for this blog with just 51 posts. I don’t want to post just for the sake of posting, but I had lots of other thoughts that just never made it to typing. Some appeared as tweets, but that is unsatisfying. Tweets are ephemeral while blog entries continue to be useful long after their first appearance. This has been a major part of my objective function, but I have been neglecting it.
Truly embrace analytics and robustness. While “business analytics” continues to be a hot term, I don’t think we as a field have truly internalized the effect of vast amounts of data in our operations research models. There is still too much a divide between predictive analytics and prescriptive analytics. Data miners don’t really think of how their predictions will be used, while operations researchers still limit themselves to aggregate point estimates of values that are best modeled as distributions over may, predictable single values. Further, operations research models often create fragile solutions. Any deviation from the assumptions of the models can result in terrible situations. A flight-crew schedule is cheap to run until a snowstorm shuts an airport in Chicago and flights are cancelled country-wide due to cascading effects. How can we as a field avoid this “curse of fragility”? And how does this affect my own research? Perhaps this direction will loosen some constraints I have seen as I ponder the limits of my research agenda.
Learn a new field. While I have worked in a number of areas over the years, most of my recent work has been in sports scheduling. I started in this are in the late 90s, and it seems time to find a new area. New variables for my decision models!

OK, five resolutions seems enough. And I am not sure I have really embraced an operations research approach: I think more constraints are needed to help define what I can do, my objective is ill-defined, and even my set of variables is too self-limited. But if I can make a dent in these five (along with the eight or so on my personal list) then I will certainly be able to declare 2012 to be a successful year!

Happy New Year, everyone, and I wish you all an optimal year.

This entry is part of the current INFORMS Blog Challenge.