I like to think there are some solid foundations to my life. I will be able to do the Monday New York Times crossword and I will not be able to do the Thursday version. My dental hygienist will not be satisfied with the amount of flossing I do. I will get my favorite spaghetti meal served on my birthday.
And I can trust American Scientist to write articles that teach me science in an accessible, yet challenging, way. American Scientist is the magazine of the honor society Sigma Xi and is my favorite science magazine since the demise of The Sciences, my all-time favorite science magazine. Or it was.
That particular pillar of my existence crumbled into dust today as I was reading an otherwise interesting article on “Paradoxes, Contradictions, and the Limits of Science” by Noson Yanofsky. In this article, there is a sidebar on the Traveling Salesman Problem that makes me weep.
It is fortunate that they have put this behind their firewall so as to minimize the damage they have done. But let me briefly describe: there is a map of the United States, along with a truly atrocious Traveling Salesman tour through the US, apparently visiting each zip code (approximately 43,000 zip codes). The tour is created by “Hilbert Curves” and is “about 75% optimal”. Then, the inevitable, “Finding the actual shortest route would take a computer essentially an infinite amount of time to compute”. Cue the sidebar: “But what about going to 100 different cities? A computer would have to check 100x99x98x….x2x1 possible routes.” They then helpfully expand that into the 157 digit number of potential routes. “The computer would have see how long the route takes, and then compare all of them to find the shortest route.”
The sidebar continues with the drivel of computer speed and the number of centuries it would take.
No, no. 100! times no. It is not necessary to check all the routes. 100 cities TSPs can be solved to proved optimality in a very short period of time. And even 43,000 zip codes can be solved to optimality with a non-infinite (and quite practical) amount of computation. Using integer programming and specialized techniques, Bill Cook and his gang at Concorde have solved to optimality problems with up to 85,900 cities. The “zip code” problem is just like the problems Cook has solved, just smaller.
The Traveling Salesman problem is NP-complete (or NP-hard depending what you mean by the TSP). But as a computational approach, the “figuring out how big 100! is and that is the time” completely misunderstands the role optimization and algorithms play in truly solving hard problems (I mean it: the true optimal is found and proved to be optimal). My most recent blog post (sadly too long ago) was on this very subject. Perhaps I should rename this blog “Michael Trick’s Rants about Complete Enumeration Arguments”.
Even if you want to call this instance impossible to solve, why would you ever use a Hilbert Curve heuristic for it? Don’t get me wrong: I am a student of John Bartholdi and his paper with Loren Platzman on spacefilling curves and his paper applying this to Meals-on-Wheels scheduling changed my life. But these techniques don’t give great tours (they have lots of other great properties).
It appears American Scientist has taken this tour from this webpage. The published map is the bottom one on the page. The resulting tour is clearly not good. Practically any other modern heuristic would do better. This is no complaint about Robert Kosara, the author of the original article: the page describes what he wanted to do, and he did it well. But for American Scientist to put this forward as an example of the state-of-the-art heuristic approach completely misrepresents the state of the art for heuristic approaches. And I have no idea what 75% optimal means: the tour is longer than optimal, of course.
I don’t know if this means I should not trust American Scientist in areas that are not my specialty. On something I know a bit about, the magazine has failed miserably.
Hi, good rant! The idea to use a Hilbert curve came to me when I was thinking about this years ago. So I did about five minutes of research and Wikipedia’s 75% number was good enough for me. And off I went implementing it (was pretty easy, too).
I didn’t really care how optimal it was, I was mostly after a fun variation on my scribble map. I might spend some time redoing it, since it’s another election year… got any good pointers? I can imagine lots of ways of making this more efficient with quad trees, Voronoi diagrams, etc.
Best,
Robert
Isn’t this a chance to be a scientist and publish a peer-reviewed paper refuting the results in American Scientist, showing the provably optimal tour of all US zip codes?
Well, really disappointing from the readers’ perspective. I would be glad to see operations research on a scientific magazine but it clearly needs to explain better the state-of-the-art techniques otherwise it’s just sad.
You’re just miffed because the “75% optimal” route invades Canada multiple times.Hope they’re doing it in the summer — they have to swim back and forth across Lake Michigan, and it can get a bit cold.
I hope you plan on writing a Letter to the Editor. It at least contact the author.
American Scientist does a good job of allowing readers to debate with authors who publish incorrect material in the magazine. Please do send a letter to the editor.
This is true, but this isn’t constructive. It still doesn’t explain the complexity of a constraint satisfaction problem to journalist, let alone his/her audience.
TSP is a red herring: it has only 1 constraint and human technology can solve 10k+ TSP’s optimally in reasonable time. Add a few arbitrary constraints on it (like any business case in the real world will) and that optimally goes out the window.
Instead, let’s take CVRPTW with a load balancing constraint – how do we explain to the journalist that it takes exponential time to solve? How do we explain that no humans or technology on the planet can solve optimally and scale out? The size of the brute force search space matters. But also the degree of restrictiveness of the constraints matter: not too little and not too much.
Please do write a letter to the editors; it will give them an opportunity to bring the error to the attention of all the magazine’s readers, most of whom will not see this blog post.
Let me further suggest that whenever one of your favorite science magazines disappoints you, think for a moment about how you might make it better. A letter the the editor is a start. But maybe you’d like to write an article yourself? Or just suggest subjects and authors who might produce the kind of work you’d like to read about in those pages.
American Scientist is not a distant and faceless institution. The magazine is not created on another planet and beamed here for the edification (or otherwise) of earthlings. It’s a channel of communication created by and for the entire community of science. The magazine is a nonprofit enterprise with a tiny staff of editors, illustrators, and others. They are very smart and work very hard, but they don’t know everything about every branch of science, mathematics, and engineering. With your help, they’ve got a much better chance of avoiding missteps the next time an article mentions the TSP.
Disclosure: Until recently I wrote a column for American Scientist, and once upon a time I was the magazine’s editor. Not an innocent bystander. Over the years I’ve made my own share of goofs in the pages of the magazine, and if I’ve gotten any better over the years, it’s because of all the people who have taken the trouble to show me the errors of my ways.
so…. hear me out… why not just take UPS’ right-turn-only delivery algorithms and map that out to the rest of the county? they need to be able to touch every house (let alone every zipcode). I know I’m not bright enough to sit in anyone’s lecture, but to say that finding “the most efficient way” is impossible, has I think (IMO) probably already been done – just not in the manner spoken of here…. or I’m just not understanding the problem…
Thank you for pointing out a truly fascinating article. Yanofsky’s topic is really interesting and has many intriguing ideas. As for the minor point about TSP, it remains a NP-Complete problem and there are cases where the problem still demands an exponential amount of time. I also think that talking against a great magazine like American Scientist for a sidebar of an article is a bit silly. Surely you jest!
The problem is much harder than simply trying to visit 43,000 points on the surface of a sphere optimally. Each zip code is a polygon on the surface of the sphere. The constraints say “visit each zip code”. Does that mean one has to be on a road? Or can a person walk, swim and rapel across terrain and water? If a road, please define the meaning of “a road”? Gravel? Off-road 2-track? Dirt bike trail? Even when each of these questions is answered, the problem is much harder than just visiting 43,000 points.
Thanks for all the feedback here. Brian Hayes: your column was the reason I subscribed. It was brilliant! And you are right that there are more positive things I can do. But this _particular_ bit of misinformation is extremely grating (as my previous blog post pointed out).
Aside from all the off-topic drivel on HN, I would also tend to wonder about the articles conclusions (having done this by hand myself). But then, there are very few things nowadays I agree with, news of any kind being something I will usually disagree with. Certainly not worth a blog however.
A couple of small points:
Mike, I would guess that “within 75% of optimal” probably means “within 175% of optimal,” or “with an error within 75% of the optimum.” That’s apparently not the invention of the sidebar writers, but appears in many articles Google turns up about the Hilbert curve heuristic. Nevertheless, Christofedes’s heuristic is already within 150% of optimal. And add my vote to sending a letter.
Don Wills, indeed a realistic routing problem based on this premise is harder to formulate (what points in each zip, what roads count, geodesic vs. Euclidian vs. odometer distance, etc.), much less solve. But the sidebar is still just about the stylized TSP: if you had to visit 43000 points with some distances (presumably at least respecting the triangle inequality), what’s the most efficient way to find the best tour? Certainly not complete enumeration.