Blogs and Web – Page 9 – Mike Trick's Operations Research Blog

Final Four based on Salaries

Payscale.com has the college basketball bracket based on median salaries of its graduates (5-10 years after graduation). Stanford ($113,000) beats Notre Dame ($99,100) in the final, with Duke ($96,800) and Georgetown ($92,500) the other two in the final four.

I always do my picks based on the quality and quantity of operations research done at the school. My final four this year were George Mason, Wisconsin, Cornell (in a tough division, beating Stanford, Kentucky, Texas, and Pittsburgh along the way) and UCLA, with Cornell winning it all. Needless to say, I won’t be getting paid this year: it appears that OR quality has, at best, a weak correlation with basketball prowess.

Disaster in the making, and averted

According to the New York Times, I am unlikely to be a successful researcher in my chosen field of operations research. The reason? No, not due to an insufficient mathematical grounding, or a fuzzy understanding of methods of symmetry breaking for integer programs, but rather due to a social effect: I like to drink beer. I particularly like to drink beer with other academics. Here at the Tepper School, there is a Friday Beer group that goes out at the end of every week, and drinks.. yes.. beer. At OR conferences, I am likely to be found in the bar talking with friends (and adversaries!), and have often evaluated a conference on the quality of that bar and those conversations. In fact, as President of INFORMS, I took it as a platform that people should drink more beer together (actually, I advocated a stronger understanding of social capital, but it is a rather thin line). But the New York Times, via a Czech ornithologist, says that is a problem:

What is it that turns one scientist into more of a Darwin and another into more of a dud?

After years of argument over the roles of factors like genius, sex and dumb luck, a new study shows that something entirely unexpected and considerably sudsier may be at play in determining the success or failure of scientists — beer.

According to the study, published in February in Oikos, a highly respected scientific journal, the more beer a scientist drinks, the less likely the scientist is to publish a paper or to have a paper cited by another researcher, a measure of a paper’s quality and importance.

Oh no! I still have aspirations to be OR’s answer to Darwin and Einstein. Am I ruining my chances by partaking in the golden nectar? Is having my “conference preparation” be limited to checking out the brewpubs in the area fundamentally flawed?

Fortunately, there are people out there who spend time debunking such myths, and lithographer Chris Mack was on the job. In a brilliant piece of work, Chris provides an excellent summary of what can go wrong in statistical analysis. He sees a number of problems with the analysis:

Correlation is not causation. Perhaps it is poor research that drives one to drink (as alluded to in the Time article), or there is a common factor that drives both (a nagging spouse or an annoying dean, perhaps).
There aren’t many data points: just 34, and the r-squared value is just .5
The entire correlation is driven by the five worst-performing, and heaviest drinking, researchers.
It is likely those five are drinking with each other, messing up the independence assumption of linear regression.

So, as Chris says:

Thus, the entire study came down to only one conclusion: the five worst ornithologists in the Czech Republic drank a lot of beer.

Whew! That’s a relief. The next time operations research gets together with lithography, I owe Chris a beer.

Thanks to, ironically, my drinking buddy Stan for the pointers.

P.G. Wodehouse approach to Modeling?

P.G. Wodehouse, creator of Bertie and Jeeves, Psmith, and countless other characters reeking of prewar (WWI, not Iraq) England, is one of my favorite authors (my favorite being Patrick O’Brian). When I am sick or down, or simply at loose ends, I love to get out one of my Wodehouse books and get lost in worries of stolen policeman’s hats and avoiding harassment by elderly aunts.

The Basildon Coder has an article entitled “The P.G. Wodehouse Method of Refactoring”. Now, refactoring is pretty important to a lot of operations research. In particular, for linear programming, the basis is normally kept as a sequence of eta-vectors whose application results in the inverse of the basis matrix. As the sequence gets long, there will come a time to refactor the basis matrix, reducing the eta sequence (see here for some details). What-ho! Bertie (or, more likely, Jeeves) has thoughts on this?

Unfortunately, not, but close. Refactoring in this context is “any change to a computer program‘s code that improves its readability or simplifies its structure without changing its results”, according to wikipedia. So when a computer program is built on over the years, at some point there is a wish to rewrite or otherwise consolidate the code so it is back looking new, ideally with the effects of all the changes intact. Of course, this is relevant to OR also. We often put together models that have been added on to, changed, and modified to within an inch of their lives. And at some point we would like to have the model “we should have written” in the first place. So there is a temptation to toss everything out and start again. In fact, I often recommend doing so in my classes. But that isn’t very good practice, according to contemporary work in refactorization:

Now, the first mistake to avoid here is the compulsion to throw it away and rewrite from scratch. So often when confronted with a vast seething moiling spiritless mass of code a developer throws his hands into the air and declares it a lost cause. How seductive is the thought that 31,000 lines of code could be thrown away and replaced with ~15,000 lines of clean, well-designed, beautiful code?

Sadly, that’s often a path to disaster. It’s almost a rule of the game. jwz left Netscape because he knew their decision to rewrite from scratch was doomed. Joel Spolsky wrote a rant about the same decision – in fact, the Netscape rewrite is commonly cited as a major factor in Netscape losing the first browser war.

The problem is that warty old code isn’t always just warty – it’s battle-scarred.

This is a very good point. Back before I found someone (Hi Kelly!) who could work with me on sports scheduling, doing all the hard work (and most of the inventive work), I tossed away my main sport scheduling models (embedded in thousands of lines of C++ code) two or three times. Every time, the first thing I lost was two months as I remembers why I had in some of the more obscure code conditions in the models. It was always nice to work with “new” code, but it would have been nicer to work with new, commented, code, which never quite seemed to happen.

So where does P.G. Wodehouse come in? When refactoring a large computer code, the author (the Basildon Coder, not Wodehouse) suggests putting up printouts of the code, greatly reduced. The resulting display gives a hint of where code problems occur: lines squeezed to the right are a sign of deep nesting in loops, dense mass is often overly-complicated code, and so on. And this reminded him of Wodehouse:

The first time we pinned up the printouts, I suddenly recalled a Douglas Adams foreword reprinted in The Salmon of Doubt. Adams was a great fan of P.G. Wodehouse, and explained Wodehouse’s interesting drafting technique:

It is the next stage of writing—the relentless revising, refining, and polishing—that turned his works into the marvels of language we know and love. When he was writing a book, he used to pin the pages in undulating waves around the wall of his workroom. Pages he felt were working well would be pinned up high, and those that still needed work would be lower down the wall. His aim was to get the entire manuscript up to the picture rail before he handed it in.

(Adams, 2002)

This actually seems like a pretty good way to write models, and to write papers. When I write a paper, I generally work on a sentence-by-sentence level, and only rarely look at a the big picture. So perhaps when you visit me next, I will have two sets of papers pinned to my wall: my current optimization model, and my current journal submission-to-be.

By the way, be sure to click on the illustration to get to the site of Kevin Cornell, who looks to be a very talented and humorous illustrator to me!

The Fastest Traveling Salesman Solution

xkcd has a different take on solving the Traveling Salesman Problem. Perhaps no travel is necessary! Next: the n-salesman solution to the n-point TSP.

More on the NCAA Tournament

Punk Rock Operations Research has a great post on OR methods for choosing your bracket. Check it out!

Summary of Quantum Computing

Scott Aaronson, whose writings I both admire and am jealous of, has an article in the month’s Scientific American on the limits of quantum computing. He has posted a preliminary version of the paper on his site. I found this extremely useful in trying to make sense of what quantum computing can and can’t do. It is a shame the writers at SlashDot don’t read the paper before making the comments showing their confusions!

A Good List to be On

Sandy Holt of Systems Analysis and Planning, Inc. (and one of my contacts on LinkedIn) wrote to let me know that “operations research analyst” was one of the careers on MSN’s list of “Ten Jobs that pay more than $30/hour”, albeit on the list at number 10 with a median salary of $31.08/hour ($64,650/year). The description of OR analyst isn’t bad:

Operations research analysts are brought into businesses and organizations to identify, investigate and solve logistics problems through the use of statistical analysis and computer programs. The type of problems can vary depending on the nature of the business, whether it’s a production factory or the military.

I find “logistics problems” and the misleading phrase “computer programs” a bit problematical, but it is not the worse description I have seen.

Feb 10: Did I really not include a pointer to the article? It is here.

Andy Boyd, Pricing, and “The Engines of Our Ingenuity”

Andy Boyd, formerly chief scientist of PROS (he is actually still on their scientific board, but is not an active employee) visited CMU today as part of our CART (Center for Analytical Research in Technology) seminar series. He talked about the challenges those in pricing research face. The main point he made is that it is very difficult to figure out demand curves (hence elasticity of demand) through data. Having even lots and lots of transaction level data doesn’t help much in generating demand curves. This is not a new insight (economists refer to these sorts of issues as “identification problems”) but it was interesting to hear this from someone who has made a living doing pricing for the last decade. Without demand curves, how can pricing be done? Airlines have enough separate flights (for which you can assume no substitution) to do a fair amount of experimentation. How can other areas get similar data? Further, Andy makes the point that without understanding the sales process, it is impossible to interpret any data. For instance, for a given kind of car, there will be a few sales at a low value, lots of sales at a medium value, and a few sales at a high value. This does not mean that the demand for cars goes up then down as a function of price! Since car prices are generally negotiated, only a few of the best negotiators will get the lowest price.

Andy makes a strong case that operations research needs to be applied more in the entire sales domain, from customer segmentation through pricing to negotiation. The lack of underpinning in something as fundamental as a demand curve is a little disconcerting, but he stressed for many markets (those without “posted prices”), demand curves may be the wrong modeling approach in the first place.

Andy is now “semi-retired” (I guess he did well when PROS went public) but certainly seems to have lots going on. Once a week, he does a radio show on the Houston public radio radio station. The show is called Engines of Our Ingenuity and Andy does his version on Thursdays. The transcripts are available for the shows. Andy is normally referred to as “guest scientist” but he is sometimes called “operations researcher”, which makes me happy. A recent show of his was on operations research legend George Dantzig, concentrating on his development of the simplex algorithm and his lack of Nobel Prize. Other episodes involve the four color theorem, mathematical models, parallel computing, and operations research itself, along with much, much more. John Lienard is the driving force behind The Engines of Our Ingenuity.

Also, Andy has a new book out on pricing entitled The Future of Pricing: How Airline Ticket Pricing has Inspired a Revolution. Andy and I go back more than twenty years: it was great to see him and see all the amazing things he is doing, even if he is “semi-retired”.

Dice, Games, and ORStat

Last year, I received a paper from Prof. Henk Tijms of the Vrije University Amsterdam on using stochastic dynamic programming to analyze some simple dice games (pdf version available). A few years ago, I tried to do something similar with an analysis of a game I called Flip, but which is more commonly known as “Close the Box” (the paper appeared in INFORMS Transactions on Education). Both Tijms’ work and mine spend a fair amount of time discussing how well certain easy heuristics do relative to optimal decision making in simple games. Ideally, heuristics would get good, but not optimal solutions: that would make the game challenging as players tried to come up with better and better heuristics. For “Close the Box”, while the optimal decision was quite subtle, some simple heuristics got pretty close (perhaps too close to discern the difference). These games make good classroom demonstrations and even better mini-projects for summer schools and the like. Tijms’ paper was written for a journal aimed at students.

Tijms has also done the field a great service by making his software for applied probability available, which are good tools for education. Check it out at his web page.

End of INFORMS Resources Page?

In 1994 I began collecting links about operations research on the internet. Of course, it was pretty easy at the time. There were only about 1000 web servers at all, so there were just a handful of OR links. But there was also gopher and ftp, so I could put together a pretty good page with 30 or so entries. Over time, “Michael Trick’s Operations Research Page” grew and grew, encompassing a couple thousand pointers. It is through that page that I became involved in INFORMS, by becoming the founding editor of INFORMS Online.

In late 2000, I finished my terms as editor, but was then elected President of INFORMS (I suspect MTORP had something to do with that). At that time, I donated MTORP to INFORMS, where it became the INFORMS OR Resource page. At the time, it was the second most accessed page at INFORMS (next to the conference page). I continued to edit the page, primarily because the software made it pretty easy to do.

Over the past few years, I have been thinking that the time for the resource collection is pretty well over. The page was started long before google, and played an important role when finding information on the web was hard. With google and its competitors, that is no longer the case. A quick search can find any page on the web in an instant. If I want to find something about OR, I go to google, not the OR Resources page.

The page is actually taking more time these days. Spammers attack the page, and integrating the system in the overall INFORMS Online system is a hassle. The worst aspect is updating the page. About 1/3 of the links are no longer valid, but correcting everything needs to be done by hand. So I am thinking perhaps the time for the system is over. What do you think? I see three choices, though I am sure there are more. I (we) could:

Shut down the page, perhaps replacing it by an edited blog on what’s new on the web in OR (similar to this blog, perhaps).
Continue to limp along about the way we are doing things.
Find someone else to come in, provide direction and excitement and show what a resource pointer collection can really be!

So, I’m interested: what do you think we should do?