A Better Random Number Generator?

In operations research, we often use “random” numbers.  Whether we are doing a simulation of a manufacturing facility, generating future economic scenarios in financial optimization, or creating “random” instances to test our algorithms, we use up lots of random numbers.  Terry Pratchett might wonder whether we are using up all the randomness in our need for these values!

For the most part, our “random” numbers are not that at all:  they are numbers generated by a particular formula, the results of which pass certain tests for randomness.    But any random number generator has flaws and patterns.    A “weak” random number generator (think of the “middle square approach” where a number is squared and the middle part is used as the random number, and seed for the next squaring) will quickly show its flaws, while a strong one will hide its patterns longer.

But what if you want true randomness?  Or, to avoid philsophical and physical discussions, as random as rolling a die?  The Dice-O-Matic can come to your rescue!  Able to generate more than a million die rolls per day, the Dice-O-Matic is a great addition to the offices of anyone doing work in simulation.  With a footprint of a mere 18″ by 18″ (albeit 7 feet tall), this certainly looks to be about as random as one could hope.  And with the periodic melting and squashing of dice (due to the ire of gamers annoyed at an outcome), any biases will quickly be eliminated.

I think we should all include this on our next NSF grant proposals.

Brenda Dietrich “Most Creative”

Fast Company has IBM Vice President, Business Analytics and Mathematical Sciences, Brenda Dietrich as number 27 on their list of “100 Most Creative People in Business”.    Nice quote from her:

“Mathematics,” says Brenda L. Dietrich, 49, “is not mechanical. You’re finding how things look different on the surface and then seeing what they have in common underneath.”

It would be easy to put Brenda in the top 100 most brilliant or top 100 hardest working.  It takes creativity to put her as one of the 100 most creative, and it is a creativity I am delighted to see.  It can be hard to convince people that our field is primarily about creativity:  rather than paintbrushes, we use mathematics to be creative.

Thanks to the INFORMS e-news for the pointer.

Gurobi software now available for download

I am behind Erwin Kalvelagen, who writes an extremely useful blog where many challenging modeling problems are solved (this is one of my “must read” blogs), in announcing that Gurobi’s standalone software is now available.  I particularly like that the trial version is 500 variables and 500 constraints, which is large enough to see how the software really works, at least for MIP.  I had attended part of a software session at the recent INFORMS conference and left impressed both with the software and with Gurobi’s business plan.

Data Mining Competition from FICO and UCSD

I am a sucker for competitions.  I have run a few in the past, and I see my page on the Traveling Tournament Problem as an indefinite length computational competition.    Data Mining naturally leads to competitions:  there are so many alternative techniques out there and little idea of what might work well or poorly on a particular data set.  The preeminent challenge of this type is the Netflix Prize, where the goal is to better predict customer movie ratings, and win a million dollars in doing so.  I have written before about the lessons to be learned from this particular challenge (in short:  while it might be a nice exercise, it is pretty clear that the improvements given by the algorithm would have little noticeable effect on the customer experience).

FICO (formerly known as FairIsaac) has sponsored a data mining competition with the University of California San Diego for a number of years.  The competition is open to all students (and postdocs) and have just announced the 2009 competition.  The website for the competition is now open, with a finish date for the competition of July 15, 2009.

The data involves detecting anomalous e-commerce transactions and come in “easy” and “hard” versions.  I have spent a couple of minutes with the data and it is quite interesting to work with.

I do have one complaint with this sort of data mining.  In my data mining class, I stress that you can do much better data mining if you understand the business context.  This understanding need not be overly deep, but it is hard to analyze data that is simply given as “field1”, “field2”, and so on.  For problems where creating new fields is important (say, aggregating ten types of insurance policies into one new field giving number of insurance policies purchased), if you don’t understand what the data means, it is impossible to generate appropriate new fields.  The data set in this competition has had its fields anonymized so strongly that finding any creative new fields will be more a matter of luck than anything else.

Despite this caveat, I think the competition is a great chance for students to show off what they have learned or developed.  It would be particularly nice for an operations research approach to do well.  And it doesn’t last forever like the Traveling Tournament Problem or, it seems, the Netflix Prize.

Whither Data Mining?

The New York Times Magazine this week is a special issue on debt (a topic that has a particular resonance to me:  we are still paying off an expensive, but spectacular, year in New Zealand!).   There is a fascinating article on what credit card companies can learn about you based on your spending.   For instance,

A 2002 study of how customers of Canadian Tire were using the company’s credit cards found that 2,220 of 100,000 cardholders who used their credit cards in drinking places missed four payments within the next 12 months. By contrast, only 530 of the cardholders who used their credit cards at the dentist missed four payments within the next 12 months.

A factor of 4 is a pretty significant difference.  That should be enough to change the interest rate offered (and 2% default in 2002 is pretty high).   The illustrations to the article go on to suggest that chrome accessories for your car are a sign of much more likely default, while premium bird seed suggests likely on-time payment.

The article was not primarily about these issues:  it was about how companies learn about defaulters in order to connect to them so that they will be more likely to pay back (or will pay back more).  But the illustrations did get me thinking again about the ethics of data mining.  Is it “right” to penalize people for activities that don’t have a direct effect on their ability to payback but only a statistical correlation?  Similar issues came up earlier when American Express started to penalize people who  shopped at dollar stores.

I brought this up during my ethics talk in my data mining course, and my MBA students were split on this.  On one hand, companies discriminate on statistical correlations a lot:  teenage boys pay more for insurance than middle-aged women, for instance.  But it seems unfair to penalize people for simply choosing to purchase one item over another.  Isn’t that what capitalism is about?  But statistics don’t lie.  Or do they?  Do statistics from the past hold equivalent relevancy in today’s unusual economy?  Is relying on past statistics making today’s economy even worse?  Should a company search for something with a more direct correlation or is this correlation enough?

At the Tepper School at Carnegie Mellon, we generally put more faith in so-called structural models, rather than statistical models.   Can we get at the heart of what makes people default on credit card debt?  For instance, spending more than you earn seems one thing that might directly effect the ability to pay back debt.  It is hard to come up with a model where paying for drinks at the bar has a similar effect. But structural models tend to be pretty reduced-form.  It is hard to include 85,000 different items (like in the study reported by the New York Times) in such models.

I vacillate a lot about this issue.  Right now, I am feeling that data mining like the “spend in a bar implies default on credit cards” can lead to interesting insights and directions, but those insights would not be actionable without some more fundamental insight into behavior.  The level of “fundamentalness”  would depend on the application:  if I am simply deciding on a marketing campaign, I might not require too much insight;  if I am setting or reducing credit limits, I would require much more.

I guess this is particularly critical to me since I often play the bank at our Friday Beers.  Since we might have 20 people at Beers, the tab can reach $300, and I sometimes grab the cash from the table and pay by credit card.  Either the credit card companies have to come up with new rules (“If the tip is > 25% [as it often is with us:  they do like us at the bar!], then credit is OK; else ding the record”) or I better hit the ATM on Friday afternoons.

Advice to Doctoral Students

Noah Snyder, a doctoral student in mathematics at Berkeley, has a wonderful post on how to be a successful doctoral student (I lost track on where I saw this:  if it is from an OR blog, please let me know so I can give credit Thanks to Yiorgos Adamopoulos for his tweet on this).  While there is an emphasis on the situation in mathematics, I think the advice he gives is great (for the most part).  His thoughts revolve around the topics:

  • Prioritize reading readable sources
  • Build narratives
  • Study other mathematician’s taste
  • Do one early side project
  • Find a clump of other graduate students
  • Cast a wide net when looking for an advisor
  • Don’t just work on one thing
  • Don’t graduate until you have to

I particularly like his “build narratives” advice.  Every good piece of research has a story behind it.  Why is this research being done?  How does it fit?  Why should people care about this research (hint:  “The literature has a hole which I have now filled” is not a particularly evocative story)?  Once that story is found, research directions are clear, and the whole enterprise takes on a new life.  Note that the narratives are not the same as a popularization (like, say, a blog posting);  the example Noah gives is pretty technical:

In the 80s knot polynomials went through the following transition. First people thought about towers of algebras, then they replaced those with skein theory, then they related those to quantum groups. Attempts at categorification has gone backwards through this progression. First Frankel and Crane wanted to categorify quantum groups, when that proved difficult Khovanov instead categorified the skein theory. Finally, in Khovanov’s HOMFLY homology paper he went all the way back to categorifying towers of algebras and replacing them with towers of categories.

There is a narrative there, and it is one that will resonate with the research audience.

About 22 years ago, when I was a doctoral student trying to finish my dissertation, I had a terrible time putting things together.  I had lots of papers (it was a great time to be a doctoral student at Georgia Tech:  lots of great young faculty to work with) but I couldn’t put it together into a dissertation.  I remember getting snowed in during a rare Atlanta snowstorm and sitting down with a stack of index cards, on which I wrote all that I knew about or wanted to explore further.  I spent the next day or two shuffling the cards and putting them in different groupings and orders.  Out of that, a story finally appeared.  I don’t think it was a great story (the title of my dissertation was “Networks with Specially Structured Side Constraints”), but it was enough of a story to shape the disseration and provide a research agenda for the first years after graduation.

I see a lot of students who present work who don’t go beyond “My advisor told me this was a good problem”.  I think thinking about the narrative would do a lot of us good.

The only piece of advice I would disagree with from Noah’s post is the last one, where he suggests delaying graduation.  This may be forced by his field, which is extremely competitive, but I strongly disagree with that advice in operations research.  In general, I think students should get what they can out of a doctoral program, and push themselves to move on.  Our doctoral students make perhaps $20,000 a year or a bit more:  a comfortable graduate student living, of course.  But faculty salaries will be many times that amount, and even postdocs will pay better.  Don’t stay in school hoping for that one more paper that will put you over the top.  If you have only one more paper in you, you won’t last in this field anyway.  If you have more than that, do them at the next phase of your career.

I am not suggesting rushing through your doctoral program, but once you finish your fifth year here at CMU (we admit students from the bachelors degree, and give a masters degree along the way), we’d really like for you to think about moving on, and I think that is a pretty good general outline.

If you are a doctoral student, or a researcher of any age for that matter, I highly recommend reading Noah’s comments.

Computational Sustainability

Carla Gomes from Cornell visited here a few weeks ago.  I have known Carla for a decade or so, and she has been one of the people who I have found very useful to talk to when trying to figure out the world of constraint programming.

Institute for Computational SustainabilityCarla gets involved in lots of things.  She (along with others, of course, though she is the Lead PI) received a huge grant from NSF to form an Institute for Computational Sustainability, which I think is a wonderful term for an interesting topic.   The vision statement for this activity is quite provocative:

Computer scientists can — and should — play a key role in increasing the efficiency and effectiveness of the way we manage and allocate our natural resources, while enriching and transforming Computer Science.

Now, I consider Carla to be in Operations Research, even though she calls herself a computer scientist, and the topics that the Institute address have a strong operations research feel. One example: how can you most effectively create wildlife corridors to connect populations of endangered animals? If that is not an OR problem, I don’t know what is!

The Institute is holding its first conference in a few weeks, with registration open until May 22. The conference has an extremely broad group of speakers and I suspect most of the conference will be spent trying to understand each others terminology, skills, and interests. Looks like it will be a fun week!

Optimal Cleaning Paths

Yesterday I twittered:

Doing too much operations research. Spent more time figuring out optimal mowing pattern than mowing lawn.

roombaToday, I came across a picture of a Roomba’s path to clean the floor of an l-shaped room (through a number of sites, but I think I am referring to the original). I think I am a little more efficient in lawnmowing, but, then again, I know the shape of my lawn: Roombas figure out the shape dynamically. Interesting OR problem to come up with the optimal Roomba algorithm.

Arnoff Lecture by Keeney, Having Children, and Decision Analysis

Last year around this time, I was giving the 17th Arnoff Lecture at the University of Cincinnati, which was a great thrill. This year, the Arnoff Lecturer was Ralph Keeney who spoke on “Making Informed Business, Health, and Personal Decisions”. Keeney was the author of an OR Forum Paper on how personal decisions are a leading cause of death. The organizers have put a video of the lecture online.

The Lecture contained a review of Ralph’s work on personal health decisions and his earlier work on placing a nuclear waste site. The third topic was my favorite: when should a woman decide to start a family? Being an older parent (I am 49, with Alexander being 5 in a couple of weeks) it is interesting to see how such a decision analysis might be done. Near as I can recall, this was a non-issue until I was 44 when Ilona told me I was about to be a father (at which point I watched “Finding Nemo” to see a possibly fatherly role I might have to take on). Ilona might have a different view of the decision process, though I do not believe it involved explicit utility curves.

What Motivates Operations Researchers?

I was wandering through the Social Science Research Network (a place to which I don’t often go) and I checked out the top operations research papers. The most downloaded paper is by Martin Shubik on 50 years of operations research and game theory. This is an older (2001) paper but makes fascinating reading. I was struck by a paragraph on what motivates people in operations research:

This is illustrated by an event which happened to me at General Electric and another which happened to George Feeney at Stanford Research Institute. I was complaining to one of the vice-presidents that in spite of the fact that General Electric in the 1950s had hired a first class group of operations researchers, the management except for Harold Smiddy (Smiddy and Naum 1954 is still worth rereading) did not appreciate us. Jack McKitterick (the VP of marketing) replied that the trouble with the executives at General Electric was that they had not understood the basic motivation of the group they has hired. He said if they had to do it over again they would have paid us half as much but would have hired a special manager to stroke us and to go around telling each of us how smart we were. George Feeney’s experience at Stanford Research involved explaining what operations research was to one of their vice-presidents who reacted immediately. “I see”, he said, “operations research involves utilizing big minds to work on small problems.”

Ouch. I’d argue with this, but I just spent two days developing an improved optimization approach to scheduling soccer games for five year olds.