New Use for Abstracts

For a previous post on data mining, I received the following comment from “liseli bakire”:

Abstract
The purpose of this article is to investigate some managerial insights related to using the all-unit quantity discount policies under various conditions. The models developed here are general treatments that deal with four major issues: (a) one buyer or multiple buyers, (b) constant or price-elastic demand, (c) the relationship between the supplier’s production schedule or ordering policy and the buyers’ ordering sizes, and (d) the supplier either purchasing or manufacturing the item. The models are developed with two objectives: the supplier’s profit improvement or the supplier’s increased profit share analysis. Algorithms are developed to find optimal decision policies. Our analysis provides the supplier with both the optimal all-unit quantity discount policy and the optimal production (or ordering) strategy. Numerical examples are provided. © 1993 John Wiley & Sons. Inc.

Looks vaguely relevant, though not really pertinent to the topic.  My spam filter had no chance on it:  it passed it through as legit.  But why this abstract?  Ah, “liseli” has a porn site linked to the name (and it turns out the name means “high school virgins” in Turkish).  All part of the cat and mouse game as people try to get links on to the all-powerful “Michael Trick’s Operations Research Blog”.   But extra points for actually using an operations research oriented abstract!

By the way, the paper with that abstract is by Kevin Weng and Richard Wong, “General models for the supplier’s all-unit quantity discount policy, Naval Research Logistics, 1993.  Looks like an interesting paper!

Optimizing Discounts with Data Mining

The New York Times has an article today about tailoring discounts to individuals.    They concentrated on Sam’s Club, a warehouse chain.  Sam’s Club is a good place for this sort of individual discounting since you have to be a member to shop there, and your membership is associated with every purchase you make.  So Sam’s Club has a very good record of what you are buying there.  (In fact, as a division of Walmart Stores, perhaps Sam’s has an even better picture based on the other stores in the chain, but no membership card is shown at Walmart, so it would have to be done through credit card or other information.)

The article stressed how predictive analytics could predict what an individual consumer might be interested in, and could then offer discounts or other messages to encourage buying.

Given how many loyalty cards I have, it is surprising how few really take advantage of the data they get.  Once in a while, my local supermarket seems to offer individualized coupons.   Barnes and Noble and Borders seem to offer nothing beyond “Take 20% of one item” coupons, even though everything in my buying behavior says “If you hook me on a mystery or science fiction series, I will buy each and every one of the series, including those that are only in hardcover”.

Amazon does market to me individually, seeming to offer discounts that may be designed for me alone (online retailers can hide individual versus group discounts very well:  it is hard to know what others are seeing).

For both Sam’s and Amazon, though, I would be worried that the companies would be using my data against me.  If the goal is to optimize net revenue, any optimal discounting scheme would have the following property:  if I am sufficiently likely to buy a product without a discount, then no discount should be given.  The NY Times article had two quotes from customers:

“There’s a dollar off Bounce. I use that all the time.”

and

“[A customer]  said the best eValues deal yet was $300 off a $1,200 television.

“I remember that day,” he said later. “I came to buy food, and I bought two TVs.”

The second story is a success for data mining (assuming the company made a profit off of a $900 TV):  the customer would not have purchased without it.

In the first first story, the evaluation is more complicated:  if she really was going to buy Bounce anyway, then the $1 coupon was a $1 loss for Sam’s.  But consumer behavior is complicated:  by offering small discounts on many items, Sam’s encourages customers to buy all of their items there, not just the ones on discount.  So the overall effect may be positive.  But optimal discounting for these sorts of interrelated items with a lifetime environment is pretty complicated.

But here is a hypothetical situation (presumably):  it turns out that 25 year olds (say) are at a critical point in purchasing behavior when they decide exactly what brands they will purchase for the rest of their lifetime;  50 year olds are set in their ways (“I always buy Colgate, I never buy Crest”).  A 25 year old goes into Sam’s, hits the kiosk and walks away with 10 pages of coupons;  a 50 year old gets nothing.  Is this a success for data mining?  Perhaps the answer depends on whether you are 25 or 50!

And, more importantly for me, does Amazon not give me discounts once it is sufficiently certain I am going to want a book?

World Cup Forecast Pool, with a Twist

The Brazilian Society of Operations Research is organizing a competition for predicting the results of the group stage at the upcoming World Cup.  If you have to ask for which sport, you probably aren’t the target audience:  it is for football (aka soccer).  Many sites have such pools for many sports:  for US college basketball, the OR blogs are practically given over to the topic every March.

This competition is a bit different though:  you aren’t allowed to simply guess the winners of each game.  First, you need to give probabilities of win, loss or tie, with scoring based on squared errors.  Second, you need to use some sort of model to generate the predictions, and be willing to describe that model.  And no fair modifying your results to better fit your own ideas!  You need to stick to the model’s predictions.    There are three subcompetitions, with track A allowing fewer types of information than track B, and track C limited to members of the Brazilian OR society.

There is quite a bit of past data on the players and teams, so perhaps it is possible to create useful models.  I look forward to seeing the results.  Deadline for entries is June 7, with games starting June 11.

Martin Gardner has Passed Away

Martin Gardner has passed away.  I know I am not the only person in operations research who was inspired by Gardner’s Mathematical Games columns in Scientific American.  I have a strong memory of whiling away long high school physics classes reading Gardner’s columns (and thankful that patient and insightful physics teacher had a stack of Scientific Americans and did not mind my lack of attention to his teaching).  A large number of the columns were really about operations research problems:  “What is the best way to do this?”  “How few moves are needed to accomplish that?”  It is through his columns that I understood the breadth and beauty of mathematics, and how that world was accessible even to a high school student.  And that high school students and professional mathematicians could work on the same problem and each have something to contribute.

When I went to university, it took me some time to find the type of mathematics that inspired what Gardner inspired.  I found it in operations research, and I am thankful for Martin Gardner for showing me what to look for.

Culling Journals Time!

It is that time of the year when our librarian asks us to consider whether or not to continue subscribing to journals.  In the past, journals have been identified by “percentage increase” with the idea that those whose increase is high need special attention to determine if they are still valuable.  This assumes that we had made good decisions in the past:  if a “bad” journal keeps its increase low enough, it doesn’t show up on the radar screen.  A low priced, but valuable journal with a “big” one-time increase gets special scrutiny.  But which should get more attention: a journal going up $60 on a base of $600 or an equivalent quality journal going up $200 on a base of $5000?  Ordering by percentage increase means the first gets much more attention but rational budgeting suggests looking carefully at the second.  While those values seem extreme, that is roughly what happens when comparing Management Science (as the “inexpensive” journal) and European Journal of Operational Research (whose price to Carnegie Mellon is $5885 per year).

This year, our librarian simply listed all journals above $500 and asked us to look those over.  Here are the ones in operations research/operations management we are considering:

INTERNATIONAL JOURNAL OF PRODUCTION RESEARCH $8,615
EUROPEAN JOURNAL OF OPERATIONAL RESEARCH $5,855
JOURNAL OF THE OPERATIONAL RESEARCH SOCIETY $1,840
COMPUTATIONAL OPTIMIZATION AND APPLICATIONS $1,166
ZEITSCHRIFT FUR OPERATIONS RESEARCH aka Mathematical Methods of Operations Research $898
OPERATIONS RESEARCH LETTERS $815
JOURNAL OF OPERATIONS MANAGEMENT $637

The INFORMS journals don’t make the list since the bundled rate puts them under $500/journal.

What to do with these?  Fortunately, I have already done some checking on journal influence and pricing.

Let’s start with the first journal listed above: “International Journal of Production Research” (Taylor and Francis). If we take a “cost per eigenfactor” value, that journal ranks 39th in the “Operations Research” ranking.  I have never considered publishing in the journal, so I don’t know much about it.  It does publish a lot of articles (24 issues per year, with around 15-20 articles per issue).  I recognize a couple of names on its editorial board.  Harzing’s indispensable Publish or Perish shows that it has a fair number of papers published with 100+ google scholar cites.  Overall, not a bad or junk journal, but for $8615, I would want much more.  So I would be biased towards dropping, but will bow to my colleagues in operations management on how they feel.  Near as I can tell, none of them have published in the journal either, so that might be a good one to cut.

European Journal of Operational Research (Elsevier) is a difficult one for me.  I have published two papers there recently, and it is a key outlet in operations research.  Since they publish many papers (24 issues/year times perhaps 25  papers per issue), the journal is important to our field.  Going through the same steps as above, the journal is 12th in cost per eigenfactor, number 1 in overall eigenfactor,I certainly know and admire much of the editorial board, there are many papers above 100 cites.  I’m not crazy about a $5855 cost, but I think we are stuck paying it.

Journal of the Operational Research Society (Palgrave) would be a hard one for me to cut.  Sometimes it veers off in directions I am not crazy about (the dreaded “soft OR” versus “hard OR” debate) but it offers a nice mix of theory and application, along with the odd interesting historical piece.  Number 15 on the cost/eigenfactor, I think it is safe.

Computational Optimization and Applications (Springer) is a journal I have published once in, and is part of the discussion when trying to place some of my work.  It is down the list at 25th in cost/eigenvector, but has an admirable board.  Not a huge number of papers with 100+ cites (14 in my search), but pretty reasonable.  I think it is OK.

Zeitschrift fuer Operations Research (Mathematical Methods of Operations Research) (Springer) has a long history, going back to the days when it made sense to talk about a country’s operations research journals.  But, like “operations research” groups in Fortune 500 companies, country-oriented OR journals are finding it hard to compete.  In fact, I am finding it hard to parse out exactly what the history is here, but this appears to be the combination of a couple different journals.  In any case, at 28 in the cost/eigenvector listing, it is clear it needs more papers like “Modeling of Extremal Events in Insurance and Finance” by Embrechts and Schmidli (1994) (at an impressive 2051 google cites) if it is going to survive.  So keep for now, but give it a stern eye.

OR Letters (Elsevier) is a natural keep at number 9 in the cost/eigenvector listing.  It is as close as we get to the rapid publication system that works so well in portions of computer science through their competitive conference publication system.

Journal of Operations Management (Elsevier) at number 10 in the cost/eigenvector listing, and a Financial Times journal to boot (meaning it is used to gauge research impact in the Financial Times ranking of business schools) is also a natural keep.

OK, there you have it:  I would toss out the International Journal of Production Research on the basis of stupid pricing but keep the rest.   We’ll see if my colleagues vote to keep it around for another year.

More Operations Research in the News, but not in a Welcome Way

Fabrice Tourre, “the fabulous Fab”, who is at the center of the Goldman Sachs scandal, is a 2001 graduate of Stanford University. That, in itself, is no surprise. Stanford has a top ranked business school that does about as well as the rest of us in graduating ethical MBAs (by that I mean MBAs who do, on the whole, try to act ethically, but some of whom find ethical challenges … challenging), so it is not surprising that a powerful Goldman Sachs person would come from there. But what is surprising is that Fabrice’s Stanford degree is a Masters in Operations Research! Our field is in the news!

Thinking about it, it is not so surprising. Since Fabrice is reported to be 31, a 2001 graduate would have been 22. Most business schools like to see at least a little work experience, so 22 year-olds with an MBA would be quite unusual. A Masters in Operations Research would be more common, I would think.

I can’t tell if the fabulous Fab did anything wrong, let alone illegal, but this does bring up an issue in training. At business schools, we are working hard to think about how to include ethics and other aspects of corporate social responsibility into the curriculum (with varying levels of success). What are operations research programs doing to ensure that their masters graduates are aware of the choices they make? Checking Georgia Tech, Michigan, Stanford Management Science and Engineering (is there still an MSOR from Stanford?), and Cornell (not to pick on them, but to pick a few of the best programs out there), does not lead one to believe that ethics, corporate responsibility or a traditional “engineering professional responsibility” course is part of the masters curriculum.  This is not to suggest that we are putting out a generation of unethical lying optimizers, but perhaps we should rethink the balance of our programs.  I do believe operations research to be outstanding training for a wide variety of careers:  going beyond linear and integer programming into some of the challenges of the real world would be a good direction to go for the sake of the students, and for the rest of us.

Conditional Probability in the New York Times

When you ask a question of the form “What are the chances of X given Y”, your are asking a question of conditional probability. These sorts of questions have come up in this blog before: “What are the chances of cancer given a positive test result?” “What are the chances a monkey prefers blue M&Ms to green M&Ms, given it prefers red M&Ms to blue M&Ms?” “What are the chances of predicting the NCAA tournament perfectly, given perfect predictions for the first two rounds?”

Conditional probability is extremely important for two reasons. First, it occurs all the time: it is a fundamental building block as we aggregate information in an uncertain environment. Second, people are really, really bad at it. In case after case, our intuition misleads us and we badly misestimate conditional probabilities. When a 90% accurate drug test (meaning it is positive 90% of the time for a drug user, and negative 90% of the time for a nonuser) comes back positive, what is the probability the person uses drugs. Our intuition screams “It has to be 90%”! But the probability of “User given positive drug test” is not the same as probability of “positive drug test given user”. If 5% of the population use drugs, then the probability of “User given positive drug test is about 1 in 3. Consider 1000 people: 50 are drug users so 45 will test positive; of the 950 non-users, 95 will test positive; so 45/(45+95) is the probability of user given positive test.

Note that in the argument above, I did not rely on the main theorem in conditional probability: Bayes Theorem. Bayes Theorem states P(A|B) (the probability of A given B) = P(B|A)P(A)/P(B). I could have worked it out that way, but in doing so I would have lost all intuition as to the result. For simple cases, the counting approach is much easier and shows why the result is what it is.

This argument is at the heart of Steven Stogatz’s excellent article “Chances Are”, online at the New York Times (thanks for the pointer, Matt). He gives some excellent examples of conditional probability, including a great riff on the O.J. Simpson murder trial.

The prosecution spent the first 10 days of the trial introducing evidence that O.J. had a history of violence toward his ex-wife, Nicole. He had allegedly battered her, thrown her against walls and groped her in public, telling onlookers, “This belongs to me.” But what did any of this have to do with a murder trial? The prosecution’s argument was that a pattern of spousal abuse reflected a motive to kill. As one of the prosecutors put it, “A slap is a prelude to homicide.”

Alan Dershowitz countered for the defense, arguing that if even the allegations of domestic violence were true, they were irrelevant and should therefore be inadmissible. He later wrote, “We knew we could prove, if we had to, that an infinitesimal percentage — certainly fewer than 1 of 2,500 — of men who slap or beat their domestic partners go on to murder them.”

In effect, both sides were asking the jury to consider the probability that a man murdered his ex-wife, given that he previously battered her. But as the statistician I. J. Good pointed out, that’s not the right number to look at.

The real question is: What’s the probability that a man murdered his ex-wife, given that he previously battered her and she was murdered by someone? That conditional probability turns out to be very far from 1 in 2,500.

Turns out that probability is about 90%.

We are in the midst of a curriculum review at the Tepper School of Business and are considering what we absolutely have to be sure our MBAs know. Being able to work with conditional probability is very high on my list: it is an area where your intuition will almost surely lead you astray. And, as the Strogatz article points out, while Bayes Rule may get you the right results, simple counting arguments are much more convincing.

What is Operations Research?

Over on the suddenly active OR-Exchange, David Woods asked the question:

What are the best quick definitions describing operations research? The kind that you’d give to someone if you only had the duration of an elevator ride to describe it…

David then goes on to answer his question with a fantastic answer:

“Operations research is the art and science of obtaining bad answers to questions to which otherwise worse answers would be given.”

I love that! But I’m not sure that outsiders will really get it. Paraphrasing Steve Martin, “I stopped using irony when I realized I was the only one using it.”

So I’m left with “Operations Research is about making better decisions through mathematical models. Say, are you a baseball fan?” where I hope to squeeze in a story about operations research and sports scheduling. I’m hoping for a better line through the answers at OR-Exchange.

New Optimization Software Version: Gurobi

The INFORMS Practice Meeting has become a good place for optimization software firms to announce their new versions. Gurobi is first off the mark with an announcement of version 3.0. New aspects include better use of multiple cores in the barrier solver and what looks to be significant improvements to the mixed integer programming solver. Quadratic programming will wait for version 4.0, with an expected release in November, 2010.

Future plans include second order cone programming (SOCP), including a mixed integer version. I really think I should know more about SOCP, but it doesn’t seem to fit with me: I see mixed integer linear programs everywhere I look, but never say “Wow, now there is a SOCP”. Perhaps I’ll start seeing them in time for the Gurobi release a couple of versions down the road.

Follow INFORMS Practice from your Own Home

Sadly, I’m not at INFORMS Practice, but the blog entries and tweets make me feel like I am there (or perhaps they remind me I am not). Lots of interesting things happening and the conference hasn’t even started yet! Coming up shortly: the technology workshops, followed by the Welcome Reception tonight. I’ve got the tweets in my sidebar, and highly recommend following the conference page for the blog entries.