Optimizing Discounts with Data Mining

The New York Times has an article today about tailoring discounts to individuals.    They concentrated on Sam’s Club, a warehouse chain.  Sam’s Club is a good place for this sort of individual discounting since you have to be a member to shop there, and your membership is associated with every purchase you make.  So Sam’s Club has a very good record of what you are buying there.  (In fact, as a division of Walmart Stores, perhaps Sam’s has an even better picture based on the other stores in the chain, but no membership card is shown at Walmart, so it would have to be done through credit card or other information.)

The article stressed how predictive analytics could predict what an individual consumer might be interested in, and could then offer discounts or other messages to encourage buying.

Given how many loyalty cards I have, it is surprising how few really take advantage of the data they get.  Once in a while, my local supermarket seems to offer individualized coupons.   Barnes and Noble and Borders seem to offer nothing beyond “Take 20% of one item” coupons, even though everything in my buying behavior says “If you hook me on a mystery or science fiction series, I will buy each and every one of the series, including those that are only in hardcover”.

Amazon does market to me individually, seeming to offer discounts that may be designed for me alone (online retailers can hide individual versus group discounts very well:  it is hard to know what others are seeing).

For both Sam’s and Amazon, though, I would be worried that the companies would be using my data against me.  If the goal is to optimize net revenue, any optimal discounting scheme would have the following property:  if I am sufficiently likely to buy a product without a discount, then no discount should be given.  The NY Times article had two quotes from customers:

“There’s a dollar off Bounce. I use that all the time.”

and

“[A customer]  said the best eValues deal yet was $300 off a $1,200 television.

“I remember that day,” he said later. “I came to buy food, and I bought two TVs.”

The second story is a success for data mining (assuming the company made a profit off of a $900 TV):  the customer would not have purchased without it.

In the first first story, the evaluation is more complicated:  if she really was going to buy Bounce anyway, then the $1 coupon was a $1 loss for Sam’s.  But consumer behavior is complicated:  by offering small discounts on many items, Sam’s encourages customers to buy all of their items there, not just the ones on discount.  So the overall effect may be positive.  But optimal discounting for these sorts of interrelated items with a lifetime environment is pretty complicated.

But here is a hypothetical situation (presumably):  it turns out that 25 year olds (say) are at a critical point in purchasing behavior when they decide exactly what brands they will purchase for the rest of their lifetime;  50 year olds are set in their ways (“I always buy Colgate, I never buy Crest”).  A 25 year old goes into Sam’s, hits the kiosk and walks away with 10 pages of coupons;  a 50 year old gets nothing.  Is this a success for data mining?  Perhaps the answer depends on whether you are 25 or 50!

And, more importantly for me, does Amazon not give me discounts once it is sufficiently certain I am going to want a book?

World Cup Forecast Pool, with a Twist

The Brazilian Society of Operations Research is organizing a competition for predicting the results of the group stage at the upcoming World Cup.  If you have to ask for which sport, you probably aren’t the target audience:  it is for football (aka soccer).  Many sites have such pools for many sports:  for US college basketball, the OR blogs are practically given over to the topic every March.

This competition is a bit different though:  you aren’t allowed to simply guess the winners of each game.  First, you need to give probabilities of win, loss or tie, with scoring based on squared errors.  Second, you need to use some sort of model to generate the predictions, and be willing to describe that model.  And no fair modifying your results to better fit your own ideas!  You need to stick to the model’s predictions.    There are three subcompetitions, with track A allowing fewer types of information than track B, and track C limited to members of the Brazilian OR society.

There is quite a bit of past data on the players and teams, so perhaps it is possible to create useful models.  I look forward to seeing the results.  Deadline for entries is June 7, with games starting June 11.

Martin Gardner has Passed Away

Martin Gardner has passed away.  I know I am not the only person in operations research who was inspired by Gardner’s Mathematical Games columns in Scientific American.  I have a strong memory of whiling away long high school physics classes reading Gardner’s columns (and thankful that patient and insightful physics teacher had a stack of Scientific Americans and did not mind my lack of attention to his teaching).  A large number of the columns were really about operations research problems:  “What is the best way to do this?”  “How few moves are needed to accomplish that?”  It is through his columns that I understood the breadth and beauty of mathematics, and how that world was accessible even to a high school student.  And that high school students and professional mathematicians could work on the same problem and each have something to contribute.

When I went to university, it took me some time to find the type of mathematics that inspired what Gardner inspired.  I found it in operations research, and I am thankful for Martin Gardner for showing me what to look for.

Culling Journals Time!

It is that time of the year when our librarian asks us to consider whether or not to continue subscribing to journals.  In the past, journals have been identified by “percentage increase” with the idea that those whose increase is high need special attention to determine if they are still valuable.  This assumes that we had made good decisions in the past:  if a “bad” journal keeps its increase low enough, it doesn’t show up on the radar screen.  A low priced, but valuable journal with a “big” one-time increase gets special scrutiny.  But which should get more attention: a journal going up $60 on a base of $600 or an equivalent quality journal going up $200 on a base of $5000?  Ordering by percentage increase means the first gets much more attention but rational budgeting suggests looking carefully at the second.  While those values seem extreme, that is roughly what happens when comparing Management Science (as the “inexpensive” journal) and European Journal of Operational Research (whose price to Carnegie Mellon is $5885 per year).

This year, our librarian simply listed all journals above $500 and asked us to look those over.  Here are the ones in operations research/operations management we are considering:

INTERNATIONAL JOURNAL OF PRODUCTION RESEARCH $8,615
EUROPEAN JOURNAL OF OPERATIONAL RESEARCH $5,855
JOURNAL OF THE OPERATIONAL RESEARCH SOCIETY $1,840
COMPUTATIONAL OPTIMIZATION AND APPLICATIONS $1,166
ZEITSCHRIFT FUR OPERATIONS RESEARCH aka Mathematical Methods of Operations Research $898
OPERATIONS RESEARCH LETTERS $815
JOURNAL OF OPERATIONS MANAGEMENT $637

The INFORMS journals don’t make the list since the bundled rate puts them under $500/journal.

What to do with these?  Fortunately, I have already done some checking on journal influence and pricing.

Let’s start with the first journal listed above: “International Journal of Production Research” (Taylor and Francis). If we take a “cost per eigenfactor” value, that journal ranks 39th in the “Operations Research” ranking.  I have never considered publishing in the journal, so I don’t know much about it.  It does publish a lot of articles (24 issues per year, with around 15-20 articles per issue).  I recognize a couple of names on its editorial board.  Harzing’s indispensable Publish or Perish shows that it has a fair number of papers published with 100+ google scholar cites.  Overall, not a bad or junk journal, but for $8615, I would want much more.  So I would be biased towards dropping, but will bow to my colleagues in operations management on how they feel.  Near as I can tell, none of them have published in the journal either, so that might be a good one to cut.

European Journal of Operational Research (Elsevier) is a difficult one for me.  I have published two papers there recently, and it is a key outlet in operations research.  Since they publish many papers (24 issues/year times perhaps 25  papers per issue), the journal is important to our field.  Going through the same steps as above, the journal is 12th in cost per eigenfactor, number 1 in overall eigenfactor,I certainly know and admire much of the editorial board, there are many papers above 100 cites.  I’m not crazy about a $5855 cost, but I think we are stuck paying it.

Journal of the Operational Research Society (Palgrave) would be a hard one for me to cut.  Sometimes it veers off in directions I am not crazy about (the dreaded “soft OR” versus “hard OR” debate) but it offers a nice mix of theory and application, along with the odd interesting historical piece.  Number 15 on the cost/eigenfactor, I think it is safe.

Computational Optimization and Applications (Springer) is a journal I have published once in, and is part of the discussion when trying to place some of my work.  It is down the list at 25th in cost/eigenvector, but has an admirable board.  Not a huge number of papers with 100+ cites (14 in my search), but pretty reasonable.  I think it is OK.

Zeitschrift fuer Operations Research (Mathematical Methods of Operations Research) (Springer) has a long history, going back to the days when it made sense to talk about a country’s operations research journals.  But, like “operations research” groups in Fortune 500 companies, country-oriented OR journals are finding it hard to compete.  In fact, I am finding it hard to parse out exactly what the history is here, but this appears to be the combination of a couple different journals.  In any case, at 28 in the cost/eigenvector listing, it is clear it needs more papers like “Modeling of Extremal Events in Insurance and Finance” by Embrechts and Schmidli (1994) (at an impressive 2051 google cites) if it is going to survive.  So keep for now, but give it a stern eye.

OR Letters (Elsevier) is a natural keep at number 9 in the cost/eigenvector listing.  It is as close as we get to the rapid publication system that works so well in portions of computer science through their competitive conference publication system.

Journal of Operations Management (Elsevier) at number 10 in the cost/eigenvector listing, and a Financial Times journal to boot (meaning it is used to gauge research impact in the Financial Times ranking of business schools) is also a natural keep.

OK, there you have it:  I would toss out the International Journal of Production Research on the basis of stupid pricing but keep the rest.   We’ll see if my colleagues vote to keep it around for another year.