Skip to content

{ Category Archives } Data Mining

Statistics, Cell Phones, and Cancer

Today’s New York Times Magazine has a very nice article entitled “Do Cellphones Cause Brain Cancer”. The emphasis on the article is on the statistical and medical issues faced when trying to find such a link. On the surface, it seems unlikely that cellphones have a role here. There has been no increase in brain […]

Everyone Needs to Know Some Statistics Part n+1

I have previously written on how decision makers (and journalists) need to know some elementary probability and statistics to prevent them from making horrendously terrible decisions.  Coincidentally, Twitter’s @ORatWork (John Poppelaars) has provided a pointer to an excellent example of how easily organizations can get messed up on some very simple things. As reported by […]

Optimizing Discounts with Data Mining

The New York Times has an article today about tailoring discounts to individuals.    They concentrated on Sam’s Club, a warehouse chain.  Sam’s Club is a good place for this sort of individual discounting since you have to be a member to shop there, and your membership is associated with every purchase you make.  So Sam’s […]

Data Mining, Operations Research, and Predicting Murders

John Toczek, who writes the PuzzlOR column for OR/MS Today  (example), has put together a new operations research/data mining challenge in the spirit of, though without the million dollar reward of, the Netflix Prize.  The Analytics X Prize is a  fascinating problem: Current Contest – 2010 – Predicting Homicides in Philadelphia Philadelphia is a city […]

Competition then Cooperation: More on the Netflix Challenge

Wired has a nice article on the teams competing to win the Netflix Prize (thanks for the pointer, Matt!).  I think the most interesting aspect is how the “competition” turned into a cooperation: Teams Bellkor (AT&T Research), Big Chaos and Pragmatic Theory combined to form Bellkor’s Pragmatic Chaos, the first team to qualify for the […]

Data Mining and the Stock Market

As I have mentioned a number of times, I teach data mining to the MBA students here at the Tepper School.  It is a popular course, with something like 60% of our students taking it before graduating.   I offer an operations research view of data mining:  here are the algorithms, here are the assumptions, here […]

How to Design a Contest: The Netflix Challenge

It looks like the Netflix people made some good decisions when they designed their million dollar challenge. In particular, it appears that they kept two verification test sets: one that was the basis for the public standings and one that no one ever saw the results from. It is the success on the latter set […]

The Perils of “Statistical Significance”

As someone who teaches data mining, which I see as part of operations research, I often talk about what sort of results are worth changing decisions over.  Statistical significance is not the same as changing decisions.  For instance, knowing that a rare event is 3 times more likely to occur under certain circumstances might be […]

Netflix Prize ready to finish?

While I have been somewhat skeptical of the Netflix Prize (in short: it seems to be showing how little information is in the data, rather than how much; and the data is rather strange for “real data”), it is still fascinating to watch some pretty high powered groups take a stab at it. If I […]

Data Mining Competition from FICO and UCSD

I am a sucker for competitions.  I have run a few in the past, and I see my page on the Traveling Tournament Problem as an indefinite length computational competition.    Data Mining naturally leads to competitions:  there are so many alternative techniques out there and little idea of what might work well or poorly on […]