Seed Magazine has an excellent article entitled “Dirty Little Secret” with a subtitle of “Are most published research findings actually false? The case for reform.” (Thanks to the Complex Intelligent Systems blog for pointing to this). The article begins with some amazing statistics:
In a 2005 article in the Journal of the American Medical Association, epidemiologist John Ioannidis showed that among the 45 most highly cited clinical research findings of the past 15 years, 99 percent of molecular research had subsequently been refuted. Epidemiology findings had been contradicted in four-fifths of the cases he looked at, and the usually robust outcomes of clinical trials had a refutation rate of one in four.
Why has this happened?
The culprits appear to be the proverbial suspects: lies, damn lies, and statistics. Jonathan Sterne and George Smith, a statistician and an epidemiologist from the university of Bristol in the UK, point out in a study in British Medical Journal that “the widespread misunderstanding of statistical significance is a fundamental problem” in medical research. What’s more, the scientist’s bias may distort statistics. Pressure to publish can lead to “selective reporting;” the implication is that attention-seeking scientists are exaggerating their results far more often than the occasional, spectacular science fraud would suggest.
Ioannidis’ paper “Why Most Published Research Findings are False” goes into a more in-depth examination of why this happens. This is no surprise to those who understand statistical significance. If 20 groups do similar research, it is pretty likely that at least one group will have a finding “at the 95% significance level” just by blind luck. And if that is the group that gets to publish (since proof of significance is much more interesting than non-proof), then the scientific literature will, by its nature, publish false findings. This is made worse by issues of bias, conflict of interest, and so on.
Ioannidis continues with some corollaries on when it is more likely that published research is false:
Corollary 1: The smaller the studies conducted in a scientific field, the less likely the research findings are to be true.
Corollary 2: The smaller the effect sizes in a scientific field, the less likely the research findings are to be true.
Corollary 3: The greater the number and the lesser the selection of tested relationships in a scientific field, the less likely the research findings are to be true.
Corollary 4: The greater the flexibility in designs, definitions, outcomes, and analytical modes in a scientific field, the less likely the research findings are to be true.
Corollary 5: The greater the financial and other interests and prejudices in a scientific field, the less likely the research findings are to be true.
Corollary 6: The hotter a scientific field (with more scientific teams involved), the less likely the research findings are to be true.
This got me thinking about the situation in operations research. How much of the published operations research literature is false? One key way of avoiding false results is replicability. If a referee (or better, any reader of the paper) can replicate the result, then many of the causes for false results go away. The referee may be wrong sometimes (in a statistical study with a true 95% confidence, the referee will get the “wrong” result 5% of the time), but the number of false publications decreases enormously.
Most major medical trials are not replicable: the cost of large trials is enormous, and the time required is too long. So you can expect false publications.
Much of OR is replicable. For more mathematical work, the purpose of a proof is to allow the reader to replicate the theorem. For most papers, this is straightforward enough that the field can have confidence in the theorems. Of course, sometimes subtle mistakes are made (or not so subtle if the referees did not do their jobs), and false theorems are published. And sometimes the proof of a theorem is so complex (or reliant on computer checking) that the level of replicability decreases. But I think it is safe to say that the majority (perhaps even the vast majority) of theorems in operations research are “true” (they may, of course, be irrelevant or uninteresting or useless, but that is another matter).
For computational work, the situation is less rosy. Some types of claims (“This is a feasible solution with this value to this model”, for example) should be easy to check, but generally are not. This leads to problems and frustrations in the literature. In my own areas of research, there are competing claims of graph coloring with k colors and cliques of size k+1 on the same graph, which is an inconsistency: I am putting together a system to try to clear out those sorts of problems. But other claims are harder to check. A claim that X is the optimal solution for this instance of a hard combinatorial problem is, by its nature, difficult to show. Worse are the claims “this heuristic is the best method for this problem” which get into many of the same issues medical research gets into (with bias being perhaps even more of an issue).
I think this is an issue that deserves much more thought. Very little of the computational work in this field is replicable (contrast this to my review of work of Applegate, Bixby, Cook, and Chvatal on the Traveling Salesman Problem, which I think is much more open than 99.9% of the work in the field), and this leads to the confusion, false paths, and wasted effort. We are much more able to replicate work than many fields, but I do not think we do so. So we have many more false results in our field than we should have.