From Pharyngula's P.Z. Myers:
I was unimpressed with the overselling of the flaws in the science, but actually quite impressed with the article as an example of psychological manipulation.
The problem described is straightforward: many statistical results from scientific studies that showed great significance early in the analysis are less and less robust in later studies. For instance, a pharmaceutical company may release a new drug with great fanfare that showed extremely promising results in clinical trials, and then later, when numbers from its use in the general public trickle back, shows much smaller effects. Or a scientific observation of mate choice in swallows may first show a clear preference for symmetry, but as time passes and more species are examined or the same species is re-examined, the effect seems to fade.
This isn't surprising at all. It's what we expect, and there are many very good reasons for the shift.
* Regression to the mean: As the number of data points increases, we expect the average values to regress to the true meanand since often the initial work is done on the basis of promising early results, we expect more data to even out a fortuitously significant early outcome.
* The file drawer effect: Results that are not significant are hard to publish, and end up stashed away in a cabinet. However, as a result becomes established, contrary results become more interesting and publishable.
* Investigator bias: It's difficult to maintain scientific dispassion. We'd all love to see our hypotheses validated, so we tend to consciously or unconsciously select reseults that favor our views.
* Commercial bias: Drug companies want to make money. They can make money off a placebo if there is some statistical support for it; there is certainly a bias towards exploiting statistical outliers for profit.
* Population variance: Success in a well-defined subset of the population may lead to a bit of creep: if the drug helps this group with well-defined symptoms, maybe we should try it on this other group with marginal symptoms. And it doesn'tbut those numbers will still be used in estimating its overall efficacy.
* Simple chance: This is a hard one to get across to people, I've found. But if something is significant at the p=0.05 level, that still means that 1 in 20 experiments with a completely useless drug will still exhibit a significant effect.
* Statistical fishing: I hate this one, and I see it all the time. The planned experiment revealed no significant results, so the data is pored over and any significant correlation is seized upon and published as if it was intended. See previous explanation. If the data set is complex enough, you'll always find a correlation somewhere, purely by chance.
Here's the thing about Lehrer's article: he's a smart guy, he knows this stuff. He touches on every single one of these explanations, and then some. In fact, the structure of the article is that it is a whole series of explanations of those sorts. Here's phenomenon 1, and here's explanation 1 for that result. But here's phenomenon 2, and explanation 1 doesn't workbut here's explanation 2. But now look at phenomenon 3! Explanation 2 doesn't fit! Oh, but here's explanation 3. And on and on. It's all right there, and Lehrer has explained it.
But that's where the psychological dimension comes into play. Look at the loaded language in the article: scientists are "disturbed," "depressed," and "troubled." The issues are presented as a crisis for all of science; the titles (which I hope were picked by an editor, not Lehrer) emphasize that science isn't working, when nothing in the article backs that up. The conclusion goes from a reasonable suggestion to complete bullshit.