Reproducibility…

Reproducibility… September 26, 2013

Posted by gordonwatts in Analysis, Data Preservation, Fermilab, reproducible.
trackback

I stumbled across an article on reproducibility recently, “Science is in a reproducibility crisis: How do we resolve it?”, with the following quotes which really caught me off guard:

Over the past few years, there has been a growing awareness that many experimentally established "facts" don’t seem to hold up to repeated investigation.

They made a reference to a 2010 alarmist New Yorker article, The Truth Wears Off (there is a link to a PDF of this article on the website, but I don’t know if it is legal, so I won’t link directly here).

Read that quote carefully: many. That means a lot. It would be all over! Searching on the internet, I stumbled on a Nature report. They looked carefully at a database of medical journal publications and retraction rates. Here is a image of the retraction rates the found as a function of time:

First, watch it for the axes here – multiply the numbers on the left by 10 to the 5th (100000), and numbers on the right by 10 to the –2 (0.01). IN short, the peak rate is 0.01%. This is a tiny number. And, as the report points out, there are two ways to interpret the results:

This conclusion, of course, can have two interpretations, each with very different implications for the state of science. The first interpretation implies that increasing competition in science and the pressure to publish is pushing scientists to produce flawed manuscripts at a higher rate, which means that scientific integrity is indeed in decline. The second interpretation is more positive: it suggests that flawed manuscripts are identified more successfully, which means that the self-correction of science is improving.

The truth is probably a mixture of the two. But this rate is still very very small!

The reason I harp on this is because I’m currently involved in a project that contains reproducibility as one of its possible uses: preserving the data of the DZERO experiment, one of the two general purpose detectors on the now-defunct Tevatron accelerator. Through this I’ve come to appreciate exactly how difficult and potentially expensive this process might be. Especially in my field.

Lets take a very simple example. Say you use Excel to process data for a paper you are writing. The final number comes from this spreadsheet and is copied into the conclusions paragraph of your paper. So you can now upload your excel spreadsheet to the journal along with the draft of the paper. The journal archives it forever. If someone is puzzled by your result, they can go to the journal and download the spreadsheet and see exactly what you did (aka modern economics papers). Win!

Only wait. What if the numbers that you typed into your spreadsheet came from some calculations you ran. Ok. You need to include that. And the inputs to the calculations. And so on and so on. For a medical study you would presumably have to upload the anonymous medical records of each patient, and then everything from there to the conclusion about a drug’s safety or efficacy. Uploading raw data from my field is not feasible – it is petabytes in size. This is all ad-hoc – the tools we use do not track the data as they flow through them.

As an early prof I was involved in a study that was trying to replicate and extend a result from a prior experiment. We couldn’t. The group from the other experiment was forced to resurrect code on a dead operating system, and figure out what they did – reproduce it – so they could ask our questions. The process took almost a year. In the end we found one error in that original paper – but the biggest change was just that modern tools were better had a better model of physics and that was the main reason we could not replicate their results. It delayed the publication of our paper by a long time.

So, clearly, it is useful to have reproducibility. Errors are made. Bias gets involved even with the best of intentions. Sometimes fraud is involved. But these negatives have to be balanced against the cost of making all the analyses reproducible. Our tools just aren’t there yet and it will be both expensive and time consuming to upgrade them. Do we do that? Or measure a new number, rule out a new effect, test a new drug?

Given the rates above, I’d be inclined to select the latter. And have a process of evolution of the tools. No crisis.

Comments»

1. Jeremy Leipzig - September 27, 2013: I’m sure physics is fine. This is really about biology. You stumbled on the wrong Nature article. The real shocker is this one: http://www.nature.com/nature/journal/v483/n7391/full/483531a.html

Reply
gordonwatts - September 27, 2013: For those that don’t want to click through, that describes a situation where studies in 50 cancer papers judged to be “ground breaking” were re-run – and the effect was only observed in 6 of them. That is, indeed, bad. But all the lead up in that article seems to imply a bad scientific methodology/standards – not something reproducibility would help fix. The issue there is more fundamental: getting your initial sample in a known state (i.e. knowing what types of cancer, etc., that the people entering your trails have). There are also some inherent difficulties in their figure of merit – survival. That can be years, so if you treat someone and then wait 5 or 10 years to see if they are still alive… well, a lot can happen in that time that has nothing to do with the procedure.

Reply
2. gordonwatts - September 27, 2013: A minor update. I didn’t realize this when I wrote this post, but someone has pointed out to me that the author of that New Yorker article was fired from the New Yorker for making up quotes. In short, it has a credibility issue. 🙂

Reply
3. 金榜赌场 - September 16, 2015: ¡¡¡¡°ÁÊÀÎáÑùÃ²²»·²£¬È´Áô×Å¸ö¹âÍ·£¬¶øÇÒÔÚ×ó²àÁ³¼ÕÉÏ£¬ÒÔºÚÉ«Ì¿Ä«ÀÓÓ¡×ÅÒ»¸öÐ¡Ð¡Í¼°¸£¬Í¼°¸µÄÐÎ×´ÄËÊÇÒ»ÖÖ²»ÖªÃûµÄÉÏ¹ÅÐÇÊÞ¡£¹âÍ·¼ÓÁ³²¿ÎÆÉí£¬´ó´óµÄ½µµÍÁËËûË§ÆøµÄ³Ì¶È£¬È´Ò²¸øËûÔö¼ÓÁË¼¸·ÖÉ±ÆøÓë¸öÐÔ£¬ÁîÈË¸ü¼Ó²»Ô¸ÓëÖ®½Ó½ü¡£
金榜赌场 http://7777761.cn

Reply

Life as a Physicist