The End Of The Scientific Method… Wha….? June 26, 2008Posted by gordonwatts in science.
There is an incredible article over on Wired right now, The End of Theory: The Data Deluge Makes the Scientific Method Obsolete. The article’s premise is that we now have so much data on hand that you don’t need to look at why things happen, just that they do happen. The author, Anderson, uses Google advertising as an example:
Google’s founding philosophy is that we don’t know why this page is better than that one: If the statistics of incoming links say it is, that’s good enough. No semantic or causal analysis is required.
Anderson then wants to extend it to science:
In short, the more we learn about biology, the further we find ourselves from a model that can explain it. There is now a better way. Petabytes [of data] allow us to say: “Correlation is enough.” We can stop looking for models. We can analyze the data without hypotheses about what it might show. We can throw the numbers into the biggest computing clusters the world has ever seen and let statistical algorithms find patterns where science cannot.
His basic thesis is that when you have so much data you can map out every connection, every correlation, then the data becomes the model. No need to derive or understand what is actually happening — you have so much data that you can already make all the predictions that a model would let you do in the first place. In short — you no longer need to develop a theory or hypothesis – just map the data!
This definitely works for some things. For example, we have measured that aspirin works by basic data-mining. We know it helps reduce heart risk because of the many trials where that was measured. Imagine if everyone’s detailed medical history was available for data mining. What other hidden gems are there? Probably lots!
In particle physics we use this technique all the time to analyze our data. But I have several basic problems with the thesis that this can replace science.
First, in order for this to work you need to have millions and millions and millions of data points. You need, basically, ever single outcome possible, with all possible other factors. Huge amounts of data. That does not apply to all branches of science. Take medicine — testing a new drug compound means there is no data availible — you definitely don’t want to unleash it on millions of people to see what is going on. It might be much better to use the data mining tools to find something else that sort of does what you want, then isolate the compound. At that point you might know the agent, and the group of people it affects, and now you can study what is actually happening. Given that, you can now create something new and more powerful. Take the retro-virial drugs developed for AIDS. I don’t see how their development could have come out from anything other than understanding how AIDS works.
The second problem with this approach is you will never discover anything new. The problem with new things is there is no data on them!
Third is more subjective. I just can’t imagine living in a world where I’m told “well, that is the way it happens, so we just do it like that.” But WHY!? I couldn’t do it.🙂
Anderson is right — we are entering a new age where the ability to mine these large amounts of data are going to open up whole new levels of understanding. Discoveries will be made using this technique alone. I predict Woody Allen was right — we will discover that chocolate milkshakes are a health food — and tools like this will discover those sorts of things. This is a new tool, and it will open up all sorts of doors for us. But the end of the scientific method? No — because that implies an end of discovery. And end of new things.
Update: Ars does a better job than I do (of course):
Correlations are a way of catching a scientist’s attention, but the models and mechanisms that explain them are how we make the predictions that not only advance science, but generate practical applications. One only needs to look at a promising field that lacks a strong theoretical foundation—high-temperature superconductivity springs to mind—to see how badly the lack of a theory can impact progress