jump to navigation

Data mining November 7, 2009

Posted by gordonwatts in computers, Health.

In particle physics this is what we do. We have petabytes (1000 terrabytes!) datasets consisting of billions of physics interactions. For the particularly rare ones we need to pick out several 100 or 1000 and study them in detail. As you might expect, we are drowning in data and have developed many tools to help us. Computers are central – without them we would not be able to do the science we currently do!

The most common public example of data mining I’ve heard about is looking at all the receipts from Wallmart purchases. This is why grocery stores like you to sign up for their frequent-use cards – they can track everything you buy, sell that data, and, more importantly, send you ads that are likely to get you in and get you to buy other things. It is an amazingly powerful tool. In business it has been getting a bit of a bad name recently because it has been connected to some fairly serious invasion of privacy issues (i.e. creepy things – like knowing what hour of the day you check your email, how much the average person in your zip code makes, etc.).

But one place that it could obviously be applied for the greater good that I’d never really given much thought to is medicine. Check out this long article from the NYT on the topic – Making Health Care Better. It starts with some history – and the bromide “The amount of death and disease would be less if all disease were left to itself.” from 1835… to the present day:

“Medicine adopted the scientific method,” James said… “It transformed medicine, and it’s easy to make the case.”

He talks about the testing and science applied to any new method, drug, procedure before it is allowed to be used by mainstream doctors. But…

But there is one important way in which medicine never quite adopted the scientific method… …once a treatment enters the mainstream — once we know whether it works in certain situations — science is largely left behind. The next questions — when to use it and on which patients — become matters of judgment, not measurement.

The article provides a dizzying array of treatments available to a doctor that is trying to treat heart disease. And what treatment is left up to the doctor’s judgment.

Cleary some hospitals and doctors have better average outcomes than others – so some doctors must have better judgment than others. Wouldn’t it be great if every doctor could start with a default procedure that has been shown to work for a patient that looks like the one the doctor is trying to treat and then modify it to fit the patient’s specifics?


“I thought there wasn’t anybody better in the world at twiddling the knobs than I was,” Jim Orme, a critical-care doctor, told me later, “so I was skeptical that any protocol generated by a group of people could do better.”

And that is just it – there are so many variations in treatments. Today’s instruments are quite complex and have many settings – so how do you know what works correctly? When the procedure is approved there is a fair amount of science recorded for each setting, and presumable most doctors follow it. But not all of them!

And this is where I think data-mining could come in. What if every single modern instrument was hooked into the network, and each adjustment was recorded? And linked to a patients medical file (so you could see history). Each time a nurse or doctor did something it was recorded. All of that in some standard format – and then shared across hospitals and doctors the country or world over?

This has, of course, been a dream for a while of electronic health care records. It always struck me as obvious that you would attach x-rays, CT scans, descriptions of medicine given, etc., but it never occurred to me the level of detail you could go into! From a technical point of view this is hard – the data is so non-uniform, unlike the particle physics experiments I work on, but the long term benefits could be quite good. The article describes when this data mining technique was applied ad-hoc in just a single hospital:

One widely circulated national study overseen by doctors at Massachusetts General Hospital had found an ARDS [Acute respiratory distress syndrome] survival rate of about 10 percent. For those in Intermountain’s study, the rate was 40 percent.

At any rate, this tickled my fancy, which is why I wrote about it. I found it ironic that on the Health home page yesterday there was also the following article:

Five years later, Medicare underwrites more than half of the $4 billion the nation now spends annually on defibrillators, but the agency is no closer to knowing how many lives that big investment is saving.

My impression of the health care bills working their way through congress right now is none of them really go after cost-savings. Science can help*.

* Ok – making devices that can spit out data in a common format will add to their cost. But you can do simple things like the Intermountain study to start as better devices come online!