jump to navigation

Data mining November 7, 2009

Posted by gordonwatts in computers, Health.

In particle physics this is what we do. We have petabytes (1000 terrabytes!) datasets consisting of billions of physics interactions. For the particularly rare ones we need to pick out several 100 or 1000 and study them in detail. As you might expect, we are drowning in data and have developed many tools to help us. Computers are central – without them we would not be able to do the science we currently do!

The most common public example of data mining I’ve heard about is looking at all the receipts from Wallmart purchases. This is why grocery stores like you to sign up for their frequent-use cards – they can track everything you buy, sell that data, and, more importantly, send you ads that are likely to get you in and get you to buy other things. It is an amazingly powerful tool. In business it has been getting a bit of a bad name recently because it has been connected to some fairly serious invasion of privacy issues (i.e. creepy things – like knowing what hour of the day you check your email, how much the average person in your zip code makes, etc.).

But one place that it could obviously be applied for the greater good that I’d never really given much thought to is medicine. Check out this long article from the NYT on the topic – Making Health Care Better. It starts with some history – and the bromide “The amount of death and disease would be less if all disease were left to itself.” from 1835… to the present day:

“Medicine adopted the scientific method,” James said… “It transformed medicine, and it’s easy to make the case.”

He talks about the testing and science applied to any new method, drug, procedure before it is allowed to be used by mainstream doctors. But…

But there is one important way in which medicine never quite adopted the scientific method… …once a treatment enters the mainstream — once we know whether it works in certain situations — science is largely left behind. The next questions — when to use it and on which patients — become matters of judgment, not measurement.

The article provides a dizzying array of treatments available to a doctor that is trying to treat heart disease. And what treatment is left up to the doctor’s judgment.

Cleary some hospitals and doctors have better average outcomes than others – so some doctors must have better judgment than others. Wouldn’t it be great if every doctor could start with a default procedure that has been shown to work for a patient that looks like the one the doctor is trying to treat and then modify it to fit the patient’s specifics?


“I thought there wasn’t anybody better in the world at twiddling the knobs than I was,” Jim Orme, a critical-care doctor, told me later, “so I was skeptical that any protocol generated by a group of people could do better.”

And that is just it – there are so many variations in treatments. Today’s instruments are quite complex and have many settings – so how do you know what works correctly? When the procedure is approved there is a fair amount of science recorded for each setting, and presumable most doctors follow it. But not all of them!

And this is where I think data-mining could come in. What if every single modern instrument was hooked into the network, and each adjustment was recorded? And linked to a patients medical file (so you could see history). Each time a nurse or doctor did something it was recorded. All of that in some standard format – and then shared across hospitals and doctors the country or world over?

This has, of course, been a dream for a while of electronic health care records. It always struck me as obvious that you would attach x-rays, CT scans, descriptions of medicine given, etc., but it never occurred to me the level of detail you could go into! From a technical point of view this is hard – the data is so non-uniform, unlike the particle physics experiments I work on, but the long term benefits could be quite good. The article describes when this data mining technique was applied ad-hoc in just a single hospital:

One widely circulated national study overseen by doctors at Massachusetts General Hospital had found an ARDS [Acute respiratory distress syndrome] survival rate of about 10 percent. For those in Intermountain’s study, the rate was 40 percent.

At any rate, this tickled my fancy, which is why I wrote about it. I found it ironic that on the Health home page yesterday there was also the following article:

Five years later, Medicare underwrites more than half of the $4 billion the nation now spends annually on defibrillators, but the agency is no closer to knowing how many lives that big investment is saving.

My impression of the health care bills working their way through congress right now is none of them really go after cost-savings. Science can help*.

* Ok – making devices that can spit out data in a common format will add to their cost. But you can do simple things like the Intermountain study to start as better devices come online!



1. zyxo - November 8, 2009

“$4 billion the nation now spends annually on defibrillators, but the agency is no closer to knowing how many lives that big investment is saving”
Exactly the same is happening in marketing : half of the marketing budget is useles, only, nobody knows which half. The only way to find out would be data mining.

2. gordonwatts - November 9, 2009

RIght – I guess often the root of the problem, at least right now, is that we don’t have the data. Probably the same with marketing. Soon we’ll have so much data that the problem will be spotting what is important. At the Tevatron we all live in fear of some big discovery at the LHC, and when we go back and look at the Tevatron data we will find that we could have seen it had we only looked correctly!

3. Charlie - November 11, 2009

With information comes power, and with power comes great responsibility. I’ve no doubt that a common records system would provide great insights into the human body and indirectly prolong many lives, but one must be extremely careful not to overstep privacy, ethics, and social bounds. Just as our international postal system can be hamstrung by one evil character with the right white powder, an efficient records system has the potential to aid both the bad and the good.

As for the big discoveries – the answers are all within every cubic centimeter of vacuum and accessible to the energy stored in an automobile gas tank.

4. gordonwatts - November 11, 2009

Indeed, Charlie (though I don’t get your last comment). I was explicitly ignoring those issues in this post – though they are very real. I’m a far cry from being a privacy expert!! But there must be some good way to extra the useful information (or at least, more useful information than we are extracting right now).

5. Charlie - November 11, 2009

Whoops! Missed the privacy note in your second paragraph.

As for the second bit of my comment – the Planck energy is ~2×10^9 J, which the Wikipedia entry claims is equivalent to the chemical energy stored in a 60 L gasoline tank. A cubic centimeter of vacuum ought to be plenty large, if we’re sufficiently clever, for us to determine all of its high-energy properties.

6. gordonwatts - November 12, 2009

No problems – the privacy issue is huge when it comes to this sort of thing – so it bears repeating.

So, we are trying to be clever – that is why we build those very large accelerators. Part of the problem is interactions we need to study in order to understand how the universe works don’t happen naturally – the natural energy level is too low. We need something like an accelerator to access those energies – which are similar to what we think happened during the big-bang (when the natural energy scale was huge compared to today).

Or did I totally miss the point of the comment. 🙂

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: