jump to navigation

HEP in a Database March 19, 2008

Posted by gordonwatts in computers, D0, Uncategorized.
trackback

Not everyone is satisfied with ROOT as the “tool” to analyze HEP data. Back in D0’s Run I all the data was loaded into a commercial database.

So, before you roll your eyes – you are right. HEP is littered with database train wrecks (can anyone say Objectivity?). However, most of those had to do with trying to store every single last bit of data that came off the data acquisition system in the database. And then also store reconstructed data. And then, in some cases, even the analysis level objects. In fact, ROOT grew out disagreement with this vision (and you can tell who won…).

This project, however, was different. The goal was to store only the high level physics information. For a reconstructed jet, for example, they had the four vector and some other quantities (like electromagnetic fraction of calorimeter energies – 28 values in all). They had separate markers for tight very high quality electrons and loose, lower quality, electrons. Same for muons, jets, etc. To understand the limitations of this — and what you might or might not do with this tool: if you changed your jet energy scale you would have to completely re-load the database. This is not something you do frequently, but you get the idea: this is to do your final selection – the last mile of your analysis. Indeed, the test case was to repeat the Run 1 top discovery analysis. However, if you can do selection quickly imagine the power for scanning over a large SUSY parameter space!

How much data? About 62 million events. As a raw ntuple it was 62.4 GB of ntuples (small by today’s standards, of course!). It took almost 1000 hours to generate these ntuples – applying jet energy scale, etc. After being inserted into the database it was 80 GB of raw data, and another 30 GB of database index data.

They used Microsoft’s SQL Server for this. On a qual 450 MHz Pentium II with 256 MB of memory. Does that tell you how long ago this experiment was done!?

Actually, their DB design was pretty clever. All electrons in one table, all jets in another. Then another table which just listed all tight electrons, and another one that listed all loose electrons, etc.

So, how fast did this thing run? So, looking for a Z boson goes to two electrons took about 7 seconds. It found about 6000 events – the right number. Looking for a W boson decaying to an electron and neutrino took about 18 seconds to find 86,000 events. That is pretty darn good!

Are there plans to do this in ATLAS? Well, perhaps. We have a physics summary database – but it isn’t complete (e.g. doesn’t have all the jets in an event). It its design goal is different: you use it to select a sample of events you actually want to run over.

The project was lead by Rich Partridge at Brown University (with a lot of help from an undergraduate Matt Bowen). For more raw information you can see a talk by Rich at a SLAC meeting the other day (CERN ATLAS agendas, look for meetings on Feb 27, the SLAC ATLAS forum).

At any rate, this was something I’ve been meaning to write about for a while. Unfortunately for an approach like this, about 95% of an analyzer’s time is spent trying to understand what exactly is a tight electron – and its fake rate. However, anything that makes for fast turn around is a boon in my book!

Comments»

1. Kevin - March 19, 2008

OK, this post forces me to ask this question: can you estimate the amount of time you spend on “science” questions versus methods development, programming or other things? After reading your blog (and enjoying it of course) for over two years now, I feel you have discussed a lot of issues in programming, ROOT, C++ vs other languages, computers and other things that are tools to do science but which are not themselves science (or at least not physics). I ask because a physics colleague of mine has recently been warned that he is doing too much methods development (and publishing on these developments). It would appear to me that the LHC has been for many years “methods development,” yet I’m assuming a couple people already have tenure on it and more than a couple Ph Ds have been awarded for developing the technology. Maybe I’m wrong, but it seems unfair for a single-PI to be criticized for methods development by someone who has the backing and support of an international collaboraiton. Single PIs can’t do anything unique and original if they don’t develop their new methods, while collaborations always seem to have something to do.

2. gordonwatts - March 19, 2008

Excellent questions, Kevin. I’ll try to address them in actual posts. Probably starting Saturday or Monday.

3. Adam Kocoloski - March 19, 2008

Nice post — I’ve been thinking about trying this with my own analysis trees in STAR for some time now. It’s good to see a positive precedent.

4. Gordon Watts - March 20, 2008

The key to fast analysis is the index. That took a lot of tuning, and one can’t help but wonder if that tuning had to happen for every analysis (to get really optimal performance).

5. Tenure and Physics « Life as a Physicist - March 21, 2008

[…] March 21, 2008 Posted by gordonwatts in Tenure, university. trackback One one of my lasts posts about computers and HEP, Kevin left a comment. After reading your blog (and enjoying it of course) for over two years now, […]

6. Is the LHC Doing Physics? « Life as a Physicist - March 25, 2008

[…] Posted by gordonwatts in LHC, Tenure, physics life, university. trackback One one of my lasts posts about computers and HEP, Kevin left a comment. It would appear to me that the LHC has been for many years “methods […]

7. What Do I Spend My Time On? How do I Choose a Topic to Write About? « Life as a Physicist - March 25, 2008

[…] March 25, 2008 Posted by gordonwatts in blog, physics life. trackback One one of my lasts posts about computers and HEP, Kevin left a comment. OK, this post forces me to ask this question: can you estimate the amount of […]


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: