HEP in a Database March 19, 2008Posted by gordonwatts in computers, D0, Uncategorized.
Not everyone is satisfied with ROOT as the “tool” to analyze HEP data. Back in D0’s Run I all the data was loaded into a commercial database.
So, before you roll your eyes – you are right. HEP is littered with database train wrecks (can anyone say Objectivity?). However, most of those had to do with trying to store every single last bit of data that came off the data acquisition system in the database. And then also store reconstructed data. And then, in some cases, even the analysis level objects. In fact, ROOT grew out disagreement with this vision (and you can tell who won…).
This project, however, was different. The goal was to store only the high level physics information. For a reconstructed jet, for example, they had the four vector and some other quantities (like electromagnetic fraction of calorimeter energies – 28 values in all). They had separate markers for tight very high quality electrons and loose, lower quality, electrons. Same for muons, jets, etc. To understand the limitations of this — and what you might or might not do with this tool: if you changed your jet energy scale you would have to completely re-load the database. This is not something you do frequently, but you get the idea: this is to do your final selection – the last mile of your analysis. Indeed, the test case was to repeat the Run 1 top discovery analysis. However, if you can do selection quickly imagine the power for scanning over a large SUSY parameter space!
How much data? About 62 million events. As a raw ntuple it was 62.4 GB of ntuples (small by today’s standards, of course!). It took almost 1000 hours to generate these ntuples – applying jet energy scale, etc. After being inserted into the database it was 80 GB of raw data, and another 30 GB of database index data.
They used Microsoft’s SQL Server for this. On a qual 450 MHz Pentium II with 256 MB of memory. Does that tell you how long ago this experiment was done!?
Actually, their DB design was pretty clever. All electrons in one table, all jets in another. Then another table which just listed all tight electrons, and another one that listed all loose electrons, etc.
So, how fast did this thing run? So, looking for a Z boson goes to two electrons took about 7 seconds. It found about 6000 events – the right number. Looking for a W boson decaying to an electron and neutrino took about 18 seconds to find 86,000 events. That is pretty darn good!
Are there plans to do this in ATLAS? Well, perhaps. We have a physics summary database – but it isn’t complete (e.g. doesn’t have all the jets in an event). It its design goal is different: you use it to select a sample of events you actually want to run over.
The project was lead by Rich Partridge at Brown University (with a lot of help from an undergraduate Matt Bowen). For more raw information you can see a talk by Rich at a SLAC meeting the other day (CERN ATLAS agendas, look for meetings on Feb 27, the SLAC ATLAS forum).
At any rate, this was something I’ve been meaning to write about for a while. Unfortunately for an approach like this, about 95% of an analyzer’s time is spent trying to understand what exactly is a tight electron – and its fake rate. However, anything that makes for fast turn around is a boon in my book!