Reinventing the wheel September 10, 2011Posted by gordonwatts in Analysis, computers, LINQToTTree, ROOT.
Last October (2010) my term came to and end running the ATLAS flavor-tagging group. It was time to get back to being a plot-making member of ATLAS. I don’t know how most people feel when they run a large group like this, but I start to feel separated from actually doing physics. You know a lot more about the physics, and your input affects a lot of people, but you are actually doing very little yourself.
But I had a problem. By the time I stepped down in order to even show a plot in ATLAS you had to apply multiple corrections: the z distribution of the vertex was incorrect, the transverse momentum spectrum of the jets in the Monte Carlo didn’t match, etc. Each of these corrections had to first be derived, and then applied before someone would believe your plot.
To make your one really great plot then, lets look at what you have to do:
- Run over the data to get the distributions of each thing you will be reweighting (jet pT, vertex z position, etc.).
- Run over the Monte Carlo samples to get the same thing
- Calculate the reweighting factors
- Apply the reweighting factors
- Make the plot you’d like to make.
If you are lucky then the various items you need to reweight are not correlated – so you can just run the one job on the Data and the one job on the Monte Carlo in steps one and two. Otherwise you’ll have to run multiple times. These jobs are either batch jobs that run on the GRID, or a local ROOT job you run on PROOF or something similar. The results of these jobs are typically small ROOT files.
In step three you have to author a small script that will extract the results from the two jobs in steps 1 and 2, and create the reweighting function. This is often no more difficult that dividing one histogram by another. One can do this at the start of the plotting job (the job you create for steps 4 and 5) or do ti at the command line and save the result in another ROOT file that serves as one of the inputs to the next step.
Steps 4 and 5 can normally be combined into one job. Take the results of step 3 and apply it as a weight to each event, and then plot whatever your variable of interest is, as a function of that weight. Save the result to another ROOT file and you are done!!
I don’t know about you, but this looked scary to me. I had several big issues with this. First, the LHC has been running gang-busters. This means having to constantly re-run all these steps. I’d better not be doing it by hand, especially as things get more complex, because I’m going to forget a step, or accidentally reuse an old result. Next, I was going back to be teaching a pretty difficult course – which means I was going to be distracted. So whatever I did was going to have to be able to survive me not looking at it for a week and then coming back to it… and me still being able to understand what I did! Mostly, the way I normally approach something like the above was going to lead to a mess of scripts and programs, etc., all floating around.
It took me three tries to come up with something that seems to work. It has some difficulties, and isn’t perfect in a number of respects, but it feels a lot better than what I’ve had to do in the past. Next post I’ll talk about my two failed attempts (it will be a week, but I promise it will be there!). After that I’ll discuss my 2011 Christmas project which lead to what I’m using this year.
I’m curious – what do others do to solve this? Mess of scripts and programs? Some sort of work flow? Makefiles?? What?? What I’ve outlined above doesn’t seem scalable!