jump to navigation

The Kitchen Sink May 20, 2009

Posted by gordonwatts in ROOT.

While the plane was bouncing all over the map, it occurred to me that ROOT is a lot like Microsoft Office (indeed, any product like Office).

Looking at the list of libraries/packages in ROOT you might be tempted to call it bloatware. You’d be right, of course, but, just like MS Word and other similar programs, the libraries you think are important are different from the ones that someone else thinks are important. And almost all the libraries are available for separate use as well! But you’ll notice that few people are using it that way.

The reason is obvious – simplicity.

The brilliance of large software packages like MS Office and ROOT is not that they innovate (though both certianly do), it is that they have taken tried and true ways of doing things and made them work together by packaging them up and building bridges between them. Take TMVA, for example, the package that allows you to easily implement various multivariate analysis techniques (like a boosted decision tree). You can get that software separately – there are lots of packages out there. But TMVA is specifically designed to work with a TTree and other things in ROOT.

Once you get yourself into ROOT you have access to all these tools – and the potential barrier you have to get over to use them is minimal. You want to switch to something else? Of course you can do it – but it will be a lot of work! Just like MS Office. 😉

I think people complain about ROOT and how hard it is to use, but use it anyway for the same reasons they complain about MS Office and its daughter programs. Fortunately, for MS Office, there is real $$ involved so other companies are finally starting to compete – which I hope will make MS Office better. I don’t see anything like that coming along to challenge ROOT until after I retire (or there is a complete revolution in how we do data analysis in HEP).



1. tim head - May 21, 2009

“””I don’t see anything like that coming along to challenge ROOT until after I retire (or there is a complete revolution in how we do data analysis in HEP). “””

IMO that is exactly the problem, there is no competition. The plethora of web frameworks that exists comes to mind, most of them are not super (and have few users) but the good ones which survived are used by many (and borrowed some of the best ideas from the little ones).

Another worrying thing is that as you said it is nearly impossible to find a “problem” in particle physics which has not been addressed in ROOT. No single person can hold in their head all the different shovels and hammers available in root, not helped by the rather sparse documentation for exotic features.

This makes me wonder if we were not better off having several smaller projects which are then tied together by some framework, again similar to how most good web frameworks do it (use the “best” templating engine, the “best” request handler, etc.). We would look for the “best” multivariate package, the “best” graph plotting package (which produces “pretty” plots by default), the “best” interactive shell, the “best” collection of useful snippets, etc. and bundle them together. The documentation and examples would then nearly come for free. There is so much great, modern software out there which we as HEP do not seem to take advantage of.

It would require a big effort and probably would not get off the ground because people would be reluctant to invest time to learn new things instead of publishing results.

2. gordonwatts - May 22, 2009

I know what you mean about different frameworks, but I don’t think that is possible given how today’s large HEP collaborationsn work. Certianly ATLAS and CMS and CDF and D0 are all large enough to support their own framework. But once they adopt one, I think pretty much the whole experiment has to stick with it – that is between 1000 and 2000 people that have to share code, and doing it in a common framework makes life a lot simpler. In ATLAS we do our best to remain framework independent (i.e. we have several), but there is a lot of extra work involved. Especially if you’ve spent the last 6 months developing your idea and now you see another 5 months of technical work ahead of you to make that idea publically availible because your experiment requires cross-framework compatibility!

One good thing about ROOT is it is quite pluggable. So it isn’t hard to add new libraries. Indeed, many of the new libraries that are being added in are just plug-ins. Most are not integrated into the core (not true of everything, of course).

We are where we are. And ROOT is “good enough” for now. We started the evolution of where we are today in PAW with the concept of a n-tuple (remember how big a deal that was – no longer having to submit batch jobs to re-do a histogram limit?). ROOT is an evolution of PAW – but the data sets are back to where they were before (so large that making a histogram from a root-tuple requires a batch job again).

So we await the next revolution. And at that point I think we will have a chance to do it all over again.

Ideas!? 🙂

3. tim head - May 23, 2009

I wasn’t thinking so much about the framework for the whole collaboration. That is and should be something maintained centrally and changing only slowly with a well defined release cycle so people know when to get ready for an upgrade that will potentially break their code, everyone uses the same versions so one doesn’t have meetings where people get told that they have to redo things because they used the version from three weeks ago which had a bug in it.

However most people I know use the central framework to do very little clever stuff, everyone tries to escape from it as soon in the analysis chain as possible so that they don’t have to use it any more to make histograms or fiddle with a parameter in their code. I don’t know how efficient exactly root’s ntuples/trees are (they are one of the things we should keep!) but lets say one has 1000000 events at the nearly final analysis stage, maybe 2000000 to be on the safe side, about ten objects each described by 8 doubles (4 for the four vector and 4 for “other” stuff) plus some extra space for what ever bookkeeping needed for the data structure:

((10 * (8 * 8 byte)) + (32 bytes)) * 1 000 000 = 641 megabytes

That fits into most people computers memory these days! I think for D0 2000000 events at the final stage is close to an upper limit, how it would be for LHC experiments I don’t know (I know of a LHCb background sample which contains ~80million events but that is before any cuts but it also only corresponds to a few days of data taking).

The point being that I am unconvinced that after you apply all corrections and “standard” stuff you need a central framework per collaboration, what is needed is something which makes it easy to do useful things. Lots of things can be done in ROOT but they are generally a bit tedious to do (often because you need to look things up and click three times through the docs before you find what you were looking for). Probably everyone already has those, inherited from a previous grad student, but one can’t find them publicly available. A friend showed me this:

tree.Draw(“foo>>fhist”, “mycuthere”)
tree.Draw(“bar>>bhist”, “mycuthere”)

fhist, bhist = Get(“fhist”, “bhist”)
st = StackLegend(“a stack”, “TL”,
((fhist, kRed, “foo”),
(bhist, kBlue, “bar”)))
st.Draw(“e0 hist”)

I dont want to know how much work is going on inside StackLegend to make it all happen but this is nice code for someone who doesnt want to be bothered with code so why is it not something available “by default”?

Enough of my ranting, I always thought the best way to solve this is to start a project collecting these things, I managed to check-in a README to see if the repository hosting was working and then never had time to do anything more.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: