CHEP Trends: Libraries May 24, 2012Posted by gordonwatts in Analysis, computers.
I’m attending CHEP – Computers in High Energy Physics – which is being hosted by New York University this year, in New York City. A lot of fun – most of my family is on the east coast so it is cool to hang out with my sister and her family.
CHEP has been one my favorite conference series. For a while I soured on it as the GRID hijacked it. Everything else – algorithms, virtualization, etc., is making a come back now and makes the conference much more balanced and more interesting, IMHO.
There were a few striking themes (no, one of them wasn’t me being a smart-a** – that has always been true). Rene Brun, one of the inventors of ROOT, gave a talk about the history of data analysis. Check out this slide:
These two slides show, I think, two very different approaches to software architecture. In Rene’s slide, note that all the libraries are coalescing into a small number of projects (i.e. ROOT and GEANT). As anyone who has used ROOT knows, it is a bit of a kitchen sink. The Cloudera platform, on the other hand, is a project built of many small libraries mashed together. Some of them are written in-house, others are written by other groups. All open source (as far as I could understand from the talk). This is the current development paradigm in the open source world: make lots of libraries that end-programing can put together like Lego blocks.
This trend in the web world is, I think, the result of at least two forces at place: the rapid release cycle and the agile programming approach. Both mean that you want to develop small bits of functionality in isolation, if possible, which can then be rapidly integrated into the end project. As a result, development can proceed a pace on both projects, independently. However, a powerful side-effect is it also enables someone from the outside to come along and quickly build up a new system with a few unique aspects – in short, innovate.
I’ve used the fruits of this in some of my projects: it is trivial to download an load a library into one of my projects and with almost no work I’ve got a major building block. HTML parsers, and combinator parsers are two that I’ve used recently that have meant I could ignore some major bits of plumbing, but still get a very robust solution.
Will software development in particle physics ever adopt this strategy? Should it? I’m still figuring that out.