jump to navigation

Source Code In ATLAS June 11, 2011

Posted by gordonwatts in ATLAS, computers.
3 comments

I got asked in a comment what, really, was the size in lines of the source code that ATLAS uses. I have an imperfect answer. About 7 million total. This excludes comments in the code and blank lines in the code.

The break down is a bit under 4 million lines of C++ and almost 1.5 million lines of python – the two major programming languages used by ATLAS. Additionally, in those same C++ source files there are another about million blank lines and almost a million lines of comments. Python contains similar fractions.

There are 7 lines of LISP. Which was probably an accidental check-in. Once the build runs the # of lines of source code balloons almost a factor of 10 – but that is all generated code (and HTML documentation, actually) – so shouldn’t count in the official numbers.

This is imperfect because these are just the files that are built for the reconstruction program. This is the main program that takes the raw detector signals and coverts them into high level objects (electrons, muons, jets, etc.). There is another large body of code – the physics analysis code. That is the code that takes those high level objects and coverts them into actual interesting measurements – like a cross section, or a top quark mass, or a limit on your favorite SUSY model. That is not always in a source code repository, and is almost impossible to get an accounting of – but I would guess that it was about another x10 or so in size, based on experience in previous experiments.

So, umm… wow. That is big. But it isn’t quite as big as I thought! I mentioned in the last post talking about source control that I was worried about the size of the source and checking it out. However, Linux is apparently about 13.5 million lines of code, and uses one of these modern source control systems. So, I guess these things are up to the job…

Can’t It Be Easy? June 8, 2011

Posted by gordonwatts in ROOT.
3 comments

Friday night. A truly spectacular day in Seattle. I had to take half of it off and was stuck out doors hanging out with Julia. Paula is on a plane to Finland. I’ve got a beer by my slide. A youtube video of a fire in a fireplace.  Hey. I’m up for anything.

So, lets tackle a ROOT problem.

ROOT is weird. It has made it very easy to do very simple things. For example, want to draw a previously made histogram? Just double click and you’re done. Want to see what the data in one of your TTree’s looks like? Just double click on the leaf and it pops up! But, the second you want to do something harder… well, it is much harder. I’d say it was as hard to do something advanced as it was to do something intermediate in ROOT.

Plotting is an example.

Stacking the Plots

I have four plots, and I want to plot them on top of each other so I can compare them. If I do exactly what I learned how to do when I learned to plot one thing, I end up with the following:

image

Note all the lines on black, thin, and on top of each other. No legend. And that “stats” box in the upper right contains data relevant only to the first plot. The title strip is also only for the first plot. Grey background. Lousy font. It should probably have error bars but that is for a later time.

h1->Draw();
h2->Draw("SAME");
h3->Draw("SAME");
h4->Draw("SAME");

.csharpcode, .csharpcode pre
{
font-size: small;
color: black;
font-family: consolas, “Courier New”, courier, monospace;
background-color: #ffffff;
/*white-space: pre;*/
}
.csharpcode pre { margin: 0em; }
.csharpcode .rem { color: #008000; }
.csharpcode .kwrd { color: #0000ff; }
.csharpcode .str { color: #006080; }
.csharpcode .op { color: #0000c0; }
.csharpcode .preproc { color: #cc6633; }
.csharpcode .asp { background-color: #ffff00; }
.csharpcode .html { color: #800000; }
.csharpcode .attr { color: #ff0000; }
.csharpcode .alt
{
background-color: #f4f4f4;
width: 100%;
margin: 0em;
}
.csharpcode .lnum { color: #606060; }

So, everyone has to make plots like this. This should be “easy” to make it look good! I suspect with a simple solution 90% of the folks who use ROOT would be very happy!

So, someone must have thought of this, right? Turns out… yes. It is called THStack. Its interface is dirt simple:

THStack *s = new THStack();
s->Add(h1);
s->Add(h2);
s->Add(h3);
s->Add(h4);
s->Draw("nostack");

.csharpcode, .csharpcode pre
{
font-size: small;
color: black;
font-family: consolas, “Courier New”, courier, monospace;
background-color: #ffffff;
/*white-space: pre;*/
}
.csharpcode pre { margin: 0em; }
.csharpcode .rem { color: #008000; }
.csharpcode .kwrd { color: #0000ff; }
.csharpcode .str { color: #006080; }
.csharpcode .op { color: #0000c0; }
.csharpcode .preproc { color: #cc6633; }
.csharpcode .asp { background-color: #ffff00; }
.csharpcode .html { color: #800000; }
.csharpcode .attr { color: #ff0000; }
.csharpcode .alt
{
background-color: #f4f4f4;
width: 100%;
margin: 0em;
}
.csharpcode .lnum { color: #606060; }and we end up with the following:

image

THStack actually took care of a lot of stuff behind our backs.It matched up the axes, it made sure the max and min of the plot were correct, removed the stats box, and killed off the title. So this is a big win for us! Thanks to the ROOT team. But we are not done. I don’t know about you, but I can’t tell what is what on there!

Color

There are two options for telling the plots apart: color the lines or make them different patterns (dots, dashes, etc.). I am, fortunately, not color blind, and tend to choose color as my primary differentiator. ROOT defines a number of nice colors for you in the EColor enumeration… but you can’t really use it out of the box. Charitably, I would say the colors were designed to look good on the printed page – some of them are a disaster on a CRT, LCD, or beamer.

First, under no circumstances, under no situation, never. EVER. use the color kYellow. It is almost like using White on a White background. Just never do it. If you want a yellowish color, use kOrange as the color. At least, it looks yellow to me.

Second, try to avoid the default kGreen color. It is a flourecent green. On a white or grey background it tends to bleed into the surrounding colors or backgrounds. Instead, use a dark green color.

Do not use both kPink and kRed on the same plot – they are too close together. kCyan suffers the same problem as kGreen, so don’t use it. kSpring (yes, that is the name) is another color that is too bright a green to be useful – stay away if you can.

After playing around a bit I settled on these colors for my automatic color assignment: kBlack, kBlue, TColor::GetColroDark(kGreen), kRed, kViolet, kOrange, kMagenta. The TColor class has some nice palettes (right there in the docs, even). But it one thing it doesn’t have that it really should is what the constituents of EColor look like. These are the things that you are most likely to use.

Colors are tricky things. The thickness of the line can make a big difference, for example. The default 1 pixel line width isn’t enough in my opinion to really show off these colors (more on fixing that below).

.csharpcode, .csharpcode pre
{
font-size: small;
color: black;
font-family: consolas, “Courier New”, courier, monospace;
background-color: #ffffff;
/*white-space: pre;*/
}
.csharpcode pre { margin: 0em; }
.csharpcode .rem { color: #008000; }
.csharpcode .kwrd { color: #0000ff; }
.csharpcode .str { color: #006080; }
.csharpcode .op { color: #0000c0; }
.csharpcode .preproc { color: #cc6633; }
.csharpcode .asp { background-color: #ffff00; }
.csharpcode .html { color: #800000; }
.csharpcode .attr { color: #ff0000; }
.csharpcode .alt
{
background-color: #f4f4f4;
width: 100%;
margin: 0em;
}
.csharpcode .lnum { color: #606060; }

After applying the colors I end up with a plot that looks like the following:

image

A Legend and Title

So the plot is starting to look ok… at least, I can tell the difference between the various things. But darned if I can tell what each one is! We need a legend. Now, ROOT comes with the TLegend object. So, we could do all the work of cycling through the histograms and putting up the proper titles, etc. However, it turns out there is a very nice short-cut provided by the ROOT folks: TPad::BuildLegend. So, just using the code:

c1->BuildLegend();

.csharpcode, .csharpcode pre
{
font-size: small;
color: black;
font-family: consolas, “Courier New”, courier, monospace;
background-color: #ffffff;
/*white-space: pre;*/
}
.csharpcode pre { margin: 0em; }
.csharpcode .rem { color: #008000; }
.csharpcode .kwrd { color: #0000ff; }
.csharpcode .str { color: #006080; }
.csharpcode .op { color: #0000c0; }
.csharpcode .preproc { color: #cc6633; }
.csharpcode .asp { background-color: #ffff00; }
.csharpcode .html { color: #800000; }
.csharpcode .attr { color: #ff0000; }
.csharpcode .alt
{
background-color: #f4f4f4;
width: 100%;
margin: 0em;
}
.csharpcode .lnum { color: #606060; }where c1 is the pointer to the current TCanvas (the one most often used when you are running from the command line). See below for its effect. The automatic legend has some problems – mainly that it doesn’t automatically detect the best placement when drawing for a stack of histograms (left, right, up or down). One can think of a simple algorithm that would get this right most of the time. But that is for another day.

Next, I’d like to have a decent title up there, similar to what was there previously. This is also easy – we just pass it in when we create the stack of histograms.

THStack *s = new THStack("histstack", "WeightSV0");

.csharpcode, .csharpcode pre
{
font-size: small;
color: black;
font-family: consolas, “Courier New”, courier, monospace;
background-color: #ffffff;
/*white-space: pre;*/
}
.csharpcode pre { margin: 0em; }
.csharpcode .rem { color: #008000; }
.csharpcode .kwrd { color: #0000ff; }
.csharpcode .str { color: #006080; }
.csharpcode .op { color: #0000c0; }
.csharpcode .preproc { color: #cc6633; }
.csharpcode .asp { background-color: #ffff00; }
.csharpcode .html { color: #800000; }
.csharpcode .attr { color: #ff0000; }
.csharpcode .alt
{
background-color: #f4f4f4;
width: 100%;
margin: 0em;
}
.csharpcode .lnum { color: #606060; }

And we now have something that is at least scientifically serviceable:

image

One thing to note here – there are no x-axis labels. If you add an x-axis label to your plot the THStack doesn’t copy it over. I’d call that a bug, I suppose.

Background And Lines And Fonts

We are getting close to what I think the plot should look like out of the box. The final bit is basically pretty-printing. Note the very ugly white-on-grey around the lines in the Legend box. Or the font (it is pixelated, even when the plot is blown up). Or (to me, at least) the lines are too thin, etc. This plot wouldn’t even make it past first-base if you tried to submit it to a journal.

ROOT has a fairly nice system for dealing with this. All plots and other graphing functions tend to take their queues from a TStyle object. This defines the background, etc. The default set in ROOT is what you get above. HOWEVER… it looks like that is about to change with the new version of ROOT.

Now, a TStyle is funny. A style is applied when you draw the histograms… but it is also applied when it is created. So to really get it right you have to have the proper style applied both when you create and when you draw the histogram. In short: I have an awful time with TStyle! I’m left with the choice of either setting everything in code when I do the drawing, or applying a TStyle everywhere. I’ve gone with the latter. Here is my rootlogon.C file, which contains the TStyle definition. But even this isn’t perfect. After a bunch of work I basically gave up, I’m afraid, and I ended up with this (note the #@*@ title box still has that funny background):

image

Conclusion

So, if you’ve made it this far I’m impressed. As you can tell, getting ROOT to draw nice plots is not trivial. This should work out of the box (using the “SAME” option that I used in the first line we should get behavior that looks a lot like this last plot).

Finally, a word on object ownership. ROOT is written in C++, which means it is very easy to delete an object that is being referenced by some other bit of the system. As a result, code has to carefully keep track of who owns what and when. For example, if I don’t write out the Canvas that I’ve generated right away, sometimes my canvases somehow come out blank. This is because something has deleted the objects from under me (it was my program obviously, but I have no idea what did it). Reference counting would have been the right away to go, but ROOT was started too long ago. Perhaps it is time for someone to start again? Winking smile

The code I used to make the above appears below. My actual code does more (for example, it will take the legend and automatically turn it into “lightJets”, “charmJets”, etc., instead of the full blown titles you see there. It is, obvously, not in C++, but the algorithm should be clear!

        public static ROOTNET.Interface.NTCanvas PlotStacked(this ROOTNET.Interface.NTH1F[] histos, string canvasName, string canvasTitle,
            bool logy = false,
            bool normalize = false,
            bool colorize = true)
        {
            if (histos == null || histos.Length == 0)
                return null;

            var hToPlot = histos;

            ///
            /// If we have to normalize first, we need to normalize first!
            /// 

            if (normalize)
            {
                hToPlot = (from h in hToPlot
                           let clone = h.Clone() as ROOTNET.Interface.NTH1F
                           select clone.Normalize()).ToArray();
            }

            ///
            /// Reset the colors on these guys
            /// 

            if (colorize)
            {
                var cloop = new ColorLoop();
                foreach (var h in hToPlot)
                {
                    h.LineColor = cloop.NextColor();
                }
            }

            ///
            /// Use the nice ROOT utility THStack to make the plot
            /// 

            var stack = new ROOTNET.NTHStack(canvasName + "StacK", canvasTitle);
            foreach (var h in hToPlot)
            {
                stack.Add(h);
            }

            ///
            /// Now do the plotting. Use the THStack to get all the axis stuff correct.
            /// If we are plotting a log plot, then make sure to set that first before
            /// calling it as it will use that information during its painting.
            /// 

            var result = new ROOTNET.NTCanvas(canvasName, canvasTitle);
            result.FillColor = ROOTNET.NTStyle.gStyle.FrameFillColor; // This is not a sticky setting!
            if (logy)
                result.Logy = 1;
            stack.Draw("nostack");

            ///
            /// And a legend!
            /// 

            result.BuildLegend();

            ///
            /// Return the canvas so it can be saved to the file (or whatever).
            /// 

            return result;
        }

        /// <summary>
        /// Normalize this histo and return it.
        /// </summary>
        /// <param name="histo"></param>
        /// <returns></returns>
        public static ROOTNET.Interface.NTH1F Normalize(this ROOTNET.Interface.NTH1F histo, double toArea = 1.0)
        {
            histo.Scale(toArea / histo.Integral());
            return histo;
        }

.csharpcode, .csharpcode pre
{
font-size: small;
color: black;
font-family: consolas, “Courier New”, courier, monospace;
background-color: #ffffff;
/*white-space: pre;*/
}
.csharpcode pre { margin: 0em; }
.csharpcode .rem { color: #008000; }
.csharpcode .kwrd { color: #0000ff; }
.csharpcode .str { color: #006080; }
.csharpcode .op { color: #0000c0; }
.csharpcode .preproc { color: #cc6633; }
.csharpcode .asp { background-color: #ffff00; }
.csharpcode .html { color: #800000; }
.csharpcode .attr { color: #ff0000; }
.csharpcode .alt
{
background-color: #f4f4f4;
width: 100%;
margin: 0em;
}
.csharpcode .lnum { color: #606060; }

Yes, We may Have Made a Mistake. June 3, 2011

Posted by gordonwatts in ATLAS, computers.
9 comments

No, no. I’m not talking about this. A few months ago I wondered if, short of generating our own reality, ATLAS made a mistake. The discussion was over source control systems:

Subversion, Mercurial, and Git are all source code version control systems. When an experiment says we have 10 million lines of code – all that code is kept in one of these systems. The systems are fantastic – they can track exactly who made what modifications to any file under their control. It is how we keep anarchy from breaking out as >1000 people develop the source code that makes ATLAS (or any other large experiment) go.

Yes, another geeky post. Skip over it if you can’t stand this stuff.

ATLAS has switched some time ago from a system called cvs to svn. The two systems are very much a like: centralized, top-down control. Old school. However, the internet happened. And, more to the point, the Cathedral and the Bazaar happened. New source control systems have sprung up. In particular, Mercurial and git. These systems are distributed. Rather than asking for permission to make modifications to the software, you just point your source control client at the main source and hit copy. Then you can start making modifications to your hearts content. When you are done you let the owner of the repository know and tell them where your repository is – and they then copy your changes back! The key here is that you had your own copy of the repository – so you could make multiple modifications w/out asking the owner. Heck, you could even send your modifications to your friends for testing before asking the owner to copy them back.

That is why it is called distributed source control. Heck, you can even make modifications to the source at 30,000 feet (when no wifi is available).

When I wrote that first blog post I’d never tried anything but the old school source controls. I’ve not spent the last 5 months using Mercurial – one of the new style systems. And I’m sold. Frankly, I have no idea how you’d convert the 10 million+ lines of code in ATLAS to something like this, but if there is a sensible way to convert to git or mercurial then I’m completely in favor. Just about everything is easier with these tools… I’ve never done branch development in SVN, for example. But in Mercurial I use it all the time… because it just works. And I’m constantly flipping my development directory from one branch to another because it takes seconds – not minutes. And despite all of this I’ve only once had to deal with merge conflicts. If you look at SVN the wrong way it will give you merge conflicts.

All this said, I have no idea how git or Mercurial would scale. Clearly it isn’t reasonable to copy the repository for 10+ million lines of code onto your portable to develop one small package. But if we could figure that out, and if it integrated well into the ATLAS production builds, well, that would be fantastic.

If you are starting a small stand alone project and you can choose your source control system, I’d definitely recommend trying one of these two modern tools.

Follow

Get every new post delivered to your Inbox.

Join 42 other followers