jump to navigation

Trends in Triggering: Offline to online June 5, 2015

Posted by gordonwatts in ATLAS, LHC, Trigger.
2 comments

The recent LHCC open meeting is a great place to look to see the current state of the Large Hadron Collider’s physics program. While watching the talks I had one of those moments. You know – where suddenly you realize something that you’d seen here and there isn’t just something you’d seen here and there, but that it is a trend. It was the LHCb talk that drove it home for me.

There are many reasons this is desirable, which I’ll get to in a second. but the fact that everyone is starting to do it is because it is possible. Moore’s law is at the root of this, along with the fact that we take software more seriously than we used to.

First, some context. Software in the trigger lives in a rather harsh environment. Take the LHC. Every 25 ns a new collision occurs. The trigger must decide if that collision is interesting enough to keep, or not. Interesting, of course, means cool physics like a collision that might contain a Higgs or perhaps some new exotic particle. We can only afford to save about 1000 events per second. Afford, by the way, is the right word here: each collision we wish to save must be written to disk and tape, and must be processed multiple times, spending CPU cycles. It turns out the cost of CPU cycles is the driver here.

Even with modern processors 25 ns isn’t a lot of time. As a result we tend to divide our trigger into levels. Traditionally the first level is hardware – fast and simple – and can make a decision in the first 25 ns. A second level is often a combination of specialized hardware and standard PC’s. It can take a little longer to make the decision. And the third level is usually a farm of commodity PC’s (think GRID or cloud computing). Each level gets to take a longer amount of time and make more careful calculations to make its decision. Already Moore’s law has basically eliminated Level 2. At the Tevatron DZERO had a hardward/PC Level 2; ATL:AS had a PC-only Level 2 the 2011-2012 run of ATLAS, and now even that is gone in the run that just started.

Traditionally the software that ran in the 3rd level trigger (often called a High Level Trigger, or HLT for short) were carefully optimized and custom designed algorithms. Often only a select part of the collaboration wrote these, and there were lots of coding rules involved to make sure extra CPU cycles (time) weren’t wasted. CPU is of utmost importance here, and every additional physics feature must be balanced against the CPU cost. It will find charged particle tracks, but perhaps only ones that can be quickly found (e.g. obvious ones). The ones that take a little more work – they get skipped in the trigger because it will take too much time!

Offline, on the other hand, was a different story. Offline refers to reconstruction code – this is code that runs after the data is recorded to tape. It can take its time – it can carefully reconstruct the data, looking for charged particle tracks anywhere in the detector, applying the latest calibrations, etc. This code is written with physics performance in mind, and traditionally, CPU and memory performance have been secondary (if that). Generally the best algorithms run here – if a charged particle track can be found by an algorithm, this is where that algorithm will reside. Who cares if it takes 5 seconds?

Traditionally, these two code bases have been exactly that: two code bases. But this does cause some physics problems. For example, you can have a situation where your offline code will find an object that your trigger code does not, or vice versa. And thus when it comes time to understand how much physics you’ve actually written to tape – a crucial step in measuring a particle like the Higgs, or searching for something new – the additional complication can be… painful (I speak from experience!).

Over time we’ve gotten much better at writing software. We now track performance in a way we never have before: physics, CPU, and memory are all measured on releases built every night. With modern tools we’ve discovered that… holy cow!… applying well known software practices means we can have our physics performance and CPU and memory performance too! And in the few places that just isn’t possible, there are usually easy knobs we can turn to reduce the CPU requirements. And even if we have to make a small CPU sacrifice, Moore’s law helps out and takes up the slack.

In preparation for Run 2 at the LHC ATLAS went through a major software re-design. One big effort was to more as many of the offline algorithms into the trigger as possible. This was a big job – the internal data structures had to be unified, offline algorithms’ CPU performance was examined in a way it had never been before. In the end ATLAS will have less software to maintain, and it will have (I hope) more understandable reconstruction performance when it comes to doing physics.

LHCb is doing the same thing. I’ve seen discussions about new experiments running offline and writing only that out. Air shower arrays searching for large cosmic-ray showers often do quite a bit of final processing in real-time. All of this made me think these were not isolated occurrences. I don’t think anyone has labeled this a trend yet, but I’m ready to.

By the way, this does not mean offline code and algorithms will disappear. There will always be versions of the algorithm that will use huge amounts of CPU power to get the last 10% of performance. The offline code is not run for several days after the data is taken in order to make sure the latest and greatest calibration data has been distributed. This calibration data is much more fine grained (and recent) than what is available to the trigger. Though as Moore’s law and our ability to better engineer the software improves, perhaps even this will disappear over time.

Pi Day–We should do it more! March 15, 2015

Posted by gordonwatts in ATLAS, Outreach, physics life.
add a comment

WP_20150314_003

Today was Pi day. To join in the festivities, here in Marseille, I took my kid to the Pi-day exhibit at MuCEM, the new fancy museum they built in 2013 here in Marseille. It was packed. The room was on the top floor, and it was packed with people (sorry for the poor quality of the photo, my cell phone doesn’t handle the sun pouring in the windows well!). It was full of tables with various activities all having to do with mathematics. Puzzles and games that ranged from logic to group theory. It was very well done, and the students were enthusiastic and very helpful. They really wanted nothing more than to be here on a Saturday with this huge crowd of people. For the 45 minutes we were exploring everyone seemed to be having a good time.

And when I say packed, I really do mean packed. When we left the fire marshals had arrived, and were carefully counting people. The folks (all students from nearby universities) were carefully making sure that only one person went in for everyone that went out.

Each time I go to one of these things or participate in one of these things I’m reminded how much the public likes it. The Particle Fever movie is an obvious recent really big example. It was shown over here in Marseille in a theater for the first time about 6 months ago. The theater sold out! This was not uncommon back in the USA (though sometimes smaller audiences happened as well!). The staging was genius: the creator of the movie is a fellow physicist and each time a town would do a showing, he would get in contact with some of his friends to do Q&A after the movie.

Another big one I helped put together was the Higgs announcement on July 3, 2012, in Seattle. There were some 6 of us. It started at midnight and went on till 2 am (closing time). At midnight, on a Tuesday night, there were close to 200 people there! We’d basically packed the bar. The bar had to kick us out as people were peppering us with questions as we were trying to leave before closing. It was a lot of fun for us, and it looked like a lot of fun for everyone else that attended.

I remember the planning stages for that clearly. We had contingency plans in case no one showed up. Or how to alter our presentation if there were only 5 people. I think we were opening for about 40 or so. And almost 200 showed up. I think most of us did not think the public was interested. This attitude is pretty common – why would they care about the work we do is a common theme in conversations about outreach. And it is demonstrably wrong. Smile

The lesson for people in these fields: people want to know about this stuff! And we should figure out how to do these public outreach events more often. Some cost a lot and are years in the making (e.g. the movie Particle Fever), but others are easy. For example – Science Café’s around the USA.

And in more different ways. For example, some friends of mine have come up with a neat way of looking for cosmic rays – using your cell phones (most interesting conversation on this project can be found on twitter). What a great way to get everyone involved!

And there are selfish reasons for us to do these things! A lot of funding for science comes from various governments agencies in the USA and around the world (be it local or federal), and the more of the public knows what is being done with their tax dollars, and what interesting results are being produced, the better. Sure, there are people who will never be convinced, but there are also a lot that will become even more enthusiastic.

So… what are your next plans for an outreach project?

The Higgs. Whaaaa? July 6, 2012

Posted by gordonwatts in ATLAS, CMS, Higgs, LHC, physics, press.
9 comments

Ok. This post is for all my non-physics friends who have been asking me… What just happened? Why is everyone talking about this Higgs thing!?

It does what!?

Actually, two things. It gives fundamental particles mass.  Not much help, eh? Smile Fundamental particles are, well, fundamental – the most basic things in nature. We are made out of arms & legs and a few other bits. Arms & legs and everything else are made out of cells. Cells are made out of molecules. Molecules are made out of atoms. Note we’ve not reached anything fundamental yet – we can keep peeling back the layers of the onion and peer inside. Inside the atom are electrons in a cloud around the nucleus. Yes! We’ve got a first fundamental particle: the electron! Everything we’ve done up to now says it stops with the electron. There is nothing inside it. It is a fundamental particle.

We aren’t done with the nucleus yet, however. Pop that open and you’ll find protons and neutrons. Not even those guys are fundamental, however – inside each of them you’ll find quarks – about 3 of them. Two “up” quarks and a “down” quark in the case of the proton and one “up” quark and two “down” quarks in the case of the neutron. Those quarks are fundamental particles.

The Higgs interacts with the electron and the quarks and gives them mass. You could say it “generates” the mass. I’m tempted to say that without the Higgs those fundamental particles wouldn’t have mass. So, there you have it. This is one of its roles. Without this Higgs, we would not understand at all how electrons and quarks have mass, and we wouldn’t understand how to correctly calculate the mass of an atom!

Now, any physicist who has made it this far is cringing with my last statement – as a quick reading of it implies that all the mass of an atom comes from the Higgs. It turns out that we know of several different ways that mass can be “generated” – and the Higgs is just one of them. It also happens to be the only one that, up until July 4th, we didn’t have any direct proof for. An atom, a proton, etc., has contributions from more than just the Higgs – indeed, most of a proton’s mass (and hence, an atom’s mass) comes from another mechanism. But this is a technical aside. And by reading this you know more than many reporters who are talking about the story!

The Higgs plays a second role. This is a little harder to explain, and I don’t see it discussed much in the press. And, to us physicists, this feels like the really important thing. “Electro-Weak Symmetry Breaking”. Oh yeah! It comes down to this: we want to tell a coherent, unified, story from the time of the big-bang to now. The thing about the big-bang is that was *really* hot. So hot, in fact, that the rules of physics that we see directly around us don’t seem to apply. Everything was symmetric back then – it all looked the same. We have quarks and electrons now, which gives us matter – but then it was so hot that they didn’t really exist – rather, we think, some single type of particle existed. Now, and the universe cooled down from the big bang, making its way towards present day, new particles froze out – perhaps the quarks froze out first, and then the electrons, etc. Let me see how far I can push this analogy… when water freezes, it does so into ice crystals. Say that an electron was one particular shape of ice crystal and a quark was a different shape. So you go from a liquid state where everything looks the same – heck – it is just water, to a solid state where the ice crystals have some set of shapes – and by their shape they become electrons or quarks.

Ok, big deal. It seems like the present day “froze” out of the Big Bang. Well, think about it. If our current particles evolved out of some previous state, then we had sure as hell be able to describe that freezing process. Even better – we had better be able to describe that original liquid – the Big Bang. In fact, you could argue, and we definitely do, that the rules that governed physics at the big bang would have to evolve to describe the rules that describe our present day particles. They should be connected. Unified!! Ha! See how I slipped that word in up above!?

We know about four forces in the universe: the strong (holds a proton together), weak (radioactive decay is an example), electro-magnetism (cell phones, etc. are examples), and gravity. The Higgs is a key player in the unification of the weak force and the electro-magnetic force. Finding it means we actually have a bead on how nature unifies those two forces. That is HUGE! This is a big step along the way to putting all the forces back together. We still have a lot of work to do!

Another technical aside. Smile We think of the first role – giving fundamental particles mass – a consequence of the second – they are not independent roles. The Higgs is key to the unification and in order to be that key, it must also be the source of the fundamental particle’s mass.

How long have you been searching for it?

A loooooong time. We are like archeologists. Nature is what nature is. Our job is to figure out how nature works. We have a mathematical model (called the Standard Model). We change it every time we find an experimental result that doesn’t agree with the calculation. The last time that happened was when we stumble upon the unexpected fact that neutrino’s have mass. The time before that was the addition of the Higgs, and that modification was first proposed in 1964 (it took a few years to become generally accepted). So, I suppose you could say in some sense we’ve been looking for it since 1964!

It isn’t until recently, however (say in the late 90’s) that the machines we use have become powerful enough that we could honestly say we were “in the hunt for the Higgs.” The LHC, actually, had finding the Higgs as one of its major physics goals. There was no guarantee – no reason nature had to work like that – so when we built it we were all a little nervous and excited… ok. a lot nervous and excited.

So, why did it take so long!? The main reason is we hardly ever make it in our accelerators! It is very very massive!! So it is very hard to make. Even at the LHC we make one every 3 hours… The LHC works by colliding protons together at a very high speed (almost the speed of light). We do that more than 1,000,000 times a second… and we make a Higgs only once every 3 hours. The very definition of “needle in a haystack!”

Who made this discovery?

Two very large teams of physicists, and a whole bunch of people running the LHC accelerator at CERN. The two teams are the two experiments: ATLAS and CMS. I and my colleagues at UW are on ATLAS. If you hear someone say “I discovered the Higgs” they are using the royal-I. This is big science. Heck – the detector is half a (American) football field long, and about 8 or 9 stories tall and wide. This is the sort of work that is done by lots of people and countries working together. ATLAS currently has people from 38 countries – the USA being one of them.

What does a Cocktail Party have to do with it?

The cocktail party analogy is the answer to why some fundamental particles are more massive than particles (sadly, not why I have to keep letting my belt out year-after-year).

This is a cartoon of a cocktail party. Someone very famous has just entered the room. Note how everyone has clumped around them! If they are trying to get to the other side of the room, they are just not going to get there very fast!!

Now, lets say I enter the room. I don’t know that many people, so while some friends will come up and talk to me, it will be nothing like that famous person. So I will be able to get across the room very quickly.

The fact that I can move quickly because I interact with few people means I have little mass. The famous person has lots of interactions and can’t move quickly – and in this analogy they have lots of mass.

Ok. Bringing it back to the Higgs. The party and the people – that is the Higgs field. How much a particle interacts with the Higgs field determines its mass. The more it interacts, the more mass is “generated.”

And that is the analogy. You’ve been reading a long time. Isn’t this making you thirsty? Go get a drink!

Really, is this that big a deal?

Yes. This is a huge piece of the puzzle. This work is definitely worth a Nobel prize – look for them to award one to the people that first proposed it in 1960 (there are 6 of them, one has passed away – no idea how the committee will sort out the max of 3 they can give it to). We have confirmed a major piece of how nature works. In fact, this was the one particle that the Standard Model predicted that we hadn’t found. We’d gotten all the rest! We now have a complete picture of the Standard Model is it is time to start work on extending the Standard Model. For example, dark matter and dark energy are not yet in the Standard Model. We have no figured out how to fully unify everything we know about.

No. The economy won’t see an up-tick or a down-tick because of this. This is pure research – we do it to understand how nature and the universe around us works. There are sometimes, by-luck, spin-offs. And there are people that work with us who take it on as one of their tasks to find spin offs. But that isn’t the reason we do this.

What is next?

Ok. You had to ask that. So… First, we are sure we have found a new boson, but the real world – and data, is a bit messy. We have looked for it, and expect it to appear in several different places. It appeared in most of them – one place it seems to be playing hide and seek (where the Higgs decays to tau’s – a tau is very much like a heavy electron). Now, only one of the two experiments has presented results in the tau’s (CMS), so we have to wait for my experiment, ATLAS, to present its results before we get worried.

Second, and this is what we’d be doing no matter what happened to the tau’s, is… HEY! We have a shiny new particle! We are going to spend some years looking at it from every single angle possible, taking it out for a test drive, you know – kicking the tires. There is actually a scientific point to doing that – there are other possible theories out there that predict the existence of a Higgs that looks exactly like the Standard Model Higgs except for some subtle differences. So we will be looking at this new Higgs every-which way to see if we can see any of those subtle differences.

ATLAS and CMS also do a huge amount of other types of physics – none of which we are talking about right now – and we will continue working on those as well.

Why do you call it the God Particle!?

We don’t. (especially check out the Pulp Fiction mash-up picture).

What will you all discover next?

I’ll get back to you on that…

Whew. I’m spent!

The Way You Look at the World Will Change… Soon December 2, 2011

Posted by gordonwatts in ATLAS, CERN, CMS, Higgs, physics.
7 comments

We are coming up on one of those “lucky to be alive to see this” moments. Sometime in the next year we will all know, one way or the other, if the Standard Model Higgs exists. Or it does not exist. How we think fundamental physics will change. I can’t understate the importance of this. And the first strike along this path will occur on December 13th.

If it does not exist that will force us to tear down and rebuild – in some totally unknown way – our model of physics. Our model that we’ve had for 40+ years now. Imagine that – 40 years and now that it finally meets data… poof! Gone. Or, we will find the Higgs, and we’ll have a mass. Knowing the mass will be in itself interesting, and finding the Higgs won’t change the fact that we still need something more than the Standard Model to complete our description of the universe. But now every single beyond-the-standard model theory will have to incorporate not only electrons, muons, quarks, W’s, Z’s, photons, gluons – at their measured masses, but a Higgs too with the appropriate masses we measure!

So, how do I know this is going to happen? Look at this plot that was released during the recent HCP conference (deepzoom version Smile) in Paris.

Ok, this takes a second to explain. First, when we look for the Higgs we do it as a function of its mass – the theory does not predict exactly how massive it will be. Second, the y-axis is the rate at which the Higgs is produced. When we look for it at a certain mass we make a statement “if the Higgs exists at mass 200 GeV/c2, then it must be happening at a rate less than 0.6 or we would have seen it.” I read the 0.6 off the plot by looking at the placement of the solid black line with the square points – the observed upper limit. The rate, the y-axis, is in funny units. Basically, the red line is the rate you’d expect if it was a standard model Higgs. The solid black line with the square points on it is the combined LHC exclusion line. Combined means ATLAS + CMS results. So, anywhere the solid black line dips below the red horizontal line means that we are fairly confident that the Standard Model Higgs doesn’t exist (BTW – even fairly confident has a very specific meaning here: we are 95% confident). The hatched areas are the areas where the Higgs has already been ruled out. Note the hatched areas at low mass (100 GeV or so) – those are from other experiments like LEP.

Now that is done. A fair question is where would we expect to find the Higgs. As it turns out, a Standard Model Higgs will mostly likely occur at low masses – exactly that region between 114 GeV/c2 and 140 GeV/c2. There isn’t a lot of room left for the Higgs to hide there!! These plots are with 2 fb-1 of data. Both experiments now have about 5 fb-1 of data recorded. And everyone wants to know exactly what they see. Heck, while in each experiment we basically know what we see, we desperately want to know what the other experiment sees. The first unveiling will occur at a joint seminar at 2pm on December 13th. I really hope it will be streamed on the web, as I’ll be up in Whistler for my winder ski vacation!

So what should you look for during that seminar (or in the talks that will be uploaded when the seminar is given)? The above plot will be a quick summary of what the status of the experiments. Each experiment will have an individual one. The key thing to look for is where the dashed line and the solid line deviate significantly. The solid line I’ve already explained – that says that for the HIggs of a particular mass if it is there, it must be at a rate less than what is shown. Now, the dashed line is what we expect – given everything was right – and the Higgs didn’t exist at that mass – that is how good we expect to be. So, for example, right around the 280 GeV/C2 level we expect to be able to see a rate of about 0.6, and that is almost exactly what we measure. Now look down around 120-130 GeV/c2. There you’ll notice that the observed line is well above the solid line. How much – well, it is just along the edge of the yellow band – which means 2 sigma. 2 sigma isn’t very much – so this plot has nothing to get very interested yet. But if one of the plots shown over the next year has a more significant excursion, and you see it in both experiments… then you have my permission to get a little excited. The real test will be if we can get to a 5 sigma excursion.

This seminar is the first step in this final chapter of the old realm of particle physics. We are about to start a new chapter. I, for one, can’t wait!

N.B. I’m totally glossing over the fact that if we do find something in the next year that looks like a Higgs, it will take us sometime to make sure it is a Standard Model Higgs, rather than some other type of Higgs! 2nd order effect, as they say. Also, in that last long paragraph, the sigma’s I’m talking about on the plot and the 5 sigma discovery aren’t the same – so I glossed over some real details there too (and this latter one is a detail I sometimes forget, much to my embarrassment at a meeting the other day!).

Update: Matt Strassler posted a great post detailing the ifs/ands/ors behind seeing or not seeing – basically a giant flow-chart. Check it out!

Source Code In ATLAS June 11, 2011

Posted by gordonwatts in ATLAS, computers.
3 comments

I got asked in a comment what, really, was the size in lines of the source code that ATLAS uses. I have an imperfect answer. About 7 million total. This excludes comments in the code and blank lines in the code.

The break down is a bit under 4 million lines of C++ and almost 1.5 million lines of python – the two major programming languages used by ATLAS. Additionally, in those same C++ source files there are another about million blank lines and almost a million lines of comments. Python contains similar fractions.

There are 7 lines of LISP. Which was probably an accidental check-in. Once the build runs the # of lines of source code balloons almost a factor of 10 – but that is all generated code (and HTML documentation, actually) – so shouldn’t count in the official numbers.

This is imperfect because these are just the files that are built for the reconstruction program. This is the main program that takes the raw detector signals and coverts them into high level objects (electrons, muons, jets, etc.). There is another large body of code – the physics analysis code. That is the code that takes those high level objects and coverts them into actual interesting measurements – like a cross section, or a top quark mass, or a limit on your favorite SUSY model. That is not always in a source code repository, and is almost impossible to get an accounting of – but I would guess that it was about another x10 or so in size, based on experience in previous experiments.

So, umm… wow. That is big. But it isn’t quite as big as I thought! I mentioned in the last post talking about source control that I was worried about the size of the source and checking it out. However, Linux is apparently about 13.5 million lines of code, and uses one of these modern source control systems. So, I guess these things are up to the job…

Yes, We may Have Made a Mistake. June 3, 2011

Posted by gordonwatts in ATLAS, computers.
9 comments

No, no. I’m not talking about this. A few months ago I wondered if, short of generating our own reality, ATLAS made a mistake. The discussion was over source control systems:

Subversion, Mercurial, and Git are all source code version control systems. When an experiment says we have 10 million lines of code – all that code is kept in one of these systems. The systems are fantastic – they can track exactly who made what modifications to any file under their control. It is how we keep anarchy from breaking out as >1000 people develop the source code that makes ATLAS (or any other large experiment) go.

Yes, another geeky post. Skip over it if you can’t stand this stuff.

ATLAS has switched some time ago from a system called cvs to svn. The two systems are very much a like: centralized, top-down control. Old school. However, the internet happened. And, more to the point, the Cathedral and the Bazaar happened. New source control systems have sprung up. In particular, Mercurial and git. These systems are distributed. Rather than asking for permission to make modifications to the software, you just point your source control client at the main source and hit copy. Then you can start making modifications to your hearts content. When you are done you let the owner of the repository know and tell them where your repository is – and they then copy your changes back! The key here is that you had your own copy of the repository – so you could make multiple modifications w/out asking the owner. Heck, you could even send your modifications to your friends for testing before asking the owner to copy them back.

That is why it is called distributed source control. Heck, you can even make modifications to the source at 30,000 feet (when no wifi is available).

When I wrote that first blog post I’d never tried anything but the old school source controls. I’ve not spent the last 5 months using Mercurial – one of the new style systems. And I’m sold. Frankly, I have no idea how you’d convert the 10 million+ lines of code in ATLAS to something like this, but if there is a sensible way to convert to git or mercurial then I’m completely in favor. Just about everything is easier with these tools… I’ve never done branch development in SVN, for example. But in Mercurial I use it all the time… because it just works. And I’m constantly flipping my development directory from one branch to another because it takes seconds – not minutes. And despite all of this I’ve only once had to deal with merge conflicts. If you look at SVN the wrong way it will give you merge conflicts.

All this said, I have no idea how git or Mercurial would scale. Clearly it isn’t reasonable to copy the repository for 10+ million lines of code onto your portable to develop one small package. But if we could figure that out, and if it integrated well into the ATLAS production builds, well, that would be fantastic.

If you are starting a small stand alone project and you can choose your source control system, I’d definitely recommend trying one of these two modern tools.

16,000 Physics Plots January 12, 2011

Posted by gordonwatts in ATLAS, CDF, CMS, computers, D0, DeepTalk, physics life, Pivot Physics Plots.
4 comments

Google has 20% time. I have Christmas break. If you work at Google you are supposed to have 20% of your time to work on your own little side project rather than the work you are nominally supposed to be doing. Lots of little projects are started this way (I think GMail, for example, started this way).

Each Christmas break I tend to hack on some project that interests me – but is often not directly related to something that I’m working on. Usually by the end of the break the project is useful enough that I can start to get something out of it. I then steadily improve it over the next months as I figure out what I really wanted. Sometimes they never get used again after that initial hacking time (you know: fail often, and fail early). My deeptalk project came out of this, as did my ROOT.NET libraries. I’m not sure others have gotten a lot of use out of these projects, but I certainly have. The one I tackled this year has turned out to be a total disaster. Interesting, but still a disaster. This plot post is about the project I started a year ago.  This was a fun one. Check this out:

image

Each of those little rectangles represents a plot released last year by DZERO, CDF, ATLAS, or CMS (the Tevatron and LHC general purpose collider experiments) as a preliminary result. That huge spike is July – 3600 plots (click to enlarge the image) -  is everyone preparing for the ICHEP conference. In all the 4 experiments put out about 6000 preliminary plots last year.

I don’t know about you – but there is no way I can keep up with what the four experiments are doing – let alone the two I’m a member of! That is an awful lot of web pages to check – especially since the experiments, though modern, aren’t modern enough to be using something like an Atom/RSS feed! So my hack project was to write a massive web scraper and a Silverlight front-end to display it. The front-end is based on the Pivot project originally from MSR, which means you can really dig into the data.

For example, I can explode December by clicking on “December”:

image

and that brings up the two halves of December. Clicking in the same way on the second half of December I can see:

image

From that it looks like 4 notes were released – so we can organize things by notes that were released:

image

Note the two funny icons – those allow you to switch between a grid layout of the plots and a histogram layout. And after selecting that we see that it was actually 6 notes:

image

 

That left note is title “Z+Jets Inclusive Cross Section” – something I want to see more of, so I can select that to see all the plots at once for that note:

image

And say I want to look at one plot – I just click on it (or use my mouse scroll wheel) and I see:

image

I can actually zoom way into the plot if I wish using my mouse scroll wheel (or typical touch-screen gestures, or on the Mac the typical zoom gesture). Note the info-bar that shows up on the right hand side. That includes information about the plot (a caption, for example) as well as a link to the web page where it was pulled from. You can click on that link (see caveat below!) and bring up the web page. Even a link to a PDF note is there if the web scrapper could discover one.

Along the left hand side you’ll see a vertical bar (which I’ve rotated for display purposes here):

image

You can click on any of the years to get the plots from that year. Recent will give you the last 4 months of plots. Be default, this is where the viewer starts up – seems like a nice compromise between speed and breadth when you want to quickly check what has recently happened. The “FS” button (yeah, I’m not a user-interface guy) is short for “Full Screen”. I definitely recommend viewing this on a large monitor! “BK” and “FW” are like the back and forward buttons on your browser and enable you to undo a selection. The info bar on the left allows you do do some of this if you want too.

Want to play? Go to http://deeptalk.phys.washington.edu/ColliderPlots/… but first read the following. Smile And feel free to leave suggestions! And let me know what you think about the idea behind this (and perhaps a better way to do this).

  • Currently works only on Windows and a Mac. Linux will happen when Moonlight supports v4.0 of Silverlight. For Windows and the Mac you will have to have the Silverlight plug-in installed (if you are on Windows you almost certainly already have it).
  • This thing needs a good network connection and a good CPU/GPU. There is some heavy graphics lifting that goes on (wait till you see the graphics animations – very cool). I can run it on my netbook, but it isn’t that great. And loading when my DSL line is not doing well can take upwards of a minute (when loading from a decent connection it takes about 10 seconds for the first load).
  • You can’t open a link to a physics note or webpage unless you install this so it is running locally. This is a security feature (cross site scripting). The install is lightweight – just right click and select install (control-click on the Mac, if I remember correctly). And I’ve signed it with a certificate, so it won’t get messed up behind your back.
  • The data is only as good as its source. Free-form web pages are a mess. I’ve done my best without investing an inordinate amount of time on the project. Keep that in mind when you find some data that makes no sense. Heck, this is open source, so feel free to contribute! Updating happens about once a day. If an experiment removes a plot from their web pages, then it will disappear from here as well at the next update.
  • Only public web pages are scanned!!
  • The biggest hole is the lack of published papers/plots. This is intentional because I would like to get them from arxiv. But the problem is that my scrapper isn’t intelligent enough when it hits a website – it grabs everything it needs all at once (don’t worry, the second time through it asks only for headers to see if anything has changed). As a result it is bound to set off arxiv’s robot sensor. And the thought of parsing TeX files for captions is just… not appealing. But this is the most obvious big hole that I would like to fix some point soon.
  • This depends on public web pages. That means if an experiment changes its web pages or where they are located, all the plots will disappear from the display! I do my best to fix this as soon as I notice it. Fortunately, these are public facing web pages so this doesn’t happen very often!

Ok, now for some fun. Who has the most broken links on their public pages? CDF by a long shot. Smile Who has the pages that are most machine readable? CMS and DZERO. But while they are that, the images have no captions (which makes searching the image database for text words less useful than it should be). ATLAS is a happy medium – their preliminary results are in a nice automatically produced grid that includes captions.

Did ATLAS Make a Big Mistake? December 16, 2010

Posted by gordonwatts in ATLAS, computers.
11 comments

Ok. That is a sensationalistic headline. And, the answer is no. ATLAS is so big that, at least in this case, we can generate our own reality.

Check out this graphic, which I’ve pulled form a developer survey.

image

Ok, I apologize for this being hard to read. However, there is very little you need to read here. The first column is Windows users, the second Linux, and the third Mac. The key colors to pay attention to are red (Git), Green (Mercurial), and Purple (Subversion). This survey was completed just recently, has about 500 people responding. So it isn’t perfect… But…

Subversion, Mercurial, and Git are all source code version control systems. When an experiment says we have 10 million lines of code – all that code is kept in one of these systems. The systems are fantastic – they can track exactly who made what modifications to any file under their control. It is how we keep anarchy from breaking out as >1000 people develop the source code that makes ATLAS (or any other large experiment) go. Heck, I use Subversion for small little one-person projects as well. Once you get used to using them you wonder how you ever did without them.

One thing to note is that cvs, which is the grand-daddy of all version control systems and used to be it about 10 or 15 years ago doesn’t even show up. Experiments like CDF and DZERO, however, are still using them. The other thing to note… how small Subversion is. Particularly amongst Linux and Mac users. It is still fairly strong in Windows, though I suspect that is in part because there is absolutely amazing integration with the operating system which makes it very easy to use. And the extent to which it is used on Linux and the Mac may also be influenced by the people that took the survey – they used twitter to advertise it and those folks are probably a little more cutting edge on average than the rest of us.

Just a few years ago Subversion was huge – about the current size of Git. And there in lies the key to the title of this post. Sometime in March 2009 ATLAS decided to switch from cvs to Subversion. At the time it looked like Subversion was the future of source control. Ops!

No, ATLAS doesn’t really care for the most part. Subversion seems to be working well for it and its developers. And all the code for Subversion is open source, so it won’t be going away anytime. At any rate, ATLAS is big enough that it can support the project even if it is left as one of the only users of it. Still… this shift makes you wonder!

I’ve never used Git and Mercurial – both of which are a new type of distributed source control system. The idea is that instead of having a central repository where all your changes to your files are tracked, each person has their own. They can trade batches of changes back and forth with each other without contacting the central repository. It is a technique that is used in the increasingly high speed development industry (for things like Agile programming, I guess). Also, I’ve often heard the term “social coding” applied to Git as well, though it sounds like that may have to do more with the GitHub repository’s web page setup than the actual version control system. It is certainly true that anyone I talk to raves about GitHub and other things like that. While I might not get it yet, it is pretty clear that there is something to “get”.

I wonder if ATLAS will switch? Or, I should say, when it will switch! This experiment will go on 20 years. Wonder what version control system will be in ascendance in 10 years?

Update: Below, Dale included a link to a video of Linus talking about GIT (and trashing cvs and svn). Well worth a watch while eating lunch!

Linus on GIT– he really hates cvs and svn–and makes a pretty good case

The Particle Physics Version of the Anecdote December 13, 2010

Posted by gordonwatts in ATLAS, Hidden Valley.
4 comments

Anecdotes are wonderful things, used (and misused) all the time. They tell great little stories, can be the seed of a new idea, or bring down an argument. Have something that is always true? Then you need but one anecdote to bring it tumbling to the ground. People fighting the evolution vs. creationism battle know this technique well! Of course, it is often misused too – an anecdote does not a theory make or break!

In experimental particle physics we have our own version of an anecdote: the event display. In the anecdotal sense we use it mostly in the sense that it is the seed of a new idea. Our eyes and brain are better at recognizing a new pattern than any computer algorithm currently known. I’ve often said that gut instinct does play a role in physics – and the event display is one place where we learn our gut instinct!

Take, for example, this event display shown by Monica Verducci at the Discrete2010 conference

image

You are looking at the inner detector of ATLAS – first (from inner to outer) are the highly accurate pixel detectors, then the silicon strip detectors, and finally all the dots are the transition radiation detector (TRT). The hits from a simulated Hidden Valley event are shown. Now, so the average particle physicist most of that display looks very normal, and wouldn’t even raise an eyebrow. Except for two features. Opposite each other, just above and below the horizontal, there are two plumes of particles. While plumes of particles (“jets”) are not uncommon, the fact that they draw to a point a long way – meters – from the center of the detector is. Very uncommon in fact.

Your eye can pick those out right away. Perhaps, if you aren’t a particle physicist, you didn’t realize those were unique, but I bet your eye got them right away, regardless. Now, the problem is to develop a computer algorithm to pick those guys out. It may look trivial – after all something that your eye gets that easily can’t be that hard – but it turns out not to be the case. Especially using full blown tracking to find those guys… tracking that is tuned to find a track that originates from the center of the detector. Just starting at it like this I’m having a few ideas of things we could do to find those tracks.

Say you already have an algorithm, but it fails some 30% of the time. Then you might take 100 interactions that fail, make event displays of all of them, create a slide show, and then just watch them one after the other. If you are lucky you’ll start to see a pattern.

None of this proves anything, unfortunately. Anecdotes aren’t science. But they do lead to ideas that can be tested! Once you have an idea for the algorithm you can write some code – which is not affected by human bias! – and run it on your sample of interactions. Now you can test it, and you measure its performance and see if your idea is going to work. By measuring you’ve turned your anecdote into science.

That is what I mean by the event display can be the germ of an idea. I’ve seen this technique used a number of times in my field. Though not enough! Our event displays are very hard to use and so many of us (myself included) tend to use them as a last resort. This is unfortunate, because when looking for some new sort of pattern recognition algorithm – as in this case – they are incredibly valuable. Another trend I’ve noticed – the old generation seems to resort to these much quicker than the younger ones. <cough>bubble chambers<cough>.

Just like with real anecdotes, we particle physicists misused our event displays all the time. The most public example is we show an event display at a conference and then call it “a typical event.” You should chuckle. Anytime you hear that it is code for “we searched and searched for the absolutely cleanest event we could find that most clearly demonstrates what we want you to think of as normal and that probably will happen less than once every year.” <smile>

What do you mean it isn’t about the $$? December 16, 2009

Posted by gordonwatts in ATLAS, CERN, LHC, life.
3 comments

A cute article in Vanity Fair:

Among the defining attributes of now are ever tinier gadgets, ever shorter attention spans, and the privileging of marketplace values above all. Life is manically parceled into financial quarters, three-minute YouTube videos, 140-character tweets. In my pocket is a phone/computer/camera/video recorder/TV/stereo system half the size of a pack of Marlboros. And what about pursuing knowledge purely for its own sake, without any real thought of, um, monetizing it? Cute.

Something I found out from this article – The LHC is the largest machine ever built. Ok. Wow. Ever!? I would have though that something like a large air craft carrier would have beat this. Still.

The attention span is another interesting aspect I’d not thought about. You know that the first space shuttles were using magnetic core memory (see the reference in that Wikipedia particle). There were a number of reasons for this – one of them was certainly there was no better technology available when they started. Before it was built more robust memory appeared – but it was too late to redesign. Later space shuttles were fitted with more modern versions of the memory.

In internet time, 6 months or a year and you are already a version behind. And it matters. It would seem part of the point of the now is to be using the latest and greatest. You know how everyone stands around a water cooler discussing the latest episode of some TV show (i.e. Lost when it first started). Now it is the latest iPhone update or some other cool new gadget. Ops. Hee hee. I said water cooler. How quaint. Obviously, I meant facebook.

Projects like the space shuttle or the LHC take years and years. And a lot of people have to remain focused for that long. And governments who provide the funding. You know how hard that is – especially for a place like the USA where every year they discuss the budget? It is hard. Some people have been working on this for 20 years. 20 years! And now data is finally arriving. Think about that: designs set down 20 years ago have finally been built and installed and integrated and tested.

This science does not operate on internet time. But we are now deep in the age of internet time. How will the next big project fair? Will we as a society have the commitment to get it done?

I like the writing style in this VF article – a cultural look at the LHC. They do a good job of describing the quench as well. I recommend the read. And, finally, yes, this post ended up very different from the way it started. 🙂

Thanks to Chris @ UW for bringing this article to my attention.