The Real Monte Carlo Story April 14, 2008
Posted by gordonwatts in ATLAS, physics.add a comment
A few posts ago I poked fun at the idea that we’d be running 10 TeV Monte Carlo in ATLAS. Not Even Wrong linked to it. So, now I have to eat my words.
We will all be running the Monte Carlo.
Here is the deal. In an experiment like ATLAS we have a huge software base. Think of it like your favorite operating system - Windows, Mac OS, or Linux. Every now and then a new version is created. It is a huge undertaking each time. Lots of source code, lots of updates, lots of new functionality, lots of old things break, etc.
It is the same thing for us in particle physics. The software code to run these experiments, simulate Monte Carlo, reconstruct the data, and analyze the physics is in constant flux. Hopefully improving. Every now and then (say every 6 months or a year or so) all the recent changes are gathered up and released. Once the release is shown to work we let it loose on all of our computer farms with the express task of generating Monte Carlo for us.
This means we regenerate most of our Monte Carlo about once or twice a year. In ATLAS we are working on the version of the code we expect to run when data taking finally arrives (how exciting is that - when data arrives!!). If all goes well in a few months or so it will be in good enough shape to start producing Monte Carlo. And guess what. Whatever the LHC will initially turn on at - that is the energy we will produce the Monte Carlo.
So the whole experiment will be producing Monte Carlo at this energy, not just someone in a back room in secret. Oh well.
ATLAS Week April 12, 2008
Posted by gordonwatts in ATLAS, LHC, physics.add a comment
ATLAS has just finished one of its large collaboration meeting. One of the nice things about these meetings is we get to hear a fairly detailed report on the machine status - something I don’t always hear except in rumors. In this case it was filling in some of the blanks that were in the recent press release explaining that the start up of the LHC would be at the reduced energy of 10 TeV instead of 14 TeV.
The problem is some of the dipole magnets. They have to be trained to run at full field. Full field for most magnets is 8 Tesla, which is about 133333 times stronger than the earth’s magnetic field. They have to be that strong in order to bend the very high energy 7 TeV beams of protons (magnets are to charged particles like protons what lenses are to light). The power requirements are stupendous (scientific term). In fact, they would probably melt if they were made out of regular copper wire. Instead they make them out a special wire that is superconducting when it is very cold. About -270 degrees Celsius.
The beauty about superconducting wire is that it doesn’t dissipate any energy of the current it is carrying. You know how an overloaded plug socket gets warm? That is because some of the current is converted into heat instead of being used to run your computer - a waste. When dealing with the currents in these magnets - well, it would be so hot that it would melt the magnet.
These magnets have a tendency to quench. Which is a problem. Lets say you have a bundle of wires all at -270 degrees carrying a huge amount of current. Lets say a flaw in one part of one wire causes it suddenly to loose its superconducting property. As a result the current flowing through that bit of wire starts to generate heat. That heat, of course, warms up all the wire around it, which causes it to “go normal” as well. This process rapidly cascades until the whole magnet ceases to be superconducting. This is called a quench. If not handled correctly this can be disastrous - you could melt the whole thing (and these things are expensive!). Part of the magnet design is quench protection.
Now, here is the cool thing. To get to their full field strength you have to train the magnets. This is particularly true when you are pushing the envelope of what the technology can do. You do this by slowly increasing the current in the magnet until it quenches. Once it has, you cool it down again and try again. And repeat.
This process is what will prevent the LHC from being ready to run at 14 TeV this year. The retraining of some magnets is taking too long (all the magnets were trained to full strength before they were installed - so some have become “untrained”). So their plans are to retain these magnets that are not properly trained over the first shutdown in the winter of 2008-2009.
And that, right there, tells us how long we will be running at the reduced energy of 10 TeV. If we are very lucky we will see beam in August and that will be our first run. So, probably a few months. Now, if I’m allowed to put on my old-guy hat, I’m going to guess that we won’t really get collisions until later than that and then the data coming out of our detector won’t make much sense until just around the shutdown. So it could well be this initial 10 TeV run gets almost no useful physics out - but is exactly what we need to get our brand spanking new detector into shape for the first real 14 TeV run.
BTW, I should say that the LHC has not told the experiments at what energy it will actually run yet. People think it will probably be 10 TeV, but the official word has not come from the machine division yet. Next week that should happen.
There were several other things of general note at the meeting (actually, there was a lot, but…). One thing is if you watched Peter Jenni’s talk - he gave out a few links you can go for status info. One has the current cooling status of the accelerator. I don’t think it is meant for everyone to look at, so I won’t post the link. But if you are member of ATLAS you can just look at Peter’s talk on the agenda server. The graphic is cool! I want to make it the background on my computer!
The other thing that, as a member of ATLAS, really makes this time exciting is the detection of cosmic rays. More and more detectors are getting turned on - and the first thing that is done with them is to look for cosmic rays. A few months ago people talked about the first cosmic ray having been seen. Now everyone in ATLAS is showing these things. Maybe this thing will work after all…
HEP in the Cloud March 20, 2008
Posted by gordonwatts in ATLAS, computers.10 comments
Amazon has done a lot of work to make GRID computing services accessible to anyone that wants it. Actually, it surprised me that Google or Microsoft didn’t do it first — to run their search engines and other similar things they must have farm computing down to a tee.
In HEP we spend a huge amount of money and cost and time with the GRID. A discussion in a bar some time back generated the question: what would it cost to move HEP into the cloud?
Databases
Yesterday I mentioned databases for storing event data. Amazon has SimpleDB (see this posting to get an idea of how it works). On the surface it looks rather poorly suited to do what we would want to do with our highly structured data. But, ignoring that and some of the overhead it will charge - for the 100 GB of data that Rich had in his database it would cost about 150 bucks a month to store it. Querying is dirt cheap — 14 cents per hour of CPU time used. I have no idea what the performance would be on a database like this, but even if it were x10 slower I doubt it would matter much.
ATLAS’ equivalent database to Rich’s project is thought to be 14 TB/year. That works out to be $21,500/month.
Event Data
Amazon has a simple storage service as well (Amazon S3). Because the data is just a binary blob the cost of storage is much cheaper: 15 cents per GB per month. However, trying to figure out what size ATLAS will actually use if it stored everything in the cloud, and ignored the actual design, is difficult. Making some rough estimates from an old version of the computing model, I’m going to guess about 10 PB per year (that is petabyte!). That is about 1.6 million bucks per month. But we aren’t done with this yet, however - it costs money to move the data in and out. First, just to load the data it will cost about 1 million.
Then we have to use the data - lets say each year we cycle through all the data once — so all 10 PB. That will run about 2.5 million per year (not per month!). But if we use Amazon’s EC2 compute cloud, moving data to it and back is free. In that case, only final datasets will probably be moved. That would be much cheaper.
Computing
This is even harder for me to calculate. This matches up with Amazon’s EC2. One cool thing is data between these computers and S3 is free. Otherwise, for a 32 bit single processor machine that has enough memory to run ATLAS software it looks like it costs about 10 cents per hour of use. Now, in ATLAS an estimate in 2005 was it would take about 3000 kSI2k to reconstruct the average event. So, for an Amazon machine (that is about 1.9 kSI2k) that would take about 26 minutes. So, about 5 cents per event to reconstruct the event. If we expect 2,000,000,000 events per year, then that will cost us $100 million dollars to reconstruct. If someone is familiar with SpecINT2000 and how it works, perhaps they can verify I did this math “ok”. And I’ve not included analysis time which is probably x2 more.
So, there you have it. A lot of money would go into running this in the cloud. Of course, we could never walk up to someone like Amazon and dump this on them. In almost all cases we will do better on our own as we can optimize what we are doing for our uses. Further, the cash that gets spent on this is from all over, and in all different colors. Many nations, for example, buy GRID installations for all scientists in their country. ATLAS just piggybacks on these purchases and uses a portion of them. Still, interesting to see what the cost would be - about 120 million before you even start to analyze the data to produce a physics result!
WARNING: this is very much a back-of-the-envelope calculation!!
National Geographic LHC Article and Pictures March 13, 2008
Posted by gordonwatts in ATLAS, LHC, photography.1 comment so far
Thirsting for some stunning pictures of the LHC? Check out this article in the National Geographic magazine. Make sure to look at the photo-gallery that comes along with the article. Some of the pictures — like the ALICE and CMS detector pictures are really stunning (ATLAS too, of course, but we’ve seen that one already).
BTW — the ATLAS detector is no longer all that photogenic on a grand scale - the cavern is now so full of bits of the detector it is quite difficult to get an idea of how big it is - all your site lines are blocked!
The picture is one of mine of ATLAS. The ones from ATLAS are much better!
Robust Trigger March 2, 2008
Posted by gordonwatts in ATLAS, D0.2 comments
I’ve got to give a talk on Trigger Robustness next Tuesday (the link is protected unless you are a member of ATLAS - sorry). This is supposed to be robustness in all aspects of the word: against beam-gas events, against DAQ problems, against calibration problems, etc.
If you were attending this thing and were to hear a report from the Tevatron, what would you want to hear?