HEP in the Cloud March 20, 2008Posted by gordonwatts in ATLAS, computers, Uncategorized.
Amazon has done a lot of work to make GRID computing services accessible to anyone that wants it. Actually, it surprised me that Google or Microsoft didn’t do it first — to run their search engines and other similar things they must have farm computing down to a tee.
In HEP we spend a huge amount of money and cost and time with the GRID. A discussion in a bar some time back generated the question: what would it cost to move HEP into the cloud?
Yesterday I mentioned databases for storing event data. Amazon has SimpleDB (see this posting to get an idea of how it works). On the surface it looks rather poorly suited to do what we would want to do with our highly structured data. But, ignoring that and some of the overhead it will charge – for the 100 GB of data that Rich had in his database it would cost about 150 bucks a month to store it. Querying is dirt cheap — 14 cents per hour of CPU time used. I have no idea what the performance would be on a database like this, but even if it were x10 slower I doubt it would matter much.
ATLAS’ equivalent database to Rich’s project is thought to be 14 TB/year. That works out to be $21,500/month.
Amazon has a simple storage service as well (Amazon S3). Because the data is just a binary blob the cost of storage is much cheaper: 15 cents per GB per month. However, trying to figure out what size ATLAS will actually use if it stored everything in the cloud, and ignored the actual design, is difficult. Making some rough estimates from an old version of the computing model, I’m going to guess about 10 PB per year (that is petabyte!). That is about 1.6 million bucks per month. But we aren’t done with this yet, however – it costs money to move the data in and out. First, just to load the data it will cost about 1 million.
Then we have to use the data – lets say each year we cycle through all the data once — so all 10 PB. That will run about 2.5 million per year (not per month!). But if we use Amazon’s EC2 compute cloud, moving data to it and back is free. In that case, only final datasets will probably be moved. That would be much cheaper.
This is even harder for me to calculate. This matches up with Amazon’s EC2. One cool thing is data between these computers and S3 is free. Otherwise, for a 32 bit single processor machine that has enough memory to run ATLAS software it looks like it costs about 10 cents per hour of use. Now, in ATLAS an estimate in 2005 was it would take about 3000 kSI2k to reconstruct the average event. So, for an Amazon machine (that is about 1.9 kSI2k) that would take about 26 minutes. So, about 5 cents per event to reconstruct the event. If we expect 2,000,000,000 events per year, then that will cost us $100 million dollars to reconstruct. If someone is familiar with SpecINT2000 and how it works, perhaps they can verify I did this math “ok”. And I’ve not included analysis time which is probably x2 more.
So, there you have it. A lot of money would go into running this in the cloud. Of course, we could never walk up to someone like Amazon and dump this on them. In almost all cases we will do better on our own as we can optimize what we are doing for our uses. Further, the cash that gets spent on this is from all over, and in all different colors. Many nations, for example, buy GRID installations for all scientists in their country. ATLAS just piggybacks on these purchases and uses a portion of them. Still, interesting to see what the cost would be – about 120 million before you even start to analyze the data to produce a physics result!
WARNING: this is very much a back-of-the-envelope calculation!!