The Cost Of Free GRID Access June 13, 2008Posted by gordonwatts in computers, physics, science, university.
I was giving some thought to the health of our department at the University of Washington the other day. Cheap and readily available computing power means new types of physics simulations can be tackled that have never been done before. Think of it like weather forecasting – the more computer power brought to bear the better the models are at predicting reality. Not only are the old style models better, we can try new weather models and make predictions that were never possible with the previous versions. The same thing is happening in Physics. Techniques and levels of detail we never though possible are now tackled on a regular basis. NSF and DOE both have programs specifically designed to fund these sorts of endeavors.
This means there is a growing need for a physics department to have a strong connection to a large computing resource – in house or otherwise – in order for its faculty members to be able to participate in these cutting edge research topics.
Particle physics is no stranger to these sorts of large-scale computing requirements. In ATLAS, our current reconstruction programs take over 15 seconds per event — we expect to collect 200 events per second – we would need a farm of 200*15=3000 CPUs just to keep pace. And that says nothing about the ability to reprocess and the huge number of Monte Carlo events we must simulate (over 2 minutes per event). And then we have to do this over and over again as we refine our analysis strategy. Oh, and lets not forget analyzing the data either!
However, even though may of us are located at universities, we don’t make heavy use of local clusters. I think there are two reasons. First the small one: the jobs we run are different from most simulation tasks run by other physicist. Their research values high bandwidth communication between CPU’s (i.e. Lattice QCD calculations) and requires little memory per-processor. Ours does not need the communication bandwidth but needs a huge amount of memory per processor (2 GB and growing).
The second reason is more important – we HEP folks get access to a large international GRID for “free”. This GRID is tailor made for our needs – we drove much of the design of it actually. We saw a need for this more than a decade ago, and have been working on getting it built and working smoothly ever since. While we still have a way to go towards smooth operation, it does serve almost all of our needs well. And to a university group like ourselves at the University of Washington, cheaply. By function of being a member of the ATLAS or D0 collaboration, I get a security certificate that allows me to submit large batch jobs to the GRID. An example of the power: it took us weeks to simulate 40,000 events locally. When we submitted it to the GRID we had back 100,000 events in less than a week.
Given that us HEP’rs would rather spend money on a modest size local analysis system – which is quite small compared to what the rest of the physics department needs. And so we don’t really participate in these large systems in our local department. I wonder if there is a hidden cost to that. Could we gain something but moving more of our processing back locally? Could you more easily convince the NSF to fund a physics compute cluster that was doing Lattice QCD, HEP simulation and analysis, and Astro simulations? Or would they get pissed off because we weren’t using the large centers they are already funding instead? Has anyone tried a proposal like that before?