OSG Plenary Sessions: Day 1 August 22, 2006Posted by gordonwatts in computers.
Today’s set of talks was broadly interesting, I thought. I’m not sure if I’ll continue this series of posts as later plenary talks will be more about the details and inner workings of the Open Science Grid (OSG).
The State of the OSG – Ruth Pordes (Fermilab, OSG Director)
Ruth’s talk was meant to set the stage for the OSG meeting. Sorry about the link to her talk — I’ll add it as soon as the real talk is posted (agenda currently links to a Twiki page). Part of the talk was a laundry list — what they decided to do last meeting, what got done, what didn’t. She outlined OSG’s mission as an integrator. To take tools like Globus, Condor, etc., integrate and extend them, and then redistribute them as something easily installable and pre-configured for the OSG; they don’t actually write the tools from the ground up. I got the impression from this talk, and several others, that the OSG was basically succeeding, but there were plenty of rough edges.
eScience – Tony Hey (Microsoft).
Tony is an ex-UK particle theorist who turned to parallel computing an eScience — he ran the UK effort before taking a job as academic computing outreach with Microsoft. In general, particle physics doesn’t have much use for Microsoft. He made said that here (as he did during his talk at CHEP). But this talk was better than his CHEP talk – it spoke much more directly to what is going on in the world of GRID computing. Tony claims that Microsoft is still developing its strategy in this field, but you can hear the outline of it in his talk. I usually think about the GRID and OSG as being about high performance computing: the more compute cycles the better. The software is all about taking a job, sending it to the computer, and getting the results back. Tony almost brushed over this aspect of the work. He talked about running jobs on both Windows and Linux farms (indeed, they did a demonstration of just that at the Supercomputing 2005 conference). Tony spent much more time talking about services on the GRID, and getting them to interoperable. His mantra was web service. When asked for specifics he used job submission as an example (actually he had 3 examples, but this is the only one I can remember off the top of my head). He would like to see a simple job submission web service protocol standardized. Then Microsoft, as well as open source developers and others, could all make batch farms and management systems that would expose this interface — and compete with each other. Further he seemed much more interested in solving the problems around running on the GRID. How do you analyze the huge amount of data that usually comes back from these jobs? How do you do you create the workflow that steers your jobs through the various steps? And, repeatedly emphasized, how do you do it without a computer scientist on your staff? I was impressed with the fraction of the audience that Tony knew — he called out many of his questioners by name. And the questions were much better than at CHEP (“Who defines the standards?”).
Rosetta@Home – David Baker
David is a UW professor in the bio-chemistry department. His group came up with and runs rosetta@home. The shape of a protein in the body goes a long way to determining its function. It can only bind with a cell or another protein if its has open bonds in the right geometrical configuration. This is especially true, as I learned, because hydrogen bonds are short-range. Each protein has a single shape. Proteins, being lazy like the rest of the universe, prefers a shape that is an energy minimum: in short, it takes a minimum amount of effort (or none) to hold the shape. Up to now determining this shape has been a purely experimental process using X-Ray crystallography. Protein folding programs cycle through all possible shapes of a particular protein and determines the total energy of each configuration. If they can find a shape with a small amount of energy they then have found the actual shape of the protein. The problem is the huge number of shapes a protein can take on: there are 100’s of chemical bonds in a protein and each of them can orient themselves in several different ways. David and his lab use the SETI framework, boinc, to distribute the protein folding as a screensaver to millions of users. This gives him about 33 TeraFLOPS (as of this writing). David has entered several contests using this tool — his group is given the protein’s amino acid sequence and must determine its shape. When the contest is over the shape is determined experimentally and the closest match wins. He wins every time using the rosetta@home tool. The ultimate goal is to run this process in reverse. Given a set of binding sites create an amino acid sequence for a protein that will bind to those sites. Imagine doing this with AIDS: you find a set of sites on the HIV virus that will render it ineffective, you create a protein that binds to it… The two most surprising things he learned during the several years he has been running this were sociology and outreach. He didn’t realize how much outreach and education potential there was — he has talked to more high school classes than he ever thought possible. And some of the flame wars on the forums have also caught him off guard – I found that post doing a quick scan; I’m sure others that are worse could be found. 🙂
Influence of Confinement and Antiplasticigation on the Dynamics of Polymers – Robert Riggleman (U. Wisconsin Madison)
This is a cool problem. As feature sizes on chips get smaller the laser etching is getting more and more difficult. A basic difficulty is the etching material, a photo-sensitive polymer, is having trouble maintaining its shape after etching. Adjacent structures are bending into each other: the material isn’t stiff enough. It isn’t hard to add stiffeners to the polymer but to add stiffeners and keep the etching properties the same turns out to be very difficult. To study the mix Robert and others in his group have turned to large scale simulations. By adding small amounts of an antiplasticigation material they are able to achieve the effects they need. Determining the concentration and chemical makeup of the agent is done by the simulation. Pretty cool! This is a project that has been run on the OSG, and they are looking forward to expanding the number of computers they run on. By the way, Condor was invented at Wisconsin.
Automated Annotation of Microbial Genomes – Margaret Romine (PNNL)
Margaret talked about using various GRID enabled tools to annotate genes. It was a fascinating process. Genes can be quickly sequenced (yesterday’s discovery is today’s graduate student task!). But once they are sequenced one has to pick out the genes by looking for markers. The problem is the markers aren’t the same in every animal and we don’t know all the markers either. Some automation is possible. For example, if an animal is close to yours you can often find lots of genetic similarities. You can also just do a huge search to see if there is a sub-sequence in your gene that matches anything else ever seen. One fascinating thing Margaret talked about was the huge number of tools she had to run to perform an analysis that a computer should have performed. She was complaining of the exact thing that Tony had been talking about as an opportunity for Microsoft: workflow integrated with high performance computing.
LHC Physics – Oliver Gutsche
Oliver gave the usual LHC GRID talk. I’m sorry to put it this way — I’ve seen a talk like this so many times that my mind starts to drift. Towards the end he did talk about OSG use by the CMS experiment. HEP tends to use the GRID in a fairly boring way currently: we submit huge batch jobs that have almost no interaction between each other. Interactive use and analysis on the GRID are problems only just getting started and I don’t think the field has carefully addressed the issues yet. One thing I did note was that CMS was planning on an analyzer taking 3 days to regenerate the plots for there analysis. There is just no way any analyzer is going to be willing to wait that long. I can’t help but wonder if too many of these analysis systems are getting designed by people expert in production (I include ATLAS in this criticism).
Nanotechnology Experience – David Braun
David talked about the laundry list of problems his group has had getting nanoHUB working on OSG, and specifically, getting the BioMOCA simulation package to run. This package simulates the walk of an ion through a channel in a cell membrane.Their jobs have caused OSG a lot of problems because they are different than many of the HEP jobs that have run through. For example, they run for 10 days straight. Security permits expire in less time than that, meaning the job’s permission to run goes away before it finishes!
The rest of the morning talks were about OSG facilities. Particularly interesting was a talk by Mike Wilde about a South Padre Island summer school. I remember South Padre Island as the spring break place when I was at University of Texas at Austin! Moving right along, he talked about getting 40 students together and having them create and run a GRID job. Talk about outreach!
The rest of the afternoon was taken up by parallel sessions.