jump to navigation

CHEP Summary September 7, 2007

Posted by gordonwatts in computers, Conference.
trackback

I’ve spent this week in Victoria, Canada attending the Computers in High Energy Physics (CHEP) conference. At one time this was my favorite conference (more on that in a later post). I’ve been writing up day summaries, which I’ll post this weekend, but thought I should get my personal conference summary out before the actual conference summary just to see how it compares.🙂

First, the vendor talks (ibm, sgi, intel) were some of the best talks given. They were given nice long talks in the plenary sessions and all the companies sent excellent speakers. I think these were the best vendor talks I’ve ever seen. It was fascinating to see the different takes each company had on the future of computing: ibm: lots slow cores, little memory, sgi: evolve the current technology to its limits: pack 1000’s of cores into a single rack, intel: 80 cores on a single chip, one or two big cores with all the bells and whistles and lots of little cores that run more slowly.

Computing Hardware

Power, Moore’s law, and Heat dominated the plenary sessions. Every computing facility is feeling the pain. No one has figured out how to solve this. For the near future we will be moving towards tricks — like Sun’s black box computing. Longer term some real change in technology.

Performance & The Multi-Core Future

It seems a given that as we head towards the multi-core future we will need to change the way we write code. The memory bandwidth in and out of a chip will increase more slowly than the number of flops that chip will be capable of — so less data per flop will be a problem! It isn’t obvious we can rewrite our code to run multi-threaded: that is a huge amount of work. Further, I think we don’t have enough data on processor performance to really understand if that would help (though it seems like it would).

To that end there was only one talk (that I saw) that really looked at performance of some of our large reconstruction programs and simulation programs. The result? On a CPU with a 500 GB/sec bus (or was it 50? I can’t remember) the reconstruction program of CMS is using only 40 MB/sec!! If that is true, we will have no trouble scaling up to 80 cores given the current memory bandwidth. Further, the CPU is idle of about 60% of the time (it can process 4 instructions at once, on average it is doing 1.2).

At the start of this conference I was convinced that we were going to have to alter our programming model. But now I think there is a lot of work we can do with our current installed base in the form of optimization. By the next CHEP I expect a lot more studies of this sort. It was sad that so many of us (myself included) talked without really having too much data. We may still have to alter how we approach things — but there is more to be gained in our current frameworks.

I also predict that people working on offline software will be now asked to move away from creating random objects all the time – in some places in the CMS code over 1 million news and deletes were occurring per second! Ack!!

GRID

It is hard for me to tell any difference year to year. But the consensus seems to be funding is “dead” and we need to get on with distributed computing. Oh, and, as in every year, stability is a must (sheesh). I think that with funding drying up organizations like OSG will be disfavored and large installations, centrally and professionally managed, like TerraGRID will become the norm. GRID software will still exist so people can “easily” run on these different large centers.

I also think, give the continuing addition of layers of complexity, that user analysis will not occur on the GRID. ROOT tuples (or something similar) will be produced, downloaded to your local 10 TB cluster, and then run locally.

ROOT

Resistance is no longer an option — we have all been assimilated. It is very nice to see ROOT finally ripping itself apart and putting itself back together in a more modular and separable way. What prompted this? Slow start up times and memory usage. Awesome! Lots of other efficiency improvements and I/O improvements are getting made as well.

I can’t tell how PROOF is coming along. There are now some real installations, but it hasn’t really started to spread. The problem is that at almost every CHEP this has been the case. Unfortunately, from my point of view, PROOF cluster design still is a big iron design. Hopefully it will get simpler as time passes.

Algorithms

There wasn’t enough of this at CHEP. I like CHEP because it straddles Physics and Computer Science. It feels more and more CS like, and less and less physics like. There were some interesting talks — for example integrating advanced separation techniques (like Decision Trees) into ROOT.

There was plenty more going on, but in the few minutes I had to dash this off this is what came to mind. It was a good conference (other than the lousy network access)! Food in Victoria is also really good!

Comments»

1. It’s the Data, Stupid! « Life as a Physicist - May 5, 2008

[…] the Data, Stupid! May 5, 2008 Posted by gordonwatts in computers. trackback I’ve mentioned before that I think multicore computing is going to hit HEP hard. The basic problem is that we run all of […]


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: