CHEP Summary September 7, 2007Posted by gordonwatts in computers, Conference.
I’ve spent this week in Victoria, Canada attending the Computers in High Energy Physics (CHEP) conference. At one time this was my favorite conference (more on that in a later post). I’ve been writing up day summaries, which I’ll post this weekend, but thought I should get my personal conference summary out before the actual conference summary just to see how it compares. 🙂
First, the vendor talks (ibm, sgi, intel) were some of the best talks given. They were given nice long talks in the plenary sessions and all the companies sent excellent speakers. I think these were the best vendor talks I’ve ever seen. It was fascinating to see the different takes each company had on the future of computing: ibm: lots slow cores, little memory, sgi: evolve the current technology to its limits: pack 1000’s of cores into a single rack, intel: 80 cores on a single chip, one or two big cores with all the bells and whistles and lots of little cores that run more slowly.
Power, Moore’s law, and Heat dominated the plenary sessions. Every computing facility is feeling the pain. No one has figured out how to solve this. For the near future we will be moving towards tricks — like Sun’s black box computing. Longer term some real change in technology.
Performance & The Multi-Core Future
It seems a given that as we head towards the multi-core future we will need to change the way we write code. The memory bandwidth in and out of a chip will increase more slowly than the number of flops that chip will be capable of — so less data per flop will be a problem! It isn’t obvious we can rewrite our code to run multi-threaded: that is a huge amount of work. Further, I think we don’t have enough data on processor performance to really understand if that would help (though it seems like it would).
To that end there was only one talk (that I saw) that really looked at performance of some of our large reconstruction programs and simulation programs. The result? On a CPU with a 500 GB/sec bus (or was it 50? I can’t remember) the reconstruction program of CMS is using only 40 MB/sec!! If that is true, we will have no trouble scaling up to 80 cores given the current memory bandwidth. Further, the CPU is idle of about 60% of the time (it can process 4 instructions at once, on average it is doing 1.2).
At the start of this conference I was convinced that we were going to have to alter our programming model. But now I think there is a lot of work we can do with our current installed base in the form of optimization. By the next CHEP I expect a lot more studies of this sort. It was sad that so many of us (myself included) talked without really having too much data. We may still have to alter how we approach things — but there is more to be gained in our current frameworks.
I also predict that people working on offline software will be now asked to move away from creating random objects all the time – in some places in the CMS code over 1 million news and deletes were occurring per second! Ack!!
It is hard for me to tell any difference year to year. But the consensus seems to be funding is “dead” and we need to get on with distributed computing. Oh, and, as in every year, stability is a must (sheesh). I think that with funding drying up organizations like OSG will be disfavored and large installations, centrally and professionally managed, like TerraGRID will become the norm. GRID software will still exist so people can “easily” run on these different large centers.
I also think, give the continuing addition of layers of complexity, that user analysis will not occur on the GRID. ROOT tuples (or something similar) will be produced, downloaded to your local 10 TB cluster, and then run locally.
Resistance is no longer an option — we have all been assimilated. It is very nice to see ROOT finally ripping itself apart and putting itself back together in a more modular and separable way. What prompted this? Slow start up times and memory usage. Awesome! Lots of other efficiency improvements and I/O improvements are getting made as well.
I can’t tell how PROOF is coming along. There are now some real installations, but it hasn’t really started to spread. The problem is that at almost every CHEP this has been the case. Unfortunately, from my point of view, PROOF cluster design still is a big iron design. Hopefully it will get simpler as time passes.
There wasn’t enough of this at CHEP. I like CHEP because it straddles Physics and Computer Science. It feels more and more CS like, and less and less physics like. There were some interesting talks — for example integrating advanced separation techniques (like Decision Trees) into ROOT.
There was plenty more going on, but in the few minutes I had to dash this off this is what came to mind. It was a good conference (other than the lousy network access)! Food in Victoria is also really good!