Bjarne Stroustrup September 8, 2009Posted by gordonwatts in CERN, computers, ROOT.
If you are even semi-conscious of the computing world you know this name: Bjarne Stroustrup. He is the father of C++. He started designing the language sometime in the very late 1970’s and continues to this day trying to keep it from getting too “weird” (his words).
He visited CERN this last week, invited by the ROOT team (I took few pictures). I couldn’t see his big plenary talk due to a meeting conflict, but my friend Axel, on the ROOT team, was nice enough to invite me along to a smaller discussion. Presentations made at this discussion should be posted soon here. The big lecture is posted here, along with video (sadly, in flash and wmv format – not quite mp4 as I’ve been discussing!!)! I see that Axel also has a blog and he is posting a summary there too – in more detail than I am.
The C++ standard – which defines the language – is currently overseen by a ISO Standards Committee. Collectively they decide on the features and changes to the language. The members are made up of compiler vendors, library vendors, library authors, large banking organizations, Intel, Microsoft, etc. – people who have a little $$ and make heavy use of C++. Even high energy physics is represented – Walter Brown from Fermilab. Apparently the committee membership is basically open – it costs about $10K/year to send someone to all the meetings. That is it. Not very expensive. The committee is currently finishing off a new version of the C++ language, commonly referred to as C++0x.
The visit was fascinating. I’ve always known there was plenty of politics when a group of people get together and try to decide things. Heck, I’m in High Energy Physics! But I guess I’d never given much thought to a programming language! Part of the reason it was as fascinating as it was was because several additions to the language that folks in HEP were interested in were taken out at the last minute – for a variety of reasons – so we were all curious as to what happened.
I learned a whole bunch of things during this discussion (sorry for going technical on everyone here!):
- Bjarne yelled at us multiple times: people like HEP are not well represented on the committee. So join the thing and get views like ours better represented (though he worried if all 150 labs joined at once that might cause a problem).
- In many ways HEP is now pushing several multi-core computing boundaries. Both in numbers of cores we wish to run on and how we use memory. Memory is, in particular, becoming an acute problem. Some support in the standard would be very helpful. Minimal support is going in to the new standard, but Bjarne said, amazingly enough, there are very few people on the committee who are willing to work on these aspects. Many have the attitude that one core is really all that is needed!!! Crazy!
- In particle physics we leak memory like a sieve. Many times our jobs crash because of it. Most of the leaks are pretty simple and a decent garbage collector could efficiently pick up everything and allow our programs to run longer. Apparently this almost made it into the standard until a coalition of the authors of the boost library killed it: if you need a garbage collector then you have a bug; just fix it. Which is all good and glorious in an ideal world, but give me a break! In a 50 million line code base!? One thing Bjarne pointed out was it takes 40 people to get something done on the committee, but it takes only 10 to stop it. Sort of like health insurance. :-)
- Built in support for memory pools would probably be quite helpful here too. The idea is that when you read in a particle physics event you allocated all the data for that event in a special memory pool. The data from an event is pretty self-contained – you don’t need it once you have done processing that event and move onto the next one. If it is all in its own memory pool, then you can just wipe it out all at once – who cares about actually carefully deleting each object. As part of the discussion of why something like this wasn’t in there (scoped allocators sounds like it might be partway there) he mentioned that HP was “on our side”, Intel was “not”, and Microsoft was one of the most aggressive when it came to adding new features to the language.
- I started a discussion of how the STL is used in HEP – pointing out that we make very heavy use of vector and map, and then very little else. Bjarne expressed the general frustration that no one was really writing their own containers. In the ensuing discussion he dissed something that I often make use of – the for_each loop algorithm. His biggest complaint was who much stuff it added – you had to create a whole new class – which involves lots of extra lines of code – and that the code is no longer near where it is being used (non-locality can make source code hard to read). He is right both are problems, but to him they are big enough to nix its used except in rare circumstances. Perhaps I’ll have to re-look at the way I use them.
- He is not a fan of OpenMP. I don’t like it either, but sometimes people trot it out as the only game in town. Surely we know enough to do better now. Tasked based parallelism? By slots?
- Bjarne is very uncomfortable with Lambda’s functions – a short hand way to write one-off functions. To me this is the single best thing being added to the language – it will not be possible to totally avoid having to write another mem_fun or bind2nd template. That is huge, because those things never worked anyway – you could spend hours trying to make the code build, and they added so much cruft to your code you could never understand what you were trying to do in the first place! He is nervous that people will start adding large amounts of code directly into lambda functions – as he said “if it is more than one line, it is important enough to be given a name!!” We’ll have to see how use develops.
- He was pretty dismissive of proprietary languages. Java and C# both were put in this category (both have international standards behind them, just like C++, however) – citing vendor lock-in. But the most venom I detected was when he was discussing the LLVM open source project. This is a C++ interpreter and JIT. This project was loosely run but has now been taken over by Apple – presumably to be, among other things, packaged with their machines. His comment was basically “I used to think that was very good, but now that it has been taken over by Apple I’d have to take a close look at it and see what direction they were taking it.”
- Run Time Type Information. C++ came into its own around 1983 or so. No modern language is without the ability to inspect itself. Given an object, you can usually determine what methods are on the object, what the arguments of those methods are, etc. – and most importantly, build a call to that method without having ever seen the code in source form. C++ does not have it. We all thought there was a big reason this wasn’t the case. The real reason: no one has pushed hard enough or is interested enough on the committee. For folks doing dynamic coding or writing interpreters this is crucial. We have to do that in our code and adding the information in after-the-fact is cumbersome and causes code bloat. Apparently we just need to pack the C++ committee!
Usually as someone rises in importance in their field they get more and more diplomatic – it is almost a necessity. If that is the case, Bjarne must have been pretty rough when he was younger! It was great to see someone who was attempting to steer-by-committee something he invented vent his frustrations, show his passion, name names, and at one point threaten to give out phone numbers (well, not really, but he almost gave out phone numbers). He can no longer steer the language exactly as he wants it, but he is clearly still very much guiding it.
You can find slides that were used to guide the informal discussion here. I think archived video from the plenary presentation will appear linked to here eventually if you are curious.