jump to navigation

Did ATLAS Make a Big Mistake? December 16, 2010

Posted by gordonwatts in ATLAS, computers.
trackback

Ok. That is a sensationalistic headline. And, the answer is no. ATLAS is so big that, at least in this case, we can generate our own reality.

Check out this graphic, which I’ve pulled form a developer survey.

image

Ok, I apologize for this being hard to read. However, there is very little you need to read here. The first column is Windows users, the second Linux, and the third Mac. The key colors to pay attention to are red (Git), Green (Mercurial), and Purple (Subversion). This survey was completed just recently, has about 500 people responding. So it isn’t perfect… But…

Subversion, Mercurial, and Git are all source code version control systems. When an experiment says we have 10 million lines of code – all that code is kept in one of these systems. The systems are fantastic – they can track exactly who made what modifications to any file under their control. It is how we keep anarchy from breaking out as >1000 people develop the source code that makes ATLAS (or any other large experiment) go. Heck, I use Subversion for small little one-person projects as well. Once you get used to using them you wonder how you ever did without them.

One thing to note is that cvs, which is the grand-daddy of all version control systems and used to be it about 10 or 15 years ago doesn’t even show up. Experiments like CDF and DZERO, however, are still using them. The other thing to note… how small Subversion is. Particularly amongst Linux and Mac users. It is still fairly strong in Windows, though I suspect that is in part because there is absolutely amazing integration with the operating system which makes it very easy to use. And the extent to which it is used on Linux and the Mac may also be influenced by the people that took the survey – they used twitter to advertise it and those folks are probably a little more cutting edge on average than the rest of us.

Just a few years ago Subversion was huge – about the current size of Git. And there in lies the key to the title of this post. Sometime in March 2009 ATLAS decided to switch from cvs to Subversion. At the time it looked like Subversion was the future of source control. Ops!

No, ATLAS doesn’t really care for the most part. Subversion seems to be working well for it and its developers. And all the code for Subversion is open source, so it won’t be going away anytime. At any rate, ATLAS is big enough that it can support the project even if it is left as one of the only users of it. Still… this shift makes you wonder!

I’ve never used Git and Mercurial – both of which are a new type of distributed source control system. The idea is that instead of having a central repository where all your changes to your files are tracked, each person has their own. They can trade batches of changes back and forth with each other without contacting the central repository. It is a technique that is used in the increasingly high speed development industry (for things like Agile programming, I guess). Also, I’ve often heard the term “social coding” applied to Git as well, though it sounds like that may have to do more with the GitHub repository’s web page setup than the actual version control system. It is certainly true that anyone I talk to raves about GitHub and other things like that. While I might not get it yet, it is pretty clear that there is something to “get”.

I wonder if ATLAS will switch? Or, I should say, when it will switch! This experiment will go on 20 years. Wonder what version control system will be in ascendance in 10 years?

Update: Below, Dale included a link to a video of Linus talking about GIT (and trashing cvs and svn). Well worth a watch while eating lunch!

Linus on GIT– he really hates cvs and svn–and makes a pretty good case
About these ads

Comments»

1. Maximilian Attems - December 16, 2010

cvs was very broken from day zero. This is why Linus never wanted to use it and actively asked not to use it. svn is just a modern version built on the same mistakes.

git makes branching super easy. It is fast and easy to use. I’d highly recommend to use it.

2. Gordon Watts - December 16, 2010

Ok. Let me ask this. This split/merge. I’ve used split in svn and the merge was a royal disaster. But the split and merge is independent of the source control, isn’t it? Is merging any better in git??

3. Leah Welty-Rieger - December 16, 2010

git is awesome. I used it when I worked for a bit as a software developer after grad school before coming back as a postdoc. (don’t know if you remember me…I worked for Rick on D0)

I think the biggest advantage is that you don’t have to be online to commit changes. So in subversion there are two steps, add the change, and commit it. In git you add the changed file, commit it but this only commits it to *your* local repository. Then you can do a “git push” and it sends it to the central repository. It was nice because you could do small small changes, and commit them locally. Then when you have something working you can put all the small changes up.

I think merging is always a pain, but I think it’s better in git. And since everyone has a copy of the repository if you really screw up you don’t really screw it up because someone else has a copy of it.

4. Peter Onyisi - December 16, 2010

I’ve used Mercurial (not git) and definitely prefer it to svn for small projects – no central repository needed, you can both push and pull changes, you can initialize in your own working area, tagging/branching makes more sense, and so on. I suspect the “distributed” part (other than local repositories) is not so relevant for us since all code used for physics ought to wind up in the central repository anyway. A friend of mine is a great fan of the git-svn bridge – might be a good place for people to start if they want to experiment?

5. Maximilian Attems - December 17, 2010

It tries to ease collaboration. There are many ways one can merge patches either from mail or convenient pull requests.

Conflicts happen, but the person in charge want to know them anyway. git merge is quite intelligent and tries to avoid them when possible.

The mantra of the “central repository” dates from an prehistoric “grant access” mind state. It is much better to let people branch and get productive right away. As it was pointed out in one of aboves comments, one doesn’t need a network connection to properly use git. The local clone repo contains all the history. Best practices on git allows for often commits.

6. Bitter postdoc - December 17, 2010

This all just makes it more depressing that the (fairly new) experiment I’m on is using CVS, along with the rule that you can’t have a stable and development branch of anything…

7. Nick - December 17, 2010

No, I do not believe ATLAS made a mistake moving to subversion. Much as I love git, the reality is that from what I have seen of this field, understanding it is beyond most people, either by ability or caring. Incidentally I love the mac distribution of that chart – I wonder if it has anything to do with the amazing mac-only git gui.

The other good reason for SVN is that git-SVN integration appears to be pretty mature – from what I have researched, git and SVN’s storage models are close enough that moving between the two is pretty transparent. This is good because SVN is similar enough to CVS that people wouldn’t get too confused (It is ‘professor friendly’) but if you want to learn something more complicated you can do so nearly transparently. My experiment is still stuck on CVS, which isn’t going to change – even though it’s not completely transparent to work back and forth with git, I do so anyway, because the tools git gives you are too useful and powerful.

As for branching and merging, for git at least, the system is designed around frequent branching. Making a branch is a zero-cost operation, and merging works very well too (though as it is mostly used personally, I don’t do large branches too often, preferring to rebase on top of the CVS branch before pushing the changes back up). The example usually stated is that git handles large merges on the linux kernel without a problem.

I would heartily recommend at least trying one of the distributed source control systems. The ability to have a ‘personal’ history of commits before pushing any changes up to the central repository is, in my opinion, invaluable. git or mercurial probably doesn’t matter – I decided to use git initially and am happy, but I know other people who use mercurial too.

8. gordonwatts - December 17, 2010

Great set of comments, thanks!

First of all, at lesat in HEP, I don’t think you can get away w/out centralized control. That is the only way to keep a lid on things going into the large production resources. Too much depends on them being right and well understood.

The flip side, however, is that outside of that you want people to play. I assume git and others can do that w/out trouble – you just have a particular branch that has centralized control (or one repository, etc.). The distributed repositories could then be created as desired.

I also love the idea of having a local repository to push changes to w/out going up to the big central one. I’d be willing to switch based on that alone. The other thing that seems attractive to me is being able to push changes to a friend of mine, work together, and then sync up to the main repository.

I had thought that the act of an svn branch (or copy) was also basically a no-cost operation. I remember this being one of its main selling points.

BTW, careful how you use the term “professor friendly”! :-) But you are right – what most people need/want in this field is way simpler than what we give them and the consts can be quite high. I point to the build systems we use (like CMT or SRT) as an example – way more power than 90% of the packages need, and as a result the build system is much more fragile!

Finally, the mac comment about git adoption – I’m sure that the GUI is a partial driver of that. On Windows the SVN GUI integration in the explorer (i.e. Finder) is nothing short of butter. I move stuff in and out of ATLAS all the time using it, along with my various personal projects.

I’m told that the integration for Mercurial is pretty good on windows too. And partly as a result of the comments here, I’ll try that out for my next small personal project.

9. Dale - December 18, 2010

I suggest watching the presentation Linus Tourvalds made to Google on Git.

10. Mike Miller - December 18, 2010

I’ve got one toe in the water with an actual agile team doing daily releases in a large production environment, and I have to say that git+github is the backbone of something very powerful. Combined with something like hudson for continuous integration testing, it forms a solid backbone. It took me a bit to understand that git enforces a very different workflow (task/issue => fork => pull => code => commit => push => generate pull request => selective merge). Very nicely introduced at http://help.github.com/forking/, worth a look.

11. Gordon Watts - December 31, 2010

Ok. GIT sounds pretty amazing – the fact that it tracks everything as a single file – so a function that moves from one file to another, and it tracks it!? WOW!

Sounds like the only performance problems are having too many files in a single project. For an experiment that isn’t likely to be a problem as we’ve alreay segmented it.

After listening to this talk, Dale, I’m sold. I’m trying out Hg right now to see how I like the style for one of my new side-projects.

Updated the main post to point them to this video…


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 44 other followers

%d bloggers like this: