jump to navigation

Yes, We may Have Made a Mistake. June 3, 2011

Posted by gordonwatts in ATLAS, computers.
trackback

No, no. I’m not talking about this. A few months ago I wondered if, short of generating our own reality, ATLAS made a mistake. The discussion was over source control systems:

Subversion, Mercurial, and Git are all source code version control systems. When an experiment says we have 10 million lines of code – all that code is kept in one of these systems. The systems are fantastic – they can track exactly who made what modifications to any file under their control. It is how we keep anarchy from breaking out as >1000 people develop the source code that makes ATLAS (or any other large experiment) go.

Yes, another geeky post. Skip over it if you can’t stand this stuff.

ATLAS has switched some time ago from a system called cvs to svn. The two systems are very much a like: centralized, top-down control. Old school. However, the internet happened. And, more to the point, the Cathedral and the Bazaar happened. New source control systems have sprung up. In particular, Mercurial and git. These systems are distributed. Rather than asking for permission to make modifications to the software, you just point your source control client at the main source and hit copy. Then you can start making modifications to your hearts content. When you are done you let the owner of the repository know and tell them where your repository is – and they then copy your changes back! The key here is that you had your own copy of the repository – so you could make multiple modifications w/out asking the owner. Heck, you could even send your modifications to your friends for testing before asking the owner to copy them back.

That is why it is called distributed source control. Heck, you can even make modifications to the source at 30,000 feet (when no wifi is available).

When I wrote that first blog post I’d never tried anything but the old school source controls. I’ve not spent the last 5 months using Mercurial – one of the new style systems. And I’m sold. Frankly, I have no idea how you’d convert the 10 million+ lines of code in ATLAS to something like this, but if there is a sensible way to convert to git or mercurial then I’m completely in favor. Just about everything is easier with these tools… I’ve never done branch development in SVN, for example. But in Mercurial I use it all the time… because it just works. And I’m constantly flipping my development directory from one branch to another because it takes seconds – not minutes. And despite all of this I’ve only once had to deal with merge conflicts. If you look at SVN the wrong way it will give you merge conflicts.

All this said, I have no idea how git or Mercurial would scale. Clearly it isn’t reasonable to copy the repository for 10+ million lines of code onto your portable to develop one small package. But if we could figure that out, and if it integrated well into the ATLAS production builds, well, that would be fantastic.

If you are starting a small stand alone project and you can choose your source control system, I’d definitely recommend trying one of these two modern tools.

About these ads

Comments»

1. Maximilian Attems - June 3, 2011

Go directly to git. All major open source software uses git unless politics come to a play and even then they often migrate soon to it. It really scales.

Gnome moved all their stuff from svn to git. The technical steps for the migration are pretty well documented: http://live.gnome.org/GitMigration

2. Gordon Watts - June 3, 2011

So – how do you deal with huge repositories? Making a copy must take forever, no? I should find out what the total current size of the svn repository is for our code.

I’ve not used git personally, and the two step commit seems weird, but which ever of the modern ones gets used, I’d be a fan more than svn.

3. Brian - June 3, 2011

http://en.wikipedia.org/wiki/Source_lines_of_code

The linux kernel 2.6.35 has 13.5 million lines of code. They use git, it seems to work just fine for them.

You could try some tests with their repository to see how it scales.

4. Tim Head - June 3, 2011

How big are the errorbars on this “10million” that ATLAS claims? Surely it contains multiple copies of the same code, universities analysis packages etc. If you’d ask me I’d guess it used to be 5, then got rounded up, and rounded up again etc.

The real point though: Even if it is 10million lines, you aren’t working on all the code but just parts of it. So each part of the code can exist in its own repository, your univeristies code, the btagging code, etc.

And as we are bikeshedding: my vote is for mercurial over git, git is just that little bit less good at explaining itself. I love that mercurial is so good at reminding me how to use it at the commandline because I _always_ forget.

5. Gordon Watts - June 3, 2011

Good question. I’ll ask someone who knows the number of source lines. I’ll post back here if I get a definitive answer.

We definately organize in seperate packages – and each package is certianly management (with a few exceptions, but that is ok, one always has exceptions). So that approach would work. I’d never seen anyone talking about git or Mercurial talk about this – though searching around the web after these comments I see Mercurial has support for sub-repositories.

Ok, so seperate repositories. And then use the build system to make sure you have the right set of repositories checked-out for the actual build. That is similar to the current build process.

6. Jason Mansour - June 3, 2011

I’m also a big fan of Mercurial. The great thing about it (or git or bzr) is that it encourages you to use version control for yourself. You can commit work in progress, without worrying that you might “break head” (i mean the latest version, not Tim).

The distributed aspect was a bit intimidating in the beginning, but in practice it is not much different than what you are used to from CVS: When you have something nice, you just push your version to the “official” repository. But you can also tell a colleague to pull a special version from your repo (instead of saying “copy it from my working directory” as we do now).

As to the question of git vs. Mercurial: Mercurial is James Bond, git is McGuyver. :-)
http://importantshock.wordpress.com/2008/08/07/git-vs-mercurial/

Git was developed by Linus Torvalds for the Linux kernel, so it should be perfectly suited for large code bases. However, you can “see the moving parts” if you know what I mean.

Hg (Mercurial) is more polished, has less features, but is harder to mess up, and closer to svn/cvs. I think for our typical workflow in particle physics, it would be perfect.

7. Peter Onyisi - June 4, 2011

I got sold by a friend on git, mostly because of the git-svn interface – it basically does svn better than svn. (For example tagging or branching an ATLAS svn package with git-svn is essentially trivial.) You also get a private repository to check in changes, you can rapidly switch from a “development” branch to a “stable” one for bugfixes, and so on. So basically all my work on ATLAS software is already in git :-)

Peter Waller - July 11, 2011

@Tim

I know you’re a mercurial fan, but git has gotten better since we tried them out in the day. I don’t think mercurial has that over git anymore.

@general discussion

It would be silly to try and map the whole ATLAS repository onto one git repository. The workflow needs to be considered. I believe there should be one repository “per smallest thing that it makes sense to tagged”. Repositories are very light weight, it is not a big deal to have thousands of them sitting around.

git also has the concept of “submodules”. If one really wanted, it would be possible to represent “the current state of the ATLAS repository” in terms of submodules, where each submodule one points at the HEAD commit of a version in another repository. This would allow for someone to say “clone this then run git submodule update”, and get particular versions of given packages. In true git style, anyone could create and distribute such packages of other packages. This would be really useful at ATLAS where getting consistent versions for various pieces of “correction” code together in one place can be rather difficult.

Another point that Peter O made is that you don’t _have_ to switch to git to use it. Though beware that you can’t merge if you do that, only rebase (which allows you to achieve much the same effect in a controlled manner).

8. maximilian attems - June 6, 2011

Related News showing that GitHub is the *most* popular code repository (it’s in German):
http://www.heise.de/newsticker/meldung/GitHub-populaerer-als-SourceForge-und-Google-Code-1255416.html

Original press release:
https://github.com/blog/865-github-dominates-the-forges


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 44 other followers

%d bloggers like this: