Wednesday, March 12, 2008

Sketching Subversion

Subversion is easy to grasp for long-time CVS users, since it is conceptually very similar. The intent when creating Subversion was simply to create "a better CVS" as they state on their frontpage. I think they succeeded with regard to this.

First, each commit is atomic and revision numbers are incremented for each commit repository-wise, whereas in CVS revision numbers are incremented for each commit file-wise. This means each revision number represent a state of the filesystem tree. With CVS, if you want to get the diff of a commit, you could only approximate it with dates. In SVN, you simply ask for the diff between two revisions.

The puzzling side of this is that two successive revisions of a file may not have successive revision number. For example, I've recently worked on a script stored in a repository shared by multiple developpers, and here are the successive revision number:


Another feature worth noting is called "externals". Each repository entry, file or directory, may carry metadatas as name/value pairs, so-called "properties". Their name may be whatever you want, such as a copyright or even the full license text, but a few have special meaning. I've only used the one named "svn:externals", which it is extremely useful. I've once work on a CVS repository containing multiple projects and a couple home-grown libs shared among them. In order to build the project, we had to write some shell script glue that checked out the required libs automatically into the project directory before building. As far as I know, there is no other way to circumvent this problem (let me know if there is one). This is the exact problem that externals can resolve. By adding a couple of svn:externals properties to your project directory, you compel Subversion to pull down the required libraries along your project. Supposing your repository is laid out like this:


You can add a "svn:externals" property to the "myproject" directory containing:

mylib /path/to/your/repository/lib/mylib

And the next time you'll check out or update your project, the mylib/ subdirectory will automatically created. You could even use it as a working copy of lib/mylib/ and perform commits in it!

Branches and all
Subversion only knows about "cheap copies" of a directory or file somewhere else in the repository, no tag, no branch. Let me explain. Newly copied entries share their history with their ancestor, therefrom the cheap copy. From here onward, all commits on one or the other copy won't be shared... I'm sure the concept of branch is already looming in your mind :-). You may also have already figured that branches are addressed through the repository namespace: you don't need an extra information as in CVS ("-r BRANCH"). This is why Subversion repositories are commonly laid out like this (this is the advised way in SVN documentation to design your repository, though this is really a matter of policy):


You may now wonder how to create a simple tag (i.e. not a branch tag, to use CVS terminology), unless you are especially keen and you have already grasped the whole thing. Actually, there is no difference between a branch and a mere tag, except you haven't performed any commits in the latter. A good practice when using CVS when you want to create a new branch for a given release is to first tag and then branch, so you can address the exact branching point using the tag. In Subversion, a cheap copy creates a new revision number, so you only have to dig the log up to for it and ask for the diff.

This also mean you can easily move a file in the repository without losing its history, by copying and then deleting the ancestor. With CVS, this required a so-called "repo-copy", i.e. duplicating the RCS file on the repository side, which somehow is a waste of space.

To sum up the whole thing, you have a namespace in which you can do cheap copies which will share their history up to the copy revision. Everything else is just a matter of policy.

I really like SVN, for all those aforementioned things above.
But I won't ever blame CVS for its weaknesses. It has been designed more than twenty years ago and relies on the RCS format which was designed in the early eighties. It works so well that numerous projects are still using it, notably the FreeBSD project which is known to have the biggest open-source repository ever.