[Dev] Repository Performance

Brett Wooldridge bwooldridge at alterpoint.com
Tue Dec 18 20:39:42 CST 2007


I was performing some benchmarking over the weekend around scalability, and particularly around the Subversion repository.  Both Dylamite and I had noticed "throttling" during large scale backups, and the bottleneck always pointed to Subversion.  In particular, part of the bottleneck is creating and committing into the repository the directory structure which contains the configs for a device.  So, I made a change to create and commit this structure at the time that the device is created rather than lazily at backup time - shaving at least part of the time off of the initial backup.  Of course, I didn't want to impact our stellar import speed (40,000 devices in 30 seconds), so I queue up the requests to create/commit the structures and a thread works them off in the background.

For my test I created 100,000 devices and then monitored the repository activity to see how long it would take to drain the queue (create/commit ~140,000 directories).  Much to my chagrin (or should I say astonishment) after leaving the house and running various errands, I returned to the server still chugging away.  In all it took Subversion two hours to commit.  Granted, this is a one-time hit that occurs the first time that devices are created, but it does not bode well in general for Subversion performance.  The great thing about Subversion for us was the availability of Java bindings which sit right on top of their native libraries.  Unfortunately, the convenience of these bindings is largely outweighed by the poor performance of Subversion.

Obviously, this isn't much of an issue now with most users having only a few thousand devices, but it presents a long term barrier to scalability and one I intend to get out in front of.  Fortunately, the API that our Configuration Store presents a repository implementation-agnostic view of the revision store so when we decide to swap out the underlying backend it will have no visible effect on existing clients or scripts.  However, I'm going to take it one step further and internally make the repository implementation an OSGi Fragment.  But doing so, both the external and internal contracts will be defined via interfaces and by merely dropping in a different fragment (bundle), the underlying repository implementation can be swapped.

In surveying the version store landscape two things became clear.  First, Subversion has gained a reputation of being "a pig" (to use someone else's characterization).  Second, the Git repository created by Linus Torvalds after the Linux kernal project got burned by BitKeeper is bar-none the most advanced SCM for open source development out there.  On the topic of the first, here's a blog page from the Architect of Site5 (web hosting company) about their results of using Subversion and ultimately replacing it with their own homegrown (http://www.karmiccoding.com/articles/2006/02/22/on-performance-sometimes-the-wheel-just-aint-up-to-scratch).  Worth a read if you're an architecty kind of person, but he high-level is this:

Populating large repository (several gigabytes, thousands of files):   Subversion about 2 hours, homegrown about 126 seconds
Sweeping same repository for changes:   Subversion about 38 minutes, homegrown about 81 seconds
Sweeping smaller repositories:  Subversion about 15 seconds, homegrown less than 1 second

When reading some comparisons of Git to other SCM solutions, I came across some interesting numbers.  The Mozilla project's CVS repository is about 3 GB; its about 12 GB in Subversion's fsfs format.  In Git its around 300 MB.  The unfortunate thing (for us) with Git is that it is GPL.  However, the fortunate thing for us is that it owes it's performance and high compression largely to the libXDiff library, which is LGPL.  I'm going to run some experiments with libXDiff to gauge the feasibility of using it as replacement for Subversion within ZipTie.  libXDiff is simply the most advanced general purpose differencing engine available (http://www.xmailserver.org/xdiff-lib.html).  I'll be sure to update you with my findings.

Kudos to all those who stuck through reading this far!

-Brett
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ziptie.org/pipermail/dev/attachments/20071218/509d71a6/attachment.html 


More information about the Dev mailing list