summaryrefslogtreecommitdiff
path: root/doc/user/lj_article.txt
diff options
context:
space:
mode:
Diffstat (limited to 'doc/user/lj_article.txt')
-rw-r--r--doc/user/lj_article.txt323
1 files changed, 323 insertions, 0 deletions
diff --git a/doc/user/lj_article.txt b/doc/user/lj_article.txt
new file mode 100644
index 0000000..dca776d
--- /dev/null
+++ b/doc/user/lj_article.txt
@@ -0,0 +1,323 @@
+
+ The Subversion Project: Building a Better CVS
+ ==============================================
+
+ Ben Collins-Sussman <sussman@collab.net>
+
+ Written in August 2001
+ Published in Linux Journal, January 2002
+
+Abstract
+--------
+
+This article discusses the history, goals, features and design of
+Subversion (http://subversion.tigris.org), an open-source project that
+aims to produce a compelling replacement for CVS.
+
+
+Introduction
+------------
+
+If you work on any kind of open-source project, you've probably worked
+with CVS. You probably remember the first time you learned to do an
+anonymous checkout of a source tree over the net -- or your first
+commit, or learning how to look at CVS diffs. And then the fateful
+day came: you asked your friend how to rename a file.
+
+"You can't", was the reply.
+
+What? What do you mean?
+
+"Well, you can delete the file from the repository and then re-add it
+under a new name."
+
+Yes, but then nobody would know it had been renamed...
+
+"Let's call the CVS administrator. She can hand-edit the repository's
+RCS files for us and possibly make things work."
+
+What?
+
+"And by the way, don't try to delete a directory either."
+
+You rolled your eyes and groaned. How could such simple tasks be
+difficult?
+
+
+The Legacy of CVS
+-----------------
+
+No doubt about it, CVS has evolved into the standard Software
+Configuration Management (SCM) system of the open source community.
+And rightly so! CVS itself is Free software, and its wonderful "non
+locking" development model -- whereby dozens of far-flung programmers
+collaborate -- fits the open-source world very well. In fact, one
+might argue that without CVS, it's doubtful whether sites like
+Freshmeat or Sourceforge would ever have flourished as they do now.
+CVS and its semi-chaotic development model have become an essential
+part of open source culture.
+
+So what's wrong with CVS?
+
+Because it uses the RCS storage-system under the hood, CVS can only
+track file contents, not tree structures. As a result, the user has
+no way to copy, move, or rename items without losing history. Tree
+rearrangements are always ugly server-side tweaks.
+
+The RCS back-end cannot store binary files efficiently, and branching
+and tagging operations can grow to be very slow. CVS also uses the
+network inefficiently; many users are annoyed by long waits, because
+file differeces are sent in only one direction (from server to client,
+but not from client to server), and binary files are always
+transmitted in their entirety.
+
+From a developer's standpoint, the CVS codebase is the result of
+layers upon layers of historical "hacks". (Remember that CVS began
+life as a collection of shell-scripts to drive RCS.) This makes the
+code difficult to understand, maintain, or extend. For example: CVS's
+networking ability was essentially "stapled on". It was never
+designed to be a native client-server system.
+
+Rectifying CVS's problems is a huge task -- and we've only listed just
+a few of the many common complaints here.
+
+
+Enter Subversion
+----------------
+
+In 1995, Karl Fogel and Jim Blandy founded Cyclic Software, a company
+for commercially supporting and improving CVS. Cyclic made the first
+public release of a network-enabled CVS (contributed by Cygnus
+software.) In 1999, Karl Fogel published a book about CVS and the
+open-source development model it enables (cvsbook.red-bean.com). Karl
+and Jim had long talked about writing a replacement for CVS; Jim had
+even drafted a new, theoretical repository design. Finally, in
+February of 2000, Brian Behlendorf of CollabNet (www.collab.net)
+offered Karl a full-time job to write a CVS replacement. Karl
+gathered a team together and work began in May.
+
+The team settled on a few simple goals: it was decided that Subversion
+would be designed as a functional replacement for CVS. It would do
+everything that CVS does -- preserving the same development model
+while fixing the flaws in CVS's (lack-of) design. Existing CVS users
+would be the target audience: any CVS user should be able to start
+using Subversion with little effort. Any other SCM "bonus features"
+were decided to be of secondary importance (at least before a 1.0
+release.)
+
+At the time of writing, the original team has been coding for a little
+over a year, and we have a number of excellent volunteer contributors.
+(Subversion, like CVS, is a open-source project!)
+
+
+Subversion's Features
+----------------------
+
+Here's a quick run-down of some of the reasons you should be excited
+about Subversion:
+
+ * Real copies and renames. The Subversion repository doesn't use
+ RCS files at all; instead, it implements a 'virtual' versioned
+ filesystem that tracks tree-structures over time (described
+ below). Files *and* directories are versioned. At last, there
+ are real client-side `mv' and `cp' commands that behave just as
+ you think.
+
+ * Atomic commits. A commit either goes into the repository
+ completely, or not all.
+
+ * Advanced network layer. The Subversion network server is Apache,
+ and client and server speak WebDAV(2) to one another. (See the
+ 'design' section below.)
+
+ * Faster network access. A binary diffing algorithm is used to
+ store and transmit deltas in *both* directions, regardless of
+ whether a file is of text or binary type.
+
+ * Filesystem "properties". Each file or directory has an invisible
+ hashtable attached. You can invent and store any arbitrary
+ key/value pairs you wish: owner, perms, icons, app-creator,
+ mime-type, personal notes, etc. This is a general-purpose feature
+ for users. Properties are versioned, just like file contents.
+ And some properties are auto-detected, like the mime-type of a
+ file (no more remembering to use the '-kb' switch!)
+
+ * Extensible and hackable. Subversion has no historical baggage; it
+ was designed and then implemented as a collection of shared C
+ libraries with well-defined APIs. This makes Subversion extremely
+ maintainable and usable by other applications and languages.
+
+ * Easy migration. The Subversion command-line client is very
+ similar to CVS; the development model is the same, so CVS users
+ should have little trouble making the switch. Development of a
+ 'cvs2svn' repository converter is in progress.
+
+ * It's Free. Subversion is released under a Apache/BSD-style
+ open-source license.
+
+
+Subversion's Design
+-------------------
+
+Subversion has a modular design; it's implemented as a collection of C
+libraries. Each layer has a well-defined purpose and interface. In
+general, code flow begins at the top of the diagram and flows
+"downward" -- each layer provides an interface to the layer above it.
+
+ <<insert diagram here: svn.tiff>>
+
+
+Let's take a short tour of these layers, starting at the bottom.
+
+
+--> The Subversion filesystem.
+
+The Subversion Filesystem is *not* a kernel-level filesystem that one
+would install in an operating system (like the Linux ext2 fs.)
+Instead, it refers to the design of Subversion's repository. The
+repository is built on top of a database -- currently Berkeley DB --
+and thus is a collection of .db files. However, a library accesses
+these files and exports a C API that simulates a filesystem --
+specifically, a "versioned" filesystem.
+
+This means that writing a program to access the repository is like
+writing against other filesystem APIs: you can open files and
+directories for reading and writing as usual. The main difference is
+that this particular filesystem never loses data when written to; old
+versions of files and directories are always saved as historical
+artifacts.
+
+Whereas CVS's backend (RCS) stores revision numbers on a per-file
+basis, Subversion numbers entire trees. Each atomic 'commit' to the
+repository creates a completely new filesystem tree, and is
+individually labeled with a single, global revision number. Files and
+directories which have changed are rewritten (and older versions are
+backed up and stored as differences against the latest version), while
+unchanged entries are pointed to via a shared-storage mechanism. This
+is how the repository is able to version tree structures, not just
+file contents.
+
+Finally, it should be mentioned that using a database like Berkeley DB
+immediately provides other nice features that Subversion needs: data
+integrity, atomic writes, recoverability, and hot backups. (See
+www.sleepycat.com for more information.)
+
+
+--> The network layer.
+
+Subversion has the mark of Apache all over it. At its very core, the
+client uses the Apache Portable Runtime (APR) library. (In fact, this
+means that Subversion client should compile and run anywhere Apache
+httpd does -- right now, this list includes all flavors of Unix,
+Win32, BeOS, OS/2, Mac OS X, and possibly Netware.)
+
+However, Subversion depends on more than just APR -- the Subversion
+"server" is Apache httpd itself.
+
+Why was Apache chosen? Ultimately, the decision was about not
+reinventing the wheel. Apache is a time-tested, open-source server
+process that ready for serious use, yet is still extensible. It can
+sustain a high network load. It runs on many platforms and can
+operate through firewalls. It's able to use a number of different
+authentication protocols. It can do network pipelining and caching.
+By using Apache as a server, Subversion gets all these features for
+free. Why start from scratch?
+
+Subversion uses WebDAV as its network protocol. DAV (Distributed
+Authoring and Versioning) is a whole discussion in itself (see
+www.webdav.org) -- but in short, it's an extension to HTTP that allows
+reads/writes and "versioning" of files over the web. The Subversion
+project is hoping to ride a slowly rising tide of support for this
+protocol: all of the latest file-browsers for Win32, MacOS, and GNOME
+speak this protocol already. Interoperability will (hopefully) become
+more and more of a bonus over time.
+
+For users who simply wish to access Subversion repositories on local
+disk, the client can do this too; no network is required. The
+"Repository Access" layer (RA) is an abstract API implemented by both
+the DAV and local-access RA libraries. This is a specific benefit of
+writing a "librarized" version control system; it's a big win over
+CVS, which has two very different, difficult-to-maintain codepaths for
+local vs. network repository-access. Feel like writing a new network
+protocol for Subversion? Just write a new library that implements the
+RA API!
+
+
+--> The client libraries.
+
+On the client side, the Subversion "working copy" library maintains
+administrative information within special SVN/ subdirectories, similar
+in purpose to the CVS/ administrative directories found in CVS working
+copies.
+
+A glance inside the typical SVN/ directory turns up a bit more than
+usual, however. The `entries' file contains XML which describes the
+current state of the working copy directory (and which basically
+serves the purposes of CVS's Entries, Root, and Repository files
+combined). But other items present (and not found in CVS/) include
+storage locations for the versioned "properties" (the metadata
+mentioned in 'Subversion Features' above) and private caches of
+pristine versions of each file. This latter feature provides the
+ability to report local modifications -- and do reversions --
+*without* network access. Authentication data is also stored within
+SVN/, rather than in a single .cvspass-like file.
+
+The Subversion "client" library has the broadest responsibility; its
+job is to mingle the functionality of the working-copy library with
+that of the repository-access library, and then to provide a
+highest-level API to any application that wishes to perform general
+version control actions.
+
+For example: the C routine `svn_client_checkout()' takes a URL as an
+argument. It passes this URL to the repository-access library and
+opens an authenticated session with a particular repository. It then
+asks the repository for a certain tree, and sends this tree into the
+working-copy library, which then writes a full working copy to disk
+(SVN/ directories and all.)
+
+The client library is designed to be used by any application. While
+the Subversion source code includes a standard command-line client, it
+should be very easy to write any number of GUI clients on top of the
+client library. Hopefully, these GUIs should someday prove to be much
+better than the current crop of CVS GUI applications (the majority of
+which are no more than fragile "wrappers" around the CVS command-line
+client.)
+
+In addition, proper SWIG bindings (www.swig.org) should make
+the Subversion API available to any number of languages: java, perl,
+python, guile, and so on. In order to Subvert CVS, it helps to be
+ubiquitous!
+
+
+Subversion's Future
+-------------------
+
+The release of Subversion 1.0 is currently planned for early 2002.
+After the release of 1.0, Subversion is slated for additions such as
+i18n support, "intelligent" merging, better "changeset" manipulation,
+client-side plugins, and improved features for server administration.
+(Also on the wishlist is an eclectic collection of ideas, such as
+distributed, replicating repositories.)
+
+A final thought from Subversion's FAQ:
+
+ "We aren't (yet) attempting to break new ground in SCM systems, nor
+ are we attempting to imitate all the best features of every SCM
+ system out there. We're trying to replace CVS."
+
+If, in three years, Subversion is widely presumed to be the "standard"
+SCM system in the open-source community, then the project will have
+succeeded. But the future is still hazy: ultimately, Subversion
+will have to win this position on its own technical merits.
+
+Patches are welcome.
+
+
+For More Information
+--------------------
+
+Please visit the Subversion project website at
+http://subversion.tigris.org. There are discussion lists to join, and
+the source code is available via anonymous CVS -- and soon through
+Subversion itself.
+