1 files changed, 281 insertions, 0 deletions
diff --git a/doc/RCSFILES b/doc/RCSFILES
new file mode 100644
index 0000000..35e30ab
--- /dev/null
+++ b/doc/RCSFILES
@@ -0,0 +1,281 @@
+It would be nice if the RCS file format (which is implemented by a
+great many tools, both free and non-free, both by calling GNU RCS and
+by reimplementing access to RCS files) were documented in some
+standard separate from any one tool.  But as far as I know no such
+standard exists.  Hence this file.
+
+The place to start is the rcsfile.5 manpage in the GNU RCS 5.7
+distribution.  Then look at the diff at the end of this file (which
+contains a few fixes and clarifications to that manpage).
+
+If you are interested in MKS RCS, src/ci.c in GNU RCS 5.7 has a
+comment about their date format.  However, as far as we know there
+isn't really any document describing MKS's changes to the RCS file
+format.
+
+The rcsfile.5 manpage does not document what goes in the "text" field
+for each revision.  The answer is that the head revision contains the
+contents of that revision and every other revision contain a bunch of
+edits to produce that revision ("a" and "d" lines).  The GNU diff
+manual (the version I looked at was for GNU diff 2.4) documents this
+format somewhat (as the "RCS output format"), but the presentation is
+a bit confusing as it is all tangled up with the documentation of
+several other output formats.  If you just want some source code to
+look at, the part of CVS which applies these is RCS_deltas in
+src/rcs.c.
+
+The rcsfile.5 documentation only _very_ briefly touches on the order
+of the revisions.  The order _is_ important and CVS relies on it.
+Here is an example of what I was able to find, based on the join3
+sanity.sh testcase (and the behavior I am documenting here seems to be
+the same for RCS 5.7 and CVS 1.9.27):
+
+    1.1 ----------------->  1.2
+     \---> 1.1.2.1           \---> 1.2.2.1
+
+Here is how this shows up in the RCS file (omitting irrelevant parts):
+
+  admin:  head 1.2;
+  deltas:
+    1.2 branches 1.2.2.1; next 1.1;
+    1.1 branches 1.1.2.1; next;
+    1.1.2.1 branches; next;
+    1.2.2.1 branches; next;
+  deltatexts:
+    1.2
+    1.2.2.1
+    1.1
+    1.1.2.1
+
+Yes, the order seems to differ between the deltas and the deltatexts.
+I have no idea how much of this should actually be considered part of
+the RCS file format, and how much programs reading it should expect to
+encounter any order.
+
+The rcsfile.5 grammar shows the {num} after "next" as optional; if it
+is omitted then there is no next delta node (for example 1.1 or the
+head of a branch will typically have no next).
+
+There is one case where CVS uses CVS-specific, non-compatible changes
+to the RCS file format, and this is magic branches.  See cvs.texinfo
+for more information on them.  CVS also sets the RCS state to "dead"
+to indicate that a file does not exist in a given revision (this is
+stored just as any other RCS state is).
+
+The RCS file format allows quite a variety of extensions to be added
+in a compatible manner by use of the "newphrase" feature documented in
+rcsfile.5.  We won't try to document extensions not used by CVS in any
+detail, but we will briefly list them.  Each occurrence of a newphrase
+begins with an identifier, which is what we list here.  Future
+designers of extensions are strongly encouraged to pick
+non-conflicting identifiers.  Note that newphrase occurs several
+places in the RCS grammar, and a given extension may not be legal in
+all locations.  However, it seems better to reserve a particular
+identifier for all locations, to avoid confusion and complicated
+rules.
+
+   Identifier   Used by
+   ----------   -------
+   namespace    RCS library done at Silicon Graphics Inc. (SGI) in 1996
+                (a modified RCS 5.7--not sure it has any other name).
+   dead         A set of RCS patches developed by Rich Pixley at
+                Cygnus about 1992.  These were for CVS, and predated
+                the current CVS death support, which uses a state "dead"
+                rather than a "dead" newphrase.
+
+CVS does use newphrases to implement the `PreservePermissions'
+extension introduced in CVS 1.9.26.  The following new keywords are
+defined when PreservePermissions=yes:
+
+   owner
+   group
+   permissions
+   special
+   symlink
+   hardlinks
+
+The contents of the `owner' and `group' field should be a numeric uid
+and a numeric gid, respectively, representing the user and group who
+own the file.  The `permissions' field contains an octal integer,
+representing the permissions that should be applied to the file.  The
+`special' field contains two words; the first must be either `block'
+or `character', and the second is the file's device number.  The
+`symlink' field should be present only in files which are symbolic
+links to other files, and absent on all regular files.  The
+`hardlinks' field contains a list of filenames to which the current
+file is linked, in alphabetical order.  Because files often contain
+characters special to RCS, like `.' and sometimes even contain spaces
+or eight-bit characters, the filenames in the hardlinks field will
+usually be enclosed in RCS strings.  For example:
+
+	hardlinks	README @install.txt@ @Installation Notes@;
+
+The hardlinks field should always include the name of the current
+file.  That is, in the repository file README,v, any hardlinks fields
+in the delta nodes should include `README'; CVS will not operate
+properly if this is not done.
+
+Newphrases are also used to implement the 'commitid' feature. The
+following new keyword is defined:
+
+   commitid
+
+The rules regarding keyword expansion are not documented along with
+the rest of the RCS file format; they are documented in the co(1)
+manpage in the RCS 5.7 distribution.  See also the "Keyword
+substitution" chapter of cvs.texinfo.  The co(1) manpage refers to
+special behavior if the log prefix for the $Log keyword is /* or (*.
+RCS 5.7 produces a warning whenever it behaves that way, and current
+versions of CVS do not handle this case in a special way (CVS 1.9 and
+earlier invoke RCS to perform keyword expansion).
+
+Note that if the "expand" keyword is omitted from the RCS file, the
+default is "kv".
+
+Note that the "comment {string};" syntax from rcsfile.5 specifies a
+comment leader, which affects expansion of the $Log keyword for old
+versions of RCS.  The comment leader is not used by RCS 5.7 or current
+versions of CVS.
+
+Both RCS 5.7 and current versions of CVS handle the $Log keyword in a
+different way if the log message starts with "checked in with -k by ".
+I don't think this behavior is documented anywhere.
+
+Here is a clarification regarding characters versus bytes in certain
+character sets like JIS and Big5:
+
+    The RCS file format, as described in the rcsfile(5) man page, is
+    actually byte-oriented, not character-oriented, despite hints to
+    the contrary in the man page.  This distinction is important for
+    multibyte characters.  For example, if a multibyte character
+    contains a `@' byte, the `@' must be doubled within strings in RCS
+    files, since RCS uses `@' bytes as escapes.
+
+    This point is not an issue for encodings like ISO 8859, which do
+    not have multibyte characters.  Nor is it an issue for encodings
+    like UTF-8 and EUC-JIS, which never uses ASCII bytes within a
+    multibyte character.  It is an issue only for multibyte encodings
+    like JIS and BIG5, which _do_ usurp ASCII bytes.
+
+    If `@' doubling occurs within a multibyte char, the resulting RCS
+    file is not a properly encoded text file.  Instead, it is a byte
+    stream that does not use a consistent character encoding that can
+    be understood by the usual text tools, since doubling `@' messes
+    up the encoding.  This point affects only programs that examine
+    the RCS files -- it doesn't affect the external RCS interface, as
+    the RCS commands always give you the properly encoded text files
+    and logs (assuming that you always check in properly encoded
+    text).
+
+    CVS 1.10 (and earlier) probably has some bugs in this area on
+    systems where a C "char" is signed and where the data contains
+    bytes with the eighth bit set.
+
+One common concern about the RCS file format is the fact that to get
+the head of a branch, one must apply deltas from the head of the trunk
+to the branchpoint, and then from the branchpoint to the head of the
+branch.  While more detailed analyses might be worth doing, we will
+note:
+
+    * The performance bottleneck for CVS generally is figuring out which
+    files to operate on and that sort of thing, not applying deltas.
+
+    * Here is one quick test (probably not a very good test; a better test
+    would use a normally sized file (say 50-200K) instead of a small one):
+
+	I just did a quick test with a small file (on a Sun Ultra 1/170E
+	running Solaris 5.5.1), with 1000 revisions on the main branch and
+	1000 revisions on branch that forked at the root (i.e., RCS revisions
+	1.1, 1.2, ..., 1.1000, and branch revisions 1.1.1.1, 1.1.1.2, ...,
+	1.1.1.1000).  It took about 0.15 seconds real time to check in the
+	first revision, and about 0.6 seconds to check in and 0.3 seconds to
+	retrieve revision 1.1.1.1000 (the worst case).
+
+    * Any attempt to "fix" this problem should be careful not to interfere
+    with other features, such as lightweight creation of branches
+    (particularly using CVS magic branches).
+
+Diff follows:
+
+(Note that in the following diff the old value for the Id keyword was:
+    Id: rcsfile.5in,v 5.6 1995/06/05 08:28:35 eggert Exp 
+and the new one was:
+    Id: rcsfile.5in,v 5.7 1996/12/09 17:31:44 eggert Exp 
+but since this file itself might be subject to keyword expansion I
+haven't included a diff for that fact).
+
+===================================================================
+RCS file: RCS/rcsfile.5in,v
+retrieving revision 5.6
+retrieving revision 5.7
+diff -u -r5.6 -r5.7
+--- rcsfile.5in	1995/06/05 08:28:35	5.6
++++ rcsfile.5in	1996/12/09 17:31:44	5.7
+@@ -85,7 +85,8 @@
+ .LP
+ \f2sym\fP	::=	{\f2digit\fP}* \f2idchar\fP {\f2idchar\fP | \f2digit\fP}*
+ .LP
+-\f2idchar\fP	::=	any visible graphic character except \f2special\fP
++\f2idchar\fP	::=	any visible graphic character,
++		except \f2digit\fP or \f2special\fP
+ .LP
+ \f2special\fP	::=	\f3$\fP | \f3,\fP | \f3.\fP | \f3:\fP | \f3;\fP | \f3@\fP
+ .LP
+@@ -119,12 +120,23 @@
+ the minute (00\-59),
+ and
+ .I ss
+-the second (00\-60).
++the second (00\-59).
++If
+ .I Y
+-contains just the last two digits of the year
+-for years from 1900 through 1999,
+-and all the digits of years thereafter.
+-Dates use the Gregorian calendar; times use UTC.
++contains exactly two digits,
++they are the last two digits of a year from 1900 through 1999;
++otherwise,
++.I Y
++contains all the digits of the year.
++Dates use the Gregorian calendar.
++Times use UTC, except that for portability's sake leap seconds are not allowed;
++implementations that support leap seconds should output
++.B 59
++for
++.I ss
++during an inserted leap second, and should accept
++.B 59
++for a deleted leap second.
+ .PP
+ The
+ .I newphrase
+@@ -144,16 +156,23 @@
+ field in order of decreasing numbers.
+ The
+ .B head
+-field in the
+-.I admin
+-node points to the head of that sequence (i.e., contains
++field points to the head of that sequence (i.e., contains
+ the highest pair).
+ The
+ .B branch
+-node in the admin node indicates the default
++field indicates the default
+ branch (or revision) for most \*r operations.
+ If empty, the default
+ branch is the highest branch on the trunk.
++The
++.B symbols
++field associates symbolic names with revisions.
++For example, if the file contains
++.B "symbols rr:1.1;"
++then
++.B rr
++is a name for revision
++.BR 1.1 .
+ .PP
+ All
+ .I delta
+