diff options
Diffstat (limited to 'subversion/libsvn_fs_fs/structure')
-rw-r--r-- | subversion/libsvn_fs_fs/structure | 95 |
1 files changed, 93 insertions, 2 deletions
diff --git a/subversion/libsvn_fs_fs/structure b/subversion/libsvn_fs_fs/structure index 5472a18..41caf1d 100644 --- a/subversion/libsvn_fs_fs/structure +++ b/subversion/libsvn_fs_fs/structure @@ -40,6 +40,9 @@ repository) is: revprops/ Subdirectory containing rev-props <shard>/ Shard directory, if sharding is in use (see below) <revnum> File containing rev-props for <revnum> + <shard>.pack/ Pack directory, if the repo has been packed (see below) + <rev>.<count> Pack file, if the repository has been packed (see below) + manifest Pack manifest file, if a pack file exists (see below) revprops.db SQLite database of the packed revision properties transactions/ Subdirectory containing transactions <txnid>.txn/ Directory containing transaction <txnid> @@ -134,6 +137,7 @@ The formats are: Format 3, understood by Subversion 1.5+ Format 4, understood by Subversion 1.6+ Format 5, understood by Subversion 1.7-dev, never released + Format 6, understood by Subversion 1.8 The differences between the formats are: @@ -173,6 +177,12 @@ Revision changed paths list: Format 1-3: Does not contain the node's kind. Format 4+: Contains the node's kind. +Shard packing: + Format 4: Applied to revision data only. + Format 5: Revprops would be packed independently of revision data. + Format 6+: Applied equally to revision data and revprop data + (i.e. same min packed revision) + # Incomplete list. See SVN_FS_FS__MIN_*_FORMAT @@ -232,6 +242,80 @@ See r1143829 of this file: http://svn.apache.org/viewvc/subversion/trunk/subversion/libsvn_fs_fs/structure?view=markup&pathrev=1143829 +Packing revision properties (format 6+) +--------------------------- + +Similarly to the revision data, packing will concatenate multiple +revprops into a single file. Since they are mutable data, we put an +upper limit to the size of these files: We will concatenate the data +up to the limit and then use a new file for the following revisions. + +The limit can be set and changed at will in the configuration file. +It is 64kB by default. Because a pack file must contain at least one +complete property list, files containing just one revision may exceed +that limit. + +Furthermore, pack files can be compressed which saves about 75% of +disk space. A configuration file flag enables the compression; it is +off by default and may be switched on and off at will. The pack size +limit is always applied to the uncompressed data. For this reason, +the default is 256kB while compression has been enabled. + +Files are named after their start revision as "<rev>.<counter>" where +counter will be increased whenever we rewrite a pack file due to a +revprop change. The manifest file contains the list of pack file +names, one line for each revision. + +Many tools track repository global data in revision properties at +revision 0. To minimize I/O overhead for those applications, we +will never pack that revision, i.e. its data is always being kept +in revprops/0/0. + +Pack file format + + Top level: <packed container> + + We always apply data compression to the pack file - using the + SVN_DELTA_COMPRESSION_LEVEL_NONE level if compression is disabled. + (Note that compression at SVN_DELTA_COMPRESSION_LEVEL_NONE is not + a no-op stream transformation although most of the data will remain + human readable.) + + container := header '\n' (revprops)+ + header := start_rev '\n' rev_count '\n' (size '\n')+ + + All numbers in the header are given as ASCII decimals. rev_count + is the number of revisions packed into this container. There must + be exactly as many "size" and serialized "revprops". The "size" + values in the list are the length in bytes of the serialized + revprops of the respective revision. + +Writing to packed revprops + + The old pack file is being read and the new revprops serialized. + If they fit into the same pack file, a temp file with the new + content gets written and moved into place just like an non-packed + revprop file would. No name change or manifest update required. + + If they don't fit into the same pack file, i.e. exceed the pack + size limit, the pack will be split into 2 or 3 new packs just + before and / or after the modified revision. + + In the current implementation, they will never be merged again. + To minimize fragmentation, the initial packing process will only + use about 90% of the limit, i.e. leave some room for growth. + + When a pack file gets split, its counter is being increased + creating a new file and leaving the old content in place and + available for concurrent readers. Only after the new manifest + file got moved into place, will the old pack files be deleted. + + Write access to revprops is being serialized by the global + filesystem write lock. We only need to build a few retries into + the reader code to gracefully handle manifest changes and pack + file deletions. + + Node-revision IDs ----------------- @@ -266,7 +350,7 @@ Within a revision: "r<rev>/<offset>" txn-id fields. In Format 3 and above, this uniqueness is done by changing a temporary - id of "_<base36>" to "<rev>-<base36>". Note that this means that the + id of "_<base36>" to "<base36>-<rev>". Note that this means that the originating revision of a line of history or a copy can be determined by looking at the node ID. @@ -379,7 +463,8 @@ defined: props "<rev> <offset> <length> <size> <digest>" for props rep <rev> and <offset> give location of rep <length> gives length of rep, sans header and trailer - <size> gives size of expanded rep + <size> gives size of expanded rep; may be 0 if equal + to the length <digest> gives hex MD5 digest of expanded rep ### in formats >=4, also present: <sha1-digest> gives hex SHA1 digest of expanded rep @@ -441,6 +526,7 @@ A transaction directory has the following layout: node.<nid>.<cid> New node-rev data for node node.<nid>.<cid>.props Props for new node-rev, if changed node.<nid>.<cid>.children Directory contents for node-rev + <sha1> Text representation of that sha1 In FS formats 1 and 2, it also contains: @@ -458,6 +544,11 @@ The two kinds of props files are all in hash dump format. The "props" file will always be present. The "node.<nid>.<cid>.props" file will only be present if the node-rev properties have been changed. +The <sha1> files have been introduced in FS format 6. Their content +is that of text rep references: "<rev> <offset> <length> <size> <digest>" +They will be written for text reps in the current transaction and be +used to eliminate duplicate reps within that transaction. + The "next-ids" file contains a single line "<next-temp-node-id> <next-temp-copy-id>\n" giving the next temporary node-ID and copy-ID assignments (without the leading underscores). The next node-ID is |