diff options
Diffstat (limited to 'subversion/libsvn_fs_x')
54 files changed, 41001 insertions, 0 deletions
diff --git a/subversion/libsvn_fs_x/TODO b/subversion/libsvn_fs_x/TODO new file mode 100644 index 0000000..4daf45b --- /dev/null +++ b/subversion/libsvn_fs_x/TODO @@ -0,0 +1,270 @@ + +TODO (see also DONE section below) +================================== + +Internal API cleanup +-------------------- + +During refactoring, some functions had to be declared in header files +to make them available to other fsfs code. We need to revisit those +function definitions to turn them into a proper API that may be useful +to other code (such as fsfs tools). + + +Checksum all metadata elements +------------------------------ + +All elements of an FS-X repository shall be guarded by checksums. That +includes indexes, noderevs etc. Larger data structures, such as index +files, should have checksummed sub-elements such that corrupted parts +may be identified and potentially repaired / circumvented in a meaningful +way. + +Those checksums may be quite simple such as Adler32 because that meta- +data can be cross-verified with other parts as well and acts only as a +fallback to narrow down the affected parts. + +'svnadmin verify' shall check consistency based on those checksums. + + +Port existing FSFS tools +------------------------ + +fsfs-stats, fsfsverify.py and possibly others should have equivalents +in the FS-X world. + + +Optimize data ordering during pack +---------------------------------- + +I/O optimized copy algorithms are yet to be implemented. The current +code is relatively slow as it performs quasi-random I/O on the +input stream. + + +TxDelta v2 +---------- + +Version 1 of txdelta turns out to be limited in its effectiveness for +larger files when data gets inserted or removed. For typical office +documents (zip files), deltification often becomes ineffective. + +Version 2 shall introduce the following changes: + +- increase the delta window from 100kB to 1MB +- use a sliding window instead of a fixed-sized one +- use a slightly more efficient instruction encoding + +When introducing it, we will make it an option at the txdelta interfaces +(e.g. a format number). The version will be indicated in the 'SVN\x1' / +'SVN\x2' stream header. While at it, (try to) fix the layering violations +where those prefixes are being read or written. + + +Large file storage +------------------ + +Even most source code repositories contain large, hard to compress, +hard to deltify binaries. Reconstructing their content becomes very I/O +intense and it "dilutes" the data in our pack files. The latter makes +e.g. caching, prefetching and packing less efficient. + +Once a representation exceeds a certain configured threshold (16M default), +the fulltext of that item will be stored in a separate file. This will +be marked in the representation_t by an extra flag and future reps will +not be deltified against it. From that location, the data can be forwarded +directly via SendFile and the fulltext caches will not be used for it. + +Note that by making the decision contingent upon the size of the deltified +and packed representation, all large data that benefit from these (i.e. +have smaller increments) will still be stored within the rev and pack files. +If a future representation is smaller than the threshold, it may be + +/* danielsh: so if we have a file which is 20MB over many revisions, it'll +be stored in fulltext every single time unless the configured threshold is +changed? Wondering if that's the best solution... */ + + +Sorted binary directory representations +--------------------------------------- + +Lookup of entries in a directory is a frequent operation when following +cached paths. The represents directories as arrays sorted by entry name +to allow for binary search during that lookup. However, all external +representation uses hashes and the conversion is expensive. + +FS-X shall store directory representations sorted by element names and +all use that array representation internally wherever appropriate. This +will minimize the conversion overhead for long directories, especially +during transaction building. + +Moreover, switch from the key/value representation to a slightly tighter +and easier to process binary representation (validity is already guaranteed +by checksums). + + +Star-Deltification +------------------ + +Current implementation is incomplete. TODO: actually support & use base +representations, optimize instruction table. + +Combine this with Txdelta 2 such that the corresponding windows from +all representations get stored in a common star-delta container. + + +Multiple pack stages +-------------------- + +FSFS only knows one packing level - the shard. For repositories with +a large number of revisions, it may be more efficient to start with small +packs (10-ish) and later pack them into larger and larger ones. + + +Open less files when opening a repository +----------------------------------------- + +Opening a repository reads numerous files in db/ (besides several more in +../conf): uuid, current, format, fs-type, fsfs.conf, min-unpacked-rev, ... + +Combine most of them into one or two files (eg uuid|format(|fs-type?), +current|min-unpacked-revprop). + + +Sharded transaction directories +------------------------------- + +Transaction directories contain 3 OS files per FS file modified in the +transaction. That doesn't scale well; find something better. + + +DONE +==== + +Turn into separate FS +--------------------- + +Make FS-X a separate file system alongside BDB and FSFS. Rip out all +FSFS compatibility code. + + +Logical addressing +------------------ + +To allow for moving data structures around within the repository, we must +replace the current absolute addressing using file offsets with a logical +one. All references will no take the form of (revision, index) pairs and +a replacement to the format 6 manifest files will map that to actual file +offsets. + +Having the need to map revision-local offsets to pack-file global offsets +today already gives us some localized address mapping code that simply +needs to be replaced. + + +Optimize data ordering during pack +---------------------------------- + +Replace today's simple concatenating shard packing process with a one +placing fragments (representations and noderevs) from various revisions +close to each other if they are likely needed to serve in the same request. + +We will optimize on a per-shard basis. The general strategy is + +* place all change lists at the beginning of the pack file + - strict revision order + - place newest ones first +* place all file properties reps next + - place newer reps first +* place all directory properties next + - place newer reps first +* place all root nodes and root directories + - ordered newest rev -> oldest rev + - place rep delta chains 'en block' + - place root node in front of rep, if that rep has not already + been placed as part of a rep delta chain +* place remaining content as follows: + - place node rev directly in front of their reps (where they have one) + - start with the latest root directory not placed, yet + - recurse to sub-folders first with, sorted by name + - per folder, place files in naming order + - place rep deltification chains in deltification order (new->old) +* no fragments should be left but if they are, put them at the end + + +Index pack files +---------------- + +In addition to the manifest we need for the (revision, index) -> offset +mapping, we also introduce an offset -> (revision, index, type) index +file. This will allow us to parse any data in a pack file without walking +the DAG top down. + + +Data prefetch +------------- + +This builds on the previous. The idea is that whenever a cache lookup +fails, we will not just read the single missing fragment but parse all +data within the APR file buffer and put that into the cache. + +For maximum efficiency, we will align the data blocks being read to +multiples of the block size and allow that buffer size to be configured +(where supported by APR). The default block size will be raised to 64kB. + + +Extend 'svnadmin verify' +------------------------ + +Format 7 provides many extra chances to verify contents plus contains +extra indexes that must be consistent with the pack / rev files. We +must extend the tests to cover all that. + + +Containers +---------- + +Extend the index format support containers, i.e. map a logical item index +to (file offset, sub-index) pairs. The whole container will be read and +cached and the specific item later accessed from the whole structure. + +Use these containers for reps, noderevs and changes. Provide specific +data container types for each of these item types and different item +types cannot be put into the same container. Containers are binaries, +i.e. there is no textual representations of their contents. + +This allows for significant space savings on disk due to deltification +amongst e.g. revprops. More importantly, it reduces the size of the +runtime data structures within the cache *and* reduces the number of +cache entries (the cache is can't handle items < 500 bytes very well). + + +Packed change lists +------------------- + +Change lists tend to be large, in some cases >20% of the repo. Due to the +new ordering of pack data, the change lists can be the largest part of +data to read for svn log. Use our standard compression method to save +70 .. 80% of the disk space. + +Packing will only be applied to binary representations of change lists +to keep the number of possible combinations low. + + +Star-Deltification +------------------ + +Most node contents are smaller than 500k, i.e. less than Txdelta 2 window. +Those contents shall be aggregated into star-delta containers upon pack. +This will save significant amounts of disk space, particularly in case +of heavy branching. Also, the data extraction is independent of the +number of deltas, i.e. delta chain length) within the same container. + + +Support for arbitrary chars in path names +----------------------------------------- + +FSFS's textual item representations breaks when path names contain +newlines. FS-X revisions shall escape all control chars (e.g. < 0x20) +in path names when using them in textual item representations. + diff --git a/subversion/libsvn_fs_x/cached_data.c b/subversion/libsvn_fs_x/cached_data.c new file mode 100644 index 0000000..2fdf569 --- /dev/null +++ b/subversion/libsvn_fs_x/cached_data.c @@ -0,0 +1,3355 @@ +/* cached_data.c --- cached (read) access to FSX data + * + * ==================================================================== + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + * ==================================================================== + */ + +#include "cached_data.h" + +#include <assert.h> + +#include "svn_hash.h" +#include "svn_ctype.h" +#include "svn_sorts.h" + +#include "private/svn_io_private.h" +#include "private/svn_sorts_private.h" +#include "private/svn_subr_private.h" +#include "private/svn_temp_serializer.h" + +#include "fs_x.h" +#include "low_level.h" +#include "util.h" +#include "pack.h" +#include "temp_serializer.h" +#include "index.h" +#include "changes.h" +#include "noderevs.h" +#include "reps.h" + +#include "../libsvn_fs/fs-loader.h" +#include "../libsvn_delta/delta.h" /* for SVN_DELTA_WINDOW_SIZE */ + +#include "svn_private_config.h" + +/* forward-declare. See implementation for the docstring */ +static svn_error_t * +block_read(void **result, + svn_fs_t *fs, + const svn_fs_x__id_t *id, + svn_fs_x__revision_file_t *revision_file, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool); + + +/* Defined this to enable access logging via dgb__log_access +#define SVN_FS_X__LOG_ACCESS +*/ + +/* When SVN_FS_X__LOG_ACCESS has been defined, write a line to console + * showing where ID is located in FS and use ITEM to show details on it's + * contents if not NULL. Use SCRATCH_POOL for temporary allocations. + */ +static svn_error_t * +dgb__log_access(svn_fs_t *fs, + const svn_fs_x__id_t *id, + void *item, + apr_uint32_t item_type, + apr_pool_t *scratch_pool) +{ + /* no-op if this macro is not defined */ +#ifdef SVN_FS_X__LOG_ACCESS + svn_fs_x__data_t *ffd = fs->fsap_data; + apr_off_t offset = -1; + apr_off_t end_offset = 0; + apr_uint32_t sub_item = 0; + svn_fs_x__p2l_entry_t *entry = NULL; + static const char *types[] = {"<n/a>", "frep ", "drep ", "fprop", "dprop", + "node ", "chgs ", "rep ", "c:", "n:", "r:"}; + const char *description = ""; + const char *type = types[item_type]; + const char *pack = ""; + svn_revnum_t revision = svn_fs_x__get_revnum(id->change_set); + + /* determine rev / pack file offset */ + SVN_ERR(svn_fs_x__item_offset(&offset, &sub_item, fs, id, scratch_pool)); + + /* constructing the pack file description */ + if (revision < ffd->min_unpacked_rev) + pack = apr_psprintf(scratch_pool, "%4ld|", + revision / ffd->max_files_per_dir); + + /* construct description if possible */ + if (item_type == SVN_FS_X__ITEM_TYPE_NODEREV && item != NULL) + { + svn_fs_x__noderev_t *node = item; + const char *data_rep + = node->data_rep + ? apr_psprintf(scratch_pool, " d=%ld/%" APR_UINT64_T_FMT, + svn_fs_x__get_revnum(node->data_rep->id.change_set), + node->data_rep->id.number) + : ""; + const char *prop_rep + = node->prop_rep + ? apr_psprintf(scratch_pool, " p=%ld/%" APR_UINT64_T_FMT, + svn_fs_x__get_revnum(node->prop_rep->id.change_set), + node->prop_rep->id.number) + : ""; + description = apr_psprintf(scratch_pool, "%s (pc=%d%s%s)", + node->created_path, + node->predecessor_count, + data_rep, + prop_rep); + } + else if (item_type == SVN_FS_X__ITEM_TYPE_ANY_REP) + { + svn_fs_x__rep_header_t *header = item; + if (header == NULL) + description = " (txdelta window)"; + else if (header->type == svn_fs_x__rep_self_delta) + description = " DELTA"; + else + description = apr_psprintf(scratch_pool, + " DELTA against %ld/%" APR_UINT64_T_FMT, + header->base_revision, + header->base_item_index); + } + else if (item_type == SVN_FS_X__ITEM_TYPE_CHANGES && item != NULL) + { + apr_array_header_t *changes = item; + switch (changes->nelts) + { + case 0: description = " no change"; + break; + case 1: description = " 1 change"; + break; + default: description = apr_psprintf(scratch_pool, " %d changes", + changes->nelts); + } + } + + /* reverse index lookup: get item description in ENTRY */ + SVN_ERR(svn_fs_x__p2l_entry_lookup(&entry, fs, revision, offset, + scratch_pool)); + if (entry) + { + /* more details */ + end_offset = offset + entry->size; + type = types[entry->type]; + + /* merge the sub-item number with the container type */ + if ( entry->type == SVN_FS_X__ITEM_TYPE_CHANGES_CONT + || entry->type == SVN_FS_X__ITEM_TYPE_NODEREVS_CONT + || entry->type == SVN_FS_X__ITEM_TYPE_REPS_CONT) + type = apr_psprintf(scratch_pool, "%s%-3d", type, sub_item); + } + + /* line output */ + printf("%5s%4lx:%04lx -%4lx:%04lx %s %7ld %5"APR_UINT64_T_FMT" %s\n", + pack, (long)(offset / ffd->block_size), + (long)(offset % ffd->block_size), + (long)(end_offset / ffd->block_size), + (long)(end_offset % ffd->block_size), + type, revision, id->number, description); + +#endif + + return SVN_NO_ERROR; +} + +/* Convenience wrapper around svn_io_file_aligned_seek, taking filesystem + FS instead of a block size. */ +static svn_error_t * +aligned_seek(svn_fs_t *fs, + apr_file_t *file, + apr_off_t *buffer_start, + apr_off_t offset, + apr_pool_t *scratch_pool) +{ + svn_fs_x__data_t *ffd = fs->fsap_data; + return svn_error_trace(svn_io_file_aligned_seek(file, ffd->block_size, + buffer_start, offset, + scratch_pool)); +} + +/* Open the revision file for the item given by ID in filesystem FS and + store the newly opened file in FILE. Seek to the item's location before + returning. + + Allocate the result in RESULT_POOL and temporaries in SCRATCH_POOL. */ +static svn_error_t * +open_and_seek_revision(svn_fs_x__revision_file_t **file, + svn_fs_t *fs, + const svn_fs_x__id_t *id, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool) +{ + svn_fs_x__revision_file_t *rev_file; + apr_off_t offset = -1; + apr_uint32_t sub_item = 0; + svn_revnum_t rev = svn_fs_x__get_revnum(id->change_set); + + SVN_ERR(svn_fs_x__ensure_revision_exists(rev, fs, scratch_pool)); + + SVN_ERR(svn_fs_x__open_pack_or_rev_file(&rev_file, fs, rev, result_pool, + scratch_pool)); + SVN_ERR(svn_fs_x__item_offset(&offset, &sub_item, fs, rev_file, id, + scratch_pool)); + SVN_ERR(aligned_seek(fs, rev_file->file, NULL, offset, scratch_pool)); + + *file = rev_file; + + return SVN_NO_ERROR; +} + +/* Open the representation REP for a node-revision in filesystem FS, seek + to its position and store the newly opened file in FILE. + + Allocate the result in RESULT_POOL and temporaries in SCRATCH_POOL. */ +static svn_error_t * +open_and_seek_transaction(svn_fs_x__revision_file_t **file, + svn_fs_t *fs, + svn_fs_x__representation_t *rep, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool) +{ + apr_off_t offset; + apr_uint32_t sub_item = 0; + apr_int64_t txn_id = svn_fs_x__get_txn_id(rep->id.change_set); + + SVN_ERR(svn_fs_x__open_proto_rev_file(file, fs, txn_id, result_pool, + scratch_pool)); + + SVN_ERR(svn_fs_x__item_offset(&offset, &sub_item, fs, *file, &rep->id, + scratch_pool)); + SVN_ERR(aligned_seek(fs, (*file)->file, NULL, offset, scratch_pool)); + + return SVN_NO_ERROR; +} + +/* Given a node-id ID, and a representation REP in filesystem FS, open + the correct file and seek to the correction location. Store this + file in *FILE_P. + + Allocate the result in RESULT_POOL and temporaries in SCRATCH_POOL. */ +static svn_error_t * +open_and_seek_representation(svn_fs_x__revision_file_t **file_p, + svn_fs_t *fs, + svn_fs_x__representation_t *rep, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool) +{ + if (svn_fs_x__is_revision(rep->id.change_set)) + return open_and_seek_revision(file_p, fs, &rep->id, result_pool, + scratch_pool); + else + return open_and_seek_transaction(file_p, fs, rep, result_pool, + scratch_pool); +} + + + +static svn_error_t * +err_dangling_id(svn_fs_t *fs, + const svn_fs_x__id_t *id) +{ + svn_string_t *id_str = svn_fs_x__id_unparse(id, fs->pool); + return svn_error_createf + (SVN_ERR_FS_ID_NOT_FOUND, 0, + _("Reference to non-existent node '%s' in filesystem '%s'"), + id_str->data, fs->path); +} + +/* Get the node-revision for the node ID in FS. + Set *NODEREV_P to the new node-revision structure, allocated in POOL. + See svn_fs_x__get_node_revision, which wraps this and adds another + error. */ +static svn_error_t * +get_node_revision_body(svn_fs_x__noderev_t **noderev_p, + svn_fs_t *fs, + const svn_fs_x__id_t *id, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool) +{ + svn_error_t *err; + svn_boolean_t is_cached = FALSE; + svn_fs_x__data_t *ffd = fs->fsap_data; + + if (svn_fs_x__is_txn(id->change_set)) + { + apr_file_t *file; + + /* This is a transaction node-rev. Its storage logic is very + different from that of rev / pack files. */ + err = svn_io_file_open(&file, + svn_fs_x__path_txn_node_rev(fs, id, + scratch_pool, + scratch_pool), + APR_READ | APR_BUFFERED, APR_OS_DEFAULT, + scratch_pool); + if (err) + { + if (APR_STATUS_IS_ENOENT(err->apr_err)) + { + svn_error_clear(err); + return svn_error_trace(err_dangling_id(fs, id)); + } + + return svn_error_trace(err); + } + + SVN_ERR(svn_fs_x__read_noderev(noderev_p, + svn_stream_from_aprfile2(file, + FALSE, + scratch_pool), + result_pool, scratch_pool)); + } + else + { + svn_fs_x__revision_file_t *revision_file; + + /* noderevs in rev / pack files can be cached */ + svn_revnum_t revision = svn_fs_x__get_revnum(id->change_set); + svn_fs_x__pair_cache_key_t key; + + SVN_ERR(svn_fs_x__open_pack_or_rev_file(&revision_file, fs, revision, + scratch_pool, scratch_pool)); + + /* First, try a noderevs container cache lookup. */ + if ( svn_fs_x__is_packed_rev(fs, revision) + && ffd->noderevs_container_cache) + { + apr_off_t offset; + apr_uint32_t sub_item; + SVN_ERR(svn_fs_x__item_offset(&offset, &sub_item, fs, revision_file, + id, scratch_pool)); + key.revision = svn_fs_x__packed_base_rev(fs, revision); + key.second = offset; + + SVN_ERR(svn_cache__get_partial((void **)noderev_p, &is_cached, + ffd->noderevs_container_cache, &key, + svn_fs_x__noderevs_get_func, + &sub_item, result_pool)); + if (is_cached) + return SVN_NO_ERROR; + } + + key.revision = revision; + key.second = id->number; + + /* Not found or not applicable. Try a noderev cache lookup. + * If that succeeds, we are done here. */ + if (ffd->node_revision_cache) + { + SVN_ERR(svn_cache__get((void **) noderev_p, + &is_cached, + ffd->node_revision_cache, + &key, + result_pool)); + if (is_cached) + return SVN_NO_ERROR; + } + + /* block-read will parse the whole block and will also return + the one noderev that we need right now. */ + SVN_ERR(block_read((void **)noderev_p, fs, + id, + revision_file, + result_pool, + scratch_pool)); + SVN_ERR(svn_fs_x__close_revision_file(revision_file)); + } + + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__get_node_revision(svn_fs_x__noderev_t **noderev_p, + svn_fs_t *fs, + const svn_fs_x__id_t *id, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool) +{ + svn_error_t *err = get_node_revision_body(noderev_p, fs, id, + result_pool, scratch_pool); + if (err && err->apr_err == SVN_ERR_FS_CORRUPT) + { + svn_string_t *id_string = svn_fs_x__id_unparse(id, scratch_pool); + return svn_error_createf(SVN_ERR_FS_CORRUPT, err, + "Corrupt node-revision '%s'", + id_string->data); + } + + SVN_ERR(dgb__log_access(fs, id, *noderev_p, + SVN_FS_X__ITEM_TYPE_NODEREV, scratch_pool)); + + return svn_error_trace(err); +} + + +svn_error_t * +svn_fs_x__get_mergeinfo_count(apr_int64_t *count, + svn_fs_t *fs, + const svn_fs_x__id_t *id, + apr_pool_t *scratch_pool) +{ + svn_fs_x__noderev_t *noderev; + + /* If we want a full acccess log, we need to provide full data and + cannot take shortcuts here. */ +#if !defined(SVN_FS_X__LOG_ACCESS) + + /* First, try a noderevs container cache lookup. */ + if (! svn_fs_x__is_txn(id->change_set)) + { + /* noderevs in rev / pack files can be cached */ + svn_fs_x__data_t *ffd = fs->fsap_data; + svn_revnum_t revision = svn_fs_x__get_revnum(id->change_set); + + svn_fs_x__revision_file_t *rev_file; + SVN_ERR(svn_fs_x__open_pack_or_rev_file(&rev_file, fs, revision, + scratch_pool, scratch_pool)); + + if ( svn_fs_x__is_packed_rev(fs, revision) + && ffd->noderevs_container_cache) + { + svn_fs_x__pair_cache_key_t key; + apr_off_t offset; + apr_uint32_t sub_item; + svn_boolean_t is_cached; + + SVN_ERR(svn_fs_x__item_offset(&offset, &sub_item, fs, rev_file, + id, scratch_pool)); + key.revision = svn_fs_x__packed_base_rev(fs, revision); + key.second = offset; + + SVN_ERR(svn_cache__get_partial((void **)count, &is_cached, + ffd->noderevs_container_cache, &key, + svn_fs_x__mergeinfo_count_get_func, + &sub_item, scratch_pool)); + if (is_cached) + return SVN_NO_ERROR; + } + } +#endif + + /* fallback to the naive implementation handling all edge cases */ + SVN_ERR(svn_fs_x__get_node_revision(&noderev, fs, id, scratch_pool, + scratch_pool)); + *count = noderev->mergeinfo_count; + + return SVN_NO_ERROR; +} + +/* Describes a lazily opened rev / pack file. Instances will be shared + between multiple instances of rep_state_t. */ +typedef struct shared_file_t +{ + /* The opened file. NULL while file is not open, yet. */ + svn_fs_x__revision_file_t *rfile; + + /* file system to open the file in */ + svn_fs_t *fs; + + /* a revision contained in the FILE. Since this file may be shared, + that value may be different from REP_STATE_T->REVISION. */ + svn_revnum_t revision; + + /* pool to use when creating the FILE. This guarantees that the file + remains open / valid beyond the respective local context that required + the file to be opened eventually. */ + apr_pool_t *pool; +} shared_file_t; + +/* Represents where in the current svndiff data block each + representation is. */ +typedef struct rep_state_t +{ + /* shared lazy-open rev/pack file structure */ + shared_file_t *sfile; + /* The txdelta window cache to use or NULL. */ + svn_cache__t *window_cache; + /* Caches un-deltified windows. May be NULL. */ + svn_cache__t *combined_cache; + /* ID addressing the representation */ + svn_fs_x__id_t rep_id; + /* length of the header at the start of the rep. + 0 iff this is rep is stored in a container + (i.e. does not have a header) */ + apr_size_t header_size; + apr_off_t start; /* The starting offset for the raw + svndiff data minus header. + -1 if the offset is yet unknown. */ + /* sub-item index in case the rep is containered */ + apr_uint32_t sub_item; + apr_off_t current;/* The current offset relative to START. */ + apr_off_t size; /* The on-disk size of the representation. */ + int ver; /* If a delta, what svndiff version? + -1 for unknown delta version. */ + int chunk_index; /* number of the window to read */ +} rep_state_t; + +/* Simple wrapper around svn_fs_x__get_file_offset to simplify callers. */ +static svn_error_t * +get_file_offset(apr_off_t *offset, + rep_state_t *rs, + apr_pool_t *scratch_pool) +{ + return svn_error_trace(svn_fs_x__get_file_offset(offset, + rs->sfile->rfile->file, + scratch_pool)); +} + +/* Simple wrapper around svn_io_file_aligned_seek to simplify callers. */ +static svn_error_t * +rs_aligned_seek(rep_state_t *rs, + apr_off_t *buffer_start, + apr_off_t offset, + apr_pool_t *scratch_pool) +{ + svn_fs_x__data_t *ffd = rs->sfile->fs->fsap_data; + return svn_error_trace(svn_io_file_aligned_seek(rs->sfile->rfile->file, + ffd->block_size, + buffer_start, offset, + scratch_pool)); +} + +/* Open FILE->FILE and FILE->STREAM if they haven't been opened, yet. */ +static svn_error_t* +auto_open_shared_file(shared_file_t *file) +{ + if (file->rfile == NULL) + SVN_ERR(svn_fs_x__open_pack_or_rev_file(&file->rfile, file->fs, + file->revision, file->pool, + file->pool)); + + return SVN_NO_ERROR; +} + +/* Set RS->START to the begin of the representation raw in RS->SFILE->RFILE, + if that hasn't been done yet. Use SCRATCH_POOL for temporary allocations. + */ +static svn_error_t* +auto_set_start_offset(rep_state_t *rs, + apr_pool_t *scratch_pool) +{ + if (rs->start == -1) + { + SVN_ERR(svn_fs_x__item_offset(&rs->start, &rs->sub_item, + rs->sfile->fs, rs->sfile->rfile, + &rs->rep_id, scratch_pool)); + rs->start += rs->header_size; + } + + return SVN_NO_ERROR; +} + +/* Set RS->VER depending on what is found in the already open RS->FILE->FILE + if the diff version is still unknown. Use SCRATCH_POOL for temporary + allocations. + */ +static svn_error_t* +auto_read_diff_version(rep_state_t *rs, + apr_pool_t *scratch_pool) +{ + if (rs->ver == -1) + { + char buf[4]; + SVN_ERR(rs_aligned_seek(rs, NULL, rs->start, scratch_pool)); + SVN_ERR(svn_io_file_read_full2(rs->sfile->rfile->file, buf, + sizeof(buf), NULL, NULL, scratch_pool)); + + /* ### Layering violation */ + if (! ((buf[0] == 'S') && (buf[1] == 'V') && (buf[2] == 'N'))) + return svn_error_create + (SVN_ERR_FS_CORRUPT, NULL, + _("Malformed svndiff data in representation")); + rs->ver = buf[3]; + + rs->chunk_index = 0; + rs->current = 4; + } + + return SVN_NO_ERROR; +} + +/* See create_rep_state, which wraps this and adds another error. */ +static svn_error_t * +create_rep_state_body(rep_state_t **rep_state, + svn_fs_x__rep_header_t **rep_header, + shared_file_t **shared_file, + svn_fs_x__representation_t *rep, + svn_fs_t *fs, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool) +{ + svn_fs_x__data_t *ffd = fs->fsap_data; + rep_state_t *rs = apr_pcalloc(result_pool, sizeof(*rs)); + svn_fs_x__rep_header_t *rh; + svn_boolean_t is_cached = FALSE; + svn_revnum_t revision = svn_fs_x__get_revnum(rep->id.change_set); + apr_uint64_t estimated_window_storage; + + /* If the hint is + * - given, + * - refers to a valid revision, + * - refers to a packed revision, + * - as does the rep we want to read, and + * - refers to the same pack file as the rep + * we can re-use the same, already open file object + */ + svn_boolean_t reuse_shared_file + = shared_file && *shared_file && (*shared_file)->rfile + && SVN_IS_VALID_REVNUM((*shared_file)->revision) + && (*shared_file)->revision < ffd->min_unpacked_rev + && revision < ffd->min_unpacked_rev + && ( ((*shared_file)->revision / ffd->max_files_per_dir) + == (revision / ffd->max_files_per_dir)); + + svn_fs_x__representation_cache_key_t key = { 0 }; + key.revision = revision; + key.is_packed = revision < ffd->min_unpacked_rev; + key.item_index = rep->id.number; + + /* continue constructing RS and RA */ + rs->size = rep->size; + rs->rep_id = rep->id; + rs->ver = -1; + rs->start = -1; + + /* Very long files stored as self-delta will produce a huge number of + delta windows. Don't cache them lest we don't thrash the cache. + Since we don't know the depth of the delta chain, let's assume, the + whole contents get rewritten 3 times. + */ + estimated_window_storage + = 4 * ( (rep->expanded_size ? rep->expanded_size : rep->size) + + SVN_DELTA_WINDOW_SIZE); + estimated_window_storage = MIN(estimated_window_storage, APR_SIZE_MAX); + + rs->window_cache = ffd->txdelta_window_cache + && svn_cache__is_cachable(ffd->txdelta_window_cache, + (apr_size_t)estimated_window_storage) + ? ffd->txdelta_window_cache + : NULL; + rs->combined_cache = ffd->combined_window_cache + && svn_cache__is_cachable(ffd->combined_window_cache, + (apr_size_t)estimated_window_storage) + ? ffd->combined_window_cache + : NULL; + + /* cache lookup, i.e. skip reading the rep header if possible */ + if (ffd->rep_header_cache && SVN_IS_VALID_REVNUM(revision)) + SVN_ERR(svn_cache__get((void **) &rh, &is_cached, + ffd->rep_header_cache, &key, result_pool)); + + /* initialize the (shared) FILE member in RS */ + if (reuse_shared_file) + { + rs->sfile = *shared_file; + } + else + { + shared_file_t *file = apr_pcalloc(result_pool, sizeof(*file)); + file->revision = revision; + file->pool = result_pool; + file->fs = fs; + rs->sfile = file; + + /* remember the current file, if suggested by the caller */ + if (shared_file) + *shared_file = file; + } + + /* read rep header, if necessary */ + if (!is_cached) + { + /* we will need the on-disk location for non-txn reps */ + apr_off_t offset; + svn_boolean_t in_container = TRUE; + + /* ensure file is open and navigate to the start of rep header */ + if (reuse_shared_file) + { + /* ... we can re-use the same, already open file object. + * This implies that we don't read from a txn. + */ + rs->sfile = *shared_file; + SVN_ERR(auto_open_shared_file(rs->sfile)); + } + else + { + /* otherwise, create a new file object. May or may not be + * an in-txn file. + */ + SVN_ERR(open_and_seek_representation(&rs->sfile->rfile, fs, rep, + result_pool, scratch_pool)); + } + + if (SVN_IS_VALID_REVNUM(revision)) + { + apr_uint32_t sub_item; + + SVN_ERR(svn_fs_x__item_offset(&offset, &sub_item, fs, + rs->sfile->rfile, &rep->id, + scratch_pool)); + + /* is rep stored in some star-deltified container? */ + if (sub_item == 0) + { + svn_fs_x__p2l_entry_t *entry; + SVN_ERR(svn_fs_x__p2l_entry_lookup(&entry, fs, rs->sfile->rfile, + revision, offset, + scratch_pool, scratch_pool)); + in_container = entry->type == SVN_FS_X__ITEM_TYPE_REPS_CONT; + } + + if (in_container) + { + /* construct a container rep header */ + *rep_header = apr_pcalloc(result_pool, sizeof(**rep_header)); + (*rep_header)->type = svn_fs_x__rep_container; + + /* exit to caller */ + *rep_state = rs; + return SVN_NO_ERROR; + } + + SVN_ERR(rs_aligned_seek(rs, NULL, offset, scratch_pool)); + } + + SVN_ERR(svn_fs_x__read_rep_header(&rh, rs->sfile->rfile->stream, + result_pool, scratch_pool)); + SVN_ERR(get_file_offset(&rs->start, rs, result_pool)); + + /* populate the cache if appropriate */ + if (SVN_IS_VALID_REVNUM(revision)) + { + SVN_ERR(block_read(NULL, fs, &rs->rep_id, rs->sfile->rfile, + result_pool, scratch_pool)); + if (ffd->rep_header_cache) + SVN_ERR(svn_cache__set(ffd->rep_header_cache, &key, rh, + scratch_pool)); + } + } + + /* finalize */ + SVN_ERR(dgb__log_access(fs, &rs->rep_id, rh, SVN_FS_X__ITEM_TYPE_ANY_REP, + scratch_pool)); + + rs->header_size = rh->header_size; + *rep_state = rs; + *rep_header = rh; + + rs->chunk_index = 0; + + /* skip "SVNx" diff marker */ + rs->current = 4; + + return SVN_NO_ERROR; +} + +/* Read the rep args for REP in filesystem FS and create a rep_state + for reading the representation. Return the rep_state in *REP_STATE + and the rep args in *REP_ARGS, both allocated in POOL. + + When reading multiple reps, i.e. a skip delta chain, you may provide + non-NULL SHARED_FILE. (If SHARED_FILE is not NULL, in the first + call it should be a pointer to NULL.) The function will use this + variable to store the previous call results and tries to re-use it. + This may result in significant savings in I/O for packed files and + number of open file handles. + */ +static svn_error_t * +create_rep_state(rep_state_t **rep_state, + svn_fs_x__rep_header_t **rep_header, + shared_file_t **shared_file, + svn_fs_x__representation_t *rep, + svn_fs_t *fs, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool) +{ + svn_error_t *err = create_rep_state_body(rep_state, rep_header, + shared_file, rep, fs, + result_pool, scratch_pool); + if (err && err->apr_err == SVN_ERR_FS_CORRUPT) + { + /* ### This always returns "-1" for transaction reps, because + ### this particular bit of code doesn't know if the rep is + ### stored in the protorev or in the mutable area (for props + ### or dir contents). It is pretty rare for FSX to *read* + ### from the protorev file, though, so this is probably OK. + ### And anyone going to debug corruption errors is probably + ### going to jump straight to this comment anyway! */ + return svn_error_createf(SVN_ERR_FS_CORRUPT, err, + "Corrupt representation '%s'", + rep + ? svn_fs_x__unparse_representation + (rep, TRUE, scratch_pool, + scratch_pool)->data + : "(null)"); + } + /* ### Call representation_string() ? */ + return svn_error_trace(err); +} + +svn_error_t * +svn_fs_x__check_rep(svn_fs_x__representation_t *rep, + svn_fs_t *fs, + apr_pool_t *scratch_pool) +{ + apr_off_t offset; + apr_uint32_t sub_item; + svn_fs_x__p2l_entry_t *entry; + svn_revnum_t revision = svn_fs_x__get_revnum(rep->id.change_set); + + svn_fs_x__revision_file_t *rev_file; + SVN_ERR(svn_fs_x__open_pack_or_rev_file(&rev_file, fs, revision, + scratch_pool, scratch_pool)); + + /* Does REP->ID refer to an actual item? Which one is it? */ + SVN_ERR(svn_fs_x__item_offset(&offset, &sub_item, fs, rev_file, &rep->id, + scratch_pool)); + + /* What is the type of that item? */ + SVN_ERR(svn_fs_x__p2l_entry_lookup(&entry, fs, rev_file, revision, offset, + scratch_pool, scratch_pool)); + + /* Verify that we've got an item that is actually a representation. */ + if ( entry == NULL + || ( entry->type != SVN_FS_X__ITEM_TYPE_FILE_REP + && entry->type != SVN_FS_X__ITEM_TYPE_DIR_REP + && entry->type != SVN_FS_X__ITEM_TYPE_FILE_PROPS + && entry->type != SVN_FS_X__ITEM_TYPE_DIR_PROPS + && entry->type != SVN_FS_X__ITEM_TYPE_REPS_CONT)) + return svn_error_createf(SVN_ERR_FS_CORRUPT, NULL, + _("No representation found at offset %s " + "for item %s in revision %ld"), + apr_off_t_toa(scratch_pool, offset), + apr_psprintf(scratch_pool, "%" APR_UINT64_T_FMT, + rep->id.number), + revision); + + return SVN_NO_ERROR; +} + +/* . + Do any allocations in POOL. */ +svn_error_t * +svn_fs_x__rep_chain_length(int *chain_length, + int *shard_count, + svn_fs_x__representation_t *rep, + svn_fs_t *fs, + apr_pool_t *scratch_pool) +{ + svn_fs_x__data_t *ffd = fs->fsap_data; + svn_revnum_t shard_size = ffd->max_files_per_dir; + svn_boolean_t is_delta = FALSE; + int count = 0; + int shards = 1; + svn_revnum_t revision = svn_fs_x__get_revnum(rep->id.change_set); + svn_revnum_t last_shard = revision / shard_size; + + /* Note that this iteration pool will be used in a non-standard way. + * To reuse open file handles between iterations (e.g. while within the + * same pack file), we only clear this pool once in a while instead of + * at the start of each iteration. */ + apr_pool_t *iterpool = svn_pool_create(scratch_pool); + + /* Check whether the length of the deltification chain is acceptable. + * Otherwise, shared reps may form a non-skipping delta chain in + * extreme cases. */ + svn_fs_x__representation_t base_rep = *rep; + + /* re-use open files between iterations */ + shared_file_t *file_hint = NULL; + + svn_fs_x__rep_header_t *header; + + /* follow the delta chain towards the end but for at most + * MAX_CHAIN_LENGTH steps. */ + do + { + rep_state_t *rep_state; + revision = svn_fs_x__get_revnum(base_rep.id.change_set); + if (revision / shard_size != last_shard) + { + last_shard = revision / shard_size; + ++shards; + } + + SVN_ERR(create_rep_state_body(&rep_state, + &header, + &file_hint, + &base_rep, + fs, + iterpool, + iterpool)); + + base_rep.id.change_set + = svn_fs_x__change_set_by_rev(header->base_revision); + base_rep.id.number = header->base_item_index; + base_rep.size = header->base_length; + is_delta = header->type == svn_fs_x__rep_delta; + + /* Clear it the ITERPOOL once in a while. Doing it too frequently + * renders the FILE_HINT ineffective. Doing too infrequently, may + * leave us with too many open file handles. + * + * Note that this is mostly about efficiency, with larger values + * being more efficient, and any non-zero value is legal here. When + * reading deltified contents, we may keep 10s of rev files open at + * the same time and the system has to cope with that. Thus, the + * limit of 16 chosen below is in the same ballpark. + */ + ++count; + if (count % 16 == 0) + { + file_hint = NULL; + svn_pool_clear(iterpool); + } + } + while (is_delta && base_rep.id.change_set); + + *chain_length = count; + *shard_count = shards; + svn_pool_destroy(iterpool); + + return SVN_NO_ERROR; +} + + +typedef struct rep_read_baton_t +{ + /* The FS from which we're reading. */ + svn_fs_t *fs; + + /* Representation to read. */ + svn_fs_x__representation_t rep; + + /* If not NULL, this is the base for the first delta window in rs_list */ + svn_stringbuf_t *base_window; + + /* The state of all prior delta representations. */ + apr_array_header_t *rs_list; + + /* The plaintext state, if there is a plaintext. */ + rep_state_t *src_state; + + /* The index of the current delta chunk, if we are reading a delta. */ + int chunk_index; + + /* The buffer where we store undeltified data. */ + char *buf; + apr_size_t buf_pos; + apr_size_t buf_len; + + /* A checksum context for summing the data read in order to verify it. + Note: we don't need to use the sha1 checksum because we're only doing + data verification, for which md5 is perfectly safe. */ + svn_checksum_ctx_t *md5_checksum_ctx; + + svn_boolean_t checksum_finalized; + + /* The stored checksum of the representation we are reading, its + length, and the amount we've read so far. Some of this + information is redundant with rs_list and src_state, but it's + convenient for the checksumming code to have it here. */ + unsigned char md5_digest[APR_MD5_DIGESTSIZE]; + + svn_filesize_t len; + svn_filesize_t off; + + /* The key for the fulltext cache for this rep, if there is a + fulltext cache. */ + svn_fs_x__pair_cache_key_t fulltext_cache_key; + /* The text we've been reading, if we're going to cache it. */ + svn_stringbuf_t *current_fulltext; + + /* If not NULL, attempt to read the data from this cache. + Once that lookup fails, reset it to NULL. */ + svn_cache__t *fulltext_cache; + + /* Bytes delivered from the FULLTEXT_CACHE so far. If the next + lookup fails, we need to skip that much data from the reconstructed + window stream before we continue normal operation. */ + svn_filesize_t fulltext_delivered; + + /* Used for temporary allocations during the read. */ + apr_pool_t *scratch_pool; + + /* Pool used to store file handles and other data that is persistant + for the entire stream read. */ + apr_pool_t *filehandle_pool; +} rep_read_baton_t; + +/* Set window key in *KEY to address the window described by RS. + For convenience, return the KEY. */ +static svn_fs_x__window_cache_key_t * +get_window_key(svn_fs_x__window_cache_key_t *key, + rep_state_t *rs) +{ + svn_revnum_t revision = svn_fs_x__get_revnum(rs->rep_id.change_set); + assert(revision <= APR_UINT32_MAX); + + key->revision = (apr_uint32_t)revision; + key->item_index = rs->rep_id.number; + key->chunk_index = rs->chunk_index; + + return key; +} + +/* Read the WINDOW_P number CHUNK_INDEX for the representation given in + * rep state RS from the current FSX session's cache. This will be a + * no-op and IS_CACHED will be set to FALSE if no cache has been given. + * If a cache is available IS_CACHED will inform the caller about the + * success of the lookup. Allocations (of the window in particualar) will + * be made from POOL. + * + * If the information could be found, put RS to CHUNK_INDEX. + */ + +/* Return data type for get_cached_window_sizes_func. + */ +typedef struct window_sizes_t +{ + /* length of the txdelta window in its on-disk format */ + svn_filesize_t packed_len; + + /* expanded (and combined) window length */ + svn_filesize_t target_len; +} window_sizes_t; + +/* Implements svn_cache__partial_getter_func_t extracting the packed + * and expanded window sizes from a cached window and return the size + * info as a window_sizes_t* in *OUT. + */ +static svn_error_t * +get_cached_window_sizes_func(void **out, + const void *data, + apr_size_t data_len, + void *baton, + apr_pool_t *pool) +{ + const svn_fs_x__txdelta_cached_window_t *window = data; + const svn_txdelta_window_t *txdelta_window + = svn_temp_deserializer__ptr(window, (const void **)&window->window); + + window_sizes_t *result = apr_palloc(pool, sizeof(*result)); + result->packed_len = window->end_offset - window->start_offset; + result->target_len = txdelta_window->tview_len; + + *out = result; + + return SVN_NO_ERROR; +} + +/* Read the WINDOW_P number CHUNK_INDEX for the representation given in + * rep state RS from the current FSFS session's cache. This will be a + * no-op and IS_CACHED will be set to FALSE if no cache has been given. + * If a cache is available IS_CACHED will inform the caller about the + * success of the lookup. Allocations of the window in will be made + * from RESULT_POOL. Use SCRATCH_POOL for temporary allocations. + * + * If the information could be found, put RS to CHUNK_INDEX. + */ +static svn_error_t * +get_cached_window_sizes(window_sizes_t **sizes, + rep_state_t *rs, + svn_boolean_t *is_cached, + apr_pool_t *pool) +{ + if (! rs->window_cache) + { + /* txdelta window has not been enabled */ + *is_cached = FALSE; + } + else + { + svn_fs_x__window_cache_key_t key = { 0 }; + SVN_ERR(svn_cache__get_partial((void **)sizes, + is_cached, + rs->window_cache, + get_window_key(&key, rs), + get_cached_window_sizes_func, + NULL, + pool)); + } + + return SVN_NO_ERROR; +} + +static svn_error_t * +get_cached_window(svn_txdelta_window_t **window_p, + rep_state_t *rs, + int chunk_index, + svn_boolean_t *is_cached, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool) +{ + if (! rs->window_cache) + { + /* txdelta window has not been enabled */ + *is_cached = FALSE; + } + else + { + /* ask the cache for the desired txdelta window */ + svn_fs_x__txdelta_cached_window_t *cached_window; + svn_fs_x__window_cache_key_t key = { 0 }; + get_window_key(&key, rs); + key.chunk_index = chunk_index; + SVN_ERR(svn_cache__get((void **) &cached_window, + is_cached, + rs->window_cache, + &key, + result_pool)); + + if (*is_cached) + { + /* found it. Pass it back to the caller. */ + *window_p = cached_window->window; + + /* manipulate the RS as if we just read the data */ + rs->current = cached_window->end_offset; + rs->chunk_index = chunk_index; + } + } + + return SVN_NO_ERROR; +} + +/* Store the WINDOW read for the rep state RS with the given START_OFFSET + * within the pack / rev file in the current FSX session's cache. This + * will be a no-op if no cache has been given. + * Temporary allocations will be made from SCRATCH_POOL. */ +static svn_error_t * +set_cached_window(svn_txdelta_window_t *window, + rep_state_t *rs, + apr_off_t start_offset, + apr_pool_t *scratch_pool) +{ + if (rs->window_cache) + { + /* store the window and the first offset _past_ it */ + svn_fs_x__txdelta_cached_window_t cached_window; + svn_fs_x__window_cache_key_t key = {0}; + + cached_window.window = window; + cached_window.start_offset = start_offset - rs->start; + cached_window.end_offset = rs->current; + + /* but key it with the start offset because that is the known state + * when we will look it up */ + SVN_ERR(svn_cache__set(rs->window_cache, + get_window_key(&key, rs), + &cached_window, + scratch_pool)); + } + + return SVN_NO_ERROR; +} + +/* Read the WINDOW_P for the rep state RS from the current FSX session's + * cache. This will be a no-op and IS_CACHED will be set to FALSE if no + * cache has been given. If a cache is available IS_CACHED will inform + * the caller about the success of the lookup. Allocations (of the window + * in particular) will be made from POOL. + */ +static svn_error_t * +get_cached_combined_window(svn_stringbuf_t **window_p, + rep_state_t *rs, + svn_boolean_t *is_cached, + apr_pool_t *pool) +{ + if (! rs->combined_cache) + { + /* txdelta window has not been enabled */ + *is_cached = FALSE; + } + else + { + /* ask the cache for the desired txdelta window */ + svn_fs_x__window_cache_key_t key = { 0 }; + return svn_cache__get((void **)window_p, + is_cached, + rs->combined_cache, + get_window_key(&key, rs), + pool); + } + + return SVN_NO_ERROR; +} + +/* Store the WINDOW read for the rep state RS in the current FSX session's + * cache. This will be a no-op if no cache has been given. + * Temporary allocations will be made from SCRATCH_POOL. */ +static svn_error_t * +set_cached_combined_window(svn_stringbuf_t *window, + rep_state_t *rs, + apr_pool_t *scratch_pool) +{ + if (rs->combined_cache) + { + /* but key it with the start offset because that is the known state + * when we will look it up */ + svn_fs_x__window_cache_key_t key = { 0 }; + return svn_cache__set(rs->combined_cache, + get_window_key(&key, rs), + window, + scratch_pool); + } + + return SVN_NO_ERROR; +} + +/* Build an array of rep_state structures in *LIST giving the delta + reps from first_rep to a self-compressed rep. Set *SRC_STATE to + the container rep we find at the end of the chain, or to NULL if + the final delta representation is self-compressed. + The representation to start from is designated by filesystem FS, id + ID, and representation REP. + Also, set *WINDOW_P to the base window content for *LIST, if it + could be found in cache. Otherwise, *LIST will contain the base + representation for the whole delta chain. + */ +static svn_error_t * +build_rep_list(apr_array_header_t **list, + svn_stringbuf_t **window_p, + rep_state_t **src_state, + svn_fs_t *fs, + svn_fs_x__representation_t *first_rep, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool) +{ + svn_fs_x__representation_t rep; + rep_state_t *rs = NULL; + svn_fs_x__rep_header_t *rep_header; + svn_boolean_t is_cached = FALSE; + shared_file_t *shared_file = NULL; + apr_pool_t *iterpool = svn_pool_create(scratch_pool); + + *list = apr_array_make(result_pool, 1, sizeof(rep_state_t *)); + rep = *first_rep; + + /* for the top-level rep, we need the rep_args */ + SVN_ERR(create_rep_state(&rs, &rep_header, &shared_file, &rep, fs, + result_pool, iterpool)); + + while (1) + { + svn_pool_clear(iterpool); + + /* fetch state, if that has not been done already */ + if (!rs) + SVN_ERR(create_rep_state(&rs, &rep_header, &shared_file, + &rep, fs, result_pool, iterpool)); + + /* for txn reps and containered reps, there won't be a cached + * combined window */ + if (svn_fs_x__is_revision(rep.id.change_set) + && rep_header->type != svn_fs_x__rep_container) + SVN_ERR(get_cached_combined_window(window_p, rs, &is_cached, + result_pool)); + + if (is_cached) + { + /* We already have a reconstructed window in our cache. + Write a pseudo rep_state with the full length. */ + rs->start = 0; + rs->current = 0; + rs->size = (*window_p)->len; + *src_state = rs; + break; + } + + if (rep_header->type == svn_fs_x__rep_container) + { + /* This is a container item, so just return the current rep_state. */ + *src_state = rs; + break; + } + + /* Push this rep onto the list. If it's self-compressed, we're done. */ + APR_ARRAY_PUSH(*list, rep_state_t *) = rs; + if (rep_header->type == svn_fs_x__rep_self_delta) + { + *src_state = NULL; + break; + } + + rep.id.change_set + = svn_fs_x__change_set_by_rev(rep_header->base_revision); + rep.id.number = rep_header->base_item_index; + rep.size = rep_header->base_length; + + rs = NULL; + } + svn_pool_destroy(iterpool); + + return SVN_NO_ERROR; +} + + +/* Create a rep_read_baton structure for node revision NODEREV in + filesystem FS and store it in *RB_P. If FULLTEXT_CACHE_KEY is not + NULL, it is the rep's key in the fulltext cache, and a stringbuf + must be allocated to store the text. If rep is mutable, it must be + refer to file contents. + + Allocate the result in RESULT_POOL. This includes the pools within *RB_P. + */ +static svn_error_t * +rep_read_get_baton(rep_read_baton_t **rb_p, + svn_fs_t *fs, + svn_fs_x__representation_t *rep, + svn_fs_x__pair_cache_key_t fulltext_cache_key, + apr_pool_t *result_pool) +{ + rep_read_baton_t *b; + + b = apr_pcalloc(result_pool, sizeof(*b)); + b->fs = fs; + b->rep = *rep; + b->base_window = NULL; + b->chunk_index = 0; + b->buf = NULL; + b->md5_checksum_ctx = svn_checksum_ctx_create(svn_checksum_md5, + result_pool); + b->checksum_finalized = FALSE; + memcpy(b->md5_digest, rep->md5_digest, sizeof(rep->md5_digest)); + b->len = rep->expanded_size; + b->off = 0; + b->fulltext_cache_key = fulltext_cache_key; + + /* Clearable sub-pools. Since they have to remain valid for as long as B + lives, we can't take them from some scratch pool. The caller of this + function will have no control over how those subpools will be used. */ + b->scratch_pool = svn_pool_create(result_pool); + b->filehandle_pool = svn_pool_create(result_pool); + b->fulltext_cache = NULL; + b->fulltext_delivered = 0; + b->current_fulltext = NULL; + + /* Save our output baton. */ + *rb_p = b; + + return SVN_NO_ERROR; +} + +/* Skip forwards to THIS_CHUNK in REP_STATE and then read the next delta + window into *NWIN. */ +static svn_error_t * +read_delta_window(svn_txdelta_window_t **nwin, int this_chunk, + rep_state_t *rs, apr_pool_t *result_pool, + apr_pool_t *scratch_pool) +{ + svn_boolean_t is_cached; + apr_off_t start_offset; + apr_off_t end_offset; + apr_pool_t *iterpool; + + SVN_ERR_ASSERT(rs->chunk_index <= this_chunk); + + SVN_ERR(dgb__log_access(rs->sfile->fs, &rs->rep_id, NULL, + SVN_FS_X__ITEM_TYPE_ANY_REP, scratch_pool)); + + /* Read the next window. But first, try to find it in the cache. */ + SVN_ERR(get_cached_window(nwin, rs, this_chunk, &is_cached, + result_pool, scratch_pool)); + if (is_cached) + return SVN_NO_ERROR; + + /* someone has to actually read the data from file. Open it */ + SVN_ERR(auto_open_shared_file(rs->sfile)); + + /* invoke the 'block-read' feature for non-txn data. + However, don't do that if we are in the middle of some representation, + because the block is unlikely to contain other data. */ + if ( rs->chunk_index == 0 + && svn_fs_x__is_revision(rs->rep_id.change_set) + && rs->window_cache) + { + SVN_ERR(block_read(NULL, rs->sfile->fs, &rs->rep_id, + rs->sfile->rfile, result_pool, scratch_pool)); + + /* reading the whole block probably also provided us with the + desired txdelta window */ + SVN_ERR(get_cached_window(nwin, rs, this_chunk, &is_cached, + result_pool, scratch_pool)); + if (is_cached) + return SVN_NO_ERROR; + } + + /* data is still not cached -> we need to read it. + Make sure we have all the necessary info. */ + SVN_ERR(auto_set_start_offset(rs, scratch_pool)); + SVN_ERR(auto_read_diff_version(rs, scratch_pool)); + + /* RS->FILE may be shared between RS instances -> make sure we point + * to the right data. */ + start_offset = rs->start + rs->current; + SVN_ERR(rs_aligned_seek(rs, NULL, start_offset, scratch_pool)); + + /* Skip windows to reach the current chunk if we aren't there yet. */ + iterpool = svn_pool_create(scratch_pool); + while (rs->chunk_index < this_chunk) + { + apr_file_t *file = rs->sfile->rfile->file; + svn_pool_clear(iterpool); + + SVN_ERR(svn_txdelta_skip_svndiff_window(file, rs->ver, iterpool)); + rs->chunk_index++; + SVN_ERR(svn_fs_x__get_file_offset(&start_offset, file, iterpool)); + + rs->current = start_offset - rs->start; + if (rs->current >= rs->size) + return svn_error_create(SVN_ERR_FS_CORRUPT, NULL, + _("Reading one svndiff window read " + "beyond the end of the " + "representation")); + } + svn_pool_destroy(iterpool); + + /* Actually read the next window. */ + SVN_ERR(svn_txdelta_read_svndiff_window(nwin, rs->sfile->rfile->stream, + rs->ver, result_pool)); + SVN_ERR(get_file_offset(&end_offset, rs, scratch_pool)); + rs->current = end_offset - rs->start; + if (rs->current > rs->size) + return svn_error_create(SVN_ERR_FS_CORRUPT, NULL, + _("Reading one svndiff window read beyond " + "the end of the representation")); + + /* the window has not been cached before, thus cache it now + * (if caching is used for them at all) */ + if (svn_fs_x__is_revision(rs->rep_id.change_set)) + SVN_ERR(set_cached_window(*nwin, rs, start_offset, scratch_pool)); + + return SVN_NO_ERROR; +} + +/* Read the whole representation RS and return it in *NWIN. */ +static svn_error_t * +read_container_window(svn_stringbuf_t **nwin, + rep_state_t *rs, + apr_size_t size, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool) +{ + svn_fs_x__rep_extractor_t *extractor = NULL; + svn_fs_t *fs = rs->sfile->fs; + svn_fs_x__data_t *ffd = fs->fsap_data; + svn_fs_x__pair_cache_key_t key; + svn_revnum_t revision = svn_fs_x__get_revnum(rs->rep_id.change_set); + + SVN_ERR(auto_set_start_offset(rs, scratch_pool)); + key.revision = svn_fs_x__packed_base_rev(fs, revision); + key.second = rs->start; + + /* already in cache? */ + if (ffd->reps_container_cache) + { + svn_boolean_t is_cached = FALSE; + svn_fs_x__reps_baton_t baton; + baton.fs = fs; + baton.idx = rs->sub_item; + + SVN_ERR(svn_cache__get_partial((void**)&extractor, &is_cached, + ffd->reps_container_cache, &key, + svn_fs_x__reps_get_func, &baton, + result_pool)); + } + + /* read from disk, if necessary */ + if (extractor == NULL) + { + SVN_ERR(auto_open_shared_file(rs->sfile)); + SVN_ERR(block_read((void **)&extractor, fs, &rs->rep_id, + rs->sfile->rfile, result_pool, scratch_pool)); + } + + SVN_ERR(svn_fs_x__extractor_drive(nwin, extractor, rs->current, size, + result_pool, scratch_pool)); + + /* Update RS. */ + rs->current += (apr_off_t)size; + + return SVN_NO_ERROR; +} + +/* Get the undeltified window that is a result of combining all deltas + from the current desired representation identified in *RB with its + base representation. Store the window in *RESULT. */ +static svn_error_t * +get_combined_window(svn_stringbuf_t **result, + rep_read_baton_t *rb) +{ + apr_pool_t *pool, *new_pool, *window_pool; + int i; + apr_array_header_t *windows; + svn_stringbuf_t *source, *buf = rb->base_window; + rep_state_t *rs; + apr_pool_t *iterpool; + + /* Read all windows that we need to combine. This is fine because + the size of each window is relatively small (100kB) and skip- + delta limits the number of deltas in a chain to well under 100. + Stop early if one of them does not depend on its predecessors. */ + window_pool = svn_pool_create(rb->scratch_pool); + windows = apr_array_make(window_pool, 0, sizeof(svn_txdelta_window_t *)); + iterpool = svn_pool_create(rb->scratch_pool); + for (i = 0; i < rb->rs_list->nelts; ++i) + { + svn_txdelta_window_t *window; + + svn_pool_clear(iterpool); + + rs = APR_ARRAY_IDX(rb->rs_list, i, rep_state_t *); + SVN_ERR(read_delta_window(&window, rb->chunk_index, rs, window_pool, + iterpool)); + + APR_ARRAY_PUSH(windows, svn_txdelta_window_t *) = window; + if (window->src_ops == 0) + { + ++i; + break; + } + } + + /* Combine in the windows from the other delta reps. */ + pool = svn_pool_create(rb->scratch_pool); + for (--i; i >= 0; --i) + { + svn_txdelta_window_t *window; + + svn_pool_clear(iterpool); + + rs = APR_ARRAY_IDX(rb->rs_list, i, rep_state_t *); + window = APR_ARRAY_IDX(windows, i, svn_txdelta_window_t *); + + /* Maybe, we've got a start representation in a container. If we do, + read as much data from it as the needed for the txdelta window's + source view. + Note that BUF / SOURCE may only be NULL in the first iteration. */ + source = buf; + if (source == NULL && rb->src_state != NULL) + SVN_ERR(read_container_window(&source, rb->src_state, + window->sview_len, pool, iterpool)); + + /* Combine this window with the current one. */ + new_pool = svn_pool_create(rb->scratch_pool); + buf = svn_stringbuf_create_ensure(window->tview_len, new_pool); + buf->len = window->tview_len; + + svn_txdelta_apply_instructions(window, source ? source->data : NULL, + buf->data, &buf->len); + if (buf->len != window->tview_len) + return svn_error_create(SVN_ERR_FS_CORRUPT, NULL, + _("svndiff window length is " + "corrupt")); + + /* Cache windows only if the whole rep content could be read as a + single chunk. Only then will no other chunk need a deeper RS + list than the cached chunk. */ + if ( (rb->chunk_index == 0) && (rs->current == rs->size) + && svn_fs_x__is_revision(rs->rep_id.change_set)) + SVN_ERR(set_cached_combined_window(buf, rs, new_pool)); + + rs->chunk_index++; + + /* Cycle pools so that we only need to hold three windows at a time. */ + svn_pool_destroy(pool); + pool = new_pool; + } + svn_pool_destroy(iterpool); + + svn_pool_destroy(window_pool); + + *result = buf; + return SVN_NO_ERROR; +} + +/* Returns whether or not the expanded fulltext of the file is cachable + * based on its size SIZE. The decision depends on the cache used by RB. + */ +static svn_boolean_t +fulltext_size_is_cachable(svn_fs_x__data_t *ffd, + svn_filesize_t size) +{ + return (size < APR_SIZE_MAX) + && svn_cache__is_cachable(ffd->fulltext_cache, (apr_size_t)size); +} + +/* Close method used on streams returned by read_representation(). + */ +static svn_error_t * +rep_read_contents_close(void *baton) +{ + rep_read_baton_t *rb = baton; + + svn_pool_destroy(rb->scratch_pool); + svn_pool_destroy(rb->filehandle_pool); + + return SVN_NO_ERROR; +} + +/* Inialize the representation read state RS for the given REP_HEADER and + * p2l index ENTRY. If not NULL, assign FILE and STREAM to RS. + * Allocate all sub-structures of RS in RESULT_POOL. + */ +static svn_error_t * +init_rep_state(rep_state_t *rs, + svn_fs_x__rep_header_t *rep_header, + svn_fs_t *fs, + svn_fs_x__revision_file_t *rev_file, + svn_fs_x__p2l_entry_t* entry, + apr_pool_t *result_pool) +{ + svn_fs_x__data_t *ffd = fs->fsap_data; + shared_file_t *shared_file = apr_pcalloc(result_pool, sizeof(*shared_file)); + + /* this function does not apply to representation containers */ + SVN_ERR_ASSERT(entry->type >= SVN_FS_X__ITEM_TYPE_FILE_REP + && entry->type <= SVN_FS_X__ITEM_TYPE_DIR_PROPS); + SVN_ERR_ASSERT(entry->item_count == 1); + + shared_file->rfile = rev_file; + shared_file->fs = fs; + shared_file->revision = svn_fs_x__get_revnum(entry->items[0].change_set); + shared_file->pool = result_pool; + + rs->sfile = shared_file; + rs->rep_id = entry->items[0]; + rs->header_size = rep_header->header_size; + rs->start = entry->offset + rs->header_size; + rs->current = 4; + rs->size = entry->size - rep_header->header_size - 7; + rs->ver = 1; + rs->chunk_index = 0; + rs->window_cache = ffd->txdelta_window_cache; + rs->combined_cache = ffd->combined_window_cache; + + return SVN_NO_ERROR; +} + +/* Walk through all windows in the representation addressed by RS in FS + * (excluding the delta bases) and put those not already cached into the + * window caches. If MAX_OFFSET is not -1, don't read windows that start + * at or beyond that offset. As a side effect, return the total sum of all + * expanded window sizes in *FULLTEXT_LEN. + * Use SCRATCH_POOL for temporary allocations. + */ +static svn_error_t * +cache_windows(svn_filesize_t *fulltext_len, + svn_fs_t *fs, + rep_state_t *rs, + apr_off_t max_offset, + apr_pool_t *scratch_pool) +{ + apr_pool_t *iterpool = svn_pool_create(scratch_pool); + *fulltext_len = 0; + + while (rs->current < rs->size) + { + svn_boolean_t is_cached = FALSE; + window_sizes_t *window_sizes; + + svn_pool_clear(iterpool); + if (max_offset != -1 && rs->start + rs->current >= max_offset) + { + svn_pool_destroy(iterpool); + return SVN_NO_ERROR; + } + + /* efficiently skip windows that are still being cached instead + * of fully decoding them */ + SVN_ERR(get_cached_window_sizes(&window_sizes, rs, &is_cached, + iterpool)); + if (is_cached) + { + *fulltext_len += window_sizes->target_len; + rs->current += window_sizes->packed_len; + } + else + { + svn_txdelta_window_t *window; + apr_off_t start_offset = rs->start + rs->current; + apr_off_t end_offset; + apr_off_t block_start; + + /* navigate to & read the current window */ + SVN_ERR(rs_aligned_seek(rs, &block_start, start_offset, iterpool)); + SVN_ERR(svn_txdelta_read_svndiff_window(&window, + rs->sfile->rfile->stream, + rs->ver, iterpool)); + + /* aggregate expanded window size */ + *fulltext_len += window->tview_len; + + /* determine on-disk window size */ + SVN_ERR(svn_fs_x__get_file_offset(&end_offset, + rs->sfile->rfile->file, + iterpool)); + rs->current = end_offset - rs->start; + if (rs->current > rs->size) + return svn_error_create(SVN_ERR_FS_CORRUPT, NULL, + _("Reading one svndiff window read beyond " + "the end of the representation")); + + /* if the window has not been cached before, cache it now + * (if caching is used for them at all) */ + if (!is_cached) + SVN_ERR(set_cached_window(window, rs, start_offset, iterpool)); + } + + rs->chunk_index++; + } + + svn_pool_destroy(iterpool); + + return SVN_NO_ERROR; +} + +/* Try to get the representation header identified by KEY from FS's cache. + * If it has not been cached, read it from the current position in STREAM + * and put it into the cache (if caching has been enabled for rep headers). + * Return the result in *REP_HEADER. Use POOL for allocations. + */ +static svn_error_t * +read_rep_header(svn_fs_x__rep_header_t **rep_header, + svn_fs_t *fs, + svn_stream_t *stream, + svn_fs_x__representation_cache_key_t *key, + apr_pool_t *pool) +{ + svn_fs_x__data_t *ffd = fs->fsap_data; + svn_boolean_t is_cached = FALSE; + + if (ffd->rep_header_cache) + { + SVN_ERR(svn_cache__get((void**)rep_header, &is_cached, + ffd->rep_header_cache, key, pool)); + if (is_cached) + return SVN_NO_ERROR; + } + + SVN_ERR(svn_fs_x__read_rep_header(rep_header, stream, pool, pool)); + + if (ffd->rep_header_cache) + SVN_ERR(svn_cache__set(ffd->rep_header_cache, key, *rep_header, pool)); + + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__get_representation_length(svn_filesize_t *packed_len, + svn_filesize_t *expanded_len, + svn_fs_t *fs, + svn_fs_x__revision_file_t *rev_file, + svn_fs_x__p2l_entry_t* entry, + apr_pool_t *scratch_pool) +{ + svn_fs_x__representation_cache_key_t key = { 0 }; + rep_state_t rs = { 0 }; + svn_fs_x__rep_header_t *rep_header; + + /* this function does not apply to representation containers */ + SVN_ERR_ASSERT(entry->type >= SVN_FS_X__ITEM_TYPE_FILE_REP + && entry->type <= SVN_FS_X__ITEM_TYPE_DIR_PROPS); + SVN_ERR_ASSERT(entry->item_count == 1); + + /* get / read the representation header */ + key.revision = svn_fs_x__get_revnum(entry->items[0].change_set); + key.is_packed = svn_fs_x__is_packed_rev(fs, key.revision); + key.item_index = entry->items[0].number; + SVN_ERR(read_rep_header(&rep_header, fs, rev_file->stream, &key, + scratch_pool)); + + /* prepare representation reader state (rs) structure */ + SVN_ERR(init_rep_state(&rs, rep_header, fs, rev_file, entry, + scratch_pool)); + + /* RS->SFILE may be shared between RS instances -> make sure we point + * to the right data. */ + *packed_len = rs.size; + SVN_ERR(cache_windows(expanded_len, fs, &rs, -1, scratch_pool)); + + return SVN_NO_ERROR; +} + +/* Return the next *LEN bytes of the rep from our plain / delta windows + and store them in *BUF. */ +static svn_error_t * +get_contents_from_windows(rep_read_baton_t *rb, + char *buf, + apr_size_t *len) +{ + apr_size_t copy_len, remaining = *len; + char *cur = buf; + rep_state_t *rs; + + /* Special case for when there are no delta reps, only a + containered text. */ + if (rb->rs_list->nelts == 0 && rb->buf == NULL) + { + copy_len = remaining; + rs = rb->src_state; + + /* reps in containers don't have a header */ + if (rs->header_size == 0 && rb->base_window == NULL) + { + /* RS->SIZE is unreliable here because it is based upon + * the delta rep size _before_ putting the data into a + * a container. */ + SVN_ERR(read_container_window(&rb->base_window, rs, rb->len, + rb->scratch_pool, rb->scratch_pool)); + rs->current -= rb->base_window->len; + } + + if (rb->base_window != NULL) + { + /* We got the desired rep directly from the cache. + This is where we need the pseudo rep_state created + by build_rep_list(). */ + apr_size_t offset = (apr_size_t)rs->current; + if (copy_len + offset > rb->base_window->len) + copy_len = offset < rb->base_window->len + ? rb->base_window->len - offset + : 0ul; + + memcpy (cur, rb->base_window->data + offset, copy_len); + } + + rs->current += copy_len; + *len = copy_len; + return SVN_NO_ERROR; + } + + while (remaining > 0) + { + /* If we have buffered data from a previous chunk, use that. */ + if (rb->buf) + { + /* Determine how much to copy from the buffer. */ + copy_len = rb->buf_len - rb->buf_pos; + if (copy_len > remaining) + copy_len = remaining; + + /* Actually copy the data. */ + memcpy(cur, rb->buf + rb->buf_pos, copy_len); + rb->buf_pos += copy_len; + cur += copy_len; + remaining -= copy_len; + + /* If the buffer is all used up, clear it and empty the + local pool. */ + if (rb->buf_pos == rb->buf_len) + { + svn_pool_clear(rb->scratch_pool); + rb->buf = NULL; + } + } + else + { + svn_stringbuf_t *sbuf = NULL; + + rs = APR_ARRAY_IDX(rb->rs_list, 0, rep_state_t *); + if (rs->current == rs->size) + break; + + /* Get more buffered data by evaluating a chunk. */ + SVN_ERR(get_combined_window(&sbuf, rb)); + + rb->chunk_index++; + rb->buf_len = sbuf->len; + rb->buf = sbuf->data; + rb->buf_pos = 0; + } + } + + *len = cur - buf; + + return SVN_NO_ERROR; +} + +/* Baton type for get_fulltext_partial. */ +typedef struct fulltext_baton_t +{ + /* Target buffer to write to; of at least LEN bytes. */ + char *buffer; + + /* Offset within the respective fulltext at which we shall start to + copy data into BUFFER. */ + apr_size_t start; + + /* Number of bytes to copy. The actual amount may be less in case + the fulltext is short(er). */ + apr_size_t len; + + /* Number of bytes actually copied into BUFFER. */ + apr_size_t read; +} fulltext_baton_t; + +/* Implement svn_cache__partial_getter_func_t for fulltext caches. + * From the fulltext in DATA, we copy the range specified by the + * fulltext_baton_t* BATON into the buffer provided by that baton. + * OUT and RESULT_POOL are not used. + */ +static svn_error_t * +get_fulltext_partial(void **out, + const void *data, + apr_size_t data_len, + void *baton, + apr_pool_t *result_pool) +{ + fulltext_baton_t *fulltext_baton = baton; + + /* We cached the fulltext with an NUL appended to it. */ + apr_size_t fulltext_len = data_len - 1; + + /* Clip the copy range to what the fulltext size allows. */ + apr_size_t start = MIN(fulltext_baton->start, fulltext_len); + fulltext_baton->read = MIN(fulltext_len - start, fulltext_baton->len); + + /* Copy the data to the output buffer and be done. */ + memcpy(fulltext_baton->buffer, (const char *)data + start, + fulltext_baton->read); + + return SVN_NO_ERROR; +} + +/* Find the fulltext specified in BATON in the fulltext cache given + * as well by BATON. If that succeeds, set *CACHED to TRUE and copy + * up to the next *LEN bytes into BUFFER. Set *LEN to the actual + * number of bytes copied. + */ +static svn_error_t * +get_contents_from_fulltext(svn_boolean_t *cached, + rep_read_baton_t *baton, + char *buffer, + apr_size_t *len) +{ + void *dummy; + fulltext_baton_t fulltext_baton; + + SVN_ERR_ASSERT((apr_size_t)baton->fulltext_delivered + == baton->fulltext_delivered); + fulltext_baton.buffer = buffer; + fulltext_baton.start = (apr_size_t)baton->fulltext_delivered; + fulltext_baton.len = *len; + fulltext_baton.read = 0; + + SVN_ERR(svn_cache__get_partial(&dummy, cached, baton->fulltext_cache, + &baton->fulltext_cache_key, + get_fulltext_partial, &fulltext_baton, + baton->scratch_pool)); + + if (*cached) + { + baton->fulltext_delivered += fulltext_baton.read; + *len = fulltext_baton.read; + } + + return SVN_NO_ERROR; +} + +/* Determine the optimal size of a string buf that shall receive a + * (full-) text of NEEDED bytes. + * + * The critical point is that those buffers may be very large and + * can cause memory fragmentation. We apply simple heuristics to + * make fragmentation less likely. + */ +static apr_size_t +optimimal_allocation_size(apr_size_t needed) +{ + /* For all allocations, assume some overhead that is shared between + * OS memory managemnt, APR memory management and svn_stringbuf_t. */ + const apr_size_t overhead = 0x400; + apr_size_t optimal; + + /* If an allocation size if safe for other ephemeral buffers, it should + * be safe for ours. */ + if (needed <= SVN__STREAM_CHUNK_SIZE) + return needed; + + /* Paranoia edge case: + * Skip our heuristics if they created arithmetical overflow. + * Beware to make this test work for NEEDED = APR_SIZE_MAX as well! */ + if (needed >= APR_SIZE_MAX / 2 - overhead) + return needed; + + /* As per definition SVN__STREAM_CHUNK_SIZE is a power of two. + * Since we know NEEDED to be larger than that, use it as the + * starting point. + * + * Heuristics: Allocate a power-of-two number of bytes that fit + * NEEDED plus some OVERHEAD. The APR allocator + * will round it up to the next full page size. + */ + optimal = SVN__STREAM_CHUNK_SIZE; + while (optimal - overhead < needed) + optimal *= 2; + + /* This is above or equal to NEEDED. */ + return optimal - overhead; +} + +/* After a fulltext cache lookup failure, we will continue to read from + * combined delta or plain windows. However, we must first make that data + * stream in BATON catch up tho the position LEN already delivered from the + * fulltext cache. Also, we need to store the reconstructed fulltext if we + * want to cache it at the end. + */ +static svn_error_t * +skip_contents(rep_read_baton_t *baton, + svn_filesize_t len) +{ + svn_error_t *err = SVN_NO_ERROR; + + /* Do we want to cache the reconstructed fulltext? */ + if (SVN_IS_VALID_REVNUM(baton->fulltext_cache_key.revision)) + { + char *buffer; + svn_filesize_t to_alloc = MAX(len, baton->len); + + /* This should only be happening if BATON->LEN and LEN are + * cacheable, implying they fit into memory. */ + SVN_ERR_ASSERT((apr_size_t)to_alloc == to_alloc); + + /* Allocate the fulltext buffer. */ + baton->current_fulltext = svn_stringbuf_create_ensure( + optimimal_allocation_size((apr_size_t)to_alloc), + baton->filehandle_pool); + + /* Read LEN bytes from the window stream and store the data + * in the fulltext buffer (will be filled by further reads later). */ + baton->current_fulltext->len = (apr_size_t)len; + baton->current_fulltext->data[(apr_size_t)len] = 0; + + buffer = baton->current_fulltext->data; + while (len > 0 && !err) + { + apr_size_t to_read = (apr_size_t)len; + err = get_contents_from_windows(baton, buffer, &to_read); + len -= to_read; + buffer += to_read; + } + } + else if (len > 0) + { + /* Simply drain LEN bytes from the window stream. */ + apr_pool_t *subpool = svn_pool_create(baton->scratch_pool); + char *buffer = apr_palloc(subpool, SVN__STREAM_CHUNK_SIZE); + + while (len > 0 && !err) + { + apr_size_t to_read = len > SVN__STREAM_CHUNK_SIZE + ? SVN__STREAM_CHUNK_SIZE + : (apr_size_t)len; + + err = get_contents_from_windows(baton, buffer, &to_read); + len -= to_read; + } + + svn_pool_destroy(subpool); + } + + return svn_error_trace(err); +} + +/* BATON is of type `rep_read_baton_t'; read the next *LEN bytes of the + representation and store them in *BUF. Sum as we read and verify + the MD5 sum at the end. */ +static svn_error_t * +rep_read_contents(void *baton, + char *buf, + apr_size_t *len) +{ + rep_read_baton_t *rb = baton; + + /* Get data from the fulltext cache for as long as we can. */ + if (rb->fulltext_cache) + { + svn_boolean_t cached; + SVN_ERR(get_contents_from_fulltext(&cached, rb, buf, len)); + if (cached) + return SVN_NO_ERROR; + + /* Cache miss. From now on, we will never read from the fulltext + * cache for this representation anymore. */ + rb->fulltext_cache = NULL; + } + + /* No fulltext cache to help us. We must read from the window stream. */ + if (!rb->rs_list) + { + /* Window stream not initialized, yet. Do it now. */ + SVN_ERR(build_rep_list(&rb->rs_list, &rb->base_window, + &rb->src_state, rb->fs, &rb->rep, + rb->filehandle_pool, rb->scratch_pool)); + + /* In case we did read from the fulltext cache before, make the + * window stream catch up. Also, initialize the fulltext buffer + * if we want to cache the fulltext at the end. */ + SVN_ERR(skip_contents(rb, rb->fulltext_delivered)); + } + + /* Get the next block of data. */ + SVN_ERR(get_contents_from_windows(rb, buf, len)); + + if (rb->current_fulltext) + svn_stringbuf_appendbytes(rb->current_fulltext, buf, *len); + + /* Perform checksumming. We want to check the checksum as soon as + the last byte of data is read, in case the caller never performs + a short read, but we don't want to finalize the MD5 context + twice. */ + if (!rb->checksum_finalized) + { + SVN_ERR(svn_checksum_update(rb->md5_checksum_ctx, buf, *len)); + rb->off += *len; + if (rb->off == rb->len) + { + svn_checksum_t *md5_checksum; + svn_checksum_t expected; + expected.kind = svn_checksum_md5; + expected.digest = rb->md5_digest; + + rb->checksum_finalized = TRUE; + SVN_ERR(svn_checksum_final(&md5_checksum, rb->md5_checksum_ctx, + rb->scratch_pool)); + if (!svn_checksum_match(md5_checksum, &expected)) + return svn_error_create(SVN_ERR_FS_CORRUPT, + svn_checksum_mismatch_err(&expected, md5_checksum, + rb->scratch_pool, + _("Checksum mismatch while reading representation")), + NULL); + } + } + + if (rb->off == rb->len && rb->current_fulltext) + { + svn_fs_x__data_t *ffd = rb->fs->fsap_data; + SVN_ERR(svn_cache__set(ffd->fulltext_cache, &rb->fulltext_cache_key, + rb->current_fulltext, rb->scratch_pool)); + rb->current_fulltext = NULL; + } + + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__get_contents(svn_stream_t **contents_p, + svn_fs_t *fs, + svn_fs_x__representation_t *rep, + svn_boolean_t cache_fulltext, + apr_pool_t *result_pool) +{ + if (! rep) + { + *contents_p = svn_stream_empty(result_pool); + } + else + { + svn_fs_x__data_t *ffd = fs->fsap_data; + svn_filesize_t len = rep->expanded_size; + rep_read_baton_t *rb; + svn_revnum_t revision = svn_fs_x__get_revnum(rep->id.change_set); + + svn_fs_x__pair_cache_key_t fulltext_cache_key = { 0 }; + fulltext_cache_key.revision = revision; + fulltext_cache_key.second = rep->id.number; + + /* Initialize the reader baton. Some members may added lazily + * while reading from the stream */ + SVN_ERR(rep_read_get_baton(&rb, fs, rep, fulltext_cache_key, + result_pool)); + + /* Make the stream attempt fulltext cache lookups if the fulltext + * is cacheable. If it is not, then also don't try to buffer and + * cache it. */ + if (ffd->fulltext_cache && cache_fulltext + && SVN_IS_VALID_REVNUM(revision) + && fulltext_size_is_cachable(ffd, len)) + { + rb->fulltext_cache = ffd->fulltext_cache; + } + else + { + /* This will also prevent the reconstructed fulltext from being + put into the cache. */ + rb->fulltext_cache_key.revision = SVN_INVALID_REVNUM; + } + + *contents_p = svn_stream_create(rb, result_pool); + svn_stream_set_read2(*contents_p, NULL /* only full read support */, + rep_read_contents); + svn_stream_set_close(*contents_p, rep_read_contents_close); + } + + return SVN_NO_ERROR; +} + + +/* Baton for cache_access_wrapper. Wraps the original parameters of + * svn_fs_x__try_process_file_content(). + */ +typedef struct cache_access_wrapper_baton_t +{ + svn_fs_process_contents_func_t func; + void* baton; +} cache_access_wrapper_baton_t; + +/* Wrapper to translate between svn_fs_process_contents_func_t and + * svn_cache__partial_getter_func_t. + */ +static svn_error_t * +cache_access_wrapper(void **out, + const void *data, + apr_size_t data_len, + void *baton, + apr_pool_t *pool) +{ + cache_access_wrapper_baton_t *wrapper_baton = baton; + + SVN_ERR(wrapper_baton->func((const unsigned char *)data, + data_len - 1, /* cache adds terminating 0 */ + wrapper_baton->baton, + pool)); + + /* non-NULL value to signal the calling cache that all went well */ + *out = baton; + + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__try_process_file_contents(svn_boolean_t *success, + svn_fs_t *fs, + svn_fs_x__noderev_t *noderev, + svn_fs_process_contents_func_t processor, + void* baton, + apr_pool_t *scratch_pool) +{ + svn_fs_x__representation_t *rep = noderev->data_rep; + if (rep) + { + svn_fs_x__data_t *ffd = fs->fsap_data; + svn_fs_x__pair_cache_key_t fulltext_cache_key = { 0 }; + + fulltext_cache_key.revision = svn_fs_x__get_revnum(rep->id.change_set); + fulltext_cache_key.second = rep->id.number; + if (ffd->fulltext_cache + && SVN_IS_VALID_REVNUM(fulltext_cache_key.revision) + && fulltext_size_is_cachable(ffd, rep->expanded_size)) + { + cache_access_wrapper_baton_t wrapper_baton; + void *dummy = NULL; + + wrapper_baton.func = processor; + wrapper_baton.baton = baton; + return svn_cache__get_partial(&dummy, success, + ffd->fulltext_cache, + &fulltext_cache_key, + cache_access_wrapper, + &wrapper_baton, + scratch_pool); + } + } + + *success = FALSE; + return SVN_NO_ERROR; +} + +/* Baton used when reading delta windows. */ +typedef struct delta_read_baton_t +{ + struct rep_state_t *rs; + unsigned char md5_digest[APR_MD5_DIGESTSIZE]; +} delta_read_baton_t; + +/* This implements the svn_txdelta_next_window_fn_t interface. */ +static svn_error_t * +delta_read_next_window(svn_txdelta_window_t **window, + void *baton, + apr_pool_t *pool) +{ + delta_read_baton_t *drb = baton; + apr_pool_t *scratch_pool = svn_pool_create(pool); + + *window = NULL; + if (drb->rs->current < drb->rs->size) + { + SVN_ERR(read_delta_window(window, drb->rs->chunk_index, drb->rs, pool, + scratch_pool)); + drb->rs->chunk_index++; + } + + svn_pool_destroy(scratch_pool); + + return SVN_NO_ERROR; +} + +/* This implements the svn_txdelta_md5_digest_fn_t interface. */ +static const unsigned char * +delta_read_md5_digest(void *baton) +{ + delta_read_baton_t *drb = baton; + return drb->md5_digest; +} + +/* Return a txdelta stream for on-disk representation REP_STATE + * of TARGET. Allocate the result in RESULT_POOL. + */ +static svn_txdelta_stream_t * +get_storaged_delta_stream(rep_state_t *rep_state, + svn_fs_x__noderev_t *target, + apr_pool_t *result_pool) +{ + /* Create the delta read baton. */ + delta_read_baton_t *drb = apr_pcalloc(result_pool, sizeof(*drb)); + drb->rs = rep_state; + memcpy(drb->md5_digest, target->data_rep->md5_digest, + sizeof(drb->md5_digest)); + return svn_txdelta_stream_create(drb, delta_read_next_window, + delta_read_md5_digest, result_pool); +} + +svn_error_t * +svn_fs_x__get_file_delta_stream(svn_txdelta_stream_t **stream_p, + svn_fs_t *fs, + svn_fs_x__noderev_t *source, + svn_fs_x__noderev_t *target, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool) +{ + svn_stream_t *source_stream, *target_stream; + rep_state_t *rep_state; + svn_fs_x__rep_header_t *rep_header; + svn_fs_x__data_t *ffd = fs->fsap_data; + + /* Try a shortcut: if the target is stored as a delta against the source, + then just use that delta. However, prefer using the fulltext cache + whenever that is available. */ + if (target->data_rep && (source || !ffd->fulltext_cache)) + { + /* Read target's base rep if any. */ + SVN_ERR(create_rep_state(&rep_state, &rep_header, NULL, + target->data_rep, fs, result_pool, + scratch_pool)); + + /* Try a shortcut: if the target is stored as a delta against the source, + then just use that delta. */ + if (source && source->data_rep && target->data_rep) + { + /* If that matches source, then use this delta as is. + Note that we want an actual delta here. E.g. a self-delta would + not be good enough. */ + if (rep_header->type == svn_fs_x__rep_delta + && rep_header->base_revision + == svn_fs_x__get_revnum(source->data_rep->id.change_set) + && rep_header->base_item_index == source->data_rep->id.number) + { + *stream_p = get_storaged_delta_stream(rep_state, target, + result_pool); + return SVN_NO_ERROR; + } + } + else if (!source) + { + /* We want a self-delta. There is a fair chance that TARGET got + added in this revision and is already stored in the requested + format. */ + if (rep_header->type == svn_fs_x__rep_self_delta) + { + *stream_p = get_storaged_delta_stream(rep_state, target, + result_pool); + return SVN_NO_ERROR; + } + } + + /* Don't keep file handles open for longer than necessary. */ + if (rep_state->sfile->rfile) + { + SVN_ERR(svn_fs_x__close_revision_file(rep_state->sfile->rfile)); + rep_state->sfile->rfile = NULL; + } + } + + /* Read both fulltexts and construct a delta. */ + if (source) + SVN_ERR(svn_fs_x__get_contents(&source_stream, fs, source->data_rep, + TRUE, result_pool)); + else + source_stream = svn_stream_empty(result_pool); + + SVN_ERR(svn_fs_x__get_contents(&target_stream, fs, target->data_rep, + TRUE, result_pool)); + + /* Because source and target stream will already verify their content, + * there is no need to do this once more. In particular if the stream + * content is being fetched from cache. */ + svn_txdelta2(stream_p, source_stream, target_stream, FALSE, result_pool); + + return SVN_NO_ERROR; +} + +/* Return TRUE when all svn_fs_x__dirent_t* in ENTRIES are already sorted + by their respective name. */ +static svn_boolean_t +sorted(apr_array_header_t *entries) +{ + int i; + + const svn_fs_x__dirent_t * const *dirents = (const void *)entries->elts; + for (i = 0; i < entries->nelts-1; ++i) + if (strcmp(dirents[i]->name, dirents[i+1]->name) > 0) + return FALSE; + + return TRUE; +} + +/* Compare the names of the two dirents given in **A and **B. */ +static int +compare_dirents(const void *a, + const void *b) +{ + const svn_fs_x__dirent_t *lhs = *((const svn_fs_x__dirent_t * const *) a); + const svn_fs_x__dirent_t *rhs = *((const svn_fs_x__dirent_t * const *) b); + + return strcmp(lhs->name, rhs->name); +} + +/* Compare the name of the dirents given in **A with the C string in *B. */ +static int +compare_dirent_name(const void *a, + const void *b) +{ + const svn_fs_x__dirent_t *lhs = *((const svn_fs_x__dirent_t * const *) a); + const char *rhs = b; + + return strcmp(lhs->name, rhs); +} + +/* Into ENTRIES, read all directories entries from the key-value text in + * STREAM. If INCREMENTAL is TRUE, read until the end of the STREAM and + * update the data. ID is provided for nicer error messages. + */ +static svn_error_t * +read_dir_entries(apr_array_header_t *entries, + svn_stream_t *stream, + svn_boolean_t incremental, + const svn_fs_x__id_t *id, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool) +{ + apr_pool_t *iterpool = svn_pool_create(scratch_pool); + apr_hash_t *hash = incremental ? svn_hash__make(scratch_pool) : NULL; + const char *terminator = SVN_HASH_TERMINATOR; + + /* Read until the terminator (non-incremental) or the end of STREAM + (incremental mode). In the latter mode, we use a temporary HASH + to make updating and removing entries cheaper. */ + while (1) + { + svn_hash__entry_t entry; + svn_fs_x__dirent_t *dirent; + char *str; + + svn_pool_clear(iterpool); + SVN_ERR(svn_hash__read_entry(&entry, stream, terminator, + incremental, iterpool)); + + /* End of directory? */ + if (entry.key == NULL) + { + /* In incremental mode, we skip the terminator and read the + increments following it until the end of the stream. */ + if (incremental && terminator) + terminator = NULL; + else + break; + } + + /* Deleted entry? */ + if (entry.val == NULL) + { + /* We must be in incremental mode */ + assert(hash); + apr_hash_set(hash, entry.key, entry.keylen, NULL); + continue; + } + + /* Add a new directory entry. */ + dirent = apr_pcalloc(result_pool, sizeof(*dirent)); + dirent->name = apr_pstrmemdup(result_pool, entry.key, entry.keylen); + + str = svn_cstring_tokenize(" ", &entry.val); + if (str == NULL) + return svn_error_createf(SVN_ERR_FS_CORRUPT, NULL, + _("Directory entry corrupt in '%s'"), + svn_fs_x__id_unparse(id, scratch_pool)->data); + + if (strcmp(str, SVN_FS_X__KIND_FILE) == 0) + { + dirent->kind = svn_node_file; + } + else if (strcmp(str, SVN_FS_X__KIND_DIR) == 0) + { + dirent->kind = svn_node_dir; + } + else + { + return svn_error_createf(SVN_ERR_FS_CORRUPT, NULL, + _("Directory entry corrupt in '%s'"), + svn_fs_x__id_unparse(id, scratch_pool)->data); + } + + str = svn_cstring_tokenize(" ", &entry.val); + if (str == NULL) + return svn_error_createf(SVN_ERR_FS_CORRUPT, NULL, + _("Directory entry corrupt in '%s'"), + svn_fs_x__id_unparse(id, scratch_pool)->data); + + SVN_ERR(svn_fs_x__id_parse(&dirent->id, str)); + + /* In incremental mode, update the hash; otherwise, write to the + * final array. */ + if (incremental) + apr_hash_set(hash, dirent->name, entry.keylen, dirent); + else + APR_ARRAY_PUSH(entries, svn_fs_x__dirent_t *) = dirent; + } + + /* Convert container to a sorted array. */ + if (incremental) + { + apr_hash_index_t *hi; + for (hi = apr_hash_first(iterpool, hash); hi; hi = apr_hash_next(hi)) + APR_ARRAY_PUSH(entries, svn_fs_x__dirent_t *) = apr_hash_this_val(hi); + } + + if (!sorted(entries)) + svn_sort__array(entries, compare_dirents); + + svn_pool_destroy(iterpool); + + return SVN_NO_ERROR; +} + +/* Fetch the contents of a directory into ENTRIES. Values are stored + as filename to string mappings; further conversion is necessary to + convert them into svn_fs_x__dirent_t values. */ +static svn_error_t * +get_dir_contents(apr_array_header_t **entries, + svn_fs_t *fs, + svn_fs_x__noderev_t *noderev, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool) +{ + svn_stream_t *contents; + const svn_fs_x__id_t *id = &noderev->noderev_id; + + *entries = apr_array_make(result_pool, 16, sizeof(svn_fs_x__dirent_t *)); + if (noderev->data_rep + && ! svn_fs_x__is_revision(noderev->data_rep->id.change_set)) + { + const char *filename + = svn_fs_x__path_txn_node_children(fs, id, scratch_pool, + scratch_pool); + + /* The representation is mutable. Read the old directory + contents from the mutable children file, followed by the + changes we've made in this transaction. */ + SVN_ERR(svn_stream_open_readonly(&contents, filename, scratch_pool, + scratch_pool)); + SVN_ERR(read_dir_entries(*entries, contents, TRUE, id, + result_pool, scratch_pool)); + SVN_ERR(svn_stream_close(contents)); + } + else if (noderev->data_rep) + { + /* Undeltify content before parsing it. Otherwise, we could only + * parse it byte-by-byte. + */ + apr_size_t len = noderev->data_rep->expanded_size; + svn_stringbuf_t *text; + + /* The representation is immutable. Read it normally. */ + SVN_ERR(svn_fs_x__get_contents(&contents, fs, noderev->data_rep, + FALSE, scratch_pool)); + SVN_ERR(svn_stringbuf_from_stream(&text, contents, len, scratch_pool)); + SVN_ERR(svn_stream_close(contents)); + + /* de-serialize hash */ + contents = svn_stream_from_stringbuf(text, scratch_pool); + SVN_ERR(read_dir_entries(*entries, contents, FALSE, id, + result_pool, scratch_pool)); + } + + return SVN_NO_ERROR; +} + + +/* Return the cache object in FS responsible to storing the directory the + * NODEREV plus the corresponding pre-allocated *KEY. + */ +static svn_cache__t * +locate_dir_cache(svn_fs_t *fs, + svn_fs_x__id_t *key, + svn_fs_x__noderev_t *noderev) +{ + svn_fs_x__data_t *ffd = fs->fsap_data; + if (svn_fs_x__is_txn(noderev->noderev_id.change_set)) + { + /* data in txns must be addressed by ID since the representation has + not been created, yet. */ + *key = noderev->noderev_id; + } + else + { + /* committed data can use simple rev,item pairs */ + if (noderev->data_rep) + { + *key = noderev->data_rep->id; + } + else + { + /* no data rep -> empty directory. + Use a key that does definitely not clash with non-NULL reps. */ + key->change_set = SVN_FS_X__INVALID_CHANGE_SET; + key->number = SVN_FS_X__ITEM_INDEX_UNUSED; + } + } + + return ffd->dir_cache; +} + +svn_error_t * +svn_fs_x__rep_contents_dir(apr_array_header_t **entries_p, + svn_fs_t *fs, + svn_fs_x__noderev_t *noderev, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool) +{ + svn_fs_x__id_t key; + + /* find the cache we may use */ + svn_cache__t *cache = locate_dir_cache(fs, &key, noderev); + if (cache) + { + svn_boolean_t found; + + SVN_ERR(svn_cache__get((void **)entries_p, &found, cache, &key, + result_pool)); + if (found) + return SVN_NO_ERROR; + } + + /* Read in the directory contents. */ + SVN_ERR(get_dir_contents(entries_p, fs, noderev, result_pool, + scratch_pool)); + + /* Update the cache, if we are to use one. */ + if (cache) + SVN_ERR(svn_cache__set(cache, &key, *entries_p, scratch_pool)); + + return SVN_NO_ERROR; +} + +svn_fs_x__dirent_t * +svn_fs_x__find_dir_entry(apr_array_header_t *entries, + const char *name, + int *hint) +{ + svn_fs_x__dirent_t **result + = svn_sort__array_lookup(entries, name, hint, compare_dirent_name); + return result ? *result : NULL; +} + +svn_error_t * +svn_fs_x__rep_contents_dir_entry(svn_fs_x__dirent_t **dirent, + svn_fs_t *fs, + svn_fs_x__noderev_t *noderev, + const char *name, + apr_size_t *hint, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool) +{ + svn_boolean_t found = FALSE; + + /* find the cache we may use */ + svn_fs_x__id_t key; + svn_cache__t *cache = locate_dir_cache(fs, &key, noderev); + if (cache) + { + svn_fs_x__ede_baton_t baton; + baton.hint = *hint; + baton.name = name; + + /* Cache lookup. */ + SVN_ERR(svn_cache__get_partial((void **)dirent, + &found, + cache, + &key, + svn_fs_x__extract_dir_entry, + &baton, + result_pool)); + + /* Remember the new clue only if we found something at that spot. */ + if (found) + *hint = baton.hint; + } + + /* fetch data from disk if we did not find it in the cache */ + if (! found) + { + apr_array_header_t *entries; + svn_fs_x__dirent_t *entry; + svn_fs_x__dirent_t *entry_copy = NULL; + + /* read the dir from the file system. It will probably be put it + into the cache for faster lookup in future calls. */ + SVN_ERR(svn_fs_x__rep_contents_dir(&entries, fs, noderev, + scratch_pool, scratch_pool)); + + /* find desired entry and return a copy in POOL, if found */ + entry = svn_fs_x__find_dir_entry(entries, name, NULL); + if (entry) + { + entry_copy = apr_pmemdup(result_pool, entry, sizeof(*entry_copy)); + entry_copy->name = apr_pstrdup(result_pool, entry->name); + } + + *dirent = entry_copy; + } + + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__get_proplist(apr_hash_t **proplist_p, + svn_fs_t *fs, + svn_fs_x__noderev_t *noderev, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool) +{ + apr_hash_t *proplist; + svn_stream_t *stream; + const svn_fs_x__id_t *noderev_id = &noderev->noderev_id; + + if (noderev->prop_rep + && !svn_fs_x__is_revision(noderev->prop_rep->id.change_set)) + { + const char *filename = svn_fs_x__path_txn_node_props(fs, noderev_id, + scratch_pool, + scratch_pool); + proplist = apr_hash_make(result_pool); + + SVN_ERR(svn_stream_open_readonly(&stream, filename, scratch_pool, + scratch_pool)); + SVN_ERR(svn_hash_read2(proplist, stream, SVN_HASH_TERMINATOR, + result_pool)); + SVN_ERR(svn_stream_close(stream)); + } + else if (noderev->prop_rep) + { + svn_fs_x__data_t *ffd = fs->fsap_data; + svn_fs_x__representation_t *rep = noderev->prop_rep; + svn_fs_x__pair_cache_key_t key = { 0 }; + + key.revision = svn_fs_x__get_revnum(rep->id.change_set); + key.second = rep->id.number; + if (ffd->properties_cache && SVN_IS_VALID_REVNUM(key.revision)) + { + svn_boolean_t is_cached; + SVN_ERR(svn_cache__get((void **) proplist_p, &is_cached, + ffd->properties_cache, &key, result_pool)); + if (is_cached) + return SVN_NO_ERROR; + } + + proplist = apr_hash_make(result_pool); + SVN_ERR(svn_fs_x__get_contents(&stream, fs, noderev->prop_rep, FALSE, + scratch_pool)); + SVN_ERR(svn_hash_read2(proplist, stream, SVN_HASH_TERMINATOR, + result_pool)); + SVN_ERR(svn_stream_close(stream)); + + if (ffd->properties_cache && SVN_IS_VALID_REVNUM(rep->id.change_set)) + SVN_ERR(svn_cache__set(ffd->properties_cache, &key, proplist, + scratch_pool)); + } + else + { + /* return an empty prop list if the node doesn't have any props */ + proplist = apr_hash_make(result_pool); + } + + *proplist_p = proplist; + + return SVN_NO_ERROR; +} + + + +svn_error_t * +svn_fs_x__get_changes(apr_array_header_t **changes, + svn_fs_t *fs, + svn_revnum_t rev, + apr_pool_t *result_pool) +{ + svn_fs_x__revision_file_t *revision_file; + svn_boolean_t found; + svn_fs_x__data_t *ffd = fs->fsap_data; + apr_pool_t *scratch_pool = svn_pool_create(result_pool); + + svn_fs_x__id_t id; + id.change_set = svn_fs_x__change_set_by_rev(rev); + id.number = SVN_FS_X__ITEM_INDEX_CHANGES; + + /* Provide revision file. */ + + SVN_ERR(svn_fs_x__ensure_revision_exists(rev, fs, scratch_pool)); + SVN_ERR(svn_fs_x__open_pack_or_rev_file(&revision_file, fs, rev, + scratch_pool, scratch_pool)); + + /* try cache lookup first */ + + if (ffd->changes_container_cache && svn_fs_x__is_packed_rev(fs, rev)) + { + apr_off_t offset; + apr_uint32_t sub_item; + svn_fs_x__pair_cache_key_t key; + + SVN_ERR(svn_fs_x__item_offset(&offset, &sub_item, fs, revision_file, + &id, scratch_pool)); + key.revision = svn_fs_x__packed_base_rev(fs, rev); + key.second = offset; + + SVN_ERR(svn_cache__get_partial((void **)changes, &found, + ffd->changes_container_cache, &key, + svn_fs_x__changes_get_list_func, + &sub_item, result_pool)); + } + else if (ffd->changes_cache) + { + SVN_ERR(svn_cache__get((void **) changes, &found, ffd->changes_cache, + &rev, result_pool)); + } + else + { + found = FALSE; + } + + if (!found) + { + /* 'block-read' will also provide us with the desired data */ + SVN_ERR(block_read((void **)changes, fs, &id, revision_file, + result_pool, scratch_pool)); + + SVN_ERR(svn_fs_x__close_revision_file(revision_file)); + } + + SVN_ERR(dgb__log_access(fs, &id, *changes, SVN_FS_X__ITEM_TYPE_CHANGES, + scratch_pool)); + + svn_pool_destroy(scratch_pool); + return SVN_NO_ERROR; +} + +/* Fetch the representation data (header, txdelta / plain windows) + * addressed by ENTRY->ITEM in FS and cache it if caches are enabled. + * Read the data from the already open FILE and the wrapping + * STREAM object. If MAX_OFFSET is not -1, don't read windows that start + * at or beyond that offset. Use SCRATCH_POOL for temporary allocations. + */ +static svn_error_t * +block_read_contents(svn_fs_t *fs, + svn_fs_x__revision_file_t *rev_file, + svn_fs_x__p2l_entry_t* entry, + svn_fs_x__pair_cache_key_t *key, + apr_off_t max_offset, + apr_pool_t *scratch_pool) +{ + svn_fs_x__data_t *ffd = fs->fsap_data; + svn_fs_x__representation_cache_key_t header_key = { 0 }; + rep_state_t rs = { 0 }; + svn_filesize_t fulltext_len; + svn_fs_x__rep_header_t *rep_header; + + if (!ffd->txdelta_window_cache || !ffd->combined_window_cache) + return SVN_NO_ERROR; + + header_key.revision = (apr_int32_t)key->revision; + header_key.is_packed = svn_fs_x__is_packed_rev(fs, header_key.revision); + header_key.item_index = key->second; + + SVN_ERR(read_rep_header(&rep_header, fs, rev_file->stream, &header_key, + scratch_pool)); + SVN_ERR(init_rep_state(&rs, rep_header, fs, rev_file, entry, scratch_pool)); + SVN_ERR(cache_windows(&fulltext_len, fs, &rs, max_offset, scratch_pool)); + + return SVN_NO_ERROR; +} + +/* For the given REV_FILE in FS, in *STREAM return a stream covering the + * item specified by ENTRY. Also, verify the item's content by low-level + * checksum. Allocate the result in POOL. + */ +static svn_error_t * +read_item(svn_stream_t **stream, + svn_fs_t *fs, + svn_fs_x__revision_file_t *rev_file, + svn_fs_x__p2l_entry_t* entry, + apr_pool_t *pool) +{ + apr_uint32_t digest; + svn_checksum_t *expected, *actual; + apr_uint32_t plain_digest; + + /* Read item into string buffer. */ + svn_stringbuf_t *text = svn_stringbuf_create_ensure(entry->size, pool); + text->len = entry->size; + text->data[text->len] = 0; + SVN_ERR(svn_io_file_read_full2(rev_file->file, text->data, text->len, + NULL, NULL, pool)); + + /* Return (construct, calculate) stream and checksum. */ + *stream = svn_stream_from_stringbuf(text, pool); + digest = svn__fnv1a_32x4(text->data, text->len); + + /* Checksums will match most of the time. */ + if (entry->fnv1_checksum == digest) + return SVN_NO_ERROR; + + /* Construct proper checksum objects from their digests to allow for + * nice error messages. */ + plain_digest = htonl(entry->fnv1_checksum); + expected = svn_checksum__from_digest_fnv1a_32x4( + (const unsigned char *)&plain_digest, pool); + plain_digest = htonl(digest); + actual = svn_checksum__from_digest_fnv1a_32x4( + (const unsigned char *)&plain_digest, pool); + + /* Construct the full error message with all the info we have. */ + return svn_checksum_mismatch_err(expected, actual, pool, + _("Low-level checksum mismatch while reading\n" + "%s bytes of meta data at offset %s "), + apr_psprintf(pool, "%" APR_OFF_T_FMT, entry->size), + apr_psprintf(pool, "%" APR_OFF_T_FMT, entry->offset)); +} + +/* Read all txdelta / plain windows following REP_HEADER in FS as described + * by ENTRY. Read the data from the already open FILE and the wrapping + * STREAM object. If MAX_OFFSET is not -1, don't read windows that start + * at or beyond that offset. Use SCRATCH_POOL for temporary allocations. + * If caching is not enabled, this is a no-op. + */ +static svn_error_t * +block_read_changes(apr_array_header_t **changes, + svn_fs_t *fs, + svn_fs_x__revision_file_t *rev_file, + svn_fs_x__p2l_entry_t* entry, + svn_boolean_t must_read, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool) +{ + svn_fs_x__data_t *ffd = fs->fsap_data; + svn_stream_t *stream; + svn_revnum_t revision = svn_fs_x__get_revnum(entry->items[0].change_set); + if (!must_read && !ffd->changes_cache) + return SVN_NO_ERROR; + + /* we don't support containers, yet */ + SVN_ERR_ASSERT(entry->item_count == 1); + + /* already in cache? */ + if (!must_read && ffd->changes_cache) + { + svn_boolean_t is_cached = FALSE; + SVN_ERR(svn_cache__has_key(&is_cached, ffd->changes_cache, &revision, + scratch_pool)); + if (is_cached) + return SVN_NO_ERROR; + } + + SVN_ERR(read_item(&stream, fs, rev_file, entry, scratch_pool)); + + /* read changes from revision file */ + + SVN_ERR(svn_fs_x__read_changes(changes, stream, result_pool, scratch_pool)); + + /* cache for future reference */ + + if (ffd->changes_cache) + { + /* Guesstimate for the size of the in-cache representation. */ + apr_size_t estimated_size = (apr_size_t)250 * (*changes)->nelts; + + /* Don't even serialize data that probably won't fit into the + * cache. This often implies that either CHANGES is very + * large, memory is scarce or both. Having a huge temporary + * copy would not be a good thing in either case. */ + if (svn_cache__is_cachable(ffd->changes_cache, estimated_size)) + SVN_ERR(svn_cache__set(ffd->changes_cache, &revision, *changes, + scratch_pool)); + } + + return SVN_NO_ERROR; +} + +static svn_error_t * +block_read_changes_container(apr_array_header_t **changes, + svn_fs_t *fs, + svn_fs_x__revision_file_t *rev_file, + svn_fs_x__p2l_entry_t* entry, + apr_uint32_t sub_item, + svn_boolean_t must_read, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool) +{ + svn_fs_x__data_t *ffd = fs->fsap_data; + svn_fs_x__changes_t *container; + svn_fs_x__pair_cache_key_t key; + svn_stream_t *stream; + svn_revnum_t revision = svn_fs_x__get_revnum(entry->items[0].change_set); + + key.revision = svn_fs_x__packed_base_rev(fs, revision); + key.second = entry->offset; + + /* already in cache? */ + if (!must_read && ffd->changes_container_cache) + { + svn_boolean_t is_cached = FALSE; + SVN_ERR(svn_cache__has_key(&is_cached, ffd->changes_container_cache, + &key, scratch_pool)); + if (is_cached) + return SVN_NO_ERROR; + } + + SVN_ERR(read_item(&stream, fs, rev_file, entry, scratch_pool)); + + /* read changes from revision file */ + + SVN_ERR(svn_fs_x__read_changes_container(&container, stream, scratch_pool, + scratch_pool)); + + /* extract requested data */ + + if (must_read) + SVN_ERR(svn_fs_x__changes_get_list(changes, container, sub_item, + result_pool)); + + if (ffd->changes_container_cache) + SVN_ERR(svn_cache__set(ffd->changes_container_cache, &key, container, + scratch_pool)); + + return SVN_NO_ERROR; +} + +static svn_error_t * +block_read_noderev(svn_fs_x__noderev_t **noderev_p, + svn_fs_t *fs, + svn_fs_x__revision_file_t *rev_file, + svn_fs_x__p2l_entry_t* entry, + svn_fs_x__pair_cache_key_t *key, + svn_boolean_t must_read, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool) +{ + svn_fs_x__data_t *ffd = fs->fsap_data; + svn_stream_t *stream; + if (!must_read && !ffd->node_revision_cache) + return SVN_NO_ERROR; + + /* we don't support containers, yet */ + SVN_ERR_ASSERT(entry->item_count == 1); + + /* already in cache? */ + if (!must_read && ffd->node_revision_cache) + { + svn_boolean_t is_cached = FALSE; + SVN_ERR(svn_cache__has_key(&is_cached, ffd->node_revision_cache, key, + scratch_pool)); + if (is_cached) + return SVN_NO_ERROR; + } + + SVN_ERR(read_item(&stream, fs, rev_file, entry, scratch_pool)); + + /* read node rev from revision file */ + + SVN_ERR(svn_fs_x__read_noderev(noderev_p, stream, result_pool, + scratch_pool)); + if (ffd->node_revision_cache) + SVN_ERR(svn_cache__set(ffd->node_revision_cache, key, *noderev_p, + scratch_pool)); + + return SVN_NO_ERROR; +} + +static svn_error_t * +block_read_noderevs_container(svn_fs_x__noderev_t **noderev_p, + svn_fs_t *fs, + svn_fs_x__revision_file_t *rev_file, + svn_fs_x__p2l_entry_t* entry, + apr_uint32_t sub_item, + svn_boolean_t must_read, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool) +{ + svn_fs_x__data_t *ffd = fs->fsap_data; + svn_fs_x__noderevs_t *container; + svn_stream_t *stream; + svn_fs_x__pair_cache_key_t key; + svn_revnum_t revision = svn_fs_x__get_revnum(entry->items[0].change_set); + + key.revision = svn_fs_x__packed_base_rev(fs, revision); + key.second = entry->offset; + + /* already in cache? */ + if (!must_read && ffd->noderevs_container_cache) + { + svn_boolean_t is_cached = FALSE; + SVN_ERR(svn_cache__has_key(&is_cached, ffd->noderevs_container_cache, + &key, scratch_pool)); + if (is_cached) + return SVN_NO_ERROR; + } + + SVN_ERR(read_item(&stream, fs, rev_file, entry, scratch_pool)); + + /* read noderevs from revision file */ + SVN_ERR(svn_fs_x__read_noderevs_container(&container, stream, scratch_pool, + scratch_pool)); + + /* extract requested data */ + if (must_read) + SVN_ERR(svn_fs_x__noderevs_get(noderev_p, container, sub_item, + result_pool)); + + if (ffd->noderevs_container_cache) + SVN_ERR(svn_cache__set(ffd->noderevs_container_cache, &key, container, + scratch_pool)); + + return SVN_NO_ERROR; +} + +static svn_error_t * +block_read_reps_container(svn_fs_x__rep_extractor_t **extractor, + svn_fs_t *fs, + svn_fs_x__revision_file_t *rev_file, + svn_fs_x__p2l_entry_t* entry, + apr_uint32_t sub_item, + svn_boolean_t must_read, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool) +{ + svn_fs_x__data_t *ffd = fs->fsap_data; + svn_fs_x__reps_t *container; + svn_stream_t *stream; + svn_fs_x__pair_cache_key_t key; + svn_revnum_t revision = svn_fs_x__get_revnum(entry->items[0].change_set); + + key.revision = svn_fs_x__packed_base_rev(fs, revision); + key.second = entry->offset; + + /* already in cache? */ + if (!must_read && ffd->reps_container_cache) + { + svn_boolean_t is_cached = FALSE; + SVN_ERR(svn_cache__has_key(&is_cached, ffd->reps_container_cache, + &key, scratch_pool)); + if (is_cached) + return SVN_NO_ERROR; + } + + SVN_ERR(read_item(&stream, fs, rev_file, entry, scratch_pool)); + + /* read noderevs from revision file */ + SVN_ERR(svn_fs_x__read_reps_container(&container, stream, result_pool, + scratch_pool)); + + /* extract requested data */ + + if (must_read) + SVN_ERR(svn_fs_x__reps_get(extractor, fs, container, sub_item, + result_pool)); + + if (ffd->noderevs_container_cache) + SVN_ERR(svn_cache__set(ffd->reps_container_cache, &key, container, + scratch_pool)); + + return SVN_NO_ERROR; +} + +static svn_error_t * +block_read(void **result, + svn_fs_t *fs, + const svn_fs_x__id_t *id, + svn_fs_x__revision_file_t *revision_file, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool) +{ + svn_fs_x__data_t *ffd = fs->fsap_data; + apr_off_t offset, wanted_offset = 0; + apr_off_t block_start = 0; + apr_uint32_t wanted_sub_item = 0; + svn_revnum_t revision = svn_fs_x__get_revnum(id->change_set); + apr_array_header_t *entries; + int run_count = 0; + int i; + apr_pool_t *iterpool = svn_pool_create(scratch_pool); + + /* don't try this on transaction protorev files */ + SVN_ERR_ASSERT(SVN_IS_VALID_REVNUM(revision)); + + /* index lookup: find the OFFSET of the item we *must* read plus (in the + * "do-while" block) the list of items in the same block. */ + SVN_ERR(svn_fs_x__item_offset(&wanted_offset, &wanted_sub_item, fs, + revision_file, id, iterpool)); + + offset = wanted_offset; + do + { + /* fetch list of items in the block surrounding OFFSET */ + SVN_ERR(aligned_seek(fs, revision_file->file, &block_start, offset, + iterpool)); + SVN_ERR(svn_fs_x__p2l_index_lookup(&entries, fs, revision_file, + revision, block_start, + ffd->block_size, scratch_pool, + scratch_pool)); + + /* read all items from the block */ + for (i = 0; i < entries->nelts; ++i) + { + svn_boolean_t is_result, is_wanted; + apr_pool_t *pool; + + svn_fs_x__p2l_entry_t* entry + = &APR_ARRAY_IDX(entries, i, svn_fs_x__p2l_entry_t); + + /* skip empty sections */ + if (entry->type == SVN_FS_X__ITEM_TYPE_UNUSED) + continue; + + /* the item / container we were looking for? */ + is_wanted = entry->offset == wanted_offset + && entry->item_count >= wanted_sub_item + && svn_fs_x__id_eq(entry->items + wanted_sub_item, id); + is_result = result && is_wanted; + + /* select the pool that we want the item to be allocated in */ + pool = is_result ? result_pool : iterpool; + + /* handle all items that start within this block and are relatively + * small (i.e. < block size). Always read the item we need to return. + */ + if (is_result || ( entry->offset >= block_start + && entry->size < ffd->block_size)) + { + void *item = NULL; + svn_fs_x__pair_cache_key_t key = { 0 }; + key.revision = svn_fs_x__get_revnum(entry->items[0].change_set); + key.second = entry->items[0].number; + + SVN_ERR(svn_io_file_seek(revision_file->file, SEEK_SET, + &entry->offset, iterpool)); + switch (entry->type) + { + case SVN_FS_X__ITEM_TYPE_FILE_REP: + case SVN_FS_X__ITEM_TYPE_DIR_REP: + case SVN_FS_X__ITEM_TYPE_FILE_PROPS: + case SVN_FS_X__ITEM_TYPE_DIR_PROPS: + SVN_ERR(block_read_contents(fs, revision_file, + entry, &key, + is_wanted + ? -1 + : block_start + ffd->block_size, + iterpool)); + break; + + case SVN_FS_X__ITEM_TYPE_NODEREV: + if (ffd->node_revision_cache || is_result) + SVN_ERR(block_read_noderev((svn_fs_x__noderev_t **)&item, + fs, revision_file, + entry, &key, is_result, + pool, iterpool)); + break; + + case SVN_FS_X__ITEM_TYPE_CHANGES: + SVN_ERR(block_read_changes((apr_array_header_t **)&item, + fs, revision_file, + entry, is_result, + pool, iterpool)); + break; + + case SVN_FS_X__ITEM_TYPE_CHANGES_CONT: + SVN_ERR(block_read_changes_container + ((apr_array_header_t **)&item, + fs, revision_file, + entry, wanted_sub_item, + is_result, pool, iterpool)); + break; + + case SVN_FS_X__ITEM_TYPE_NODEREVS_CONT: + SVN_ERR(block_read_noderevs_container + ((svn_fs_x__noderev_t **)&item, + fs, revision_file, + entry, wanted_sub_item, + is_result, pool, iterpool)); + break; + + case SVN_FS_X__ITEM_TYPE_REPS_CONT: + SVN_ERR(block_read_reps_container + ((svn_fs_x__rep_extractor_t **)&item, + fs, revision_file, + entry, wanted_sub_item, + is_result, pool, iterpool)); + break; + + default: + break; + } + + if (is_result) + *result = item; + + /* if we crossed a block boundary, read the remainder of + * the last block as well */ + offset = entry->offset + entry->size; + if (offset > block_start + ffd->block_size) + ++run_count; + + svn_pool_clear(iterpool); + } + } + } + while(run_count++ == 1); /* can only be true once and only if a block + * boundary got crossed */ + + /* if the caller requested a result, we must have provided one by now */ + assert(!result || *result); + svn_pool_destroy(iterpool); + + return SVN_NO_ERROR; +} diff --git a/subversion/libsvn_fs_x/cached_data.h b/subversion/libsvn_fs_x/cached_data.h new file mode 100644 index 0000000..079303e --- /dev/null +++ b/subversion/libsvn_fs_x/cached_data.h @@ -0,0 +1,180 @@ +/* cached_data.h --- cached (read) access to FSX data + * + * ==================================================================== + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + * ==================================================================== + */ + +#ifndef SVN_LIBSVN_FS__CACHED_DATA_H +#define SVN_LIBSVN_FS__CACHED_DATA_H + +#include "svn_pools.h" +#include "svn_fs.h" + +#include "fs.h" +#include "index.h" + + + +/* Set *NODEREV_P to the node-revision for the node ID in FS. Do any + allocations in POOL. */ +svn_error_t * +svn_fs_x__get_node_revision(svn_fs_x__noderev_t **noderev_p, + svn_fs_t *fs, + const svn_fs_x__id_t *id, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool); + +/* Set *COUNT to the value of the mergeinfo_count member of the node- + revision for the node ID in FS. Do temporary allocations in SCRATCH_POOL. + */ +svn_error_t * +svn_fs_x__get_mergeinfo_count(apr_int64_t *count, + svn_fs_t *fs, + const svn_fs_x__id_t *id, + apr_pool_t *scratch_pool); + +/* Verify that representation REP in FS can be accessed. + Do any allocations in SCRATCH_POOL. */ +svn_error_t * +svn_fs_x__check_rep(svn_fs_x__representation_t *rep, + svn_fs_t *fs, + apr_pool_t *scratch_pool); + +/* Follow the representation delta chain in FS starting with REP. The + number of reps (including REP) in the chain will be returned in + *CHAIN_LENGTH. *SHARD_COUNT will be set to the number of shards + accessed. Do any allocations in SCRATCH_POOL. */ +svn_error_t * +svn_fs_x__rep_chain_length(int *chain_length, + int *shard_count, + svn_fs_x__representation_t *rep, + svn_fs_t *fs, + apr_pool_t *scratch_pool); + +/* Set *CONTENTS to be a readable svn_stream_t that receives the text + representation REP as seen in filesystem FS. If CACHE_FULLTEXT is + not set, bypass fulltext cache lookup for this rep and don't put the + reconstructed fulltext into cache. + Allocate *CONTENT_P in RESULT_POOL. */ +svn_error_t * +svn_fs_x__get_contents(svn_stream_t **contents_p, + svn_fs_t *fs, + svn_fs_x__representation_t *rep, + svn_boolean_t cache_fulltext, + apr_pool_t *result_pool); + +/* Determine on-disk and expanded sizes of the representation identified + * by ENTRY in FS and return the result in PACKED_LEN and EXPANDED_LEN, + * respectively. FILE must point to the start of the representation and + * STREAM must be a stream defined on top of FILE. + * Use SCRATCH_POOL for temporary allocations. + */ +svn_error_t * +svn_fs_x__get_representation_length(svn_filesize_t *packed_len, + svn_filesize_t *expanded_len, + svn_fs_t *fs, + svn_fs_x__revision_file_t *rev_file, + svn_fs_x__p2l_entry_t* entry, + apr_pool_t *scratch_pool); + +/* Attempt to fetch the text representation of node-revision NODEREV as + seen in filesystem FS and pass it along with the BATON to the PROCESSOR. + Set *SUCCESS only of the data could be provided and the processing + had been called. + Use SCRATCH_POOL for temporary allocations. + */ +svn_error_t * +svn_fs_x__try_process_file_contents(svn_boolean_t *success, + svn_fs_t *fs, + svn_fs_x__noderev_t *noderev, + svn_fs_process_contents_func_t processor, + void* baton, + apr_pool_t *scratch_pool); + +/* Set *STREAM_P to a delta stream turning the contents of the file SOURCE + into the contents of the file TARGET, allocated in RESULT_POOL. + If SOURCE is NULL, an empty string will be used in its stead. + Use SCRATCH_POOL for temporary allocations. + */ +svn_error_t * +svn_fs_x__get_file_delta_stream(svn_txdelta_stream_t **stream_p, + svn_fs_t *fs, + svn_fs_x__noderev_t *source, + svn_fs_x__noderev_t *target, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool); + +/* Set *ENTRIES to an apr_array_header_t of dirent structs that contain + the directory entries of node-revision NODEREV in filesystem FS. The + returned table is allocated in RESULT_POOL and entries are sorted + lexicographically. SCRATCH_POOL is used for temporary allocations. */ +svn_error_t * +svn_fs_x__rep_contents_dir(apr_array_header_t **entries_p, + svn_fs_t *fs, + svn_fs_x__noderev_t *noderev, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool); + +/* Return the directory entry from ENTRIES that matches NAME. If no such + entry exists, return NULL. If HINT is not NULL, set *HINT to the array + index of the entry returned. Successive calls in a linear scan scenario + will be faster called with the same HINT variable. */ +svn_fs_x__dirent_t * +svn_fs_x__find_dir_entry(apr_array_header_t *entries, + const char *name, + int *hint); + +/* Set *DIRENT to the entry identified by NAME in the directory given + by NODEREV in filesystem FS. If no such entry exits, *DIRENT will + be NULL. The value referenced by HINT can be used to speed up + consecutive calls when travering the directory in name order. + Any value is allowed, however APR_SIZE_MAX gives best performance + when there has been no previous lookup for the same directory. + + The returned object is allocated in RESULT_POOL; SCRATCH_POOL + used for temporary allocations. */ +svn_error_t * +svn_fs_x__rep_contents_dir_entry(svn_fs_x__dirent_t **dirent, + svn_fs_t *fs, + svn_fs_x__noderev_t *noderev, + const char *name, + apr_size_t *hint, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool); + +/* Set *PROPLIST to be an apr_hash_t containing the property list of + node-revision NODEREV as seen in filesystem FS. Allocate the result + in RESULT_POOL and use SCRATCH_POOL for temporary allocations. */ +svn_error_t * +svn_fs_x__get_proplist(apr_hash_t **proplist, + svn_fs_t *fs, + svn_fs_x__noderev_t *noderev, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool); + +/* Fetch the list of change in revision REV in FS and return it in *CHANGES. + * Allocate the result in POOL. + */ +svn_error_t * +svn_fs_x__get_changes(apr_array_header_t **changes, + svn_fs_t *fs, + svn_revnum_t rev, + apr_pool_t *pool); + +#endif diff --git a/subversion/libsvn_fs_x/caching.c b/subversion/libsvn_fs_x/caching.c new file mode 100644 index 0000000..17e80bd --- /dev/null +++ b/subversion/libsvn_fs_x/caching.c @@ -0,0 +1,725 @@ +/* caching.c : in-memory caching + * + * ==================================================================== + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + * ==================================================================== + */ + +#include "fs.h" +#include "fs_x.h" +#include "id.h" +#include "dag.h" +#include "tree.h" +#include "index.h" +#include "changes.h" +#include "noderevs.h" +#include "temp_serializer.h" +#include "reps.h" +#include "../libsvn_fs/fs-loader.h" + +#include "svn_config.h" +#include "svn_cache_config.h" + +#include "svn_private_config.h" +#include "svn_hash.h" +#include "svn_pools.h" + +#include "private/svn_debug.h" +#include "private/svn_subr_private.h" + +/* Take the ORIGINAL string and replace all occurrences of ":" without + * limiting the key space. Allocate the result in RESULT_POOL. + */ +static const char * +normalize_key_part(const char *original, + apr_pool_t *result_pool) +{ + apr_size_t i; + apr_size_t len = strlen(original); + svn_stringbuf_t *normalized = svn_stringbuf_create_ensure(len, + result_pool); + + for (i = 0; i < len; ++i) + { + char c = original[i]; + switch (c) + { + case ':': svn_stringbuf_appendbytes(normalized, "%_", 2); + break; + case '%': svn_stringbuf_appendbytes(normalized, "%%", 2); + break; + default : svn_stringbuf_appendbyte(normalized, c); + } + } + + return normalized->data; +} + +/* *CACHE_TXDELTAS, *CACHE_FULLTEXTS and *CACHE_REVPROPS flags will be set + according to FS->CONFIG. *CACHE_NAMESPACE receives the cache prefix + to use. + + Allocate CACHE_NAMESPACE in RESULT_POOL. */ +static svn_error_t * +read_config(const char **cache_namespace, + svn_boolean_t *cache_txdeltas, + svn_boolean_t *cache_fulltexts, + svn_boolean_t *cache_revprops, + svn_fs_t *fs, + apr_pool_t *result_pool) +{ + /* No cache namespace by default. I.e. all FS instances share the + * cached data. If you specify different namespaces, the data will + * share / compete for the same cache memory but keys will not match + * across namespaces and, thus, cached data will not be shared between + * namespaces. + * + * Since the namespace will be concatenated with other elements to form + * the complete key prefix, we must make sure that the resulting string + * is unique and cannot be created by any other combination of elements. + */ + *cache_namespace + = normalize_key_part(svn_hash__get_cstring(fs->config, + SVN_FS_CONFIG_FSFS_CACHE_NS, + ""), + result_pool); + + /* don't cache text deltas by default. + * Once we reconstructed the fulltexts from the deltas, + * these deltas are rarely re-used. Therefore, only tools + * like svnadmin will activate this to speed up operations + * dump and verify. + */ + *cache_txdeltas + = svn_hash__get_bool(fs->config, + SVN_FS_CONFIG_FSFS_CACHE_DELTAS, + TRUE); + + /* by default, cache fulltexts. + * Most SVN tools care about reconstructed file content. + * Thus, this is a reasonable default. + * SVN admin tools may set that to FALSE because fulltexts + * won't be re-used rendering the cache less effective + * by squeezing wanted data out. + */ + *cache_fulltexts + = svn_hash__get_bool(fs->config, + SVN_FS_CONFIG_FSFS_CACHE_FULLTEXTS, + TRUE); + + /* don't cache revprops by default. + * Revprop caching significantly speeds up operations like + * svn ls -v. However, it requires synchronization that may + * not be available or efficient in the current server setup. + * Option "2" is equivalent to "1". + */ + if (strcmp(svn_hash__get_cstring(fs->config, + SVN_FS_CONFIG_FSFS_CACHE_REVPROPS, + ""), "2")) + *cache_revprops + = svn_hash__get_bool(fs->config, + SVN_FS_CONFIG_FSFS_CACHE_REVPROPS, + FALSE); + else + *cache_revprops = TRUE; + + return SVN_NO_ERROR; +} + + +/* Implements svn_cache__error_handler_t + * This variant clears the error after logging it. + */ +static svn_error_t * +warn_and_continue_on_cache_errors(svn_error_t *err, + void *baton, + apr_pool_t *pool) +{ + svn_fs_t *fs = baton; + (fs->warning)(fs->warning_baton, err); + svn_error_clear(err); + + return SVN_NO_ERROR; +} + +/* Implements svn_cache__error_handler_t + * This variant logs the error and passes it on to the callers. + */ +static svn_error_t * +warn_and_fail_on_cache_errors(svn_error_t *err, + void *baton, + apr_pool_t *pool) +{ + svn_fs_t *fs = baton; + (fs->warning)(fs->warning_baton, err); + return err; +} + +#ifdef SVN_DEBUG_CACHE_DUMP_STATS +/* Baton to be used for the dump_cache_statistics() pool cleanup function, */ +typedef struct dump_cache_baton_t +{ + /* the pool about to be cleaned up. Will be used for temp. allocations. */ + apr_pool_t *pool; + + /* the cache to dump the statistics for */ + svn_cache__t *cache; +} dump_cache_baton_t; + +/* APR pool cleanup handler that will printf the statistics of the + cache referenced by the baton in BATON_VOID. */ +static apr_status_t +dump_cache_statistics(void *baton_void) +{ + dump_cache_baton_t *baton = baton_void; + + apr_status_t result = APR_SUCCESS; + svn_cache__info_t info; + svn_string_t *text_stats; + apr_array_header_t *lines; + int i; + + svn_error_t *err = svn_cache__get_info(baton->cache, + &info, + TRUE, + baton->pool); + + /* skip unused caches */ + if (! err && (info.gets > 0 || info.sets > 0)) + { + text_stats = svn_cache__format_info(&info, TRUE, baton->pool); + lines = svn_cstring_split(text_stats->data, "\n", FALSE, baton->pool); + + for (i = 0; i < lines->nelts; ++i) + { + const char *line = APR_ARRAY_IDX(lines, i, const char *); +#ifdef SVN_DEBUG + SVN_DBG(("%s\n", line)); +#endif + } + } + + /* process error returns */ + if (err) + { + result = err->apr_err; + svn_error_clear(err); + } + + return result; +} + +static apr_status_t +dump_global_cache_statistics(void *baton_void) +{ + apr_pool_t *pool = baton_void; + + svn_cache__info_t *info = svn_cache__membuffer_get_global_info(pool); + svn_string_t *text_stats = svn_cache__format_info(info, FALSE, pool); + apr_array_header_t *lines = svn_cstring_split(text_stats->data, "\n", + FALSE, pool); + + int i; + for (i = 0; i < lines->nelts; ++i) + { + const char *line = APR_ARRAY_IDX(lines, i, const char *); +#ifdef SVN_DEBUG + SVN_DBG(("%s\n", line)); +#endif + } + + return APR_SUCCESS; +} + +#endif /* SVN_DEBUG_CACHE_DUMP_STATS */ + +/* This function sets / registers the required callbacks for a given + * not transaction-specific CACHE object in FS, if CACHE is not NULL. + * + * All these svn_cache__t instances shall be handled uniformly. Unless + * ERROR_HANDLER is NULL, register it for the given CACHE in FS. + */ +static svn_error_t * +init_callbacks(svn_cache__t *cache, + svn_fs_t *fs, + svn_cache__error_handler_t error_handler, + apr_pool_t *pool) +{ + if (cache != NULL) + { +#ifdef SVN_DEBUG_CACHE_DUMP_STATS + + /* schedule printing the access statistics upon pool cleanup, + * i.e. end of FSX session. + */ + dump_cache_baton_t *baton; + + baton = apr_palloc(pool, sizeof(*baton)); + baton->pool = pool; + baton->cache = cache; + + apr_pool_cleanup_register(pool, + baton, + dump_cache_statistics, + apr_pool_cleanup_null); +#endif + + if (error_handler) + SVN_ERR(svn_cache__set_error_handler(cache, + error_handler, + fs, + pool)); + + } + + return SVN_NO_ERROR; +} + +/* Sets *CACHE_P to cache instance based on provided options. + * Creates memcache if MEMCACHE is not NULL. Creates membuffer cache if + * MEMBUFFER is not NULL. Fallbacks to inprocess cache if MEMCACHE and + * MEMBUFFER are NULL and pages is non-zero. Sets *CACHE_P to NULL + * otherwise. Use the given PRIORITY class for the new cache. If it + * is 0, then use the default priority class. + * + * Unless NO_HANDLER is true, register an error handler that reports errors + * as warnings to the FS warning callback. + * + * Cache is allocated in RESULT_POOL, temporaries in SCRATCH_POOL. + * */ +static svn_error_t * +create_cache(svn_cache__t **cache_p, + svn_memcache_t *memcache, + svn_membuffer_t *membuffer, + apr_int64_t pages, + apr_int64_t items_per_page, + svn_cache__serialize_func_t serializer, + svn_cache__deserialize_func_t deserializer, + apr_ssize_t klen, + const char *prefix, + apr_uint32_t priority, + svn_fs_t *fs, + svn_boolean_t no_handler, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool) +{ + svn_cache__error_handler_t error_handler = no_handler + ? NULL + : warn_and_fail_on_cache_errors; + if (priority == 0) + priority = SVN_CACHE__MEMBUFFER_DEFAULT_PRIORITY; + + if (memcache) + { + SVN_ERR(svn_cache__create_memcache(cache_p, memcache, + serializer, deserializer, klen, + prefix, result_pool)); + error_handler = no_handler + ? NULL + : warn_and_continue_on_cache_errors; + } + else if (membuffer) + { + SVN_ERR(svn_cache__create_membuffer_cache( + cache_p, membuffer, serializer, deserializer, + klen, prefix, priority, FALSE, result_pool, scratch_pool)); + } + else if (pages) + { + SVN_ERR(svn_cache__create_inprocess( + cache_p, serializer, deserializer, klen, pages, + items_per_page, FALSE, prefix, result_pool)); + } + else + { + *cache_p = NULL; + } + + SVN_ERR(init_callbacks(*cache_p, fs, error_handler, result_pool)); + + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__initialize_caches(svn_fs_t *fs, + apr_pool_t *scratch_pool) +{ + svn_fs_x__data_t *ffd = fs->fsap_data; + const char *prefix = apr_pstrcat(scratch_pool, + "fsx:", fs->uuid, + "/", normalize_key_part(fs->path, + scratch_pool), + ":", + SVN_VA_NULL); + svn_membuffer_t *membuffer; + svn_boolean_t no_handler = ffd->fail_stop; + svn_boolean_t cache_txdeltas; + svn_boolean_t cache_fulltexts; + svn_boolean_t cache_revprops; + const char *cache_namespace; + + /* Evaluating the cache configuration. */ + SVN_ERR(read_config(&cache_namespace, + &cache_txdeltas, + &cache_fulltexts, + &cache_revprops, + fs, + scratch_pool)); + + prefix = apr_pstrcat(scratch_pool, "ns:", cache_namespace, ":", prefix, + SVN_VA_NULL); + + membuffer = svn_cache__get_global_membuffer_cache(); + + /* General rules for assigning cache priorities: + * + * - Data that can be reconstructed from other elements has low prio + * (e.g. fulltexts, directories etc.) + * - Index data required to find any of the other data has high prio + * (e.g. noderevs, L2P and P2L index pages) + * - everthing else should use default prio + */ + +#ifdef SVN_DEBUG_CACHE_DUMP_STATS + + /* schedule printing the global access statistics upon pool cleanup, + * i.e. end of FSX session. + */ + if (membuffer) + apr_pool_cleanup_register(fs->pool, + fs->pool, + dump_global_cache_statistics, + apr_pool_cleanup_null); +#endif + + /* Rough estimate: revision DAG nodes have size around 320 bytes, so + * let's put 16 on a page. */ + SVN_ERR(create_cache(&(ffd->rev_node_cache), + NULL, + membuffer, + 1024, 16, + svn_fs_x__dag_serialize, + svn_fs_x__dag_deserialize, + APR_HASH_KEY_STRING, + apr_pstrcat(scratch_pool, prefix, "DAG", SVN_VA_NULL), + SVN_CACHE__MEMBUFFER_LOW_PRIORITY, + fs, + no_handler, + fs->pool, scratch_pool)); + + /* 1st level DAG node cache */ + ffd->dag_node_cache = svn_fs_x__create_dag_cache(fs->pool); + + /* Very rough estimate: 1K per directory. */ + SVN_ERR(create_cache(&(ffd->dir_cache), + NULL, + membuffer, + 1024, 8, + svn_fs_x__serialize_dir_entries, + svn_fs_x__deserialize_dir_entries, + sizeof(svn_fs_x__id_t), + apr_pstrcat(scratch_pool, prefix, "DIR", SVN_VA_NULL), + SVN_CACHE__MEMBUFFER_DEFAULT_PRIORITY, + fs, + no_handler, + fs->pool, scratch_pool)); + + /* Only 16 bytes per entry (a revision number + the corresponding offset). + Since we want ~8k pages, that means 512 entries per page. */ + SVN_ERR(create_cache(&(ffd->packed_offset_cache), + NULL, + membuffer, + 32, 1, + svn_fs_x__serialize_manifest, + svn_fs_x__deserialize_manifest, + sizeof(svn_revnum_t), + apr_pstrcat(scratch_pool, prefix, "PACK-MANIFEST", + SVN_VA_NULL), + SVN_CACHE__MEMBUFFER_HIGH_PRIORITY, + fs, + no_handler, + fs->pool, scratch_pool)); + + /* initialize node revision cache, if caching has been enabled */ + SVN_ERR(create_cache(&(ffd->node_revision_cache), + NULL, + membuffer, + 32, 32, /* ~200 byte / entry; 1k entries total */ + svn_fs_x__serialize_node_revision, + svn_fs_x__deserialize_node_revision, + sizeof(svn_fs_x__pair_cache_key_t), + apr_pstrcat(scratch_pool, prefix, "NODEREVS", + SVN_VA_NULL), + SVN_CACHE__MEMBUFFER_HIGH_PRIORITY, + fs, + no_handler, + fs->pool, scratch_pool)); + + /* initialize representation header cache, if caching has been enabled */ + SVN_ERR(create_cache(&(ffd->rep_header_cache), + NULL, + membuffer, + 1, 1000, /* ~8 bytes / entry; 1k entries total */ + svn_fs_x__serialize_rep_header, + svn_fs_x__deserialize_rep_header, + sizeof(svn_fs_x__representation_cache_key_t), + apr_pstrcat(scratch_pool, prefix, "REPHEADER", + SVN_VA_NULL), + SVN_CACHE__MEMBUFFER_DEFAULT_PRIORITY, + fs, + no_handler, + fs->pool, scratch_pool)); + + /* initialize node change list cache, if caching has been enabled */ + SVN_ERR(create_cache(&(ffd->changes_cache), + NULL, + membuffer, + 1, 8, /* 1k / entry; 8 entries total, rarely used */ + svn_fs_x__serialize_changes, + svn_fs_x__deserialize_changes, + sizeof(svn_revnum_t), + apr_pstrcat(scratch_pool, prefix, "CHANGES", + SVN_VA_NULL), + 0, + fs, + no_handler, + fs->pool, scratch_pool)); + + /* if enabled, cache fulltext and other derived information */ + if (cache_fulltexts) + { + SVN_ERR(create_cache(&(ffd->fulltext_cache), + ffd->memcache, + membuffer, + 0, 0, /* Do not use inprocess cache */ + /* Values are svn_stringbuf_t */ + NULL, NULL, + sizeof(svn_fs_x__pair_cache_key_t), + apr_pstrcat(scratch_pool, prefix, "TEXT", + SVN_VA_NULL), + SVN_CACHE__MEMBUFFER_DEFAULT_PRIORITY, + fs, + no_handler, + fs->pool, scratch_pool)); + + SVN_ERR(create_cache(&(ffd->properties_cache), + NULL, + membuffer, + 0, 0, /* Do not use inprocess cache */ + svn_fs_x__serialize_properties, + svn_fs_x__deserialize_properties, + sizeof(svn_fs_x__pair_cache_key_t), + apr_pstrcat(scratch_pool, prefix, "PROP", + SVN_VA_NULL), + SVN_CACHE__MEMBUFFER_DEFAULT_PRIORITY, + fs, + no_handler, + fs->pool, scratch_pool)); + + SVN_ERR(create_cache(&(ffd->mergeinfo_cache), + NULL, + membuffer, + 0, 0, /* Do not use inprocess cache */ + svn_fs_x__serialize_mergeinfo, + svn_fs_x__deserialize_mergeinfo, + APR_HASH_KEY_STRING, + apr_pstrcat(scratch_pool, prefix, "MERGEINFO", + SVN_VA_NULL), + 0, + fs, + no_handler, + fs->pool, scratch_pool)); + + SVN_ERR(create_cache(&(ffd->mergeinfo_existence_cache), + NULL, + membuffer, + 0, 0, /* Do not use inprocess cache */ + /* Values are svn_stringbuf_t */ + NULL, NULL, + APR_HASH_KEY_STRING, + apr_pstrcat(scratch_pool, prefix, "HAS_MERGEINFO", + SVN_VA_NULL), + 0, + fs, + no_handler, + fs->pool, scratch_pool)); + } + else + { + ffd->fulltext_cache = NULL; + ffd->properties_cache = NULL; + ffd->mergeinfo_cache = NULL; + ffd->mergeinfo_existence_cache = NULL; + } + + /* if enabled, cache revprops */ + if (cache_revprops) + { + SVN_ERR(create_cache(&(ffd->revprop_cache), + NULL, + membuffer, + 0, 0, /* Do not use inprocess cache */ + svn_fs_x__serialize_properties, + svn_fs_x__deserialize_properties, + sizeof(svn_fs_x__pair_cache_key_t), + apr_pstrcat(scratch_pool, prefix, "REVPROP", + SVN_VA_NULL), + SVN_CACHE__MEMBUFFER_DEFAULT_PRIORITY, + fs, + no_handler, + fs->pool, scratch_pool)); + } + else + { + ffd->revprop_cache = NULL; + } + + /* if enabled, cache text deltas and their combinations */ + if (cache_txdeltas) + { + SVN_ERR(create_cache(&(ffd->txdelta_window_cache), + NULL, + membuffer, + 0, 0, /* Do not use inprocess cache */ + svn_fs_x__serialize_txdelta_window, + svn_fs_x__deserialize_txdelta_window, + sizeof(svn_fs_x__window_cache_key_t), + apr_pstrcat(scratch_pool, prefix, "TXDELTA_WINDOW", + SVN_VA_NULL), + SVN_CACHE__MEMBUFFER_LOW_PRIORITY, + fs, + no_handler, + fs->pool, scratch_pool)); + + SVN_ERR(create_cache(&(ffd->combined_window_cache), + NULL, + membuffer, + 0, 0, /* Do not use inprocess cache */ + /* Values are svn_stringbuf_t */ + NULL, NULL, + sizeof(svn_fs_x__window_cache_key_t), + apr_pstrcat(scratch_pool, prefix, "COMBINED_WINDOW", + SVN_VA_NULL), + SVN_CACHE__MEMBUFFER_LOW_PRIORITY, + fs, + no_handler, + fs->pool, scratch_pool)); + } + else + { + ffd->txdelta_window_cache = NULL; + ffd->combined_window_cache = NULL; + } + + SVN_ERR(create_cache(&(ffd->noderevs_container_cache), + NULL, + membuffer, + 16, 4, /* Important, largish objects */ + svn_fs_x__serialize_noderevs_container, + svn_fs_x__deserialize_noderevs_container, + sizeof(svn_fs_x__pair_cache_key_t), + apr_pstrcat(scratch_pool, prefix, "NODEREVSCNT", + SVN_VA_NULL), + SVN_CACHE__MEMBUFFER_HIGH_PRIORITY, + fs, + no_handler, + fs->pool, scratch_pool)); + SVN_ERR(create_cache(&(ffd->changes_container_cache), + NULL, + membuffer, + 0, 0, /* Do not use inprocess cache */ + svn_fs_x__serialize_changes_container, + svn_fs_x__deserialize_changes_container, + sizeof(svn_fs_x__pair_cache_key_t), + apr_pstrcat(scratch_pool, prefix, "CHANGESCNT", + SVN_VA_NULL), + 0, + fs, + no_handler, + fs->pool, scratch_pool)); + SVN_ERR(create_cache(&(ffd->reps_container_cache), + NULL, + membuffer, + 0, 0, /* Do not use inprocess cache */ + svn_fs_x__serialize_reps_container, + svn_fs_x__deserialize_reps_container, + sizeof(svn_fs_x__pair_cache_key_t), + apr_pstrcat(scratch_pool, prefix, "REPSCNT", + SVN_VA_NULL), + 0, + fs, + no_handler, + fs->pool, scratch_pool)); + + SVN_ERR(create_cache(&(ffd->l2p_header_cache), + NULL, + membuffer, + 64, 16, /* entry size varies but we must cover + a reasonable number of revisions (1k) */ + svn_fs_x__serialize_l2p_header, + svn_fs_x__deserialize_l2p_header, + sizeof(svn_fs_x__pair_cache_key_t), + apr_pstrcat(scratch_pool, prefix, "L2P_HEADER", + SVN_VA_NULL), + SVN_CACHE__MEMBUFFER_HIGH_PRIORITY, + fs, + no_handler, + fs->pool, scratch_pool)); + SVN_ERR(create_cache(&(ffd->l2p_page_cache), + NULL, + membuffer, + 64, 16, /* entry size varies but we must cover + a reasonable number of revisions (1k) */ + svn_fs_x__serialize_l2p_page, + svn_fs_x__deserialize_l2p_page, + sizeof(svn_fs_x__page_cache_key_t), + apr_pstrcat(scratch_pool, prefix, "L2P_PAGE", + SVN_VA_NULL), + SVN_CACHE__MEMBUFFER_HIGH_PRIORITY, + fs, + no_handler, + fs->pool, scratch_pool)); + SVN_ERR(create_cache(&(ffd->p2l_header_cache), + NULL, + membuffer, + 4, 1, /* Large entries. Rarely used. */ + svn_fs_x__serialize_p2l_header, + svn_fs_x__deserialize_p2l_header, + sizeof(svn_fs_x__pair_cache_key_t), + apr_pstrcat(scratch_pool, prefix, "P2L_HEADER", + SVN_VA_NULL), + SVN_CACHE__MEMBUFFER_HIGH_PRIORITY, + fs, + no_handler, + fs->pool, scratch_pool)); + SVN_ERR(create_cache(&(ffd->p2l_page_cache), + NULL, + membuffer, + 4, 16, /* Variably sized entries. Rarely used. */ + svn_fs_x__serialize_p2l_page, + svn_fs_x__deserialize_p2l_page, + sizeof(svn_fs_x__page_cache_key_t), + apr_pstrcat(scratch_pool, prefix, "P2L_PAGE", + SVN_VA_NULL), + SVN_CACHE__MEMBUFFER_HIGH_PRIORITY, + fs, + no_handler, + fs->pool, scratch_pool)); + + return SVN_NO_ERROR; +} diff --git a/subversion/libsvn_fs_x/changes.c b/subversion/libsvn_fs_x/changes.c new file mode 100644 index 0000000..a7d5ee2 --- /dev/null +++ b/subversion/libsvn_fs_x/changes.c @@ -0,0 +1,536 @@ +/* changes.h --- FSX changed paths lists container + * + * ==================================================================== + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + * ==================================================================== + */ + +#include "svn_private_config.h" + +#include "private/svn_packed_data.h" + +#include "changes.h" +#include "string_table.h" +#include "temp_serializer.h" + +/* These flags will be used with the FLAGS field in binary_change_t. + */ + +/* the change contains a text modification */ +#define CHANGE_TEXT_MOD 0x00001 + +/* the change contains a property modification */ +#define CHANGE_PROP_MOD 0x00002 + +/* the last part (rev_id) of node revision ID is a transaction ID */ +#define CHANGE_TXN_NODE 0x00004 + +/* (flags & CHANGE_NODE_MASK) >> CHANGE_NODE_SHIFT extracts the node type */ +#define CHANGE_NODE_SHIFT 0x00003 +#define CHANGE_NODE_MASK 0x00018 + +/* node types according to svn_node_kind_t */ +#define CHANGE_NODE_NONE 0x00000 +#define CHANGE_NODE_FILE 0x00008 +#define CHANGE_NODE_DIR 0x00010 +#define CHANGE_NODE_UNKNOWN 0x00018 + +/* (flags & CHANGE_KIND_MASK) >> CHANGE_KIND_SHIFT extracts the change type */ +#define CHANGE_KIND_SHIFT 0x00005 +#define CHANGE_KIND_MASK 0x000E0 + +/* node types according to svn_fs_path_change_kind_t */ +#define CHANGE_KIND_MODIFY 0x00000 +#define CHANGE_KIND_ADD 0x00020 +#define CHANGE_KIND_DELETE 0x00040 +#define CHANGE_KIND_REPLACE 0x00060 +#define CHANGE_KIND_RESET 0x00080 +#define CHANGE_KIND_MOVE 0x000A0 +#define CHANGE_KIND_MOVEREPLACE 0x000C0 + +/* Our internal representation of a change */ +typedef struct binary_change_t +{ + /* define the kind of change and what specific information is present */ + int flags; + + /* Path of the change. */ + apr_size_t path; + + /* copy-from information. + * Not present if COPYFROM_REV is SVN_INVALID_REVNUM. */ + svn_revnum_t copyfrom_rev; + apr_size_t copyfrom_path; + + /* Relevant parts of the node revision ID of the change. + * Empty, if REV_ID is not "used". */ + svn_fs_x__id_t noderev_id; + +} binary_change_t; + +/* The actual container object. Change lists are concatenated into CHANGES + * and and their begins and ends are stored in OFFSETS. + */ +struct svn_fs_x__changes_t +{ + /* The paths - either in 'builder' mode or finalized mode. + * The respective other pointer will be NULL. */ + string_table_builder_t *builder; + string_table_t *paths; + + /* All changes of all change lists concatenated. + * Array elements are binary_change_t.structs (not pointer!) */ + apr_array_header_t *changes; + + /* [Offsets[index] .. Offsets[index+1]) is the range in CHANGES that + * forms the contents of change list INDEX. */ + apr_array_header_t *offsets; +}; + +/* Create and return a new container object, allocated in RESULT_POOL with + * an initial capacity of INITIAL_COUNT changes. The PATH and BUILDER + * members must be initialized by the caller afterwards. + */ +static svn_fs_x__changes_t * +changes_create_body(apr_size_t initial_count, + apr_pool_t *result_pool) +{ + svn_fs_x__changes_t *changes = apr_pcalloc(result_pool, sizeof(*changes)); + + changes->changes = apr_array_make(result_pool, (int)initial_count, + sizeof(binary_change_t)); + changes->offsets = apr_array_make(result_pool, 16, sizeof(int)); + APR_ARRAY_PUSH(changes->offsets, int) = 0; + + return changes; +} + +svn_fs_x__changes_t * +svn_fs_x__changes_create(apr_size_t initial_count, + apr_pool_t *result_pool) +{ + svn_fs_x__changes_t *changes = changes_create_body(initial_count, + result_pool); + changes->builder = svn_fs_x__string_table_builder_create(result_pool); + + return changes; +} + +/* Add CHANGE to the latest change list in CHANGES. + */ +static svn_error_t * +append_change(svn_fs_x__changes_t *changes, + svn_fs_x__change_t *change) +{ + binary_change_t binary_change = { 0 }; + svn_boolean_t is_txn_id; + + /* CHANGE must be sufficiently complete */ + SVN_ERR_ASSERT(change); + SVN_ERR_ASSERT(change->path.data); + + /* Relevant parts of the revision ID of the change. */ + binary_change.noderev_id = change->noderev_id; + + /* define the kind of change and what specific information is present */ + is_txn_id = svn_fs_x__is_txn(binary_change.noderev_id.change_set); + binary_change.flags = (change->text_mod ? CHANGE_TEXT_MOD : 0) + | (change->prop_mod ? CHANGE_PROP_MOD : 0) + | (is_txn_id ? CHANGE_TXN_NODE : 0) + | ((int)change->change_kind << CHANGE_KIND_SHIFT) + | ((int)change->node_kind << CHANGE_NODE_SHIFT); + + /* Path of the change. */ + binary_change.path + = svn_fs_x__string_table_builder_add(changes->builder, + change->path.data, + change->path.len); + + /* copy-from information, if presence is indicated by FLAGS */ + if (SVN_IS_VALID_REVNUM(change->copyfrom_rev)) + { + binary_change.copyfrom_rev = change->copyfrom_rev; + binary_change.copyfrom_path + = svn_fs_x__string_table_builder_add(changes->builder, + change->copyfrom_path, + 0); + } + else + { + binary_change.copyfrom_rev = SVN_INVALID_REVNUM; + binary_change.copyfrom_path = 0; + } + + APR_ARRAY_PUSH(changes->changes, binary_change_t) = binary_change; + + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__changes_append_list(apr_size_t *list_index, + svn_fs_x__changes_t *changes, + apr_array_header_t *list) +{ + int i; + + /* CHANGES must be in 'builder' mode */ + SVN_ERR_ASSERT(changes->builder); + SVN_ERR_ASSERT(changes->paths == NULL); + + /* simply append the list and all changes */ + for (i = 0; i < list->nelts; ++i) + append_change(changes, APR_ARRAY_IDX(list, i, svn_fs_x__change_t *)); + + /* terminate the list by storing the next changes offset */ + APR_ARRAY_PUSH(changes->offsets, int) = changes->changes->nelts; + *list_index = (apr_size_t)(changes->offsets->nelts - 2); + + return SVN_NO_ERROR; +} + +apr_size_t +svn_fs_x__changes_estimate_size(const svn_fs_x__changes_t *changes) +{ + /* CHANGES must be in 'builder' mode */ + if (changes->builder == NULL) + return 0; + + /* string table code makes its own prediction, + * changes should be < 10 bytes each, + * some static overhead should be assumed */ + return svn_fs_x__string_table_builder_estimate_size(changes->builder) + + changes->changes->nelts * 10 + + 100; +} + +svn_error_t * +svn_fs_x__changes_get_list(apr_array_header_t **list, + const svn_fs_x__changes_t *changes, + apr_size_t idx, + apr_pool_t *pool) +{ + int first; + int last; + int i; + + /* CHANGES must be in 'finalized' mode */ + SVN_ERR_ASSERT(changes->builder == NULL); + SVN_ERR_ASSERT(changes->paths); + + /* validate index */ + if (idx + 1 >= (apr_size_t)changes->offsets->nelts) + return svn_error_createf(SVN_ERR_FS_CONTAINER_INDEX, NULL, + apr_psprintf(pool, + _("Changes list index %%%s" + " exceeds container size %%d"), + APR_SIZE_T_FMT), + idx, changes->offsets->nelts - 1); + + /* range of changes to return */ + first = APR_ARRAY_IDX(changes->offsets, (int)idx, int); + last = APR_ARRAY_IDX(changes->offsets, (int)idx + 1, int); + + /* construct result */ + *list = apr_array_make(pool, last - first, sizeof(svn_fs_x__change_t*)); + for (i = first; i < last; ++i) + { + const binary_change_t *binary_change + = &APR_ARRAY_IDX(changes->changes, i, binary_change_t); + + /* convert BINARY_CHANGE into a standard FSX svn_fs_x__change_t */ + svn_fs_x__change_t *change = apr_pcalloc(pool, sizeof(*change)); + change->path.data = svn_fs_x__string_table_get(changes->paths, + binary_change->path, + &change->path.len, + pool); + + if (binary_change->noderev_id.change_set != SVN_FS_X__INVALID_CHANGE_SET) + change->noderev_id = binary_change->noderev_id; + + change->change_kind = (svn_fs_path_change_kind_t) + ((binary_change->flags & CHANGE_KIND_MASK) >> CHANGE_KIND_SHIFT); + change->text_mod = (binary_change->flags & CHANGE_TEXT_MOD) != 0; + change->prop_mod = (binary_change->flags & CHANGE_PROP_MOD) != 0; + change->node_kind = (svn_node_kind_t) + ((binary_change->flags & CHANGE_NODE_MASK) >> CHANGE_NODE_SHIFT); + + change->copyfrom_rev = binary_change->copyfrom_rev; + change->copyfrom_known = TRUE; + if (SVN_IS_VALID_REVNUM(binary_change->copyfrom_rev)) + change->copyfrom_path + = svn_fs_x__string_table_get(changes->paths, + binary_change->copyfrom_path, + NULL, + pool); + + /* add it to the result */ + APR_ARRAY_PUSH(*list, svn_fs_x__change_t*) = change; + } + + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__write_changes_container(svn_stream_t *stream, + const svn_fs_x__changes_t *changes, + apr_pool_t *scratch_pool) +{ + int i; + + string_table_t *paths = changes->paths + ? changes->paths + : svn_fs_x__string_table_create(changes->builder, + scratch_pool); + + svn_packed__data_root_t *root = svn_packed__data_create_root(scratch_pool); + + /* one top-level stream for each array */ + svn_packed__int_stream_t *offsets_stream + = svn_packed__create_int_stream(root, TRUE, FALSE); + svn_packed__int_stream_t *changes_stream + = svn_packed__create_int_stream(root, FALSE, FALSE); + + /* structure the CHANGES_STREAM such we can extract much of the redundancy + * from the binary_change_t structs */ + svn_packed__create_int_substream(changes_stream, TRUE, FALSE); + svn_packed__create_int_substream(changes_stream, TRUE, FALSE); + svn_packed__create_int_substream(changes_stream, TRUE, TRUE); + svn_packed__create_int_substream(changes_stream, TRUE, FALSE); + svn_packed__create_int_substream(changes_stream, TRUE, TRUE); + svn_packed__create_int_substream(changes_stream, TRUE, FALSE); + + /* serialize offsets array */ + for (i = 0; i < changes->offsets->nelts; ++i) + svn_packed__add_uint(offsets_stream, + APR_ARRAY_IDX(changes->offsets, i, int)); + + /* serialize changes array */ + for (i = 0; i < changes->changes->nelts; ++i) + { + const binary_change_t *change + = &APR_ARRAY_IDX(changes->changes, i, binary_change_t); + + svn_packed__add_uint(changes_stream, change->flags); + svn_packed__add_uint(changes_stream, change->path); + + svn_packed__add_int(changes_stream, change->copyfrom_rev); + svn_packed__add_uint(changes_stream, change->copyfrom_path); + + svn_packed__add_int(changes_stream, change->noderev_id.change_set); + svn_packed__add_uint(changes_stream, change->noderev_id.number); + } + + /* write to disk */ + SVN_ERR(svn_fs_x__write_string_table(stream, paths, scratch_pool)); + SVN_ERR(svn_packed__data_write(stream, root, scratch_pool)); + + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__read_changes_container(svn_fs_x__changes_t **changes_p, + svn_stream_t *stream, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool) +{ + apr_size_t i; + apr_size_t count; + + svn_fs_x__changes_t *changes = apr_pcalloc(result_pool, sizeof(*changes)); + + svn_packed__data_root_t *root; + svn_packed__int_stream_t *offsets_stream; + svn_packed__int_stream_t *changes_stream; + + /* read from disk */ + SVN_ERR(svn_fs_x__read_string_table(&changes->paths, stream, + result_pool, scratch_pool)); + + SVN_ERR(svn_packed__data_read(&root, stream, result_pool, scratch_pool)); + offsets_stream = svn_packed__first_int_stream(root); + changes_stream = svn_packed__next_int_stream(offsets_stream); + + /* read offsets array */ + count = svn_packed__int_count(offsets_stream); + changes->offsets = apr_array_make(result_pool, (int)count, sizeof(int)); + for (i = 0; i < count; ++i) + APR_ARRAY_PUSH(changes->offsets, int) + = (int)svn_packed__get_uint(offsets_stream); + + /* read changes array */ + count + = svn_packed__int_count(svn_packed__first_int_substream(changes_stream)); + changes->changes + = apr_array_make(result_pool, (int)count, sizeof(binary_change_t)); + for (i = 0; i < count; ++i) + { + binary_change_t change; + + change.flags = (int)svn_packed__get_uint(changes_stream); + change.path = (apr_size_t)svn_packed__get_uint(changes_stream); + + change.copyfrom_rev = (svn_revnum_t)svn_packed__get_int(changes_stream); + change.copyfrom_path = (apr_size_t)svn_packed__get_uint(changes_stream); + + change.noderev_id.change_set = svn_packed__get_int(changes_stream); + change.noderev_id.number = svn_packed__get_uint(changes_stream); + + APR_ARRAY_PUSH(changes->changes, binary_change_t) = change; + } + + *changes_p = changes; + + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__serialize_changes_container(void **data, + apr_size_t *data_len, + void *in, + apr_pool_t *pool) +{ + svn_fs_x__changes_t *changes = in; + svn_stringbuf_t *serialized; + + /* make a guesstimate on the size of the serialized data. Erring on the + * low side will cause the serializer to re-alloc its buffer. */ + apr_size_t size + = changes->changes->elt_size * changes->changes->nelts + + changes->offsets->elt_size * changes->offsets->nelts + + 10 * changes->changes->elt_size + + 100; + + /* serialize array header and all its elements */ + svn_temp_serializer__context_t *context + = svn_temp_serializer__init(changes, sizeof(*changes), size, pool); + + /* serialize sub-structures */ + svn_fs_x__serialize_string_table(context, &changes->paths); + svn_fs_x__serialize_apr_array(context, &changes->changes); + svn_fs_x__serialize_apr_array(context, &changes->offsets); + + /* return the serialized result */ + serialized = svn_temp_serializer__get(context); + + *data = serialized->data; + *data_len = serialized->len; + + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__deserialize_changes_container(void **out, + void *data, + apr_size_t data_len, + apr_pool_t *pool) +{ + svn_fs_x__changes_t *changes = (svn_fs_x__changes_t *)data; + + /* de-serialize sub-structures */ + svn_fs_x__deserialize_string_table(changes, &changes->paths); + svn_fs_x__deserialize_apr_array(changes, &changes->changes, pool); + svn_fs_x__deserialize_apr_array(changes, &changes->offsets, pool); + + /* done */ + *out = changes; + + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__changes_get_list_func(void **out, + const void *data, + apr_size_t data_len, + void *baton, + apr_pool_t *pool) +{ + int first; + int last; + int i; + apr_array_header_t *list; + + apr_uint32_t idx = *(apr_uint32_t *)baton; + const svn_fs_x__changes_t *container = data; + + /* resolve all the sub-container pointers we need */ + const string_table_t *paths + = svn_temp_deserializer__ptr(container, + (const void *const *)&container->paths); + const apr_array_header_t *serialized_offsets + = svn_temp_deserializer__ptr(container, + (const void *const *)&container->offsets); + const apr_array_header_t *serialized_changes + = svn_temp_deserializer__ptr(container, + (const void *const *)&container->changes); + const int *offsets + = svn_temp_deserializer__ptr(serialized_offsets, + (const void *const *)&serialized_offsets->elts); + const binary_change_t *changes + = svn_temp_deserializer__ptr(serialized_changes, + (const void *const *)&serialized_changes->elts); + + /* validate index */ + if (idx + 1 >= (apr_size_t)serialized_offsets->nelts) + return svn_error_createf(SVN_ERR_FS_CONTAINER_INDEX, NULL, + _("Changes list index %u exceeds container " + "size %d"), + (unsigned)idx, serialized_offsets->nelts - 1); + + /* range of changes to return */ + first = offsets[idx]; + last = offsets[idx+1]; + + /* construct result */ + list = apr_array_make(pool, last - first, sizeof(svn_fs_x__change_t*)); + + for (i = first; i < last; ++i) + { + const binary_change_t *binary_change = &changes[i]; + + /* convert BINARY_CHANGE into a standard FSX svn_fs_x__change_t */ + svn_fs_x__change_t *change = apr_pcalloc(pool, sizeof(*change)); + change->path.data + = svn_fs_x__string_table_get_func(paths, binary_change->path, + &change->path.len, pool); + + change->noderev_id = binary_change->noderev_id; + + change->change_kind = (svn_fs_path_change_kind_t) + ((binary_change->flags & CHANGE_KIND_MASK) >> CHANGE_KIND_SHIFT); + change->text_mod = (binary_change->flags & CHANGE_TEXT_MOD) != 0; + change->prop_mod = (binary_change->flags & CHANGE_PROP_MOD) != 0; + change->node_kind = (svn_node_kind_t) + ((binary_change->flags & CHANGE_NODE_MASK) >> CHANGE_NODE_SHIFT); + + change->copyfrom_rev = binary_change->copyfrom_rev; + change->copyfrom_known = TRUE; + if (SVN_IS_VALID_REVNUM(binary_change->copyfrom_rev)) + change->copyfrom_path + = svn_fs_x__string_table_get_func(paths, + binary_change->copyfrom_path, + NULL, + pool); + + /* add it to the result */ + APR_ARRAY_PUSH(list, svn_fs_x__change_t*) = change; + } + + *out = list; + + return SVN_NO_ERROR; +} diff --git a/subversion/libsvn_fs_x/changes.h b/subversion/libsvn_fs_x/changes.h new file mode 100644 index 0000000..ccb2647 --- /dev/null +++ b/subversion/libsvn_fs_x/changes.h @@ -0,0 +1,132 @@ +/* changes.h --- FSX changed paths lists container + * + * ==================================================================== + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + * ==================================================================== + */ + +#ifndef SVN_LIBSVN_FS__CHANGES_H +#define SVN_LIBSVN_FS__CHANGES_H + +#include "svn_io.h" +#include "fs.h" + +/* Entries in a revision's change list tend to be widely redundant (similar + * changes to similar paths). Even more so, change lists from a larger + * revision range also tend to overlap. + * + * In its serialized form, the svn_fs_x__changes_t container extracts most + * of that redundancy and the run-time representation is also much smaller + * than sum of the respective svn_fs_x__change_t* arrays. + * + * As with other containers, this one has two modes: 'construction', in + * which you may add data to it, and 'getter' in which there is only r/o + * access to the data. + */ + +/* An opaque collection of change lists (apr_array_header_t * of + * svn_fs_x__change_t *). + */ +typedef struct svn_fs_x__changes_t svn_fs_x__changes_t; + +/* Create and populate changes containers. */ + +/* Create and return a new changes container with an initial capacity of + * INITIAL_COUNT svn_fs_x__change_t objects. + * Allocate the result in RESULT_POOL. + */ +svn_fs_x__changes_t * +svn_fs_x__changes_create(apr_size_t initial_count, + apr_pool_t *result_pool); + +/* Start a new change list CHANGES (implicitly terminating the previous one) + * and return its index in *LIST_INDEX. Append all changes from LIST to + * that new change list. + */ +svn_error_t * +svn_fs_x__changes_append_list(apr_size_t *list_index, + svn_fs_x__changes_t *changes, + apr_array_header_t *list); + +/* Return a rough estimate in bytes for the serialized representation + * of CHANGES. + */ +apr_size_t +svn_fs_x__changes_estimate_size(const svn_fs_x__changes_t *changes); + +/* Read changes containers. */ + +/* From CHANGES, extract the change list with the given IDX. Allocate + * the result in POOL and return it in *LIST. + */ +svn_error_t * +svn_fs_x__changes_get_list(apr_array_header_t **list, + const svn_fs_x__changes_t *changes, + apr_size_t idx, + apr_pool_t *pool); + +/* I/O interface. */ + +/* Write a serialized representation of CHANGES to STREAM. + * Use SCRATCH_POOL for temporary allocations. + */ +svn_error_t * +svn_fs_x__write_changes_container(svn_stream_t *stream, + const svn_fs_x__changes_t *changes, + apr_pool_t *scratch_pool); + +/* Read a changes container from its serialized representation in STREAM. + * Allocate the result in RESULT_POOL and return it in *CHANGES_P. Use + * SCRATCH_POOL for temporary allocations. + */ +svn_error_t * +svn_fs_x__read_changes_container(svn_fs_x__changes_t **changes_p, + svn_stream_t *stream, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool); + +/* Implements #svn_cache__serialize_func_t for svn_fs_x__changes_t objects. + */ +svn_error_t * +svn_fs_x__serialize_changes_container(void **data, + apr_size_t *data_len, + void *in, + apr_pool_t *pool); + +/* Implements #svn_cache__deserialize_func_t for svn_fs_x__changes_t objects. + */ +svn_error_t * +svn_fs_x__deserialize_changes_container(void **out, + void *data, + apr_size_t data_len, + apr_pool_t *pool); + +/* Implements svn_cache__partial_getter_func_t for svn_fs_x__changes_t, + * setting *OUT to the change list (apr_array_header_t *) selected by + * the apr_uint32_t index passed in as *BATON. This function is similar + * to svn_fs_x__changes_get_list but operates on the cache serialized + * representation of the container. + */ +svn_error_t * +svn_fs_x__changes_get_list_func(void **out, + const void *data, + apr_size_t data_len, + void *baton, + apr_pool_t *pool); + +#endif diff --git a/subversion/libsvn_fs_x/dag.c b/subversion/libsvn_fs_x/dag.c new file mode 100644 index 0000000..2f5bcb2 --- /dev/null +++ b/subversion/libsvn_fs_x/dag.c @@ -0,0 +1,1368 @@ +/* dag.c : DAG-like interface filesystem, private to libsvn_fs + * + * ==================================================================== + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + * ==================================================================== + */ + +#include <string.h> + +#include "svn_path.h" +#include "svn_error.h" +#include "svn_fs.h" +#include "svn_props.h" +#include "svn_pools.h" + +#include "dag.h" +#include "fs.h" +#include "fs_x.h" +#include "fs_id.h" +#include "cached_data.h" +#include "transaction.h" + +#include "../libsvn_fs/fs-loader.h" + +#include "private/svn_fspath.h" +#include "svn_private_config.h" +#include "private/svn_temp_serializer.h" +#include "temp_serializer.h" + + +/* Initializing a filesystem. */ + +struct dag_node_t +{ + /* The filesystem this dag node came from. */ + svn_fs_t *fs; + + /* The node revision ID for this dag node. */ + svn_fs_x__id_t id; + + /* In the special case that this node is the root of a transaction + that has not yet been modified, the revision of this node is the + respective txn's base rev. Otherwise, this is SVN_INVALID_REVNUM + for txn nodes and the respective crev for committed nodes. + (Used in svn_fs_node_created_rev.) */ + svn_revnum_t revision; + + /* The node's type (file, dir, etc.) */ + svn_node_kind_t kind; + + /* The node's NODE-REVISION, or NULL if we haven't read it in yet. + This is allocated in this node's POOL. + + If you're willing to respect all the rules above, you can munge + this yourself, but you're probably better off just calling + `get_node_revision' and `set_node_revision', which take care of + things for you. */ + svn_fs_x__noderev_t *node_revision; + + /* The pool to allocate NODE_REVISION in. */ + apr_pool_t *node_pool; + + /* the path at which this node was created. */ + const char *created_path; + + /* Directory entry lookup hint to speed up consecutive calls to + svn_fs_x__rep_contents_dir_entry(). Only used for directory nodes. + Any value is legal but should default to APR_SIZE_MAX. */ + apr_size_t hint; +}; + + + +/* Trivial helper/accessor functions. */ +svn_node_kind_t +svn_fs_x__dag_node_kind(dag_node_t *node) +{ + return node->kind; +} + +const svn_fs_x__id_t * +svn_fs_x__dag_get_id(const dag_node_t *node) +{ + return &node->id; +} + + +const char * +svn_fs_x__dag_get_created_path(dag_node_t *node) +{ + return node->created_path; +} + + +svn_fs_t * +svn_fs_x__dag_get_fs(dag_node_t *node) +{ + return node->fs; +} + +void +svn_fs_x__dag_set_fs(dag_node_t *node, + svn_fs_t *fs) +{ + node->fs = fs; +} + + +/* Dup NODEREV and all associated data into RESULT_POOL. + Leaves the id and is_fresh_txn_root fields as zero bytes. */ +static svn_fs_x__noderev_t * +copy_node_revision(svn_fs_x__noderev_t *noderev, + apr_pool_t *result_pool) +{ + svn_fs_x__noderev_t *nr = apr_pmemdup(result_pool, noderev, + sizeof(*noderev)); + + if (noderev->copyfrom_path) + nr->copyfrom_path = apr_pstrdup(result_pool, noderev->copyfrom_path); + + nr->copyroot_path = apr_pstrdup(result_pool, noderev->copyroot_path); + nr->data_rep = svn_fs_x__rep_copy(noderev->data_rep, result_pool); + nr->prop_rep = svn_fs_x__rep_copy(noderev->prop_rep, result_pool); + + if (noderev->created_path) + nr->created_path = apr_pstrdup(result_pool, noderev->created_path); + + return nr; +} + + +/* Set *NODEREV_P to the cached node-revision for NODE. + If the node-revision was not already cached in NODE, read it in, + allocating the cache in NODE->NODE_POOL. + + If you plan to change the contents of NODE, be careful! We're + handing you a pointer directly to our cached node-revision, not + your own copy. If you change it as part of some operation, but + then some Berkeley DB function deadlocks or gets an error, you'll + need to back out your changes, or else the cache will reflect + changes that never got committed. It's probably best not to change + the structure at all. */ +static svn_error_t * +get_node_revision(svn_fs_x__noderev_t **noderev_p, + dag_node_t *node) +{ + /* If we've already got a copy, there's no need to read it in. */ + if (! node->node_revision) + { + svn_fs_x__noderev_t *noderev; + apr_pool_t *scratch_pool = svn_pool_create(node->node_pool); + + SVN_ERR(svn_fs_x__get_node_revision(&noderev, node->fs, &node->id, + node->node_pool, scratch_pool)); + node->node_revision = noderev; + svn_pool_destroy(scratch_pool); + } + + /* Now NODE->node_revision is set. */ + *noderev_p = node->node_revision; + return SVN_NO_ERROR; +} + +/* Return the node revision ID of NODE. The value returned is shared + with NODE, and will be deallocated when NODE is. */ +svn_error_t * +svn_fs_x__dag_get_node_id(svn_fs_x__id_t *node_id, + dag_node_t *node) +{ + svn_fs_x__noderev_t *noderev; + SVN_ERR(get_node_revision(&noderev, node)); + + *node_id = noderev->node_id; + return SVN_NO_ERROR; +} + +/* Return the node revision ID of NODE. The value returned is shared + with NODE, and will be deallocated when NODE is. */ +svn_error_t * +svn_fs_x__dag_get_copy_id(svn_fs_x__id_t *copy_id, + dag_node_t *node) +{ + svn_fs_x__noderev_t *noderev; + SVN_ERR(get_node_revision(&noderev, node)); + + *copy_id = noderev->copy_id; + return SVN_NO_ERROR; +} + +/* Return the node ID of NODE. The value returned is shared with NODE, + and will be deallocated when NODE is. */ +svn_error_t * +svn_fs_x__dag_related_node(svn_boolean_t *same, + dag_node_t *lhs, + dag_node_t *rhs) +{ + svn_fs_x__id_t lhs_node, rhs_node; + + SVN_ERR(svn_fs_x__dag_get_node_id(&lhs_node, lhs)); + SVN_ERR(svn_fs_x__dag_get_node_id(&rhs_node, rhs)); + *same = svn_fs_x__id_eq(&lhs_node, &rhs_node); + + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__dag_same_line_of_history(svn_boolean_t *same, + dag_node_t *lhs, + dag_node_t *rhs) +{ + svn_fs_x__noderev_t *lhs_noderev, *rhs_noderev; + + SVN_ERR(get_node_revision(&lhs_noderev, lhs)); + SVN_ERR(get_node_revision(&rhs_noderev, rhs)); + + *same = svn_fs_x__id_eq(&lhs_noderev->node_id, &rhs_noderev->node_id) + && svn_fs_x__id_eq(&lhs_noderev->copy_id, &rhs_noderev->copy_id); + + return SVN_NO_ERROR; +} + +svn_boolean_t +svn_fs_x__dag_check_mutable(const dag_node_t *node) +{ + return svn_fs_x__is_txn(svn_fs_x__dag_get_id(node)->change_set); +} + + +svn_error_t * +svn_fs_x__dag_get_node(dag_node_t **node, + svn_fs_t *fs, + const svn_fs_x__id_t *id, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool) +{ + dag_node_t *new_node; + svn_fs_x__noderev_t *noderev; + + /* Construct the node. */ + new_node = apr_pcalloc(result_pool, sizeof(*new_node)); + new_node->fs = fs; + new_node->id = *id; + new_node->hint = APR_SIZE_MAX; + + /* Grab the contents so we can inspect the node's kind and created path. */ + SVN_ERR(svn_fs_x__get_node_revision(&noderev, fs, id, + result_pool, scratch_pool)); + new_node->node_pool = result_pool; + new_node->node_revision = noderev; + + /* Initialize the KIND and CREATED_PATH attributes */ + new_node->kind = noderev->kind; + new_node->created_path = noderev->created_path; + + /* Support our quirky svn_fs_node_created_rev API. + Untouched txn roots report the base rev as theirs. */ + new_node->revision + = ( svn_fs_x__is_fresh_txn_root(noderev) + ? svn_fs_x__get_revnum(noderev->predecessor_id.change_set) + : svn_fs_x__get_revnum(id->change_set)); + + /* Return a fresh new node */ + *node = new_node; + return SVN_NO_ERROR; +} + + +svn_revnum_t +svn_fs_x__dag_get_revision(const dag_node_t *node) +{ + return node->revision; +} + + +svn_error_t * +svn_fs_x__dag_get_predecessor_id(svn_fs_x__id_t *id_p, + dag_node_t *node) +{ + svn_fs_x__noderev_t *noderev; + + SVN_ERR(get_node_revision(&noderev, node)); + *id_p = noderev->predecessor_id; + + return SVN_NO_ERROR; +} + + +svn_error_t * +svn_fs_x__dag_get_predecessor_count(int *count, + dag_node_t *node) +{ + svn_fs_x__noderev_t *noderev; + + SVN_ERR(get_node_revision(&noderev, node)); + *count = noderev->predecessor_count; + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__dag_get_mergeinfo_count(apr_int64_t *count, + dag_node_t *node) +{ + svn_fs_x__noderev_t *noderev; + + SVN_ERR(get_node_revision(&noderev, node)); + *count = noderev->mergeinfo_count; + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__dag_has_mergeinfo(svn_boolean_t *has_mergeinfo, + dag_node_t *node) +{ + svn_fs_x__noderev_t *noderev; + + SVN_ERR(get_node_revision(&noderev, node)); + *has_mergeinfo = noderev->has_mergeinfo; + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__dag_has_descendants_with_mergeinfo(svn_boolean_t *do_they, + dag_node_t *node) +{ + svn_fs_x__noderev_t *noderev; + + if (node->kind != svn_node_dir) + { + *do_they = FALSE; + return SVN_NO_ERROR; + } + + SVN_ERR(get_node_revision(&noderev, node)); + if (noderev->mergeinfo_count > 1) + *do_they = TRUE; + else if (noderev->mergeinfo_count == 1 && !noderev->has_mergeinfo) + *do_they = TRUE; + else + *do_they = FALSE; + return SVN_NO_ERROR; +} + + +/*** Directory node functions ***/ + +/* Some of these are helpers for functions outside this section. */ + +/* Set *ID_P to the noderev-id for entry NAME in PARENT. If no such + entry, set *ID_P to NULL but do not error. */ +static svn_error_t * +dir_entry_id_from_node(svn_fs_x__id_t *id_p, + dag_node_t *parent, + const char *name, + apr_pool_t *scratch_pool) +{ + svn_fs_x__dirent_t *dirent; + svn_fs_x__noderev_t *noderev; + + SVN_ERR(get_node_revision(&noderev, parent)); + if (noderev->kind != svn_node_dir) + return svn_error_create(SVN_ERR_FS_NOT_DIRECTORY, NULL, + _("Can't get entries of non-directory")); + + /* Make sure that NAME is a single path component. */ + if (! svn_path_is_single_path_component(name)) + return svn_error_createf + (SVN_ERR_FS_NOT_SINGLE_PATH_COMPONENT, NULL, + "Attempted to open node with an illegal name '%s'", name); + + /* Get a dirent hash for this directory. */ + SVN_ERR(svn_fs_x__rep_contents_dir_entry(&dirent, parent->fs, noderev, + name, &parent->hint, + scratch_pool, scratch_pool)); + if (dirent) + *id_p = dirent->id; + else + svn_fs_x__id_reset(id_p); + + return SVN_NO_ERROR; +} + + +/* Add or set in PARENT a directory entry NAME pointing to ID. + Temporary allocations are done in SCRATCH_POOL. + + Assumptions: + - PARENT is a mutable directory. + - ID does not refer to an ancestor of parent + - NAME is a single path component +*/ +static svn_error_t * +set_entry(dag_node_t *parent, + const char *name, + const svn_fs_x__id_t *id, + svn_node_kind_t kind, + svn_fs_x__txn_id_t txn_id, + apr_pool_t *scratch_pool) +{ + svn_fs_x__noderev_t *parent_noderev; + + /* Get the parent's node-revision. */ + SVN_ERR(get_node_revision(&parent_noderev, parent)); + + /* Set the new entry. */ + return svn_fs_x__set_entry(parent->fs, txn_id, parent_noderev, name, id, + kind, parent->node_pool, scratch_pool); +} + + +/* Make a new entry named NAME in PARENT. If IS_DIR is true, then the + node revision the new entry points to will be a directory, else it + will be a file. The new node will be allocated in RESULT_POOL. PARENT + must be mutable, and must not have an entry named NAME. + + Use SCRATCH_POOL for all temporary allocations. + */ +static svn_error_t * +make_entry(dag_node_t **child_p, + dag_node_t *parent, + const char *parent_path, + const char *name, + svn_boolean_t is_dir, + svn_fs_x__txn_id_t txn_id, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool) +{ + svn_fs_x__noderev_t new_noderev, *parent_noderev; + + /* Make sure that NAME is a single path component. */ + if (! svn_path_is_single_path_component(name)) + return svn_error_createf + (SVN_ERR_FS_NOT_SINGLE_PATH_COMPONENT, NULL, + _("Attempted to create a node with an illegal name '%s'"), name); + + /* Make sure that parent is a directory */ + if (parent->kind != svn_node_dir) + return svn_error_create + (SVN_ERR_FS_NOT_DIRECTORY, NULL, + _("Attempted to create entry in non-directory parent")); + + /* Check that the parent is mutable. */ + if (! svn_fs_x__dag_check_mutable(parent)) + return svn_error_createf + (SVN_ERR_FS_NOT_MUTABLE, NULL, + _("Attempted to clone child of non-mutable node")); + + /* Create the new node's NODE-REVISION */ + memset(&new_noderev, 0, sizeof(new_noderev)); + new_noderev.kind = is_dir ? svn_node_dir : svn_node_file; + new_noderev.created_path = svn_fspath__join(parent_path, name, result_pool); + + SVN_ERR(get_node_revision(&parent_noderev, parent)); + new_noderev.copyroot_path = apr_pstrdup(result_pool, + parent_noderev->copyroot_path); + new_noderev.copyroot_rev = parent_noderev->copyroot_rev; + new_noderev.copyfrom_rev = SVN_INVALID_REVNUM; + new_noderev.copyfrom_path = NULL; + svn_fs_x__id_reset(&new_noderev.predecessor_id); + + SVN_ERR(svn_fs_x__create_node + (svn_fs_x__dag_get_fs(parent), &new_noderev, + &parent_noderev->copy_id, txn_id, scratch_pool)); + + /* Create a new dag_node_t for our new node */ + SVN_ERR(svn_fs_x__dag_get_node(child_p, svn_fs_x__dag_get_fs(parent), + &new_noderev.noderev_id, result_pool, + scratch_pool)); + + /* We can safely call set_entry because we already know that + PARENT is mutable, and we just created CHILD, so we know it has + no ancestors (therefore, PARENT cannot be an ancestor of CHILD) */ + return set_entry(parent, name, &new_noderev.noderev_id, + new_noderev.kind, txn_id, scratch_pool); +} + + +svn_error_t * +svn_fs_x__dag_dir_entries(apr_array_header_t **entries, + dag_node_t *node, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool) +{ + svn_fs_x__noderev_t *noderev; + + SVN_ERR(get_node_revision(&noderev, node)); + + if (noderev->kind != svn_node_dir) + return svn_error_create(SVN_ERR_FS_NOT_DIRECTORY, NULL, + _("Can't get entries of non-directory")); + + return svn_fs_x__rep_contents_dir(entries, node->fs, noderev, result_pool, + scratch_pool); +} + + +svn_error_t * +svn_fs_x__dag_set_entry(dag_node_t *node, + const char *entry_name, + const svn_fs_x__id_t *id, + svn_node_kind_t kind, + svn_fs_x__txn_id_t txn_id, + apr_pool_t *scratch_pool) +{ + /* Check it's a directory. */ + if (node->kind != svn_node_dir) + return svn_error_create + (SVN_ERR_FS_NOT_DIRECTORY, NULL, + _("Attempted to set entry in non-directory node")); + + /* Check it's mutable. */ + if (! svn_fs_x__dag_check_mutable(node)) + return svn_error_create + (SVN_ERR_FS_NOT_MUTABLE, NULL, + _("Attempted to set entry in immutable node")); + + return set_entry(node, entry_name, id, kind, txn_id, scratch_pool); +} + + + +/*** Proplists. ***/ + +svn_error_t * +svn_fs_x__dag_get_proplist(apr_hash_t **proplist_p, + dag_node_t *node, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool) +{ + svn_fs_x__noderev_t *noderev; + apr_hash_t *proplist = NULL; + + SVN_ERR(get_node_revision(&noderev, node)); + + SVN_ERR(svn_fs_x__get_proplist(&proplist, node->fs, noderev, result_pool, + scratch_pool)); + + *proplist_p = proplist; + + return SVN_NO_ERROR; +} + + +svn_error_t * +svn_fs_x__dag_set_proplist(dag_node_t *node, + apr_hash_t *proplist, + apr_pool_t *scratch_pool) +{ + svn_fs_x__noderev_t *noderev; + + /* Sanity check: this node better be mutable! */ + if (! svn_fs_x__dag_check_mutable(node)) + { + svn_string_t *idstr = svn_fs_x__id_unparse(&node->id, scratch_pool); + return svn_error_createf + (SVN_ERR_FS_NOT_MUTABLE, NULL, + "Can't set proplist on *immutable* node-revision %s", + idstr->data); + } + + /* Go get a fresh NODE-REVISION for this node. */ + SVN_ERR(get_node_revision(&noderev, node)); + + /* Set the new proplist. */ + return svn_fs_x__set_proplist(node->fs, noderev, proplist, scratch_pool); +} + + +svn_error_t * +svn_fs_x__dag_increment_mergeinfo_count(dag_node_t *node, + apr_int64_t increment, + apr_pool_t *scratch_pool) +{ + svn_fs_x__noderev_t *noderev; + + /* Sanity check: this node better be mutable! */ + if (! svn_fs_x__dag_check_mutable(node)) + { + svn_string_t *idstr = svn_fs_x__id_unparse(&node->id, scratch_pool); + return svn_error_createf + (SVN_ERR_FS_NOT_MUTABLE, NULL, + "Can't increment mergeinfo count on *immutable* node-revision %s", + idstr->data); + } + + if (increment == 0) + return SVN_NO_ERROR; + + /* Go get a fresh NODE-REVISION for this node. */ + SVN_ERR(get_node_revision(&noderev, node)); + + noderev->mergeinfo_count += increment; + if (noderev->mergeinfo_count < 0) + { + svn_string_t *idstr = svn_fs_x__id_unparse(&node->id, scratch_pool); + return svn_error_createf + (SVN_ERR_FS_CORRUPT, NULL, + apr_psprintf(scratch_pool, + _("Can't increment mergeinfo count on node-revision %%s " + "to negative value %%%s"), + APR_INT64_T_FMT), + idstr->data, noderev->mergeinfo_count); + } + if (noderev->mergeinfo_count > 1 && noderev->kind == svn_node_file) + { + svn_string_t *idstr = svn_fs_x__id_unparse(&node->id, scratch_pool); + return svn_error_createf + (SVN_ERR_FS_CORRUPT, NULL, + apr_psprintf(scratch_pool, + _("Can't increment mergeinfo count on *file* " + "node-revision %%s to %%%s (> 1)"), + APR_INT64_T_FMT), + idstr->data, noderev->mergeinfo_count); + } + + /* Flush it out. */ + return svn_fs_x__put_node_revision(node->fs, noderev, scratch_pool); +} + +svn_error_t * +svn_fs_x__dag_set_has_mergeinfo(dag_node_t *node, + svn_boolean_t has_mergeinfo, + apr_pool_t *scratch_pool) +{ + svn_fs_x__noderev_t *noderev; + + /* Sanity check: this node better be mutable! */ + if (! svn_fs_x__dag_check_mutable(node)) + { + svn_string_t *idstr = svn_fs_x__id_unparse(&node->id, scratch_pool); + return svn_error_createf + (SVN_ERR_FS_NOT_MUTABLE, NULL, + "Can't set mergeinfo flag on *immutable* node-revision %s", + idstr->data); + } + + /* Go get a fresh NODE-REVISION for this node. */ + SVN_ERR(get_node_revision(&noderev, node)); + + noderev->has_mergeinfo = has_mergeinfo; + + /* Flush it out. */ + return svn_fs_x__put_node_revision(node->fs, noderev, scratch_pool); +} + + +/*** Roots. ***/ + +svn_error_t * +svn_fs_x__dag_revision_root(dag_node_t **node_p, + svn_fs_t *fs, + svn_revnum_t rev, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool) +{ + svn_fs_x__id_t root_id; + + svn_fs_x__init_rev_root(&root_id, rev); + return svn_fs_x__dag_get_node(node_p, fs, &root_id, result_pool, + scratch_pool); +} + + +svn_error_t * +svn_fs_x__dag_txn_root(dag_node_t **node_p, + svn_fs_t *fs, + svn_fs_x__txn_id_t txn_id, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool) +{ + svn_fs_x__id_t root_id; + + svn_fs_x__init_txn_root(&root_id, txn_id); + return svn_fs_x__dag_get_node(node_p, fs, &root_id, result_pool, + scratch_pool); +} + + +svn_error_t * +svn_fs_x__dag_clone_child(dag_node_t **child_p, + dag_node_t *parent, + const char *parent_path, + const char *name, + const svn_fs_x__id_t *copy_id, + svn_fs_x__txn_id_t txn_id, + svn_boolean_t is_parent_copyroot, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool) +{ + dag_node_t *cur_entry; /* parent's current entry named NAME */ + const svn_fs_x__id_t *new_node_id; /* node id we'll put into NEW_NODE */ + svn_fs_t *fs = svn_fs_x__dag_get_fs(parent); + + /* First check that the parent is mutable. */ + if (! svn_fs_x__dag_check_mutable(parent)) + return svn_error_createf + (SVN_ERR_FS_NOT_MUTABLE, NULL, + "Attempted to clone child of non-mutable node"); + + /* Make sure that NAME is a single path component. */ + if (! svn_path_is_single_path_component(name)) + return svn_error_createf + (SVN_ERR_FS_NOT_SINGLE_PATH_COMPONENT, NULL, + "Attempted to make a child clone with an illegal name '%s'", name); + + /* Find the node named NAME in PARENT's entries list if it exists. */ + SVN_ERR(svn_fs_x__dag_open(&cur_entry, parent, name, scratch_pool, + scratch_pool)); + if (! cur_entry) + return svn_error_createf + (SVN_ERR_FS_NOT_FOUND, NULL, + "Attempted to open non-existent child node '%s'", name); + + /* Check for mutability in the node we found. If it's mutable, we + don't need to clone it. */ + if (svn_fs_x__dag_check_mutable(cur_entry)) + { + /* This has already been cloned */ + new_node_id = svn_fs_x__dag_get_id(cur_entry); + } + else + { + svn_fs_x__noderev_t *noderev, *parent_noderev; + + /* Go get a fresh NODE-REVISION for current child node. */ + SVN_ERR(get_node_revision(&noderev, cur_entry)); + + if (is_parent_copyroot) + { + SVN_ERR(get_node_revision(&parent_noderev, parent)); + noderev->copyroot_rev = parent_noderev->copyroot_rev; + noderev->copyroot_path = apr_pstrdup(scratch_pool, + parent_noderev->copyroot_path); + } + + noderev->copyfrom_path = NULL; + noderev->copyfrom_rev = SVN_INVALID_REVNUM; + + noderev->predecessor_id = noderev->noderev_id; + noderev->predecessor_count++; + noderev->created_path = svn_fspath__join(parent_path, name, + scratch_pool); + + if (copy_id == NULL) + copy_id = &noderev->copy_id; + + SVN_ERR(svn_fs_x__create_successor(fs, noderev, copy_id, txn_id, + scratch_pool)); + new_node_id = &noderev->noderev_id; + + /* Replace the ID in the parent's ENTRY list with the ID which + refers to the mutable clone of this child. */ + SVN_ERR(set_entry(parent, name, new_node_id, noderev->kind, txn_id, + scratch_pool)); + } + + /* Initialize the youngster. */ + return svn_fs_x__dag_get_node(child_p, fs, new_node_id, result_pool, + scratch_pool); +} + + +/* Delete all mutable node revisions reachable from node ID, including + ID itself, from FS's `nodes' table. Also delete any mutable + representations and strings associated with that node revision. + ID may refer to a file or directory, which may be mutable or immutable. + + Use SCRATCH_POOL for temporary allocations. + */ +static svn_error_t * +delete_if_mutable(svn_fs_t *fs, + const svn_fs_x__id_t *id, + apr_pool_t *scratch_pool) +{ + dag_node_t *node; + + /* Get the node. */ + SVN_ERR(svn_fs_x__dag_get_node(&node, fs, id, scratch_pool, scratch_pool)); + + /* If immutable, do nothing and return immediately. */ + if (! svn_fs_x__dag_check_mutable(node)) + return SVN_NO_ERROR; + + /* Else it's mutable. Recurse on directories... */ + if (node->kind == svn_node_dir) + { + apr_array_header_t *entries; + int i; + apr_pool_t *iterpool = svn_pool_create(scratch_pool); + + /* Loop over directory entries */ + SVN_ERR(svn_fs_x__dag_dir_entries(&entries, node, scratch_pool, + iterpool)); + for (i = 0; i < entries->nelts; ++i) + { + const svn_fs_x__id_t *noderev_id + = &APR_ARRAY_IDX(entries, i, svn_fs_x__dirent_t *)->id; + + svn_pool_clear(iterpool); + SVN_ERR(delete_if_mutable(fs, noderev_id, iterpool)); + } + + svn_pool_destroy(iterpool); + } + + /* ... then delete the node itself, after deleting any mutable + representations and strings it points to. */ + return svn_fs_x__delete_node_revision(fs, id, scratch_pool); +} + + +svn_error_t * +svn_fs_x__dag_delete(dag_node_t *parent, + const char *name, + svn_fs_x__txn_id_t txn_id, + apr_pool_t *scratch_pool) +{ + svn_fs_x__noderev_t *parent_noderev; + svn_fs_t *fs = parent->fs; + svn_fs_x__dirent_t *dirent; + apr_pool_t *subpool; + + /* Make sure parent is a directory. */ + if (parent->kind != svn_node_dir) + return svn_error_createf + (SVN_ERR_FS_NOT_DIRECTORY, NULL, + "Attempted to delete entry '%s' from *non*-directory node", name); + + /* Make sure parent is mutable. */ + if (! svn_fs_x__dag_check_mutable(parent)) + return svn_error_createf + (SVN_ERR_FS_NOT_MUTABLE, NULL, + "Attempted to delete entry '%s' from immutable directory node", name); + + /* Make sure that NAME is a single path component. */ + if (! svn_path_is_single_path_component(name)) + return svn_error_createf + (SVN_ERR_FS_NOT_SINGLE_PATH_COMPONENT, NULL, + "Attempted to delete a node with an illegal name '%s'", name); + + /* Get a fresh NODE-REVISION for the parent node. */ + SVN_ERR(get_node_revision(&parent_noderev, parent)); + + subpool = svn_pool_create(scratch_pool); + + /* Search this directory for a dirent with that NAME. */ + SVN_ERR(svn_fs_x__rep_contents_dir_entry(&dirent, fs, parent_noderev, + name, &parent->hint, + subpool, subpool)); + + /* If we never found ID in ENTRIES (perhaps because there are no + ENTRIES, perhaps because ID just isn't in the existing ENTRIES + ... it doesn't matter), return an error. */ + if (! dirent) + return svn_error_createf + (SVN_ERR_FS_NO_SUCH_ENTRY, NULL, + "Delete failed--directory has no entry '%s'", name); + + /* If mutable, remove it and any mutable children from db. */ + SVN_ERR(delete_if_mutable(parent->fs, &dirent->id, scratch_pool)); + svn_pool_destroy(subpool); + + /* Remove this entry from its parent's entries list. */ + return svn_fs_x__set_entry(parent->fs, txn_id, parent_noderev, name, + NULL, svn_node_unknown, parent->node_pool, + scratch_pool); +} + + +svn_error_t * +svn_fs_x__dag_make_file(dag_node_t **child_p, + dag_node_t *parent, + const char *parent_path, + const char *name, + svn_fs_x__txn_id_t txn_id, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool) +{ + /* Call our little helper function */ + return make_entry(child_p, parent, parent_path, name, FALSE, txn_id, + result_pool, scratch_pool); +} + + +svn_error_t * +svn_fs_x__dag_make_dir(dag_node_t **child_p, + dag_node_t *parent, + const char *parent_path, + const char *name, + svn_fs_x__txn_id_t txn_id, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool) +{ + /* Call our little helper function */ + return make_entry(child_p, parent, parent_path, name, TRUE, txn_id, + result_pool, scratch_pool); +} + + +svn_error_t * +svn_fs_x__dag_get_contents(svn_stream_t **contents_p, + dag_node_t *file, + apr_pool_t *result_pool) +{ + svn_fs_x__noderev_t *noderev; + svn_stream_t *contents; + + /* Make sure our node is a file. */ + if (file->kind != svn_node_file) + return svn_error_createf + (SVN_ERR_FS_NOT_FILE, NULL, + "Attempted to get textual contents of a *non*-file node"); + + /* Go get a fresh node-revision for FILE. */ + SVN_ERR(get_node_revision(&noderev, file)); + + /* Get a stream to the contents. */ + SVN_ERR(svn_fs_x__get_contents(&contents, file->fs, + noderev->data_rep, TRUE, result_pool)); + + *contents_p = contents; + + return SVN_NO_ERROR; +} + + +svn_error_t * +svn_fs_x__dag_get_file_delta_stream(svn_txdelta_stream_t **stream_p, + dag_node_t *source, + dag_node_t *target, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool) +{ + svn_fs_x__noderev_t *src_noderev; + svn_fs_x__noderev_t *tgt_noderev; + + /* Make sure our nodes are files. */ + if ((source && source->kind != svn_node_file) + || target->kind != svn_node_file) + return svn_error_createf + (SVN_ERR_FS_NOT_FILE, NULL, + "Attempted to get textual contents of a *non*-file node"); + + /* Go get fresh node-revisions for the nodes. */ + if (source) + SVN_ERR(get_node_revision(&src_noderev, source)); + else + src_noderev = NULL; + SVN_ERR(get_node_revision(&tgt_noderev, target)); + + /* Get the delta stream. */ + return svn_fs_x__get_file_delta_stream(stream_p, target->fs, + src_noderev, tgt_noderev, + result_pool, scratch_pool); +} + + +svn_error_t * +svn_fs_x__dag_try_process_file_contents(svn_boolean_t *success, + dag_node_t *node, + svn_fs_process_contents_func_t processor, + void* baton, + apr_pool_t *scratch_pool) +{ + svn_fs_x__noderev_t *noderev; + + /* Go get fresh node-revisions for the nodes. */ + SVN_ERR(get_node_revision(&noderev, node)); + + return svn_fs_x__try_process_file_contents(success, node->fs, + noderev, + processor, baton, scratch_pool); +} + + +svn_error_t * +svn_fs_x__dag_file_length(svn_filesize_t *length, + dag_node_t *file) +{ + svn_fs_x__noderev_t *noderev; + + /* Make sure our node is a file. */ + if (file->kind != svn_node_file) + return svn_error_createf + (SVN_ERR_FS_NOT_FILE, NULL, + "Attempted to get length of a *non*-file node"); + + /* Go get a fresh node-revision for FILE, and . */ + SVN_ERR(get_node_revision(&noderev, file)); + + return svn_fs_x__file_length(length, noderev); +} + + +svn_error_t * +svn_fs_x__dag_file_checksum(svn_checksum_t **checksum, + dag_node_t *file, + svn_checksum_kind_t kind, + apr_pool_t *result_pool) +{ + svn_fs_x__noderev_t *noderev; + + if (file->kind != svn_node_file) + return svn_error_createf + (SVN_ERR_FS_NOT_FILE, NULL, + "Attempted to get checksum of a *non*-file node"); + + SVN_ERR(get_node_revision(&noderev, file)); + + return svn_fs_x__file_checksum(checksum, noderev, kind, result_pool); +} + + +svn_error_t * +svn_fs_x__dag_get_edit_stream(svn_stream_t **contents, + dag_node_t *file, + apr_pool_t *result_pool) +{ + svn_fs_x__noderev_t *noderev; + svn_stream_t *ws; + + /* Make sure our node is a file. */ + if (file->kind != svn_node_file) + return svn_error_createf + (SVN_ERR_FS_NOT_FILE, NULL, + "Attempted to set textual contents of a *non*-file node"); + + /* Make sure our node is mutable. */ + if (! svn_fs_x__dag_check_mutable(file)) + return svn_error_createf + (SVN_ERR_FS_NOT_MUTABLE, NULL, + "Attempted to set textual contents of an immutable node"); + + /* Get the node revision. */ + SVN_ERR(get_node_revision(&noderev, file)); + + SVN_ERR(svn_fs_x__set_contents(&ws, file->fs, noderev, result_pool)); + + *contents = ws; + + return SVN_NO_ERROR; +} + + + +svn_error_t * +svn_fs_x__dag_finalize_edits(dag_node_t *file, + const svn_checksum_t *checksum, + apr_pool_t *scratch_pool) +{ + if (checksum) + { + svn_checksum_t *file_checksum; + + SVN_ERR(svn_fs_x__dag_file_checksum(&file_checksum, file, + checksum->kind, scratch_pool)); + if (!svn_checksum_match(checksum, file_checksum)) + return svn_checksum_mismatch_err(checksum, file_checksum, + scratch_pool, + _("Checksum mismatch for '%s'"), + file->created_path); + } + + return SVN_NO_ERROR; +} + + +dag_node_t * +svn_fs_x__dag_dup(const dag_node_t *node, + apr_pool_t *result_pool) +{ + /* Allocate our new node. */ + dag_node_t *new_node = apr_pmemdup(result_pool, node, sizeof(*new_node)); + + /* Only copy cached svn_fs_x__noderev_t for immutable nodes. */ + if (node->node_revision && !svn_fs_x__dag_check_mutable(node)) + { + new_node->node_revision = copy_node_revision(node->node_revision, + result_pool); + new_node->created_path = new_node->node_revision->created_path; + } + else + { + new_node->node_revision = NULL; + new_node->created_path = apr_pstrdup(result_pool, node->created_path); + } + + new_node->node_pool = result_pool; + + return new_node; +} + +dag_node_t * +svn_fs_x__dag_copy_into_pool(dag_node_t *node, + apr_pool_t *result_pool) +{ + return (node->node_pool == result_pool + ? node + : svn_fs_x__dag_dup(node, result_pool)); +} + +svn_error_t * +svn_fs_x__dag_serialize(void **data, + apr_size_t *data_len, + void *in, + apr_pool_t *pool) +{ + dag_node_t *node = in; + svn_stringbuf_t *serialized; + + /* create an serialization context and serialize the dag node as root */ + svn_temp_serializer__context_t *context = + svn_temp_serializer__init(node, + sizeof(*node), + 1024 - SVN_TEMP_SERIALIZER__OVERHEAD, + pool); + + /* for mutable nodes, we will _never_ cache the noderev */ + if (node->node_revision && !svn_fs_x__dag_check_mutable(node)) + { + svn_fs_x__noderev_serialize(context, &node->node_revision); + } + else + { + svn_temp_serializer__set_null(context, + (const void * const *)&node->node_revision); + svn_temp_serializer__add_string(context, &node->created_path); + } + + /* The deserializer will use its own pool. */ + svn_temp_serializer__set_null(context, + (const void * const *)&node->node_pool); + + /* return serialized data */ + serialized = svn_temp_serializer__get(context); + *data = serialized->data; + *data_len = serialized->len; + + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__dag_deserialize(void **out, + void *data, + apr_size_t data_len, + apr_pool_t *pool) +{ + dag_node_t *node = (dag_node_t *)data; + if (data_len == 0) + return svn_error_create(SVN_ERR_FS_CORRUPT, NULL, + _("Empty noderev in cache")); + + /* Copy the _full_ buffer as it also contains the sub-structures. */ + node->fs = NULL; + + /* fixup all references to sub-structures */ + svn_fs_x__noderev_deserialize(node, &node->node_revision, pool); + node->node_pool = pool; + + if (node->node_revision) + node->created_path = node->node_revision->created_path; + else + svn_temp_deserializer__resolve(node, (void**)&node->created_path); + + /* return result */ + *out = node; + + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__dag_open(dag_node_t **child_p, + dag_node_t *parent, + const char *name, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool) +{ + svn_fs_x__id_t node_id; + + /* Ensure that NAME exists in PARENT's entry list. */ + SVN_ERR(dir_entry_id_from_node(&node_id, parent, name, scratch_pool)); + if (! svn_fs_x__id_used(&node_id)) + { + *child_p = NULL; + return SVN_NO_ERROR; + } + + /* Now get the node that was requested. */ + return svn_fs_x__dag_get_node(child_p, svn_fs_x__dag_get_fs(parent), + &node_id, result_pool, scratch_pool); +} + + +svn_error_t * +svn_fs_x__dag_copy(dag_node_t *to_node, + const char *entry, + dag_node_t *from_node, + svn_boolean_t preserve_history, + svn_revnum_t from_rev, + const char *from_path, + svn_fs_x__txn_id_t txn_id, + apr_pool_t *scratch_pool) +{ + const svn_fs_x__id_t *id; + + if (preserve_history) + { + svn_fs_x__noderev_t *from_noderev, *to_noderev; + svn_fs_x__id_t copy_id; + svn_fs_t *fs = svn_fs_x__dag_get_fs(from_node); + + /* Make a copy of the original node revision. */ + SVN_ERR(get_node_revision(&from_noderev, from_node)); + to_noderev = copy_node_revision(from_noderev, scratch_pool); + + /* Reserve a copy ID for this new copy. */ + SVN_ERR(svn_fs_x__reserve_copy_id(©_id, fs, txn_id, scratch_pool)); + + /* Create a successor with its predecessor pointing at the copy + source. */ + to_noderev->predecessor_id = to_noderev->noderev_id; + to_noderev->predecessor_count++; + to_noderev->created_path = + svn_fspath__join(svn_fs_x__dag_get_created_path(to_node), entry, + scratch_pool); + to_noderev->copyfrom_path = apr_pstrdup(scratch_pool, from_path); + to_noderev->copyfrom_rev = from_rev; + + /* Set the copyroot equal to our own id. */ + to_noderev->copyroot_path = NULL; + + SVN_ERR(svn_fs_x__create_successor(fs, to_noderev, + ©_id, txn_id, scratch_pool)); + id = &to_noderev->noderev_id; + } + else /* don't preserve history */ + { + id = svn_fs_x__dag_get_id(from_node); + } + + /* Set the entry in to_node to the new id. */ + return svn_fs_x__dag_set_entry(to_node, entry, id, from_node->kind, + txn_id, scratch_pool); +} + + + +/*** Comparison. ***/ + +svn_error_t * +svn_fs_x__dag_things_different(svn_boolean_t *props_changed, + svn_boolean_t *contents_changed, + dag_node_t *node1, + dag_node_t *node2, + svn_boolean_t strict, + apr_pool_t *scratch_pool) +{ + svn_fs_x__noderev_t *noderev1, *noderev2; + svn_fs_t *fs; + svn_boolean_t same; + + /* If we have no place to store our results, don't bother doing + anything. */ + if (! props_changed && ! contents_changed) + return SVN_NO_ERROR; + + fs = svn_fs_x__dag_get_fs(node1); + + /* The node revision skels for these two nodes. */ + SVN_ERR(get_node_revision(&noderev1, node1)); + SVN_ERR(get_node_revision(&noderev2, node2)); + + /* Compare property keys. */ + if (props_changed != NULL) + { + SVN_ERR(svn_fs_x__prop_rep_equal(&same, fs, noderev1, noderev2, + strict, scratch_pool)); + *props_changed = !same; + } + + /* Compare contents keys. */ + if (contents_changed != NULL) + *contents_changed = !svn_fs_x__file_text_rep_equal(noderev1->data_rep, + noderev2->data_rep); + + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__dag_get_copyroot(svn_revnum_t *rev, + const char **path, + dag_node_t *node) +{ + svn_fs_x__noderev_t *noderev; + + /* Go get a fresh node-revision for NODE. */ + SVN_ERR(get_node_revision(&noderev, node)); + + *rev = noderev->copyroot_rev; + *path = noderev->copyroot_path; + + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__dag_get_copyfrom_rev(svn_revnum_t *rev, + dag_node_t *node) +{ + svn_fs_x__noderev_t *noderev; + + /* Go get a fresh node-revision for NODE. */ + SVN_ERR(get_node_revision(&noderev, node)); + + *rev = noderev->copyfrom_rev; + + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__dag_get_copyfrom_path(const char **path, + dag_node_t *node) +{ + svn_fs_x__noderev_t *noderev; + + /* Go get a fresh node-revision for NODE. */ + SVN_ERR(get_node_revision(&noderev, node)); + + *path = noderev->copyfrom_path; + + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__dag_update_ancestry(dag_node_t *target, + dag_node_t *source, + apr_pool_t *scratch_pool) +{ + svn_fs_x__noderev_t *source_noderev, *target_noderev; + + if (! svn_fs_x__dag_check_mutable(target)) + return svn_error_createf + (SVN_ERR_FS_NOT_MUTABLE, NULL, + _("Attempted to update ancestry of non-mutable node")); + + SVN_ERR(get_node_revision(&source_noderev, source)); + SVN_ERR(get_node_revision(&target_noderev, target)); + + target_noderev->predecessor_id = source_noderev->noderev_id; + target_noderev->predecessor_count = source_noderev->predecessor_count; + target_noderev->predecessor_count++; + + return svn_fs_x__put_node_revision(target->fs, target_noderev, + scratch_pool); +} diff --git a/subversion/libsvn_fs_x/dag.h b/subversion/libsvn_fs_x/dag.h new file mode 100644 index 0000000..6d5e85b --- /dev/null +++ b/subversion/libsvn_fs_x/dag.h @@ -0,0 +1,580 @@ +/* dag.h : DAG-like interface filesystem, private to libsvn_fs + * + * ==================================================================== + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + * ==================================================================== + */ + +#ifndef SVN_LIBSVN_FS_DAG_H +#define SVN_LIBSVN_FS_DAG_H + +#include "svn_fs.h" +#include "svn_delta.h" +#include "private/svn_cache.h" + +#include "fs.h" +#include "id.h" + +#ifdef __cplusplus +extern "C" { +#endif /* __cplusplus */ + + +/* The interface in this file provides all the essential filesystem + operations, but exposes the filesystem's DAG structure. This makes + it simpler to implement than the public interface, since a client + of this interface has to understand and cope with shared structure + directly as it appears in the database. However, it's still a + self-consistent set of invariants to maintain, making it + (hopefully) a useful interface boundary. + + In other words: + + - The dag_node_t interface exposes the internal DAG structure of + the filesystem, while the svn_fs.h interface does any cloning + necessary to make the filesystem look like a tree. + + - The dag_node_t interface exposes the existence of copy nodes, + whereas the svn_fs.h handles them transparently. + + - dag_node_t's must be explicitly cloned, whereas the svn_fs.h + operations make clones implicitly. + + - Callers of the dag_node_t interface use Berkeley DB transactions + to ensure consistency between operations, while callers of the + svn_fs.h interface use Subversion transactions. */ + + +/* Generic DAG node stuff. */ + +typedef struct dag_node_t dag_node_t; + +/* Fill *NODE with a dag_node_t representing node revision ID in FS, + allocating in RESULT_POOL. Use SCRATCH_POOL for temporaries. */ +svn_error_t * +svn_fs_x__dag_get_node(dag_node_t **node, + svn_fs_t *fs, + const svn_fs_x__id_t *id, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool); + + +/* Return a new dag_node_t object referring to the same node as NODE, + allocated in RESULT_POOL. If you're trying to build a structure in a + pool that wants to refer to dag nodes that may have been allocated + elsewhere, you can call this function and avoid inter-pool pointers. */ +dag_node_t * +svn_fs_x__dag_dup(const dag_node_t *node, + apr_pool_t *result_pool); + +/* If NODE has been allocated in POOL, return NODE. Otherwise, return + a copy created in RESULT_POOL with svn_fs_fs__dag_dup. */ +dag_node_t * +svn_fs_x__dag_copy_into_pool(dag_node_t *node, + apr_pool_t *result_pool); + +/* Serialize a DAG node, except don't try to preserve the 'fs' member. + Implements svn_cache__serialize_func_t */ +svn_error_t * +svn_fs_x__dag_serialize(void **data, + apr_size_t *data_len, + void *in, + apr_pool_t *pool); + +/* Deserialize a DAG node, leaving the 'fs' member as NULL. + Implements svn_cache__deserialize_func_t */ +svn_error_t * +svn_fs_x__dag_deserialize(void **out, + void *data, + apr_size_t data_len, + apr_pool_t *pool); + +/* Return the filesystem containing NODE. */ +svn_fs_t * +svn_fs_x__dag_get_fs(dag_node_t *node); + +/* Changes the filesystem containing NODE to FS. (Used when pulling + nodes out of a shared cache, say.) */ +void +svn_fs_x__dag_set_fs(dag_node_t *node, + svn_fs_t *fs); + + +/* Return NODE's revision number. If NODE has never been committed as + part of a revision, set *REV to SVN_INVALID_REVNUM. */ +svn_revnum_t +svn_fs_x__dag_get_revision(const dag_node_t *node); + + +/* Return the node revision ID of NODE. The value returned is shared + with NODE, and will be deallocated when NODE is. */ +const svn_fs_x__id_t * +svn_fs_x__dag_get_id(const dag_node_t *node); + +/* Return the node ID of NODE. The value returned is shared with NODE, + and will be deallocated when NODE is. */ +svn_error_t * +svn_fs_x__dag_get_node_id(svn_fs_x__id_t *node_id, + dag_node_t *node); + +/* Return the copy ID of NODE. The value returned is shared with NODE, + and will be deallocated when NODE is. */ +svn_error_t * +svn_fs_x__dag_get_copy_id(svn_fs_x__id_t *copy_id, + dag_node_t *node); + +/* Set *SAME to TRUE, if nodes LHS and RHS have the same node ID. */ +svn_error_t * +svn_fs_x__dag_related_node(svn_boolean_t *same, + dag_node_t *lhs, + dag_node_t *rhs); + +/* Set *SAME to TRUE, if nodes LHS and RHS have the same node and copy IDs. + */ +svn_error_t * +svn_fs_x__dag_same_line_of_history(svn_boolean_t *same, + dag_node_t *lhs, + dag_node_t *rhs); + +/* Return the created path of NODE. The value returned is shared + with NODE, and will be deallocated when NODE is. */ +const char * +svn_fs_x__dag_get_created_path(dag_node_t *node); + + +/* Set *ID_P to the node revision ID of NODE's immediate predecessor. + */ +svn_error_t * +svn_fs_x__dag_get_predecessor_id(svn_fs_x__id_t *id_p, + dag_node_t *node); + + +/* Set *COUNT to the number of predecessors NODE has (recursively). + */ +/* ### This function is currently only used by 'verify'. */ +svn_error_t * +svn_fs_x__dag_get_predecessor_count(int *count, + dag_node_t *node); + +/* Set *COUNT to the number of node under NODE (inclusive) with + svn:mergeinfo properties. + */ +svn_error_t * +svn_fs_x__dag_get_mergeinfo_count(apr_int64_t *count, + dag_node_t *node); + +/* Set *DO_THEY to a flag indicating whether or not NODE is a + directory with at least one descendant (not including itself) with + svn:mergeinfo. + */ +svn_error_t * +svn_fs_x__dag_has_descendants_with_mergeinfo(svn_boolean_t *do_they, + dag_node_t *node); + +/* Set *HAS_MERGEINFO to a flag indicating whether or not NODE itself + has svn:mergeinfo set on it. + */ +svn_error_t * +svn_fs_x__dag_has_mergeinfo(svn_boolean_t *has_mergeinfo, + dag_node_t *node); + +/* Return non-zero IFF NODE is currently mutable. */ +svn_boolean_t +svn_fs_x__dag_check_mutable(const dag_node_t *node); + +/* Return the node kind of NODE. */ +svn_node_kind_t +svn_fs_x__dag_node_kind(dag_node_t *node); + +/* Set *PROPLIST_P to a PROPLIST hash representing the entire property + list of NODE, allocating from POOL. The hash has const char * + names (the property names) and svn_string_t * values (the property + values). + + If properties do not exist on NODE, *PROPLIST_P will be set to + NULL. + + Allocate the result in RESULT_POOL and use SCRATCH_POOL for temporaries. + */ +svn_error_t * +svn_fs_x__dag_get_proplist(apr_hash_t **proplist_p, + dag_node_t *node, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool); + +/* Set the property list of NODE to PROPLIST, allocating from POOL. + The node being changed must be mutable. + + Use SCRATCH_POOL for temporary allocations. + */ +svn_error_t * +svn_fs_x__dag_set_proplist(dag_node_t *node, + apr_hash_t *proplist, + apr_pool_t *scratch_pool); + +/* Increment the mergeinfo_count field on NODE by INCREMENT. The node + being changed must be mutable. + + Use SCRATCH_POOL for temporary allocations. + */ +svn_error_t * +svn_fs_x__dag_increment_mergeinfo_count(dag_node_t *node, + apr_int64_t increment, + apr_pool_t *scratch_pool); + +/* Set the has-mergeinfo flag on NODE to HAS_MERGEINFO. The node + being changed must be mutable. + + Use SCRATCH_POOL for temporary allocations. + */ +svn_error_t * +svn_fs_x__dag_set_has_mergeinfo(dag_node_t *node, + svn_boolean_t has_mergeinfo, + apr_pool_t *scratch_pool); + + + +/* Revision and transaction roots. */ + + +/* Open the root of revision REV of filesystem FS, allocating from + RESULT_POOL. Set *NODE_P to the new node. Use SCRATCH_POOL for + temporary allocations.*/ +svn_error_t * +svn_fs_x__dag_revision_root(dag_node_t **node_p, + svn_fs_t *fs, + svn_revnum_t rev, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool); + + +/* Set *NODE_P to the root of transaction TXN_ID in FS, allocating + from RESULT_POOL. Use SCRATCH_POOL for temporary allocations. */ +svn_error_t * +svn_fs_x__dag_txn_root(dag_node_t **node_p, + svn_fs_t *fs, + svn_fs_x__txn_id_t txn_id, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool); + + +/* Directories. */ + + +/* Open the node named NAME in the directory PARENT. Set *CHILD_P to + the new node, allocated in RESULT_POOL. NAME must be a single path + component; it cannot be a slash-separated directory path. If NAME does + not exist within PARENT, set *CHILD_P to NULL. + */ +svn_error_t * +svn_fs_x__dag_open(dag_node_t **child_p, + dag_node_t *parent, + const char *name, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool); + + +/* Set *ENTRIES_P to an array of NODE's entries, sorted by entry names, + and the values are svn_fs_x__dirent_t. The returned table (and elements) + is allocated in RESULT_POOL, temporaries in SCRATCH_POOL. */ +svn_error_t * +svn_fs_x__dag_dir_entries(apr_array_header_t **entries_p, + dag_node_t *node, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool); + +/* Set ENTRY_NAME in NODE to point to ID (with kind KIND), allocating + from POOL. NODE must be a mutable directory. ID can refer to a + mutable or immutable node. If ENTRY_NAME does not exist, it will + be created. TXN_ID is the Subversion transaction under which this + occurs. + + Use SCRATCH_POOL for temporary allocations. + */ +svn_error_t * +svn_fs_x__dag_set_entry(dag_node_t *node, + const char *entry_name, + const svn_fs_x__id_t *id, + svn_node_kind_t kind, + svn_fs_x__txn_id_t txn_id, + apr_pool_t *scratch_pool); + + +/* Make a new mutable clone of the node named NAME in PARENT, and + adjust PARENT's directory entry to point to it, unless NAME in + PARENT already refers to a mutable node. In either case, set + *CHILD_P to a reference to the new node, allocated in POOL. PARENT + must be mutable. NAME must be a single path component; it cannot + be a slash-separated directory path. PARENT_PATH must be the + canonicalized absolute path of the parent directory. + + COPY_ID, if non-NULL, is a key into the `copies' table, and + indicates that this new node is being created as the result of a + copy operation, and specifically which operation that was. + + PATH is the canonicalized absolute path at which this node is being + created. + + TXN_ID is the Subversion transaction under which this occurs. + + Allocate *CHILD_P in RESULT_POOL and use SCRATCH_POOL for temporaries. + */ +svn_error_t * +svn_fs_x__dag_clone_child(dag_node_t **child_p, + dag_node_t *parent, + const char *parent_path, + const char *name, + const svn_fs_x__id_t *copy_id, + svn_fs_x__txn_id_t txn_id, + svn_boolean_t is_parent_copyroot, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool); + + +/* Delete the directory entry named NAME from PARENT, allocating from + POOL. PARENT must be mutable. NAME must be a single path + component; it cannot be a slash-separated directory path. If the + node being deleted is a mutable directory, remove all mutable nodes + reachable from it. TXN_ID is the Subversion transaction under + which this occurs. + + If return SVN_ERR_FS_NO_SUCH_ENTRY, then there is no entry NAME in + PARENT. + + Use SCRATCH_POOL for temporary allocations. + */ +svn_error_t * +svn_fs_x__dag_delete(dag_node_t *parent, + const char *name, + svn_fs_x__txn_id_t txn_id, + apr_pool_t *scratch_pool); + + +/* Create a new mutable directory named NAME in PARENT. Set *CHILD_P + to a reference to the new node, allocated in RESULT_POOL. The new + directory has no contents, and no properties. PARENT must be + mutable. NAME must be a single path component; it cannot be a + slash-separated directory path. PARENT_PATH must be the + canonicalized absolute path of the parent directory. PARENT must + not currently have an entry named NAME. TXN_ID is the Subversion + transaction under which this occurs. + + Use SCRATCH_POOL for temporary allocations. + */ +svn_error_t * +svn_fs_x__dag_make_dir(dag_node_t **child_p, + dag_node_t *parent, + const char *parent_path, + const char *name, + svn_fs_x__txn_id_t txn_id, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool); + + + +/* Files. */ + + +/* Set *CONTENTS to a readable generic stream which yields the + contents of FILE. Allocate the stream in RESULT_POOL. + + If FILE is not a file, return SVN_ERR_FS_NOT_FILE. + */ +svn_error_t * +svn_fs_x__dag_get_contents(svn_stream_t **contents, + dag_node_t *file, + apr_pool_t *result_pool); + +/* Attempt to fetch the contents of NODE and pass it along with the BATON + to the PROCESSOR. Set *SUCCESS only of the data could be provided + and the processor had been called. + + Use SCRATCH_POOL for temporary allocations. + */ +svn_error_t * +svn_fs_x__dag_try_process_file_contents(svn_boolean_t *success, + dag_node_t *node, + svn_fs_process_contents_func_t processor, + void* baton, + apr_pool_t *scratch_pool); + + +/* Set *STREAM_P to a delta stream that will turn the contents of SOURCE into + the contents of TARGET, allocated in RESULT_POOL. If SOURCE is null, the + empty string will be used is its stead. + + Use SCRATCH_POOL for temporary allocations. + */ +svn_error_t * +svn_fs_x__dag_get_file_delta_stream(svn_txdelta_stream_t **stream_p, + dag_node_t *source, + dag_node_t *target, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool); + +/* Return a generic writable stream in *CONTENTS with which to set the + contents of FILE. Allocate the stream in RESULT_POOL. + + Any previous edits on the file will be deleted, and a new edit + stream will be constructed. + */ +svn_error_t * +svn_fs_x__dag_get_edit_stream(svn_stream_t **contents, + dag_node_t *file, + apr_pool_t *result_pool); + + +/* Signify the completion of edits to FILE made using the stream + returned by svn_fs_x__dag_get_edit_stream. + + If CHECKSUM is non-null, it must match the checksum for FILE's + contents (note: this is not recalculated, the recorded checksum is + used), else the error SVN_ERR_CHECKSUM_MISMATCH is returned. + + This operation is a no-op if no edits are present. + + Use SCRATCH_POOL for temporary allocations. + */ +svn_error_t * +svn_fs_x__dag_finalize_edits(dag_node_t *file, + const svn_checksum_t *checksum, + apr_pool_t *scratch_pool); + + +/* Set *LENGTH to the length of the contents of FILE. + */ +svn_error_t * +svn_fs_x__dag_file_length(svn_filesize_t *length, + dag_node_t *file); + +/* Put the recorded checksum of type KIND for FILE into CHECKSUM, allocating + from RESULT_POOL. + + If no stored checksum is available, do not calculate the checksum, + just put NULL into CHECKSUM. + */ +svn_error_t * +svn_fs_x__dag_file_checksum(svn_checksum_t **checksum, + dag_node_t *file, + svn_checksum_kind_t kind, + apr_pool_t *result_pool); + +/* Create a new mutable file named NAME in PARENT. Set *CHILD_P to a + reference to the new node, allocated in RESULT_POOL. The new file's + contents are the empty string, and it has no properties. PARENT + must be mutable. NAME must be a single path component; it cannot + be a slash-separated directory path. PARENT_PATH must be the + canonicalized absolute path of the parent directory. TXN_ID is the + Subversion transaction under which this occurs. + + Use SCRATCH_POOL for temporary allocations. + */ +svn_error_t * +svn_fs_x__dag_make_file(dag_node_t **child_p, + dag_node_t *parent, + const char *parent_path, + const char *name, + svn_fs_x__txn_id_t txn_id, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool); + + + +/* Copies */ + +/* Make ENTRY in TO_NODE be a copy of FROM_NODE. TO_NODE must be mutable. + TXN_ID is the Subversion transaction under which this occurs. + + If PRESERVE_HISTORY is true, the new node will record that it was + copied from FROM_PATH in FROM_REV; therefore, FROM_NODE should be + the node found at FROM_PATH in FROM_REV, although this is not + checked. FROM_PATH should be canonicalized before being passed + here. + + If PRESERVE_HISTORY is false, FROM_PATH and FROM_REV are ignored. + + Use SCRATCH_POOL for temporary allocations. + */ +svn_error_t * +svn_fs_x__dag_copy(dag_node_t *to_node, + const char *entry, + dag_node_t *from_node, + svn_boolean_t preserve_history, + svn_revnum_t from_rev, + const char *from_path, + svn_fs_x__txn_id_t txn_id, + apr_pool_t *scratch_pool); + + +/* Comparison */ + +/* Find out what is the same between two nodes. If STRICT is FALSE, + this function may report false positives, i.e. report changes even + if the resulting contents / props are equal. + + If PROPS_CHANGED is non-null, set *PROPS_CHANGED to 1 if the two + nodes have different property lists, or to 0 if same. + + If CONTENTS_CHANGED is non-null, set *CONTENTS_CHANGED to 1 if the + two nodes have different contents, or to 0 if same. NODE1 and NODE2 + must refer to files from the same filesystem. + + Use SCRATCH_POOL for temporary allocations. + */ +svn_error_t * +svn_fs_x__dag_things_different(svn_boolean_t *props_changed, + svn_boolean_t *contents_changed, + dag_node_t *node1, + dag_node_t *node2, + svn_boolean_t strict, + apr_pool_t *scratch_pool); + + +/* Set *REV and *PATH to the copyroot revision and path of node NODE, or + to SVN_INVALID_REVNUM and NULL if no copyroot exists. + */ +svn_error_t * +svn_fs_x__dag_get_copyroot(svn_revnum_t *rev, + const char **path, + dag_node_t *node); + +/* Set *REV to the copyfrom revision associated with NODE. + */ +svn_error_t * +svn_fs_x__dag_get_copyfrom_rev(svn_revnum_t *rev, + dag_node_t *node); + +/* Set *PATH to the copyfrom path associated with NODE. + */ +svn_error_t * +svn_fs_x__dag_get_copyfrom_path(const char **path, + dag_node_t *node); + +/* Update *TARGET so that SOURCE is it's predecessor. + + Use SCRATCH_POOL for temporary allocations. + */ +svn_error_t * +svn_fs_x__dag_update_ancestry(dag_node_t *target, + dag_node_t *source, + apr_pool_t *scratch_pool); +#ifdef __cplusplus +} +#endif /* __cplusplus */ + +#endif /* SVN_LIBSVN_FS_DAG_H */ diff --git a/subversion/libsvn_fs_x/fs.c b/subversion/libsvn_fs_x/fs.c new file mode 100644 index 0000000..abc564d --- /dev/null +++ b/subversion/libsvn_fs_x/fs.c @@ -0,0 +1,669 @@ +/* fs.c --- creating, opening and closing filesystems + * + * ==================================================================== + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + * ==================================================================== + */ + +#include <stdlib.h> +#include <stdio.h> +#include <string.h> + +#include <apr_general.h> +#include <apr_pools.h> +#include <apr_file_io.h> +#include <apr_thread_mutex.h> + +#include "svn_fs.h" +#include "svn_delta.h" +#include "svn_version.h" +#include "svn_pools.h" +#include "fs.h" +#include "fs_x.h" +#include "pack.h" +#include "recovery.h" +#include "hotcopy.h" +#include "verify.h" +#include "tree.h" +#include "lock.h" +#include "id.h" +#include "revprops.h" +#include "rep-cache.h" +#include "transaction.h" +#include "util.h" +#include "svn_private_config.h" +#include "private/svn_fs_util.h" + +#include "../libsvn_fs/fs-loader.h" + +/* A prefix for the pool userdata variables used to hold + per-filesystem shared data. See fs_serialized_init. */ +#define SVN_FSX_SHARED_USERDATA_PREFIX "svn-fsx-shared-" + + + +/* Initialize the part of FS that requires global serialization across all + instances. The caller is responsible of ensuring that serialization. + Use COMMON_POOL for process-wide and SCRATCH_POOL for temporary + allocations. */ +static svn_error_t * +x_serialized_init(svn_fs_t *fs, + apr_pool_t *common_pool, + apr_pool_t *scratch_pool) +{ + svn_fs_x__data_t *ffd = fs->fsap_data; + const char *key; + void *val; + svn_fs_x__shared_data_t *ffsd; + apr_status_t status; + + /* Note that we are allocating a small amount of long-lived data for + each separate repository opened during the lifetime of the + svn_fs_initialize pool. It's unlikely that anyone will notice + the modest expenditure; the alternative is to allocate each structure + in a subpool, add a reference-count, and add a serialized destructor + to the FS vtable. That's more machinery than it's worth. + + Picking an appropriate key for the shared data is tricky, because, + unfortunately, a filesystem UUID is not really unique. It is implicitly + shared between hotcopied (1), dump / loaded (2) or naively copied (3) + filesystems. We tackle this problem by using a combination of the UUID + and an instance ID as the key. This allows us to avoid key clashing + in (1) and (2). + + Speaking of (3), there is not so much we can do about it, except maybe + provide a convenient way of fixing things. Naively copied filesystems + have identical filesystem UUIDs *and* instance IDs. With the key being + a combination of these two, clashes can be fixed by changing either of + them (or both), e.g. with svn_fs_set_uuid(). */ + + + SVN_ERR_ASSERT(fs->uuid); + SVN_ERR_ASSERT(ffd->instance_id); + + key = apr_pstrcat(scratch_pool, SVN_FSX_SHARED_USERDATA_PREFIX, + fs->uuid, ":", ffd->instance_id, SVN_VA_NULL); + status = apr_pool_userdata_get(&val, key, common_pool); + if (status) + return svn_error_wrap_apr(status, _("Can't fetch FSX shared data")); + ffsd = val; + + if (!ffsd) + { + ffsd = apr_pcalloc(common_pool, sizeof(*ffsd)); + ffsd->common_pool = common_pool; + + /* POSIX fcntl locks are per-process, so we need a mutex for + intra-process synchronization when grabbing the repository write + lock. */ + SVN_ERR(svn_mutex__init(&ffsd->fs_write_lock, + SVN_FS_X__USE_LOCK_MUTEX, common_pool)); + + /* ... the pack lock ... */ + SVN_ERR(svn_mutex__init(&ffsd->fs_pack_lock, + SVN_FS_X__USE_LOCK_MUTEX, common_pool)); + + /* ... not to mention locking the txn-current file. */ + SVN_ERR(svn_mutex__init(&ffsd->txn_current_lock, + SVN_FS_X__USE_LOCK_MUTEX, common_pool)); + + /* We also need a mutex for synchronizing access to the active + transaction list and free transaction pointer. */ + SVN_ERR(svn_mutex__init(&ffsd->txn_list_lock, TRUE, common_pool)); + + key = apr_pstrdup(common_pool, key); + status = apr_pool_userdata_set(ffsd, key, NULL, common_pool); + if (status) + return svn_error_wrap_apr(status, _("Can't store FSX shared data")); + } + + ffd->shared = ffsd; + + return SVN_NO_ERROR; +} + + + +/* This function is provided for Subversion 1.0.x compatibility. It + has no effect for fsx backed Subversion filesystems. It conforms + to the fs_library_vtable_t.bdb_set_errcall() API. */ +static svn_error_t * +x_set_errcall(svn_fs_t *fs, + void (*db_errcall_fcn)(const char *errpfx, char *msg)) +{ + + return SVN_NO_ERROR; +} + +typedef struct x_freeze_baton_t { + svn_fs_t *fs; + svn_fs_freeze_func_t freeze_func; + void *freeze_baton; +} x_freeze_baton_t; + +static svn_error_t * +x_freeze_body(void *baton, + apr_pool_t *scratch_pool) +{ + x_freeze_baton_t *b = baton; + svn_boolean_t exists; + + SVN_ERR(svn_fs_x__exists_rep_cache(&exists, b->fs, scratch_pool)); + if (exists) + SVN_ERR(svn_fs_x__with_rep_cache_lock(b->fs, + b->freeze_func, b->freeze_baton, + scratch_pool)); + else + SVN_ERR(b->freeze_func(b->freeze_baton, scratch_pool)); + + return SVN_NO_ERROR; +} + +static svn_error_t * +x_freeze_body2(void *baton, + apr_pool_t *scratch_pool) +{ + x_freeze_baton_t *b = baton; + SVN_ERR(svn_fs_x__with_write_lock(b->fs, x_freeze_body, baton, + scratch_pool)); + + return SVN_NO_ERROR; +} + +static svn_error_t * +x_freeze(svn_fs_t *fs, + svn_fs_freeze_func_t freeze_func, + void *freeze_baton, + apr_pool_t *scratch_pool) +{ + x_freeze_baton_t b; + + b.fs = fs; + b.freeze_func = freeze_func; + b.freeze_baton = freeze_baton; + + SVN_ERR(svn_fs__check_fs(fs, TRUE)); + SVN_ERR(svn_fs_x__with_pack_lock(fs, x_freeze_body2, &b, scratch_pool)); + + return SVN_NO_ERROR; +} + +static svn_error_t * +x_info(const void **fsx_info, + svn_fs_t *fs, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool) +{ + svn_fs_x__data_t *ffd = fs->fsap_data; + svn_fs_fsx_info_t *info = apr_palloc(result_pool, sizeof(*info)); + info->fs_type = SVN_FS_TYPE_FSX; + info->shard_size = ffd->max_files_per_dir; + info->min_unpacked_rev = ffd->min_unpacked_rev; + *fsx_info = info; + return SVN_NO_ERROR; +} + +/* Wrapper around svn_fs_x__revision_prop() adapting between function + signatures. */ +static svn_error_t * +x_revision_prop(svn_string_t **value_p, + svn_fs_t *fs, + svn_revnum_t rev, + const char *propname, + apr_pool_t *pool) +{ + apr_pool_t *scratch_pool = svn_pool_create(pool); + SVN_ERR(svn_fs_x__revision_prop(value_p, fs, rev, propname, pool, + scratch_pool)); + svn_pool_destroy(scratch_pool); + + return SVN_NO_ERROR; +} + +/* Wrapper around svn_fs_x__get_revision_proplist() adapting between function + signatures. */ +static svn_error_t * +x_revision_proplist(apr_hash_t **proplist_p, + svn_fs_t *fs, + svn_revnum_t rev, + apr_pool_t *pool) +{ + apr_pool_t *scratch_pool = svn_pool_create(pool); + + /* No need to bypass the caches for r/o access to revprops. */ + SVN_ERR(svn_fs_x__get_revision_proplist(proplist_p, fs, rev, FALSE, + pool, scratch_pool)); + svn_pool_destroy(scratch_pool); + + return SVN_NO_ERROR; +} + +/* Wrapper around svn_fs_x__set_uuid() adapting between function + signatures. */ +static svn_error_t * +x_set_uuid(svn_fs_t *fs, + const char *uuid, + apr_pool_t *scratch_pool) +{ + /* Whenever we set a new UUID, imply that FS will also be a different + * instance (on formats that support this). */ + return svn_error_trace(svn_fs_x__set_uuid(fs, uuid, NULL, scratch_pool)); +} + +/* Wrapper around svn_fs_x__begin_txn() providing the scratch pool. */ +static svn_error_t * +x_begin_txn(svn_fs_txn_t **txn_p, + svn_fs_t *fs, + svn_revnum_t rev, + apr_uint32_t flags, + apr_pool_t *pool) +{ + apr_pool_t *scratch_pool = svn_pool_create(pool); + SVN_ERR(svn_fs_x__begin_txn(txn_p, fs, rev, flags, pool, scratch_pool)); + svn_pool_destroy(scratch_pool); + + return SVN_NO_ERROR; +} + + + +/* The vtable associated with a specific open filesystem. */ +static fs_vtable_t fs_vtable = { + svn_fs_x__youngest_rev, + x_revision_prop, + x_revision_proplist, + svn_fs_x__change_rev_prop, + x_set_uuid, + svn_fs_x__revision_root, + x_begin_txn, + svn_fs_x__open_txn, + svn_fs_x__purge_txn, + svn_fs_x__list_transactions, + svn_fs_x__deltify, + svn_fs_x__lock, + svn_fs_x__generate_lock_token, + svn_fs_x__unlock, + svn_fs_x__get_lock, + svn_fs_x__get_locks, + svn_fs_x__info_format, + svn_fs_x__info_config_files, + x_info, + svn_fs_x__verify_root, + x_freeze, + x_set_errcall +}; + + +/* Creating a new filesystem. */ + +/* Set up vtable and fsap_data fields in FS. */ +static svn_error_t * +initialize_fs_struct(svn_fs_t *fs) +{ + svn_fs_x__data_t *ffd = apr_pcalloc(fs->pool, sizeof(*ffd)); + fs->vtable = &fs_vtable; + fs->fsap_data = ffd; + return SVN_NO_ERROR; +} + +/* Reset vtable and fsap_data fields in FS such that the FS is basically + * closed now. Note that FS must not hold locks when you call this. */ +static void +uninitialize_fs_struct(svn_fs_t *fs) +{ + fs->vtable = NULL; + fs->fsap_data = NULL; +} + +/* This implements the fs_library_vtable_t.create() API. Create a new + fsx-backed Subversion filesystem at path PATH and link it into + *FS. + + Perform temporary allocations in SCRATCH_POOL, and fs-global allocations + in COMMON_POOL. The latter must be serialized using COMMON_POOL_LOCK. */ +static svn_error_t * +x_create(svn_fs_t *fs, + const char *path, + svn_mutex__t *common_pool_lock, + apr_pool_t *scratch_pool, + apr_pool_t *common_pool) +{ + SVN_ERR(svn_fs__check_fs(fs, FALSE)); + + SVN_ERR(initialize_fs_struct(fs)); + + SVN_ERR(svn_fs_x__create(fs, path, scratch_pool)); + + SVN_ERR(svn_fs_x__initialize_caches(fs, scratch_pool)); + SVN_MUTEX__WITH_LOCK(common_pool_lock, + x_serialized_init(fs, common_pool, scratch_pool)); + + return SVN_NO_ERROR; +} + + + +/* Gaining access to an existing filesystem. */ + +/* This implements the fs_library_vtable_t.open() API. Open an FSX + Subversion filesystem located at PATH, set *FS to point to the + correct vtable for the filesystem. Use SCRATCH_POOL for any temporary + allocations, and COMMON_POOL for fs-global allocations. + The latter must be serialized using COMMON_POOL_LOCK. */ +static svn_error_t * +x_open(svn_fs_t *fs, + const char *path, + svn_mutex__t *common_pool_lock, + apr_pool_t *scratch_pool, + apr_pool_t *common_pool) +{ + apr_pool_t *subpool = svn_pool_create(scratch_pool); + + SVN_ERR(svn_fs__check_fs(fs, FALSE)); + + SVN_ERR(initialize_fs_struct(fs)); + + SVN_ERR(svn_fs_x__open(fs, path, subpool)); + + SVN_ERR(svn_fs_x__initialize_caches(fs, subpool)); + SVN_MUTEX__WITH_LOCK(common_pool_lock, + x_serialized_init(fs, common_pool, subpool)); + + svn_pool_destroy(subpool); + + return SVN_NO_ERROR; +} + + + +/* This implements the fs_library_vtable_t.open_for_recovery() API. */ +static svn_error_t * +x_open_for_recovery(svn_fs_t *fs, + const char *path, + svn_mutex__t *common_pool_lock, + apr_pool_t *scratch_pool, + apr_pool_t *common_pool) +{ + svn_error_t * err; + svn_revnum_t youngest_rev; + apr_pool_t * subpool = svn_pool_create(scratch_pool); + + /* Recovery for FSFS is currently limited to recreating the 'current' + file from the latest revision. */ + + /* The only thing we have to watch out for is that the 'current' file + might not exist or contain garbage. So we'll try to read it here + and provide or replace the existing file if we couldn't read it. + (We'll also need it to exist later anyway as a source for the new + file's permissions). */ + + /* Use a partly-filled fs pointer first to create 'current'. */ + fs->path = apr_pstrdup(fs->pool, path); + + SVN_ERR(initialize_fs_struct(fs)); + + /* Figure out the repo format and check that we can even handle it. */ + SVN_ERR(svn_fs_x__read_format_file(fs, subpool)); + + /* Now, read 'current' and try to patch it if necessary. */ + err = svn_fs_x__youngest_rev(&youngest_rev, fs, subpool); + if (err) + { + const char *file_path; + + /* 'current' file is missing or contains garbage. Since we are trying + * to recover from whatever problem there is, being picky about the + * error code here won't do us much good. If there is a persistent + * problem that we can't fix, it will show up when we try rewrite the + * file a few lines further below and we will report the failure back + * to the caller. + * + * Start recovery with HEAD = 0. */ + svn_error_clear(err); + file_path = svn_fs_x__path_current(fs, subpool); + + /* Best effort to ensure the file exists and is valid. + * This may fail for r/o filesystems etc. */ + SVN_ERR(svn_io_remove_file2(file_path, TRUE, subpool)); + SVN_ERR(svn_io_file_create_empty(file_path, subpool)); + SVN_ERR(svn_fs_x__write_current(fs, 0, subpool)); + } + + uninitialize_fs_struct(fs); + svn_pool_destroy(subpool); + + /* Now open the filesystem properly by calling the vtable method directly. */ + return x_open(fs, path, common_pool_lock, scratch_pool, common_pool); +} + + + +/* This implements the fs_library_vtable_t.upgrade_fs() API. */ +static svn_error_t * +x_upgrade(svn_fs_t *fs, + const char *path, + svn_fs_upgrade_notify_t notify_func, + void *notify_baton, + svn_cancel_func_t cancel_func, + void *cancel_baton, + svn_mutex__t *common_pool_lock, + apr_pool_t *scratch_pool, + apr_pool_t *common_pool) +{ + SVN_ERR(x_open(fs, path, common_pool_lock, scratch_pool, common_pool)); + return svn_fs_x__upgrade(fs, notify_func, notify_baton, + cancel_func, cancel_baton, scratch_pool); +} + +static svn_error_t * +x_verify(svn_fs_t *fs, + const char *path, + svn_revnum_t start, + svn_revnum_t end, + svn_fs_progress_notify_func_t notify_func, + void *notify_baton, + svn_cancel_func_t cancel_func, + void *cancel_baton, + svn_mutex__t *common_pool_lock, + apr_pool_t *scratch_pool, + apr_pool_t *common_pool) +{ + SVN_ERR(x_open(fs, path, common_pool_lock, scratch_pool, common_pool)); + return svn_fs_x__verify(fs, start, end, notify_func, notify_baton, + cancel_func, cancel_baton, scratch_pool); +} + +static svn_error_t * +x_pack(svn_fs_t *fs, + const char *path, + svn_fs_pack_notify_t notify_func, + void *notify_baton, + svn_cancel_func_t cancel_func, + void *cancel_baton, + svn_mutex__t *common_pool_lock, + apr_pool_t *scratch_pool, + apr_pool_t *common_pool) +{ + SVN_ERR(x_open(fs, path, common_pool_lock, scratch_pool, common_pool)); + return svn_fs_x__pack(fs, notify_func, notify_baton, + cancel_func, cancel_baton, scratch_pool); +} + + + + +/* This implements the fs_library_vtable_t.hotcopy() API. Copy a + possibly live Subversion filesystem SRC_FS from SRC_PATH to a + DST_FS at DEST_PATH. If INCREMENTAL is TRUE, make an effort not to + re-copy data which already exists in DST_FS. + The CLEAN_LOGS argument is ignored and included for Subversion + 1.0.x compatibility. The NOTIFY_FUNC and NOTIFY_BATON arguments + are also currently ignored. + Perform all temporary allocations in SCRATCH_POOL. */ +static svn_error_t * +x_hotcopy(svn_fs_t *src_fs, + svn_fs_t *dst_fs, + const char *src_path, + const char *dst_path, + svn_boolean_t clean_logs, + svn_boolean_t incremental, + svn_fs_hotcopy_notify_t notify_func, + void *notify_baton, + svn_cancel_func_t cancel_func, + void *cancel_baton, + svn_mutex__t *common_pool_lock, + apr_pool_t *scratch_pool, + apr_pool_t *common_pool) +{ + /* Open the source repo as usual. */ + SVN_ERR(x_open(src_fs, src_path, common_pool_lock, scratch_pool, + common_pool)); + if (cancel_func) + SVN_ERR(cancel_func(cancel_baton)); + + /* Test target repo when in INCREMENTAL mode, initialize it when not. + * For this, we need our FS internal data structures to be temporarily + * available. */ + SVN_ERR(initialize_fs_struct(dst_fs)); + SVN_ERR(svn_fs_x__hotcopy_prepare_target(src_fs, dst_fs, dst_path, + incremental, scratch_pool)); + uninitialize_fs_struct(dst_fs); + + /* Now, the destination repo should open just fine. */ + SVN_ERR(x_open(dst_fs, dst_path, common_pool_lock, scratch_pool, + common_pool)); + if (cancel_func) + SVN_ERR(cancel_func(cancel_baton)); + + /* Now, we may copy data as needed ... */ + return svn_fs_x__hotcopy(src_fs, dst_fs, incremental, + notify_func, notify_baton, + cancel_func, cancel_baton, scratch_pool); +} + + + +/* This function is included for Subversion 1.0.x compatibility. It + has no effect for fsx backed Subversion filesystems. It conforms + to the fs_library_vtable_t.bdb_logfiles() API. */ +static svn_error_t * +x_logfiles(apr_array_header_t **logfiles, + const char *path, + svn_boolean_t only_unused, + apr_pool_t *pool) +{ + /* A no-op for FSX. */ + *logfiles = apr_array_make(pool, 0, sizeof(const char *)); + + return SVN_NO_ERROR; +} + + + + + +/* Delete the filesystem located at path PATH. Perform any temporary + allocations in SCRATCH_POOL. */ +static svn_error_t * +x_delete_fs(const char *path, + apr_pool_t *scratch_pool) +{ + /* Remove everything. */ + return svn_error_trace(svn_io_remove_dir2(path, FALSE, NULL, NULL, + scratch_pool)); +} + +static const svn_version_t * +x_version(void) +{ + SVN_VERSION_BODY; +} + +static const char * +x_get_description(void) +{ + return _("Module for working with an experimental (FSX) repository."); +} + +static svn_error_t * +x_set_svn_fs_open(svn_fs_t *fs, + svn_error_t *(*svn_fs_open_)(svn_fs_t **, + const char *, + apr_hash_t *, + apr_pool_t *, + apr_pool_t *)) +{ + svn_fs_x__data_t *ffd = fs->fsap_data; + ffd->svn_fs_open_ = svn_fs_open_; + return SVN_NO_ERROR; +} + +static void * +x_info_dup(const void *fsx_info_void, + apr_pool_t *result_pool) +{ + /* All fields are either ints or static strings. */ + const svn_fs_fsx_info_t *fsx_info = fsx_info_void; + return apr_pmemdup(result_pool, fsx_info, sizeof(*fsx_info)); +} + + +/* Base FS library vtable, used by the FS loader library. */ + +static fs_library_vtable_t library_vtable = { + x_version, + x_create, + x_open, + x_open_for_recovery, + x_upgrade, + x_verify, + x_delete_fs, + x_hotcopy, + x_get_description, + svn_fs_x__recover, + x_pack, + x_logfiles, + NULL /* parse_id */, + x_set_svn_fs_open, + x_info_dup +}; + +svn_error_t * +svn_fs_x__init(const svn_version_t *loader_version, + fs_library_vtable_t **vtable, + apr_pool_t* common_pool) +{ + static const svn_version_checklist_t checklist[] = + { + { "svn_subr", svn_subr_version }, + { "svn_delta", svn_delta_version }, + { "svn_fs_util", svn_fs_util__version }, + { NULL, NULL } + }; + + /* Simplified version check to make sure we can safely use the + VTABLE parameter. The FS loader does a more exhaustive check. */ + if (loader_version->major != SVN_VER_MAJOR) + return svn_error_createf(SVN_ERR_VERSION_MISMATCH, NULL, + _("Unsupported FS loader version (%d) for fsx"), + loader_version->major); + SVN_ERR(svn_ver_check_list2(x_version(), checklist, svn_ver_equal)); + + *vtable = &library_vtable; + return SVN_NO_ERROR; +} diff --git a/subversion/libsvn_fs_x/fs.h b/subversion/libsvn_fs_x/fs.h new file mode 100644 index 0000000..afb4b2a --- /dev/null +++ b/subversion/libsvn_fs_x/fs.h @@ -0,0 +1,574 @@ +/* fs.h : interface to Subversion filesystem, private to libsvn_fs + * + * ==================================================================== + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + * ==================================================================== + */ + +#ifndef SVN_LIBSVN_FS_X_H +#define SVN_LIBSVN_FS_X_H + +#include <apr_pools.h> +#include <apr_hash.h> +#include <apr_network_io.h> +#include <apr_md5.h> +#include <apr_sha1.h> + +#include "svn_fs.h" +#include "svn_config.h" +#include "private/svn_atomic.h" +#include "private/svn_cache.h" +#include "private/svn_fs_private.h" +#include "private/svn_sqlite.h" +#include "private/svn_mutex.h" + +#include "id.h" + +#ifdef __cplusplus +extern "C" { +#endif /* __cplusplus */ + + +/*** The filesystem structure. ***/ + +/* Following are defines that specify the textual elements of the + native filesystem directories and revision files. */ + +/* Names of special files in the fs_x filesystem. */ +#define PATH_FORMAT "format" /* Contains format number */ +#define PATH_UUID "uuid" /* Contains UUID */ +#define PATH_CURRENT "current" /* Youngest revision */ +#define PATH_LOCK_FILE "write-lock" /* Revision lock file */ +#define PATH_PACK_LOCK_FILE "pack-lock" /* Pack lock file */ +#define PATH_REVS_DIR "revs" /* Directory of revisions */ +#define PATH_REVPROPS_DIR "revprops" /* Directory of revprops */ +#define PATH_TXNS_DIR "transactions" /* Directory of transactions */ +#define PATH_NODE_ORIGINS_DIR "node-origins" /* Lazy node-origin cache */ +#define PATH_TXN_PROTOS_DIR "txn-protorevs" /* Directory of proto-revs */ +#define PATH_TXN_CURRENT "txn-current" /* File with next txn key */ +#define PATH_TXN_CURRENT_LOCK "txn-current-lock" /* Lock for txn-current */ +#define PATH_LOCKS_DIR "locks" /* Directory of locks */ +#define PATH_MIN_UNPACKED_REV "min-unpacked-rev" /* Oldest revision which + has not been packed. */ +#define PATH_REVPROP_GENERATION "revprop-generation" + /* Current revprop generation*/ +#define PATH_MANIFEST "manifest" /* Manifest file name */ +#define PATH_PACKED "pack" /* Packed revision data file */ +#define PATH_EXT_PACKED_SHARD ".pack" /* Extension for packed + shards */ +#define PATH_EXT_L2P_INDEX ".l2p" /* extension of the log- + to-phys index */ +#define PATH_EXT_P2L_INDEX ".p2l" /* extension of the phys- + to-log index */ +/* If you change this, look at tests/svn_test_fs.c(maybe_install_fsx_conf) */ +#define PATH_CONFIG "fsx.conf" /* Configuration */ + +/* Names of special files and file extensions for transactions */ +#define PATH_CHANGES "changes" /* Records changes made so far */ +#define PATH_TXN_PROPS "props" /* Transaction properties */ +#define PATH_TXN_PROPS_FINAL "props-final" /* Final transaction properties + before moving to revprops */ +#define PATH_NEXT_IDS "next-ids" /* Next temporary ID assignments */ +#define PATH_PREFIX_NODE "node." /* Prefix for node filename */ +#define PATH_EXT_TXN ".txn" /* Extension of txn dir */ +#define PATH_EXT_CHILDREN ".children" /* Extension for dir contents */ +#define PATH_EXT_PROPS ".props" /* Extension for node props */ +#define PATH_EXT_REV ".rev" /* Extension of protorev file */ +#define PATH_EXT_REV_LOCK ".rev-lock" /* Extension of protorev lock file */ +#define PATH_TXN_ITEM_INDEX "itemidx" /* File containing the current item + index number */ +#define PATH_INDEX "index" /* name of index files w/o ext */ + +/* Names of files in legacy FS formats */ +#define PATH_REV "rev" /* Proto rev file */ +#define PATH_REV_LOCK "rev-lock" /* Proto rev (write) lock file */ + +/* Names of sections and options in fsx.conf. */ +#define CONFIG_SECTION_CACHES "caches" +#define CONFIG_OPTION_FAIL_STOP "fail-stop" +#define CONFIG_SECTION_REP_SHARING "rep-sharing" +#define CONFIG_OPTION_ENABLE_REP_SHARING "enable-rep-sharing" +#define CONFIG_SECTION_DELTIFICATION "deltification" +#define CONFIG_OPTION_MAX_DELTIFICATION_WALK "max-deltification-walk" +#define CONFIG_OPTION_MAX_LINEAR_DELTIFICATION "max-linear-deltification" +#define CONFIG_OPTION_COMPRESSION_LEVEL "compression-level" +#define CONFIG_SECTION_PACKED_REVPROPS "packed-revprops" +#define CONFIG_OPTION_REVPROP_PACK_SIZE "revprop-pack-size" +#define CONFIG_OPTION_COMPRESS_PACKED_REVPROPS "compress-packed-revprops" +#define CONFIG_SECTION_IO "io" +#define CONFIG_OPTION_BLOCK_SIZE "block-size" +#define CONFIG_OPTION_L2P_PAGE_SIZE "l2p-page-size" +#define CONFIG_OPTION_P2L_PAGE_SIZE "p2l-page-size" +#define CONFIG_SECTION_DEBUG "debug" +#define CONFIG_OPTION_PACK_AFTER_COMMIT "pack-after-commit" + +/* The format number of this filesystem. + This is independent of the repository format number, and + independent of any other FS back ends. + + Note: If you bump this, please update the switch statement in + svn_fs_x__create() as well. + */ +#define SVN_FS_X__FORMAT_NUMBER 1 + +/* On most operating systems apr implements file locks per process, not + per file. On Windows apr implements the locking as per file handle + locks, so we don't have to add our own mutex for just in-process + synchronization. */ +#if APR_HAS_THREADS && !defined(WIN32) +#define SVN_FS_X__USE_LOCK_MUTEX 1 +#else +#define SVN_FS_X__USE_LOCK_MUTEX 0 +#endif + +/* Private FSX-specific data shared between all svn_txn_t objects that + relate to a particular transaction in a filesystem (as identified + by transaction id and filesystem UUID). Objects of this type are + allocated in their own subpool of the common pool. */ +typedef struct svn_fs_x__shared_txn_data_t +{ + /* The next transaction in the list, or NULL if there is no following + transaction. */ + struct svn_fs_x__shared_txn_data_t *next; + + /* ID of this transaction. */ + svn_fs_x__txn_id_t txn_id; + + /* Whether the transaction's prototype revision file is locked for + writing by any thread in this process (including the current + thread; recursive locks are not permitted). This is effectively + a non-recursive mutex. */ + svn_boolean_t being_written; + + /* The pool in which this object has been allocated; a subpool of the + common pool. */ + apr_pool_t *pool; +} svn_fs_x__shared_txn_data_t; + +/* Private FSX-specific data shared between all svn_fs_t objects that + relate to a particular filesystem, as identified by filesystem UUID. + Objects of this type are allocated in the common pool. */ +typedef struct svn_fs_x__shared_data_t +{ + /* A list of shared transaction objects for each transaction that is + currently active, or NULL if none are. All access to this list, + including the contents of the objects stored in it, is synchronised + under TXN_LIST_LOCK. */ + svn_fs_x__shared_txn_data_t *txns; + + /* A free transaction object, or NULL if there is no free object. + Access to this object is synchronised under TXN_LIST_LOCK. */ + svn_fs_x__shared_txn_data_t *free_txn; + + /* The following lock must be taken out in reverse order of their + declaration here. Any subset may be acquired and held at any given + time but their relative acquisition order must not change. + + (lock 'txn-current' before 'pack' before 'write' before 'txn-list') */ + + /* A lock for intra-process synchronization when accessing the TXNS list. */ + svn_mutex__t *txn_list_lock; + + /* A lock for intra-process synchronization when grabbing the + repository write lock. */ + svn_mutex__t *fs_write_lock; + + /* A lock for intra-process synchronization when grabbing the + repository pack operation lock. */ + svn_mutex__t *fs_pack_lock; + + /* A lock for intra-process synchronization when locking the + txn-current file. */ + svn_mutex__t *txn_current_lock; + + /* The common pool, under which this object is allocated, subpools + of which are used to allocate the transaction objects. */ + apr_pool_t *common_pool; +} svn_fs_x__shared_data_t; + +/* Data structure for the 1st level DAG node cache. */ +typedef struct svn_fs_x__dag_cache_t svn_fs_x__dag_cache_t; + +/* Key type for all caches that use revision + offset / counter as key. + + Note: Cache keys should be 16 bytes for best performance and there + should be no padding. */ +typedef struct svn_fs_x__pair_cache_key_t +{ + /* The object's revision. Use the 64 data type to prevent padding. */ + apr_int64_t revision; + + /* Sub-address: item index, revprop generation, packed flag, etc. */ + apr_int64_t second; +} svn_fs_x__pair_cache_key_t; + +/* Key type that identifies a representation / rep header. + + Note: Cache keys should require no padding. */ +typedef struct svn_fs_x__representation_cache_key_t +{ + /* Revision that contains the representation */ + apr_int64_t revision; + + /* Packed or non-packed representation (boolean)? */ + apr_int64_t is_packed; + + /* Item index of the representation */ + apr_uint64_t item_index; +} svn_fs_x__representation_cache_key_t; + +/* Key type that identifies a txdelta window. + + Note: Cache keys should require no padding. */ +typedef struct svn_fs_x__window_cache_key_t +{ + /* The object's revision. Use the 64 data type to prevent padding. */ + apr_int64_t revision; + + /* Window number within that representation. */ + apr_int64_t chunk_index; + + /* Item index of the representation */ + apr_uint64_t item_index; +} svn_fs_x__window_cache_key_t; + +/* Private (non-shared) FSX-specific data for each svn_fs_t object. + Any caches in here may be NULL. */ +typedef struct svn_fs_x__data_t +{ + /* The format number of this FS. */ + int format; + + /* The maximum number of files to store per directory. */ + int max_files_per_dir; + + /* Rev / pack file read granularity in bytes. */ + apr_int64_t block_size; + + /* Rev / pack file granularity (in bytes) covered by a single phys-to-log + * index page. */ + /* Capacity in entries of log-to-phys index pages */ + apr_int64_t l2p_page_size; + + /* Rev / pack file granularity covered by phys-to-log index pages */ + apr_int64_t p2l_page_size; + + /* The revision that was youngest, last time we checked. */ + svn_revnum_t youngest_rev_cache; + + /* Caches of immutable data. (Note that these may be shared between + multiple svn_fs_t's for the same filesystem.) */ + + /* Access to the configured memcached instances. May be NULL. */ + svn_memcache_t *memcache; + + /* If TRUE, don't ignore any cache-related errors. If FALSE, errors from + e.g. memcached may be ignored as caching is an optional feature. */ + svn_boolean_t fail_stop; + + /* Caches native dag_node_t* instances and acts as a 1st level cache */ + svn_fs_x__dag_cache_t *dag_node_cache; + + /* DAG node cache for immutable nodes. Maps (revision, fspath) + to (dag_node_t *). This is the 2nd level cache for DAG nodes. */ + svn_cache__t *rev_node_cache; + + /* A cache of the contents of immutable directories; maps from + unparsed FS ID to a apr_hash_t * mapping (const char *) dirent + names to (svn_fs_x__dirent_t *). */ + svn_cache__t *dir_cache; + + /* Fulltext cache; currently only used with memcached. Maps from + rep key (revision/offset) to svn_stringbuf_t. */ + svn_cache__t *fulltext_cache; + + /* Access object to the revprop "generation". Will be NULL until + the first access. May be also get closed and set to NULL again. */ + apr_file_t *revprop_generation_file; + + /* Revision property cache. Maps from (rev,generation) to apr_hash_t. */ + svn_cache__t *revprop_cache; + + /* Node properties cache. Maps from rep key to apr_hash_t. */ + svn_cache__t *properties_cache; + + /* Pack manifest cache; a cache mapping (svn_revnum_t) shard number to + a manifest; and a manifest is a mapping from (svn_revnum_t) revision + number offset within a shard to (apr_off_t) byte-offset in the + respective pack file. */ + svn_cache__t *packed_offset_cache; + + /* Cache for txdelta_window_t objects; + * the key is svn_fs_x__window_cache_key_t */ + svn_cache__t *txdelta_window_cache; + + /* Cache for combined windows as svn_stringbuf_t objects; + the key is svn_fs_x__window_cache_key_t */ + svn_cache__t *combined_window_cache; + + /* Cache for svn_fs_x__rep_header_t objects; + * the key is (revision, item index) */ + svn_cache__t *node_revision_cache; + + /* Cache for noderevs_t containers; + the key is a (pack file revision, file offset) pair */ + svn_cache__t *noderevs_container_cache; + + /* Cache for change lists as APR arrays of svn_fs_x__change_t * objects; + the key is the revision */ + svn_cache__t *changes_cache; + + /* Cache for change_list_t containers; + the key is a (pack file revision, file offset) pair */ + svn_cache__t *changes_container_cache; + + /* Cache for star-delta / representation containers; + the key is a (pack file revision, file offset) pair */ + svn_cache__t *reps_container_cache; + + /* Cache for svn_fs_x__rep_header_t objects; the key is a + (revision, item index) pair */ + svn_cache__t *rep_header_cache; + + /* Cache for svn_mergeinfo_t objects; the key is a combination of + revision, inheritance flags and path. */ + svn_cache__t *mergeinfo_cache; + + /* Cache for presence of svn_mergeinfo_t on a noderev; the key is a + combination of revision, inheritance flags and path; value is "1" + if the node has mergeinfo, "0" if it doesn't. */ + svn_cache__t *mergeinfo_existence_cache; + + /* Cache for l2p_header_t objects; the key is (revision, is-packed). + Will be NULL for pre-format7 repos */ + svn_cache__t *l2p_header_cache; + + /* Cache for l2p_page_t objects; the key is svn_fs_x__page_cache_key_t. + Will be NULL for pre-format7 repos */ + svn_cache__t *l2p_page_cache; + + /* Cache for p2l_header_t objects; the key is (revision, is-packed). + Will be NULL for pre-format7 repos */ + svn_cache__t *p2l_header_cache; + + /* Cache for apr_array_header_t objects containing svn_fs_x__p2l_entry_t + elements; the key is svn_fs_x__page_cache_key_t. + Will be NULL for pre-format7 repos */ + svn_cache__t *p2l_page_cache; + + /* TRUE while the we hold a lock on the write lock file. */ + svn_boolean_t has_write_lock; + + /* Data shared between all svn_fs_t objects for a given filesystem. */ + svn_fs_x__shared_data_t *shared; + + /* The sqlite database used for rep caching. */ + svn_sqlite__db_t *rep_cache_db; + + /* Thread-safe boolean */ + svn_atomic_t rep_cache_db_opened; + + /* The oldest revision not in a pack file. It also applies to revprops + * if revprop packing has been enabled by the FSX format version. */ + svn_revnum_t min_unpacked_rev; + + /* Whether rep-sharing is supported by the filesystem + * and allowed by the configuration. */ + svn_boolean_t rep_sharing_allowed; + + /* File size limit in bytes up to which multiple revprops shall be packed + * into a single file. */ + apr_int64_t revprop_pack_size; + + /* Whether packed revprop files shall be compressed. */ + svn_boolean_t compress_packed_revprops; + + /* Restart deltification histories after each multiple of this value */ + apr_int64_t max_deltification_walk; + + /* Maximum number of length of the linear part at the top of the + * deltification history after which skip deltas will be used. */ + apr_int64_t max_linear_deltification; + + /* Compression level to use with txdelta storage format in new revs. */ + int delta_compression_level; + + /* Pack after every commit. */ + svn_boolean_t pack_after_commit; + + /* Per-instance filesystem ID, which provides an additional level of + uniqueness for filesystems that share the same UUID, but should + still be distinguishable (e.g. backups produced by svn_fs_hotcopy() + or dump / load cycles). */ + const char *instance_id; + + /* Pointer to svn_fs_open. */ + svn_error_t *(*svn_fs_open_)(svn_fs_t **, const char *, apr_hash_t *, + apr_pool_t *, apr_pool_t *); +} svn_fs_x__data_t; + + +/*** Filesystem Transaction ***/ +typedef struct svn_fs_x__transaction_t +{ + /* property list (const char * name, svn_string_t * value). + may be NULL if there are no properties. */ + apr_hash_t *proplist; + + /* revision upon which this txn is base. (unfinished only) */ + svn_revnum_t base_rev; + + /* copies list (const char * copy_ids), or NULL if there have been + no copies in this transaction. */ + apr_array_header_t *copies; + +} svn_fs_x__transaction_t; + + +/*** Representation ***/ +/* If you add fields to this, check to see if you need to change + * svn_fs_x__rep_copy. */ +typedef struct svn_fs_x__representation_t +{ + /* Checksums digests for the contents produced by this representation. + This checksum is for the contents the rep shows to consumers, + regardless of how the rep stores the data under the hood. It is + independent of the storage (fulltext, delta, whatever). + + If has_sha1 is FALSE, then for compatibility behave as though this + checksum matches the expected checksum. + + The md5 checksum is always filled, unless this is rep which was + retrieved from the rep-cache. The sha1 checksum is only computed on + a write, for use with rep-sharing. */ + svn_boolean_t has_sha1; + unsigned char sha1_digest[APR_SHA1_DIGESTSIZE]; + unsigned char md5_digest[APR_MD5_DIGESTSIZE]; + + /* Change set and item number where this representation is located. */ + svn_fs_x__id_t id; + + /* The size of the representation in bytes as seen in the revision + file. */ + svn_filesize_t size; + + /* The size of the fulltext of the representation. */ + svn_filesize_t expanded_size; + +} svn_fs_x__representation_t; + + +/*** Node-Revision ***/ +/* If you add fields to this, check to see if you need to change + * copy_node_revision in dag.c. */ +typedef struct svn_fs_x__noderev_t +{ + /* Predecessor node revision id. Will be "unused" if there is no + predecessor for this node revision. */ + svn_fs_x__id_t predecessor_id; + + /* The ID of this noderev */ + svn_fs_x__id_t noderev_id; + + /* Identifier of the node that this noderev belongs to. */ + svn_fs_x__id_t node_id; + + /* Copy identifier of this line of history. */ + svn_fs_x__id_t copy_id; + + /* If this node-rev is a copy, where was it copied from? */ + const char *copyfrom_path; + svn_revnum_t copyfrom_rev; + + /* Helper for history tracing, root of the parent tree from whence + this node-rev was copied. */ + svn_revnum_t copyroot_rev; + const char *copyroot_path; + + /* node kind */ + svn_node_kind_t kind; + + /* number of predecessors this node revision has (recursively). */ + int predecessor_count; + + /* representation key for this node's properties. may be NULL if + there are no properties. */ + svn_fs_x__representation_t *prop_rep; + + /* representation for this node's data. may be NULL if there is + no data. */ + svn_fs_x__representation_t *data_rep; + + /* path at which this node first came into existence. */ + const char *created_path; + + /* Does this node itself have svn:mergeinfo? */ + svn_boolean_t has_mergeinfo; + + /* Number of nodes with svn:mergeinfo properties that are + descendants of this node (including it itself) */ + apr_int64_t mergeinfo_count; + +} svn_fs_x__noderev_t; + + +/** The type of a directory entry. */ +typedef struct svn_fs_x__dirent_t +{ + + /** The name of this directory entry. */ + const char *name; + + /** The node revision ID it names. */ + svn_fs_x__id_t id; + + /** The node kind. */ + svn_node_kind_t kind; +} svn_fs_x__dirent_t; + + +/*** Change ***/ +typedef struct svn_fs_x__change_t +{ + /* Path of the change. */ + svn_string_t path; + + /* node revision id of changed path */ + svn_fs_x__id_t noderev_id; + + /* See svn_fs_path_change2_t for a description for the remaining elements. + */ + svn_fs_path_change_kind_t change_kind; + + svn_boolean_t text_mod; + svn_boolean_t prop_mod; + svn_node_kind_t node_kind; + + svn_boolean_t copyfrom_known; + svn_revnum_t copyfrom_rev; + const char *copyfrom_path; + + svn_tristate_t mergeinfo_mod; +} svn_fs_x__change_t; + + +#ifdef __cplusplus +} +#endif /* __cplusplus */ + +#endif /* SVN_LIBSVN_FS_X_H */ diff --git a/subversion/libsvn_fs_x/fs_id.c b/subversion/libsvn_fs_x/fs_id.c new file mode 100644 index 0000000..16f8f26 --- /dev/null +++ b/subversion/libsvn_fs_x/fs_id.c @@ -0,0 +1,319 @@ +/* fs_id.c : FSX's implementation of svn_fs_id_t + * + * ==================================================================== + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + * ==================================================================== + */ + +#include "svn_pools.h" + +#include "cached_data.h" +#include "fs_id.h" + +#include "../libsvn_fs/fs-loader.h" +#include "private/svn_string_private.h" + + + +/* Structure holding everything needed to implement svn_fs_id_t for FSX. + */ +typedef struct fs_x__id_t +{ + /* API visible part. + The fsap_data member points to our svn_fs_x__id_context_t object. */ + svn_fs_id_t generic_id; + + /* Private members. + This addresses the DAG node identified by this ID object. + If it refers to a TXN, it may become . */ + svn_fs_x__id_t noderev_id; + +} fs_x__id_t; + + + +/* The state machine behind this is as follows: + + (A) FS passed in during context construction still open and uses a + different pool as the context (Usually the initial state). In that + case, FS_PATH is NULL and we watch for either pool's cleanup. + + Next states: + (B). Transition triggered by FS->POOL cleanup. + (D). Transition triggered by OWNER cleanup. + + (B) FS has been closed but not the OWNER pool, i.e. the context is valid. + FS is NULL, FS_NAME has been set. No cleanup functions are registered. + + Next states: + (C). Transition triggered by successful access to the file system. + (D). Transition triggered by OWNER cleanup. + + (C) FS is open, allocated in the context's OWNER pool (maybe the initial + state but that is atypical). No cleanup functions are registered. + + Next states: + (D). Transition triggered by OWNER cleanup. + + (D) Destroyed. No access nor notification is allowed. + Final state. + + */ +struct svn_fs_x__id_context_t +{ + /* If this is NULL, FS_PATH points to the on-disk path to the file system + we need to re-open the FS. */ + svn_fs_t *fs; + + /* If FS is NULL, this points to the on-disk path to pass into svn_fs_open2 + to reopen the filesystem. Allocated in OWNER. May only be NULL if FS + is not.*/ + const char *fs_path; + + /* If FS is NULL, this points to svn_fs_open() as passed to the library. */ + svn_error_t *(*svn_fs_open_)(svn_fs_t **, + const char *, + apr_hash_t *, + apr_pool_t *, + apr_pool_t *); + + /* Pool that this context struct got allocated in. */ + apr_pool_t *owner; + + /* A sub-pool of ONWER. We use this when querying data from FS. Gets + cleanup up immediately after usage. NULL until needed for the first + time. */ + apr_pool_t *aux_pool; +}; + +/* Forward declaration. */ +static apr_status_t +fs_cleanup(void *baton); + +/* APR pool cleanup notification for the svn_fs_x__id_context_t given as + BATON. Sent at state (A)->(D) transition. */ +static apr_status_t +owner_cleanup(void *baton) +{ + svn_fs_x__id_context_t *context = baton; + + /* Everything in CONTEXT gets cleaned up automatically. + However, we must prevent notifications from other pools. */ + apr_pool_cleanup_kill(context->fs->pool, context, fs_cleanup); + + return APR_SUCCESS; +} + +/* APR pool cleanup notification for the svn_fs_x__id_context_t given as + BATON. Sent at state (A)->(B) transition. */ +static apr_status_t +fs_cleanup(void *baton) +{ + svn_fs_x__id_context_t *context = baton; + svn_fs_x__data_t *ffd = context->fs->fsap_data; + + /* Remember the FS_PATH to potentially reopen and mark the FS as n/a. */ + context->fs_path = apr_pstrdup(context->owner, context->fs->path); + context->svn_fs_open_ = ffd->svn_fs_open_; + context->fs = NULL; + + + /* No need for further notifications because from now on, everything is + allocated in OWNER. */ + apr_pool_cleanup_kill(context->owner, context, owner_cleanup); + + return APR_SUCCESS; +} + +/* Return the filesystem provided by CONTEXT. Re-open it if necessary. + Returns NULL if the FS could not be opened. */ +static svn_fs_t * +get_fs(svn_fs_x__id_context_t *context) +{ + if (!context->fs) + { + svn_error_t *err; + + SVN_ERR_ASSERT_NO_RETURN(context->svn_fs_open_); + + err = context->svn_fs_open_(&context->fs, context->fs_path, NULL, + context->owner, context->owner); + if (err) + { + svn_error_clear(err); + context->fs = NULL; + } + } + + return context->fs; +} + +/* Provide the auto-created auxiliary pool from ID's context object. */ +static apr_pool_t * +get_aux_pool(const fs_x__id_t *id) +{ + svn_fs_x__id_context_t *context = id->generic_id.fsap_data; + if (!context->aux_pool) + context->aux_pool = svn_pool_create(context->owner); + + return context->aux_pool; +} + +/* Return the noderev structure identified by ID. Returns NULL for invalid + IDs or inaccessible repositories. The caller should clear the auxiliary + pool before returning to its respective caller. */ +static svn_fs_x__noderev_t * +get_noderev(const fs_x__id_t *id) +{ + svn_fs_x__noderev_t *result = NULL; + + svn_fs_x__id_context_t *context = id->generic_id.fsap_data; + svn_fs_t *fs = get_fs(context); + apr_pool_t *pool = get_aux_pool(id); + + if (fs) + { + svn_error_t *err = svn_fs_x__get_node_revision(&result, fs, + &id->noderev_id, + pool, pool); + if (err) + { + svn_error_clear(err); + result = NULL; + } + } + + return result; +} + + + +/*** Implement v-table functions ***/ + +/* Implement id_vtable_t.unparse */ +static svn_string_t * +id_unparse(const svn_fs_id_t *fs_id, + apr_pool_t *result_pool) +{ + const fs_x__id_t *id = (const fs_x__id_t *)fs_id; + return svn_fs_x__id_unparse(&id->noderev_id, result_pool); +} + +/* Implement id_vtable_t.compare. + + The result is efficiently computed for matching IDs. The far less + meaningful "common ancestor" relationship has a larger latency when + evaluated first for a given context object. Subsequent calls are + moderately fast. */ +static svn_fs_node_relation_t +id_compare(const svn_fs_id_t *a, + const svn_fs_id_t *b) +{ + const fs_x__id_t *id_a = (const fs_x__id_t *)a; + const fs_x__id_t *id_b = (const fs_x__id_t *)b; + svn_fs_x__noderev_t *noderev_a, *noderev_b; + svn_boolean_t same_node; + + /* Quick check: same IDs? */ + if (svn_fs_x__id_eq(&id_a->noderev_id, &id_b->noderev_id)) + return svn_fs_node_unchanged; + + /* Fetch the nodesrevs, compare the IDs of the nodes they belong to and + clean up any temporaries. If we can't find one of the noderevs, don't + get access to the FS etc., report the IDs as "unrelated" as only + valid / existing things may be related. */ + noderev_a = get_noderev(id_a); + noderev_b = get_noderev(id_b); + + if (noderev_a && noderev_b) + same_node = svn_fs_x__id_eq(&noderev_a->node_id, &noderev_b->node_id); + else + same_node = FALSE; + + svn_pool_clear(get_aux_pool(id_a)); + svn_pool_clear(get_aux_pool(id_b)); + + /* Return result. */ + return same_node ? svn_fs_node_common_ancestor : svn_fs_node_unrelated; +} + + +/* Creating ID's. */ + +static id_vtable_t id_vtable = { + id_unparse, + id_compare +}; + +svn_fs_x__id_context_t * +svn_fs_x__id_create_context(svn_fs_t *fs, + apr_pool_t *result_pool) +{ + svn_fs_x__id_context_t *result = apr_pcalloc(result_pool, sizeof(*result)); + result->fs = fs; + result->owner = result_pool; + + /* Check for a special case: + If the owner of the context also owns the FS, there will be no reason + to notify them of the respective other's cleanup. */ + if (result_pool != fs->pool) + { + /* If the context's owner gets cleaned up before FS, we must disconnect + from the FS. */ + apr_pool_cleanup_register(result_pool, + result, + owner_cleanup, + apr_pool_cleanup_null); + + /* If the FS gets cleaned up before the context's owner, disconnect + from the FS and remember its path on disk to be able to re-open it + later if necessary. */ + apr_pool_cleanup_register(fs->pool, + result, + fs_cleanup, + apr_pool_cleanup_null); + } + + return result; +} + +svn_fs_id_t * +svn_fs_x__id_create(svn_fs_x__id_context_t *context, + const svn_fs_x__id_t *noderev_id, + apr_pool_t *result_pool) +{ + fs_x__id_t *id; + + /* Special case: NULL IDs */ + if (!svn_fs_x__id_used(noderev_id)) + return NULL; + + /* In theory, the CONTEXT might not be owned by POOL. It's FS might even + have been closed. Make sure we have a context owned by POOL. */ + if (context->owner != result_pool) + context = svn_fs_x__id_create_context(get_fs(context), result_pool); + + /* Finally, construct the ID object. */ + id = apr_pcalloc(result_pool, sizeof(*id)); + id->noderev_id = *noderev_id; + + id->generic_id.vtable = &id_vtable; + id->generic_id.fsap_data = context; + + return (svn_fs_id_t *)id; +} diff --git a/subversion/libsvn_fs_x/fs_id.h b/subversion/libsvn_fs_x/fs_id.h new file mode 100644 index 0000000..6d6a08a --- /dev/null +++ b/subversion/libsvn_fs_x/fs_id.h @@ -0,0 +1,62 @@ +/* fs_id.h : FSX's implementation of svn_fs_id_t + * + * ==================================================================== + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + * ==================================================================== + */ + +#ifndef SVN_LIBSVN_FS_X_FS_ID_H +#define SVN_LIBSVN_FS_X_FS_ID_H + +#include "id.h" + +#ifdef __cplusplus +extern "C" { +#endif /* __cplusplus */ + +/* Transparent FS access object to be used with FSX's implementation for + svn_fs_id_t. It allows the ID object query data from the respective FS + to check for node relationships etc. It also allows to re-open the repo + after the original svn_fs_t object got cleaned up, i.e. the ID object's + functionality does not depend on any other object's lifetime. + + For efficiency, multiple svn_fs_id_t should share the same context. + */ +typedef struct svn_fs_x__id_context_t svn_fs_x__id_context_t; + +/* Return a context object for filesystem FS; construct it in RESULT_POOL. */ +svn_fs_x__id_context_t * +svn_fs_x__id_create_context(svn_fs_t *fs, + apr_pool_t *result_pool); + +/* Create a permanent ID based on NODEREV_ID, allocated in RESULT_POOL. + For complex requests, access the filesystem provided with CONTEXT. + + For efficiency, CONTEXT should have been created in RESULT_POOL and be + shared between multiple ID instances allocated in the same pool. + */ +svn_fs_id_t * +svn_fs_x__id_create(svn_fs_x__id_context_t *context, + const svn_fs_x__id_t *noderev_id, + apr_pool_t *result_pool); + +#ifdef __cplusplus +} +#endif /* __cplusplus */ + +#endif /* SVN_LIBSVN_FS_X_FS_ID_H */ diff --git a/subversion/libsvn_fs_x/fs_x.c b/subversion/libsvn_fs_x/fs_x.c new file mode 100644 index 0000000..b766b58 --- /dev/null +++ b/subversion/libsvn_fs_x/fs_x.c @@ -0,0 +1,1228 @@ +/* fs_x.c --- filesystem operations specific to fs_x + * + * ==================================================================== + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + * ==================================================================== + */ + +#include "fs_x.h" + +#include <apr_uuid.h> + +#include "svn_hash.h" +#include "svn_props.h" +#include "svn_time.h" +#include "svn_dirent_uri.h" +#include "svn_sorts.h" +#include "svn_version.h" + +#include "cached_data.h" +#include "id.h" +#include "rep-cache.h" +#include "revprops.h" +#include "transaction.h" +#include "tree.h" +#include "util.h" +#include "index.h" + +#include "private/svn_fs_util.h" +#include "private/svn_string_private.h" +#include "private/svn_subr_private.h" +#include "../libsvn_fs/fs-loader.h" + +#include "svn_private_config.h" + +/* The default maximum number of files per directory to store in the + rev and revprops directory. The number below is somewhat arbitrary, + and can be overridden by defining the macro while compiling; the + figure of 1000 is reasonable for VFAT filesystems, which are by far + the worst performers in this area. */ +#ifndef SVN_FS_X_DEFAULT_MAX_FILES_PER_DIR +#define SVN_FS_X_DEFAULT_MAX_FILES_PER_DIR 1000 +#endif + +/* Begin deltification after a node history exceeded this this limit. + Useful values are 4 to 64 with 16 being a good compromise between + computational overhead and repository size savings. + Should be a power of 2. + Values < 2 will result in standard skip-delta behavior. */ +#define SVN_FS_X_MAX_LINEAR_DELTIFICATION 16 + +/* Finding a deltification base takes operations proportional to the + number of changes being skipped. To prevent exploding runtime + during commits, limit the deltification range to this value. + Should be a power of 2 minus one. + Values < 1 disable deltification. */ +#define SVN_FS_X_MAX_DELTIFICATION_WALK 1023 + + + + +/* Check that BUF, a nul-terminated buffer of text from format file PATH, + contains only digits at OFFSET and beyond, raising an error if not. + + Uses SCRATCH_POOL for temporary allocation. */ +static svn_error_t * +check_format_file_buffer_numeric(const char *buf, + apr_off_t offset, + const char *path, + apr_pool_t *scratch_pool) +{ + return svn_fs_x__check_file_buffer_numeric(buf, offset, path, "Format", + scratch_pool); +} + +/* Return the error SVN_ERR_FS_UNSUPPORTED_FORMAT if FS's format + number is not the same as a format number supported by this + Subversion. */ +static svn_error_t * +check_format(int format) +{ + /* Put blacklisted versions here. */ + + /* We support all formats from 1-current simultaneously */ + if (1 <= format && format <= SVN_FS_X__FORMAT_NUMBER) + return SVN_NO_ERROR; + + return svn_error_createf(SVN_ERR_FS_UNSUPPORTED_FORMAT, NULL, + _("Expected FS format between '1' and '%d'; found format '%d'"), + SVN_FS_X__FORMAT_NUMBER, format); +} + +/* Read the format file at PATH and set *PFORMAT to the format version found + * and *MAX_FILES_PER_DIR to the shard size. Use SCRATCH_POOL for temporary + * allocations. */ +static svn_error_t * +read_format(int *pformat, + int *max_files_per_dir, + const char *path, + apr_pool_t *scratch_pool) +{ + svn_stream_t *stream; + svn_stringbuf_t *content; + svn_stringbuf_t *buf; + svn_boolean_t eos = FALSE; + + SVN_ERR(svn_stringbuf_from_file2(&content, path, scratch_pool)); + stream = svn_stream_from_stringbuf(content, scratch_pool); + SVN_ERR(svn_stream_readline(stream, &buf, "\n", &eos, scratch_pool)); + if (buf->len == 0 && eos) + { + /* Return a more useful error message. */ + return svn_error_createf(SVN_ERR_BAD_VERSION_FILE_FORMAT, NULL, + _("Can't read first line of format file '%s'"), + svn_dirent_local_style(path, scratch_pool)); + } + + /* Check that the first line contains only digits. */ + SVN_ERR(check_format_file_buffer_numeric(buf->data, 0, path, scratch_pool)); + SVN_ERR(svn_cstring_atoi(pformat, buf->data)); + + /* Check that we support this format at all */ + SVN_ERR(check_format(*pformat)); + + /* Read any options. */ + SVN_ERR(svn_stream_readline(stream, &buf, "\n", &eos, scratch_pool)); + if (!eos && strncmp(buf->data, "layout sharded ", 15) == 0) + { + /* Check that the argument is numeric. */ + SVN_ERR(check_format_file_buffer_numeric(buf->data, 15, path, + scratch_pool)); + SVN_ERR(svn_cstring_atoi(max_files_per_dir, buf->data + 15)); + } + else + return svn_error_createf(SVN_ERR_BAD_VERSION_FILE_FORMAT, NULL, + _("'%s' contains invalid filesystem format option '%s'"), + svn_dirent_local_style(path, scratch_pool), buf->data); + + return SVN_NO_ERROR; +} + +/* Write the format number and maximum number of files per directory + to a new format file in PATH, possibly expecting to overwrite a + previously existing file. + + Use SCRATCH_POOL for temporary allocation. */ +svn_error_t * +svn_fs_x__write_format(svn_fs_t *fs, + svn_boolean_t overwrite, + apr_pool_t *scratch_pool) +{ + svn_stringbuf_t *sb; + const char *path = svn_fs_x__path_format(fs, scratch_pool); + svn_fs_x__data_t *ffd = fs->fsap_data; + + SVN_ERR_ASSERT(1 <= ffd->format && ffd->format <= SVN_FS_X__FORMAT_NUMBER); + + sb = svn_stringbuf_createf(scratch_pool, "%d\n", ffd->format); + svn_stringbuf_appendcstr(sb, apr_psprintf(scratch_pool, + "layout sharded %d\n", + ffd->max_files_per_dir)); + + /* svn_io_write_version_file() does a load of magic to allow it to + replace version files that already exist. We only need to do + that when we're allowed to overwrite an existing file. */ + if (! overwrite) + { + /* Create the file */ + SVN_ERR(svn_io_file_create(path, sb->data, scratch_pool)); + } + else + { + SVN_ERR(svn_io_write_atomic(path, sb->data, sb->len, + NULL /* copy_perms_path */, scratch_pool)); + } + + /* And set the perms to make it read only */ + return svn_io_set_file_read_only(path, FALSE, scratch_pool); +} + +/* Check that BLOCK_SIZE is a valid block / page size, i.e. it is within + * the range of what the current system may address in RAM and it is a + * power of 2. Assume that the element size within the block is ITEM_SIZE. + * Use SCRATCH_POOL for temporary allocations. + */ +static svn_error_t * +verify_block_size(apr_int64_t block_size, + apr_size_t item_size, + const char *name, + apr_pool_t *scratch_pool) +{ + /* Limit range. */ + if (block_size <= 0) + return svn_error_createf(SVN_ERR_BAD_CONFIG_VALUE, NULL, + _("%s is too small for fsfs.conf setting '%s'."), + apr_psprintf(scratch_pool, + "%" APR_INT64_T_FMT, + block_size), + name); + + if (block_size > SVN_MAX_OBJECT_SIZE / item_size) + return svn_error_createf(SVN_ERR_BAD_CONFIG_VALUE, NULL, + _("%s is too large for fsfs.conf setting '%s'."), + apr_psprintf(scratch_pool, + "%" APR_INT64_T_FMT, + block_size), + name); + + /* Ensure it is a power of two. + * For positive X, X & (X-1) will reset the lowest bit set. + * If the result is 0, at most one bit has been set. */ + if (0 != (block_size & (block_size - 1))) + return svn_error_createf(SVN_ERR_BAD_CONFIG_VALUE, NULL, + _("%s is invalid for fsfs.conf setting '%s' " + "because it is not a power of 2."), + apr_psprintf(scratch_pool, + "%" APR_INT64_T_FMT, + block_size), + name); + + return SVN_NO_ERROR; +} + +/* Read the configuration information of the file system at FS_PATH + * and set the respective values in FFD. Use pools as usual. + */ +static svn_error_t * +read_config(svn_fs_x__data_t *ffd, + const char *fs_path, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool) +{ + svn_config_t *config; + apr_int64_t compression_level; + + SVN_ERR(svn_config_read3(&config, + svn_dirent_join(fs_path, PATH_CONFIG, scratch_pool), + FALSE, FALSE, FALSE, scratch_pool)); + + /* Initialize ffd->rep_sharing_allowed. */ + SVN_ERR(svn_config_get_bool(config, &ffd->rep_sharing_allowed, + CONFIG_SECTION_REP_SHARING, + CONFIG_OPTION_ENABLE_REP_SHARING, TRUE)); + + /* Initialize deltification settings in ffd. */ + SVN_ERR(svn_config_get_int64(config, &ffd->max_deltification_walk, + CONFIG_SECTION_DELTIFICATION, + CONFIG_OPTION_MAX_DELTIFICATION_WALK, + SVN_FS_X_MAX_DELTIFICATION_WALK)); + SVN_ERR(svn_config_get_int64(config, &ffd->max_linear_deltification, + CONFIG_SECTION_DELTIFICATION, + CONFIG_OPTION_MAX_LINEAR_DELTIFICATION, + SVN_FS_X_MAX_LINEAR_DELTIFICATION)); + SVN_ERR(svn_config_get_int64(config, &compression_level, + CONFIG_SECTION_DELTIFICATION, + CONFIG_OPTION_COMPRESSION_LEVEL, + SVN_DELTA_COMPRESSION_LEVEL_DEFAULT)); + ffd->delta_compression_level + = (int)MIN(MAX(SVN_DELTA_COMPRESSION_LEVEL_NONE, compression_level), + SVN_DELTA_COMPRESSION_LEVEL_MAX); + + /* Initialize revprop packing settings in ffd. */ + SVN_ERR(svn_config_get_bool(config, &ffd->compress_packed_revprops, + CONFIG_SECTION_PACKED_REVPROPS, + CONFIG_OPTION_COMPRESS_PACKED_REVPROPS, + TRUE)); + SVN_ERR(svn_config_get_int64(config, &ffd->revprop_pack_size, + CONFIG_SECTION_PACKED_REVPROPS, + CONFIG_OPTION_REVPROP_PACK_SIZE, + ffd->compress_packed_revprops + ? 0x100 + : 0x40)); + + ffd->revprop_pack_size *= 1024; + + /* I/O settings in ffd. */ + SVN_ERR(svn_config_get_int64(config, &ffd->block_size, + CONFIG_SECTION_IO, + CONFIG_OPTION_BLOCK_SIZE, + 64)); + SVN_ERR(svn_config_get_int64(config, &ffd->l2p_page_size, + CONFIG_SECTION_IO, + CONFIG_OPTION_L2P_PAGE_SIZE, + 0x2000)); + SVN_ERR(svn_config_get_int64(config, &ffd->p2l_page_size, + CONFIG_SECTION_IO, + CONFIG_OPTION_P2L_PAGE_SIZE, + 0x400)); + + /* Don't accept unreasonable or illegal values. + * Block size and P2L page size are in kbytes; + * L2P blocks are arrays of apr_off_t. */ + SVN_ERR(verify_block_size(ffd->block_size, 0x400, + CONFIG_OPTION_BLOCK_SIZE, scratch_pool)); + SVN_ERR(verify_block_size(ffd->p2l_page_size, 0x400, + CONFIG_OPTION_P2L_PAGE_SIZE, scratch_pool)); + SVN_ERR(verify_block_size(ffd->l2p_page_size, sizeof(apr_off_t), + CONFIG_OPTION_L2P_PAGE_SIZE, scratch_pool)); + + /* convert kBytes to bytes */ + ffd->block_size *= 0x400; + ffd->p2l_page_size *= 0x400; + /* L2P pages are in entries - not in (k)Bytes */ + + /* Debug options. */ + SVN_ERR(svn_config_get_bool(config, &ffd->pack_after_commit, + CONFIG_SECTION_DEBUG, + CONFIG_OPTION_PACK_AFTER_COMMIT, + FALSE)); + + /* memcached configuration */ + SVN_ERR(svn_cache__make_memcache_from_config(&ffd->memcache, config, + result_pool, scratch_pool)); + + SVN_ERR(svn_config_get_bool(config, &ffd->fail_stop, + CONFIG_SECTION_CACHES, CONFIG_OPTION_FAIL_STOP, + FALSE)); + + return SVN_NO_ERROR; +} + +/* Write FS' initial configuration file. + * Use SCRATCH_POOL for temporary allocations. */ +static svn_error_t * +write_config(svn_fs_t *fs, + apr_pool_t *scratch_pool) +{ +#define NL APR_EOL_STR + static const char * const fsx_conf_contents = +"### This file controls the configuration of the FSX filesystem." NL +"" NL +"[" SVN_CACHE_CONFIG_CATEGORY_MEMCACHED_SERVERS "]" NL +"### These options name memcached servers used to cache internal FSX" NL +"### data. See http://www.danga.com/memcached/ for more information on" NL +"### memcached. To use memcached with FSX, run one or more memcached" NL +"### servers, and specify each of them as an option like so:" NL +"# first-server = 127.0.0.1:11211" NL +"# remote-memcached = mymemcached.corp.example.com:11212" NL +"### The option name is ignored; the value is of the form HOST:PORT." NL +"### memcached servers can be shared between multiple repositories;" NL +"### however, if you do this, you *must* ensure that repositories have" NL +"### distinct UUIDs and paths, or else cached data from one repository" NL +"### might be used by another accidentally. Note also that memcached has" NL +"### no authentication for reads or writes, so you must ensure that your" NL +"### memcached servers are only accessible by trusted users." NL +"" NL +"[" CONFIG_SECTION_CACHES "]" NL +"### When a cache-related error occurs, normally Subversion ignores it" NL +"### and continues, logging an error if the server is appropriately" NL +"### configured (and ignoring it with file:// access). To make" NL +"### Subversion never ignore cache errors, uncomment this line." NL +"# " CONFIG_OPTION_FAIL_STOP " = true" NL +"" NL +"[" CONFIG_SECTION_REP_SHARING "]" NL +"### To conserve space, the filesystem can optionally avoid storing" NL +"### duplicate representations. This comes at a slight cost in" NL +"### performance, as maintaining a database of shared representations can" NL +"### increase commit times. The space savings are dependent upon the size" NL +"### of the repository, the number of objects it contains and the amount of" NL +"### duplication between them, usually a function of the branching and" NL +"### merging process." NL +"###" NL +"### The following parameter enables rep-sharing in the repository. It can" NL +"### be switched on and off at will, but for best space-saving results" NL +"### should be enabled consistently over the life of the repository." NL +"### 'svnadmin verify' will check the rep-cache regardless of this setting." NL +"### rep-sharing is enabled by default." NL +"# " CONFIG_OPTION_ENABLE_REP_SHARING " = true" NL +"" NL +"[" CONFIG_SECTION_DELTIFICATION "]" NL +"### To conserve space, the filesystem stores data as differences against" NL +"### existing representations. This comes at a slight cost in performance," NL +"### as calculating differences can increase commit times. Reading data" NL +"### will also create higher CPU load and the data will be fragmented." NL +"### Since deltification tends to save significant amounts of disk space," NL +"### the overall I/O load can actually be lower." NL +"###" NL +"### The options in this section allow for tuning the deltification" NL +"### strategy. Their effects on data size and server performance may vary" NL +"### from one repository to another." NL +"###" NL +"### During commit, the server may need to walk the whole change history of" NL +"### of a given node to find a suitable deltification base. This linear" NL +"### process can impact commit times, svnadmin load and similar operations." NL +"### This setting limits the depth of the deltification history. If the" NL +"### threshold has been reached, the node will be stored as fulltext and a" NL +"### new deltification history begins." NL +"### Note, this is unrelated to svn log." NL +"### Very large values rarely provide significant additional savings but" NL +"### can impact performance greatly - in particular if directory" NL +"### deltification has been activated. Very small values may be useful in" NL +"### repositories that are dominated by large, changing binaries." NL +"### Should be a power of two minus 1. A value of 0 will effectively" NL +"### disable deltification." NL +"### For 1.9, the default value is 1023." NL +"# " CONFIG_OPTION_MAX_DELTIFICATION_WALK " = 1023" NL +"###" NL +"### The skip-delta scheme used by FSX tends to repeatably store redundant" NL +"### delta information where a simple delta against the latest version is" NL +"### often smaller. By default, 1.9+ will therefore use skip deltas only" NL +"### after the linear chain of deltas has grown beyond the threshold" NL +"### specified by this setting." NL +"### Values up to 64 can result in some reduction in repository size for" NL +"### the cost of quickly increasing I/O and CPU costs. Similarly, smaller" NL +"### numbers can reduce those costs at the cost of more disk space. For" NL +"### rarely read repositories or those containing larger binaries, this may" NL +"### present a better trade-off." NL +"### Should be a power of two. A value of 1 or smaller will cause the" NL +"### exclusive use of skip-deltas." NL +"### For 1.8, the default value is 16." NL +"# " CONFIG_OPTION_MAX_LINEAR_DELTIFICATION " = 16" NL +"###" NL +"### After deltification, we compress the data through zlib to minimize on-" NL +"### disk size. That can be an expensive and ineffective process. This" NL +"### setting controls the usage of zlib in future revisions." NL +"### Revisions with highly compressible data in them may shrink in size" NL +"### if the setting is increased but may take much longer to commit. The" NL +"### time taken to uncompress that data again is widely independent of the" NL +"### compression level." NL +"### Compression will be ineffective if the incoming content is already" NL +"### highly compressed. In that case, disabling the compression entirely" NL +"### will speed up commits as well as reading the data. Repositories with" NL +"### many small compressible files (source code) but also a high percentage" NL +"### of large incompressible ones (artwork) may benefit from compression" NL +"### levels lowered to e.g. 1." NL +"### Valid values are 0 to 9 with 9 providing the highest compression ratio" NL +"### and 0 disabling it altogether." NL +"### The default value is 5." NL +"# " CONFIG_OPTION_COMPRESSION_LEVEL " = 5" NL +"" NL +"[" CONFIG_SECTION_PACKED_REVPROPS "]" NL +"### This parameter controls the size (in kBytes) of packed revprop files." NL +"### Revprops of consecutive revisions will be concatenated into a single" NL +"### file up to but not exceeding the threshold given here. However, each" NL +"### pack file may be much smaller and revprops of a single revision may be" NL +"### much larger than the limit set here. The threshold will be applied" NL +"### before optional compression takes place." NL +"### Large values will reduce disk space usage at the expense of increased" NL +"### latency and CPU usage reading and changing individual revprops. They" NL +"### become an advantage when revprop caching has been enabled because a" NL +"### lot of data can be read in one go. Values smaller than 4 kByte will" NL +"### not improve latency any further and quickly render revprop packing" NL +"### ineffective." NL +"### revprop-pack-size is 64 kBytes by default for non-compressed revprop" NL +"### pack files and 256 kBytes when compression has been enabled." NL +"# " CONFIG_OPTION_REVPROP_PACK_SIZE " = 64" NL +"###" NL +"### To save disk space, packed revprop files may be compressed. Standard" NL +"### revprops tend to allow for very effective compression. Reading and" NL +"### even more so writing, become significantly more CPU intensive. With" NL +"### revprop caching enabled, the overhead can be offset by reduced I/O" NL +"### unless you often modify revprops after packing." NL +"### Compressing packed revprops is enabled by default." NL +"# " CONFIG_OPTION_COMPRESS_PACKED_REVPROPS " = true" NL +"" NL +"[" CONFIG_SECTION_IO "]" NL +"### Parameters in this section control the data access granularity in" NL +"### format 7 repositories and later. The defaults should translate into" NL +"### decent performance over a wide range of setups." NL +"###" NL +"### When a specific piece of information needs to be read from disk, a" NL +"### data block is being read at once and its contents are being cached." NL +"### If the repository is being stored on a RAID, the block size should be" NL +"### either 50% or 100% of RAID block size / granularity. Also, your file" NL +"### system blocks/clusters should be properly aligned and sized. In that" NL +"### setup, each access will hit only one disk (minimizes I/O load) but" NL +"### uses all the data provided by the disk in a single access." NL +"### For SSD-based storage systems, slightly lower values around 16 kB" NL +"### may improve latency while still maximizing throughput." NL +"### Can be changed at any time but must be a power of 2." NL +"### block-size is given in kBytes and with a default of 64 kBytes." NL +"# " CONFIG_OPTION_BLOCK_SIZE " = 64" NL +"###" NL +"### The log-to-phys index maps data item numbers to offsets within the" NL +"### rev or pack file. This index is organized in pages of a fixed maximum" NL +"### capacity. To access an item, the page table and the respective page" NL +"### must be read." NL +"### This parameter only affects revisions with thousands of changed paths." NL +"### If you have several extremely large revisions (~1 mio changes), think" NL +"### about increasing this setting. Reducing the value will rarely result" NL +"### in a net speedup." NL +"### This is an expert setting. Must be a power of 2." NL +"### l2p-page-size is 8192 entries by default." NL +"# " CONFIG_OPTION_L2P_PAGE_SIZE " = 8192" NL +"###" NL +"### The phys-to-log index maps positions within the rev or pack file to" NL +"### to data items, i.e. describes what piece of information is being" NL +"### stored at any particular offset. The index describes the rev file" NL +"### in chunks (pages) and keeps a global list of all those pages. Large" NL +"### pages mean a shorter page table but a larger per-page description of" NL +"### data items in it. The latency sweet spot depends on the change size" NL +"### distribution but covers a relatively wide range." NL +"### If the repository contains very large files, i.e. individual changes" NL +"### of tens of MB each, increasing the page size will shorten the index" NL +"### file at the expense of a slightly increased latency in sections with" NL +"### smaller changes." NL +"### For source code repositories, this should be about 16x the block-size." NL +"### Must be a power of 2." NL +"### p2l-page-size is given in kBytes and with a default of 1024 kBytes." NL +"# " CONFIG_OPTION_P2L_PAGE_SIZE " = 1024" NL +; +#undef NL + return svn_io_file_create(svn_dirent_join(fs->path, PATH_CONFIG, + scratch_pool), + fsx_conf_contents, scratch_pool); +} + +/* Read FS's UUID file and store the data in the FS struct. */ +static svn_error_t * +read_uuid(svn_fs_t *fs, + apr_pool_t *scratch_pool) +{ + svn_fs_x__data_t *ffd = fs->fsap_data; + apr_file_t *uuid_file; + char buf[APR_UUID_FORMATTED_LENGTH + 2]; + apr_size_t limit; + + /* Read the repository uuid. */ + SVN_ERR(svn_io_file_open(&uuid_file, svn_fs_x__path_uuid(fs, scratch_pool), + APR_READ | APR_BUFFERED, APR_OS_DEFAULT, + scratch_pool)); + + limit = sizeof(buf); + SVN_ERR(svn_io_read_length_line(uuid_file, buf, &limit, scratch_pool)); + fs->uuid = apr_pstrdup(fs->pool, buf); + + /* Read the instance ID. */ + limit = sizeof(buf); + SVN_ERR(svn_io_read_length_line(uuid_file, buf, &limit, + scratch_pool)); + ffd->instance_id = apr_pstrdup(fs->pool, buf); + + SVN_ERR(svn_io_file_close(uuid_file, scratch_pool)); + + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__read_format_file(svn_fs_t *fs, + apr_pool_t *scratch_pool) +{ + svn_fs_x__data_t *ffd = fs->fsap_data; + int format, max_files_per_dir; + + /* Read info from format file. */ + SVN_ERR(read_format(&format, &max_files_per_dir, + svn_fs_x__path_format(fs, scratch_pool), scratch_pool)); + + /* Now that we've got *all* info, store / update values in FFD. */ + ffd->format = format; + ffd->max_files_per_dir = max_files_per_dir; + + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__open(svn_fs_t *fs, + const char *path, + apr_pool_t *scratch_pool) +{ + svn_fs_x__data_t *ffd = fs->fsap_data; + fs->path = apr_pstrdup(fs->pool, path); + + /* Read the FS format file. */ + SVN_ERR(svn_fs_x__read_format_file(fs, scratch_pool)); + + /* Read in and cache the repository uuid. */ + SVN_ERR(read_uuid(fs, scratch_pool)); + + /* Read the min unpacked revision. */ + SVN_ERR(svn_fs_x__update_min_unpacked_rev(fs, scratch_pool)); + + /* Read the configuration file. */ + SVN_ERR(read_config(ffd, fs->path, fs->pool, scratch_pool)); + + return svn_error_trace(svn_fs_x__read_current(&ffd->youngest_rev_cache, + fs, scratch_pool)); +} + +/* Baton type bridging svn_fs_x__upgrade and upgrade_body carrying + * parameters over between them. */ +typedef struct upgrade_baton_t +{ + svn_fs_t *fs; + svn_fs_upgrade_notify_t notify_func; + void *notify_baton; + svn_cancel_func_t cancel_func; + void *cancel_baton; +} upgrade_baton_t; + +/* Upgrade the FS given in upgrade_baton_t *)BATON to the latest format + * version. Apply options an invoke callback from that BATON. + * Temporary allocations are to be made from SCRATCH_POOL. + * + * At the moment, this is a simple placeholder as we don't support upgrades + * from experimental FSX versions. + */ +static svn_error_t * +upgrade_body(void *baton, + apr_pool_t *scratch_pool) +{ + upgrade_baton_t *upgrade_baton = baton; + svn_fs_t *fs = upgrade_baton->fs; + int format, max_files_per_dir; + const char *format_path = svn_fs_x__path_format(fs, scratch_pool); + + /* Read the FS format number and max-files-per-dir setting. */ + SVN_ERR(read_format(&format, &max_files_per_dir, format_path, + scratch_pool)); + + /* If we're already up-to-date, there's nothing else to be done here. */ + if (format == SVN_FS_X__FORMAT_NUMBER) + return SVN_NO_ERROR; + + /* Done */ + return SVN_NO_ERROR; +} + + +svn_error_t * +svn_fs_x__upgrade(svn_fs_t *fs, + svn_fs_upgrade_notify_t notify_func, + void *notify_baton, + svn_cancel_func_t cancel_func, + void *cancel_baton, + apr_pool_t *scratch_pool) +{ + upgrade_baton_t baton; + baton.fs = fs; + baton.notify_func = notify_func; + baton.notify_baton = notify_baton; + baton.cancel_func = cancel_func; + baton.cancel_baton = cancel_baton; + + return svn_fs_x__with_all_locks(fs, upgrade_body, (void *)&baton, + scratch_pool); +} + + +svn_error_t * +svn_fs_x__youngest_rev(svn_revnum_t *youngest_p, + svn_fs_t *fs, + apr_pool_t *scratch_pool) +{ + svn_fs_x__data_t *ffd = fs->fsap_data; + SVN_ERR(svn_fs_x__read_current(youngest_p, fs, scratch_pool)); + ffd->youngest_rev_cache = *youngest_p; + + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__ensure_revision_exists(svn_revnum_t rev, + svn_fs_t *fs, + apr_pool_t *scratch_pool) +{ + svn_fs_x__data_t *ffd = fs->fsap_data; + + if (! SVN_IS_VALID_REVNUM(rev)) + return svn_error_createf(SVN_ERR_FS_NO_SUCH_REVISION, NULL, + _("Invalid revision number '%ld'"), rev); + + + /* Did the revision exist the last time we checked the current + file? */ + if (rev <= ffd->youngest_rev_cache) + return SVN_NO_ERROR; + + SVN_ERR(svn_fs_x__read_current(&ffd->youngest_rev_cache, fs, scratch_pool)); + + /* Check again. */ + if (rev <= ffd->youngest_rev_cache) + return SVN_NO_ERROR; + + return svn_error_createf(SVN_ERR_FS_NO_SUCH_REVISION, NULL, + _("No such revision %ld"), rev); +} + + +svn_error_t * +svn_fs_x__file_length(svn_filesize_t *length, + svn_fs_x__noderev_t *noderev) +{ + if (noderev->data_rep) + *length = noderev->data_rep->expanded_size; + else + *length = 0; + + return SVN_NO_ERROR; +} + +svn_boolean_t +svn_fs_x__file_text_rep_equal(svn_fs_x__representation_t *a, + svn_fs_x__representation_t *b) +{ + svn_boolean_t a_empty = a == NULL || a->expanded_size == 0; + svn_boolean_t b_empty = b == NULL || b->expanded_size == 0; + + /* This makes sure that neither rep will be NULL later on */ + if (a_empty && b_empty) + return TRUE; + + if (a_empty != b_empty) + return FALSE; + + /* Same physical representation? Note that these IDs are always up-to-date + instead of e.g. being set lazily. */ + if (svn_fs_x__id_eq(&a->id, &b->id)) + return TRUE; + + /* Contents are equal if the checksums match. These are also always known. + */ + return memcmp(a->md5_digest, b->md5_digest, sizeof(a->md5_digest)) == 0 + && memcmp(a->sha1_digest, b->sha1_digest, sizeof(a->sha1_digest)) == 0; +} + +svn_error_t * +svn_fs_x__prop_rep_equal(svn_boolean_t *equal, + svn_fs_t *fs, + svn_fs_x__noderev_t *a, + svn_fs_x__noderev_t *b, + svn_boolean_t strict, + apr_pool_t *scratch_pool) +{ + svn_fs_x__representation_t *rep_a = a->prop_rep; + svn_fs_x__representation_t *rep_b = b->prop_rep; + apr_hash_t *proplist_a; + apr_hash_t *proplist_b; + + /* Mainly for a==b==NULL */ + if (rep_a == rep_b) + { + *equal = TRUE; + return SVN_NO_ERROR; + } + + /* Committed property lists can be compared quickly */ + if ( rep_a && rep_b + && svn_fs_x__is_revision(rep_a->id.change_set) + && svn_fs_x__is_revision(rep_b->id.change_set)) + { + /* MD5 must be given. Having the same checksum is good enough for + accepting the prop lists as equal. */ + *equal = memcmp(rep_a->md5_digest, rep_b->md5_digest, + sizeof(rep_a->md5_digest)) == 0; + return SVN_NO_ERROR; + } + + /* Same path in same txn? */ + if (svn_fs_x__id_eq(&a->noderev_id, &b->noderev_id)) + { + *equal = TRUE; + return SVN_NO_ERROR; + } + + /* Skip the expensive bits unless we are in strict mode. + Simply assume that there is a different. */ + if (!strict) + { + *equal = FALSE; + return SVN_NO_ERROR; + } + + /* At least one of the reps has been modified in a txn. + Fetch and compare them. */ + SVN_ERR(svn_fs_x__get_proplist(&proplist_a, fs, a, scratch_pool, + scratch_pool)); + SVN_ERR(svn_fs_x__get_proplist(&proplist_b, fs, b, scratch_pool, + scratch_pool)); + + *equal = svn_fs__prop_lists_equal(proplist_a, proplist_b, scratch_pool); + return SVN_NO_ERROR; +} + + +svn_error_t * +svn_fs_x__file_checksum(svn_checksum_t **checksum, + svn_fs_x__noderev_t *noderev, + svn_checksum_kind_t kind, + apr_pool_t *result_pool) +{ + *checksum = NULL; + + if (noderev->data_rep) + { + svn_checksum_t temp; + temp.kind = kind; + + switch(kind) + { + case svn_checksum_md5: + temp.digest = noderev->data_rep->md5_digest; + break; + + case svn_checksum_sha1: + if (! noderev->data_rep->has_sha1) + return SVN_NO_ERROR; + + temp.digest = noderev->data_rep->sha1_digest; + break; + + default: + return SVN_NO_ERROR; + } + + *checksum = svn_checksum_dup(&temp, result_pool); + } + + return SVN_NO_ERROR; +} + +svn_fs_x__representation_t * +svn_fs_x__rep_copy(svn_fs_x__representation_t *rep, + apr_pool_t *result_pool) +{ + if (rep == NULL) + return NULL; + + return apr_pmemdup(result_pool, rep, sizeof(*rep)); +} + + +/* Write out the zeroth revision for filesystem FS. + Perform temporary allocations in SCRATCH_POOL. */ +static svn_error_t * +write_revision_zero(svn_fs_t *fs, + apr_pool_t *scratch_pool) +{ + /* Use an explicit sub-pool to have full control over temp file lifetimes. + * Since we have it, use it for everything else as well. */ + apr_pool_t *subpool = svn_pool_create(scratch_pool); + const char *path_revision_zero = svn_fs_x__path_rev(fs, 0, subpool); + apr_hash_t *proplist; + svn_string_t date; + + apr_array_header_t *index_entries; + svn_fs_x__p2l_entry_t *entry; + svn_fs_x__revision_file_t *rev_file; + const char *l2p_proto_index, *p2l_proto_index; + + /* Construct a skeleton r0 with no indexes. */ + svn_string_t *noderev_str = svn_string_create("id: 2+0\n" + "node: 0+0\n" + "copy: 0+0\n" + "type: dir\n" + "count: 0\n" + "cpath: /\n" + "\n", + subpool); + svn_string_t *changes_str = svn_string_create("\n", + subpool); + svn_string_t *r0 = svn_string_createf(subpool, "%s%s", + noderev_str->data, + changes_str->data); + + /* Write skeleton r0 to disk. */ + SVN_ERR(svn_io_file_create(path_revision_zero, r0->data, subpool)); + + /* Construct the index P2L contents: describe the 2 items we have. + Be sure to create them in on-disk order. */ + index_entries = apr_array_make(subpool, 2, sizeof(entry)); + + entry = apr_pcalloc(subpool, sizeof(*entry)); + entry->offset = 0; + entry->size = (apr_off_t)noderev_str->len; + entry->type = SVN_FS_X__ITEM_TYPE_NODEREV; + entry->item_count = 1; + entry->items = apr_pcalloc(subpool, sizeof(*entry->items)); + entry->items[0].change_set = 0; + entry->items[0].number = SVN_FS_X__ITEM_INDEX_ROOT_NODE; + APR_ARRAY_PUSH(index_entries, svn_fs_x__p2l_entry_t *) = entry; + + entry = apr_pcalloc(subpool, sizeof(*entry)); + entry->offset = (apr_off_t)noderev_str->len; + entry->size = (apr_off_t)changes_str->len; + entry->type = SVN_FS_X__ITEM_TYPE_CHANGES; + entry->item_count = 1; + entry->items = apr_pcalloc(subpool, sizeof(*entry->items)); + entry->items[0].change_set = 0; + entry->items[0].number = SVN_FS_X__ITEM_INDEX_CHANGES; + APR_ARRAY_PUSH(index_entries, svn_fs_x__p2l_entry_t *) = entry; + + /* Now re-open r0, create proto-index files from our entries and + rewrite the index section of r0. */ + SVN_ERR(svn_fs_x__open_pack_or_rev_file_writable(&rev_file, fs, 0, + subpool, subpool)); + SVN_ERR(svn_fs_x__p2l_index_from_p2l_entries(&p2l_proto_index, fs, + rev_file, index_entries, + subpool, subpool)); + SVN_ERR(svn_fs_x__l2p_index_from_p2l_entries(&l2p_proto_index, fs, + index_entries, + subpool, subpool)); + SVN_ERR(svn_fs_x__add_index_data(fs, rev_file->file, l2p_proto_index, + p2l_proto_index, 0, subpool)); + SVN_ERR(svn_fs_x__close_revision_file(rev_file)); + + SVN_ERR(svn_io_set_file_read_only(path_revision_zero, FALSE, fs->pool)); + + /* Set a date on revision 0. */ + date.data = svn_time_to_cstring(apr_time_now(), fs->pool); + date.len = strlen(date.data); + proplist = apr_hash_make(fs->pool); + svn_hash_sets(proplist, SVN_PROP_REVISION_DATE, &date); + return svn_fs_x__set_revision_proplist(fs, 0, proplist, fs->pool); +} + +svn_error_t * +svn_fs_x__create_file_tree(svn_fs_t *fs, + const char *path, + int format, + int shard_size, + apr_pool_t *scratch_pool) +{ + svn_fs_x__data_t *ffd = fs->fsap_data; + + fs->path = apr_pstrdup(fs->pool, path); + ffd->format = format; + + /* Use an appropriate sharding mode if supported by the format. */ + ffd->max_files_per_dir = shard_size; + + /* Create the revision data directories. */ + SVN_ERR(svn_io_make_dir_recursively( + svn_fs_x__path_rev_shard(fs, 0, scratch_pool), + scratch_pool)); + + /* Create the revprops directory. */ + SVN_ERR(svn_io_make_dir_recursively( + svn_fs_x__path_revprops_shard(fs, 0, scratch_pool), + scratch_pool)); + + /* Create the transaction directory. */ + SVN_ERR(svn_io_make_dir_recursively( + svn_fs_x__path_txns_dir(fs, scratch_pool), + scratch_pool)); + + /* Create the protorevs directory. */ + SVN_ERR(svn_io_make_dir_recursively( + svn_fs_x__path_txn_proto_revs(fs, scratch_pool), + scratch_pool)); + + /* Create the 'current' file. */ + SVN_ERR(svn_io_file_create_empty(svn_fs_x__path_current(fs, scratch_pool), + scratch_pool)); + SVN_ERR(svn_fs_x__write_current(fs, 0, scratch_pool)); + + /* Create the 'uuid' file. */ + SVN_ERR(svn_io_file_create_empty(svn_fs_x__path_lock(fs, scratch_pool), + scratch_pool)); + SVN_ERR(svn_fs_x__set_uuid(fs, NULL, NULL, scratch_pool)); + + /* Create the fsfs.conf file. */ + SVN_ERR(write_config(fs, scratch_pool)); + SVN_ERR(read_config(ffd, fs->path, fs->pool, scratch_pool)); + + /* Add revision 0. */ + SVN_ERR(write_revision_zero(fs, scratch_pool)); + + /* Create the min unpacked rev file. */ + SVN_ERR(svn_io_file_create( + svn_fs_x__path_min_unpacked_rev(fs, scratch_pool), + "0\n", scratch_pool)); + + /* Create the txn-current file if the repository supports + the transaction sequence file. */ + SVN_ERR(svn_io_file_create(svn_fs_x__path_txn_current(fs, scratch_pool), + "0\n", scratch_pool)); + SVN_ERR(svn_io_file_create_empty( + svn_fs_x__path_txn_current_lock(fs, scratch_pool), + scratch_pool)); + + /* Initialize the revprop caching info. */ + SVN_ERR(svn_fs_x__reset_revprop_generation_file(fs, scratch_pool)); + + ffd->youngest_rev_cache = 0; + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__create(svn_fs_t *fs, + const char *path, + apr_pool_t *scratch_pool) +{ + int format = SVN_FS_X__FORMAT_NUMBER; + svn_fs_x__data_t *ffd = fs->fsap_data; + + fs->path = apr_pstrdup(fs->pool, path); + /* See if compatibility with older versions was explicitly requested. */ + if (fs->config) + { + svn_version_t *compatible_version; + SVN_ERR(svn_fs__compatible_version(&compatible_version, fs->config, + scratch_pool)); + + /* select format number */ + switch(compatible_version->minor) + { + case 0: + case 1: + case 2: + case 3: + case 4: + case 5: + case 6: + case 7: + case 8: return svn_error_create(SVN_ERR_FS_UNSUPPORTED_FORMAT, NULL, + _("FSX is not compatible with Subversion prior to 1.9")); + + default:format = SVN_FS_X__FORMAT_NUMBER; + } + } + + /* Actual FS creation. */ + SVN_ERR(svn_fs_x__create_file_tree(fs, path, format, + SVN_FS_X_DEFAULT_MAX_FILES_PER_DIR, + scratch_pool)); + + /* This filesystem is ready. Stamp it with a format number. */ + SVN_ERR(svn_fs_x__write_format(fs, FALSE, scratch_pool)); + + ffd->youngest_rev_cache = 0; + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__set_uuid(svn_fs_t *fs, + const char *uuid, + const char *instance_id, + apr_pool_t *scratch_pool) +{ + svn_fs_x__data_t *ffd = fs->fsap_data; + const char *uuid_path = svn_fs_x__path_uuid(fs, scratch_pool); + svn_stringbuf_t *contents = svn_stringbuf_create_empty(scratch_pool); + + if (! uuid) + uuid = svn_uuid_generate(scratch_pool); + + if (! instance_id) + instance_id = svn_uuid_generate(scratch_pool); + + svn_stringbuf_appendcstr(contents, uuid); + svn_stringbuf_appendcstr(contents, "\n"); + svn_stringbuf_appendcstr(contents, instance_id); + svn_stringbuf_appendcstr(contents, "\n"); + + /* We use the permissions of the 'current' file, because the 'uuid' + file does not exist during repository creation. */ + SVN_ERR(svn_io_write_atomic(uuid_path, contents->data, contents->len, + /* perms */ + svn_fs_x__path_current(fs, scratch_pool), + scratch_pool)); + + fs->uuid = apr_pstrdup(fs->pool, uuid); + ffd->instance_id = apr_pstrdup(fs->pool, instance_id); + + return SVN_NO_ERROR; +} + +/** Node origin lazy cache. */ + +/* If directory PATH does not exist, create it and give it the same + permissions as FS_path.*/ +svn_error_t * +svn_fs_x__ensure_dir_exists(const char *path, + const char *fs_path, + apr_pool_t *scratch_pool) +{ + svn_error_t *err = svn_io_dir_make(path, APR_OS_DEFAULT, scratch_pool); + if (err && APR_STATUS_IS_EEXIST(err->apr_err)) + { + svn_error_clear(err); + return SVN_NO_ERROR; + } + SVN_ERR(err); + + /* We successfully created a new directory. Dup the permissions + from FS->path. */ + return svn_io_copy_perms(fs_path, path, scratch_pool); +} + + +/*** Revisions ***/ + +svn_error_t * +svn_fs_x__revision_prop(svn_string_t **value_p, + svn_fs_t *fs, + svn_revnum_t rev, + const char *propname, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool) +{ + apr_hash_t *table; + + SVN_ERR(svn_fs__check_fs(fs, TRUE)); + SVN_ERR(svn_fs_x__get_revision_proplist(&table, fs, rev, FALSE, + scratch_pool, scratch_pool)); + + *value_p = svn_string_dup(svn_hash_gets(table, propname), result_pool); + + return SVN_NO_ERROR; +} + + +/* Baton used for change_rev_prop_body below. */ +typedef struct change_rev_prop_baton_t { + svn_fs_t *fs; + svn_revnum_t rev; + const char *name; + const svn_string_t *const *old_value_p; + const svn_string_t *value; +} change_rev_prop_baton_t; + +/* The work-horse for svn_fs_x__change_rev_prop, called with the FS + write lock. This implements the svn_fs_x__with_write_lock() + 'body' callback type. BATON is a 'change_rev_prop_baton_t *'. */ +static svn_error_t * +change_rev_prop_body(void *baton, + apr_pool_t *scratch_pool) +{ + change_rev_prop_baton_t *cb = baton; + apr_hash_t *table; + + /* Read current revprop values from disk (never from cache). + Even if somehow the cache got out of sync, we want to make sure that + we read, update and write up-to-date data. */ + SVN_ERR(svn_fs_x__get_revision_proplist(&table, cb->fs, cb->rev, TRUE, + scratch_pool, scratch_pool)); + + if (cb->old_value_p) + { + const svn_string_t *wanted_value = *cb->old_value_p; + const svn_string_t *present_value = svn_hash_gets(table, cb->name); + if ((!wanted_value != !present_value) + || (wanted_value && present_value + && !svn_string_compare(wanted_value, present_value))) + { + /* What we expected isn't what we found. */ + return svn_error_createf(SVN_ERR_FS_PROP_BASEVALUE_MISMATCH, NULL, + _("revprop '%s' has unexpected value in " + "filesystem"), + cb->name); + } + /* Fall through. */ + } + svn_hash_sets(table, cb->name, cb->value); + + return svn_fs_x__set_revision_proplist(cb->fs, cb->rev, table, + scratch_pool); +} + +svn_error_t * +svn_fs_x__change_rev_prop(svn_fs_t *fs, + svn_revnum_t rev, + const char *name, + const svn_string_t *const *old_value_p, + const svn_string_t *value, + apr_pool_t *scratch_pool) +{ + change_rev_prop_baton_t cb; + + SVN_ERR(svn_fs__check_fs(fs, TRUE)); + + cb.fs = fs; + cb.rev = rev; + cb.name = name; + cb.old_value_p = old_value_p; + cb.value = value; + + return svn_fs_x__with_write_lock(fs, change_rev_prop_body, &cb, + scratch_pool); +} + + +svn_error_t * +svn_fs_x__info_format(int *fs_format, + svn_version_t **supports_version, + svn_fs_t *fs, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool) +{ + svn_fs_x__data_t *ffd = fs->fsap_data; + *fs_format = ffd->format; + *supports_version = apr_palloc(result_pool, sizeof(svn_version_t)); + + (*supports_version)->major = SVN_VER_MAJOR; + (*supports_version)->minor = 9; + (*supports_version)->patch = 0; + (*supports_version)->tag = ""; + + switch (ffd->format) + { + case 1: + break; +#ifdef SVN_DEBUG +# if SVN_FS_X__FORMAT_NUMBER != 1 +# error "Need to add a 'case' statement here" +# endif +#endif + } + + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__info_config_files(apr_array_header_t **files, + svn_fs_t *fs, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool) +{ + *files = apr_array_make(result_pool, 1, sizeof(const char *)); + APR_ARRAY_PUSH(*files, const char *) = svn_dirent_join(fs->path, PATH_CONFIG, + result_pool); + return SVN_NO_ERROR; +} diff --git a/subversion/libsvn_fs_x/fs_x.h b/subversion/libsvn_fs_x/fs_x.h new file mode 100644 index 0000000..98be702 --- /dev/null +++ b/subversion/libsvn_fs_x/fs_x.h @@ -0,0 +1,202 @@ +/* fs_x.h : interface to the native filesystem layer + * + * ==================================================================== + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + * ==================================================================== + */ + +#ifndef SVN_LIBSVN_FS__FS_X_H +#define SVN_LIBSVN_FS__FS_X_H + +#include "fs.h" + +/* Read the 'format' file of fsx filesystem FS and store its info in FS. + * Use SCRATCH_POOL for temporary allocations. */ +svn_error_t * +svn_fs_x__read_format_file(svn_fs_t *fs, + apr_pool_t *scratch_pool); + +/* Open the fsx filesystem pointed to by PATH and associate it with + filesystem object FS. Use SCRATCH_POOL for temporary allocations. + + ### Some parts of *FS must have been initialized beforehand; some parts + (including FS->path) are initialized by this function. */ +svn_error_t * +svn_fs_x__open(svn_fs_t *fs, + const char *path, + apr_pool_t *scratch_pool); + +/* Upgrade the fsx filesystem FS. Indicate progress via the optional + * NOTIFY_FUNC callback using NOTIFY_BATON. The optional CANCEL_FUNC + * will periodically be called with CANCEL_BATON to allow for preemption. + * Use SCRATCH_POOL for temporary allocations. */ +svn_error_t * +svn_fs_x__upgrade(svn_fs_t *fs, + svn_fs_upgrade_notify_t notify_func, + void *notify_baton, + svn_cancel_func_t cancel_func, + void *cancel_baton, + apr_pool_t *scratch_pool); + +/* Set *YOUNGEST to the youngest revision in filesystem FS. Do any + temporary allocation in SCRATCH_POOL. */ +svn_error_t * +svn_fs_x__youngest_rev(svn_revnum_t *youngest, + svn_fs_t *fs, + apr_pool_t *scratch_pool); + +/* Return SVN_ERR_FS_NO_SUCH_REVISION if the given revision REV is newer + than the current youngest revision in FS or is simply not a valid + revision number, else return success. Use SCRATCH_POOL for temporary + allocations. */ +svn_error_t * +svn_fs_x__ensure_revision_exists(svn_revnum_t rev, + svn_fs_t *fs, + apr_pool_t *scratch_pool); + +/* Set *LENGTH to the be fulltext length of the node revision + specified by NODEREV. */ +svn_error_t * +svn_fs_x__file_length(svn_filesize_t *length, + svn_fs_x__noderev_t *noderev); + +/* Return TRUE if the representations in A and B have equal contents, else + return FALSE. */ +svn_boolean_t +svn_fs_x__file_text_rep_equal(svn_fs_x__representation_t *a, + svn_fs_x__representation_t *b); + +/* Set *EQUAL to TRUE if the property representations in A and B within FS + have equal contents, else set it to FALSE. If STRICT is not set, allow + for false negatives. + Use SCRATCH_POOL for temporary allocations. */ +svn_error_t * +svn_fs_x__prop_rep_equal(svn_boolean_t *equal, + svn_fs_t *fs, + svn_fs_x__noderev_t *a, + svn_fs_x__noderev_t *b, + svn_boolean_t strict, + apr_pool_t *scratch_pool); + + +/* Return a copy of the representation REP allocated from RESULT_POOL. */ +svn_fs_x__representation_t * +svn_fs_x__rep_copy(svn_fs_x__representation_t *rep, + apr_pool_t *result_pool); + + +/* Return the recorded checksum of type KIND for the text representation + of NODREV into CHECKSUM, allocating from RESULT_POOL. If no stored + checksum is available, put all NULL into CHECKSUM. */ +svn_error_t * +svn_fs_x__file_checksum(svn_checksum_t **checksum, + svn_fs_x__noderev_t *noderev, + svn_checksum_kind_t kind, + apr_pool_t *result_pool); + +/* Under the repository db PATH, create a FSFS repository with FORMAT, + * the given SHARD_SIZE. If not supported by the respective format, + * the latter two parameters will be ignored. FS will be updated. + * + * The only file not being written is the 'format' file. This allows + * callers such as hotcopy to modify the contents before turning the + * tree into an accessible repository. + * + * Use SCRATCH_POOL for temporary allocations. + */ +svn_error_t * +svn_fs_x__create_file_tree(svn_fs_t *fs, + const char *path, + int format, + int shard_size, + apr_pool_t *scratch_pool); + +/* Create a fs_x fileysystem referenced by FS at path PATH. Get any + temporary allocations from SCRATCH_POOL. + + ### Some parts of *FS must have been initialized beforehand; some parts + (including FS->path) are initialized by this function. */ +svn_error_t * +svn_fs_x__create(svn_fs_t *fs, + const char *path, + apr_pool_t *scratch_pool); + +/* Set the uuid of repository FS to UUID and the instance ID to INSTANCE_ID. + If any of them is NULL, use a newly generated UUID / ID instead. + Perform temporary allocations in SCRATCH_POOL. */ +svn_error_t * +svn_fs_x__set_uuid(svn_fs_t *fs, + const char *uuid, + const char *instance_id, + apr_pool_t *scratch_pool); + +/* Read the format number and maximum number of files per directory + from PATH and return them in *PFORMAT and *MAX_FILES_PER_DIR + respectively. + + *MAX_FILES_PER_DIR is obtained from the 'layout' format option, and + will be set to zero if a linear scheme should be used. + + Use SCRATCH_POOL for temporary allocation. */ +svn_error_t * +svn_fs_x__write_format(svn_fs_t *fs, + svn_boolean_t overwrite, + apr_pool_t *scratch_pool); + +/* Find the value of the property named PROPNAME in transaction REV. + Return the contents in *VALUE_P, allocated from RESULT_POOL. + Use SCRATCH_POOL for temporary allocations. */ +svn_error_t * +svn_fs_x__revision_prop(svn_string_t **value_p, + svn_fs_t *fs, + svn_revnum_t rev, + const char *propname, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool); + +/* Change, add, or delete a property on a revision REV in filesystem + FS. NAME gives the name of the property, and value, if non-NULL, + gives the new contents of the property. If value is NULL, then the + property will be deleted. If OLD_VALUE_P is not NULL, do nothing unless + the preexisting value is *OLD_VALUE_P. + Do any temporary allocation in SCRATCH_POOL. */ +svn_error_t * +svn_fs_x__change_rev_prop(svn_fs_t *fs, + svn_revnum_t rev, + const char *name, + const svn_string_t *const *old_value_p, + const svn_string_t *value, + apr_pool_t *scratch_pool); + +/* If directory PATH does not exist, create it and give it the same + permissions as FS_PATH. Do any temporary allocation in SCRATCH_POOL. */ +svn_error_t * +svn_fs_x__ensure_dir_exists(const char *path, + const char *fs_path, + apr_pool_t *scratch_pool); + +/* Initialize all session-local caches in FS according to the global + cache settings. Use SCRATCH_POOL for temporary allocations. + + Please note that it is permissible for this function to set some + or all of these caches to NULL, regardless of any setting. */ +svn_error_t * +svn_fs_x__initialize_caches(svn_fs_t *fs, + apr_pool_t *scratch_pool); + +#endif diff --git a/subversion/libsvn_fs_x/hotcopy.c b/subversion/libsvn_fs_x/hotcopy.c new file mode 100644 index 0000000..c9f0af2 --- /dev/null +++ b/subversion/libsvn_fs_x/hotcopy.c @@ -0,0 +1,991 @@ +/* hotcopys.c --- FS hotcopy functionality for FSX + * + * ==================================================================== + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + * ==================================================================== + */ +#include "svn_pools.h" +#include "svn_path.h" +#include "svn_dirent_uri.h" + +#include "fs_x.h" +#include "hotcopy.h" +#include "util.h" +#include "revprops.h" +#include "rep-cache.h" +#include "transaction.h" +#include "recovery.h" + +#include "../libsvn_fs/fs-loader.h" + +#include "svn_private_config.h" + +/* Like svn_io_dir_file_copy(), but doesn't copy files that exist at + * the destination and do not differ in terms of kind, size, and mtime. + * Set *SKIPPED_P to FALSE only if the file was copied, do not change + * the value in *SKIPPED_P otherwise. SKIPPED_P may be NULL if not + * required. */ +static svn_error_t * +hotcopy_io_dir_file_copy(svn_boolean_t *skipped_p, + const char *src_path, + const char *dst_path, + const char *file, + apr_pool_t *scratch_pool) +{ + const svn_io_dirent2_t *src_dirent; + const svn_io_dirent2_t *dst_dirent; + const char *src_target; + const char *dst_target; + + /* Does the destination already exist? If not, we must copy it. */ + dst_target = svn_dirent_join(dst_path, file, scratch_pool); + SVN_ERR(svn_io_stat_dirent2(&dst_dirent, dst_target, FALSE, TRUE, + scratch_pool, scratch_pool)); + if (dst_dirent->kind != svn_node_none) + { + /* If the destination's stat information indicates that the file + * is equal to the source, don't bother copying the file again. */ + src_target = svn_dirent_join(src_path, file, scratch_pool); + SVN_ERR(svn_io_stat_dirent2(&src_dirent, src_target, FALSE, FALSE, + scratch_pool, scratch_pool)); + if (src_dirent->kind == dst_dirent->kind && + src_dirent->special == dst_dirent->special && + src_dirent->filesize == dst_dirent->filesize && + src_dirent->mtime <= dst_dirent->mtime) + return SVN_NO_ERROR; + } + + if (skipped_p) + *skipped_p = FALSE; + + return svn_error_trace(svn_io_dir_file_copy(src_path, dst_path, file, + scratch_pool)); +} + +/* Set *NAME_P to the UTF-8 representation of directory entry NAME. + * NAME is in the internal encoding used by APR; PARENT is in + * UTF-8 and in internal (not local) style. + * + * Use PARENT only for generating an error string if the conversion + * fails because NAME could not be represented in UTF-8. In that + * case, return a two-level error in which the outer error's message + * mentions PARENT, but the inner error's message does not mention + * NAME (except possibly in hex) since NAME may not be printable. + * Such a compound error at least allows the user to go looking in the + * right directory for the problem. + * + * If there is any other error, just return that error directly. + * + * If there is any error, the effect on *NAME_P is undefined. + * + * *NAME_P and NAME may refer to the same storage. + */ +static svn_error_t * +entry_name_to_utf8(const char **name_p, + const char *name, + const char *parent, + apr_pool_t *result_pool) +{ + svn_error_t *err = svn_path_cstring_to_utf8(name_p, name, result_pool); + if (err && err->apr_err == APR_EINVAL) + { + return svn_error_createf(err->apr_err, err, + _("Error converting entry " + "in directory '%s' to UTF-8"), + svn_dirent_local_style(parent, result_pool)); + } + return err; +} + +/* Like svn_io_copy_dir_recursively() but doesn't copy regular files that + * exist in the destination and do not differ from the source in terms of + * kind, size, and mtime. Set *SKIPPED_P to FALSE only if at least one + * file was copied, do not change the value in *SKIPPED_P otherwise. + * SKIPPED_P may be NULL if not required. */ +static svn_error_t * +hotcopy_io_copy_dir_recursively(svn_boolean_t *skipped_p, + const char *src, + const char *dst_parent, + const char *dst_basename, + svn_boolean_t copy_perms, + svn_cancel_func_t cancel_func, + void *cancel_baton, + apr_pool_t *scratch_pool) +{ + svn_node_kind_t kind; + apr_status_t status; + const char *dst_path; + apr_dir_t *this_dir; + apr_finfo_t this_entry; + apr_int32_t flags = APR_FINFO_TYPE | APR_FINFO_NAME; + + /* Make a subpool for recursion */ + apr_pool_t *subpool = svn_pool_create(scratch_pool); + + /* The 'dst_path' is simply dst_parent/dst_basename */ + dst_path = svn_dirent_join(dst_parent, dst_basename, scratch_pool); + + /* Sanity checks: SRC and DST_PARENT are directories, and + DST_BASENAME doesn't already exist in DST_PARENT. */ + SVN_ERR(svn_io_check_path(src, &kind, subpool)); + if (kind != svn_node_dir) + return svn_error_createf(SVN_ERR_NODE_UNEXPECTED_KIND, NULL, + _("Source '%s' is not a directory"), + svn_dirent_local_style(src, scratch_pool)); + + SVN_ERR(svn_io_check_path(dst_parent, &kind, subpool)); + if (kind != svn_node_dir) + return svn_error_createf(SVN_ERR_NODE_UNEXPECTED_KIND, NULL, + _("Destination '%s' is not a directory"), + svn_dirent_local_style(dst_parent, + scratch_pool)); + + SVN_ERR(svn_io_check_path(dst_path, &kind, subpool)); + + /* Create the new directory. */ + /* ### TODO: copy permissions (needs apr_file_attrs_get()) */ + SVN_ERR(svn_io_make_dir_recursively(dst_path, scratch_pool)); + + /* Loop over the dirents in SRC. ('.' and '..' are auto-excluded) */ + SVN_ERR(svn_io_dir_open(&this_dir, src, subpool)); + + for (status = apr_dir_read(&this_entry, flags, this_dir); + status == APR_SUCCESS; + status = apr_dir_read(&this_entry, flags, this_dir)) + { + if ((this_entry.name[0] == '.') + && ((this_entry.name[1] == '\0') + || ((this_entry.name[1] == '.') + && (this_entry.name[2] == '\0')))) + { + continue; + } + else + { + const char *entryname_utf8; + + if (cancel_func) + SVN_ERR(cancel_func(cancel_baton)); + + SVN_ERR(entry_name_to_utf8(&entryname_utf8, this_entry.name, + src, subpool)); + if (this_entry.filetype == APR_REG) /* regular file */ + { + SVN_ERR(hotcopy_io_dir_file_copy(skipped_p, src, dst_path, + entryname_utf8, subpool)); + } + else if (this_entry.filetype == APR_LNK) /* symlink */ + { + const char *src_target = svn_dirent_join(src, entryname_utf8, + subpool); + const char *dst_target = svn_dirent_join(dst_path, + entryname_utf8, + subpool); + SVN_ERR(svn_io_copy_link(src_target, dst_target, + subpool)); + } + else if (this_entry.filetype == APR_DIR) /* recurse */ + { + const char *src_target; + + /* Prevent infinite recursion by filtering off our + newly created destination path. */ + if (strcmp(src, dst_parent) == 0 + && strcmp(entryname_utf8, dst_basename) == 0) + continue; + + src_target = svn_dirent_join(src, entryname_utf8, subpool); + SVN_ERR(hotcopy_io_copy_dir_recursively(skipped_p, + src_target, + dst_path, + entryname_utf8, + copy_perms, + cancel_func, + cancel_baton, + subpool)); + } + /* ### support other APR node types someday?? */ + + } + } + + if (! (APR_STATUS_IS_ENOENT(status))) + return svn_error_wrap_apr(status, _("Can't read directory '%s'"), + svn_dirent_local_style(src, scratch_pool)); + + status = apr_dir_close(this_dir); + if (status) + return svn_error_wrap_apr(status, _("Error closing directory '%s'"), + svn_dirent_local_style(src, scratch_pool)); + + /* Free any memory used by recursion */ + svn_pool_destroy(subpool); + + return SVN_NO_ERROR; +} + +/* Copy an un-packed revision or revprop file for revision REV from SRC_SUBDIR + * to DST_SUBDIR. Assume a sharding layout based on MAX_FILES_PER_DIR. + * Set *SKIPPED_P to FALSE only if the file was copied, do not change the + * value in *SKIPPED_P otherwise. SKIPPED_P may be NULL if not required. + * Use SCRATCH_POOL for temporary allocations. */ +static svn_error_t * +hotcopy_copy_shard_file(svn_boolean_t *skipped_p, + const char *src_subdir, + const char *dst_subdir, + svn_revnum_t rev, + int max_files_per_dir, + apr_pool_t *scratch_pool) +{ + const char *src_subdir_shard = src_subdir, + *dst_subdir_shard = dst_subdir; + + const char *shard = apr_psprintf(scratch_pool, "%ld", + rev / max_files_per_dir); + src_subdir_shard = svn_dirent_join(src_subdir, shard, scratch_pool); + dst_subdir_shard = svn_dirent_join(dst_subdir, shard, scratch_pool); + + if (rev % max_files_per_dir == 0) + { + SVN_ERR(svn_io_make_dir_recursively(dst_subdir_shard, scratch_pool)); + SVN_ERR(svn_io_copy_perms(dst_subdir, dst_subdir_shard, + scratch_pool)); + } + + SVN_ERR(hotcopy_io_dir_file_copy(skipped_p, + src_subdir_shard, dst_subdir_shard, + apr_psprintf(scratch_pool, "%ld", rev), + scratch_pool)); + return SVN_NO_ERROR; +} + + +/* Copy a packed shard containing revision REV, and which contains + * MAX_FILES_PER_DIR revisions, from SRC_FS to DST_FS. + * Update *DST_MIN_UNPACKED_REV in case the shard is new in DST_FS. + * Do not re-copy data which already exists in DST_FS. + * Set *SKIPPED_P to FALSE only if at least one part of the shard + * was copied, do not change the value in *SKIPPED_P otherwise. + * SKIPPED_P may be NULL if not required. + * Use SCRATCH_POOL for temporary allocations. */ +static svn_error_t * +hotcopy_copy_packed_shard(svn_boolean_t *skipped_p, + svn_revnum_t *dst_min_unpacked_rev, + svn_fs_t *src_fs, + svn_fs_t *dst_fs, + svn_revnum_t rev, + int max_files_per_dir, + apr_pool_t *scratch_pool) +{ + const char *src_subdir; + const char *dst_subdir; + const char *packed_shard; + const char *src_subdir_packed_shard; + svn_revnum_t revprop_rev; + apr_pool_t *iterpool; + svn_fs_x__data_t *src_ffd = src_fs->fsap_data; + + /* Copy the packed shard. */ + src_subdir = svn_dirent_join(src_fs->path, PATH_REVS_DIR, scratch_pool); + dst_subdir = svn_dirent_join(dst_fs->path, PATH_REVS_DIR, scratch_pool); + packed_shard = apr_psprintf(scratch_pool, "%ld" PATH_EXT_PACKED_SHARD, + rev / max_files_per_dir); + src_subdir_packed_shard = svn_dirent_join(src_subdir, packed_shard, + scratch_pool); + SVN_ERR(hotcopy_io_copy_dir_recursively(skipped_p, src_subdir_packed_shard, + dst_subdir, packed_shard, + TRUE /* copy_perms */, + NULL /* cancel_func */, NULL, + scratch_pool)); + + /* Copy revprops belonging to revisions in this pack. */ + src_subdir = svn_dirent_join(src_fs->path, PATH_REVPROPS_DIR, scratch_pool); + dst_subdir = svn_dirent_join(dst_fs->path, PATH_REVPROPS_DIR, scratch_pool); + + if (src_ffd->min_unpacked_rev < rev + max_files_per_dir) + { + /* copy unpacked revprops rev by rev */ + iterpool = svn_pool_create(scratch_pool); + for (revprop_rev = rev; + revprop_rev < rev + max_files_per_dir; + revprop_rev++) + { + svn_pool_clear(iterpool); + + SVN_ERR(hotcopy_copy_shard_file(skipped_p, src_subdir, dst_subdir, + revprop_rev, max_files_per_dir, + iterpool)); + } + svn_pool_destroy(iterpool); + } + else + { + /* revprop for revision 0 will never be packed */ + if (rev == 0) + SVN_ERR(hotcopy_copy_shard_file(skipped_p, src_subdir, dst_subdir, + 0, max_files_per_dir, + scratch_pool)); + + /* packed revprops folder */ + packed_shard = apr_psprintf(scratch_pool, "%ld" PATH_EXT_PACKED_SHARD, + rev / max_files_per_dir); + src_subdir_packed_shard = svn_dirent_join(src_subdir, packed_shard, + scratch_pool); + SVN_ERR(hotcopy_io_copy_dir_recursively(skipped_p, + src_subdir_packed_shard, + dst_subdir, packed_shard, + TRUE /* copy_perms */, + NULL /* cancel_func */, NULL, + scratch_pool)); + } + + /* If necessary, update the min-unpacked rev file in the hotcopy. */ + if (*dst_min_unpacked_rev < rev + max_files_per_dir) + { + *dst_min_unpacked_rev = rev + max_files_per_dir; + SVN_ERR(svn_fs_x__write_min_unpacked_rev(dst_fs, + *dst_min_unpacked_rev, + scratch_pool)); + } + + return SVN_NO_ERROR; +} + +/* Remove file PATH, if it exists - even if it is read-only. + * Use SCRATCH_POOL for temporary allocations. */ +static svn_error_t * +hotcopy_remove_file(const char *path, + apr_pool_t *scratch_pool) +{ + /* Make the rev file writable and remove it. */ + SVN_ERR(svn_io_set_file_read_write(path, TRUE, scratch_pool)); + SVN_ERR(svn_io_remove_file2(path, TRUE, scratch_pool)); + + return SVN_NO_ERROR; +} + + +/* Remove revision or revprop files between START_REV (inclusive) and + * END_REV (non-inclusive) from folder DST_SUBDIR in DST_FS. Assume + * sharding as per MAX_FILES_PER_DIR. + * Use SCRATCH_POOL for temporary allocations. */ +static svn_error_t * +hotcopy_remove_files(svn_fs_t *dst_fs, + const char *dst_subdir, + svn_revnum_t start_rev, + svn_revnum_t end_rev, + int max_files_per_dir, + apr_pool_t *scratch_pool) +{ + const char *shard; + const char *dst_subdir_shard; + svn_revnum_t rev; + apr_pool_t *iterpool; + + /* Pre-compute paths for initial shard. */ + shard = apr_psprintf(scratch_pool, "%ld", start_rev / max_files_per_dir); + dst_subdir_shard = svn_dirent_join(dst_subdir, shard, scratch_pool); + + iterpool = svn_pool_create(scratch_pool); + for (rev = start_rev; rev < end_rev; rev++) + { + svn_pool_clear(iterpool); + + /* If necessary, update paths for shard. */ + if (rev != start_rev && rev % max_files_per_dir == 0) + { + shard = apr_psprintf(iterpool, "%ld", rev / max_files_per_dir); + dst_subdir_shard = svn_dirent_join(dst_subdir, shard, scratch_pool); + } + + /* remove files for REV */ + SVN_ERR(hotcopy_remove_file(svn_dirent_join(dst_subdir_shard, + apr_psprintf(iterpool, + "%ld", rev), + iterpool), + iterpool)); + } + + svn_pool_destroy(iterpool); + + return SVN_NO_ERROR; +} + +/* Remove revisions between START_REV (inclusive) and END_REV (non-inclusive) + * from DST_FS. Assume sharding as per MAX_FILES_PER_DIR. + * Use SCRATCH_POOL for temporary allocations. */ +static svn_error_t * +hotcopy_remove_rev_files(svn_fs_t *dst_fs, + svn_revnum_t start_rev, + svn_revnum_t end_rev, + int max_files_per_dir, + apr_pool_t *scratch_pool) +{ + SVN_ERR_ASSERT(start_rev <= end_rev); + SVN_ERR(hotcopy_remove_files(dst_fs, + svn_dirent_join(dst_fs->path, + PATH_REVS_DIR, + scratch_pool), + start_rev, end_rev, + max_files_per_dir, scratch_pool)); + + return SVN_NO_ERROR; +} + +/* Remove revision properties between START_REV (inclusive) and END_REV + * (non-inclusive) from DST_FS. Assume sharding as per MAX_FILES_PER_DIR. + * Use SCRATCH_POOL for temporary allocations. Revision 0 revprops will + * not be deleted. */ +static svn_error_t * +hotcopy_remove_revprop_files(svn_fs_t *dst_fs, + svn_revnum_t start_rev, + svn_revnum_t end_rev, + int max_files_per_dir, + apr_pool_t *scratch_pool) +{ + SVN_ERR_ASSERT(start_rev <= end_rev); + + /* don't delete rev 0 props */ + SVN_ERR(hotcopy_remove_files(dst_fs, + svn_dirent_join(dst_fs->path, + PATH_REVPROPS_DIR, + scratch_pool), + start_rev ? start_rev : 1, end_rev, + max_files_per_dir, scratch_pool)); + + return SVN_NO_ERROR; +} + +/* Verify that DST_FS is a suitable destination for an incremental + * hotcopy from SRC_FS. */ +static svn_error_t * +hotcopy_incremental_check_preconditions(svn_fs_t *src_fs, + svn_fs_t *dst_fs) +{ + svn_fs_x__data_t *src_ffd = src_fs->fsap_data; + svn_fs_x__data_t *dst_ffd = dst_fs->fsap_data; + + /* We only support incremental hotcopy between the same format. */ + if (src_ffd->format != dst_ffd->format) + return svn_error_createf(SVN_ERR_UNSUPPORTED_FEATURE, NULL, + _("The FSX format (%d) of the hotcopy source does not match the " + "FSX format (%d) of the hotcopy destination; please upgrade " + "both repositories to the same format"), + src_ffd->format, dst_ffd->format); + + /* Make sure the UUID of source and destination match up. + * We don't want to copy over a different repository. */ + if (strcmp(src_fs->uuid, dst_fs->uuid) != 0) + return svn_error_create(SVN_ERR_RA_UUID_MISMATCH, NULL, + _("The UUID of the hotcopy source does " + "not match the UUID of the hotcopy " + "destination")); + + /* Also require same shard size. */ + if (src_ffd->max_files_per_dir != dst_ffd->max_files_per_dir) + return svn_error_create(SVN_ERR_UNSUPPORTED_FEATURE, NULL, + _("The sharding layout configuration " + "of the hotcopy source does not match " + "the sharding layout configuration of " + "the hotcopy destination")); + return SVN_NO_ERROR; +} + +/* Remove folder PATH. Ignore errors due to the sub-tree not being empty. + * CANCEL_FUNC and CANCEL_BATON do the usual thing. + * Use SCRATCH_POOL for temporary allocations. + */ +static svn_error_t * +remove_folder(const char *path, + svn_cancel_func_t cancel_func, + void *cancel_baton, + apr_pool_t *scratch_pool) +{ + svn_error_t *err = svn_io_remove_dir2(path, TRUE, + cancel_func, cancel_baton, + scratch_pool); + + if (err && APR_STATUS_IS_ENOTEMPTY(err->apr_err)) + { + svn_error_clear(err); + err = SVN_NO_ERROR; + } + + return svn_error_trace(err); +} + +/* Copy the revision and revprop files (possibly sharded / packed) from + * SRC_FS to DST_FS. Do not re-copy data which already exists in DST_FS. + * When copying packed or unpacked shards, checkpoint the result in DST_FS + * for every shard by updating the 'current' file if necessary. Assume + * the >= SVN_FS_FS__MIN_NO_GLOBAL_IDS_FORMAT filesystem format without + * global next-ID counters. Indicate progress via the optional NOTIFY_FUNC + * callback using NOTIFY_BATON. Use SCRATCH_POOL for temporary allocations. + */ +static svn_error_t * +hotcopy_revisions(svn_fs_t *src_fs, + svn_fs_t *dst_fs, + svn_revnum_t src_youngest, + svn_revnum_t dst_youngest, + svn_boolean_t incremental, + const char *src_revs_dir, + const char *dst_revs_dir, + const char *src_revprops_dir, + const char *dst_revprops_dir, + svn_fs_hotcopy_notify_t notify_func, + void* notify_baton, + svn_cancel_func_t cancel_func, + void* cancel_baton, + apr_pool_t *scratch_pool) +{ + svn_fs_x__data_t *src_ffd = src_fs->fsap_data; + int max_files_per_dir = src_ffd->max_files_per_dir; + svn_revnum_t src_min_unpacked_rev; + svn_revnum_t dst_min_unpacked_rev; + svn_revnum_t rev; + apr_pool_t *iterpool; + + /* Copy the min unpacked rev, and read its value. */ + SVN_ERR(svn_fs_x__read_min_unpacked_rev(&src_min_unpacked_rev, src_fs, + scratch_pool)); + SVN_ERR(svn_fs_x__read_min_unpacked_rev(&dst_min_unpacked_rev, dst_fs, + scratch_pool)); + + /* We only support packs coming from the hotcopy source. + * The destination should not be packed independently from + * the source. This also catches the case where users accidentally + * swap the source and destination arguments. */ + if (src_min_unpacked_rev < dst_min_unpacked_rev) + return svn_error_createf(SVN_ERR_UNSUPPORTED_FEATURE, NULL, + _("The hotcopy destination already contains " + "more packed revisions (%lu) than the " + "hotcopy source contains (%lu)"), + dst_min_unpacked_rev - 1, + src_min_unpacked_rev - 1); + + SVN_ERR(svn_io_dir_file_copy(src_fs->path, dst_fs->path, + PATH_MIN_UNPACKED_REV, scratch_pool)); + + if (cancel_func) + SVN_ERR(cancel_func(cancel_baton)); + + /* + * Copy the necessary rev files. + */ + + iterpool = svn_pool_create(scratch_pool); + /* First, copy packed shards. */ + for (rev = 0; rev < src_min_unpacked_rev; rev += max_files_per_dir) + { + svn_boolean_t skipped = TRUE; + svn_revnum_t pack_end_rev; + + svn_pool_clear(iterpool); + + if (cancel_func) + SVN_ERR(cancel_func(cancel_baton)); + + /* Copy the packed shard. */ + SVN_ERR(hotcopy_copy_packed_shard(&skipped, &dst_min_unpacked_rev, + src_fs, dst_fs, + rev, max_files_per_dir, + iterpool)); + + pack_end_rev = rev + max_files_per_dir - 1; + + /* Whenever this pack did not previously exist in the destination, + * update 'current' to the most recent packed rev (so readers can see + * new revisions which arrived in this pack). */ + if (pack_end_rev > dst_youngest) + { + SVN_ERR(svn_fs_x__write_current(dst_fs, pack_end_rev, iterpool)); + } + + /* When notifying about packed shards, make things simpler by either + * reporting a full revision range, i.e [pack start, pack end] or + * reporting nothing. There is one case when this approach might not + * be exact (incremental hotcopy with a pack replacing last unpacked + * revisions), but generally this is good enough. */ + if (notify_func && !skipped) + notify_func(notify_baton, rev, pack_end_rev, iterpool); + + /* Remove revision files which are now packed. */ + if (incremental) + { + SVN_ERR(hotcopy_remove_rev_files(dst_fs, rev, + rev + max_files_per_dir, + max_files_per_dir, iterpool)); + SVN_ERR(hotcopy_remove_revprop_files(dst_fs, rev, + rev + max_files_per_dir, + max_files_per_dir, + iterpool)); + } + + /* Now that all revisions have moved into the pack, the original + * rev dir can be removed. */ + SVN_ERR(remove_folder(svn_fs_x__path_rev_shard(dst_fs, rev, iterpool), + cancel_func, cancel_baton, iterpool)); + if (rev > 0) + SVN_ERR(remove_folder(svn_fs_x__path_revprops_shard(dst_fs, rev, + iterpool), + cancel_func, cancel_baton, iterpool)); + } + + if (cancel_func) + SVN_ERR(cancel_func(cancel_baton)); + + SVN_ERR_ASSERT(rev == src_min_unpacked_rev); + SVN_ERR_ASSERT(src_min_unpacked_rev == dst_min_unpacked_rev); + + /* Now, copy pairs of non-packed revisions and revprop files. + * If necessary, update 'current' after copying all files from a shard. */ + for (; rev <= src_youngest; rev++) + { + svn_boolean_t skipped = TRUE; + + svn_pool_clear(iterpool); + + if (cancel_func) + SVN_ERR(cancel_func(cancel_baton)); + + /* Copying non-packed revisions is racy in case the source repository is + * being packed concurrently with this hotcopy operation. With the pack + * lock, however, the race is impossible, because hotcopy and pack + * operations block each other. + * + * We assume that all revisions coming after 'min-unpacked-rev' really + * are unpacked and that's not necessarily true with concurrent packing. + * Don't try to be smart in this edge case, because handling it properly + * might require copying *everything* from the start. Just abort the + * hotcopy with an ENOENT (revision file moved to a pack, so it is no + * longer where we expect it to be). */ + + /* Copy the rev file. */ + SVN_ERR(hotcopy_copy_shard_file(&skipped, src_revs_dir, dst_revs_dir, + rev, max_files_per_dir, + iterpool)); + + /* Copy the revprop file. */ + SVN_ERR(hotcopy_copy_shard_file(&skipped, src_revprops_dir, + dst_revprops_dir, + rev, max_files_per_dir, + iterpool)); + + /* Whenever this revision did not previously exist in the destination, + * checkpoint the progress via 'current' (do that once per full shard + * in order not to slow things down). */ + if (rev > dst_youngest) + { + if (max_files_per_dir && (rev % max_files_per_dir == 0)) + { + SVN_ERR(svn_fs_x__write_current(dst_fs, rev, iterpool)); + } + } + + if (notify_func && !skipped) + notify_func(notify_baton, rev, rev, iterpool); + } + svn_pool_destroy(iterpool); + + /* We assume that all revisions were copied now, i.e. we didn't exit the + * above loop early. 'rev' was last incremented during exit of the loop. */ + SVN_ERR_ASSERT(rev == src_youngest + 1); + + return SVN_NO_ERROR; +} + +/* Baton for hotcopy_body(). */ +typedef struct hotcopy_body_baton_t { + svn_fs_t *src_fs; + svn_fs_t *dst_fs; + svn_boolean_t incremental; + svn_fs_hotcopy_notify_t notify_func; + void *notify_baton; + svn_cancel_func_t cancel_func; + void *cancel_baton; +} hotcopy_body_baton_t; + +/* Perform a hotcopy, either normal or incremental. + * + * Normal hotcopy assumes that the destination exists as an empty + * directory. It behaves like an incremental hotcopy except that + * none of the copied files already exist in the destination. + * + * An incremental hotcopy copies only changed or new files to the destination, + * and removes files from the destination no longer present in the source. + * While the incremental hotcopy is running, readers should still be able + * to access the destintation repository without error and should not see + * revisions currently in progress of being copied. Readers are able to see + * new fully copied revisions even if the entire incremental hotcopy procedure + * has not yet completed. + * + * Writers are blocked out completely during the entire incremental hotcopy + * process to ensure consistency. This function assumes that the repository + * write-lock is held. + */ +static svn_error_t * +hotcopy_body(void *baton, + apr_pool_t *scratch_pool) +{ + hotcopy_body_baton_t *hbb = baton; + svn_fs_t *src_fs = hbb->src_fs; + svn_fs_t *dst_fs = hbb->dst_fs; + svn_boolean_t incremental = hbb->incremental; + svn_fs_hotcopy_notify_t notify_func = hbb->notify_func; + void* notify_baton = hbb->notify_baton; + svn_cancel_func_t cancel_func = hbb->cancel_func; + void* cancel_baton = hbb->cancel_baton; + svn_revnum_t src_youngest; + svn_revnum_t dst_youngest; + const char *src_revprops_dir; + const char *dst_revprops_dir; + const char *src_revs_dir; + const char *dst_revs_dir; + const char *src_subdir; + const char *dst_subdir; + svn_node_kind_t kind; + + /* Try to copy the config. + * + * ### We try copying the config file before doing anything else, + * ### because higher layers will abort the hotcopy if we throw + * ### an error from this function, and that renders the hotcopy + * ### unusable anyway. */ + SVN_ERR(svn_io_dir_file_copy(src_fs->path, dst_fs->path, PATH_CONFIG, + scratch_pool)); + + if (cancel_func) + SVN_ERR(cancel_func(cancel_baton)); + + /* Find the youngest revision in the source and destination. + * We only support hotcopies from sources with an equal or greater amount + * of revisions than the destination. + * This also catches the case where users accidentally swap the + * source and destination arguments. */ + SVN_ERR(svn_fs_x__read_current(&src_youngest, src_fs, scratch_pool)); + if (incremental) + { + SVN_ERR(svn_fs_x__youngest_rev(&dst_youngest, dst_fs, scratch_pool)); + if (src_youngest < dst_youngest) + return svn_error_createf(SVN_ERR_UNSUPPORTED_FEATURE, NULL, + _("The hotcopy destination already contains more revisions " + "(%lu) than the hotcopy source contains (%lu); are source " + "and destination swapped?"), + dst_youngest, src_youngest); + } + else + dst_youngest = 0; + + src_revs_dir = svn_dirent_join(src_fs->path, PATH_REVS_DIR, scratch_pool); + dst_revs_dir = svn_dirent_join(dst_fs->path, PATH_REVS_DIR, scratch_pool); + src_revprops_dir = svn_dirent_join(src_fs->path, PATH_REVPROPS_DIR, + scratch_pool); + dst_revprops_dir = svn_dirent_join(dst_fs->path, PATH_REVPROPS_DIR, + scratch_pool); + + /* Ensure that the required folders exist in the destination + * before actually copying the revisions and revprops. */ + SVN_ERR(svn_io_make_dir_recursively(dst_revs_dir, scratch_pool)); + SVN_ERR(svn_io_make_dir_recursively(dst_revprops_dir, scratch_pool)); + + if (cancel_func) + SVN_ERR(cancel_func(cancel_baton)); + + /* Split the logic for new and old FS formats. The latter is much simpler + * due to the absense of sharding and packing. However, it requires special + * care when updating the 'current' file (which contains not just the + * revision number, but also the next-ID counters). */ + SVN_ERR(hotcopy_revisions(src_fs, dst_fs, src_youngest, dst_youngest, + incremental, src_revs_dir, dst_revs_dir, + src_revprops_dir, dst_revprops_dir, + notify_func, notify_baton, + cancel_func, cancel_baton, scratch_pool)); + SVN_ERR(svn_fs_x__write_current(dst_fs, src_youngest, scratch_pool)); + + /* Replace the locks tree. + * This is racy in case readers are currently trying to list locks in + * the destination. However, we need to get rid of stale locks. + * This is the simplest way of doing this, so we accept this small race. */ + dst_subdir = svn_dirent_join(dst_fs->path, PATH_LOCKS_DIR, scratch_pool); + SVN_ERR(svn_io_remove_dir2(dst_subdir, TRUE, cancel_func, cancel_baton, + scratch_pool)); + src_subdir = svn_dirent_join(src_fs->path, PATH_LOCKS_DIR, scratch_pool); + SVN_ERR(svn_io_check_path(src_subdir, &kind, scratch_pool)); + if (kind == svn_node_dir) + SVN_ERR(svn_io_copy_dir_recursively(src_subdir, dst_fs->path, + PATH_LOCKS_DIR, TRUE, + cancel_func, cancel_baton, + scratch_pool)); + + /* Now copy the node-origins cache tree. */ + src_subdir = svn_dirent_join(src_fs->path, PATH_NODE_ORIGINS_DIR, + scratch_pool); + SVN_ERR(svn_io_check_path(src_subdir, &kind, scratch_pool)); + if (kind == svn_node_dir) + SVN_ERR(hotcopy_io_copy_dir_recursively(NULL, src_subdir, dst_fs->path, + PATH_NODE_ORIGINS_DIR, TRUE, + cancel_func, cancel_baton, + scratch_pool)); + + /* + * NB: Data copied below is only read by writers, not readers. + * Writers are still locked out at this point. + */ + + /* Copy the rep cache and then remove entries for revisions + * younger than the destination's youngest revision. */ + src_subdir = svn_dirent_join(src_fs->path, REP_CACHE_DB_NAME, scratch_pool); + dst_subdir = svn_dirent_join(dst_fs->path, REP_CACHE_DB_NAME, scratch_pool); + SVN_ERR(svn_io_check_path(src_subdir, &kind, scratch_pool)); + if (kind == svn_node_file) + { + /* Copy the rep cache and then remove entries for revisions + * that did not make it into the destination. */ + SVN_ERR(svn_sqlite__hotcopy(src_subdir, dst_subdir, scratch_pool)); + SVN_ERR(svn_fs_x__del_rep_reference(dst_fs, src_youngest, + scratch_pool)); + } + + /* Copy the txn-current file. */ + SVN_ERR(svn_io_dir_file_copy(src_fs->path, dst_fs->path, + PATH_TXN_CURRENT, scratch_pool)); + + /* If a revprop generation file exists in the source filesystem, + * reset it to zero (since this is on a different path, it will not + * overlap with data already in cache). Also, clean up stale files + * used for the named atomics implementation. */ + SVN_ERR(svn_fs_x__reset_revprop_generation_file(dst_fs, scratch_pool)); + + return SVN_NO_ERROR; +} + +/* Wrapper around hotcopy_body taking out all necessary source repository + * locks. + */ +static svn_error_t * +hotcopy_locking_src_body(void *baton, + apr_pool_t *scratch_pool) +{ + hotcopy_body_baton_t *hbb = baton; + + return svn_error_trace(svn_fs_x__with_pack_lock(hbb->src_fs, hotcopy_body, + baton, scratch_pool)); +} + +/* Create an empty filesystem at DST_FS at DST_PATH with the same + * configuration as SRC_FS (uuid, format, and other parameters). + * After creation DST_FS has no revisions, not even revision zero. */ +static svn_error_t * +hotcopy_create_empty_dest(svn_fs_t *src_fs, + svn_fs_t *dst_fs, + const char *dst_path, + apr_pool_t *scratch_pool) +{ + svn_fs_x__data_t *src_ffd = src_fs->fsap_data; + + /* Create the DST_FS repository with the same layout as SRC_FS. */ + SVN_ERR(svn_fs_x__create_file_tree(dst_fs, dst_path, src_ffd->format, + src_ffd->max_files_per_dir, + scratch_pool)); + + /* Copy the UUID. Hotcopy destination receives a new instance ID, but + * has the same filesystem UUID as the source. */ + SVN_ERR(svn_fs_x__set_uuid(dst_fs, src_fs->uuid, NULL, scratch_pool)); + + /* Remove revision 0 contents. Otherwise, it may not get overwritten + * due to having a newer timestamp. */ + SVN_ERR(hotcopy_remove_file(svn_fs_x__path_rev(dst_fs, 0, scratch_pool), + scratch_pool)); + SVN_ERR(hotcopy_remove_file(svn_fs_x__path_revprops(dst_fs, 0, + scratch_pool), + scratch_pool)); + + /* This filesystem is ready. Stamp it with a format number. Fail if + * the 'format' file should already exist. */ + SVN_ERR(svn_fs_x__write_format(dst_fs, FALSE, scratch_pool)); + + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__hotcopy_prepare_target(svn_fs_t *src_fs, + svn_fs_t *dst_fs, + const char *dst_path, + svn_boolean_t incremental, + apr_pool_t *scratch_pool) +{ + if (incremental) + { + const char *dst_format_abspath; + svn_node_kind_t dst_format_kind; + + /* Check destination format to be sure we know how to incrementally + * hotcopy to the destination FS. */ + dst_format_abspath = svn_dirent_join(dst_path, PATH_FORMAT, + scratch_pool); + SVN_ERR(svn_io_check_path(dst_format_abspath, &dst_format_kind, + scratch_pool)); + if (dst_format_kind == svn_node_none) + { + /* Destination doesn't exist yet. Perform a normal hotcopy to a + * empty destination using the same configuration as the source. */ + SVN_ERR(hotcopy_create_empty_dest(src_fs, dst_fs, dst_path, + scratch_pool)); + } + else + { + /* Check the existing repository. */ + SVN_ERR(svn_fs_x__open(dst_fs, dst_path, scratch_pool)); + SVN_ERR(hotcopy_incremental_check_preconditions(src_fs, dst_fs)); + } + } + else + { + /* Start out with an empty destination using the same configuration + * as the source. */ + SVN_ERR(hotcopy_create_empty_dest(src_fs, dst_fs, dst_path, + scratch_pool)); + } + + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__hotcopy(svn_fs_t *src_fs, + svn_fs_t *dst_fs, + svn_boolean_t incremental, + svn_fs_hotcopy_notify_t notify_func, + void *notify_baton, + svn_cancel_func_t cancel_func, + void *cancel_baton, + apr_pool_t *scratch_pool) +{ + hotcopy_body_baton_t hbb; + + hbb.src_fs = src_fs; + hbb.dst_fs = dst_fs; + hbb.incremental = incremental; + hbb.notify_func = notify_func; + hbb.notify_baton = notify_baton; + hbb.cancel_func = cancel_func; + hbb.cancel_baton = cancel_baton; + SVN_ERR(svn_fs_x__with_all_locks(dst_fs, hotcopy_locking_src_body, &hbb, + scratch_pool)); + + return SVN_NO_ERROR; +} diff --git a/subversion/libsvn_fs_x/hotcopy.h b/subversion/libsvn_fs_x/hotcopy.h new file mode 100644 index 0000000..516c66a --- /dev/null +++ b/subversion/libsvn_fs_x/hotcopy.h @@ -0,0 +1,53 @@ +/* hotcopy.h : interface to the native filesystem layer + * + * ==================================================================== + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + * ==================================================================== + */ + +#ifndef SVN_LIBSVN_FS__HOTCOPY_H +#define SVN_LIBSVN_FS__HOTCOPY_H + +#include "fs.h" + +/* Create an empty copy of the fsfs filesystem SRC_FS into a new DST_FS at + * DST_PATH. If INCREMENTAL is TRUE, perform a few pre-checks only if + * a repo already exists at DST_PATH. + * Use SCRATCH_POOL for temporary allocations. */ +svn_error_t * +svn_fs_x__hotcopy_prepare_target(svn_fs_t *src_fs, + svn_fs_t *dst_fs, + const char *dst_path, + svn_boolean_t incremental, + apr_pool_t *scratch_pool); + +/* Copy the fsfs filesystem SRC_FS into DST_FS. If INCREMENTAL is TRUE, do + * not re-copy data which already exists in DST_FS. Indicate progress via + * the optional NOTIFY_FUNC callback using NOTIFY_BATON. + * Use SCRATCH_POOL for temporary allocations. */ +svn_error_t * +svn_fs_x__hotcopy(svn_fs_t *src_fs, + svn_fs_t *dst_fs, + svn_boolean_t incremental, + svn_fs_hotcopy_notify_t notify_func, + void *notify_baton, + svn_cancel_func_t cancel_func, + void *cancel_baton, + apr_pool_t *scratch_pool); + +#endif diff --git a/subversion/libsvn_fs_x/id.c b/subversion/libsvn_fs_x/id.c new file mode 100644 index 0000000..0127175 --- /dev/null +++ b/subversion/libsvn_fs_x/id.c @@ -0,0 +1,198 @@ +/* id.c : implements FSX-internal ID functions + * + * ==================================================================== + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + * ==================================================================== + */ + +#include <assert.h> + +#include "id.h" +#include "index.h" +#include "util.h" + +#include "private/svn_string_private.h" + + + +svn_boolean_t +svn_fs_x__is_txn(svn_fs_x__change_set_t change_set) +{ + return change_set < SVN_FS_X__INVALID_CHANGE_SET; +} + +svn_boolean_t +svn_fs_x__is_revision(svn_fs_x__change_set_t change_set) +{ + return change_set > SVN_FS_X__INVALID_CHANGE_SET; +} + +svn_revnum_t +svn_fs_x__get_revnum(svn_fs_x__change_set_t change_set) +{ + return svn_fs_x__is_revision(change_set) + ? (svn_revnum_t)change_set + : SVN_INVALID_REVNUM; +} + +apr_int64_t +svn_fs_x__get_txn_id(svn_fs_x__change_set_t change_set) +{ + return svn_fs_x__is_txn(change_set) + ? -change_set + SVN_FS_X__INVALID_CHANGE_SET -1 + : SVN_FS_X__INVALID_TXN_ID; +} + + +svn_fs_x__change_set_t +svn_fs_x__change_set_by_rev(svn_revnum_t revnum) +{ + assert(revnum >= SVN_FS_X__INVALID_CHANGE_SET); + return revnum; +} + +svn_fs_x__change_set_t +svn_fs_x__change_set_by_txn(apr_int64_t txn_id) +{ + assert(txn_id >= SVN_FS_X__INVALID_CHANGE_SET); + return -txn_id + SVN_FS_X__INVALID_CHANGE_SET -1; +} + + +/* Parse the NUL-terminated ID part at DATA and write the result into *PART. + * Return TRUE if no errors were detected. */ +static svn_boolean_t +part_parse(svn_fs_x__id_t *part, + const char *data) +{ + part->number = svn__base36toui64(&data, data); + switch (data[0]) + { + /* txn number? */ + case '-': part->change_set = -svn__base36toui64(&data, data + 1); + return TRUE; + + /* revision number? */ + case '+': part->change_set = svn__base36toui64(&data, data + 1); + return TRUE; + + /* everything else is forbidden */ + default: return FALSE; + } +} + +/* Write the textual representation of *PART into P and return a pointer + * to the first position behind that string. + */ +static char * +part_unparse(char *p, + const svn_fs_x__id_t *part) +{ + p += svn__ui64tobase36(p, part->number); + if (part->change_set >= 0) + { + *(p++) = '+'; + p += svn__ui64tobase36(p, part->change_set); + } + else + { + *(p++) = '-'; + p += svn__ui64tobase36(p, -part->change_set); + } + + return p; +} + + + +/* Operations on ID parts */ + +svn_boolean_t +svn_fs_x__id_is_root(const svn_fs_x__id_t* part) +{ + return part->change_set == 0 && part->number == 0; +} + +svn_boolean_t +svn_fs_x__id_eq(const svn_fs_x__id_t *lhs, + const svn_fs_x__id_t *rhs) +{ + return lhs->change_set == rhs->change_set && lhs->number == rhs->number; +} + +svn_error_t * +svn_fs_x__id_parse(svn_fs_x__id_t *part, + const char *data) +{ + if (!part_parse(part, data)) + return svn_error_createf(SVN_ERR_FS_MALFORMED_NODEREV_ID, NULL, + "Malformed ID string"); + + return SVN_NO_ERROR; +} + +svn_string_t * +svn_fs_x__id_unparse(const svn_fs_x__id_t *id, + apr_pool_t *result_pool) +{ + char string[2 * SVN_INT64_BUFFER_SIZE + 1]; + char *p = part_unparse(string, id); + + return svn_string_ncreate(string, p - string, result_pool); +} + +void +svn_fs_x__id_reset(svn_fs_x__id_t *part) +{ + part->change_set = SVN_FS_X__INVALID_CHANGE_SET; + part->number = 0; +} + +svn_boolean_t +svn_fs_x__id_used(const svn_fs_x__id_t *part) +{ + return part->change_set != SVN_FS_X__INVALID_CHANGE_SET; +} + +void +svn_fs_x__init_txn_root(svn_fs_x__id_t *noderev_id, + svn_fs_x__txn_id_t txn_id) +{ + noderev_id->change_set = svn_fs_x__change_set_by_txn(txn_id); + noderev_id->number = SVN_FS_X__ITEM_INDEX_ROOT_NODE; +} + +void +svn_fs_x__init_rev_root(svn_fs_x__id_t *noderev_id, + svn_revnum_t rev) +{ + noderev_id->change_set = svn_fs_x__change_set_by_rev(rev); + noderev_id->number = SVN_FS_X__ITEM_INDEX_ROOT_NODE; +} + +int +svn_fs_x__id_compare(const svn_fs_x__id_t *a, + const svn_fs_x__id_t *b) +{ + if (a->change_set < b->change_set) + return -1; + if (a->change_set > b->change_set) + return 1; + + return a->number < b->number ? -1 : a->number == b->number ? 0 : 1; +} diff --git a/subversion/libsvn_fs_x/id.h b/subversion/libsvn_fs_x/id.h new file mode 100644 index 0000000..e584043 --- /dev/null +++ b/subversion/libsvn_fs_x/id.h @@ -0,0 +1,135 @@ +/* id.h : interface to FSX-internal ID functions + * + * ==================================================================== + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + * ==================================================================== + */ + +#ifndef SVN_LIBSVN_FS_X_ID_H +#define SVN_LIBSVN_FS_X_ID_H + +#include "svn_fs.h" + +#ifdef __cplusplus +extern "C" { +#endif /* __cplusplus */ + +/* Unique identifier for a transaction within the given repository. */ +typedef apr_int64_t svn_fs_x__txn_id_t; + +/* svn_fs_x__txn_id_t value for everything that is not a transaction. */ +#define SVN_FS_X__INVALID_TXN_ID ((svn_fs_x__txn_id_t)(-1)) + +/* Change set is the umbrella term for transaction and revision in FSX. + * Revision numbers (>=0) map 1:1 onto change sets while txns are mapped + * onto the negatve value range. */ +typedef apr_int64_t svn_fs_x__change_set_t; + +/* Invalid / unused change set number. */ +#define SVN_FS_X__INVALID_CHANGE_SET ((svn_fs_x__change_set_t)(-1)) + +/* Return TRUE iff the CHANGE_SET refers to a revision + (will return FALSE for SVN_INVALID_REVNUM). */ +svn_boolean_t +svn_fs_x__is_revision(svn_fs_x__change_set_t change_set); + +/* Return TRUE iff the CHANGE_SET refers to a transaction + (will return FALSE for SVN_FS_X__INVALID_TXN_ID). */ +svn_boolean_t +svn_fs_x__is_txn(svn_fs_x__change_set_t change_set); + +/* Return the revision number that corresponds to CHANGE_SET. + Will SVN_INVALID_REVNUM for transactions. */ +svn_revnum_t +svn_fs_x__get_revnum(svn_fs_x__change_set_t change_set); + +/* Return the transaction ID that corresponds to CHANGE_SET. + Will SVN_FS_X__INVALID_TXN_ID for revisions. */ +svn_fs_x__txn_id_t +svn_fs_x__get_txn_id(svn_fs_x__change_set_t change_set); + +/* Convert REVNUM into a change set number */ +svn_fs_x__change_set_t +svn_fs_x__change_set_by_rev(svn_revnum_t revnum); + +/* Convert TXN_ID into a change set number */ +svn_fs_x__change_set_t +svn_fs_x__change_set_by_txn(svn_fs_x__txn_id_t txn_id); + +/* An ID in FSX consists of a creation CHANGE_SET number and some changeset- + * local counter value (NUMBER). + */ +typedef struct svn_fs_x__id_t +{ + svn_fs_x__change_set_t change_set; + + apr_uint64_t number; +} svn_fs_x__id_t; + + +/*** Operations on ID parts. ***/ + +/* Return TRUE, if both elements of the PART is 0, i.e. this is the default + * value if e.g. no copies were made of this node. */ +svn_boolean_t +svn_fs_x__id_is_root(const svn_fs_x__id_t *part); + +/* Return TRUE, if all element values of *LHS and *RHS match. */ +svn_boolean_t +svn_fs_x__id_eq(const svn_fs_x__id_t *lhs, + const svn_fs_x__id_t *rhs); + +/* Parse the NUL-terminated ID part at DATA and write the result into *PART. + */ +svn_error_t * +svn_fs_x__id_parse(svn_fs_x__id_t *part, + const char *data); + +/* Convert ID into string form, allocated in RESULT_POOL. */ +svn_string_t * +svn_fs_x__id_unparse(const svn_fs_x__id_t*id, + apr_pool_t *result_pool); + +/* Set *PART to "unused". */ +void +svn_fs_x__id_reset(svn_fs_x__id_t *part); + +/* Return TRUE if *PART is belongs to either a revision or transaction. */ +svn_boolean_t +svn_fs_x__id_used(const svn_fs_x__id_t *part); + +/* Return 0 if A and B are equal, 1 if A is "greater than" B, -1 otherwise. */ +int +svn_fs_x__id_compare(const svn_fs_x__id_t *a, + const svn_fs_x__id_t *b); + +/* Set *NODEREV_ID to the root node ID of transaction TXN_ID. */ +void +svn_fs_x__init_txn_root(svn_fs_x__id_t *noderev_id, + svn_fs_x__txn_id_t txn_id); + +/* Set *NODEREV_ID to the root node ID of revision REV. */ +void +svn_fs_x__init_rev_root(svn_fs_x__id_t *noderev_id, + svn_revnum_t rev); + +#ifdef __cplusplus +} +#endif /* __cplusplus */ + +#endif /* SVN_LIBSVN_FS_X_ID_H */ diff --git a/subversion/libsvn_fs_x/index.c b/subversion/libsvn_fs_x/index.c new file mode 100644 index 0000000..7d568f9 --- /dev/null +++ b/subversion/libsvn_fs_x/index.c @@ -0,0 +1,3981 @@ +/* index.c indexing support for FSX support + * + * ==================================================================== + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + * ==================================================================== + */ + +#include <assert.h> + +#include "svn_io.h" +#include "svn_pools.h" +#include "svn_sorts.h" + +#include "index.h" +#include "util.h" +#include "pack.h" + +#include "private/svn_dep_compat.h" +#include "private/svn_sorts_private.h" +#include "private/svn_subr_private.h" +#include "private/svn_temp_serializer.h" + +#include "svn_private_config.h" +#include "temp_serializer.h" +#include "fs_x.h" + +#include "../libsvn_fs/fs-loader.h" + +/* maximum length of a uint64 in an 7/8b encoding */ +#define ENCODED_INT_LENGTH 10 + +/* APR is missing an APR_OFF_T_MAX. So, define one. We will use it to + * limit file offsets stored in the indexes. + * + * We assume that everything shorter than 64 bits, it is at least 32 bits. + * We also assume that the type is always signed meaning we only have an + * effective positive range of 63 or 31 bits, respectively. + */ +static +const apr_uint64_t off_t_max = (sizeof(apr_off_t) == sizeof(apr_int64_t)) + ? APR_INT64_MAX + : APR_INT32_MAX; + +/* We store P2L proto-index entries as 6 values, 64 bits each on disk. + * See also svn_fs_fs__p2l_proto_index_add_entry(). + */ +#define P2L_PROTO_INDEX_ENTRY_SIZE (6 * sizeof(apr_uint64_t)) + +/* We put this string in front of the L2P index header. */ +#define L2P_STREAM_PREFIX "L2P-INDEX\n" + +/* We put this string in front of the P2L index header. */ +#define P2L_STREAM_PREFIX "P2L-INDEX\n" + +/* Size of the buffer that will fit the index header prefixes. */ +#define STREAM_PREFIX_LEN MAX(sizeof(L2P_STREAM_PREFIX), \ + sizeof(P2L_STREAM_PREFIX)) + +/* Page tables in the log-to-phys index file exclusively contain entries + * of this type to describe position and size of a given page. + */ +typedef struct l2p_page_table_entry_t +{ + /* global offset on the page within the index file */ + apr_uint64_t offset; + + /* number of mapping entries in that page */ + apr_uint32_t entry_count; + + /* size of the page on disk (in the index file) */ + apr_uint32_t size; +} l2p_page_table_entry_t; + +/* Master run-time data structure of an log-to-phys index. It contains + * the page tables of every revision covered by that index - but not the + * pages themselves. + */ +typedef struct l2p_header_t +{ + /* first revision covered by this index */ + svn_revnum_t first_revision; + + /* number of revisions covered */ + apr_size_t revision_count; + + /* (max) number of entries per page */ + apr_uint32_t page_size; + + /* indexes into PAGE_TABLE that mark the first page of the respective + * revision. PAGE_TABLE_INDEX[REVISION_COUNT] points to the end of + * PAGE_TABLE. + */ + apr_size_t * page_table_index; + + /* Page table covering all pages in the index */ + l2p_page_table_entry_t * page_table; +} l2p_header_t; + +/* Run-time data structure containing a single log-to-phys index page. + */ +typedef struct l2p_page_t +{ + /* number of entries in the OFFSETS array */ + apr_uint32_t entry_count; + + /* global file offsets (item index is the array index) within the + * packed or non-packed rev file. Offset will be -1 for unused / + * invalid item index values. */ + apr_off_t *offsets; + + /* In case that the item is stored inside a container, this is the + * identifying index of the item within that container. 0 for the + * container itself or for items that aren't containers. */ + apr_uint32_t *sub_items; +} l2p_page_t; + +/* All of the log-to-phys proto index file consist of entries of this type. + */ +typedef struct l2p_proto_entry_t +{ + /* phys offset + 1 of the data container. 0 for "new revision" entries. */ + apr_uint64_t offset; + + /* corresponding item index. 0 for "new revision" entries. */ + apr_uint64_t item_index; + + /* index within the container starting @ offset. 0 for "new revision" + * entries and for items with no outer container. */ + apr_uint32_t sub_item; +} l2p_proto_entry_t; + +/* Master run-time data structure of an phys-to-log index. It contains + * an array with one offset value for each rev file cluster. + */ +typedef struct p2l_header_t +{ + /* first revision covered by the index (and rev file) */ + svn_revnum_t first_revision; + + /* number of bytes in the rev files covered by each p2l page */ + apr_uint64_t page_size; + + /* number of pages / clusters in that rev file */ + apr_size_t page_count; + + /* number of bytes in the rev file */ + apr_uint64_t file_size; + + /* offsets of the pages / cluster descriptions within the index file */ + apr_off_t *offsets; +} p2l_header_t; + +/* + * packed stream array + */ + +/* How many numbers we will pre-fetch and buffer in a packed number stream. + */ +enum { MAX_NUMBER_PREFETCH = 64 }; + +/* Prefetched number entry in a packed number stream. + */ +typedef struct value_position_pair_t +{ + /* prefetched number */ + apr_uint64_t value; + + /* number of bytes read, *including* this number, since the buffer start */ + apr_size_t total_len; +} value_position_pair_t; + +/* State of a prefetching packed number stream. It will read compressed + * index data efficiently and present it as a series of non-packed uint64. + */ +struct svn_fs_x__packed_number_stream_t +{ + /* underlying data file containing the packed values */ + apr_file_t *file; + + /* Offset within FILE at which the stream data starts + * (i.e. which offset will reported as offset 0 by packed_stream_offset). */ + apr_off_t stream_start; + + /* First offset within FILE after the stream data. + * Attempts to read beyond this will cause an "Unexpected End Of Stream" + * error. */ + apr_off_t stream_end; + + /* number of used entries in BUFFER (starting at index 0) */ + apr_size_t used; + + /* index of the next number to read from the BUFFER (0 .. USED). + * If CURRENT == USED, we need to read more data upon get() */ + apr_size_t current; + + /* offset in FILE from which the first entry in BUFFER has been read */ + apr_off_t start_offset; + + /* offset in FILE from which the next number has to be read */ + apr_off_t next_offset; + + /* read the file in chunks of this size */ + apr_size_t block_size; + + /* pool to be used for file ops etc. */ + apr_pool_t *pool; + + /* buffer for prefetched values */ + value_position_pair_t buffer[MAX_NUMBER_PREFETCH]; +}; + +/* Return an svn_error_t * object for error ERR on STREAM with the given + * MESSAGE string. The latter must have a placeholder for the index file + * name ("%s") and the current read offset (e.g. "0x%lx"). + */ +static svn_error_t * +stream_error_create(svn_fs_x__packed_number_stream_t *stream, + apr_status_t err, + const char *message) +{ + const char *file_name; + apr_off_t offset; + SVN_ERR(svn_io_file_name_get(&file_name, stream->file, + stream->pool)); + SVN_ERR(svn_fs_x__get_file_offset(&offset, stream->file, stream->pool)); + + return svn_error_createf(err, NULL, message, file_name, + apr_psprintf(stream->pool, + "%" APR_UINT64_T_HEX_FMT, + (apr_uint64_t)offset)); +} + +/* Read up to MAX_NUMBER_PREFETCH numbers from the STREAM->NEXT_OFFSET in + * STREAM->FILE and buffer them. + * + * We don't want GCC and others to inline this (infrequently called) + * function into packed_stream_get() because it prevents the latter from + * being inlined itself. + */ +SVN__PREVENT_INLINE +static svn_error_t * +packed_stream_read(svn_fs_x__packed_number_stream_t *stream) +{ + unsigned char buffer[MAX_NUMBER_PREFETCH]; + apr_size_t read = 0; + apr_size_t i; + value_position_pair_t *target; + apr_off_t block_start = 0; + apr_off_t block_left = 0; + apr_status_t err; + + /* all buffered data will have been read starting here */ + stream->start_offset = stream->next_offset; + + /* packed numbers are usually not aligned to MAX_NUMBER_PREFETCH blocks, + * i.e. the last number has been incomplete (and not buffered in stream) + * and need to be re-read. Therefore, always correct the file pointer. + */ + SVN_ERR(svn_io_file_aligned_seek(stream->file, stream->block_size, + &block_start, stream->next_offset, + stream->pool)); + + /* prefetch at least one number but, if feasible, don't cross block + * boundaries. This shall prevent jumping back and forth between two + * blocks because the extra data was not actually request _now_. + */ + read = sizeof(buffer); + block_left = stream->block_size - (stream->next_offset - block_start); + if (block_left >= 10 && block_left < read) + read = (apr_size_t)block_left; + + /* Don't read beyond the end of the file section that belongs to this + * index / stream. */ + read = (apr_size_t)MIN(read, stream->stream_end - stream->next_offset); + + err = apr_file_read(stream->file, buffer, &read); + if (err && !APR_STATUS_IS_EOF(err)) + return stream_error_create(stream, err, + _("Can't read index file '%s' at offset 0x%")); + + /* if the last number is incomplete, trim it from the buffer */ + while (read > 0 && buffer[read-1] >= 0x80) + --read; + + /* we call read() only if get() requires more data. So, there must be + * at least *one* further number. */ + if SVN__PREDICT_FALSE(read == 0) + return stream_error_create(stream, err, + _("Unexpected end of index file %s at offset 0x%")); + + /* parse file buffer and expand into stream buffer */ + target = stream->buffer; + for (i = 0; i < read;) + { + if (buffer[i] < 0x80) + { + /* numbers < 128 are relatively frequent and particularly easy + * to decode. Give them special treatment. */ + target->value = buffer[i]; + ++i; + target->total_len = i; + ++target; + } + else + { + apr_uint64_t value = 0; + apr_uint64_t shift = 0; + while (buffer[i] >= 0x80) + { + value += ((apr_uint64_t)buffer[i] & 0x7f) << shift; + shift += 7; + ++i; + } + + target->value = value + ((apr_uint64_t)buffer[i] << shift); + ++i; + target->total_len = i; + ++target; + + /* let's catch corrupted data early. It would surely cause + * havoc further down the line. */ + if SVN__PREDICT_FALSE(shift > 8 * sizeof(value)) + return svn_error_createf(SVN_ERR_FS_INDEX_CORRUPTION, NULL, + _("Corrupt index: number too large")); + } + } + + /* update stream state */ + stream->used = target - stream->buffer; + stream->next_offset = stream->start_offset + i; + stream->current = 0; + + return SVN_NO_ERROR; +} + +/* Create and open a packed number stream reading from offsets START to + * END in FILE and return it in *STREAM. Access the file in chunks of + * BLOCK_SIZE bytes. Expect the stream to be prefixed by STREAM_PREFIX. + * Allocate *STREAM in RESULT_POOL and use SCRATCH_POOL for temporaries. + */ +static svn_error_t * +packed_stream_open(svn_fs_x__packed_number_stream_t **stream, + apr_file_t *file, + apr_off_t start, + apr_off_t end, + const char *stream_prefix, + apr_size_t block_size, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool) +{ + char buffer[STREAM_PREFIX_LEN + 1] = { 0 }; + apr_size_t len = strlen(stream_prefix); + svn_fs_x__packed_number_stream_t *result; + + /* If this is violated, we forgot to adjust STREAM_PREFIX_LEN after + * changing the index header prefixes. */ + SVN_ERR_ASSERT(len < sizeof(buffer)); + + /* Read the header prefix and compare it with the expected prefix */ + SVN_ERR(svn_io_file_aligned_seek(file, block_size, NULL, start, + scratch_pool)); + SVN_ERR(svn_io_file_read_full2(file, buffer, len, NULL, NULL, + scratch_pool)); + + if (strncmp(buffer, stream_prefix, len)) + return svn_error_createf(SVN_ERR_FS_INDEX_CORRUPTION, NULL, + _("Index stream header prefix mismatch.\n" + " expected: %s" + " found: %s"), stream_prefix, buffer); + + /* Construct the actual stream object. */ + result = apr_palloc(result_pool, sizeof(*result)); + + result->pool = result_pool; + result->file = file; + result->stream_start = start + len; + result->stream_end = end; + + result->used = 0; + result->current = 0; + result->start_offset = result->stream_start; + result->next_offset = result->stream_start; + result->block_size = block_size; + + *stream = result; + + return SVN_NO_ERROR; +} + +/* + * The forced inline is required for performance reasons: This is a very + * hot code path (called for every item we read) but e.g. GCC would rather + * chose to inline packed_stream_read() here, preventing packed_stream_get + * from being inlined itself. + */ +SVN__FORCE_INLINE +static svn_error_t* +packed_stream_get(apr_uint64_t *value, + svn_fs_x__packed_number_stream_t *stream) +{ + if (stream->current == stream->used) + SVN_ERR(packed_stream_read(stream)); + + *value = stream->buffer[stream->current].value; + ++stream->current; + + return SVN_NO_ERROR; +} + +/* Navigate STREAM to packed stream offset OFFSET. There will be no checks + * whether the given OFFSET is valid. + */ +static void +packed_stream_seek(svn_fs_x__packed_number_stream_t *stream, + apr_off_t offset) +{ + apr_off_t file_offset = offset + stream->stream_start; + + if ( stream->used == 0 + || offset < stream->start_offset + || offset >= stream->next_offset) + { + /* outside buffered data. Next get() will read() from OFFSET. */ + stream->start_offset = file_offset; + stream->next_offset = file_offset; + stream->current = 0; + stream->used = 0; + } + else + { + /* Find the suitable location in the stream buffer. + * Since our buffer is small, it is efficient enough to simply scan + * it for the desired position. */ + apr_size_t i; + for (i = 0; i < stream->used; ++i) + if (stream->buffer[i].total_len > file_offset - stream->start_offset) + break; + + stream->current = i; + } +} + +/* Return the packed stream offset of at which the next number in the stream + * can be found. + */ +static apr_off_t +packed_stream_offset(svn_fs_x__packed_number_stream_t *stream) +{ + apr_off_t file_offset + = stream->current == 0 + ? stream->start_offset + : stream->buffer[stream->current-1].total_len + stream->start_offset; + + return file_offset - stream->stream_start; +} + +/* Write VALUE to the PROTO_INDEX file, using SCRATCH_POOL for temporary + * allocations. + * + * The point of this function is to ensure an architecture-independent + * proto-index file format. All data is written as unsigned 64 bits ints + * in little endian byte order. 64 bits is the largest portable integer + * we have and unsigned values have well-defined conversions in C. + */ +static svn_error_t * +write_uint64_to_proto_index(apr_file_t *proto_index, + apr_uint64_t value, + apr_pool_t *scratch_pool) +{ + apr_byte_t buffer[sizeof(value)]; + int i; + apr_size_t written; + + /* Split VALUE into 8 bytes using LE ordering. */ + for (i = 0; i < sizeof(buffer); ++i) + { + /* Unsigned conversions are well-defined ... */ + buffer[i] = (apr_byte_t)value; + value >>= CHAR_BIT; + } + + /* Write it all to disk. */ + SVN_ERR(svn_io_file_write_full(proto_index, buffer, sizeof(buffer), + &written, scratch_pool)); + SVN_ERR_ASSERT(written == sizeof(buffer)); + + return SVN_NO_ERROR; +} + +/* Read one unsigned 64 bit value from PROTO_INDEX file and return it in + * *VALUE_P. If EOF is NULL, error out when trying to read beyond EOF. + * Use SCRATCH_POOL for temporary allocations. + * + * This function is the inverse to write_uint64_to_proto_index (see there), + * reading the external LE byte order and convert it into host byte order. + */ +static svn_error_t * +read_uint64_from_proto_index(apr_file_t *proto_index, + apr_uint64_t *value_p, + svn_boolean_t *eof, + apr_pool_t *scratch_pool) +{ + apr_byte_t buffer[sizeof(*value_p)]; + apr_size_t read; + + /* Read the full 8 bytes or our 64 bit value, unless we hit EOF. + * Assert that we never read partial values. */ + SVN_ERR(svn_io_file_read_full2(proto_index, buffer, sizeof(buffer), + &read, eof, scratch_pool)); + SVN_ERR_ASSERT((eof && *eof) || read == sizeof(buffer)); + + /* If we did not hit EOF, reconstruct the uint64 value and return it. */ + if (!eof || !*eof) + { + int i; + apr_uint64_t value; + + /* This could only overflow if CHAR_BIT had a value that is not + * a divisor of 64. */ + value = 0; + for (i = sizeof(buffer) - 1; i >= 0; --i) + value = (value << CHAR_BIT) + buffer[i]; + + *value_p = value; + } + + return SVN_NO_ERROR; +} + +/* Convenience function similar to read_uint64_from_proto_index, but returns + * an uint32 value in VALUE_P. Return an error if the value does not fit. + */ +static svn_error_t * +read_uint32_from_proto_index(apr_file_t *proto_index, + apr_uint32_t *value_p, + svn_boolean_t *eof, + apr_pool_t *scratch_pool) +{ + apr_uint64_t value; + SVN_ERR(read_uint64_from_proto_index(proto_index, &value, eof, + scratch_pool)); + if (!eof || !*eof) + { + if (value > APR_UINT32_MAX) + return svn_error_createf(SVN_ERR_FS_INDEX_OVERFLOW, NULL, + _("UINT32 0x%s too large, max = 0x%s"), + apr_psprintf(scratch_pool, + "%" APR_UINT64_T_HEX_FMT, + value), + apr_psprintf(scratch_pool, + "%" APR_UINT64_T_HEX_FMT, + (apr_uint64_t)APR_UINT32_MAX)); + + /* This conversion is not lossy because the value can be represented + * in the target type. */ + *value_p = (apr_uint32_t)value; + } + + return SVN_NO_ERROR; +} + +/* Convenience function similar to read_uint64_from_proto_index, but returns + * an off_t value in VALUE_P. Return an error if the value does not fit. + */ +static svn_error_t * +read_off_t_from_proto_index(apr_file_t *proto_index, + apr_off_t *value_p, + svn_boolean_t *eof, + apr_pool_t *scratch_pool) +{ + apr_uint64_t value; + SVN_ERR(read_uint64_from_proto_index(proto_index, &value, eof, + scratch_pool)); + if (!eof || !*eof) + { + if (value > off_t_max) + return svn_error_createf(SVN_ERR_FS_INDEX_OVERFLOW, NULL, + _("File offset 0x%s too large, max = 0x%s"), + apr_psprintf(scratch_pool, + "%" APR_UINT64_T_HEX_FMT, + value), + apr_psprintf(scratch_pool, + "%" APR_UINT64_T_HEX_FMT, + off_t_max)); + + /* Shortening conversion from unsigned to signed int is well-defined + * and not lossy in C because the value can be represented in the + * target type. */ + *value_p = (apr_off_t)value; + } + + return SVN_NO_ERROR; +} + +/* + * log-to-phys index + */ +svn_error_t * +svn_fs_x__l2p_proto_index_open(apr_file_t **proto_index, + const char *file_name, + apr_pool_t *result_pool) +{ + SVN_ERR(svn_io_file_open(proto_index, file_name, APR_READ | APR_WRITE + | APR_CREATE | APR_APPEND | APR_BUFFERED, + APR_OS_DEFAULT, result_pool)); + + return SVN_NO_ERROR; +} + +/* Append ENTRY to log-to-phys PROTO_INDEX file. + * Use SCRATCH_POOL for temporary allocations. + */ +static svn_error_t * +write_l2p_entry_to_proto_index(apr_file_t *proto_index, + l2p_proto_entry_t entry, + apr_pool_t *scratch_pool) +{ + SVN_ERR(write_uint64_to_proto_index(proto_index, entry.offset, + scratch_pool)); + SVN_ERR(write_uint64_to_proto_index(proto_index, entry.item_index, + scratch_pool)); + SVN_ERR(write_uint64_to_proto_index(proto_index, entry.sub_item, + scratch_pool)); + + return SVN_NO_ERROR; +} + +/* Read *ENTRY from log-to-phys PROTO_INDEX file and indicate end-of-file + * in *EOF, or error out in that case if EOF is NULL. *ENTRY is in an + * undefined state if an end-of-file occurred. + * Use SCRATCH_POOL for temporary allocations. + */ +static svn_error_t * +read_l2p_entry_from_proto_index(apr_file_t *proto_index, + l2p_proto_entry_t *entry, + svn_boolean_t *eof, + apr_pool_t *scratch_pool) +{ + SVN_ERR(read_uint64_from_proto_index(proto_index, &entry->offset, eof, + scratch_pool)); + SVN_ERR(read_uint64_from_proto_index(proto_index, &entry->item_index, eof, + scratch_pool)); + SVN_ERR(read_uint32_from_proto_index(proto_index, &entry->sub_item, eof, + scratch_pool)); + + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__l2p_proto_index_add_revision(apr_file_t *proto_index, + apr_pool_t *scratch_pool) +{ + l2p_proto_entry_t entry = { 0 }; + return svn_error_trace(write_l2p_entry_to_proto_index(proto_index, entry, + scratch_pool)); +} + +svn_error_t * +svn_fs_x__l2p_proto_index_add_entry(apr_file_t *proto_index, + apr_off_t offset, + apr_uint32_t sub_item, + apr_uint64_t item_index, + apr_pool_t *scratch_pool) +{ + l2p_proto_entry_t entry = { 0 }; + + /* make sure the conversion to uint64 works */ + SVN_ERR_ASSERT(offset >= -1); + + /* we support offset '-1' as a "not used" indication */ + entry.offset = (apr_uint64_t)offset + 1; + + /* make sure we can use item_index as an array index when building the + * final index file */ + SVN_ERR_ASSERT(item_index < UINT_MAX / 2); + entry.item_index = item_index; + + /* no limits on the container sub-item index */ + entry.sub_item = sub_item; + + return svn_error_trace(write_l2p_entry_to_proto_index(proto_index, entry, + scratch_pool)); +} + +/* Encode VALUE as 7/8b into P and return the number of bytes written. + * This will be used when _writing_ packed data. packed_stream_* is for + * read operations only. + */ +static apr_size_t +encode_uint(unsigned char *p, apr_uint64_t value) +{ + unsigned char *start = p; + while (value >= 0x80) + { + *p = (unsigned char)((value % 0x80) + 0x80); + value /= 0x80; + ++p; + } + + *p = (unsigned char)(value % 0x80); + return (p - start) + 1; +} + +/* Encode VALUE as 7/8b into P and return the number of bytes written. + * This maps signed ints onto unsigned ones. + */ +static apr_size_t +encode_int(unsigned char *p, apr_int64_t value) +{ + return encode_uint(p, (apr_uint64_t)(value < 0 ? -1 - 2*value : 2*value)); +} + +/* Append VALUE to STREAM in 7/8b encoding. + */ +static svn_error_t * +stream_write_encoded(svn_stream_t *stream, + apr_uint64_t value) +{ + unsigned char encoded[ENCODED_INT_LENGTH]; + + apr_size_t len = encode_uint(encoded, value); + return svn_error_trace(svn_stream_write(stream, (char *)encoded, &len)); +} + +/* Run-length-encode the uint64 numbers in ARRAY starting at index START + * up to but not including END. All numbers must be > 0. + * Return the number of remaining entries in ARRAY after START. + */ +static int +rle_array(apr_array_header_t *array, int start, int end) +{ + int i; + int target = start; + for (i = start; i < end; ++i) + { + apr_uint64_t value = APR_ARRAY_IDX(array, i, apr_uint64_t); + assert(value > 0); + + if (value == 1) + { + int counter; + for (counter = 1; i + counter < end; ++counter) + if (APR_ARRAY_IDX(array, i + counter, apr_uint64_t) != 1) + break; + + if (--counter) + { + APR_ARRAY_IDX(array, target, apr_uint64_t) = 0; + APR_ARRAY_IDX(array, target + 1, apr_uint64_t) = counter; + target += 2; + i += counter; + continue; + } + } + + APR_ARRAY_IDX(array, target, apr_uint64_t) = value; + ++target; + } + + return target; +} + +/* Utility data structure describing an log-2-phys page entry. + * This is only used as a transient representation during index creation. + */ +typedef struct l2p_page_entry_t +{ + apr_uint64_t offset; + apr_uint32_t sub_item; +} l2p_page_entry_t; + +/* qsort-compatible compare function taking two l2p_page_entry_t and + * ordering them by offset. + */ +static int +compare_l2p_entries_by_offset(const l2p_page_entry_t *lhs, + const l2p_page_entry_t *rhs) +{ + return lhs->offset > rhs->offset ? 1 + : lhs->offset == rhs->offset ? 0 : -1; +} + +/* Write the log-2-phys index page description for the l2p_page_entry_t + * array ENTRIES, starting with element START up to but not including END. + * Write the resulting representation into BUFFER. Use SCRATCH_POOL for + * temporary allocations. + */ +static svn_error_t * +encode_l2p_page(apr_array_header_t *entries, + int start, + int end, + svn_spillbuf_t *buffer, + apr_pool_t *scratch_pool) +{ + unsigned char encoded[ENCODED_INT_LENGTH]; + apr_hash_t *containers = apr_hash_make(scratch_pool); + int count = end - start; + int container_count = 0; + apr_uint64_t last_offset = 0; + int i; + + apr_size_t data_size = count * sizeof(l2p_page_entry_t); + svn_stringbuf_t *container_offsets + = svn_stringbuf_create_ensure(count * 2, scratch_pool); + + /* SORTED: relevant items from ENTRIES, sorted by offset */ + l2p_page_entry_t *sorted + = apr_pmemdup(scratch_pool, + entries->elts + start * sizeof(l2p_page_entry_t), + data_size); + qsort(sorted, end - start, sizeof(l2p_page_entry_t), + (int (*)(const void *, const void *))compare_l2p_entries_by_offset); + + /* identify container offsets and create container list */ + for (i = 0; i < count; ++i) + { + /* skip "unused" entries */ + if (sorted[i].offset == 0) + continue; + + /* offset already covered? */ + if (i > 0 && sorted[i].offset == sorted[i-1].offset) + continue; + + /* is this a container item + * (appears more than once or accesses to sub-items other than 0)? */ + if ( (i != count-1 && sorted[i].offset == sorted[i+1].offset) + || (sorted[i].sub_item != 0)) + { + svn_stringbuf_appendbytes(container_offsets, (const char *)encoded, + encode_uint(encoded, sorted[i].offset + - last_offset)); + last_offset = sorted[i].offset; + apr_hash_set(containers, + &sorted[i].offset, + sizeof(sorted[i].offset), + (void *)(apr_uintptr_t)++container_count); + } + } + + /* write container list to BUFFER */ + SVN_ERR(svn_spillbuf__write(buffer, (const char *)encoded, + encode_uint(encoded, container_count), + scratch_pool)); + SVN_ERR(svn_spillbuf__write(buffer, (const char *)container_offsets->data, + container_offsets->len, scratch_pool)); + + /* encode items */ + for (i = start; i < end; ++i) + { + l2p_page_entry_t *entry = &APR_ARRAY_IDX(entries, i, l2p_page_entry_t); + if (entry->offset == 0) + { + SVN_ERR(svn_spillbuf__write(buffer, "\0", 1, scratch_pool)); + } + else + { + void *void_idx = apr_hash_get(containers, &entry->offset, + sizeof(entry->offset)); + if (void_idx == NULL) + { + apr_uint64_t value = entry->offset + container_count; + SVN_ERR(svn_spillbuf__write(buffer, (const char *)encoded, + encode_uint(encoded, value), + scratch_pool)); + } + else + { + apr_uintptr_t idx = (apr_uintptr_t)void_idx; + apr_uint64_t value = entry->sub_item; + SVN_ERR(svn_spillbuf__write(buffer, (const char *)encoded, + encode_uint(encoded, idx), + scratch_pool)); + SVN_ERR(svn_spillbuf__write(buffer, (const char *)encoded, + encode_uint(encoded, value), + scratch_pool)); + } + } + } + + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__l2p_index_append(svn_checksum_t **checksum, + svn_fs_t *fs, + apr_file_t *index_file, + const char *proto_file_name, + svn_revnum_t revision, + apr_pool_t * result_pool, + apr_pool_t *scratch_pool) +{ + svn_fs_x__data_t *ffd = fs->fsap_data; + apr_file_t *proto_index = NULL; + svn_stream_t *stream; + int i; + int end; + apr_uint64_t entry; + svn_boolean_t eof = FALSE; + + int last_page_count = 0; /* total page count at the start of + the current revision */ + + /* temporary data structures that collect the data which will be moved + to the target file in a second step */ + apr_pool_t *local_pool = svn_pool_create(scratch_pool); + apr_pool_t *iterpool = svn_pool_create(local_pool); + apr_array_header_t *page_counts + = apr_array_make(local_pool, 16, sizeof(apr_uint64_t)); + apr_array_header_t *page_sizes + = apr_array_make(local_pool, 16, sizeof(apr_uint64_t)); + apr_array_header_t *entry_counts + = apr_array_make(local_pool, 16, sizeof(apr_uint64_t)); + + /* collect the item offsets and sub-item value for the current revision */ + apr_array_header_t *entries + = apr_array_make(local_pool, 256, sizeof(l2p_page_entry_t)); + + /* 64k blocks, spill after 16MB */ + svn_spillbuf_t *buffer + = svn_spillbuf__create(0x10000, 0x1000000, local_pool); + + /* Paranoia check that makes later casting to int32 safe. + * The current implementation is limited to 2G entries per page. */ + if (ffd->l2p_page_size > APR_INT32_MAX) + return svn_error_createf(SVN_ERR_FS_INDEX_OVERFLOW , NULL, + _("L2P index page size %s" + " exceeds current limit of 2G entries"), + apr_psprintf(local_pool, "%" APR_UINT64_T_FMT, + ffd->l2p_page_size)); + + /* start at the beginning of the source file */ + SVN_ERR(svn_io_file_open(&proto_index, proto_file_name, + APR_READ | APR_CREATE | APR_BUFFERED, + APR_OS_DEFAULT, local_pool)); + + /* process all entries until we fail due to EOF */ + for (entry = 0; !eof; ++entry) + { + l2p_proto_entry_t proto_entry; + + /* (attempt to) read the next entry from the source */ + SVN_ERR(read_l2p_entry_from_proto_index(proto_index, &proto_entry, + &eof, local_pool)); + + /* handle new revision */ + if ((entry > 0 && proto_entry.offset == 0) || eof) + { + /* dump entries, grouped into pages */ + + int entry_count = 0; + for (i = 0; i < entries->nelts; i += entry_count) + { + /* 1 page with up to L2P_PAGE_SIZE entries. + * fsfs.conf settings validation guarantees this to fit into + * our address space. */ + apr_size_t last_buffer_size + = (apr_size_t)svn_spillbuf__get_size(buffer); + + svn_pool_clear(iterpool); + + entry_count = ffd->l2p_page_size < entries->nelts - i + ? (int)ffd->l2p_page_size + : entries->nelts - i; + SVN_ERR(encode_l2p_page(entries, i, i + entry_count, + buffer, iterpool)); + + APR_ARRAY_PUSH(entry_counts, apr_uint64_t) = entry_count; + APR_ARRAY_PUSH(page_sizes, apr_uint64_t) + = svn_spillbuf__get_size(buffer) - last_buffer_size; + } + + apr_array_clear(entries); + + /* store the number of pages in this revision */ + APR_ARRAY_PUSH(page_counts, apr_uint64_t) + = page_sizes->nelts - last_page_count; + + last_page_count = page_sizes->nelts; + } + else + { + int idx; + + /* store the mapping in our array */ + l2p_page_entry_t page_entry = { 0 }; + + if (proto_entry.item_index > APR_INT32_MAX) + return svn_error_createf(SVN_ERR_FS_INDEX_OVERFLOW , NULL, + _("Item index %s too large " + "in l2p proto index for revision %ld"), + apr_psprintf(local_pool, + "%" APR_UINT64_T_FMT, + proto_entry.item_index), + revision + page_counts->nelts); + + idx = (int)proto_entry.item_index; + while (idx >= entries->nelts) + APR_ARRAY_PUSH(entries, l2p_page_entry_t) = page_entry; + + page_entry.offset = proto_entry.offset; + page_entry.sub_item = proto_entry.sub_item; + APR_ARRAY_IDX(entries, idx, l2p_page_entry_t) = page_entry; + } + } + + /* we are now done with the source file */ + SVN_ERR(svn_io_file_close(proto_index, local_pool)); + + /* Paranoia check that makes later casting to int32 safe. + * The current implementation is limited to 2G pages per index. */ + if (page_counts->nelts > APR_INT32_MAX) + return svn_error_createf(SVN_ERR_FS_INDEX_OVERFLOW , NULL, + _("L2P index page count %d" + " exceeds current limit of 2G pages"), + page_counts->nelts); + + /* open target stream. */ + stream = svn_stream_checksummed2(svn_stream_from_aprfile2(index_file, TRUE, + local_pool), + NULL, checksum, svn_checksum_md5, FALSE, + result_pool); + + + /* write header info */ + SVN_ERR(svn_stream_puts(stream, L2P_STREAM_PREFIX)); + SVN_ERR(stream_write_encoded(stream, revision)); + SVN_ERR(stream_write_encoded(stream, page_counts->nelts)); + SVN_ERR(stream_write_encoded(stream, ffd->l2p_page_size)); + SVN_ERR(stream_write_encoded(stream, page_sizes->nelts)); + + /* write the revision table */ + end = rle_array(page_counts, 0, page_counts->nelts); + for (i = 0; i < end; ++i) + { + apr_uint64_t value = APR_ARRAY_IDX(page_counts, i, apr_uint64_t); + SVN_ERR(stream_write_encoded(stream, value)); + } + + /* write the page table */ + for (i = 0; i < page_sizes->nelts; ++i) + { + apr_uint64_t value = APR_ARRAY_IDX(page_sizes, i, apr_uint64_t); + SVN_ERR(stream_write_encoded(stream, value)); + value = APR_ARRAY_IDX(entry_counts, i, apr_uint64_t); + SVN_ERR(stream_write_encoded(stream, value)); + } + + /* append page contents and implicitly close STREAM */ + SVN_ERR(svn_stream_copy3(svn_stream__from_spillbuf(buffer, local_pool), + stream, NULL, NULL, local_pool)); + + svn_pool_destroy(local_pool); + + return SVN_NO_ERROR; +} + +/* Return the base revision used to identify the p2l or lp2 index covering + * REVISION in FS. + */ +static svn_revnum_t +base_revision(svn_fs_t *fs, svn_revnum_t revision) +{ + svn_fs_x__data_t *ffd = fs->fsap_data; + return svn_fs_x__is_packed_rev(fs, revision) + ? revision - (revision % ffd->max_files_per_dir) + : revision; +} + +/* Data structure that describes which l2p page info shall be extracted + * from the cache and contains the fields that receive the result. + */ +typedef struct l2p_page_info_baton_t +{ + /* input data: we want the page covering (REVISION,ITEM_INDEX) */ + svn_revnum_t revision; + apr_uint64_t item_index; + + /* out data */ + /* page location and size of the page within the l2p index file */ + l2p_page_table_entry_t entry; + + /* page number within the pages for REVISION (not l2p index global!) */ + apr_uint32_t page_no; + + /* offset of ITEM_INDEX within that page */ + apr_uint32_t page_offset; + + /* revision identifying the l2p index file, also the first rev in that */ + svn_revnum_t first_revision; +} l2p_page_info_baton_t; + + +/* Utility function that copies the info requested by BATON->REVISION and + * BATON->ITEM_INDEX and from HEADER and PAGE_TABLE into the output fields + * of *BATON. Use SCRATCH_POOL for temporary allocations. + */ +static svn_error_t * +l2p_header_copy(l2p_page_info_baton_t *baton, + const l2p_header_t *header, + const l2p_page_table_entry_t *page_table, + const apr_size_t *page_table_index, + apr_pool_t *scratch_pool) +{ + /* revision offset within the index file */ + apr_size_t rel_revision = baton->revision - header->first_revision; + if (rel_revision >= header->revision_count) + return svn_error_createf(SVN_ERR_FS_INDEX_REVISION , NULL, + _("Revision %ld not covered by item index"), + baton->revision); + + /* select the relevant page */ + if (baton->item_index < header->page_size) + { + /* most revs fit well into a single page */ + baton->page_offset = (apr_uint32_t)baton->item_index; + baton->page_no = 0; + baton->entry = page_table[page_table_index[rel_revision]]; + } + else + { + const l2p_page_table_entry_t *first_entry; + const l2p_page_table_entry_t *last_entry; + apr_uint64_t max_item_index; + + /* range of pages for this rev */ + first_entry = page_table + page_table_index[rel_revision]; + last_entry = page_table + page_table_index[rel_revision + 1]; + + /* do we hit a valid index page? */ + max_item_index = (apr_uint64_t)header->page_size + * (last_entry - first_entry); + if (baton->item_index >= max_item_index) + return svn_error_createf(SVN_ERR_FS_INDEX_OVERFLOW , NULL, + _("Item index %s exceeds l2p limit " + "of %s for revision %ld"), + apr_psprintf(scratch_pool, + "%" APR_UINT64_T_FMT, + baton->item_index), + apr_psprintf(scratch_pool, + "%" APR_UINT64_T_FMT, + max_item_index), + baton->revision); + + /* all pages are of the same size and full, except for the last one */ + baton->page_offset = (apr_uint32_t)(baton->item_index % header->page_size); + baton->page_no = (apr_uint32_t)(baton->item_index / header->page_size); + baton->entry = first_entry[baton->page_no]; + } + + baton->first_revision = header->first_revision; + + return SVN_NO_ERROR; +} + +/* Implement svn_cache__partial_getter_func_t: copy the data requested in + * l2p_page_info_baton_t *BATON from l2p_header_t *DATA into the output + * fields in *BATON. + */ +static svn_error_t * +l2p_header_access_func(void **out, + const void *data, + apr_size_t data_len, + void *baton, + apr_pool_t *result_pool) +{ + /* resolve all pointer values of in-cache data */ + const l2p_header_t *header = data; + const l2p_page_table_entry_t *page_table + = svn_temp_deserializer__ptr(header, + (const void *const *)&header->page_table); + const apr_size_t *page_table_index + = svn_temp_deserializer__ptr(header, + (const void *const *)&header->page_table_index); + + /* copy the info */ + return l2p_header_copy(baton, header, page_table, page_table_index, + result_pool); +} + +/* Read COUNT run-length-encoded (see rle_array) uint64 from STREAM and + * return them in VALUES. + */ +static svn_error_t * +expand_rle(apr_array_header_t *values, + svn_fs_x__packed_number_stream_t *stream, + apr_size_t count) +{ + apr_array_clear(values); + + while (count) + { + apr_uint64_t value; + SVN_ERR(packed_stream_get(&value, stream)); + + if (value) + { + APR_ARRAY_PUSH(values, apr_uint64_t) = value; + --count; + } + else + { + apr_uint64_t i; + apr_uint64_t repetitions; + SVN_ERR(packed_stream_get(&repetitions, stream)); + if (++repetitions > count) + repetitions = count; + + for (i = 0; i < repetitions; ++i) + APR_ARRAY_PUSH(values, apr_uint64_t) = 1; + + count -= repetitions; + } + } + + return SVN_NO_ERROR; +} + +/* If REV_FILE->L2P_STREAM is NULL, create a new stream for the log-to-phys + * index for REVISION in FS and return it in REV_FILE. + */ +static svn_error_t * +auto_open_l2p_index(svn_fs_x__revision_file_t *rev_file, + svn_fs_t *fs, + svn_revnum_t revision) +{ + if (rev_file->l2p_stream == NULL) + { + svn_fs_x__data_t *ffd = fs->fsap_data; + + SVN_ERR(svn_fs_x__auto_read_footer(rev_file)); + SVN_ERR(packed_stream_open(&rev_file->l2p_stream, + rev_file->file, + rev_file->l2p_offset, + rev_file->p2l_offset, + L2P_STREAM_PREFIX, + (apr_size_t)ffd->block_size, + rev_file->pool, + rev_file->pool)); + } + + return SVN_NO_ERROR; +} + +/* Read the header data structure of the log-to-phys index for REVISION + * in FS and return it in *HEADER, allocated in RESULT_POOL. Use REV_FILE + * to access on-disk data. Use SCRATCH_POOL for temporary allocations. + */ +static svn_error_t * +get_l2p_header_body(l2p_header_t **header, + svn_fs_x__revision_file_t *rev_file, + svn_fs_t *fs, + svn_revnum_t revision, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool) +{ + svn_fs_x__data_t *ffd = fs->fsap_data; + apr_uint64_t value; + apr_size_t i; + apr_size_t page, page_count; + apr_off_t offset; + l2p_header_t *result = apr_pcalloc(result_pool, sizeof(*result)); + apr_size_t page_table_index; + svn_revnum_t next_rev; + apr_array_header_t *expanded_values + = apr_array_make(scratch_pool, 16, sizeof(apr_uint64_t)); + + svn_fs_x__pair_cache_key_t key; + key.revision = rev_file->start_revision; + key.second = rev_file->is_packed; + + SVN_ERR(auto_open_l2p_index(rev_file, fs, revision)); + packed_stream_seek(rev_file->l2p_stream, 0); + + /* Read the table sizes. Check the data for plausibility and + * consistency with other bits. */ + SVN_ERR(packed_stream_get(&value, rev_file->l2p_stream)); + result->first_revision = (svn_revnum_t)value; + if (result->first_revision != rev_file->start_revision) + return svn_error_create(SVN_ERR_FS_INDEX_CORRUPTION, NULL, + _("Index rev / pack file revision numbers do not match")); + + SVN_ERR(packed_stream_get(&value, rev_file->l2p_stream)); + result->revision_count = (int)value; + if ( result->revision_count != 1 + && result->revision_count != (apr_uint64_t)ffd->max_files_per_dir) + return svn_error_create(SVN_ERR_FS_INDEX_CORRUPTION, NULL, + _("Invalid number of revisions in L2P index")); + + SVN_ERR(packed_stream_get(&value, rev_file->l2p_stream)); + result->page_size = (apr_uint32_t)value; + if (!result->page_size || (result->page_size & (result->page_size - 1))) + return svn_error_create(SVN_ERR_FS_INDEX_CORRUPTION, NULL, + _("L2P index page size is not a power of two")); + + SVN_ERR(packed_stream_get(&value, rev_file->l2p_stream)); + page_count = (apr_size_t)value; + if (page_count < result->revision_count) + return svn_error_create(SVN_ERR_FS_INDEX_CORRUPTION, NULL, + _("Fewer L2P index pages than revisions")); + if (page_count > (rev_file->p2l_offset - rev_file->l2p_offset) / 2) + return svn_error_create(SVN_ERR_FS_INDEX_CORRUPTION, NULL, + _("L2P index page count implausibly large")); + + next_rev = result->first_revision + (svn_revnum_t)result->revision_count; + if (result->first_revision > revision || next_rev <= revision) + return svn_error_createf(SVN_ERR_FS_INDEX_CORRUPTION, NULL, + _("Corrupt L2P index for r%ld only covers r%ld:%ld"), + revision, result->first_revision, next_rev); + + /* allocate the page tables */ + result->page_table + = apr_pcalloc(result_pool, page_count * sizeof(*result->page_table)); + result->page_table_index + = apr_pcalloc(result_pool, (result->revision_count + 1) + * sizeof(*result->page_table_index)); + + /* read per-revision page table sizes (i.e. number of pages per rev) */ + page_table_index = 0; + result->page_table_index[0] = page_table_index; + SVN_ERR(expand_rle(expanded_values, rev_file->l2p_stream, + result->revision_count)); + for (i = 0; i < result->revision_count; ++i) + { + value = (apr_size_t)APR_ARRAY_IDX(expanded_values, i, apr_uint64_t); + if (value == 0) + return svn_error_create(SVN_ERR_FS_INDEX_CORRUPTION, NULL, + _("Revision with no L2P index pages")); + + page_table_index += (apr_size_t)value; + if (page_table_index > page_count) + return svn_error_create(SVN_ERR_FS_INDEX_CORRUPTION, NULL, + _("L2P page table exceeded")); + + result->page_table_index[i+1] = page_table_index; + } + + if (page_table_index != page_count) + return svn_error_create(SVN_ERR_FS_INDEX_CORRUPTION, NULL, + _("Revisions do not cover the full L2P index page table")); + + /* read actual page tables */ + for (page = 0; page < page_count; ++page) + { + SVN_ERR(packed_stream_get(&value, rev_file->l2p_stream)); + if (value == 0) + return svn_error_create(SVN_ERR_FS_INDEX_CORRUPTION, NULL, + _("Empty L2P index page")); + + result->page_table[page].size = (apr_uint32_t)value; + SVN_ERR(packed_stream_get(&value, rev_file->l2p_stream)); + if (value > result->page_size) + return svn_error_create(SVN_ERR_FS_INDEX_CORRUPTION, NULL, + _("Page exceeds L2P index page size")); + + result->page_table[page].entry_count = (apr_uint32_t)value; + } + + /* correct the page description offsets */ + offset = packed_stream_offset(rev_file->l2p_stream); + for (page = 0; page < page_count; ++page) + { + result->page_table[page].offset = offset; + offset += result->page_table[page].size; + } + + /* return and cache the header */ + *header = result; + SVN_ERR(svn_cache__set(ffd->l2p_header_cache, &key, result, scratch_pool)); + + return SVN_NO_ERROR; +} + +/* Get the page info requested in *BATON from FS and set the output fields + * in *BATON. + * To maximize efficiency, use or return the data stream in *STREAM. + * Use SCRATCH_POOL for temporary allocations. + */ +static svn_error_t * +get_l2p_page_info(l2p_page_info_baton_t *baton, + svn_fs_x__revision_file_t *rev_file, + svn_fs_t *fs, + apr_pool_t *scratch_pool) +{ + svn_fs_x__data_t *ffd = fs->fsap_data; + l2p_header_t *result; + svn_boolean_t is_cached = FALSE; + void *dummy = NULL; + + /* try to find the info in the cache */ + svn_fs_x__pair_cache_key_t key; + key.revision = base_revision(fs, baton->revision); + key.second = svn_fs_x__is_packed_rev(fs, baton->revision); + SVN_ERR(svn_cache__get_partial((void**)&dummy, &is_cached, + ffd->l2p_header_cache, &key, + l2p_header_access_func, baton, + scratch_pool)); + if (is_cached) + return SVN_NO_ERROR; + + /* read from disk, cache and copy the result */ + SVN_ERR(get_l2p_header_body(&result, rev_file, fs, baton->revision, + scratch_pool, scratch_pool)); + SVN_ERR(l2p_header_copy(baton, result, result->page_table, + result->page_table_index, scratch_pool)); + + return SVN_NO_ERROR; +} + +/* Read the log-to-phys header info of the index covering REVISION from FS + * and return it in *HEADER. REV_FILE provides the pack / rev status. + * Allocate *HEADER in RESULT_POOL, use SCRATCH_POOL for temporary + * allocations. + */ +static svn_error_t * +get_l2p_header(l2p_header_t **header, + svn_fs_x__revision_file_t *rev_file, + svn_fs_t *fs, + svn_revnum_t revision, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool) +{ + svn_fs_x__data_t *ffd = fs->fsap_data; + svn_boolean_t is_cached = FALSE; + + /* first, try cache lookop */ + svn_fs_x__pair_cache_key_t key; + key.revision = rev_file->start_revision; + key.second = rev_file->is_packed; + SVN_ERR(svn_cache__get((void**)header, &is_cached, ffd->l2p_header_cache, + &key, result_pool)); + if (is_cached) + return SVN_NO_ERROR; + + /* read from disk and cache the result */ + SVN_ERR(get_l2p_header_body(header, rev_file, fs, revision, result_pool, + scratch_pool)); + + return SVN_NO_ERROR; +} + +/* From the log-to-phys index file starting at START_REVISION in FS, read + * the mapping page identified by TABLE_ENTRY and return it in *PAGE. + * Use REV_FILE to access on-disk files. + * Use RESULT_POOL for allocations. + */ +static svn_error_t * +get_l2p_page(l2p_page_t **page, + svn_fs_x__revision_file_t *rev_file, + svn_fs_t *fs, + svn_revnum_t start_revision, + l2p_page_table_entry_t *table_entry, + apr_pool_t *result_pool) +{ + apr_uint64_t value, last_value = 0; + apr_uint32_t i; + l2p_page_t *result = apr_pcalloc(result_pool, sizeof(*result)); + apr_uint64_t container_count; + apr_off_t *container_offsets; + + /* open index file and select page */ + SVN_ERR(auto_open_l2p_index(rev_file, fs, start_revision)); + packed_stream_seek(rev_file->l2p_stream, table_entry->offset); + + /* initialize the page content */ + result->entry_count = table_entry->entry_count; + result->offsets = apr_pcalloc(result_pool, result->entry_count + * sizeof(*result->offsets)); + result->sub_items = apr_pcalloc(result_pool, result->entry_count + * sizeof(*result->sub_items)); + + /* container offsets array */ + + SVN_ERR(packed_stream_get(&container_count, rev_file->l2p_stream)); + container_offsets = apr_pcalloc(result_pool, + container_count * sizeof(*result)); + for (i = 0; i < container_count; ++i) + { + SVN_ERR(packed_stream_get(&value, rev_file->l2p_stream)); + last_value += value; + container_offsets[i] = (apr_off_t)last_value - 1; + /* '-1' is represented as '0' in the index file */ + } + + /* read all page entries (offsets in rev file and container sub-items) */ + for (i = 0; i < result->entry_count; ++i) + { + SVN_ERR(packed_stream_get(&value, rev_file->l2p_stream)); + if (value == 0) + { + result->offsets[i] = -1; + result->sub_items[i] = 0; + } + else if (value <= container_count) + { + result->offsets[i] = container_offsets[value - 1]; + SVN_ERR(packed_stream_get(&value, rev_file->l2p_stream)); + result->sub_items[i] = (apr_uint32_t)value; + } + else + { + result->offsets[i] = (apr_off_t)(value - 1 - container_count); + result->sub_items[i] = 0; + } + } + + /* After reading all page entries, the read cursor must have moved by + * TABLE_ENTRY->SIZE bytes. */ + if ( packed_stream_offset(rev_file->l2p_stream) + != table_entry->offset + table_entry->size) + return svn_error_create(SVN_ERR_FS_INDEX_CORRUPTION, NULL, + _("L2P actual page size does not match page table value.")); + + *page = result; + + return SVN_NO_ERROR; +} + +/* Request data structure for l2p_page_access_func. + */ +typedef struct l2p_page_baton_t +{ + /* in data */ + /* revision. Used for error messages only */ + svn_revnum_t revision; + + /* item index to look up. Used for error messages only */ + apr_uint64_t item_index; + + /* offset within the cached page */ + apr_uint32_t page_offset; + + /* out data */ + /* absolute item or container offset in rev / pack file */ + apr_off_t offset; + + /* 0 -> container / item itself; sub-item in container otherwise */ + apr_uint32_t sub_item; + +} l2p_page_baton_t; + +/* Return the rev / pack file offset of the item at BATON->PAGE_OFFSET in + * OFFSETS of PAGE and write it to *OFFSET. + * Allocate temporaries in SCRATCH_POOL. + */ +static svn_error_t * +l2p_page_get_offset(l2p_page_baton_t *baton, + const l2p_page_t *page, + const apr_off_t *offsets, + const apr_uint32_t *sub_items, + apr_pool_t *scratch_pool) +{ + /* overflow check */ + if (page->entry_count <= baton->page_offset) + return svn_error_createf(SVN_ERR_FS_INDEX_OVERFLOW , NULL, + _("Item index %s too large in" + " revision %ld"), + apr_psprintf(scratch_pool, "%" APR_UINT64_T_FMT, + baton->item_index), + baton->revision); + + /* return the result */ + baton->offset = offsets[baton->page_offset]; + baton->sub_item = sub_items[baton->page_offset]; + + return SVN_NO_ERROR; +} + +/* Implement svn_cache__partial_getter_func_t: copy the data requested in + * l2p_page_baton_t *BATON from l2p_page_t *DATA into apr_off_t *OUT. + */ +static svn_error_t * +l2p_page_access_func(void **out, + const void *data, + apr_size_t data_len, + void *baton, + apr_pool_t *result_pool) +{ + /* resolve all in-cache pointers */ + const l2p_page_t *page = data; + const apr_off_t *offsets + = svn_temp_deserializer__ptr(page, (const void *const *)&page->offsets); + const apr_uint32_t *sub_items + = svn_temp_deserializer__ptr(page, (const void *const *)&page->sub_items); + + /* return the requested data */ + return l2p_page_get_offset(baton, page, offsets, sub_items, result_pool); +} + +/* Data request structure used by l2p_page_table_access_func. + */ +typedef struct l2p_page_table_baton_t +{ + /* revision for which to read the page table */ + svn_revnum_t revision; + + /* page table entries (of type l2p_page_table_entry_t). + * Must be created by caller and will be filled by callee. */ + apr_array_header_t *pages; +} l2p_page_table_baton_t; + +/* Implement svn_cache__partial_getter_func_t: copy the data requested in + * l2p_page_baton_t *BATON from l2p_page_t *DATA into BATON->PAGES and *OUT. + */ +static svn_error_t * +l2p_page_table_access_func(void **out, + const void *data, + apr_size_t data_len, + void *baton, + apr_pool_t *result_pool) +{ + /* resolve in-cache pointers */ + l2p_page_table_baton_t *table_baton = baton; + const l2p_header_t *header = (const l2p_header_t *)data; + const l2p_page_table_entry_t *page_table + = svn_temp_deserializer__ptr(header, + (const void *const *)&header->page_table); + const apr_size_t *page_table_index + = svn_temp_deserializer__ptr(header, + (const void *const *)&header->page_table_index); + + /* copy the revision's page table into BATON */ + apr_size_t rel_revision = table_baton->revision - header->first_revision; + if (rel_revision < header->revision_count) + { + const l2p_page_table_entry_t *entry + = page_table + page_table_index[rel_revision]; + const l2p_page_table_entry_t *last_entry + = page_table + page_table_index[rel_revision + 1]; + + for (; entry < last_entry; ++entry) + APR_ARRAY_PUSH(table_baton->pages, l2p_page_table_entry_t) + = *entry; + } + + /* set output as a courtesy to the caller */ + *out = table_baton->pages; + + return SVN_NO_ERROR; +} + +/* Read the l2p index page table for REVISION in FS from cache and return + * it in PAGES. The later must be provided by the caller (and can be + * re-used); existing entries will be removed before writing the result. + * If the data cannot be found in the cache, the result will be empty + * (it never can be empty for a valid REVISION if the data is cached). + * Use the info from REV_FILE to determine pack / rev file properties. + * Use SCRATCH_POOL for temporary allocations. + */ +static svn_error_t * +get_l2p_page_table(apr_array_header_t *pages, + svn_fs_t *fs, + svn_revnum_t revision, + apr_pool_t *scratch_pool) +{ + svn_fs_x__data_t *ffd = fs->fsap_data; + svn_boolean_t is_cached = FALSE; + l2p_page_table_baton_t baton; + + svn_fs_x__pair_cache_key_t key; + key.revision = base_revision(fs, revision); + key.second = svn_fs_x__is_packed_rev(fs, revision); + + apr_array_clear(pages); + baton.revision = revision; + baton.pages = pages; + SVN_ERR(svn_cache__get_partial((void**)&pages, &is_cached, + ffd->l2p_header_cache, &key, + l2p_page_table_access_func, &baton, + scratch_pool)); + + return SVN_NO_ERROR; +} + +/* Utility function. Read the l2p index pages for REVISION in FS from + * STREAM and put them into the cache. Skip page number EXLCUDED_PAGE_NO + * (use -1 for 'skip none') and pages outside the MIN_OFFSET, MAX_OFFSET + * range in the l2p index file. The index is being identified by + * FIRST_REVISION. PAGES is a scratch container provided by the caller. + * SCRATCH_POOL is used for temporary allocations. + * + * This function may be a no-op if the header cache lookup fails / misses. + */ +static svn_error_t * +prefetch_l2p_pages(svn_boolean_t *end, + svn_fs_t *fs, + svn_fs_x__revision_file_t *rev_file, + svn_revnum_t first_revision, + svn_revnum_t revision, + apr_array_header_t *pages, + int exlcuded_page_no, + apr_off_t min_offset, + apr_off_t max_offset, + apr_pool_t *scratch_pool) +{ + svn_fs_x__data_t *ffd = fs->fsap_data; + int i; + apr_pool_t *iterpool; + svn_fs_x__page_cache_key_t key = { 0 }; + + /* Parameter check. */ + if (min_offset < 0) + min_offset = 0; + + if (max_offset <= 0) + { + /* Nothing to do */ + *end = TRUE; + return SVN_NO_ERROR; + } + + /* get the page table for REVISION from cache */ + *end = FALSE; + SVN_ERR(get_l2p_page_table(pages, fs, revision, scratch_pool)); + if (pages->nelts == 0) + { + /* not found -> we can't continue without hitting the disk again */ + *end = TRUE; + return SVN_NO_ERROR; + } + + /* prefetch pages individually until all are done or we found one in + * the cache */ + iterpool = svn_pool_create(scratch_pool); + assert(revision <= APR_UINT32_MAX); + key.revision = (apr_uint32_t)revision; + key.is_packed = svn_fs_x__is_packed_rev(fs, revision); + + for (i = 0; i < pages->nelts && !*end; ++i) + { + svn_boolean_t is_cached; + + l2p_page_table_entry_t *entry + = &APR_ARRAY_IDX(pages, i, l2p_page_table_entry_t); + svn_pool_clear(iterpool); + + if (i == exlcuded_page_no) + continue; + + /* skip pages outside the specified index file range */ + if ( entry->offset < (apr_uint64_t)min_offset + || entry->offset + entry->size > (apr_uint64_t)max_offset) + { + *end = TRUE; + continue; + } + + /* page already in cache? */ + key.page = i; + SVN_ERR(svn_cache__has_key(&is_cached, ffd->l2p_page_cache, + &key, iterpool)); + if (!is_cached) + { + /* no in cache -> read from stream (data already buffered in APR) + * and cache the result */ + l2p_page_t *page = NULL; + SVN_ERR(get_l2p_page(&page, rev_file, fs, first_revision, + entry, iterpool)); + + SVN_ERR(svn_cache__set(ffd->l2p_page_cache, &key, page, + iterpool)); + } + } + + svn_pool_destroy(iterpool); + + return SVN_NO_ERROR; +} + +/* Using the log-to-phys indexes in FS, find the absolute offset in the + * rev file for (REVISION, ITEM_INDEX) and return it in *OFFSET. + * Use SCRATCH_POOL for temporary allocations. + */ +static svn_error_t * +l2p_index_lookup(apr_off_t *offset, + apr_uint32_t *sub_item, + svn_fs_t *fs, + svn_fs_x__revision_file_t *rev_file, + svn_revnum_t revision, + apr_uint64_t item_index, + apr_pool_t *scratch_pool) +{ + svn_fs_x__data_t *ffd = fs->fsap_data; + l2p_page_info_baton_t info_baton; + l2p_page_baton_t page_baton; + l2p_page_t *page = NULL; + svn_fs_x__page_cache_key_t key = { 0 }; + svn_boolean_t is_cached = FALSE; + void *dummy = NULL; + + /* read index master data structure and extract the info required to + * access the l2p index page for (REVISION,ITEM_INDEX)*/ + info_baton.revision = revision; + info_baton.item_index = item_index; + SVN_ERR(get_l2p_page_info(&info_baton, rev_file, fs, scratch_pool)); + + /* try to find the page in the cache and get the OFFSET from it */ + page_baton.revision = revision; + page_baton.item_index = item_index; + page_baton.page_offset = info_baton.page_offset; + + assert(revision <= APR_UINT32_MAX); + key.revision = (apr_uint32_t)revision; + key.is_packed = svn_fs_x__is_packed_rev(fs, revision); + key.page = info_baton.page_no; + + SVN_ERR(svn_cache__get_partial(&dummy, &is_cached, + ffd->l2p_page_cache, &key, + l2p_page_access_func, &page_baton, + scratch_pool)); + + if (!is_cached) + { + /* we need to read the info from disk (might already be in the + * APR file buffer, though) */ + apr_array_header_t *pages; + svn_revnum_t prefetch_revision; + svn_revnum_t last_revision + = info_baton.first_revision + + svn_fs_x__pack_size(fs, info_baton.first_revision); + apr_pool_t *iterpool = svn_pool_create(scratch_pool); + svn_boolean_t end; + apr_off_t max_offset + = APR_ALIGN(info_baton.entry.offset + info_baton.entry.size, + ffd->block_size); + apr_off_t min_offset = max_offset - ffd->block_size; + + /* read the relevant page */ + SVN_ERR(get_l2p_page(&page, rev_file, fs, info_baton.first_revision, + &info_baton.entry, scratch_pool)); + + /* cache the page and extract the result we need */ + SVN_ERR(svn_cache__set(ffd->l2p_page_cache, &key, page, scratch_pool)); + SVN_ERR(l2p_page_get_offset(&page_baton, page, page->offsets, + page->sub_items, scratch_pool)); + + /* prefetch pages from following and preceding revisions */ + pages = apr_array_make(scratch_pool, 16, + sizeof(l2p_page_table_entry_t)); + end = FALSE; + for (prefetch_revision = revision; + prefetch_revision < last_revision && !end; + ++prefetch_revision) + { + int excluded_page_no = prefetch_revision == revision + ? info_baton.page_no + : -1; + svn_pool_clear(iterpool); + + SVN_ERR(prefetch_l2p_pages(&end, fs, rev_file, + info_baton.first_revision, + prefetch_revision, pages, + excluded_page_no, min_offset, + max_offset, iterpool)); + } + + end = FALSE; + for (prefetch_revision = revision-1; + prefetch_revision >= info_baton.first_revision && !end; + --prefetch_revision) + { + svn_pool_clear(iterpool); + + SVN_ERR(prefetch_l2p_pages(&end, fs, rev_file, + info_baton.first_revision, + prefetch_revision, pages, -1, + min_offset, max_offset, iterpool)); + } + + svn_pool_destroy(iterpool); + } + + *offset = page_baton.offset; + *sub_item = page_baton.sub_item; + + return SVN_NO_ERROR; +} + +/* Using the log-to-phys proto index in transaction TXN_ID in FS, find the + * absolute offset in the proto rev file for the given ITEM_INDEX and return + * it in *OFFSET. Use SCRATCH_POOL for temporary allocations. + */ +static svn_error_t * +l2p_proto_index_lookup(apr_off_t *offset, + apr_uint32_t *sub_item, + svn_fs_t *fs, + svn_fs_x__txn_id_t txn_id, + apr_uint64_t item_index, + apr_pool_t *scratch_pool) +{ + svn_boolean_t eof = FALSE; + apr_file_t *file = NULL; + SVN_ERR(svn_io_file_open(&file, + svn_fs_x__path_l2p_proto_index(fs, txn_id, + scratch_pool), + APR_READ | APR_BUFFERED, APR_OS_DEFAULT, + scratch_pool)); + + /* process all entries until we fail due to EOF */ + *offset = -1; + while (!eof) + { + l2p_proto_entry_t entry; + + /* (attempt to) read the next entry from the source */ + SVN_ERR(read_l2p_entry_from_proto_index(file, &entry, &eof, + scratch_pool)); + + /* handle new revision */ + if (!eof && entry.item_index == item_index) + { + *offset = (apr_off_t)entry.offset - 1; + *sub_item = entry.sub_item; + break; + } + } + + SVN_ERR(svn_io_file_close(file, scratch_pool)); + + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__l2p_get_max_ids(apr_array_header_t **max_ids, + svn_fs_t *fs, + svn_revnum_t start_rev, + apr_size_t count, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool) +{ + l2p_header_t *header = NULL; + svn_revnum_t revision; + svn_revnum_t last_rev = (svn_revnum_t)(start_rev + count); + svn_fs_x__revision_file_t *rev_file; + apr_pool_t *header_pool = svn_pool_create(scratch_pool); + + /* read index master data structure for the index covering START_REV */ + SVN_ERR(svn_fs_x__open_pack_or_rev_file(&rev_file, fs, start_rev, + header_pool, header_pool)); + SVN_ERR(get_l2p_header(&header, rev_file, fs, start_rev, header_pool, + header_pool)); + SVN_ERR(svn_fs_x__close_revision_file(rev_file)); + + /* Determine the length of the item index list for each rev. + * Read new index headers as required. */ + *max_ids = apr_array_make(result_pool, (int)count, sizeof(apr_uint64_t)); + for (revision = start_rev; revision < last_rev; ++revision) + { + apr_uint64_t full_page_count; + apr_uint64_t item_count; + apr_size_t first_page_index, last_page_index; + + if (revision >= header->first_revision + header->revision_count) + { + /* need to read the next index. Clear up memory used for the + * previous one. Note that intermittent pack runs do not change + * the number of items in a revision, i.e. there is no consistency + * issue here. */ + svn_pool_clear(header_pool); + SVN_ERR(svn_fs_x__open_pack_or_rev_file(&rev_file, fs, revision, + header_pool, header_pool)); + SVN_ERR(get_l2p_header(&header, rev_file, fs, revision, + header_pool, header_pool)); + SVN_ERR(svn_fs_x__close_revision_file(rev_file)); + } + + /* in a revision with N index pages, the first N-1 index pages are + * "full", i.e. contain HEADER->PAGE_SIZE entries */ + first_page_index + = header->page_table_index[revision - header->first_revision]; + last_page_index + = header->page_table_index[revision - header->first_revision + 1]; + full_page_count = last_page_index - first_page_index - 1; + item_count = full_page_count * header->page_size + + header->page_table[last_page_index - 1].entry_count; + + APR_ARRAY_PUSH(*max_ids, apr_uint64_t) = item_count; + } + + svn_pool_destroy(header_pool); + return SVN_NO_ERROR; +} + +/* + * phys-to-log index + */ +svn_fs_x__p2l_entry_t * +svn_fs_x__p2l_entry_dup(const svn_fs_x__p2l_entry_t *entry, + apr_pool_t *result_pool) +{ + svn_fs_x__p2l_entry_t *new_entry = apr_pmemdup(result_pool, entry, + sizeof(*new_entry)); + if (new_entry->item_count) + new_entry->items = apr_pmemdup(result_pool, + entry->items, + entry->item_count * sizeof(*entry->items)); + + return new_entry; +} + +/* + * phys-to-log index + */ +svn_error_t * +svn_fs_x__p2l_proto_index_open(apr_file_t **proto_index, + const char *file_name, + apr_pool_t *result_pool) +{ + SVN_ERR(svn_io_file_open(proto_index, file_name, APR_READ | APR_WRITE + | APR_CREATE | APR_APPEND | APR_BUFFERED, + APR_OS_DEFAULT, result_pool)); + + return SVN_NO_ERROR; +} + + +svn_error_t * +svn_fs_x__p2l_proto_index_add_entry(apr_file_t *proto_index, + const svn_fs_x__p2l_entry_t *entry, + apr_pool_t *scratch_pool) +{ + apr_uint64_t revision; + apr_int32_t i; + + /* Make sure all signed elements of ENTRY have non-negative values. + * + * For file offsets and sizes, this is a given as we use them to describe + * absolute positions and sizes. For revisions, SVN_INVALID_REVNUM is + * valid, hence we have to shift it by 1. + */ + SVN_ERR_ASSERT(entry->offset >= 0); + SVN_ERR_ASSERT(entry->size >= 0); + + /* Now, all values will nicely convert to uint64. */ + /* Make sure to keep P2L_PROTO_INDEX_ENTRY_SIZE consistent with this: */ + + SVN_ERR(write_uint64_to_proto_index(proto_index, entry->offset, + scratch_pool)); + SVN_ERR(write_uint64_to_proto_index(proto_index, entry->size, + scratch_pool)); + SVN_ERR(write_uint64_to_proto_index(proto_index, entry->type, + scratch_pool)); + SVN_ERR(write_uint64_to_proto_index(proto_index, entry->fnv1_checksum, + scratch_pool)); + SVN_ERR(write_uint64_to_proto_index(proto_index, entry->item_count, + scratch_pool)); + + /* Add sub-items. */ + for (i = 0; i < entry->item_count; ++i) + { + const svn_fs_x__id_t *sub_item = &entry->items[i]; + + /* Make sure all signed elements of ENTRY have non-negative values. + * + * For file offsets and sizes, this is a given as we use them to + * describe absolute positions and sizes. For revisions, + * SVN_INVALID_REVNUM is valid, hence we have to shift it by 1. + */ + SVN_ERR_ASSERT( sub_item->change_set >= 0 + || sub_item->change_set == SVN_INVALID_REVNUM); + + /* Write sub- record. */ + revision = sub_item->change_set == SVN_INVALID_REVNUM + ? 0 + : ((apr_uint64_t)sub_item->change_set + 1); + + SVN_ERR(write_uint64_to_proto_index(proto_index, revision, + scratch_pool)); + SVN_ERR(write_uint64_to_proto_index(proto_index, sub_item->number, + scratch_pool)); + } + + /* Add trailer: rev / pack file offset of the next item */ + SVN_ERR(write_uint64_to_proto_index(proto_index, + entry->offset + entry->size, + scratch_pool)); + + return SVN_NO_ERROR; +} + +/* Read *ENTRY from log-to-phys PROTO_INDEX file and indicate end-of-file + * in *EOF, or error out in that case if EOF is NULL. *ENTRY is in an + * undefined state if an end-of-file occurred. + * Use SCRATCH_POOL for temporary allocations. + */ +static svn_error_t * +read_p2l_entry_from_proto_index(apr_file_t *proto_index, + svn_fs_x__p2l_entry_t *entry, + svn_boolean_t *eof, + apr_pool_t *scratch_pool) +{ + SVN_ERR(read_off_t_from_proto_index(proto_index, &entry->offset, + eof, scratch_pool)); + SVN_ERR(read_off_t_from_proto_index(proto_index, &entry->size, + eof, scratch_pool)); + SVN_ERR(read_uint32_from_proto_index(proto_index, &entry->type, + eof, scratch_pool)); + SVN_ERR(read_uint32_from_proto_index(proto_index, &entry->fnv1_checksum, + eof, scratch_pool)); + SVN_ERR(read_uint32_from_proto_index(proto_index, &entry->item_count, + eof, scratch_pool)); + + return SVN_NO_ERROR; +} + +static svn_error_t * +read_p2l_sub_items_from_proto_index(apr_file_t *proto_index, + svn_fs_x__p2l_entry_t *entry, + svn_boolean_t *eof, + apr_pool_t *scratch_pool) +{ + apr_int32_t i; + for (i = 0; i < entry->item_count; ++i) + { + apr_uint64_t revision; + svn_fs_x__id_t *sub_item = &entry->items[i]; + + SVN_ERR(read_uint64_from_proto_index(proto_index, &revision, + eof, scratch_pool)); + SVN_ERR(read_uint64_from_proto_index(proto_index, &sub_item->number, + eof, scratch_pool)); + + /* Do the inverse REVSION number conversion (see + * svn_fs_x__p2l_proto_index_add_entry), if we actually read a + * complete record. + */ + if (!eof || !*eof) + { + /* Be careful with the arithmetics here (overflows and wrap-around): + */ + if (revision > 0 && revision - 1 > LONG_MAX) + return svn_error_createf(SVN_ERR_FS_INDEX_OVERFLOW, NULL, + _("Revision 0x%s too large, max = 0x%s"), + apr_psprintf(scratch_pool, + "%" APR_UINT64_T_FMT, + revision), + apr_psprintf(scratch_pool, + "%" APR_UINT64_T_FMT, + (apr_uint64_t)LONG_MAX)); + + /* Shortening conversion from unsigned to signed int is well- + * defined and not lossy in C because the value can be represented + * in the target type. Also, cast to 'long' instead of + * 'svn_revnum_t' here to provoke a compiler warning if those + * types should differ and we would need to change the overflow + * checking logic. + */ + sub_item->change_set = revision == 0 + ? SVN_INVALID_REVNUM + : (long)(revision - 1); + } + + } + + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__p2l_proto_index_next_offset(apr_off_t *next_offset, + apr_file_t *proto_index, + apr_pool_t *scratch_pool) +{ + apr_off_t offset = 0; + + /* Empty index file? */ + SVN_ERR(svn_io_file_seek(proto_index, APR_END, &offset, scratch_pool)); + if (offset == 0) + { + *next_offset = 0; + } + else + { + /* The last 64 bits contain the value we are looking for. */ + offset -= sizeof(apr_uint64_t); + SVN_ERR(svn_io_file_seek(proto_index, APR_SET, &offset, scratch_pool)); + SVN_ERR(read_off_t_from_proto_index(proto_index, next_offset, NULL, + scratch_pool)); + } + + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__p2l_index_append(svn_checksum_t **checksum, + svn_fs_t *fs, + apr_file_t *index_file, + const char *proto_file_name, + svn_revnum_t revision, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool) +{ + svn_fs_x__data_t *ffd = fs->fsap_data; + apr_uint64_t page_size = ffd->p2l_page_size; + apr_file_t *proto_index = NULL; + svn_stream_t *stream; + int i; + apr_uint32_t sub_item; + svn_boolean_t eof = FALSE; + unsigned char encoded[ENCODED_INT_LENGTH]; + + apr_uint64_t last_entry_end = 0; + apr_uint64_t last_page_end = 0; + apr_size_t last_buffer_size = 0; /* byte offset in the spill buffer at + the begin of the current revision */ + apr_uint64_t file_size = 0; + + /* temporary data structures that collect the data which will be moved + to the target file in a second step */ + apr_pool_t *local_pool = svn_pool_create(scratch_pool); + apr_array_header_t *table_sizes + = apr_array_make(local_pool, 16, sizeof(apr_uint64_t)); + + /* 64k blocks, spill after 16MB */ + svn_spillbuf_t *buffer + = svn_spillbuf__create(0x10000, 0x1000000, local_pool); + + /* for loop temps ... */ + apr_pool_t *iterpool = svn_pool_create(scratch_pool); + + /* start at the beginning of the source file */ + SVN_ERR(svn_io_file_open(&proto_index, proto_file_name, + APR_READ | APR_CREATE | APR_BUFFERED, + APR_OS_DEFAULT, local_pool)); + + /* process all entries until we fail due to EOF */ + while (!eof) + { + svn_fs_x__p2l_entry_t entry; + apr_uint64_t entry_end; + svn_boolean_t new_page = svn_spillbuf__get_size(buffer) == 0; + svn_revnum_t last_revision = revision; + apr_uint64_t last_number = 0; + + svn_pool_clear(iterpool); + + /* (attempt to) read the next entry from the source */ + SVN_ERR(read_p2l_entry_from_proto_index(proto_index, &entry, + &eof, iterpool)); + + if (entry.item_count && !eof) + { + entry.items = apr_palloc(iterpool, + entry.item_count * sizeof(*entry.items)); + SVN_ERR(read_p2l_sub_items_from_proto_index(proto_index, &entry, + &eof, iterpool)); + } + + /* Read entry trailer. However, we won't need its content. */ + if (!eof) + { + apr_uint64_t trailer; + SVN_ERR(read_uint64_from_proto_index(proto_index, &trailer, &eof, + scratch_pool)); + } + + /* "unused" (and usually non-existent) section to cover the offsets + at the end the of the last page. */ + if (eof) + { + file_size = last_entry_end; + + entry.offset = last_entry_end; + entry.size = APR_ALIGN(entry.offset, page_size) - entry.offset; + entry.type = SVN_FS_X__ITEM_TYPE_UNUSED; + entry.fnv1_checksum = 0; + entry.item_count = 0; + entry.items = NULL; + } + + for (sub_item = 0; sub_item < entry.item_count; ++sub_item) + if (entry.items[sub_item].change_set == SVN_FS_X__INVALID_CHANGE_SET) + entry.items[sub_item].change_set + = svn_fs_x__change_set_by_rev(revision); + + /* end pages if entry is extending beyond their boundaries */ + entry_end = entry.offset + entry.size; + while (entry_end - last_page_end > page_size) + { + apr_uint64_t buffer_size = svn_spillbuf__get_size(buffer); + APR_ARRAY_PUSH(table_sizes, apr_uint64_t) + = buffer_size - last_buffer_size; + + last_buffer_size = buffer_size; + last_page_end += page_size; + new_page = TRUE; + } + + /* this entry starts a new table -> store its offset + (all following entries in the same table will store sizes only) */ + if (new_page) + { + SVN_ERR(svn_spillbuf__write(buffer, (const char *)encoded, + encode_uint(encoded, entry.offset), + iterpool)); + last_revision = revision; + } + + /* write simple item / container entry */ + SVN_ERR(svn_spillbuf__write(buffer, (const char *)encoded, + encode_uint(encoded, entry.size), + iterpool)); + SVN_ERR(svn_spillbuf__write(buffer, (const char *)encoded, + encode_uint(encoded, entry.type + entry.item_count * 16), + iterpool)); + SVN_ERR(svn_spillbuf__write(buffer, (const char *)encoded, + encode_uint(encoded, entry.fnv1_checksum), + iterpool)); + + /* container contents (only one for non-container items) */ + for (sub_item = 0; sub_item < entry.item_count; ++sub_item) + { + svn_revnum_t item_rev + = svn_fs_x__get_revnum(entry.items[sub_item].change_set); + apr_int64_t diff = item_rev - last_revision; + SVN_ERR(svn_spillbuf__write(buffer, (const char *)encoded, + encode_int(encoded, diff), + iterpool)); + last_revision = item_rev; + } + + for (sub_item = 0; sub_item < entry.item_count; ++sub_item) + { + apr_int64_t diff = entry.items[sub_item].number - last_number; + SVN_ERR(svn_spillbuf__write(buffer, (const char *)encoded, + encode_int(encoded, diff), + iterpool)); + last_number = entry.items[sub_item].number; + } + + last_entry_end = entry_end; + } + + /* close the source file */ + SVN_ERR(svn_io_file_close(proto_index, local_pool)); + + /* store length of last table */ + APR_ARRAY_PUSH(table_sizes, apr_uint64_t) + = svn_spillbuf__get_size(buffer) - last_buffer_size; + + /* Open target stream. */ + stream = svn_stream_checksummed2(svn_stream_from_aprfile2(index_file, TRUE, + local_pool), + NULL, checksum, svn_checksum_md5, FALSE, + result_pool); + + /* write the start revision, file size and page size */ + SVN_ERR(svn_stream_puts(stream, P2L_STREAM_PREFIX)); + SVN_ERR(stream_write_encoded(stream, revision)); + SVN_ERR(stream_write_encoded(stream, file_size)); + SVN_ERR(stream_write_encoded(stream, page_size)); + + /* write the page table (actually, the sizes of each page description) */ + SVN_ERR(stream_write_encoded(stream, table_sizes->nelts)); + for (i = 0; i < table_sizes->nelts; ++i) + { + apr_uint64_t value = APR_ARRAY_IDX(table_sizes, i, apr_uint64_t); + SVN_ERR(stream_write_encoded(stream, value)); + } + + /* append page contents and implicitly close STREAM */ + SVN_ERR(svn_stream_copy3(svn_stream__from_spillbuf(buffer, local_pool), + stream, NULL, NULL, local_pool)); + + svn_pool_destroy(iterpool); + svn_pool_destroy(local_pool); + + return SVN_NO_ERROR; +} + +/* If REV_FILE->P2L_STREAM is NULL, create a new stream for the phys-to-log + * index for REVISION in FS using the rev / pack file provided by REV_FILE. + */ +static svn_error_t * +auto_open_p2l_index(svn_fs_x__revision_file_t *rev_file, + svn_fs_t *fs, + svn_revnum_t revision) +{ + if (rev_file->p2l_stream == NULL) + { + svn_fs_x__data_t *ffd = fs->fsap_data; + + SVN_ERR(svn_fs_x__auto_read_footer(rev_file)); + SVN_ERR(packed_stream_open(&rev_file->p2l_stream, + rev_file->file, + rev_file->p2l_offset, + rev_file->footer_offset, + P2L_STREAM_PREFIX, + (apr_size_t)ffd->block_size, + rev_file->pool, + rev_file->pool)); + } + + return SVN_NO_ERROR; +} + +/* Data structure that describes which p2l page info shall be extracted + * from the cache and contains the fields that receive the result. + */ +typedef struct p2l_page_info_baton_t +{ + /* input variables */ + /* revision identifying the index file */ + svn_revnum_t revision; + + /* offset within the page in rev / pack file */ + apr_off_t offset; + + /* output variables */ + /* page containing OFFSET */ + apr_size_t page_no; + + /* first revision in this p2l index */ + svn_revnum_t first_revision; + + /* offset within the p2l index file describing this page */ + apr_off_t start_offset; + + /* offset within the p2l index file describing the following page */ + apr_off_t next_offset; + + /* PAGE_NO * PAGE_SIZE (if <= OFFSET) */ + apr_off_t page_start; + + /* total number of pages indexed */ + apr_size_t page_count; + + /* size of each page in pack / rev file */ + apr_uint64_t page_size; +} p2l_page_info_baton_t; + +/* From HEADER and the list of all OFFSETS, fill BATON with the page info + * requested by BATON->OFFSET. + */ +static void +p2l_page_info_copy(p2l_page_info_baton_t *baton, + const p2l_header_t *header, + const apr_off_t *offsets) +{ + /* if the requested offset is out of bounds, return info for + * a zero-sized empty page right behind the last page. + */ + if (baton->offset / header->page_size < header->page_count) + { + /* This cast is safe because the value is < header->page_count. */ + baton->page_no = (apr_size_t)(baton->offset / header->page_size); + baton->start_offset = offsets[baton->page_no]; + baton->next_offset = offsets[baton->page_no + 1]; + baton->page_size = header->page_size; + } + else + { + /* Beyond the last page. */ + baton->page_no = header->page_count; + baton->start_offset = offsets[baton->page_no]; + baton->next_offset = offsets[baton->page_no]; + baton->page_size = 0; + } + + baton->first_revision = header->first_revision; + baton->page_start = (apr_off_t)(header->page_size * baton->page_no); + baton->page_count = header->page_count; +} + +/* Implement svn_cache__partial_getter_func_t: extract the p2l page info + * requested by BATON and return it in BATON. + */ +static svn_error_t * +p2l_page_info_func(void **out, + const void *data, + apr_size_t data_len, + void *baton, + apr_pool_t *result_pool) +{ + /* all the pointers to cached data we need */ + const p2l_header_t *header = data; + const apr_off_t *offsets + = svn_temp_deserializer__ptr(header, + (const void *const *)&header->offsets); + + /* copy data from cache to BATON */ + p2l_page_info_copy(baton, header, offsets); + return SVN_NO_ERROR; +} + +/* Read the header data structure of the phys-to-log index for REVISION in + * FS and return it in *HEADER, allocated in RESULT_POOL. Use REV_FILE to + * access on-disk data. Use SCRATCH_POOL for temporary allocations. + */ +static svn_error_t * +get_p2l_header(p2l_header_t **header, + svn_fs_x__revision_file_t *rev_file, + svn_fs_t *fs, + svn_revnum_t revision, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool) +{ + svn_fs_x__data_t *ffd = fs->fsap_data; + apr_uint64_t value; + apr_size_t i; + apr_off_t offset; + p2l_header_t *result; + svn_boolean_t is_cached = FALSE; + + /* look for the header data in our cache */ + svn_fs_x__pair_cache_key_t key; + key.revision = rev_file->start_revision; + key.second = rev_file->is_packed; + + SVN_ERR(svn_cache__get((void**)header, &is_cached, ffd->p2l_header_cache, + &key, result_pool)); + if (is_cached) + return SVN_NO_ERROR; + + /* not found -> must read it from disk. + * Open index file or position read pointer to the begin of the file */ + SVN_ERR(auto_open_p2l_index(rev_file, fs, key.revision)); + packed_stream_seek(rev_file->p2l_stream, 0); + + /* allocate result data structure */ + result = apr_pcalloc(result_pool, sizeof(*result)); + + /* Read table sizes, check them for plausibility and allocate page array. */ + SVN_ERR(packed_stream_get(&value, rev_file->p2l_stream)); + result->first_revision = (svn_revnum_t)value; + if (result->first_revision != rev_file->start_revision) + return svn_error_create(SVN_ERR_FS_INDEX_CORRUPTION, NULL, + _("Index rev / pack file revision numbers do not match")); + + SVN_ERR(packed_stream_get(&value, rev_file->p2l_stream)); + result->file_size = value; + if (result->file_size != (apr_uint64_t)rev_file->l2p_offset) + return svn_error_create(SVN_ERR_FS_INDEX_CORRUPTION, NULL, + _("Index offset and rev / pack file size do not match")); + + SVN_ERR(packed_stream_get(&value, rev_file->p2l_stream)); + result->page_size = value; + if (!result->page_size || (result->page_size & (result->page_size - 1))) + return svn_error_create(SVN_ERR_FS_INDEX_CORRUPTION, NULL, + _("P2L index page size is not a power of two")); + + SVN_ERR(packed_stream_get(&value, rev_file->p2l_stream)); + result->page_count = (apr_size_t)value; + if (result->page_count != (result->file_size - 1) / result->page_size + 1) + return svn_error_create(SVN_ERR_FS_INDEX_CORRUPTION, NULL, + _("P2L page count does not match rev / pack file size")); + + result->offsets + = apr_pcalloc(result_pool, (result->page_count + 1) * sizeof(*result->offsets)); + + /* read page sizes and derive page description offsets from them */ + result->offsets[0] = 0; + for (i = 0; i < result->page_count; ++i) + { + SVN_ERR(packed_stream_get(&value, rev_file->p2l_stream)); + result->offsets[i+1] = result->offsets[i] + (apr_off_t)value; + } + + /* correct the offset values */ + offset = packed_stream_offset(rev_file->p2l_stream); + for (i = 0; i <= result->page_count; ++i) + result->offsets[i] += offset; + + /* cache the header data */ + SVN_ERR(svn_cache__set(ffd->p2l_header_cache, &key, result, scratch_pool)); + + /* return the result */ + *header = result; + + return SVN_NO_ERROR; +} + +/* Read the header data structure of the phys-to-log index for revision + * BATON->REVISION in FS. Return in *BATON all info relevant to read the + * index page for the rev / pack file offset BATON->OFFSET. Use REV_FILE + * to access on-disk data. Use SCRATCH_POOL for temporary allocations. + */ +static svn_error_t * +get_p2l_page_info(p2l_page_info_baton_t *baton, + svn_fs_x__revision_file_t *rev_file, + svn_fs_t *fs, + apr_pool_t *scratch_pool) +{ + svn_fs_x__data_t *ffd = fs->fsap_data; + p2l_header_t *header; + svn_boolean_t is_cached = FALSE; + void *dummy = NULL; + + /* look for the header data in our cache */ + svn_fs_x__pair_cache_key_t key; + key.revision = base_revision(fs, baton->revision); + key.second = svn_fs_x__is_packed_rev(fs, baton->revision); + + SVN_ERR(svn_cache__get_partial(&dummy, &is_cached, ffd->p2l_header_cache, + &key, p2l_page_info_func, baton, + scratch_pool)); + if (is_cached) + return SVN_NO_ERROR; + + SVN_ERR(get_p2l_header(&header, rev_file, fs, baton->revision, + scratch_pool, scratch_pool)); + + /* copy the requested info into *BATON */ + p2l_page_info_copy(baton, header, header->offsets); + + return SVN_NO_ERROR; +} + +/* Read a mapping entry from the phys-to-log index STREAM and append it to + * RESULT. *ITEM_INDEX contains the phys offset for the entry and will + * be moved forward by the size of entry. + */ +static svn_error_t * +read_entry(svn_fs_x__packed_number_stream_t *stream, + apr_off_t *item_offset, + svn_revnum_t revision, + apr_array_header_t *result) +{ + apr_uint64_t value; + apr_uint64_t number = 0; + apr_uint32_t sub_item; + + svn_fs_x__p2l_entry_t entry; + + entry.offset = *item_offset; + SVN_ERR(packed_stream_get(&value, stream)); + entry.size = (apr_off_t)value; + SVN_ERR(packed_stream_get(&value, stream)); + entry.type = (int)value % 16; + entry.item_count = (apr_uint32_t)(value / 16); + + /* Verify item type. */ + if (entry.type > SVN_FS_X__ITEM_TYPE_REPS_CONT) + return svn_error_create(SVN_ERR_FS_INDEX_CORRUPTION, NULL, + _("Invalid item type in P2L index")); + + SVN_ERR(packed_stream_get(&value, stream)); + entry.fnv1_checksum = (apr_uint32_t)value; + + /* Truncating the checksum to 32 bits may have hidden random data in the + * unused extra bits of the on-disk representation (7/8 bit representation + * uses 5 bytes on disk for the 32 bit value, leaving 3 bits unused). */ + if (value > APR_UINT32_MAX) + return svn_error_create(SVN_ERR_FS_INDEX_CORRUPTION, NULL, + _("Invalid FNV1 checksum in P2L index")); + + /* Some of the index data for empty rev / pack file sections will not be + * used during normal operation. Thus, we have strict rules for the + * contents of those unused fields. */ + if (entry.type == SVN_FS_X__ITEM_TYPE_UNUSED) + if ( entry.fnv1_checksum != 0 + || entry.item_count != 0) + return svn_error_create(SVN_ERR_FS_INDEX_CORRUPTION, NULL, + _("Unused regions must be empty and have checksum 0")); + + if (entry.item_count == 0) + { + entry.items = NULL; + } + else + { + entry.items + = apr_pcalloc(result->pool, entry.item_count * sizeof(*entry.items)); + + if ( entry.item_count > 1 + && entry.type < SVN_FS_X__ITEM_TYPE_CHANGES_CONT) + return svn_error_create(SVN_ERR_FS_INDEX_CORRUPTION, NULL, + _("Only containers may have more than one sub-item")); + + for (sub_item = 0; sub_item < entry.item_count; ++sub_item) + { + SVN_ERR(packed_stream_get(&value, stream)); + revision += (svn_revnum_t)(value % 2 ? -1 - value / 2 : value / 2); + entry.items[sub_item].change_set + = svn_fs_x__change_set_by_rev(revision); + } + + for (sub_item = 0; sub_item < entry.item_count; ++sub_item) + { + SVN_ERR(packed_stream_get(&value, stream)); + number += value % 2 ? -1 - value / 2 : value / 2; + entry.items[sub_item].number = number; + + if ( ( entry.type == SVN_FS_X__ITEM_TYPE_CHANGES + || entry.type == SVN_FS_X__ITEM_TYPE_CHANGES_CONT) + && number != SVN_FS_X__ITEM_INDEX_CHANGES) + return svn_error_create(SVN_ERR_FS_INDEX_CORRUPTION, NULL, + _("Changed path list must have item number 1")); + } + } + + APR_ARRAY_PUSH(result, svn_fs_x__p2l_entry_t) = entry; + *item_offset += entry.size; + + return SVN_NO_ERROR; +} + +/* Read the phys-to-log mappings for the cluster beginning at rev file + * offset PAGE_START from the index for START_REVISION in FS. The data + * can be found in the index page beginning at START_OFFSET with the next + * page beginning at NEXT_OFFSET. PAGE_SIZE is the L2P index page size. + * Return the relevant index entries in *ENTRIES. Use REV_FILE to access + * on-disk data. Allocate *ENTRIES in RESULT_POOL. + */ +static svn_error_t * +get_p2l_page(apr_array_header_t **entries, + svn_fs_x__revision_file_t *rev_file, + svn_fs_t *fs, + svn_revnum_t start_revision, + apr_off_t start_offset, + apr_off_t next_offset, + apr_off_t page_start, + apr_uint64_t page_size, + apr_pool_t *result_pool) +{ + apr_uint64_t value; + apr_array_header_t *result + = apr_array_make(result_pool, 16, sizeof(svn_fs_x__p2l_entry_t)); + apr_off_t item_offset; + apr_off_t offset; + + /* open index and navigate to page start */ + SVN_ERR(auto_open_p2l_index(rev_file, fs, start_revision)); + packed_stream_seek(rev_file->p2l_stream, start_offset); + + /* read rev file offset of the first page entry (all page entries will + * only store their sizes). */ + SVN_ERR(packed_stream_get(&value, rev_file->p2l_stream)); + item_offset = (apr_off_t)value; + + /* Special case: empty pages. */ + if (start_offset == next_offset) + { + /* Empty page. This only happens if the first entry of the next page + * also covers this page (and possibly more) completely. */ + SVN_ERR(read_entry(rev_file->p2l_stream, &item_offset, start_revision, + result)); + } + else + { + /* Read non-empty page. */ + do + { + SVN_ERR(read_entry(rev_file->p2l_stream, &item_offset, + start_revision, result)); + offset = packed_stream_offset(rev_file->p2l_stream); + } + while (offset < next_offset); + + /* We should now be exactly at the next offset, i.e. the numbers in + * the stream cannot overlap into the next page description. */ + if (offset != next_offset) + return svn_error_create(SVN_ERR_FS_INDEX_CORRUPTION, NULL, + _("P2L page description overlaps with next page description")); + + /* if we haven't covered the cluster end yet, we must read the first + * entry of the next page */ + if (item_offset < page_start + page_size) + { + SVN_ERR(packed_stream_get(&value, rev_file->p2l_stream)); + item_offset = (apr_off_t)value; + SVN_ERR(read_entry(rev_file->p2l_stream, &item_offset, + start_revision, result)); + } + } + + *entries = result; + + return SVN_NO_ERROR; +} + +/* If it cannot be found in FS's caches, read the p2l index page selected + * by BATON->OFFSET from *STREAM. If the latter is yet to be constructed, + * do so in STREAM_POOL. Don't read the page if it precedes MIN_OFFSET. + * Set *END to TRUE if the caller should stop refeching. + * + * *BATON will be updated with the selected page's info and SCRATCH_POOL + * will be used for temporary allocations. If the data is alread in the + * cache, descrease *LEAKING_BUCKET and increase it otherwise. With that + * pattern we will still read all pages from the block even if some of + * them survived in the cached. + */ +static svn_error_t * +prefetch_p2l_page(svn_boolean_t *end, + int *leaking_bucket, + svn_fs_t *fs, + svn_fs_x__revision_file_t *rev_file, + p2l_page_info_baton_t *baton, + apr_off_t min_offset, + apr_pool_t *scratch_pool) +{ + svn_fs_x__data_t *ffd = fs->fsap_data; + svn_boolean_t already_cached; + apr_array_header_t *page; + svn_fs_x__page_cache_key_t key = { 0 }; + + /* fetch the page info */ + *end = FALSE; + baton->revision = baton->first_revision; + SVN_ERR(get_p2l_page_info(baton, rev_file, fs, scratch_pool)); + if (baton->start_offset < min_offset) + { + /* page outside limits -> stop prefetching */ + *end = TRUE; + return SVN_NO_ERROR; + } + + /* do we have that page in our caches already? */ + assert(baton->first_revision <= APR_UINT32_MAX); + key.revision = (apr_uint32_t)baton->first_revision; + key.is_packed = svn_fs_x__is_packed_rev(fs, baton->first_revision); + key.page = baton->page_no; + SVN_ERR(svn_cache__has_key(&already_cached, ffd->p2l_page_cache, + &key, scratch_pool)); + + /* yes, already cached */ + if (already_cached) + { + /* stop prefetching if most pages are already cached. */ + if (!--*leaking_bucket) + *end = TRUE; + + return SVN_NO_ERROR; + } + + ++*leaking_bucket; + + /* read from disk */ + SVN_ERR(get_p2l_page(&page, rev_file, fs, + baton->first_revision, + baton->start_offset, + baton->next_offset, + baton->page_start, + baton->page_size, + scratch_pool)); + + /* and put it into our cache */ + SVN_ERR(svn_cache__set(ffd->p2l_page_cache, &key, page, scratch_pool)); + + return SVN_NO_ERROR; +} + +/* Lookup & construct the baton and key information that we will need for + * a P2L page cache lookup. We want the page covering OFFSET in the rev / + * pack file containing REVSION in FS. Return the results in *PAGE_INFO_P + * and *KEY_P. Read data through REV_FILE. Use SCRATCH_POOL for temporary + * allocations. + */ +static svn_error_t * +get_p2l_keys(p2l_page_info_baton_t *page_info_p, + svn_fs_x__page_cache_key_t *key_p, + svn_fs_x__revision_file_t *rev_file, + svn_fs_t *fs, + svn_revnum_t revision, + apr_off_t offset, + apr_pool_t *scratch_pool) +{ + p2l_page_info_baton_t page_info; + + /* request info for the index pages that describes the pack / rev file + * contents at pack / rev file position OFFSET. */ + page_info.offset = offset; + page_info.revision = revision; + SVN_ERR(get_p2l_page_info(&page_info, rev_file, fs, scratch_pool)); + + /* if the offset refers to a non-existent page, bail out */ + if (page_info.page_count <= page_info.page_no) + return svn_error_createf(SVN_ERR_FS_INDEX_OVERFLOW , NULL, + _("Offset %s too large in revision %ld"), + apr_off_t_toa(scratch_pool, offset), revision); + + /* return results */ + if (page_info_p) + *page_info_p = page_info; + + /* construct cache key */ + if (key_p) + { + svn_fs_x__page_cache_key_t key = { 0 }; + assert(page_info.first_revision <= APR_UINT32_MAX); + key.revision = (apr_uint32_t)page_info.first_revision; + key.is_packed = svn_fs_x__is_packed_rev(fs, revision); + key.page = page_info.page_no; + + *key_p = key; + } + + return SVN_NO_ERROR; +} + +/* qsort-compatible compare function that compares the OFFSET of the + * svn_fs_x__p2l_entry_t in *LHS with the apr_off_t in *RHS. */ +static int +compare_start_p2l_entry(const void *lhs, + const void *rhs) +{ + const svn_fs_x__p2l_entry_t *entry = lhs; + apr_off_t start = *(const apr_off_t*)rhs; + apr_off_t diff = entry->offset - start; + + /* restrict result to int */ + return diff < 0 ? -1 : (diff == 0 ? 0 : 1); +} + +/* From the PAGE_ENTRIES array of svn_fs_x__p2l_entry_t, ordered + * by their OFFSET member, copy all elements overlapping the range + * [BLOCK_START, BLOCK_END) to ENTRIES. If RESOLVE_PTR is set, the ITEMS + * sub-array in each entry needs to be de-serialized. */ +static void +append_p2l_entries(apr_array_header_t *entries, + apr_array_header_t *page_entries, + apr_off_t block_start, + apr_off_t block_end, + svn_boolean_t resolve_ptr) +{ + const svn_fs_x__p2l_entry_t *entry; + int idx = svn_sort__bsearch_lower_bound(page_entries, &block_start, + compare_start_p2l_entry); + + /* start at the first entry that overlaps with BLOCK_START */ + if (idx > 0) + { + entry = &APR_ARRAY_IDX(page_entries, idx - 1, svn_fs_x__p2l_entry_t); + if (entry->offset + entry->size > block_start) + --idx; + } + + /* copy all entries covering the requested range */ + for ( ; idx < page_entries->nelts; ++idx) + { + svn_fs_x__p2l_entry_t *copy; + entry = &APR_ARRAY_IDX(page_entries, idx, svn_fs_x__p2l_entry_t); + if (entry->offset >= block_end) + break; + + /* Copy the entry record. */ + copy = apr_array_push(entries); + *copy = *entry; + + /* Copy the items of that entries. */ + if (entry->item_count) + { + const svn_fs_x__id_t *items + = resolve_ptr + ? svn_temp_deserializer__ptr(page_entries->elts, + (const void * const *)&entry->items) + : entry->items; + + copy->items = apr_pmemdup(entries->pool, items, + entry->item_count * sizeof(*items)); + } + } +} + +/* Auxilliary struct passed to p2l_entries_func selecting the relevant + * data range. */ +typedef struct p2l_entries_baton_t +{ + apr_off_t start; + apr_off_t end; +} p2l_entries_baton_t; + +/* Implement svn_cache__partial_getter_func_t: extract p2l entries from + * the page in DATA which overlap the p2l_entries_baton_t in BATON. + * The target array is already provided in *OUT. + */ +static svn_error_t * +p2l_entries_func(void **out, + const void *data, + apr_size_t data_len, + void *baton, + apr_pool_t *result_pool) +{ + apr_array_header_t *entries = *(apr_array_header_t **)out; + const apr_array_header_t *raw_page = data; + p2l_entries_baton_t *block = baton; + + /* Make PAGE a readable APR array. */ + apr_array_header_t page = *raw_page; + page.elts = (void *)svn_temp_deserializer__ptr(raw_page, + (const void * const *)&raw_page->elts); + + /* append relevant information to result */ + append_p2l_entries(entries, &page, block->start, block->end, TRUE); + + return SVN_NO_ERROR; +} + + +/* Body of svn_fs_x__p2l_index_lookup. However, do a single index page + * lookup and append the result to the ENTRIES array provided by the caller. + * Use successive calls to cover larger ranges. + */ +static svn_error_t * +p2l_index_lookup(apr_array_header_t *entries, + svn_fs_x__revision_file_t *rev_file, + svn_fs_t *fs, + svn_revnum_t revision, + apr_off_t block_start, + apr_off_t block_end, + apr_pool_t *scratch_pool) +{ + svn_fs_x__data_t *ffd = fs->fsap_data; + svn_fs_x__page_cache_key_t key; + svn_boolean_t is_cached = FALSE; + p2l_page_info_baton_t page_info; + apr_array_header_t *local_result = entries; + + /* baton selecting the relevant entries from the one page we access */ + p2l_entries_baton_t block; + block.start = block_start; + block.end = block_end; + + /* if we requested an empty range, the result would be empty */ + SVN_ERR_ASSERT(block_start < block_end); + + /* look for the fist page of the range in our cache */ + SVN_ERR(get_p2l_keys(&page_info, &key, rev_file, fs, revision, block_start, + scratch_pool)); + SVN_ERR(svn_cache__get_partial((void**)&local_result, &is_cached, + ffd->p2l_page_cache, &key, p2l_entries_func, + &block, scratch_pool)); + + if (!is_cached) + { + svn_boolean_t end; + apr_pool_t *iterpool = svn_pool_create(scratch_pool); + apr_off_t original_page_start = page_info.page_start; + int leaking_bucket = 4; + p2l_page_info_baton_t prefetch_info = page_info; + apr_array_header_t *page_entries; + + apr_off_t max_offset + = APR_ALIGN(page_info.next_offset, ffd->block_size); + apr_off_t min_offset + = APR_ALIGN(page_info.start_offset, ffd->block_size) - ffd->block_size; + + /* Since we read index data in larger chunks, we probably got more + * page data than we requested. Parse & cache that until either we + * encounter pages already cached or reach the end of the buffer. + */ + + /* pre-fetch preceding pages */ + end = FALSE; + prefetch_info.offset = original_page_start; + while (prefetch_info.offset >= prefetch_info.page_size && !end) + { + prefetch_info.offset -= prefetch_info.page_size; + SVN_ERR(prefetch_p2l_page(&end, &leaking_bucket, fs, rev_file, + &prefetch_info, min_offset, iterpool)); + svn_pool_clear(iterpool); + } + + /* fetch page from disk and put it into the cache */ + SVN_ERR(get_p2l_page(&page_entries, rev_file, fs, + page_info.first_revision, + page_info.start_offset, + page_info.next_offset, + page_info.page_start, + page_info.page_size, iterpool)); + + /* The last cache entry must not end beyond the range covered by + * this index. The same applies for any subset of entries. */ + if (page_entries->nelts) + { + const svn_fs_x__p2l_entry_t *entry + = &APR_ARRAY_IDX(page_entries, page_entries->nelts - 1, + svn_fs_x__p2l_entry_t); + if ( entry->offset + entry->size + > page_info.page_size * page_info.page_count) + return svn_error_createf(SVN_ERR_FS_INDEX_OVERFLOW , NULL, + _("Last P2L index entry extends beyond " + "the last page in revision %ld."), + revision); + } + + SVN_ERR(svn_cache__set(ffd->p2l_page_cache, &key, page_entries, + iterpool)); + + /* append relevant information to result */ + append_p2l_entries(entries, page_entries, block_start, block_end, FALSE); + + /* pre-fetch following pages */ + end = FALSE; + leaking_bucket = 4; + prefetch_info = page_info; + prefetch_info.offset = original_page_start; + while ( prefetch_info.next_offset < max_offset + && prefetch_info.page_no + 1 < prefetch_info.page_count + && !end) + { + prefetch_info.offset += prefetch_info.page_size; + SVN_ERR(prefetch_p2l_page(&end, &leaking_bucket, fs, rev_file, + &prefetch_info, min_offset, iterpool)); + svn_pool_clear(iterpool); + } + + svn_pool_destroy(iterpool); + } + + /* We access a valid page (otherwise, we had seen an error in the + * get_p2l_keys request). Hence, at least one entry must be found. */ + SVN_ERR_ASSERT(entries->nelts > 0); + + /* Add an "unused" entry if it extends beyond the end of the data file. + * Since the index page size might be smaller than the current data + * read block size, the trailing "unused" entry in this index may not + * fully cover the end of the last block. */ + if (page_info.page_no + 1 >= page_info.page_count) + { + svn_fs_x__p2l_entry_t *entry + = &APR_ARRAY_IDX(entries, entries->nelts-1, svn_fs_x__p2l_entry_t); + + apr_off_t entry_end = entry->offset + entry->size; + if (entry_end < block_end) + { + if (entry->type == SVN_FS_X__ITEM_TYPE_UNUSED) + { + /* extend the terminal filler */ + entry->size = block_end - entry->offset; + } + else + { + /* No terminal filler. Add one. */ + entry = apr_array_push(entries); + entry->offset = entry_end; + entry->size = block_end - entry_end; + entry->type = SVN_FS_X__ITEM_TYPE_UNUSED; + entry->fnv1_checksum = 0; + entry->item_count = 0; + entry->items = NULL; + } + } + } + + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__p2l_index_lookup(apr_array_header_t **entries, + svn_fs_t *fs, + svn_fs_x__revision_file_t *rev_file, + svn_revnum_t revision, + apr_off_t block_start, + apr_off_t block_size, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool) +{ + apr_off_t block_end = block_start + block_size; + + /* the receiving container */ + int last_count = 0; + apr_array_header_t *result = apr_array_make(result_pool, 16, + sizeof(svn_fs_x__p2l_entry_t)); + + /* Fetch entries page-by-page. Since the p2l index is supposed to cover + * every single byte in the rev / pack file - even unused sections - + * every iteration must result in some progress. */ + while (block_start < block_end) + { + svn_fs_x__p2l_entry_t *entry; + SVN_ERR(p2l_index_lookup(result, rev_file, fs, revision, block_start, + block_end, scratch_pool)); + SVN_ERR_ASSERT(result->nelts > 0); + + /* continue directly behind last item */ + entry = &APR_ARRAY_IDX(result, result->nelts-1, svn_fs_x__p2l_entry_t); + block_start = entry->offset + entry->size; + + /* Some paranoia check. Successive iterations should never return + * duplicates but if it did, we might get into trouble later on. */ + if (last_count > 0 && last_count < result->nelts) + { + entry = &APR_ARRAY_IDX(result, last_count - 1, + svn_fs_x__p2l_entry_t); + SVN_ERR_ASSERT(APR_ARRAY_IDX(result, last_count, + svn_fs_x__p2l_entry_t).offset + >= entry->offset + entry->size); + } + + last_count = result->nelts; + } + + *entries = result; + return SVN_NO_ERROR; +} + +/* compare_fn_t comparing a svn_fs_x__p2l_entry_t at LHS with an offset + * RHS. + */ +static int +compare_p2l_entry_offsets(const void *lhs, const void *rhs) +{ + const svn_fs_x__p2l_entry_t *entry = (const svn_fs_x__p2l_entry_t *)lhs; + apr_off_t offset = *(const apr_off_t *)rhs; + + return entry->offset < offset ? -1 : (entry->offset == offset ? 0 : 1); +} + +/* Cached data extraction utility. DATA is a P2L index page, e.g. an APR + * array of svn_fs_fs__p2l_entry_t elements. Return the entry for the item, + * allocated in RESULT_POOL, starting at OFFSET or NULL if that's not an + * the start offset of any item. Use SCRATCH_POOL for temporary allocations. + */ +static svn_fs_x__p2l_entry_t * +get_p2l_entry_from_cached_page(const void *data, + apr_off_t offset, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool) +{ + /* resolve all pointer values of in-cache data */ + const apr_array_header_t *page = data; + apr_array_header_t *entries = apr_pmemdup(scratch_pool, page, + sizeof(*page)); + svn_fs_x__p2l_entry_t *entry; + + entries->elts = (char *)svn_temp_deserializer__ptr(page, + (const void *const *)&page->elts); + + /* search of the offset we want */ + entry = svn_sort__array_lookup(entries, &offset, NULL, + (int (*)(const void *, const void *))compare_p2l_entry_offsets); + + /* return it, if it is a perfect match */ + if (entry) + { + svn_fs_x__p2l_entry_t *result + = apr_pmemdup(result_pool, entry, sizeof(*result)); + result->items + = (svn_fs_x__id_t *)svn_temp_deserializer__ptr(entries->elts, + (const void *const *)&entry->items); + return result; + } + + return NULL; +} + +/* Implements svn_cache__partial_getter_func_t for P2L index pages, copying + * the entry for the apr_off_t at BATON into *OUT. *OUT will be NULL if + * there is no matching entry in the index page at DATA. + */ +static svn_error_t * +p2l_entry_lookup_func(void **out, + const void *data, + apr_size_t data_len, + void *baton, + apr_pool_t *result_pool) +{ + svn_fs_x__p2l_entry_t *entry + = get_p2l_entry_from_cached_page(data, *(apr_off_t *)baton, result_pool, + result_pool); + + *out = entry && entry->offset == *(apr_off_t *)baton + ? svn_fs_x__p2l_entry_dup(entry, result_pool) + : NULL; + + return SVN_NO_ERROR; +} + +static svn_error_t * +p2l_entry_lookup(svn_fs_x__p2l_entry_t **entry_p, + svn_fs_x__revision_file_t *rev_file, + svn_fs_t *fs, + svn_revnum_t revision, + apr_off_t offset, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool) +{ + svn_fs_x__data_t *ffd = fs->fsap_data; + svn_fs_x__page_cache_key_t key = { 0 }; + svn_boolean_t is_cached = FALSE; + p2l_page_info_baton_t page_info; + + /* look for this info in our cache */ + SVN_ERR(get_p2l_keys(&page_info, &key, rev_file, fs, revision, offset, + scratch_pool)); + SVN_ERR(svn_cache__get_partial((void**)entry_p, &is_cached, + ffd->p2l_page_cache, &key, + p2l_entry_lookup_func, &offset, + result_pool)); + if (!is_cached) + { + /* do a standard index lookup. This is will automatically prefetch + * data to speed up future lookups. */ + apr_array_header_t *entries = apr_array_make(result_pool, 1, + sizeof(**entry_p)); + SVN_ERR(p2l_index_lookup(entries, rev_file, fs, revision, offset, + offset + 1, scratch_pool)); + + /* Find the entry that we want. */ + *entry_p = svn_sort__array_lookup(entries, &offset, NULL, + (int (*)(const void *, const void *))compare_p2l_entry_offsets); + } + + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__p2l_entry_lookup(svn_fs_x__p2l_entry_t **entry_p, + svn_fs_t *fs, + svn_fs_x__revision_file_t *rev_file, + svn_revnum_t revision, + apr_off_t offset, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool) +{ + /* look for this info in our cache */ + SVN_ERR(p2l_entry_lookup(entry_p, rev_file, fs, revision, offset, + result_pool, scratch_pool)); + + return SVN_NO_ERROR; +} + +/* Baton structure for p2l_item_lookup_func. It describes which sub_item + * info shall be returned. + */ +typedef struct p2l_item_lookup_baton_t +{ + /* file offset to find the P2L index entry for */ + apr_off_t offset; + + /* return the sub-item at this position within that entry */ + apr_uint32_t sub_item; +} p2l_item_lookup_baton_t; + +/* Implements svn_cache__partial_getter_func_t for P2L index pages, copying + * the svn_fs_x__id_t for the item described 2l_item_lookup_baton_t + * *BATON. *OUT will be NULL if there is no matching index entry or the + * sub-item is out of range. + */ +static svn_error_t * +p2l_item_lookup_func(void **out, + const void *data, + apr_size_t data_len, + void *baton, + apr_pool_t *result_pool) +{ + p2l_item_lookup_baton_t *lookup_baton = baton; + svn_fs_x__p2l_entry_t *entry + = get_p2l_entry_from_cached_page(data, lookup_baton->offset, result_pool, + result_pool); + + *out = entry + && entry->offset == lookup_baton->offset + && entry->item_count > lookup_baton->sub_item + ? apr_pmemdup(result_pool, + entry->items + lookup_baton->sub_item, + sizeof(*entry->items)) + : NULL; + + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__p2l_item_lookup(svn_fs_x__id_t **item, + svn_fs_t *fs, + svn_fs_x__revision_file_t *rev_file, + svn_revnum_t revision, + apr_off_t offset, + apr_uint32_t sub_item, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool) +{ + svn_fs_x__data_t *ffd = fs->fsap_data; + svn_fs_x__page_cache_key_t key = { 0 }; + svn_boolean_t is_cached = FALSE; + p2l_page_info_baton_t page_info; + p2l_item_lookup_baton_t baton; + + *item = NULL; + + /* look for this info in our cache */ + SVN_ERR(get_p2l_keys(&page_info, &key, rev_file, fs, revision, offset, + scratch_pool)); + baton.offset = offset; + baton.sub_item = sub_item; + SVN_ERR(svn_cache__get_partial((void**)item, &is_cached, + ffd->p2l_page_cache, &key, + p2l_item_lookup_func, &baton, result_pool)); + if (!is_cached) + { + /* do a standard index lookup. This is will automatically prefetch + * data to speed up future lookups. */ + svn_fs_x__p2l_entry_t *entry; + SVN_ERR(p2l_entry_lookup(&entry, rev_file, fs, revision, offset, + result_pool, scratch_pool)); + + /* return result */ + if (entry && entry->item_count > sub_item) + *item = apr_pmemdup(result_pool, entry->items + sub_item, + sizeof(**item)); + } + + return SVN_NO_ERROR; +} + +/* Implements svn_cache__partial_getter_func_t for P2L headers, setting *OUT + * to the largest the first offset not covered by this P2L index. + */ +static svn_error_t * +p2l_get_max_offset_func(void **out, + const void *data, + apr_size_t data_len, + void *baton, + apr_pool_t *result_pool) +{ + const p2l_header_t *header = data; + apr_off_t max_offset = header->file_size; + *out = apr_pmemdup(result_pool, &max_offset, sizeof(max_offset)); + + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__p2l_get_max_offset(apr_off_t *offset, + svn_fs_t *fs, + svn_fs_x__revision_file_t *rev_file, + svn_revnum_t revision, + apr_pool_t *scratch_pool) +{ + svn_fs_x__data_t *ffd = fs->fsap_data; + p2l_header_t *header; + svn_boolean_t is_cached = FALSE; + apr_off_t *offset_p; + + /* look for the header data in our cache */ + svn_fs_x__pair_cache_key_t key; + key.revision = base_revision(fs, revision); + key.second = svn_fs_x__is_packed_rev(fs, revision); + + SVN_ERR(svn_cache__get_partial((void **)&offset_p, &is_cached, + ffd->p2l_header_cache, &key, + p2l_get_max_offset_func, NULL, + scratch_pool)); + if (is_cached) + { + *offset = *offset_p; + return SVN_NO_ERROR; + } + + SVN_ERR(get_p2l_header(&header, rev_file, fs, revision, scratch_pool, + scratch_pool)); + *offset = header->file_size; + + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__item_offset(apr_off_t *absolute_position, + apr_uint32_t *sub_item, + svn_fs_t *fs, + svn_fs_x__revision_file_t *rev_file, + const svn_fs_x__id_t *item_id, + apr_pool_t *scratch_pool) +{ + if (svn_fs_x__is_txn(item_id->change_set)) + SVN_ERR(l2p_proto_index_lookup(absolute_position, sub_item, fs, + svn_fs_x__get_txn_id(item_id->change_set), + item_id->number, scratch_pool)); + else + SVN_ERR(l2p_index_lookup(absolute_position, sub_item, fs, rev_file, + svn_fs_x__get_revnum(item_id->change_set), + item_id->number, scratch_pool)); + + return SVN_NO_ERROR; +} + +/* Calculate the FNV1 checksum over the offset range in REV_FILE, covered by + * ENTRY. Store the result in ENTRY->FNV1_CHECKSUM. Use SCRATCH_POOL for + * temporary allocations. */ +static svn_error_t * +calc_fnv1(svn_fs_x__p2l_entry_t *entry, + svn_fs_x__revision_file_t *rev_file, + apr_pool_t *scratch_pool) +{ + unsigned char buffer[4096]; + svn_checksum_t *checksum; + svn_checksum_ctx_t *context + = svn_checksum_ctx_create(svn_checksum_fnv1a_32x4, scratch_pool); + apr_off_t size = entry->size; + + /* Special rules apply to unused sections / items. The data must be a + * sequence of NUL bytes (not checked here) and the checksum is fixed to 0. + */ + if (entry->type == SVN_FS_X__ITEM_TYPE_UNUSED) + { + entry->fnv1_checksum = 0; + return SVN_NO_ERROR; + } + + /* Read the block and feed it to the checksum calculator. */ + SVN_ERR(svn_io_file_seek(rev_file->file, APR_SET, &entry->offset, + scratch_pool)); + while (size > 0) + { + apr_size_t to_read = size > sizeof(buffer) + ? sizeof(buffer) + : (apr_size_t)size; + SVN_ERR(svn_io_file_read_full2(rev_file->file, buffer, to_read, NULL, + NULL, scratch_pool)); + SVN_ERR(svn_checksum_update(context, buffer, to_read)); + size -= to_read; + } + + /* Store final checksum in ENTRY. */ + SVN_ERR(svn_checksum_final(&checksum, context, scratch_pool)); + entry->fnv1_checksum = ntohl(*(const apr_uint32_t *)checksum->digest); + + return SVN_NO_ERROR; +} + +/* + * Index (re-)creation utilities. + */ + +svn_error_t * +svn_fs_x__p2l_index_from_p2l_entries(const char **protoname, + svn_fs_t *fs, + svn_fs_x__revision_file_t *rev_file, + apr_array_header_t *entries, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool) +{ + apr_file_t *proto_index; + + /* Use a subpool for immediate temp file cleanup at the end of this + * function. */ + apr_pool_t *iterpool = svn_pool_create(scratch_pool); + int i; + + /* Create a proto-index file. */ + SVN_ERR(svn_io_open_unique_file3(NULL, protoname, NULL, + svn_io_file_del_on_pool_cleanup, + result_pool, scratch_pool)); + SVN_ERR(svn_fs_x__p2l_proto_index_open(&proto_index, *protoname, + scratch_pool)); + + /* Write ENTRIES to proto-index file and calculate checksums as we go. */ + for (i = 0; i < entries->nelts; ++i) + { + svn_fs_x__p2l_entry_t *entry + = APR_ARRAY_IDX(entries, i, svn_fs_x__p2l_entry_t *); + svn_pool_clear(iterpool); + + SVN_ERR(calc_fnv1(entry, rev_file, iterpool)); + SVN_ERR(svn_fs_x__p2l_proto_index_add_entry(proto_index, entry, + iterpool)); + } + + /* Convert proto-index into final index and move it into position. + * Note that REV_FILE contains the start revision of the shard file if it + * has been packed while REVISION may be somewhere in the middle. For + * non-packed shards, they will have identical values. */ + SVN_ERR(svn_io_file_close(proto_index, iterpool)); + + /* Temp file cleanup. */ + svn_pool_destroy(iterpool); + + return SVN_NO_ERROR; +} + +/* Decorator for svn_fs_x__p2l_entry_t that associates it with a sorted + * variant of its ITEMS array. + */ +typedef struct sub_item_ordered_t +{ + /* ENTRY that got wrapped */ + svn_fs_x__p2l_entry_t *entry; + + /* Array of pointers into ENTRY->ITEMS, sorted by their revision member + * _descending_ order. May be NULL if ENTRY->ITEM_COUNT < 2. */ + svn_fs_x__id_t **order; +} sub_item_ordered_t; + +/* implements compare_fn_t. Place LHS before RHS, if the latter is younger. + * Used to sort sub_item_ordered_t::order + */ +static int +compare_sub_items(const svn_fs_x__id_t * const * lhs, + const svn_fs_x__id_t * const * rhs) +{ + return (*lhs)->change_set < (*rhs)->change_set + ? 1 + : ((*lhs)->change_set > (*rhs)->change_set ? -1 : 0); +} + +/* implements compare_fn_t. Place LHS before RHS, if the latter belongs to + * a newer revision. + */ +static int +compare_p2l_info_rev(const sub_item_ordered_t * lhs, + const sub_item_ordered_t * rhs) +{ + svn_fs_x__id_t *lhs_part; + svn_fs_x__id_t *rhs_part; + + assert(lhs != rhs); + if (lhs->entry->item_count == 0) + return rhs->entry->item_count == 0 ? 0 : -1; + if (rhs->entry->item_count == 0) + return 1; + + lhs_part = lhs->order ? lhs->order[lhs->entry->item_count - 1] + : &lhs->entry->items[0]; + rhs_part = rhs->order ? rhs->order[rhs->entry->item_count - 1] + : &rhs->entry->items[0]; + + if (lhs_part->change_set == rhs_part->change_set) + return 0; + + return lhs_part->change_set < rhs_part->change_set ? -1 : 1; +} + +svn_error_t * +svn_fs_x__l2p_index_from_p2l_entries(const char **protoname, + svn_fs_t *fs, + apr_array_header_t *entries, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool) +{ + apr_file_t *proto_index; + + /* Use a subpool for immediate temp file cleanup at the end of this + * function. */ + apr_pool_t *iterpool = svn_pool_create(scratch_pool); + svn_revnum_t prev_rev = SVN_INVALID_REVNUM; + int i; + apr_uint32_t k; + svn_priority_queue__t *queue; + apr_size_t count = 0; + apr_array_header_t *sub_item_orders; + + /* Create the temporary proto-rev file. */ + SVN_ERR(svn_io_open_unique_file3(NULL, protoname, NULL, + svn_io_file_del_on_pool_cleanup, + result_pool, scratch_pool)); + SVN_ERR(svn_fs_x__l2p_proto_index_open(&proto_index, *protoname, + scratch_pool)); + + + /* wrap P2L entries such that we have access to the sub-items in revision + order. The ENTRY_COUNT member will point to the next item to read+1. */ + sub_item_orders = apr_array_make(scratch_pool, entries->nelts, + sizeof(sub_item_ordered_t)); + sub_item_orders->nelts = entries->nelts; + + for (i = 0; i < entries->nelts; ++i) + { + svn_fs_x__p2l_entry_t *entry + = APR_ARRAY_IDX(entries, i, svn_fs_x__p2l_entry_t *); + sub_item_ordered_t *ordered + = &APR_ARRAY_IDX(sub_item_orders, i, sub_item_ordered_t); + + /* skip unused regions (e.g. padding) */ + if (entry->item_count == 0) + { + --sub_item_orders->nelts; + continue; + } + + assert(entry); + ordered->entry = entry; + count += entry->item_count; + + if (entry->item_count > 1) + { + ordered->order + = apr_palloc(scratch_pool, + sizeof(*ordered->order) * entry->item_count); + for (k = 0; k < entry->item_count; ++k) + ordered->order[k] = &entry->items[k]; + + qsort(ordered->order, entry->item_count, sizeof(*ordered->order), + (int (*)(const void *, const void *))compare_sub_items); + } + } + + /* we need to write the index in ascending revision order */ + queue = svn_priority_queue__create + (sub_item_orders, + (int (*)(const void *, const void *))compare_p2l_info_rev); + + /* write index entries */ + for (i = 0; i < count; ++i) + { + svn_fs_x__id_t *sub_item; + sub_item_ordered_t *ordered = svn_priority_queue__peek(queue); + + if (ordered->entry->item_count > 0) + { + /* if there is only one item, we skip the overhead of having an + extra array for the item order */ + sub_item = ordered->order + ? ordered->order[ordered->entry->item_count - 1] + : &ordered->entry->items[0]; + + /* next revision? */ + if (prev_rev != svn_fs_x__get_revnum(sub_item->change_set)) + { + prev_rev = svn_fs_x__get_revnum(sub_item->change_set); + SVN_ERR(svn_fs_x__l2p_proto_index_add_revision + (proto_index, iterpool)); + } + + /* add entry */ + SVN_ERR(svn_fs_x__l2p_proto_index_add_entry + (proto_index, ordered->entry->offset, + (apr_uint32_t)(sub_item - ordered->entry->items), + sub_item->number, iterpool)); + + /* make ITEM_COUNT point the next sub-item to use+1 */ + --ordered->entry->item_count; + } + + /* process remaining sub-items (if any) of that container later */ + if (ordered->entry->item_count) + svn_priority_queue__update(queue); + else + svn_priority_queue__pop(queue); + + /* keep memory usage in check */ + if (i % 256 == 0) + svn_pool_clear(iterpool); + } + + /* Convert proto-index into final index and move it into position. */ + SVN_ERR(svn_io_file_close(proto_index, iterpool)); + + /* Temp file cleanup. */ + svn_pool_destroy(iterpool); + + return SVN_NO_ERROR; +} + + +/* + * Standard (de-)serialization functions + */ + +svn_error_t * +svn_fs_x__serialize_l2p_header(void **data, + apr_size_t *data_len, + void *in, + apr_pool_t *pool) +{ + l2p_header_t *header = in; + svn_temp_serializer__context_t *context; + svn_stringbuf_t *serialized; + apr_size_t page_count = header->page_table_index[header->revision_count]; + apr_size_t page_table_size = page_count * sizeof(*header->page_table); + apr_size_t index_size + = (header->revision_count + 1) * sizeof(*header->page_table_index); + apr_size_t data_size = sizeof(*header) + index_size + page_table_size; + + /* serialize header and all its elements */ + context = svn_temp_serializer__init(header, + sizeof(*header), + data_size + 32, + pool); + + /* page table index array */ + svn_temp_serializer__add_leaf(context, + (const void * const *)&header->page_table_index, + index_size); + + /* page table array */ + svn_temp_serializer__add_leaf(context, + (const void * const *)&header->page_table, + page_table_size); + + /* return the serialized result */ + serialized = svn_temp_serializer__get(context); + + *data = serialized->data; + *data_len = serialized->len; + + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__deserialize_l2p_header(void **out, + void *data, + apr_size_t data_len, + apr_pool_t *pool) +{ + l2p_header_t *header = (l2p_header_t *)data; + + /* resolve the pointers in the struct */ + svn_temp_deserializer__resolve(header, (void**)&header->page_table_index); + svn_temp_deserializer__resolve(header, (void**)&header->page_table); + + /* done */ + *out = header; + + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__serialize_l2p_page(void **data, + apr_size_t *data_len, + void *in, + apr_pool_t *pool) +{ + l2p_page_t *page = in; + svn_temp_serializer__context_t *context; + svn_stringbuf_t *serialized; + apr_size_t of_table_size = page->entry_count * sizeof(*page->offsets); + apr_size_t si_table_size = page->entry_count * sizeof(*page->sub_items); + + /* serialize struct and all its elements */ + context = svn_temp_serializer__init(page, + sizeof(*page), + of_table_size + si_table_size + + sizeof(*page) + 32, + pool); + + /* offsets and sub_items arrays */ + svn_temp_serializer__add_leaf(context, + (const void * const *)&page->offsets, + of_table_size); + svn_temp_serializer__add_leaf(context, + (const void * const *)&page->sub_items, + si_table_size); + + /* return the serialized result */ + serialized = svn_temp_serializer__get(context); + + *data = serialized->data; + *data_len = serialized->len; + + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__deserialize_l2p_page(void **out, + void *data, + apr_size_t data_len, + apr_pool_t *pool) +{ + l2p_page_t *page = data; + + /* resolve the pointers in the struct */ + svn_temp_deserializer__resolve(page, (void**)&page->offsets); + svn_temp_deserializer__resolve(page, (void**)&page->sub_items); + + /* done */ + *out = page; + + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__serialize_p2l_header(void **data, + apr_size_t *data_len, + void *in, + apr_pool_t *pool) +{ + p2l_header_t *header = in; + svn_temp_serializer__context_t *context; + svn_stringbuf_t *serialized; + apr_size_t table_size = (header->page_count + 1) * sizeof(*header->offsets); + + /* serialize header and all its elements */ + context = svn_temp_serializer__init(header, + sizeof(*header), + table_size + sizeof(*header) + 32, + pool); + + /* offsets array */ + svn_temp_serializer__add_leaf(context, + (const void * const *)&header->offsets, + table_size); + + /* return the serialized result */ + serialized = svn_temp_serializer__get(context); + + *data = serialized->data; + *data_len = serialized->len; + + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__deserialize_p2l_header(void **out, + void *data, + apr_size_t data_len, + apr_pool_t *pool) +{ + p2l_header_t *header = data; + + /* resolve the only pointer in the struct */ + svn_temp_deserializer__resolve(header, (void**)&header->offsets); + + /* done */ + *out = header; + + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__serialize_p2l_page(void **data, + apr_size_t *data_len, + void *in, + apr_pool_t *pool) +{ + apr_array_header_t *page = in; + svn_temp_serializer__context_t *context; + svn_stringbuf_t *serialized; + apr_size_t table_size = page->elt_size * page->nelts; + svn_fs_x__p2l_entry_t *entries = (svn_fs_x__p2l_entry_t *)page->elts; + int i; + + /* serialize array header and all its elements */ + context = svn_temp_serializer__init(page, + sizeof(*page), + table_size + sizeof(*page) + 32, + pool); + + /* items in the array */ + svn_temp_serializer__push(context, + (const void * const *)&page->elts, + table_size); + + for (i = 0; i < page->nelts; ++i) + svn_temp_serializer__add_leaf(context, + (const void * const *)&entries[i].items, + entries[i].item_count + * sizeof(*entries[i].items)); + + svn_temp_serializer__pop(context); + + /* return the serialized result */ + serialized = svn_temp_serializer__get(context); + + *data = serialized->data; + *data_len = serialized->len; + + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__deserialize_p2l_page(void **out, + void *data, + apr_size_t data_len, + apr_pool_t *pool) +{ + apr_array_header_t *page = (apr_array_header_t *)data; + svn_fs_x__p2l_entry_t *entries; + int i; + + /* resolve the only pointer in the struct */ + svn_temp_deserializer__resolve(page, (void**)&page->elts); + + /* resolve sub-struct pointers*/ + entries = (svn_fs_x__p2l_entry_t *)page->elts; + for (i = 0; i < page->nelts; ++i) + svn_temp_deserializer__resolve(entries, (void**)&entries[i].items); + + /* patch up members */ + page->pool = pool; + page->nalloc = page->nelts; + + /* done */ + *out = page; + + return SVN_NO_ERROR; +} diff --git a/subversion/libsvn_fs_x/index.h b/subversion/libsvn_fs_x/index.h new file mode 100644 index 0000000..4e0e1dd --- /dev/null +++ b/subversion/libsvn_fs_x/index.h @@ -0,0 +1,411 @@ +/* index.h : interface to FSX indexing functionality + * + * ==================================================================== + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + * ==================================================================== + */ + +#ifndef SVN_LIBSVN_FS__INDEX_H +#define SVN_LIBSVN_FS__INDEX_H + +#include "fs.h" +#include "rev_file.h" + +/* Per-defined item index values. They are used to identify empty or + * mandatory items. + */ +#define SVN_FS_X__ITEM_INDEX_UNUSED 0 /* invalid / reserved value */ +#define SVN_FS_X__ITEM_INDEX_CHANGES 1 /* list of changed paths */ +#define SVN_FS_X__ITEM_INDEX_ROOT_NODE 2 /* the root noderev */ +#define SVN_FS_X__ITEM_INDEX_FIRST_USER 3 /* first noderev to be freely + assigned */ + +/* Data / item types as stored in the phys-to-log index. + */ +#define SVN_FS_X__ITEM_TYPE_UNUSED 0 /* file section not used */ +#define SVN_FS_X__ITEM_TYPE_FILE_REP 1 /* item is a file representation */ +#define SVN_FS_X__ITEM_TYPE_DIR_REP 2 /* item is a directory rep. */ +#define SVN_FS_X__ITEM_TYPE_FILE_PROPS 3 /* item is a file property rep. */ +#define SVN_FS_X__ITEM_TYPE_DIR_PROPS 4 /* item is a directory prop rep */ +#define SVN_FS_X__ITEM_TYPE_NODEREV 5 /* item is a noderev */ +#define SVN_FS_X__ITEM_TYPE_CHANGES 6 /* item is a changed paths list */ + +#define SVN_FS_X__ITEM_TYPE_ANY_REP 7 /* item is any representation. + Only used in pre-format7. */ + +#define SVN_FS_X__ITEM_TYPE_CHANGES_CONT 8 /* item is a changes container */ +#define SVN_FS_X__ITEM_TYPE_NODEREVS_CONT 9 /* item is a noderevs container */ +#define SVN_FS_X__ITEM_TYPE_REPS_CONT 10 /* item is a representations + container */ + +/* (user visible) entry in the phys-to-log index. It describes a section + * of some packed / non-packed rev file as containing a specific item. + * There must be no overlapping / conflicting entries. + */ +typedef struct svn_fs_x__p2l_entry_t +{ + /* offset of the first byte that belongs to the item */ + apr_off_t offset; + + /* length of the item in bytes */ + apr_off_t size; + + /* type of the item (see SVN_FS_X__ITEM_TYPE_*) defines */ + apr_uint32_t type; + + /* modified FNV-1a checksum. 0 if unknown checksum */ + apr_uint32_t fnv1_checksum; + + /* Number of items in this block / container. Their list can be found + * in *ITEMS. 0 for unused sections. 1 for non-container items, + * > 1 for containers. */ + apr_uint32_t item_count; + + /* List of items in that block / container */ + svn_fs_x__id_t *items; +} svn_fs_x__p2l_entry_t; + +/* Return a (deep) copy of ENTRY, allocated in RESULT_POOL. + */ +svn_fs_x__p2l_entry_t * +svn_fs_x__p2l_entry_dup(const svn_fs_x__p2l_entry_t *entry, + apr_pool_t *result_pool); + +/* Open / create a log-to-phys index file with the full file path name + * FILE_NAME. Return the open file in *PROTO_INDEX allocated in + * RESULT_POOL. + */ +svn_error_t * +svn_fs_x__l2p_proto_index_open(apr_file_t **proto_index, + const char *file_name, + apr_pool_t *result_pool); + +/* Call this function before adding entries for the next revision to the + * log-to-phys index file in PROTO_INDEX. Use SCRATCH_POOL for temporary + * allocations. + */ +svn_error_t * +svn_fs_x__l2p_proto_index_add_revision(apr_file_t *proto_index, + apr_pool_t *scratch_pool); + +/* Add a new mapping, ITEM_INDEX to the (OFFSET, SUB_ITEM) pair, to log-to- + * phys index file in PROTO_INDEX. Please note that mappings may be added + * in any order but duplicate entries for the same ITEM_INDEX, SUB_ITEM + * are not supported. Not all possible index values need to be used. + * (OFFSET, SUB_ITEM) may be (-1, 0) to mark 'invalid' item indexes but + * that is already implied for all item indexes not explicitly given a + * mapping. + * + * Use SCRATCH_POOL for temporary allocations. + */ +svn_error_t * +svn_fs_x__l2p_proto_index_add_entry(apr_file_t *proto_index, + apr_off_t offset, + apr_uint32_t sub_item, + apr_uint64_t item_index, + apr_pool_t *scratch_pool); + +/* Use the proto index file stored at PROTO_FILE_NAME, construct the final + * log-to-phys index and append it to INDEX_FILE. The first revision will + * be REVISION, entries to the next revision will be assigned to REVISION+1 + * and so forth. + * + * Return the MD5 checksum of the on-disk index data in *CHECKSUM, allocated + * in RESULT_POOL. Use SCRATCH_POOL for temporary allocations. + */ +svn_error_t * +svn_fs_x__l2p_index_append(svn_checksum_t **checksum, + svn_fs_t *fs, + apr_file_t *index_file, + const char *proto_file_name, + svn_revnum_t revision, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool); + +/* Open / create a phys-to-log index file with the full file path name + * FILE_NAME. Return the open file in *PROTO_INDEX allocated in + * RESULT_POOL. + */ +svn_error_t * +svn_fs_x__p2l_proto_index_open(apr_file_t **proto_index, + const char *file_name, + apr_pool_t *result_pool); + +/* Add a new mapping ENTRY to the phys-to-log index file in PROTO_INDEX. + * The entries must be added in ascending offset order and must not leave + * intermittent ranges uncovered. The revision value in ENTRY may be + * SVN_INVALID_REVISION. Use SCRATCH_POOL for temporary allocations. + */ +svn_error_t * +svn_fs_x__p2l_proto_index_add_entry(apr_file_t *proto_index, + const svn_fs_x__p2l_entry_t *entry, + apr_pool_t *scratch_pool); + +/* Set *NEXT_OFFSET to the first offset behind the last entry in the + * phys-to-log proto index file PROTO_INDEX. This will be 0 for empty + * index files. Use SCRATCH_POOL for temporary allocations. + */ +svn_error_t * +svn_fs_x__p2l_proto_index_next_offset(apr_off_t *next_offset, + apr_file_t *proto_index, + apr_pool_t *scratch_pool); + +/* Use the proto index file stored at PROTO_FILE_NAME, construct the final + * phys-to-log index and append it to INDEX_FILE. Entries without a valid + * revision will be assigned to the REVISION given here. + * + * Return the MD5 checksum of the on-disk index data in *CHECKSUM, allocated + * in RESULT_POOL. Use SCRATCH_POOL for temporary allocations. + */ +svn_error_t * +svn_fs_x__p2l_index_append(svn_checksum_t **checksum, + svn_fs_t *fs, + apr_file_t *index_file, + const char *proto_file_name, + svn_revnum_t revision, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool); + +/* Use the phys-to-log mapping files in FS to build a list of entries + * that (at least partly) overlap with the range given by BLOCK_START + * offset and BLOCK_SIZE in the rep / pack file containing REVISION. + * Return the array in *ENTRIES with svn_fs_fs__p2l_entry_t as elements, + * allocated in RESULT_POOL. REV_FILE determines whether to access single + * rev or pack file data. If that is not available anymore (neither in + * cache nor on disk), return an error. Use SCRATCH_POOL for temporary + * allocations. + * + * Note that (only) the first and the last mapping may cross a cluster + * boundary. + */ +svn_error_t * +svn_fs_x__p2l_index_lookup(apr_array_header_t **entries, + svn_fs_t *fs, + svn_fs_x__revision_file_t *rev_file, + svn_revnum_t revision, + apr_off_t block_start, + apr_off_t block_size, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool); + +/* Use the phys-to-log mapping files in FS to return the entry for the + * container or single item starting at global OFFSET in the rep file + * containing REVISION in*ENTRY, allocated in RESULT_POOL. Sets *ENTRY + * to NULL if no item starts at exactly that offset. REV_FILE determines + * whether to access single rev or pack file data. If that is not available + * anymore (neither in cache nor on disk), return an error. + * Use SCRATCH_POOL for temporary allocations. + */ +svn_error_t * +svn_fs_x__p2l_entry_lookup(svn_fs_x__p2l_entry_t **entry, + svn_fs_t *fs, + svn_fs_x__revision_file_t *rev_file, + svn_revnum_t revision, + apr_off_t offset, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool); + +/* Use the phys-to-log mapping files in FS to return the svn_fs_x__id_t + * for the SUB_ITEM of the container starting at global OFFSET in the rep / + * pack file containing REVISION in *ITEM, allocated in RESULT_POOL. Sets + * *ITEM to NULL if no element starts at exactly that offset or if it + * contains no more than SUB_ITEM sub-items. + * + * Use SCRATCH_POOL for temporary allocations. + */ +svn_error_t * +svn_fs_x__p2l_item_lookup(svn_fs_x__id_t **item, + svn_fs_t *fs, + svn_fs_x__revision_file_t *rev_file, + svn_revnum_t revision, + apr_off_t offset, + apr_uint32_t sub_item, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool); + +/* For ITEM_ID in FS, return the position in the respective rev or pack file + * in *ABSOLUTE_POSITION and the *SUB_ITEM number within the object at that + * location. *SUB_ITEM will be 0 for non-container items. + * + * REV_FILE determines whether to access single rev or pack file data. + * If that is not available anymore (neither in cache nor on disk), re-open + * the rev / pack file and retry to open the index file. For transaction + * content, REV_FILE may be NULL. + * + * Use SCRATCH_POOL for temporary allocations. + */ +svn_error_t * +svn_fs_x__item_offset(apr_off_t *absolute_position, + apr_uint32_t *sub_item, + svn_fs_t *fs, + svn_fs_x__revision_file_t *rev_file, + const svn_fs_x__id_t *item_id, + apr_pool_t *scratch_pool); + +/* Use the log-to-phys indexes in FS to determine the maximum item indexes + * assigned to revision START_REV to START_REV + COUNT - 1. That is a + * close upper limit to the actual number of items in the respective revs. + * Return the results in *MAX_IDS, allocated in RESULT_POOL. + * Use SCRATCH_POOL for temporary allocations. + */ +svn_error_t * +svn_fs_x__l2p_get_max_ids(apr_array_header_t **max_ids, + svn_fs_t *fs, + svn_revnum_t start_rev, + apr_size_t count, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool); + +/* In *OFFSET, return the first OFFSET in the pack / rev file containing + * REVISION in FS not covered by the log-to-phys index. + * Use SCRATCH_POOL for temporary allocations. + */ +svn_error_t * +svn_fs_x__p2l_get_max_offset(apr_off_t *offset, + svn_fs_t *fs, + svn_fs_x__revision_file_t *rev_file, + svn_revnum_t revision, + apr_pool_t *scratch_pool); + +/* Index (re-)creation utilities. + */ + +/* For FS, create a new L2P auto-deleting proto index file in POOL and return + * its name in *PROTONAME. All entries to write are given in ENTRIES and + * entries are of type svn_fs_fs__p2l_entry_t* (sic!). The ENTRIES array + * will be reordered. Give the proto index file the lifetime of RESULT_POOL + * and use SCRATCH_POOL for temporary allocations. + */ +svn_error_t * +svn_fs_x__l2p_index_from_p2l_entries(const char **protoname, + svn_fs_t *fs, + apr_array_header_t *entries, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool); + +/* For FS, create a new P2L auto-deleting proto index file in POOL and return + * its name in *PROTONAME. All entries to write are given in ENTRIES and + * of type svn_fs_fs__p2l_entry_t*. The FVN1 checksums are not taken from + * ENTRIES but are begin calculated from the current contents of REV_FILE + * as we go. Give the proto index file the lifetime of RESULT_POOL and use + * SCRATCH_POOL for temporary allocations. + */ +svn_error_t * +svn_fs_x__p2l_index_from_p2l_entries(const char **protoname, + svn_fs_t *fs, + svn_fs_x__revision_file_t *rev_file, + apr_array_header_t *entries, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool); + +/* Serialization and caching interface + */ + +/* We use this key type to address individual pages from both index types. + */ +typedef struct svn_fs_x__page_cache_key_t +{ + /* in l2p: this is the revision of the items being mapped + in p2l: this is the start revision identifying the pack / rev file */ + apr_uint32_t revision; + + /* if TRUE, this is the index to a pack file + */ + svn_boolean_t is_packed; + + /* in l2p: page number within the revision + * in p2l: page number with the rev / pack file + */ + apr_uint64_t page; +} svn_fs_x__page_cache_key_t; + +/* + * Implements svn_cache__serialize_func_t for l2p_header_t objects. + */ +svn_error_t * +svn_fs_x__serialize_l2p_header(void **data, + apr_size_t *data_len, + void *in, + apr_pool_t *pool); + +/* + * Implements svn_cache__deserialize_func_t for l2p_header_t objects. + */ +svn_error_t * +svn_fs_x__deserialize_l2p_header(void **out, + void *data, + apr_size_t data_len, + apr_pool_t *pool); + +/* + * Implements svn_cache__serialize_func_t for l2p_page_t objects. + */ +svn_error_t * +svn_fs_x__serialize_l2p_page(void **data, + apr_size_t *data_len, + void *in, + apr_pool_t *pool); + +/* + * Implements svn_cache__deserialize_func_t for l2p_page_t objects. + */ +svn_error_t * +svn_fs_x__deserialize_l2p_page(void **out, + void *data, + apr_size_t data_len, + apr_pool_t *pool); + +/* + * Implements svn_cache__serialize_func_t for p2l_header_t objects. + */ +svn_error_t * +svn_fs_x__serialize_p2l_header(void **data, + apr_size_t *data_len, + void *in, + apr_pool_t *pool); + +/* + * Implements svn_cache__deserialize_func_t for p2l_header_t objects. + */ +svn_error_t * +svn_fs_x__deserialize_p2l_header(void **out, + void *data, + apr_size_t data_len, + apr_pool_t *pool); + +/* + * Implements svn_cache__serialize_func_t for apr_array_header_t objects + * with elements of type svn_fs_x__p2l_entry_t. + */ +svn_error_t * +svn_fs_x__serialize_p2l_page(void **data, + apr_size_t *data_len, + void *in, + apr_pool_t *pool); + +/* + * Implements svn_cache__deserialize_func_t for apr_array_header_t objects + * with elements of type svn_fs_x__p2l_entry_t. + */ +svn_error_t * +svn_fs_x__deserialize_p2l_page(void **out, + void *data, + apr_size_t data_len, + apr_pool_t *pool); + +#endif diff --git a/subversion/libsvn_fs_x/libsvn_fs_x.pc.in b/subversion/libsvn_fs_x/libsvn_fs_x.pc.in new file mode 100644 index 0000000..46d93dc --- /dev/null +++ b/subversion/libsvn_fs_x/libsvn_fs_x.pc.in @@ -0,0 +1,12 @@ +prefix=@prefix@ +exec_prefix=@exec_prefix@ +libdir=@libdir@ +includedir=@includedir@ + +Name: libsvn_fs_x +Description: Subversion FSX Repository Filesystem Library +Version: @PACKAGE_VERSION@ +Requires: apr-util-@SVN_APR_MAJOR_VERSION@ apr-@SVN_APR_MAJOR_VERSION@ +Requires.private: libsvn_delta libsvn_subr libsvn_fs_util +Libs: -L${libdir} -lsvn_fs_x +Cflags: -I${includedir} diff --git a/subversion/libsvn_fs_x/lock.c b/subversion/libsvn_fs_x/lock.c new file mode 100644 index 0000000..6819f63 --- /dev/null +++ b/subversion/libsvn_fs_x/lock.c @@ -0,0 +1,1492 @@ +/* lock.c : functions for manipulating filesystem locks. + * + * ==================================================================== + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + * ==================================================================== + */ + +#include "svn_pools.h" +#include "svn_error.h" +#include "svn_dirent_uri.h" +#include "svn_path.h" +#include "svn_fs.h" +#include "svn_hash.h" +#include "svn_time.h" +#include "svn_utf.h" + +#include <apr_uuid.h> +#include <apr_file_io.h> +#include <apr_file_info.h> + +#include "lock.h" +#include "tree.h" +#include "fs_x.h" +#include "transaction.h" +#include "util.h" +#include "../libsvn_fs/fs-loader.h" + +#include "private/svn_fs_util.h" +#include "private/svn_fspath.h" +#include "private/svn_sorts_private.h" +#include "svn_private_config.h" + +/* Names of hash keys used to store a lock for writing to disk. */ +#define PATH_KEY "path" +#define TOKEN_KEY "token" +#define OWNER_KEY "owner" +#define CREATION_DATE_KEY "creation_date" +#define EXPIRATION_DATE_KEY "expiration_date" +#define COMMENT_KEY "comment" +#define IS_DAV_COMMENT_KEY "is_dav_comment" +#define CHILDREN_KEY "children" + +/* Number of characters from the head of a digest file name used to + calculate a subdirectory in which to drop that file. */ +#define DIGEST_SUBDIR_LEN 3 + + + +/*** Generic helper functions. ***/ + +/* Set *DIGEST to the MD5 hash of STR. */ +static svn_error_t * +make_digest(const char **digest, + const char *str, + apr_pool_t *pool) +{ + svn_checksum_t *checksum; + + SVN_ERR(svn_checksum(&checksum, svn_checksum_md5, str, strlen(str), pool)); + + *digest = svn_checksum_to_cstring_display(checksum, pool); + return SVN_NO_ERROR; +} + + +/* Set the value of KEY (whose size is KEY_LEN, or APR_HASH_KEY_STRING + if unknown) to an svn_string_t-ized version of VALUE (whose size is + VALUE_LEN, or APR_HASH_KEY_STRING if unknown) in HASH. The value + will be allocated in POOL; KEY will not be duped. If either KEY or VALUE + is NULL, this function will do nothing. */ +static void +hash_store(apr_hash_t *hash, + const char *key, + apr_ssize_t key_len, + const char *value, + apr_ssize_t value_len, + apr_pool_t *pool) +{ + if (! (key && value)) + return; + if (value_len == APR_HASH_KEY_STRING) + value_len = strlen(value); + apr_hash_set(hash, key, key_len, + svn_string_ncreate(value, value_len, pool)); +} + + +/* Fetch the value of KEY from HASH, returning only the cstring data + of that value (if it exists). */ +static const char * +hash_fetch(apr_hash_t *hash, + const char *key) +{ + svn_string_t *str = svn_hash_gets(hash, key); + return str ? str->data : NULL; +} + + +/* SVN_ERR_FS_CORRUPT: the lockfile for PATH in FS is corrupt. */ +static svn_error_t * +err_corrupt_lockfile(const char *fs_path, const char *path) +{ + return + svn_error_createf( + SVN_ERR_FS_CORRUPT, 0, + _("Corrupt lockfile for path '%s' in filesystem '%s'"), + path, fs_path); +} + + +/*** Digest file handling functions. ***/ + +/* Return the path of the lock/entries file for which DIGEST is the + hashed repository relative path. */ +static const char * +digest_path_from_digest(const char *fs_path, + const char *digest, + apr_pool_t *pool) +{ + return svn_dirent_join_many(pool, fs_path, PATH_LOCKS_DIR, + apr_pstrmemdup(pool, digest, DIGEST_SUBDIR_LEN), + digest, SVN_VA_NULL); +} + + +/* Set *DIGEST_PATH to the path to the lock/entries digest file associate + with PATH, where PATH is the path to the lock file or lock entries file + in FS. */ +static svn_error_t * +digest_path_from_path(const char **digest_path, + const char *fs_path, + const char *path, + apr_pool_t *pool) +{ + const char *digest; + SVN_ERR(make_digest(&digest, path, pool)); + *digest_path = svn_dirent_join_many(pool, fs_path, PATH_LOCKS_DIR, + apr_pstrmemdup(pool, digest, + DIGEST_SUBDIR_LEN), + digest, SVN_VA_NULL); + return SVN_NO_ERROR; +} + + +/* Write to DIGEST_PATH a representation of CHILDREN (which may be + empty, if the versioned path in FS represented by DIGEST_PATH has + no children) and LOCK (which may be NULL if that versioned path is + lock itself locked). Set the permissions of DIGEST_PATH to those of + PERMS_REFERENCE. Use POOL for temporary allocations. + */ +static svn_error_t * +write_digest_file(apr_hash_t *children, + svn_lock_t *lock, + const char *fs_path, + const char *digest_path, + const char *perms_reference, + apr_pool_t *scratch_pool) +{ + svn_error_t *err = SVN_NO_ERROR; + svn_stream_t *stream; + apr_hash_index_t *hi; + apr_hash_t *hash = apr_hash_make(scratch_pool); + const char *tmp_path; + + SVN_ERR(svn_fs_x__ensure_dir_exists(svn_dirent_join(fs_path, PATH_LOCKS_DIR, + scratch_pool), + fs_path, scratch_pool)); + SVN_ERR(svn_fs_x__ensure_dir_exists(svn_dirent_dirname(digest_path, + scratch_pool), + fs_path, scratch_pool)); + + if (lock) + { + const char *creation_date = NULL, *expiration_date = NULL; + if (lock->creation_date) + creation_date = svn_time_to_cstring(lock->creation_date, + scratch_pool); + if (lock->expiration_date) + expiration_date = svn_time_to_cstring(lock->expiration_date, + scratch_pool); + + hash_store(hash, PATH_KEY, sizeof(PATH_KEY)-1, + lock->path, APR_HASH_KEY_STRING, scratch_pool); + hash_store(hash, TOKEN_KEY, sizeof(TOKEN_KEY)-1, + lock->token, APR_HASH_KEY_STRING, scratch_pool); + hash_store(hash, OWNER_KEY, sizeof(OWNER_KEY)-1, + lock->owner, APR_HASH_KEY_STRING, scratch_pool); + hash_store(hash, COMMENT_KEY, sizeof(COMMENT_KEY)-1, + lock->comment, APR_HASH_KEY_STRING, scratch_pool); + hash_store(hash, IS_DAV_COMMENT_KEY, sizeof(IS_DAV_COMMENT_KEY)-1, + lock->is_dav_comment ? "1" : "0", 1, scratch_pool); + hash_store(hash, CREATION_DATE_KEY, sizeof(CREATION_DATE_KEY)-1, + creation_date, APR_HASH_KEY_STRING, scratch_pool); + hash_store(hash, EXPIRATION_DATE_KEY, sizeof(EXPIRATION_DATE_KEY)-1, + expiration_date, APR_HASH_KEY_STRING, scratch_pool); + } + if (apr_hash_count(children)) + { + svn_stringbuf_t *children_list + = svn_stringbuf_create_empty(scratch_pool); + for (hi = apr_hash_first(scratch_pool, children); + hi; + hi = apr_hash_next(hi)) + { + svn_stringbuf_appendbytes(children_list, + apr_hash_this_key(hi), + apr_hash_this_key_len(hi)); + svn_stringbuf_appendbyte(children_list, '\n'); + } + hash_store(hash, CHILDREN_KEY, sizeof(CHILDREN_KEY)-1, + children_list->data, children_list->len, scratch_pool); + } + + SVN_ERR(svn_stream_open_unique(&stream, &tmp_path, + svn_dirent_dirname(digest_path, + scratch_pool), + svn_io_file_del_none, scratch_pool, + scratch_pool)); + if ((err = svn_hash_write2(hash, stream, SVN_HASH_TERMINATOR, + scratch_pool))) + { + svn_error_clear(svn_stream_close(stream)); + return svn_error_createf(err->apr_err, + err, + _("Cannot write lock/entries hashfile '%s'"), + svn_dirent_local_style(tmp_path, + scratch_pool)); + } + + SVN_ERR(svn_stream_close(stream)); + SVN_ERR(svn_io_file_rename(tmp_path, digest_path, scratch_pool)); + SVN_ERR(svn_io_copy_perms(perms_reference, digest_path, scratch_pool)); + return SVN_NO_ERROR; +} + + +/* Parse the file at DIGEST_PATH, populating the lock LOCK_P in that + file (if it exists, and if *LOCK_P is non-NULL) and the hash of + CHILDREN_P (if any exist, and if *CHILDREN_P is non-NULL). Use POOL + for all allocations. */ +static svn_error_t * +read_digest_file(apr_hash_t **children_p, + svn_lock_t **lock_p, + const char *fs_path, + const char *digest_path, + apr_pool_t *pool) +{ + svn_error_t *err = SVN_NO_ERROR; + svn_lock_t *lock; + apr_hash_t *hash; + svn_stream_t *stream; + const char *val; + svn_node_kind_t kind; + + if (lock_p) + *lock_p = NULL; + if (children_p) + *children_p = apr_hash_make(pool); + + SVN_ERR(svn_io_check_path(digest_path, &kind, pool)); + if (kind == svn_node_none) + return SVN_NO_ERROR; + + /* If our caller doesn't care about anything but the presence of the + file... whatever. */ + if (kind == svn_node_file && !lock_p && !children_p) + return SVN_NO_ERROR; + + SVN_ERR(svn_stream_open_readonly(&stream, digest_path, pool, pool)); + + hash = apr_hash_make(pool); + if ((err = svn_hash_read2(hash, stream, SVN_HASH_TERMINATOR, pool))) + { + svn_error_clear(svn_stream_close(stream)); + return svn_error_createf(err->apr_err, + err, + _("Can't parse lock/entries hashfile '%s'"), + svn_dirent_local_style(digest_path, pool)); + } + SVN_ERR(svn_stream_close(stream)); + + /* If our caller cares, see if we have a lock path in our hash. If + so, we'll assume we have a lock here. */ + val = hash_fetch(hash, PATH_KEY); + if (val && lock_p) + { + const char *path = val; + + /* Create our lock and load it up. */ + lock = svn_lock_create(pool); + lock->path = path; + + if (! ((lock->token = hash_fetch(hash, TOKEN_KEY)))) + return svn_error_trace(err_corrupt_lockfile(fs_path, path)); + + if (! ((lock->owner = hash_fetch(hash, OWNER_KEY)))) + return svn_error_trace(err_corrupt_lockfile(fs_path, path)); + + if (! ((val = hash_fetch(hash, IS_DAV_COMMENT_KEY)))) + return svn_error_trace(err_corrupt_lockfile(fs_path, path)); + lock->is_dav_comment = (val[0] == '1'); + + if (! ((val = hash_fetch(hash, CREATION_DATE_KEY)))) + return svn_error_trace(err_corrupt_lockfile(fs_path, path)); + SVN_ERR(svn_time_from_cstring(&(lock->creation_date), val, pool)); + + if ((val = hash_fetch(hash, EXPIRATION_DATE_KEY))) + SVN_ERR(svn_time_from_cstring(&(lock->expiration_date), val, pool)); + + lock->comment = hash_fetch(hash, COMMENT_KEY); + + *lock_p = lock; + } + + /* If our caller cares, see if we have any children for this path. */ + val = hash_fetch(hash, CHILDREN_KEY); + if (val && children_p) + { + apr_array_header_t *kiddos = svn_cstring_split(val, "\n", FALSE, pool); + int i; + + for (i = 0; i < kiddos->nelts; i++) + { + svn_hash_sets(*children_p, APR_ARRAY_IDX(kiddos, i, const char *), + (void *)1); + } + } + return SVN_NO_ERROR; +} + + + +/*** Lock helper functions (path here are still FS paths, not on-disk + schema-supporting paths) ***/ + + +/* Write LOCK in FS to the actual OS filesystem. + + Use PERMS_REFERENCE for the permissions of any digest files. + */ +static svn_error_t * +set_lock(const char *fs_path, + svn_lock_t *lock, + const char *perms_reference, + apr_pool_t *scratch_pool) +{ + const char *digest_path; + apr_hash_t *children; + + SVN_ERR(digest_path_from_path(&digest_path, fs_path, lock->path, + scratch_pool)); + + /* We could get away without reading the file as children should + always come back empty. */ + SVN_ERR(read_digest_file(&children, NULL, fs_path, digest_path, + scratch_pool)); + + SVN_ERR(write_digest_file(children, lock, fs_path, digest_path, + perms_reference, scratch_pool)); + + return SVN_NO_ERROR; +} + +static svn_error_t * +delete_lock(const char *fs_path, + const char *path, + apr_pool_t *scratch_pool) +{ + const char *digest_path; + + SVN_ERR(digest_path_from_path(&digest_path, fs_path, path, scratch_pool)); + + SVN_ERR(svn_io_remove_file2(digest_path, TRUE, scratch_pool)); + + return SVN_NO_ERROR; +} + +static svn_error_t * +add_to_digest(const char *fs_path, + apr_array_header_t *paths, + const char *index_path, + const char *perms_reference, + apr_pool_t *scratch_pool) +{ + const char *index_digest_path; + apr_hash_t *children; + svn_lock_t *lock; + int i; + unsigned int original_count; + + SVN_ERR(digest_path_from_path(&index_digest_path, fs_path, index_path, + scratch_pool)); + SVN_ERR(read_digest_file(&children, &lock, fs_path, index_digest_path, + scratch_pool)); + + original_count = apr_hash_count(children); + + for (i = 0; i < paths->nelts; ++i) + { + const char *path = APR_ARRAY_IDX(paths, i, const char *); + const char *digest_path, *digest_file; + + SVN_ERR(digest_path_from_path(&digest_path, fs_path, path, + scratch_pool)); + digest_file = svn_dirent_basename(digest_path, NULL); + svn_hash_sets(children, digest_file, (void *)1); + } + + if (apr_hash_count(children) != original_count) + SVN_ERR(write_digest_file(children, lock, fs_path, index_digest_path, + perms_reference, scratch_pool)); + + return SVN_NO_ERROR; +} + +static svn_error_t * +delete_from_digest(const char *fs_path, + apr_array_header_t *paths, + const char *index_path, + const char *perms_reference, + apr_pool_t *scratch_pool) +{ + const char *index_digest_path; + apr_hash_t *children; + svn_lock_t *lock; + int i; + + SVN_ERR(digest_path_from_path(&index_digest_path, fs_path, index_path, + scratch_pool)); + SVN_ERR(read_digest_file(&children, &lock, fs_path, index_digest_path, + scratch_pool)); + + for (i = 0; i < paths->nelts; ++i) + { + const char *path = APR_ARRAY_IDX(paths, i, const char *); + const char *digest_path, *digest_file; + + SVN_ERR(digest_path_from_path(&digest_path, fs_path, path, + scratch_pool)); + digest_file = svn_dirent_basename(digest_path, NULL); + svn_hash_sets(children, digest_file, NULL); + } + + if (apr_hash_count(children) || lock) + SVN_ERR(write_digest_file(children, lock, fs_path, index_digest_path, + perms_reference, scratch_pool)); + else + SVN_ERR(svn_io_remove_file2(index_digest_path, TRUE, scratch_pool)); + + return SVN_NO_ERROR; +} + +static svn_error_t * +unlock_single(svn_fs_t *fs, + svn_lock_t *lock, + apr_pool_t *pool); + +/* Set *LOCK_P to the lock for PATH in FS. HAVE_WRITE_LOCK should be + TRUE if the caller (or one of its callers) has taken out the + repository-wide write lock, FALSE otherwise. If MUST_EXIST is + not set, the function will simply return NULL in *LOCK_P instead + of creating an SVN_FS__ERR_NO_SUCH_LOCK error in case the lock + was not found (much faster). Use POOL for allocations. */ +static svn_error_t * +get_lock(svn_lock_t **lock_p, + svn_fs_t *fs, + const char *path, + svn_boolean_t have_write_lock, + svn_boolean_t must_exist, + apr_pool_t *pool) +{ + svn_lock_t *lock = NULL; + const char *digest_path; + svn_node_kind_t kind; + + SVN_ERR(digest_path_from_path(&digest_path, fs->path, path, pool)); + SVN_ERR(svn_io_check_path(digest_path, &kind, pool)); + + *lock_p = NULL; + if (kind != svn_node_none) + SVN_ERR(read_digest_file(NULL, &lock, fs->path, digest_path, pool)); + + if (! lock) + return must_exist ? SVN_FS__ERR_NO_SUCH_LOCK(fs, path) : SVN_NO_ERROR; + + /* Don't return an expired lock. */ + if (lock->expiration_date && (apr_time_now() > lock->expiration_date)) + { + /* Only remove the lock if we have the write lock. + Read operations shouldn't change the filesystem. */ + if (have_write_lock) + SVN_ERR(unlock_single(fs, lock, pool)); + return SVN_FS__ERR_LOCK_EXPIRED(fs, lock->token); + } + + *lock_p = lock; + return SVN_NO_ERROR; +} + + +/* Set *LOCK_P to the lock for PATH in FS. HAVE_WRITE_LOCK should be + TRUE if the caller (or one of its callers) has taken out the + repository-wide write lock, FALSE otherwise. Use POOL for + allocations. */ +static svn_error_t * +get_lock_helper(svn_fs_t *fs, + svn_lock_t **lock_p, + const char *path, + svn_boolean_t have_write_lock, + apr_pool_t *pool) +{ + svn_lock_t *lock; + svn_error_t *err; + + err = get_lock(&lock, fs, path, have_write_lock, FALSE, pool); + + /* We've deliberately decided that this function doesn't tell the + caller *why* the lock is unavailable. */ + if (err && ((err->apr_err == SVN_ERR_FS_NO_SUCH_LOCK) + || (err->apr_err == SVN_ERR_FS_LOCK_EXPIRED))) + { + svn_error_clear(err); + *lock_p = NULL; + return SVN_NO_ERROR; + } + else + SVN_ERR(err); + + *lock_p = lock; + return SVN_NO_ERROR; +} + + +/* Baton for locks_walker(). */ +typedef struct walk_locks_baton_t +{ + svn_fs_get_locks_callback_t get_locks_func; + void *get_locks_baton; + svn_fs_t *fs; +} walk_locks_baton_t; + +/* Implements walk_digests_callback_t. */ +static svn_error_t * +locks_walker(void *baton, + const char *fs_path, + const char *digest_path, + svn_lock_t *lock, + svn_boolean_t have_write_lock, + apr_pool_t *pool) +{ + walk_locks_baton_t *wlb = baton; + + if (lock) + { + /* Don't report an expired lock. */ + if (lock->expiration_date == 0 + || (apr_time_now() <= lock->expiration_date)) + { + if (wlb->get_locks_func) + SVN_ERR(wlb->get_locks_func(wlb->get_locks_baton, lock, pool)); + } + else + { + /* Only remove the lock if we have the write lock. + Read operations shouldn't change the filesystem. */ + if (have_write_lock) + SVN_ERR(unlock_single(wlb->fs, lock, pool)); + } + } + + return SVN_NO_ERROR; +} + +/* Callback type for walk_digest_files(). + * + * LOCK come from a read_digest_file(digest_path) call. + */ +typedef svn_error_t *(*walk_digests_callback_t)(void *baton, + const char *fs_path, + const char *digest_path, + svn_lock_t *lock, + svn_boolean_t have_write_lock, + apr_pool_t *pool); + +/* A function that calls WALK_DIGESTS_FUNC/WALK_DIGESTS_BATON for + all lock digest files in and under PATH in FS. + HAVE_WRITE_LOCK should be true if the caller (directly or indirectly) + has the FS write lock. */ +static svn_error_t * +walk_digest_files(const char *fs_path, + const char *digest_path, + walk_digests_callback_t walk_digests_func, + void *walk_digests_baton, + svn_boolean_t have_write_lock, + apr_pool_t *pool) +{ + apr_hash_index_t *hi; + apr_hash_t *children; + apr_pool_t *subpool; + svn_lock_t *lock; + + /* First, send up any locks in the current digest file. */ + SVN_ERR(read_digest_file(&children, &lock, fs_path, digest_path, pool)); + + SVN_ERR(walk_digests_func(walk_digests_baton, fs_path, digest_path, lock, + have_write_lock, pool)); + + /* Now, report all the child entries (if any; bail otherwise). */ + if (! apr_hash_count(children)) + return SVN_NO_ERROR; + subpool = svn_pool_create(pool); + for (hi = apr_hash_first(pool, children); hi; hi = apr_hash_next(hi)) + { + const char *digest = apr_hash_this_key(hi); + svn_pool_clear(subpool); + + SVN_ERR(read_digest_file + (NULL, &lock, fs_path, + digest_path_from_digest(fs_path, digest, subpool), subpool)); + + SVN_ERR(walk_digests_func(walk_digests_baton, fs_path, digest_path, lock, + have_write_lock, subpool)); + } + svn_pool_destroy(subpool); + return SVN_NO_ERROR; +} + +/* A function that calls GET_LOCKS_FUNC/GET_LOCKS_BATON for + all locks in and under PATH in FS. + HAVE_WRITE_LOCK should be true if the caller (directly or indirectly) + has the FS write lock. */ +static svn_error_t * +walk_locks(svn_fs_t *fs, + const char *digest_path, + svn_fs_get_locks_callback_t get_locks_func, + void *get_locks_baton, + svn_boolean_t have_write_lock, + apr_pool_t *pool) +{ + walk_locks_baton_t wlb; + + wlb.get_locks_func = get_locks_func; + wlb.get_locks_baton = get_locks_baton; + wlb.fs = fs; + SVN_ERR(walk_digest_files(fs->path, digest_path, locks_walker, &wlb, + have_write_lock, pool)); + return SVN_NO_ERROR; +} + + +/* Utility function: verify that a lock can be used. Interesting + errors returned from this function: + + SVN_ERR_FS_NO_USER: No username attached to FS. + SVN_ERR_FS_LOCK_OWNER_MISMATCH: FS's username doesn't match LOCK's owner. + SVN_ERR_FS_BAD_LOCK_TOKEN: FS doesn't hold matching lock-token for LOCK. + */ +static svn_error_t * +verify_lock(svn_fs_t *fs, + svn_lock_t *lock) +{ + if ((! fs->access_ctx) || (! fs->access_ctx->username)) + return svn_error_createf + (SVN_ERR_FS_NO_USER, NULL, + _("Cannot verify lock on path '%s'; no username available"), + lock->path); + + else if (strcmp(fs->access_ctx->username, lock->owner) != 0) + return svn_error_createf + (SVN_ERR_FS_LOCK_OWNER_MISMATCH, NULL, + _("User '%s' does not own lock on path '%s' (currently locked by '%s')"), + fs->access_ctx->username, lock->path, lock->owner); + + else if (svn_hash_gets(fs->access_ctx->lock_tokens, lock->token) == NULL) + return svn_error_createf + (SVN_ERR_FS_BAD_LOCK_TOKEN, NULL, + _("Cannot verify lock on path '%s'; no matching lock-token available"), + lock->path); + + return SVN_NO_ERROR; +} + + +/* This implements the svn_fs_get_locks_callback_t interface, where + BATON is just an svn_fs_t object. */ +static svn_error_t * +get_locks_callback(void *baton, + svn_lock_t *lock, + apr_pool_t *pool) +{ + return verify_lock(baton, lock); +} + + +/* The main routine for lock enforcement, used throughout libsvn_fs_x. */ +svn_error_t * +svn_fs_x__allow_locked_operation(const char *path, + svn_fs_t *fs, + svn_boolean_t recurse, + svn_boolean_t have_write_lock, + apr_pool_t *scratch_pool) +{ + path = svn_fs__canonicalize_abspath(path, scratch_pool); + if (recurse) + { + /* Discover all locks at or below the path. */ + const char *digest_path; + SVN_ERR(digest_path_from_path(&digest_path, fs->path, path, + scratch_pool)); + SVN_ERR(walk_locks(fs, digest_path, get_locks_callback, + fs, have_write_lock, scratch_pool)); + } + else + { + /* Discover and verify any lock attached to the path. */ + svn_lock_t *lock; + SVN_ERR(get_lock_helper(fs, &lock, path, have_write_lock, + scratch_pool)); + if (lock) + SVN_ERR(verify_lock(fs, lock)); + } + return SVN_NO_ERROR; +} + +/* The effective arguments for lock_body() below. */ +typedef struct lock_baton_t { + svn_fs_t *fs; + apr_array_header_t *targets; + apr_array_header_t *infos; + const char *comment; + svn_boolean_t is_dav_comment; + apr_time_t expiration_date; + svn_boolean_t steal_lock; + apr_pool_t *result_pool; +} lock_baton_t; + +static svn_error_t * +check_lock(svn_error_t **fs_err, + const char *path, + const svn_fs_lock_target_t *target, + lock_baton_t *lb, + svn_fs_root_t *root, + svn_revnum_t youngest_rev, + apr_pool_t *pool) +{ + svn_node_kind_t kind; + svn_lock_t *existing_lock; + + *fs_err = SVN_NO_ERROR; + + SVN_ERR(svn_fs_x__check_path(&kind, root, path, pool)); + if (kind == svn_node_dir) + { + *fs_err = SVN_FS__ERR_NOT_FILE(lb->fs, path); + return SVN_NO_ERROR; + } + + /* While our locking implementation easily supports the locking of + nonexistent paths, we deliberately choose not to allow such madness. */ + if (kind == svn_node_none) + { + if (SVN_IS_VALID_REVNUM(target->current_rev)) + *fs_err = svn_error_createf( + SVN_ERR_FS_OUT_OF_DATE, NULL, + _("Path '%s' doesn't exist in HEAD revision"), + path); + else + *fs_err = svn_error_createf( + SVN_ERR_FS_NOT_FOUND, NULL, + _("Path '%s' doesn't exist in HEAD revision"), + path); + + return SVN_NO_ERROR; + } + + /* Is the caller attempting to lock an out-of-date working file? */ + if (SVN_IS_VALID_REVNUM(target->current_rev)) + { + svn_revnum_t created_rev; + + if (target->current_rev > youngest_rev) + { + *fs_err = svn_error_createf(SVN_ERR_FS_NO_SUCH_REVISION, NULL, + _("No such revision %ld"), + target->current_rev); + return SVN_NO_ERROR; + } + + SVN_ERR(svn_fs_x__node_created_rev(&created_rev, root, path, + pool)); + + /* SVN_INVALID_REVNUM means the path doesn't exist. So + apparently somebody is trying to lock something in their + working copy, but somebody else has deleted the thing + from HEAD. That counts as being 'out of date'. */ + if (! SVN_IS_VALID_REVNUM(created_rev)) + { + *fs_err = svn_error_createf + (SVN_ERR_FS_OUT_OF_DATE, NULL, + _("Path '%s' doesn't exist in HEAD revision"), path); + + return SVN_NO_ERROR; + } + + if (target->current_rev < created_rev) + { + *fs_err = svn_error_createf + (SVN_ERR_FS_OUT_OF_DATE, NULL, + _("Lock failed: newer version of '%s' exists"), path); + + return SVN_NO_ERROR; + } + } + + /* If the caller provided a TOKEN, we *really* need to see + if a lock already exists with that token, and if so, verify that + the lock's path matches PATH. Otherwise we run the risk of + breaking the 1-to-1 mapping of lock tokens to locked paths. */ + /* ### TODO: actually do this check. This is tough, because the + schema doesn't supply a lookup-by-token mechanism. */ + + /* Is the path already locked? + + Note that this next function call will automatically ignore any + errors about {the path not existing as a key, the path's token + not existing as a key, the lock just having been expired}. And + that's totally fine. Any of these three errors are perfectly + acceptable to ignore; it means that the path is now free and + clear for locking, because the fsx funcs just cleared out both + of the tables for us. */ + SVN_ERR(get_lock_helper(lb->fs, &existing_lock, path, TRUE, pool)); + if (existing_lock) + { + if (! lb->steal_lock) + { + /* Sorry, the path is already locked. */ + *fs_err = SVN_FS__ERR_PATH_ALREADY_LOCKED(lb->fs, existing_lock); + return SVN_NO_ERROR; + } + } + + return SVN_NO_ERROR; +} + +typedef struct lock_info_t { + const char *path; + const char *component; + svn_lock_t *lock; + svn_error_t *fs_err; +} lock_info_t; + +/* The body of svn_fs_x__lock(), which see. + + BATON is a 'lock_baton_t *' holding the effective arguments. + BATON->targets is an array of 'svn_sort__item_t' targets, sorted by + path, mapping canonical path to 'svn_fs_lock_target_t'. Set + BATON->infos to an array of 'lock_info_t' holding the results. For + the other arguments, see svn_fs_lock_many(). + + This implements the svn_fs_x__with_write_lock() 'body' callback + type, and assumes that the write lock is held. + */ +static svn_error_t * +lock_body(void *baton, apr_pool_t *pool) +{ + lock_baton_t *lb = baton; + svn_fs_root_t *root; + svn_revnum_t youngest; + const char *rev_0_path; + int i, outstanding = 0; + apr_pool_t *iterpool = svn_pool_create(pool); + + lb->infos = apr_array_make(lb->result_pool, lb->targets->nelts, + sizeof(lock_info_t)); + + /* Until we implement directory locks someday, we only allow locks + on files or non-existent paths. */ + /* Use fs->vtable->foo instead of svn_fs_foo to avoid circular + library dependencies, which are not portable. */ + SVN_ERR(lb->fs->vtable->youngest_rev(&youngest, lb->fs, pool)); + SVN_ERR(lb->fs->vtable->revision_root(&root, lb->fs, youngest, pool)); + + for (i = 0; i < lb->targets->nelts; ++i) + { + const svn_sort__item_t *item = &APR_ARRAY_IDX(lb->targets, i, + svn_sort__item_t); + const svn_fs_lock_target_t *target = item->value; + lock_info_t info; + + svn_pool_clear(iterpool); + + info.path = item->key; + SVN_ERR(check_lock(&info.fs_err, info.path, target, lb, root, + youngest, iterpool)); + info.lock = NULL; + info.component = NULL; + APR_ARRAY_PUSH(lb->infos, lock_info_t) = info; + if (!info.fs_err) + ++outstanding; + } + + rev_0_path = svn_fs_x__path_rev_absolute(lb->fs, 0, pool); + + /* Given the paths: + + /foo/bar/f + /foo/bar/g + /zig/x + + we loop through repeatedly. The first pass sees '/' on all paths + and writes the '/' index. The second pass sees '/foo' twice and + writes that index followed by '/zig' and that index. The third + pass sees '/foo/bar' twice and writes that index, and then writes + the lock for '/zig/x'. The fourth pass writes the locks for + '/foo/bar/f' and '/foo/bar/g'. + + Writing indices before locks is correct: if interrupted it leaves + indices without locks rather than locks without indices. An + index without a lock is consistent in that it always shows up as + unlocked in svn_fs_x__allow_locked_operation. A lock without an + index is inconsistent, svn_fs_x__allow_locked_operation will + show locked on the file but unlocked on the parent. */ + + + while (outstanding) + { + const char *last_path = NULL; + apr_array_header_t *paths; + + svn_pool_clear(iterpool); + paths = apr_array_make(iterpool, 1, sizeof(const char *)); + + for (i = 0; i < lb->infos->nelts; ++i) + { + lock_info_t *info = &APR_ARRAY_IDX(lb->infos, i, lock_info_t); + const svn_sort__item_t *item = &APR_ARRAY_IDX(lb->targets, i, + svn_sort__item_t); + const svn_fs_lock_target_t *target = item->value; + + if (!info->fs_err && !info->lock) + { + if (!info->component) + { + info->component = info->path; + APR_ARRAY_PUSH(paths, const char *) = info->path; + last_path = "/"; + } + else + { + info->component = strchr(info->component + 1, '/'); + if (!info->component) + { + /* The component is a path to lock, this cannot + match a previous path that need to be indexed. */ + if (paths->nelts) + { + SVN_ERR(add_to_digest(lb->fs->path, paths, last_path, + rev_0_path, iterpool)); + apr_array_clear(paths); + last_path = NULL; + } + + info->lock = svn_lock_create(lb->result_pool); + if (target->token) + info->lock->token = target->token; + else + SVN_ERR(svn_fs_x__generate_lock_token( + &(info->lock->token), lb->fs, + lb->result_pool)); + info->lock->path = info->path; + info->lock->owner = lb->fs->access_ctx->username; + info->lock->comment = lb->comment; + info->lock->is_dav_comment = lb->is_dav_comment; + info->lock->creation_date = apr_time_now(); + info->lock->expiration_date = lb->expiration_date; + + info->fs_err = set_lock(lb->fs->path, info->lock, + rev_0_path, iterpool); + --outstanding; + } + else + { + /* The component is a path to an index. */ + apr_size_t len = info->component - info->path; + + if (last_path + && (strncmp(last_path, info->path, len) + || strlen(last_path) != len)) + { + /* No match to the previous paths to index. */ + SVN_ERR(add_to_digest(lb->fs->path, paths, last_path, + rev_0_path, iterpool)); + apr_array_clear(paths); + last_path = NULL; + } + APR_ARRAY_PUSH(paths, const char *) = info->path; + if (!last_path) + last_path = apr_pstrndup(iterpool, info->path, len); + } + } + } + + if (last_path && i == lb->infos->nelts - 1) + SVN_ERR(add_to_digest(lb->fs->path, paths, last_path, + rev_0_path, iterpool)); + } + } + + return SVN_NO_ERROR; +} + +/* The effective arguments for unlock_body() below. */ +typedef struct unlock_baton_t { + svn_fs_t *fs; + apr_array_header_t *targets; + apr_array_header_t *infos; + /* Set skip_check TRUE to prevent the checks that set infos[].fs_err. */ + svn_boolean_t skip_check; + svn_boolean_t break_lock; + apr_pool_t *result_pool; +} unlock_baton_t; + +static svn_error_t * +check_unlock(svn_error_t **fs_err, + const char *path, + const char *token, + unlock_baton_t *ub, + svn_fs_root_t *root, + apr_pool_t *pool) +{ + svn_lock_t *lock; + + *fs_err = get_lock(&lock, ub->fs, path, TRUE, TRUE, pool); + if (!*fs_err && !ub->break_lock) + { + if (strcmp(token, lock->token) != 0) + *fs_err = SVN_FS__ERR_NO_SUCH_LOCK(ub->fs, path); + else if (strcmp(ub->fs->access_ctx->username, lock->owner) != 0) + *fs_err = SVN_FS__ERR_LOCK_OWNER_MISMATCH(ub->fs, + ub->fs->access_ctx->username, + lock->owner); + } + + return SVN_NO_ERROR; +} + +typedef struct unlock_info_t { + const char *path; + const char *component; + svn_error_t *fs_err; + svn_boolean_t done; + int components; +} unlock_info_t; + +/* The body of svn_fs_x__unlock(), which see. + + BATON is a 'unlock_baton_t *' holding the effective arguments. + BATON->targets is an array of 'svn_sort__item_t' targets, sorted by + path, mapping canonical path to (const char *) token. Set + BATON->infos to an array of 'unlock_info_t' results. For the other + arguments, see svn_fs_unlock_many(). + + This implements the svn_fs_x__with_write_lock() 'body' callback + type, and assumes that the write lock is held. + */ +static svn_error_t * +unlock_body(void *baton, apr_pool_t *pool) +{ + unlock_baton_t *ub = baton; + svn_fs_root_t *root; + svn_revnum_t youngest; + const char *rev_0_path; + int i, max_components = 0, outstanding = 0; + apr_pool_t *iterpool = svn_pool_create(pool); + + ub->infos = apr_array_make(ub->result_pool, ub->targets->nelts, + sizeof( unlock_info_t)); + + SVN_ERR(ub->fs->vtable->youngest_rev(&youngest, ub->fs, pool)); + SVN_ERR(ub->fs->vtable->revision_root(&root, ub->fs, youngest, pool)); + + for (i = 0; i < ub->targets->nelts; ++i) + { + const svn_sort__item_t *item = &APR_ARRAY_IDX(ub->targets, i, + svn_sort__item_t); + const char *token = item->value; + unlock_info_t info = { 0 }; + + svn_pool_clear(iterpool); + + info.path = item->key; + if (!ub->skip_check) + SVN_ERR(check_unlock(&info.fs_err, info.path, token, ub, root, + iterpool)); + if (!info.fs_err) + { + const char *s; + + info.components = 1; + info.component = info.path; + while((s = strchr(info.component + 1, '/'))) + { + info.component = s; + ++info.components; + } + + if (info.components > max_components) + max_components = info.components; + + ++outstanding; + } + APR_ARRAY_PUSH(ub->infos, unlock_info_t) = info; + } + + rev_0_path = svn_fs_x__path_rev_absolute(ub->fs, 0, pool); + + for (i = max_components; i >= 0; --i) + { + const char *last_path = NULL; + apr_array_header_t *paths; + int j; + + svn_pool_clear(iterpool); + paths = apr_array_make(pool, 1, sizeof(const char *)); + + for (j = 0; j < ub->infos->nelts; ++j) + { + unlock_info_t *info = &APR_ARRAY_IDX(ub->infos, j, unlock_info_t); + + if (!info->fs_err && info->path) + { + + if (info->components == i) + { + SVN_ERR(delete_lock(ub->fs->path, info->path, iterpool)); + info->done = TRUE; + } + else if (info->components > i) + { + apr_size_t len = info->component - info->path; + + if (last_path + && strcmp(last_path, "/") + && (strncmp(last_path, info->path, len) + || strlen(last_path) != len)) + { + SVN_ERR(delete_from_digest(ub->fs->path, paths, last_path, + rev_0_path, iterpool)); + apr_array_clear(paths); + last_path = NULL; + } + APR_ARRAY_PUSH(paths, const char *) = info->path; + if (!last_path) + { + if (info->component > info->path) + last_path = apr_pstrndup(pool, info->path, len); + else + last_path = "/"; + } + + if (info->component > info->path) + { + --info->component; + while(info->component[0] != '/') + --info->component; + } + } + } + + if (last_path && j == ub->infos->nelts - 1) + SVN_ERR(delete_from_digest(ub->fs->path, paths, last_path, + rev_0_path, iterpool)); + } + } + + return SVN_NO_ERROR; +} + +/* Unlock the lock described by LOCK->path and LOCK->token in FS. + + This assumes that the write lock is held. + */ +static svn_error_t * +unlock_single(svn_fs_t *fs, + svn_lock_t *lock, + apr_pool_t *scratch_pool) +{ + unlock_baton_t ub; + svn_sort__item_t item; + apr_array_header_t *targets = apr_array_make(scratch_pool, 1, + sizeof(svn_sort__item_t)); + item.key = lock->path; + item.klen = strlen(item.key); + item.value = (char*)lock->token; + APR_ARRAY_PUSH(targets, svn_sort__item_t) = item; + + ub.fs = fs; + ub.targets = targets; + ub.skip_check = TRUE; + ub.result_pool = scratch_pool; + + /* No ub.infos[].fs_err error because skip_check is TRUE. */ + SVN_ERR(unlock_body(&ub, scratch_pool)); + + return SVN_NO_ERROR; +} + + +/*** Public API implementations ***/ + +svn_error_t * +svn_fs_x__lock(svn_fs_t *fs, + apr_hash_t *targets, + const char *comment, + svn_boolean_t is_dav_comment, + apr_time_t expiration_date, + svn_boolean_t steal_lock, + svn_fs_lock_callback_t lock_callback, + void *lock_baton, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool) +{ + lock_baton_t lb; + apr_array_header_t *sorted_targets; + apr_hash_t *canonical_targets = apr_hash_make(scratch_pool); + apr_hash_index_t *hi; + apr_pool_t *iterpool; + svn_error_t *err, *cb_err = SVN_NO_ERROR; + int i; + + SVN_ERR(svn_fs__check_fs(fs, TRUE)); + + /* We need to have a username attached to the fs. */ + if (!fs->access_ctx || !fs->access_ctx->username) + return SVN_FS__ERR_NO_USER(fs); + + /* The FS locking API allows both canonical and non-canonical + paths which means that the same canonical path could be + represented more than once in the TARGETS hash. We just keep + one, choosing one with a token if possible. */ + for (hi = apr_hash_first(scratch_pool, targets); hi; hi = apr_hash_next(hi)) + { + const char *path = apr_hash_this_key(hi); + const svn_fs_lock_target_t *target = apr_hash_this_val(hi); + const svn_fs_lock_target_t *other; + + path = svn_fspath__canonicalize(path, result_pool); + other = svn_hash_gets(canonical_targets, path); + + if (!other || (!other->token && target->token)) + svn_hash_sets(canonical_targets, path, target); + } + + sorted_targets = svn_sort__hash(canonical_targets, + svn_sort_compare_items_as_paths, + scratch_pool); + + lb.fs = fs; + lb.targets = sorted_targets; + lb.comment = comment; + lb.is_dav_comment = is_dav_comment; + lb.expiration_date = expiration_date; + lb.steal_lock = steal_lock; + lb.result_pool = result_pool; + + iterpool = svn_pool_create(scratch_pool); + err = svn_fs_x__with_write_lock(fs, lock_body, &lb, iterpool); + for (i = 0; i < lb.infos->nelts; ++i) + { + struct lock_info_t *info = &APR_ARRAY_IDX(lb.infos, i, + struct lock_info_t); + svn_pool_clear(iterpool); + if (!cb_err && lock_callback) + { + if (!info->lock && !info->fs_err) + info->fs_err = svn_error_createf(SVN_ERR_FS_LOCK_OPERATION_FAILED, + 0, _("Failed to lock '%s'"), + info->path); + + cb_err = lock_callback(lock_baton, info->path, info->lock, + info->fs_err, iterpool); + } + svn_error_clear(info->fs_err); + } + svn_pool_destroy(iterpool); + + if (err && cb_err) + svn_error_compose(err, cb_err); + else if (!err) + err = cb_err; + + return svn_error_trace(err); +} + + +svn_error_t * +svn_fs_x__generate_lock_token(const char **token, + svn_fs_t *fs, + apr_pool_t *pool) +{ + SVN_ERR(svn_fs__check_fs(fs, TRUE)); + + /* Notice that 'fs' is currently unused. But perhaps someday, we'll + want to use the fs UUID + some incremented number? For now, we + generate a URI that matches the DAV RFC. We could change this to + some other URI scheme someday, if we wish. */ + *token = apr_pstrcat(pool, "opaquelocktoken:", + svn_uuid_generate(pool), SVN_VA_NULL); + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__unlock(svn_fs_t *fs, + apr_hash_t *targets, + svn_boolean_t break_lock, + svn_fs_lock_callback_t lock_callback, + void *lock_baton, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool) +{ + unlock_baton_t ub; + apr_array_header_t *sorted_targets; + apr_hash_t *canonical_targets = apr_hash_make(scratch_pool); + apr_hash_index_t *hi; + apr_pool_t *iterpool; + svn_error_t *err, *cb_err = SVN_NO_ERROR; + int i; + + SVN_ERR(svn_fs__check_fs(fs, TRUE)); + + /* We need to have a username attached to the fs. */ + if (!fs->access_ctx || !fs->access_ctx->username) + return SVN_FS__ERR_NO_USER(fs); + + for (hi = apr_hash_first(scratch_pool, targets); hi; hi = apr_hash_next(hi)) + { + const char *path = apr_hash_this_key(hi); + const char *token = apr_hash_this_val(hi); + const char *other; + + path = svn_fspath__canonicalize(path, result_pool); + other = svn_hash_gets(canonical_targets, path); + + if (!other) + svn_hash_sets(canonical_targets, path, token); + } + + sorted_targets = svn_sort__hash(canonical_targets, + svn_sort_compare_items_as_paths, + scratch_pool); + + ub.fs = fs; + ub.targets = sorted_targets; + ub.skip_check = FALSE; + ub.break_lock = break_lock; + ub.result_pool = result_pool; + + iterpool = svn_pool_create(scratch_pool); + err = svn_fs_x__with_write_lock(fs, unlock_body, &ub, iterpool); + for (i = 0; i < ub.infos->nelts; ++i) + { + unlock_info_t *info = &APR_ARRAY_IDX(ub.infos, i, unlock_info_t); + svn_pool_clear(iterpool); + if (!cb_err && lock_callback) + { + if (!info->done && !info->fs_err) + info->fs_err = svn_error_createf(SVN_ERR_FS_LOCK_OPERATION_FAILED, + 0, _("Failed to unlock '%s'"), + info->path); + cb_err = lock_callback(lock_baton, info->path, NULL, info->fs_err, + iterpool); + } + svn_error_clear(info->fs_err); + } + svn_pool_destroy(iterpool); + + if (err && cb_err) + svn_error_compose(err, cb_err); + else if (!err) + err = cb_err; + + return svn_error_trace(err); +} + + +svn_error_t * +svn_fs_x__get_lock(svn_lock_t **lock_p, + svn_fs_t *fs, + const char *path, + apr_pool_t *pool) +{ + SVN_ERR(svn_fs__check_fs(fs, TRUE)); + path = svn_fs__canonicalize_abspath(path, pool); + return get_lock_helper(fs, lock_p, path, FALSE, pool); +} + + +/* Baton for get_locks_filter_func(). */ +typedef struct get_locks_filter_baton_t +{ + const char *path; + svn_depth_t requested_depth; + svn_fs_get_locks_callback_t get_locks_func; + void *get_locks_baton; + +} get_locks_filter_baton_t; + + +/* A wrapper for the GET_LOCKS_FUNC passed to svn_fs_x__get_locks() + which filters out locks on paths that aren't within + BATON->requested_depth of BATON->path before called + BATON->get_locks_func() with BATON->get_locks_baton. + + NOTE: See issue #3660 for details about how the FSX lock + management code is inconsistent. Until that inconsistency is + resolved, we take this filtering approach rather than honoring + depth requests closer to the crawling code. In other words, once + we decide how to resolve issue #3660, there might be a more + performant way to honor the depth passed to svn_fs_x__get_locks(). */ +static svn_error_t * +get_locks_filter_func(void *baton, + svn_lock_t *lock, + apr_pool_t *pool) +{ + get_locks_filter_baton_t *b = baton; + + /* Filter out unwanted paths. Since Subversion only allows + locks on files, we can treat depth=immediates the same as + depth=files for filtering purposes. Meaning, we'll keep + this lock if: + + a) its path is the very path we queried, or + b) we've asked for a fully recursive answer, or + c) we've asked for depth=files or depth=immediates, and this + lock is on an immediate child of our query path. + */ + if ((strcmp(b->path, lock->path) == 0) + || (b->requested_depth == svn_depth_infinity)) + { + SVN_ERR(b->get_locks_func(b->get_locks_baton, lock, pool)); + } + else if ((b->requested_depth == svn_depth_files) || + (b->requested_depth == svn_depth_immediates)) + { + const char *rel_uri = svn_fspath__skip_ancestor(b->path, lock->path); + if (rel_uri && (svn_path_component_count(rel_uri) == 1)) + SVN_ERR(b->get_locks_func(b->get_locks_baton, lock, pool)); + } + + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__get_locks(svn_fs_t *fs, + const char *path, + svn_depth_t depth, + svn_fs_get_locks_callback_t get_locks_func, + void *get_locks_baton, + apr_pool_t *scratch_pool) +{ + const char *digest_path; + get_locks_filter_baton_t glfb; + + SVN_ERR(svn_fs__check_fs(fs, TRUE)); + path = svn_fs__canonicalize_abspath(path, scratch_pool); + + glfb.path = path; + glfb.requested_depth = depth; + glfb.get_locks_func = get_locks_func; + glfb.get_locks_baton = get_locks_baton; + + /* Get the top digest path in our tree of interest, and then walk it. */ + SVN_ERR(digest_path_from_path(&digest_path, fs->path, path, scratch_pool)); + SVN_ERR(walk_locks(fs, digest_path, get_locks_filter_func, &glfb, + FALSE, scratch_pool)); + return SVN_NO_ERROR; +} diff --git a/subversion/libsvn_fs_x/lock.h b/subversion/libsvn_fs_x/lock.h new file mode 100644 index 0000000..1db5eb7 --- /dev/null +++ b/subversion/libsvn_fs_x/lock.h @@ -0,0 +1,116 @@ +/* lock.h : internal interface to lock functions + * + * ==================================================================== + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + * ==================================================================== + */ + +#ifndef SVN_LIBSVN_FS_LOCK_H +#define SVN_LIBSVN_FS_LOCK_H + +#ifdef __cplusplus +extern "C" { +#endif /* __cplusplus */ + + + +/* These functions implement some of the calls in the FS loader + library's fs vtables. */ + +/* See svn_fs_lock(), svn_fs_lock_many(). */ +svn_error_t * +svn_fs_x__lock(svn_fs_t *fs, + apr_hash_t *targets, + const char *comment, + svn_boolean_t is_dav_comment, + apr_time_t expiration_date, + svn_boolean_t steal_lock, + svn_fs_lock_callback_t lock_callback, + void *lock_baton, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool); + +/* See svn_fs_generate_lock_token(). */ +svn_error_t * +svn_fs_x__generate_lock_token(const char **token, + svn_fs_t *fs, + apr_pool_t *pool); + +/* See svn_fs_unlock(), svn_fs_unlock_many(). */ +svn_error_t * +svn_fs_x__unlock(svn_fs_t *fs, + apr_hash_t *targets, + svn_boolean_t break_lock, + svn_fs_lock_callback_t lock_callback, + void *lock_baton, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool); + +/* See svn_fs_get_lock(). */ +svn_error_t * +svn_fs_x__get_lock(svn_lock_t **lock, + svn_fs_t *fs, + const char *path, + apr_pool_t *pool); + +/* See svn_fs_get_locks2(). */ +svn_error_t * +svn_fs_x__get_locks(svn_fs_t *fs, + const char *path, + svn_depth_t depth, + svn_fs_get_locks_callback_t get_locks_func, + void *get_locks_baton, + apr_pool_t *scratch_pool); + + +/* Examine PATH for existing locks, and check whether they can be + used. Use SCRATCH_POOL for temporary allocations. + + If no locks are present, return SVN_NO_ERROR. + + If PATH is locked (or contains locks "below" it, when RECURSE is + set), then verify that: + + 1. a username has been supplied to TRAIL->fs's access-context, + else return SVN_ERR_FS_NO_USER. + + 2. for every lock discovered, the current username in the access + context of TRAIL->fs matches the "owner" of the lock, else + return SVN_ERR_FS_LOCK_OWNER_MISMATCH. + + 3. for every lock discovered, a matching lock token has been + passed into TRAIL->fs's access-context, else return + SVN_ERR_FS_BAD_LOCK_TOKEN. + + If all three conditions are met, return SVN_NO_ERROR. + + If the caller (directly or indirectly) has the FS write lock, + HAVE_WRITE_LOCK should be true. +*/ +svn_error_t * +svn_fs_x__allow_locked_operation(const char *path, + svn_fs_t *fs, + svn_boolean_t recurse, + svn_boolean_t have_write_lock, + apr_pool_t *scratch_pool); + +#ifdef __cplusplus +} +#endif /* __cplusplus */ + +#endif /* SVN_LIBSVN_FS_LOCK_H */ diff --git a/subversion/libsvn_fs_x/low_level.c b/subversion/libsvn_fs_x/low_level.c new file mode 100644 index 0000000..76f3fd2 --- /dev/null +++ b/subversion/libsvn_fs_x/low_level.c @@ -0,0 +1,1123 @@ +/* low_level.c --- low level r/w access to fs_x file structures + * + * ==================================================================== + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + * ==================================================================== + */ + +#include "svn_private_config.h" +#include "svn_hash.h" +#include "svn_pools.h" +#include "svn_sorts.h" +#include "private/svn_sorts_private.h" +#include "private/svn_string_private.h" +#include "private/svn_subr_private.h" +#include "private/svn_fspath.h" + +#include "../libsvn_fs/fs-loader.h" + +#include "low_level.h" +#include "util.h" +#include "pack.h" +#include "cached_data.h" + +/* Headers used to describe node-revision in the revision file. */ +#define HEADER_ID "id" +#define HEADER_NODE "node" +#define HEADER_COPY "copy" +#define HEADER_TYPE "type" +#define HEADER_COUNT "count" +#define HEADER_PROPS "props" +#define HEADER_TEXT "text" +#define HEADER_CPATH "cpath" +#define HEADER_PRED "pred" +#define HEADER_COPYFROM "copyfrom" +#define HEADER_COPYROOT "copyroot" +#define HEADER_MINFO_HERE "minfo-here" +#define HEADER_MINFO_CNT "minfo-cnt" + +/* Kinds that a change can be. */ +#define ACTION_MODIFY "modify" +#define ACTION_ADD "add" +#define ACTION_DELETE "delete" +#define ACTION_REPLACE "replace" +#define ACTION_RESET "reset" + +/* True and False flags. */ +#define FLAG_TRUE "true" +#define FLAG_FALSE "false" + +/* Kinds of representation. */ +#define REP_DELTA "DELTA" + +/* An arbitrary maximum path length, so clients can't run us out of memory + * by giving us arbitrarily large paths. */ +#define FSX_MAX_PATH_LEN 4096 + +/* The 256 is an arbitrary size large enough to hold the node id and the + * various flags. */ +#define MAX_CHANGE_LINE_LEN FSX_MAX_PATH_LEN + 256 + +/* Convert the C string in *TEXT to a revision number and return it in *REV. + * Overflows, negative values other than -1 and terminating characters other + * than 0x20 or 0x0 will cause an error. Set *TEXT to the first char after + * the initial separator or to EOS. + */ +static svn_error_t * +parse_revnum(svn_revnum_t *rev, + const char **text) +{ + const char *string = *text; + if ((string[0] == '-') && (string[1] == '1')) + { + *rev = SVN_INVALID_REVNUM; + string += 2; + } + else + { + SVN_ERR(svn_revnum_parse(rev, string, &string)); + } + + if (*string == ' ') + ++string; + else if (*string != '\0') + return svn_error_create(SVN_ERR_FS_CORRUPT, NULL, + _("Invalid character in revision number")); + + *text = string; + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__parse_footer(apr_off_t *l2p_offset, + svn_checksum_t **l2p_checksum, + apr_off_t *p2l_offset, + svn_checksum_t **p2l_checksum, + svn_stringbuf_t *footer, + svn_revnum_t rev, + apr_pool_t *result_pool) +{ + apr_int64_t val; + char *last_str = footer->data; + + /* Get the L2P offset. */ + const char *str = svn_cstring_tokenize(" ", &last_str); + if (str == NULL) + return svn_error_create(SVN_ERR_FS_CORRUPT, NULL, + _("Invalid revision footer")); + + SVN_ERR(svn_cstring_atoi64(&val, str)); + *l2p_offset = (apr_off_t)val; + + /* Get the L2P checksum. */ + str = svn_cstring_tokenize(" ", &last_str); + if (str == NULL) + return svn_error_create(SVN_ERR_FS_CORRUPT, NULL, + _("Invalid revision footer")); + + SVN_ERR(svn_checksum_parse_hex(l2p_checksum, svn_checksum_md5, str, + result_pool)); + + /* Get the P2L offset. */ + str = svn_cstring_tokenize(" ", &last_str); + if (str == NULL) + return svn_error_create(SVN_ERR_FS_CORRUPT, NULL, + _("Invalid revision footer")); + + SVN_ERR(svn_cstring_atoi64(&val, str)); + *p2l_offset = (apr_off_t)val; + + /* Get the P2L checksum. */ + str = svn_cstring_tokenize(" ", &last_str); + if (str == NULL) + return svn_error_create(SVN_ERR_FS_CORRUPT, NULL, + _("Invalid revision footer")); + + SVN_ERR(svn_checksum_parse_hex(p2l_checksum, svn_checksum_md5, str, + result_pool)); + + return SVN_NO_ERROR; +} + +svn_stringbuf_t * +svn_fs_x__unparse_footer(apr_off_t l2p_offset, + svn_checksum_t *l2p_checksum, + apr_off_t p2l_offset, + svn_checksum_t *p2l_checksum, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool) +{ + return svn_stringbuf_createf(result_pool, + "%" APR_OFF_T_FMT " %s %" APR_OFF_T_FMT " %s", + l2p_offset, + svn_checksum_to_cstring(l2p_checksum, + scratch_pool), + p2l_offset, + svn_checksum_to_cstring(p2l_checksum, + scratch_pool)); +} + +/* Given a revision file FILE that has been pre-positioned at the + beginning of a Node-Rev header block, read in that header block and + store it in the apr_hash_t HEADERS. All allocations will be from + RESULT_POOL. */ +static svn_error_t * +read_header_block(apr_hash_t **headers, + svn_stream_t *stream, + apr_pool_t *result_pool) +{ + *headers = svn_hash__make(result_pool); + + while (1) + { + svn_stringbuf_t *header_str; + const char *name, *value; + apr_size_t i = 0; + apr_size_t name_len; + svn_boolean_t eof; + + SVN_ERR(svn_stream_readline(stream, &header_str, "\n", &eof, + result_pool)); + + if (eof || header_str->len == 0) + break; /* end of header block */ + + while (header_str->data[i] != ':') + { + if (header_str->data[i] == '\0') + return svn_error_createf(SVN_ERR_FS_CORRUPT, NULL, + _("Found malformed header '%s' in " + "revision file"), + header_str->data); + i++; + } + + /* Create a 'name' string and point to it. */ + header_str->data[i] = '\0'; + name = header_str->data; + name_len = i; + + /* Check if we have enough data to parse. */ + if (i + 2 > header_str->len) + { + /* Restore the original line for the error. */ + header_str->data[i] = ':'; + return svn_error_createf(SVN_ERR_FS_CORRUPT, NULL, + _("Found malformed header '%s' in " + "revision file"), + header_str->data); + } + + /* Skip over the NULL byte and the space following it. */ + i += 2; + + value = header_str->data + i; + + /* header_str is safely in our pool, so we can use bits of it as + key and value. */ + apr_hash_set(*headers, name, name_len, value); + } + + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__parse_representation(svn_fs_x__representation_t **rep_p, + svn_stringbuf_t *text, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool) +{ + svn_fs_x__representation_t *rep; + char *str; + apr_int64_t val; + char *string = text->data; + svn_checksum_t *checksum; + + rep = apr_pcalloc(result_pool, sizeof(*rep)); + *rep_p = rep; + + str = svn_cstring_tokenize(" ", &string); + if (str == NULL) + return svn_error_create(SVN_ERR_FS_CORRUPT, NULL, + _("Malformed text representation offset line in node-rev")); + + SVN_ERR(svn_cstring_atoi64(&rep->id.change_set, str)); + + /* while in transactions, it is legal to simply write "-1" */ + if (rep->id.change_set == -1) + return SVN_NO_ERROR; + + str = svn_cstring_tokenize(" ", &string); + if (str == NULL) + { + if (rep->id.change_set == SVN_FS_X__INVALID_CHANGE_SET) + return SVN_NO_ERROR; + + return svn_error_create(SVN_ERR_FS_CORRUPT, NULL, + _("Malformed text representation offset line in node-rev")); + } + + SVN_ERR(svn_cstring_atoi64(&val, str)); + rep->id.number = (apr_off_t)val; + + str = svn_cstring_tokenize(" ", &string); + if (str == NULL) + return svn_error_create(SVN_ERR_FS_CORRUPT, NULL, + _("Malformed text representation offset line in node-rev")); + + SVN_ERR(svn_cstring_atoi64(&val, str)); + rep->size = (svn_filesize_t)val; + + str = svn_cstring_tokenize(" ", &string); + if (str == NULL) + return svn_error_create(SVN_ERR_FS_CORRUPT, NULL, + _("Malformed text representation offset line in node-rev")); + + SVN_ERR(svn_cstring_atoi64(&val, str)); + rep->expanded_size = (svn_filesize_t)val; + + /* Read in the MD5 hash. */ + str = svn_cstring_tokenize(" ", &string); + if ((str == NULL) || (strlen(str) != (APR_MD5_DIGESTSIZE * 2))) + return svn_error_create(SVN_ERR_FS_CORRUPT, NULL, + _("Malformed text representation offset line in node-rev")); + + SVN_ERR(svn_checksum_parse_hex(&checksum, svn_checksum_md5, str, + scratch_pool)); + if (checksum) + memcpy(rep->md5_digest, checksum->digest, sizeof(rep->md5_digest)); + + /* The remaining fields are only used for formats >= 4, so check that. */ + str = svn_cstring_tokenize(" ", &string); + if (str == NULL) + return SVN_NO_ERROR; + + /* Read the SHA1 hash. */ + if (strlen(str) != (APR_SHA1_DIGESTSIZE * 2)) + return svn_error_create(SVN_ERR_FS_CORRUPT, NULL, + _("Malformed text representation offset line in node-rev")); + + SVN_ERR(svn_checksum_parse_hex(&checksum, svn_checksum_sha1, str, + scratch_pool)); + rep->has_sha1 = checksum != NULL; + if (checksum) + memcpy(rep->sha1_digest, checksum->digest, sizeof(rep->sha1_digest)); + + return SVN_NO_ERROR; +} + +/* Wrap read_rep_offsets_body(), extracting its TXN_ID from our NODEREV_ID, + and adding an error message. */ +static svn_error_t * +read_rep_offsets(svn_fs_x__representation_t **rep_p, + char *string, + const svn_fs_x__id_t *noderev_id, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool) +{ + svn_error_t *err + = svn_fs_x__parse_representation(rep_p, + svn_stringbuf_create_wrap(string, + scratch_pool), + result_pool, + scratch_pool); + if (err) + { + const svn_string_t *id_unparsed; + const char *where; + + id_unparsed = svn_fs_x__id_unparse(noderev_id, scratch_pool); + where = apr_psprintf(scratch_pool, + _("While reading representation offsets " + "for node-revision '%s':"), + id_unparsed->data); + + return svn_error_quick_wrap(err, where); + } + + return SVN_NO_ERROR; +} + +/* If PATH needs to be escaped, return an escaped version of it, allocated + * from RESULT_POOL. Otherwise, return PATH directly. */ +static const char * +auto_escape_path(const char *path, + apr_pool_t *result_pool) +{ + apr_size_t len = strlen(path); + apr_size_t i; + const char esc = '\x1b'; + + for (i = 0; i < len; ++i) + if (path[i] < ' ') + { + svn_stringbuf_t *escaped = svn_stringbuf_create_ensure(2 * len, + result_pool); + for (i = 0; i < len; ++i) + if (path[i] < ' ') + { + svn_stringbuf_appendbyte(escaped, esc); + svn_stringbuf_appendbyte(escaped, path[i] + 'A' - 1); + } + else + { + svn_stringbuf_appendbyte(escaped, path[i]); + } + + return escaped->data; + } + + return path; +} + +/* If PATH has been escaped, return the un-escaped version of it, allocated + * from RESULT_POOL. Otherwise, return PATH directly. */ +static const char * +auto_unescape_path(const char *path, + apr_pool_t *result_pool) +{ + const char esc = '\x1b'; + if (strchr(path, esc)) + { + apr_size_t len = strlen(path); + apr_size_t i; + + svn_stringbuf_t *unescaped = svn_stringbuf_create_ensure(len, + result_pool); + for (i = 0; i < len; ++i) + if (path[i] == esc) + svn_stringbuf_appendbyte(unescaped, path[++i] + 1 - 'A'); + else + svn_stringbuf_appendbyte(unescaped, path[i]); + + return unescaped->data; + } + + return path; +} + +/* Find entry HEADER_NAME in HEADERS and parse its value into *ID. */ +static svn_error_t * +read_id_part(svn_fs_x__id_t *id, + apr_hash_t *headers, + const char *header_name) +{ + const char *value = svn_hash_gets(headers, header_name); + if (value == NULL) + return svn_error_createf(SVN_ERR_FS_CORRUPT, NULL, + _("Missing %s field in node-rev"), + header_name); + + SVN_ERR(svn_fs_x__id_parse(id, value)); + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__read_noderev(svn_fs_x__noderev_t **noderev_p, + svn_stream_t *stream, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool) +{ + apr_hash_t *headers; + svn_fs_x__noderev_t *noderev; + char *value; + const char *noderev_id; + + SVN_ERR(read_header_block(&headers, stream, scratch_pool)); + SVN_ERR(svn_stream_close(stream)); + + noderev = apr_pcalloc(result_pool, sizeof(*noderev)); + + /* for error messages later */ + noderev_id = svn_hash_gets(headers, HEADER_ID); + + /* Read the node-rev id. */ + SVN_ERR(read_id_part(&noderev->noderev_id, headers, HEADER_ID)); + SVN_ERR(read_id_part(&noderev->node_id, headers, HEADER_NODE)); + SVN_ERR(read_id_part(&noderev->copy_id, headers, HEADER_COPY)); + + /* Read the type. */ + value = svn_hash_gets(headers, HEADER_TYPE); + + if ((value == NULL) || + ( strcmp(value, SVN_FS_X__KIND_FILE) + && strcmp(value, SVN_FS_X__KIND_DIR))) + /* ### s/kind/type/ */ + return svn_error_createf(SVN_ERR_FS_CORRUPT, NULL, + _("Missing kind field in node-rev '%s'"), + noderev_id); + + noderev->kind = (strcmp(value, SVN_FS_X__KIND_FILE) == 0) + ? svn_node_file + : svn_node_dir; + + /* Read the 'count' field. */ + value = svn_hash_gets(headers, HEADER_COUNT); + if (value) + SVN_ERR(svn_cstring_atoi(&noderev->predecessor_count, value)); + else + noderev->predecessor_count = 0; + + /* Get the properties location. */ + value = svn_hash_gets(headers, HEADER_PROPS); + if (value) + { + SVN_ERR(read_rep_offsets(&noderev->prop_rep, value, + &noderev->noderev_id, result_pool, + scratch_pool)); + } + + /* Get the data location. */ + value = svn_hash_gets(headers, HEADER_TEXT); + if (value) + { + SVN_ERR(read_rep_offsets(&noderev->data_rep, value, + &noderev->noderev_id, result_pool, + scratch_pool)); + } + + /* Get the created path. */ + value = svn_hash_gets(headers, HEADER_CPATH); + if (value == NULL) + { + return svn_error_createf(SVN_ERR_FS_CORRUPT, NULL, + _("Missing cpath field in node-rev '%s'"), + noderev_id); + } + else + { + if (!svn_fspath__is_canonical(value)) + return svn_error_createf(SVN_ERR_FS_CORRUPT, NULL, + _("Non-canonical cpath field in node-rev '%s'"), + noderev_id); + + noderev->created_path = auto_unescape_path(apr_pstrdup(result_pool, + value), + result_pool); + } + + /* Get the predecessor ID. */ + value = svn_hash_gets(headers, HEADER_PRED); + if (value) + SVN_ERR(svn_fs_x__id_parse(&noderev->predecessor_id, value)); + else + svn_fs_x__id_reset(&noderev->predecessor_id); + + /* Get the copyroot. */ + value = svn_hash_gets(headers, HEADER_COPYROOT); + if (value == NULL) + { + noderev->copyroot_path = noderev->created_path; + noderev->copyroot_rev + = svn_fs_x__get_revnum(noderev->noderev_id.change_set); + } + else + { + SVN_ERR(parse_revnum(&noderev->copyroot_rev, (const char **)&value)); + + if (!svn_fspath__is_canonical(value)) + return svn_error_createf(SVN_ERR_FS_CORRUPT, NULL, + _("Malformed copyroot line in node-rev '%s'"), + noderev_id); + noderev->copyroot_path = auto_unescape_path(apr_pstrdup(result_pool, + value), + result_pool); + } + + /* Get the copyfrom. */ + value = svn_hash_gets(headers, HEADER_COPYFROM); + if (value == NULL) + { + noderev->copyfrom_path = NULL; + noderev->copyfrom_rev = SVN_INVALID_REVNUM; + } + else + { + SVN_ERR(parse_revnum(&noderev->copyfrom_rev, (const char **)&value)); + + if (*value == 0) + return svn_error_createf(SVN_ERR_FS_CORRUPT, NULL, + _("Malformed copyfrom line in node-rev '%s'"), + noderev_id); + noderev->copyfrom_path = auto_unescape_path(apr_pstrdup(result_pool, + value), + result_pool); + } + + /* Get the mergeinfo count. */ + value = svn_hash_gets(headers, HEADER_MINFO_CNT); + if (value) + SVN_ERR(svn_cstring_atoi64(&noderev->mergeinfo_count, value)); + else + noderev->mergeinfo_count = 0; + + /* Get whether *this* node has mergeinfo. */ + value = svn_hash_gets(headers, HEADER_MINFO_HERE); + noderev->has_mergeinfo = (value != NULL); + + *noderev_p = noderev; + + return SVN_NO_ERROR; +} + +/* Return a textual representation of the DIGEST of given KIND. + * If IS_NULL is TRUE, no digest is available. + * Allocate the result in RESULT_POOL. + */ +static const char * +format_digest(const unsigned char *digest, + svn_checksum_kind_t kind, + svn_boolean_t is_null, + apr_pool_t *result_pool) +{ + svn_checksum_t checksum; + checksum.digest = digest; + checksum.kind = kind; + + if (is_null) + return "(null)"; + + return svn_checksum_to_cstring_display(&checksum, result_pool); +} + +svn_stringbuf_t * +svn_fs_x__unparse_representation(svn_fs_x__representation_t *rep, + svn_boolean_t mutable_rep_truncated, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool) +{ + if (!rep->has_sha1) + return svn_stringbuf_createf + (result_pool, + "%" APR_INT64_T_FMT " %" APR_UINT64_T_FMT " %" SVN_FILESIZE_T_FMT + " %" SVN_FILESIZE_T_FMT " %s", + rep->id.change_set, rep->id.number, rep->size, + rep->expanded_size, + format_digest(rep->md5_digest, svn_checksum_md5, FALSE, + scratch_pool)); + + return svn_stringbuf_createf + (result_pool, + "%" APR_INT64_T_FMT " %" APR_UINT64_T_FMT " %" SVN_FILESIZE_T_FMT + " %" SVN_FILESIZE_T_FMT " %s %s", + rep->id.change_set, rep->id.number, rep->size, + rep->expanded_size, + format_digest(rep->md5_digest, svn_checksum_md5, + FALSE, scratch_pool), + format_digest(rep->sha1_digest, svn_checksum_sha1, + !rep->has_sha1, scratch_pool)); +} + + +svn_error_t * +svn_fs_x__write_noderev(svn_stream_t *outfile, + svn_fs_x__noderev_t *noderev, + apr_pool_t *scratch_pool) +{ + svn_string_t *str_id; + + str_id = svn_fs_x__id_unparse(&noderev->noderev_id, scratch_pool); + SVN_ERR(svn_stream_printf(outfile, scratch_pool, HEADER_ID ": %s\n", + str_id->data)); + str_id = svn_fs_x__id_unparse(&noderev->node_id, scratch_pool); + SVN_ERR(svn_stream_printf(outfile, scratch_pool, HEADER_NODE ": %s\n", + str_id->data)); + str_id = svn_fs_x__id_unparse(&noderev->copy_id, scratch_pool); + SVN_ERR(svn_stream_printf(outfile, scratch_pool, HEADER_COPY ": %s\n", + str_id->data)); + + SVN_ERR(svn_stream_printf(outfile, scratch_pool, HEADER_TYPE ": %s\n", + (noderev->kind == svn_node_file) ? + SVN_FS_X__KIND_FILE : SVN_FS_X__KIND_DIR)); + + if (svn_fs_x__id_used(&noderev->predecessor_id)) + SVN_ERR(svn_stream_printf(outfile, scratch_pool, HEADER_PRED ": %s\n", + svn_fs_x__id_unparse(&noderev->predecessor_id, + scratch_pool)->data)); + + SVN_ERR(svn_stream_printf(outfile, scratch_pool, HEADER_COUNT ": %d\n", + noderev->predecessor_count)); + + if (noderev->data_rep) + SVN_ERR(svn_stream_printf(outfile, scratch_pool, HEADER_TEXT ": %s\n", + svn_fs_x__unparse_representation + (noderev->data_rep, + noderev->kind == svn_node_dir, + scratch_pool, scratch_pool)->data)); + + if (noderev->prop_rep) + SVN_ERR(svn_stream_printf(outfile, scratch_pool, HEADER_PROPS ": %s\n", + svn_fs_x__unparse_representation + (noderev->prop_rep, + TRUE, scratch_pool, scratch_pool)->data)); + + SVN_ERR(svn_stream_printf(outfile, scratch_pool, HEADER_CPATH ": %s\n", + auto_escape_path(noderev->created_path, + scratch_pool))); + + if (noderev->copyfrom_path) + SVN_ERR(svn_stream_printf(outfile, scratch_pool, HEADER_COPYFROM ": %ld" + " %s\n", + noderev->copyfrom_rev, + auto_escape_path(noderev->copyfrom_path, + scratch_pool))); + + if ( ( noderev->copyroot_rev + != svn_fs_x__get_revnum(noderev->noderev_id.change_set)) + || (strcmp(noderev->copyroot_path, noderev->created_path) != 0)) + SVN_ERR(svn_stream_printf(outfile, scratch_pool, HEADER_COPYROOT ": %ld" + " %s\n", + noderev->copyroot_rev, + auto_escape_path(noderev->copyroot_path, + scratch_pool))); + + if (noderev->mergeinfo_count > 0) + SVN_ERR(svn_stream_printf(outfile, scratch_pool, HEADER_MINFO_CNT ": %" + APR_INT64_T_FMT "\n", + noderev->mergeinfo_count)); + + if (noderev->has_mergeinfo) + SVN_ERR(svn_stream_puts(outfile, HEADER_MINFO_HERE ": y\n")); + + return svn_stream_puts(outfile, "\n"); +} + +svn_error_t * +svn_fs_x__read_rep_header(svn_fs_x__rep_header_t **header, + svn_stream_t *stream, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool) +{ + svn_stringbuf_t *buffer; + char *str, *last_str; + apr_int64_t val; + svn_boolean_t eol = FALSE; + + SVN_ERR(svn_stream_readline(stream, &buffer, "\n", &eol, scratch_pool)); + + *header = apr_pcalloc(result_pool, sizeof(**header)); + (*header)->header_size = buffer->len + 1; + if (strcmp(buffer->data, REP_DELTA) == 0) + { + /* This is a delta against the empty stream. */ + (*header)->type = svn_fs_x__rep_self_delta; + return SVN_NO_ERROR; + } + + (*header)->type = svn_fs_x__rep_delta; + + /* We have hopefully a DELTA vs. a non-empty base revision. */ + last_str = buffer->data; + str = svn_cstring_tokenize(" ", &last_str); + if (! str || (strcmp(str, REP_DELTA) != 0)) + goto error; + + SVN_ERR(parse_revnum(&(*header)->base_revision, (const char **)&last_str)); + + str = svn_cstring_tokenize(" ", &last_str); + if (! str) + goto error; + SVN_ERR(svn_cstring_atoi64(&val, str)); + (*header)->base_item_index = (apr_off_t)val; + + str = svn_cstring_tokenize(" ", &last_str); + if (! str) + goto error; + SVN_ERR(svn_cstring_atoi64(&val, str)); + (*header)->base_length = (svn_filesize_t)val; + + return SVN_NO_ERROR; + + error: + return svn_error_createf(SVN_ERR_FS_CORRUPT, NULL, + _("Malformed representation header")); +} + +svn_error_t * +svn_fs_x__write_rep_header(svn_fs_x__rep_header_t *header, + svn_stream_t *stream, + apr_pool_t *scratch_pool) +{ + const char *text; + + switch (header->type) + { + case svn_fs_x__rep_self_delta: + text = REP_DELTA "\n"; + break; + + default: + text = apr_psprintf(scratch_pool, REP_DELTA " %ld %" APR_OFF_T_FMT + " %" SVN_FILESIZE_T_FMT "\n", + header->base_revision, header->base_item_index, + header->base_length); + } + + return svn_error_trace(svn_stream_puts(stream, text)); +} + +/* Read the next entry in the changes record from file FILE and store + the resulting change in *CHANGE_P. If there is no next record, + store NULL there. Perform all allocations from POOL. */ +static svn_error_t * +read_change(svn_fs_x__change_t **change_p, + svn_stream_t *stream, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool) +{ + svn_stringbuf_t *line; + svn_boolean_t eof = TRUE; + svn_fs_x__change_t *change; + char *str, *last_str, *kind_str; + + /* Default return value. */ + *change_p = NULL; + + SVN_ERR(svn_stream_readline(stream, &line, "\n", &eof, scratch_pool)); + + /* Check for a blank line. */ + if (eof || (line->len == 0)) + return SVN_NO_ERROR; + + change = apr_pcalloc(result_pool, sizeof(*change)); + last_str = line->data; + + /* Get the node-id of the change. */ + str = svn_cstring_tokenize(" ", &last_str); + if (str == NULL) + return svn_error_create(SVN_ERR_FS_CORRUPT, NULL, + _("Invalid changes line in rev-file")); + + SVN_ERR(svn_fs_x__id_parse(&change->noderev_id, str)); + + /* Get the change type. */ + str = svn_cstring_tokenize(" ", &last_str); + if (str == NULL) + return svn_error_create(SVN_ERR_FS_CORRUPT, NULL, + _("Invalid changes line in rev-file")); + + /* Don't bother to check the format number before looking for + * node-kinds: just read them if you find them. */ + change->node_kind = svn_node_unknown; + kind_str = strchr(str, '-'); + if (kind_str) + { + /* Cap off the end of "str" (the action). */ + *kind_str = '\0'; + kind_str++; + if (strcmp(kind_str, SVN_FS_X__KIND_FILE) == 0) + change->node_kind = svn_node_file; + else if (strcmp(kind_str, SVN_FS_X__KIND_DIR) == 0) + change->node_kind = svn_node_dir; + else + return svn_error_create(SVN_ERR_FS_CORRUPT, NULL, + _("Invalid changes line in rev-file")); + } + + if (strcmp(str, ACTION_MODIFY) == 0) + { + change->change_kind = svn_fs_path_change_modify; + } + else if (strcmp(str, ACTION_ADD) == 0) + { + change->change_kind = svn_fs_path_change_add; + } + else if (strcmp(str, ACTION_DELETE) == 0) + { + change->change_kind = svn_fs_path_change_delete; + } + else if (strcmp(str, ACTION_REPLACE) == 0) + { + change->change_kind = svn_fs_path_change_replace; + } + else if (strcmp(str, ACTION_RESET) == 0) + { + change->change_kind = svn_fs_path_change_reset; + } + else + { + return svn_error_create(SVN_ERR_FS_CORRUPT, NULL, + _("Invalid change kind in rev file")); + } + + /* Get the text-mod flag. */ + str = svn_cstring_tokenize(" ", &last_str); + if (str == NULL) + return svn_error_create(SVN_ERR_FS_CORRUPT, NULL, + _("Invalid changes line in rev-file")); + + if (strcmp(str, FLAG_TRUE) == 0) + { + change->text_mod = TRUE; + } + else if (strcmp(str, FLAG_FALSE) == 0) + { + change->text_mod = FALSE; + } + else + { + return svn_error_create(SVN_ERR_FS_CORRUPT, NULL, + _("Invalid text-mod flag in rev-file")); + } + + /* Get the prop-mod flag. */ + str = svn_cstring_tokenize(" ", &last_str); + if (str == NULL) + return svn_error_create(SVN_ERR_FS_CORRUPT, NULL, + _("Invalid changes line in rev-file")); + + if (strcmp(str, FLAG_TRUE) == 0) + { + change->prop_mod = TRUE; + } + else if (strcmp(str, FLAG_FALSE) == 0) + { + change->prop_mod = FALSE; + } + else + { + return svn_error_create(SVN_ERR_FS_CORRUPT, NULL, + _("Invalid prop-mod flag in rev-file")); + } + + /* Get the mergeinfo-mod flag. */ + str = svn_cstring_tokenize(" ", &last_str); + if (str == NULL) + return svn_error_create(SVN_ERR_FS_CORRUPT, NULL, + _("Invalid changes line in rev-file")); + + if (strcmp(str, FLAG_TRUE) == 0) + { + change->mergeinfo_mod = svn_tristate_true; + } + else if (strcmp(str, FLAG_FALSE) == 0) + { + change->mergeinfo_mod = svn_tristate_false; + } + else + { + return svn_error_create(SVN_ERR_FS_CORRUPT, NULL, + _("Invalid mergeinfo-mod flag in rev-file")); + } + + /* Get the changed path. */ + if (!svn_fspath__is_canonical(last_str)) + return svn_error_create(SVN_ERR_FS_CORRUPT, NULL, + _("Invalid path in changes line")); + + change->path.data = auto_unescape_path(apr_pstrmemdup(result_pool, + last_str, + strlen(last_str)), + result_pool); + change->path.len = strlen(change->path.data); + + /* Read the next line, the copyfrom line. */ + SVN_ERR(svn_stream_readline(stream, &line, "\n", &eof, result_pool)); + change->copyfrom_known = TRUE; + if (eof || line->len == 0) + { + change->copyfrom_rev = SVN_INVALID_REVNUM; + change->copyfrom_path = NULL; + } + else + { + last_str = line->data; + SVN_ERR(parse_revnum(&change->copyfrom_rev, (const char **)&last_str)); + + if (!svn_fspath__is_canonical(last_str)) + return svn_error_create(SVN_ERR_FS_CORRUPT, NULL, + _("Invalid copy-from path in changes line")); + + change->copyfrom_path = auto_unescape_path(last_str, result_pool); + } + + *change_p = change; + + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__read_changes(apr_array_header_t **changes, + svn_stream_t *stream, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool) +{ + svn_fs_x__change_t *change; + apr_pool_t *iterpool; + + /* Pre-allocate enough room for most change lists. + (will be auto-expanded as necessary). + + Chose the default to just below 2^N such that the doubling reallocs + will request roughly 2^M bytes from the OS without exceeding the + respective two-power by just a few bytes (leaves room array and APR + node overhead for large enough M). + */ + *changes = apr_array_make(result_pool, 63, sizeof(svn_fs_x__change_t *)); + + SVN_ERR(read_change(&change, stream, result_pool, scratch_pool)); + iterpool = svn_pool_create(scratch_pool); + while (change) + { + APR_ARRAY_PUSH(*changes, svn_fs_x__change_t*) = change; + SVN_ERR(read_change(&change, stream, result_pool, iterpool)); + svn_pool_clear(iterpool); + } + svn_pool_destroy(iterpool); + + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__read_changes_incrementally(svn_stream_t *stream, + svn_fs_x__change_receiver_t + change_receiver, + void *change_receiver_baton, + apr_pool_t *scratch_pool) +{ + svn_fs_x__change_t *change; + apr_pool_t *iterpool; + + iterpool = svn_pool_create(scratch_pool); + do + { + svn_pool_clear(iterpool); + + SVN_ERR(read_change(&change, stream, iterpool, iterpool)); + if (change) + SVN_ERR(change_receiver(change_receiver_baton, change, iterpool)); + } + while (change); + svn_pool_destroy(iterpool); + + return SVN_NO_ERROR; +} + +/* Write a single change entry, path PATH, change CHANGE, to STREAM. + + All temporary allocations are in SCRATCH_POOL. */ +static svn_error_t * +write_change_entry(svn_stream_t *stream, + svn_fs_x__change_t *change, + apr_pool_t *scratch_pool) +{ + const char *idstr; + const char *change_string = NULL; + const char *kind_string = ""; + svn_stringbuf_t *buf; + apr_size_t len; + + switch (change->change_kind) + { + case svn_fs_path_change_modify: + change_string = ACTION_MODIFY; + break; + case svn_fs_path_change_add: + change_string = ACTION_ADD; + break; + case svn_fs_path_change_delete: + change_string = ACTION_DELETE; + break; + case svn_fs_path_change_replace: + change_string = ACTION_REPLACE; + break; + case svn_fs_path_change_reset: + change_string = ACTION_RESET; + break; + default: + return svn_error_createf(SVN_ERR_FS_CORRUPT, NULL, + _("Invalid change type %d"), + change->change_kind); + } + + idstr = svn_fs_x__id_unparse(&change->noderev_id, scratch_pool)->data; + + SVN_ERR_ASSERT(change->node_kind == svn_node_dir + || change->node_kind == svn_node_file); + kind_string = apr_psprintf(scratch_pool, "-%s", + change->node_kind == svn_node_dir + ? SVN_FS_X__KIND_DIR + : SVN_FS_X__KIND_FILE); + + buf = svn_stringbuf_createf(scratch_pool, "%s %s%s %s %s %s %s\n", + idstr, change_string, kind_string, + change->text_mod ? FLAG_TRUE : FLAG_FALSE, + change->prop_mod ? FLAG_TRUE : FLAG_FALSE, + change->mergeinfo_mod == svn_tristate_true + ? FLAG_TRUE : FLAG_FALSE, + auto_escape_path(change->path.data, scratch_pool)); + + if (SVN_IS_VALID_REVNUM(change->copyfrom_rev)) + { + svn_stringbuf_appendcstr(buf, apr_psprintf(scratch_pool, "%ld %s", + change->copyfrom_rev, + auto_escape_path(change->copyfrom_path, + scratch_pool))); + } + + svn_stringbuf_appendbyte(buf, '\n'); + + /* Write all change info in one write call. */ + len = buf->len; + return svn_error_trace(svn_stream_write(stream, buf->data, &len)); +} + +svn_error_t * +svn_fs_x__write_changes(svn_stream_t *stream, + svn_fs_t *fs, + apr_hash_t *changes, + svn_boolean_t terminate_list, + apr_pool_t *scratch_pool) +{ + apr_pool_t *iterpool = svn_pool_create(scratch_pool); + apr_array_header_t *sorted_changed_paths; + int i; + + /* For the sake of the repository administrator sort the changes so + that the final file is deterministic and repeatable, however the + rest of the FSX code doesn't require any particular order here. + + Also, this sorting is only effective in writing all entries with + a single call as write_final_changed_path_info() does. For the + list being written incrementally during transaction, we actually + *must not* change the order of entries from different calls. + */ + sorted_changed_paths = svn_sort__hash(changes, + svn_sort_compare_items_lexically, + scratch_pool); + + /* Write all items to disk in the new order. */ + for (i = 0; i < sorted_changed_paths->nelts; ++i) + { + svn_fs_x__change_t *change; + + svn_pool_clear(iterpool); + change = APR_ARRAY_IDX(sorted_changed_paths, i, svn_sort__item_t).value; + + /* Write out the new entry into the final rev-file. */ + SVN_ERR(write_change_entry(stream, change, iterpool)); + } + + if (terminate_list) + svn_stream_puts(stream, "\n"); + + svn_pool_destroy(iterpool); + + return SVN_NO_ERROR; +} + diff --git a/subversion/libsvn_fs_x/low_level.h b/subversion/libsvn_fs_x/low_level.h new file mode 100644 index 0000000..e4fdf05 --- /dev/null +++ b/subversion/libsvn_fs_x/low_level.h @@ -0,0 +1,214 @@ +/* low_level.c --- low level r/w access to fs_x file structures + * + * ==================================================================== + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + * ==================================================================== + */ + +#ifndef SVN_LIBSVN_FS__LOW_LEVEL_H +#define SVN_LIBSVN_FS__LOW_LEVEL_H + +#include "svn_fs.h" + +#include "fs_x.h" +#include "id.h" + +#ifdef __cplusplus +extern "C" { +#endif /* __cplusplus */ + +/* Kinds that a node-rev can be. */ +#define SVN_FS_X__KIND_FILE "file" +#define SVN_FS_X__KIND_DIR "dir" + +/* The functions are grouped as follows: + * + * - revision footer + * - representation (as in "text:" and "props:" lines) + * - node revision + * - representation header ("DELTA" lines) + * - changed path list + */ + +/* Given the FSX revision / pack FOOTER, parse it destructively + * and return the start offsets of the index data in *L2P_OFFSET and + * *P2L_OFFSET, respectively. Also, return the expected checksums in + * in *L2P_CHECKSUM and *P2L_CHECKSUM. + * + * Note that REV is only used to construct nicer error objects that + * mention this revision. Allocate the checksums in RESULT_POOL. + */ +svn_error_t * +svn_fs_x__parse_footer(apr_off_t *l2p_offset, + svn_checksum_t **l2p_checksum, + apr_off_t *p2l_offset, + svn_checksum_t **p2l_checksum, + svn_stringbuf_t *footer, + svn_revnum_t rev, + apr_pool_t *result_pool); + +/* Given the offset of the L2P index data in L2P_OFFSET, the content + * checksum in L2P_CHECKSUM and the offset plus checksum of the P2L + * index data in P2L_OFFSET and P2L_CHECKSUM. + * + * Return the corresponding format 7+ revision / pack file footer. + * Allocate it in RESULT_POOL and use SCRATCH_POOL for temporary. + */ +svn_stringbuf_t * +svn_fs_x__unparse_footer(apr_off_t l2p_offset, + svn_checksum_t *l2p_checksum, + apr_off_t p2l_offset, + svn_checksum_t *p2l_checksum, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool); + +/* Parse the description of a representation from TEXT and store it + into *REP_P. TEXT will be invalidated by this call. Allocate *REP_P in + RESULT_POOL and use SCRATCH_POOL for temporaries. */ +svn_error_t * +svn_fs_x__parse_representation(svn_fs_x__representation_t **rep_p, + svn_stringbuf_t *text, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool); + +/* Return a formatted string that represents the location of representation + * REP. If MUTABLE_REP_TRUNCATED is given, the rep is for props or dir + * contents, and only a "-1" revision number will be given for a mutable rep. + * If MAY_BE_CORRUPT is true, guard for NULL when constructing the string. + * Allocate the result in RESULT_POOL and temporaries in SCRATCH_POOL. */ +svn_stringbuf_t * +svn_fs_x__unparse_representation(svn_fs_x__representation_t *rep, + svn_boolean_t mutable_rep_truncated, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool); + +/* Read a node-revision from STREAM. Set *NODEREV to the new structure, + allocated in RESULT_POOL. */ +svn_error_t * +svn_fs_x__read_noderev(svn_fs_x__noderev_t **noderev, + svn_stream_t *stream, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool); + +/* Write the node-revision NODEREV into the stream OUTFILE. + Temporary allocations are from SCRATCH_POOL. */ +svn_error_t * +svn_fs_x__write_noderev(svn_stream_t *outfile, + svn_fs_x__noderev_t *noderev, + apr_pool_t *scratch_pool); + +/* This type enumerates all forms of representations that we support. */ +typedef enum svn_fs_x__rep_type_t +{ + /* this is a DELTA representation with no base representation */ + svn_fs_x__rep_self_delta, + + /* this is a DELTA representation against some base representation */ + svn_fs_x__rep_delta, + + /* this is a representation in a star-delta container */ + svn_fs_x__rep_container +} svn_fs_x__rep_type_t; + +/* This structure is used to hold the information stored in a representation + * header. */ +typedef struct svn_fs_x__rep_header_t +{ + /* type of the representation, i.e. whether self-DELTA etc. */ + svn_fs_x__rep_type_t type; + + /* if this rep is a delta against some other rep, that base rep can + * be found in this revision. Should be 0 if there is no base rep. */ + svn_revnum_t base_revision; + + /* if this rep is a delta against some other rep, that base rep can + * be found at this item index within the base rep's revision. Should + * be 0 if there is no base rep. */ + apr_off_t base_item_index; + + /* if this rep is a delta against some other rep, this is the (deltified) + * size of that base rep. Should be 0 if there is no base rep. */ + svn_filesize_t base_length; + + /* length of the textual representation of the header in the rep or pack + * file, including EOL. Only valid after reading it from disk. + * Should be 0 otherwise. */ + apr_size_t header_size; +} svn_fs_x__rep_header_t; + +/* Read the next line from STREAM and parse it as a text + representation header. Return the parsed entry in *HEADER, allocated + in RESULT_POOL. Perform temporary allocations in SCRATCH_POOL. */ +svn_error_t * +svn_fs_x__read_rep_header(svn_fs_x__rep_header_t **header, + svn_stream_t *stream, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool); + +/* Write the representation HEADER to STREAM. + * Use SCRATCH_POOL for allocations. */ +svn_error_t * +svn_fs_x__write_rep_header(svn_fs_x__rep_header_t *header, + svn_stream_t *stream, + apr_pool_t *scratch_pool); + +/* Read all the changes from STREAM and store them in *CHANGES, + allocated in RESULT_POOL. Do temporary allocations in SCRATCH_POOL. */ +svn_error_t * +svn_fs_x__read_changes(apr_array_header_t **changes, + svn_stream_t *stream, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool); + +/* Callback function used by svn_fs_fs__read_changes_incrementally(), + * asking the receiver to process to process CHANGE using BATON. CHANGE + * and SCRATCH_POOL will not be valid beyond the current callback invocation. + */ +typedef svn_error_t *(*svn_fs_x__change_receiver_t)( + void *baton, + svn_fs_x__change_t *change, + apr_pool_t *scratch_pool); + +/* Read all the changes from STREAM and invoke CHANGE_RECEIVER on each change. + Do all allocations in SCRATCH_POOL. */ +svn_error_t * +svn_fs_x__read_changes_incrementally(svn_stream_t *stream, + svn_fs_x__change_receiver_t + change_receiver, + void *change_receiver_baton, + apr_pool_t *scratch_pool); + +/* Write the changed path info from CHANGES in filesystem FS to the + output stream STREAM. You may call this function multiple time on + the same stream. If you are writing to a (proto-)revision file, + the last call must set TERMINATE_LIST to write an extra empty line + that marks the end of the changed paths list. + Perform temporary allocations in SCRATCH_POOL. + */ +svn_error_t * +svn_fs_x__write_changes(svn_stream_t *stream, + svn_fs_t *fs, + apr_hash_t *changes, + svn_boolean_t terminate_list, + apr_pool_t *scratch_pool); + +#ifdef __cplusplus +} +#endif /* __cplusplus */ + +#endif /* SVN_LIBSVN_FS__LOW_LEVEL_H */ diff --git a/subversion/libsvn_fs_x/noderevs.c b/subversion/libsvn_fs_x/noderevs.c new file mode 100644 index 0000000..60c6029 --- /dev/null +++ b/subversion/libsvn_fs_x/noderevs.c @@ -0,0 +1,912 @@ +/* noderevs.h --- FSX node revision container + * + * ==================================================================== + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + * ==================================================================== + */ + +#include "svn_private_config.h" + +#include "private/svn_dep_compat.h" +#include "private/svn_packed_data.h" +#include "private/svn_subr_private.h" +#include "private/svn_temp_serializer.h" + +#include "noderevs.h" +#include "string_table.h" +#include "temp_serializer.h" + +/* These flags will be used with the FLAGS field in binary_noderev_t. + */ + +/* (flags & NODEREV_KIND_MASK) extracts the noderev type */ +#define NODEREV_KIND_MASK 0x00007 + +/* the noderev has merge info */ +#define NODEREV_HAS_MINFO 0x00008 + +/* the noderev has copy-from-path and revision */ +#define NODEREV_HAS_COPYFROM 0x00010 + +/* the noderev has copy-root path and revision */ +#define NODEREV_HAS_COPYROOT 0x00020 + +/* the noderev has copy-root path and revision */ +#define NODEREV_HAS_CPATH 0x00040 + +/* Our internal representation of a svn_fs_x__noderev_t. + * + * We will store path strings in a string container and reference them + * from here. Similarly, IDs and representations are being stored in + * separate containers and then also referenced here. This eliminates + * the need to store the same IDs and representations more than once. + */ +typedef struct binary_noderev_t +{ + /* node type and presence indicators */ + apr_uint32_t flags; + + /* Index+1 of the noderev-id for this node-rev. */ + int id; + + /* Index+1 of the node-id for this node-rev. */ + int node_id; + + /* Index+1 of the copy-id for this node-rev. */ + int copy_id; + + /* Index+1 of the predecessor node revision id, or 0 if there is no + predecessor for this node revision */ + int predecessor_id; + + /* number of predecessors this node revision has (recursively), or + -1 if not known (for backward compatibility). */ + int predecessor_count; + + /* If this node-rev is a copy, what revision was it copied from? */ + svn_revnum_t copyfrom_rev; + + /* Helper for history tracing, root revision of the parent tree from + whence this node-rev was copied. */ + svn_revnum_t copyroot_rev; + + /* If this node-rev is a copy, this is the string index+1 of the path + from which that copy way made. 0, otherwise. */ + apr_size_t copyfrom_path; + + /* String index+1 of the root of the parent tree from whence this node- + * rev was copied. */ + apr_size_t copyroot_path; + + /* Index+1 of the representation key for this node's properties. + May be 0 if there are no properties. */ + int prop_rep; + + /* Index+1 of the representation for this node's data. + May be 0 if there is no data. */ + int data_rep; + + /* String index+1 of the path at which this node first came into + existence. */ + apr_size_t created_path; + + /* Number of nodes with svn:mergeinfo properties that are + descendants of this node (including it itself) */ + apr_int64_t mergeinfo_count; + +} binary_noderev_t; + +/* The actual container object. Node revisions are concatenated into + * NODEREVS, referenced representations are stored in DATA_REPS / PROP_REPS + * and the ids in IDs. PATHS is the string table for all paths. + * + * During construction, BUILDER will be used instead of PATHS. IDS_DICT, + * DATA_REPS_DICT and PROP_REPS_DICT are also only used during construction + * and are NULL otherwise. + */ +struct svn_fs_x__noderevs_t +{ + /* The paths - either in 'builder' mode or finalized mode. + * The respective other pointer will be NULL. */ + string_table_builder_t *builder; + string_table_t *paths; + + /* During construction, maps a full binary_id_t to an index into IDS */ + apr_hash_t *ids_dict; + + /* During construction, maps a full binary_representation_t to an index + * into REPS. */ + apr_hash_t *reps_dict; + + /* array of binary_id_t */ + apr_array_header_t *ids; + + /* array of binary_representation_t */ + apr_array_header_t *reps; + + /* array of binary_noderev_t. */ + apr_array_header_t *noderevs; +}; + +svn_fs_x__noderevs_t * +svn_fs_x__noderevs_create(int initial_count, + apr_pool_t* result_pool) +{ + svn_fs_x__noderevs_t *noderevs + = apr_palloc(result_pool, sizeof(*noderevs)); + + noderevs->builder = svn_fs_x__string_table_builder_create(result_pool); + noderevs->ids_dict = svn_hash__make(result_pool); + noderevs->reps_dict = svn_hash__make(result_pool); + noderevs->paths = NULL; + + noderevs->ids + = apr_array_make(result_pool, 2 * initial_count, sizeof(svn_fs_x__id_t)); + noderevs->reps + = apr_array_make(result_pool, 2 * initial_count, + sizeof(svn_fs_x__representation_t)); + noderevs->noderevs + = apr_array_make(result_pool, initial_count, sizeof(binary_noderev_t)); + + return noderevs; +} + +/* Given the ID, return the index+1 into IDS that contains a binary_id + * for it. Returns 0 for NULL IDs. We use DICT to detect duplicates. + */ +static int +store_id(apr_array_header_t *ids, + apr_hash_t *dict, + const svn_fs_x__id_t *id) +{ + int idx; + void *idx_void; + + if (!svn_fs_x__id_used(id)) + return 0; + + idx_void = apr_hash_get(dict, &id, sizeof(id)); + idx = (int)(apr_uintptr_t)idx_void; + if (idx == 0) + { + APR_ARRAY_PUSH(ids, svn_fs_x__id_t) = *id; + idx = ids->nelts; + apr_hash_set(dict, ids->elts + (idx-1) * ids->elt_size, + ids->elt_size, (void*)(apr_uintptr_t)idx); + } + + return idx; +} + +/* Given the REP, return the index+1 into REPS that contains a copy of it. + * Returns 0 for NULL IDs. We use DICT to detect duplicates. + */ +static int +store_representation(apr_array_header_t *reps, + apr_hash_t *dict, + const svn_fs_x__representation_t *rep) +{ + int idx; + void *idx_void; + + if (rep == NULL) + return 0; + + idx_void = apr_hash_get(dict, rep, sizeof(*rep)); + idx = (int)(apr_uintptr_t)idx_void; + if (idx == 0) + { + APR_ARRAY_PUSH(reps, svn_fs_x__representation_t) = *rep; + idx = reps->nelts; + apr_hash_set(dict, reps->elts + (idx-1) * reps->elt_size, + reps->elt_size, (void*)(apr_uintptr_t)idx); + } + + return idx; +} + +apr_size_t +svn_fs_x__noderevs_add(svn_fs_x__noderevs_t *container, + svn_fs_x__noderev_t *noderev) +{ + binary_noderev_t binary_noderev = { 0 }; + + binary_noderev.flags = (noderev->has_mergeinfo ? NODEREV_HAS_MINFO : 0) + | (noderev->copyfrom_path ? NODEREV_HAS_COPYFROM : 0) + | (noderev->copyroot_path ? NODEREV_HAS_COPYROOT : 0) + | (noderev->created_path ? NODEREV_HAS_CPATH : 0) + | (int)noderev->kind; + + binary_noderev.id + = store_id(container->ids, container->ids_dict, &noderev->noderev_id); + binary_noderev.node_id + = store_id(container->ids, container->ids_dict, &noderev->node_id); + binary_noderev.copy_id + = store_id(container->ids, container->ids_dict, &noderev->copy_id); + binary_noderev.predecessor_id + = store_id(container->ids, container->ids_dict, &noderev->predecessor_id); + + if (noderev->copyfrom_path) + { + binary_noderev.copyfrom_path + = svn_fs_x__string_table_builder_add(container->builder, + noderev->copyfrom_path, + 0); + binary_noderev.copyfrom_rev = noderev->copyfrom_rev; + } + + if (noderev->copyroot_path) + { + binary_noderev.copyroot_path + = svn_fs_x__string_table_builder_add(container->builder, + noderev->copyroot_path, + 0); + binary_noderev.copyroot_rev = noderev->copyroot_rev; + } + + binary_noderev.predecessor_count = noderev->predecessor_count; + binary_noderev.prop_rep = store_representation(container->reps, + container->reps_dict, + noderev->prop_rep); + binary_noderev.data_rep = store_representation(container->reps, + container->reps_dict, + noderev->data_rep); + + if (noderev->created_path) + binary_noderev.created_path + = svn_fs_x__string_table_builder_add(container->builder, + noderev->created_path, + 0); + + binary_noderev.mergeinfo_count = noderev->mergeinfo_count; + + APR_ARRAY_PUSH(container->noderevs, binary_noderev_t) = binary_noderev; + + return container->noderevs->nelts - 1; +} + +apr_size_t +svn_fs_x__noderevs_estimate_size(const svn_fs_x__noderevs_t *container) +{ + /* CONTAINER must be in 'builder' mode */ + if (container->builder == NULL) + return 0; + + /* string table code makes its own prediction, + * noderevs should be < 16 bytes each, + * id parts < 4 bytes each, + * data representations < 40 bytes each, + * property representations < 30 bytes each, + * some static overhead should be assumed */ + return svn_fs_x__string_table_builder_estimate_size(container->builder) + + container->noderevs->nelts * 16 + + container->ids->nelts * 4 + + container->reps->nelts * 40 + + 100; +} + +/* Set *ID to the ID part stored at index IDX in IDS. + */ +static svn_error_t * +get_id(svn_fs_x__id_t *id, + const apr_array_header_t *ids, + int idx) +{ + /* handle NULL IDs */ + if (idx == 0) + { + svn_fs_x__id_reset(id); + return SVN_NO_ERROR; + } + + /* check for corrupted data */ + if (idx < 0 || idx > ids->nelts) + return svn_error_createf(SVN_ERR_FS_CONTAINER_INDEX, NULL, + _("ID part index %d exceeds container size %d"), + idx, ids->nelts); + + /* Return the requested ID. */ + *id = APR_ARRAY_IDX(ids, idx - 1, svn_fs_x__id_t); + + return SVN_NO_ERROR; +} + +/* Create a svn_fs_x__representation_t in *REP, allocated in POOL based on the + * representation stored at index IDX in REPS. + */ +static svn_error_t * +get_representation(svn_fs_x__representation_t **rep, + const apr_array_header_t *reps, + int idx, + apr_pool_t *pool) +{ + /* handle NULL representations */ + if (idx == 0) + { + *rep = NULL; + return SVN_NO_ERROR; + } + + /* check for corrupted data */ + if (idx < 0 || idx > reps->nelts) + return svn_error_createf(SVN_ERR_FS_CONTAINER_INDEX, NULL, + _("Node revision ID index %d" + " exceeds container size %d"), + idx, reps->nelts); + + /* no translation required. Just duplicate the info */ + *rep = apr_pmemdup(pool, + &APR_ARRAY_IDX(reps, idx - 1, svn_fs_x__representation_t), + sizeof(**rep)); + + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__noderevs_get(svn_fs_x__noderev_t **noderev_p, + const svn_fs_x__noderevs_t *container, + apr_size_t idx, + apr_pool_t *pool) +{ + svn_fs_x__noderev_t *noderev; + binary_noderev_t *binary_noderev; + + /* CONTAINER must be in 'finalized' mode */ + SVN_ERR_ASSERT(container->builder == NULL); + SVN_ERR_ASSERT(container->paths); + + /* validate index */ + if (idx >= (apr_size_t)container->noderevs->nelts) + return svn_error_createf(SVN_ERR_FS_CONTAINER_INDEX, NULL, + apr_psprintf(pool, + _("Node revision index %%%s" + " exceeds container size %%d"), + APR_SIZE_T_FMT), + idx, container->noderevs->nelts); + + /* allocate result struct and fill it field by field */ + noderev = apr_pcalloc(pool, sizeof(*noderev)); + binary_noderev = &APR_ARRAY_IDX(container->noderevs, idx, binary_noderev_t); + + noderev->kind = (svn_node_kind_t)(binary_noderev->flags & NODEREV_KIND_MASK); + SVN_ERR(get_id(&noderev->noderev_id, container->ids, binary_noderev->id)); + SVN_ERR(get_id(&noderev->node_id, container->ids, + binary_noderev->node_id)); + SVN_ERR(get_id(&noderev->copy_id, container->ids, + binary_noderev->copy_id)); + SVN_ERR(get_id(&noderev->predecessor_id, container->ids, + binary_noderev->predecessor_id)); + + if (binary_noderev->flags & NODEREV_HAS_COPYFROM) + { + noderev->copyfrom_path + = svn_fs_x__string_table_get(container->paths, + binary_noderev->copyfrom_path, + NULL, + pool); + noderev->copyfrom_rev = binary_noderev->copyfrom_rev; + } + else + { + noderev->copyfrom_path = NULL; + noderev->copyfrom_rev = SVN_INVALID_REVNUM; + } + + if (binary_noderev->flags & NODEREV_HAS_COPYROOT) + { + noderev->copyroot_path + = svn_fs_x__string_table_get(container->paths, + binary_noderev->copyroot_path, + NULL, + pool); + noderev->copyroot_rev = binary_noderev->copyroot_rev; + } + else + { + noderev->copyroot_path = NULL; + noderev->copyroot_rev = 0; + } + + noderev->predecessor_count = binary_noderev->predecessor_count; + + SVN_ERR(get_representation(&noderev->prop_rep, container->reps, + binary_noderev->prop_rep, pool)); + SVN_ERR(get_representation(&noderev->data_rep, container->reps, + binary_noderev->data_rep, pool)); + + if (binary_noderev->flags & NODEREV_HAS_CPATH) + noderev->created_path + = svn_fs_x__string_table_get(container->paths, + binary_noderev->created_path, + NULL, + pool); + + noderev->mergeinfo_count = binary_noderev->mergeinfo_count; + + noderev->has_mergeinfo = (binary_noderev->flags & NODEREV_HAS_MINFO) ? 1 : 0; + *noderev_p = noderev; + + return SVN_NO_ERROR; +} + +/* Create and return a stream for representations in PARENT. + * Initialize the sub-streams for all fields, except checksums. + */ +static svn_packed__int_stream_t * +create_rep_stream(svn_packed__int_stream_t *parent) +{ + svn_packed__int_stream_t *stream + = svn_packed__create_int_substream(parent, FALSE, FALSE); + + /* sub-streams for members - except for checksums */ + /* has_sha1 */ + svn_packed__create_int_substream(stream, FALSE, FALSE); + + /* rev, item_index, size, expanded_size */ + svn_packed__create_int_substream(stream, TRUE, FALSE); + svn_packed__create_int_substream(stream, FALSE, FALSE); + svn_packed__create_int_substream(stream, FALSE, FALSE); + svn_packed__create_int_substream(stream, FALSE, FALSE); + + return stream; +} + +/* Serialize all representations in REP. Store checksums in DIGEST_STREAM, + * put all other fields into REP_STREAM. + */ +static void +write_reps(svn_packed__int_stream_t *rep_stream, + svn_packed__byte_stream_t *digest_stream, + apr_array_header_t *reps) +{ + int i; + for (i = 0; i < reps->nelts; ++i) + { + svn_fs_x__representation_t *rep + = &APR_ARRAY_IDX(reps, i, svn_fs_x__representation_t); + + svn_packed__add_uint(rep_stream, rep->has_sha1); + + svn_packed__add_uint(rep_stream, rep->id.change_set); + svn_packed__add_uint(rep_stream, rep->id.number); + svn_packed__add_uint(rep_stream, rep->size); + svn_packed__add_uint(rep_stream, rep->expanded_size); + + svn_packed__add_bytes(digest_stream, + (const char *)rep->md5_digest, + sizeof(rep->md5_digest)); + if (rep->has_sha1) + svn_packed__add_bytes(digest_stream, + (const char *)rep->sha1_digest, + sizeof(rep->sha1_digest)); + } +} + +svn_error_t * +svn_fs_x__write_noderevs_container(svn_stream_t *stream, + const svn_fs_x__noderevs_t *container, + apr_pool_t *scratch_pool) +{ + int i; + + string_table_t *paths = container->paths + ? container->paths + : svn_fs_x__string_table_create(container->builder, + scratch_pool); + + svn_packed__data_root_t *root = svn_packed__data_create_root(scratch_pool); + + /* one common top-level stream for all arrays. One sub-stream */ + svn_packed__int_stream_t *structs_stream + = svn_packed__create_int_stream(root, FALSE, FALSE); + svn_packed__int_stream_t *ids_stream + = svn_packed__create_int_substream(structs_stream, FALSE, FALSE); + svn_packed__int_stream_t *reps_stream + = create_rep_stream(structs_stream); + svn_packed__int_stream_t *noderevs_stream + = svn_packed__create_int_substream(structs_stream, FALSE, FALSE); + svn_packed__byte_stream_t *digests_stream + = svn_packed__create_bytes_stream(root); + + /* structure the IDS_STREAM such we can extract much of the redundancy + * from the svn_fs_x__ip_part_t structs */ + for (i = 0; i < 2; ++i) + svn_packed__create_int_substream(ids_stream, TRUE, FALSE); + + /* Same storing binary_noderev_t in the NODEREVS_STREAM */ + svn_packed__create_int_substream(noderevs_stream, FALSE, FALSE); + for (i = 0; i < 13; ++i) + svn_packed__create_int_substream(noderevs_stream, TRUE, FALSE); + + /* serialize ids array */ + for (i = 0; i < container->ids->nelts; ++i) + { + svn_fs_x__id_t *id = &APR_ARRAY_IDX(container->ids, i, svn_fs_x__id_t); + + svn_packed__add_int(ids_stream, id->change_set); + svn_packed__add_uint(ids_stream, id->number); + } + + /* serialize rep arrays */ + write_reps(reps_stream, digests_stream, container->reps); + + /* serialize noderevs array */ + for (i = 0; i < container->noderevs->nelts; ++i) + { + const binary_noderev_t *noderev + = &APR_ARRAY_IDX(container->noderevs, i, binary_noderev_t); + + svn_packed__add_uint(noderevs_stream, noderev->flags); + + svn_packed__add_uint(noderevs_stream, noderev->id); + svn_packed__add_uint(noderevs_stream, noderev->node_id); + svn_packed__add_uint(noderevs_stream, noderev->copy_id); + svn_packed__add_uint(noderevs_stream, noderev->predecessor_id); + svn_packed__add_uint(noderevs_stream, noderev->predecessor_count); + + svn_packed__add_uint(noderevs_stream, noderev->copyfrom_path); + svn_packed__add_int(noderevs_stream, noderev->copyfrom_rev); + svn_packed__add_uint(noderevs_stream, noderev->copyroot_path); + svn_packed__add_int(noderevs_stream, noderev->copyroot_rev); + + svn_packed__add_uint(noderevs_stream, noderev->prop_rep); + svn_packed__add_uint(noderevs_stream, noderev->data_rep); + + svn_packed__add_uint(noderevs_stream, noderev->created_path); + svn_packed__add_uint(noderevs_stream, noderev->mergeinfo_count); + } + + /* write to disk */ + SVN_ERR(svn_fs_x__write_string_table(stream, paths, scratch_pool)); + SVN_ERR(svn_packed__data_write(stream, root, scratch_pool)); + + return SVN_NO_ERROR; +} + +/* Allocate a svn_fs_x__representation_t array in POOL and return it in + * REPS_P. Deserialize the data in REP_STREAM and DIGEST_STREAM and store + * the resulting representations into the *REPS_P. + */ +static svn_error_t * +read_reps(apr_array_header_t **reps_p, + svn_packed__int_stream_t *rep_stream, + svn_packed__byte_stream_t *digest_stream, + apr_pool_t *pool) +{ + apr_size_t i; + apr_size_t len; + const char *bytes; + + apr_size_t count + = svn_packed__int_count(svn_packed__first_int_substream(rep_stream)); + apr_array_header_t *reps + = apr_array_make(pool, (int)count, sizeof(svn_fs_x__representation_t)); + + for (i = 0; i < count; ++i) + { + svn_fs_x__representation_t rep; + + rep.has_sha1 = (svn_boolean_t)svn_packed__get_uint(rep_stream); + + rep.id.change_set = (svn_revnum_t)svn_packed__get_uint(rep_stream); + rep.id.number = svn_packed__get_uint(rep_stream); + rep.size = svn_packed__get_uint(rep_stream); + rep.expanded_size = svn_packed__get_uint(rep_stream); + + /* when extracting the checksums, beware of buffer under/overflows + caused by disk data corruption. */ + bytes = svn_packed__get_bytes(digest_stream, &len); + if (len != sizeof(rep.md5_digest)) + return svn_error_createf(SVN_ERR_FS_CONTAINER_INDEX, NULL, + apr_psprintf(pool, + _("Unexpected MD5" + " digest size %%%s"), + APR_SIZE_T_FMT), + len); + + memcpy(rep.md5_digest, bytes, sizeof(rep.md5_digest)); + if (rep.has_sha1) + { + bytes = svn_packed__get_bytes(digest_stream, &len); + if (len != sizeof(rep.sha1_digest)) + return svn_error_createf(SVN_ERR_FS_CONTAINER_INDEX, NULL, + apr_psprintf(pool, + _("Unexpected SHA1" + " digest size %%%s"), + APR_SIZE_T_FMT), + len); + + memcpy(rep.sha1_digest, bytes, sizeof(rep.sha1_digest)); + } + + APR_ARRAY_PUSH(reps, svn_fs_x__representation_t) = rep; + } + + *reps_p = reps; + + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__read_noderevs_container(svn_fs_x__noderevs_t **container, + svn_stream_t *stream, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool) +{ + apr_size_t i; + apr_size_t count; + + svn_fs_x__noderevs_t *noderevs + = apr_pcalloc(result_pool, sizeof(*noderevs)); + + svn_packed__data_root_t *root; + svn_packed__int_stream_t *structs_stream; + svn_packed__int_stream_t *ids_stream; + svn_packed__int_stream_t *reps_stream; + svn_packed__int_stream_t *noderevs_stream; + svn_packed__byte_stream_t *digests_stream; + + /* read everything from disk */ + SVN_ERR(svn_fs_x__read_string_table(&noderevs->paths, stream, + result_pool, scratch_pool)); + SVN_ERR(svn_packed__data_read(&root, stream, result_pool, scratch_pool)); + + /* get streams */ + structs_stream = svn_packed__first_int_stream(root); + ids_stream = svn_packed__first_int_substream(structs_stream); + reps_stream = svn_packed__next_int_stream(ids_stream); + noderevs_stream = svn_packed__next_int_stream(reps_stream); + digests_stream = svn_packed__first_byte_stream(root); + + /* read ids array */ + count + = svn_packed__int_count(svn_packed__first_int_substream(ids_stream)); + noderevs->ids + = apr_array_make(result_pool, (int)count, sizeof(svn_fs_x__id_t)); + for (i = 0; i < count; ++i) + { + svn_fs_x__id_t id; + + id.change_set = (svn_revnum_t)svn_packed__get_int(ids_stream); + id.number = svn_packed__get_uint(ids_stream); + + APR_ARRAY_PUSH(noderevs->ids, svn_fs_x__id_t) = id; + } + + /* read rep arrays */ + SVN_ERR(read_reps(&noderevs->reps, reps_stream, digests_stream, + result_pool)); + + /* read noderevs array */ + count + = svn_packed__int_count(svn_packed__first_int_substream(noderevs_stream)); + noderevs->noderevs + = apr_array_make(result_pool, (int)count, sizeof(binary_noderev_t)); + for (i = 0; i < count; ++i) + { + binary_noderev_t noderev; + + noderev.flags = (apr_uint32_t)svn_packed__get_uint(noderevs_stream); + + noderev.id = (int)svn_packed__get_uint(noderevs_stream); + noderev.node_id = (int)svn_packed__get_uint(noderevs_stream); + noderev.copy_id = (int)svn_packed__get_uint(noderevs_stream); + noderev.predecessor_id = (int)svn_packed__get_uint(noderevs_stream); + noderev.predecessor_count = (int)svn_packed__get_uint(noderevs_stream); + + noderev.copyfrom_path = (apr_size_t)svn_packed__get_uint(noderevs_stream); + noderev.copyfrom_rev = (svn_revnum_t)svn_packed__get_int(noderevs_stream); + noderev.copyroot_path = (apr_size_t)svn_packed__get_uint(noderevs_stream); + noderev.copyroot_rev = (svn_revnum_t)svn_packed__get_int(noderevs_stream); + + noderev.prop_rep = (int)svn_packed__get_uint(noderevs_stream); + noderev.data_rep = (int)svn_packed__get_uint(noderevs_stream); + + noderev.created_path = (apr_size_t)svn_packed__get_uint(noderevs_stream); + noderev.mergeinfo_count = svn_packed__get_uint(noderevs_stream); + + APR_ARRAY_PUSH(noderevs->noderevs, binary_noderev_t) = noderev; + } + + *container = noderevs; + + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__serialize_noderevs_container(void **data, + apr_size_t *data_len, + void *in, + apr_pool_t *pool) +{ + svn_fs_x__noderevs_t *noderevs = in; + svn_stringbuf_t *serialized; + apr_size_t size + = noderevs->ids->elt_size * noderevs->ids->nelts + + noderevs->reps->elt_size * noderevs->reps->nelts + + noderevs->noderevs->elt_size * noderevs->noderevs->nelts + + 10 * noderevs->noderevs->elt_size + + 100; + + /* serialize array header and all its elements */ + svn_temp_serializer__context_t *context + = svn_temp_serializer__init(noderevs, sizeof(*noderevs), size, pool); + + /* serialize sub-structures */ + svn_fs_x__serialize_string_table(context, &noderevs->paths); + svn_fs_x__serialize_apr_array(context, &noderevs->ids); + svn_fs_x__serialize_apr_array(context, &noderevs->reps); + svn_fs_x__serialize_apr_array(context, &noderevs->noderevs); + + /* return the serialized result */ + serialized = svn_temp_serializer__get(context); + + *data = serialized->data; + *data_len = serialized->len; + + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__deserialize_noderevs_container(void **out, + void *data, + apr_size_t data_len, + apr_pool_t *pool) +{ + svn_fs_x__noderevs_t *noderevs = (svn_fs_x__noderevs_t *)data; + + /* de-serialize sub-structures */ + svn_fs_x__deserialize_string_table(noderevs, &noderevs->paths); + svn_fs_x__deserialize_apr_array(noderevs, &noderevs->ids, pool); + svn_fs_x__deserialize_apr_array(noderevs, &noderevs->reps, pool); + svn_fs_x__deserialize_apr_array(noderevs, &noderevs->noderevs, pool); + + /* done */ + *out = noderevs; + + return SVN_NO_ERROR; +} + +/* Deserialize the cache serialized APR struct at *IN in BUFFER and write + * the result to OUT. Note that this will only resolve the pointers and + * not the array elements themselves. */ +static void +resolve_apr_array_header(apr_array_header_t *out, + const void *buffer, + apr_array_header_t * const *in) +{ + const apr_array_header_t *array + = svn_temp_deserializer__ptr(buffer, (const void *const *)in); + const char *elements + = svn_temp_deserializer__ptr(array, (const void *const *)&array->elts); + + *out = *array; + out->elts = (char *)elements; + out->pool = NULL; +} + +svn_error_t * +svn_fs_x__noderevs_get_func(void **out, + const void *data, + apr_size_t data_len, + void *baton, + apr_pool_t *pool) +{ + svn_fs_x__noderev_t *noderev; + binary_noderev_t *binary_noderev; + + apr_array_header_t ids; + apr_array_header_t reps; + apr_array_header_t noderevs; + + apr_uint32_t idx = *(apr_uint32_t *)baton; + const svn_fs_x__noderevs_t *container = data; + + /* Resolve all container pointers */ + const string_table_t *paths + = svn_temp_deserializer__ptr(container, + (const void *const *)&container->paths); + + resolve_apr_array_header(&ids, container, &container->ids); + resolve_apr_array_header(&reps, container, &container->reps); + resolve_apr_array_header(&noderevs, container, &container->noderevs); + + /* allocate result struct and fill it field by field */ + noderev = apr_pcalloc(pool, sizeof(*noderev)); + binary_noderev = &APR_ARRAY_IDX(&noderevs, idx, binary_noderev_t); + + noderev->kind = (svn_node_kind_t)(binary_noderev->flags & NODEREV_KIND_MASK); + SVN_ERR(get_id(&noderev->noderev_id, &ids, binary_noderev->id)); + SVN_ERR(get_id(&noderev->node_id, &ids, binary_noderev->node_id)); + SVN_ERR(get_id(&noderev->copy_id, &ids, binary_noderev->copy_id)); + SVN_ERR(get_id(&noderev->predecessor_id, &ids, + binary_noderev->predecessor_id)); + + if (binary_noderev->flags & NODEREV_HAS_COPYFROM) + { + noderev->copyfrom_path + = svn_fs_x__string_table_get_func(paths, + binary_noderev->copyfrom_path, + NULL, + pool); + noderev->copyfrom_rev = binary_noderev->copyfrom_rev; + } + else + { + noderev->copyfrom_path = NULL; + noderev->copyfrom_rev = SVN_INVALID_REVNUM; + } + + if (binary_noderev->flags & NODEREV_HAS_COPYROOT) + { + noderev->copyroot_path + = svn_fs_x__string_table_get_func(paths, + binary_noderev->copyroot_path, + NULL, + pool); + noderev->copyroot_rev = binary_noderev->copyroot_rev; + } + else + { + noderev->copyroot_path = NULL; + noderev->copyroot_rev = 0; + } + + noderev->predecessor_count = binary_noderev->predecessor_count; + + SVN_ERR(get_representation(&noderev->prop_rep, &reps, + binary_noderev->prop_rep, pool)); + SVN_ERR(get_representation(&noderev->data_rep, &reps, + binary_noderev->data_rep, pool)); + + if (binary_noderev->flags & NODEREV_HAS_CPATH) + noderev->created_path + = svn_fs_x__string_table_get_func(paths, + binary_noderev->created_path, + NULL, + pool); + + noderev->mergeinfo_count = binary_noderev->mergeinfo_count; + + noderev->has_mergeinfo = (binary_noderev->flags & NODEREV_HAS_MINFO) ? 1 : 0; + *out = noderev; + + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__mergeinfo_count_get_func(void **out, + const void *data, + apr_size_t data_len, + void *baton, + apr_pool_t *pool) +{ + binary_noderev_t *binary_noderev; + apr_array_header_t noderevs; + + apr_uint32_t idx = *(apr_uint32_t *)baton; + const svn_fs_x__noderevs_t *container = data; + + /* Resolve all container pointers */ + resolve_apr_array_header(&noderevs, container, &container->noderevs); + binary_noderev = &APR_ARRAY_IDX(&noderevs, idx, binary_noderev_t); + + *(apr_int64_t *)out = binary_noderev->mergeinfo_count; + + return SVN_NO_ERROR; +} diff --git a/subversion/libsvn_fs_x/noderevs.h b/subversion/libsvn_fs_x/noderevs.h new file mode 100644 index 0000000..f9b79dc --- /dev/null +++ b/subversion/libsvn_fs_x/noderevs.h @@ -0,0 +1,142 @@ +/* noderevs.h --- FSX node revision container + * + * ==================================================================== + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + * ==================================================================== + */ + +#ifndef SVN_LIBSVN_FS__NODEREVS_H +#define SVN_LIBSVN_FS__NODEREVS_H + +#include "svn_io.h" +#include "fs.h" + +/* A collection of related noderevs tends to be widely redundant (similar + * paths, predecessor ID matching anothers ID, shared representations etc.) + * Also, the binary representation of a noderev can be much shorter than + * the ordinary textual variant. + * + * In its serialized form, the svn_fs_x__noderevs_t container extracts + * most of that redundancy and the run-time representation is also much + * smaller than sum of the respective svn_fs_x__noderev_t objects. + * + * As with other containers, this one has two modes: 'construction', in + * which you may add data to it, and 'getter' in which there is only r/o + * access to the data. + */ + +/* An opaque collection of node revisions. + */ +typedef struct svn_fs_x__noderevs_t svn_fs_x__noderevs_t; + +/* Create and populate noderev containers. */ + +/* Create and return a new noderevs container with an initial capacity of + * INITIAL_COUNT svn_fs_x__noderev_t objects. + * Allocate the result in RESULT_POOL. + */ +svn_fs_x__noderevs_t * +svn_fs_x__noderevs_create(int initial_count, + apr_pool_t *result_pool); + +/* Add NODEREV to the CONTAINER. Return the index that identifies the new + * item in this container. + */ +apr_size_t +svn_fs_x__noderevs_add(svn_fs_x__noderevs_t *container, + svn_fs_x__noderev_t *noderev); + +/* Return a rough estimate in bytes for the serialized representation + * of CONTAINER. + */ +apr_size_t +svn_fs_x__noderevs_estimate_size(const svn_fs_x__noderevs_t *container); + +/* Read from noderev containers. */ + +/* From CONTAINER, extract the noderev with the given IDX. Allocate + * the result in POOL and return it in *NODEREV_P. + */ +svn_error_t * +svn_fs_x__noderevs_get(svn_fs_x__noderev_t **noderev_p, + const svn_fs_x__noderevs_t *container, + apr_size_t idx, + apr_pool_t *pool); + +/* I/O interface. */ + +/* Write a serialized representation of CONTAINER to STREAM. + * Use SCRATCH_POOL for temporary allocations. + */ +svn_error_t * +svn_fs_x__write_noderevs_container(svn_stream_t *stream, + const svn_fs_x__noderevs_t *container, + apr_pool_t *scratch_pool); + +/* Read a noderev container from its serialized representation in STREAM. + * Allocate the result in RESULT_POOL and return it in *CONTAINER. Use + * SCRATCH_POOL for temporary allocations. + */ +svn_error_t * +svn_fs_x__read_noderevs_container(svn_fs_x__noderevs_t **container, + svn_stream_t *stream, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool); + +/* Implements #svn_cache__serialize_func_t for svn_fs_x__noderevs_t + * objects. + */ +svn_error_t * +svn_fs_x__serialize_noderevs_container(void **data, + apr_size_t *data_len, + void *in, + apr_pool_t *pool); + +/* Implements #svn_cache__deserialize_func_t for svn_fs_x__noderevs_t + * objects. + */ +svn_error_t * +svn_fs_x__deserialize_noderevs_container(void **out, + void *data, + apr_size_t data_len, + apr_pool_t *pool); + +/* Implements svn_cache__partial_getter_func_t for svn_fs_x__noderevs_t, + * setting *OUT to the svn_fs_x__noderev_t selected by the apr_uint32_t index + * passed in as *BATON. This function is similar to svn_fs_x__noderevs_get + * but operates on the cache serialized representation of the container. + */ +svn_error_t * +svn_fs_x__noderevs_get_func(void **out, + const void *data, + apr_size_t data_len, + void *baton, + apr_pool_t *pool); + +/* Implements svn_cache__partial_getter_func_t for the mergeinfo_count in + * the stored noderevs, setting *OUT to the apr_int64_t counter value of + * the noderev selected by the apr_uint32_t index passed in as *BATON. + */ +svn_error_t * +svn_fs_x__mergeinfo_count_get_func(void **out, + const void *data, + apr_size_t data_len, + void *baton, + apr_pool_t *pool); + +#endif diff --git a/subversion/libsvn_fs_x/pack.c b/subversion/libsvn_fs_x/pack.c new file mode 100644 index 0000000..cdbb980 --- /dev/null +++ b/subversion/libsvn_fs_x/pack.c @@ -0,0 +1,2324 @@ +/* pack.c --- FSX shard packing functionality + * + * ==================================================================== + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + * ==================================================================== + */ +#include <assert.h> + +#include "svn_pools.h" +#include "svn_dirent_uri.h" +#include "svn_sorts.h" +#include "private/svn_sorts_private.h" +#include "private/svn_subr_private.h" +#include "private/svn_string_private.h" +#include "private/svn_temp_serializer.h" + +#include "fs_x.h" +#include "pack.h" +#include "util.h" +#include "revprops.h" +#include "transaction.h" +#include "index.h" +#include "low_level.h" +#include "cached_data.h" +#include "changes.h" +#include "noderevs.h" +#include "reps.h" + +#include "../libsvn_fs/fs-loader.h" + +#include "svn_private_config.h" +#include "temp_serializer.h" + +/* Packing logic: + * + * We pack files on a pack file basis (e.g. 1000 revs) without changing + * existing pack files nor the revision files outside the range to pack. + * + * First, we will scan the revision file indexes to determine the number + * of items to "place" (i.e. determine their optimal position within the + * future pack file). For each item, we will need a constant amount of + * memory to track it. A MAX_MEM parameter sets a limit to the number of + * items we may place in one go. That means, we may not be able to add + * all revisions at once. Instead, we will run the placement for a subset + * of revisions at a time. The very unlikely worst case will simply append + * all revision data with just a little reshuffling inside each revision. + * + * In a second step, we read all revisions in the selected range, build + * the item tracking information and copy the items themselves from the + * revision files to temporary files. The latter serve as buckets for a + * very coarse bucket presort: Separate change lists, file properties, + * directory properties and noderevs + representations from one another. + * + * The third step will determine an optimized placement for the items in + * each of the 4 buckets separately. The first three will simply order + * their items by revision, starting with the newest once. Placing rep + * and noderev items is a more elaborate process documented in the code. + * + * In short, we store items in the following order: + * - changed paths lists + * - node property + * - directory properties + * - directory representations corresponding noderevs, lexical path order + * with special treatment of "trunk" and "branches" + * - same for file representations + * + * Step 4 copies the items from the temporary buckets into the final + * pack file and writes the temporary index files. + * + * Finally, after the last range of revisions, create the final indexes. + */ + +/* Maximum amount of memory we allocate for placement information during + * the pack process. + */ +#define DEFAULT_MAX_MEM (64 * 1024 * 1024) + +/* Data structure describing a node change at PATH, REVISION. + * We will sort these instances by PATH and NODE_ID such that we can combine + * similar nodes in the same reps container and store containers in path + * major order. + */ +typedef struct path_order_t +{ + /* changed path */ + svn_prefix_string__t *path; + + /* node ID for this PATH in REVISION */ + svn_fs_x__id_t node_id; + + /* when this change happened */ + svn_revnum_t revision; + + /* this is a directory node */ + svn_boolean_t is_dir; + + /* length of the expanded representation content */ + apr_int64_t expanded_size; + + /* item ID of the noderev linked to the change. May be (0, 0). */ + svn_fs_x__id_t noderev_id; + + /* item ID of the representation containing the new data. May be (0, 0). */ + svn_fs_x__id_t rep_id; +} path_order_t; + +/* Represents a reference from item FROM to item TO. FROM may be a noderev + * or rep_id while TO is (currently) always a representation. We will sort + * them by TO which allows us to collect all dependent items. + */ +typedef struct reference_t +{ + svn_fs_x__id_t to; + svn_fs_x__id_t from; +} reference_t; + +/* This structure keeps track of all the temporary data and status that + * needs to be kept around during the creation of one pack file. After + * each revision range (in case we can't process all revs at once due to + * memory restrictions), parts of the data will get re-initialized. + */ +typedef struct pack_context_t +{ + /* file system that we operate on */ + svn_fs_t *fs; + + /* cancel function to invoke at regular intervals. May be NULL */ + svn_cancel_func_t cancel_func; + + /* baton to pass to CANCEL_FUNC */ + void *cancel_baton; + + /* first revision in the shard (and future pack file) */ + svn_revnum_t shard_rev; + + /* first revision in the range to process (>= SHARD_REV) */ + svn_revnum_t start_rev; + + /* first revision after the range to process (<= SHARD_END_REV) */ + svn_revnum_t end_rev; + + /* first revision after the current shard */ + svn_revnum_t shard_end_rev; + + /* log-to-phys proto index for the whole pack file */ + apr_file_t *proto_l2p_index; + + /* phys-to-log proto index for the whole pack file */ + apr_file_t *proto_p2l_index; + + /* full shard directory path (containing the unpacked revisions) */ + const char *shard_dir; + + /* full packed shard directory path (containing the pack file + indexes) */ + const char *pack_file_dir; + + /* full pack file path (including PACK_FILE_DIR) */ + const char *pack_file_path; + + /* current write position (i.e. file length) in the pack file */ + apr_off_t pack_offset; + + /* the pack file to ultimately write all data to */ + apr_file_t *pack_file; + + /* array of svn_fs_x__p2l_entry_t *, all referring to change lists. + * Will be filled in phase 2 and be cleared after each revision range. */ + apr_array_header_t *changes; + + /* temp file receiving all change list items (referenced by CHANGES). + * Will be filled in phase 2 and be cleared after each revision range. */ + apr_file_t *changes_file; + + /* array of svn_fs_x__p2l_entry_t *, all referring to file properties. + * Will be filled in phase 2 and be cleared after each revision range. */ + apr_array_header_t *file_props; + + /* temp file receiving all file prop items (referenced by FILE_PROPS). + * Will be filled in phase 2 and be cleared after each revision range.*/ + apr_file_t *file_props_file; + + /* array of svn_fs_x__p2l_entry_t *, all referring to directory properties. + * Will be filled in phase 2 and be cleared after each revision range. */ + apr_array_header_t *dir_props; + + /* temp file receiving all directory prop items (referenced by DIR_PROPS). + * Will be filled in phase 2 and be cleared after each revision range.*/ + apr_file_t *dir_props_file; + + /* container for all PATH members in PATH_ORDER. */ + svn_prefix_tree__t *paths; + + /* array of path_order_t *. Will be filled in phase 2 and be cleared + * after each revision range. Sorted by PATH, NODE_ID. */ + apr_array_header_t *path_order; + + /* array of reference_t* linking representations to their delta bases. + * Will be filled in phase 2 and be cleared after each revision range. + * It will be sorted by the FROM members (for rep->base rep lookup). */ + apr_array_header_t *references; + + /* array of svn_fs_x__p2l_entry_t*. Will be filled in phase 2 and be + * cleared after each revision range. During phase 3, we will set items + * to NULL that we already processed. */ + apr_array_header_t *reps; + + /* array of int, marking for each revision, the which offset their items + * begin in REPS. Will be filled in phase 2 and be cleared after + * each revision range. */ + apr_array_header_t *rev_offsets; + + /* temp file receiving all items referenced by REPS. + * Will be filled in phase 2 and be cleared after each revision range.*/ + apr_file_t *reps_file; + + /* pool used for temporary data structures that will be cleaned up when + * the next range of revisions is being processed */ + apr_pool_t *info_pool; +} pack_context_t; + +/* Create and initialize a new pack context for packing shard SHARD_REV in + * SHARD_DIR into PACK_FILE_DIR within filesystem FS. Allocate it in POOL + * and return the structure in *CONTEXT. + * + * Limit the number of items being copied per iteration to MAX_ITEMS. + * Set CANCEL_FUNC and CANCEL_BATON as well. + */ +static svn_error_t * +initialize_pack_context(pack_context_t *context, + svn_fs_t *fs, + const char *pack_file_dir, + const char *shard_dir, + svn_revnum_t shard_rev, + int max_items, + svn_cancel_func_t cancel_func, + void *cancel_baton, + apr_pool_t *pool) +{ + svn_fs_x__data_t *ffd = fs->fsap_data; + const char *temp_dir; + int max_revs = MIN(ffd->max_files_per_dir, max_items); + + SVN_ERR_ASSERT(shard_rev % ffd->max_files_per_dir == 0); + + /* where we will place our various temp files */ + SVN_ERR(svn_io_temp_dir(&temp_dir, pool)); + + /* store parameters */ + context->fs = fs; + context->cancel_func = cancel_func; + context->cancel_baton = cancel_baton; + + context->shard_rev = shard_rev; + context->start_rev = shard_rev; + context->end_rev = shard_rev; + context->shard_end_rev = shard_rev + ffd->max_files_per_dir; + + /* Create the new directory and pack file. */ + context->shard_dir = shard_dir; + context->pack_file_dir = pack_file_dir; + context->pack_file_path + = svn_dirent_join(pack_file_dir, PATH_PACKED, pool); + SVN_ERR(svn_io_file_open(&context->pack_file, context->pack_file_path, + APR_WRITE | APR_BUFFERED | APR_BINARY | APR_EXCL + | APR_CREATE, APR_OS_DEFAULT, pool)); + + /* Proto index files */ + SVN_ERR(svn_fs_x__l2p_proto_index_open( + &context->proto_l2p_index, + svn_dirent_join(pack_file_dir, + PATH_INDEX PATH_EXT_L2P_INDEX, + pool), + pool)); + SVN_ERR(svn_fs_x__p2l_proto_index_open( + &context->proto_p2l_index, + svn_dirent_join(pack_file_dir, + PATH_INDEX PATH_EXT_P2L_INDEX, + pool), + pool)); + + /* item buckets: one item info array and one temp file per bucket */ + context->changes = apr_array_make(pool, max_items, + sizeof(svn_fs_x__p2l_entry_t *)); + SVN_ERR(svn_io_open_unique_file3(&context->changes_file, NULL, temp_dir, + svn_io_file_del_on_close, pool, pool)); + context->file_props = apr_array_make(pool, max_items, + sizeof(svn_fs_x__p2l_entry_t *)); + SVN_ERR(svn_io_open_unique_file3(&context->file_props_file, NULL, temp_dir, + svn_io_file_del_on_close, pool, pool)); + context->dir_props = apr_array_make(pool, max_items, + sizeof(svn_fs_x__p2l_entry_t *)); + SVN_ERR(svn_io_open_unique_file3(&context->dir_props_file, NULL, temp_dir, + svn_io_file_del_on_close, pool, pool)); + + /* noderev and representation item bucket */ + context->rev_offsets = apr_array_make(pool, max_revs, sizeof(int)); + context->path_order = apr_array_make(pool, max_items, + sizeof(path_order_t *)); + context->references = apr_array_make(pool, max_items, + sizeof(reference_t *)); + context->reps = apr_array_make(pool, max_items, + sizeof(svn_fs_x__p2l_entry_t *)); + SVN_ERR(svn_io_open_unique_file3(&context->reps_file, NULL, temp_dir, + svn_io_file_del_on_close, pool, pool)); + + /* the pool used for temp structures */ + context->info_pool = svn_pool_create(pool); + context->paths = svn_prefix_tree__create(context->info_pool); + + return SVN_NO_ERROR; +} + +/* Clean up / free all revision range specific data and files in CONTEXT. + * Use SCRATCH_POOL for temporary allocations. + */ +static svn_error_t * +reset_pack_context(pack_context_t *context, + apr_pool_t *scratch_pool) +{ + apr_array_clear(context->changes); + SVN_ERR(svn_io_file_trunc(context->changes_file, 0, scratch_pool)); + apr_array_clear(context->file_props); + SVN_ERR(svn_io_file_trunc(context->file_props_file, 0, scratch_pool)); + apr_array_clear(context->dir_props); + SVN_ERR(svn_io_file_trunc(context->dir_props_file, 0, scratch_pool)); + + apr_array_clear(context->rev_offsets); + apr_array_clear(context->path_order); + apr_array_clear(context->references); + apr_array_clear(context->reps); + SVN_ERR(svn_io_file_trunc(context->reps_file, 0, scratch_pool)); + + svn_pool_clear(context->info_pool); + + return SVN_NO_ERROR; +} + +/* Call this after the last revision range. It will finalize all index files + * for CONTEXT and close any open files. + * Use SCRATCH_POOL for temporary allocations. + */ +static svn_error_t * +close_pack_context(pack_context_t *context, + apr_pool_t *scratch_pool) +{ + const char *proto_l2p_index_path; + const char *proto_p2l_index_path; + + /* need the file names for the actual index creation call further down */ + SVN_ERR(svn_io_file_name_get(&proto_l2p_index_path, + context->proto_l2p_index, scratch_pool)); + SVN_ERR(svn_io_file_name_get(&proto_p2l_index_path, + context->proto_p2l_index, scratch_pool)); + + /* finalize proto index files */ + SVN_ERR(svn_io_file_close(context->proto_l2p_index, scratch_pool)); + SVN_ERR(svn_io_file_close(context->proto_p2l_index, scratch_pool)); + + /* Append the actual index data to the pack file. */ + SVN_ERR(svn_fs_x__add_index_data(context->fs, context->pack_file, + proto_l2p_index_path, + proto_p2l_index_path, + context->shard_rev, + scratch_pool)); + + /* remove proto index files */ + SVN_ERR(svn_io_remove_file2(proto_l2p_index_path, FALSE, scratch_pool)); + SVN_ERR(svn_io_remove_file2(proto_p2l_index_path, FALSE, scratch_pool)); + + SVN_ERR(svn_io_file_close(context->pack_file, scratch_pool)); + + return SVN_NO_ERROR; +} + +/* Efficiently copy SIZE bytes from SOURCE to DEST. Invoke the CANCEL_FUNC + * from CONTEXT at regular intervals. + * Use SCRATCH_POOL for temporary allocations. + */ +static svn_error_t * +copy_file_data(pack_context_t *context, + apr_file_t *dest, + apr_file_t *source, + apr_off_t size, + apr_pool_t *scratch_pool) +{ + /* most non-representation items will be small. Minimize the buffer + * and infrastructure overhead in that case. */ + enum { STACK_BUFFER_SIZE = 1024 }; + + if (size < STACK_BUFFER_SIZE) + { + /* copy small data using a fixed-size buffer on stack */ + char buffer[STACK_BUFFER_SIZE]; + SVN_ERR(svn_io_file_read_full2(source, buffer, (apr_size_t)size, + NULL, NULL, scratch_pool)); + SVN_ERR(svn_io_file_write_full(dest, buffer, (apr_size_t)size, + NULL, scratch_pool)); + } + else + { + /* use streaming copies for larger data blocks. That may require + * the allocation of larger buffers and we should make sure that + * this extra memory is released asap. */ + svn_fs_x__data_t *ffd = context->fs->fsap_data; + apr_pool_t *copypool = svn_pool_create(scratch_pool); + char *buffer = apr_palloc(copypool, ffd->block_size); + + while (size) + { + apr_size_t to_copy = (apr_size_t)(MIN(size, ffd->block_size)); + if (context->cancel_func) + SVN_ERR(context->cancel_func(context->cancel_baton)); + + SVN_ERR(svn_io_file_read_full2(source, buffer, to_copy, + NULL, NULL, scratch_pool)); + SVN_ERR(svn_io_file_write_full(dest, buffer, to_copy, + NULL, scratch_pool)); + + size -= to_copy; + } + + svn_pool_destroy(copypool); + } + + return SVN_NO_ERROR; +} + +/* Writes SIZE bytes, all 0, to DEST. + * Use SCRATCH_POOL for temporary allocations. + */ +static svn_error_t * +write_null_bytes(apr_file_t *dest, + apr_off_t size, + apr_pool_t *scratch_pool) +{ + /* Have a collection of high-quality, easy to access NUL bytes handy. */ + enum { BUFFER_SIZE = 1024 }; + static const char buffer[BUFFER_SIZE] = { 0 }; + + /* copy SIZE of them into the file's buffer */ + while (size) + { + apr_size_t to_write = MIN(size, BUFFER_SIZE); + SVN_ERR(svn_io_file_write_full(dest, buffer, to_write, NULL, + scratch_pool)); + size -= to_write; + } + + return SVN_NO_ERROR; +} + +/* Copy the "simple" item (changed paths list or property representation) + * from the current position in REV_FILE to TEMP_FILE using CONTEXT. Add + * a copy of ENTRY to ENTRIES but with an updated offset value that points + * to the copy destination in TEMP_FILE. + * Use SCRATCH_POOL for temporary allocations. + */ +static svn_error_t * +copy_item_to_temp(pack_context_t *context, + apr_array_header_t *entries, + apr_file_t *temp_file, + svn_fs_x__revision_file_t *rev_file, + svn_fs_x__p2l_entry_t *entry, + apr_pool_t *scratch_pool) +{ + svn_fs_x__p2l_entry_t *new_entry + = svn_fs_x__p2l_entry_dup(entry, context->info_pool); + + SVN_ERR(svn_fs_x__get_file_offset(&new_entry->offset, temp_file, + scratch_pool)); + APR_ARRAY_PUSH(entries, svn_fs_x__p2l_entry_t *) = new_entry; + + SVN_ERR(copy_file_data(context, temp_file, rev_file->file, entry->size, + scratch_pool)); + + return SVN_NO_ERROR; +} + +/* Return the offset within CONTEXT->REPS that corresponds to item + * ITEM_INDEX in REVISION. + */ +static int +get_item_array_index(pack_context_t *context, + svn_revnum_t revision, + apr_int64_t item_index) +{ + assert(revision >= context->start_rev); + return (int)item_index + APR_ARRAY_IDX(context->rev_offsets, + revision - context->start_rev, + int); +} + +/* Write INFO to the correct position in CONTEXT->REPS. The latter may + * need auto-expanding. Overwriting an array element is not allowed. + */ +static void +add_item_rep_mapping(pack_context_t *context, + svn_fs_x__p2l_entry_t *entry) +{ + int idx; + assert(entry->item_count == 1); + + /* index of INFO */ + idx = get_item_array_index(context, + entry->items[0].change_set, + entry->items[0].number); + + /* make sure the index exists in the array */ + while (context->reps->nelts <= idx) + APR_ARRAY_PUSH(context->reps, void *) = NULL; + + /* set the element. If there is already an entry, there are probably + * two items claiming to be the same -> bail out */ + assert(!APR_ARRAY_IDX(context->reps, idx, void *)); + APR_ARRAY_IDX(context->reps, idx, void *) = entry; +} + +/* Return the P2L entry from CONTEXT->REPS for the given ID. If there is + * none (or not anymore), return NULL. If RESET has been specified, set + * the array entry to NULL after returning the entry. + */ +static svn_fs_x__p2l_entry_t * +get_item(pack_context_t *context, + const svn_fs_x__id_t *id, + svn_boolean_t reset) +{ + svn_fs_x__p2l_entry_t *result = NULL; + svn_revnum_t revision = svn_fs_x__get_revnum(id->change_set); + if (id->number && revision >= context->start_rev) + { + int idx = get_item_array_index(context, revision, id->number); + if (context->reps->nelts > idx) + { + result = APR_ARRAY_IDX(context->reps, idx, void *); + if (result && reset) + APR_ARRAY_IDX(context->reps, idx, void *) = NULL; + } + } + + return result; +} + +/* Copy representation item identified by ENTRY from the current position + * in REV_FILE into CONTEXT->REPS_FILE. Add all tracking into needed by + * our placement algorithm to CONTEXT. + * Use SCRATCH_POOL for temporary allocations. + */ +static svn_error_t * +copy_rep_to_temp(pack_context_t *context, + svn_fs_x__revision_file_t *rev_file, + svn_fs_x__p2l_entry_t *entry, + apr_pool_t *scratch_pool) +{ + svn_fs_x__rep_header_t *rep_header; + apr_off_t source_offset = entry->offset; + + /* create a copy of ENTRY, make it point to the copy destination and + * store it in CONTEXT */ + entry = svn_fs_x__p2l_entry_dup(entry, context->info_pool); + SVN_ERR(svn_fs_x__get_file_offset(&entry->offset, context->reps_file, + scratch_pool)); + add_item_rep_mapping(context, entry); + + /* read & parse the representation header */ + SVN_ERR(svn_fs_x__read_rep_header(&rep_header, rev_file->stream, + scratch_pool, scratch_pool)); + + /* if the representation is a delta against some other rep, link the two */ + if ( rep_header->type == svn_fs_x__rep_delta + && rep_header->base_revision >= context->start_rev) + { + reference_t *reference = apr_pcalloc(context->info_pool, + sizeof(*reference)); + reference->from = entry->items[0]; + reference->to.change_set + = svn_fs_x__change_set_by_rev(rep_header->base_revision); + reference->to.number = rep_header->base_item_index; + APR_ARRAY_PUSH(context->references, reference_t *) = reference; + } + + /* copy the whole rep (including header!) to our temp file */ + SVN_ERR(svn_io_file_seek(rev_file->file, APR_SET, &source_offset, + scratch_pool)); + SVN_ERR(copy_file_data(context, context->reps_file, rev_file->file, + entry->size, scratch_pool)); + + return SVN_NO_ERROR; +} + +/* Directories first, dirs / files sorted by name in reverse lexical order. + * This maximizes the chance of two items being located close to one another + * in *all* pack files independent of their change order. It also groups + * multi-project repos nicely according to their sub-projects. The reverse + * order aspect gives "trunk" preference over "tags" and "branches", so + * trunk-related items are more likely to be contiguous. + */ +static int +compare_dir_entries(const svn_sort__item_t *a, + const svn_sort__item_t *b) +{ + const svn_fs_dirent_t *lhs = (const svn_fs_dirent_t *) a->value; + const svn_fs_dirent_t *rhs = (const svn_fs_dirent_t *) b->value; + + if (lhs->kind != rhs->kind) + return lhs->kind == svn_node_dir ? -1 : 1; + + return strcmp(lhs->name, rhs->name); +} + +apr_array_header_t * +svn_fs_x__order_dir_entries(svn_fs_t *fs, + apr_hash_t *directory, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool) +{ + apr_array_header_t *ordered + = svn_sort__hash(directory, compare_dir_entries, scratch_pool); + + apr_array_header_t *result + = apr_array_make(result_pool, ordered->nelts, sizeof(svn_fs_dirent_t *)); + + int i; + for (i = 0; i < ordered->nelts; ++i) + APR_ARRAY_PUSH(result, svn_fs_dirent_t *) + = APR_ARRAY_IDX(ordered, i, svn_sort__item_t).value; + + return result; +} + +/* Return a duplicate of the the ORIGINAL path and with special sub-strins + * (e.g. "trunk") modified in such a way that have a lower lexicographic + * value than any other "normal" file name. + */ +static const char * +tweak_path_for_ordering(const char *original, + apr_pool_t *pool) +{ + /* We may add further special cases as needed. */ + enum {SPECIAL_COUNT = 2}; + static const char *special[SPECIAL_COUNT] = {"trunk", "branch"}; + char *pos; + char *path = apr_pstrdup(pool, original); + int i; + + /* Replace the first char of any "special" sub-string we find by + * a control char, i.e. '\1' .. '\31'. In the rare event that this + * would clash with existing paths, no data will be lost but merely + * the node ordering will be sub-optimal. + */ + for (i = 0; i < SPECIAL_COUNT; ++i) + for (pos = strstr(path, special[i]); + pos; + pos = strstr(pos + 1, special[i])) + { + *pos = (char)(i + '\1'); + } + + return path; +} + +/* Copy node revision item identified by ENTRY from the current position + * in REV_FILE into CONTEXT->REPS_FILE. Add all tracking into needed by + * our placement algorithm to CONTEXT. + * Use SCRATCH_POOL for temporary allocations. + */ +static svn_error_t * +copy_node_to_temp(pack_context_t *context, + svn_fs_x__revision_file_t *rev_file, + svn_fs_x__p2l_entry_t *entry, + apr_pool_t *scratch_pool) +{ + path_order_t *path_order = apr_pcalloc(context->info_pool, + sizeof(*path_order)); + svn_fs_x__noderev_t *noderev; + const char *sort_path; + apr_off_t source_offset = entry->offset; + + /* read & parse noderev */ + SVN_ERR(svn_fs_x__read_noderev(&noderev, rev_file->stream, scratch_pool, + scratch_pool)); + + /* create a copy of ENTRY, make it point to the copy destination and + * store it in CONTEXT */ + entry = svn_fs_x__p2l_entry_dup(entry, context->info_pool); + SVN_ERR(svn_fs_x__get_file_offset(&entry->offset, context->reps_file, + scratch_pool)); + add_item_rep_mapping(context, entry); + + /* copy the noderev to our temp file */ + SVN_ERR(svn_io_file_seek(rev_file->file, APR_SET, &source_offset, + scratch_pool)); + SVN_ERR(copy_file_data(context, context->reps_file, rev_file->file, + entry->size, scratch_pool)); + + /* if the node has a data representation, make that the node's "base". + * This will (often) cause the noderev to be placed right in front of + * its data representation. */ + + if (noderev->data_rep + && svn_fs_x__get_revnum(noderev->data_rep->id.change_set) + >= context->start_rev) + { + reference_t *reference = apr_pcalloc(context->info_pool, + sizeof(*reference)); + reference->from = entry->items[0]; + reference->to.change_set = noderev->data_rep->id.change_set; + reference->to.number = noderev->data_rep->id.number; + APR_ARRAY_PUSH(context->references, reference_t *) = reference; + + path_order->rep_id = reference->to; + path_order->expanded_size = noderev->data_rep->expanded_size; + } + + /* Sort path is the key used for ordering noderevs and associated reps. + * It will not be stored in the final pack file. */ + sort_path = tweak_path_for_ordering(noderev->created_path, scratch_pool); + path_order->path = svn_prefix_string__create(context->paths, sort_path); + path_order->node_id = noderev->node_id; + path_order->revision = svn_fs_x__get_revnum(noderev->noderev_id.change_set); + path_order->is_dir = noderev->kind == svn_node_dir; + path_order->noderev_id = noderev->noderev_id; + APR_ARRAY_PUSH(context->path_order, path_order_t *) = path_order; + + return SVN_NO_ERROR; +} + +/* implements compare_fn_t. Place LHS before RHS, if the latter is older. + */ +static int +compare_p2l_info(const svn_fs_x__p2l_entry_t * const * lhs, + const svn_fs_x__p2l_entry_t * const * rhs) +{ + assert(*lhs != *rhs); + if ((*lhs)->item_count == 0) + return (*lhs)->item_count == 0 ? 0 : -1; + if ((*lhs)->item_count == 0) + return 1; + + if ((*lhs)->items[0].change_set == (*rhs)->items[0].change_set) + return (*lhs)->items[0].number > (*rhs)->items[0].number ? -1 : 1; + + return (*lhs)->items[0].change_set > (*rhs)->items[0].change_set ? -1 : 1; +} + +/* Sort svn_fs_x__p2l_entry_t * array ENTRIES by age. Place the latest + * items first. + */ +static void +sort_items(apr_array_header_t *entries) +{ + svn_sort__array(entries, + (int (*)(const void *, const void *))compare_p2l_info); +} + +/* implements compare_fn_t. Sort descending by PATH, NODE_ID and REVISION. + */ +static int +compare_path_order(const path_order_t * const * lhs_p, + const path_order_t * const * rhs_p) +{ + const path_order_t * lhs = *lhs_p; + const path_order_t * rhs = *rhs_p; + + /* cluster all directories */ + int diff = rhs->is_dir - lhs->is_dir; + if (diff) + return diff; + + /* lexicographic order on path and node (i.e. latest first) */ + diff = svn_prefix_string__compare(lhs->path, rhs->path); + if (diff) + return diff; + + /* reverse order on node (i.e. latest first) */ + diff = svn_fs_x__id_compare(&rhs->node_id, &lhs->node_id); + if (diff) + return diff; + + /* reverse order on revision (i.e. latest first) */ + if (lhs->revision != rhs->revision) + return lhs->revision < rhs->revision ? 1 : -1; + + return 0; +} + +/* implements compare_fn_t. Sort ascending by TO, FROM. + */ +static int +compare_references(const reference_t * const * lhs_p, + const reference_t * const * rhs_p) +{ + const reference_t * lhs = *lhs_p; + const reference_t * rhs = *rhs_p; + + int diff = svn_fs_x__id_compare(&lhs->to, &rhs->to); + return diff ? diff : svn_fs_x__id_compare(&lhs->from, &rhs->from); +} + +/* Order the data collected in CONTEXT such that we can place them in the + * desired order. + */ +static void +sort_reps(pack_context_t *context) +{ + svn_sort__array(context->path_order, + (int (*)(const void *, const void *))compare_path_order); + svn_sort__array(context->references, + (int (*)(const void *, const void *))compare_references); +} + +/* Return the remaining unused bytes in the current block in CONTEXT's + * pack file. + */ +static apr_ssize_t +get_block_left(pack_context_t *context) +{ + svn_fs_x__data_t *ffd = context->fs->fsap_data; + return ffd->block_size - (context->pack_offset % ffd->block_size); +} + +/* To prevent items from overlapping a block boundary, we will usually + * put them into the next block and top up the old one with NUL bytes. + * Pad CONTEXT's pack file to the end of the current block, if that padding + * is short enough. Use SCRATCH_POOL for temporary allocations. + */ +static svn_error_t * +auto_pad_block(pack_context_t *context, + apr_pool_t *scratch_pool) +{ + svn_fs_x__data_t *ffd = context->fs->fsap_data; + + /* This is the maximum number of bytes "wasted" that way per block. + * Larger items will cross the block boundaries. */ + const apr_off_t max_padding = MAX(ffd->block_size / 50, 512); + + /* Is wasted space small enough to align the current item to the next + * block? */ + apr_off_t padding = get_block_left(context); + + if (padding < max_padding) + { + /* Yes. To up with NUL bytes and don't forget to create + * an P2L index entry marking this section as unused. */ + svn_fs_x__p2l_entry_t null_entry; + + null_entry.offset = context->pack_offset; + null_entry.size = padding; + null_entry.type = SVN_FS_X__ITEM_TYPE_UNUSED; + null_entry.fnv1_checksum = 0; + null_entry.item_count = 0; + null_entry.items = NULL; + + SVN_ERR(write_null_bytes(context->pack_file, padding, scratch_pool)); + SVN_ERR(svn_fs_x__p2l_proto_index_add_entry + (context->proto_p2l_index, &null_entry, scratch_pool)); + context->pack_offset += padding; + } + + return SVN_NO_ERROR; +} + +/* Return the index of the first entry in CONTEXT->REFERENCES that + * references ITEM->ITEMS[0] if such entries exist. All matching items + * will be consecutive. + */ +static int +find_first_reference(pack_context_t *context, + svn_fs_x__p2l_entry_t *item) +{ + int lower = 0; + int upper = context->references->nelts - 1; + + while (lower <= upper) + { + int current = lower + (upper - lower) / 2; + reference_t *reference + = APR_ARRAY_IDX(context->references, current, reference_t *); + + if (svn_fs_x__id_compare(&reference->to, item->items) < 0) + lower = current + 1; + else + upper = current - 1; + } + + return lower; +} + +/* Check whether entry number IDX in CONTEXT->REFERENCES references ITEM. + */ +static svn_boolean_t +is_reference_match(pack_context_t *context, + int idx, + svn_fs_x__p2l_entry_t *item) +{ + reference_t *reference; + if (context->references->nelts <= idx) + return FALSE; + + reference = APR_ARRAY_IDX(context->references, idx, reference_t *); + return svn_fs_x__id_eq(&reference->to, item->items); +} + +/* Starting at IDX in CONTEXT->PATH_ORDER, select all representations and + * noderevs that should be placed into the same container, respectively. + * Append the path_order_t * elements encountered in SELECTED, the + * svn_fs_x__p2l_entry_t * of the representations that should be placed + * into the same reps container will be appended to REP_PARTS and the + * svn_fs_x__p2l_entry_t * of the noderevs referencing those reps will + * be appended to NODE_PARTS. + * + * Remove all returned items from the CONTEXT->REPS container and prevent + * them from being placed a second time later on. That also means that the + * caller has to place all items returned. + */ +static svn_error_t * +select_reps(pack_context_t *context, + int idx, + apr_array_header_t *selected, + apr_array_header_t *node_parts, + apr_array_header_t *rep_parts) +{ + apr_array_header_t *path_order = context->path_order; + path_order_t *start_path = APR_ARRAY_IDX(path_order, idx, path_order_t *); + + svn_fs_x__p2l_entry_t *node_part; + svn_fs_x__p2l_entry_t *rep_part; + svn_fs_x__p2l_entry_t *depending; + int i, k; + + /* collect all path_order records as well as rep and noderev items + * that occupy the same path with the same node. */ + for (; idx < path_order->nelts; ++idx) + { + path_order_t *current_path + = APR_ARRAY_IDX(path_order, idx, path_order_t *); + + if (!svn_fs_x__id_eq(&start_path->node_id, ¤t_path->node_id)) + break; + + APR_ARRAY_IDX(path_order, idx, path_order_t *) = NULL; + node_part = get_item(context, ¤t_path->noderev_id, TRUE); + rep_part = get_item(context, ¤t_path->rep_id, TRUE); + + if (node_part && rep_part) + APR_ARRAY_PUSH(selected, path_order_t *) = current_path; + + if (node_part) + APR_ARRAY_PUSH(node_parts, svn_fs_x__p2l_entry_t *) = node_part; + if (rep_part) + APR_ARRAY_PUSH(rep_parts, svn_fs_x__p2l_entry_t *) = rep_part; + } + + /* collect depending reps and noderevs that reference any of the collected + * reps */ + for (i = 0; i < rep_parts->nelts; ++i) + { + rep_part = APR_ARRAY_IDX(rep_parts, i, svn_fs_x__p2l_entry_t*); + for (k = find_first_reference(context, rep_part); + is_reference_match(context, k, rep_part); + ++k) + { + reference_t *reference + = APR_ARRAY_IDX(context->references, k, reference_t *); + + depending = get_item(context, &reference->from, TRUE); + if (!depending) + continue; + + if (depending->type == SVN_FS_X__ITEM_TYPE_NODEREV) + APR_ARRAY_PUSH(node_parts, svn_fs_x__p2l_entry_t *) = depending; + else + APR_ARRAY_PUSH(rep_parts, svn_fs_x__p2l_entry_t *) = depending; + } + } + + return SVN_NO_ERROR; +} + +/* Return TRUE, if all path_order_t * in SELECTED reference contents that is + * not longer than LIMIT. + */ +static svn_boolean_t +reps_fit_into_containers(apr_array_header_t *selected, + apr_uint64_t limit) +{ + int i; + for (i = 0; i < selected->nelts; ++i) + if (APR_ARRAY_IDX(selected, i, path_order_t *)->expanded_size > limit) + return FALSE; + + return TRUE; +} + +/* Write the *CONTAINER containing the noderevs described by the + * svn_fs_x__p2l_entry_t * in ITEMS to the pack file on CONTEXT. + * Append a P2L entry for the container to CONTAINER->REPS. + * Afterwards, clear ITEMS and re-allocate *CONTAINER in CONTAINER_POOL + * so the caller may fill them again. + * Use SCRATCH_POOL for temporary allocations. + */ +static svn_error_t * +write_nodes_container(pack_context_t *context, + svn_fs_x__noderevs_t **container, + apr_array_header_t *items, + apr_pool_t *container_pool, + apr_pool_t *scratch_pool) +{ + int i; + apr_off_t offset = 0; + svn_fs_x__p2l_entry_t *container_entry; + svn_stream_t *pack_stream; + + if (items->nelts == 0) + return SVN_NO_ERROR; + + /* serialize container */ + container_entry = apr_palloc(context->info_pool, sizeof(*container_entry)); + pack_stream = svn_checksum__wrap_write_stream_fnv1a_32x4 + (&container_entry->fnv1_checksum, + svn_stream_from_aprfile2(context->pack_file, + TRUE, scratch_pool), + scratch_pool); + SVN_ERR(svn_fs_x__write_noderevs_container(pack_stream, *container, + scratch_pool)); + SVN_ERR(svn_stream_close(pack_stream)); + SVN_ERR(svn_io_file_seek(context->pack_file, APR_CUR, &offset, + scratch_pool)); + + /* replace first noderev item in ENTRIES with the container + and set all others to NULL */ + container_entry->offset = context->pack_offset; + container_entry->size = offset - container_entry->offset; + container_entry->type = SVN_FS_X__ITEM_TYPE_NODEREVS_CONT; + container_entry->item_count = items->nelts; + container_entry->items = apr_palloc(context->info_pool, + sizeof(svn_fs_x__id_t) * container_entry->item_count); + + for (i = 0; i < items->nelts; ++i) + container_entry->items[i] + = APR_ARRAY_IDX(items, i, svn_fs_x__p2l_entry_t *)->items[0]; + + context->pack_offset = offset; + APR_ARRAY_PUSH(context->reps, svn_fs_x__p2l_entry_t *) + = container_entry; + + /* Write P2L index for copied items, i.e. the 1 container */ + SVN_ERR(svn_fs_x__p2l_proto_index_add_entry + (context->proto_p2l_index, container_entry, scratch_pool)); + + svn_pool_clear(container_pool); + *container = svn_fs_x__noderevs_create(16, container_pool); + apr_array_clear(items); + + return SVN_NO_ERROR; +} + +/* Read the noderevs given by the svn_fs_x__p2l_entry_t * in NODE_PARTS + * from TEMP_FILE and add them to *CONTAINER and NODES_IN_CONTAINER. + * Whenever the container grows bigger than the current block in CONTEXT, + * write the data to disk and continue in the next block. + * + * Use CONTAINER_POOL to re-allocate the *CONTAINER as necessary and + * SCRATCH_POOL to temporary allocations. + */ +static svn_error_t * +store_nodes(pack_context_t *context, + apr_file_t *temp_file, + apr_array_header_t *node_parts, + svn_fs_x__noderevs_t **container, + apr_array_header_t *nodes_in_container, + apr_pool_t *container_pool, + apr_pool_t *scratch_pool) +{ + int i; + + apr_pool_t *iterpool = svn_pool_create(scratch_pool); + svn_stream_t *stream + = svn_stream_from_aprfile2(temp_file, TRUE, scratch_pool); + + /* number of bytes in the current block not being spent on fixed-size + items (i.e. those not put into the container). */ + apr_size_t capacity_left = get_block_left(context); + + /* Estimated noderev container size */ + apr_size_t last_container_size = 0, container_size = 0; + + /* Estimate extra capacity we will gain from container compression. */ + apr_size_t pack_savings = 0; + for (i = 0; i < node_parts->nelts; ++i) + { + svn_fs_x__noderev_t *noderev; + svn_fs_x__p2l_entry_t *entry + = APR_ARRAY_IDX(node_parts, i, svn_fs_x__p2l_entry_t *); + + /* if we reached the limit, check whether we saved some space + through the container. */ + if (capacity_left + pack_savings < container_size + entry->size) + container_size = svn_fs_x__noderevs_estimate_size(*container); + + /* If necessary and the container is large enough, try harder + by actually serializing the container and determine current + savings due to compression. */ + if ( capacity_left + pack_savings < container_size + entry->size + && container_size > last_container_size + 2000) + { + svn_stringbuf_t *serialized + = svn_stringbuf_create_ensure(container_size, iterpool); + svn_stream_t *temp_stream + = svn_stream_from_stringbuf(serialized, iterpool); + + SVN_ERR(svn_fs_x__write_noderevs_container(temp_stream, *container, + iterpool)); + SVN_ERR(svn_stream_close(temp_stream)); + + last_container_size = container_size; + pack_savings = container_size - serialized->len; + } + + /* still doesn't fit? -> block is full. Flush */ + if ( capacity_left + pack_savings < container_size + entry->size + && nodes_in_container->nelts < 2) + { + SVN_ERR(auto_pad_block(context, iterpool)); + capacity_left = get_block_left(context); + } + + /* still doesn't fit? -> block is full. Flush */ + if (capacity_left + pack_savings < container_size + entry->size) + { + SVN_ERR(write_nodes_container(context, container, + nodes_in_container, container_pool, + iterpool)); + + capacity_left = get_block_left(context); + pack_savings = 0; + container_size = 0; + } + + /* item will fit into the block. */ + SVN_ERR(svn_io_file_seek(temp_file, APR_SET, &entry->offset, iterpool)); + SVN_ERR(svn_fs_x__read_noderev(&noderev, stream, iterpool, iterpool)); + svn_fs_x__noderevs_add(*container, noderev); + + container_size += entry->size; + APR_ARRAY_PUSH(nodes_in_container, svn_fs_x__p2l_entry_t *) = entry; + + svn_pool_clear(iterpool); + } + + svn_pool_destroy(iterpool); + + return SVN_NO_ERROR; +} + + +/* Finalize CONTAINER and write it to CONTEXT's pack file. + * Append an P2L entry containing the given SUB_ITEMS to NEW_ENTRIES. + * Use SCRATCH_POOL for temporary allocations. + */ +static svn_error_t * +write_reps_container(pack_context_t *context, + svn_fs_x__reps_builder_t *container, + apr_array_header_t *sub_items, + apr_array_header_t *new_entries, + apr_pool_t *scratch_pool) +{ + apr_off_t offset = 0; + svn_fs_x__p2l_entry_t container_entry; + + svn_stream_t *pack_stream + = svn_checksum__wrap_write_stream_fnv1a_32x4 + (&container_entry.fnv1_checksum, + svn_stream_from_aprfile2(context->pack_file, + TRUE, scratch_pool), + scratch_pool); + + SVN_ERR(svn_fs_x__write_reps_container(pack_stream, container, + scratch_pool)); + SVN_ERR(svn_stream_close(pack_stream)); + SVN_ERR(svn_io_file_seek(context->pack_file, APR_CUR, &offset, + scratch_pool)); + + container_entry.offset = context->pack_offset; + container_entry.size = offset - container_entry.offset; + container_entry.type = SVN_FS_X__ITEM_TYPE_REPS_CONT; + container_entry.item_count = sub_items->nelts; + container_entry.items = (svn_fs_x__id_t *)sub_items->elts; + + context->pack_offset = offset; + APR_ARRAY_PUSH(new_entries, svn_fs_x__p2l_entry_t *) + = svn_fs_x__p2l_entry_dup(&container_entry, context->info_pool); + + SVN_ERR(svn_fs_x__p2l_proto_index_add_entry + (context->proto_p2l_index, &container_entry, scratch_pool)); + + return SVN_NO_ERROR; +} + +/* Read the (property) representations identified by svn_fs_x__p2l_entry_t + * elements in ENTRIES from TEMP_FILE, aggregate them and write them into + * CONTEXT->PACK_FILE. Use SCRATCH_POOL for temporary allocations. + */ +static svn_error_t * +write_reps_containers(pack_context_t *context, + apr_array_header_t *entries, + apr_file_t *temp_file, + apr_array_header_t *new_entries, + apr_pool_t *scratch_pool) +{ + apr_pool_t *iterpool = svn_pool_create(scratch_pool); + apr_pool_t *container_pool = svn_pool_create(scratch_pool); + int i; + + apr_ssize_t block_left = get_block_left(context); + + svn_fs_x__reps_builder_t *container + = svn_fs_x__reps_builder_create(context->fs, container_pool); + apr_array_header_t *sub_items + = apr_array_make(scratch_pool, 64, sizeof(svn_fs_x__id_t)); + svn_fs_x__revision_file_t *file; + + SVN_ERR(svn_fs_x__wrap_temp_rev_file(&file, context->fs, temp_file, + scratch_pool)); + + /* copy all items in strict order */ + for (i = entries->nelts-1; i >= 0; --i) + { + svn_fs_x__representation_t representation = { 0 }; + svn_stringbuf_t *contents; + svn_stream_t *stream; + apr_size_t list_index; + svn_fs_x__p2l_entry_t *entry + = APR_ARRAY_IDX(entries, i, svn_fs_x__p2l_entry_t *); + + if ((block_left < entry->size) && sub_items->nelts) + { + block_left = get_block_left(context) + - svn_fs_x__reps_estimate_size(container); + } + + if ((block_left < entry->size) && sub_items->nelts) + { + SVN_ERR(write_reps_container(context, container, sub_items, + new_entries, iterpool)); + + apr_array_clear(sub_items); + svn_pool_clear(container_pool); + container = svn_fs_x__reps_builder_create(context->fs, + container_pool); + block_left = get_block_left(context); + } + + /* still enough space in current block? */ + if (block_left < entry->size) + { + SVN_ERR(auto_pad_block(context, iterpool)); + block_left = get_block_left(context); + } + + assert(entry->item_count == 1); + representation.id = entry->items[0]; + + /* select the change list in the source file, parse it and add it to + * the container */ + SVN_ERR(svn_io_file_seek(temp_file, APR_SET, &entry->offset, + iterpool)); + SVN_ERR(svn_fs_x__get_representation_length(&representation.size, + &representation.expanded_size, + context->fs, file, + entry, iterpool)); + SVN_ERR(svn_fs_x__get_contents(&stream, context->fs, &representation, + FALSE, iterpool)); + contents = svn_stringbuf_create_ensure(representation.expanded_size, + iterpool); + contents->len = representation.expanded_size; + + /* The representation is immutable. Read it normally. */ + SVN_ERR(svn_stream_read_full(stream, contents->data, &contents->len)); + SVN_ERR(svn_stream_close(stream)); + + SVN_ERR(svn_fs_x__reps_add(&list_index, container, + svn_stringbuf__morph_into_string(contents))); + SVN_ERR_ASSERT(list_index == sub_items->nelts); + block_left -= entry->size; + + APR_ARRAY_PUSH(sub_items, svn_fs_x__id_t) = entry->items[0]; + + svn_pool_clear(iterpool); + } + + if (sub_items->nelts) + SVN_ERR(write_reps_container(context, container, sub_items, + new_entries, iterpool)); + + svn_pool_destroy(iterpool); + svn_pool_destroy(container_pool); + + return SVN_NO_ERROR; +} + +/* Return TRUE if the estimated size of the NODES_IN_CONTAINER plus the + * representations given as svn_fs_x__p2l_entry_t * in ENTRIES may exceed + * the space left in the current block. + */ +static svn_boolean_t +should_flush_nodes_container(pack_context_t *context, + svn_fs_x__noderevs_t *nodes_container, + apr_array_header_t *entries) +{ + apr_ssize_t block_left = get_block_left(context); + apr_ssize_t rep_sum = 0; + apr_ssize_t container_size + = svn_fs_x__noderevs_estimate_size(nodes_container); + + int i; + for (i = 0; i < entries->nelts; ++i) + { + svn_fs_x__p2l_entry_t *entry + = APR_ARRAY_IDX(entries, i, svn_fs_x__p2l_entry_t *); + rep_sum += entry->size; + } + + return block_left < rep_sum + container_size; +} + +/* Read the contents of the first COUNT non-NULL, non-empty items in ITEMS + * from TEMP_FILE and write them to CONTEXT->PACK_FILE. + * Use SCRATCH_POOL for temporary allocations. + */ +static svn_error_t * +store_items(pack_context_t *context, + apr_file_t *temp_file, + apr_array_header_t *items, + int count, + apr_pool_t *scratch_pool) +{ + int i; + apr_pool_t *iterpool = svn_pool_create(scratch_pool); + + /* copy all items in strict order */ + for (i = 0; i < count; ++i) + { + svn_fs_x__p2l_entry_t *entry + = APR_ARRAY_IDX(items, i, svn_fs_x__p2l_entry_t *); + if (!entry + || entry->type == SVN_FS_X__ITEM_TYPE_UNUSED + || entry->item_count == 0) + continue; + + /* select the item in the source file and copy it into the target + * pack file */ + SVN_ERR(svn_io_file_seek(temp_file, APR_SET, &entry->offset, + iterpool)); + SVN_ERR(copy_file_data(context, context->pack_file, temp_file, + entry->size, iterpool)); + + /* write index entry and update current position */ + entry->offset = context->pack_offset; + context->pack_offset += entry->size; + + SVN_ERR(svn_fs_x__p2l_proto_index_add_entry + (context->proto_p2l_index, entry, iterpool)); + + APR_ARRAY_PUSH(context->reps, svn_fs_x__p2l_entry_t *) = entry; + svn_pool_clear(iterpool); + } + + svn_pool_destroy(iterpool); + + return SVN_NO_ERROR; +} + +/* Copy (append) the items identified by svn_fs_x__p2l_entry_t * elements + * in ENTRIES strictly in order from TEMP_FILE into CONTEXT->PACK_FILE. + * Use SCRATCH_POOL for temporary allocations. + */ +static svn_error_t * +copy_reps_from_temp(pack_context_t *context, + apr_file_t *temp_file, + apr_pool_t *scratch_pool) +{ + svn_fs_x__data_t *ffd = context->fs->fsap_data; + + apr_pool_t *iterpool = svn_pool_create(scratch_pool); + apr_pool_t *container_pool = svn_pool_create(scratch_pool); + apr_array_header_t *path_order = context->path_order; + apr_array_header_t *reps = context->reps; + apr_array_header_t *selected = apr_array_make(scratch_pool, 16, + path_order->elt_size); + apr_array_header_t *node_parts = apr_array_make(scratch_pool, 16, + reps->elt_size); + apr_array_header_t *rep_parts = apr_array_make(scratch_pool, 16, + reps->elt_size); + apr_array_header_t *nodes_in_container = apr_array_make(scratch_pool, 16, + reps->elt_size); + int i, k; + int initial_reps_count = reps->nelts; + + /* 1 container for all noderevs in the current block. We will try to + * not write it to disk until the current block fills up, i.e. aim for + * a single noderevs container per block. */ + svn_fs_x__noderevs_t *nodes_container + = svn_fs_x__noderevs_create(16, container_pool); + + /* copy items in path order. Create block-sized containers. */ + for (i = 0; i < path_order->nelts; ++i) + { + if (APR_ARRAY_IDX(path_order, i, path_order_t *) == NULL) + continue; + + /* Collect reps to combine and all noderevs referencing them */ + SVN_ERR(select_reps(context, i, selected, node_parts, rep_parts)); + + /* store the noderevs container in front of the reps */ + SVN_ERR(store_nodes(context, temp_file, node_parts, &nodes_container, + nodes_in_container, container_pool, iterpool)); + + /* actually flush the noderevs to disk if the reps container is likely + * to fill the block, i.e. no further noderevs will be added to the + * nodes container. */ + if (should_flush_nodes_container(context, nodes_container, node_parts)) + SVN_ERR(write_nodes_container(context, &nodes_container, + nodes_in_container, container_pool, + iterpool)); + + /* if all reps are short enough put them into one container. + * Otherwise, just store all containers here. */ + if (reps_fit_into_containers(selected, 2 * ffd->block_size)) + SVN_ERR(write_reps_containers(context, rep_parts, temp_file, + context->reps, iterpool)); + else + SVN_ERR(store_items(context, temp_file, rep_parts, rep_parts->nelts, + iterpool)); + + /* processed all items */ + apr_array_clear(selected); + apr_array_clear(node_parts); + apr_array_clear(rep_parts); + + svn_pool_clear(iterpool); + } + + /* flush noderevs container to disk */ + if (nodes_in_container->nelts) + SVN_ERR(write_nodes_container(context, &nodes_container, + nodes_in_container, container_pool, + iterpool)); + + /* copy all items in strict order */ + SVN_ERR(store_items(context, temp_file, reps, initial_reps_count, + scratch_pool)); + + /* vaccum ENTRIES array: eliminate NULL entries */ + for (i = 0, k = 0; i < reps->nelts; ++i) + { + svn_fs_x__p2l_entry_t *entry + = APR_ARRAY_IDX(reps, i, svn_fs_x__p2l_entry_t *); + if (entry) + { + APR_ARRAY_IDX(reps, k, svn_fs_x__p2l_entry_t *) = entry; + ++k; + } + } + reps->nelts = k; + + svn_pool_destroy(iterpool); + svn_pool_destroy(container_pool); + + return SVN_NO_ERROR; +} + +/* Finalize CONTAINER and write it to CONTEXT's pack file. + * Append an P2L entry containing the given SUB_ITEMS to NEW_ENTRIES. + * Use SCRATCH_POOL for temporary allocations. + */ +static svn_error_t * +write_changes_container(pack_context_t *context, + svn_fs_x__changes_t *container, + apr_array_header_t *sub_items, + apr_array_header_t *new_entries, + apr_pool_t *scratch_pool) +{ + apr_off_t offset = 0; + svn_fs_x__p2l_entry_t container_entry; + + svn_stream_t *pack_stream + = svn_checksum__wrap_write_stream_fnv1a_32x4 + (&container_entry.fnv1_checksum, + svn_stream_from_aprfile2(context->pack_file, + TRUE, scratch_pool), + scratch_pool); + + SVN_ERR(svn_fs_x__write_changes_container(pack_stream, + container, + scratch_pool)); + SVN_ERR(svn_stream_close(pack_stream)); + SVN_ERR(svn_io_file_seek(context->pack_file, APR_CUR, &offset, + scratch_pool)); + + container_entry.offset = context->pack_offset; + container_entry.size = offset - container_entry.offset; + container_entry.type = SVN_FS_X__ITEM_TYPE_CHANGES_CONT; + container_entry.item_count = sub_items->nelts; + container_entry.items = (svn_fs_x__id_t *)sub_items->elts; + + context->pack_offset = offset; + APR_ARRAY_PUSH(new_entries, svn_fs_x__p2l_entry_t *) + = svn_fs_x__p2l_entry_dup(&container_entry, context->info_pool); + + SVN_ERR(svn_fs_x__p2l_proto_index_add_entry + (context->proto_p2l_index, &container_entry, scratch_pool)); + + return SVN_NO_ERROR; +} + +/* Read the change lists identified by svn_fs_x__p2l_entry_t * elements + * in ENTRIES strictly in from TEMP_FILE, aggregate them and write them + * into CONTEXT->PACK_FILE. Use SCRATCH_POOL for temporary allocations. + */ +static svn_error_t * +write_changes_containers(pack_context_t *context, + apr_array_header_t *entries, + apr_file_t *temp_file, + apr_pool_t *scratch_pool) +{ + apr_pool_t *iterpool = svn_pool_create(scratch_pool); + apr_pool_t *container_pool = svn_pool_create(scratch_pool); + int i; + + apr_ssize_t block_left = get_block_left(context); + apr_ssize_t estimated_addition = 0; + + svn_fs_x__changes_t *container + = svn_fs_x__changes_create(1000, container_pool); + apr_array_header_t *sub_items + = apr_array_make(scratch_pool, 64, sizeof(svn_fs_x__id_t)); + apr_array_header_t *new_entries + = apr_array_make(context->info_pool, 16, entries->elt_size); + svn_stream_t *temp_stream + = svn_stream_from_aprfile2(temp_file, TRUE, scratch_pool); + + /* copy all items in strict order */ + for (i = entries->nelts-1; i >= 0; --i) + { + apr_array_header_t *changes; + apr_size_t list_index; + svn_fs_x__p2l_entry_t *entry + = APR_ARRAY_IDX(entries, i, svn_fs_x__p2l_entry_t *); + + /* zip compression alone will significantly reduce the size of large + * change lists. So, we will probably need even less than this estimate. + */ + apr_ssize_t estimated_size = (entry->size / 5) + 250; + + /* If necessary and enough data has been added to the container since + * the last test, try harder by actually serializing the container and + * determine current savings due to compression. */ + if (block_left < estimated_size && estimated_addition > 2000) + { + svn_stringbuf_t *serialized + = svn_stringbuf_create_ensure(get_block_left(context), iterpool); + svn_stream_t *memory_stream + = svn_stream_from_stringbuf(serialized, iterpool); + + SVN_ERR(svn_fs_x__write_changes_container(memory_stream, + container, iterpool)); + SVN_ERR(svn_stream_close(temp_stream)); + + block_left = get_block_left(context) - serialized->len; + estimated_addition = 0; + } + + if ((block_left < estimated_size) && sub_items->nelts) + { + SVN_ERR(write_changes_container(context, container, sub_items, + new_entries, iterpool)); + + apr_array_clear(sub_items); + svn_pool_clear(container_pool); + container = svn_fs_x__changes_create(1000, container_pool); + block_left = get_block_left(context); + estimated_addition = 0; + } + + /* still enough space in current block? */ + if (block_left < estimated_size) + { + SVN_ERR(auto_pad_block(context, iterpool)); + block_left = get_block_left(context); + } + + /* select the change list in the source file, parse it and add it to + * the container */ + SVN_ERR(svn_io_file_seek(temp_file, APR_SET, &entry->offset, + iterpool)); + SVN_ERR(svn_fs_x__read_changes(&changes, temp_stream, scratch_pool, + iterpool)); + SVN_ERR(svn_fs_x__changes_append_list(&list_index, container, changes)); + SVN_ERR_ASSERT(list_index == sub_items->nelts); + block_left -= estimated_size; + estimated_addition += estimated_size; + + APR_ARRAY_PUSH(sub_items, svn_fs_x__id_t) = entry->items[0]; + + svn_pool_clear(iterpool); + } + + if (sub_items->nelts) + SVN_ERR(write_changes_container(context, container, sub_items, + new_entries, iterpool)); + + *entries = *new_entries; + svn_pool_destroy(iterpool); + svn_pool_destroy(container_pool); + + return SVN_NO_ERROR; +} + +/* Read the (property) representations identified by svn_fs_x__p2l_entry_t + * elements in ENTRIES from TEMP_FILE, aggregate them and write them into + * CONTEXT->PACK_FILE. Use SCRATCH_POOL for temporary allocations. + */ +static svn_error_t * +write_property_containers(pack_context_t *context, + apr_array_header_t *entries, + apr_file_t *temp_file, + apr_pool_t *scratch_pool) +{ + apr_array_header_t *new_entries + = apr_array_make(context->info_pool, 16, entries->elt_size); + + SVN_ERR(write_reps_containers(context, entries, temp_file, new_entries, + scratch_pool)); + + *entries = *new_entries; + + return SVN_NO_ERROR; +} + +/* Append all entries of svn_fs_x__p2l_entry_t * array TO_APPEND to + * svn_fs_x__p2l_entry_t * array DEST. + */ +static void +append_entries(apr_array_header_t *dest, + apr_array_header_t *to_append) +{ + int i; + for (i = 0; i < to_append->nelts; ++i) + APR_ARRAY_PUSH(dest, svn_fs_x__p2l_entry_t *) + = APR_ARRAY_IDX(to_append, i, svn_fs_x__p2l_entry_t *); +} + +/* Write the log-to-phys proto index file for CONTEXT and use POOL for + * temporary allocations. All items in all buckets must have been placed + * by now. + */ +static svn_error_t * +write_l2p_index(pack_context_t *context, + apr_pool_t *pool) +{ + apr_pool_t *scratch_pool = svn_pool_create(pool); + const char *temp_name; + const char *proto_index; + apr_off_t offset = 0; + + /* lump all items into one bucket. As target, use the bucket that + * probably has the most entries already. */ + append_entries(context->reps, context->changes); + append_entries(context->reps, context->file_props); + append_entries(context->reps, context->dir_props); + + /* Let the index code do the expensive L2P -> P2L transformation. */ + SVN_ERR(svn_fs_x__l2p_index_from_p2l_entries(&temp_name, + context->fs, + context->reps, + pool, scratch_pool)); + + /* Append newly written segment to exisiting proto index file. */ + SVN_ERR(svn_io_file_name_get(&proto_index, context->proto_l2p_index, + scratch_pool)); + + SVN_ERR(svn_io_file_flush(context->proto_l2p_index, scratch_pool)); + SVN_ERR(svn_io_append_file(temp_name, proto_index, scratch_pool)); + SVN_ERR(svn_io_remove_file2(temp_name, FALSE, scratch_pool)); + SVN_ERR(svn_io_file_seek(context->proto_l2p_index, APR_END, &offset, + scratch_pool)); + + /* Done. */ + svn_pool_destroy(scratch_pool); + + return SVN_NO_ERROR; +} + +/* Pack the current revision range of CONTEXT, i.e. this covers phases 2 + * to 4. Use SCRATCH_POOL for temporary allocations. + */ +static svn_error_t * +pack_range(pack_context_t *context, + apr_pool_t *scratch_pool) +{ + svn_fs_x__data_t *ffd = context->fs->fsap_data; + apr_pool_t *revpool = svn_pool_create(scratch_pool); + apr_pool_t *iterpool = svn_pool_create(scratch_pool); + + /* Phase 2: Copy items into various buckets and build tracking info */ + svn_revnum_t revision; + for (revision = context->start_rev; revision < context->end_rev; ++revision) + { + apr_off_t offset = 0; + svn_fs_x__revision_file_t *rev_file; + + /* Get the rev file dimensions (mainly index locations). */ + SVN_ERR(svn_fs_x__open_pack_or_rev_file(&rev_file, context->fs, + revision, revpool, iterpool)); + SVN_ERR(svn_fs_x__auto_read_footer(rev_file)); + + /* store the indirect array index */ + APR_ARRAY_PUSH(context->rev_offsets, int) = context->reps->nelts; + + /* read the phys-to-log index file until we covered the whole rev file. + * That index contains enough info to build both target indexes from it. */ + while (offset < rev_file->l2p_offset) + { + /* read one cluster */ + int i; + apr_array_header_t *entries; + svn_pool_clear(iterpool); + + SVN_ERR(svn_fs_x__p2l_index_lookup(&entries, context->fs, + rev_file, revision, offset, + ffd->p2l_page_size, iterpool, + iterpool)); + + for (i = 0; i < entries->nelts; ++i) + { + svn_fs_x__p2l_entry_t *entry + = &APR_ARRAY_IDX(entries, i, svn_fs_x__p2l_entry_t); + + /* skip first entry if that was duplicated due crossing a + cluster boundary */ + if (offset > entry->offset) + continue; + + /* process entry while inside the rev file */ + offset = entry->offset; + if (offset < rev_file->l2p_offset) + { + SVN_ERR(svn_io_file_seek(rev_file->file, APR_SET, &offset, + iterpool)); + + if (entry->type == SVN_FS_X__ITEM_TYPE_CHANGES) + SVN_ERR(copy_item_to_temp(context, + context->changes, + context->changes_file, + rev_file, entry, iterpool)); + else if (entry->type == SVN_FS_X__ITEM_TYPE_FILE_PROPS) + SVN_ERR(copy_item_to_temp(context, + context->file_props, + context->file_props_file, + rev_file, entry, iterpool)); + else if (entry->type == SVN_FS_X__ITEM_TYPE_DIR_PROPS) + SVN_ERR(copy_item_to_temp(context, + context->dir_props, + context->dir_props_file, + rev_file, entry, iterpool)); + else if ( entry->type == SVN_FS_X__ITEM_TYPE_FILE_REP + || entry->type == SVN_FS_X__ITEM_TYPE_DIR_REP) + SVN_ERR(copy_rep_to_temp(context, rev_file, entry, + iterpool)); + else if (entry->type == SVN_FS_X__ITEM_TYPE_NODEREV) + SVN_ERR(copy_node_to_temp(context, rev_file, entry, + iterpool)); + else + SVN_ERR_ASSERT(entry->type == SVN_FS_X__ITEM_TYPE_UNUSED); + + offset += entry->size; + } + } + + if (context->cancel_func) + SVN_ERR(context->cancel_func(context->cancel_baton)); + } + + svn_pool_clear(revpool); + } + + svn_pool_destroy(iterpool); + + /* phase 3: placement. + * Use "newest first" placement for simple items. */ + sort_items(context->changes); + sort_items(context->file_props); + sort_items(context->dir_props); + + /* follow dependencies recursively for noderevs and data representations */ + sort_reps(context); + + /* phase 4: copy bucket data to pack file. Write P2L index. */ + SVN_ERR(write_changes_containers(context, context->changes, + context->changes_file, revpool)); + svn_pool_clear(revpool); + SVN_ERR(write_property_containers(context, context->file_props, + context->file_props_file, revpool)); + svn_pool_clear(revpool); + SVN_ERR(write_property_containers(context, context->dir_props, + context->dir_props_file, revpool)); + svn_pool_clear(revpool); + SVN_ERR(copy_reps_from_temp(context, context->reps_file, revpool)); + svn_pool_clear(revpool); + + /* write L2P index as well (now that we know all target offsets) */ + SVN_ERR(write_l2p_index(context, revpool)); + + svn_pool_destroy(revpool); + + return SVN_NO_ERROR; +} + +/* Append CONTEXT->START_REV to the context's pack file with no re-ordering. + * This function will only be used for very large revisions (>>100k changes). + * Use SCRATCH_POOL for temporary allocations. + */ +static svn_error_t * +append_revision(pack_context_t *context, + apr_pool_t *scratch_pool) +{ + svn_fs_x__data_t *ffd = context->fs->fsap_data; + apr_off_t offset = 0; + apr_pool_t *iterpool = svn_pool_create(scratch_pool); + svn_fs_x__revision_file_t *rev_file; + apr_finfo_t finfo; + + /* Get the size of the file. */ + const char *path = svn_dirent_join(context->shard_dir, + apr_psprintf(iterpool, "%ld", + context->start_rev), + scratch_pool); + SVN_ERR(svn_io_stat(&finfo, path, APR_FINFO_SIZE, scratch_pool)); + + /* Copy all the bits from the rev file to the end of the pack file. */ + SVN_ERR(svn_fs_x__open_pack_or_rev_file(&rev_file, context->fs, + context->start_rev, scratch_pool, + iterpool)); + SVN_ERR(copy_file_data(context, context->pack_file, rev_file->file, + finfo.size, iterpool)); + + /* mark the start of a new revision */ + SVN_ERR(svn_fs_x__l2p_proto_index_add_revision(context->proto_l2p_index, + scratch_pool)); + + /* read the phys-to-log index file until we covered the whole rev file. + * That index contains enough info to build both target indexes from it. */ + while (offset < finfo.size) + { + /* read one cluster */ + int i; + apr_array_header_t *entries; + SVN_ERR(svn_fs_x__p2l_index_lookup(&entries, context->fs, rev_file, + context->start_rev, offset, + ffd->p2l_page_size, iterpool, + iterpool)); + + for (i = 0; i < entries->nelts; ++i) + { + svn_fs_x__p2l_entry_t *entry + = &APR_ARRAY_IDX(entries, i, svn_fs_x__p2l_entry_t); + + /* skip first entry if that was duplicated due crossing a + cluster boundary */ + if (offset > entry->offset) + continue; + + /* process entry while inside the rev file */ + offset = entry->offset; + if (offset < finfo.size) + { + /* there should be true containers */ + SVN_ERR_ASSERT(entry->item_count == 1); + + entry->offset += context->pack_offset; + offset += entry->size; + SVN_ERR(svn_fs_x__l2p_proto_index_add_entry + (context->proto_l2p_index, entry->offset, 0, + entry->items[0].number, iterpool)); + SVN_ERR(svn_fs_x__p2l_proto_index_add_entry + (context->proto_p2l_index, entry, iterpool)); + } + } + + svn_pool_clear(iterpool); + } + + svn_pool_destroy(iterpool); + context->pack_offset += finfo.size; + + return SVN_NO_ERROR; +} + +/* Format 7 packing logic. + * + * Pack the revision shard starting at SHARD_REV in filesystem FS from + * SHARD_DIR into the PACK_FILE_DIR, using SCRATCH_POOL for temporary + * allocations. Limit the extra memory consumption to MAX_MEM bytes. + * CANCEL_FUNC and CANCEL_BATON are what you think they are. + */ +static svn_error_t * +pack_log_addressed(svn_fs_t *fs, + const char *pack_file_dir, + const char *shard_dir, + svn_revnum_t shard_rev, + apr_size_t max_mem, + svn_cancel_func_t cancel_func, + void *cancel_baton, + apr_pool_t *scratch_pool) +{ + enum + { + /* estimated amount of memory used to represent one item in memory + * during rev file packing */ + PER_ITEM_MEM = APR_ALIGN_DEFAULT(sizeof(path_order_t)) + + APR_ALIGN_DEFAULT(2 *sizeof(void*)) + + APR_ALIGN_DEFAULT(sizeof(reference_t)) + + APR_ALIGN_DEFAULT(sizeof(svn_fs_x__p2l_entry_t)) + + 6 * sizeof(void*) + }; + + int max_items = max_mem / PER_ITEM_MEM > INT_MAX + ? INT_MAX + : (int)(max_mem / PER_ITEM_MEM); + apr_array_header_t *max_ids; + pack_context_t context = { 0 }; + int i; + apr_size_t item_count = 0; + apr_pool_t *iterpool = svn_pool_create(scratch_pool); + + /* set up a pack context */ + SVN_ERR(initialize_pack_context(&context, fs, pack_file_dir, shard_dir, + shard_rev, max_items, cancel_func, + cancel_baton, scratch_pool)); + + /* phase 1: determine the size of the revisions to pack */ + SVN_ERR(svn_fs_x__l2p_get_max_ids(&max_ids, fs, shard_rev, + context.shard_end_rev - shard_rev, + scratch_pool, scratch_pool)); + + /* pack revisions in ranges that don't exceed MAX_MEM */ + for (i = 0; i < max_ids->nelts; ++i) + if (APR_ARRAY_IDX(max_ids, i, apr_uint64_t) + item_count <= max_items) + { + context.end_rev++; + } + else + { + /* some unpacked revisions before this one? */ + if (context.start_rev < context.end_rev) + { + /* pack them intelligently (might be just 1 rev but still ...) */ + SVN_ERR(pack_range(&context, iterpool)); + SVN_ERR(reset_pack_context(&context, iterpool)); + item_count = 0; + } + + /* next revision range is to start with the current revision */ + context.start_rev = i + context.shard_rev; + context.end_rev = context.start_rev + 1; + + /* if this is a very large revision, we must place it as is */ + if (APR_ARRAY_IDX(max_ids, i, apr_uint64_t) > max_items) + { + SVN_ERR(append_revision(&context, iterpool)); + context.start_rev++; + } + else + item_count += (apr_size_t)APR_ARRAY_IDX(max_ids, i, apr_uint64_t); + + svn_pool_clear(iterpool); + } + + /* non-empty revision range at the end? */ + if (context.start_rev < context.end_rev) + SVN_ERR(pack_range(&context, iterpool)); + + /* last phase: finalize indexes and clean up */ + SVN_ERR(reset_pack_context(&context, iterpool)); + SVN_ERR(close_pack_context(&context, iterpool)); + svn_pool_destroy(iterpool); + + return SVN_NO_ERROR; +} + +/* Given REV in FS, set *REV_OFFSET to REV's offset in the packed file. + Use SCRATCH_POOL for temporary allocations. */ +svn_error_t * +svn_fs_x__get_packed_offset(apr_off_t *rev_offset, + svn_fs_t *fs, + svn_revnum_t rev, + apr_pool_t *scratch_pool) +{ + svn_fs_x__data_t *ffd = fs->fsap_data; + svn_stream_t *manifest_stream; + svn_boolean_t is_cached; + svn_revnum_t shard; + apr_int64_t shard_pos; + apr_array_header_t *manifest; + apr_pool_t *iterpool; + + shard = rev / ffd->max_files_per_dir; + + /* position of the shard within the manifest */ + shard_pos = rev % ffd->max_files_per_dir; + + /* fetch exactly that element into *rev_offset, if the manifest is found + in the cache */ + SVN_ERR(svn_cache__get_partial((void **) rev_offset, &is_cached, + ffd->packed_offset_cache, &shard, + svn_fs_x__get_sharded_offset, &shard_pos, + scratch_pool)); + + if (is_cached) + return SVN_NO_ERROR; + + /* Open the manifest file. */ + SVN_ERR(svn_stream_open_readonly(&manifest_stream, + svn_fs_x__path_rev_packed(fs, rev, PATH_MANIFEST, + scratch_pool), + scratch_pool, scratch_pool)); + + /* While we're here, let's just read the entire manifest file into an array, + so we can cache the entire thing. */ + iterpool = svn_pool_create(scratch_pool); + manifest = apr_array_make(scratch_pool, ffd->max_files_per_dir, + sizeof(apr_off_t)); + while (1) + { + svn_boolean_t eof; + apr_int64_t val; + + svn_pool_clear(iterpool); + SVN_ERR(svn_fs_x__read_number_from_stream(&val, &eof, manifest_stream, + iterpool)); + if (eof) + break; + + APR_ARRAY_PUSH(manifest, apr_off_t) = (apr_off_t)val; + } + svn_pool_destroy(iterpool); + + *rev_offset = APR_ARRAY_IDX(manifest, rev % ffd->max_files_per_dir, + apr_off_t); + + /* Close up shop and cache the array. */ + SVN_ERR(svn_stream_close(manifest_stream)); + return svn_cache__set(ffd->packed_offset_cache, &shard, manifest, + scratch_pool); +} + +/* In filesystem FS, pack the revision SHARD containing exactly + * MAX_FILES_PER_DIR revisions from SHARD_PATH into the PACK_FILE_DIR, + * using SCRATCH_POOL for temporary allocations. Try to limit the amount of + * temporary memory needed to MAX_MEM bytes. CANCEL_FUNC and CANCEL_BATON + * are what you think they are. + * + * If for some reason we detect a partial packing already performed, we + * remove the pack file and start again. + * + * The actual packing will be done in a format-specific sub-function. + */ +static svn_error_t * +pack_rev_shard(svn_fs_t *fs, + const char *pack_file_dir, + const char *shard_path, + apr_int64_t shard, + int max_files_per_dir, + apr_size_t max_mem, + svn_cancel_func_t cancel_func, + void *cancel_baton, + apr_pool_t *scratch_pool) +{ + const char *pack_file_path; + svn_revnum_t shard_rev = (svn_revnum_t) (shard * max_files_per_dir); + + /* Some useful paths. */ + pack_file_path = svn_dirent_join(pack_file_dir, PATH_PACKED, scratch_pool); + + /* Remove any existing pack file for this shard, since it is incomplete. */ + SVN_ERR(svn_io_remove_dir2(pack_file_dir, TRUE, cancel_func, cancel_baton, + scratch_pool)); + + /* Create the new directory and pack file. */ + SVN_ERR(svn_io_dir_make(pack_file_dir, APR_OS_DEFAULT, scratch_pool)); + + /* Index information files */ + SVN_ERR(pack_log_addressed(fs, pack_file_dir, shard_path, shard_rev, + max_mem, cancel_func, cancel_baton, + scratch_pool)); + + SVN_ERR(svn_io_copy_perms(shard_path, pack_file_dir, scratch_pool)); + SVN_ERR(svn_io_set_file_read_only(pack_file_path, FALSE, scratch_pool)); + + return SVN_NO_ERROR; +} + +/* In the file system at FS_PATH, pack the SHARD in REVS_DIR and + * REVPROPS_DIR containing exactly MAX_FILES_PER_DIR revisions, using + * SCRATCH_POOL temporary for allocations. REVPROPS_DIR will be NULL if + * revprop packing is not supported. COMPRESSION_LEVEL and MAX_PACK_SIZE + * will be ignored in that case. + * + * CANCEL_FUNC and CANCEL_BATON are what you think they are; similarly + * NOTIFY_FUNC and NOTIFY_BATON. + * + * If for some reason we detect a partial packing already performed, we + * remove the pack file and start again. + */ +static svn_error_t * +pack_shard(const char *revs_dir, + const char *revsprops_dir, + svn_fs_t *fs, + apr_int64_t shard, + int max_files_per_dir, + apr_off_t max_pack_size, + int compression_level, + svn_fs_pack_notify_t notify_func, + void *notify_baton, + svn_cancel_func_t cancel_func, + void *cancel_baton, + apr_pool_t *scratch_pool) +{ + svn_fs_x__data_t *ffd = fs->fsap_data; + const char *rev_shard_path, *rev_pack_file_dir; + const char *revprops_shard_path, *revprops_pack_file_dir; + + /* Notify caller we're starting to pack this shard. */ + if (notify_func) + SVN_ERR(notify_func(notify_baton, shard, svn_fs_pack_notify_start, + scratch_pool)); + + /* Some useful paths. */ + rev_pack_file_dir = svn_dirent_join(revs_dir, + apr_psprintf(scratch_pool, + "%" APR_INT64_T_FMT PATH_EXT_PACKED_SHARD, + shard), + scratch_pool); + rev_shard_path = svn_dirent_join(revs_dir, + apr_psprintf(scratch_pool, "%" APR_INT64_T_FMT, shard), + scratch_pool); + + /* pack the revision content */ + SVN_ERR(pack_rev_shard(fs, rev_pack_file_dir, rev_shard_path, + shard, max_files_per_dir, DEFAULT_MAX_MEM, + cancel_func, cancel_baton, scratch_pool)); + + /* if enabled, pack the revprops in an equivalent way */ + if (revsprops_dir) + { + revprops_pack_file_dir = svn_dirent_join(revsprops_dir, + apr_psprintf(scratch_pool, + "%" APR_INT64_T_FMT PATH_EXT_PACKED_SHARD, + shard), + scratch_pool); + revprops_shard_path = svn_dirent_join(revsprops_dir, + apr_psprintf(scratch_pool, "%" APR_INT64_T_FMT, shard), + scratch_pool); + + SVN_ERR(svn_fs_x__pack_revprops_shard(revprops_pack_file_dir, + revprops_shard_path, + shard, max_files_per_dir, + (int)(0.9 * max_pack_size), + compression_level, + cancel_func, cancel_baton, + scratch_pool)); + } + + /* Update the min-unpacked-rev file to reflect our newly packed shard. */ + SVN_ERR(svn_fs_x__write_min_unpacked_rev(fs, + (svn_revnum_t)((shard + 1) * max_files_per_dir), + scratch_pool)); + ffd->min_unpacked_rev = (svn_revnum_t)((shard + 1) * max_files_per_dir); + + /* Finally, remove the existing shard directories. + * For revprops, clean up older obsolete shards as well as they might + * have been left over from an interrupted FS upgrade. */ + SVN_ERR(svn_io_remove_dir2(rev_shard_path, TRUE, + cancel_func, cancel_baton, scratch_pool)); + if (revsprops_dir) + { + svn_node_kind_t kind = svn_node_dir; + apr_int64_t to_cleanup = shard; + do + { + SVN_ERR(svn_fs_x__delete_revprops_shard(revprops_shard_path, + to_cleanup, + max_files_per_dir, + cancel_func, cancel_baton, + scratch_pool)); + + /* If the previous shard exists, clean it up as well. + Don't try to clean up shard 0 as it we can't tell quickly + whether it actually needs cleaning up. */ + revprops_shard_path = svn_dirent_join(revsprops_dir, + apr_psprintf(scratch_pool, + "%" APR_INT64_T_FMT, + --to_cleanup), + scratch_pool); + SVN_ERR(svn_io_check_path(revprops_shard_path, &kind, scratch_pool)); + } + while (kind == svn_node_dir && to_cleanup > 0); + } + + /* Notify caller we're starting to pack this shard. */ + if (notify_func) + SVN_ERR(notify_func(notify_baton, shard, svn_fs_pack_notify_end, + scratch_pool)); + + return SVN_NO_ERROR; +} + +typedef struct pack_baton_t +{ + svn_fs_t *fs; + svn_fs_pack_notify_t notify_func; + void *notify_baton; + svn_cancel_func_t cancel_func; + void *cancel_baton; +} pack_baton_t; + + +/* The work-horse for svn_fs_x__pack, called with the FS write lock. + This implements the svn_fs_x__with_write_lock() 'body' callback + type. BATON is a 'pack_baton_t *'. + + WARNING: if you add a call to this function, please note: + The code currently assumes that any piece of code running with + the write-lock set can rely on the ffd->min_unpacked_rev and + ffd->min_unpacked_revprop caches to be up-to-date (and, by + extension, on not having to use a retry when calling + svn_fs_x__path_rev_absolute() and friends). If you add a call + to this function, consider whether you have to call + update_min_unpacked_rev(). + See this thread: http://thread.gmane.org/1291206765.3782.3309.camel@edith + */ +static svn_error_t * +pack_body(void *baton, + apr_pool_t *scratch_pool) +{ + pack_baton_t *pb = baton; + svn_fs_x__data_t *ffd = pb->fs->fsap_data; + apr_int64_t completed_shards; + apr_int64_t i; + svn_revnum_t youngest; + apr_pool_t *iterpool; + const char *rev_data_path; + const char *revprops_data_path = NULL; + + /* If we aren't using sharding, we can't do any packing, so quit. */ + SVN_ERR(svn_fs_x__read_min_unpacked_rev(&ffd->min_unpacked_rev, pb->fs, + scratch_pool)); + + SVN_ERR(svn_fs_x__youngest_rev(&youngest, pb->fs, scratch_pool)); + completed_shards = (youngest + 1) / ffd->max_files_per_dir; + + /* See if we've already completed all possible shards thus far. */ + if (ffd->min_unpacked_rev == (completed_shards * ffd->max_files_per_dir)) + return SVN_NO_ERROR; + + rev_data_path = svn_dirent_join(pb->fs->path, PATH_REVS_DIR, scratch_pool); + revprops_data_path = svn_dirent_join(pb->fs->path, PATH_REVPROPS_DIR, + scratch_pool); + + iterpool = svn_pool_create(scratch_pool); + for (i = ffd->min_unpacked_rev / ffd->max_files_per_dir; + i < completed_shards; + i++) + { + svn_pool_clear(iterpool); + + if (pb->cancel_func) + SVN_ERR(pb->cancel_func(pb->cancel_baton)); + + SVN_ERR(pack_shard(rev_data_path, revprops_data_path, + pb->fs, i, ffd->max_files_per_dir, + ffd->revprop_pack_size, + ffd->compress_packed_revprops + ? SVN__COMPRESSION_ZLIB_DEFAULT + : SVN__COMPRESSION_NONE, + pb->notify_func, pb->notify_baton, + pb->cancel_func, pb->cancel_baton, iterpool)); + } + + svn_pool_destroy(iterpool); + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__pack(svn_fs_t *fs, + svn_fs_pack_notify_t notify_func, + void *notify_baton, + svn_cancel_func_t cancel_func, + void *cancel_baton, + apr_pool_t *scratch_pool) +{ + pack_baton_t pb = { 0 }; + pb.fs = fs; + pb.notify_func = notify_func; + pb.notify_baton = notify_baton; + pb.cancel_func = cancel_func; + pb.cancel_baton = cancel_baton; + return svn_fs_x__with_pack_lock(fs, pack_body, &pb, scratch_pool); +} diff --git a/subversion/libsvn_fs_x/pack.h b/subversion/libsvn_fs_x/pack.h new file mode 100644 index 0000000..5541619 --- /dev/null +++ b/subversion/libsvn_fs_x/pack.h @@ -0,0 +1,65 @@ +/* pack.h : interface FSX pack functionality + * + * ==================================================================== + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + * ==================================================================== + */ + +#ifndef SVN_LIBSVN_FS__PACK_H +#define SVN_LIBSVN_FS__PACK_H + +#include "fs.h" + +/* Possibly pack the repository at PATH. This just take full shards, and + combines all the revision files into a single one, with a manifest header. + Use optional CANCEL_FUNC/CANCEL_BATON for cancellation support. + Use SCRATCH_POOL for temporary allocations. + + Existing filesystem references need not change. */ +svn_error_t * +svn_fs_x__pack(svn_fs_t *fs, + svn_fs_pack_notify_t notify_func, + void *notify_baton, + svn_cancel_func_t cancel_func, + void *cancel_baton, + apr_pool_t *scratch_pool); + +/** + * For the packed revision REV in FS, determine the offset within the + * revision pack file and return it in REV_OFFSET. + * Use SCRATCH_POOL for temporary allocations. + */ +svn_error_t * +svn_fs_x__get_packed_offset(apr_off_t *rev_offset, + svn_fs_t *fs, + svn_revnum_t rev, + apr_pool_t *scratch_pool); + +/* Return the svn_dir_entry_t* objects of DIRECTORY in an APR array + * allocated in RESULT_POOL with entries added in storage (on-disk) order. + * FS' format will be used to pick the optimal ordering strategy. Use + * SCRATCH_POOL for temporary allocations. + */ +apr_array_header_t * +svn_fs_x__order_dir_entries(svn_fs_t *fs, + apr_hash_t *directory, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool); + + +#endif diff --git a/subversion/libsvn_fs_x/recovery.c b/subversion/libsvn_fs_x/recovery.c new file mode 100644 index 0000000..984b740 --- /dev/null +++ b/subversion/libsvn_fs_x/recovery.c @@ -0,0 +1,263 @@ +/* recovery.c --- FSX recovery functionality +* + * ==================================================================== + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + * ==================================================================== + */ + +#include "recovery.h" + +#include "svn_hash.h" +#include "svn_pools.h" +#include "private/svn_string_private.h" + +#include "low_level.h" +#include "rep-cache.h" +#include "revprops.h" +#include "transaction.h" +#include "util.h" +#include "cached_data.h" +#include "index.h" + +#include "../libsvn_fs/fs-loader.h" + +#include "svn_private_config.h" + +/* Part of the recovery procedure. Return the largest revision *REV in + filesystem FS. Use SCRATCH_POOL for temporary allocation. */ +static svn_error_t * +recover_get_largest_revision(svn_fs_t *fs, + svn_revnum_t *rev, + apr_pool_t *scratch_pool) +{ + /* Discovering the largest revision in the filesystem would be an + expensive operation if we did a readdir() or searched linearly, + so we'll do a form of binary search. left is a revision that we + know exists, right a revision that we know does not exist. */ + apr_pool_t *iterpool; + svn_revnum_t left, right = 1; + + iterpool = svn_pool_create(scratch_pool); + /* Keep doubling right, until we find a revision that doesn't exist. */ + while (1) + { + svn_error_t *err; + svn_fs_x__revision_file_t *file; + svn_pool_clear(iterpool); + + err = svn_fs_x__open_pack_or_rev_file(&file, fs, right, iterpool, + iterpool); + if (err && err->apr_err == SVN_ERR_FS_NO_SUCH_REVISION) + { + svn_error_clear(err); + break; + } + else + SVN_ERR(err); + + right <<= 1; + } + + left = right >> 1; + + /* We know that left exists and right doesn't. Do a normal bsearch to find + the last revision. */ + while (left + 1 < right) + { + svn_revnum_t probe = left + ((right - left) / 2); + svn_error_t *err; + svn_fs_x__revision_file_t *file; + svn_pool_clear(iterpool); + + err = svn_fs_x__open_pack_or_rev_file(&file, fs, probe, iterpool, + iterpool); + if (err && err->apr_err == SVN_ERR_FS_NO_SUCH_REVISION) + { + svn_error_clear(err); + right = probe; + } + else + { + SVN_ERR(err); + left = probe; + } + } + + svn_pool_destroy(iterpool); + + /* left is now the largest revision that exists. */ + *rev = left; + return SVN_NO_ERROR; +} + +/* Baton used for recover_body below. */ +typedef struct recover_baton_t { + svn_fs_t *fs; + svn_cancel_func_t cancel_func; + void *cancel_baton; +} recover_baton_t; + +/* The work-horse for svn_fs_x__recover, called with the FS + write lock. This implements the svn_fs_x__with_write_lock() + 'body' callback type. BATON is a 'recover_baton_t *'. */ +static svn_error_t * +recover_body(void *baton, + apr_pool_t *scratch_pool) +{ + recover_baton_t *b = baton; + svn_fs_t *fs = b->fs; + svn_fs_x__data_t *ffd = fs->fsap_data; + svn_revnum_t max_rev; + svn_revnum_t youngest_rev; + svn_boolean_t revprop_missing = TRUE; + svn_boolean_t revprop_accessible = FALSE; + + /* Lose potentially corrupted data in temp files */ + SVN_ERR(svn_fs_x__reset_revprop_generation_file(fs, scratch_pool)); + + /* The admin may have created a plain copy of this repo before attempting + to recover it (hotcopy may or may not work with corrupted repos). + Bump the instance ID. */ + SVN_ERR(svn_fs_x__set_uuid(fs, fs->uuid, NULL, scratch_pool)); + + /* We need to know the largest revision in the filesystem. */ + SVN_ERR(recover_get_largest_revision(fs, &max_rev, scratch_pool)); + + /* Get the expected youngest revision */ + SVN_ERR(svn_fs_x__youngest_rev(&youngest_rev, fs, scratch_pool)); + + /* Policy note: + + Since the revprops file is written after the revs file, the true + maximum available revision is the youngest one for which both are + present. That's probably the same as the max_rev we just found, + but if it's not, we could, in theory, repeatedly decrement + max_rev until we find a revision that has both a revs and + revprops file, then write db/current with that. + + But we choose not to. If a repository is so corrupt that it's + missing at least one revprops file, we shouldn't assume that the + youngest revision for which both the revs and revprops files are + present is healthy. In other words, we're willing to recover + from a missing or out-of-date db/current file, because db/current + is truly redundant -- it's basically a cache so we don't have to + find max_rev each time, albeit a cache with unusual semantics, + since it also officially defines when a revision goes live. But + if we're missing more than the cache, it's time to back out and + let the admin reconstruct things by hand: correctness at that + point may depend on external things like checking a commit email + list, looking in particular working copies, etc. + + This policy matches well with a typical naive backup scenario. + Say you're rsyncing your FSX repository nightly to the same + location. Once revs and revprops are written, you've got the + maximum rev; if the backup should bomb before db/current is + written, then db/current could stay arbitrarily out-of-date, but + we can still recover. It's a small window, but we might as well + do what we can. */ + + /* Even if db/current were missing, it would be created with 0 by + get_youngest(), so this conditional remains valid. */ + if (youngest_rev > max_rev) + return svn_error_createf(SVN_ERR_FS_CORRUPT, NULL, + _("Expected current rev to be <= %ld " + "but found %ld"), max_rev, youngest_rev); + + /* Before setting current, verify that there is a revprops file + for the youngest revision. (Issue #2992) */ + if (svn_fs_x__is_packed_revprop(fs, max_rev)) + { + revprop_accessible + = svn_fs_x__packed_revprop_available(&revprop_missing, fs, max_rev, + scratch_pool); + } + else + { + svn_node_kind_t youngest_revprops_kind; + SVN_ERR(svn_io_check_path(svn_fs_x__path_revprops(fs, max_rev, + scratch_pool), + &youngest_revprops_kind, scratch_pool)); + + if (youngest_revprops_kind == svn_node_file) + { + revprop_missing = FALSE; + revprop_accessible = TRUE; + } + else if (youngest_revprops_kind != svn_node_none) + { + return svn_error_createf(SVN_ERR_FS_CORRUPT, NULL, + _("Revision %ld has a non-file where its " + "revprops file should be"), + max_rev); + } + } + + if (!revprop_accessible) + { + if (revprop_missing) + { + return svn_error_createf(SVN_ERR_FS_CORRUPT, NULL, + _("Revision %ld has a revs file but no " + "revprops file"), + max_rev); + } + else + { + return svn_error_createf(SVN_ERR_FS_CORRUPT, NULL, + _("Revision %ld has a revs file but the " + "revprops file is inaccessible"), + max_rev); + } + } + + /* Prune younger-than-(newfound-youngest) revisions from the rep + cache if sharing is enabled taking care not to create the cache + if it does not exist. */ + if (ffd->rep_sharing_allowed) + { + svn_boolean_t rep_cache_exists; + + SVN_ERR(svn_fs_x__exists_rep_cache(&rep_cache_exists, fs, + scratch_pool)); + if (rep_cache_exists) + SVN_ERR(svn_fs_x__del_rep_reference(fs, max_rev, scratch_pool)); + } + + /* Now store the discovered youngest revision, and the next IDs if + relevant, in a new 'current' file. */ + return svn_fs_x__write_current(fs, max_rev, scratch_pool); +} + +/* This implements the fs_library_vtable_t.recover() API. */ +svn_error_t * +svn_fs_x__recover(svn_fs_t *fs, + svn_cancel_func_t cancel_func, + void *cancel_baton, + apr_pool_t *scratch_pool) +{ + recover_baton_t b; + + /* We have no way to take out an exclusive lock in FSX, so we're + restricted as to the types of recovery we can do. Luckily, + we just want to recreate the 'current' file, and we can do that just + by blocking other writers. */ + b.fs = fs; + b.cancel_func = cancel_func; + b.cancel_baton = cancel_baton; + return svn_fs_x__with_all_locks(fs, recover_body, &b, scratch_pool); +} diff --git a/subversion/libsvn_fs_x/recovery.h b/subversion/libsvn_fs_x/recovery.h new file mode 100644 index 0000000..4fe0a07 --- /dev/null +++ b/subversion/libsvn_fs_x/recovery.h @@ -0,0 +1,37 @@ +/* recovery.h : interface to the FSX recovery functionality + * + * ==================================================================== + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + * ==================================================================== + */ + +#ifndef SVN_LIBSVN_FS__RECOVERY_H +#define SVN_LIBSVN_FS__RECOVERY_H + +#include "fs.h" + +/* Recover the fsx associated with filesystem FS. + Use optional CANCEL_FUNC/CANCEL_BATON for cancellation support. + Use SCRATCH_POOL for temporary allocations. */ +svn_error_t * +svn_fs_x__recover(svn_fs_t *fs, + svn_cancel_func_t cancel_func, + void *cancel_baton, + apr_pool_t *scratch_pool); + +#endif diff --git a/subversion/libsvn_fs_x/rep-cache-db.h b/subversion/libsvn_fs_x/rep-cache-db.h new file mode 100644 index 0000000..918955f --- /dev/null +++ b/subversion/libsvn_fs_x/rep-cache-db.h @@ -0,0 +1,92 @@ +/* This file is automatically generated from rep-cache-db.sql and .dist_sandbox/subversion-1.9.7/subversion/libsvn_fs_x/token-map.h. + * Do not edit this file -- edit the source and rerun gen-make.py */ + +#define STMT_CREATE_SCHEMA 0 +#define STMT_0_INFO {"STMT_CREATE_SCHEMA", NULL} +#define STMT_0 \ + "PRAGMA PAGE_SIZE = 4096; " \ + "CREATE TABLE rep_cache ( " \ + " hash TEXT NOT NULL PRIMARY KEY, " \ + " revision INTEGER NOT NULL, " \ + " offset INTEGER NOT NULL, " \ + " size INTEGER NOT NULL, " \ + " expanded_size INTEGER NOT NULL " \ + " ); " \ + "PRAGMA USER_VERSION = 1; " \ + "" + +#define STMT_GET_REP 1 +#define STMT_1_INFO {"STMT_GET_REP", NULL} +#define STMT_1 \ + "SELECT revision, offset, size, expanded_size " \ + "FROM rep_cache " \ + "WHERE hash = ?1 " \ + "" + +#define STMT_SET_REP 2 +#define STMT_2_INFO {"STMT_SET_REP", NULL} +#define STMT_2 \ + "INSERT OR FAIL INTO rep_cache (hash, revision, offset, size, expanded_size) " \ + "VALUES (?1, ?2, ?3, ?4, ?5) " \ + "" + +#define STMT_GET_REPS_FOR_RANGE 3 +#define STMT_3_INFO {"STMT_GET_REPS_FOR_RANGE", NULL} +#define STMT_3 \ + "SELECT hash, revision, offset, size, expanded_size " \ + "FROM rep_cache " \ + "WHERE revision >= ?1 AND revision <= ?2 " \ + "" + +#define STMT_GET_MAX_REV 4 +#define STMT_4_INFO {"STMT_GET_MAX_REV", NULL} +#define STMT_4 \ + "SELECT MAX(revision) " \ + "FROM rep_cache " \ + "" + +#define STMT_DEL_REPS_YOUNGER_THAN_REV 5 +#define STMT_5_INFO {"STMT_DEL_REPS_YOUNGER_THAN_REV", NULL} +#define STMT_5 \ + "DELETE FROM rep_cache " \ + "WHERE revision > ?1 " \ + "" + +#define STMT_LOCK_REP 6 +#define STMT_6_INFO {"STMT_LOCK_REP", NULL} +#define STMT_6 \ + "BEGIN TRANSACTION; " \ + "INSERT INTO rep_cache VALUES ('dummy', 0, 0, 0, 0) " \ + "" + +#define STMT_UNLOCK_REP 7 +#define STMT_7_INFO {"STMT_UNLOCK_REP", NULL} +#define STMT_7 \ + "ROLLBACK TRANSACTION; " \ + "" + +#define REP_CACHE_DB_SQL_DECLARE_STATEMENTS(varname) \ + static const char * const varname[] = { \ + STMT_0, \ + STMT_1, \ + STMT_2, \ + STMT_3, \ + STMT_4, \ + STMT_5, \ + STMT_6, \ + STMT_7, \ + NULL \ + } + +#define REP_CACHE_DB_SQL_DECLARE_STATEMENT_INFO(varname) \ + static const char * const varname[][2] = { \ + STMT_0_INFO, \ + STMT_1_INFO, \ + STMT_2_INFO, \ + STMT_3_INFO, \ + STMT_4_INFO, \ + STMT_5_INFO, \ + STMT_6_INFO, \ + STMT_7_INFO, \ + {NULL, NULL} \ + } diff --git a/subversion/libsvn_fs_x/rep-cache-db.sql b/subversion/libsvn_fs_x/rep-cache-db.sql new file mode 100644 index 0000000..7ad402a --- /dev/null +++ b/subversion/libsvn_fs_x/rep-cache-db.sql @@ -0,0 +1,70 @@ +/* rep-cache-db.sql -- schema for use in rep-caching + * This is intended for use with SQLite 3 + * + * ==================================================================== + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + * ==================================================================== + */ + +-- STMT_CREATE_SCHEMA +PRAGMA PAGE_SIZE = 4096; + +/* A table mapping representation hashes to locations in a rev file. */ +CREATE TABLE rep_cache ( + hash TEXT NOT NULL PRIMARY KEY, + revision INTEGER NOT NULL, + offset INTEGER NOT NULL, + size INTEGER NOT NULL, + expanded_size INTEGER NOT NULL + ); + +PRAGMA USER_VERSION = 1; + + +-- STMT_GET_REP +SELECT revision, offset, size, expanded_size +FROM rep_cache +WHERE hash = ?1 + +-- STMT_SET_REP +INSERT OR FAIL INTO rep_cache (hash, revision, offset, size, expanded_size) +VALUES (?1, ?2, ?3, ?4, ?5) + +-- STMT_GET_REPS_FOR_RANGE +SELECT hash, revision, offset, size, expanded_size +FROM rep_cache +WHERE revision >= ?1 AND revision <= ?2 + +-- STMT_GET_MAX_REV +SELECT MAX(revision) +FROM rep_cache + +-- STMT_DEL_REPS_YOUNGER_THAN_REV +DELETE FROM rep_cache +WHERE revision > ?1 + +/* An INSERT takes an SQLite reserved lock that prevents other writes + but doesn't block reads. The incomplete transaction means that no + permanent change is made to the database and the transaction is + removed when the database is closed. */ +-- STMT_LOCK_REP +BEGIN TRANSACTION; +INSERT INTO rep_cache VALUES ('dummy', 0, 0, 0, 0) + +-- STMT_UNLOCK_REP +ROLLBACK TRANSACTION; diff --git a/subversion/libsvn_fs_x/rep-cache.c b/subversion/libsvn_fs_x/rep-cache.c new file mode 100644 index 0000000..85e62a4 --- /dev/null +++ b/subversion/libsvn_fs_x/rep-cache.c @@ -0,0 +1,416 @@ +/* rep-sharing.c --- the rep-sharing cache for fsx + * + * ==================================================================== + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + * ==================================================================== + */ + +#include "svn_pools.h" + +#include "svn_private_config.h" + +#include "fs_x.h" +#include "fs.h" +#include "rep-cache.h" +#include "util.h" +#include "../libsvn_fs/fs-loader.h" + +#include "svn_path.h" + +#include "private/svn_sqlite.h" + +#include "rep-cache-db.h" + +/* A few magic values */ +#define REP_CACHE_SCHEMA_FORMAT 1 + +REP_CACHE_DB_SQL_DECLARE_STATEMENTS(statements); + + + +/** Helper functions. **/ +static APR_INLINE const char * +path_rep_cache_db(const char *fs_path, + apr_pool_t *result_pool) +{ + return svn_dirent_join(fs_path, REP_CACHE_DB_NAME, result_pool); +} + + +/** Library-private API's. **/ + +/* Body of svn_fs_x__open_rep_cache(). + Implements svn_atomic__init_once().init_func. + */ +static svn_error_t * +open_rep_cache(void *baton, + apr_pool_t *scratch_pool) +{ + svn_fs_t *fs = baton; + svn_fs_x__data_t *ffd = fs->fsap_data; + svn_sqlite__db_t *sdb; + const char *db_path; + int version; + + /* Open (or create) the sqlite database. It will be automatically + closed when fs->pool is destroyed. */ + db_path = path_rep_cache_db(fs->path, scratch_pool); +#ifndef WIN32 + { + /* We want to extend the permissions that apply to the repository + as a whole when creating a new rep cache and not simply default + to umask. */ + svn_boolean_t exists; + + SVN_ERR(svn_fs_x__exists_rep_cache(&exists, fs, scratch_pool)); + if (!exists) + { + const char *current = svn_fs_x__path_current(fs, scratch_pool); + svn_error_t *err = svn_io_file_create_empty(db_path, scratch_pool); + + if (err && !APR_STATUS_IS_EEXIST(err->apr_err)) + /* A real error. */ + return svn_error_trace(err); + else if (err) + /* Some other thread/process created the file. */ + svn_error_clear(err); + else + /* We created the file. */ + SVN_ERR(svn_io_copy_perms(current, db_path, scratch_pool)); + } + } +#endif + SVN_ERR(svn_sqlite__open(&sdb, db_path, + svn_sqlite__mode_rwcreate, statements, + 0, NULL, 0, + fs->pool, scratch_pool)); + + SVN_ERR(svn_sqlite__read_schema_version(&version, sdb, scratch_pool)); + if (version < REP_CACHE_SCHEMA_FORMAT) + { + /* Must be 0 -- an uninitialized (no schema) database. Create + the schema. Results in schema version of 1. */ + SVN_ERR(svn_sqlite__exec_statements(sdb, STMT_CREATE_SCHEMA)); + } + + /* This is used as a flag that the database is available so don't + set it earlier. */ + ffd->rep_cache_db = sdb; + + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__open_rep_cache(svn_fs_t *fs, + apr_pool_t *scratch_pool) +{ + svn_fs_x__data_t *ffd = fs->fsap_data; + svn_error_t *err = svn_atomic__init_once(&ffd->rep_cache_db_opened, + open_rep_cache, fs, scratch_pool); + return svn_error_quick_wrap(err, _("Couldn't open rep-cache database")); +} + +svn_error_t * +svn_fs_x__exists_rep_cache(svn_boolean_t *exists, + svn_fs_t *fs, + apr_pool_t *scratch_pool) +{ + svn_node_kind_t kind; + + SVN_ERR(svn_io_check_path(path_rep_cache_db(fs->path, scratch_pool), + &kind, scratch_pool)); + + *exists = (kind != svn_node_none); + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__walk_rep_reference(svn_fs_t *fs, + svn_revnum_t start, + svn_revnum_t end, + svn_error_t *(*walker)(svn_fs_x__representation_t *, + void *, + svn_fs_t *, + apr_pool_t *), + void *walker_baton, + svn_cancel_func_t cancel_func, + void *cancel_baton, + apr_pool_t *scratch_pool) +{ + svn_fs_x__data_t *ffd = fs->fsap_data; + svn_sqlite__stmt_t *stmt; + svn_boolean_t have_row; + int iterations = 0; + + apr_pool_t *iterpool = svn_pool_create(scratch_pool); + + if (! ffd->rep_cache_db) + SVN_ERR(svn_fs_x__open_rep_cache(fs, scratch_pool)); + + /* Check global invariants. */ + if (start == 0) + { + svn_revnum_t max; + + SVN_ERR(svn_sqlite__get_statement(&stmt, ffd->rep_cache_db, + STMT_GET_MAX_REV)); + SVN_ERR(svn_sqlite__step(&have_row, stmt)); + max = svn_sqlite__column_revnum(stmt, 0); + SVN_ERR(svn_sqlite__reset(stmt)); + if (SVN_IS_VALID_REVNUM(max)) /* The rep-cache could be empty. */ + SVN_ERR(svn_fs_x__ensure_revision_exists(max, fs, iterpool)); + } + + SVN_ERR(svn_sqlite__get_statement(&stmt, ffd->rep_cache_db, + STMT_GET_REPS_FOR_RANGE)); + SVN_ERR(svn_sqlite__bindf(stmt, "rr", + start, end)); + + /* Walk the cache entries. */ + SVN_ERR(svn_sqlite__step(&have_row, stmt)); + while (have_row) + { + svn_fs_x__representation_t *rep; + const char *sha1_digest; + svn_error_t *err; + svn_checksum_t *checksum; + + /* Clear ITERPOOL occasionally. */ + if (iterations++ % 16 == 0) + svn_pool_clear(iterpool); + + /* Check for cancellation. */ + if (cancel_func) + { + err = cancel_func(cancel_baton); + if (err) + return svn_error_compose_create(err, svn_sqlite__reset(stmt)); + } + + /* Construct a svn_fs_x__representation_t. */ + rep = apr_pcalloc(iterpool, sizeof(*rep)); + sha1_digest = svn_sqlite__column_text(stmt, 0, iterpool); + err = svn_checksum_parse_hex(&checksum, svn_checksum_sha1, + sha1_digest, iterpool); + if (err) + return svn_error_compose_create(err, svn_sqlite__reset(stmt)); + + rep->has_sha1 = TRUE; + memcpy(rep->sha1_digest, checksum->digest, sizeof(rep->sha1_digest)); + rep->id.change_set = svn_sqlite__column_revnum(stmt, 1); + rep->id.number = svn_sqlite__column_int64(stmt, 2); + rep->size = svn_sqlite__column_int64(stmt, 3); + rep->expanded_size = svn_sqlite__column_int64(stmt, 4); + + /* Walk. */ + err = walker(rep, walker_baton, fs, iterpool); + if (err) + return svn_error_compose_create(err, svn_sqlite__reset(stmt)); + + SVN_ERR(svn_sqlite__step(&have_row, stmt)); + } + + SVN_ERR(svn_sqlite__reset(stmt)); + svn_pool_destroy(iterpool); + + return SVN_NO_ERROR; +} + + +/* This function's caller ignores most errors it returns. + If you extend this function, check the callsite to see if you have + to make it not-ignore additional error codes. */ +svn_error_t * +svn_fs_x__get_rep_reference(svn_fs_x__representation_t **rep, + svn_fs_t *fs, + svn_checksum_t *checksum, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool) +{ + svn_fs_x__data_t *ffd = fs->fsap_data; + svn_sqlite__stmt_t *stmt; + svn_boolean_t have_row; + + SVN_ERR_ASSERT(ffd->rep_sharing_allowed); + if (! ffd->rep_cache_db) + SVN_ERR(svn_fs_x__open_rep_cache(fs, scratch_pool)); + + /* We only allow SHA1 checksums in this table. */ + if (checksum->kind != svn_checksum_sha1) + return svn_error_create(SVN_ERR_BAD_CHECKSUM_KIND, NULL, + _("Only SHA1 checksums can be used as keys in the " + "rep_cache table.\n")); + + SVN_ERR(svn_sqlite__get_statement(&stmt, ffd->rep_cache_db, STMT_GET_REP)); + SVN_ERR(svn_sqlite__bindf(stmt, "s", + svn_checksum_to_cstring(checksum, scratch_pool))); + + SVN_ERR(svn_sqlite__step(&have_row, stmt)); + if (have_row) + { + *rep = apr_pcalloc(result_pool, sizeof(**rep)); + memcpy((*rep)->sha1_digest, checksum->digest, + sizeof((*rep)->sha1_digest)); + (*rep)->has_sha1 = TRUE; + (*rep)->id.change_set = svn_sqlite__column_revnum(stmt, 0); + (*rep)->id.number = svn_sqlite__column_int64(stmt, 1); + (*rep)->size = svn_sqlite__column_int64(stmt, 2); + (*rep)->expanded_size = svn_sqlite__column_int64(stmt, 3); + } + else + *rep = NULL; + + SVN_ERR(svn_sqlite__reset(stmt)); + + if (*rep) + { + /* Check that REP refers to a revision that exists in FS. */ + svn_revnum_t revision = svn_fs_x__get_revnum((*rep)->id.change_set); + svn_error_t *err = svn_fs_x__ensure_revision_exists(revision, fs, + scratch_pool); + if (err) + return svn_error_createf(SVN_ERR_FS_CORRUPT, err, + "Checksum '%s' in rep-cache is beyond HEAD", + svn_checksum_to_cstring_display(checksum, scratch_pool)); + } + + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__set_rep_reference(svn_fs_t *fs, + svn_fs_x__representation_t *rep, + apr_pool_t *scratch_pool) +{ + svn_fs_x__data_t *ffd = fs->fsap_data; + svn_sqlite__stmt_t *stmt; + svn_error_t *err; + svn_checksum_t checksum; + checksum.kind = svn_checksum_sha1; + checksum.digest = rep->sha1_digest; + + SVN_ERR_ASSERT(ffd->rep_sharing_allowed); + if (! ffd->rep_cache_db) + SVN_ERR(svn_fs_x__open_rep_cache(fs, scratch_pool)); + + /* We only allow SHA1 checksums in this table. */ + if (! rep->has_sha1) + return svn_error_create(SVN_ERR_BAD_CHECKSUM_KIND, NULL, + _("Only SHA1 checksums can be used as keys in the " + "rep_cache table.\n")); + + SVN_ERR(svn_sqlite__get_statement(&stmt, ffd->rep_cache_db, STMT_SET_REP)); + SVN_ERR(svn_sqlite__bindf(stmt, "siiii", + svn_checksum_to_cstring(&checksum, scratch_pool), + (apr_int64_t) rep->id.change_set, + (apr_int64_t) rep->id.number, + (apr_int64_t) rep->size, + (apr_int64_t) rep->expanded_size)); + + err = svn_sqlite__insert(NULL, stmt); + if (err) + { + svn_fs_x__representation_t *old_rep; + + if (err->apr_err != SVN_ERR_SQLITE_CONSTRAINT) + return svn_error_trace(err); + + svn_error_clear(err); + + /* Constraint failed so the mapping for SHA1_CHECKSUM->REP + should exist. If so that's cool -- just do nothing. If not, + that's a red flag! */ + SVN_ERR(svn_fs_x__get_rep_reference(&old_rep, fs, &checksum, + scratch_pool, scratch_pool)); + + if (!old_rep) + { + /* Something really odd at this point, we failed to insert the + checksum AND failed to read an existing checksum. Do we need + to flag this? */ + } + } + + return SVN_NO_ERROR; +} + + +svn_error_t * +svn_fs_x__del_rep_reference(svn_fs_t *fs, + svn_revnum_t youngest, + apr_pool_t *scratch_pool) +{ + svn_fs_x__data_t *ffd = fs->fsap_data; + svn_sqlite__stmt_t *stmt; + + if (! ffd->rep_cache_db) + SVN_ERR(svn_fs_x__open_rep_cache(fs, scratch_pool)); + + SVN_ERR(svn_sqlite__get_statement(&stmt, ffd->rep_cache_db, + STMT_DEL_REPS_YOUNGER_THAN_REV)); + SVN_ERR(svn_sqlite__bindf(stmt, "r", youngest)); + SVN_ERR(svn_sqlite__step_done(stmt)); + + return SVN_NO_ERROR; +} + +/* Start a transaction to take an SQLite reserved lock that prevents + other writes. + + See unlock_rep_cache(). */ +static svn_error_t * +lock_rep_cache(svn_fs_t *fs, + apr_pool_t *pool) +{ + svn_fs_x__data_t *ffd = fs->fsap_data; + + if (! ffd->rep_cache_db) + SVN_ERR(svn_fs_x__open_rep_cache(fs, pool)); + + SVN_ERR(svn_sqlite__exec_statements(ffd->rep_cache_db, STMT_LOCK_REP)); + + return SVN_NO_ERROR; +} + +/* End the transaction started by lock_rep_cache(). */ +static svn_error_t * +unlock_rep_cache(svn_fs_t *fs, + apr_pool_t *pool) +{ + svn_fs_x__data_t *ffd = fs->fsap_data; + + SVN_ERR_ASSERT(ffd->rep_cache_db); /* was opened by lock_rep_cache() */ + + SVN_ERR(svn_sqlite__exec_statements(ffd->rep_cache_db, STMT_UNLOCK_REP)); + + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__with_rep_cache_lock(svn_fs_t *fs, + svn_error_t *(*body)(void *, + apr_pool_t *), + void *baton, + apr_pool_t *pool) +{ + svn_error_t *err; + + SVN_ERR(lock_rep_cache(fs, pool)); + err = body(baton, pool); + return svn_error_compose_create(err, unlock_rep_cache(fs, pool)); +} diff --git a/subversion/libsvn_fs_x/rep-cache.h b/subversion/libsvn_fs_x/rep-cache.h new file mode 100644 index 0000000..1fe26da --- /dev/null +++ b/subversion/libsvn_fs_x/rep-cache.h @@ -0,0 +1,105 @@ +/* rep-cache.h : interface to rep cache db functions + * + * ==================================================================== + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + * ==================================================================== + */ + +#ifndef SVN_LIBSVN_FS_X_REP_CACHE_H +#define SVN_LIBSVN_FS_X_REP_CACHE_H + +#include "svn_error.h" + +#include "fs.h" + +#ifdef __cplusplus +extern "C" { +#endif /* __cplusplus */ + + +#define REP_CACHE_DB_NAME "rep-cache.db" + +/* Open and create, if needed, the rep cache database associated with FS. + Use SCRATCH_POOL for temporary allocations. */ +svn_error_t * +svn_fs_x__open_rep_cache(svn_fs_t *fs, + apr_pool_t *scratch_pool); + +/* Set *EXISTS to TRUE iff the rep-cache DB file exists. */ +svn_error_t * +svn_fs_x__exists_rep_cache(svn_boolean_t *exists, + svn_fs_t *fs, + apr_pool_t *scratch_pool); + +/* Iterate all representations currently in FS's cache. */ +svn_error_t * +svn_fs_x__walk_rep_reference(svn_fs_t *fs, + svn_revnum_t start, + svn_revnum_t end, + svn_error_t *(*walker)(svn_fs_x__representation_t *rep, + void *walker_baton, + svn_fs_t *fs, + apr_pool_t *scratch_pool), + void *walker_baton, + svn_cancel_func_t cancel_func, + void *cancel_baton, + apr_pool_t *scratch_pool); + +/* Return the representation REP in FS which has fulltext CHECKSUM. + REP is allocated in RESULT_POOL. If the rep cache database has not been + opened, just set *REP to NULL. Returns SVN_ERR_FS_CORRUPT if a reference + beyond HEAD is detected. Uses SCRATCH_POOL for temporary allocations. */ +svn_error_t * +svn_fs_x__get_rep_reference(svn_fs_x__representation_t **rep, + svn_fs_t *fs, + svn_checksum_t *checksum, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool); + +/* Set the representation REP in FS, using REP->CHECKSUM. + Use SCRATCH_POOL for temporary allocations. Returns SVN_ERR_FS_CORRUPT + if an existing reference beyond HEAD is detected. + + If the rep cache database has not been opened, this may be a no op. */ +svn_error_t * +svn_fs_x__set_rep_reference(svn_fs_t *fs, + svn_fs_x__representation_t *rep, + apr_pool_t *scratch_pool); + +/* Delete from the cache all reps corresponding to revisions younger + than YOUNGEST. */ +svn_error_t * +svn_fs_x__del_rep_reference(svn_fs_t *fs, + svn_revnum_t youngest, + apr_pool_t *scratch_pool); + + +/* Start a transaction to take an SQLite reserved lock that prevents + other writes, call BODY, end the transaction, and return what BODY returned. + */ +svn_error_t * +svn_fs_x__with_rep_cache_lock(svn_fs_t *fs, + svn_error_t *(*body)(void *baton, + apr_pool_t *pool), + void *baton, + apr_pool_t *pool); +#ifdef __cplusplus +} +#endif /* __cplusplus */ + +#endif /* SVN_LIBSVN_FS_X_REP_CACHE_H */ diff --git a/subversion/libsvn_fs_x/reps.c b/subversion/libsvn_fs_x/reps.c new file mode 100644 index 0000000..85a5269 --- /dev/null +++ b/subversion/libsvn_fs_x/reps.c @@ -0,0 +1,948 @@ +/* reps.c --- FSX representation container + * + * ==================================================================== + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + * ==================================================================== + */ + +#include "reps.h" + +#include "svn_sorts.h" +#include "private/svn_string_private.h" +#include "private/svn_packed_data.h" +#include "private/svn_temp_serializer.h" + +#include "svn_private_config.h" + +#include "cached_data.h" + +/* Length of the text chunks we hash and match. The algorithm will find + * most matches with a length of 2 * MATCH_BLOCKSIZE and only specific + * ones that are shorter than MATCH_BLOCKSIZE. + * + * This should be a power of two and must be a multiple of 8. + * Good choices are 32, 64 and 128. + */ +#define MATCH_BLOCKSIZE 64 + +/* Limit the total text body within a container to 16MB. Larger values + * of up to 2GB are possible but become increasingly impractical as the + * container has to be loaded in its entirety before any of it can be read. + */ +#define MAX_TEXT_BODY 0x1000000 + +/* Limit the size of the instructions stream. This should not exceed the + * text body size limit. */ +#define MAX_INSTRUCTIONS (MAX_TEXT_BODY / 8) + +/* value of unused hash buckets */ +#define NO_OFFSET ((apr_uint32_t)(-1)) + +/* Byte strings are described by a series of copy instructions that each + * do one of the following + * + * - copy a given number of bytes from the text corpus starting at a + * given offset + * - reference other instruction and specify how many of instructions of + * that sequence shall be executed (i.e. a sub-sequence) + * - copy a number of bytes from the base representation buffer starting + * at a given offset + */ + +/* The contents of a fulltext / representation is defined by its first + * instruction and the number of instructions to execute. + */ +typedef struct rep_t +{ + apr_uint32_t first_instruction; + apr_uint32_t instruction_count; +} rep_t; + +/* A single instruction. The instruction type is being encoded in OFFSET. + */ +typedef struct instruction_t +{ + /* Instruction type and offset. + * - offset < 0 + * reference to instruction sub-sequence starting with + * container->instructions[-offset]. + * - 0 <= offset < container->base_text_len + * reference to the base text corpus; + * start copy at offset + * - offset >= container->base_text_len + * reference to the text corpus; + * start copy at offset-container->base_text_len + */ + apr_int32_t offset; + + /* Number of bytes to copy / instructions to execute + */ + apr_uint32_t count; +} instruction_t; + +/* Describe a base fulltext. + */ +typedef struct base_t +{ + /* Revision */ + svn_revnum_t revision; + + /* Item within that revision */ + apr_uint64_t item_index; + + /* Priority with which to use this base over others */ + int priority; + + /* Index into builder->representations that identifies the copy + * instructions for this base. */ + apr_uint32_t rep; +} base_t; + +/* Yet another hash data structure. This one tries to be more cache + * friendly by putting the first byte of each hashed sequence in a + * common array. This array will often fit into L1 or L2 at least and + * give a 99% accurate test for a match without giving false negatives. + */ +typedef struct hash_t +{ + /* for used entries i, prefixes[i] == text[offsets[i]]; 0 otherwise. + * This allows for a quick check without resolving the double + * indirection. */ + char *prefixes; + + /* for used entries i, offsets[i] is start offset in the text corpus; + * NO_OFFSET otherwise. + */ + apr_uint32_t *offsets; + + /* to be used later for optimizations. */ + apr_uint32_t *last_matches; + + /* number of buckets in this hash, i.e. elements in each array above. + * Must be 1 << (8 * sizeof(hash_key_t) - shift) */ + apr_size_t size; + + /* number of buckets actually in use. Must be <= size. */ + apr_size_t used; + + /* number of bits to shift right to map a hash_key_t to a bucket index */ + apr_size_t shift; + + /* pool to use when growing the hash */ + apr_pool_t *pool; +} hash_t; + +/* Hash key type. 32 bits for pseudo-Adler32 hash sums. + */ +typedef apr_uint32_t hash_key_t; + +/* Constructor data structure. + */ +struct svn_fs_x__reps_builder_t +{ + /* file system to read base representations from */ + svn_fs_t *fs; + + /* text corpus */ + svn_stringbuf_t *text; + + /* text block hash */ + hash_t hash; + + /* array of base_t objects describing all bases defined so far */ + apr_array_header_t *bases; + + /* array of rep_t objects describing all fulltexts (including bases) + * added so far */ + apr_array_header_t *reps; + + /* array of instruction_t objects describing all instructions */ + apr_array_header_t *instructions; + + /* number of bytes in the text corpus that belongs to bases */ + apr_size_t base_text_len; +}; + +/* R/o container. + */ +struct svn_fs_x__reps_t +{ + /* text corpus */ + const char *text; + + /* length of the text corpus in bytes */ + apr_size_t text_len; + + /* bases used */ + const base_t *bases; + + /* number of bases used */ + apr_size_t base_count; + + /* fulltext i can be reconstructed by executing instructions + * first_instructions[i] .. first_instructions[i+1]-1 + * (this array has one extra element at the end) + */ + const apr_uint32_t *first_instructions; + + /* number of fulltexts (no bases) */ + apr_size_t rep_count; + + /* instructions */ + const instruction_t *instructions; + + /* total number of instructions */ + apr_size_t instruction_count; + + /* offsets > 0 but smaller that this are considered base references */ + apr_size_t base_text_len; +}; + +/* describe a section in the extractor's result string that is not filled + * yet (but already exists). + */ +typedef struct missing_t +{ + /* start offset within the result string */ + apr_uint32_t start; + + /* number of bytes to write */ + apr_uint32_t count; + + /* index into extractor->bases selecting the base representation to + * copy from */ + apr_uint32_t base; + + /* copy source offset within that base representation */ + apr_uint32_t offset; +} missing_t; + +/* Fulltext extractor data structure. + */ +struct svn_fs_x__rep_extractor_t +{ + /* filesystem to read the bases from */ + svn_fs_t *fs; + + /* fulltext being constructed */ + svn_stringbuf_t *result; + + /* bases (base_t) yet to process (not used ATM) */ + apr_array_header_t *bases; + + /* missing sections (missing_t) in result->data that need to be filled, + * yet */ + apr_array_header_t *missing; + + /* pool to use for allocating the above arrays */ + apr_pool_t *pool; +}; + +/* Given the ADLER32 checksum for a certain range of MATCH_BLOCKSIZE + * bytes, return the checksum for the range excluding the first byte + * C_OUT and appending C_IN. + */ +static hash_key_t +hash_key_replace(hash_key_t adler32, const char c_out, const char c_in) +{ + adler32 -= (MATCH_BLOCKSIZE * 0x10000u * ((unsigned char) c_out)); + + adler32 -= (unsigned char)c_out; + adler32 += (unsigned char)c_in; + + return adler32 + adler32 * 0x10000; +} + +/* Calculate an pseudo-adler32 checksum for MATCH_BLOCKSIZE bytes starting + at DATA. Return the checksum value. */ +static hash_key_t +hash_key(const char *data) +{ + const unsigned char *input = (const unsigned char *)data; + const unsigned char *last = input + MATCH_BLOCKSIZE; + + hash_key_t s1 = 0; + hash_key_t s2 = 0; + + for (; input < last; input += 8) + { + s1 += input[0]; s2 += s1; + s1 += input[1]; s2 += s1; + s1 += input[2]; s2 += s1; + s1 += input[3]; s2 += s1; + s1 += input[4]; s2 += s1; + s1 += input[5]; s2 += s1; + s1 += input[6]; s2 += s1; + s1 += input[7]; s2 += s1; + } + + return s2 * 0x10000 + s1; +} + +/* Map the ADLER32 key to a bucket index in HASH and return that index. + */ +static apr_size_t +hash_to_index(hash_t *hash, hash_key_t adler32) +{ + return (adler32 * 0xd1f3da69) >> hash->shift; +} + +/* Allocate and initialized SIZE buckets in RESULT_POOL. + * Assign them to HASH. + */ +static void +allocate_hash_members(hash_t *hash, + apr_size_t size, + apr_pool_t *result_pool) +{ + apr_size_t i; + + hash->pool = result_pool; + hash->size = size; + + hash->prefixes = apr_pcalloc(result_pool, size); + hash->last_matches = apr_pcalloc(result_pool, + sizeof(*hash->last_matches) * size); + hash->offsets = apr_palloc(result_pool, sizeof(*hash->offsets) * size); + + for (i = 0; i < size; ++i) + hash->offsets[i] = NO_OFFSET; +} + +/* Initialize the HASH data structure with 2**TWOPOWER buckets allocated + * in RESULT_POOL. + */ +static void +init_hash(hash_t *hash, + apr_size_t twoPower, + apr_pool_t *result_pool) +{ + hash->used = 0; + hash->shift = sizeof(hash_key_t) * 8 - twoPower; + + allocate_hash_members(hash, 1 << twoPower, result_pool); +} + +/* Make HASH have at least MIN_SIZE buckets but at least double the number + * of buckets in HASH by rehashing it based TEXT. + */ +static void +grow_hash(hash_t *hash, + svn_stringbuf_t *text, + apr_size_t min_size) +{ + hash_t copy; + apr_size_t i; + + /* determine the new hash size */ + apr_size_t new_size = hash->size * 2; + apr_size_t new_shift = hash->shift - 1; + while (new_size < min_size) + { + new_size *= 2; + --new_shift; + } + + /* allocate new hash */ + allocate_hash_members(©, new_size, hash->pool); + copy.used = 0; + copy.shift = new_shift; + + /* copy / translate data */ + for (i = 0; i < hash->size; ++i) + { + apr_uint32_t offset = hash->offsets[i]; + if (offset != NO_OFFSET) + { + hash_key_t key = hash_key(text->data + offset); + size_t idx = hash_to_index(©, key); + + if (copy.offsets[idx] == NO_OFFSET) + copy.used++; + + copy.prefixes[idx] = hash->prefixes[i]; + copy.offsets[idx] = offset; + copy.last_matches[idx] = hash->last_matches[i]; + } + } + + *hash = copy; +} + +svn_fs_x__reps_builder_t * +svn_fs_x__reps_builder_create(svn_fs_t *fs, + apr_pool_t *result_pool) +{ + svn_fs_x__reps_builder_t *result = apr_pcalloc(result_pool, + sizeof(*result)); + + result->fs = fs; + result->text = svn_stringbuf_create_empty(result_pool); + init_hash(&result->hash, 4, result_pool); + + result->bases = apr_array_make(result_pool, 0, sizeof(base_t)); + result->reps = apr_array_make(result_pool, 0, sizeof(rep_t)); + result->instructions = apr_array_make(result_pool, 0, + sizeof(instruction_t)); + + return result; +} + +svn_error_t * +svn_fs_x__reps_add_base(svn_fs_x__reps_builder_t *builder, + svn_fs_x__representation_t *rep, + int priority, + apr_pool_t *scratch_pool) +{ + base_t base; + apr_size_t text_start_offset = builder->text->len; + + svn_stream_t *stream; + svn_string_t *contents; + apr_size_t idx; + SVN_ERR(svn_fs_x__get_contents(&stream, builder->fs, rep, FALSE, + scratch_pool)); + SVN_ERR(svn_string_from_stream(&contents, stream, scratch_pool, + scratch_pool)); + SVN_ERR(svn_fs_x__reps_add(&idx, builder, contents)); + + base.revision = svn_fs_x__get_revnum(rep->id.change_set); + base.item_index = rep->id.number; + base.priority = priority; + base.rep = (apr_uint32_t)idx; + + APR_ARRAY_PUSH(builder->bases, base_t) = base; + builder->base_text_len += builder->text->len - text_start_offset; + + return SVN_NO_ERROR; +} + +/* Add LEN bytes from DATA to BUILDER's text corpus. Also, add a copy + * operation for that text fragment. + */ +static void +add_new_text(svn_fs_x__reps_builder_t *builder, + const char *data, + apr_size_t len) +{ + instruction_t instruction; + apr_size_t offset; + apr_size_t buckets_required; + + if (len == 0) + return; + + /* new instruction */ + instruction.offset = (apr_int32_t)builder->text->len; + instruction.count = (apr_uint32_t)len; + APR_ARRAY_PUSH(builder->instructions, instruction_t) = instruction; + + /* add to text corpus */ + svn_stringbuf_appendbytes(builder->text, data, len); + + /* expand the hash upfront to minimize the chances of collisions */ + buckets_required = builder->hash.used + len / MATCH_BLOCKSIZE; + if (buckets_required * 3 >= builder->hash.size * 2) + grow_hash(&builder->hash, builder->text, 2 * buckets_required); + + /* add hash entries for the new sequence */ + for (offset = instruction.offset; + offset + MATCH_BLOCKSIZE <= builder->text->len; + offset += MATCH_BLOCKSIZE) + { + hash_key_t key = hash_key(builder->text->data + offset); + size_t idx = hash_to_index(&builder->hash, key); + + /* Don't replace hash entries that stem from the current text. + * This makes early matches more likely. */ + if (builder->hash.offsets[idx] == NO_OFFSET) + ++builder->hash.used; + else if (builder->hash.offsets[idx] >= instruction.offset) + continue; + + builder->hash.offsets[idx] = (apr_uint32_t)offset; + builder->hash.prefixes[idx] = builder->text->data[offset]; + } +} + +svn_error_t * +svn_fs_x__reps_add(apr_size_t *rep_idx, + svn_fs_x__reps_builder_t *builder, + const svn_string_t *contents) +{ + rep_t rep; + const char *current = contents->data; + const char *processed = current; + const char *end = current + contents->len; + const char *last_to_test = end - MATCH_BLOCKSIZE - 1; + + if (builder->text->len + contents->len > MAX_TEXT_BODY) + return svn_error_create(SVN_ERR_FS_CONTAINER_SIZE, NULL, + _("Text body exceeds star delta container capacity")); + + if ( builder->instructions->nelts + 2 * contents->len / MATCH_BLOCKSIZE + > MAX_INSTRUCTIONS) + return svn_error_create(SVN_ERR_FS_CONTAINER_SIZE, NULL, + _("Instruction count exceeds star delta container capacity")); + + rep.first_instruction = (apr_uint32_t)builder->instructions->nelts; + while (current < last_to_test) + { + hash_key_t key = hash_key(current); + size_t offset; + size_t idx; + + /* search for the next matching sequence */ + + for (; current < last_to_test; ++current) + { + idx = hash_to_index(&builder->hash, key); + if (builder->hash.prefixes[idx] == current[0]) + { + offset = builder->hash.offsets[idx]; + if ( (offset != NO_OFFSET) + && (memcmp(&builder->text->data[offset], current, + MATCH_BLOCKSIZE) == 0)) + break; + } + key = hash_key_replace(key, current[0], current[MATCH_BLOCKSIZE]); + } + + /* found it? */ + + if (current < last_to_test) + { + instruction_t instruction; + + /* extend the match */ + + size_t prefix_match + = svn_cstring__reverse_match_length(current, + builder->text->data + offset, + MIN(offset, current - processed)); + size_t postfix_match + = svn_cstring__match_length(current + MATCH_BLOCKSIZE, + builder->text->data + offset + MATCH_BLOCKSIZE, + MIN(builder->text->len - offset - MATCH_BLOCKSIZE, + end - current - MATCH_BLOCKSIZE)); + + /* non-matched section */ + + size_t new_copy = (current - processed) - prefix_match; + if (new_copy) + add_new_text(builder, processed, new_copy); + + /* add instruction for matching section */ + + instruction.offset = (apr_int32_t)(offset - prefix_match); + instruction.count = (apr_uint32_t)(prefix_match + postfix_match + + MATCH_BLOCKSIZE); + APR_ARRAY_PUSH(builder->instructions, instruction_t) = instruction; + + processed = current + MATCH_BLOCKSIZE + postfix_match; + current = processed; + } + } + + add_new_text(builder, processed, end - processed); + rep.instruction_count = (apr_uint32_t)builder->instructions->nelts + - rep.first_instruction; + APR_ARRAY_PUSH(builder->reps, rep_t) = rep; + + *rep_idx = (apr_size_t)(builder->reps->nelts - 1); + return SVN_NO_ERROR; +} + +apr_size_t +svn_fs_x__reps_estimate_size(const svn_fs_x__reps_builder_t *builder) +{ + /* approx: size of the text exclusive to us @ 50% compression rate + * + 2 bytes per instruction + * + 2 bytes per representation + * + 8 bytes per base representation + * + 1:8 inefficiency in using the base representations + * + 100 bytes static overhead + */ + return (builder->text->len - builder->base_text_len) / 2 + + builder->instructions->nelts * 2 + + builder->reps->nelts * 2 + + builder->bases->nelts * 8 + + builder->base_text_len / 8 + + 100; +} + +/* Execute COUNT instructions starting at INSTRUCTION_IDX in CONTAINER + * and fill the parts of EXTRACTOR->RESULT that we can from this container. + * Record the remainder in EXTRACTOR->MISSING. + * + * This function will recurse for instructions that reference other + * instruction sequences. COUNT refers to the top-level instructions only. + */ +static void +get_text(svn_fs_x__rep_extractor_t *extractor, + const svn_fs_x__reps_t *container, + apr_size_t instruction_idx, + apr_size_t count) +{ + const instruction_t *instruction; + const char *offset_0 = container->text - container->base_text_len; + + for (instruction = container->instructions + instruction_idx; + instruction < container->instructions + instruction_idx + count; + instruction++) + if (instruction->offset < 0) + { + /* instruction sub-sequence */ + get_text(extractor, container, -instruction->offset, + instruction->count); + } + else if (instruction->offset >= container->base_text_len) + { + /* direct copy instruction */ + svn_stringbuf_appendbytes(extractor->result, + offset_0 + instruction->offset, + instruction->count); + } + else + { + /* a section that we need to fill from some external base rep. */ + missing_t missing; + missing.base = 0; + missing.start = (apr_uint32_t)extractor->result->len; + missing.count = instruction->count; + missing.offset = instruction->offset; + svn_stringbuf_appendfill(extractor->result, 0, instruction->count); + + if (extractor->missing == NULL) + extractor->missing = apr_array_make(extractor->pool, 1, + sizeof(missing)); + + APR_ARRAY_PUSH(extractor->missing, missing_t) = missing; + } +} + +svn_error_t * +svn_fs_x__reps_get(svn_fs_x__rep_extractor_t **extractor, + svn_fs_t *fs, + const svn_fs_x__reps_t *container, + apr_size_t idx, + apr_pool_t *pool) +{ + apr_uint32_t first = container->first_instructions[idx]; + apr_uint32_t last = container->first_instructions[idx + 1]; + + /* create the extractor object */ + svn_fs_x__rep_extractor_t *result = apr_pcalloc(pool, sizeof(*result)); + result->fs = fs; + result->result = svn_stringbuf_create_empty(pool); + result->pool = pool; + + /* fill all the bits of the result that we can, i.e. all but bits coming + * from base representations */ + get_text(result, container, first, last - first); + *extractor = result; + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__extractor_drive(svn_stringbuf_t **contents, + svn_fs_x__rep_extractor_t *extractor, + apr_size_t start_offset, + apr_size_t size, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool) +{ + /* we don't support base reps right now */ + SVN_ERR_ASSERT(extractor->missing == NULL); + + if (size == 0) + { + *contents = svn_stringbuf_dup(extractor->result, result_pool); + } + else + { + /* clip the selected range */ + if (start_offset > extractor->result->len) + start_offset = extractor->result->len; + + if (size > extractor->result->len - start_offset) + size = extractor->result->len - start_offset; + + *contents = svn_stringbuf_ncreate(extractor->result->data + start_offset, + size, result_pool); + } + + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__write_reps_container(svn_stream_t *stream, + const svn_fs_x__reps_builder_t *builder, + apr_pool_t *scratch_pool) +{ + int i; + svn_packed__data_root_t *root = svn_packed__data_create_root(scratch_pool); + + /* one top-level stream for each array */ + svn_packed__int_stream_t *bases_stream + = svn_packed__create_int_stream(root, FALSE, FALSE); + svn_packed__int_stream_t *reps_stream + = svn_packed__create_int_stream(root, TRUE, FALSE); + svn_packed__int_stream_t *instructions_stream + = svn_packed__create_int_stream(root, FALSE, FALSE); + + /* for misc stuff */ + svn_packed__int_stream_t *misc_stream + = svn_packed__create_int_stream(root, FALSE, FALSE); + + /* TEXT will be just a single string */ + svn_packed__byte_stream_t *text_stream + = svn_packed__create_bytes_stream(root); + + /* structure the struct streams such we can extract much of the redundancy + */ + svn_packed__create_int_substream(bases_stream, TRUE, TRUE); + svn_packed__create_int_substream(bases_stream, TRUE, FALSE); + svn_packed__create_int_substream(bases_stream, TRUE, FALSE); + svn_packed__create_int_substream(bases_stream, TRUE, FALSE); + + svn_packed__create_int_substream(instructions_stream, TRUE, TRUE); + svn_packed__create_int_substream(instructions_stream, FALSE, FALSE); + + /* text */ + svn_packed__add_bytes(text_stream, builder->text->data, builder->text->len); + + /* serialize bases */ + for (i = 0; i < builder->bases->nelts; ++i) + { + const base_t *base = &APR_ARRAY_IDX(builder->bases, i, base_t); + svn_packed__add_int(bases_stream, base->revision); + svn_packed__add_uint(bases_stream, base->item_index); + svn_packed__add_uint(bases_stream, base->priority); + svn_packed__add_uint(bases_stream, base->rep); + } + + /* serialize reps */ + for (i = 0; i < builder->reps->nelts; ++i) + { + const rep_t *rep = &APR_ARRAY_IDX(builder->reps, i, rep_t); + svn_packed__add_uint(reps_stream, rep->first_instruction); + } + + svn_packed__add_uint(reps_stream, builder->instructions->nelts); + + /* serialize instructions */ + for (i = 0; i < builder->instructions->nelts; ++i) + { + const instruction_t *instruction + = &APR_ARRAY_IDX(builder->instructions, i, instruction_t); + svn_packed__add_int(instructions_stream, instruction->offset); + svn_packed__add_uint(instructions_stream, instruction->count); + } + + /* other elements */ + svn_packed__add_uint(misc_stream, 0); + + /* write to stream */ + SVN_ERR(svn_packed__data_write(stream, root, scratch_pool)); + + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__read_reps_container(svn_fs_x__reps_t **container, + svn_stream_t *stream, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool) +{ + apr_size_t i; + + base_t *bases; + apr_uint32_t *first_instructions; + instruction_t *instructions; + + svn_fs_x__reps_t *reps = apr_pcalloc(result_pool, sizeof(*reps)); + + svn_packed__data_root_t *root; + svn_packed__int_stream_t *bases_stream; + svn_packed__int_stream_t *reps_stream; + svn_packed__int_stream_t *instructions_stream; + svn_packed__int_stream_t *misc_stream; + svn_packed__byte_stream_t *text_stream; + + /* read from disk */ + SVN_ERR(svn_packed__data_read(&root, stream, result_pool, scratch_pool)); + + bases_stream = svn_packed__first_int_stream(root); + reps_stream = svn_packed__next_int_stream(bases_stream); + instructions_stream = svn_packed__next_int_stream(reps_stream); + misc_stream = svn_packed__next_int_stream(instructions_stream); + text_stream = svn_packed__first_byte_stream(root); + + /* text */ + reps->text = svn_packed__get_bytes(text_stream, &reps->text_len); + reps->text = apr_pmemdup(result_pool, reps->text, reps->text_len); + + /* de-serialize bases */ + reps->base_count + = svn_packed__int_count(svn_packed__first_int_substream(bases_stream)); + bases = apr_palloc(result_pool, reps->base_count * sizeof(*bases)); + reps->bases = bases; + + for (i = 0; i < reps->base_count; ++i) + { + base_t *base = bases + i; + base->revision = (svn_revnum_t)svn_packed__get_int(bases_stream); + base->item_index = svn_packed__get_uint(bases_stream); + base->priority = (int)svn_packed__get_uint(bases_stream); + base->rep = (apr_uint32_t)svn_packed__get_uint(bases_stream); + } + + /* de-serialize instructions */ + reps->instruction_count + = svn_packed__int_count + (svn_packed__first_int_substream(instructions_stream)); + instructions + = apr_palloc(result_pool, + reps->instruction_count * sizeof(*instructions)); + reps->instructions = instructions; + + for (i = 0; i < reps->instruction_count; ++i) + { + instruction_t *instruction = instructions + i; + instruction->offset + = (apr_int32_t)svn_packed__get_int(instructions_stream); + instruction->count + = (apr_uint32_t)svn_packed__get_uint(instructions_stream); + } + + /* de-serialize reps */ + reps->rep_count = svn_packed__int_count(reps_stream); + first_instructions + = apr_palloc(result_pool, + (reps->rep_count + 1) * sizeof(*first_instructions)); + reps->first_instructions = first_instructions; + + for (i = 0; i < reps->rep_count; ++i) + first_instructions[i] + = (apr_uint32_t)svn_packed__get_uint(reps_stream); + first_instructions[reps->rep_count] = (apr_uint32_t)reps->instruction_count; + + /* other elements */ + reps->base_text_len = (apr_size_t)svn_packed__get_uint(misc_stream); + + /* return result */ + *container = reps; + + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__serialize_reps_container(void **data, + apr_size_t *data_len, + void *in, + apr_pool_t *pool) +{ + svn_fs_x__reps_t *reps = in; + svn_stringbuf_t *serialized; + + /* make a guesstimate on the size of the serialized data. Erring on the + * low side will cause the serializer to re-alloc its buffer. */ + apr_size_t size + = reps->text_len + + reps->base_count * sizeof(*reps->bases) + + reps->rep_count * sizeof(*reps->first_instructions) + + reps->instruction_count * sizeof(*reps->instructions) + + 100; + + /* serialize array header and all its elements */ + svn_temp_serializer__context_t *context + = svn_temp_serializer__init(reps, sizeof(*reps), size, pool); + + /* serialize sub-structures */ + svn_temp_serializer__add_leaf(context, (const void **)&reps->text, + reps->text_len); + svn_temp_serializer__add_leaf(context, (const void **)&reps->bases, + reps->base_count * sizeof(*reps->bases)); + svn_temp_serializer__add_leaf(context, + (const void **)&reps->first_instructions, + reps->rep_count * + sizeof(*reps->first_instructions)); + svn_temp_serializer__add_leaf(context, (const void **)&reps->instructions, + reps->instruction_count * + sizeof(*reps->instructions)); + + /* return the serialized result */ + serialized = svn_temp_serializer__get(context); + + *data = serialized->data; + *data_len = serialized->len; + + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__deserialize_reps_container(void **out, + void *data, + apr_size_t data_len, + apr_pool_t *pool) +{ + svn_fs_x__reps_t *reps = (svn_fs_x__reps_t *)data; + + /* de-serialize sub-structures */ + svn_temp_deserializer__resolve(reps, (void **)&reps->text); + svn_temp_deserializer__resolve(reps, (void **)&reps->bases); + svn_temp_deserializer__resolve(reps, (void **)&reps->first_instructions); + svn_temp_deserializer__resolve(reps, (void **)&reps->instructions); + + /* done */ + *out = reps; + + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__reps_get_func(void **out, + const void *data, + apr_size_t data_len, + void *baton, + apr_pool_t *pool) +{ + svn_fs_x__reps_baton_t *reps_baton = baton; + + /* get a usable reps structure */ + const svn_fs_x__reps_t *cached = data; + svn_fs_x__reps_t *reps = apr_pmemdup(pool, cached, sizeof(*reps)); + + reps->text + = svn_temp_deserializer__ptr(cached, (const void **)&cached->text); + reps->bases + = svn_temp_deserializer__ptr(cached, (const void **)&cached->bases); + reps->first_instructions + = svn_temp_deserializer__ptr(cached, + (const void **)&cached->first_instructions); + reps->instructions + = svn_temp_deserializer__ptr(cached, + (const void **)&cached->instructions); + + /* return an extractor for the selected item */ + SVN_ERR(svn_fs_x__reps_get((svn_fs_x__rep_extractor_t **)out, + reps_baton->fs, reps, reps_baton->idx, pool)); + + return SVN_NO_ERROR; +} diff --git a/subversion/libsvn_fs_x/reps.h b/subversion/libsvn_fs_x/reps.h new file mode 100644 index 0000000..720bfbf --- /dev/null +++ b/subversion/libsvn_fs_x/reps.h @@ -0,0 +1,190 @@ +/* reps.h --- FSX representation container + * + * ==================================================================== + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + * ==================================================================== + */ + +#ifndef SVN_LIBSVN_FS__REPS_H +#define SVN_LIBSVN_FS__REPS_H + +#include "svn_io.h" +#include "fs.h" + +/* This container type implements the start-delta (aka pick lists) data + * structure plus functions to create it and read data from it. The key + * point is to identify common sub-strings within a whole set of fulltexts + * instead of only two as in the classic txdelta code. + * + * Because it is relatively expensive to optimize the final in-memory + * layout, representation containers cannot be updated. A builder object + * will do most of the space saving when adding fulltexts but the final + * data will only be created immediately before serializing everything to + * disk. So, builders are write only and representation containers are + * read-only. + * + * Extracting data from a representation container is O(length) but it + * may require multiple iterations if base representations outside the + * container were used. Therefore, you will first create an extractor + * object (this may happen while holding a cache lock) and the you need + * to "drive" the extractor outside any cache context. + */ + +/* A write-only constructor object for representation containers. + */ +typedef struct svn_fs_x__reps_builder_t svn_fs_x__reps_builder_t; + +/* A read-only representation container - + * an opaque collection of fulltexts, i.e. byte strings. + */ +typedef struct svn_fs_x__reps_t svn_fs_x__reps_t; + +/* The fulltext extractor utility object. + */ +typedef struct svn_fs_x__rep_extractor_t svn_fs_x__rep_extractor_t; + +/* Baton type to be passed to svn_fs_x__reps_get_func. + */ +typedef struct svn_fs_x__reps_baton_t +{ + /* filesystem the resulting extractor shall operate on */ + svn_fs_t *fs; + + /* element index of the item to extract from the container */ + apr_size_t idx; +} svn_fs_x__reps_baton_t; + +/* Create and populate noderev containers. */ + +/* Create and return a new builder object, allocated in RESULT_POOL. + */ +svn_fs_x__reps_builder_t * +svn_fs_x__reps_builder_create(svn_fs_t *fs, + apr_pool_t *result_pool); + +/* To BUILDER, add reference to the fulltext currently stored in + * representation REP. Substrings matching with any of the base reps + * in BUILDER can be removed from the text base and be replaced by + * references to those base representations. + * + * The PRIORITY is a mere hint on which base representations should + * preferred in case we could re-use the same contents from multiple bases. + * Higher numerical value means higher priority / likelihood of being + * selected over others. + * + * Use SCRATCH_POOL for temporary allocations. + */ +svn_error_t * +svn_fs_x__reps_add_base(svn_fs_x__reps_builder_t *builder, + svn_fs_x__representation_t *rep, + int priority, + apr_pool_t *scratch_pool); + +/* Add the byte string CONTENTS to BUILDER. Return the item index under + * which the fulltext can be retrieved from the final container in *REP_IDX. + */ +svn_error_t * +svn_fs_x__reps_add(apr_size_t *rep_idx, + svn_fs_x__reps_builder_t *builder, + const svn_string_t *contents); + +/* Return a rough estimate in bytes for the serialized representation + * of BUILDER. + */ +apr_size_t +svn_fs_x__reps_estimate_size(const svn_fs_x__reps_builder_t *builder); + +/* Read from representation containers. */ + +/* For fulltext IDX in CONTAINER in filesystem FS, create an extract object + * allocated in POOL and return it in *EXTRACTOR. + */ +svn_error_t * +svn_fs_x__reps_get(svn_fs_x__rep_extractor_t **extractor, + svn_fs_t *fs, + const svn_fs_x__reps_t *container, + apr_size_t idx, + apr_pool_t *pool); + +/* Let the EXTRACTOR object fetch all parts of the desired fulltext and + * return the latter in *CONTENTS. If SIZE is not 0, return SIZE bytes + * starting at offset START_OFFSET of the full contents. If that range + * lies partly or completely outside the content, clip it accordingly. + * Allocate the result in RESULT_POOL and use SCRATCH_POOL for temporary + * allocations. + * + * Note, you may not run this inside a cache access function. + */ +svn_error_t * +svn_fs_x__extractor_drive(svn_stringbuf_t** contents, + svn_fs_x__rep_extractor_t* extractor, + apr_size_t start_offset, + apr_size_t size, + apr_pool_t* result_pool, + apr_pool_t* scratch_pool); + +/* I/O interface. */ + +/* Write a serialized representation of the final container described by + * BUILDER to STREAM. Use SCRATCH_POOL for temporary allocations. + */ +svn_error_t * +svn_fs_x__write_reps_container(svn_stream_t *stream, + const svn_fs_x__reps_builder_t *builder, + apr_pool_t *scratch_pool); + +/* Read a representations container from its serialized representation in + * STREAM. Allocate the result in RESULT_POOL and return it in *CONTAINER. + * Use SCRATCH_POOL for temporary allocations. + */ +svn_error_t * +svn_fs_x__read_reps_container(svn_fs_x__reps_t **container, + svn_stream_t *stream, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool); + +/* Implements #svn_cache__serialize_func_t for svn_fs_x__reps_t objects. + */ +svn_error_t * +svn_fs_x__serialize_reps_container(void **data, + apr_size_t *data_len, + void *in, + apr_pool_t *pool); + +/* Implements #svn_cache__deserialize_func_t for svn_fs_x__reps_t objects. + */ +svn_error_t * +svn_fs_x__deserialize_reps_container(void **out, + void *data, + apr_size_t data_len, + apr_pool_t *pool); + +/* Implements svn_cache__partial_getter_func_t for svn_fs_x__reps_t, + * setting *OUT to an svn_fs_x__rep_extractor_t object defined by the + * svn_fs_x__reps_baton_t passed in as *BATON. This function is similar + * to svn_fs_x__reps_get but operates on the cache serialized + * representation of the container. + */ +svn_error_t * +svn_fs_x__reps_get_func(void **out, + const void *data, + apr_size_t data_len, + void *baton, + apr_pool_t *pool); + +#endif diff --git a/subversion/libsvn_fs_x/rev_file.c b/subversion/libsvn_fs_x/rev_file.c new file mode 100644 index 0000000..445d45b --- /dev/null +++ b/subversion/libsvn_fs_x/rev_file.c @@ -0,0 +1,316 @@ +/* rev_file.c --- revision file and index access functions + * + * ==================================================================== + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + * ==================================================================== + */ + +#include "rev_file.h" +#include "fs_x.h" +#include "index.h" +#include "low_level.h" +#include "util.h" + +#include "../libsvn_fs/fs-loader.h" + +#include "private/svn_io_private.h" +#include "svn_private_config.h" + +/* Return a new revision file instance, allocated in RESULT_POOL, for + * filesystem FS. Set its pool member to the provided RESULT_POOL. */ +static svn_fs_x__revision_file_t * +create_revision_file(svn_fs_t *fs, + apr_pool_t *result_pool) +{ + svn_fs_x__data_t *ffd = fs->fsap_data; + svn_fs_x__revision_file_t *file = apr_palloc(result_pool, sizeof(*file)); + + file->is_packed = FALSE; + file->start_revision = SVN_INVALID_REVNUM; + + file->file = NULL; + file->stream = NULL; + file->p2l_stream = NULL; + file->l2p_stream = NULL; + file->block_size = ffd->block_size; + file->l2p_offset = -1; + file->l2p_checksum = NULL; + file->p2l_offset = -1; + file->p2l_checksum = NULL; + file->footer_offset = -1; + file->pool = result_pool; + + return file; +} + +/* Return a new revision file instance, allocated in RESULT_POOL, for + * REVISION in filesystem FS. Set its pool member to the provided + * RESULT_POOL. */ +static svn_fs_x__revision_file_t * +init_revision_file(svn_fs_t *fs, + svn_revnum_t revision, + apr_pool_t *result_pool) +{ + svn_fs_x__revision_file_t *file = create_revision_file(fs, result_pool); + + file->is_packed = svn_fs_x__is_packed_rev(fs, revision); + file->start_revision = svn_fs_x__packed_base_rev(fs, revision); + + return file; +} + +/* Baton type for set_read_only() */ +typedef struct set_read_only_baton_t +{ + /* File to set to read-only. */ + const char *file_path; + + /* Scratch pool sufficient life time. + * Ideally the pool that we registered the cleanup on. */ + apr_pool_t *pool; +} set_read_only_baton_t; + +/* APR pool cleanup callback taking a set_read_only_baton_t baton and then + * (trying to) set the specified file to r/o mode. */ +static apr_status_t +set_read_only(void *baton) +{ + set_read_only_baton_t *ro_baton = baton; + apr_status_t status = APR_SUCCESS; + svn_error_t *err; + + err = svn_io_set_file_read_only(ro_baton->file_path, TRUE, ro_baton->pool); + if (err) + { + status = err->apr_err; + svn_error_clear(err); + } + + return status; +} + +/* If the file at PATH is read-only, attempt to make it writable. The + * original state will be restored with RESULT_POOL gets cleaned up. + * SCRATCH_POOL is for temporary allocations. */ +static svn_error_t * +auto_make_writable(const char *path, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool) +{ + svn_boolean_t is_read_only; + apr_finfo_t finfo; + + SVN_ERR(svn_io_stat(&finfo, path, SVN__APR_FINFO_READONLY, scratch_pool)); + SVN_ERR(svn_io__is_finfo_read_only(&is_read_only, &finfo, scratch_pool)); + + if (is_read_only) + { + /* Tell the pool to restore the r/o state upon cleanup + (assuming the file will still exist, failing silently otherwise). */ + set_read_only_baton_t *baton = apr_pcalloc(result_pool, + sizeof(*baton)); + baton->pool = result_pool; + baton->file_path = apr_pstrdup(result_pool, path); + apr_pool_cleanup_register(result_pool, baton, + set_read_only, apr_pool_cleanup_null); + + /* Finally, allow write access (undoing it has already been scheduled + and is idempotent). */ + SVN_ERR(svn_io_set_file_read_write(path, FALSE, scratch_pool)); + } + + return SVN_NO_ERROR; +} + +/* Core implementation of svn_fs_fs__open_pack_or_rev_file working on an + * existing, initialized FILE structure. If WRITABLE is TRUE, give write + * access to the file - temporarily resetting the r/o state if necessary. + */ +static svn_error_t * +open_pack_or_rev_file(svn_fs_x__revision_file_t *file, + svn_fs_t *fs, + svn_revnum_t rev, + svn_boolean_t writable, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool) +{ + svn_error_t *err; + svn_boolean_t retry = FALSE; + + do + { + const char *path = svn_fs_x__path_rev_absolute(fs, rev, scratch_pool); + apr_file_t *apr_file; + apr_int32_t flags = writable + ? APR_READ | APR_WRITE | APR_BUFFERED + : APR_READ | APR_BUFFERED; + + /* We may have to *temporarily* enable write access. */ + err = writable ? auto_make_writable(path, result_pool, scratch_pool) + : SVN_NO_ERROR; + + /* open the revision file in buffered r/o or r/w mode */ + if (!err) + err = svn_io_file_open(&apr_file, path, flags, APR_OS_DEFAULT, + result_pool); + + if (!err) + { + file->file = apr_file; + file->stream = svn_stream_from_aprfile2(apr_file, TRUE, + result_pool); + + return SVN_NO_ERROR; + } + + if (err && APR_STATUS_IS_ENOENT(err->apr_err)) + { + /* Could not open the file. This may happen if the + * file once existed but got packed later. */ + svn_error_clear(err); + + /* if that was our 2nd attempt, leave it at that. */ + if (retry) + return svn_error_createf(SVN_ERR_FS_NO_SUCH_REVISION, NULL, + _("No such revision %ld"), rev); + + /* We failed for the first time. Refresh cache & retry. */ + SVN_ERR(svn_fs_x__update_min_unpacked_rev(fs, scratch_pool)); + file->start_revision = svn_fs_x__packed_base_rev(fs, rev); + + retry = TRUE; + } + else + { + retry = FALSE; + } + } + while (retry); + + return svn_error_trace(err); +} + +svn_error_t * +svn_fs_x__open_pack_or_rev_file(svn_fs_x__revision_file_t **file, + svn_fs_t *fs, + svn_revnum_t rev, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool) +{ + *file = init_revision_file(fs, rev, result_pool); + return svn_error_trace(open_pack_or_rev_file(*file, fs, rev, FALSE, + result_pool, scratch_pool)); +} + +svn_error_t * +svn_fs_x__open_pack_or_rev_file_writable(svn_fs_x__revision_file_t** file, + svn_fs_t* fs, + svn_revnum_t rev, + apr_pool_t* result_pool, + apr_pool_t *scratch_pool) +{ + *file = init_revision_file(fs, rev, result_pool); + return svn_error_trace(open_pack_or_rev_file(*file, fs, rev, TRUE, + result_pool, scratch_pool)); +} + +svn_error_t * +svn_fs_x__auto_read_footer(svn_fs_x__revision_file_t *file) +{ + if (file->l2p_offset == -1) + { + apr_off_t filesize = 0; + unsigned char footer_length; + svn_stringbuf_t *footer; + + /* Determine file size. */ + SVN_ERR(svn_io_file_seek(file->file, APR_END, &filesize, file->pool)); + + /* Read last byte (containing the length of the footer). */ + SVN_ERR(svn_io_file_aligned_seek(file->file, file->block_size, NULL, + filesize - 1, file->pool)); + SVN_ERR(svn_io_file_read_full2(file->file, &footer_length, + sizeof(footer_length), NULL, NULL, + file->pool)); + + /* Read footer. */ + footer = svn_stringbuf_create_ensure(footer_length, file->pool); + SVN_ERR(svn_io_file_aligned_seek(file->file, file->block_size, NULL, + filesize - 1 - footer_length, + file->pool)); + SVN_ERR(svn_io_file_read_full2(file->file, footer->data, footer_length, + &footer->len, NULL, file->pool)); + footer->data[footer->len] = '\0'; + + /* Extract index locations. */ + SVN_ERR(svn_fs_x__parse_footer(&file->l2p_offset, &file->l2p_checksum, + &file->p2l_offset, &file->p2l_checksum, + footer, file->start_revision, + file->pool)); + file->footer_offset = filesize - footer_length - 1; + } + + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__open_proto_rev_file(svn_fs_x__revision_file_t **file, + svn_fs_t *fs, + svn_fs_x__txn_id_t txn_id, + apr_pool_t* result_pool, + apr_pool_t *scratch_pool) +{ + apr_file_t *apr_file; + SVN_ERR(svn_io_file_open(&apr_file, + svn_fs_x__path_txn_proto_rev(fs, txn_id, + scratch_pool), + APR_READ | APR_BUFFERED, APR_OS_DEFAULT, + result_pool)); + + return svn_error_trace(svn_fs_x__wrap_temp_rev_file(file, fs, apr_file, + result_pool)); +} + +svn_error_t * +svn_fs_x__wrap_temp_rev_file(svn_fs_x__revision_file_t **file, + svn_fs_t *fs, + apr_file_t *temp_file, + apr_pool_t *result_pool) +{ + *file = create_revision_file(fs, result_pool); + (*file)->file = temp_file; + (*file)->stream = svn_stream_from_aprfile2(temp_file, TRUE, result_pool); + + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__close_revision_file(svn_fs_x__revision_file_t *file) +{ + if (file->stream) + SVN_ERR(svn_stream_close(file->stream)); + if (file->file) + SVN_ERR(svn_io_file_close(file->file, file->pool)); + + file->file = NULL; + file->stream = NULL; + file->l2p_stream = NULL; + file->p2l_stream = NULL; + + return SVN_NO_ERROR; +} diff --git a/subversion/libsvn_fs_x/rev_file.h b/subversion/libsvn_fs_x/rev_file.h new file mode 100644 index 0000000..b96d035 --- /dev/null +++ b/subversion/libsvn_fs_x/rev_file.h @@ -0,0 +1,154 @@ +/* rev_file.h --- revision file and index access data structure + * + * ==================================================================== + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + * ==================================================================== + */ + +#ifndef SVN_LIBSVN_FS_X__REV_FILE_H +#define SVN_LIBSVN_FS_X__REV_FILE_H + +#include "svn_fs.h" +#include "id.h" + +/* In format 7, index files must be read in sync with the respective + * revision / pack file. I.e. we must use packed index files for packed + * rev files and unpacked ones for non-packed rev files. So, the whole + * point is to open them with matching "is packed" setting in case some + * background pack process was run. + */ + +/* Opaque index stream type. + */ +typedef struct svn_fs_x__packed_number_stream_t + svn_fs_x__packed_number_stream_t; + +/* Data file, including indexes data, and associated properties for + * START_REVISION. As the FILE is kept open, background pack operations + * will not cause access to this file to fail. + */ +typedef struct svn_fs_x__revision_file_t +{ + /* first (potentially only) revision in the rev / pack file. + * SVN_INVALID_REVNUM for txn proto-rev files. */ + svn_revnum_t start_revision; + + /* the revision was packed when the first file / stream got opened */ + svn_boolean_t is_packed; + + /* rev / pack file */ + apr_file_t *file; + + /* stream based on FILE and not NULL exactly when FILE is not NULL */ + svn_stream_t *stream; + + /* the opened P2L index stream or NULL. Always NULL for txns. */ + svn_fs_x__packed_number_stream_t *p2l_stream; + + /* the opened L2P index stream or NULL. Always NULL for txns. */ + svn_fs_x__packed_number_stream_t *l2p_stream; + + /* Copied from FS->FFD->BLOCK_SIZE upon creation. It allows us to + * use aligned seek() without having the FS handy. */ + apr_off_t block_size; + + /* Offset within FILE at which the rev data ends and the L2P index + * data starts. Less than P2L_OFFSET. -1 if svn_fs_fs__auto_read_footer + * has not been called, yet. */ + apr_off_t l2p_offset; + + /* MD5 checksum on the whole on-disk representation of the L2P index. + * NULL if svn_fs_fs__auto_read_footer has not been called, yet. */ + svn_checksum_t *l2p_checksum; + + /* Offset within FILE at which the L2P index ends and the P2L index + * data starts. Greater than L2P_OFFSET. -1 if svn_fs_fs__auto_read_footer + * has not been called, yet. */ + apr_off_t p2l_offset; + + /* MD5 checksum on the whole on-disk representation of the P2L index. + * NULL if svn_fs_fs__auto_read_footer has not been called, yet. */ + svn_checksum_t *p2l_checksum; + + /* Offset within FILE at which the P2L index ends and the footer starts. + * Greater than P2L_OFFSET. -1 if svn_fs_fs__auto_read_footer has not + * been called, yet. */ + apr_off_t footer_offset; + + /* pool containing this object */ + apr_pool_t *pool; +} svn_fs_x__revision_file_t; + +/* Open the correct revision file for REV. If the filesystem FS has + * been packed, *FILE will be set to the packed file; otherwise, set *FILE + * to the revision file for REV. Return SVN_ERR_FS_NO_SUCH_REVISION if the + * file doesn't exist. Allocate *FILE in RESULT_POOL and use SCRATCH_POOL + * for temporaries. */ +svn_error_t * +svn_fs_x__open_pack_or_rev_file(svn_fs_x__revision_file_t **file, + svn_fs_t *fs, + svn_revnum_t rev, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool); + +/* Open the correct revision file for REV with read and write access. + * If necessary, temporarily reset the file's read-only state. If the + * filesystem FS has been packed, *FILE will be set to the packed file; + * otherwise, set *FILE to the revision file for REV. + * + * Return SVN_ERR_FS_NO_SUCH_REVISION if the file doesn't exist. + * Allocate *FILE in RESULT_POOL and use SCRATCH_POOLfor temporaries. */ +svn_error_t * +svn_fs_x__open_pack_or_rev_file_writable(svn_fs_x__revision_file_t **file, + svn_fs_t *fs, + svn_revnum_t rev, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool); + +/* If the footer data in FILE has not been read, yet, do so now. + * Index locations will only be read upon request as we assume they get + * cached and the FILE is usually used for REP data access only. + * Hence, the separate step. + */ +svn_error_t * +svn_fs_x__auto_read_footer(svn_fs_x__revision_file_t *file); + +/* Open the proto-rev file of transaction TXN_ID in FS and return it in *FILE. + * Allocate *FILE in RESULT_POOL use and SCRATCH_POOL for temporaries.. */ +svn_error_t * +svn_fs_x__open_proto_rev_file(svn_fs_x__revision_file_t **file, + svn_fs_t *fs, + svn_fs_x__txn_id_t txn_id, + apr_pool_t* result_pool, + apr_pool_t *scratch_pool); + +/* Wrap the TEMP_FILE, used in the context of FS, into a revision file + * struct, allocated in RESULT_POOL, and return it in *FILE. + */ +svn_error_t * +svn_fs_x__wrap_temp_rev_file(svn_fs_x__revision_file_t **file, + svn_fs_t *fs, + apr_file_t *temp_file, + apr_pool_t *result_pool); + +/* Close all files and streams in FILE. + */ +svn_error_t * +svn_fs_x__close_revision_file(svn_fs_x__revision_file_t *file); + +#endif diff --git a/subversion/libsvn_fs_x/revprops.c b/subversion/libsvn_fs_x/revprops.c new file mode 100644 index 0000000..5bc62cc --- /dev/null +++ b/subversion/libsvn_fs_x/revprops.c @@ -0,0 +1,1948 @@ +/* revprops.c --- everything needed to handle revprops in FSX + * + * ==================================================================== + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + * ==================================================================== + */ + +#include <assert.h> +#include <apr_md5.h> + +#include "svn_pools.h" +#include "svn_hash.h" +#include "svn_dirent_uri.h" + +#include "fs_x.h" +#include "revprops.h" +#include "util.h" +#include "transaction.h" + +#include "private/svn_subr_private.h" +#include "private/svn_string_private.h" +#include "../libsvn_fs/fs-loader.h" + +#include "svn_private_config.h" + +/* Give writing processes 10 seconds to replace an existing revprop + file with a new one. After that time, we assume that the writing + process got aborted and that we have re-read revprops. */ +#define REVPROP_CHANGE_TIMEOUT (10 * 1000000) + +/* In case of an inconsistent read, close the generation file, yield, + re-open and re-read. This is the number of times we try this before + giving up. */ +#define GENERATION_READ_RETRY_COUNT 100 + +/* Maximum size of the generation number file contents (including NUL). */ +#define CHECKSUMMED_NUMBER_BUFFER_LEN \ + (SVN_INT64_BUFFER_SIZE + 3 + APR_MD5_DIGESTSIZE * 2) + + +svn_error_t * +svn_fs_x__upgrade_pack_revprops(svn_fs_t *fs, + svn_fs_upgrade_notify_t notify_func, + void *notify_baton, + svn_cancel_func_t cancel_func, + void *cancel_baton, + apr_pool_t *scratch_pool) +{ + svn_fs_x__data_t *ffd = fs->fsap_data; + const char *revprops_shard_path; + const char *revprops_pack_file_dir; + apr_int64_t shard; + apr_int64_t first_unpacked_shard + = ffd->min_unpacked_rev / ffd->max_files_per_dir; + + apr_pool_t *iterpool = svn_pool_create(scratch_pool); + const char *revsprops_dir = svn_dirent_join(fs->path, PATH_REVPROPS_DIR, + scratch_pool); + int compression_level = ffd->compress_packed_revprops + ? SVN_DELTA_COMPRESSION_LEVEL_DEFAULT + : SVN_DELTA_COMPRESSION_LEVEL_NONE; + + /* first, pack all revprops shards to match the packed revision shards */ + for (shard = 0; shard < first_unpacked_shard; ++shard) + { + svn_pool_clear(iterpool); + + revprops_pack_file_dir = svn_dirent_join(revsprops_dir, + apr_psprintf(iterpool, + "%" APR_INT64_T_FMT PATH_EXT_PACKED_SHARD, + shard), + iterpool); + revprops_shard_path = svn_dirent_join(revsprops_dir, + apr_psprintf(iterpool, "%" APR_INT64_T_FMT, shard), + iterpool); + + SVN_ERR(svn_fs_x__pack_revprops_shard(revprops_pack_file_dir, + revprops_shard_path, + shard, ffd->max_files_per_dir, + (int)(0.9 * ffd->revprop_pack_size), + compression_level, + cancel_func, cancel_baton, iterpool)); + if (notify_func) + SVN_ERR(notify_func(notify_baton, shard, + svn_fs_upgrade_pack_revprops, iterpool)); + } + + svn_pool_destroy(iterpool); + + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__upgrade_cleanup_pack_revprops(svn_fs_t *fs, + svn_fs_upgrade_notify_t notify_func, + void *notify_baton, + svn_cancel_func_t cancel_func, + void *cancel_baton, + apr_pool_t *scratch_pool) +{ + svn_fs_x__data_t *ffd = fs->fsap_data; + const char *revprops_shard_path; + apr_int64_t shard; + apr_int64_t first_unpacked_shard + = ffd->min_unpacked_rev / ffd->max_files_per_dir; + + apr_pool_t *iterpool = svn_pool_create(scratch_pool); + const char *revsprops_dir = svn_dirent_join(fs->path, PATH_REVPROPS_DIR, + scratch_pool); + + /* delete the non-packed revprops shards afterwards */ + for (shard = 0; shard < first_unpacked_shard; ++shard) + { + svn_pool_clear(iterpool); + + revprops_shard_path = svn_dirent_join(revsprops_dir, + apr_psprintf(iterpool, "%" APR_INT64_T_FMT, shard), + iterpool); + SVN_ERR(svn_fs_x__delete_revprops_shard(revprops_shard_path, + shard, ffd->max_files_per_dir, + cancel_func, cancel_baton, + iterpool)); + if (notify_func) + SVN_ERR(notify_func(notify_baton, shard, + svn_fs_upgrade_cleanup_revprops, iterpool)); + } + + svn_pool_destroy(iterpool); + + return SVN_NO_ERROR; +} + +/* Revprop caching management. + * + * Mechanism: + * ---------- + * + * Revprop caching needs to be activated and will be deactivated for the + * respective FS instance if the necessary infrastructure could not be + * initialized. As long as no revprops are being read or changed, revprop + * caching imposes no overhead. + * + * When activated, we cache revprops using (revision, generation) pairs + * as keys with the generation being incremented upon every revprop change. + * Since the cache is process-local, the generation needs to be tracked + * for at least as long as the process lives but may be reset afterwards. + * + * We track the revprop generation in a persistent, unbuffered file that + * we may keep open for the lifetime of the svn_fs_t. It is the OS' + * responsibility to provide us with the latest contents upon read. To + * detect incomplete updates due to non-atomic reads, we put a MD5 checksum + * next to the actual generation number and verify that it matches. + * + * Since we cannot guarantee that the OS will provide us with up-to-date + * data buffers for open files, we re-open and re-read the file before + * modifying it. This will prevent lost updates. + * + * A race condition exists between switching to the modified revprop data + * and bumping the generation number. In particular, the process may crash + * just after switching to the new revprop data and before bumping the + * generation. To be able to detect this scenario, we bump the generation + * twice per revprop change: once immediately before (creating an odd number) + * and once after the atomic switch (even generation). + * + * A writer holding the write lock can immediately assume a crashed writer + * in case of an odd generation or they would not have been able to acquire + * the lock. A reader detecting an odd generation will use that number and + * be forced to re-read any revprop data - usually getting the new revprops + * already. If the generation file modification timestamp is too old, the + * reader will assume a crashed writer, acquire the write lock and bump + * the generation if it is still odd. So, for about REVPROP_CHANGE_TIMEOUT + * after the crash, reader caches may be stale. + */ + +/* If the revprop generation file in FS is open, close it. This is a no-op + * if the file is not open. + */ +static svn_error_t * +close_revprop_generation_file(svn_fs_t *fs, + apr_pool_t *scratch_pool) +{ + svn_fs_x__data_t *ffd = fs->fsap_data; + if (ffd->revprop_generation_file) + { + SVN_ERR(svn_io_file_close(ffd->revprop_generation_file, scratch_pool)); + ffd->revprop_generation_file = NULL; + } + + return SVN_NO_ERROR; +} + +/* Make sure the revprop_generation member in FS is set. If READ_ONLY is + * set, open the file w/o write permission if the file is not open yet. + * The file is kept open if it has sufficient rights (or more) but will be + * closed and re-opened if it provided insufficient access rights. + * + * Call only for repos that support revprop caching. + */ +static svn_error_t * +open_revprop_generation_file(svn_fs_t *fs, + svn_boolean_t read_only, + apr_pool_t *scratch_pool) +{ + svn_fs_x__data_t *ffd = fs->fsap_data; + apr_int32_t flags = read_only ? APR_READ : (APR_READ | APR_WRITE); + + /* Close the current file handle if it has insufficient rights. */ + if ( ffd->revprop_generation_file + && (apr_file_flags_get(ffd->revprop_generation_file) & flags) != flags) + SVN_ERR(close_revprop_generation_file(fs, scratch_pool)); + + /* If not open already, open with sufficient rights. */ + if (ffd->revprop_generation_file == NULL) + { + const char *path = svn_fs_x__path_revprop_generation(fs, scratch_pool); + SVN_ERR(svn_io_file_open(&ffd->revprop_generation_file, path, + flags, APR_OS_DEFAULT, fs->pool)); + } + + return SVN_NO_ERROR; +} + +/* Return the textual representation of NUMBER and its checksum in *BUFFER. + */ +static svn_error_t * +checkedsummed_number(svn_stringbuf_t **buffer, + apr_int64_t number, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool) +{ + svn_checksum_t *checksum; + const char *digest; + + char str[SVN_INT64_BUFFER_SIZE]; + apr_size_t len = svn__i64toa(str, number); + str[len] = 0; + + SVN_ERR(svn_checksum(&checksum, svn_checksum_md5, str, len, scratch_pool)); + digest = svn_checksum_to_cstring_display(checksum, scratch_pool); + + *buffer = svn_stringbuf_createf(result_pool, "%s %s\n", digest, str); + + return SVN_NO_ERROR; +} + +/* Extract the generation number from the text BUFFER of LEN bytes and + * verify it against the checksum in the same BUFFER. If they match, return + * the generation in *NUMBER. Otherwise, return an error. + * BUFFER does not need to be NUL-terminated. + */ +static svn_error_t * +verify_extract_number(apr_int64_t *number, + const char *buffer, + apr_size_t len, + apr_pool_t *scratch_pool) +{ + const char *digest_end = strchr(buffer, ' '); + + /* Does the buffer even contain checksum _and_ number? */ + if (digest_end != NULL) + { + svn_checksum_t *expected; + svn_checksum_t *actual; + + SVN_ERR(svn_checksum_parse_hex(&expected, svn_checksum_md5, buffer, + scratch_pool)); + SVN_ERR(svn_checksum(&actual, svn_checksum_md5, digest_end + 1, + (buffer + len) - (digest_end + 1), scratch_pool)); + + if (svn_checksum_match(expected, actual)) + return svn_error_trace(svn_cstring_atoi64(number, digest_end + 1)); + } + + /* Incomplete buffer or not a match. */ + return svn_error_create(SVN_ERR_FS_INVALID_GENERATION, NULL, + _("Invalid generation number data.")); +} + +/* Read revprop generation as stored on disk for repository FS. The result is + * returned in *CURRENT. Call only for repos that support revprop caching. + */ +static svn_error_t * +read_revprop_generation_file(apr_int64_t *current, + svn_fs_t *fs, + apr_pool_t *scratch_pool) +{ + svn_fs_x__data_t *ffd = fs->fsap_data; + apr_pool_t *iterpool = svn_pool_create(scratch_pool); + char buf[CHECKSUMMED_NUMBER_BUFFER_LEN]; + apr_size_t len; + apr_off_t offset = 0; + int i; + svn_error_t *err = SVN_NO_ERROR; + + /* Retry in case of incomplete file buffer updates. */ + for (i = 0; i < GENERATION_READ_RETRY_COUNT; ++i) + { + svn_error_clear(err); + svn_pool_clear(iterpool); + + /* If we can't even access the data, things are very wrong. + * Don't retry in that case. + */ + SVN_ERR(open_revprop_generation_file(fs, TRUE, iterpool)); + SVN_ERR(svn_io_file_seek(ffd->revprop_generation_file, APR_SET, &offset, + iterpool)); + + len = sizeof(buf); + SVN_ERR(svn_io_read_length_line(ffd->revprop_generation_file, buf, &len, + iterpool)); + + /* Some data has been read. It will most likely be complete and + * consistent. Extract and verify anyway. */ + err = verify_extract_number(current, buf, len, iterpool); + if (!err) + break; + + /* Got unlucky and data was invalid. Retry. */ + SVN_ERR(close_revprop_generation_file(fs, iterpool)); + +#if APR_HAS_THREADS + apr_thread_yield(); +#else + apr_sleep(0); +#endif + } + + svn_pool_destroy(iterpool); + + /* If we had to give up, propagate the error. */ + return svn_error_trace(err); +} + +/* Write the CURRENT revprop generation to disk for repository FS. + * Call only for repos that support revprop caching. + */ +static svn_error_t * +write_revprop_generation_file(svn_fs_t *fs, + apr_int64_t current, + apr_pool_t *scratch_pool) +{ + svn_fs_x__data_t *ffd = fs->fsap_data; + svn_stringbuf_t *buffer; + apr_off_t offset = 0; + + SVN_ERR(checkedsummed_number(&buffer, current, scratch_pool, scratch_pool)); + + SVN_ERR(open_revprop_generation_file(fs, FALSE, scratch_pool)); + SVN_ERR(svn_io_file_seek(ffd->revprop_generation_file, APR_SET, &offset, + scratch_pool)); + SVN_ERR(svn_io_file_write_full(ffd->revprop_generation_file, buffer->data, + buffer->len, NULL, scratch_pool)); + SVN_ERR(svn_io_file_flush_to_disk(ffd->revprop_generation_file, + scratch_pool)); + + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__reset_revprop_generation_file(svn_fs_t *fs, + apr_pool_t *scratch_pool) +{ + const char *path = svn_fs_x__path_revprop_generation(fs, scratch_pool); + svn_stringbuf_t *buffer; + + /* Unconditionally close the revprop generation file. + * Don't care about FS formats. This ensures consistent internal state. */ + SVN_ERR(close_revprop_generation_file(fs, scratch_pool)); + + /* Unconditionally remove any old revprop generation file. + * Don't care about FS formats. This ensures consistent on-disk state + * for old format repositories. */ + SVN_ERR(svn_io_remove_file2(path, TRUE, scratch_pool)); + + /* Write the initial revprop generation file contents, if supported by + * the current format. This ensures consistent on-disk state for new + * format repositories. */ + SVN_ERR(checkedsummed_number(&buffer, 0, scratch_pool, scratch_pool)); + SVN_ERR(svn_io_write_atomic(path, buffer->data, buffer->len, NULL, + scratch_pool)); + + /* ffd->revprop_generation_file will be re-opened on demand. */ + + return SVN_NO_ERROR; +} + +/* Create an error object with the given MESSAGE and pass it to the + WARNING member of FS. Clears UNDERLYING_ERR. */ +static void +log_revprop_cache_init_warning(svn_fs_t *fs, + svn_error_t *underlying_err, + const char *message, + apr_pool_t *scratch_pool) +{ + svn_error_t *err = svn_error_createf( + SVN_ERR_FS_REVPROP_CACHE_INIT_FAILURE, + underlying_err, message, + svn_dirent_local_style(fs->path, scratch_pool)); + + if (fs->warning) + (fs->warning)(fs->warning_baton, err); + + svn_error_clear(err); +} + +/* Test whether revprop cache and necessary infrastructure are + available in FS. */ +static svn_boolean_t +has_revprop_cache(svn_fs_t *fs, + apr_pool_t *scratch_pool) +{ + svn_fs_x__data_t *ffd = fs->fsap_data; + svn_error_t *error; + + /* is the cache (still) enabled? */ + if (ffd->revprop_cache == NULL) + return FALSE; + + /* try initialize our file-backed infrastructure */ + error = open_revprop_generation_file(fs, TRUE, scratch_pool); + if (error) + { + /* failure -> disable revprop cache for good */ + + ffd->revprop_cache = NULL; + log_revprop_cache_init_warning(fs, error, + "Revprop caching for '%s' disabled " + "because infrastructure for revprop " + "caching failed to initialize.", + scratch_pool); + + return FALSE; + } + + return TRUE; +} + +/* Baton structure for revprop_generation_fixup. */ +typedef struct revprop_generation_fixup_t +{ + /* revprop generation to read */ + apr_int64_t *generation; + + /* file system context */ + svn_fs_t *fs; +} revprop_generation_upgrade_t; + +/* If the revprop generation has an odd value, it means the original writer + of the revprop got killed. We don't know whether that process as able + to change the revprop data but we assume that it was. Therefore, we + increase the generation in that case to basically invalidate everyone's + cache content. + Execute this only while holding the write lock to the repo in baton->FFD. + */ +static svn_error_t * +revprop_generation_fixup(void *void_baton, + apr_pool_t *scratch_pool) +{ + revprop_generation_upgrade_t *baton = void_baton; + svn_fs_x__data_t *ffd = baton->fs->fsap_data; + assert(ffd->has_write_lock); + + /* Make sure we don't operate on stale OS buffers. */ + SVN_ERR(close_revprop_generation_file(baton->fs, scratch_pool)); + + /* Maybe, either the original revprop writer or some other reader has + already corrected / bumped the revprop generation. Thus, we need + to read it again. However, we will now be the only ones changing + the file contents due to us holding the write lock. */ + SVN_ERR(read_revprop_generation_file(baton->generation, baton->fs, + scratch_pool)); + + /* Cause everyone to re-read revprops upon their next access, if the + last revprop write did not complete properly. */ + if (*baton->generation % 2) + { + ++*baton->generation; + SVN_ERR(write_revprop_generation_file(baton->fs, + *baton->generation, + scratch_pool)); + } + + return SVN_NO_ERROR; +} + +/* Read the current revprop generation and return it in *GENERATION. + Also, detect aborted / crashed writers and recover from that. + Use the access object in FS to set the shared mem values. */ +static svn_error_t * +read_revprop_generation(apr_int64_t *generation, + svn_fs_t *fs, + apr_pool_t *scratch_pool) +{ + apr_int64_t current = 0; + svn_fs_x__data_t *ffd = fs->fsap_data; + + /* read the current revprop generation number */ + SVN_ERR(read_revprop_generation_file(¤t, fs, scratch_pool)); + + /* is an unfinished revprop write under the way? */ + if (current % 2) + { + svn_boolean_t timeout = FALSE; + + /* Has the writer process been aborted? + * Either by timeout or by us being the writer now. + */ + if (!ffd->has_write_lock) + { + apr_time_t mtime; + SVN_ERR(svn_io_file_affected_time(&mtime, + svn_fs_x__path_revprop_generation(fs, scratch_pool), + scratch_pool)); + timeout = apr_time_now() > mtime + REVPROP_CHANGE_TIMEOUT; + } + + if (ffd->has_write_lock || timeout) + { + revprop_generation_upgrade_t baton; + baton.generation = ¤t; + baton.fs = fs; + + /* Ensure that the original writer process no longer exists by + * acquiring the write lock to this repository. Then, fix up + * the revprop generation. + */ + if (ffd->has_write_lock) + SVN_ERR(revprop_generation_fixup(&baton, scratch_pool)); + else + SVN_ERR(svn_fs_x__with_write_lock(fs, revprop_generation_fixup, + &baton, scratch_pool)); + } + } + + /* return the value we just got */ + *generation = current; + return SVN_NO_ERROR; +} + +/* Set the revprop generation in FS to the next odd number to indicate + that there is a revprop write process under way. Return that value + in *GENERATION. If the change times out, readers shall recover from + that state & re-read revprops. + This is a no-op for repo formats that don't support revprop caching. */ +static svn_error_t * +begin_revprop_change(apr_int64_t *generation, + svn_fs_t *fs, + apr_pool_t *scratch_pool) +{ + svn_fs_x__data_t *ffd = fs->fsap_data; + SVN_ERR_ASSERT(ffd->has_write_lock); + + /* Close and re-open to make sure we read the latest data. */ + SVN_ERR(close_revprop_generation_file(fs, scratch_pool)); + SVN_ERR(open_revprop_generation_file(fs, FALSE, scratch_pool)); + + /* Set the revprop generation to an odd value to indicate + * that a write is in progress. + */ + SVN_ERR(read_revprop_generation(generation, fs, scratch_pool)); + ++*generation; + SVN_ERR(write_revprop_generation_file(fs, *generation, scratch_pool)); + + return SVN_NO_ERROR; +} + +/* Set the revprop generation in FS to the next even generation after + the odd value in GENERATION to indicate that + a) readers shall re-read revprops, and + b) the write process has been completed (no recovery required). + This is a no-op for repo formats that don't support revprop caching. */ +static svn_error_t * +end_revprop_change(svn_fs_t *fs, + apr_int64_t generation, + apr_pool_t *scratch_pool) +{ + svn_fs_x__data_t *ffd = fs->fsap_data; + SVN_ERR_ASSERT(ffd->has_write_lock); + SVN_ERR_ASSERT(generation % 2); + + /* Set the revprop generation to an even value to indicate + * that a write has been completed. Since we held the write + * lock, nobody else could have updated the file contents. + */ + SVN_ERR(write_revprop_generation_file(fs, generation + 1, scratch_pool)); + + return SVN_NO_ERROR; +} + +/* Container for all data required to access the packed revprop file + * for a given REVISION. This structure will be filled incrementally + * by read_pack_revprops() its sub-routines. + */ +typedef struct packed_revprops_t +{ + /* revision number to read (not necessarily the first in the pack) */ + svn_revnum_t revision; + + /* current revprop generation. Used when populating the revprop cache */ + apr_int64_t generation; + + /* the actual revision properties */ + apr_hash_t *properties; + + /* their size when serialized to a single string + * (as found in PACKED_REVPROPS) */ + apr_size_t serialized_size; + + + /* name of the pack file (without folder path) */ + const char *filename; + + /* packed shard folder path */ + const char *folder; + + /* sum of values in SIZES */ + apr_size_t total_size; + + /* first revision in the pack (>= MANIFEST_START) */ + svn_revnum_t start_revision; + + /* size of the revprops in PACKED_REVPROPS */ + apr_array_header_t *sizes; + + /* offset of the revprops in PACKED_REVPROPS */ + apr_array_header_t *offsets; + + + /* concatenation of the serialized representation of all revprops + * in the pack, i.e. the pack content without header and compression */ + svn_stringbuf_t *packed_revprops; + + /* First revision covered by MANIFEST. + * Will equal the shard start revision or 1, for the 1st shard. */ + svn_revnum_t manifest_start; + + /* content of the manifest. + * Maps long(rev - MANIFEST_START) to const char* pack file name */ + apr_array_header_t *manifest; +} packed_revprops_t; + +/* Parse the serialized revprops in CONTENT and return them in *PROPERTIES. + * Also, put them into the revprop cache, if activated, for future use. + * Three more parameters are being used to update the revprop cache: FS is + * our file system, the revprops belong to REVISION and the global revprop + * GENERATION is used as well. + * + * The returned hash will be allocated in RESULT_POOL, SCRATCH_POOL is + * being used for temporary allocations. + */ +static svn_error_t * +parse_revprop(apr_hash_t **properties, + svn_fs_t *fs, + svn_revnum_t revision, + apr_int64_t generation, + svn_string_t *content, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool) +{ + svn_stream_t *stream = svn_stream_from_string(content, scratch_pool); + *properties = apr_hash_make(result_pool); + + SVN_ERR(svn_hash_read2(*properties, stream, SVN_HASH_TERMINATOR, + result_pool)); + if (has_revprop_cache(fs, scratch_pool)) + { + svn_fs_x__data_t *ffd = fs->fsap_data; + svn_fs_x__pair_cache_key_t key = { 0 }; + + key.revision = revision; + key.second = generation; + SVN_ERR(svn_cache__set(ffd->revprop_cache, &key, *properties, + scratch_pool)); + } + + return SVN_NO_ERROR; +} + +/* Read the non-packed revprops for revision REV in FS, put them into the + * revprop cache if activated and return them in *PROPERTIES. GENERATION + * is the current revprop generation. + * + * If the data could not be read due to an otherwise recoverable error, + * leave *PROPERTIES unchanged. No error will be returned in that case. + * + * Allocate *PROPERTIES in RESULT_POOL and temporaries in SCRATCH_POOL. + */ +static svn_error_t * +read_non_packed_revprop(apr_hash_t **properties, + svn_fs_t *fs, + svn_revnum_t rev, + apr_int64_t generation, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool) +{ + svn_stringbuf_t *content = NULL; + apr_pool_t *iterpool = svn_pool_create(scratch_pool); + svn_boolean_t missing = FALSE; + int i; + + for (i = 0; + i < SVN_FS_X__RECOVERABLE_RETRY_COUNT && !missing && !content; + ++i) + { + svn_pool_clear(iterpool); + SVN_ERR(svn_fs_x__try_stringbuf_from_file(&content, + &missing, + svn_fs_x__path_revprops(fs, rev, iterpool), + i + 1 < SVN_FS_X__RECOVERABLE_RETRY_COUNT, + iterpool)); + } + + if (content) + SVN_ERR(parse_revprop(properties, fs, rev, generation, + svn_stringbuf__morph_into_string(content), + result_pool, iterpool)); + + svn_pool_clear(iterpool); + + return SVN_NO_ERROR; +} + +/* Return the minimum length of any packed revprop file name in REVPROPS. */ +static apr_size_t +get_min_filename_len(packed_revprops_t *revprops) +{ + char number_buffer[SVN_INT64_BUFFER_SIZE]; + + /* The revprop filenames have the format <REV>.<COUNT> - with <REV> being + * at least the first rev in the shard and <COUNT> having at least one + * digit. Thus, the minimum is 2 + #decimal places in the start rev. + */ + return svn__i64toa(number_buffer, revprops->manifest_start) + 2; +} + +/* Given FS and REVPROPS->REVISION, fill the FILENAME, FOLDER and MANIFEST + * members. Use RESULT_POOL for allocating results and SCRATCH_POOL for + * temporaries. + */ +static svn_error_t * +get_revprop_packname(svn_fs_t *fs, + packed_revprops_t *revprops, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool) +{ + svn_fs_x__data_t *ffd = fs->fsap_data; + svn_stringbuf_t *content = NULL; + const char *manifest_file_path; + int idx, rev_count; + char *buffer, *buffer_end; + const char **filenames, **filenames_end; + apr_size_t min_filename_len; + + /* Determine the dimensions. Rev 0 is excluded from the first shard. */ + rev_count = ffd->max_files_per_dir; + revprops->manifest_start + = revprops->revision - (revprops->revision % rev_count); + if (revprops->manifest_start == 0) + { + ++revprops->manifest_start; + --rev_count; + } + + revprops->manifest = apr_array_make(result_pool, rev_count, + sizeof(const char*)); + + /* No line in the file can be less than this number of chars long. */ + min_filename_len = get_min_filename_len(revprops); + + /* Read the content of the manifest file */ + revprops->folder + = svn_fs_x__path_revprops_pack_shard(fs, revprops->revision, result_pool); + manifest_file_path = svn_dirent_join(revprops->folder, PATH_MANIFEST, + result_pool); + + SVN_ERR(svn_fs_x__read_content(&content, manifest_file_path, result_pool)); + + /* There CONTENT must have a certain minimal size and there no + * unterminated lines at the end of the file. Both guarantees also + * simplify the parser loop below. + */ + if ( content->len < rev_count * (min_filename_len + 1) + || content->data[content->len - 1] != '\n') + return svn_error_createf(SVN_ERR_FS_CORRUPT, NULL, + _("Packed revprop manifest for r%ld not " + "properly terminated"), revprops->revision); + + /* Chop (parse) the manifest CONTENT into filenames, one per line. + * We only have to replace all newlines with NUL and add all line + * starts to REVPROPS->MANIFEST. + * + * There must be exactly REV_COUNT lines and that is the number of + * lines we parse from BUFFER to FILENAMES. Set the end pointer for + * the source BUFFER such that BUFFER+MIN_FILENAME_LEN is still valid + * BUFFER_END is always valid due to CONTENT->LEN > MIN_FILENAME_LEN. + * + * Please note that this loop is performance critical for e.g. 'svn log'. + * It is run 1000x per revprop access, i.e. per revision and about + * 50 million times per sec (and CPU core). + */ + for (filenames = (const char **)revprops->manifest->elts, + filenames_end = filenames + rev_count, + buffer = content->data, + buffer_end = buffer + content->len - min_filename_len; + (filenames < filenames_end) && (buffer < buffer_end); + ++filenames) + { + /* BUFFER always points to the start of the next line / filename. */ + *filenames = buffer; + + /* Find the next EOL. This is guaranteed to stay within the CONTENT + * buffer because we left enough room after BUFFER_END and we know + * we will always see a newline as the last non-NUL char. */ + buffer += min_filename_len; + while (*buffer != '\n') + ++buffer; + + /* Found EOL. Turn it into the filename terminator and move BUFFER + * to the start of the next line or CONTENT buffer end. */ + *buffer = '\0'; + ++buffer; + } + + /* We must have reached the end of both buffers. */ + if (buffer < content->data + content->len) + return svn_error_createf(SVN_ERR_FS_CORRUPT, NULL, + _("Packed revprop manifest for r%ld " + "has too many entries"), revprops->revision); + + if (filenames < filenames_end) + return svn_error_createf(SVN_ERR_FS_CORRUPT, NULL, + _("Packed revprop manifest for r%ld " + "has too few entries"), revprops->revision); + + /* The target array has now exactly one entry per revision. */ + revprops->manifest->nelts = rev_count; + + /* Now get the file name */ + idx = (int)(revprops->revision - revprops->manifest_start); + revprops->filename = APR_ARRAY_IDX(revprops->manifest, idx, const char*); + + return SVN_NO_ERROR; +} + +/* Return TRUE, if revision R1 and R2 refer to the same shard in FS. + */ +static svn_boolean_t +same_shard(svn_fs_t *fs, + svn_revnum_t r1, + svn_revnum_t r2) +{ + svn_fs_x__data_t *ffd = fs->fsap_data; + return (r1 / ffd->max_files_per_dir) == (r2 / ffd->max_files_per_dir); +} + +/* Given FS and the full packed file content in REVPROPS->PACKED_REVPROPS, + * fill the START_REVISION member, and make PACKED_REVPROPS point to the + * first serialized revprop. If READ_ALL is set, initialize the SIZES + * and OFFSETS members as well. + * + * Parse the revprops for REVPROPS->REVISION and set the PROPERTIES as + * well as the SERIALIZED_SIZE member. If revprop caching has been + * enabled, parse all revprops in the pack and cache them. + */ +static svn_error_t * +parse_packed_revprops(svn_fs_t *fs, + packed_revprops_t *revprops, + svn_boolean_t read_all, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool) +{ + svn_stream_t *stream; + apr_int64_t first_rev, count, i; + apr_off_t offset; + const char *header_end; + apr_pool_t *iterpool = svn_pool_create(scratch_pool); + svn_boolean_t cache_all = has_revprop_cache(fs, scratch_pool); + + /* decompress (even if the data is only "stored", there is still a + * length header to remove) */ + svn_stringbuf_t *compressed = revprops->packed_revprops; + svn_stringbuf_t *uncompressed = svn_stringbuf_create_empty(result_pool); + SVN_ERR(svn__decompress(compressed, uncompressed, APR_SIZE_MAX)); + + /* read first revision number and number of revisions in the pack */ + stream = svn_stream_from_stringbuf(uncompressed, scratch_pool); + SVN_ERR(svn_fs_x__read_number_from_stream(&first_rev, NULL, stream, + iterpool)); + SVN_ERR(svn_fs_x__read_number_from_stream(&count, NULL, stream, iterpool)); + + /* Check revision range for validity. */ + if ( !same_shard(fs, revprops->revision, first_rev) + || !same_shard(fs, revprops->revision, first_rev + count - 1) + || count < 1) + return svn_error_createf(SVN_ERR_FS_CORRUPT, NULL, + _("Revprop pack for revision r%ld" + " contains revprops for r%ld .. r%ld"), + revprops->revision, + (svn_revnum_t)first_rev, + (svn_revnum_t)(first_rev + count -1)); + + /* Since start & end are in the same shard, it is enough to just test + * the FIRST_REV for being actually packed. That will also cover the + * special case of rev 0 never being packed. */ + if (!svn_fs_x__is_packed_revprop(fs, first_rev)) + return svn_error_createf(SVN_ERR_FS_CORRUPT, NULL, + _("Revprop pack for revision r%ld" + " starts at non-packed revisions r%ld"), + revprops->revision, (svn_revnum_t)first_rev); + + /* make PACKED_REVPROPS point to the first char after the header. + * This is where the serialized revprops are. */ + header_end = strstr(uncompressed->data, "\n\n"); + if (header_end == NULL) + return svn_error_create(SVN_ERR_FS_CORRUPT, NULL, + _("Header end not found")); + + offset = header_end - uncompressed->data + 2; + + revprops->packed_revprops = svn_stringbuf_create_empty(result_pool); + revprops->packed_revprops->data = uncompressed->data + offset; + revprops->packed_revprops->len = (apr_size_t)(uncompressed->len - offset); + revprops->packed_revprops->blocksize = (apr_size_t)(uncompressed->blocksize - offset); + + /* STREAM still points to the first entry in the sizes list. */ + revprops->start_revision = (svn_revnum_t)first_rev; + if (read_all) + { + /* Init / construct REVPROPS members. */ + revprops->sizes = apr_array_make(result_pool, (int)count, + sizeof(offset)); + revprops->offsets = apr_array_make(result_pool, (int)count, + sizeof(offset)); + } + + /* Now parse, revision by revision, the size and content of each + * revisions' revprops. */ + for (i = 0, offset = 0, revprops->total_size = 0; i < count; ++i) + { + apr_int64_t size; + svn_string_t serialized; + svn_revnum_t revision = (svn_revnum_t)(first_rev + i); + svn_pool_clear(iterpool); + + /* read & check the serialized size */ + SVN_ERR(svn_fs_x__read_number_from_stream(&size, NULL, stream, + iterpool)); + if (size + offset > (apr_int64_t)revprops->packed_revprops->len) + return svn_error_create(SVN_ERR_FS_CORRUPT, NULL, + _("Packed revprop size exceeds pack file size")); + + /* Parse this revprops list, if necessary */ + serialized.data = revprops->packed_revprops->data + offset; + serialized.len = (apr_size_t)size; + + if (revision == revprops->revision) + { + /* Parse (and possibly cache) the one revprop list we care about. */ + SVN_ERR(parse_revprop(&revprops->properties, fs, revision, + revprops->generation, &serialized, + result_pool, iterpool)); + revprops->serialized_size = serialized.len; + + /* If we only wanted the revprops for REVISION then we are done. */ + if (!read_all && !cache_all) + break; + } + else if (cache_all) + { + /* Parse and cache all other revprop lists. */ + apr_hash_t *properties; + SVN_ERR(parse_revprop(&properties, fs, revision, + revprops->generation, &serialized, + iterpool, iterpool)); + } + + if (read_all) + { + /* fill REVPROPS data structures */ + APR_ARRAY_PUSH(revprops->sizes, apr_off_t) = serialized.len; + APR_ARRAY_PUSH(revprops->offsets, apr_off_t) = offset; + } + revprops->total_size += serialized.len; + + offset += serialized.len; + } + + return SVN_NO_ERROR; +} + +/* In filesystem FS, read the packed revprops for revision REV into + * *REVPROPS. Use GENERATION to populate the revprop cache, if enabled. + * If you want to modify revprop contents / update REVPROPS, READ_ALL + * must be set. Otherwise, only the properties of REV are being provided. + * + * Allocate *PROPERTIES in RESULT_POOL and temporaries in SCRATCH_POOL. + */ +static svn_error_t * +read_pack_revprop(packed_revprops_t **revprops, + svn_fs_t *fs, + svn_revnum_t rev, + apr_int64_t generation, + svn_boolean_t read_all, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool) +{ + apr_pool_t *iterpool = svn_pool_create(scratch_pool); + svn_boolean_t missing = FALSE; + svn_error_t *err; + packed_revprops_t *result; + int i; + + /* someone insisted that REV is packed. Double-check if necessary */ + if (!svn_fs_x__is_packed_revprop(fs, rev)) + SVN_ERR(svn_fs_x__update_min_unpacked_rev(fs, iterpool)); + + if (!svn_fs_x__is_packed_revprop(fs, rev)) + return svn_error_createf(SVN_ERR_FS_NO_SUCH_REVISION, NULL, + _("No such packed revision %ld"), rev); + + /* initialize the result data structure */ + result = apr_pcalloc(result_pool, sizeof(*result)); + result->revision = rev; + result->generation = generation; + + /* try to read the packed revprops. This may require retries if we have + * concurrent writers. */ + for (i = 0; + i < SVN_FS_X__RECOVERABLE_RETRY_COUNT && !result->packed_revprops; + ++i) + { + const char *file_path; + svn_pool_clear(iterpool); + + /* there might have been concurrent writes. + * Re-read the manifest and the pack file. + */ + SVN_ERR(get_revprop_packname(fs, result, result_pool, iterpool)); + file_path = svn_dirent_join(result->folder, + result->filename, + iterpool); + SVN_ERR(svn_fs_x__try_stringbuf_from_file(&result->packed_revprops, + &missing, + file_path, + i + 1 < SVN_FS_X__RECOVERABLE_RETRY_COUNT, + result_pool)); + + /* If we could not find the file, there was a write. + * So, we should refresh our revprop generation info as well such + * that others may find data we will put into the cache. They would + * consider it outdated, otherwise. + */ + if (missing && has_revprop_cache(fs, iterpool)) + SVN_ERR(read_revprop_generation(&result->generation, fs, iterpool)); + } + + /* the file content should be available now */ + if (!result->packed_revprops) + return svn_error_createf(SVN_ERR_FS_PACKED_REVPROP_READ_FAILURE, NULL, + _("Failed to read revprop pack file for r%ld"), rev); + + /* parse it. RESULT will be complete afterwards. */ + err = parse_packed_revprops(fs, result, read_all, result_pool, iterpool); + svn_pool_destroy(iterpool); + if (err) + return svn_error_createf(SVN_ERR_FS_CORRUPT, err, + _("Revprop pack file for r%ld is corrupt"), rev); + + *revprops = result; + + return SVN_NO_ERROR; +} + +/* Read the revprops for revision REV in FS and return them in *PROPERTIES_P. + * + * Allocations will be done in POOL. + */ +svn_error_t * +svn_fs_x__get_revision_proplist(apr_hash_t **proplist_p, + svn_fs_t *fs, + svn_revnum_t rev, + svn_boolean_t bypass_cache, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool) +{ + svn_fs_x__data_t *ffd = fs->fsap_data; + apr_int64_t generation = 0; + + /* not found, yet */ + *proplist_p = NULL; + + /* should they be available at all? */ + SVN_ERR(svn_fs_x__ensure_revision_exists(rev, fs, scratch_pool)); + + /* Try cache lookup first. */ + if (!bypass_cache && has_revprop_cache(fs, scratch_pool)) + { + svn_boolean_t is_cached; + svn_fs_x__pair_cache_key_t key = { 0 }; + + SVN_ERR(read_revprop_generation(&generation, fs, scratch_pool)); + + key.revision = rev; + key.second = generation; + SVN_ERR(svn_cache__get((void **) proplist_p, &is_cached, + ffd->revprop_cache, &key, result_pool)); + if (is_cached) + return SVN_NO_ERROR; + } + + /* if REV had not been packed when we began, try reading it from the + * non-packed shard. If that fails, we will fall through to packed + * shard reads. */ + if (!svn_fs_x__is_packed_revprop(fs, rev)) + { + svn_error_t *err = read_non_packed_revprop(proplist_p, fs, rev, + generation, result_pool, + scratch_pool); + if (err) + { + if (!APR_STATUS_IS_ENOENT(err->apr_err)) + return svn_error_trace(err); + + svn_error_clear(err); + *proplist_p = NULL; /* in case read_non_packed_revprop changed it */ + } + } + + /* if revprop packing is available and we have not read the revprops, yet, + * try reading them from a packed shard. If that fails, REV is most + * likely invalid (or its revprops highly contested). */ + if (!*proplist_p) + { + packed_revprops_t *revprops; + SVN_ERR(read_pack_revprop(&revprops, fs, rev, generation, FALSE, + result_pool, scratch_pool)); + *proplist_p = revprops->properties; + } + + /* The revprops should have been there. Did we get them? */ + if (!*proplist_p) + return svn_error_createf(SVN_ERR_FS_NO_SUCH_REVISION, NULL, + _("Could not read revprops for revision %ld"), + rev); + + return SVN_NO_ERROR; +} + +/* Serialize the revision property list PROPLIST of revision REV in + * filesystem FS to a non-packed file. Return the name of that temporary + * file in *TMP_PATH and the file path that it must be moved to in + * *FINAL_PATH. + * + * Allocate *FINAL_PATH and *TMP_PATH in RESULT_POOL. Use SCRATCH_POOL + * for temporary allocations. + */ +static svn_error_t * +write_non_packed_revprop(const char **final_path, + const char **tmp_path, + svn_fs_t *fs, + svn_revnum_t rev, + apr_hash_t *proplist, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool) +{ + svn_stream_t *stream; + *final_path = svn_fs_x__path_revprops(fs, rev, result_pool); + + /* ### do we have a directory sitting around already? we really shouldn't + ### have to get the dirname here. */ + SVN_ERR(svn_stream_open_unique(&stream, tmp_path, + svn_dirent_dirname(*final_path, + scratch_pool), + svn_io_file_del_none, + result_pool, scratch_pool)); + SVN_ERR(svn_hash_write2(proplist, stream, SVN_HASH_TERMINATOR, + scratch_pool)); + SVN_ERR(svn_stream_close(stream)); + + return SVN_NO_ERROR; +} + +/* After writing the new revprop file(s), call this function to move the + * file at TMP_PATH to FINAL_PATH and give it the permissions from + * PERMS_REFERENCE. + * + * If indicated in BUMP_GENERATION, increase FS' revprop generation. + * Finally, delete all the temporary files given in FILES_TO_DELETE. + * The latter may be NULL. + * + * Use SCRATCH_POOL for temporary allocations. + */ +static svn_error_t * +switch_to_new_revprop(svn_fs_t *fs, + const char *final_path, + const char *tmp_path, + const char *perms_reference, + apr_array_header_t *files_to_delete, + svn_boolean_t bump_generation, + apr_pool_t *scratch_pool) +{ + apr_int64_t generation; + + /* Now, we may actually be replacing revprops. Make sure that all other + threads and processes will know about this. */ + if (bump_generation) + SVN_ERR(begin_revprop_change(&generation, fs, scratch_pool)); + + SVN_ERR(svn_fs_x__move_into_place(tmp_path, final_path, perms_reference, + scratch_pool)); + + /* Indicate that the update (if relevant) has been completed. */ + if (bump_generation) + SVN_ERR(end_revprop_change(fs, generation, scratch_pool)); + + /* Clean up temporary files, if necessary. */ + if (files_to_delete) + { + apr_pool_t *iterpool = svn_pool_create(scratch_pool); + int i; + + for (i = 0; i < files_to_delete->nelts; ++i) + { + const char *path = APR_ARRAY_IDX(files_to_delete, i, const char*); + + svn_pool_clear(iterpool); + SVN_ERR(svn_io_remove_file2(path, TRUE, iterpool)); + } + + svn_pool_destroy(iterpool); + } + return SVN_NO_ERROR; +} + +/* Write a pack file header to STREAM that starts at revision START_REVISION + * and contains the indexes [START,END) of SIZES. + */ +static svn_error_t * +serialize_revprops_header(svn_stream_t *stream, + svn_revnum_t start_revision, + apr_array_header_t *sizes, + int start, + int end, + apr_pool_t *scratch_pool) +{ + apr_pool_t *iterpool = svn_pool_create(scratch_pool); + int i; + + SVN_ERR_ASSERT(start < end); + + /* start revision and entry count */ + SVN_ERR(svn_stream_printf(stream, scratch_pool, "%ld\n", start_revision)); + SVN_ERR(svn_stream_printf(stream, scratch_pool, "%d\n", end - start)); + + /* the sizes array */ + for (i = start; i < end; ++i) + { + /* Non-standard pool usage. + * + * We only allocate a few bytes each iteration -- even with a + * million iterations we would still be in good shape memory-wise. + */ + apr_off_t size = APR_ARRAY_IDX(sizes, i, apr_off_t); + SVN_ERR(svn_stream_printf(stream, iterpool, "%" APR_OFF_T_FMT "\n", + size)); + } + + /* the double newline char indicates the end of the header */ + SVN_ERR(svn_stream_printf(stream, iterpool, "\n")); + + svn_pool_destroy(iterpool); + return SVN_NO_ERROR; +} + +/* Writes the a pack file to FILE_STREAM. It copies the serialized data + * from REVPROPS for the indexes [START,END) except for index CHANGED_INDEX. + * + * The data for the latter is taken from NEW_SERIALIZED. Note, that + * CHANGED_INDEX may be outside the [START,END) range, i.e. no new data is + * taken in that case but only a subset of the old data will be copied. + * + * NEW_TOTAL_SIZE is a hint for pre-allocating buffers of appropriate size. + * SCRATCH_POOL is used for temporary allocations. + */ +static svn_error_t * +repack_revprops(svn_fs_t *fs, + packed_revprops_t *revprops, + int start, + int end, + int changed_index, + svn_stringbuf_t *new_serialized, + apr_off_t new_total_size, + svn_stream_t *file_stream, + apr_pool_t *scratch_pool) +{ + svn_fs_x__data_t *ffd = fs->fsap_data; + svn_stream_t *stream; + int i; + + /* create data empty buffers and the stream object */ + svn_stringbuf_t *uncompressed + = svn_stringbuf_create_ensure((apr_size_t)new_total_size, scratch_pool); + svn_stringbuf_t *compressed + = svn_stringbuf_create_empty(scratch_pool); + stream = svn_stream_from_stringbuf(uncompressed, scratch_pool); + + /* write the header*/ + SVN_ERR(serialize_revprops_header(stream, revprops->start_revision + start, + revprops->sizes, start, end, + scratch_pool)); + + /* append the serialized revprops */ + for (i = start; i < end; ++i) + if (i == changed_index) + { + SVN_ERR(svn_stream_write(stream, + new_serialized->data, + &new_serialized->len)); + } + else + { + apr_size_t size + = (apr_size_t)APR_ARRAY_IDX(revprops->sizes, i, apr_off_t); + apr_size_t offset + = (apr_size_t)APR_ARRAY_IDX(revprops->offsets, i, apr_off_t); + + SVN_ERR(svn_stream_write(stream, + revprops->packed_revprops->data + offset, + &size)); + } + + /* flush the stream buffer (if any) to our underlying data buffer */ + SVN_ERR(svn_stream_close(stream)); + + /* compress / store the data */ + SVN_ERR(svn__compress(uncompressed, + compressed, + ffd->compress_packed_revprops + ? SVN_DELTA_COMPRESSION_LEVEL_DEFAULT + : SVN_DELTA_COMPRESSION_LEVEL_NONE)); + + /* finally, write the content to the target stream and close it */ + SVN_ERR(svn_stream_write(file_stream, compressed->data, &compressed->len)); + SVN_ERR(svn_stream_close(file_stream)); + + return SVN_NO_ERROR; +} + +/* Allocate a new pack file name for revisions + * [REVPROPS->START_REVISION + START, REVPROPS->START_REVISION + END - 1] + * of REVPROPS->MANIFEST. Add the name of old file to FILES_TO_DELETE, + * auto-create that array if necessary. Return an open file stream to + * the new file in *STREAM allocated in RESULT_POOL. Allocate the paths + * in *FILES_TO_DELETE from the same pool that contains the array itself. + * + * Use SCRATCH_POOL for temporary allocations. + */ +static svn_error_t * +repack_stream_open(svn_stream_t **stream, + svn_fs_t *fs, + packed_revprops_t *revprops, + int start, + int end, + apr_array_header_t **files_to_delete, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool) +{ + apr_int64_t tag; + const char *tag_string; + svn_string_t *new_filename; + int i; + apr_file_t *file; + int manifest_offset + = (int)(revprops->start_revision - revprops->manifest_start); + + /* get the old (= current) file name and enlist it for later deletion */ + const char *old_filename = APR_ARRAY_IDX(revprops->manifest, + start + manifest_offset, + const char*); + + if (*files_to_delete == NULL) + *files_to_delete = apr_array_make(result_pool, 3, sizeof(const char*)); + + APR_ARRAY_PUSH(*files_to_delete, const char*) + = svn_dirent_join(revprops->folder, old_filename, + (*files_to_delete)->pool); + + /* increase the tag part, i.e. the counter after the dot */ + tag_string = strchr(old_filename, '.'); + if (tag_string == NULL) + return svn_error_createf(SVN_ERR_FS_CORRUPT, NULL, + _("Packed file '%s' misses a tag"), + old_filename); + + SVN_ERR(svn_cstring_atoi64(&tag, tag_string + 1)); + new_filename = svn_string_createf((*files_to_delete)->pool, + "%ld.%" APR_INT64_T_FMT, + revprops->start_revision + start, + ++tag); + + /* update the manifest to point to the new file */ + for (i = start; i < end; ++i) + APR_ARRAY_IDX(revprops->manifest, i + manifest_offset, const char*) + = new_filename->data; + + /* create a file stream for the new file */ + SVN_ERR(svn_io_file_open(&file, svn_dirent_join(revprops->folder, + new_filename->data, + scratch_pool), + APR_WRITE | APR_CREATE, APR_OS_DEFAULT, + result_pool)); + *stream = svn_stream_from_aprfile2(file, FALSE, result_pool); + + return SVN_NO_ERROR; +} + +/* For revision REV in filesystem FS, set the revision properties to + * PROPLIST. Return a new file in *TMP_PATH that the caller shall move + * to *FINAL_PATH to make the change visible. Files to be deleted will + * be listed in *FILES_TO_DELETE which may remain unchanged / unallocated. + * + * Allocate output values in RESULT_POOL and temporaries from SCRATCH_POOL. + */ +static svn_error_t * +write_packed_revprop(const char **final_path, + const char **tmp_path, + apr_array_header_t **files_to_delete, + svn_fs_t *fs, + svn_revnum_t rev, + apr_hash_t *proplist, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool) +{ + svn_fs_x__data_t *ffd = fs->fsap_data; + packed_revprops_t *revprops; + apr_int64_t generation = 0; + svn_stream_t *stream; + svn_stringbuf_t *serialized; + apr_off_t new_total_size; + int changed_index; + + /* read the current revprop generation. This value will not change + * while we hold the global write lock to this FS. */ + if (has_revprop_cache(fs, scratch_pool)) + SVN_ERR(read_revprop_generation(&generation, fs, scratch_pool)); + + /* read contents of the current pack file */ + SVN_ERR(read_pack_revprop(&revprops, fs, rev, generation, TRUE, + scratch_pool, scratch_pool)); + + /* serialize the new revprops */ + serialized = svn_stringbuf_create_empty(scratch_pool); + stream = svn_stream_from_stringbuf(serialized, scratch_pool); + SVN_ERR(svn_hash_write2(proplist, stream, SVN_HASH_TERMINATOR, + scratch_pool)); + SVN_ERR(svn_stream_close(stream)); + + /* calculate the size of the new data */ + changed_index = (int)(rev - revprops->start_revision); + new_total_size = revprops->total_size - revprops->serialized_size + + serialized->len + + (revprops->offsets->nelts + 2) * SVN_INT64_BUFFER_SIZE; + + APR_ARRAY_IDX(revprops->sizes, changed_index, apr_off_t) = serialized->len; + + /* can we put the new data into the same pack as the before? */ + if ( new_total_size < ffd->revprop_pack_size + || revprops->sizes->nelts == 1) + { + /* simply replace the old pack file with new content as we do it + * in the non-packed case */ + + *final_path = svn_dirent_join(revprops->folder, revprops->filename, + result_pool); + SVN_ERR(svn_stream_open_unique(&stream, tmp_path, revprops->folder, + svn_io_file_del_none, result_pool, + scratch_pool)); + SVN_ERR(repack_revprops(fs, revprops, 0, revprops->sizes->nelts, + changed_index, serialized, new_total_size, + stream, scratch_pool)); + } + else + { + /* split the pack file into two of roughly equal size */ + int right_count, left_count, i; + + int left = 0; + int right = revprops->sizes->nelts - 1; + apr_off_t left_size = 2 * SVN_INT64_BUFFER_SIZE; + apr_off_t right_size = 2 * SVN_INT64_BUFFER_SIZE; + + /* let left and right side grow such that their size difference + * is minimal after each step. */ + while (left <= right) + if ( left_size + APR_ARRAY_IDX(revprops->sizes, left, apr_off_t) + < right_size + APR_ARRAY_IDX(revprops->sizes, right, apr_off_t)) + { + left_size += APR_ARRAY_IDX(revprops->sizes, left, apr_off_t) + + SVN_INT64_BUFFER_SIZE; + ++left; + } + else + { + right_size += APR_ARRAY_IDX(revprops->sizes, right, apr_off_t) + + SVN_INT64_BUFFER_SIZE; + --right; + } + + /* since the items need much less than SVN_INT64_BUFFER_SIZE + * bytes to represent their length, the split may not be optimal */ + left_count = left; + right_count = revprops->sizes->nelts - left; + + /* if new_size is large, one side may exceed the pack size limit. + * In that case, split before and after the modified revprop.*/ + if ( left_size > ffd->revprop_pack_size + || right_size > ffd->revprop_pack_size) + { + left_count = changed_index; + right_count = revprops->sizes->nelts - left_count - 1; + } + + /* Allocate this here such that we can call the repack functions with + * the scratch pool alone. */ + if (*files_to_delete == NULL) + *files_to_delete = apr_array_make(result_pool, 3, + sizeof(const char*)); + + /* write the new, split files */ + if (left_count) + { + SVN_ERR(repack_stream_open(&stream, fs, revprops, 0, + left_count, files_to_delete, + scratch_pool, scratch_pool)); + SVN_ERR(repack_revprops(fs, revprops, 0, left_count, + changed_index, serialized, new_total_size, + stream, scratch_pool)); + } + + if (left_count + right_count < revprops->sizes->nelts) + { + SVN_ERR(repack_stream_open(&stream, fs, revprops, changed_index, + changed_index + 1, files_to_delete, + scratch_pool, scratch_pool)); + SVN_ERR(repack_revprops(fs, revprops, changed_index, + changed_index + 1, + changed_index, serialized, new_total_size, + stream, scratch_pool)); + } + + if (right_count) + { + SVN_ERR(repack_stream_open(&stream, fs, revprops, + revprops->sizes->nelts - right_count, + revprops->sizes->nelts, + files_to_delete, scratch_pool, + scratch_pool)); + SVN_ERR(repack_revprops(fs, revprops, + revprops->sizes->nelts - right_count, + revprops->sizes->nelts, changed_index, + serialized, new_total_size, stream, + scratch_pool)); + } + + /* write the new manifest */ + *final_path = svn_dirent_join(revprops->folder, PATH_MANIFEST, + result_pool); + SVN_ERR(svn_stream_open_unique(&stream, tmp_path, revprops->folder, + svn_io_file_del_none, result_pool, + scratch_pool)); + + for (i = 0; i < revprops->manifest->nelts; ++i) + { + const char *filename = APR_ARRAY_IDX(revprops->manifest, i, + const char*); + SVN_ERR(svn_stream_printf(stream, scratch_pool, "%s\n", filename)); + } + + SVN_ERR(svn_stream_close(stream)); + } + + return SVN_NO_ERROR; +} + +/* Set the revision property list of revision REV in filesystem FS to + PROPLIST. Use SCRATCH_POOL for temporary allocations. */ +svn_error_t * +svn_fs_x__set_revision_proplist(svn_fs_t *fs, + svn_revnum_t rev, + apr_hash_t *proplist, + apr_pool_t *scratch_pool) +{ + svn_boolean_t is_packed; + svn_boolean_t bump_generation = FALSE; + const char *final_path; + const char *tmp_path; + const char *perms_reference; + apr_array_header_t *files_to_delete = NULL; + + SVN_ERR(svn_fs_x__ensure_revision_exists(rev, fs, scratch_pool)); + + /* this info will not change while we hold the global FS write lock */ + is_packed = svn_fs_x__is_packed_revprop(fs, rev); + + /* Test whether revprops already exist for this revision. + * Only then will we need to bump the revprop generation. + * The fact that they did not yet exist is never cached. */ + if (is_packed) + { + bump_generation = TRUE; + } + else + { + svn_node_kind_t kind; + SVN_ERR(svn_io_check_path(svn_fs_x__path_revprops(fs, rev, + scratch_pool), + &kind, scratch_pool)); + bump_generation = kind != svn_node_none; + } + + /* Serialize the new revprop data */ + if (is_packed) + SVN_ERR(write_packed_revprop(&final_path, &tmp_path, &files_to_delete, + fs, rev, proplist, scratch_pool, + scratch_pool)); + else + SVN_ERR(write_non_packed_revprop(&final_path, &tmp_path, + fs, rev, proplist, scratch_pool, + scratch_pool)); + + /* We use the rev file of this revision as the perms reference, + * because when setting revprops for the first time, the revprop + * file won't exist and therefore can't serve as its own reference. + * (Whereas the rev file should already exist at this point.) + */ + perms_reference = svn_fs_x__path_rev_absolute(fs, rev, scratch_pool); + + /* Now, switch to the new revprop data. */ + SVN_ERR(switch_to_new_revprop(fs, final_path, tmp_path, perms_reference, + files_to_delete, bump_generation, + scratch_pool)); + + return SVN_NO_ERROR; +} + +/* Return TRUE, if for REVISION in FS, we can find the revprop pack file. + * Use SCRATCH_POOL for temporary allocations. + * Set *MISSING, if the reason is a missing manifest or pack file. + */ +svn_boolean_t +svn_fs_x__packed_revprop_available(svn_boolean_t *missing, + svn_fs_t *fs, + svn_revnum_t revision, + apr_pool_t *scratch_pool) +{ + svn_fs_x__data_t *ffd = fs->fsap_data; + svn_stringbuf_t *content = NULL; + + /* try to read the manifest file */ + const char *folder = svn_fs_x__path_revprops_pack_shard(fs, revision, + scratch_pool); + const char *manifest_path = svn_dirent_join(folder, PATH_MANIFEST, + scratch_pool); + + svn_error_t *err = svn_fs_x__try_stringbuf_from_file(&content, + missing, + manifest_path, + FALSE, + scratch_pool); + + /* if the manifest cannot be read, consider the pack files inaccessible + * even if the file itself exists. */ + if (err) + { + svn_error_clear(err); + return FALSE; + } + + if (*missing) + return FALSE; + + /* parse manifest content until we find the entry for REVISION. + * Revision 0 is never packed. */ + revision = revision < ffd->max_files_per_dir + ? revision - 1 + : revision % ffd->max_files_per_dir; + while (content->data) + { + char *next = strchr(content->data, '\n'); + if (next) + { + *next = 0; + ++next; + } + + if (revision-- == 0) + { + /* the respective pack file must exist (and be a file) */ + svn_node_kind_t kind; + err = svn_io_check_path(svn_dirent_join(folder, content->data, + scratch_pool), + &kind, scratch_pool); + if (err) + { + svn_error_clear(err); + return FALSE; + } + + *missing = kind == svn_node_none; + return kind == svn_node_file; + } + + content->data = next; + } + + return FALSE; +} + + +/****** Packing FSX shards *********/ + +svn_error_t * +svn_fs_x__copy_revprops(const char *pack_file_dir, + const char *pack_filename, + const char *shard_path, + svn_revnum_t start_rev, + svn_revnum_t end_rev, + apr_array_header_t *sizes, + apr_size_t total_size, + int compression_level, + svn_cancel_func_t cancel_func, + void *cancel_baton, + apr_pool_t *scratch_pool) +{ + svn_stream_t *pack_stream; + apr_file_t *pack_file; + svn_revnum_t rev; + apr_pool_t *iterpool = svn_pool_create(scratch_pool); + svn_stream_t *stream; + + /* create empty data buffer and a write stream on top of it */ + svn_stringbuf_t *uncompressed + = svn_stringbuf_create_ensure(total_size, scratch_pool); + svn_stringbuf_t *compressed + = svn_stringbuf_create_empty(scratch_pool); + pack_stream = svn_stream_from_stringbuf(uncompressed, scratch_pool); + + /* write the pack file header */ + SVN_ERR(serialize_revprops_header(pack_stream, start_rev, sizes, 0, + sizes->nelts, iterpool)); + + /* Some useful paths. */ + SVN_ERR(svn_io_file_open(&pack_file, svn_dirent_join(pack_file_dir, + pack_filename, + scratch_pool), + APR_WRITE | APR_CREATE, APR_OS_DEFAULT, + scratch_pool)); + + /* Iterate over the revisions in this shard, squashing them together. */ + for (rev = start_rev; rev <= end_rev; rev++) + { + const char *path; + + svn_pool_clear(iterpool); + + /* Construct the file name. */ + path = svn_dirent_join(shard_path, apr_psprintf(iterpool, "%ld", rev), + iterpool); + + /* Copy all the bits from the non-packed revprop file to the end of + * the pack file. */ + SVN_ERR(svn_stream_open_readonly(&stream, path, iterpool, iterpool)); + SVN_ERR(svn_stream_copy3(stream, pack_stream, + cancel_func, cancel_baton, iterpool)); + } + + /* flush stream buffers to content buffer */ + SVN_ERR(svn_stream_close(pack_stream)); + + /* compress the content (or just store it for COMPRESSION_LEVEL 0) */ + SVN_ERR(svn__compress(uncompressed, compressed, compression_level)); + + /* write the pack file content to disk */ + stream = svn_stream_from_aprfile2(pack_file, FALSE, scratch_pool); + SVN_ERR(svn_stream_write(stream, compressed->data, &compressed->len)); + SVN_ERR(svn_stream_close(stream)); + + svn_pool_destroy(iterpool); + + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__pack_revprops_shard(const char *pack_file_dir, + const char *shard_path, + apr_int64_t shard, + int max_files_per_dir, + apr_off_t max_pack_size, + int compression_level, + svn_cancel_func_t cancel_func, + void *cancel_baton, + apr_pool_t *scratch_pool) +{ + const char *manifest_file_path, *pack_filename = NULL; + svn_stream_t *manifest_stream; + svn_revnum_t start_rev, end_rev, rev; + apr_off_t total_size; + apr_pool_t *iterpool = svn_pool_create(scratch_pool); + apr_array_header_t *sizes; + + /* Some useful paths. */ + manifest_file_path = svn_dirent_join(pack_file_dir, PATH_MANIFEST, + scratch_pool); + + /* Remove any existing pack file for this shard, since it is incomplete. */ + SVN_ERR(svn_io_remove_dir2(pack_file_dir, TRUE, cancel_func, cancel_baton, + scratch_pool)); + + /* Create the new directory and manifest file stream. */ + SVN_ERR(svn_io_dir_make(pack_file_dir, APR_OS_DEFAULT, scratch_pool)); + SVN_ERR(svn_stream_open_writable(&manifest_stream, manifest_file_path, + scratch_pool, scratch_pool)); + + /* revisions to handle. Special case: revision 0 */ + start_rev = (svn_revnum_t) (shard * max_files_per_dir); + end_rev = (svn_revnum_t) ((shard + 1) * (max_files_per_dir) - 1); + if (start_rev == 0) + ++start_rev; + /* Special special case: if max_files_per_dir is 1, then at this point + start_rev == 1 and end_rev == 0 (!). Fortunately, everything just + works. */ + + /* initialize the revprop size info */ + sizes = apr_array_make(scratch_pool, max_files_per_dir, sizeof(apr_off_t)); + total_size = 2 * SVN_INT64_BUFFER_SIZE; + + /* Iterate over the revisions in this shard, determine their size and + * squashing them together into pack files. */ + for (rev = start_rev; rev <= end_rev; rev++) + { + apr_finfo_t finfo; + const char *path; + + svn_pool_clear(iterpool); + + /* Get the size of the file. */ + path = svn_dirent_join(shard_path, apr_psprintf(iterpool, "%ld", rev), + iterpool); + SVN_ERR(svn_io_stat(&finfo, path, APR_FINFO_SIZE, iterpool)); + + /* if we already have started a pack file and this revprop cannot be + * appended to it, write the previous pack file. */ + if (sizes->nelts != 0 && + total_size + SVN_INT64_BUFFER_SIZE + finfo.size > max_pack_size) + { + SVN_ERR(svn_fs_x__copy_revprops(pack_file_dir, pack_filename, + shard_path, start_rev, rev-1, + sizes, (apr_size_t)total_size, + compression_level, cancel_func, + cancel_baton, iterpool)); + + /* next pack file starts empty again */ + apr_array_clear(sizes); + total_size = 2 * SVN_INT64_BUFFER_SIZE; + start_rev = rev; + } + + /* Update the manifest. Allocate a file name for the current pack + * file if it is a new one */ + if (sizes->nelts == 0) + pack_filename = apr_psprintf(scratch_pool, "%ld.0", rev); + + SVN_ERR(svn_stream_printf(manifest_stream, iterpool, "%s\n", + pack_filename)); + + /* add to list of files to put into the current pack file */ + APR_ARRAY_PUSH(sizes, apr_off_t) = finfo.size; + total_size += SVN_INT64_BUFFER_SIZE + finfo.size; + } + + /* write the last pack file */ + if (sizes->nelts != 0) + SVN_ERR(svn_fs_x__copy_revprops(pack_file_dir, pack_filename, shard_path, + start_rev, rev-1, sizes, + (apr_size_t)total_size, compression_level, + cancel_func, cancel_baton, iterpool)); + + /* flush the manifest file and update permissions */ + SVN_ERR(svn_stream_close(manifest_stream)); + SVN_ERR(svn_io_copy_perms(shard_path, pack_file_dir, iterpool)); + + svn_pool_destroy(iterpool); + + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__delete_revprops_shard(const char *shard_path, + apr_int64_t shard, + int max_files_per_dir, + svn_cancel_func_t cancel_func, + void *cancel_baton, + apr_pool_t *scratch_pool) +{ + if (shard == 0) + { + apr_pool_t *iterpool = svn_pool_create(scratch_pool); + int i; + + /* delete all files except the one for revision 0 */ + for (i = 1; i < max_files_per_dir; ++i) + { + const char *path; + svn_pool_clear(iterpool); + + path = svn_dirent_join(shard_path, + apr_psprintf(iterpool, "%d", i), + iterpool); + if (cancel_func) + SVN_ERR((*cancel_func)(cancel_baton)); + + SVN_ERR(svn_io_remove_file2(path, TRUE, iterpool)); + } + + svn_pool_destroy(iterpool); + } + else + SVN_ERR(svn_io_remove_dir2(shard_path, TRUE, + cancel_func, cancel_baton, scratch_pool)); + + return SVN_NO_ERROR; +} + diff --git a/subversion/libsvn_fs_x/revprops.h b/subversion/libsvn_fs_x/revprops.h new file mode 100644 index 0000000..c4827c4 --- /dev/null +++ b/subversion/libsvn_fs_x/revprops.h @@ -0,0 +1,184 @@ +/* revprops.h --- everything needed to handle revprops in FSX + * + * ==================================================================== + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + * ==================================================================== + */ + +#ifndef SVN_LIBSVN_FS__REVPROPS_H +#define SVN_LIBSVN_FS__REVPROPS_H + +#include "svn_fs.h" + +#ifdef __cplusplus +extern "C" { +#endif /* __cplusplus */ + +/* Auto-create / replace the revprop generation file in FS with its + * initial contents. In any case, FS will not hold an open handle to + * it after this function succeeds. + * + * Use SCRATCH_POOL for temporary allocations. + */ +svn_error_t * +svn_fs_x__reset_revprop_generation_file(svn_fs_t *fs, + apr_pool_t *scratch_pool); + +/* In the filesystem FS, pack all revprop shards up to min_unpacked_rev. + * + * NOTE: Keep the old non-packed shards around until after the format bump. + * Otherwise, re-running upgrade will drop the packed revprop shard but + * have no unpacked data anymore. Call upgrade_cleanup_pack_revprops after + * the bump. + * + * NOTIFY_FUNC and NOTIFY_BATON as well as CANCEL_FUNC and CANCEL_BATON are + * used in the usual way. Temporary allocations are done in SCRATCH_POOL. + */ +svn_error_t * +svn_fs_x__upgrade_pack_revprops(svn_fs_t *fs, + svn_fs_upgrade_notify_t notify_func, + void *notify_baton, + svn_cancel_func_t cancel_func, + void *cancel_baton, + apr_pool_t *scratch_pool); + +/* In the filesystem FS, remove all non-packed revprop shards up to + * min_unpacked_rev. Temporary allocations are done in SCRATCH_POOL. + * + * NOTIFY_FUNC and NOTIFY_BATON as well as CANCEL_FUNC and CANCEL_BATON are + * used in the usual way. Cancellation is supported in the sense that we + * will cleanly abort the operation. However, there will be remnant shards + * that must be removed manually. + * + * See upgrade_pack_revprops for more info. + */ +svn_error_t * +svn_fs_x__upgrade_cleanup_pack_revprops(svn_fs_t *fs, + svn_fs_upgrade_notify_t notify_func, + void *notify_baton, + svn_cancel_func_t cancel_func, + void *cancel_baton, + apr_pool_t *scratch_pool); + +/* Read the revprops for revision REV in FS and return them in *PROPLIST_P. + * If BYPASS_CACHE is set, don't consult the disks but always read from disk. + * + * Allocate the *PROPLIST_P in RESULT_POOL and use SCRATCH_POOL for temporary + * allocations. + */ +svn_error_t * +svn_fs_x__get_revision_proplist(apr_hash_t **proplist_p, + svn_fs_t *fs, + svn_revnum_t rev, + svn_boolean_t bypass_cache, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool); + +/* Set the revision property list of revision REV in filesystem FS to + PROPLIST. Use SCRATCH_POOL for temporary allocations. */ +svn_error_t * +svn_fs_x__set_revision_proplist(svn_fs_t *fs, + svn_revnum_t rev, + apr_hash_t *proplist, + apr_pool_t *scratch_pool); + + +/* Return TRUE, if for REVISION in FS, we can find the revprop pack file. + * Use SCRATCH_POOL for temporary allocations. + * Set *MISSING, if the reason is a missing manifest or pack file. + */ +svn_boolean_t +svn_fs_x__packed_revprop_available(svn_boolean_t *missing, + svn_fs_t *fs, + svn_revnum_t revision, + apr_pool_t *scratch_pool); + + +/****** Packing FSX shards *********/ + +/* Copy revprop files for revisions [START_REV, END_REV) from SHARD_PATH + * to the pack file at PACK_FILE_NAME in PACK_FILE_DIR. + * + * The file sizes have already been determined and written to SIZES. + * Please note that this function will be executed while the filesystem + * has been locked and that revprops files will therefore not be modified + * while the pack is in progress. + * + * COMPRESSION_LEVEL defines how well the resulting pack file shall be + * compressed or whether is shall be compressed at all. TOTAL_SIZE is + * a hint on which initial buffer size we should use to hold the pack file + * content. + * + * CANCEL_FUNC and CANCEL_BATON are used as usual. Temporary allocations + * are done in SCRATCH_POOL. + */ +svn_error_t * +svn_fs_x__copy_revprops(const char *pack_file_dir, + const char *pack_filename, + const char *shard_path, + svn_revnum_t start_rev, + svn_revnum_t end_rev, + apr_array_header_t *sizes, + apr_size_t total_size, + int compression_level, + svn_cancel_func_t cancel_func, + void *cancel_baton, + apr_pool_t *scratch_pool); + +/* For the revprop SHARD at SHARD_PATH with exactly MAX_FILES_PER_DIR + * revprop files in it, create a packed shared at PACK_FILE_DIR. + * + * COMPRESSION_LEVEL defines how well the resulting pack file shall be + * compressed or whether is shall be compressed at all. Individual pack + * file containing more than one revision will be limited to a size of + * MAX_PACK_SIZE bytes before compression. + * + * CANCEL_FUNC and CANCEL_BATON are used in the usual way. Temporary + * allocations are done in SCRATCH_POOL. + */ +svn_error_t * +svn_fs_x__pack_revprops_shard(const char *pack_file_dir, + const char *shard_path, + apr_int64_t shard, + int max_files_per_dir, + apr_off_t max_pack_size, + int compression_level, + svn_cancel_func_t cancel_func, + void *cancel_baton, + apr_pool_t *scratch_pool); + +/* Delete the non-packed revprop SHARD at SHARD_PATH with exactly + * MAX_FILES_PER_DIR revprop files in it. If this is shard 0, keep the + * revprop file for revision 0. + * + * CANCEL_FUNC and CANCEL_BATON are used in the usual way. Temporary + * allocations are done in SCRATCH_POOL. + */ +svn_error_t * +svn_fs_x__delete_revprops_shard(const char *shard_path, + apr_int64_t shard, + int max_files_per_dir, + svn_cancel_func_t cancel_func, + void *cancel_baton, + apr_pool_t *scratch_pool); + +#ifdef __cplusplus +} +#endif /* __cplusplus */ + +#endif /* SVN_LIBSVN_FS__REVPROPS_H */ diff --git a/subversion/libsvn_fs_x/string_table.c b/subversion/libsvn_fs_x/string_table.c new file mode 100644 index 0000000..7b3b645 --- /dev/null +++ b/subversion/libsvn_fs_x/string_table.c @@ -0,0 +1,904 @@ +/* string_table.c : operations on string tables + * + * ==================================================================== + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + * ==================================================================== + */ + +#include <assert.h> +#include <string.h> +#include <apr_tables.h> + +#include "svn_string.h" +#include "svn_sorts.h" +#include "private/svn_dep_compat.h" +#include "private/svn_string_private.h" +#include "private/svn_subr_private.h" +#include "private/svn_packed_data.h" +#include "string_table.h" + + + +#define MAX_DATA_SIZE 0xffff +#define MAX_SHORT_STRING_LEN (MAX_DATA_SIZE / 4) +#define TABLE_SHIFT 13 +#define MAX_STRINGS_PER_TABLE (1 << (TABLE_SHIFT - 1)) +#define LONG_STRING_MASK (1 << (TABLE_SHIFT - 1)) +#define STRING_INDEX_MASK ((1 << (TABLE_SHIFT - 1)) - 1) +#define PADDING (sizeof(apr_uint64_t)) + + +typedef struct builder_string_t +{ + svn_string_t string; + int position; + apr_size_t depth; + struct builder_string_t *previous; + struct builder_string_t *next; + apr_size_t previous_match_len; + apr_size_t next_match_len; + struct builder_string_t *left; + struct builder_string_t *right; +} builder_string_t; + +typedef struct builder_table_t +{ + apr_size_t max_data_size; + builder_string_t *top; + builder_string_t *first; + builder_string_t *last; + apr_array_header_t *short_strings; + apr_array_header_t *long_strings; + apr_hash_t *long_string_dict; + apr_size_t long_string_size; +} builder_table_t; + +struct string_table_builder_t +{ + apr_pool_t *pool; + apr_array_header_t *tables; +}; + +typedef struct string_header_t +{ + apr_uint16_t head_string; + apr_uint16_t head_length; + apr_uint16_t tail_start; + apr_uint16_t tail_length; +} string_header_t; + +typedef struct string_sub_table_t +{ + const char *data; + apr_size_t data_size; + + string_header_t *short_strings; + apr_size_t short_string_count; + + svn_string_t *long_strings; + apr_size_t long_string_count; +} string_sub_table_t; + +struct string_table_t +{ + apr_size_t size; + string_sub_table_t *sub_tables; +}; + + +/* Accessing ID Pieces. */ + +static builder_table_t * +add_table(string_table_builder_t *builder) +{ + builder_table_t *table = apr_pcalloc(builder->pool, sizeof(*table)); + table->max_data_size = MAX_DATA_SIZE - PADDING; /* ensure there remain a few + unused bytes at the end */ + table->short_strings = apr_array_make(builder->pool, 64, + sizeof(builder_string_t *)); + table->long_strings = apr_array_make(builder->pool, 0, + sizeof(svn_string_t)); + table->long_string_dict = svn_hash__make(builder->pool); + + APR_ARRAY_PUSH(builder->tables, builder_table_t *) = table; + + return table; +} + +string_table_builder_t * +svn_fs_x__string_table_builder_create(apr_pool_t *result_pool) +{ + string_table_builder_t *result = apr_palloc(result_pool, sizeof(*result)); + result->pool = result_pool; + result->tables = apr_array_make(result_pool, 1, sizeof(builder_table_t *)); + + add_table(result); + + return result; +} + +static void +balance(builder_table_t *table, + builder_string_t **parent, + builder_string_t *node) +{ + apr_size_t left_height = node->left ? node->left->depth + 1 : 0; + apr_size_t right_height = node->right ? node->right->depth + 1 : 0; + + if (left_height > right_height + 1) + { + builder_string_t *temp = node->left->right; + node->left->right = node; + *parent = node->left; + node->left = temp; + + --left_height; + } + else if (left_height + 1 < right_height) + { + builder_string_t *temp = node->right->left; + *parent = node->right; + node->right->left = node; + node->right = temp; + + --right_height; + } + + node->depth = MAX(left_height, right_height); +} + +static apr_uint16_t +match_length(const svn_string_t *lhs, + const svn_string_t *rhs) +{ + apr_size_t len = MIN(lhs->len, rhs->len); + return (apr_uint16_t)svn_cstring__match_length(lhs->data, rhs->data, len); +} + +static apr_uint16_t +insert_string(builder_table_t *table, + builder_string_t **parent, + builder_string_t *to_insert) +{ + apr_uint16_t result; + builder_string_t *current = *parent; + int diff = strcmp(current->string.data, to_insert->string.data); + if (diff == 0) + { + apr_array_pop(table->short_strings); + return current->position; + } + + if (diff < 0) + { + if (current->left == NULL) + { + current->left = to_insert; + + to_insert->previous = current->previous; + to_insert->next = current; + + if (to_insert->previous == NULL) + { + table->first = to_insert; + } + else + { + builder_string_t *previous = to_insert->previous; + to_insert->previous_match_len + = match_length(&previous->string, &to_insert->string); + + previous->next = to_insert; + previous->next_match_len = to_insert->previous_match_len; + } + + current->previous = to_insert; + to_insert->next_match_len + = match_length(¤t->string, &to_insert->string); + current->previous_match_len = to_insert->next_match_len; + + table->max_data_size -= to_insert->string.len; + if (to_insert->previous == NULL) + table->max_data_size += to_insert->next_match_len; + else + table->max_data_size += MIN(to_insert->previous_match_len, + to_insert->next_match_len); + + return to_insert->position; + } + else + result = insert_string(table, ¤t->left, to_insert); + } + else + { + if (current->right == NULL) + { + current->right = to_insert; + + to_insert->next = current->next; + to_insert->previous = current; + + if (to_insert->next == NULL) + { + table->last = to_insert; + } + else + { + builder_string_t *next = to_insert->next; + to_insert->next_match_len + = match_length(&next->string, &to_insert->string); + + next->previous = to_insert; + next->previous_match_len = to_insert->next_match_len; + } + + current->next = current->right; + to_insert->previous_match_len + = match_length(¤t->string, &to_insert->string); + current->next_match_len = to_insert->previous_match_len; + + table->max_data_size -= to_insert->string.len; + if (to_insert->next == NULL) + table->max_data_size += to_insert->previous_match_len; + else + table->max_data_size += MIN(to_insert->previous_match_len, + to_insert->next_match_len); + + return to_insert->position; + } + else + result = insert_string(table, ¤t->right, to_insert); + } + + balance(table, parent, current); + return result; +} + +apr_size_t +svn_fs_x__string_table_builder_add(string_table_builder_t *builder, + const char *string, + apr_size_t len) +{ + apr_size_t result; + builder_table_t *table = APR_ARRAY_IDX(builder->tables, + builder->tables->nelts - 1, + builder_table_t *); + if (len == 0) + len = strlen(string); + + string = apr_pstrmemdup(builder->pool, string, len); + if (len > MAX_SHORT_STRING_LEN) + { + void *idx_void; + svn_string_t item; + item.data = string; + item.len = len; + + idx_void = apr_hash_get(table->long_string_dict, string, len); + result = (apr_uintptr_t)idx_void; + if (result) + return result - 1 + + LONG_STRING_MASK + + (((apr_size_t)builder->tables->nelts - 1) << TABLE_SHIFT); + + if (table->long_strings->nelts == MAX_STRINGS_PER_TABLE) + table = add_table(builder); + + result = table->long_strings->nelts + + LONG_STRING_MASK + + (((apr_size_t)builder->tables->nelts - 1) << TABLE_SHIFT); + APR_ARRAY_PUSH(table->long_strings, svn_string_t) = item; + apr_hash_set(table->long_string_dict, string, len, + (void*)(apr_uintptr_t)table->long_strings->nelts); + + table->long_string_size += len; + } + else + { + builder_string_t *item = apr_pcalloc(builder->pool, sizeof(*item)); + item->string.data = string; + item->string.len = len; + item->previous_match_len = 0; + item->next_match_len = 0; + + if ( table->short_strings->nelts == MAX_STRINGS_PER_TABLE + || table->max_data_size < len) + table = add_table(builder); + + item->position = table->short_strings->nelts; + APR_ARRAY_PUSH(table->short_strings, builder_string_t *) = item; + + if (table->top == NULL) + { + table->max_data_size -= len; + table->top = item; + table->first = item; + table->last = item; + + result = ((apr_size_t)builder->tables->nelts - 1) << TABLE_SHIFT; + } + else + { + result = insert_string(table, &table->top, item) + + (((apr_size_t)builder->tables->nelts - 1) << TABLE_SHIFT); + } + } + + return result; +} + +apr_size_t +svn_fs_x__string_table_builder_estimate_size(string_table_builder_t *builder) +{ + apr_size_t total = 0; + int i; + + for (i = 0; i < builder->tables->nelts; ++i) + { + builder_table_t *table + = APR_ARRAY_IDX(builder->tables, i, builder_table_t*); + + /* total number of chars to store, + * 8 bytes per short string table entry + * 4 bytes per long string table entry + * some static overhead */ + apr_size_t table_size + = MAX_DATA_SIZE - table->max_data_size + + table->long_string_size + + table->short_strings->nelts * 8 + + table->long_strings->nelts * 4 + + 10; + + total += table_size; + } + + /* ZIP compression should give us a 50% reduction. + * add some static overhead */ + return 200 + total / 2; + +} + +static void +create_table(string_sub_table_t *target, + builder_table_t *source, + apr_pool_t *pool, + apr_pool_t *scratch_pool) +{ + int i = 0; + apr_hash_t *tails = svn_hash__make(scratch_pool); + svn_stringbuf_t *data + = svn_stringbuf_create_ensure(MAX_DATA_SIZE - source->max_data_size, + scratch_pool); + + /* pack sub-strings */ + target->short_string_count = (apr_size_t)source->short_strings->nelts; + target->short_strings = apr_palloc(pool, sizeof(*target->short_strings) * + target->short_string_count); + for (i = 0; i < source->short_strings->nelts; ++i) + { + const builder_string_t *string + = APR_ARRAY_IDX(source->short_strings, i, const builder_string_t *); + + string_header_t *entry = &target->short_strings[i]; + const char *tail = string->string.data + string->previous_match_len; + string_header_t *tail_match; + apr_size_t head_length = string->previous_match_len; + + /* Minimize the number of strings to visit when reconstructing the + string head. So, skip all predecessors that don't contribute to + first HEAD_LENGTH chars of our string. */ + if (head_length) + { + const builder_string_t *furthest_prev = string->previous; + while (furthest_prev->previous_match_len >= head_length) + furthest_prev = furthest_prev->previous; + entry->head_string = furthest_prev->position; + } + else + entry->head_string = 0; + + /* head & tail length are known */ + entry->head_length = (apr_uint16_t)head_length; + entry->tail_length + = (apr_uint16_t)(string->string.len - entry->head_length); + + /* try to reuse an existing tail segment */ + tail_match = apr_hash_get(tails, tail, entry->tail_length); + if (tail_match) + { + entry->tail_start = tail_match->tail_start; + } + else + { + entry->tail_start = (apr_uint16_t)data->len; + svn_stringbuf_appendbytes(data, tail, entry->tail_length); + apr_hash_set(tails, tail, entry->tail_length, entry); + } + } + + /* pack long strings */ + target->long_string_count = (apr_size_t)source->long_strings->nelts; + target->long_strings = apr_palloc(pool, sizeof(*target->long_strings) * + target->long_string_count); + for (i = 0; i < source->long_strings->nelts; ++i) + { + svn_string_t *string = &target->long_strings[i]; + *string = APR_ARRAY_IDX(source->long_strings, i, svn_string_t); + string->data = apr_pstrmemdup(pool, string->data, string->len); + } + + data->len += PADDING; /* add a few extra bytes at the end of the buffer + that we want to keep valid for chunky access */ + assert(data->len < data->blocksize); + memset(data->data + data->len - PADDING, 0, PADDING); + + target->data = apr_pmemdup(pool, data->data, data->len); + target->data_size = data->len; +} + +string_table_t * +svn_fs_x__string_table_create(const string_table_builder_t *builder, + apr_pool_t *pool) +{ + apr_size_t i; + + string_table_t *result = apr_pcalloc(pool, sizeof(*result)); + result->size = (apr_size_t)builder->tables->nelts; + result->sub_tables + = apr_pcalloc(pool, result->size * sizeof(*result->sub_tables)); + + for (i = 0; i < result->size; ++i) + create_table(&result->sub_tables[i], + APR_ARRAY_IDX(builder->tables, i, builder_table_t*), + pool, + builder->pool); + + return result; +} + +/* Masks used by table_copy_string. copy_mask[I] is used if the target + content to be preserved starts at byte I within the current chunk. + This is used to work around alignment issues. + */ +#if SVN_UNALIGNED_ACCESS_IS_OK +static const char *copy_masks[8] = { "\xff\xff\xff\xff\xff\xff\xff\xff", + "\x00\xff\xff\xff\xff\xff\xff\xff", + "\x00\x00\xff\xff\xff\xff\xff\xff", + "\x00\x00\x00\xff\xff\xff\xff\xff", + "\x00\x00\x00\x00\xff\xff\xff\xff", + "\x00\x00\x00\x00\x00\xff\xff\xff", + "\x00\x00\x00\x00\x00\x00\xff\xff", + "\x00\x00\x00\x00\x00\x00\x00\xff" }; +#endif + +static void +table_copy_string(char *buffer, + apr_size_t len, + const string_sub_table_t *table, + string_header_t *header) +{ + buffer[len] = '\0'; + do + { + assert(header->head_length <= len); + { +#if SVN_UNALIGNED_ACCESS_IS_OK + /* the sections that we copy tend to be short but we can copy + *all* of it chunky because we made sure that source and target + buffer have some extra padding to prevent segfaults. */ + apr_uint64_t mask; + apr_size_t to_copy = len - header->head_length; + apr_size_t copied = 0; + + const char *source = table->data + header->tail_start; + char *target = buffer + header->head_length; + len = header->head_length; + + /* copy whole chunks */ + while (to_copy >= copied + sizeof(apr_uint64_t)) + { + *(apr_uint64_t *)(target + copied) + = *(const apr_uint64_t *)(source + copied); + copied += sizeof(apr_uint64_t); + } + + /* copy the remainder assuming that we have up to 8 extra bytes + of addressable buffer on the source and target sides. + Now, we simply copy 8 bytes and use a mask to filter & merge + old with new data. */ + mask = *(const apr_uint64_t *)copy_masks[to_copy - copied]; + *(apr_uint64_t *)(target + copied) + = (*(apr_uint64_t *)(target + copied) & mask) + | (*(const apr_uint64_t *)(source + copied) & ~mask); +#else + memcpy(buffer + header->head_length, + table->data + header->tail_start, + len - header->head_length); + len = header->head_length; +#endif + } + + header = &table->short_strings[header->head_string]; + } + while (len); +} + +const char* +svn_fs_x__string_table_get(const string_table_t *table, + apr_size_t idx, + apr_size_t *length, + apr_pool_t *pool) +{ + apr_size_t table_number = idx >> TABLE_SHIFT; + apr_size_t sub_index = idx & STRING_INDEX_MASK; + + if (table_number < table->size) + { + string_sub_table_t *sub_table = &table->sub_tables[table_number]; + if (idx & LONG_STRING_MASK) + { + if (sub_index < sub_table->long_string_count) + { + if (length) + *length = sub_table->long_strings[sub_index].len; + + return apr_pstrmemdup(pool, + sub_table->long_strings[sub_index].data, + sub_table->long_strings[sub_index].len); + } + } + else + { + if (sub_index < sub_table->short_string_count) + { + string_header_t *header = sub_table->short_strings + sub_index; + apr_size_t len = header->head_length + header->tail_length; + char *result = apr_palloc(pool, len + PADDING); + + if (length) + *length = len; + table_copy_string(result, len, sub_table, header); + + return result; + } + } + } + + return apr_pstrmemdup(pool, "", 0); +} + +svn_error_t * +svn_fs_x__write_string_table(svn_stream_t *stream, + const string_table_t *table, + apr_pool_t *scratch_pool) +{ + apr_size_t i, k; + + svn_packed__data_root_t *root = svn_packed__data_create_root(scratch_pool); + + svn_packed__int_stream_t *table_sizes + = svn_packed__create_int_stream(root, FALSE, FALSE); + svn_packed__int_stream_t *small_strings_headers + = svn_packed__create_int_stream(root, FALSE, FALSE); + svn_packed__byte_stream_t *large_strings + = svn_packed__create_bytes_stream(root); + svn_packed__byte_stream_t *small_strings_data + = svn_packed__create_bytes_stream(root); + + svn_packed__create_int_substream(small_strings_headers, TRUE, FALSE); + svn_packed__create_int_substream(small_strings_headers, FALSE, FALSE); + svn_packed__create_int_substream(small_strings_headers, TRUE, FALSE); + svn_packed__create_int_substream(small_strings_headers, FALSE, FALSE); + + /* number of sub-tables */ + + svn_packed__add_uint(table_sizes, table->size); + + /* all short-string char data sizes */ + + for (i = 0; i < table->size; ++i) + svn_packed__add_uint(table_sizes, + table->sub_tables[i].short_string_count); + + for (i = 0; i < table->size; ++i) + svn_packed__add_uint(table_sizes, + table->sub_tables[i].long_string_count); + + /* all strings */ + + for (i = 0; i < table->size; ++i) + { + string_sub_table_t *sub_table = &table->sub_tables[i]; + svn_packed__add_bytes(small_strings_data, + sub_table->data, + sub_table->data_size); + + for (k = 0; k < sub_table->short_string_count; ++k) + { + string_header_t *string = &sub_table->short_strings[k]; + + svn_packed__add_uint(small_strings_headers, string->head_string); + svn_packed__add_uint(small_strings_headers, string->head_length); + svn_packed__add_uint(small_strings_headers, string->tail_start); + svn_packed__add_uint(small_strings_headers, string->tail_length); + } + + for (k = 0; k < sub_table->long_string_count; ++k) + svn_packed__add_bytes(large_strings, + sub_table->long_strings[k].data, + sub_table->long_strings[k].len + 1); + } + + /* write to target stream */ + + SVN_ERR(svn_packed__data_write(stream, root, scratch_pool)); + + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__read_string_table(string_table_t **table_p, + svn_stream_t *stream, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool) +{ + apr_size_t i, k; + + string_table_t *table = apr_palloc(result_pool, sizeof(*table)); + + svn_packed__data_root_t *root; + svn_packed__int_stream_t *table_sizes; + svn_packed__byte_stream_t *large_strings; + svn_packed__byte_stream_t *small_strings_data; + svn_packed__int_stream_t *headers; + + SVN_ERR(svn_packed__data_read(&root, stream, result_pool, scratch_pool)); + table_sizes = svn_packed__first_int_stream(root); + headers = svn_packed__next_int_stream(table_sizes); + large_strings = svn_packed__first_byte_stream(root); + small_strings_data = svn_packed__next_byte_stream(large_strings); + + /* create sub-tables */ + + table->size = (apr_size_t)svn_packed__get_uint(table_sizes); + table->sub_tables = apr_pcalloc(result_pool, + table->size * sizeof(*table->sub_tables)); + + /* read short strings */ + + for (i = 0; i < table->size; ++i) + { + string_sub_table_t *sub_table = &table->sub_tables[i]; + + sub_table->short_string_count + = (apr_size_t)svn_packed__get_uint(table_sizes); + if (sub_table->short_string_count) + { + sub_table->short_strings + = apr_pcalloc(result_pool, sub_table->short_string_count + * sizeof(*sub_table->short_strings)); + + /* read short string headers */ + + for (k = 0; k < sub_table->short_string_count; ++k) + { + string_header_t *string = &sub_table->short_strings[k]; + + string->head_string = (apr_uint16_t)svn_packed__get_uint(headers); + string->head_length = (apr_uint16_t)svn_packed__get_uint(headers); + string->tail_start = (apr_uint16_t)svn_packed__get_uint(headers); + string->tail_length = (apr_uint16_t)svn_packed__get_uint(headers); + } + } + + sub_table->data = svn_packed__get_bytes(small_strings_data, + &sub_table->data_size); + } + + /* read long strings */ + + for (i = 0; i < table->size; ++i) + { + /* initialize long string table */ + string_sub_table_t *sub_table = &table->sub_tables[i]; + + sub_table->long_string_count = svn_packed__get_uint(table_sizes); + if (sub_table->long_string_count) + { + sub_table->long_strings + = apr_pcalloc(result_pool, sub_table->long_string_count + * sizeof(*sub_table->long_strings)); + + /* read long strings */ + + for (k = 0; k < sub_table->long_string_count; ++k) + { + svn_string_t *string = &sub_table->long_strings[k]; + string->data = svn_packed__get_bytes(large_strings, + &string->len); + string->len--; + } + } + } + + /* done */ + + *table_p = table; + + return SVN_NO_ERROR; +} + +void +svn_fs_x__serialize_string_table(svn_temp_serializer__context_t *context, + string_table_t **st) +{ + apr_size_t i, k; + string_table_t *string_table = *st; + if (string_table == NULL) + return; + + /* string table struct */ + svn_temp_serializer__push(context, + (const void * const *)st, + sizeof(*string_table)); + + /* sub-table array (all structs in a single memory block) */ + svn_temp_serializer__push(context, + (const void * const *)&string_table->sub_tables, + sizeof(*string_table->sub_tables) * + string_table->size); + + /* sub-elements of all sub-tables */ + for (i = 0; i < string_table->size; ++i) + { + string_sub_table_t *sub_table = &string_table->sub_tables[i]; + svn_temp_serializer__add_leaf(context, + (const void * const *)&sub_table->data, + sub_table->data_size); + svn_temp_serializer__add_leaf(context, + (const void * const *)&sub_table->short_strings, + sub_table->short_string_count * sizeof(string_header_t)); + + /* all "long string" instances form a single memory block */ + svn_temp_serializer__push(context, + (const void * const *)&sub_table->long_strings, + sub_table->long_string_count * sizeof(svn_string_t)); + + /* serialize actual long string contents */ + for (k = 0; k < sub_table->long_string_count; ++k) + { + svn_string_t *string = &sub_table->long_strings[k]; + svn_temp_serializer__add_leaf(context, + (const void * const *)&string->data, + string->len + 1); + } + + svn_temp_serializer__pop(context); + } + + /* back to the caller's nesting level */ + svn_temp_serializer__pop(context); + svn_temp_serializer__pop(context); +} + +void +svn_fs_x__deserialize_string_table(void *buffer, + string_table_t **table) +{ + apr_size_t i, k; + string_sub_table_t *sub_tables; + + svn_temp_deserializer__resolve(buffer, (void **)table); + if (*table == NULL) + return; + + svn_temp_deserializer__resolve(*table, (void **)&(*table)->sub_tables); + sub_tables = (*table)->sub_tables; + for (i = 0; i < (*table)->size; ++i) + { + string_sub_table_t *sub_table = sub_tables + i; + + svn_temp_deserializer__resolve(sub_tables, + (void **)&sub_table->data); + svn_temp_deserializer__resolve(sub_tables, + (void **)&sub_table->short_strings); + svn_temp_deserializer__resolve(sub_tables, + (void **)&sub_table->long_strings); + + for (k = 0; k < sub_table->long_string_count; ++k) + svn_temp_deserializer__resolve(sub_table->long_strings, + (void **)&sub_table->long_strings[k].data); + } +} + +const char* +svn_fs_x__string_table_get_func(const string_table_t *table, + apr_size_t idx, + apr_size_t *length, + apr_pool_t *pool) +{ + apr_size_t table_number = idx >> TABLE_SHIFT; + apr_size_t sub_index = idx & STRING_INDEX_MASK; + + if (table_number < table->size) + { + /* resolve TABLE->SUB_TABLES pointer and select sub-table */ + string_sub_table_t *sub_tables + = (string_sub_table_t *)svn_temp_deserializer__ptr(table, + (const void *const *)&table->sub_tables); + string_sub_table_t *sub_table = sub_tables + table_number; + + /* pick the right kind of string */ + if (idx & LONG_STRING_MASK) + { + if (sub_index < sub_table->long_string_count) + { + /* resolve SUB_TABLE->LONG_STRINGS, select the string we want + and resolve the pointer to its char data */ + svn_string_t *long_strings + = (svn_string_t *)svn_temp_deserializer__ptr(sub_table, + (const void *const *)&sub_table->long_strings); + const char *str_data + = (const char*)svn_temp_deserializer__ptr(long_strings, + (const void *const *)&long_strings[sub_index].data); + + /* return a copy of the char data */ + if (length) + *length = long_strings[sub_index].len; + + return apr_pstrmemdup(pool, + str_data, + long_strings[sub_index].len); + } + } + else + { + if (sub_index < sub_table->short_string_count) + { + string_header_t *header; + apr_size_t len; + char *result; + + /* construct a copy of our sub-table struct with SHORT_STRINGS + and DATA pointers resolved. Leave all other pointers as + they are. This allows us to use the same code for string + reconstruction here as in the non-serialized case. */ + string_sub_table_t table_copy = *sub_table; + table_copy.data + = (const char *)svn_temp_deserializer__ptr(sub_tables, + (const void *const *)&sub_table->data); + table_copy.short_strings + = (string_header_t *)svn_temp_deserializer__ptr(sub_tables, + (const void *const *)&sub_table->short_strings); + + /* reconstruct the char data and return it */ + header = table_copy.short_strings + sub_index; + len = header->head_length + header->tail_length; + result = apr_palloc(pool, len + PADDING); + if (length) + *length = len; + + table_copy_string(result, len, &table_copy, header); + + return result; + } + } + } + + return ""; +} diff --git a/subversion/libsvn_fs_x/string_table.h b/subversion/libsvn_fs_x/string_table.h new file mode 100644 index 0000000..f7ab476 --- /dev/null +++ b/subversion/libsvn_fs_x/string_table.h @@ -0,0 +1,133 @@ +/* string_table.h : interface to string tables, private to libsvn_fs_x + * + * ==================================================================== + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + * ==================================================================== + */ + +#ifndef SVN_LIBSVN_FS_X_STRING_TABLE_H +#define SVN_LIBSVN_FS_X_STRING_TABLE_H + +#include "svn_io.h" +#include "private/svn_temp_serializer.h" + +#ifdef __cplusplus +extern "C" { +#endif /* __cplusplus */ + +/* A string table is a very space efficient, read-only representation for + * a set of strings with high degreed of prefix and postfix overhead. + * + * Creating a string table is a two-stage process: Use a builder class, + * stuff all the strings in there and let it then do the heavy lifting of + * classification and compression to create the actual string table object. + * + * We will use this for the various path values in FSX change lists and + * node revision items. + */ + +/* the string table builder */ +typedef struct string_table_builder_t string_table_builder_t; + +/* the string table */ +typedef struct string_table_t string_table_t; + +/* Returns a new string table builder object, allocated in RESULT_POOL. + */ +string_table_builder_t * +svn_fs_x__string_table_builder_create(apr_pool_t *result_pool); + +/* Add an arbitrary NUL-terminated C-string STRING of the given length LEN + * to BUILDER. Return the index of that string in the future string table. + * If LEN is 0, determine the length of the C-string internally. + */ +apr_size_t +svn_fs_x__string_table_builder_add(string_table_builder_t *builder, + const char *string, + apr_size_t len); + +/* Return an estimate for the on-disk size of the resulting string table. + * The estimate may err in both directions but tends to overestimate the + * space requirements for larger tables. + */ +apr_size_t +svn_fs_x__string_table_builder_estimate_size(string_table_builder_t *builder); + +/* From the given BUILDER object, create a string table object allocated + * in POOL that contains all strings previously added to BUILDER. + */ +string_table_t * +svn_fs_x__string_table_create(const string_table_builder_t *builder, + apr_pool_t *pool); + +/* Extract string number INDEX from TABLE and return a copy of it allocated + * in POOL. If LENGTH is not NULL, set *LENGTH to strlen() of the result + * string. Returns an empty string for invalid indexes. + */ +const char* +svn_fs_x__string_table_get(const string_table_t *table, + apr_size_t index, + apr_size_t *length, + apr_pool_t *pool); + +/* Write a serialized representation of the string table TABLE to STREAM. + * Use SCRATCH_POOL for temporary allocations. + */ +svn_error_t * +svn_fs_x__write_string_table(svn_stream_t *stream, + const string_table_t *table, + apr_pool_t *scratch_pool); + +/* Read the serialized string table representation from STREAM and return + * the resulting runtime representation in *TABLE_P. Allocate it in + * RESULT_POOL and use SCRATCH_POOL for temporary allocations. + */ +svn_error_t * +svn_fs_x__read_string_table(string_table_t **table_p, + svn_stream_t *stream, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool); + +/* Serialize string table *ST within the serialization CONTEXT. + */ +void +svn_fs_x__serialize_string_table(svn_temp_serializer__context_t *context, + string_table_t **st); + +/* Deserialize string table *TABLE within the BUFFER. + */ +void +svn_fs_x__deserialize_string_table(void *buffer, + string_table_t **table); + +/* Extract string number INDEX from the cache serialized representation at + * TABLE and return a copy of it allocated in POOL. If LENGTH is not NULL, + * set *LENGTH to strlen() of the result string. Returns an empty string + * for invalid indexes. + */ +const char* +svn_fs_x__string_table_get_func(const string_table_t *table, + apr_size_t idx, + apr_size_t *length, + apr_pool_t *pool); + +#ifdef __cplusplus +} +#endif /* __cplusplus */ + +#endif /* SVN_LIBSVN_FS_X_STRING_TABLE_H */ diff --git a/subversion/libsvn_fs_x/structure b/subversion/libsvn_fs_x/structure new file mode 100644 index 0000000..8d10c3d --- /dev/null +++ b/subversion/libsvn_fs_x/structure @@ -0,0 +1,336 @@ +This file will describe the design, layouts, and file formats of a +libsvn_fs_x repository. + +Since FSX is still in a very early phase of its development, all sections +either subject to major change or simply "TBD". + +Design +------ + +TBD. + +Similar to FSFS format 7 but using a radically different on-disk format. + +In FSFS, each committed revision is represented as an immutable file +containing the new node-revisions, contents, and changed-path +information for the revision, plus a second, changeable file +containing the revision properties. + +To reduce the size of the on-disk representation, revision data gets +packed, i.e. multiple revision files get combined into a single pack +file of smaller total size. The same strategy is applied to revprops. + +In-progress transactions are represented with a prototype rev file +containing only the new text representations of files (appended to as +changed file contents come in), along with a separate file for each +node-revision, directory representation, or property representation +which has been changed or added in the transaction. During the final +stage of the commit, these separate files are marshalled onto the end +of the prototype rev file to form the immutable revision file. + +Layout of the FS directory +-------------------------- + +The layout of the FS directory (the "db" subdirectory of the +repository) is: + + revs/ Subdirectory containing revs + <shard>/ Shard directory, if sharding is in use (see below) + <revnum> File containing rev <revnum> + <shard>.pack/ Pack directory, if the repo has been packed (see below) + pack Pack file, if the repository has been packed (see below) + manifest Pack manifest file, if a pack file exists (see below) + revprops/ Subdirectory containing rev-props + <shard>/ Shard directory, if sharding is in use (see below) + <revnum> File containing rev-props for <revnum> + <shard>.pack/ Pack directory, if the repo has been packed (see below) + <rev>.<count> Pack file, if the repository has been packed (see below) + manifest Pack manifest file, if a pack file exists (see below) + revprops.db SQLite database of the packed revision properties + transactions/ Subdirectory containing transactions + <txnid>.txn/ Directory containing transaction <txnid> + txn-protorevs/ Subdirectory containing transaction proto-revision files + <txnid>.rev Proto-revision file for transaction <txnid> + <txnid>.rev-lock Write lock for proto-rev file + txn-current File containing the next transaction key + locks/ Subdirectory containing locks + <partial-digest>/ Subdirectory named for first 3 letters of an MD5 digest + <digest> File containing locks/children for path with <digest> + current File specifying current revision and next node/copy id + fs-type File identifying this filesystem as an FSFS filesystem + write-lock Empty file, locked to serialise writers + pack-lock Empty file, locked to serialise 'svnadmin pack' (f. 7+) + txn-current-lock Empty file, locked to serialise 'txn-current' + uuid File containing the UUID of the repository + format File containing the format number of this filesystem + fsx.conf Configuration file + min-unpacked-rev File containing the oldest revision not in a pack file + min-unpacked-revprop File containing the oldest revision of unpacked revprop + rep-cache.db SQLite database mapping rep checksums to locations + +Files in the revprops directory are in the hash dump format used by +svn_hash_write. + +The format of the "current" file is a single line of the form +"<youngest-revision>\n" giving the youngest revision for the +repository. + +The "write-lock" file is an empty file which is locked before the +final stage of a commit and unlocked after the new "current" file has +been moved into place to indicate that a new revision is present. It +is also locked during a revprop propchange while the revprop file is +read in, mutated, and written out again. Furthermore, it will be used +to serialize the repository structure changes during 'svnadmin pack' +(see also next section). Note that readers are never blocked by any +operation - writers must ensure that the filesystem is always in a +consistent state. + +The "pack-lock" file is an empty file which is locked before an 'svnadmin +pack' operation commences. Thus, only one process may attempt to modify +the repository structure at a time while other processes may still read +and write (commit) to the repository during most of the pack procedure. +It is only available with format 7 and newer repositories. Older formats +use the global write-lock instead which disables commits completely +for the duration of the pack process. + +The "txn-current" file is a file with a single line of text that +contains only a base-36 number. The current value will be used in the +next transaction name, along with the revision number the transaction +is based on. This sequence number ensures that transaction names are +not reused, even if the transaction is aborted and a new transaction +based on the same revision is begun. The only operation that FSFS +performs on this file is "get and increment"; the "txn-current-lock" +file is locked during this operation. + +"fsx.conf" is a configuration file in the standard Subversion/Python +config format. It is automatically generated when you create a new +repository; read the generated file for details on what it controls. + +When representation sharing is enabled, the filesystem tracks +representation checksum and location mappings using a SQLite database in +"rep-cache.db". The database has a single table, which stores the sha1 +hash text as the primary key, mapped to the representation revision, offset, +size and expanded size. This file is only consulted during writes and never +during reads. Consequently, it is not required, and may be removed at an +abritrary time, with the subsequent loss of rep-sharing capabilities for +revisions written thereafter. + +Filesystem formats +------------------ + +TBD. + +The "format" file defines what features are permitted within the +filesystem, and indicates changes that are not backward-compatible. +It serves the same purpose as the repository file of the same name. + +So far, there is only format 1. + + +Node-revision IDs +----------------- + +A node-rev ID consists of the following three fields: + + node_revision_id ::= node_id '.' copy_id '.' txn_id + +At this level, the form of the ID is the same as for BDB - see the +section called "ID's" in <../libsvn_fs_base/notes/structure>. + +In order to support efficient lookup of node-revisions by their IDs +and to simplify the allocation of fresh node-IDs during a transaction, +we treat the fields of a node-rev ID in new and interesting ways. + +Within a new transaction: + + New node-revision IDs assigned within a transaction have a txn-id + field of the form "t<txnid>". + + When a new node-id or copy-id is assigned in a transaction, the ID + used is a "_" followed by a base36 number unique to the transaction. + +Within a revision: + + Within a revision file, node-revs have a txn-id field of the form + "r<rev>/<offset>", to support easy lookup. The <offset> is the (ASCII + decimal) number of bytes from the start of the revision file to the + start of the node-rev. + + During the final phase of a commit, node-revision IDs are rewritten + to have repository-wide unique node-ID and copy-ID fields, and to have + "r<rev>/<offset>" txn-id fields. + + This uniqueness is done by changing a temporary + id of "_<base36>" to "<base36>-<rev>". Note that this means that the + originating revision of a line of history or a copy can be determined + by looking at the node ID. + +The temporary assignment of node-ID and copy-ID fields has +implications for svn_fs_compare_ids and svn_fs_check_related. The ID +_1.0.t1 is not related to the ID _1.0.t2 even though they have the +same node-ID, because temporary node-IDs are restricted in scope to +the transactions they belong to. + +Copy-IDs and copy roots +----------------------- + +Copy-IDs are assigned in the same manner as they are in the BDB +implementation: + + * A node-rev resulting from a creation operation (with no copy + history) receives the copy-ID of its parent directory. + + * A node-rev resulting from a copy operation receives a fresh + copy-ID, as one would expect. + + * A node-rev resulting from a modification operation receives a + copy-ID depending on whether its predecessor derives from a + copy operation or whether it derives from a creation operation + with no intervening copies: + + - If the predecessor does not derive from a copy, the new + node-rev receives the copy-ID of its parent directory. If the + node-rev is being modified through its created-path, this will + be the same copy-ID as the predecessor node-rev has; however, + if the node-rev is being modified through a copied ancestor + directory (i.e. we are performing a "lazy copy"), this will be + a different copy-ID. + + - If the predecessor derives from a copy and the node-rev is + being modified through its created-path, the new node-rev + receives the copy-ID of the predecessor. + + - If the predecessor derives from a copy and the node-rev is not + being modified through its created path, the new node-rev + receives a fresh copy-ID. This is called a "soft copy" + operation, as distinct from a "true copy" operation which was + actually requested through the svn_fs interface. Soft copies + exist to ensure that the same <node-ID,copy-ID> pair is not + used twice within a transaction. + +Unlike the BDB implementation, we do not have a "copies" table. +Instead, each node-revision record contains a "copyroot" field +identifying the node-rev resulting from the true copy operation most +proximal to the node-rev. If the node-rev does not itself derive from +a copy operation, then the copyroot field identifies the copy of an +ancestor directory; if no ancestor directories derive from a copy +operation, then the copyroot field identifies the root directory of +rev 0. + +Revision file format +-------------------- + +TBD + +A revision file contains a concatenation of various kinds of data: + + * Text and property representations + * Node-revisions + * The changed-path data + +That data is aggregated in compressed containers with a binary on-disk +representation. + +Transaction layout +------------------ + +A transaction directory has the following layout: + + props Transaction props + props-final Final transaction props (optional) + next-ids Next temporary node-ID and copy-ID + changes Changed-path information so far + node.<nid>.<cid> New node-rev data for node + node.<nid>.<cid>.props Props for new node-rev, if changed + node.<nid>.<cid>.children Directory contents for node-rev + <sha1> Text representation of that sha1 + + txn-protorevs/rev Prototype rev file with new text reps + txn-protorevs/rev-lock Lockfile for writing to the above + +The prototype rev file is used to store the text representations as +they are received from the client. To ensure that only one client is +writing to the file at a given time, the "rev-lock" file is locked for +the duration of each write. + +The three kinds of props files are all in hash dump format. The "props" +file will always be present. The "node.<nid>.<cid>.props" file will +only be present if the node-rev properties have been changed. The +"props-final" only exists while converting the transaction into a revision. + +The <sha1> files' content is that of text rep references: +"<rev> <offset> <length> <size> <digest>" +They will be written for text reps in the current transaction and be +used to eliminate duplicate reps within that transaction. + +The "next-ids" file contains a single line "<next-temp-node-id> +<next-temp-copy-id>\n" giving the next temporary node-ID and copy-ID +assignments (without the leading underscores). The next node-ID is +also used as a uniquifier for representations which may share the same +underlying rep. + +The "children" file for a node-revision begins with a copy of the hash +dump representation of the directory entries from the old node-rev (or +a dump of the empty hash for new directories), and then an incremental +hash dump entry for each change made to the directory. + +The "changes" file contains changed-path entries in the same form as +the changed-path entries in a rev file, except that <id> and <action> +may both be "reset" (in which case <text-mod> and <prop-mod> are both +always "false") to indicate that all changes to a path should be +considered undone. Reset entries are only used during the final merge +phase of a transaction. Actions in the "changes" file always contain +a node kind. + +The node-rev files have the same format as node-revs in a revision +file, except that the "text" and "props" fields are augmented as +follows: + + * The "props" field may have the value "-1" if properties have + been changed and are contained in a "props" file within the + node-rev subdirectory. + + * For directory node-revs, the "text" field may have the value + "-1" if entries have been changed and are contained in a + "contents" file in the node-rev subdirectory. + + * For the directory node-rev representing the root of the + transaction, the "is-fresh-txn-root" field indicates that it has + not been made mutable yet (see Issue #2608). + + * For file node-revs, the "text" field may have the value "-1 + <offset> <length> <size> <digest>" if the text representation is + within the prototype rev file. + + * The "copyroot" field may have the value "-1 <created-path>" if the + copy root of the node-rev is part of the transaction in process. + + +Locks layout +------------ + +Locks in FSX are stored in serialized hash format in files whose +names are MD5 digests of the FS path which the lock is associated +with. For the purposes of keeping directory inode usage down, these +digest files live in subdirectories of the main lock directory whose +names are the first 3 characters of the digest filename. + +Also stored in the digest file for a given FS path are pointers to +other digest files which contain information associated with other FS +paths that are beneath our path (an immediate child thereof, or a +grandchild, or a great-grandchild, ...). + +To answer the question, "Does path FOO have a lock associated with +it?", one need only generate the MD5 digest of FOO's +absolute-in-the-FS path (say, 3b1b011fed614a263986b5c4869604e8), look +for a file located like so: + + /path/to/repos/locks/3b1/3b1b011fed614a263986b5c4869604e8 + +And then see if that file contains lock information. + +To inquire about locks on children of the path FOO, you would +reference the same path as above, but look for a list of children in +that file (instead of lock information). Children are listed as MD5 +digests, too, so you would simply iterate over those digests and +consult the files they reference for lock information. diff --git a/subversion/libsvn_fs_x/temp_serializer.c b/subversion/libsvn_fs_x/temp_serializer.c new file mode 100644 index 0000000..65a2c3f --- /dev/null +++ b/subversion/libsvn_fs_x/temp_serializer.c @@ -0,0 +1,1337 @@ +/* temp_serializer.c: serialization functions for caching of FSX structures + * + * ==================================================================== + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + * ==================================================================== + */ + +#include <apr_pools.h> + +#include "svn_pools.h" +#include "svn_hash.h" +#include "svn_sorts.h" +#include "svn_fs.h" + +#include "private/svn_fs_util.h" +#include "private/svn_sorts_private.h" +#include "private/svn_temp_serializer.h" +#include "private/svn_subr_private.h" + +#include "id.h" +#include "temp_serializer.h" +#include "low_level.h" +#include "cached_data.h" + +/* Utility to encode a signed NUMBER into a variable-length sequence of + * 8-bit chars in KEY_BUFFER and return the last writen position. + * + * Numbers will be stored in 7 bits / byte and using byte values above + * 32 (' ') to make them combinable with other string by simply separating + * individual parts with spaces. + */ +static char* +encode_number(apr_int64_t number, char *key_buffer) +{ + /* encode the sign in the first byte */ + if (number < 0) + { + number = -number; + *key_buffer = (char)((number & 63) + ' ' + 65); + } + else + *key_buffer = (char)((number & 63) + ' ' + 1); + number /= 64; + + /* write 7 bits / byte until no significant bits are left */ + while (number) + { + *++key_buffer = (char)((number & 127) + ' ' + 1); + number /= 128; + } + + /* return the last written position */ + return key_buffer; +} + +const char* +svn_fs_x__combine_number_and_string(apr_int64_t number, + const char *string, + apr_pool_t *pool) +{ + apr_size_t len = strlen(string); + + /* number part requires max. 10x7 bits + 1 space. + * Add another 1 for the terminal 0 */ + char *key_buffer = apr_palloc(pool, len + 12); + const char *key = key_buffer; + + /* Prepend the number to the string and separate them by space. No other + * number can result in the same prefix, no other string in the same + * postfix nor can the boundary between them be ambiguous. */ + key_buffer = encode_number(number, key_buffer); + *++key_buffer = ' '; + memcpy(++key_buffer, string, len+1); + + /* return the start of the key */ + return key; +} + +/* Utility function to serialize string S in the given serialization CONTEXT. + */ +static void +serialize_svn_string(svn_temp_serializer__context_t *context, + const svn_string_t * const *s) +{ + const svn_string_t *string = *s; + + /* Nothing to do for NULL string references. */ + if (string == NULL) + return; + + svn_temp_serializer__push(context, + (const void * const *)s, + sizeof(*string)); + + /* the "string" content may actually be arbitrary binary data. + * Thus, we cannot use svn_temp_serializer__add_string. */ + svn_temp_serializer__add_leaf(context, + (const void * const *)&string->data, + string->len + 1); + + /* back to the caller's nesting level */ + svn_temp_serializer__pop(context); +} + +/* Utility function to deserialize the STRING inside the BUFFER. + */ +static void +deserialize_svn_string(void *buffer, svn_string_t **string) +{ + svn_temp_deserializer__resolve(buffer, (void **)string); + if (*string == NULL) + return; + + svn_temp_deserializer__resolve(*string, (void **)&(*string)->data); +} + +/* Utility function to serialize the REPRESENTATION within the given + * serialization CONTEXT. + */ +static void +serialize_representation(svn_temp_serializer__context_t *context, + svn_fs_x__representation_t * const *representation) +{ + const svn_fs_x__representation_t * rep = *representation; + if (rep == NULL) + return; + + /* serialize the representation struct itself */ + svn_temp_serializer__add_leaf(context, + (const void * const *)representation, + sizeof(*rep)); +} + +void +svn_fs_x__serialize_apr_array(svn_temp_serializer__context_t *context, + apr_array_header_t **a) +{ + const apr_array_header_t *array = *a; + + /* Nothing to do for NULL string references. */ + if (array == NULL) + return; + + /* array header struct */ + svn_temp_serializer__push(context, + (const void * const *)a, + sizeof(*array)); + + /* contents */ + svn_temp_serializer__add_leaf(context, + (const void * const *)&array->elts, + (apr_size_t)array->nelts * array->elt_size); + + /* back to the caller's nesting level */ + svn_temp_serializer__pop(context); +} + +void +svn_fs_x__deserialize_apr_array(void *buffer, + apr_array_header_t **array, + apr_pool_t *pool) +{ + svn_temp_deserializer__resolve(buffer, (void **)array); + if (*array == NULL) + return; + + svn_temp_deserializer__resolve(*array, (void **)&(*array)->elts); + (*array)->pool = pool; +} + +/* auxilliary structure representing the content of a directory array */ +typedef struct dir_data_t +{ + /* number of entries in the directory + * (it's int because the directory is an APR array) */ + int count; + + /* number of unused dir entry buckets in the index */ + apr_size_t over_provision; + + /* internal modifying operations counter + * (used to repack data once in a while) */ + apr_size_t operations; + + /* size of the serialization buffer actually used. + * (we will allocate more than we actually need such that we may + * append more data in situ later) */ + apr_size_t len; + + /* reference to the entries */ + svn_fs_x__dirent_t **entries; + + /* size of the serialized entries and don't be too wasteful + * (needed since the entries are no longer in sequence) */ + apr_uint32_t *lengths; +} dir_data_t; + +/* Utility function to serialize the *ENTRY_P into a the given + * serialization CONTEXT. Return the serialized size of the + * dir entry in *LENGTH. + */ +static void +serialize_dir_entry(svn_temp_serializer__context_t *context, + svn_fs_x__dirent_t **entry_p, + apr_uint32_t *length) +{ + svn_fs_x__dirent_t *entry = *entry_p; + apr_size_t initial_length = svn_temp_serializer__get_length(context); + + svn_temp_serializer__push(context, + (const void * const *)entry_p, + sizeof(svn_fs_x__dirent_t)); + + svn_temp_serializer__add_string(context, &entry->name); + + *length = (apr_uint32_t)( svn_temp_serializer__get_length(context) + - APR_ALIGN_DEFAULT(initial_length)); + + svn_temp_serializer__pop(context); +} + +/* Utility function to serialize the ENTRIES into a new serialization + * context to be returned. + * + * Temporary allocation will be made form SCRATCH_POOL. + */ +static svn_temp_serializer__context_t * +serialize_dir(apr_array_header_t *entries, + apr_pool_t *scratch_pool) +{ + dir_data_t dir_data; + int i = 0; + svn_temp_serializer__context_t *context; + + /* calculate sizes */ + int count = entries->nelts; + apr_size_t over_provision = 2 + count / 4; + apr_size_t entries_len = (count + over_provision) + * sizeof(svn_fs_x__dirent_t*); + apr_size_t lengths_len = (count + over_provision) * sizeof(apr_uint32_t); + + /* copy the hash entries to an auxiliary struct of known layout */ + dir_data.count = count; + dir_data.over_provision = over_provision; + dir_data.operations = 0; + dir_data.entries = apr_palloc(scratch_pool, entries_len); + dir_data.lengths = apr_palloc(scratch_pool, lengths_len); + + for (i = 0; i < count; ++i) + dir_data.entries[i] = APR_ARRAY_IDX(entries, i, svn_fs_x__dirent_t *); + + /* Serialize that aux. structure into a new one. Also, provide a good + * estimate for the size of the buffer that we will need. */ + context = svn_temp_serializer__init(&dir_data, + sizeof(dir_data), + 50 + count * 200 + entries_len, + scratch_pool); + + /* serialize entries references */ + svn_temp_serializer__push(context, + (const void * const *)&dir_data.entries, + entries_len); + + /* serialize the individual entries and their sub-structures */ + for (i = 0; i < count; ++i) + serialize_dir_entry(context, + &dir_data.entries[i], + &dir_data.lengths[i]); + + svn_temp_serializer__pop(context); + + /* serialize entries references */ + svn_temp_serializer__push(context, + (const void * const *)&dir_data.lengths, + lengths_len); + + return context; +} + +/* Utility function to reconstruct a dir entries array from serialized data + * in BUFFER and DIR_DATA. Allocation will be made form POOL. + */ +static apr_array_header_t * +deserialize_dir(void *buffer, dir_data_t *dir_data, apr_pool_t *pool) +{ + apr_array_header_t *result + = apr_array_make(pool, dir_data->count, sizeof(svn_fs_x__dirent_t *)); + apr_size_t i; + apr_size_t count; + svn_fs_x__dirent_t *entry; + svn_fs_x__dirent_t **entries; + + /* resolve the reference to the entries array */ + svn_temp_deserializer__resolve(buffer, (void **)&dir_data->entries); + entries = dir_data->entries; + + /* fixup the references within each entry and add it to the hash */ + for (i = 0, count = dir_data->count; i < count; ++i) + { + svn_temp_deserializer__resolve(entries, (void **)&entries[i]); + entry = dir_data->entries[i]; + + /* pointer fixup */ + svn_temp_deserializer__resolve(entry, (void **)&entry->name); + + /* add the entry to the hash */ + APR_ARRAY_PUSH(result, svn_fs_x__dirent_t *) = entry; + } + + /* return the now complete hash */ + return result; +} + +void +svn_fs_x__noderev_serialize(svn_temp_serializer__context_t *context, + svn_fs_x__noderev_t * const *noderev_p) +{ + const svn_fs_x__noderev_t *noderev = *noderev_p; + if (noderev == NULL) + return; + + /* serialize the representation struct itself */ + svn_temp_serializer__push(context, + (const void * const *)noderev_p, + sizeof(*noderev)); + + /* serialize sub-structures */ + serialize_representation(context, &noderev->prop_rep); + serialize_representation(context, &noderev->data_rep); + + svn_temp_serializer__add_string(context, &noderev->copyfrom_path); + svn_temp_serializer__add_string(context, &noderev->copyroot_path); + svn_temp_serializer__add_string(context, &noderev->created_path); + + /* return to the caller's nesting level */ + svn_temp_serializer__pop(context); +} + + +void +svn_fs_x__noderev_deserialize(void *buffer, + svn_fs_x__noderev_t **noderev_p, + apr_pool_t *pool) +{ + svn_fs_x__noderev_t *noderev; + + /* fixup the reference to the representation itself, + * if this is part of a parent structure. */ + if (buffer != *noderev_p) + svn_temp_deserializer__resolve(buffer, (void **)noderev_p); + + noderev = *noderev_p; + if (noderev == NULL) + return; + + /* fixup of sub-structures */ + svn_temp_deserializer__resolve(noderev, (void **)&noderev->prop_rep); + svn_temp_deserializer__resolve(noderev, (void **)&noderev->data_rep); + + svn_temp_deserializer__resolve(noderev, (void **)&noderev->copyfrom_path); + svn_temp_deserializer__resolve(noderev, (void **)&noderev->copyroot_path); + svn_temp_deserializer__resolve(noderev, (void **)&noderev->created_path); +} + + +/* Utility function to serialize COUNT svn_txdelta_op_t objects + * at OPS in the given serialization CONTEXT. + */ +static void +serialize_txdelta_ops(svn_temp_serializer__context_t *context, + const svn_txdelta_op_t * const * ops, + apr_size_t count) +{ + if (*ops == NULL) + return; + + /* the ops form a contiguous chunk of memory with no further references */ + svn_temp_serializer__add_leaf(context, + (const void * const *)ops, + count * sizeof(svn_txdelta_op_t)); +} + +/* Utility function to serialize W in the given serialization CONTEXT. + */ +static void +serialize_txdeltawindow(svn_temp_serializer__context_t *context, + svn_txdelta_window_t * const * w) +{ + svn_txdelta_window_t *window = *w; + + /* serialize the window struct itself */ + svn_temp_serializer__push(context, + (const void * const *)w, + sizeof(svn_txdelta_window_t)); + + /* serialize its sub-structures */ + serialize_txdelta_ops(context, &window->ops, window->num_ops); + serialize_svn_string(context, &window->new_data); + + svn_temp_serializer__pop(context); +} + +svn_error_t * +svn_fs_x__serialize_txdelta_window(void **buffer, + apr_size_t *buffer_size, + void *item, + apr_pool_t *pool) +{ + svn_fs_x__txdelta_cached_window_t *window_info = item; + svn_stringbuf_t *serialized; + + /* initialize the serialization process and allocate a buffer large + * enough to do without the need of re-allocations in most cases. */ + apr_size_t text_len = window_info->window->new_data + ? window_info->window->new_data->len + : 0; + svn_temp_serializer__context_t *context = + svn_temp_serializer__init(window_info, + sizeof(*window_info), + 500 + text_len, + pool); + + /* serialize the sub-structure(s) */ + serialize_txdeltawindow(context, &window_info->window); + + /* return the serialized result */ + serialized = svn_temp_serializer__get(context); + + *buffer = serialized->data; + *buffer_size = serialized->len; + + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__deserialize_txdelta_window(void **item, + void *buffer, + apr_size_t buffer_size, + apr_pool_t *pool) +{ + svn_txdelta_window_t *window; + + /* Copy the _full_ buffer as it also contains the sub-structures. */ + svn_fs_x__txdelta_cached_window_t *window_info = + (svn_fs_x__txdelta_cached_window_t *)buffer; + + /* pointer reference fixup */ + svn_temp_deserializer__resolve(window_info, + (void **)&window_info->window); + window = window_info->window; + + svn_temp_deserializer__resolve(window, (void **)&window->ops); + + deserialize_svn_string(window, (svn_string_t**)&window->new_data); + + /* done */ + *item = window_info; + + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__serialize_manifest(void **data, + apr_size_t *data_len, + void *in, + apr_pool_t *pool) +{ + apr_array_header_t *manifest = in; + + *data_len = sizeof(apr_off_t) *manifest->nelts; + *data = apr_palloc(pool, *data_len); + memcpy(*data, manifest->elts, *data_len); + + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__deserialize_manifest(void **out, + void *data, + apr_size_t data_len, + apr_pool_t *pool) +{ + apr_array_header_t *manifest = apr_array_make(pool, 1, sizeof(apr_off_t)); + + manifest->nelts = (int) (data_len / sizeof(apr_off_t)); + manifest->nalloc = (int) (data_len / sizeof(apr_off_t)); + manifest->elts = (char*)data; + + *out = manifest; + + return SVN_NO_ERROR; +} + +/* Auxiliary structure representing the content of a properties hash. + This structure is much easier to (de-)serialize than an apr_hash. + */ +typedef struct properties_data_t +{ + /* number of entries in the hash */ + apr_size_t count; + + /* reference to the keys */ + const char **keys; + + /* reference to the values */ + const svn_string_t **values; +} properties_data_t; + +/* Serialize COUNT C-style strings from *STRINGS into CONTEXT. */ +static void +serialize_cstring_array(svn_temp_serializer__context_t *context, + const char ***strings, + apr_size_t count) +{ + apr_size_t i; + const char **entries = *strings; + + /* serialize COUNT entries pointers (the array) */ + svn_temp_serializer__push(context, + (const void * const *)strings, + count * sizeof(const char*)); + + /* serialize array elements */ + for (i = 0; i < count; ++i) + svn_temp_serializer__add_string(context, &entries[i]); + + svn_temp_serializer__pop(context); +} + +/* Serialize COUNT svn_string_t* items from *STRINGS into CONTEXT. */ +static void +serialize_svn_string_array(svn_temp_serializer__context_t *context, + const svn_string_t ***strings, + apr_size_t count) +{ + apr_size_t i; + const svn_string_t **entries = *strings; + + /* serialize COUNT entries pointers (the array) */ + svn_temp_serializer__push(context, + (const void * const *)strings, + count * sizeof(const char*)); + + /* serialize array elements */ + for (i = 0; i < count; ++i) + serialize_svn_string(context, &entries[i]); + + svn_temp_serializer__pop(context); +} + +svn_error_t * +svn_fs_x__serialize_properties(void **data, + apr_size_t *data_len, + void *in, + apr_pool_t *pool) +{ + apr_hash_t *hash = in; + properties_data_t properties; + svn_temp_serializer__context_t *context; + apr_hash_index_t *hi; + svn_stringbuf_t *serialized; + apr_size_t i; + + /* create our auxiliary data structure */ + properties.count = apr_hash_count(hash); + properties.keys = apr_palloc(pool, sizeof(const char*) * (properties.count + 1)); + properties.values = apr_palloc(pool, sizeof(const char*) * properties.count); + + /* populate it with the hash entries */ + for (hi = apr_hash_first(pool, hash), i=0; hi; hi = apr_hash_next(hi), ++i) + { + properties.keys[i] = apr_hash_this_key(hi); + properties.values[i] = apr_hash_this_val(hi); + } + + /* serialize it */ + context = svn_temp_serializer__init(&properties, + sizeof(properties), + properties.count * 100, + pool); + + properties.keys[i] = ""; + serialize_cstring_array(context, &properties.keys, properties.count + 1); + serialize_svn_string_array(context, &properties.values, properties.count); + + /* return the serialized result */ + serialized = svn_temp_serializer__get(context); + + *data = serialized->data; + *data_len = serialized->len; + + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__deserialize_properties(void **out, + void *data, + apr_size_t data_len, + apr_pool_t *pool) +{ + apr_hash_t *hash = svn_hash__make(pool); + properties_data_t *properties = (properties_data_t *)data; + size_t i; + + /* de-serialize our auxiliary data structure */ + svn_temp_deserializer__resolve(properties, (void**)&properties->keys); + svn_temp_deserializer__resolve(properties, (void**)&properties->values); + + /* de-serialize each entry and put it into the hash */ + for (i = 0; i < properties->count; ++i) + { + apr_size_t len = properties->keys[i+1] - properties->keys[i] - 1; + svn_temp_deserializer__resolve(properties->keys, + (void**)&properties->keys[i]); + + deserialize_svn_string(properties->values, + (svn_string_t **)&properties->values[i]); + + apr_hash_set(hash, + properties->keys[i], len, + properties->values[i]); + } + + /* done */ + *out = hash; + + return SVN_NO_ERROR; +} + +/** Caching svn_fs_x__noderev_t objects. **/ + +svn_error_t * +svn_fs_x__serialize_node_revision(void **buffer, + apr_size_t *buffer_size, + void *item, + apr_pool_t *pool) +{ + svn_stringbuf_t *serialized; + svn_fs_x__noderev_t *noderev = item; + + /* create an (empty) serialization context with plenty of (initial) + * buffer space. */ + svn_temp_serializer__context_t *context = + svn_temp_serializer__init(NULL, 0, + 1024 - SVN_TEMP_SERIALIZER__OVERHEAD, + pool); + + /* serialize the noderev */ + svn_fs_x__noderev_serialize(context, &noderev); + + /* return serialized data */ + serialized = svn_temp_serializer__get(context); + *buffer = serialized->data; + *buffer_size = serialized->len; + + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__deserialize_node_revision(void **item, + void *buffer, + apr_size_t buffer_size, + apr_pool_t *pool) +{ + /* Copy the _full_ buffer as it also contains the sub-structures. */ + svn_fs_x__noderev_t *noderev = (svn_fs_x__noderev_t *)buffer; + + /* fixup of all pointers etc. */ + svn_fs_x__noderev_deserialize(noderev, &noderev, pool); + + /* done */ + *item = noderev; + return SVN_NO_ERROR; +} + +/* Utility function that returns the directory serialized inside CONTEXT + * to DATA and DATA_LEN. */ +static svn_error_t * +return_serialized_dir_context(svn_temp_serializer__context_t *context, + void **data, + apr_size_t *data_len) +{ + svn_stringbuf_t *serialized = svn_temp_serializer__get(context); + + *data = serialized->data; + *data_len = serialized->blocksize; + ((dir_data_t *)serialized->data)->len = serialized->len; + + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__serialize_dir_entries(void **data, + apr_size_t *data_len, + void *in, + apr_pool_t *pool) +{ + apr_array_header_t *dir = in; + + /* serialize the dir content into a new serialization context + * and return the serialized data */ + return return_serialized_dir_context(serialize_dir(dir, pool), + data, + data_len); +} + +svn_error_t * +svn_fs_x__deserialize_dir_entries(void **out, + void *data, + apr_size_t data_len, + apr_pool_t *pool) +{ + /* Copy the _full_ buffer as it also contains the sub-structures. */ + dir_data_t *dir_data = (dir_data_t *)data; + + /* reconstruct the hash from the serialized data */ + *out = deserialize_dir(dir_data, dir_data, pool); + + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__get_sharded_offset(void **out, + const void *data, + apr_size_t data_len, + void *baton, + apr_pool_t *pool) +{ + const apr_off_t *manifest = data; + apr_int64_t shard_pos = *(apr_int64_t *)baton; + + *(apr_off_t *)out = manifest[shard_pos]; + + return SVN_NO_ERROR; +} + +/* Utility function that returns the lowest index of the first entry in + * *ENTRIES that points to a dir entry with a name equal or larger than NAME. + * If an exact match has been found, *FOUND will be set to TRUE. COUNT is + * the number of valid entries in ENTRIES. + */ +static apr_size_t +find_entry(svn_fs_x__dirent_t **entries, + const char *name, + apr_size_t count, + svn_boolean_t *found) +{ + /* binary search for the desired entry by name */ + apr_size_t lower = 0; + apr_size_t upper = count; + apr_size_t middle; + + for (middle = upper / 2; lower < upper; middle = (upper + lower) / 2) + { + const svn_fs_x__dirent_t *entry = + svn_temp_deserializer__ptr(entries, (const void *const *)&entries[middle]); + const char* entry_name = + svn_temp_deserializer__ptr(entry, (const void *const *)&entry->name); + + int diff = strcmp(entry_name, name); + if (diff < 0) + lower = middle + 1; + else + upper = middle; + } + + /* check whether we actually found a match */ + *found = FALSE; + if (lower < count) + { + const svn_fs_x__dirent_t *entry = + svn_temp_deserializer__ptr(entries, (const void *const *)&entries[lower]); + const char* entry_name = + svn_temp_deserializer__ptr(entry, (const void *const *)&entry->name); + + if (strcmp(entry_name, name) == 0) + *found = TRUE; + } + + return lower; +} + +/* Utility function that returns TRUE if entry number IDX in ENTRIES has the + * name NAME. + */ +static svn_boolean_t +found_entry(const svn_fs_x__dirent_t * const *entries, + const char *name, + apr_size_t idx) +{ + /* check whether we actually found a match */ + const svn_fs_x__dirent_t *entry = + svn_temp_deserializer__ptr(entries, (const void *const *)&entries[idx]); + const char* entry_name = + svn_temp_deserializer__ptr(entry, (const void *const *)&entry->name); + + return strcmp(entry_name, name) == 0; +} + +svn_error_t * +svn_fs_x__extract_dir_entry(void **out, + const void *data, + apr_size_t data_len, + void *baton, + apr_pool_t *pool) +{ + const dir_data_t *dir_data = data; + svn_fs_x__ede_baton_t *b = baton; + svn_boolean_t found; + apr_size_t pos; + + /* resolve the reference to the entries array */ + const svn_fs_x__dirent_t * const *entries = + svn_temp_deserializer__ptr(data, (const void *const *)&dir_data->entries); + + /* resolve the reference to the lengths array */ + const apr_uint32_t *lengths = + svn_temp_deserializer__ptr(data, (const void *const *)&dir_data->lengths); + + /* Special case: Early out for empty directories. + That simplifies tests further down the road. */ + *out = NULL; + if (dir_data->count == 0) + return SVN_NO_ERROR; + + /* HINT _might_ be the position we hit last time. + If within valid range, check whether HINT+1 is a hit. */ + if ( b->hint < dir_data->count - 1 + && found_entry(entries, b->name, b->hint + 1)) + { + /* Got lucky. */ + pos = b->hint + 1; + found = TRUE; + } + else + { + /* Binary search for the desired entry by name. */ + pos = find_entry((svn_fs_x__dirent_t **)entries, b->name, + dir_data->count, &found); + } + + /* Remember the hit index - if we FOUND the entry. */ + if (found) + b->hint = pos; + + /* de-serialize that entry or return NULL, if no match has been found */ + if (found) + { + const svn_fs_x__dirent_t *source = + svn_temp_deserializer__ptr(entries, (const void *const *)&entries[pos]); + + /* Entries have been serialized one-by-one, each time including all + * nested structures and strings. Therefore, they occupy a single + * block of memory whose end-offset is either the beginning of the + * next entry or the end of the buffer + */ + apr_size_t size = lengths[pos]; + + /* copy & deserialize the entry */ + svn_fs_x__dirent_t *new_entry = apr_palloc(pool, size); + memcpy(new_entry, source, size); + + svn_temp_deserializer__resolve(new_entry, (void **)&new_entry->name); + *(svn_fs_x__dirent_t **)out = new_entry; + } + + return SVN_NO_ERROR; +} + +/* Utility function for svn_fs_x__replace_dir_entry that implements the + * modification as a simply deserialize / modify / serialize sequence. + */ +static svn_error_t * +slowly_replace_dir_entry(void **data, + apr_size_t *data_len, + void *baton, + apr_pool_t *pool) +{ + replace_baton_t *replace_baton = (replace_baton_t *)baton; + dir_data_t *dir_data = (dir_data_t *)*data; + apr_array_header_t *dir; + int idx = -1; + svn_fs_x__dirent_t *entry; + + SVN_ERR(svn_fs_x__deserialize_dir_entries((void **)&dir, + *data, + dir_data->len, + pool)); + + entry = svn_fs_x__find_dir_entry(dir, replace_baton->name, &idx); + + /* Replacement or removal? */ + if (replace_baton->new_entry) + { + /* Replace ENTRY with / insert the NEW_ENTRY */ + if (entry) + APR_ARRAY_IDX(dir, idx, svn_fs_x__dirent_t *) + = replace_baton->new_entry; + else + svn_sort__array_insert(dir, &replace_baton->new_entry, idx); + } + else + { + /* Remove the old ENTRY. */ + if (entry) + svn_sort__array_delete(dir, idx, 1); + } + + return svn_fs_x__serialize_dir_entries(data, data_len, dir, pool); +} + +svn_error_t * +svn_fs_x__replace_dir_entry(void **data, + apr_size_t *data_len, + void *baton, + apr_pool_t *pool) +{ + replace_baton_t *replace_baton = (replace_baton_t *)baton; + dir_data_t *dir_data = (dir_data_t *)*data; + svn_boolean_t found; + svn_fs_x__dirent_t **entries; + apr_uint32_t *lengths; + apr_uint32_t length; + apr_size_t pos; + + svn_temp_serializer__context_t *context; + + /* after quite a number of operations, let's re-pack everything. + * This is to limit the number of wasted space as we cannot overwrite + * existing data but must always append. */ + if (dir_data->operations > 2 + dir_data->count / 4) + return slowly_replace_dir_entry(data, data_len, baton, pool); + + /* resolve the reference to the entries array */ + entries = (svn_fs_x__dirent_t **) + svn_temp_deserializer__ptr((const char *)dir_data, + (const void *const *)&dir_data->entries); + + /* resolve the reference to the lengths array */ + lengths = (apr_uint32_t *) + svn_temp_deserializer__ptr((const char *)dir_data, + (const void *const *)&dir_data->lengths); + + /* binary search for the desired entry by name */ + pos = find_entry(entries, replace_baton->name, dir_data->count, &found); + + /* handle entry removal (if found at all) */ + if (replace_baton->new_entry == NULL) + { + if (found) + { + /* remove reference to the entry from the index */ + memmove(&entries[pos], + &entries[pos + 1], + sizeof(entries[pos]) * (dir_data->count - pos)); + memmove(&lengths[pos], + &lengths[pos + 1], + sizeof(lengths[pos]) * (dir_data->count - pos)); + + dir_data->count--; + dir_data->over_provision++; + dir_data->operations++; + } + + return SVN_NO_ERROR; + } + + /* if not found, prepare to insert the new entry */ + if (!found) + { + /* fallback to slow operation if there is no place left to insert an + * new entry to index. That will automatically give add some spare + * entries ("overprovision"). */ + if (dir_data->over_provision == 0) + return slowly_replace_dir_entry(data, data_len, baton, pool); + + /* make entries[index] available for pointing to the new entry */ + memmove(&entries[pos + 1], + &entries[pos], + sizeof(entries[pos]) * (dir_data->count - pos)); + memmove(&lengths[pos + 1], + &lengths[pos], + sizeof(lengths[pos]) * (dir_data->count - pos)); + + dir_data->count++; + dir_data->over_provision--; + dir_data->operations++; + } + + /* de-serialize the new entry */ + entries[pos] = replace_baton->new_entry; + context = svn_temp_serializer__init_append(dir_data, + entries, + dir_data->len, + *data_len, + pool); + serialize_dir_entry(context, &entries[pos], &length); + + /* return the updated serialized data */ + SVN_ERR (return_serialized_dir_context(context, + data, + data_len)); + + /* since the previous call may have re-allocated the buffer, the lengths + * pointer may no longer point to the entry in that buffer. Therefore, + * re-map it again and store the length value after that. */ + + dir_data = (dir_data_t *)*data; + lengths = (apr_uint32_t *) + svn_temp_deserializer__ptr((const char *)dir_data, + (const void *const *)&dir_data->lengths); + lengths[pos] = length; + + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__serialize_rep_header(void **data, + apr_size_t *data_len, + void *in, + apr_pool_t *pool) +{ + svn_fs_x__rep_header_t *copy = apr_palloc(pool, sizeof(*copy)); + *copy = *(svn_fs_x__rep_header_t *)in; + + *data_len = sizeof(svn_fs_x__rep_header_t); + *data = copy; + + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__deserialize_rep_header(void **out, + void *data, + apr_size_t data_len, + apr_pool_t *pool) +{ + svn_fs_x__rep_header_t *copy = apr_palloc(pool, sizeof(*copy)); + SVN_ERR_ASSERT(data_len == sizeof(*copy)); + + *copy = *(svn_fs_x__rep_header_t *)data; + *out = data; + + return SVN_NO_ERROR; +} + +/* Utility function to serialize change CHANGE_P in the given serialization + * CONTEXT. + */ +static void +serialize_change(svn_temp_serializer__context_t *context, + svn_fs_x__change_t * const *change_p) +{ + const svn_fs_x__change_t * change = *change_p; + if (change == NULL) + return; + + /* serialize the change struct itself */ + svn_temp_serializer__push(context, + (const void * const *)change_p, + sizeof(*change)); + + /* serialize sub-structures */ + svn_temp_serializer__add_string(context, &change->path.data); + svn_temp_serializer__add_string(context, &change->copyfrom_path); + + /* return to the caller's nesting level */ + svn_temp_serializer__pop(context); +} + +/* Utility function to serialize the CHANGE_P within the given + * serialization CONTEXT. + */ +static void +deserialize_change(void *buffer, + svn_fs_x__change_t **change_p, + apr_pool_t *pool) +{ + svn_fs_x__change_t * change; + + /* fix-up of the pointer to the struct in question */ + svn_temp_deserializer__resolve(buffer, (void **)change_p); + + change = *change_p; + if (change == NULL) + return; + + /* fix-up of sub-structures */ + svn_temp_deserializer__resolve(change, (void **)&change->path.data); + svn_temp_deserializer__resolve(change, (void **)&change->copyfrom_path); +} + +/* Auxiliary structure representing the content of a svn_fs_x__change_t array. + This structure is much easier to (de-)serialize than an APR array. + */ +typedef struct changes_data_t +{ + /* number of entries in the array */ + int count; + + /* reference to the changes */ + svn_fs_x__change_t **changes; +} changes_data_t; + +svn_error_t * +svn_fs_x__serialize_changes(void **data, + apr_size_t *data_len, + void *in, + apr_pool_t *pool) +{ + apr_array_header_t *array = in; + changes_data_t changes; + svn_temp_serializer__context_t *context; + svn_stringbuf_t *serialized; + int i; + + /* initialize our auxiliary data structure and link it to the + * array elements */ + changes.count = array->nelts; + changes.changes = (svn_fs_x__change_t **)array->elts; + + /* serialize it and all its elements */ + context = svn_temp_serializer__init(&changes, + sizeof(changes), + changes.count * 250, + pool); + + svn_temp_serializer__push(context, + (const void * const *)&changes.changes, + changes.count * sizeof(svn_fs_x__change_t*)); + + for (i = 0; i < changes.count; ++i) + serialize_change(context, &changes.changes[i]); + + svn_temp_serializer__pop(context); + + /* return the serialized result */ + serialized = svn_temp_serializer__get(context); + + *data = serialized->data; + *data_len = serialized->len; + + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__deserialize_changes(void **out, + void *data, + apr_size_t data_len, + apr_pool_t *pool) +{ + int i; + changes_data_t *changes = (changes_data_t *)data; + apr_array_header_t *array = apr_array_make(pool, 0, + sizeof(svn_fs_x__change_t *)); + + /* de-serialize our auxiliary data structure */ + svn_temp_deserializer__resolve(changes, (void**)&changes->changes); + + /* de-serialize each entry and add it to the array */ + for (i = 0; i < changes->count; ++i) + deserialize_change(changes->changes, + (svn_fs_x__change_t **)&changes->changes[i], + pool); + + /* Use the changes buffer as the array's data buffer + * (DATA remains valid for at least as long as POOL). */ + array->elts = (char *)changes->changes; + array->nelts = changes->count; + array->nalloc = changes->count; + + /* done */ + *out = array; + + return SVN_NO_ERROR; +} + +/* Auxiliary structure representing the content of a svn_mergeinfo_t hash. + This structure is much easier to (de-)serialize than an APR array. + */ +typedef struct mergeinfo_data_t +{ + /* number of paths in the hash */ + unsigned count; + + /* COUNT keys (paths) */ + const char **keys; + + /* COUNT keys lengths (strlen of path) */ + apr_ssize_t *key_lengths; + + /* COUNT entries, each giving the number of ranges for the key */ + int *range_counts; + + /* all ranges in a single, concatenated buffer */ + svn_merge_range_t *ranges; +} mergeinfo_data_t; + +svn_error_t * +svn_fs_x__serialize_mergeinfo(void **data, + apr_size_t *data_len, + void *in, + apr_pool_t *pool) +{ + svn_mergeinfo_t mergeinfo = in; + mergeinfo_data_t merges; + svn_temp_serializer__context_t *context; + svn_stringbuf_t *serialized; + apr_hash_index_t *hi; + unsigned i; + int k; + apr_size_t range_count; + + /* initialize our auxiliary data structure */ + merges.count = apr_hash_count(mergeinfo); + merges.keys = apr_palloc(pool, sizeof(*merges.keys) * merges.count); + merges.key_lengths = apr_palloc(pool, sizeof(*merges.key_lengths) * + merges.count); + merges.range_counts = apr_palloc(pool, sizeof(*merges.range_counts) * + merges.count); + + i = 0; + range_count = 0; + for (hi = apr_hash_first(pool, mergeinfo); hi; hi = apr_hash_next(hi), ++i) + { + svn_rangelist_t *ranges; + apr_hash_this(hi, (const void**)&merges.keys[i], + &merges.key_lengths[i], + (void **)&ranges); + merges.range_counts[i] = ranges->nelts; + range_count += ranges->nelts; + } + + merges.ranges = apr_palloc(pool, sizeof(*merges.ranges) * range_count); + + i = 0; + for (hi = apr_hash_first(pool, mergeinfo); hi; hi = apr_hash_next(hi)) + { + svn_rangelist_t *ranges = apr_hash_this_val(hi); + for (k = 0; k < ranges->nelts; ++k, ++i) + merges.ranges[i] = *APR_ARRAY_IDX(ranges, k, svn_merge_range_t*); + } + + /* serialize it and all its elements */ + context = svn_temp_serializer__init(&merges, + sizeof(merges), + range_count * 30, + pool); + + /* keys array */ + svn_temp_serializer__push(context, + (const void * const *)&merges.keys, + merges.count * sizeof(*merges.keys)); + + for (i = 0; i < merges.count; ++i) + svn_temp_serializer__add_string(context, &merges.keys[i]); + + svn_temp_serializer__pop(context); + + /* key lengths array */ + svn_temp_serializer__add_leaf(context, + (const void * const *)&merges.key_lengths, + merges.count * sizeof(*merges.key_lengths)); + + /* range counts array */ + svn_temp_serializer__add_leaf(context, + (const void * const *)&merges.range_counts, + merges.count * sizeof(*merges.range_counts)); + + /* ranges */ + svn_temp_serializer__add_leaf(context, + (const void * const *)&merges.ranges, + range_count * sizeof(*merges.ranges)); + + /* return the serialized result */ + serialized = svn_temp_serializer__get(context); + + *data = serialized->data; + *data_len = serialized->len; + + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__deserialize_mergeinfo(void **out, + void *data, + apr_size_t data_len, + apr_pool_t *pool) +{ + unsigned i; + int k, n; + mergeinfo_data_t *merges = (mergeinfo_data_t *)data; + svn_mergeinfo_t mergeinfo; + + /* de-serialize our auxiliary data structure */ + svn_temp_deserializer__resolve(merges, (void**)&merges->keys); + svn_temp_deserializer__resolve(merges, (void**)&merges->key_lengths); + svn_temp_deserializer__resolve(merges, (void**)&merges->range_counts); + svn_temp_deserializer__resolve(merges, (void**)&merges->ranges); + + /* de-serialize keys and add entries to the result */ + n = 0; + mergeinfo = svn_hash__make(pool); + for (i = 0; i < merges->count; ++i) + { + svn_rangelist_t *ranges = apr_array_make(pool, + merges->range_counts[i], + sizeof(svn_merge_range_t*)); + for (k = 0; k < merges->range_counts[i]; ++k, ++n) + APR_ARRAY_PUSH(ranges, svn_merge_range_t*) = &merges->ranges[n]; + + svn_temp_deserializer__resolve(merges->keys, + (void**)&merges->keys[i]); + apr_hash_set(mergeinfo, merges->keys[i], merges->key_lengths[i], ranges); + } + + /* done */ + *out = mergeinfo; + + return SVN_NO_ERROR; +} + diff --git a/subversion/libsvn_fs_x/temp_serializer.h b/subversion/libsvn_fs_x/temp_serializer.h new file mode 100644 index 0000000..80f5004 --- /dev/null +++ b/subversion/libsvn_fs_x/temp_serializer.h @@ -0,0 +1,301 @@ +/* temp_serializer.h : serialization functions for caching of FSX structures + * + * ==================================================================== + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + * ==================================================================== + */ + +#ifndef SVN_LIBSVN_FS__TEMP_SERIALIZER_H +#define SVN_LIBSVN_FS__TEMP_SERIALIZER_H + +#include "private/svn_temp_serializer.h" +#include "fs.h" + +/** + * Prepend the @a number to the @a string in a space efficient way such that + * no other (number,string) combination can produce the same result. + * Allocate temporaries as well as the result from @a pool. + */ +const char* +svn_fs_x__combine_number_and_string(apr_int64_t number, + const char *string, + apr_pool_t *pool); + +/** + * Serialize a @a noderev_p within the serialization @a context. + */ +void +svn_fs_x__noderev_serialize(struct svn_temp_serializer__context_t *context, + svn_fs_x__noderev_t * const *noderev_p); + +/** + * Deserialize a @a noderev_p within the @a buffer and associate it with + * @a pool. + */ +void +svn_fs_x__noderev_deserialize(void *buffer, + svn_fs_x__noderev_t **noderev_p, + apr_pool_t *pool); + +/** + * Serialize APR array @a *a within the serialization @a context. + * The elements within the array must not contain pointers. + */ +void +svn_fs_x__serialize_apr_array(struct svn_temp_serializer__context_t *context, + apr_array_header_t **a); + +/** + * Deserialize APR @a *array within the @a buffer. Set its pool member to + * @a pool. The elements within the array must not contain pointers. + */ +void +svn_fs_x__deserialize_apr_array(void *buffer, + apr_array_header_t **array, + apr_pool_t *pool); + + +/** + * #svn_txdelta_window_t is not sufficient for caching the data it + * represents because data read process needs auxiliary information. + */ +typedef struct +{ + /* the txdelta window information cached / to be cached */ + svn_txdelta_window_t *window; + + /* the revision file read pointer position before reading the window */ + apr_off_t start_offset; + + /* the revision file read pointer position right after reading the window */ + apr_off_t end_offset; +} svn_fs_x__txdelta_cached_window_t; + +/** + * Implements #svn_cache__serialize_func_t for + * #svn_fs_x__txdelta_cached_window_t. + */ +svn_error_t * +svn_fs_x__serialize_txdelta_window(void **buffer, + apr_size_t *buffer_size, + void *item, + apr_pool_t *pool); + +/** + * Implements #svn_cache__deserialize_func_t for + * #svn_fs_x__txdelta_cached_window_t. + */ +svn_error_t * +svn_fs_x__deserialize_txdelta_window(void **item, + void *buffer, + apr_size_t buffer_size, + apr_pool_t *pool); + +/** + * Implements #svn_cache__serialize_func_t for a manifest + * (@a in is an #apr_array_header_t of apr_off_t elements). + */ +svn_error_t * +svn_fs_x__serialize_manifest(void **data, + apr_size_t *data_len, + void *in, + apr_pool_t *pool); + +/** + * Implements #svn_cache__deserialize_func_t for a manifest + * (@a *out is an #apr_array_header_t of apr_off_t elements). + */ +svn_error_t * +svn_fs_x__deserialize_manifest(void **out, + void *data, + apr_size_t data_len, + apr_pool_t *pool); + +/** + * Implements #svn_cache__serialize_func_t for a properties hash + * (@a in is an #apr_hash_t of svn_string_t elements, keyed by const char*). + */ +svn_error_t * +svn_fs_x__serialize_properties(void **data, + apr_size_t *data_len, + void *in, + apr_pool_t *pool); + +/** + * Implements #svn_cache__deserialize_func_t for a properties hash + * (@a *out is an #apr_hash_t of svn_string_t elements, keyed by const char*). + */ +svn_error_t * +svn_fs_x__deserialize_properties(void **out, + void *data, + apr_size_t data_len, + apr_pool_t *pool); + +/** + * Implements #svn_cache__serialize_func_t for #svn_fs_x__noderev_t + */ +svn_error_t * +svn_fs_x__serialize_node_revision(void **buffer, + apr_size_t *buffer_size, + void *item, + apr_pool_t *pool); + +/** + * Implements #svn_cache__deserialize_func_t for #svn_fs_x__noderev_t + */ +svn_error_t * +svn_fs_x__deserialize_node_revision(void **item, + void *buffer, + apr_size_t buffer_size, + apr_pool_t *pool); + +/** + * Implements #svn_cache__serialize_func_t for a directory contents array + */ +svn_error_t * +svn_fs_x__serialize_dir_entries(void **data, + apr_size_t *data_len, + void *in, + apr_pool_t *pool); + +/** + * Implements #svn_cache__deserialize_func_t for a directory contents array + */ +svn_error_t * +svn_fs_x__deserialize_dir_entries(void **out, + void *data, + apr_size_t data_len, + apr_pool_t *pool); + +/** + * Implements #svn_cache__partial_getter_func_t. Set (apr_off_t) @a *out + * to the element indexed by (apr_int64_t) @a *baton within the + * serialized manifest array @a data and @a data_len. */ +svn_error_t * +svn_fs_x__get_sharded_offset(void **out, + const void *data, + apr_size_t data_len, + void *baton, + apr_pool_t *pool); + +/** + * Baton type to be used with svn_fs_x__extract_dir_entry. */ +typedef struct svn_fs_x__ede_baton_t +{ + /* Name of the directory entry to find. */ + const char *name; + + /* Lookup hint [in / out] */ + apr_size_t hint; +} svn_fs_x__ede_baton_t; + +/** + * Implements #svn_cache__partial_getter_func_t for a single + * #svn_fs_x__dirent_t within a serialized directory contents hash, + * identified by its name (given in @a svn_fs_x__ede_baton_t @a *baton). + */ +svn_error_t * +svn_fs_x__extract_dir_entry(void **out, + const void *data, + apr_size_t data_len, + void *baton, + apr_pool_t *pool); + +/** + * Describes the change to be done to a directory: Set the entry + * identify by @a name to the value @a new_entry. If the latter is + * @c NULL, the entry shall be removed if it exists. Otherwise it + * will be replaced or automatically added, respectively. + */ +typedef struct replace_baton_t +{ + /** name of the directory entry to modify */ + const char *name; + + /** directory entry to insert instead */ + svn_fs_x__dirent_t *new_entry; +} replace_baton_t; + +/** + * Implements #svn_cache__partial_setter_func_t for a single + * #svn_fs_x__dirent_t within a serialized directory contents hash, + * identified by its name in the #replace_baton_t in @a baton. + */ +svn_error_t * +svn_fs_x__replace_dir_entry(void **data, + apr_size_t *data_len, + void *baton, + apr_pool_t *pool); + +/** + * Implements #svn_cache__serialize_func_t for a #svn_fs_x__rep_header_t. + */ +svn_error_t * +svn_fs_x__serialize_rep_header(void **data, + apr_size_t *data_len, + void *in, + apr_pool_t *pool); + +/** + * Implements #svn_cache__deserialize_func_t for a #svn_fs_x__rep_header_t. + */ +svn_error_t * +svn_fs_x__deserialize_rep_header(void **out, + void *data, + apr_size_t data_len, + apr_pool_t *pool); + +/** + * Implements #svn_cache__serialize_func_t for an #apr_array_header_t of + * #svn_fs_x__change_t *. + */ +svn_error_t * +svn_fs_x__serialize_changes(void **data, + apr_size_t *data_len, + void *in, + apr_pool_t *pool); + +/** + * Implements #svn_cache__deserialize_func_t for an #apr_array_header_t of + * #svn_fs_x__change_t *. + */ +svn_error_t * +svn_fs_x__deserialize_changes(void **out, + void *data, + apr_size_t data_len, + apr_pool_t *pool); + +/** + * Implements #svn_cache__serialize_func_t for #svn_mergeinfo_t objects. + */ +svn_error_t * +svn_fs_x__serialize_mergeinfo(void **data, + apr_size_t *data_len, + void *in, + apr_pool_t *pool); + +/** + * Implements #svn_cache__deserialize_func_t for #svn_mergeinfo_t objects. + */ +svn_error_t * +svn_fs_x__deserialize_mergeinfo(void **out, + void *data, + apr_size_t data_len, + apr_pool_t *pool); + +#endif diff --git a/subversion/libsvn_fs_x/transaction.c b/subversion/libsvn_fs_x/transaction.c new file mode 100644 index 0000000..5f3adc5 --- /dev/null +++ b/subversion/libsvn_fs_x/transaction.c @@ -0,0 +1,3782 @@ +/* transaction.c --- transaction-related functions of FSX + * + * ==================================================================== + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + * ==================================================================== + */ + +#include "transaction.h" + +#include <assert.h> +#include <apr_sha1.h> + +#include "svn_hash.h" +#include "svn_props.h" +#include "svn_sorts.h" +#include "svn_time.h" +#include "svn_dirent_uri.h" + +#include "fs_x.h" +#include "tree.h" +#include "util.h" +#include "id.h" +#include "low_level.h" +#include "temp_serializer.h" +#include "cached_data.h" +#include "lock.h" +#include "rep-cache.h" +#include "index.h" + +#include "private/svn_fs_util.h" +#include "private/svn_fspath.h" +#include "private/svn_sorts_private.h" +#include "private/svn_string_private.h" +#include "private/svn_subr_private.h" +#include "private/svn_io_private.h" +#include "../libsvn_fs/fs-loader.h" + +#include "svn_private_config.h" + +/* The vtable associated with an open transaction object. */ +static txn_vtable_t txn_vtable = { + svn_fs_x__commit_txn, + svn_fs_x__abort_txn, + svn_fs_x__txn_prop, + svn_fs_x__txn_proplist, + svn_fs_x__change_txn_prop, + svn_fs_x__txn_root, + svn_fs_x__change_txn_props +}; + +/* FSX-specific data being attached to svn_fs_txn_t. + */ +typedef struct fs_txn_data_t +{ + /* Strongly typed representation of the TXN's ID member. */ + svn_fs_x__txn_id_t txn_id; +} fs_txn_data_t; + +svn_fs_x__txn_id_t +svn_fs_x__txn_get_id(svn_fs_txn_t *txn) +{ + fs_txn_data_t *ftd = txn->fsap_data; + return ftd->txn_id; +} + +/* Functions for working with shared transaction data. */ + +/* Return the transaction object for transaction TXN_ID from the + transaction list of filesystem FS (which must already be locked via the + txn_list_lock mutex). If the transaction does not exist in the list, + then create a new transaction object and return it (if CREATE_NEW is + true) or return NULL (otherwise). */ +static svn_fs_x__shared_txn_data_t * +get_shared_txn(svn_fs_t *fs, + svn_fs_x__txn_id_t txn_id, + svn_boolean_t create_new) +{ + svn_fs_x__data_t *ffd = fs->fsap_data; + svn_fs_x__shared_data_t *ffsd = ffd->shared; + svn_fs_x__shared_txn_data_t *txn; + + for (txn = ffsd->txns; txn; txn = txn->next) + if (txn->txn_id == txn_id) + break; + + if (txn || !create_new) + return txn; + + /* Use the transaction object from the (single-object) freelist, + if one is available, or otherwise create a new object. */ + if (ffsd->free_txn) + { + txn = ffsd->free_txn; + ffsd->free_txn = NULL; + } + else + { + apr_pool_t *subpool = svn_pool_create(ffsd->common_pool); + txn = apr_palloc(subpool, sizeof(*txn)); + txn->pool = subpool; + } + + txn->txn_id = txn_id; + txn->being_written = FALSE; + + /* Link this transaction into the head of the list. We will typically + be dealing with only one active transaction at a time, so it makes + sense for searches through the transaction list to look at the + newest transactions first. */ + txn->next = ffsd->txns; + ffsd->txns = txn; + + return txn; +} + +/* Free the transaction object for transaction TXN_ID, and remove it + from the transaction list of filesystem FS (which must already be + locked via the txn_list_lock mutex). Do nothing if the transaction + does not exist. */ +static void +free_shared_txn(svn_fs_t *fs, svn_fs_x__txn_id_t txn_id) +{ + svn_fs_x__data_t *ffd = fs->fsap_data; + svn_fs_x__shared_data_t *ffsd = ffd->shared; + svn_fs_x__shared_txn_data_t *txn, *prev = NULL; + + for (txn = ffsd->txns; txn; prev = txn, txn = txn->next) + if (txn->txn_id == txn_id) + break; + + if (!txn) + return; + + if (prev) + prev->next = txn->next; + else + ffsd->txns = txn->next; + + /* As we typically will be dealing with one transaction after another, + we will maintain a single-object free list so that we can hopefully + keep reusing the same transaction object. */ + if (!ffsd->free_txn) + ffsd->free_txn = txn; + else + svn_pool_destroy(txn->pool); +} + + +/* Obtain a lock on the transaction list of filesystem FS, call BODY + with FS, BATON, and POOL, and then unlock the transaction list. + Return what BODY returned. */ +static svn_error_t * +with_txnlist_lock(svn_fs_t *fs, + svn_error_t *(*body)(svn_fs_t *fs, + const void *baton, + apr_pool_t *pool), + const void *baton, + apr_pool_t *pool) +{ + svn_fs_x__data_t *ffd = fs->fsap_data; + svn_fs_x__shared_data_t *ffsd = ffd->shared; + + SVN_MUTEX__WITH_LOCK(ffsd->txn_list_lock, + body(fs, baton, pool)); + + return SVN_NO_ERROR; +} + + +/* Get a lock on empty file LOCK_FILENAME, creating it in RESULT_POOL. */ +static svn_error_t * +get_lock_on_filesystem(const char *lock_filename, + apr_pool_t *result_pool) +{ + return svn_error_trace(svn_io__file_lock_autocreate(lock_filename, + result_pool)); +} + +/* Reset the HAS_WRITE_LOCK member in the FFD given as BATON_VOID. + When registered with the pool holding the lock on the lock file, + this makes sure the flag gets reset just before we release the lock. */ +static apr_status_t +reset_lock_flag(void *baton_void) +{ + svn_fs_x__data_t *ffd = baton_void; + ffd->has_write_lock = FALSE; + return APR_SUCCESS; +} + +/* Structure defining a file system lock to be acquired and the function + to be executed while the lock is held. + + Instances of this structure may be nested to allow for multiple locks to + be taken out before executing the user-provided body. In that case, BODY + and BATON of the outer instances will be with_lock and a with_lock_baton_t + instance (transparently, no special treatment is required.). It is + illegal to attempt to acquire the same lock twice within the same lock + chain or via nesting calls using separate lock chains. + + All instances along the chain share the same LOCK_POOL such that only one + pool needs to be created and cleared for all locks. We also allocate as + much data from that lock pool as possible to minimize memory usage in + caller pools. */ +typedef struct with_lock_baton_t +{ + /* The filesystem we operate on. Same for all instances along the chain. */ + svn_fs_t *fs; + + /* Mutex to complement the lock file in an APR threaded process. + No-op object for non-threaded processes but never NULL. */ + svn_mutex__t *mutex; + + /* Path to the file to lock. */ + const char *lock_path; + + /* If true, set FS->HAS_WRITE_LOCK after we acquired the lock. */ + svn_boolean_t is_global_lock; + + /* Function body to execute after we acquired the lock. + This may be user-provided or a nested call to with_lock(). */ + svn_error_t *(*body)(void *baton, + apr_pool_t *scratch_pool); + + /* Baton to pass to BODY; possibly NULL. + This may be user-provided or a nested lock baton instance. */ + void *baton; + + /* Pool for all allocations along the lock chain and BODY. Will hold the + file locks and gets destroyed after the outermost BODY returned, + releasing all file locks. + Same for all instances along the chain. */ + apr_pool_t *lock_pool; + + /* TRUE, iff BODY is the user-provided body. */ + svn_boolean_t is_inner_most_lock; + + /* TRUE, iff this is not a nested lock. + Then responsible for destroying LOCK_POOL. */ + svn_boolean_t is_outer_most_lock; +} with_lock_baton_t; + +/* Obtain a write lock on the file BATON->LOCK_PATH and call BATON->BODY + with BATON->BATON. If this is the outermost lock call, release all file + locks after the body returned. If BATON->IS_GLOBAL_LOCK is set, set the + HAS_WRITE_LOCK flag while we keep the write lock. */ +static svn_error_t * +with_some_lock_file(with_lock_baton_t *baton) +{ + apr_pool_t *pool = baton->lock_pool; + svn_error_t *err = get_lock_on_filesystem(baton->lock_path, pool); + + if (!err) + { + svn_fs_t *fs = baton->fs; + svn_fs_x__data_t *ffd = fs->fsap_data; + + if (baton->is_global_lock) + { + /* set the "got the lock" flag and register reset function */ + apr_pool_cleanup_register(pool, + ffd, + reset_lock_flag, + apr_pool_cleanup_null); + ffd->has_write_lock = TRUE; + } + + /* nobody else will modify the repo state + => read HEAD & pack info once */ + if (baton->is_inner_most_lock) + { + err = svn_fs_x__update_min_unpacked_rev(fs, pool); + if (!err) + err = svn_fs_x__youngest_rev(&ffd->youngest_rev_cache, fs, pool); + } + + if (!err) + err = baton->body(baton->baton, pool); + } + + if (baton->is_outer_most_lock) + svn_pool_destroy(pool); + + return svn_error_trace(err); +} + +/* Wraps with_some_lock_file, protecting it with BATON->MUTEX. + + SCRATCH_POOL is unused here and only provided for signature compatibility + with WITH_LOCK_BATON_T.BODY. */ +static svn_error_t * +with_lock(void *baton, + apr_pool_t *scratch_pool) +{ + with_lock_baton_t *lock_baton = baton; + SVN_MUTEX__WITH_LOCK(lock_baton->mutex, with_some_lock_file(lock_baton)); + + return SVN_NO_ERROR; +} + +/* Enum identifying a filesystem lock. */ +typedef enum lock_id_t +{ + write_lock, + txn_lock, + pack_lock +} lock_id_t; + +/* Initialize BATON->MUTEX, BATON->LOCK_PATH and BATON->IS_GLOBAL_LOCK + according to the LOCK_ID. All other members of BATON must already be + valid. */ +static void +init_lock_baton(with_lock_baton_t *baton, + lock_id_t lock_id) +{ + svn_fs_x__data_t *ffd = baton->fs->fsap_data; + svn_fs_x__shared_data_t *ffsd = ffd->shared; + + switch (lock_id) + { + case write_lock: + baton->mutex = ffsd->fs_write_lock; + baton->lock_path = svn_fs_x__path_lock(baton->fs, baton->lock_pool); + baton->is_global_lock = TRUE; + break; + + case txn_lock: + baton->mutex = ffsd->txn_current_lock; + baton->lock_path = svn_fs_x__path_txn_current_lock(baton->fs, + baton->lock_pool); + baton->is_global_lock = FALSE; + break; + + case pack_lock: + baton->mutex = ffsd->fs_pack_lock; + baton->lock_path = svn_fs_x__path_pack_lock(baton->fs, + baton->lock_pool); + baton->is_global_lock = FALSE; + break; + } +} + +/* Return the baton for the innermost lock of a (potential) lock chain. + The baton shall take out LOCK_ID from FS and execute BODY with BATON + while the lock is being held. Allocate the result in a sub-pool of + RESULT_POOL. + */ +static with_lock_baton_t * +create_lock_baton(svn_fs_t *fs, + lock_id_t lock_id, + svn_error_t *(*body)(void *baton, + apr_pool_t *scratch_pool), + void *baton, + apr_pool_t *result_pool) +{ + /* Allocate everything along the lock chain into a single sub-pool. + This minimizes memory usage and cleanup overhead. */ + apr_pool_t *lock_pool = svn_pool_create(result_pool); + with_lock_baton_t *result = apr_pcalloc(lock_pool, sizeof(*result)); + + /* Store parameters. */ + result->fs = fs; + result->body = body; + result->baton = baton; + + /* File locks etc. will use this pool as well for easy cleanup. */ + result->lock_pool = lock_pool; + + /* Right now, we are the first, (only, ) and last struct in the chain. */ + result->is_inner_most_lock = TRUE; + result->is_outer_most_lock = TRUE; + + /* Select mutex and lock file path depending on LOCK_ID. + Also, initialize dependent members (IS_GLOBAL_LOCK only, ATM). */ + init_lock_baton(result, lock_id); + + return result; +} + +/* Return a baton that wraps NESTED and requests LOCK_ID as additional lock. + * + * That means, when you create a lock chain, start with the last / innermost + * lock to take out and add the first / outermost lock last. + */ +static with_lock_baton_t * +chain_lock_baton(lock_id_t lock_id, + with_lock_baton_t *nested) +{ + /* Use the same pool for batons along the lock chain. */ + apr_pool_t *lock_pool = nested->lock_pool; + with_lock_baton_t *result = apr_pcalloc(lock_pool, sizeof(*result)); + + /* All locks along the chain operate on the same FS. */ + result->fs = nested->fs; + + /* Execution of this baton means acquiring the nested lock and its + execution. */ + result->body = with_lock; + result->baton = nested; + + /* Shared among all locks along the chain. */ + result->lock_pool = lock_pool; + + /* We are the new outermost lock but surely not the innermost lock. */ + result->is_inner_most_lock = FALSE; + result->is_outer_most_lock = TRUE; + nested->is_outer_most_lock = FALSE; + + /* Select mutex and lock file path depending on LOCK_ID. + Also, initialize dependent members (IS_GLOBAL_LOCK only, ATM). */ + init_lock_baton(result, lock_id); + + return result; +} + +svn_error_t * +svn_fs_x__with_write_lock(svn_fs_t *fs, + svn_error_t *(*body)(void *baton, + apr_pool_t *scratch_pool), + void *baton, + apr_pool_t *scratch_pool) +{ + return svn_error_trace( + with_lock(create_lock_baton(fs, write_lock, body, baton, + scratch_pool), + scratch_pool)); +} + +svn_error_t * +svn_fs_x__with_pack_lock(svn_fs_t *fs, + svn_error_t *(*body)(void *baton, + apr_pool_t *scratch_pool), + void *baton, + apr_pool_t *scratch_pool) +{ + return svn_error_trace( + with_lock(create_lock_baton(fs, pack_lock, body, baton, + scratch_pool), + scratch_pool)); +} + +svn_error_t * +svn_fs_x__with_txn_current_lock(svn_fs_t *fs, + svn_error_t *(*body)(void *baton, + apr_pool_t *scratch_pool), + void *baton, + apr_pool_t *scratch_pool) +{ + return svn_error_trace( + with_lock(create_lock_baton(fs, txn_lock, body, baton, + scratch_pool), + scratch_pool)); +} + +svn_error_t * +svn_fs_x__with_all_locks(svn_fs_t *fs, + svn_error_t *(*body)(void *baton, + apr_pool_t *scratch_pool), + void *baton, + apr_pool_t *scratch_pool) +{ + /* Be sure to use the correct lock ordering as documented in + fs_fs_shared_data_t. The lock chain is being created in + innermost (last to acquire) -> outermost (first to acquire) order. */ + with_lock_baton_t *lock_baton + = create_lock_baton(fs, write_lock, body, baton, scratch_pool); + + lock_baton = chain_lock_baton(pack_lock, lock_baton); + lock_baton = chain_lock_baton(txn_lock, lock_baton); + + return svn_error_trace(with_lock(lock_baton, scratch_pool)); +} + + +/* A structure used by unlock_proto_rev() and unlock_proto_rev_body(), + which see. */ +typedef struct unlock_proto_rev_baton_t +{ + svn_fs_x__txn_id_t txn_id; + void *lockcookie; +} unlock_proto_rev_baton_t; + +/* Callback used in the implementation of unlock_proto_rev(). */ +static svn_error_t * +unlock_proto_rev_body(svn_fs_t *fs, + const void *baton, + apr_pool_t *scratch_pool) +{ + const unlock_proto_rev_baton_t *b = baton; + apr_file_t *lockfile = b->lockcookie; + svn_fs_x__shared_txn_data_t *txn = get_shared_txn(fs, b->txn_id, FALSE); + apr_status_t apr_err; + + if (!txn) + return svn_error_createf(SVN_ERR_FS_CORRUPT, NULL, + _("Can't unlock unknown transaction '%s'"), + svn_fs_x__txn_name(b->txn_id, scratch_pool)); + if (!txn->being_written) + return svn_error_createf(SVN_ERR_FS_CORRUPT, NULL, + _("Can't unlock nonlocked transaction '%s'"), + svn_fs_x__txn_name(b->txn_id, scratch_pool)); + + apr_err = apr_file_unlock(lockfile); + if (apr_err) + return svn_error_wrap_apr + (apr_err, + _("Can't unlock prototype revision lockfile for transaction '%s'"), + svn_fs_x__txn_name(b->txn_id, scratch_pool)); + apr_err = apr_file_close(lockfile); + if (apr_err) + return svn_error_wrap_apr + (apr_err, + _("Can't close prototype revision lockfile for transaction '%s'"), + svn_fs_x__txn_name(b->txn_id, scratch_pool)); + + txn->being_written = FALSE; + + return SVN_NO_ERROR; +} + +/* Unlock the prototype revision file for transaction TXN_ID in filesystem + FS using cookie LOCKCOOKIE. The original prototype revision file must + have been closed _before_ calling this function. + + Perform temporary allocations in SCRATCH_POOL. */ +static svn_error_t * +unlock_proto_rev(svn_fs_t *fs, + svn_fs_x__txn_id_t txn_id, + void *lockcookie, + apr_pool_t *scratch_pool) +{ + unlock_proto_rev_baton_t b; + + b.txn_id = txn_id; + b.lockcookie = lockcookie; + return with_txnlist_lock(fs, unlock_proto_rev_body, &b, scratch_pool); +} + +/* A structure used by get_writable_proto_rev() and + get_writable_proto_rev_body(), which see. */ +typedef struct get_writable_proto_rev_baton_t +{ + void **lockcookie; + svn_fs_x__txn_id_t txn_id; +} get_writable_proto_rev_baton_t; + +/* Callback used in the implementation of get_writable_proto_rev(). */ +static svn_error_t * +get_writable_proto_rev_body(svn_fs_t *fs, + const void *baton, + apr_pool_t *scratch_pool) +{ + const get_writable_proto_rev_baton_t *b = baton; + void **lockcookie = b->lockcookie; + svn_fs_x__shared_txn_data_t *txn = get_shared_txn(fs, b->txn_id, TRUE); + + /* First, ensure that no thread in this process (including this one) + is currently writing to this transaction's proto-rev file. */ + if (txn->being_written) + return svn_error_createf(SVN_ERR_FS_REP_BEING_WRITTEN, NULL, + _("Cannot write to the prototype revision file " + "of transaction '%s' because a previous " + "representation is currently being written by " + "this process"), + svn_fs_x__txn_name(b->txn_id, scratch_pool)); + + + /* We know that no thread in this process is writing to the proto-rev + file, and by extension, that no thread in this process is holding a + lock on the prototype revision lock file. It is therefore safe + for us to attempt to lock this file, to see if any other process + is holding a lock. */ + + { + apr_file_t *lockfile; + apr_status_t apr_err; + const char *lockfile_path + = svn_fs_x__path_txn_proto_rev_lock(fs, b->txn_id, scratch_pool); + + /* Open the proto-rev lockfile, creating it if necessary, as it may + not exist if the transaction dates from before the lockfiles were + introduced. + + ### We'd also like to use something like svn_io_file_lock2(), but + that forces us to create a subpool just to be able to unlock + the file, which seems a waste. */ + SVN_ERR(svn_io_file_open(&lockfile, lockfile_path, + APR_WRITE | APR_CREATE, APR_OS_DEFAULT, + scratch_pool)); + + apr_err = apr_file_lock(lockfile, + APR_FLOCK_EXCLUSIVE | APR_FLOCK_NONBLOCK); + if (apr_err) + { + svn_error_clear(svn_io_file_close(lockfile, scratch_pool)); + + if (APR_STATUS_IS_EAGAIN(apr_err)) + return svn_error_createf(SVN_ERR_FS_REP_BEING_WRITTEN, NULL, + _("Cannot write to the prototype revision " + "file of transaction '%s' because a " + "previous representation is currently " + "being written by another process"), + svn_fs_x__txn_name(b->txn_id, + scratch_pool)); + + return svn_error_wrap_apr(apr_err, + _("Can't get exclusive lock on file '%s'"), + svn_dirent_local_style(lockfile_path, + scratch_pool)); + } + + *lockcookie = lockfile; + } + + /* We've successfully locked the transaction; mark it as such. */ + txn->being_written = TRUE; + + return SVN_NO_ERROR; +} + +/* Make sure the length ACTUAL_LENGTH of the proto-revision file PROTO_REV + of transaction TXN_ID in filesystem FS matches the proto-index file. + Trim any crash / failure related extra data from the proto-rev file. + + If the prototype revision file is too short, we can't do much but bail out. + + Perform all allocations in SCRATCH_POOL. */ +static svn_error_t * +auto_truncate_proto_rev(svn_fs_t *fs, + apr_file_t *proto_rev, + apr_off_t actual_length, + svn_fs_x__txn_id_t txn_id, + apr_pool_t *scratch_pool) +{ + /* Determine file range covered by the proto-index so far. Note that + we always append to both file, i.e. the last index entry also + corresponds to the last addition in the rev file. */ + const char *path = svn_fs_x__path_p2l_proto_index(fs, txn_id, scratch_pool); + apr_file_t *file; + apr_off_t indexed_length; + + SVN_ERR(svn_fs_x__p2l_proto_index_open(&file, path, scratch_pool)); + SVN_ERR(svn_fs_x__p2l_proto_index_next_offset(&indexed_length, file, + scratch_pool)); + SVN_ERR(svn_io_file_close(file, scratch_pool)); + + /* Handle mismatches. */ + if (indexed_length < actual_length) + SVN_ERR(svn_io_file_trunc(proto_rev, indexed_length, scratch_pool)); + else if (indexed_length > actual_length) + return svn_error_createf(SVN_ERR_FS_INDEX_INCONSISTENT, + NULL, + _("p2l proto index offset %s beyond proto" + "rev file size %s for TXN %s"), + apr_off_t_toa(scratch_pool, indexed_length), + apr_off_t_toa(scratch_pool, actual_length), + svn_fs_x__txn_name(txn_id, scratch_pool)); + + return SVN_NO_ERROR; +} + +/* Get a handle to the prototype revision file for transaction TXN_ID in + filesystem FS, and lock it for writing. Return FILE, a file handle + positioned at the end of the file, and LOCKCOOKIE, a cookie that + should be passed to unlock_proto_rev() to unlock the file once FILE + has been closed. + + If the prototype revision file is already locked, return error + SVN_ERR_FS_REP_BEING_WRITTEN. + + Perform all allocations in POOL. */ +static svn_error_t * +get_writable_proto_rev(apr_file_t **file, + void **lockcookie, + svn_fs_t *fs, + svn_fs_x__txn_id_t txn_id, + apr_pool_t *pool) +{ + get_writable_proto_rev_baton_t b; + svn_error_t *err; + apr_off_t end_offset = 0; + + b.lockcookie = lockcookie; + b.txn_id = txn_id; + + SVN_ERR(with_txnlist_lock(fs, get_writable_proto_rev_body, &b, pool)); + + /* Now open the prototype revision file and seek to the end. */ + err = svn_io_file_open(file, + svn_fs_x__path_txn_proto_rev(fs, txn_id, pool), + APR_WRITE | APR_BUFFERED, APR_OS_DEFAULT, pool); + + /* You might expect that we could dispense with the following seek + and achieve the same thing by opening the file using APR_APPEND. + Unfortunately, APR's buffered file implementation unconditionally + places its initial file pointer at the start of the file (even for + files opened with APR_APPEND), so we need this seek to reconcile + the APR file pointer to the OS file pointer (since we need to be + able to read the current file position later). */ + if (!err) + err = svn_io_file_seek(*file, APR_END, &end_offset, pool); + + /* We don't want unused sections (such as leftovers from failed delta + stream) in our file. If we use log addressing, we would need an + index entry for the unused section and that section would need to + be all NUL by convention. So, detect and fix those cases by truncating + the protorev file. */ + if (!err) + err = auto_truncate_proto_rev(fs, *file, end_offset, txn_id, pool); + + if (err) + { + err = svn_error_compose_create( + err, + unlock_proto_rev(fs, txn_id, *lockcookie, pool)); + + *lockcookie = NULL; + } + + return svn_error_trace(err); +} + +/* Callback used in the implementation of purge_shared_txn(). */ +static svn_error_t * +purge_shared_txn_body(svn_fs_t *fs, + const void *baton, + apr_pool_t *scratch_pool) +{ + svn_fs_x__txn_id_t txn_id = *(const svn_fs_x__txn_id_t *)baton; + + free_shared_txn(fs, txn_id); + + return SVN_NO_ERROR; +} + +/* Purge the shared data for transaction TXN_ID in filesystem FS. + Perform all temporary allocations in SCRATCH_POOL. */ +static svn_error_t * +purge_shared_txn(svn_fs_t *fs, + svn_fs_x__txn_id_t txn_id, + apr_pool_t *scratch_pool) +{ + return with_txnlist_lock(fs, purge_shared_txn_body, &txn_id, scratch_pool); +} + + +svn_boolean_t +svn_fs_x__is_fresh_txn_root(svn_fs_x__noderev_t *noderev) +{ + /* Is it a root node? */ + if (noderev->noderev_id.number != SVN_FS_X__ITEM_INDEX_ROOT_NODE) + return FALSE; + + /* ... in a transaction? */ + if (!svn_fs_x__is_txn(noderev->noderev_id.change_set)) + return FALSE; + + /* ... with no prop change in that txn? + (Once we set a property, the prop rep will never become NULL again.) */ + if (noderev->prop_rep && svn_fs_x__is_txn(noderev->prop_rep->id.change_set)) + return FALSE; + + /* ... and no sub-tree change? + (Once we set a text, the data rep will never become NULL again.) */ + if (noderev->data_rep && svn_fs_x__is_txn(noderev->data_rep->id.change_set)) + return FALSE; + + /* Root node of a txn with no changes. */ + return TRUE; +} + +svn_error_t * +svn_fs_x__put_node_revision(svn_fs_t *fs, + svn_fs_x__noderev_t *noderev, + apr_pool_t *scratch_pool) +{ + apr_file_t *noderev_file; + const svn_fs_x__id_t *id = &noderev->noderev_id; + + if (! svn_fs_x__is_txn(id->change_set)) + return svn_error_createf(SVN_ERR_FS_CORRUPT, NULL, + _("Attempted to write to non-transaction '%s'"), + svn_fs_x__id_unparse(id, scratch_pool)->data); + + SVN_ERR(svn_io_file_open(&noderev_file, + svn_fs_x__path_txn_node_rev(fs, id, scratch_pool, + scratch_pool), + APR_WRITE | APR_CREATE | APR_TRUNCATE + | APR_BUFFERED, APR_OS_DEFAULT, scratch_pool)); + + SVN_ERR(svn_fs_x__write_noderev(svn_stream_from_aprfile2(noderev_file, TRUE, + scratch_pool), + noderev, scratch_pool)); + + SVN_ERR(svn_io_file_close(noderev_file, scratch_pool)); + + return SVN_NO_ERROR; +} + +/* For the in-transaction NODEREV within FS, write the sha1->rep mapping + * file in the respective transaction, if rep sharing has been enabled etc. + * Use SCATCH_POOL for temporary allocations. + */ +static svn_error_t * +store_sha1_rep_mapping(svn_fs_t *fs, + svn_fs_x__noderev_t *noderev, + apr_pool_t *scratch_pool) +{ + svn_fs_x__data_t *ffd = fs->fsap_data; + + /* if rep sharing has been enabled and the noderev has a data rep and + * its SHA-1 is known, store the rep struct under its SHA1. */ + if ( ffd->rep_sharing_allowed + && noderev->data_rep + && noderev->data_rep->has_sha1) + { + apr_file_t *rep_file; + apr_int64_t txn_id + = svn_fs_x__get_txn_id(noderev->data_rep->id.change_set); + const char *file_name + = svn_fs_x__path_txn_sha1(fs, txn_id, + noderev->data_rep->sha1_digest, + scratch_pool); + svn_stringbuf_t *rep_string + = svn_fs_x__unparse_representation(noderev->data_rep, + (noderev->kind == svn_node_dir), + scratch_pool, scratch_pool); + + SVN_ERR(svn_io_file_open(&rep_file, file_name, + APR_WRITE | APR_CREATE | APR_TRUNCATE + | APR_BUFFERED, APR_OS_DEFAULT, scratch_pool)); + + SVN_ERR(svn_io_file_write_full(rep_file, rep_string->data, + rep_string->len, NULL, scratch_pool)); + + SVN_ERR(svn_io_file_close(rep_file, scratch_pool)); + } + + return SVN_NO_ERROR; +} + +static svn_error_t * +unparse_dir_entry(svn_fs_x__dirent_t *dirent, + svn_stream_t *stream, + apr_pool_t *scratch_pool) +{ + const char *val + = apr_psprintf(scratch_pool, "%s %s", + (dirent->kind == svn_node_file) ? SVN_FS_X__KIND_FILE + : SVN_FS_X__KIND_DIR, + svn_fs_x__id_unparse(&dirent->id, scratch_pool)->data); + + SVN_ERR(svn_stream_printf(stream, scratch_pool, "K %" APR_SIZE_T_FMT + "\n%s\nV %" APR_SIZE_T_FMT "\n%s\n", + strlen(dirent->name), dirent->name, + strlen(val), val)); + return SVN_NO_ERROR; +} + +/* Write the directory given as array of dirent structs in ENTRIES to STREAM. + Perform temporary allocations in SCRATCH_POOL. */ +static svn_error_t * +unparse_dir_entries(apr_array_header_t *entries, + svn_stream_t *stream, + apr_pool_t *scratch_pool) +{ + apr_pool_t *iterpool = svn_pool_create(scratch_pool); + int i; + for (i = 0; i < entries->nelts; ++i) + { + svn_fs_x__dirent_t *dirent; + + svn_pool_clear(iterpool); + dirent = APR_ARRAY_IDX(entries, i, svn_fs_x__dirent_t *); + SVN_ERR(unparse_dir_entry(dirent, stream, iterpool)); + } + + SVN_ERR(svn_stream_printf(stream, scratch_pool, "%s\n", + SVN_HASH_TERMINATOR)); + + svn_pool_destroy(iterpool); + return SVN_NO_ERROR; +} + +/* Return a deep copy of SOURCE and allocate it in RESULT_POOL. + */ +static svn_fs_x__change_t * +path_change_dup(const svn_fs_x__change_t *source, + apr_pool_t *result_pool) +{ + svn_fs_x__change_t *result + = apr_pmemdup(result_pool, source, sizeof(*source)); + result->path.data + = apr_pstrmemdup(result_pool, source->path.data, source->path.len); + + if (source->copyfrom_path) + result->copyfrom_path = apr_pstrdup(result_pool, source->copyfrom_path); + + return result; +} + +/* Merge the internal-use-only CHANGE into a hash of public-FS + svn_fs_x__change_t CHANGED_PATHS, collapsing multiple changes into a + single summarical (is that real word?) change per path. DELETIONS is + also a path->svn_fs_x__change_t hash and contains all the deletions + that got turned into a replacement. */ +static svn_error_t * +fold_change(apr_hash_t *changed_paths, + apr_hash_t *deletions, + const svn_fs_x__change_t *change) +{ + apr_pool_t *pool = apr_hash_pool_get(changed_paths); + svn_fs_x__change_t *old_change, *new_change; + const svn_string_t *path = &change->path; + + if ((old_change = apr_hash_get(changed_paths, path->data, path->len))) + { + /* This path already exists in the hash, so we have to merge + this change into the already existing one. */ + + /* Sanity check: only allow unused node revision IDs in the + `reset' case. */ + if ((! svn_fs_x__id_used(&change->noderev_id)) + && (change->change_kind != svn_fs_path_change_reset)) + return svn_error_create + (SVN_ERR_FS_CORRUPT, NULL, + _("Missing required node revision ID")); + + /* Sanity check: we should be talking about the same node + revision ID as our last change except where the last change + was a deletion. */ + if (svn_fs_x__id_used(&change->noderev_id) + && (!svn_fs_x__id_eq(&old_change->noderev_id, &change->noderev_id)) + && (old_change->change_kind != svn_fs_path_change_delete)) + return svn_error_create + (SVN_ERR_FS_CORRUPT, NULL, + _("Invalid change ordering: new node revision ID " + "without delete")); + + /* Sanity check: an add, replacement, or reset must be the first + thing to follow a deletion. */ + if ((old_change->change_kind == svn_fs_path_change_delete) + && (! ((change->change_kind == svn_fs_path_change_replace) + || (change->change_kind == svn_fs_path_change_reset) + || (change->change_kind == svn_fs_path_change_add)))) + return svn_error_create + (SVN_ERR_FS_CORRUPT, NULL, + _("Invalid change ordering: non-add change on deleted path")); + + /* Sanity check: an add can't follow anything except + a delete or reset. */ + if ((change->change_kind == svn_fs_path_change_add) + && (old_change->change_kind != svn_fs_path_change_delete) + && (old_change->change_kind != svn_fs_path_change_reset)) + return svn_error_create + (SVN_ERR_FS_CORRUPT, NULL, + _("Invalid change ordering: add change on preexisting path")); + + /* Now, merge that change in. */ + switch (change->change_kind) + { + case svn_fs_path_change_reset: + /* A reset here will simply remove the path change from the + hash. */ + apr_hash_set(changed_paths, path->data, path->len, NULL); + break; + + case svn_fs_path_change_delete: + if (old_change->change_kind == svn_fs_path_change_add) + { + /* If the path was introduced in this transaction via an + add, and we are deleting it, just remove the path + altogether. (The caller will delete any child paths.) */ + apr_hash_set(changed_paths, path->data, path->len, NULL); + } + else if (old_change->change_kind == svn_fs_path_change_replace) + { + /* A deleting a 'replace' restore the original deletion. */ + new_change = apr_hash_get(deletions, path->data, path->len); + SVN_ERR_ASSERT(new_change); + apr_hash_set(changed_paths, path->data, path->len, new_change); + } + else + { + /* A deletion overrules a previous change (modify). */ + new_change = path_change_dup(change, pool); + apr_hash_set(changed_paths, path->data, path->len, new_change); + } + break; + + case svn_fs_path_change_add: + case svn_fs_path_change_replace: + /* An add at this point must be following a previous delete, + so treat it just like a replace. Remember the original + deletion such that we are able to delete this path again + (the replacement may have changed node kind and id). */ + new_change = path_change_dup(change, pool); + new_change->change_kind = svn_fs_path_change_replace; + + apr_hash_set(changed_paths, path->data, path->len, new_change); + + /* Remember the original change. + * Make sure to allocate the hash key in a durable pool. */ + apr_hash_set(deletions, + apr_pstrmemdup(apr_hash_pool_get(deletions), + path->data, path->len), + path->len, old_change); + break; + + case svn_fs_path_change_modify: + default: + /* If the new change modifies some attribute of the node, set + the corresponding flag, whether it already was set or not. + Note: We do not reset a flag to FALSE if a change is undone. */ + if (change->text_mod) + old_change->text_mod = TRUE; + if (change->prop_mod) + old_change->prop_mod = TRUE; + if (change->mergeinfo_mod == svn_tristate_true) + old_change->mergeinfo_mod = svn_tristate_true; + break; + } + } + else + { + /* Add this path. The API makes no guarantees that this (new) key + will not be retained. Thus, we copy the key into the target pool + to ensure a proper lifetime. */ + new_change = path_change_dup(change, pool); + apr_hash_set(changed_paths, new_change->path.data, + new_change->path.len, new_change); + } + + return SVN_NO_ERROR; +} + +/* Baton type to be used with process_changes(). */ +typedef struct process_changes_baton_t +{ + /* Folded list of path changes. */ + apr_hash_t *changed_paths; + + /* Path changes that are deletions and have been turned into + replacements. If those replacements get deleted again, this + container contains the record that we have to revert to. */ + apr_hash_t *deletions; +} process_changes_baton_t; + +/* An implementation of svn_fs_x__change_receiver_t. + Examine all the changed path entries in CHANGES and store them in + *CHANGED_PATHS. Folding is done to remove redundant or unnecessary + data. Use SCRATCH_POOL for temporary allocations. */ +static svn_error_t * +process_changes(void *baton_p, + svn_fs_x__change_t *change, + apr_pool_t *scratch_pool) +{ + process_changes_baton_t *baton = baton_p; + + SVN_ERR(fold_change(baton->changed_paths, baton->deletions, change)); + + /* Now, if our change was a deletion or replacement, we have to + blow away any changes thus far on paths that are (or, were) + children of this path. + ### i won't bother with another iteration pool here -- at + most we talking about a few extra dups of paths into what + is already a temporary subpool. + */ + + if ((change->change_kind == svn_fs_path_change_delete) + || (change->change_kind == svn_fs_path_change_replace)) + { + apr_hash_index_t *hi; + + /* a potential child path must contain at least 2 more chars + (the path separator plus at least one char for the name). + Also, we should not assume that all paths have been normalized + i.e. some might have trailing path separators. + */ + apr_ssize_t path_len = change->path.len; + apr_ssize_t min_child_len = path_len == 0 + ? 1 + : change->path.data[path_len-1] == '/' + ? path_len + 1 + : path_len + 2; + + /* CAUTION: This is the inner loop of an O(n^2) algorithm. + The number of changes to process may be >> 1000. + Therefore, keep the inner loop as tight as possible. + */ + for (hi = apr_hash_first(scratch_pool, baton->changed_paths); + hi; + hi = apr_hash_next(hi)) + { + /* KEY is the path. */ + const void *path; + apr_ssize_t klen; + apr_hash_this(hi, &path, &klen, NULL); + + /* If we come across a child of our path, remove it. + Call svn_fspath__skip_ancestor only if there is a chance that + this is actually a sub-path. + */ + if (klen >= min_child_len) + { + const char *child; + + child = svn_fspath__skip_ancestor(change->path.data, path); + if (child && child[0] != '\0') + apr_hash_set(baton->changed_paths, path, klen, NULL); + } + } + } + + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__txn_changes_fetch(apr_hash_t **changed_paths_p, + svn_fs_t *fs, + svn_fs_x__txn_id_t txn_id, + apr_pool_t *pool) +{ + apr_file_t *file; + apr_hash_t *changed_paths = apr_hash_make(pool); + apr_pool_t *scratch_pool = svn_pool_create(pool); + process_changes_baton_t baton; + + baton.changed_paths = changed_paths; + baton.deletions = apr_hash_make(scratch_pool); + + SVN_ERR(svn_io_file_open(&file, + svn_fs_x__path_txn_changes(fs, txn_id, scratch_pool), + APR_READ | APR_BUFFERED, APR_OS_DEFAULT, + scratch_pool)); + + SVN_ERR(svn_fs_x__read_changes_incrementally( + svn_stream_from_aprfile2(file, TRUE, + scratch_pool), + process_changes, &baton, + scratch_pool)); + svn_pool_destroy(scratch_pool); + + *changed_paths_p = changed_paths; + + return SVN_NO_ERROR; +} + +/* Copy a revision node-rev SRC into the current transaction TXN_ID in + the filesystem FS. This is only used to create the root of a transaction. + Temporary allocations are from SCRATCH_POOL. */ +static svn_error_t * +create_new_txn_noderev_from_rev(svn_fs_t *fs, + svn_fs_x__txn_id_t txn_id, + svn_fs_x__id_t *src, + apr_pool_t *scratch_pool) +{ + svn_fs_x__noderev_t *noderev; + SVN_ERR(svn_fs_x__get_node_revision(&noderev, fs, src, scratch_pool, + scratch_pool)); + + /* This must be a root node. */ + SVN_ERR_ASSERT( noderev->node_id.number == 0 + && noderev->copy_id.number == 0); + + if (svn_fs_x__is_txn(noderev->noderev_id.change_set)) + return svn_error_create(SVN_ERR_FS_CORRUPT, NULL, + _("Copying from transactions not allowed")); + + noderev->predecessor_id = noderev->noderev_id; + noderev->predecessor_count++; + noderev->copyfrom_path = NULL; + noderev->copyfrom_rev = SVN_INVALID_REVNUM; + + /* For the transaction root, the copyroot never changes. */ + svn_fs_x__init_txn_root(&noderev->noderev_id, txn_id); + + return svn_fs_x__put_node_revision(fs, noderev, scratch_pool); +} + +/* A structure used by get_and_increment_txn_key_body(). */ +typedef struct get_and_increment_txn_key_baton_t +{ + svn_fs_t *fs; + apr_uint64_t txn_number; +} get_and_increment_txn_key_baton_t; + +/* Callback used in the implementation of create_txn_dir(). This gets + the current base 36 value in PATH_TXN_CURRENT and increments it. + It returns the original value by the baton. */ +static svn_error_t * +get_and_increment_txn_key_body(void *baton, + apr_pool_t *scratch_pool) +{ + get_and_increment_txn_key_baton_t *cb = baton; + const char *txn_current_filename = svn_fs_x__path_txn_current(cb->fs, + scratch_pool); + const char *tmp_filename; + char new_id_str[SVN_INT64_BUFFER_SIZE]; + + svn_stringbuf_t *buf; + SVN_ERR(svn_fs_x__read_content(&buf, txn_current_filename, scratch_pool)); + + /* remove trailing newlines */ + cb->txn_number = svn__base36toui64(NULL, buf->data); + + /* Increment the key and add a trailing \n to the string so the + txn-current file has a newline in it. */ + SVN_ERR(svn_io_write_unique(&tmp_filename, + svn_dirent_dirname(txn_current_filename, + scratch_pool), + new_id_str, + svn__ui64tobase36(new_id_str, cb->txn_number+1), + svn_io_file_del_none, scratch_pool)); + SVN_ERR(svn_fs_x__move_into_place(tmp_filename, txn_current_filename, + txn_current_filename, scratch_pool)); + + return SVN_NO_ERROR; +} + +/* Create a unique directory for a transaction in FS based on revision REV. + Return the ID for this transaction in *ID_P and *TXN_ID. Use a sequence + value in the transaction ID to prevent reuse of transaction IDs. */ +static svn_error_t * +create_txn_dir(const char **id_p, + svn_fs_x__txn_id_t *txn_id, + svn_fs_t *fs, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool) +{ + get_and_increment_txn_key_baton_t cb; + const char *txn_dir; + + /* Get the current transaction sequence value, which is a base-36 + number, from the txn-current file, and write an + incremented value back out to the file. Place the revision + number the transaction is based off into the transaction id. */ + cb.fs = fs; + SVN_ERR(svn_fs_x__with_txn_current_lock(fs, + get_and_increment_txn_key_body, + &cb, + scratch_pool)); + *txn_id = cb.txn_number; + + *id_p = svn_fs_x__txn_name(*txn_id, result_pool); + txn_dir = svn_fs_x__path_txn_dir(fs, *txn_id, scratch_pool); + + return svn_io_dir_make(txn_dir, APR_OS_DEFAULT, scratch_pool); +} + +/* Create a new transaction in filesystem FS, based on revision REV, + and store it in *TXN_P, allocated in RESULT_POOL. Allocate necessary + temporaries from SCRATCH_POOL. */ +static svn_error_t * +create_txn(svn_fs_txn_t **txn_p, + svn_fs_t *fs, + svn_revnum_t rev, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool) +{ + svn_fs_txn_t *txn; + fs_txn_data_t *ftd; + svn_fs_x__id_t root_id; + + txn = apr_pcalloc(result_pool, sizeof(*txn)); + ftd = apr_pcalloc(result_pool, sizeof(*ftd)); + + /* Valid revision number? */ + SVN_ERR(svn_fs_x__ensure_revision_exists(rev, fs, scratch_pool)); + + /* Get the txn_id. */ + SVN_ERR(create_txn_dir(&txn->id, &ftd->txn_id, fs, result_pool, + scratch_pool)); + + txn->fs = fs; + txn->base_rev = rev; + + txn->vtable = &txn_vtable; + txn->fsap_data = ftd; + *txn_p = txn; + + /* Create a new root node for this transaction. */ + svn_fs_x__init_rev_root(&root_id, rev); + SVN_ERR(create_new_txn_noderev_from_rev(fs, ftd->txn_id, &root_id, + scratch_pool)); + + /* Create an empty rev file. */ + SVN_ERR(svn_io_file_create_empty( + svn_fs_x__path_txn_proto_rev(fs, ftd->txn_id, scratch_pool), + scratch_pool)); + + /* Create an empty rev-lock file. */ + SVN_ERR(svn_io_file_create_empty( + svn_fs_x__path_txn_proto_rev_lock(fs, ftd->txn_id, scratch_pool), + scratch_pool)); + + /* Create an empty changes file. */ + SVN_ERR(svn_io_file_create_empty( + svn_fs_x__path_txn_changes(fs, ftd->txn_id, scratch_pool), + scratch_pool)); + + /* Create the next-ids file. */ + SVN_ERR(svn_io_file_create( + svn_fs_x__path_txn_next_ids(fs, ftd->txn_id, scratch_pool), + "0 0\n", scratch_pool)); + + return SVN_NO_ERROR; +} + +/* Store the property list for transaction TXN_ID in PROPLIST. + Perform temporary allocations in POOL. */ +static svn_error_t * +get_txn_proplist(apr_hash_t *proplist, + svn_fs_t *fs, + svn_fs_x__txn_id_t txn_id, + apr_pool_t *pool) +{ + svn_stream_t *stream; + + /* Check for issue #3696. (When we find and fix the cause, we can change + * this to an assertion.) */ + if (txn_id == SVN_FS_X__INVALID_TXN_ID) + return svn_error_create(SVN_ERR_INCORRECT_PARAMS, NULL, + _("Internal error: a null transaction id was " + "passed to get_txn_proplist()")); + + /* Open the transaction properties file. */ + SVN_ERR(svn_stream_open_readonly(&stream, + svn_fs_x__path_txn_props(fs, txn_id, pool), + pool, pool)); + + /* Read in the property list. */ + SVN_ERR(svn_hash_read2(proplist, stream, SVN_HASH_TERMINATOR, pool)); + + return svn_stream_close(stream); +} + +/* Save the property list PROPS as the revprops for transaction TXN_ID + in FS. Perform temporary allocations in SCRATCH_POOL. */ +static svn_error_t * +set_txn_proplist(svn_fs_t *fs, + svn_fs_x__txn_id_t txn_id, + apr_hash_t *props, + svn_boolean_t final, + apr_pool_t *scratch_pool) +{ + svn_stringbuf_t *buf; + svn_stream_t *stream; + + /* Write out the new file contents to BUF. */ + buf = svn_stringbuf_create_ensure(1024, scratch_pool); + stream = svn_stream_from_stringbuf(buf, scratch_pool); + SVN_ERR(svn_hash_write2(props, stream, SVN_HASH_TERMINATOR, scratch_pool)); + SVN_ERR(svn_stream_close(stream)); + + /* Open the transaction properties file and write new contents to it. */ + SVN_ERR(svn_io_write_atomic((final + ? svn_fs_x__path_txn_props_final(fs, txn_id, + scratch_pool) + : svn_fs_x__path_txn_props(fs, txn_id, + scratch_pool)), + buf->data, buf->len, + NULL /* copy_perms_path */, scratch_pool)); + return SVN_NO_ERROR; +} + + +svn_error_t * +svn_fs_x__change_txn_prop(svn_fs_txn_t *txn, + const char *name, + const svn_string_t *value, + apr_pool_t *scratch_pool) +{ + apr_array_header_t *props = apr_array_make(scratch_pool, 1, + sizeof(svn_prop_t)); + svn_prop_t prop; + + prop.name = name; + prop.value = value; + APR_ARRAY_PUSH(props, svn_prop_t) = prop; + + return svn_fs_x__change_txn_props(txn, props, scratch_pool); +} + +svn_error_t * +svn_fs_x__change_txn_props(svn_fs_txn_t *txn, + const apr_array_header_t *props, + apr_pool_t *scratch_pool) +{ + fs_txn_data_t *ftd = txn->fsap_data; + apr_hash_t *txn_prop = apr_hash_make(scratch_pool); + int i; + svn_error_t *err; + + err = get_txn_proplist(txn_prop, txn->fs, ftd->txn_id, scratch_pool); + /* Here - and here only - we need to deal with the possibility that the + transaction property file doesn't yet exist. The rest of the + implementation assumes that the file exists, but we're called to set the + initial transaction properties as the transaction is being created. */ + if (err && (APR_STATUS_IS_ENOENT(err->apr_err))) + svn_error_clear(err); + else if (err) + return svn_error_trace(err); + + for (i = 0; i < props->nelts; i++) + { + svn_prop_t *prop = &APR_ARRAY_IDX(props, i, svn_prop_t); + + if (svn_hash_gets(txn_prop, SVN_FS__PROP_TXN_CLIENT_DATE) + && !strcmp(prop->name, SVN_PROP_REVISION_DATE)) + svn_hash_sets(txn_prop, SVN_FS__PROP_TXN_CLIENT_DATE, + svn_string_create("1", scratch_pool)); + + svn_hash_sets(txn_prop, prop->name, prop->value); + } + + /* Create a new version of the file and write out the new props. */ + /* Open the transaction properties file. */ + SVN_ERR(set_txn_proplist(txn->fs, ftd->txn_id, txn_prop, FALSE, + scratch_pool)); + + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__get_txn(svn_fs_x__transaction_t **txn_p, + svn_fs_t *fs, + svn_fs_x__txn_id_t txn_id, + apr_pool_t *pool) +{ + svn_fs_x__transaction_t *txn; + svn_fs_x__noderev_t *noderev; + svn_fs_x__id_t root_id; + + txn = apr_pcalloc(pool, sizeof(*txn)); + txn->proplist = apr_hash_make(pool); + + SVN_ERR(get_txn_proplist(txn->proplist, fs, txn_id, pool)); + svn_fs_x__init_txn_root(&root_id, txn_id); + + SVN_ERR(svn_fs_x__get_node_revision(&noderev, fs, &root_id, pool, pool)); + + txn->base_rev = svn_fs_x__get_revnum(noderev->predecessor_id.change_set); + txn->copies = NULL; + + *txn_p = txn; + + return SVN_NO_ERROR; +} + +/* If it is supported by the format of file system FS, store the (ITEM_INDEX, + * OFFSET) pair in the log-to-phys proto index file of transaction TXN_ID. + * Use SCRATCH_POOL for temporary allocations. + */ +static svn_error_t * +store_l2p_index_entry(svn_fs_t *fs, + svn_fs_x__txn_id_t txn_id, + apr_off_t offset, + apr_uint64_t item_index, + apr_pool_t *scratch_pool) +{ + const char *path = svn_fs_x__path_l2p_proto_index(fs, txn_id, scratch_pool); + apr_file_t *file; + SVN_ERR(svn_fs_x__l2p_proto_index_open(&file, path, scratch_pool)); + SVN_ERR(svn_fs_x__l2p_proto_index_add_entry(file, offset, 0, + item_index, scratch_pool)); + SVN_ERR(svn_io_file_close(file, scratch_pool)); + + return SVN_NO_ERROR; +} + +/* If it is supported by the format of file system FS, store ENTRY in the + * phys-to-log proto index file of transaction TXN_ID. + * Use SCRATCH_POOL for temporary allocations. + */ +static svn_error_t * +store_p2l_index_entry(svn_fs_t *fs, + svn_fs_x__txn_id_t txn_id, + svn_fs_x__p2l_entry_t *entry, + apr_pool_t *scratch_pool) +{ + const char *path = svn_fs_x__path_p2l_proto_index(fs, txn_id, scratch_pool); + apr_file_t *file; + SVN_ERR(svn_fs_x__p2l_proto_index_open(&file, path, scratch_pool)); + SVN_ERR(svn_fs_x__p2l_proto_index_add_entry(file, entry, scratch_pool)); + SVN_ERR(svn_io_file_close(file, scratch_pool)); + + return SVN_NO_ERROR; +} + +/* Allocate an item index in the transaction TXN_ID of file system FS and + * return it in *ITEM_INDEX. Use SCRATCH_POOL for temporary allocations. + */ +static svn_error_t * +allocate_item_index(apr_uint64_t *item_index, + svn_fs_t *fs, + svn_fs_x__txn_id_t txn_id, + apr_pool_t *scratch_pool) +{ + apr_file_t *file; + char buffer[SVN_INT64_BUFFER_SIZE] = { 0 }; + svn_boolean_t eof = FALSE; + apr_size_t to_write; + apr_size_t read; + apr_off_t offset = 0; + + /* read number */ + SVN_ERR(svn_io_file_open(&file, + svn_fs_x__path_txn_item_index(fs, txn_id, + scratch_pool), + APR_READ | APR_WRITE + | APR_CREATE | APR_BUFFERED, + APR_OS_DEFAULT, scratch_pool)); + SVN_ERR(svn_io_file_read_full2(file, buffer, sizeof(buffer)-1, + &read, &eof, scratch_pool)); + if (read) + SVN_ERR(svn_cstring_atoui64(item_index, buffer)); + else + *item_index = SVN_FS_X__ITEM_INDEX_FIRST_USER; + + /* increment it */ + to_write = svn__ui64toa(buffer, *item_index + 1); + + /* write it back to disk */ + SVN_ERR(svn_io_file_seek(file, APR_SET, &offset, scratch_pool)); + SVN_ERR(svn_io_file_write_full(file, buffer, to_write, NULL, scratch_pool)); + SVN_ERR(svn_io_file_close(file, scratch_pool)); + + return SVN_NO_ERROR; +} + +/* Write out the currently available next node_id NODE_ID and copy_id + COPY_ID for transaction TXN_ID in filesystem FS. The next node-id is + used both for creating new unique nodes for the given transaction, as + well as uniquifying representations. Perform temporary allocations in + SCRATCH_POOL. */ +static svn_error_t * +write_next_ids(svn_fs_t *fs, + svn_fs_x__txn_id_t txn_id, + apr_uint64_t node_id, + apr_uint64_t copy_id, + apr_pool_t *scratch_pool) +{ + apr_file_t *file; + char buffer[2 * SVN_INT64_BUFFER_SIZE + 2]; + char *p = buffer; + + p += svn__ui64tobase36(p, node_id); + *(p++) = ' '; + p += svn__ui64tobase36(p, copy_id); + *(p++) = '\n'; + *(p++) = '\0'; + + SVN_ERR(svn_io_file_open(&file, + svn_fs_x__path_txn_next_ids(fs, txn_id, + scratch_pool), + APR_WRITE | APR_TRUNCATE, + APR_OS_DEFAULT, scratch_pool)); + SVN_ERR(svn_io_file_write_full(file, buffer, p - buffer, NULL, + scratch_pool)); + return svn_io_file_close(file, scratch_pool); +} + +/* Find out what the next unique node-id and copy-id are for + transaction TXN_ID in filesystem FS. Store the results in *NODE_ID + and *COPY_ID. The next node-id is used both for creating new unique + nodes for the given transaction, as well as uniquifying representations. + Perform temporary allocations in SCRATCH_POOL. */ +static svn_error_t * +read_next_ids(apr_uint64_t *node_id, + apr_uint64_t *copy_id, + svn_fs_t *fs, + svn_fs_x__txn_id_t txn_id, + apr_pool_t *scratch_pool) +{ + svn_stringbuf_t *buf; + const char *str; + SVN_ERR(svn_fs_x__read_content(&buf, + svn_fs_x__path_txn_next_ids(fs, txn_id, + scratch_pool), + scratch_pool)); + + /* Parse this into two separate strings. */ + + str = buf->data; + *node_id = svn__base36toui64(&str, str); + if (*str != ' ') + return svn_error_create(SVN_ERR_FS_CORRUPT, NULL, + _("next-id file corrupt")); + + ++str; + *copy_id = svn__base36toui64(&str, str); + if (*str != '\n') + return svn_error_create(SVN_ERR_FS_CORRUPT, NULL, + _("next-id file corrupt")); + + return SVN_NO_ERROR; +} + +/* Get a new and unique to this transaction node-id for transaction + TXN_ID in filesystem FS. Store the new node-id in *NODE_ID_P. + Node-ids are guaranteed to be unique to this transction, but may + not necessarily be sequential. + Perform temporary allocations in SCRATCH_POOL. */ +static svn_error_t * +get_new_txn_node_id(svn_fs_x__id_t *node_id_p, + svn_fs_t *fs, + svn_fs_x__txn_id_t txn_id, + apr_pool_t *scratch_pool) +{ + apr_uint64_t node_id, copy_id; + + /* First read in the current next-ids file. */ + SVN_ERR(read_next_ids(&node_id, ©_id, fs, txn_id, scratch_pool)); + + node_id_p->change_set = svn_fs_x__change_set_by_txn(txn_id); + node_id_p->number = node_id; + + SVN_ERR(write_next_ids(fs, txn_id, ++node_id, copy_id, scratch_pool)); + + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__reserve_copy_id(svn_fs_x__id_t *copy_id_p, + svn_fs_t *fs, + svn_fs_x__txn_id_t txn_id, + apr_pool_t *scratch_pool) +{ + apr_uint64_t node_id, copy_id; + + /* First read in the current next-ids file. */ + SVN_ERR(read_next_ids(&node_id, ©_id, fs, txn_id, scratch_pool)); + + copy_id_p->change_set = svn_fs_x__change_set_by_txn(txn_id); + copy_id_p->number = copy_id; + + SVN_ERR(write_next_ids(fs, txn_id, node_id, ++copy_id, scratch_pool)); + + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__create_node(svn_fs_t *fs, + svn_fs_x__noderev_t *noderev, + const svn_fs_x__id_t *copy_id, + svn_fs_x__txn_id_t txn_id, + apr_pool_t *scratch_pool) +{ + /* Get a new node-id for this node. */ + SVN_ERR(get_new_txn_node_id(&noderev->node_id, fs, txn_id, scratch_pool)); + + /* Assign copy-id. */ + noderev->copy_id = *copy_id; + + /* Noderev-id = Change set and item number within this change set. */ + noderev->noderev_id.change_set = svn_fs_x__change_set_by_txn(txn_id); + SVN_ERR(allocate_item_index(&noderev->noderev_id.number, fs, txn_id, + scratch_pool)); + + SVN_ERR(svn_fs_x__put_node_revision(fs, noderev, scratch_pool)); + + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__purge_txn(svn_fs_t *fs, + const char *txn_id_str, + apr_pool_t *scratch_pool) +{ + svn_fs_x__txn_id_t txn_id; + SVN_ERR(svn_fs_x__txn_by_name(&txn_id, txn_id_str)); + + /* Remove the shared transaction object associated with this transaction. */ + SVN_ERR(purge_shared_txn(fs, txn_id, scratch_pool)); + /* Remove the directory associated with this transaction. */ + SVN_ERR(svn_io_remove_dir2(svn_fs_x__path_txn_dir(fs, txn_id, scratch_pool), + FALSE, NULL, NULL, scratch_pool)); + + /* Delete protorev and its lock, which aren't in the txn + directory. It's OK if they don't exist (for example, if this + is post-commit and the proto-rev has been moved into + place). */ + SVN_ERR(svn_io_remove_file2( + svn_fs_x__path_txn_proto_rev(fs, txn_id, scratch_pool), + TRUE, scratch_pool)); + SVN_ERR(svn_io_remove_file2( + svn_fs_x__path_txn_proto_rev_lock(fs, txn_id, scratch_pool), + TRUE, scratch_pool)); + + return SVN_NO_ERROR; +} + + +svn_error_t * +svn_fs_x__abort_txn(svn_fs_txn_t *txn, + apr_pool_t *scratch_pool) +{ + SVN_ERR(svn_fs__check_fs(txn->fs, TRUE)); + + /* Now, purge the transaction. */ + SVN_ERR_W(svn_fs_x__purge_txn(txn->fs, txn->id, scratch_pool), + apr_psprintf(scratch_pool, _("Transaction '%s' cleanup failed"), + txn->id)); + + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__set_entry(svn_fs_t *fs, + svn_fs_x__txn_id_t txn_id, + svn_fs_x__noderev_t *parent_noderev, + const char *name, + const svn_fs_x__id_t *id, + svn_node_kind_t kind, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool) +{ + svn_fs_x__representation_t *rep = parent_noderev->data_rep; + const char *filename + = svn_fs_x__path_txn_node_children(fs, &parent_noderev->noderev_id, + scratch_pool, scratch_pool); + apr_file_t *file; + svn_stream_t *out; + svn_fs_x__data_t *ffd = fs->fsap_data; + apr_pool_t *subpool = svn_pool_create(scratch_pool); + + if (!rep || !svn_fs_x__is_txn(rep->id.change_set)) + { + apr_array_header_t *entries; + + /* Before we can modify the directory, we need to dump its old + contents into a mutable representation file. */ + SVN_ERR(svn_fs_x__rep_contents_dir(&entries, fs, parent_noderev, + subpool, subpool)); + SVN_ERR(svn_io_file_open(&file, filename, + APR_WRITE | APR_CREATE | APR_BUFFERED, + APR_OS_DEFAULT, scratch_pool)); + out = svn_stream_from_aprfile2(file, TRUE, scratch_pool); + SVN_ERR(unparse_dir_entries(entries, out, subpool)); + + svn_pool_clear(subpool); + + /* Provide the parent with a data rep if it had none before + (directories so far empty). */ + if (!rep) + { + rep = apr_pcalloc(result_pool, sizeof(*rep)); + parent_noderev->data_rep = rep; + } + + /* Mark the node-rev's data rep as mutable. */ + rep->id.change_set = svn_fs_x__change_set_by_txn(txn_id); + rep->id.number = SVN_FS_X__ITEM_INDEX_UNUSED; + + /* Save noderev to disk. */ + SVN_ERR(svn_fs_x__put_node_revision(fs, parent_noderev, subpool)); + } + else + { + /* The directory rep is already mutable, so just open it for append. */ + SVN_ERR(svn_io_file_open(&file, filename, APR_WRITE | APR_APPEND, + APR_OS_DEFAULT, scratch_pool)); + out = svn_stream_from_aprfile2(file, TRUE, scratch_pool); + } + + /* update directory cache */ + { + /* build parameters: (name, new entry) pair */ + const svn_fs_x__id_t *key = &(parent_noderev->noderev_id); + replace_baton_t baton; + + baton.name = name; + baton.new_entry = NULL; + + if (id) + { + baton.new_entry = apr_pcalloc(subpool, sizeof(*baton.new_entry)); + baton.new_entry->name = name; + baton.new_entry->kind = kind; + baton.new_entry->id = *id; + } + + /* actually update the cached directory (if cached) */ + SVN_ERR(svn_cache__set_partial(ffd->dir_cache, key, + svn_fs_x__replace_dir_entry, &baton, + subpool)); + } + svn_pool_clear(subpool); + + /* Append an incremental hash entry for the entry change. */ + if (id) + { + svn_fs_x__dirent_t entry; + entry.name = name; + entry.id = *id; + entry.kind = kind; + + SVN_ERR(unparse_dir_entry(&entry, out, subpool)); + } + else + { + SVN_ERR(svn_stream_printf(out, subpool, "D %" APR_SIZE_T_FMT "\n%s\n", + strlen(name), name)); + } + + SVN_ERR(svn_io_file_close(file, subpool)); + svn_pool_destroy(subpool); + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__add_change(svn_fs_t *fs, + svn_fs_x__txn_id_t txn_id, + const char *path, + const svn_fs_x__id_t *id, + svn_fs_path_change_kind_t change_kind, + svn_boolean_t text_mod, + svn_boolean_t prop_mod, + svn_boolean_t mergeinfo_mod, + svn_node_kind_t node_kind, + svn_revnum_t copyfrom_rev, + const char *copyfrom_path, + apr_pool_t *scratch_pool) +{ + apr_file_t *file; + svn_fs_x__change_t change; + apr_hash_t *changes = apr_hash_make(scratch_pool); + + /* Not using APR_BUFFERED to append change in one atomic write operation. */ + SVN_ERR(svn_io_file_open(&file, + svn_fs_x__path_txn_changes(fs, txn_id, + scratch_pool), + APR_APPEND | APR_WRITE | APR_CREATE, + APR_OS_DEFAULT, scratch_pool)); + + change.path.data = path; + change.path.len = strlen(path); + change.noderev_id = *id; + change.change_kind = change_kind; + change.text_mod = text_mod; + change.prop_mod = prop_mod; + change.mergeinfo_mod = mergeinfo_mod ? svn_tristate_true + : svn_tristate_false; + change.node_kind = node_kind; + change.copyfrom_known = TRUE; + change.copyfrom_rev = copyfrom_rev; + if (copyfrom_path) + change.copyfrom_path = apr_pstrdup(scratch_pool, copyfrom_path); + + svn_hash_sets(changes, path, &change); + SVN_ERR(svn_fs_x__write_changes(svn_stream_from_aprfile2(file, TRUE, + scratch_pool), + fs, changes, FALSE, scratch_pool)); + + return svn_io_file_close(file, scratch_pool); +} + +/* This baton is used by the representation writing streams. It keeps + track of the checksum information as well as the total size of the + representation so far. */ +typedef struct rep_write_baton_t +{ + /* The FS we are writing to. */ + svn_fs_t *fs; + + /* Actual file to which we are writing. */ + svn_stream_t *rep_stream; + + /* A stream from the delta combiner. Data written here gets + deltified, then eventually written to rep_stream. */ + svn_stream_t *delta_stream; + + /* Where is this representation header stored. */ + apr_off_t rep_offset; + + /* Start of the actual data. */ + apr_off_t delta_start; + + /* How many bytes have been written to this rep already. */ + svn_filesize_t rep_size; + + /* The node revision for which we're writing out info. */ + svn_fs_x__noderev_t *noderev; + + /* Actual output file. */ + apr_file_t *file; + /* Lock 'cookie' used to unlock the output file once we've finished + writing to it. */ + void *lockcookie; + + svn_checksum_ctx_t *md5_checksum_ctx; + svn_checksum_ctx_t *sha1_checksum_ctx; + + /* Receives the low-level checksum when closing REP_STREAM. */ + apr_uint32_t fnv1a_checksum; + + /* Local pool, available for allocations that must remain valid as long + as this baton is used but may be cleaned up immediately afterwards. */ + apr_pool_t *local_pool; + + /* Outer / result pool. */ + apr_pool_t *result_pool; +} rep_write_baton_t; + +/* Handler for the write method of the representation writable stream. + BATON is a rep_write_baton_t, DATA is the data to write, and *LEN is + the length of this data. */ +static svn_error_t * +rep_write_contents(void *baton, + const char *data, + apr_size_t *len) +{ + rep_write_baton_t *b = baton; + + SVN_ERR(svn_checksum_update(b->md5_checksum_ctx, data, *len)); + SVN_ERR(svn_checksum_update(b->sha1_checksum_ctx, data, *len)); + b->rep_size += *len; + + return svn_stream_write(b->delta_stream, data, len); +} + +/* Set *SPANNED to the number of shards touched when walking WALK steps on + * NODEREV's predecessor chain in FS. + * Use SCRATCH_POOL for temporary allocations. + */ +static svn_error_t * +shards_spanned(int *spanned, + svn_fs_t *fs, + svn_fs_x__noderev_t *noderev, + int walk, + apr_pool_t *scratch_pool) +{ + svn_fs_x__data_t *ffd = fs->fsap_data; + int shard_size = ffd->max_files_per_dir; + apr_pool_t *iterpool; + + int count = walk ? 1 : 0; /* The start of a walk already touches a shard. */ + svn_revnum_t shard, last_shard = ffd->youngest_rev_cache / shard_size; + iterpool = svn_pool_create(scratch_pool); + while (walk-- && noderev->predecessor_count) + { + svn_fs_x__id_t id = noderev->predecessor_id; + + svn_pool_clear(iterpool); + SVN_ERR(svn_fs_x__get_node_revision(&noderev, fs, &id, scratch_pool, + iterpool)); + shard = svn_fs_x__get_revnum(id.change_set) / shard_size; + if (shard != last_shard) + { + ++count; + last_shard = shard; + } + } + svn_pool_destroy(iterpool); + + *spanned = count; + return SVN_NO_ERROR; +} + +/* Given a node-revision NODEREV in filesystem FS, return the + representation in *REP to use as the base for a text representation + delta if PROPS is FALSE. If PROPS has been set, a suitable props + base representation will be returned. Perform temporary allocations + in *POOL. */ +static svn_error_t * +choose_delta_base(svn_fs_x__representation_t **rep, + svn_fs_t *fs, + svn_fs_x__noderev_t *noderev, + svn_boolean_t props, + apr_pool_t *pool) +{ + /* The zero-based index (counting from the "oldest" end), along NODEREVs line + * predecessors, of the node-rev we will use as delta base. */ + int count; + /* The length of the linear part of a delta chain. (Delta chains use + * skip-delta bits for the high-order bits and are linear in the low-order + * bits.) */ + int walk; + svn_fs_x__noderev_t *base; + svn_fs_x__data_t *ffd = fs->fsap_data; + apr_pool_t *iterpool; + + /* If we have no predecessors, or that one is empty, then use the empty + * stream as a base. */ + if (! noderev->predecessor_count) + { + *rep = NULL; + return SVN_NO_ERROR; + } + + /* Flip the rightmost '1' bit of the predecessor count to determine + which file rev (counting from 0) we want to use. (To see why + count & (count - 1) unsets the rightmost set bit, think about how + you decrement a binary number.) */ + count = noderev->predecessor_count; + count = count & (count - 1); + + /* Finding the delta base over a very long distance can become extremely + expensive for very deep histories, possibly causing client timeouts etc. + OTOH, this is a rare operation and its gains are minimal. Lets simply + start deltification anew close every other 1000 changes or so. */ + walk = noderev->predecessor_count - count; + if (walk > (int)ffd->max_deltification_walk) + { + *rep = NULL; + return SVN_NO_ERROR; + } + + /* We use skip delta for limiting the number of delta operations + along very long node histories. Close to HEAD however, we create + a linear history to minimize delta size. */ + if (walk < (int)ffd->max_linear_deltification) + { + int shards; + SVN_ERR(shards_spanned(&shards, fs, noderev, walk, pool)); + + /* We also don't want the linear deltification to span more shards + than if deltas we used in a simple skip-delta scheme. */ + if ((1 << (--shards)) <= walk) + count = noderev->predecessor_count - 1; + } + + /* Walk back a number of predecessors equal to the difference + between count and the original predecessor count. (For example, + if noderev has ten predecessors and we want the eighth file rev, + walk back two predecessors.) */ + base = noderev; + iterpool = svn_pool_create(pool); + while ((count++) < noderev->predecessor_count) + { + svn_fs_x__id_t id = noderev->predecessor_id; + svn_pool_clear(iterpool); + SVN_ERR(svn_fs_x__get_node_revision(&base, fs, &id, pool, iterpool)); + } + svn_pool_destroy(iterpool); + + /* return a suitable base representation */ + *rep = props ? base->prop_rep : base->data_rep; + + /* if we encountered a shared rep, its parent chain may be different + * from the node-rev parent chain. */ + if (*rep) + { + int chain_length = 0; + int shard_count = 0; + + /* Very short rep bases are simply not worth it as we are unlikely + * to re-coup the deltification space overhead of 20+ bytes. */ + svn_filesize_t rep_size = (*rep)->expanded_size + ? (*rep)->expanded_size + : (*rep)->size; + if (rep_size < 64) + { + *rep = NULL; + return SVN_NO_ERROR; + } + + /* Check whether the length of the deltification chain is acceptable. + * Otherwise, shared reps may form a non-skipping delta chain in + * extreme cases. */ + SVN_ERR(svn_fs_x__rep_chain_length(&chain_length, &shard_count, + *rep, fs, pool)); + + /* Some reasonable limit, depending on how acceptable longer linear + * chains are in this repo. Also, allow for some minimal chain. */ + if (chain_length >= 2 * (int)ffd->max_linear_deltification + 2) + *rep = NULL; + else + /* To make it worth opening additional shards / pack files, we + * require that the reps have a certain minimal size. To deltify + * against a rep in different shard, the lower limit is 512 bytes + * and doubles with every extra shard to visit along the delta + * chain. */ + if ( shard_count > 1 + && ((svn_filesize_t)128 << shard_count) >= rep_size) + *rep = NULL; + } + + return SVN_NO_ERROR; +} + +/* Something went wrong and the pool for the rep write is being + cleared before we've finished writing the rep. So we need + to remove the rep from the protorevfile and we need to unlock + the protorevfile. */ +static apr_status_t +rep_write_cleanup(void *data) +{ + svn_error_t *err; + rep_write_baton_t *b = data; + svn_fs_x__txn_id_t txn_id + = svn_fs_x__get_txn_id(b->noderev->noderev_id.change_set); + + /* Truncate and close the protorevfile. */ + err = svn_io_file_trunc(b->file, b->rep_offset, b->local_pool); + err = svn_error_compose_create(err, svn_io_file_close(b->file, + b->local_pool)); + + /* Remove our lock regardless of any preceding errors so that the + being_written flag is always removed and stays consistent with the + file lock which will be removed no matter what since the pool is + going away. */ + err = svn_error_compose_create(err, + unlock_proto_rev(b->fs, txn_id, + b->lockcookie, + b->local_pool)); + if (err) + { + apr_status_t rc = err->apr_err; + svn_error_clear(err); + return rc; + } + + return APR_SUCCESS; +} + +/* Get a rep_write_baton_t, allocated from RESULT_POOL, and store it in + WB_P for the representation indicated by NODEREV in filesystem FS. + Only appropriate for file contents, not for props or directory contents. + */ +static svn_error_t * +rep_write_get_baton(rep_write_baton_t **wb_p, + svn_fs_t *fs, + svn_fs_x__noderev_t *noderev, + apr_pool_t *result_pool) +{ + svn_fs_x__data_t *ffd = fs->fsap_data; + rep_write_baton_t *b; + apr_file_t *file; + svn_fs_x__representation_t *base_rep; + svn_stream_t *source; + svn_txdelta_window_handler_t wh; + void *whb; + int diff_version = 1; + svn_fs_x__rep_header_t header = { 0 }; + svn_fs_x__txn_id_t txn_id + = svn_fs_x__get_txn_id(noderev->noderev_id.change_set); + + b = apr_pcalloc(result_pool, sizeof(*b)); + + b->sha1_checksum_ctx = svn_checksum_ctx_create(svn_checksum_sha1, + result_pool); + b->md5_checksum_ctx = svn_checksum_ctx_create(svn_checksum_md5, + result_pool); + + b->fs = fs; + b->result_pool = result_pool; + b->local_pool = svn_pool_create(result_pool); + b->rep_size = 0; + b->noderev = noderev; + + /* Open the prototype rev file and seek to its end. */ + SVN_ERR(get_writable_proto_rev(&file, &b->lockcookie, fs, txn_id, + b->local_pool)); + + b->file = file; + b->rep_stream = svn_checksum__wrap_write_stream_fnv1a_32x4( + &b->fnv1a_checksum, + svn_stream_from_aprfile2(file, TRUE, + b->local_pool), + b->local_pool); + + SVN_ERR(svn_fs_x__get_file_offset(&b->rep_offset, file, b->local_pool)); + + /* Get the base for this delta. */ + SVN_ERR(choose_delta_base(&base_rep, fs, noderev, FALSE, b->local_pool)); + SVN_ERR(svn_fs_x__get_contents(&source, fs, base_rep, TRUE, + b->local_pool)); + + /* Write out the rep header. */ + if (base_rep) + { + header.base_revision = svn_fs_x__get_revnum(base_rep->id.change_set); + header.base_item_index = base_rep->id.number; + header.base_length = base_rep->size; + header.type = svn_fs_x__rep_delta; + } + else + { + header.type = svn_fs_x__rep_self_delta; + } + SVN_ERR(svn_fs_x__write_rep_header(&header, b->rep_stream, + b->local_pool)); + + /* Now determine the offset of the actual svndiff data. */ + SVN_ERR(svn_fs_x__get_file_offset(&b->delta_start, file, + b->local_pool)); + + /* Cleanup in case something goes wrong. */ + apr_pool_cleanup_register(b->local_pool, b, rep_write_cleanup, + apr_pool_cleanup_null); + + /* Prepare to write the svndiff data. */ + svn_txdelta_to_svndiff3(&wh, + &whb, + svn_stream_disown(b->rep_stream, b->result_pool), + diff_version, + ffd->delta_compression_level, + result_pool); + + b->delta_stream = svn_txdelta_target_push(wh, whb, source, + b->result_pool); + + *wb_p = b; + + return SVN_NO_ERROR; +} + +/* For REP->SHA1_CHECKSUM, try to find an already existing representation + in FS and return it in *OUT_REP. If no such representation exists or + if rep sharing has been disabled for FS, NULL will be returned. Since + there may be new duplicate representations within the same uncommitted + revision, those can be passed in REPS_HASH (maps a sha1 digest onto + svn_fs_x__representation_t*), otherwise pass in NULL for REPS_HASH. + Use RESULT_POOL for *OLD_REP allocations and SCRATCH_POOL for temporaries. + The lifetime of *OLD_REP is limited by both, RESULT_POOL and REP lifetime. + */ +static svn_error_t * +get_shared_rep(svn_fs_x__representation_t **old_rep, + svn_fs_t *fs, + svn_fs_x__representation_t *rep, + apr_hash_t *reps_hash, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool) +{ + svn_error_t *err; + svn_fs_x__data_t *ffd = fs->fsap_data; + + /* Return NULL, if rep sharing has been disabled. */ + *old_rep = NULL; + if (!ffd->rep_sharing_allowed) + return SVN_NO_ERROR; + + /* Check and see if we already have a representation somewhere that's + identical to the one we just wrote out. Start with the hash lookup + because it is cheepest. */ + if (reps_hash) + *old_rep = apr_hash_get(reps_hash, + rep->sha1_digest, + APR_SHA1_DIGESTSIZE); + + /* If we haven't found anything yet, try harder and consult our DB. */ + if (*old_rep == NULL) + { + svn_checksum_t checksum; + checksum.digest = rep->sha1_digest; + checksum.kind = svn_checksum_sha1; + err = svn_fs_x__get_rep_reference(old_rep, fs, &checksum, result_pool, + scratch_pool); + + /* ### Other error codes that we shouldn't mask out? */ + if (err == SVN_NO_ERROR) + { + if (*old_rep) + SVN_ERR(svn_fs_x__check_rep(*old_rep, fs, scratch_pool)); + } + else if (err->apr_err == SVN_ERR_FS_CORRUPT + || SVN_ERROR_IN_CATEGORY(err->apr_err, + SVN_ERR_MALFUNC_CATEGORY_START)) + { + /* Fatal error; don't mask it. + + In particular, this block is triggered when the rep-cache refers + to revisions in the future. We signal that as a corruption situation + since, once those revisions are less than youngest (because of more + commits), the rep-cache would be invalid. + */ + SVN_ERR(err); + } + else + { + /* Something's wrong with the rep-sharing index. We can continue + without rep-sharing, but warn. + */ + (fs->warning)(fs->warning_baton, err); + svn_error_clear(err); + *old_rep = NULL; + } + } + + /* look for intra-revision matches (usually data reps but not limited + to them in case props happen to look like some data rep) + */ + if (*old_rep == NULL && svn_fs_x__is_txn(rep->id.change_set)) + { + svn_node_kind_t kind; + const char *file_name + = svn_fs_x__path_txn_sha1(fs, + svn_fs_x__get_txn_id(rep->id.change_set), + rep->sha1_digest, scratch_pool); + + /* in our txn, is there a rep file named with the wanted SHA1? + If so, read it and use that rep. + */ + SVN_ERR(svn_io_check_path(file_name, &kind, scratch_pool)); + if (kind == svn_node_file) + { + svn_stringbuf_t *rep_string; + SVN_ERR(svn_stringbuf_from_file2(&rep_string, file_name, + scratch_pool)); + SVN_ERR(svn_fs_x__parse_representation(old_rep, rep_string, + result_pool, scratch_pool)); + } + } + + /* Add information that is missing in the cached data. */ + if (*old_rep) + { + /* Use the old rep for this content. */ + memcpy((*old_rep)->md5_digest, rep->md5_digest, sizeof(rep->md5_digest)); + } + + return SVN_NO_ERROR; +} + +/* Copy the hash sum calculation results from MD5_CTX, SHA1_CTX into REP. + * Use SCRATCH_POOL for temporary allocations. + */ +static svn_error_t * +digests_final(svn_fs_x__representation_t *rep, + const svn_checksum_ctx_t *md5_ctx, + const svn_checksum_ctx_t *sha1_ctx, + apr_pool_t *scratch_pool) +{ + svn_checksum_t *checksum; + + SVN_ERR(svn_checksum_final(&checksum, md5_ctx, scratch_pool)); + memcpy(rep->md5_digest, checksum->digest, svn_checksum_size(checksum)); + SVN_ERR(svn_checksum_final(&checksum, sha1_ctx, scratch_pool)); + rep->has_sha1 = checksum != NULL; + if (rep->has_sha1) + memcpy(rep->sha1_digest, checksum->digest, svn_checksum_size(checksum)); + + return SVN_NO_ERROR; +} + +/* Close handler for the representation write stream. BATON is a + rep_write_baton_t. Writes out a new node-rev that correctly + references the representation we just finished writing. */ +static svn_error_t * +rep_write_contents_close(void *baton) +{ + rep_write_baton_t *b = baton; + svn_fs_x__representation_t *rep; + svn_fs_x__representation_t *old_rep; + apr_off_t offset; + apr_int64_t txn_id; + + rep = apr_pcalloc(b->result_pool, sizeof(*rep)); + + /* Close our delta stream so the last bits of svndiff are written + out. */ + SVN_ERR(svn_stream_close(b->delta_stream)); + + /* Determine the length of the svndiff data. */ + SVN_ERR(svn_fs_x__get_file_offset(&offset, b->file, b->local_pool)); + rep->size = offset - b->delta_start; + + /* Fill in the rest of the representation field. */ + rep->expanded_size = b->rep_size; + txn_id = svn_fs_x__get_txn_id(b->noderev->noderev_id.change_set); + rep->id.change_set = svn_fs_x__change_set_by_txn(txn_id); + + /* Finalize the checksum. */ + SVN_ERR(digests_final(rep, b->md5_checksum_ctx, b->sha1_checksum_ctx, + b->result_pool)); + + /* Check and see if we already have a representation somewhere that's + identical to the one we just wrote out. */ + SVN_ERR(get_shared_rep(&old_rep, b->fs, rep, NULL, b->result_pool, + b->local_pool)); + + if (old_rep) + { + /* We need to erase from the protorev the data we just wrote. */ + SVN_ERR(svn_io_file_trunc(b->file, b->rep_offset, b->local_pool)); + + /* Use the old rep for this content. */ + b->noderev->data_rep = old_rep; + } + else + { + /* Write out our cosmetic end marker. */ + SVN_ERR(svn_stream_puts(b->rep_stream, "ENDREP\n")); + SVN_ERR(allocate_item_index(&rep->id.number, b->fs, txn_id, + b->local_pool)); + SVN_ERR(store_l2p_index_entry(b->fs, txn_id, b->rep_offset, + rep->id.number, b->local_pool)); + + b->noderev->data_rep = rep; + } + + SVN_ERR(svn_stream_close(b->rep_stream)); + + /* Remove cleanup callback. */ + apr_pool_cleanup_kill(b->local_pool, b, rep_write_cleanup); + + /* Write out the new node-rev information. */ + SVN_ERR(svn_fs_x__put_node_revision(b->fs, b->noderev, b->local_pool)); + if (!old_rep) + { + svn_fs_x__p2l_entry_t entry; + svn_fs_x__id_t noderev_id; + noderev_id.change_set = SVN_FS_X__INVALID_CHANGE_SET; + noderev_id.number = rep->id.number; + + entry.offset = b->rep_offset; + SVN_ERR(svn_fs_x__get_file_offset(&offset, b->file, b->local_pool)); + entry.size = offset - b->rep_offset; + entry.type = SVN_FS_X__ITEM_TYPE_FILE_REP; + entry.item_count = 1; + entry.items = &noderev_id; + entry.fnv1_checksum = b->fnv1a_checksum; + + SVN_ERR(store_sha1_rep_mapping(b->fs, b->noderev, b->local_pool)); + SVN_ERR(store_p2l_index_entry(b->fs, txn_id, &entry, b->local_pool)); + } + + SVN_ERR(svn_io_file_close(b->file, b->local_pool)); + SVN_ERR(unlock_proto_rev(b->fs, txn_id, b->lockcookie, b->local_pool)); + svn_pool_destroy(b->local_pool); + + return SVN_NO_ERROR; +} + +/* Store a writable stream in *CONTENTS_P, allocated in RESULT_POOL, that + will receive all data written and store it as the file data representation + referenced by NODEREV in filesystem FS. Only appropriate for file data, + not props or directory contents. */ +static svn_error_t * +set_representation(svn_stream_t **contents_p, + svn_fs_t *fs, + svn_fs_x__noderev_t *noderev, + apr_pool_t *result_pool) +{ + rep_write_baton_t *wb; + + if (! svn_fs_x__is_txn(noderev->noderev_id.change_set)) + return svn_error_createf(SVN_ERR_FS_CORRUPT, NULL, + _("Attempted to write to non-transaction '%s'"), + svn_fs_x__id_unparse(&noderev->noderev_id, + result_pool)->data); + + SVN_ERR(rep_write_get_baton(&wb, fs, noderev, result_pool)); + + *contents_p = svn_stream_create(wb, result_pool); + svn_stream_set_write(*contents_p, rep_write_contents); + svn_stream_set_close(*contents_p, rep_write_contents_close); + + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__set_contents(svn_stream_t **stream, + svn_fs_t *fs, + svn_fs_x__noderev_t *noderev, + apr_pool_t *result_pool) +{ + if (noderev->kind != svn_node_file) + return svn_error_create(SVN_ERR_FS_NOT_FILE, NULL, + _("Can't set text contents of a directory")); + + return set_representation(stream, fs, noderev, result_pool); +} + +svn_error_t * +svn_fs_x__create_successor(svn_fs_t *fs, + svn_fs_x__noderev_t *new_noderev, + const svn_fs_x__id_t *copy_id, + svn_fs_x__txn_id_t txn_id, + apr_pool_t *scratch_pool) +{ + new_noderev->copy_id = *copy_id; + new_noderev->noderev_id.change_set = svn_fs_x__change_set_by_txn(txn_id); + SVN_ERR(allocate_item_index(&new_noderev->noderev_id.number, fs, txn_id, + scratch_pool)); + + if (! new_noderev->copyroot_path) + { + new_noderev->copyroot_path + = apr_pstrdup(scratch_pool, new_noderev->created_path); + new_noderev->copyroot_rev + = svn_fs_x__get_revnum(new_noderev->noderev_id.change_set); + } + + SVN_ERR(svn_fs_x__put_node_revision(fs, new_noderev, scratch_pool)); + + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__set_proplist(svn_fs_t *fs, + svn_fs_x__noderev_t *noderev, + apr_hash_t *proplist, + apr_pool_t *scratch_pool) +{ + const svn_fs_x__id_t *id = &noderev->noderev_id; + const char *filename = svn_fs_x__path_txn_node_props(fs, id, scratch_pool, + scratch_pool); + apr_file_t *file; + svn_stream_t *out; + + /* Dump the property list to the mutable property file. */ + SVN_ERR(svn_io_file_open(&file, filename, + APR_WRITE | APR_CREATE | APR_TRUNCATE + | APR_BUFFERED, APR_OS_DEFAULT, scratch_pool)); + out = svn_stream_from_aprfile2(file, TRUE, scratch_pool); + SVN_ERR(svn_hash_write2(proplist, out, SVN_HASH_TERMINATOR, scratch_pool)); + SVN_ERR(svn_io_file_close(file, scratch_pool)); + + /* Mark the node-rev's prop rep as mutable, if not already done. */ + if (!noderev->prop_rep + || svn_fs_x__is_revision(noderev->prop_rep->id.change_set)) + { + svn_fs_x__txn_id_t txn_id + = svn_fs_x__get_txn_id(noderev->noderev_id.change_set); + noderev->prop_rep = apr_pcalloc(scratch_pool, sizeof(*noderev->prop_rep)); + noderev->prop_rep->id.change_set = id->change_set; + SVN_ERR(allocate_item_index(&noderev->prop_rep->id.number, fs, + txn_id, scratch_pool)); + SVN_ERR(svn_fs_x__put_node_revision(fs, noderev, scratch_pool)); + } + + return SVN_NO_ERROR; +} + +/* This baton is used by the stream created for write_container_rep. */ +typedef struct write_container_baton_t +{ + svn_stream_t *stream; + + apr_size_t size; + + svn_checksum_ctx_t *md5_ctx; + svn_checksum_ctx_t *sha1_ctx; +} write_container_baton_t; + +/* The handler for the write_container_rep stream. BATON is a + write_container_baton_t, DATA has the data to write and *LEN is the number + of bytes to write. */ +static svn_error_t * +write_container_handler(void *baton, + const char *data, + apr_size_t *len) +{ + write_container_baton_t *whb = baton; + + SVN_ERR(svn_checksum_update(whb->md5_ctx, data, *len)); + SVN_ERR(svn_checksum_update(whb->sha1_ctx, data, *len)); + + SVN_ERR(svn_stream_write(whb->stream, data, len)); + whb->size += *len; + + return SVN_NO_ERROR; +} + +/* Callback function type. Write the data provided by BATON into STREAM. */ +typedef svn_error_t * +(* collection_writer_t)(svn_stream_t *stream, + void *baton, + apr_pool_t *scratch_pool); + +/* Implement collection_writer_t writing the C string->svn_string_t hash + given as BATON. */ +static svn_error_t * +write_hash_to_stream(svn_stream_t *stream, + void *baton, + apr_pool_t *scratch_pool) +{ + apr_hash_t *hash = baton; + SVN_ERR(svn_hash_write2(hash, stream, SVN_HASH_TERMINATOR, scratch_pool)); + + return SVN_NO_ERROR; +} + +/* Implement collection_writer_t writing the svn_fs_x__dirent_t* array given + as BATON. */ +static svn_error_t * +write_directory_to_stream(svn_stream_t *stream, + void *baton, + apr_pool_t *scratch_pool) +{ + apr_array_header_t *dir = baton; + SVN_ERR(unparse_dir_entries(dir, stream, scratch_pool)); + + return SVN_NO_ERROR; +} + + +/* Write out the COLLECTION pertaining to the NODEREV in FS as a deltified + text representation to file FILE using WRITER. In the process, record the + total size and the md5 digest in REP and add the representation of type + ITEM_TYPE to the indexes if necessary. If rep sharing has been enabled and + REPS_HASH is not NULL, it will be used in addition to the on-disk cache to + find earlier reps with the same content. When such existing reps can be + found, we will truncate the one just written from the file and return the + existing rep. + + If ITEM_TYPE is IS_PROPS equals SVN_FS_FS__ITEM_TYPE_*_PROPS, assume + that we want to a props representation as the base for our delta. + If FINAL_REVISION is not SVN_INVALID_REVNUM, use it to determine whether + to write to the proto-index files. + Perform temporary allocations in SCRATCH_POOL. + */ +static svn_error_t * +write_container_delta_rep(svn_fs_x__representation_t *rep, + apr_file_t *file, + void *collection, + collection_writer_t writer, + svn_fs_t *fs, + svn_fs_x__txn_id_t txn_id, + svn_fs_x__noderev_t *noderev, + apr_hash_t *reps_hash, + apr_uint32_t item_type, + svn_revnum_t final_revision, + apr_pool_t *scratch_pool) +{ + svn_fs_x__data_t *ffd = fs->fsap_data; + svn_txdelta_window_handler_t diff_wh; + void *diff_whb; + + svn_stream_t *file_stream; + svn_stream_t *stream; + svn_fs_x__representation_t *base_rep; + svn_fs_x__representation_t *old_rep; + svn_fs_x__p2l_entry_t entry; + svn_stream_t *source; + svn_fs_x__rep_header_t header = { 0 }; + + apr_off_t rep_end = 0; + apr_off_t delta_start = 0; + apr_off_t offset = 0; + + write_container_baton_t *whb; + int diff_version = 1; + svn_boolean_t is_props = (item_type == SVN_FS_X__ITEM_TYPE_FILE_PROPS) + || (item_type == SVN_FS_X__ITEM_TYPE_DIR_PROPS); + + /* Get the base for this delta. */ + SVN_ERR(choose_delta_base(&base_rep, fs, noderev, is_props, scratch_pool)); + SVN_ERR(svn_fs_x__get_contents(&source, fs, base_rep, FALSE, scratch_pool)); + + SVN_ERR(svn_fs_x__get_file_offset(&offset, file, scratch_pool)); + + /* Write out the rep header. */ + if (base_rep) + { + header.base_revision = svn_fs_x__get_revnum(base_rep->id.change_set); + header.base_item_index = base_rep->id.number; + header.base_length = base_rep->size; + header.type = svn_fs_x__rep_delta; + } + else + { + header.type = svn_fs_x__rep_self_delta; + } + + file_stream = svn_checksum__wrap_write_stream_fnv1a_32x4( + &entry.fnv1_checksum, + svn_stream_from_aprfile2(file, TRUE, + scratch_pool), + scratch_pool); + SVN_ERR(svn_fs_x__write_rep_header(&header, file_stream, scratch_pool)); + SVN_ERR(svn_fs_x__get_file_offset(&delta_start, file, scratch_pool)); + + /* Prepare to write the svndiff data. */ + svn_txdelta_to_svndiff3(&diff_wh, + &diff_whb, + svn_stream_disown(file_stream, scratch_pool), + diff_version, + ffd->delta_compression_level, + scratch_pool); + + whb = apr_pcalloc(scratch_pool, sizeof(*whb)); + whb->stream = svn_txdelta_target_push(diff_wh, diff_whb, source, + scratch_pool); + whb->size = 0; + whb->md5_ctx = svn_checksum_ctx_create(svn_checksum_md5, scratch_pool); + whb->sha1_ctx = svn_checksum_ctx_create(svn_checksum_sha1, scratch_pool); + + /* serialize the hash */ + stream = svn_stream_create(whb, scratch_pool); + svn_stream_set_write(stream, write_container_handler); + + SVN_ERR(writer(stream, collection, scratch_pool)); + SVN_ERR(svn_stream_close(whb->stream)); + + /* Store the results. */ + SVN_ERR(digests_final(rep, whb->md5_ctx, whb->sha1_ctx, scratch_pool)); + + /* Check and see if we already have a representation somewhere that's + identical to the one we just wrote out. */ + SVN_ERR(get_shared_rep(&old_rep, fs, rep, reps_hash, scratch_pool, + scratch_pool)); + + if (old_rep) + { + SVN_ERR(svn_stream_close(file_stream)); + + /* We need to erase from the protorev the data we just wrote. */ + SVN_ERR(svn_io_file_trunc(file, offset, scratch_pool)); + + /* Use the old rep for this content. */ + memcpy(rep, old_rep, sizeof (*rep)); + } + else + { + svn_fs_x__id_t noderev_id; + + /* Write out our cosmetic end marker. */ + SVN_ERR(svn_fs_x__get_file_offset(&rep_end, file, scratch_pool)); + SVN_ERR(svn_stream_puts(file_stream, "ENDREP\n")); + SVN_ERR(svn_stream_close(file_stream)); + + SVN_ERR(allocate_item_index(&rep->id.number, fs, txn_id, + scratch_pool)); + SVN_ERR(store_l2p_index_entry(fs, txn_id, offset, rep->id.number, + scratch_pool)); + + noderev_id.change_set = SVN_FS_X__INVALID_CHANGE_SET; + noderev_id.number = rep->id.number; + + entry.offset = offset; + SVN_ERR(svn_fs_x__get_file_offset(&offset, file, scratch_pool)); + entry.size = offset - entry.offset; + entry.type = item_type; + entry.item_count = 1; + entry.items = &noderev_id; + + SVN_ERR(store_p2l_index_entry(fs, txn_id, &entry, scratch_pool)); + + /* update the representation */ + rep->expanded_size = whb->size; + rep->size = rep_end - delta_start; + } + + return SVN_NO_ERROR; +} + +/* Sanity check ROOT_NODEREV, a candidate for being the root node-revision + of (not yet committed) revision REV in FS. Use SCRATCH_POOL for temporary + allocations. + + If you change this function, consider updating svn_fs_x__verify() too. + */ +static svn_error_t * +validate_root_noderev(svn_fs_t *fs, + svn_fs_x__noderev_t *root_noderev, + svn_revnum_t rev, + apr_pool_t *scratch_pool) +{ + svn_revnum_t head_revnum = rev-1; + int head_predecessor_count; + + SVN_ERR_ASSERT(rev > 0); + + /* Compute HEAD_PREDECESSOR_COUNT. */ + { + svn_fs_x__id_t head_root_id; + svn_fs_x__noderev_t *head_root_noderev; + + /* Get /@HEAD's noderev. */ + svn_fs_x__init_rev_root(&head_root_id, head_revnum); + SVN_ERR(svn_fs_x__get_node_revision(&head_root_noderev, fs, + &head_root_id, scratch_pool, + scratch_pool)); + + head_predecessor_count = head_root_noderev->predecessor_count; + } + + /* Check that the root noderev's predecessor count equals REV. + + This kind of corruption was seen on svn.apache.org (both on + the root noderev and on other fspaths' noderevs); see + issue #4129. + + Normally (rev == root_noderev->predecessor_count), but here we + use a more roundabout check that should only trigger on new instances + of the corruption, rather then trigger on each and every new commit + to a repository that has triggered the bug somewhere in its root + noderev's history. + */ + if ((root_noderev->predecessor_count - head_predecessor_count) + != (rev - head_revnum)) + { + return svn_error_createf(SVN_ERR_FS_CORRUPT, NULL, + _("predecessor count for " + "the root node-revision is wrong: " + "found (%d+%ld != %d), committing r%ld"), + head_predecessor_count, + rev - head_revnum, /* This is equal to 1. */ + root_noderev->predecessor_count, + rev); + } + + return SVN_NO_ERROR; +} + +/* Given the potentially txn-local id PART, update that to a permanent ID + * based on the REVISION. + */ +static void +get_final_id(svn_fs_x__id_t *part, + svn_revnum_t revision) +{ + if (!svn_fs_x__is_revision(part->change_set)) + part->change_set = svn_fs_x__change_set_by_rev(revision); +} + +/* Copy a node-revision specified by id ID in fileystem FS from a + transaction into the proto-rev-file FILE. Set *NEW_ID_P to a + pointer to the new noderev-id. If this is a directory, copy all + children as well. + + START_NODE_ID and START_COPY_ID are + the first available node and copy ids for this filesystem, for older + FS formats. + + REV is the revision number that this proto-rev-file will represent. + + INITIAL_OFFSET is the offset of the proto-rev-file on entry to + commit_body. + + If REPS_TO_CACHE is not NULL, append to it a copy (allocated in + REPS_POOL) of each data rep that is new in this revision. + + If REPS_HASH is not NULL, append copies (allocated in REPS_POOL) + of the representations of each property rep that is new in this + revision. + + AT_ROOT is true if the node revision being written is the root + node-revision. It is only controls additional sanity checking + logic. + + Temporary allocations are also from SCRATCH_POOL. */ +static svn_error_t * +write_final_rev(svn_fs_x__id_t *new_id_p, + apr_file_t *file, + svn_revnum_t rev, + svn_fs_t *fs, + const svn_fs_x__id_t *id, + apr_off_t initial_offset, + apr_array_header_t *reps_to_cache, + apr_hash_t *reps_hash, + apr_pool_t *reps_pool, + svn_boolean_t at_root, + apr_pool_t *scratch_pool) +{ + svn_fs_x__noderev_t *noderev; + apr_off_t my_offset; + svn_fs_x__id_t new_id; + svn_fs_x__id_t noderev_id; + svn_fs_x__data_t *ffd = fs->fsap_data; + svn_fs_x__txn_id_t txn_id = svn_fs_x__get_txn_id(id->change_set); + svn_fs_x__p2l_entry_t entry; + svn_fs_x__change_set_t change_set = svn_fs_x__change_set_by_rev(rev); + svn_stream_t *file_stream; + apr_pool_t *subpool; + + /* Check to see if this is a transaction node. */ + if (txn_id == SVN_FS_X__INVALID_TXN_ID) + { + svn_fs_x__id_reset(new_id_p); + return SVN_NO_ERROR; + } + + subpool = svn_pool_create(scratch_pool); + SVN_ERR(svn_fs_x__get_node_revision(&noderev, fs, id, scratch_pool, + subpool)); + + if (noderev->kind == svn_node_dir) + { + apr_array_header_t *entries; + int i; + + /* This is a directory. Write out all the children first. */ + + SVN_ERR(svn_fs_x__rep_contents_dir(&entries, fs, noderev, scratch_pool, + subpool)); + for (i = 0; i < entries->nelts; ++i) + { + svn_fs_x__dirent_t *dirent = APR_ARRAY_IDX(entries, i, + svn_fs_x__dirent_t *); + + svn_pool_clear(subpool); + SVN_ERR(write_final_rev(&new_id, file, rev, fs, &dirent->id, + initial_offset, reps_to_cache, reps_hash, + reps_pool, FALSE, subpool)); + if ( svn_fs_x__id_used(&new_id) + && (svn_fs_x__get_revnum(new_id.change_set) == rev)) + dirent->id = new_id; + } + + if (noderev->data_rep + && ! svn_fs_x__is_revision(noderev->data_rep->id.change_set)) + { + /* Write out the contents of this directory as a text rep. */ + noderev->data_rep->id.change_set = change_set; + SVN_ERR(write_container_delta_rep(noderev->data_rep, file, + entries, + write_directory_to_stream, + fs, txn_id, noderev, NULL, + SVN_FS_X__ITEM_TYPE_DIR_REP, + rev, scratch_pool)); + } + } + else + { + /* This is a file. We should make sure the data rep, if it + exists in a "this" state, gets rewritten to our new revision + num. */ + + if (noderev->data_rep + && svn_fs_x__is_txn(noderev->data_rep->id.change_set)) + { + noderev->data_rep->id.change_set = change_set; + } + } + + svn_pool_destroy(subpool); + + /* Fix up the property reps. */ + if (noderev->prop_rep + && svn_fs_x__is_txn(noderev->prop_rep->id.change_set)) + { + apr_hash_t *proplist; + apr_uint32_t item_type = noderev->kind == svn_node_dir + ? SVN_FS_X__ITEM_TYPE_DIR_PROPS + : SVN_FS_X__ITEM_TYPE_FILE_PROPS; + SVN_ERR(svn_fs_x__get_proplist(&proplist, fs, noderev, scratch_pool, + scratch_pool)); + + noderev->prop_rep->id.change_set = change_set; + + SVN_ERR(write_container_delta_rep(noderev->prop_rep, file, proplist, + write_hash_to_stream, fs, txn_id, + noderev, reps_hash, item_type, rev, + scratch_pool)); + } + + /* Convert our temporary ID into a permanent revision one. */ + get_final_id(&noderev->node_id, rev); + get_final_id(&noderev->copy_id, rev); + get_final_id(&noderev->noderev_id, rev); + + if (noderev->copyroot_rev == SVN_INVALID_REVNUM) + noderev->copyroot_rev = rev; + + SVN_ERR(svn_fs_x__get_file_offset(&my_offset, file, scratch_pool)); + + SVN_ERR(store_l2p_index_entry(fs, txn_id, my_offset, + noderev->noderev_id.number, scratch_pool)); + new_id = noderev->noderev_id; + + if (ffd->rep_sharing_allowed) + { + /* Save the data representation's hash in the rep cache. */ + if ( noderev->data_rep && noderev->kind == svn_node_file + && svn_fs_x__get_revnum(noderev->data_rep->id.change_set) == rev) + { + SVN_ERR_ASSERT(reps_to_cache && reps_pool); + APR_ARRAY_PUSH(reps_to_cache, svn_fs_x__representation_t *) + = svn_fs_x__rep_copy(noderev->data_rep, reps_pool); + } + + if ( noderev->prop_rep + && svn_fs_x__get_revnum(noderev->prop_rep->id.change_set) == rev) + { + /* Add new property reps to hash and on-disk cache. */ + svn_fs_x__representation_t *copy + = svn_fs_x__rep_copy(noderev->prop_rep, reps_pool); + + SVN_ERR_ASSERT(reps_to_cache && reps_pool); + APR_ARRAY_PUSH(reps_to_cache, svn_fs_x__representation_t *) = copy; + + apr_hash_set(reps_hash, + copy->sha1_digest, + APR_SHA1_DIGESTSIZE, + copy); + } + } + + /* don't serialize SHA1 for dirs to disk (waste of space) */ + if (noderev->data_rep && noderev->kind == svn_node_dir) + noderev->data_rep->has_sha1 = FALSE; + + /* don't serialize SHA1 for props to disk (waste of space) */ + if (noderev->prop_rep) + noderev->prop_rep->has_sha1 = FALSE; + + /* Write out our new node-revision. */ + if (at_root) + SVN_ERR(validate_root_noderev(fs, noderev, rev, scratch_pool)); + + file_stream = svn_checksum__wrap_write_stream_fnv1a_32x4( + &entry.fnv1_checksum, + svn_stream_from_aprfile2(file, TRUE, + scratch_pool), + scratch_pool); + SVN_ERR(svn_fs_x__write_noderev(file_stream, noderev, scratch_pool)); + SVN_ERR(svn_stream_close(file_stream)); + + /* reference the root noderev from the log-to-phys index */ + noderev_id = noderev->noderev_id; + noderev_id.change_set = SVN_FS_X__INVALID_CHANGE_SET; + + entry.offset = my_offset; + SVN_ERR(svn_fs_x__get_file_offset(&my_offset, file, scratch_pool)); + entry.size = my_offset - entry.offset; + entry.type = SVN_FS_X__ITEM_TYPE_NODEREV; + entry.item_count = 1; + entry.items = &noderev_id; + + SVN_ERR(store_p2l_index_entry(fs, txn_id, &entry, scratch_pool)); + + /* Return our ID that references the revision file. */ + *new_id_p = new_id; + + return SVN_NO_ERROR; +} + +/* Write the changed path info CHANGED_PATHS from transaction TXN_ID to the + permanent rev-file FILE representing NEW_REV in filesystem FS. *OFFSET_P + is set the to offset in the file of the beginning of this information. + NEW_REV is the revision currently being committed. + Perform temporary allocations in SCRATCH_POOL. */ +static svn_error_t * +write_final_changed_path_info(apr_off_t *offset_p, + apr_file_t *file, + svn_fs_t *fs, + svn_fs_x__txn_id_t txn_id, + apr_hash_t *changed_paths, + svn_revnum_t new_rev, + apr_pool_t *scratch_pool) +{ + apr_off_t offset; + svn_stream_t *stream; + svn_fs_x__p2l_entry_t entry; + svn_fs_x__id_t rev_item + = {SVN_INVALID_REVNUM, SVN_FS_X__ITEM_INDEX_CHANGES}; + + SVN_ERR(svn_fs_x__get_file_offset(&offset, file, scratch_pool)); + + /* write to target file & calculate checksum */ + stream = svn_checksum__wrap_write_stream_fnv1a_32x4(&entry.fnv1_checksum, + svn_stream_from_aprfile2(file, TRUE, scratch_pool), + scratch_pool); + SVN_ERR(svn_fs_x__write_changes(stream, fs, changed_paths, TRUE, + scratch_pool)); + SVN_ERR(svn_stream_close(stream)); + + *offset_p = offset; + + /* reference changes from the indexes */ + entry.offset = offset; + SVN_ERR(svn_fs_x__get_file_offset(&offset, file, scratch_pool)); + entry.size = offset - entry.offset; + entry.type = SVN_FS_X__ITEM_TYPE_CHANGES; + entry.item_count = 1; + entry.items = &rev_item; + + SVN_ERR(store_p2l_index_entry(fs, txn_id, &entry, scratch_pool)); + SVN_ERR(store_l2p_index_entry(fs, txn_id, entry.offset, + SVN_FS_X__ITEM_INDEX_CHANGES, scratch_pool)); + + return SVN_NO_ERROR; +} + +/* Open a new svn_fs_t handle to FS, set that handle's concept of "current + youngest revision" to NEW_REV, and call svn_fs_x__verify_root() on + NEW_REV's revision root. + + Intended to be called as the very last step in a commit before 'current' + is bumped. This implies that we are holding the write lock. */ +static svn_error_t * +verify_as_revision_before_current_plus_plus(svn_fs_t *fs, + svn_revnum_t new_rev, + apr_pool_t *scratch_pool) +{ +#ifdef SVN_DEBUG + svn_fs_x__data_t *ffd = fs->fsap_data; + svn_fs_t *ft; /* fs++ == ft */ + svn_fs_root_t *root; + svn_fs_x__data_t *ft_ffd; + apr_hash_t *fs_config; + + SVN_ERR_ASSERT(ffd->svn_fs_open_); + + /* make sure FT does not simply return data cached by other instances + * but actually retrieves it from disk at least once. + */ + fs_config = apr_hash_make(scratch_pool); + svn_hash_sets(fs_config, SVN_FS_CONFIG_FSFS_CACHE_NS, + svn_uuid_generate(scratch_pool)); + SVN_ERR(ffd->svn_fs_open_(&ft, fs->path, + fs_config, + scratch_pool, + scratch_pool)); + ft_ffd = ft->fsap_data; + /* Don't let FT consult rep-cache.db, either. */ + ft_ffd->rep_sharing_allowed = FALSE; + + /* Time travel! */ + ft_ffd->youngest_rev_cache = new_rev; + + SVN_ERR(svn_fs_x__revision_root(&root, ft, new_rev, scratch_pool)); + SVN_ERR_ASSERT(root->is_txn_root == FALSE && root->rev == new_rev); + SVN_ERR_ASSERT(ft_ffd->youngest_rev_cache == new_rev); + SVN_ERR(svn_fs_x__verify_root(root, scratch_pool)); +#endif /* SVN_DEBUG */ + + return SVN_NO_ERROR; +} + +/* Verify that the user registered with FS has all the locks necessary to + permit all the changes associated with TXN_NAME. + The FS write lock is assumed to be held by the caller. */ +static svn_error_t * +verify_locks(svn_fs_t *fs, + svn_fs_x__txn_id_t txn_id, + apr_hash_t *changed_paths, + apr_pool_t *scratch_pool) +{ + apr_pool_t *iterpool; + apr_array_header_t *changed_paths_sorted; + svn_stringbuf_t *last_recursed = NULL; + int i; + + /* Make an array of the changed paths, and sort them depth-first-ily. */ + changed_paths_sorted = svn_sort__hash(changed_paths, + svn_sort_compare_items_as_paths, + scratch_pool); + + /* Now, traverse the array of changed paths, verify locks. Note + that if we need to do a recursive verification a path, we'll skip + over children of that path when we get to them. */ + iterpool = svn_pool_create(scratch_pool); + for (i = 0; i < changed_paths_sorted->nelts; i++) + { + const svn_sort__item_t *item; + const char *path; + svn_fs_x__change_t *change; + svn_boolean_t recurse = TRUE; + + svn_pool_clear(iterpool); + + item = &APR_ARRAY_IDX(changed_paths_sorted, i, svn_sort__item_t); + + /* Fetch the change associated with our path. */ + path = item->key; + change = item->value; + + /* If this path has already been verified as part of a recursive + check of one of its parents, no need to do it again. */ + if (last_recursed + && svn_fspath__skip_ancestor(last_recursed->data, path)) + continue; + + /* What does it mean to succeed at lock verification for a given + path? For an existing file or directory getting modified + (text, props), it means we hold the lock on the file or + directory. For paths being added or removed, we need to hold + the locks for that path and any children of that path. + + WHEW! We have no reliable way to determine the node kind + of deleted items, but fortunately we are going to do a + recursive check on deleted paths regardless of their kind. */ + if (change->change_kind == svn_fs_path_change_modify) + recurse = FALSE; + SVN_ERR(svn_fs_x__allow_locked_operation(path, fs, recurse, TRUE, + iterpool)); + + /* If we just did a recursive check, remember the path we + checked (so children can be skipped). */ + if (recurse) + { + if (! last_recursed) + last_recursed = svn_stringbuf_create(path, scratch_pool); + else + svn_stringbuf_set(last_recursed, path); + } + } + svn_pool_destroy(iterpool); + return SVN_NO_ERROR; +} + +/* Return in *PATH the path to a file containing the properties that + make up the final revision properties file. This involves setting + svn:date and removing any temporary properties associated with the + commit flags. */ +static svn_error_t * +write_final_revprop(const char **path, + svn_fs_txn_t *txn, + svn_fs_x__txn_id_t txn_id, + apr_pool_t *pool) +{ + apr_hash_t *txnprops; + svn_boolean_t final_mods = FALSE; + svn_string_t date; + svn_string_t *client_date; + + SVN_ERR(svn_fs_x__txn_proplist(&txnprops, txn, pool)); + + /* Remove any temporary txn props representing 'flags'. */ + if (svn_hash_gets(txnprops, SVN_FS__PROP_TXN_CHECK_OOD)) + { + svn_hash_sets(txnprops, SVN_FS__PROP_TXN_CHECK_OOD, NULL); + final_mods = TRUE; + } + + if (svn_hash_gets(txnprops, SVN_FS__PROP_TXN_CHECK_LOCKS)) + { + svn_hash_sets(txnprops, SVN_FS__PROP_TXN_CHECK_LOCKS, NULL); + final_mods = TRUE; + } + + client_date = svn_hash_gets(txnprops, SVN_FS__PROP_TXN_CLIENT_DATE); + if (client_date) + { + svn_hash_sets(txnprops, SVN_FS__PROP_TXN_CLIENT_DATE, NULL); + final_mods = TRUE; + } + + /* Update commit time to ensure that svn:date revprops remain ordered if + requested. */ + if (!client_date || strcmp(client_date->data, "1")) + { + date.data = svn_time_to_cstring(apr_time_now(), pool); + date.len = strlen(date.data); + svn_hash_sets(txnprops, SVN_PROP_REVISION_DATE, &date); + final_mods = TRUE; + } + + if (final_mods) + { + SVN_ERR(set_txn_proplist(txn->fs, txn_id, txnprops, TRUE, pool)); + *path = svn_fs_x__path_txn_props_final(txn->fs, txn_id, pool); + } + else + { + *path = svn_fs_x__path_txn_props(txn->fs, txn_id, pool); + } + + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__add_index_data(svn_fs_t *fs, + apr_file_t *file, + const char *l2p_proto_index, + const char *p2l_proto_index, + svn_revnum_t revision, + apr_pool_t *scratch_pool) +{ + apr_off_t l2p_offset; + apr_off_t p2l_offset; + svn_stringbuf_t *footer; + unsigned char footer_length; + svn_checksum_t *l2p_checksum; + svn_checksum_t *p2l_checksum; + + /* Append the actual index data to the pack file. */ + l2p_offset = 0; + SVN_ERR(svn_io_file_seek(file, APR_END, &l2p_offset, scratch_pool)); + SVN_ERR(svn_fs_x__l2p_index_append(&l2p_checksum, fs, file, + l2p_proto_index, revision, + scratch_pool, scratch_pool)); + + p2l_offset = 0; + SVN_ERR(svn_io_file_seek(file, APR_END, &p2l_offset, scratch_pool)); + SVN_ERR(svn_fs_x__p2l_index_append(&p2l_checksum, fs, file, + p2l_proto_index, revision, + scratch_pool, scratch_pool)); + + /* Append footer. */ + footer = svn_fs_x__unparse_footer(l2p_offset, l2p_checksum, + p2l_offset, p2l_checksum, scratch_pool, + scratch_pool); + SVN_ERR(svn_io_file_write_full(file, footer->data, footer->len, NULL, + scratch_pool)); + + footer_length = footer->len; + SVN_ERR_ASSERT(footer_length == footer->len); + SVN_ERR(svn_io_file_write_full(file, &footer_length, 1, NULL, + scratch_pool)); + + return SVN_NO_ERROR; +} + +/* Baton used for commit_body below. */ +typedef struct commit_baton_t { + svn_revnum_t *new_rev_p; + svn_fs_t *fs; + svn_fs_txn_t *txn; + apr_array_header_t *reps_to_cache; + apr_hash_t *reps_hash; + apr_pool_t *reps_pool; +} commit_baton_t; + +/* The work-horse for svn_fs_x__commit, called with the FS write lock. + This implements the svn_fs_x__with_write_lock() 'body' callback + type. BATON is a 'commit_baton_t *'. */ +static svn_error_t * +commit_body(void *baton, + apr_pool_t *scratch_pool) +{ + commit_baton_t *cb = baton; + svn_fs_x__data_t *ffd = cb->fs->fsap_data; + const char *old_rev_filename, *rev_filename, *proto_filename; + const char *revprop_filename, *final_revprop; + svn_fs_x__id_t root_id, new_root_id; + svn_revnum_t old_rev, new_rev; + apr_file_t *proto_file; + void *proto_file_lockcookie; + apr_off_t initial_offset, changed_path_offset; + svn_fs_x__txn_id_t txn_id = svn_fs_x__txn_get_id(cb->txn); + apr_hash_t *changed_paths; + + /* Re-Read the current repository format. All our repo upgrade and + config evaluation strategies are such that existing information in + FS and FFD remains valid. + + Although we don't recommend upgrading hot repositories, people may + still do it and we must make sure to either handle them gracefully + or to error out. + + Committing pre-format 3 txns will fail after upgrade to format 3+ + because the proto-rev cannot be found; no further action needed. + Upgrades from pre-f7 to f7+ means a potential change in addressing + mode for the final rev. We must be sure to detect that cause because + the failure would only manifest once the new revision got committed. + */ + SVN_ERR(svn_fs_x__read_format_file(cb->fs, scratch_pool)); + + /* Get the current youngest revision. */ + SVN_ERR(svn_fs_x__youngest_rev(&old_rev, cb->fs, scratch_pool)); + + /* Check to make sure this transaction is based off the most recent + revision. */ + if (cb->txn->base_rev != old_rev) + return svn_error_create(SVN_ERR_FS_TXN_OUT_OF_DATE, NULL, + _("Transaction out of date")); + + /* We need the changes list for verification as well as for writing it + to the final rev file. */ + SVN_ERR(svn_fs_x__txn_changes_fetch(&changed_paths, cb->fs, txn_id, + scratch_pool)); + + /* Locks may have been added (or stolen) between the calling of + previous svn_fs.h functions and svn_fs_commit_txn(), so we need + to re-examine every changed-path in the txn and re-verify all + discovered locks. */ + SVN_ERR(verify_locks(cb->fs, txn_id, changed_paths, scratch_pool)); + + /* We are going to be one better than this puny old revision. */ + new_rev = old_rev + 1; + + /* Get a write handle on the proto revision file. */ + SVN_ERR(get_writable_proto_rev(&proto_file, &proto_file_lockcookie, + cb->fs, txn_id, scratch_pool)); + SVN_ERR(svn_fs_x__get_file_offset(&initial_offset, proto_file, + scratch_pool)); + + /* Write out all the node-revisions and directory contents. */ + svn_fs_x__init_txn_root(&root_id, txn_id); + SVN_ERR(write_final_rev(&new_root_id, proto_file, new_rev, cb->fs, &root_id, + initial_offset, cb->reps_to_cache, cb->reps_hash, + cb->reps_pool, TRUE, scratch_pool)); + + /* Write the changed-path information. */ + SVN_ERR(write_final_changed_path_info(&changed_path_offset, proto_file, + cb->fs, txn_id, changed_paths, + new_rev, scratch_pool)); + + /* Append the index data to the rev file. */ + SVN_ERR(svn_fs_x__add_index_data(cb->fs, proto_file, + svn_fs_x__path_l2p_proto_index(cb->fs, txn_id, scratch_pool), + svn_fs_x__path_p2l_proto_index(cb->fs, txn_id, scratch_pool), + new_rev, scratch_pool)); + + SVN_ERR(svn_io_file_flush_to_disk(proto_file, scratch_pool)); + SVN_ERR(svn_io_file_close(proto_file, scratch_pool)); + + /* We don't unlock the prototype revision file immediately to avoid a + race with another caller writing to the prototype revision file + before we commit it. */ + + /* Create the shard for the rev and revprop file, if we're sharding and + this is the first revision of a new shard. We don't care if this + fails because the shard already existed for some reason. */ + if (new_rev % ffd->max_files_per_dir == 0) + { + /* Create the revs shard. */ + { + const char *new_dir + = svn_fs_x__path_rev_shard(cb->fs, new_rev, scratch_pool); + svn_error_t *err = svn_io_dir_make(new_dir, APR_OS_DEFAULT, + scratch_pool); + if (err && !APR_STATUS_IS_EEXIST(err->apr_err)) + return svn_error_trace(err); + svn_error_clear(err); + SVN_ERR(svn_io_copy_perms(svn_dirent_join(cb->fs->path, + PATH_REVS_DIR, + scratch_pool), + new_dir, scratch_pool)); + } + + /* Create the revprops shard. */ + SVN_ERR_ASSERT(! svn_fs_x__is_packed_revprop(cb->fs, new_rev)); + { + const char *new_dir + = svn_fs_x__path_revprops_shard(cb->fs, new_rev, scratch_pool); + svn_error_t *err = svn_io_dir_make(new_dir, APR_OS_DEFAULT, + scratch_pool); + if (err && !APR_STATUS_IS_EEXIST(err->apr_err)) + return svn_error_trace(err); + svn_error_clear(err); + SVN_ERR(svn_io_copy_perms(svn_dirent_join(cb->fs->path, + PATH_REVPROPS_DIR, + scratch_pool), + new_dir, scratch_pool)); + } + } + + /* Move the finished rev file into place. + + ### This "breaks" the transaction by removing the protorev file + ### but the revision is not yet complete. If this commit does + ### not complete for any reason the transaction will be lost. */ + old_rev_filename = svn_fs_x__path_rev_absolute(cb->fs, old_rev, + scratch_pool); + + rev_filename = svn_fs_x__path_rev(cb->fs, new_rev, scratch_pool); + proto_filename = svn_fs_x__path_txn_proto_rev(cb->fs, txn_id, + scratch_pool); + SVN_ERR(svn_fs_x__move_into_place(proto_filename, rev_filename, + old_rev_filename, scratch_pool)); + + /* Now that we've moved the prototype revision file out of the way, + we can unlock it (since further attempts to write to the file + will fail as it no longer exists). We must do this so that we can + remove the transaction directory later. */ + SVN_ERR(unlock_proto_rev(cb->fs, txn_id, proto_file_lockcookie, + scratch_pool)); + + /* Move the revprops file into place. */ + SVN_ERR_ASSERT(! svn_fs_x__is_packed_revprop(cb->fs, new_rev)); + SVN_ERR(write_final_revprop(&revprop_filename, cb->txn, txn_id, + scratch_pool)); + final_revprop = svn_fs_x__path_revprops(cb->fs, new_rev, scratch_pool); + SVN_ERR(svn_fs_x__move_into_place(revprop_filename, final_revprop, + old_rev_filename, scratch_pool)); + + /* Update the 'current' file. */ + SVN_ERR(verify_as_revision_before_current_plus_plus(cb->fs, new_rev, + scratch_pool)); + SVN_ERR(svn_fs_x__write_current(cb->fs, new_rev, scratch_pool)); + + /* At this point the new revision is committed and globally visible + so let the caller know it succeeded by giving it the new revision + number, which fulfills svn_fs_commit_txn() contract. Any errors + after this point do not change the fact that a new revision was + created. */ + *cb->new_rev_p = new_rev; + + ffd->youngest_rev_cache = new_rev; + + /* Remove this transaction directory. */ + SVN_ERR(svn_fs_x__purge_txn(cb->fs, cb->txn->id, scratch_pool)); + + return SVN_NO_ERROR; +} + +/* Add the representations in REPS_TO_CACHE (an array of + * svn_fs_x__representation_t *) to the rep-cache database of FS. */ +static svn_error_t * +write_reps_to_cache(svn_fs_t *fs, + const apr_array_header_t *reps_to_cache, + apr_pool_t *scratch_pool) +{ + int i; + + for (i = 0; i < reps_to_cache->nelts; i++) + { + svn_fs_x__representation_t *rep + = APR_ARRAY_IDX(reps_to_cache, i, svn_fs_x__representation_t *); + + /* FALSE because we don't care if another parallel commit happened to + * collide with us. (Non-parallel collisions will not be detected.) */ + SVN_ERR(svn_fs_x__set_rep_reference(fs, rep, scratch_pool)); + } + + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__commit(svn_revnum_t *new_rev_p, + svn_fs_t *fs, + svn_fs_txn_t *txn, + apr_pool_t *scratch_pool) +{ + commit_baton_t cb; + svn_fs_x__data_t *ffd = fs->fsap_data; + + cb.new_rev_p = new_rev_p; + cb.fs = fs; + cb.txn = txn; + + if (ffd->rep_sharing_allowed) + { + cb.reps_to_cache = apr_array_make(scratch_pool, 5, + sizeof(svn_fs_x__representation_t *)); + cb.reps_hash = apr_hash_make(scratch_pool); + cb.reps_pool = scratch_pool; + } + else + { + cb.reps_to_cache = NULL; + cb.reps_hash = NULL; + cb.reps_pool = NULL; + } + + SVN_ERR(svn_fs_x__with_write_lock(fs, commit_body, &cb, scratch_pool)); + + /* At this point, *NEW_REV_P has been set, so errors below won't affect + the success of the commit. (See svn_fs_commit_txn().) */ + + if (ffd->rep_sharing_allowed) + { + SVN_ERR(svn_fs_x__open_rep_cache(fs, scratch_pool)); + + /* Write new entries to the rep-sharing database. + * + * We use an sqlite transaction to speed things up; + * see <http://www.sqlite.org/faq.html#q19>. + */ + /* ### A commit that touches thousands of files will starve other + (reader/writer) commits for the duration of the below call. + Maybe write in batches? */ + SVN_SQLITE__WITH_TXN( + write_reps_to_cache(fs, cb.reps_to_cache, scratch_pool), + ffd->rep_cache_db); + } + + return SVN_NO_ERROR; +} + + +svn_error_t * +svn_fs_x__list_transactions(apr_array_header_t **names_p, + svn_fs_t *fs, + apr_pool_t *pool) +{ + const char *txn_dir; + apr_hash_t *dirents; + apr_hash_index_t *hi; + apr_array_header_t *names; + apr_size_t ext_len = strlen(PATH_EXT_TXN); + + names = apr_array_make(pool, 1, sizeof(const char *)); + + /* Get the transactions directory. */ + txn_dir = svn_fs_x__path_txns_dir(fs, pool); + + /* Now find a listing of this directory. */ + SVN_ERR(svn_io_get_dirents3(&dirents, txn_dir, TRUE, pool, pool)); + + /* Loop through all the entries and return anything that ends with '.txn'. */ + for (hi = apr_hash_first(pool, dirents); hi; hi = apr_hash_next(hi)) + { + const char *name = apr_hash_this_key(hi); + apr_ssize_t klen = apr_hash_this_key_len(hi); + const char *id; + + /* The name must end with ".txn" to be considered a transaction. */ + if ((apr_size_t) klen <= ext_len + || (strcmp(name + klen - ext_len, PATH_EXT_TXN)) != 0) + continue; + + /* Truncate the ".txn" extension and store the ID. */ + id = apr_pstrndup(pool, name, strlen(name) - ext_len); + APR_ARRAY_PUSH(names, const char *) = id; + } + + *names_p = names; + + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__open_txn(svn_fs_txn_t **txn_p, + svn_fs_t *fs, + const char *name, + apr_pool_t *pool) +{ + svn_fs_txn_t *txn; + fs_txn_data_t *ftd; + svn_node_kind_t kind; + svn_fs_x__transaction_t *local_txn; + svn_fs_x__txn_id_t txn_id; + + SVN_ERR(svn_fs_x__txn_by_name(&txn_id, name)); + + /* First check to see if the directory exists. */ + SVN_ERR(svn_io_check_path(svn_fs_x__path_txn_dir(fs, txn_id, pool), + &kind, pool)); + + /* Did we find it? */ + if (kind != svn_node_dir) + return svn_error_createf(SVN_ERR_FS_NO_SUCH_TRANSACTION, NULL, + _("No such transaction '%s'"), + name); + + txn = apr_pcalloc(pool, sizeof(*txn)); + ftd = apr_pcalloc(pool, sizeof(*ftd)); + ftd->txn_id = txn_id; + + /* Read in the root node of this transaction. */ + txn->id = apr_pstrdup(pool, name); + txn->fs = fs; + + SVN_ERR(svn_fs_x__get_txn(&local_txn, fs, txn_id, pool)); + + txn->base_rev = local_txn->base_rev; + + txn->vtable = &txn_vtable; + txn->fsap_data = ftd; + *txn_p = txn; + + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__txn_proplist(apr_hash_t **table_p, + svn_fs_txn_t *txn, + apr_pool_t *pool) +{ + apr_hash_t *proplist = apr_hash_make(pool); + SVN_ERR(get_txn_proplist(proplist, txn->fs, svn_fs_x__txn_get_id(txn), + pool)); + *table_p = proplist; + + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__delete_node_revision(svn_fs_t *fs, + const svn_fs_x__id_t *id, + apr_pool_t *scratch_pool) +{ + svn_fs_x__noderev_t *noderev; + SVN_ERR(svn_fs_x__get_node_revision(&noderev, fs, id, scratch_pool, + scratch_pool)); + + /* Delete any mutable property representation. */ + if (noderev->prop_rep + && svn_fs_x__is_txn(noderev->prop_rep->id.change_set)) + SVN_ERR(svn_io_remove_file2(svn_fs_x__path_txn_node_props(fs, id, + scratch_pool, + scratch_pool), + FALSE, scratch_pool)); + + /* Delete any mutable data representation. */ + if (noderev->data_rep + && svn_fs_x__is_txn(noderev->data_rep->id.change_set) + && noderev->kind == svn_node_dir) + { + svn_fs_x__data_t *ffd = fs->fsap_data; + const svn_fs_x__id_t *key = id; + + SVN_ERR(svn_io_remove_file2( + svn_fs_x__path_txn_node_children(fs, id, scratch_pool, + scratch_pool), + FALSE, scratch_pool)); + + /* remove the corresponding entry from the cache, if such exists */ + SVN_ERR(svn_cache__set(ffd->dir_cache, key, NULL, scratch_pool)); + } + + return svn_io_remove_file2(svn_fs_x__path_txn_node_rev(fs, id, + scratch_pool, + scratch_pool), + FALSE, scratch_pool); +} + + + +/*** Transactions ***/ + +svn_error_t * +svn_fs_x__get_base_rev(svn_revnum_t *revnum, + svn_fs_t *fs, + svn_fs_x__txn_id_t txn_id, + apr_pool_t *scratch_pool) +{ + svn_fs_x__transaction_t *txn; + SVN_ERR(svn_fs_x__get_txn(&txn, fs, txn_id, scratch_pool)); + *revnum = txn->base_rev; + + return SVN_NO_ERROR; +} + + +/* Generic transaction operations. */ + +svn_error_t * +svn_fs_x__txn_prop(svn_string_t **value_p, + svn_fs_txn_t *txn, + const char *propname, + apr_pool_t *pool) +{ + apr_hash_t *table; + svn_fs_t *fs = txn->fs; + + SVN_ERR(svn_fs__check_fs(fs, TRUE)); + SVN_ERR(svn_fs_x__txn_proplist(&table, txn, pool)); + + *value_p = svn_hash_gets(table, propname); + + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__begin_txn(svn_fs_txn_t **txn_p, + svn_fs_t *fs, + svn_revnum_t rev, + apr_uint32_t flags, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool) +{ + svn_string_t date; + fs_txn_data_t *ftd; + apr_hash_t *props = apr_hash_make(scratch_pool); + + SVN_ERR(svn_fs__check_fs(fs, TRUE)); + + SVN_ERR(create_txn(txn_p, fs, rev, result_pool, scratch_pool)); + + /* Put a datestamp on the newly created txn, so we always know + exactly how old it is. (This will help sysadmins identify + long-abandoned txns that may need to be manually removed.) When + a txn is promoted to a revision, this property will be + automatically overwritten with a revision datestamp. */ + date.data = svn_time_to_cstring(apr_time_now(), scratch_pool); + date.len = strlen(date.data); + + svn_hash_sets(props, SVN_PROP_REVISION_DATE, &date); + + /* Set temporary txn props that represent the requested 'flags' + behaviors. */ + if (flags & SVN_FS_TXN_CHECK_OOD) + svn_hash_sets(props, SVN_FS__PROP_TXN_CHECK_OOD, + svn_string_create("true", scratch_pool)); + + if (flags & SVN_FS_TXN_CHECK_LOCKS) + svn_hash_sets(props, SVN_FS__PROP_TXN_CHECK_LOCKS, + svn_string_create("true", scratch_pool)); + + if (flags & SVN_FS_TXN_CLIENT_DATE) + svn_hash_sets(props, SVN_FS__PROP_TXN_CLIENT_DATE, + svn_string_create("0", scratch_pool)); + + ftd = (*txn_p)->fsap_data; + SVN_ERR(set_txn_proplist(fs, ftd->txn_id, props, FALSE, scratch_pool)); + + return SVN_NO_ERROR; +} diff --git a/subversion/libsvn_fs_x/transaction.h b/subversion/libsvn_fs_x/transaction.h new file mode 100644 index 0000000..490f716 --- /dev/null +++ b/subversion/libsvn_fs_x/transaction.h @@ -0,0 +1,316 @@ +/* transaction.h --- transaction-related functions of FSX + * + * ==================================================================== + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + * ==================================================================== + */ + +#ifndef SVN_LIBSVN_FS__TRANSACTION_H +#define SVN_LIBSVN_FS__TRANSACTION_H + +#include "fs.h" + +/* Return the transaction ID of TXN. + */ +svn_fs_x__txn_id_t +svn_fs_x__txn_get_id(svn_fs_txn_t *txn); + +/* Obtain a write lock on the filesystem FS in a subpool of SCRATCH_POOL, + call BODY with BATON and that subpool, destroy the subpool (releasing the + write lock) and return what BODY returned. */ +svn_error_t * +svn_fs_x__with_write_lock(svn_fs_t *fs, + svn_error_t *(*body)(void *baton, + apr_pool_t *scratch_pool), + void *baton, + apr_pool_t *scratch_pool); + +/* Obtain a pack operation lock on the filesystem FS in a subpool of + SCRATCH_POOL, call BODY with BATON and that subpool, destroy the subpool + (releasing the write lock) and return what BODY returned. */ +svn_error_t * +svn_fs_x__with_pack_lock(svn_fs_t *fs, + svn_error_t *(*body)(void *baton, + apr_pool_t *scratch_pool), + void *baton, + apr_pool_t *scratch_pool); + +/* Obtain the txn-current file lock on the filesystem FS in a subpool of + SCRATCH_POOL, call BODY with BATON and that subpool, destroy the subpool + (releasing the write lock) and return what BODY returned. */ +svn_error_t * +svn_fs_x__with_txn_current_lock(svn_fs_t *fs, + svn_error_t *(*body)(void *baton, + apr_pool_t *scratch_pool), + void *baton, + apr_pool_t *scratch_pool); + +/* Obtain all locks on the filesystem FS in a subpool of SCRATCH_POOL, + call BODY with BATON and that subpool, destroy the subpool (releasing + the locks) and return what BODY returned. + + This combines svn_fs_fs__with_write_lock, svn_fs_fs__with_pack_lock, + and svn_fs_fs__with_txn_current_lock, ensuring correct lock ordering. */ +svn_error_t * +svn_fs_x__with_all_locks(svn_fs_t *fs, + svn_error_t *(*body)(void *baton, + apr_pool_t *scratch_pool), + void *baton, + apr_pool_t *scratch_pool); + +/* Return TRUE, iff NODEREV is the root node of a transaction that has not + seen any modifications, yet. */ +svn_boolean_t +svn_fs_x__is_fresh_txn_root(svn_fs_x__noderev_t *noderev); + +/* Store NODEREV as the node-revision in the transaction defined by NODEREV's + ID within FS. Do any necessary temporary allocation in SCRATCH_POOL. */ +svn_error_t * +svn_fs_x__put_node_revision(svn_fs_t *fs, + svn_fs_x__noderev_t *noderev, + apr_pool_t *scratch_pool); + +/* Find the paths which were changed in transaction TXN_ID of + filesystem FS and store them in *CHANGED_PATHS_P. + Get any temporary allocations from SCRATCH_POOL. */ +svn_error_t * +svn_fs_x__txn_changes_fetch(apr_hash_t **changed_paths_p, + svn_fs_t *fs, + svn_fs_x__txn_id_t txn_id, + apr_pool_t *scratch_pool); + +/* Set the transaction property NAME to the value VALUE in transaction + TXN. Perform temporary allocations from SCRATCH_POOL. */ +svn_error_t * +svn_fs_x__change_txn_prop(svn_fs_txn_t *txn, + const char *name, + const svn_string_t *value, + apr_pool_t *scratch_pool); + +/* Change transaction properties in transaction TXN based on PROPS. + Perform temporary allocations from SCRATCH_POOL. */ +svn_error_t * +svn_fs_x__change_txn_props(svn_fs_txn_t *txn, + const apr_array_header_t *props, + apr_pool_t *scratch_pool); + +/* Store a transaction record in *TXN_P for the transaction identified + by TXN_ID in filesystem FS. Allocate everything from POOL. */ +svn_error_t * +svn_fs_x__get_txn(svn_fs_x__transaction_t **txn_p, + svn_fs_t *fs, + svn_fs_x__txn_id_t txn_id, + apr_pool_t *pool); + +/* Return the next available copy_id in *COPY_ID for the transaction + TXN_ID in filesystem FS. Allocate temporaries in SCRATCH_POOL. */ +svn_error_t * +svn_fs_x__reserve_copy_id(svn_fs_x__id_t *copy_id_p, + svn_fs_t *fs, + svn_fs_x__txn_id_t txn_id, + apr_pool_t *scratch_pool); + +/* Create an entirely new mutable node in the filesystem FS, whose + node-revision is NODEREV. COPY_ID is the copy_id to use in the + node revision ID. TXN_ID is the Subversion transaction under + which this occurs. */ +svn_error_t * +svn_fs_x__create_node(svn_fs_t *fs, + svn_fs_x__noderev_t *noderev, + const svn_fs_x__id_t *copy_id, + svn_fs_x__txn_id_t txn_id, + apr_pool_t *scratch_pool); + +/* Remove all references to the transaction TXN_ID from filesystem FS. + Temporary allocations are from SCRATCH_POOL. */ +svn_error_t * +svn_fs_x__purge_txn(svn_fs_t *fs, + const char *txn_id, + apr_pool_t *scratch_pool); + +/* Abort the existing transaction TXN, performing any temporary + allocations in SCRATCH_POOL. */ +svn_error_t * +svn_fs_x__abort_txn(svn_fs_txn_t *txn, + apr_pool_t *scratch_pool); + +/* Add or set in filesystem FS, transaction TXN_ID, in directory + PARENT_NODEREV a directory entry for NAME pointing to ID of type + KIND. The PARENT_NODEREV's DATA_REP will be redirected to the in-txn + representation, if it had not been mutable before. + + If PARENT_NODEREV does not have a DATA_REP, allocate one in RESULT_POOL. + Temporary allocations are done in SCRATCH_POOL. */ +svn_error_t * +svn_fs_x__set_entry(svn_fs_t *fs, + svn_fs_x__txn_id_t txn_id, + svn_fs_x__noderev_t *parent_noderev, + const char *name, + const svn_fs_x__id_t *id, + svn_node_kind_t kind, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool); + +/* Add a change to the changes record for filesystem FS in transaction + TXN_ID. Mark path PATH, having noderev-id ID, as changed according to + the type in CHANGE_KIND. If the text representation was changed set + TEXT_MOD to TRUE, and likewise for PROP_MOD as well as MERGEINFO_MOD. + If this change was the result of a copy, set COPYFROM_REV and + COPYFROM_PATH to the revision and path of the copy source, otherwise + they should be set to SVN_INVALID_REVNUM and NULL. Perform any + temporary allocations from SCRATCH_POOL. */ +svn_error_t * +svn_fs_x__add_change(svn_fs_t *fs, + svn_fs_x__txn_id_t txn_id, + const char *path, + const svn_fs_x__id_t *id, + svn_fs_path_change_kind_t change_kind, + svn_boolean_t text_mod, + svn_boolean_t prop_mod, + svn_boolean_t mergeinfo_mod, + svn_node_kind_t node_kind, + svn_revnum_t copyfrom_rev, + const char *copyfrom_path, + apr_pool_t *scratch_pool); + +/* Return a writable stream in *STREAM, allocated in RESULT_POOL, that + allows storing the text representation of node-revision NODEREV in + filesystem FS. */ +svn_error_t * +svn_fs_x__set_contents(svn_stream_t **stream, + svn_fs_t *fs, + svn_fs_x__noderev_t *noderev, + apr_pool_t *result_pool); + +/* Create a node revision in FS which is an immediate successor of + NEW_NODEREV's predecessor. Use SCRATCH_POOL for any temporary allocation. + + COPY_ID, is a key into the `copies' table, and + indicates that this new node is being created as the result of a + copy operation, and specifically which operation that was. + + TXN_ID is the Subversion transaction under which this occurs. + + After this call, the deltification code assumes that the new node's + contents will change frequently, and will avoid representing other + nodes as deltas against this node's contents. */ +svn_error_t * +svn_fs_x__create_successor(svn_fs_t *fs, + svn_fs_x__noderev_t *new_noderev, + const svn_fs_x__id_t *copy_id, + svn_fs_x__txn_id_t txn_id, + apr_pool_t *scratch_pool); + +/* Write a new property list PROPLIST for node-revision NODEREV in + filesystem FS. Perform any temporary allocations in SCRATCH_POOL. */ +svn_error_t * +svn_fs_x__set_proplist(svn_fs_t *fs, + svn_fs_x__noderev_t *noderev, + apr_hash_t *proplist, + apr_pool_t *scratch_pool); + +/* Append the L2P and P2L indexes given by their proto index file names + * L2P_PROTO_INDEX and P2L_PROTO_INDEX to the revision / pack FILE. + * The latter contains revision(s) starting at REVISION in FS. + * Use SCRATCH_POOL for temporary allocations. */ +svn_error_t * +svn_fs_x__add_index_data(svn_fs_t *fs, + apr_file_t *file, + const char *l2p_proto_index, + const char *p2l_proto_index, + svn_revnum_t revision, + apr_pool_t *scratch_pool); + +/* Commit the transaction TXN in filesystem FS and return its new + revision number in *REV. If the transaction is out of date, return + the error SVN_ERR_FS_TXN_OUT_OF_DATE. Use SCRATCH_POOL for temporary + allocations. */ +svn_error_t * +svn_fs_x__commit(svn_revnum_t *new_rev_p, + svn_fs_t *fs, + svn_fs_txn_t *txn, + apr_pool_t *scratch_pool); + +/* Set *NAMES_P to an array of names which are all the active + transactions in filesystem FS. Allocate the array from POOL. */ +svn_error_t * +svn_fs_x__list_transactions(apr_array_header_t **names_p, + svn_fs_t *fs, + apr_pool_t *pool); + +/* Open the transaction named NAME in filesystem FS. Set *TXN_P to + * the transaction. If there is no such transaction, return +` * SVN_ERR_FS_NO_SUCH_TRANSACTION. Allocate the new transaction in + * POOL. */ +svn_error_t * +svn_fs_x__open_txn(svn_fs_txn_t **txn_p, + svn_fs_t *fs, + const char *name, + apr_pool_t *pool); + +/* Return the property list from transaction TXN and store it in + *PROPLIST. Allocate the property list from POOL. */ +svn_error_t * +svn_fs_x__txn_proplist(apr_hash_t **table_p, + svn_fs_txn_t *txn, + apr_pool_t *pool); + +/* Delete the mutable node-revision referenced by ID, along with any + mutable props or directory contents associated with it. Perform + temporary allocations in SCRATCH_POOL. */ +svn_error_t * +svn_fs_x__delete_node_revision(svn_fs_t *fs, + const svn_fs_x__id_t *id, + apr_pool_t *scratch_pool); + +/* Retrieve information about the Subversion transaction TXN_ID from + the `transactions' table of FS, using SCRATCH_POOL for temporary + allocations. Set *RENUM to the transaction's base revision. + + If there is no such transaction, SVN_ERR_FS_NO_SUCH_TRANSACTION is + the error returned. + + Returns SVN_ERR_FS_TRANSACTION_NOT_MUTABLE if TXN_NAME refers to a + transaction that has already been committed. */ +svn_error_t * +svn_fs_x__get_base_rev(svn_revnum_t *revnum, + svn_fs_t *fs, + svn_fs_x__txn_id_t txn_id, + apr_pool_t *scratch_pool); + +/* Find the value of the property named PROPNAME in transaction TXN. + Return the contents in *VALUE_P. The contents will be allocated + from POOL. */ +svn_error_t * +svn_fs_x__txn_prop(svn_string_t **value_p, + svn_fs_txn_t *txn, + const char *propname, + apr_pool_t *pool); + +/* Begin a new transaction in filesystem FS, based on existing + revision REV. The new transaction is returned in *TXN_P, allocated + in RESULT_POOL. Allocate temporaries from SCRATCH_POOL. */ +svn_error_t * +svn_fs_x__begin_txn(svn_fs_txn_t **txn_p, + svn_fs_t *fs, + svn_revnum_t rev, + apr_uint32_t flags, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool); + +#endif diff --git a/subversion/libsvn_fs_x/tree.c b/subversion/libsvn_fs_x/tree.c new file mode 100644 index 0000000..ce24765 --- /dev/null +++ b/subversion/libsvn_fs_x/tree.c @@ -0,0 +1,4542 @@ +/* tree.c : tree-like filesystem, built on DAG filesystem + * + * ==================================================================== + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + * ==================================================================== + */ + + +/* The job of this layer is to take a filesystem with lots of node + sharing going on --- the real DAG filesystem as it appears in the + database --- and make it look and act like an ordinary tree + filesystem, with no sharing. + + We do just-in-time cloning: you can walk from some unfinished + transaction's root down into directories and files shared with + committed revisions; as soon as you try to change something, the + appropriate nodes get cloned (and parent directory entries updated) + invisibly, behind your back. Any other references you have to + nodes that have been cloned by other changes, even made by other + processes, are automatically updated to point to the right clones. */ + + +#include <stdlib.h> +#include <string.h> +#include <assert.h> +#include <apr_pools.h> +#include <apr_hash.h> + +#include "svn_hash.h" +#include "svn_private_config.h" +#include "svn_pools.h" +#include "svn_error.h" +#include "svn_path.h" +#include "svn_mergeinfo.h" +#include "svn_fs.h" +#include "svn_props.h" +#include "svn_sorts.h" + +#include "fs.h" +#include "dag.h" +#include "lock.h" +#include "tree.h" +#include "fs_x.h" +#include "fs_id.h" +#include "temp_serializer.h" +#include "cached_data.h" +#include "transaction.h" +#include "pack.h" +#include "util.h" + +#include "private/svn_mergeinfo_private.h" +#include "private/svn_subr_private.h" +#include "private/svn_fs_util.h" +#include "private/svn_fspath.h" +#include "../libsvn_fs/fs-loader.h" + + + +/* The root structures. + + Why do they contain different data? Well, transactions are mutable + enough that it isn't safe to cache the DAG node for the root + directory or the hash of copyfrom data: somebody else might modify + them concurrently on disk! (Why is the DAG node cache safer than + the root DAG node? When cloning transaction DAG nodes in and out + of the cache, all of the possibly-mutable data from the + svn_fs_x__noderev_t inside the dag_node_t is dropped.) Additionally, + revisions are immutable enough that their DAG node cache can be + kept in the FS object and shared among multiple revision root + objects. +*/ +typedef dag_node_t fs_rev_root_data_t; + +typedef struct fs_txn_root_data_t +{ + /* TXN_ID value from the main struct but as a struct instead of a string */ + svn_fs_x__txn_id_t txn_id; + + /* Cache of txn DAG nodes (without their nested noderevs, because + * it's mutable). Same keys/values as ffd->rev_node_cache. */ + svn_cache__t *txn_node_cache; +} fs_txn_root_data_t; + +/* Declared here to resolve the circular dependencies. */ +static svn_error_t * +get_dag(dag_node_t **dag_node_p, + svn_fs_root_t *root, + const char *path, + apr_pool_t *pool); + +static svn_fs_root_t * +make_revision_root(svn_fs_t *fs, + svn_revnum_t rev, + apr_pool_t *result_pool); + +static svn_error_t * +make_txn_root(svn_fs_root_t **root_p, + svn_fs_t *fs, + svn_fs_x__txn_id_t txn_id, + svn_revnum_t base_rev, + apr_uint32_t flags, + apr_pool_t *result_pool); + +static svn_error_t * +x_closest_copy(svn_fs_root_t **root_p, + const char **path_p, + svn_fs_root_t *root, + const char *path, + apr_pool_t *pool); + + +/*** Node Caching ***/ + +/* 1st level cache */ + +/* An entry in the first-level cache. REVISION and PATH form the key that + will ultimately be matched. + */ +typedef struct cache_entry_t +{ + /* hash value derived from PATH, REVISION. + Used to short-circuit failed lookups. */ + apr_uint32_t hash_value; + + /* revision to which the NODE belongs */ + svn_revnum_t revision; + + /* path of the NODE */ + char *path; + + /* cached value of strlen(PATH). */ + apr_size_t path_len; + + /* the node allocated in the cache's pool. NULL for empty entries. */ + dag_node_t *node; +} cache_entry_t; + +/* Number of entries in the cache. Keep this low to keep pressure on the + CPU caches low as well. A binary value is most efficient. If we walk + a directory tree, we want enough entries to store nodes for all files + without overwriting the nodes for the parent folder. That way, there + will be no unnecessary misses (except for a few random ones caused by + hash collision). + + The actual number of instances may be higher but entries that got + overwritten are no longer visible. + */ +enum { BUCKET_COUNT = 256 }; + +/* The actual cache structure. All nodes will be allocated in POOL. + When the number of INSERTIONS (i.e. objects created form that pool) + exceeds a certain threshold, the pool will be cleared and the cache + with it. + */ +struct svn_fs_x__dag_cache_t +{ + /* fixed number of (possibly empty) cache entries */ + cache_entry_t buckets[BUCKET_COUNT]; + + /* pool used for all node allocation */ + apr_pool_t *pool; + + /* number of entries created from POOL since the last cleanup */ + apr_size_t insertions; + + /* Property lookups etc. have a very high locality (75% re-hit). + Thus, remember the last hit location for optimistic lookup. */ + apr_size_t last_hit; + + /* Position of the last bucket hit that actually had a DAG node in it. + LAST_HIT may refer to a bucket that matches path@rev but has not + its NODE element set, yet. + This value is a mere hint for optimistic lookup and any value is + valid (as long as it is < BUCKET_COUNT). */ + apr_size_t last_non_empty; +}; + +svn_fs_x__dag_cache_t* +svn_fs_x__create_dag_cache(apr_pool_t *result_pool) +{ + svn_fs_x__dag_cache_t *result = apr_pcalloc(result_pool, sizeof(*result)); + result->pool = svn_pool_create(result_pool); + + return result; +} + +/* Clears the CACHE at regular intervals (destroying all cached nodes) + */ +static void +auto_clear_dag_cache(svn_fs_x__dag_cache_t* cache) +{ + if (cache->insertions > BUCKET_COUNT) + { + svn_pool_clear(cache->pool); + + memset(cache->buckets, 0, sizeof(cache->buckets)); + cache->insertions = 0; + } +} + +/* For the given REVISION and PATH, return the respective entry in CACHE. + If the entry is empty, its NODE member will be NULL and the caller + may then set it to the corresponding DAG node allocated in CACHE->POOL. + */ +static cache_entry_t * +cache_lookup( svn_fs_x__dag_cache_t *cache + , svn_revnum_t revision + , const char *path) +{ + apr_size_t i, bucket_index; + apr_size_t path_len = strlen(path); + apr_uint32_t hash_value = (apr_uint32_t)revision; + +#if SVN_UNALIGNED_ACCESS_IS_OK + /* "randomizing" / distributing factor used in our hash function */ + const apr_uint32_t factor = 0xd1f3da69; +#endif + + /* optimistic lookup: hit the same bucket again? */ + cache_entry_t *result = &cache->buckets[cache->last_hit]; + if ( (result->revision == revision) + && (result->path_len == path_len) + && !memcmp(result->path, path, path_len)) + { + /* Remember the position of the last node we found in this cache. */ + if (result->node) + cache->last_non_empty = cache->last_hit; + + return result; + } + + /* need to do a full lookup. Calculate the hash value + (HASH_VALUE has been initialized to REVISION). */ + i = 0; +#if SVN_UNALIGNED_ACCESS_IS_OK + /* We relax the dependency chain between iterations by processing + two chunks from the input per hash_value self-multiplication. + The HASH_VALUE update latency is now 1 MUL latency + 1 ADD latency + per 2 chunks instead of 1 chunk. + */ + for (; i + 8 <= path_len; i += 8) + hash_value = hash_value * factor * factor + + ( *(const apr_uint32_t*)(path + i) * factor + + *(const apr_uint32_t*)(path + i + 4)); +#endif + + for (; i < path_len; ++i) + /* Help GCC to minimize the HASH_VALUE update latency by splitting the + MUL 33 of the naive implementation: h = h * 33 + path[i]. This + shortens the dependency chain from 1 shift + 2 ADDs to 1 shift + 1 ADD. + */ + hash_value = hash_value * 32 + (hash_value + (unsigned char)path[i]); + + bucket_index = hash_value + (hash_value >> 16); + bucket_index = (bucket_index + (bucket_index >> 8)) % BUCKET_COUNT; + + /* access the corresponding bucket and remember its location */ + result = &cache->buckets[bucket_index]; + cache->last_hit = bucket_index; + + /* if it is *NOT* a match, clear the bucket, expect the caller to fill + in the node and count it as an insertion */ + if ( (result->hash_value != hash_value) + || (result->revision != revision) + || (result->path_len != path_len) + || memcmp(result->path, path, path_len)) + { + result->hash_value = hash_value; + result->revision = revision; + if (result->path_len < path_len) + result->path = apr_palloc(cache->pool, path_len + 1); + result->path_len = path_len; + memcpy(result->path, path, path_len + 1); + + result->node = NULL; + + cache->insertions++; + } + else if (result->node) + { + /* This bucket is valid & has a suitable DAG node in it. + Remember its location. */ + cache->last_non_empty = bucket_index; + } + + return result; +} + +/* Optimistic lookup using the last seen non-empty location in CACHE. + Return the node of that entry, if it is still in use and matches PATH. + Return NULL otherwise. Since the caller usually already knows the path + length, provide it in PATH_LEN. */ +static dag_node_t * +cache_lookup_last_path(svn_fs_x__dag_cache_t *cache, + const char *path, + apr_size_t path_len) +{ + cache_entry_t *result = &cache->buckets[cache->last_non_empty]; + assert(strlen(path) == path_len); + + if ( result->node + && (result->path_len == path_len) + && !memcmp(result->path, path, path_len)) + { + return result->node; + } + + return NULL; +} + +/* 2nd level cache */ + +/* Find and return the DAG node cache for ROOT and the key that + should be used for PATH. + + RESULT_POOL will only be used for allocating a new keys if necessary. */ +static void +locate_cache(svn_cache__t **cache, + const char **key, + svn_fs_root_t *root, + const char *path, + apr_pool_t *result_pool) +{ + if (root->is_txn_root) + { + fs_txn_root_data_t *frd = root->fsap_data; + + if (cache) + *cache = frd->txn_node_cache; + if (key && path) + *key = path; + } + else + { + svn_fs_x__data_t *ffd = root->fs->fsap_data; + + if (cache) + *cache = ffd->rev_node_cache; + if (key && path) + *key = svn_fs_x__combine_number_and_string(root->rev, path, + result_pool); + } +} + +/* Return NODE for PATH from ROOT's node cache, or NULL if the node + isn't cached; read it from the FS. *NODE remains valid until either + POOL or the FS gets cleared or destroyed (whichever comes first). + */ +static svn_error_t * +dag_node_cache_get(dag_node_t **node_p, + svn_fs_root_t *root, + const char *path, + apr_pool_t *pool) +{ + svn_boolean_t found; + dag_node_t *node = NULL; + svn_cache__t *cache; + const char *key; + + SVN_ERR_ASSERT(*path == '/'); + + if (!root->is_txn_root) + { + /* immutable DAG node. use the global caches for it */ + + svn_fs_x__data_t *ffd = root->fs->fsap_data; + cache_entry_t *bucket; + + auto_clear_dag_cache(ffd->dag_node_cache); + bucket = cache_lookup(ffd->dag_node_cache, root->rev, path); + if (bucket->node == NULL) + { + locate_cache(&cache, &key, root, path, pool); + SVN_ERR(svn_cache__get((void **)&node, &found, cache, key, + ffd->dag_node_cache->pool)); + if (found && node) + { + /* Patch up the FS, since this might have come from an old FS + * object. */ + svn_fs_x__dag_set_fs(node, root->fs); + bucket->node = node; + } + } + else + { + node = bucket->node; + } + } + else + { + /* DAG is mutable / may become invalid. Use the TXN-local cache */ + + locate_cache(&cache, &key, root, path, pool); + + SVN_ERR(svn_cache__get((void **) &node, &found, cache, key, pool)); + if (found && node) + { + /* Patch up the FS, since this might have come from an old FS + * object. */ + svn_fs_x__dag_set_fs(node, root->fs); + } + } + + *node_p = node; + + return SVN_NO_ERROR; +} + + +/* Add the NODE for PATH to ROOT's node cache. */ +static svn_error_t * +dag_node_cache_set(svn_fs_root_t *root, + const char *path, + dag_node_t *node, + apr_pool_t *scratch_pool) +{ + svn_cache__t *cache; + const char *key; + + SVN_ERR_ASSERT(*path == '/'); + + /* Do *not* attempt to dup and put the node into L1. + * dup() is twice as expensive as an L2 lookup (which will set also L1). + */ + locate_cache(&cache, &key, root, path, scratch_pool); + + return svn_cache__set(cache, key, node, scratch_pool); +} + + +/* Baton for find_descendants_in_cache. */ +typedef struct fdic_baton_t +{ + const char *path; + apr_array_header_t *list; + apr_pool_t *pool; +} fdic_baton_t; + +/* If the given item is a descendant of BATON->PATH, push + * it onto BATON->LIST (copying into BATON->POOL). Implements + * the svn_iter_apr_hash_cb_t prototype. */ +static svn_error_t * +find_descendants_in_cache(void *baton, + const void *key, + apr_ssize_t klen, + void *val, + apr_pool_t *pool) +{ + fdic_baton_t *b = baton; + const char *item_path = key; + + if (svn_fspath__skip_ancestor(b->path, item_path)) + APR_ARRAY_PUSH(b->list, const char *) = apr_pstrdup(b->pool, item_path); + + return SVN_NO_ERROR; +} + +/* Invalidate cache entries for PATH and any of its children. This + should *only* be called on a transaction root! */ +static svn_error_t * +dag_node_cache_invalidate(svn_fs_root_t *root, + const char *path, + apr_pool_t *scratch_pool) +{ + fdic_baton_t b; + svn_cache__t *cache; + apr_pool_t *iterpool; + int i; + + b.path = path; + b.pool = svn_pool_create(scratch_pool); + b.list = apr_array_make(b.pool, 1, sizeof(const char *)); + + SVN_ERR_ASSERT(root->is_txn_root); + locate_cache(&cache, NULL, root, NULL, b.pool); + + + SVN_ERR(svn_cache__iter(NULL, cache, find_descendants_in_cache, + &b, b.pool)); + + iterpool = svn_pool_create(b.pool); + + for (i = 0; i < b.list->nelts; i++) + { + const char *descendant = APR_ARRAY_IDX(b.list, i, const char *); + svn_pool_clear(iterpool); + SVN_ERR(svn_cache__set(cache, descendant, NULL, iterpool)); + } + + svn_pool_destroy(iterpool); + svn_pool_destroy(b.pool); + return SVN_NO_ERROR; +} + + + +/* Creating transaction and revision root nodes. */ + +svn_error_t * +svn_fs_x__txn_root(svn_fs_root_t **root_p, + svn_fs_txn_t *txn, + apr_pool_t *pool) +{ + apr_uint32_t flags = 0; + apr_hash_t *txnprops; + + /* Look for the temporary txn props representing 'flags'. */ + SVN_ERR(svn_fs_x__txn_proplist(&txnprops, txn, pool)); + if (txnprops) + { + if (svn_hash_gets(txnprops, SVN_FS__PROP_TXN_CHECK_OOD)) + flags |= SVN_FS_TXN_CHECK_OOD; + + if (svn_hash_gets(txnprops, SVN_FS__PROP_TXN_CHECK_LOCKS)) + flags |= SVN_FS_TXN_CHECK_LOCKS; + } + + return make_txn_root(root_p, txn->fs, svn_fs_x__txn_get_id(txn), + txn->base_rev, flags, pool); +} + + +svn_error_t * +svn_fs_x__revision_root(svn_fs_root_t **root_p, + svn_fs_t *fs, + svn_revnum_t rev, + apr_pool_t *pool) +{ + SVN_ERR(svn_fs__check_fs(fs, TRUE)); + SVN_ERR(svn_fs_x__ensure_revision_exists(rev, fs, pool)); + + *root_p = make_revision_root(fs, rev, pool); + + return SVN_NO_ERROR; +} + + + +/* Getting dag nodes for roots. */ + +/* Return the transaction ID to a given transaction ROOT. */ +static svn_fs_x__txn_id_t +root_txn_id(svn_fs_root_t *root) +{ + fs_txn_root_data_t *frd = root->fsap_data; + assert(root->is_txn_root); + + return frd->txn_id; +} + +/* Set *NODE_P to a freshly opened dag node referring to the root + directory of ROOT, allocating from RESULT_POOL. Use SCRATCH_POOL + for temporary allocations. */ +static svn_error_t * +root_node(dag_node_t **node_p, + svn_fs_root_t *root, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool) +{ + if (root->is_txn_root) + { + /* It's a transaction root. Open a fresh copy. */ + return svn_fs_x__dag_txn_root(node_p, root->fs, root_txn_id(root), + result_pool, scratch_pool); + } + else + { + /* It's a revision root, so we already have its root directory + opened. */ + return svn_fs_x__dag_revision_root(node_p, root->fs, root->rev, + result_pool, scratch_pool); + } +} + + +/* Set *NODE_P to a mutable root directory for ROOT, cloning if + necessary, allocating in RESULT_POOL. ROOT must be a transaction root. + Use ERROR_PATH in error messages. Use SCRATCH_POOL for temporaries.*/ +static svn_error_t * +mutable_root_node(dag_node_t **node_p, + svn_fs_root_t *root, + const char *error_path, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool) +{ + if (root->is_txn_root) + { + /* It's a transaction root. Open a fresh copy. */ + return svn_fs_x__dag_txn_root(node_p, root->fs, root_txn_id(root), + result_pool, scratch_pool); + } + else + /* If it's not a transaction root, we can't change its contents. */ + return SVN_FS__ERR_NOT_MUTABLE(root->fs, root->rev, error_path); +} + + + +/* Traversing directory paths. */ + +typedef enum copy_id_inherit_t +{ + copy_id_inherit_unknown = 0, + copy_id_inherit_self, + copy_id_inherit_parent, + copy_id_inherit_new + +} copy_id_inherit_t; + +/* A linked list representing the path from a node up to a root + directory. We use this for cloning, and for operations that need + to deal with both a node and its parent directory. For example, a + `delete' operation needs to know that the node actually exists, but + also needs to change the parent directory. */ +typedef struct parent_path_t +{ + + /* A node along the path. This could be the final node, one of its + parents, or the root. Every parent path ends with an element for + the root directory. */ + dag_node_t *node; + + /* The name NODE has in its parent directory. This is zero for the + root directory, which (obviously) has no name in its parent. */ + char *entry; + + /* The parent of NODE, or zero if NODE is the root directory. */ + struct parent_path_t *parent; + + /* The copy ID inheritance style. */ + copy_id_inherit_t copy_inherit; + + /* If copy ID inheritance style is copy_id_inherit_new, this is the + path which should be implicitly copied; otherwise, this is NULL. */ + const char *copy_src_path; + +} parent_path_t; + +/* Return a text string describing the absolute path of parent_path + PARENT_PATH. It will be allocated in POOL. */ +static const char * +parent_path_path(parent_path_t *parent_path, + apr_pool_t *pool) +{ + const char *path_so_far = "/"; + if (parent_path->parent) + path_so_far = parent_path_path(parent_path->parent, pool); + return parent_path->entry + ? svn_fspath__join(path_so_far, parent_path->entry, pool) + : path_so_far; +} + + +/* Return the FS path for the parent path chain object CHILD relative + to its ANCESTOR in the same chain, allocated in POOL. */ +static const char * +parent_path_relpath(parent_path_t *child, + parent_path_t *ancestor, + apr_pool_t *pool) +{ + const char *path_so_far = ""; + parent_path_t *this_node = child; + while (this_node != ancestor) + { + assert(this_node != NULL); + path_so_far = svn_relpath_join(this_node->entry, path_so_far, pool); + this_node = this_node->parent; + } + return path_so_far; +} + + + +/* Choose a copy ID inheritance method *INHERIT_P to be used in the + event that immutable node CHILD in FS needs to be made mutable. If + the inheritance method is copy_id_inherit_new, also return a + *COPY_SRC_PATH on which to base the new copy ID (else return NULL + for that path). CHILD must have a parent (it cannot be the root + node). Allocations are taken from POOL. */ +static svn_error_t * +get_copy_inheritance(copy_id_inherit_t *inherit_p, + const char **copy_src_path, + svn_fs_t *fs, + parent_path_t *child, + apr_pool_t *pool) +{ + svn_fs_x__id_t child_copy_id, parent_copy_id; + svn_boolean_t related; + const char *id_path = NULL; + svn_fs_root_t *copyroot_root; + dag_node_t *copyroot_node; + svn_revnum_t copyroot_rev; + const char *copyroot_path; + + SVN_ERR_ASSERT(child && child->parent); + + /* Initialize some convenience variables. */ + SVN_ERR(svn_fs_x__dag_get_copy_id(&child_copy_id, child->node)); + SVN_ERR(svn_fs_x__dag_get_copy_id(&parent_copy_id, child->parent->node)); + + /* If this child is already mutable, we have nothing to do. */ + if (svn_fs_x__dag_check_mutable(child->node)) + { + *inherit_p = copy_id_inherit_self; + *copy_src_path = NULL; + return SVN_NO_ERROR; + } + + /* From this point on, we'll assume that the child will just take + its copy ID from its parent. */ + *inherit_p = copy_id_inherit_parent; + *copy_src_path = NULL; + + /* Special case: if the child's copy ID is '0', use the parent's + copy ID. */ + if (svn_fs_x__id_is_root(&child_copy_id)) + return SVN_NO_ERROR; + + /* Compare the copy IDs of the child and its parent. If they are + the same, then the child is already on the same branch as the + parent, and should use the same mutability copy ID that the + parent will use. */ + if (svn_fs_x__id_eq(&child_copy_id, &parent_copy_id)) + return SVN_NO_ERROR; + + /* If the child is on the same branch that the parent is on, the + child should just use the same copy ID that the parent would use. + Else, the child needs to generate a new copy ID to use should it + need to be made mutable. We will claim that child is on the same + branch as its parent if the child itself is not a branch point, + or if it is a branch point that we are accessing via its original + copy destination path. */ + SVN_ERR(svn_fs_x__dag_get_copyroot(©root_rev, ©root_path, + child->node)); + SVN_ERR(svn_fs_x__revision_root(©root_root, fs, copyroot_rev, pool)); + SVN_ERR(get_dag(©root_node, copyroot_root, copyroot_path, pool)); + + SVN_ERR(svn_fs_x__dag_related_node(&related, copyroot_node, child->node)); + if (!related) + return SVN_NO_ERROR; + + /* Determine if we are looking at the child via its original path or + as a subtree item of a copied tree. */ + id_path = svn_fs_x__dag_get_created_path(child->node); + if (strcmp(id_path, parent_path_path(child, pool)) == 0) + { + *inherit_p = copy_id_inherit_self; + return SVN_NO_ERROR; + } + + /* We are pretty sure that the child node is an unedited nested + branched node. When it needs to be made mutable, it should claim + a new copy ID. */ + *inherit_p = copy_id_inherit_new; + *copy_src_path = id_path; + return SVN_NO_ERROR; +} + +/* Allocate a new parent_path_t node from RESULT_POOL, referring to NODE, + ENTRY, PARENT, and COPY_ID. */ +static parent_path_t * +make_parent_path(dag_node_t *node, + char *entry, + parent_path_t *parent, + apr_pool_t *result_pool) +{ + parent_path_t *parent_path = apr_pcalloc(result_pool, sizeof(*parent_path)); + if (node) + parent_path->node = svn_fs_x__dag_copy_into_pool(node, result_pool); + parent_path->entry = entry; + parent_path->parent = parent; + parent_path->copy_inherit = copy_id_inherit_unknown; + parent_path->copy_src_path = NULL; + return parent_path; +} + + +/* Flags for open_path. */ +typedef enum open_path_flags_t { + + /* The last component of the PATH need not exist. (All parent + directories must exist, as usual.) If the last component doesn't + exist, simply leave the `node' member of the bottom parent_path + component zero. */ + open_path_last_optional = 1, + + /* When this flag is set, don't bother to lookup the DAG node in + our caches because we already tried this. Ignoring this flag + has no functional impact. */ + open_path_uncached = 2, + + /* The caller does not care about the parent node chain but only + the final DAG node. */ + open_path_node_only = 4, + + /* The caller wants a NULL path object instead of an error if the + path cannot be found. */ + open_path_allow_null = 8 +} open_path_flags_t; + +/* Try a short-cut for the open_path() function using the last node accessed. + * If that ROOT is that nodes's "created rev" and PATH of PATH_LEN chars is + * its "created path", return the node in *NODE_P. Set it to NULL otherwise. + * + * This function is used to support ra_serf-style access patterns where we + * are first asked for path@rev and then for path@c_rev of the same node. + * The shortcut works by ignoring the "rev" part of the cache key and then + * checking whether we got lucky. Lookup and verification are both quick + * plus there are many early outs for common types of mismatch. + */ +static svn_error_t * +try_match_last_node(dag_node_t **node_p, + svn_fs_root_t *root, + const char *path, + apr_size_t path_len, + apr_pool_t *scratch_pool) +{ + svn_fs_x__data_t *ffd = root->fs->fsap_data; + + /* Optimistic lookup: if the last node returned from the cache applied to + the same PATH, return it in NODE. */ + dag_node_t *node + = cache_lookup_last_path(ffd->dag_node_cache, path, path_len); + + /* Did we get a bucket with a committed node? */ + if (node && !svn_fs_x__dag_check_mutable(node)) + { + /* Get the path&rev pair at which this node was created. + This is repository location for which this node is _known_ to be + the right lookup result irrespective of how we found it. */ + const char *created_path + = svn_fs_x__dag_get_created_path(node); + svn_revnum_t revision = svn_fs_x__dag_get_revision(node); + + /* Is it an exact match? */ + if (revision == root->rev && strcmp(created_path, path) == 0) + { + /* Cache it under its full path@rev access path. */ + SVN_ERR(dag_node_cache_set(root, path, node, scratch_pool)); + + *node_p = node; + return SVN_NO_ERROR; + } + } + + *node_p = NULL; + return SVN_NO_ERROR; +} + + +/* Open the node identified by PATH in ROOT, allocating in POOL. Set + *PARENT_PATH_P to a path from the node up to ROOT. The resulting + **PARENT_PATH_P value is guaranteed to contain at least one + *element, for the root directory. PATH must be in canonical form. + + If resulting *PARENT_PATH_P will eventually be made mutable and + modified, or if copy ID inheritance information is otherwise needed, + IS_TXN_PATH must be set. If IS_TXN_PATH is FALSE, no copy ID + inheritance information will be calculated for the *PARENT_PATH_P chain. + + If FLAGS & open_path_last_optional is zero, return the error + SVN_ERR_FS_NOT_FOUND if the node PATH refers to does not exist. If + non-zero, require all the parent directories to exist as normal, + but if the final path component doesn't exist, simply return a path + whose bottom `node' member is zero. This option is useful for + callers that create new nodes --- we find the parent directory for + them, and tell them whether the entry exists already. + + The remaining bits in FLAGS are hints that allow this function + to take shortcuts based on knowledge that the caller provides, + such as the caller is not actually being interested in PARENT_PATH_P, + but only in (*PARENT_PATH_P)->NODE. + + NOTE: Public interfaces which only *read* from the filesystem + should not call this function directly, but should instead use + get_dag(). +*/ +static svn_error_t * +open_path(parent_path_t **parent_path_p, + svn_fs_root_t *root, + const char *path, + int flags, + svn_boolean_t is_txn_path, + apr_pool_t *pool) +{ + svn_fs_t *fs = root->fs; + dag_node_t *here = NULL; /* The directory we're currently looking at. */ + parent_path_t *parent_path; /* The path from HERE up to the root. */ + const char *rest = NULL; /* The portion of PATH we haven't traversed yet. */ + apr_pool_t *iterpool = svn_pool_create(pool); + + /* path to the currently processed entry without trailing '/'. + We will reuse this across iterations by simply putting a NUL terminator + at the respective position and replacing that with a '/' in the next + iteration. This is correct as we assert() PATH to be canonical. */ + svn_stringbuf_t *path_so_far = svn_stringbuf_create(path, pool); + apr_size_t path_len = path_so_far->len; + + /* Callers often traverse the DAG in some path-based order or along the + history segments. That allows us to try a few guesses about where to + find the next item. This is only useful if the caller didn't request + the full parent chain. */ + assert(svn_fs__is_canonical_abspath(path)); + path_so_far->len = 0; /* "" */ + if (flags & open_path_node_only) + { + const char *directory; + + /* First attempt: Assume that we access the DAG for the same path as + in the last lookup but for a different revision that happens to be + the last revision that touched the respective node. This is a + common pattern when e.g. checking out over ra_serf. Note that this + will only work for committed data as the revision info for nodes in + txns is bogus. + + This shortcut is quick and will exit this function upon success. + So, try it first. */ + if (!root->is_txn_root) + { + dag_node_t *node; + SVN_ERR(try_match_last_node(&node, root, path, path_len, iterpool)); + + /* Did the shortcut work? */ + if (node) + { + /* Construct and return the result. */ + svn_pool_destroy(iterpool); + + parent_path = make_parent_path(node, 0, 0, pool); + parent_path->copy_inherit = copy_id_inherit_self; + *parent_path_p = parent_path; + + return SVN_NO_ERROR; + } + } + + /* Second attempt: Try starting the lookup immediately at the parent + node. We will often have recently accessed either a sibling or + said parent DIRECTORY itself for the same revision. */ + directory = svn_dirent_dirname(path, pool); + if (directory[1] != 0) /* root nodes are covered anyway */ + { + SVN_ERR(dag_node_cache_get(&here, root, directory, pool)); + + /* Did the shortcut work? */ + if (here) + { + apr_size_t dirname_len = strlen(directory); + path_so_far->len = dirname_len; + rest = path + dirname_len + 1; + } + } + } + + /* did the shortcut work? */ + if (!here) + { + /* Make a parent_path item for the root node, using its own current + copy id. */ + SVN_ERR(root_node(&here, root, pool, iterpool)); + rest = path + 1; /* skip the leading '/', it saves in iteration */ + } + + path_so_far->data[path_so_far->len] = '\0'; + parent_path = make_parent_path(here, 0, 0, pool); + parent_path->copy_inherit = copy_id_inherit_self; + + /* Whenever we are at the top of this loop: + - HERE is our current directory, + - ID is the node revision ID of HERE, + - REST is the path we're going to find in HERE, and + - PARENT_PATH includes HERE and all its parents. */ + for (;;) + { + const char *next; + char *entry; + dag_node_t *child; + + svn_pool_clear(iterpool); + + /* The NODE in PARENT_PATH always lives in POOL, i.e. it will + * survive the cleanup of ITERPOOL and the DAG cache.*/ + here = parent_path->node; + + /* Parse out the next entry from the path. */ + entry = svn_fs__next_entry_name(&next, rest, pool); + + /* Update the path traversed thus far. */ + path_so_far->data[path_so_far->len] = '/'; + path_so_far->len += strlen(entry) + 1; + path_so_far->data[path_so_far->len] = '\0'; + + /* Given the behavior of svn_fs__next_entry_name(), ENTRY may be an + empty string when the path either starts or ends with a slash. + In either case, we stay put: the current directory stays the + same, and we add nothing to the parent path. We only need to + process non-empty path segments. */ + if (*entry != '\0') + { + copy_id_inherit_t inherit; + const char *copy_path = NULL; + dag_node_t *cached_node = NULL; + + /* If we found a directory entry, follow it. First, we + check our node cache, and, failing that, we hit the DAG + layer. Don't bother to contact the cache for the last + element if we already know the lookup to fail for the + complete path. */ + if (next || !(flags & open_path_uncached)) + SVN_ERR(dag_node_cache_get(&cached_node, root, path_so_far->data, + pool)); + if (cached_node) + child = cached_node; + else + SVN_ERR(svn_fs_x__dag_open(&child, here, entry, pool, iterpool)); + + /* "file not found" requires special handling. */ + if (child == NULL) + { + /* If this was the last path component, and the caller + said it was optional, then don't return an error; + just put a NULL node pointer in the path. */ + + if ((flags & open_path_last_optional) + && (! next || *next == '\0')) + { + parent_path = make_parent_path(NULL, entry, parent_path, + pool); + break; + } + else if (flags & open_path_allow_null) + { + parent_path = NULL; + break; + } + else + { + /* Build a better error message than svn_fs_x__dag_open + can provide, giving the root and full path name. */ + return SVN_FS__NOT_FOUND(root, path); + } + } + + if (flags & open_path_node_only) + { + /* Shortcut: the caller only wants the final DAG node. */ + parent_path->node = svn_fs_x__dag_copy_into_pool(child, pool); + } + else + { + /* Now, make a parent_path item for CHILD. */ + parent_path = make_parent_path(child, entry, parent_path, pool); + if (is_txn_path) + { + SVN_ERR(get_copy_inheritance(&inherit, ©_path, fs, + parent_path, iterpool)); + parent_path->copy_inherit = inherit; + parent_path->copy_src_path = apr_pstrdup(pool, copy_path); + } + } + + /* Cache the node we found (if it wasn't already cached). */ + if (! cached_node) + SVN_ERR(dag_node_cache_set(root, path_so_far->data, child, + iterpool)); + } + + /* Are we finished traversing the path? */ + if (! next) + break; + + /* The path isn't finished yet; we'd better be in a directory. */ + if (svn_fs_x__dag_node_kind(child) != svn_node_dir) + SVN_ERR_W(SVN_FS__ERR_NOT_DIRECTORY(fs, path_so_far->data), + apr_psprintf(iterpool, _("Failure opening '%s'"), path)); + + rest = next; + } + + svn_pool_destroy(iterpool); + *parent_path_p = parent_path; + return SVN_NO_ERROR; +} + + +/* Make the node referred to by PARENT_PATH mutable, if it isn't already, + allocating from RESULT_POOL. ROOT must be the root from which + PARENT_PATH descends. Clone any parent directories as needed. + Adjust the dag nodes in PARENT_PATH to refer to the clones. Use + ERROR_PATH in error messages. Use SCRATCH_POOL for temporaries. */ +static svn_error_t * +make_path_mutable(svn_fs_root_t *root, + parent_path_t *parent_path, + const char *error_path, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool) +{ + dag_node_t *clone; + svn_fs_x__txn_id_t txn_id = root_txn_id(root); + + /* Is the node mutable already? */ + if (svn_fs_x__dag_check_mutable(parent_path->node)) + return SVN_NO_ERROR; + + /* Are we trying to clone the root, or somebody's child node? */ + if (parent_path->parent) + { + svn_fs_x__id_t copy_id = { SVN_INVALID_REVNUM, 0 }; + svn_fs_x__id_t *copy_id_ptr = ©_id; + copy_id_inherit_t inherit = parent_path->copy_inherit; + const char *clone_path, *copyroot_path; + svn_revnum_t copyroot_rev; + svn_boolean_t is_parent_copyroot = FALSE; + svn_fs_root_t *copyroot_root; + dag_node_t *copyroot_node; + svn_boolean_t related; + + /* We're trying to clone somebody's child. Make sure our parent + is mutable. */ + SVN_ERR(make_path_mutable(root, parent_path->parent, + error_path, result_pool, scratch_pool)); + + switch (inherit) + { + case copy_id_inherit_parent: + SVN_ERR(svn_fs_x__dag_get_copy_id(©_id, + parent_path->parent->node)); + break; + + case copy_id_inherit_new: + SVN_ERR(svn_fs_x__reserve_copy_id(©_id, root->fs, txn_id, + scratch_pool)); + break; + + case copy_id_inherit_self: + copy_id_ptr = NULL; + break; + + case copy_id_inherit_unknown: + default: + SVN_ERR_MALFUNCTION(); /* uh-oh -- somebody didn't calculate copy-ID + inheritance data. */ + } + + /* Determine what copyroot our new child node should use. */ + SVN_ERR(svn_fs_x__dag_get_copyroot(©root_rev, ©root_path, + parent_path->node)); + SVN_ERR(svn_fs_x__revision_root(©root_root, root->fs, + copyroot_rev, scratch_pool)); + SVN_ERR(get_dag(©root_node, copyroot_root, copyroot_path, + result_pool)); + + SVN_ERR(svn_fs_x__dag_related_node(&related, copyroot_node, + parent_path->node)); + if (!related) + is_parent_copyroot = TRUE; + + /* Now make this node mutable. */ + clone_path = parent_path_path(parent_path->parent, scratch_pool); + SVN_ERR(svn_fs_x__dag_clone_child(&clone, + parent_path->parent->node, + clone_path, + parent_path->entry, + copy_id_ptr, txn_id, + is_parent_copyroot, + result_pool, + scratch_pool)); + + /* Update the path cache. */ + SVN_ERR(dag_node_cache_set(root, + parent_path_path(parent_path, scratch_pool), + clone, scratch_pool)); + } + else + { + /* We're trying to clone the root directory. */ + SVN_ERR(mutable_root_node(&clone, root, error_path, result_pool, + scratch_pool)); + } + + /* Update the PARENT_PATH link to refer to the clone. */ + parent_path->node = clone; + + return SVN_NO_ERROR; +} + + +/* Open the node identified by PATH in ROOT. Set DAG_NODE_P to the + node we find, allocated in POOL. Return the error + SVN_ERR_FS_NOT_FOUND if this node doesn't exist. + */ +static svn_error_t * +get_dag(dag_node_t **dag_node_p, + svn_fs_root_t *root, + const char *path, + apr_pool_t *pool) +{ + parent_path_t *parent_path; + dag_node_t *node = NULL; + + /* First we look for the DAG in our cache + (if the path may be canonical). */ + if (*path == '/') + SVN_ERR(dag_node_cache_get(&node, root, path, pool)); + + if (! node) + { + /* Canonicalize the input PATH. As it turns out, >95% of all paths + * seen here during e.g. svnadmin verify are non-canonical, i.e. + * miss the leading '/'. Unconditional canonicalization has a net + * performance benefit over previously checking path for being + * canonical. */ + path = svn_fs__canonicalize_abspath(path, pool); + SVN_ERR(dag_node_cache_get(&node, root, path, pool)); + + if (! node) + { + /* Call open_path with no flags, as we want this to return an + * error if the node for which we are searching doesn't exist. */ + SVN_ERR(open_path(&parent_path, root, path, + open_path_uncached | open_path_node_only, + FALSE, pool)); + node = parent_path->node; + + /* No need to cache our find -- open_path() will do that for us. */ + } + } + + *dag_node_p = svn_fs_x__dag_copy_into_pool(node, pool); + return SVN_NO_ERROR; +} + + + +/* Populating the `changes' table. */ + +/* Add a change to the changes table in FS, keyed on transaction id + TXN_ID, and indicated that a change of kind CHANGE_KIND occurred on + PATH (whose node revision id is--or was, in the case of a + deletion--NODEREV_ID), and optionally that TEXT_MODs, PROP_MODs or + MERGEINFO_MODs occurred. If the change resulted from a copy, + COPYFROM_REV and COPYFROM_PATH specify under which revision and path + the node was copied from. If this was not part of a copy, COPYFROM_REV + should be SVN_INVALID_REVNUM. Use SCRATCH_POOL for temporary allocations. + */ +static svn_error_t * +add_change(svn_fs_t *fs, + svn_fs_x__txn_id_t txn_id, + const char *path, + const svn_fs_x__id_t *noderev_id, + svn_fs_path_change_kind_t change_kind, + svn_boolean_t text_mod, + svn_boolean_t prop_mod, + svn_boolean_t mergeinfo_mod, + svn_node_kind_t node_kind, + svn_revnum_t copyfrom_rev, + const char *copyfrom_path, + apr_pool_t *scratch_pool) +{ + return svn_fs_x__add_change(fs, txn_id, + svn_fs__canonicalize_abspath(path, + scratch_pool), + noderev_id, change_kind, + text_mod, prop_mod, mergeinfo_mod, + node_kind, copyfrom_rev, copyfrom_path, + scratch_pool); +} + + + +/* Generic node operations. */ + +/* Get the id of a node referenced by path PATH in ROOT. Return the + id in *ID_P allocated in POOL. */ +static svn_error_t * +x_node_id(const svn_fs_id_t **id_p, + svn_fs_root_t *root, + const char *path, + apr_pool_t *pool) +{ + svn_fs_x__id_t noderev_id; + + if ((! root->is_txn_root) + && (path[0] == '\0' || ((path[0] == '/') && (path[1] == '\0')))) + { + /* Optimize the case where we don't need any db access at all. + The root directory ("" or "/") node is stored in the + svn_fs_root_t object, and never changes when it's a revision + root, so we can just reach in and grab it directly. */ + svn_fs_x__init_rev_root(&noderev_id, root->rev); + } + else + { + dag_node_t *node; + + SVN_ERR(get_dag(&node, root, path, pool)); + noderev_id = *svn_fs_x__dag_get_id(node); + } + + *id_p = svn_fs_x__id_create(svn_fs_x__id_create_context(root->fs, pool), + &noderev_id, pool); + + return SVN_NO_ERROR; +} + +static svn_error_t * +x_node_relation(svn_fs_node_relation_t *relation, + svn_fs_root_t *root_a, + const char *path_a, + svn_fs_root_t *root_b, + const char *path_b, + apr_pool_t *scratch_pool) +{ + dag_node_t *node; + svn_fs_x__id_t noderev_id_a, noderev_id_b, node_id_a, node_id_b; + + /* Root paths are a common special case. */ + svn_boolean_t a_is_root_dir + = (path_a[0] == '\0') || ((path_a[0] == '/') && (path_a[1] == '\0')); + svn_boolean_t b_is_root_dir + = (path_b[0] == '\0') || ((path_b[0] == '/') && (path_b[1] == '\0')); + + /* Path from different repository are always unrelated. */ + if (root_a->fs != root_b->fs) + { + *relation = svn_fs_node_unrelated; + return SVN_NO_ERROR; + } + + /* Are both (!) root paths? Then, they are related and we only test how + * direct the relation is. */ + if (a_is_root_dir && b_is_root_dir) + { + svn_boolean_t different_txn + = root_a->is_txn_root && root_b->is_txn_root + && strcmp(root_a->txn, root_b->txn); + + /* For txn roots, root->REV is the base revision of that TXN. */ + *relation = ( (root_a->rev == root_b->rev) + && (root_a->is_txn_root == root_b->is_txn_root) + && !different_txn) + ? svn_fs_node_unchanged + : svn_fs_node_common_ancestor; + return SVN_NO_ERROR; + } + + /* We checked for all separations between ID spaces (repos, txn). + * Now, we can simply test for the ID values themselves. */ + SVN_ERR(get_dag(&node, root_a, path_a, scratch_pool)); + noderev_id_a = *svn_fs_x__dag_get_id(node); + SVN_ERR(svn_fs_x__dag_get_node_id(&node_id_a, node)); + + SVN_ERR(get_dag(&node, root_b, path_b, scratch_pool)); + noderev_id_b = *svn_fs_x__dag_get_id(node); + SVN_ERR(svn_fs_x__dag_get_node_id(&node_id_b, node)); + + /* In FSX, even in-txn IDs are globally unique. + * So, we can simply compare them. */ + if (svn_fs_x__id_eq(&noderev_id_a, &noderev_id_b)) + *relation = svn_fs_node_unchanged; + else if (svn_fs_x__id_eq(&node_id_a, &node_id_b)) + *relation = svn_fs_node_common_ancestor; + else + *relation = svn_fs_node_unrelated; + + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__node_created_rev(svn_revnum_t *revision, + svn_fs_root_t *root, + const char *path, + apr_pool_t *scratch_pool) +{ + dag_node_t *node; + + SVN_ERR(get_dag(&node, root, path, scratch_pool)); + *revision = svn_fs_x__dag_get_revision(node); + + return SVN_NO_ERROR; +} + + +/* Set *CREATED_PATH to the path at which PATH under ROOT was created. + Return a string allocated in POOL. */ +static svn_error_t * +x_node_created_path(const char **created_path, + svn_fs_root_t *root, + const char *path, + apr_pool_t *pool) +{ + dag_node_t *node; + + SVN_ERR(get_dag(&node, root, path, pool)); + *created_path = svn_fs_x__dag_get_created_path(node); + + return SVN_NO_ERROR; +} + + +/* Set *KIND_P to the type of node located at PATH under ROOT. + Perform temporary allocations in SCRATCH_POOL. */ +static svn_error_t * +node_kind(svn_node_kind_t *kind_p, + svn_fs_root_t *root, + const char *path, + apr_pool_t *scratch_pool) +{ + dag_node_t *node; + + /* Get the node id. */ + SVN_ERR(get_dag(&node, root, path, scratch_pool)); + + /* Use the node id to get the real kind. */ + *kind_p = svn_fs_x__dag_node_kind(node); + + return SVN_NO_ERROR; +} + + +/* Set *KIND_P to the type of node present at PATH under ROOT. If + PATH does not exist under ROOT, set *KIND_P to svn_node_none. Use + SCRATCH_POOL for temporary allocation. */ +svn_error_t * +svn_fs_x__check_path(svn_node_kind_t *kind_p, + svn_fs_root_t *root, + const char *path, + apr_pool_t *scratch_pool) +{ + svn_error_t *err = node_kind(kind_p, root, path, scratch_pool); + if (err && + ((err->apr_err == SVN_ERR_FS_NOT_FOUND) + || (err->apr_err == SVN_ERR_FS_NOT_DIRECTORY))) + { + svn_error_clear(err); + err = SVN_NO_ERROR; + *kind_p = svn_node_none; + } + + return svn_error_trace(err); +} + +/* Set *VALUE_P to the value of the property named PROPNAME of PATH in + ROOT. If the node has no property by that name, set *VALUE_P to + zero. Allocate the result in POOL. */ +static svn_error_t * +x_node_prop(svn_string_t **value_p, + svn_fs_root_t *root, + const char *path, + const char *propname, + apr_pool_t *pool) +{ + dag_node_t *node; + apr_hash_t *proplist; + apr_pool_t *scratch_pool = svn_pool_create(pool); + + SVN_ERR(get_dag(&node, root, path, pool)); + SVN_ERR(svn_fs_x__dag_get_proplist(&proplist, node, pool, scratch_pool)); + *value_p = NULL; + if (proplist) + *value_p = svn_hash_gets(proplist, propname); + + svn_pool_destroy(scratch_pool); + return SVN_NO_ERROR; +} + + +/* Set *TABLE_P to the entire property list of PATH under ROOT, as an + APR hash table allocated in POOL. The resulting property table + maps property names to pointers to svn_string_t objects containing + the property value. */ +static svn_error_t * +x_node_proplist(apr_hash_t **table_p, + svn_fs_root_t *root, + const char *path, + apr_pool_t *pool) +{ + dag_node_t *node; + apr_pool_t *scratch_pool = svn_pool_create(pool); + + SVN_ERR(get_dag(&node, root, path, pool)); + SVN_ERR(svn_fs_x__dag_get_proplist(table_p, node, pool, scratch_pool)); + + svn_pool_destroy(scratch_pool); + return SVN_NO_ERROR; +} + +static svn_error_t * +x_node_has_props(svn_boolean_t *has_props, + svn_fs_root_t *root, + const char *path, + apr_pool_t *scratch_pool) +{ + apr_hash_t *props; + + SVN_ERR(x_node_proplist(&props, root, path, scratch_pool)); + + *has_props = (0 < apr_hash_count(props)); + + return SVN_NO_ERROR; +} + +static svn_error_t * +increment_mergeinfo_up_tree(parent_path_t *pp, + apr_int64_t increment, + apr_pool_t *scratch_pool) +{ + apr_pool_t *iterpool = svn_pool_create(scratch_pool); + + for (; pp; pp = pp->parent) + { + svn_pool_clear(iterpool); + SVN_ERR(svn_fs_x__dag_increment_mergeinfo_count(pp->node, + increment, + iterpool)); + } + + svn_pool_destroy(iterpool); + return SVN_NO_ERROR; +} + +/* Change, add, or delete a node's property value. The affected node + is PATH under ROOT, the property value to modify is NAME, and VALUE + points to either a string value to set the new contents to, or NULL + if the property should be deleted. Perform temporary allocations + in SCRATCH_POOL. */ +static svn_error_t * +x_change_node_prop(svn_fs_root_t *root, + const char *path, + const char *name, + const svn_string_t *value, + apr_pool_t *scratch_pool) +{ + parent_path_t *parent_path; + apr_hash_t *proplist; + svn_fs_x__txn_id_t txn_id; + svn_boolean_t mergeinfo_mod = FALSE; + apr_pool_t *subpool = svn_pool_create(scratch_pool); + + if (! root->is_txn_root) + return SVN_FS__NOT_TXN(root); + txn_id = root_txn_id(root); + + path = svn_fs__canonicalize_abspath(path, subpool); + SVN_ERR(open_path(&parent_path, root, path, 0, TRUE, subpool)); + + /* Check (non-recursively) to see if path is locked; if so, check + that we can use it. */ + if (root->txn_flags & SVN_FS_TXN_CHECK_LOCKS) + SVN_ERR(svn_fs_x__allow_locked_operation(path, root->fs, FALSE, FALSE, + subpool)); + + SVN_ERR(make_path_mutable(root, parent_path, path, subpool, subpool)); + SVN_ERR(svn_fs_x__dag_get_proplist(&proplist, parent_path->node, subpool, + subpool)); + + /* If there's no proplist, but we're just deleting a property, exit now. */ + if ((! proplist) && (! value)) + return SVN_NO_ERROR; + + /* Now, if there's no proplist, we know we need to make one. */ + if (! proplist) + proplist = apr_hash_make(subpool); + + if (strcmp(name, SVN_PROP_MERGEINFO) == 0) + { + apr_int64_t increment = 0; + svn_boolean_t had_mergeinfo; + SVN_ERR(svn_fs_x__dag_has_mergeinfo(&had_mergeinfo, parent_path->node)); + + if (value && !had_mergeinfo) + increment = 1; + else if (!value && had_mergeinfo) + increment = -1; + + if (increment != 0) + { + SVN_ERR(increment_mergeinfo_up_tree(parent_path, increment, subpool)); + SVN_ERR(svn_fs_x__dag_set_has_mergeinfo(parent_path->node, + (value != NULL), subpool)); + } + + mergeinfo_mod = TRUE; + } + + /* Set the property. */ + svn_hash_sets(proplist, name, value); + + /* Overwrite the node's proplist. */ + SVN_ERR(svn_fs_x__dag_set_proplist(parent_path->node, proplist, + subpool)); + + /* Make a record of this modification in the changes table. */ + SVN_ERR(add_change(root->fs, txn_id, path, + svn_fs_x__dag_get_id(parent_path->node), + svn_fs_path_change_modify, FALSE, TRUE, mergeinfo_mod, + svn_fs_x__dag_node_kind(parent_path->node), + SVN_INVALID_REVNUM, NULL, subpool)); + + svn_pool_destroy(subpool); + return SVN_NO_ERROR; +} + + +/* Determine if the properties of two path/root combinations are + different. Set *CHANGED_P to TRUE if the properties at PATH1 under + ROOT1 differ from those at PATH2 under ROOT2, or FALSE otherwise. + Both roots must be in the same filesystem. */ +static svn_error_t * +x_props_changed(svn_boolean_t *changed_p, + svn_fs_root_t *root1, + const char *path1, + svn_fs_root_t *root2, + const char *path2, + svn_boolean_t strict, + apr_pool_t *scratch_pool) +{ + dag_node_t *node1, *node2; + apr_pool_t *subpool = svn_pool_create(scratch_pool); + + /* Check that roots are in the same fs. */ + if (root1->fs != root2->fs) + return svn_error_create + (SVN_ERR_FS_GENERAL, NULL, + _("Cannot compare property value between two different filesystems")); + + SVN_ERR(get_dag(&node1, root1, path1, subpool)); + SVN_ERR(get_dag(&node2, root2, path2, subpool)); + SVN_ERR(svn_fs_x__dag_things_different(changed_p, NULL, node1, node2, + strict, subpool)); + svn_pool_destroy(subpool); + + return SVN_NO_ERROR; +} + + + +/* Merges and commits. */ + +/* Set *NODE to the root node of ROOT. */ +static svn_error_t * +get_root(dag_node_t **node, svn_fs_root_t *root, apr_pool_t *pool) +{ + return get_dag(node, root, "/", pool); +} + + +/* Set the contents of CONFLICT_PATH to PATH, and return an + SVN_ERR_FS_CONFLICT error that indicates that there was a conflict + at PATH. Perform all allocations in POOL (except the allocation of + CONFLICT_PATH, which should be handled outside this function). */ +static svn_error_t * +conflict_err(svn_stringbuf_t *conflict_path, + const char *path) +{ + svn_stringbuf_set(conflict_path, path); + return svn_error_createf(SVN_ERR_FS_CONFLICT, NULL, + _("Conflict at '%s'"), path); +} + +/* Compare the directory representations at nodes LHS and RHS in FS and set + * *CHANGED to TRUE, if at least one entry has been added or removed them. + * Use SCRATCH_POOL for temporary allocations. + */ +static svn_error_t * +compare_dir_structure(svn_boolean_t *changed, + svn_fs_t *fs, + dag_node_t *lhs, + dag_node_t *rhs, + apr_pool_t *scratch_pool) +{ + apr_array_header_t *lhs_entries; + apr_array_header_t *rhs_entries; + int i; + apr_pool_t *iterpool = svn_pool_create(scratch_pool); + + SVN_ERR(svn_fs_x__dag_dir_entries(&lhs_entries, lhs, scratch_pool, + iterpool)); + SVN_ERR(svn_fs_x__dag_dir_entries(&rhs_entries, rhs, scratch_pool, + iterpool)); + + /* different number of entries -> some addition / removal */ + if (lhs_entries->nelts != rhs_entries->nelts) + { + svn_pool_destroy(iterpool); + *changed = TRUE; + + return SVN_NO_ERROR; + } + + /* Since directories are sorted by name, we can simply compare their + entries one-by-one without binary lookup etc. */ + for (i = 0; i < lhs_entries->nelts; ++i) + { + svn_fs_x__dirent_t *lhs_entry + = APR_ARRAY_IDX(lhs_entries, i, svn_fs_x__dirent_t *); + svn_fs_x__dirent_t *rhs_entry + = APR_ARRAY_IDX(rhs_entries, i, svn_fs_x__dirent_t *); + + if (strcmp(lhs_entry->name, rhs_entry->name) == 0) + { + svn_boolean_t same_history; + dag_node_t *lhs_node, *rhs_node; + + /* Unchanged entry? */ + if (!svn_fs_x__id_eq(&lhs_entry->id, &rhs_entry->id)) + continue; + + /* We get here rarely. */ + svn_pool_clear(iterpool); + + /* Modified but not copied / replaced or anything? */ + SVN_ERR(svn_fs_x__dag_get_node(&lhs_node, fs, &lhs_entry->id, + iterpool, iterpool)); + SVN_ERR(svn_fs_x__dag_get_node(&rhs_node, fs, &rhs_entry->id, + iterpool, iterpool)); + SVN_ERR(svn_fs_x__dag_same_line_of_history(&same_history, + lhs_node, rhs_node)); + if (same_history) + continue; + } + + /* This is a different entry. */ + *changed = TRUE; + svn_pool_destroy(iterpool); + + return SVN_NO_ERROR; + } + + svn_pool_destroy(iterpool); + *changed = FALSE; + + return SVN_NO_ERROR; +} + +/* Merge changes between ANCESTOR and SOURCE into TARGET. ANCESTOR + * and TARGET must be distinct node revisions. TARGET_PATH should + * correspond to TARGET's full path in its filesystem, and is used for + * reporting conflict location. + * + * SOURCE, TARGET, and ANCESTOR are generally directories; this + * function recursively merges the directories' contents. If any are + * files, this function simply returns an error whenever SOURCE, + * TARGET, and ANCESTOR are all distinct node revisions. + * + * If there are differences between ANCESTOR and SOURCE that conflict + * with changes between ANCESTOR and TARGET, this function returns an + * SVN_ERR_FS_CONFLICT error, and updates CONFLICT_P to the name of the + * conflicting node in TARGET, with TARGET_PATH prepended as a path. + * + * If there are no conflicting differences, CONFLICT_P is updated to + * the empty string. + * + * CONFLICT_P must point to a valid svn_stringbuf_t. + * + * Do any necessary temporary allocation in POOL. + */ +static svn_error_t * +merge(svn_stringbuf_t *conflict_p, + const char *target_path, + dag_node_t *target, + dag_node_t *source, + dag_node_t *ancestor, + svn_fs_x__txn_id_t txn_id, + apr_int64_t *mergeinfo_increment_out, + apr_pool_t *pool) +{ + const svn_fs_x__id_t *source_id, *target_id, *ancestor_id; + apr_array_header_t *s_entries, *t_entries, *a_entries; + int i, s_idx = -1, t_idx = -1; + svn_fs_t *fs; + apr_pool_t *iterpool; + apr_int64_t mergeinfo_increment = 0; + + /* Make sure everyone comes from the same filesystem. */ + fs = svn_fs_x__dag_get_fs(ancestor); + if ((fs != svn_fs_x__dag_get_fs(source)) + || (fs != svn_fs_x__dag_get_fs(target))) + { + return svn_error_create + (SVN_ERR_FS_CORRUPT, NULL, + _("Bad merge; ancestor, source, and target not all in same fs")); + } + + /* We have the same fs, now check it. */ + SVN_ERR(svn_fs__check_fs(fs, TRUE)); + + source_id = svn_fs_x__dag_get_id(source); + target_id = svn_fs_x__dag_get_id(target); + ancestor_id = svn_fs_x__dag_get_id(ancestor); + + /* It's improper to call this function with ancestor == target. */ + if (svn_fs_x__id_eq(ancestor_id, target_id)) + { + svn_string_t *id_str = svn_fs_x__id_unparse(target_id, pool); + return svn_error_createf + (SVN_ERR_FS_GENERAL, NULL, + _("Bad merge; target '%s' has id '%s', same as ancestor"), + target_path, id_str->data); + } + + svn_stringbuf_setempty(conflict_p); + + /* Base cases: + * Either no change made in source, or same change as made in target. + * Both mean nothing to merge here. + */ + if (svn_fs_x__id_eq(ancestor_id, source_id) + || (svn_fs_x__id_eq(source_id, target_id))) + return SVN_NO_ERROR; + + /* Else proceed, knowing all three are distinct node revisions. + * + * How to merge from this point: + * + * if (not all 3 are directories) + * { + * early exit with conflict; + * } + * + * // Property changes may only be made to up-to-date + * // directories, because once the client commits the prop + * // change, it bumps the directory's revision, and therefore + * // must be able to depend on there being no other changes to + * // that directory in the repository. + * if (target's property list differs from ancestor's) + * conflict; + * + * For each entry NAME in the directory ANCESTOR: + * + * Let ANCESTOR-ENTRY, SOURCE-ENTRY, and TARGET-ENTRY be the IDs of + * the name within ANCESTOR, SOURCE, and TARGET respectively. + * (Possibly null if NAME does not exist in SOURCE or TARGET.) + * + * If ANCESTOR-ENTRY == SOURCE-ENTRY, then: + * No changes were made to this entry while the transaction was in + * progress, so do nothing to the target. + * + * Else if ANCESTOR-ENTRY == TARGET-ENTRY, then: + * A change was made to this entry while the transaction was in + * process, but the transaction did not touch this entry. Replace + * TARGET-ENTRY with SOURCE-ENTRY. + * + * Else: + * Changes were made to this entry both within the transaction and + * to the repository while the transaction was in progress. They + * must be merged or declared to be in conflict. + * + * If SOURCE-ENTRY and TARGET-ENTRY are both null, that's a + * double delete; flag a conflict. + * + * If any of the three entries is of type file, declare a conflict. + * + * If either SOURCE-ENTRY or TARGET-ENTRY is not a direct + * modification of ANCESTOR-ENTRY (determine by comparing the + * node-id fields), declare a conflict. A replacement is + * incompatible with a modification or other replacement--even + * an identical replacement. + * + * Direct modifications were made to the directory ANCESTOR-ENTRY + * in both SOURCE and TARGET. Recursively merge these + * modifications. + * + * For each leftover entry NAME in the directory SOURCE: + * + * If NAME exists in TARGET, declare a conflict. Even if SOURCE and + * TARGET are adding exactly the same thing, two additions are not + * auto-mergeable with each other. + * + * Add NAME to TARGET with the entry from SOURCE. + * + * Now that we are done merging the changes from SOURCE into the + * directory TARGET, update TARGET's predecessor to be SOURCE. + */ + + if ((svn_fs_x__dag_node_kind(source) != svn_node_dir) + || (svn_fs_x__dag_node_kind(target) != svn_node_dir) + || (svn_fs_x__dag_node_kind(ancestor) != svn_node_dir)) + { + return conflict_err(conflict_p, target_path); + } + + + /* Possible early merge failure: if target and ancestor have + different property lists, then the merge should fail. + Propchanges can *only* be committed on an up-to-date directory. + ### TODO: see issue #418 about the inelegance of this. + + Another possible, similar, early merge failure: if source and + ancestor have different property lists (meaning someone else + changed directory properties while our commit transaction was + happening), the merge should fail. See issue #2751. + */ + { + svn_fs_x__noderev_t *tgt_nr, *anc_nr, *src_nr; + svn_boolean_t same; + apr_pool_t *scratch_pool; + + /* Get node revisions for our id's. */ + scratch_pool = svn_pool_create(pool); + SVN_ERR(svn_fs_x__get_node_revision(&tgt_nr, fs, target_id, + pool, scratch_pool)); + svn_pool_clear(scratch_pool); + SVN_ERR(svn_fs_x__get_node_revision(&anc_nr, fs, ancestor_id, + pool, scratch_pool)); + svn_pool_clear(scratch_pool); + SVN_ERR(svn_fs_x__get_node_revision(&src_nr, fs, source_id, + pool, scratch_pool)); + svn_pool_destroy(scratch_pool); + + /* Now compare the prop-keys of the skels. Note that just because + the keys are different -doesn't- mean the proplists have + different contents. */ + SVN_ERR(svn_fs_x__prop_rep_equal(&same, fs, src_nr, anc_nr, TRUE, pool)); + if (! same) + return conflict_err(conflict_p, target_path); + + /* The directory entries got changed in the repository but the directory + properties did not. */ + SVN_ERR(svn_fs_x__prop_rep_equal(&same, fs, tgt_nr, anc_nr, TRUE, pool)); + if (! same) + { + /* There is an incoming prop change for this directory. + We will accept it only if the directory changes were mere updates + to its entries, i.e. there were no additions or removals. + Those could cause update problems to the working copy. */ + svn_boolean_t changed; + SVN_ERR(compare_dir_structure(&changed, fs, source, ancestor, pool)); + + if (changed) + return conflict_err(conflict_p, target_path); + } + } + + /* ### todo: it would be more efficient to simply check for a NULL + entries hash where necessary below than to allocate an empty hash + here, but another day, another day... */ + iterpool = svn_pool_create(pool); + SVN_ERR(svn_fs_x__dag_dir_entries(&s_entries, source, pool, iterpool)); + SVN_ERR(svn_fs_x__dag_dir_entries(&t_entries, target, pool, iterpool)); + SVN_ERR(svn_fs_x__dag_dir_entries(&a_entries, ancestor, pool, iterpool)); + + /* for each entry E in a_entries... */ + for (i = 0; i < a_entries->nelts; ++i) + { + svn_fs_x__dirent_t *s_entry, *t_entry, *a_entry; + svn_pool_clear(iterpool); + + a_entry = APR_ARRAY_IDX(a_entries, i, svn_fs_x__dirent_t *); + s_entry = svn_fs_x__find_dir_entry(s_entries, a_entry->name, &s_idx); + t_entry = svn_fs_x__find_dir_entry(t_entries, a_entry->name, &t_idx); + + /* No changes were made to this entry while the transaction was + in progress, so do nothing to the target. */ + if (s_entry && svn_fs_x__id_eq(&a_entry->id, &s_entry->id)) + continue; + + /* A change was made to this entry while the transaction was in + process, but the transaction did not touch this entry. */ + else if (t_entry && svn_fs_x__id_eq(&a_entry->id, &t_entry->id)) + { + apr_int64_t mergeinfo_start; + apr_int64_t mergeinfo_end; + + dag_node_t *t_ent_node; + SVN_ERR(svn_fs_x__dag_get_node(&t_ent_node, fs, &t_entry->id, + iterpool, iterpool)); + SVN_ERR(svn_fs_x__dag_get_mergeinfo_count(&mergeinfo_start, + t_ent_node)); + mergeinfo_increment -= mergeinfo_start; + + if (s_entry) + { + dag_node_t *s_ent_node; + SVN_ERR(svn_fs_x__dag_get_node(&s_ent_node, fs, &s_entry->id, + iterpool, iterpool)); + + SVN_ERR(svn_fs_x__dag_get_mergeinfo_count(&mergeinfo_end, + s_ent_node)); + mergeinfo_increment += mergeinfo_end; + + SVN_ERR(svn_fs_x__dag_set_entry(target, a_entry->name, + &s_entry->id, + s_entry->kind, + txn_id, + iterpool)); + } + else + { + SVN_ERR(svn_fs_x__dag_delete(target, a_entry->name, txn_id, + iterpool)); + } + } + + /* Changes were made to this entry both within the transaction + and to the repository while the transaction was in progress. + They must be merged or declared to be in conflict. */ + else + { + dag_node_t *s_ent_node, *t_ent_node, *a_ent_node; + const char *new_tpath; + apr_int64_t sub_mergeinfo_increment; + svn_boolean_t s_a_same, t_a_same; + + /* If SOURCE-ENTRY and TARGET-ENTRY are both null, that's a + double delete; if one of them is null, that's a delete versus + a modification. In any of these cases, flag a conflict. */ + if (s_entry == NULL || t_entry == NULL) + return conflict_err(conflict_p, + svn_fspath__join(target_path, + a_entry->name, + iterpool)); + + /* If any of the three entries is of type file, flag a conflict. */ + if (s_entry->kind == svn_node_file + || t_entry->kind == svn_node_file + || a_entry->kind == svn_node_file) + return conflict_err(conflict_p, + svn_fspath__join(target_path, + a_entry->name, + iterpool)); + + /* Fetch DAG nodes to efficiently access ID parts. */ + SVN_ERR(svn_fs_x__dag_get_node(&s_ent_node, fs, &s_entry->id, + iterpool, iterpool)); + SVN_ERR(svn_fs_x__dag_get_node(&t_ent_node, fs, &t_entry->id, + iterpool, iterpool)); + SVN_ERR(svn_fs_x__dag_get_node(&a_ent_node, fs, &a_entry->id, + iterpool, iterpool)); + + /* If either SOURCE-ENTRY or TARGET-ENTRY is not a direct + modification of ANCESTOR-ENTRY, declare a conflict. */ + SVN_ERR(svn_fs_x__dag_same_line_of_history(&s_a_same, s_ent_node, + a_ent_node)); + SVN_ERR(svn_fs_x__dag_same_line_of_history(&t_a_same, t_ent_node, + a_ent_node)); + if (!s_a_same || !t_a_same) + return conflict_err(conflict_p, + svn_fspath__join(target_path, + a_entry->name, + iterpool)); + + /* Direct modifications were made to the directory + ANCESTOR-ENTRY in both SOURCE and TARGET. Recursively + merge these modifications. */ + new_tpath = svn_fspath__join(target_path, t_entry->name, iterpool); + SVN_ERR(merge(conflict_p, new_tpath, + t_ent_node, s_ent_node, a_ent_node, + txn_id, + &sub_mergeinfo_increment, + iterpool)); + mergeinfo_increment += sub_mergeinfo_increment; + } + } + + /* For each entry E in source but not in ancestor */ + for (i = 0; i < s_entries->nelts; ++i) + { + svn_fs_x__dirent_t *a_entry, *s_entry, *t_entry; + dag_node_t *s_ent_node; + apr_int64_t mergeinfo_s; + + svn_pool_clear(iterpool); + + s_entry = APR_ARRAY_IDX(s_entries, i, svn_fs_x__dirent_t *); + a_entry = svn_fs_x__find_dir_entry(a_entries, s_entry->name, &s_idx); + t_entry = svn_fs_x__find_dir_entry(t_entries, s_entry->name, &t_idx); + + /* Process only entries in source that are NOT in ancestor. */ + if (a_entry) + continue; + + /* If NAME exists in TARGET, declare a conflict. */ + if (t_entry) + return conflict_err(conflict_p, + svn_fspath__join(target_path, + t_entry->name, + iterpool)); + + SVN_ERR(svn_fs_x__dag_get_node(&s_ent_node, fs, &s_entry->id, + iterpool, iterpool)); + SVN_ERR(svn_fs_x__dag_get_mergeinfo_count(&mergeinfo_s, s_ent_node)); + mergeinfo_increment += mergeinfo_s; + + SVN_ERR(svn_fs_x__dag_set_entry + (target, s_entry->name, &s_entry->id, s_entry->kind, + txn_id, iterpool)); + } + svn_pool_destroy(iterpool); + + SVN_ERR(svn_fs_x__dag_update_ancestry(target, source, pool)); + + SVN_ERR(svn_fs_x__dag_increment_mergeinfo_count(target, + mergeinfo_increment, + pool)); + + if (mergeinfo_increment_out) + *mergeinfo_increment_out = mergeinfo_increment; + + return SVN_NO_ERROR; +} + +/* Merge changes between an ancestor and SOURCE_NODE into + TXN. The ancestor is either ANCESTOR_NODE, or if + that is null, TXN's base node. + + If the merge is successful, TXN's base will become + SOURCE_NODE, and its root node will have a new ID, a + successor of SOURCE_NODE. + + If a conflict results, update *CONFLICT to the path in the txn that + conflicted; see the CONFLICT_P parameter of merge() for details. */ +static svn_error_t * +merge_changes(dag_node_t *ancestor_node, + dag_node_t *source_node, + svn_fs_txn_t *txn, + svn_stringbuf_t *conflict, + apr_pool_t *scratch_pool) +{ + dag_node_t *txn_root_node; + svn_fs_t *fs = txn->fs; + svn_fs_x__txn_id_t txn_id = svn_fs_x__txn_get_id(txn); + svn_boolean_t related; + + SVN_ERR(svn_fs_x__dag_txn_root(&txn_root_node, fs, txn_id, scratch_pool, + scratch_pool)); + + if (ancestor_node == NULL) + { + svn_revnum_t base_rev; + SVN_ERR(svn_fs_x__get_base_rev(&base_rev, fs, txn_id, scratch_pool)); + SVN_ERR(svn_fs_x__dag_revision_root(&ancestor_node, fs, base_rev, + scratch_pool, scratch_pool)); + } + + SVN_ERR(svn_fs_x__dag_related_node(&related, ancestor_node, txn_root_node)); + if (!related) + { + /* If no changes have been made in TXN since its current base, + then it can't conflict with any changes since that base. + The caller isn't supposed to call us in that case. */ + SVN_ERR_MALFUNCTION(); + } + else + SVN_ERR(merge(conflict, "/", txn_root_node, + source_node, ancestor_node, txn_id, NULL, scratch_pool)); + + return SVN_NO_ERROR; +} + + +svn_error_t * +svn_fs_x__commit_txn(const char **conflict_p, + svn_revnum_t *new_rev, + svn_fs_txn_t *txn, + apr_pool_t *pool) +{ + /* How do commits work in Subversion? + * + * When you're ready to commit, here's what you have: + * + * 1. A transaction, with a mutable tree hanging off it. + * 2. A base revision, against which TXN_TREE was made. + * 3. A latest revision, which may be newer than the base rev. + * + * The problem is that if latest != base, then one can't simply + * attach the txn root as the root of the new revision, because that + * would lose all the changes between base and latest. It is also + * not acceptable to insist that base == latest; in a busy + * repository, commits happen too fast to insist that everyone keep + * their entire tree up-to-date at all times. Non-overlapping + * changes should not interfere with each other. + * + * The solution is to merge the changes between base and latest into + * the txn tree [see the function merge()]. The txn tree is the + * only one of the three trees that is mutable, so it has to be the + * one to adjust. + * + * You might have to adjust it more than once, if a new latest + * revision gets committed while you were merging in the previous + * one. For example: + * + * 1. Jane starts txn T, based at revision 6. + * 2. Someone commits (or already committed) revision 7. + * 3. Jane's starts merging the changes between 6 and 7 into T. + * 4. Meanwhile, someone commits revision 8. + * 5. Jane finishes the 6-->7 merge. T could now be committed + * against a latest revision of 7, if only that were still the + * latest. Unfortunately, 8 is now the latest, so... + * 6. Jane starts merging the changes between 7 and 8 into T. + * 7. Meanwhile, no one commits any new revisions. Whew. + * 8. Jane commits T, creating revision 9, whose tree is exactly + * T's tree, except immutable now. + * + * Lather, rinse, repeat. + */ + + svn_error_t *err = SVN_NO_ERROR; + svn_stringbuf_t *conflict = svn_stringbuf_create_empty(pool); + svn_fs_t *fs = txn->fs; + svn_fs_x__data_t *ffd = fs->fsap_data; + + /* Limit memory usage when the repository has a high commit rate and + needs to run the following while loop multiple times. The memory + growth without an iteration pool is very noticeable when the + transaction modifies a node that has 20,000 sibling nodes. */ + apr_pool_t *iterpool = svn_pool_create(pool); + + /* Initialize output params. */ + *new_rev = SVN_INVALID_REVNUM; + if (conflict_p) + *conflict_p = NULL; + + while (1729) + { + svn_revnum_t youngish_rev; + svn_fs_root_t *youngish_root; + dag_node_t *youngish_root_node; + + svn_pool_clear(iterpool); + + /* Get the *current* youngest revision. We call it "youngish" + because new revisions might get committed after we've + obtained it. */ + + SVN_ERR(svn_fs_x__youngest_rev(&youngish_rev, fs, iterpool)); + SVN_ERR(svn_fs_x__revision_root(&youngish_root, fs, youngish_rev, + iterpool)); + + /* Get the dag node for the youngest revision. Later we'll use + it as the SOURCE argument to a merge, and if the merge + succeeds, this youngest root node will become the new base + root for the svn txn that was the target of the merge (but + note that the youngest rev may have changed by then -- that's + why we're careful to get this root in its own bdb txn + here). */ + SVN_ERR(get_root(&youngish_root_node, youngish_root, iterpool)); + + /* Try to merge. If the merge succeeds, the base root node of + TARGET's txn will become the same as youngish_root_node, so + any future merges will only be between that node and whatever + the root node of the youngest rev is by then. */ + err = merge_changes(NULL, youngish_root_node, txn, conflict, iterpool); + if (err) + { + if ((err->apr_err == SVN_ERR_FS_CONFLICT) && conflict_p) + *conflict_p = conflict->data; + goto cleanup; + } + txn->base_rev = youngish_rev; + + /* Try to commit. */ + err = svn_fs_x__commit(new_rev, fs, txn, iterpool); + if (err && (err->apr_err == SVN_ERR_FS_TXN_OUT_OF_DATE)) + { + /* Did someone else finish committing a new revision while we + were in mid-merge or mid-commit? If so, we'll need to + loop again to merge the new changes in, then try to + commit again. Or if that's not what happened, then just + return the error. */ + svn_revnum_t youngest_rev; + SVN_ERR(svn_fs_x__youngest_rev(&youngest_rev, fs, iterpool)); + if (youngest_rev == youngish_rev) + goto cleanup; + else + svn_error_clear(err); + } + else if (err) + { + goto cleanup; + } + else + { + err = SVN_NO_ERROR; + goto cleanup; + } + } + + cleanup: + + svn_pool_destroy(iterpool); + + SVN_ERR(err); + + if (ffd->pack_after_commit) + { + SVN_ERR(svn_fs_x__pack(fs, NULL, NULL, NULL, NULL, pool)); + } + + return SVN_NO_ERROR; +} + + +/* Merge changes between two nodes into a third node. Given nodes + SOURCE_PATH under SOURCE_ROOT, TARGET_PATH under TARGET_ROOT and + ANCESTOR_PATH under ANCESTOR_ROOT, modify target to contain all the + changes between the ancestor and source. If there are conflicts, + return SVN_ERR_FS_CONFLICT and set *CONFLICT_P to a textual + description of the offending changes. Perform any temporary + allocations in POOL. */ +static svn_error_t * +x_merge(const char **conflict_p, + svn_fs_root_t *source_root, + const char *source_path, + svn_fs_root_t *target_root, + const char *target_path, + svn_fs_root_t *ancestor_root, + const char *ancestor_path, + apr_pool_t *pool) +{ + dag_node_t *source, *ancestor; + svn_fs_txn_t *txn; + svn_error_t *err; + svn_stringbuf_t *conflict = svn_stringbuf_create_empty(pool); + + if (! target_root->is_txn_root) + return SVN_FS__NOT_TXN(target_root); + + /* Paranoia. */ + if ((source_root->fs != ancestor_root->fs) + || (target_root->fs != ancestor_root->fs)) + { + return svn_error_create + (SVN_ERR_FS_CORRUPT, NULL, + _("Bad merge; ancestor, source, and target not all in same fs")); + } + + /* ### kff todo: is there any compelling reason to get the nodes in + one db transaction? Right now we don't; txn_body_get_root() gets + one node at a time. This will probably need to change: + + Jim Blandy <jimb@zwingli.cygnus.com> writes: + > svn_fs_merge needs to be a single transaction, to protect it against + > people deleting parents of nodes it's working on, etc. + */ + + /* Get the ancestor node. */ + SVN_ERR(get_root(&ancestor, ancestor_root, pool)); + + /* Get the source node. */ + SVN_ERR(get_root(&source, source_root, pool)); + + /* Open a txn for the txn root into which we're merging. */ + SVN_ERR(svn_fs_x__open_txn(&txn, ancestor_root->fs, target_root->txn, + pool)); + + /* Merge changes between ANCESTOR and SOURCE into TXN. */ + err = merge_changes(ancestor, source, txn, conflict, pool); + if (err) + { + if ((err->apr_err == SVN_ERR_FS_CONFLICT) && conflict_p) + *conflict_p = conflict->data; + return svn_error_trace(err); + } + + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__deltify(svn_fs_t *fs, + svn_revnum_t revision, + apr_pool_t *scratch_pool) +{ + /* Deltify is a no-op for fs_x. */ + + return SVN_NO_ERROR; +} + + + +/* Directories. */ + +/* Set *TABLE_P to a newly allocated APR hash table containing the + entries of the directory at PATH in ROOT. The keys of the table + are entry names, as byte strings, excluding the final null + character; the table's values are pointers to svn_fs_svn_fs_x__dirent_t + structures. Allocate the table and its contents in POOL. */ +static svn_error_t * +x_dir_entries(apr_hash_t **table_p, + svn_fs_root_t *root, + const char *path, + apr_pool_t *pool) +{ + dag_node_t *node; + apr_hash_t *hash = svn_hash__make(pool); + apr_array_header_t *table; + int i; + svn_fs_x__id_context_t *context = NULL; + apr_pool_t *scratch_pool = svn_pool_create(pool); + + /* Get the entries for this path in the caller's pool. */ + SVN_ERR(get_dag(&node, root, path, scratch_pool)); + SVN_ERR(svn_fs_x__dag_dir_entries(&table, node, scratch_pool, + scratch_pool)); + + if (table->nelts) + context = svn_fs_x__id_create_context(root->fs, pool); + + /* Convert directory array to hash. */ + for (i = 0; i < table->nelts; ++i) + { + svn_fs_x__dirent_t *entry + = APR_ARRAY_IDX(table, i, svn_fs_x__dirent_t *); + apr_size_t len = strlen(entry->name); + + svn_fs_dirent_t *api_dirent = apr_pcalloc(pool, sizeof(*api_dirent)); + api_dirent->name = apr_pstrmemdup(pool, entry->name, len); + api_dirent->kind = entry->kind; + api_dirent->id = svn_fs_x__id_create(context, &entry->id, pool); + + apr_hash_set(hash, api_dirent->name, len, api_dirent); + } + + *table_p = hash; + svn_pool_destroy(scratch_pool); + + return SVN_NO_ERROR; +} + +static svn_error_t * +x_dir_optimal_order(apr_array_header_t **ordered_p, + svn_fs_root_t *root, + apr_hash_t *entries, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool) +{ + *ordered_p = svn_fs_x__order_dir_entries(root->fs, entries, result_pool, + scratch_pool); + + return SVN_NO_ERROR; +} + +/* Create a new directory named PATH in ROOT. The new directory has + no entries, and no properties. ROOT must be the root of a + transaction, not a revision. Do any necessary temporary allocation + in SCRATCH_POOL. */ +static svn_error_t * +x_make_dir(svn_fs_root_t *root, + const char *path, + apr_pool_t *scratch_pool) +{ + parent_path_t *parent_path; + dag_node_t *sub_dir; + svn_fs_x__txn_id_t txn_id = root_txn_id(root); + apr_pool_t *subpool = svn_pool_create(scratch_pool); + + path = svn_fs__canonicalize_abspath(path, subpool); + SVN_ERR(open_path(&parent_path, root, path, open_path_last_optional, + TRUE, subpool)); + + /* Check (recursively) to see if some lock is 'reserving' a path at + that location, or even some child-path; if so, check that we can + use it. */ + if (root->txn_flags & SVN_FS_TXN_CHECK_LOCKS) + SVN_ERR(svn_fs_x__allow_locked_operation(path, root->fs, TRUE, FALSE, + subpool)); + + /* If there's already a sub-directory by that name, complain. This + also catches the case of trying to make a subdirectory named `/'. */ + if (parent_path->node) + return SVN_FS__ALREADY_EXISTS(root, path); + + /* Create the subdirectory. */ + SVN_ERR(make_path_mutable(root, parent_path->parent, path, subpool, + subpool)); + SVN_ERR(svn_fs_x__dag_make_dir(&sub_dir, + parent_path->parent->node, + parent_path_path(parent_path->parent, + subpool), + parent_path->entry, + txn_id, + subpool, subpool)); + + /* Add this directory to the path cache. */ + SVN_ERR(dag_node_cache_set(root, parent_path_path(parent_path, subpool), + sub_dir, subpool)); + + /* Make a record of this modification in the changes table. */ + SVN_ERR(add_change(root->fs, txn_id, path, svn_fs_x__dag_get_id(sub_dir), + svn_fs_path_change_add, FALSE, FALSE, FALSE, + svn_node_dir, SVN_INVALID_REVNUM, NULL, subpool)); + + svn_pool_destroy(subpool); + return SVN_NO_ERROR; +} + + +/* Delete the node at PATH under ROOT. ROOT must be a transaction + root. Perform temporary allocations in SCRATCH_POOL. */ +static svn_error_t * +x_delete_node(svn_fs_root_t *root, + const char *path, + apr_pool_t *scratch_pool) +{ + parent_path_t *parent_path; + svn_fs_x__txn_id_t txn_id; + apr_int64_t mergeinfo_count = 0; + svn_node_kind_t kind; + apr_pool_t *subpool = svn_pool_create(scratch_pool); + + if (! root->is_txn_root) + return SVN_FS__NOT_TXN(root); + + txn_id = root_txn_id(root); + path = svn_fs__canonicalize_abspath(path, subpool); + SVN_ERR(open_path(&parent_path, root, path, 0, TRUE, subpool)); + kind = svn_fs_x__dag_node_kind(parent_path->node); + + /* We can't remove the root of the filesystem. */ + if (! parent_path->parent) + return svn_error_create(SVN_ERR_FS_ROOT_DIR, NULL, + _("The root directory cannot be deleted")); + + /* Check to see if path (or any child thereof) is locked; if so, + check that we can use the existing lock(s). */ + if (root->txn_flags & SVN_FS_TXN_CHECK_LOCKS) + SVN_ERR(svn_fs_x__allow_locked_operation(path, root->fs, TRUE, FALSE, + subpool)); + + /* Make the parent directory mutable, and do the deletion. */ + SVN_ERR(make_path_mutable(root, parent_path->parent, path, subpool, + subpool)); + SVN_ERR(svn_fs_x__dag_get_mergeinfo_count(&mergeinfo_count, + parent_path->node)); + SVN_ERR(svn_fs_x__dag_delete(parent_path->parent->node, + parent_path->entry, + txn_id, subpool)); + + /* Remove this node and any children from the path cache. */ + SVN_ERR(dag_node_cache_invalidate(root, parent_path_path(parent_path, + subpool), + subpool)); + + /* Update mergeinfo counts for parents */ + if (mergeinfo_count > 0) + SVN_ERR(increment_mergeinfo_up_tree(parent_path->parent, + -mergeinfo_count, + subpool)); + + /* Make a record of this modification in the changes table. */ + SVN_ERR(add_change(root->fs, txn_id, path, + svn_fs_x__dag_get_id(parent_path->node), + svn_fs_path_change_delete, FALSE, FALSE, FALSE, kind, + SVN_INVALID_REVNUM, NULL, subpool)); + + svn_pool_destroy(subpool); + return SVN_NO_ERROR; +} + + +/* Set *SAME_P to TRUE if FS1 and FS2 have the same UUID, else set to FALSE. + Use SCRATCH_POOL for temporary allocation only. + Note: this code is duplicated between libsvn_fs_x and libsvn_fs_base. */ +static svn_error_t * +x_same_p(svn_boolean_t *same_p, + svn_fs_t *fs1, + svn_fs_t *fs2, + apr_pool_t *scratch_pool) +{ + *same_p = ! strcmp(fs1->uuid, fs2->uuid); + return SVN_NO_ERROR; +} + +/* Copy the node at FROM_PATH under FROM_ROOT to TO_PATH under + TO_ROOT. If PRESERVE_HISTORY is set, then the copy is recorded in + the copies table. Perform temporary allocations in SCRATCH_POOL. */ +static svn_error_t * +copy_helper(svn_fs_root_t *from_root, + const char *from_path, + svn_fs_root_t *to_root, + const char *to_path, + svn_boolean_t preserve_history, + apr_pool_t *scratch_pool) +{ + dag_node_t *from_node; + parent_path_t *to_parent_path; + svn_fs_x__txn_id_t txn_id = root_txn_id(to_root); + svn_boolean_t same_p; + + /* Use an error check, not an assert, because even the caller cannot + guarantee that a filesystem's UUID has not changed "on the fly". */ + SVN_ERR(x_same_p(&same_p, from_root->fs, to_root->fs, scratch_pool)); + if (! same_p) + return svn_error_createf + (SVN_ERR_UNSUPPORTED_FEATURE, NULL, + _("Cannot copy between two different filesystems ('%s' and '%s')"), + from_root->fs->path, to_root->fs->path); + + /* more things that we can't do ATM */ + if (from_root->is_txn_root) + return svn_error_create + (SVN_ERR_UNSUPPORTED_FEATURE, NULL, + _("Copy from mutable tree not currently supported")); + + if (! to_root->is_txn_root) + return svn_error_create + (SVN_ERR_UNSUPPORTED_FEATURE, NULL, + _("Copy immutable tree not supported")); + + /* Get the NODE for FROM_PATH in FROM_ROOT.*/ + SVN_ERR(get_dag(&from_node, from_root, from_path, scratch_pool)); + + /* Build up the parent path from TO_PATH in TO_ROOT. If the last + component does not exist, it's not that big a deal. We'll just + make one there. */ + SVN_ERR(open_path(&to_parent_path, to_root, to_path, + open_path_last_optional, TRUE, scratch_pool)); + + /* Check to see if path (or any child thereof) is locked; if so, + check that we can use the existing lock(s). */ + if (to_root->txn_flags & SVN_FS_TXN_CHECK_LOCKS) + SVN_ERR(svn_fs_x__allow_locked_operation(to_path, to_root->fs, + TRUE, FALSE, scratch_pool)); + + /* If the destination node already exists as the same node as the + source (in other words, this operation would result in nothing + happening at all), just do nothing an return successfully, + proud that you saved yourself from a tiresome task. */ + if (to_parent_path->node && + svn_fs_x__id_eq(svn_fs_x__dag_get_id(from_node), + svn_fs_x__dag_get_id(to_parent_path->node))) + return SVN_NO_ERROR; + + if (! from_root->is_txn_root) + { + svn_fs_path_change_kind_t kind; + dag_node_t *new_node; + const char *from_canonpath; + apr_int64_t mergeinfo_start; + apr_int64_t mergeinfo_end; + + /* If TO_PATH already existed prior to the copy, note that this + operation is a replacement, not an addition. */ + if (to_parent_path->node) + { + kind = svn_fs_path_change_replace; + SVN_ERR(svn_fs_x__dag_get_mergeinfo_count(&mergeinfo_start, + to_parent_path->node)); + } + else + { + kind = svn_fs_path_change_add; + mergeinfo_start = 0; + } + + SVN_ERR(svn_fs_x__dag_get_mergeinfo_count(&mergeinfo_end, from_node)); + + /* Make sure the target node's parents are mutable. */ + SVN_ERR(make_path_mutable(to_root, to_parent_path->parent, + to_path, scratch_pool, scratch_pool)); + + /* Canonicalize the copyfrom path. */ + from_canonpath = svn_fs__canonicalize_abspath(from_path, scratch_pool); + + SVN_ERR(svn_fs_x__dag_copy(to_parent_path->parent->node, + to_parent_path->entry, + from_node, + preserve_history, + from_root->rev, + from_canonpath, + txn_id, scratch_pool)); + + if (kind != svn_fs_path_change_add) + SVN_ERR(dag_node_cache_invalidate(to_root, + parent_path_path(to_parent_path, + scratch_pool), + scratch_pool)); + + if (mergeinfo_start != mergeinfo_end) + SVN_ERR(increment_mergeinfo_up_tree(to_parent_path->parent, + mergeinfo_end - mergeinfo_start, + scratch_pool)); + + /* Make a record of this modification in the changes table. */ + SVN_ERR(get_dag(&new_node, to_root, to_path, scratch_pool)); + SVN_ERR(add_change(to_root->fs, txn_id, to_path, + svn_fs_x__dag_get_id(new_node), kind, FALSE, + FALSE, FALSE, svn_fs_x__dag_node_kind(from_node), + from_root->rev, from_canonpath, scratch_pool)); + } + else + { + /* See IZ Issue #436 */ + /* Copying from transaction roots not currently available. + + ### cmpilato todo someday: make this not so. :-) Note that + when copying from mutable trees, you have to make sure that + you aren't creating a cyclic graph filesystem, and a simple + referencing operation won't cut it. Currently, we should not + be able to reach this clause, and the interface reports that + this only works from immutable trees anyway, but JimB has + stated that this requirement need not be necessary in the + future. */ + + SVN_ERR_MALFUNCTION(); + } + + return SVN_NO_ERROR; +} + + +/* Create a copy of FROM_PATH in FROM_ROOT named TO_PATH in TO_ROOT. + If FROM_PATH is a directory, copy it recursively. Temporary + allocations are from SCRATCH_POOL.*/ +static svn_error_t * +x_copy(svn_fs_root_t *from_root, + const char *from_path, + svn_fs_root_t *to_root, + const char *to_path, + apr_pool_t *scratch_pool) +{ + apr_pool_t *subpool = svn_pool_create(scratch_pool); + + SVN_ERR(copy_helper(from_root, + svn_fs__canonicalize_abspath(from_path, subpool), + to_root, + svn_fs__canonicalize_abspath(to_path, subpool), + TRUE, subpool)); + + svn_pool_destroy(subpool); + + return SVN_NO_ERROR; +} + + +/* Create a copy of FROM_PATH in FROM_ROOT named TO_PATH in TO_ROOT. + If FROM_PATH is a directory, copy it recursively. No history is + preserved. Temporary allocations are from SCRATCH_POOL. */ +static svn_error_t * +x_revision_link(svn_fs_root_t *from_root, + svn_fs_root_t *to_root, + const char *path, + apr_pool_t *scratch_pool) +{ + apr_pool_t *subpool; + + if (! to_root->is_txn_root) + return SVN_FS__NOT_TXN(to_root); + + subpool = svn_pool_create(scratch_pool); + + path = svn_fs__canonicalize_abspath(path, subpool); + SVN_ERR(copy_helper(from_root, path, to_root, path, FALSE, subpool)); + + svn_pool_destroy(subpool); + + return SVN_NO_ERROR; +} + + +/* Discover the copy ancestry of PATH under ROOT. Return a relevant + ancestor/revision combination in *PATH_P and *REV_P. Temporary + allocations are in POOL. */ +static svn_error_t * +x_copied_from(svn_revnum_t *rev_p, + const char **path_p, + svn_fs_root_t *root, + const char *path, + apr_pool_t *pool) +{ + dag_node_t *node; + + /* There is no cached entry, look it up the old-fashioned + way. */ + SVN_ERR(get_dag(&node, root, path, pool)); + SVN_ERR(svn_fs_x__dag_get_copyfrom_rev(rev_p, node)); + SVN_ERR(svn_fs_x__dag_get_copyfrom_path(path_p, node)); + + return SVN_NO_ERROR; +} + + + +/* Files. */ + +/* Create the empty file PATH under ROOT. Temporary allocations are + in SCRATCH_POOL. */ +static svn_error_t * +x_make_file(svn_fs_root_t *root, + const char *path, + apr_pool_t *scratch_pool) +{ + parent_path_t *parent_path; + dag_node_t *child; + svn_fs_x__txn_id_t txn_id = root_txn_id(root); + apr_pool_t *subpool = svn_pool_create(scratch_pool); + + path = svn_fs__canonicalize_abspath(path, subpool); + SVN_ERR(open_path(&parent_path, root, path, open_path_last_optional, + TRUE, subpool)); + + /* If there's already a file by that name, complain. + This also catches the case of trying to make a file named `/'. */ + if (parent_path->node) + return SVN_FS__ALREADY_EXISTS(root, path); + + /* Check (non-recursively) to see if path is locked; if so, check + that we can use it. */ + if (root->txn_flags & SVN_FS_TXN_CHECK_LOCKS) + SVN_ERR(svn_fs_x__allow_locked_operation(path, root->fs, FALSE, FALSE, + subpool)); + + /* Create the file. */ + SVN_ERR(make_path_mutable(root, parent_path->parent, path, subpool, + subpool)); + SVN_ERR(svn_fs_x__dag_make_file(&child, + parent_path->parent->node, + parent_path_path(parent_path->parent, + subpool), + parent_path->entry, + txn_id, + subpool, subpool)); + + /* Add this file to the path cache. */ + SVN_ERR(dag_node_cache_set(root, parent_path_path(parent_path, subpool), + child, subpool)); + + /* Make a record of this modification in the changes table. */ + SVN_ERR(add_change(root->fs, txn_id, path, svn_fs_x__dag_get_id(child), + svn_fs_path_change_add, TRUE, FALSE, FALSE, + svn_node_file, SVN_INVALID_REVNUM, NULL, subpool)); + + svn_pool_destroy(subpool); + return SVN_NO_ERROR; +} + + +/* Set *LENGTH_P to the size of the file PATH under ROOT. Temporary + allocations are in SCRATCH_POOL. */ +static svn_error_t * +x_file_length(svn_filesize_t *length_p, + svn_fs_root_t *root, + const char *path, + apr_pool_t *scratch_pool) +{ + dag_node_t *file; + + /* First create a dag_node_t from the root/path pair. */ + SVN_ERR(get_dag(&file, root, path, scratch_pool)); + + /* Now fetch its length */ + return svn_fs_x__dag_file_length(length_p, file); +} + + +/* Set *CHECKSUM to the checksum of type KIND for PATH under ROOT, or + NULL if that information isn't available. Temporary allocations + are from POOL. */ +static svn_error_t * +x_file_checksum(svn_checksum_t **checksum, + svn_checksum_kind_t kind, + svn_fs_root_t *root, + const char *path, + apr_pool_t *pool) +{ + dag_node_t *file; + + SVN_ERR(get_dag(&file, root, path, pool)); + return svn_fs_x__dag_file_checksum(checksum, file, kind, pool); +} + + +/* --- Machinery for svn_fs_file_contents() --- */ + +/* Set *CONTENTS to a readable stream that will return the contents of + PATH under ROOT. The stream is allocated in POOL. */ +static svn_error_t * +x_file_contents(svn_stream_t **contents, + svn_fs_root_t *root, + const char *path, + apr_pool_t *pool) +{ + dag_node_t *node; + svn_stream_t *file_stream; + + /* First create a dag_node_t from the root/path pair. */ + SVN_ERR(get_dag(&node, root, path, pool)); + + /* Then create a readable stream from the dag_node_t. */ + SVN_ERR(svn_fs_x__dag_get_contents(&file_stream, node, pool)); + + *contents = file_stream; + return SVN_NO_ERROR; +} + +/* --- End machinery for svn_fs_file_contents() --- */ + + +/* --- Machinery for svn_fs_try_process_file_contents() --- */ + +static svn_error_t * +x_try_process_file_contents(svn_boolean_t *success, + svn_fs_root_t *root, + const char *path, + svn_fs_process_contents_func_t processor, + void* baton, + apr_pool_t *pool) +{ + dag_node_t *node; + SVN_ERR(get_dag(&node, root, path, pool)); + + return svn_fs_x__dag_try_process_file_contents(success, node, + processor, baton, pool); +} + +/* --- End machinery for svn_fs_try_process_file_contents() --- */ + + +/* --- Machinery for svn_fs_apply_textdelta() --- */ + + +/* Local baton type for all the helper functions below. */ +typedef struct txdelta_baton_t +{ + /* This is the custom-built window consumer given to us by the delta + library; it uniquely knows how to read data from our designated + "source" stream, interpret the window, and write data to our + designated "target" stream (in this case, our repos file.) */ + svn_txdelta_window_handler_t interpreter; + void *interpreter_baton; + + /* The original file info */ + svn_fs_root_t *root; + const char *path; + + /* Derived from the file info */ + dag_node_t *node; + + svn_stream_t *source_stream; + svn_stream_t *target_stream; + + /* MD5 digest for the base text against which a delta is to be + applied, and for the resultant fulltext, respectively. Either or + both may be null, in which case ignored. */ + svn_checksum_t *base_checksum; + svn_checksum_t *result_checksum; + + /* Pool used by db txns */ + apr_pool_t *pool; + +} txdelta_baton_t; + + +/* The main window handler returned by svn_fs_apply_textdelta. */ +static svn_error_t * +window_consumer(svn_txdelta_window_t *window, void *baton) +{ + txdelta_baton_t *tb = (txdelta_baton_t *) baton; + + /* Send the window right through to the custom window interpreter. + In theory, the interpreter will then write more data to + cb->target_string. */ + SVN_ERR(tb->interpreter(window, tb->interpreter_baton)); + + /* Is the window NULL? If so, we're done. The stream has already been + closed by the interpreter. */ + if (! window) + SVN_ERR(svn_fs_x__dag_finalize_edits(tb->node, tb->result_checksum, + tb->pool)); + + return SVN_NO_ERROR; +} + +/* Helper function for fs_apply_textdelta. BATON is of type + txdelta_baton_t. */ +static svn_error_t * +apply_textdelta(void *baton, + apr_pool_t *scratch_pool) +{ + txdelta_baton_t *tb = (txdelta_baton_t *) baton; + parent_path_t *parent_path; + svn_fs_x__txn_id_t txn_id = root_txn_id(tb->root); + + /* Call open_path with no flags, as we want this to return an error + if the node for which we are searching doesn't exist. */ + SVN_ERR(open_path(&parent_path, tb->root, tb->path, 0, TRUE, scratch_pool)); + + /* Check (non-recursively) to see if path is locked; if so, check + that we can use it. */ + if (tb->root->txn_flags & SVN_FS_TXN_CHECK_LOCKS) + SVN_ERR(svn_fs_x__allow_locked_operation(tb->path, tb->root->fs, + FALSE, FALSE, scratch_pool)); + + /* Now, make sure this path is mutable. */ + SVN_ERR(make_path_mutable(tb->root, parent_path, tb->path, scratch_pool, + scratch_pool)); + tb->node = svn_fs_x__dag_dup(parent_path->node, tb->pool); + + if (tb->base_checksum) + { + svn_checksum_t *checksum; + + /* Until we finalize the node, its data_key points to the old + contents, in other words, the base text. */ + SVN_ERR(svn_fs_x__dag_file_checksum(&checksum, tb->node, + tb->base_checksum->kind, + scratch_pool)); + if (!svn_checksum_match(tb->base_checksum, checksum)) + return svn_checksum_mismatch_err(tb->base_checksum, checksum, + scratch_pool, + _("Base checksum mismatch on '%s'"), + tb->path); + } + + /* Make a readable "source" stream out of the current contents of + ROOT/PATH; obviously, this must done in the context of a db_txn. + The stream is returned in tb->source_stream. */ + SVN_ERR(svn_fs_x__dag_get_contents(&(tb->source_stream), + tb->node, tb->pool)); + + /* Make a writable "target" stream */ + SVN_ERR(svn_fs_x__dag_get_edit_stream(&(tb->target_stream), tb->node, + tb->pool)); + + /* Now, create a custom window handler that uses our two streams. */ + svn_txdelta_apply(tb->source_stream, + tb->target_stream, + NULL, + tb->path, + tb->pool, + &(tb->interpreter), + &(tb->interpreter_baton)); + + /* Make a record of this modification in the changes table. */ + return add_change(tb->root->fs, txn_id, tb->path, + svn_fs_x__dag_get_id(tb->node), + svn_fs_path_change_modify, TRUE, FALSE, FALSE, + svn_node_file, SVN_INVALID_REVNUM, NULL, scratch_pool); +} + + +/* Set *CONTENTS_P and *CONTENTS_BATON_P to a window handler and baton + that will accept text delta windows to modify the contents of PATH + under ROOT. Allocations are in POOL. */ +static svn_error_t * +x_apply_textdelta(svn_txdelta_window_handler_t *contents_p, + void **contents_baton_p, + svn_fs_root_t *root, + const char *path, + svn_checksum_t *base_checksum, + svn_checksum_t *result_checksum, + apr_pool_t *pool) +{ + apr_pool_t *scratch_pool = svn_pool_create(pool); + txdelta_baton_t *tb = apr_pcalloc(pool, sizeof(*tb)); + + tb->root = root; + tb->path = svn_fs__canonicalize_abspath(path, pool); + tb->pool = pool; + tb->base_checksum = svn_checksum_dup(base_checksum, pool); + tb->result_checksum = svn_checksum_dup(result_checksum, pool); + + SVN_ERR(apply_textdelta(tb, scratch_pool)); + + *contents_p = window_consumer; + *contents_baton_p = tb; + + svn_pool_destroy(scratch_pool); + return SVN_NO_ERROR; +} + +/* --- End machinery for svn_fs_apply_textdelta() --- */ + +/* --- Machinery for svn_fs_apply_text() --- */ + +/* Baton for svn_fs_apply_text(). */ +typedef struct text_baton_t +{ + /* The original file info */ + svn_fs_root_t *root; + const char *path; + + /* Derived from the file info */ + dag_node_t *node; + + /* The returned stream that will accept the file's new contents. */ + svn_stream_t *stream; + + /* The actual fs stream that the returned stream will write to. */ + svn_stream_t *file_stream; + + /* MD5 digest for the final fulltext written to the file. May + be null, in which case ignored. */ + svn_checksum_t *result_checksum; + + /* Pool used by db txns */ + apr_pool_t *pool; +} text_baton_t; + + +/* A wrapper around svn_fs_x__dag_finalize_edits, but for + * fulltext data, not text deltas. Closes BATON->file_stream. + * + * Note: If you're confused about how this function relates to another + * of similar name, think of it this way: + * + * svn_fs_apply_textdelta() ==> ... ==> txn_body_txdelta_finalize_edits() + * svn_fs_apply_text() ==> ... ==> txn_body_fulltext_finalize_edits() + */ + +/* Write function for the publically returned stream. */ +static svn_error_t * +text_stream_writer(void *baton, + const char *data, + apr_size_t *len) +{ + text_baton_t *tb = baton; + + /* Psst, here's some data. Pass it on to the -real- file stream. */ + return svn_stream_write(tb->file_stream, data, len); +} + +/* Close function for the publically returned stream. */ +static svn_error_t * +text_stream_closer(void *baton) +{ + text_baton_t *tb = baton; + + /* Close the internal-use stream. ### This used to be inside of + txn_body_fulltext_finalize_edits(), but that invoked a nested + Berkeley DB transaction -- scandalous! */ + SVN_ERR(svn_stream_close(tb->file_stream)); + + /* Need to tell fs that we're done sending text */ + return svn_fs_x__dag_finalize_edits(tb->node, tb->result_checksum, + tb->pool); +} + + +/* Helper function for fs_apply_text. BATON is of type + text_baton_t. */ +static svn_error_t * +apply_text(void *baton, + apr_pool_t *scratch_pool) +{ + text_baton_t *tb = baton; + parent_path_t *parent_path; + svn_fs_x__txn_id_t txn_id = root_txn_id(tb->root); + + /* Call open_path with no flags, as we want this to return an error + if the node for which we are searching doesn't exist. */ + SVN_ERR(open_path(&parent_path, tb->root, tb->path, 0, TRUE, scratch_pool)); + + /* Check (non-recursively) to see if path is locked; if so, check + that we can use it. */ + if (tb->root->txn_flags & SVN_FS_TXN_CHECK_LOCKS) + SVN_ERR(svn_fs_x__allow_locked_operation(tb->path, tb->root->fs, + FALSE, FALSE, scratch_pool)); + + /* Now, make sure this path is mutable. */ + SVN_ERR(make_path_mutable(tb->root, parent_path, tb->path, scratch_pool, + scratch_pool)); + tb->node = svn_fs_x__dag_dup(parent_path->node, tb->pool); + + /* Make a writable stream for replacing the file's text. */ + SVN_ERR(svn_fs_x__dag_get_edit_stream(&(tb->file_stream), tb->node, + tb->pool)); + + /* Create a 'returnable' stream which writes to the file_stream. */ + tb->stream = svn_stream_create(tb, tb->pool); + svn_stream_set_write(tb->stream, text_stream_writer); + svn_stream_set_close(tb->stream, text_stream_closer); + + /* Make a record of this modification in the changes table. */ + return add_change(tb->root->fs, txn_id, tb->path, + svn_fs_x__dag_get_id(tb->node), + svn_fs_path_change_modify, TRUE, FALSE, FALSE, + svn_node_file, SVN_INVALID_REVNUM, NULL, scratch_pool); +} + + +/* Return a writable stream that will set the contents of PATH under + ROOT. RESULT_CHECKSUM is the MD5 checksum of the final result. + Temporary allocations are in POOL. */ +static svn_error_t * +x_apply_text(svn_stream_t **contents_p, + svn_fs_root_t *root, + const char *path, + svn_checksum_t *result_checksum, + apr_pool_t *pool) +{ + apr_pool_t *scratch_pool = svn_pool_create(pool); + text_baton_t *tb = apr_pcalloc(pool, sizeof(*tb)); + + tb->root = root; + tb->path = svn_fs__canonicalize_abspath(path, pool); + tb->pool = pool; + tb->result_checksum = svn_checksum_dup(result_checksum, pool); + + SVN_ERR(apply_text(tb, scratch_pool)); + + *contents_p = tb->stream; + + svn_pool_destroy(scratch_pool); + return SVN_NO_ERROR; +} + +/* --- End machinery for svn_fs_apply_text() --- */ + + +/* Check if the contents of PATH1 under ROOT1 are different from the + contents of PATH2 under ROOT2. If they are different set + *CHANGED_P to TRUE, otherwise set it to FALSE. */ +static svn_error_t * +x_contents_changed(svn_boolean_t *changed_p, + svn_fs_root_t *root1, + const char *path1, + svn_fs_root_t *root2, + const char *path2, + svn_boolean_t strict, + apr_pool_t *scratch_pool) +{ + dag_node_t *node1, *node2; + apr_pool_t *subpool = svn_pool_create(scratch_pool); + + /* Check that roots are in the same fs. */ + if (root1->fs != root2->fs) + return svn_error_create + (SVN_ERR_FS_GENERAL, NULL, + _("Cannot compare file contents between two different filesystems")); + + /* Check that both paths are files. */ + { + svn_node_kind_t kind; + + SVN_ERR(svn_fs_x__check_path(&kind, root1, path1, subpool)); + if (kind != svn_node_file) + return svn_error_createf + (SVN_ERR_FS_GENERAL, NULL, _("'%s' is not a file"), path1); + + SVN_ERR(svn_fs_x__check_path(&kind, root2, path2, subpool)); + if (kind != svn_node_file) + return svn_error_createf + (SVN_ERR_FS_GENERAL, NULL, _("'%s' is not a file"), path2); + } + + SVN_ERR(get_dag(&node1, root1, path1, subpool)); + SVN_ERR(get_dag(&node2, root2, path2, subpool)); + SVN_ERR(svn_fs_x__dag_things_different(NULL, changed_p, node1, node2, + strict, subpool)); + + svn_pool_destroy(subpool); + return SVN_NO_ERROR; +} + + + +/* Public interface to computing file text deltas. */ + +static svn_error_t * +x_get_file_delta_stream(svn_txdelta_stream_t **stream_p, + svn_fs_root_t *source_root, + const char *source_path, + svn_fs_root_t *target_root, + const char *target_path, + apr_pool_t *pool) +{ + dag_node_t *source_node, *target_node; + apr_pool_t *scratch_pool = svn_pool_create(pool); + + if (source_root && source_path) + SVN_ERR(get_dag(&source_node, source_root, source_path, scratch_pool)); + else + source_node = NULL; + SVN_ERR(get_dag(&target_node, target_root, target_path, scratch_pool)); + + /* Create a delta stream that turns the source into the target. */ + SVN_ERR(svn_fs_x__dag_get_file_delta_stream(stream_p, source_node, + target_node, pool, + scratch_pool)); + + svn_pool_destroy(scratch_pool); + return SVN_NO_ERROR; +} + + + +/* Finding Changes */ + +/* Copy CHANGE into a FS API object allocated in RESULT_POOL and return + it in *RESULT_P. Pass CONTEXT to the ID API object being created. */ +static svn_error_t * +construct_fs_path_change(svn_fs_path_change2_t **result_p, + svn_fs_x__id_context_t *context, + svn_fs_x__change_t *change, + apr_pool_t *result_pool) +{ + const svn_fs_id_t *id + = svn_fs_x__id_create(context, &change->noderev_id, result_pool); + svn_fs_path_change2_t *result + = svn_fs__path_change_create_internal(id, change->change_kind, + result_pool); + + result->text_mod = change->text_mod; + result->prop_mod = change->prop_mod; + result->node_kind = change->node_kind; + + result->copyfrom_known = change->copyfrom_known; + result->copyfrom_rev = change->copyfrom_rev; + result->copyfrom_path = change->copyfrom_path; + + result->mergeinfo_mod = change->mergeinfo_mod; + + *result_p = result; + + return SVN_NO_ERROR; +} + +/* Set *CHANGED_PATHS_P to a newly allocated hash containing + descriptions of the paths changed under ROOT. The hash is keyed + with const char * paths and has svn_fs_path_change2_t * values. Use + POOL for all allocations. */ +static svn_error_t * +x_paths_changed(apr_hash_t **changed_paths_p, + svn_fs_root_t *root, + apr_pool_t *pool) +{ + apr_hash_t *changed_paths; + svn_fs_path_change2_t *path_change; + svn_fs_x__id_context_t *context + = svn_fs_x__id_create_context(root->fs, pool); + + if (root->is_txn_root) + { + apr_hash_index_t *hi; + SVN_ERR(svn_fs_x__txn_changes_fetch(&changed_paths, root->fs, + root_txn_id(root), pool)); + for (hi = apr_hash_first(pool, changed_paths); + hi; + hi = apr_hash_next(hi)) + { + svn_fs_x__change_t *change = apr_hash_this_val(hi); + SVN_ERR(construct_fs_path_change(&path_change, context, change, + pool)); + apr_hash_set(changed_paths, + apr_hash_this_key(hi), apr_hash_this_key_len(hi), + path_change); + } + } + else + { + apr_array_header_t *changes; + int i; + + SVN_ERR(svn_fs_x__get_changes(&changes, root->fs, root->rev, pool)); + + changed_paths = svn_hash__make(pool); + for (i = 0; i < changes->nelts; ++i) + { + svn_fs_x__change_t *change = APR_ARRAY_IDX(changes, i, + svn_fs_x__change_t *); + SVN_ERR(construct_fs_path_change(&path_change, context, change, + pool)); + apr_hash_set(changed_paths, change->path.data, change->path.len, + path_change); + } + } + + *changed_paths_p = changed_paths; + + return SVN_NO_ERROR; +} + + + +/* Our coolio opaque history object. */ +typedef struct fs_history_data_t +{ + /* filesystem object */ + svn_fs_t *fs; + + /* path and revision of historical location */ + const char *path; + svn_revnum_t revision; + + /* internal-use hints about where to resume the history search. */ + const char *path_hint; + svn_revnum_t rev_hint; + + /* FALSE until the first call to svn_fs_history_prev(). */ + svn_boolean_t is_interesting; +} fs_history_data_t; + +static svn_fs_history_t * +assemble_history(svn_fs_t *fs, + const char *path, + svn_revnum_t revision, + svn_boolean_t is_interesting, + const char *path_hint, + svn_revnum_t rev_hint, + apr_pool_t *result_pool); + + +/* Set *HISTORY_P to an opaque node history object which represents + PATH under ROOT. ROOT must be a revision root. Use POOL for all + allocations. */ +static svn_error_t * +x_node_history(svn_fs_history_t **history_p, + svn_fs_root_t *root, + const char *path, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool) +{ + svn_node_kind_t kind; + + /* We require a revision root. */ + if (root->is_txn_root) + return svn_error_create(SVN_ERR_FS_NOT_REVISION_ROOT, NULL, NULL); + + /* And we require that the path exist in the root. */ + SVN_ERR(svn_fs_x__check_path(&kind, root, path, scratch_pool)); + if (kind == svn_node_none) + return SVN_FS__NOT_FOUND(root, path); + + /* Okay, all seems well. Build our history object and return it. */ + *history_p = assemble_history(root->fs, path, root->rev, FALSE, NULL, + SVN_INVALID_REVNUM, result_pool); + return SVN_NO_ERROR; +} + +/* Find the youngest copyroot for path PARENT_PATH or its parents in + filesystem FS, and store the copyroot in *REV_P and *PATH_P. */ +static svn_error_t * +find_youngest_copyroot(svn_revnum_t *rev_p, + const char **path_p, + svn_fs_t *fs, + parent_path_t *parent_path) +{ + svn_revnum_t rev_mine; + svn_revnum_t rev_parent = SVN_INVALID_REVNUM; + const char *path_mine; + const char *path_parent = NULL; + + /* First find our parent's youngest copyroot. */ + if (parent_path->parent) + SVN_ERR(find_youngest_copyroot(&rev_parent, &path_parent, fs, + parent_path->parent)); + + /* Find our copyroot. */ + SVN_ERR(svn_fs_x__dag_get_copyroot(&rev_mine, &path_mine, + parent_path->node)); + + /* If a parent and child were copied to in the same revision, prefer + the child copy target, since it is the copy relevant to the + history of the child. */ + if (rev_mine >= rev_parent) + { + *rev_p = rev_mine; + *path_p = path_mine; + } + else + { + *rev_p = rev_parent; + *path_p = path_parent; + } + + return SVN_NO_ERROR; +} + + +static svn_error_t * +x_closest_copy(svn_fs_root_t **root_p, + const char **path_p, + svn_fs_root_t *root, + const char *path, + apr_pool_t *pool) +{ + svn_fs_t *fs = root->fs; + parent_path_t *parent_path, *copy_dst_parent_path; + svn_revnum_t copy_dst_rev, created_rev; + const char *copy_dst_path; + svn_fs_root_t *copy_dst_root; + dag_node_t *copy_dst_node; + svn_boolean_t related; + apr_pool_t *scratch_pool = svn_pool_create(pool); + + /* Initialize return values. */ + *root_p = NULL; + *path_p = NULL; + + path = svn_fs__canonicalize_abspath(path, scratch_pool); + SVN_ERR(open_path(&parent_path, root, path, 0, FALSE, scratch_pool)); + + /* Find the youngest copyroot in the path of this node-rev, which + will indicate the target of the innermost copy affecting the + node-rev. */ + SVN_ERR(find_youngest_copyroot(©_dst_rev, ©_dst_path, + fs, parent_path)); + if (copy_dst_rev == 0) /* There are no copies affecting this node-rev. */ + { + svn_pool_destroy(scratch_pool); + return SVN_NO_ERROR; + } + + /* It is possible that this node was created from scratch at some + revision between COPY_DST_REV and REV. Make sure that PATH + exists as of COPY_DST_REV and is related to this node-rev. */ + SVN_ERR(svn_fs_x__revision_root(©_dst_root, fs, copy_dst_rev, pool)); + SVN_ERR(open_path(©_dst_parent_path, copy_dst_root, path, + open_path_node_only | open_path_allow_null, FALSE, + scratch_pool)); + if (copy_dst_parent_path == NULL) + { + svn_pool_destroy(scratch_pool); + return SVN_NO_ERROR; + } + + copy_dst_node = copy_dst_parent_path->node; + SVN_ERR(svn_fs_x__dag_related_node(&related, copy_dst_node, + parent_path->node)); + if (!related) + { + svn_pool_destroy(scratch_pool); + return SVN_NO_ERROR; + } + + /* One final check must be done here. If you copy a directory and + create a new entity somewhere beneath that directory in the same + txn, then we can't claim that the copy affected the new entity. + For example, if you do: + + copy dir1 dir2 + create dir2/new-thing + commit + + then dir2/new-thing was not affected by the copy of dir1 to dir2. + We detect this situation by asking if PATH@COPY_DST_REV's + created-rev is COPY_DST_REV, and that node-revision has no + predecessors, then there is no relevant closest copy. + */ + created_rev = svn_fs_x__dag_get_revision(copy_dst_node); + if (created_rev == copy_dst_rev) + { + svn_fs_x__id_t pred; + SVN_ERR(svn_fs_x__dag_get_predecessor_id(&pred, copy_dst_node)); + if (!svn_fs_x__id_used(&pred)) + { + svn_pool_destroy(scratch_pool); + return SVN_NO_ERROR; + } + } + + /* The copy destination checks out. Return it. */ + *root_p = copy_dst_root; + *path_p = apr_pstrdup(pool, copy_dst_path); + + svn_pool_destroy(scratch_pool); + return SVN_NO_ERROR; +} + + +static svn_error_t * +x_node_origin_rev(svn_revnum_t *revision, + svn_fs_root_t *root, + const char *path, + apr_pool_t *scratch_pool) +{ + svn_fs_x__id_t node_id; + dag_node_t *node; + + path = svn_fs__canonicalize_abspath(path, scratch_pool); + + SVN_ERR(get_dag(&node, root, path, scratch_pool)); + SVN_ERR(svn_fs_x__dag_get_node_id(&node_id, node)); + + *revision = svn_fs_x__get_revnum(node_id.change_set); + + return SVN_NO_ERROR; +} + + +static svn_error_t * +history_prev(svn_fs_history_t **prev_history, + svn_fs_history_t *history, + svn_boolean_t cross_copies, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool) +{ + fs_history_data_t *fhd = history->fsap_data; + const char *commit_path, *src_path, *path = fhd->path; + svn_revnum_t commit_rev, src_rev, dst_rev; + svn_revnum_t revision = fhd->revision; + svn_fs_t *fs = fhd->fs; + parent_path_t *parent_path; + dag_node_t *node; + svn_fs_root_t *root; + svn_boolean_t reported = fhd->is_interesting; + svn_revnum_t copyroot_rev; + const char *copyroot_path; + + /* Initialize our return value. */ + *prev_history = NULL; + + /* If our last history report left us hints about where to pickup + the chase, then our last report was on the destination of a + copy. If we are crossing copies, start from those locations, + otherwise, we're all done here. */ + if (fhd->path_hint && SVN_IS_VALID_REVNUM(fhd->rev_hint)) + { + reported = FALSE; + if (! cross_copies) + return SVN_NO_ERROR; + path = fhd->path_hint; + revision = fhd->rev_hint; + } + + /* Construct a ROOT for the current revision. */ + SVN_ERR(svn_fs_x__revision_root(&root, fs, revision, scratch_pool)); + + /* Open PATH/REVISION, and get its node and a bunch of other + goodies. */ + SVN_ERR(open_path(&parent_path, root, path, 0, FALSE, scratch_pool)); + node = parent_path->node; + commit_path = svn_fs_x__dag_get_created_path(node); + commit_rev = svn_fs_x__dag_get_revision(node); + + /* The Subversion filesystem is written in such a way that a given + line of history may have at most one interesting history point + per filesystem revision. Either that node was edited (and + possibly copied), or it was copied but not edited. And a copy + source cannot be from the same revision as its destination. So, + if our history revision matches its node's commit revision, we + know that ... */ + if (revision == commit_rev) + { + if (! reported) + { + /* ... we either have not yet reported on this revision (and + need now to do so) ... */ + *prev_history = assemble_history(fs, commit_path, + commit_rev, TRUE, NULL, + SVN_INVALID_REVNUM, result_pool); + return SVN_NO_ERROR; + } + else + { + /* ... or we *have* reported on this revision, and must now + progress toward this node's predecessor (unless there is + no predecessor, in which case we're all done!). */ + svn_fs_x__id_t pred_id; + + SVN_ERR(svn_fs_x__dag_get_predecessor_id(&pred_id, node)); + if (!svn_fs_x__id_used(&pred_id)) + return SVN_NO_ERROR; + + /* Replace NODE and friends with the information from its + predecessor. */ + SVN_ERR(svn_fs_x__dag_get_node(&node, fs, &pred_id, scratch_pool, + scratch_pool)); + commit_path = svn_fs_x__dag_get_created_path(node); + commit_rev = svn_fs_x__dag_get_revision(node); + } + } + + /* Find the youngest copyroot in the path of this node, including + itself. */ + SVN_ERR(find_youngest_copyroot(©root_rev, ©root_path, fs, + parent_path)); + + /* Initialize some state variables. */ + src_path = NULL; + src_rev = SVN_INVALID_REVNUM; + dst_rev = SVN_INVALID_REVNUM; + + if (copyroot_rev > commit_rev) + { + const char *remainder_path; + const char *copy_dst, *copy_src; + svn_fs_root_t *copyroot_root; + + SVN_ERR(svn_fs_x__revision_root(©root_root, fs, copyroot_rev, + scratch_pool)); + SVN_ERR(get_dag(&node, copyroot_root, copyroot_path, scratch_pool)); + copy_dst = svn_fs_x__dag_get_created_path(node); + + /* If our current path was the very destination of the copy, + then our new current path will be the copy source. If our + current path was instead the *child* of the destination of + the copy, then figure out its previous location by taking its + path relative to the copy destination and appending that to + the copy source. Finally, if our current path doesn't meet + one of these other criteria ... ### for now just fallback to + the old copy hunt algorithm. */ + remainder_path = svn_fspath__skip_ancestor(copy_dst, path); + + if (remainder_path) + { + /* If we get here, then our current path is the destination + of, or the child of the destination of, a copy. Fill + in the return values and get outta here. */ + SVN_ERR(svn_fs_x__dag_get_copyfrom_rev(&src_rev, node)); + SVN_ERR(svn_fs_x__dag_get_copyfrom_path(©_src, node)); + + dst_rev = copyroot_rev; + src_path = svn_fspath__join(copy_src, remainder_path, scratch_pool); + } + } + + /* If we calculated a copy source path and revision, we'll make a + 'copy-style' history object. */ + if (src_path && SVN_IS_VALID_REVNUM(src_rev)) + { + svn_boolean_t retry = FALSE; + + /* It's possible for us to find a copy location that is the same + as the history point we've just reported. If that happens, + we simply need to take another trip through this history + search. */ + if ((dst_rev == revision) && reported) + retry = TRUE; + + *prev_history = assemble_history(fs, path, dst_rev, ! retry, + src_path, src_rev, result_pool); + } + else + { + *prev_history = assemble_history(fs, commit_path, commit_rev, TRUE, + NULL, SVN_INVALID_REVNUM, result_pool); + } + + return SVN_NO_ERROR; +} + + +/* Implement svn_fs_history_prev, set *PREV_HISTORY_P to a new + svn_fs_history_t object that represents the predecessory of + HISTORY. If CROSS_COPIES is true, *PREV_HISTORY_P may be related + only through a copy operation. Perform all allocations in POOL. */ +static svn_error_t * +fs_history_prev(svn_fs_history_t **prev_history_p, + svn_fs_history_t *history, + svn_boolean_t cross_copies, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool) +{ + svn_fs_history_t *prev_history = NULL; + fs_history_data_t *fhd = history->fsap_data; + svn_fs_t *fs = fhd->fs; + + /* Special case: the root directory changes in every single + revision, no exceptions. And, the root can't be the target (or + child of a target -- duh) of a copy. So, if that's our path, + then we need only decrement our revision by 1, and there you go. */ + if (strcmp(fhd->path, "/") == 0) + { + if (! fhd->is_interesting) + prev_history = assemble_history(fs, "/", fhd->revision, + 1, NULL, SVN_INVALID_REVNUM, + result_pool); + else if (fhd->revision > 0) + prev_history = assemble_history(fs, "/", fhd->revision - 1, + 1, NULL, SVN_INVALID_REVNUM, + result_pool); + } + else + { + apr_pool_t *iterpool = svn_pool_create(scratch_pool); + prev_history = history; + + while (1) + { + svn_pool_clear(iterpool); + SVN_ERR(history_prev(&prev_history, prev_history, cross_copies, + result_pool, iterpool)); + + if (! prev_history) + break; + fhd = prev_history->fsap_data; + if (fhd->is_interesting) + break; + } + + svn_pool_destroy(iterpool); + } + + *prev_history_p = prev_history; + return SVN_NO_ERROR; +} + + +/* Set *PATH and *REVISION to the path and revision for the HISTORY + object. Allocate *PATH in RESULT_POOL. */ +static svn_error_t * +fs_history_location(const char **path, + svn_revnum_t *revision, + svn_fs_history_t *history, + apr_pool_t *result_pool) +{ + fs_history_data_t *fhd = history->fsap_data; + + *path = apr_pstrdup(result_pool, fhd->path); + *revision = fhd->revision; + return SVN_NO_ERROR; +} + +static history_vtable_t history_vtable = { + fs_history_prev, + fs_history_location +}; + +/* Return a new history object (marked as "interesting") for PATH and + REVISION, allocated in RESULT_POOL, and with its members set to the + values of the parameters provided. Note that PATH and PATH_HINT get + normalized and duplicated in RESULT_POOL. */ +static svn_fs_history_t * +assemble_history(svn_fs_t *fs, + const char *path, + svn_revnum_t revision, + svn_boolean_t is_interesting, + const char *path_hint, + svn_revnum_t rev_hint, + apr_pool_t *result_pool) +{ + svn_fs_history_t *history = apr_pcalloc(result_pool, sizeof(*history)); + fs_history_data_t *fhd = apr_pcalloc(result_pool, sizeof(*fhd)); + fhd->path = svn_fs__canonicalize_abspath(path, result_pool); + fhd->revision = revision; + fhd->is_interesting = is_interesting; + fhd->path_hint = path_hint + ? svn_fs__canonicalize_abspath(path_hint, result_pool) + : NULL; + fhd->rev_hint = rev_hint; + fhd->fs = fs; + + history->vtable = &history_vtable; + history->fsap_data = fhd; + return history; +} + + +/* mergeinfo queries */ + + +/* DIR_DAG is a directory DAG node which has mergeinfo in its + descendants. This function iterates over its children. For each + child with immediate mergeinfo, it adds its mergeinfo to + RESULT_CATALOG. appropriate arguments. For each child with + descendants with mergeinfo, it recurses. Note that it does *not* + call the action on the path for DIR_DAG itself. + + POOL is used for temporary allocations, including the mergeinfo + hashes passed to actions; RESULT_POOL is used for the mergeinfo added + to RESULT_CATALOG. + */ +static svn_error_t * +crawl_directory_dag_for_mergeinfo(svn_fs_root_t *root, + const char *this_path, + dag_node_t *dir_dag, + svn_mergeinfo_catalog_t result_catalog, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool) +{ + apr_array_header_t *entries; + int i; + apr_pool_t *iterpool = svn_pool_create(scratch_pool); + + SVN_ERR(svn_fs_x__dag_dir_entries(&entries, dir_dag, scratch_pool, + iterpool)); + for (i = 0; i < entries->nelts; ++i) + { + svn_fs_x__dirent_t *dirent = APR_ARRAY_IDX(entries, i, svn_fs_x__dirent_t *); + const char *kid_path; + dag_node_t *kid_dag; + svn_boolean_t has_mergeinfo, go_down; + + svn_pool_clear(iterpool); + + kid_path = svn_fspath__join(this_path, dirent->name, iterpool); + SVN_ERR(get_dag(&kid_dag, root, kid_path, iterpool)); + + SVN_ERR(svn_fs_x__dag_has_mergeinfo(&has_mergeinfo, kid_dag)); + SVN_ERR(svn_fs_x__dag_has_descendants_with_mergeinfo(&go_down, kid_dag)); + + if (has_mergeinfo) + { + /* Save this particular node's mergeinfo. */ + apr_hash_t *proplist; + svn_mergeinfo_t kid_mergeinfo; + svn_string_t *mergeinfo_string; + svn_error_t *err; + + SVN_ERR(svn_fs_x__dag_get_proplist(&proplist, kid_dag, iterpool, + iterpool)); + mergeinfo_string = svn_hash_gets(proplist, SVN_PROP_MERGEINFO); + if (!mergeinfo_string) + { + svn_string_t *idstr + = svn_fs_x__id_unparse(&dirent->id, iterpool); + return svn_error_createf + (SVN_ERR_FS_CORRUPT, NULL, + _("Node-revision #'%s' claims to have mergeinfo but doesn't"), + idstr->data); + } + + /* Issue #3896: If a node has syntactically invalid mergeinfo, then + treat it as if no mergeinfo is present rather than raising a parse + error. */ + err = svn_mergeinfo_parse(&kid_mergeinfo, + mergeinfo_string->data, + result_pool); + if (err) + { + if (err->apr_err == SVN_ERR_MERGEINFO_PARSE_ERROR) + svn_error_clear(err); + else + return svn_error_trace(err); + } + else + { + svn_hash_sets(result_catalog, apr_pstrdup(result_pool, kid_path), + kid_mergeinfo); + } + } + + if (go_down) + SVN_ERR(crawl_directory_dag_for_mergeinfo(root, + kid_path, + kid_dag, + result_catalog, + result_pool, + iterpool)); + } + + svn_pool_destroy(iterpool); + return SVN_NO_ERROR; +} + +/* Return the cache key as a combination of REV_ROOT->REV, the inheritance + flags INHERIT and ADJUST_INHERITED_MERGEINFO, and the PATH. The result + will be allocated in RESULT_POOL. + */ +static const char * +mergeinfo_cache_key(const char *path, + svn_fs_root_t *rev_root, + svn_mergeinfo_inheritance_t inherit, + svn_boolean_t adjust_inherited_mergeinfo, + apr_pool_t *result_pool) +{ + apr_int64_t number = rev_root->rev; + number = number * 4 + + (inherit == svn_mergeinfo_nearest_ancestor ? 2 : 0) + + (adjust_inherited_mergeinfo ? 1 : 0); + + return svn_fs_x__combine_number_and_string(number, path, result_pool); +} + +/* Calculates the mergeinfo for PATH under REV_ROOT using inheritance + type INHERIT. Returns it in *MERGEINFO, or NULL if there is none. + The result is allocated in RESULT_POOL; SCRATCH_POOL is + used for temporary allocations. + */ +static svn_error_t * +get_mergeinfo_for_path_internal(svn_mergeinfo_t *mergeinfo, + svn_fs_root_t *rev_root, + const char *path, + svn_mergeinfo_inheritance_t inherit, + svn_boolean_t adjust_inherited_mergeinfo, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool) +{ + parent_path_t *parent_path, *nearest_ancestor; + apr_hash_t *proplist; + svn_string_t *mergeinfo_string; + + path = svn_fs__canonicalize_abspath(path, scratch_pool); + + SVN_ERR(open_path(&parent_path, rev_root, path, 0, FALSE, scratch_pool)); + + if (inherit == svn_mergeinfo_nearest_ancestor && ! parent_path->parent) + return SVN_NO_ERROR; + + if (inherit == svn_mergeinfo_nearest_ancestor) + nearest_ancestor = parent_path->parent; + else + nearest_ancestor = parent_path; + + while (TRUE) + { + svn_boolean_t has_mergeinfo; + + SVN_ERR(svn_fs_x__dag_has_mergeinfo(&has_mergeinfo, + nearest_ancestor->node)); + if (has_mergeinfo) + break; + + /* No need to loop if we're looking for explicit mergeinfo. */ + if (inherit == svn_mergeinfo_explicit) + { + return SVN_NO_ERROR; + } + + nearest_ancestor = nearest_ancestor->parent; + + /* Run out? There's no mergeinfo. */ + if (!nearest_ancestor) + { + return SVN_NO_ERROR; + } + } + + SVN_ERR(svn_fs_x__dag_get_proplist(&proplist, nearest_ancestor->node, + scratch_pool, scratch_pool)); + mergeinfo_string = svn_hash_gets(proplist, SVN_PROP_MERGEINFO); + if (!mergeinfo_string) + return svn_error_createf + (SVN_ERR_FS_CORRUPT, NULL, + _("Node-revision '%s@%ld' claims to have mergeinfo but doesn't"), + parent_path_path(nearest_ancestor, scratch_pool), rev_root->rev); + + /* Parse the mergeinfo; store the result in *MERGEINFO. */ + { + /* Issue #3896: If a node has syntactically invalid mergeinfo, then + treat it as if no mergeinfo is present rather than raising a parse + error. */ + svn_error_t *err = svn_mergeinfo_parse(mergeinfo, + mergeinfo_string->data, + result_pool); + if (err) + { + if (err->apr_err == SVN_ERR_MERGEINFO_PARSE_ERROR) + { + svn_error_clear(err); + err = NULL; + *mergeinfo = NULL; + } + return svn_error_trace(err); + } + } + + /* If our nearest ancestor is the very path we inquired about, we + can return the mergeinfo results directly. Otherwise, we're + inheriting the mergeinfo, so we need to a) remove non-inheritable + ranges and b) telescope the merged-from paths. */ + if (adjust_inherited_mergeinfo && (nearest_ancestor != parent_path)) + { + svn_mergeinfo_t tmp_mergeinfo; + + SVN_ERR(svn_mergeinfo_inheritable2(&tmp_mergeinfo, *mergeinfo, + NULL, SVN_INVALID_REVNUM, + SVN_INVALID_REVNUM, TRUE, + scratch_pool, scratch_pool)); + SVN_ERR(svn_fs__append_to_merged_froms(mergeinfo, tmp_mergeinfo, + parent_path_relpath( + parent_path, nearest_ancestor, + scratch_pool), + result_pool)); + } + + return SVN_NO_ERROR; +} + +/* Caching wrapper around get_mergeinfo_for_path_internal(). + */ +static svn_error_t * +get_mergeinfo_for_path(svn_mergeinfo_t *mergeinfo, + svn_fs_root_t *rev_root, + const char *path, + svn_mergeinfo_inheritance_t inherit, + svn_boolean_t adjust_inherited_mergeinfo, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool) +{ + svn_fs_x__data_t *ffd = rev_root->fs->fsap_data; + const char *cache_key; + svn_boolean_t found = FALSE; + svn_stringbuf_t *mergeinfo_exists; + + *mergeinfo = NULL; + + cache_key = mergeinfo_cache_key(path, rev_root, inherit, + adjust_inherited_mergeinfo, scratch_pool); + if (ffd->mergeinfo_existence_cache) + { + SVN_ERR(svn_cache__get((void **)&mergeinfo_exists, &found, + ffd->mergeinfo_existence_cache, + cache_key, result_pool)); + if (found && mergeinfo_exists->data[0] == '1') + SVN_ERR(svn_cache__get((void **)mergeinfo, &found, + ffd->mergeinfo_cache, + cache_key, result_pool)); + } + + if (! found) + { + SVN_ERR(get_mergeinfo_for_path_internal(mergeinfo, rev_root, path, + inherit, + adjust_inherited_mergeinfo, + result_pool, scratch_pool)); + if (ffd->mergeinfo_existence_cache) + { + mergeinfo_exists = svn_stringbuf_create(*mergeinfo ? "1" : "0", + scratch_pool); + SVN_ERR(svn_cache__set(ffd->mergeinfo_existence_cache, + cache_key, mergeinfo_exists, scratch_pool)); + if (*mergeinfo) + SVN_ERR(svn_cache__set(ffd->mergeinfo_cache, + cache_key, *mergeinfo, scratch_pool)); + } + } + + return SVN_NO_ERROR; +} + +/* Adds mergeinfo for each descendant of PATH (but not PATH itself) + under ROOT to RESULT_CATALOG. Returned values are allocated in + RESULT_POOL; temporary values in POOL. */ +static svn_error_t * +add_descendant_mergeinfo(svn_mergeinfo_catalog_t result_catalog, + svn_fs_root_t *root, + const char *path, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool) +{ + dag_node_t *this_dag; + svn_boolean_t go_down; + + SVN_ERR(get_dag(&this_dag, root, path, scratch_pool)); + SVN_ERR(svn_fs_x__dag_has_descendants_with_mergeinfo(&go_down, + this_dag)); + if (go_down) + SVN_ERR(crawl_directory_dag_for_mergeinfo(root, + path, + this_dag, + result_catalog, + result_pool, + scratch_pool)); + return SVN_NO_ERROR; +} + + +/* Get the mergeinfo for a set of paths, returned in + *MERGEINFO_CATALOG. Returned values are allocated in + POOL, while temporary values are allocated in a sub-pool. */ +static svn_error_t * +get_mergeinfos_for_paths(svn_fs_root_t *root, + svn_mergeinfo_catalog_t *mergeinfo_catalog, + const apr_array_header_t *paths, + svn_mergeinfo_inheritance_t inherit, + svn_boolean_t include_descendants, + svn_boolean_t adjust_inherited_mergeinfo, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool) +{ + svn_mergeinfo_catalog_t result_catalog = svn_hash__make(result_pool); + apr_pool_t *iterpool = svn_pool_create(scratch_pool); + int i; + + for (i = 0; i < paths->nelts; i++) + { + svn_error_t *err; + svn_mergeinfo_t path_mergeinfo; + const char *path = APR_ARRAY_IDX(paths, i, const char *); + + svn_pool_clear(iterpool); + + err = get_mergeinfo_for_path(&path_mergeinfo, root, path, + inherit, adjust_inherited_mergeinfo, + result_pool, iterpool); + if (err) + { + if (err->apr_err == SVN_ERR_MERGEINFO_PARSE_ERROR) + { + svn_error_clear(err); + err = NULL; + path_mergeinfo = NULL; + } + else + { + return svn_error_trace(err); + } + } + + if (path_mergeinfo) + svn_hash_sets(result_catalog, path, path_mergeinfo); + if (include_descendants) + SVN_ERR(add_descendant_mergeinfo(result_catalog, root, path, + result_pool, scratch_pool)); + } + svn_pool_destroy(iterpool); + + *mergeinfo_catalog = result_catalog; + return SVN_NO_ERROR; +} + + +/* Implements svn_fs_get_mergeinfo. */ +static svn_error_t * +x_get_mergeinfo(svn_mergeinfo_catalog_t *catalog, + svn_fs_root_t *root, + const apr_array_header_t *paths, + svn_mergeinfo_inheritance_t inherit, + svn_boolean_t include_descendants, + svn_boolean_t adjust_inherited_mergeinfo, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool) +{ + /* We require a revision root. */ + if (root->is_txn_root) + return svn_error_create(SVN_ERR_FS_NOT_REVISION_ROOT, NULL, NULL); + + /* Retrieve a path -> mergeinfo hash mapping. */ + return get_mergeinfos_for_paths(root, catalog, paths, + inherit, + include_descendants, + adjust_inherited_mergeinfo, + result_pool, scratch_pool); +} + + +/* The vtable associated with root objects. */ +static root_vtable_t root_vtable = { + x_paths_changed, + svn_fs_x__check_path, + x_node_history, + x_node_id, + x_node_relation, + svn_fs_x__node_created_rev, + x_node_origin_rev, + x_node_created_path, + x_delete_node, + x_copy, + x_revision_link, + x_copied_from, + x_closest_copy, + x_node_prop, + x_node_proplist, + x_node_has_props, + x_change_node_prop, + x_props_changed, + x_dir_entries, + x_dir_optimal_order, + x_make_dir, + x_file_length, + x_file_checksum, + x_file_contents, + x_try_process_file_contents, + x_make_file, + x_apply_textdelta, + x_apply_text, + x_contents_changed, + x_get_file_delta_stream, + x_merge, + x_get_mergeinfo, +}; + +/* Construct a new root object in FS, allocated from RESULT_POOL. */ +static svn_fs_root_t * +make_root(svn_fs_t *fs, + apr_pool_t *result_pool) +{ + svn_fs_root_t *root = apr_pcalloc(result_pool, sizeof(*root)); + + root->fs = fs; + root->pool = result_pool; + root->vtable = &root_vtable; + + return root; +} + + +/* Construct a root object referring to the root of revision REV in FS. + Create the new root in RESULT_POOL. */ +static svn_fs_root_t * +make_revision_root(svn_fs_t *fs, + svn_revnum_t rev, + apr_pool_t *result_pool) +{ + svn_fs_root_t *root = make_root(fs, result_pool); + + root->is_txn_root = FALSE; + root->rev = rev; + + return root; +} + + +/* Construct a root object referring to the root of the transaction + named TXN and based on revision BASE_REV in FS, with FLAGS to + describe transaction's behavior. Create the new root in RESULT_POOL. */ +static svn_error_t * +make_txn_root(svn_fs_root_t **root_p, + svn_fs_t *fs, + svn_fs_x__txn_id_t txn_id, + svn_revnum_t base_rev, + apr_uint32_t flags, + apr_pool_t *result_pool) +{ + svn_fs_root_t *root = make_root(fs, result_pool); + fs_txn_root_data_t *frd = apr_pcalloc(root->pool, sizeof(*frd)); + frd->txn_id = txn_id; + + root->is_txn_root = TRUE; + root->txn = svn_fs_x__txn_name(txn_id, root->pool); + root->txn_flags = flags; + root->rev = base_rev; + + /* Because this cache actually tries to invalidate elements, keep + the number of elements per page down. + + Note that since dag_node_cache_invalidate uses svn_cache__iter, + this *cannot* be a memcache-based cache. */ + SVN_ERR(svn_cache__create_inprocess(&(frd->txn_node_cache), + svn_fs_x__dag_serialize, + svn_fs_x__dag_deserialize, + APR_HASH_KEY_STRING, + 32, 20, FALSE, + root->txn, + root->pool)); + + root->fsap_data = frd; + + *root_p = root; + return SVN_NO_ERROR; +} + + + +/* Verify. */ +static const char * +stringify_node(dag_node_t *node, + apr_pool_t *result_pool) +{ + /* ### TODO: print some PATH@REV to it, too. */ + return svn_fs_x__id_unparse(svn_fs_x__dag_get_id(node), result_pool)->data; +} + +/* Check metadata sanity on NODE, and on its children. Manually verify + information for DAG nodes in revision REV, and trust the metadata + accuracy for nodes belonging to older revisions. To detect cycles, + provide all parent dag_node_t * in PARENT_NODES. */ +static svn_error_t * +verify_node(dag_node_t *node, + svn_revnum_t rev, + apr_array_header_t *parent_nodes, + apr_pool_t *scratch_pool) +{ + svn_boolean_t has_mergeinfo; + apr_int64_t mergeinfo_count; + svn_fs_x__id_t pred_id; + svn_fs_t *fs = svn_fs_x__dag_get_fs(node); + int pred_count; + svn_node_kind_t kind; + apr_pool_t *iterpool = svn_pool_create(scratch_pool); + int i; + + /* Detect (non-)DAG cycles. */ + for (i = 0; i < parent_nodes->nelts; ++i) + { + dag_node_t *parent = APR_ARRAY_IDX(parent_nodes, i, dag_node_t *); + if (svn_fs_x__id_eq(svn_fs_x__dag_get_id(parent), + svn_fs_x__dag_get_id(node))) + return svn_error_createf(SVN_ERR_FS_CORRUPT, NULL, + "Node is its own direct or indirect parent '%s'", + stringify_node(node, iterpool)); + } + + /* Fetch some data. */ + SVN_ERR(svn_fs_x__dag_has_mergeinfo(&has_mergeinfo, node)); + SVN_ERR(svn_fs_x__dag_get_mergeinfo_count(&mergeinfo_count, node)); + SVN_ERR(svn_fs_x__dag_get_predecessor_id(&pred_id, node)); + SVN_ERR(svn_fs_x__dag_get_predecessor_count(&pred_count, node)); + kind = svn_fs_x__dag_node_kind(node); + + /* Sanity check. */ + if (mergeinfo_count < 0) + return svn_error_createf(SVN_ERR_FS_CORRUPT, NULL, + "Negative mergeinfo-count %" APR_INT64_T_FMT + " on node '%s'", + mergeinfo_count, stringify_node(node, iterpool)); + + /* Issue #4129. (This check will explicitly catch non-root instances too.) */ + if (svn_fs_x__id_used(&pred_id)) + { + dag_node_t *pred; + int pred_pred_count; + SVN_ERR(svn_fs_x__dag_get_node(&pred, fs, &pred_id, iterpool, + iterpool)); + SVN_ERR(svn_fs_x__dag_get_predecessor_count(&pred_pred_count, pred)); + if (pred_pred_count+1 != pred_count) + return svn_error_createf(SVN_ERR_FS_CORRUPT, NULL, + "Predecessor count mismatch: " + "%s has %d, but %s has %d", + stringify_node(node, iterpool), pred_count, + stringify_node(pred, iterpool), + pred_pred_count); + } + + /* Kind-dependent verifications. */ + if (kind == svn_node_none) + { + return svn_error_createf(SVN_ERR_FS_CORRUPT, NULL, + "Node '%s' has kind 'none'", + stringify_node(node, iterpool)); + } + if (kind == svn_node_file) + { + if (has_mergeinfo != mergeinfo_count) /* comparing int to bool */ + return svn_error_createf(SVN_ERR_FS_CORRUPT, NULL, + "File node '%s' has inconsistent mergeinfo: " + "has_mergeinfo=%d, " + "mergeinfo_count=%" APR_INT64_T_FMT, + stringify_node(node, iterpool), + has_mergeinfo, mergeinfo_count); + } + if (kind == svn_node_dir) + { + apr_array_header_t *entries; + apr_int64_t children_mergeinfo = 0; + APR_ARRAY_PUSH(parent_nodes, dag_node_t*) = node; + + SVN_ERR(svn_fs_x__dag_dir_entries(&entries, node, scratch_pool, + iterpool)); + + /* Compute CHILDREN_MERGEINFO. */ + for (i = 0; i < entries->nelts; ++i) + { + svn_fs_x__dirent_t *dirent + = APR_ARRAY_IDX(entries, i, svn_fs_x__dirent_t *); + dag_node_t *child; + apr_int64_t child_mergeinfo; + + svn_pool_clear(iterpool); + + /* Compute CHILD_REV. */ + if (svn_fs_x__get_revnum(dirent->id.change_set) == rev) + { + SVN_ERR(svn_fs_x__dag_get_node(&child, fs, &dirent->id, + iterpool, iterpool)); + SVN_ERR(verify_node(child, rev, parent_nodes, iterpool)); + SVN_ERR(svn_fs_x__dag_get_mergeinfo_count(&child_mergeinfo, + child)); + } + else + { + SVN_ERR(svn_fs_x__get_mergeinfo_count(&child_mergeinfo, fs, + &dirent->id, iterpool)); + } + + children_mergeinfo += child_mergeinfo; + } + + /* Side-effect of issue #4129. */ + if (children_mergeinfo+has_mergeinfo != mergeinfo_count) + return svn_error_createf(SVN_ERR_FS_CORRUPT, NULL, + "Mergeinfo-count discrepancy on '%s': " + "expected %" APR_INT64_T_FMT "+%d, " + "counted %" APR_INT64_T_FMT, + stringify_node(node, iterpool), + mergeinfo_count, has_mergeinfo, + children_mergeinfo); + + /* If we don't make it here, there was an error / corruption. + * In that case, nobody will need PARENT_NODES anymore. */ + apr_array_pop(parent_nodes); + } + + svn_pool_destroy(iterpool); + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__verify_root(svn_fs_root_t *root, + apr_pool_t *scratch_pool) +{ + dag_node_t *root_dir; + apr_array_header_t *parent_nodes; + + /* Issue #4129: bogus pred-counts and minfo-cnt's on the root node-rev + (and elsewhere). This code makes more thorough checks than the + commit-time checks in validate_root_noderev(). */ + + /* Callers should disable caches by setting SVN_FS_CONFIG_FSX_CACHE_NS; + see r1462436. + + When this code is called in the library, we want to ensure we + use the on-disk data --- rather than some data that was read + in the possibly-distance past and cached since. */ + SVN_ERR(root_node(&root_dir, root, scratch_pool, scratch_pool)); + + /* Recursively verify ROOT_DIR. */ + parent_nodes = apr_array_make(scratch_pool, 16, sizeof(dag_node_t *)); + SVN_ERR(verify_node(root_dir, root->rev, parent_nodes, scratch_pool)); + + /* Verify explicitly the predecessor of the root. */ + { + svn_fs_x__id_t pred_id; + svn_boolean_t has_predecessor; + + /* Only r0 should have no predecessor. */ + SVN_ERR(svn_fs_x__dag_get_predecessor_id(&pred_id, root_dir)); + has_predecessor = svn_fs_x__id_used(&pred_id); + if (!root->is_txn_root && has_predecessor != !!root->rev) + return svn_error_createf(SVN_ERR_FS_CORRUPT, NULL, + "r%ld's root node's predecessor is " + "unexpectedly '%s'", + root->rev, + (has_predecessor + ? svn_fs_x__id_unparse(&pred_id, + scratch_pool)->data + : "(null)")); + if (root->is_txn_root && !has_predecessor) + return svn_error_createf(SVN_ERR_FS_CORRUPT, NULL, + "Transaction '%s''s root node's predecessor is " + "unexpectedly NULL", + root->txn); + + /* Check the predecessor's revision. */ + if (has_predecessor) + { + svn_revnum_t pred_rev = svn_fs_x__get_revnum(pred_id.change_set); + if (! root->is_txn_root && pred_rev+1 != root->rev) + /* Issue #4129. */ + return svn_error_createf(SVN_ERR_FS_CORRUPT, NULL, + "r%ld's root node's predecessor is r%ld" + " but should be r%ld", + root->rev, pred_rev, root->rev - 1); + if (root->is_txn_root && pred_rev != root->rev) + return svn_error_createf(SVN_ERR_FS_CORRUPT, NULL, + "Transaction '%s''s root node's predecessor" + " is r%ld" + " but should be r%ld", + root->txn, pred_rev, root->rev); + } + } + + return SVN_NO_ERROR; +} diff --git a/subversion/libsvn_fs_x/tree.h b/subversion/libsvn_fs_x/tree.h new file mode 100644 index 0000000..9c5d44a --- /dev/null +++ b/subversion/libsvn_fs_x/tree.h @@ -0,0 +1,112 @@ +/* tree.h : internal interface to tree node functions + * + * ==================================================================== + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + * ==================================================================== + */ + +#ifndef SVN_LIBSVN_FS_TREE_H +#define SVN_LIBSVN_FS_TREE_H + +#include "fs.h" + +#ifdef __cplusplus +extern "C" { +#endif /* __cplusplus */ + + + +/* In RESULT_POOL, create an instance of a DAG node 1st level cache. */ +svn_fs_x__dag_cache_t* +svn_fs_x__create_dag_cache(apr_pool_t *result_pool); + +/* Set *ROOT_P to the root directory of revision REV in filesystem FS. + Allocate the structure in POOL. */ +svn_error_t * +svn_fs_x__revision_root(svn_fs_root_t **root_p, + svn_fs_t *fs, + svn_revnum_t rev, + apr_pool_t *pool); + +/* Does nothing, but included for Subversion 1.0.x compatibility. */ +svn_error_t * +svn_fs_x__deltify(svn_fs_t *fs, + svn_revnum_t rev, + apr_pool_t *scratch_pool); + +/* Commit the transaction TXN as a new revision. Return the new + revision in *NEW_REV. If the transaction conflicts with other + changes return SVN_ERR_FS_CONFLICT and set *CONFLICT_P to a string + that details the cause of the conflict. */ +svn_error_t * +svn_fs_x__commit_txn(const char **conflict_p, + svn_revnum_t *new_rev, + svn_fs_txn_t *txn, + apr_pool_t *pool); + +/* Set ROOT_P to the root directory of transaction TXN. Allocate the + structure in POOL. */ +svn_error_t * +svn_fs_x__txn_root(svn_fs_root_t **root_p, + svn_fs_txn_t *txn, + apr_pool_t *pool); + + +/* Set KIND_P to the node kind of the node at PATH in ROOT. + Use SCRATCH_POOL for temporary allocations. */ +svn_error_t * +svn_fs_x__check_path(svn_node_kind_t *kind_p, + svn_fs_root_t *root, + const char *path, + apr_pool_t *scratch_pool); + +/* Set *REVISION to the revision in which PATH under ROOT was created. + Use SCRATCH_POOL for any temporary allocations. If PATH is in an + uncommitted transaction, *REVISION will be set to + SVN_INVALID_REVNUM. */ +svn_error_t * +svn_fs_x__node_created_rev(svn_revnum_t *revision, + svn_fs_root_t *root, + const char *path, + apr_pool_t *scratch_pool); + +/* Verify metadata for ROOT. + ### Currently only implemented for revision roots. */ +svn_error_t * +svn_fs_x__verify_root(svn_fs_root_t *root, + apr_pool_t *scratch_pool); + +svn_error_t * +svn_fs_x__info_format(int *fs_format, + svn_version_t **supports_version, + svn_fs_t *fs, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool); + + +svn_error_t * +svn_fs_x__info_config_files(apr_array_header_t **files, + svn_fs_t *fs, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool); + +#ifdef __cplusplus +} +#endif /* __cplusplus */ + +#endif /* SVN_LIBSVN_FS_TREE_H */ diff --git a/subversion/libsvn_fs_x/util.c b/subversion/libsvn_fs_x/util.c new file mode 100644 index 0000000..da004ad --- /dev/null +++ b/subversion/libsvn_fs_x/util.c @@ -0,0 +1,777 @@ +/* util.c --- utility functions for FSX repo access + * + * ==================================================================== + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + * ==================================================================== + */ + +#include <assert.h> + +#include "svn_ctype.h" +#include "svn_dirent_uri.h" +#include "private/svn_string_private.h" + +#include "fs_x.h" +#include "id.h" +#include "util.h" + +#include "../libsvn_fs/fs-loader.h" + +#include "svn_private_config.h" + +/* Following are defines that specify the textual elements of the + native filesystem directories and revision files. */ + +/* Notes: + +To avoid opening and closing the rev-files all the time, it would +probably be advantageous to keep each rev-file open for the +lifetime of the transaction object. I'll leave that as a later +optimization for now. + +I didn't keep track of pool lifetimes at all in this code. There +are likely some errors because of that. + +*/ + +/* Pathname helper functions */ + +/* Return TRUE is REV is packed in FS, FALSE otherwise. */ +svn_boolean_t +svn_fs_x__is_packed_rev(svn_fs_t *fs, svn_revnum_t rev) +{ + svn_fs_x__data_t *ffd = fs->fsap_data; + + return (rev < ffd->min_unpacked_rev); +} + +/* Return TRUE is REV is packed in FS, FALSE otherwise. */ +svn_boolean_t +svn_fs_x__is_packed_revprop(svn_fs_t *fs, svn_revnum_t rev) +{ + svn_fs_x__data_t *ffd = fs->fsap_data; + + /* rev 0 will not be packed */ + return (rev < ffd->min_unpacked_rev) && (rev != 0); +} + +svn_revnum_t +svn_fs_x__packed_base_rev(svn_fs_t *fs, svn_revnum_t rev) +{ + svn_fs_x__data_t *ffd = fs->fsap_data; + + return rev < ffd->min_unpacked_rev + ? rev - (rev % ffd->max_files_per_dir) + : rev; +} + +svn_revnum_t +svn_fs_x__pack_size(svn_fs_t *fs, svn_revnum_t rev) +{ + svn_fs_x__data_t *ffd = fs->fsap_data; + + return rev < ffd->min_unpacked_rev ? ffd->max_files_per_dir : 1; +} + +const char * +svn_fs_x__path_format(svn_fs_t *fs, + apr_pool_t *result_pool) +{ + return svn_dirent_join(fs->path, PATH_FORMAT, result_pool); +} + +const char * +svn_fs_x__path_uuid(svn_fs_t *fs, + apr_pool_t *result_pool) +{ + return svn_dirent_join(fs->path, PATH_UUID, result_pool); +} + +const char * +svn_fs_x__path_current(svn_fs_t *fs, + apr_pool_t *result_pool) +{ + return svn_dirent_join(fs->path, PATH_CURRENT, result_pool); +} + +const char * +svn_fs_x__path_txn_current(svn_fs_t *fs, + apr_pool_t *result_pool) +{ + return svn_dirent_join(fs->path, PATH_TXN_CURRENT, + result_pool); +} + +const char * +svn_fs_x__path_txn_current_lock(svn_fs_t *fs, + apr_pool_t *result_pool) +{ + return svn_dirent_join(fs->path, PATH_TXN_CURRENT_LOCK, result_pool); +} + +const char * +svn_fs_x__path_lock(svn_fs_t *fs, + apr_pool_t *result_pool) +{ + return svn_dirent_join(fs->path, PATH_LOCK_FILE, result_pool); +} + +const char * +svn_fs_x__path_pack_lock(svn_fs_t *fs, + apr_pool_t *result_pool) +{ + return svn_dirent_join(fs->path, PATH_PACK_LOCK_FILE, result_pool); +} + +const char * +svn_fs_x__path_revprop_generation(svn_fs_t *fs, + apr_pool_t *result_pool) +{ + return svn_dirent_join(fs->path, PATH_REVPROP_GENERATION, result_pool); +} + +/* Return the full path of the file FILENAME within revision REV's shard in + * FS. If FILENAME is NULL, return the shard directory directory itself. + * REVPROPS indicates the parent of the shard parent folder ("revprops" or + * "revs"). PACKED says whether we want the packed shard's name. + * + * Allocate the result in RESULT_POOL. + */static const char* +construct_shard_sub_path(svn_fs_t *fs, + svn_revnum_t rev, + svn_boolean_t revprops, + svn_boolean_t packed, + const char *filename, + apr_pool_t *result_pool) +{ + svn_fs_x__data_t *ffd = fs->fsap_data; + char buffer[SVN_INT64_BUFFER_SIZE + sizeof(PATH_EXT_PACKED_SHARD)] = { 0 }; + + /* Select the appropriate parent path constant. */ + const char *parent = revprops ? PATH_REVPROPS_DIR : PATH_REVS_DIR; + + /* String containing the shard number. */ + apr_size_t len = svn__i64toa(buffer, rev / ffd->max_files_per_dir); + + /* Append the suffix. Limit it to the buffer size (should never hit it). */ + if (packed) + strncpy(buffer + len, PATH_EXT_PACKED_SHARD, sizeof(buffer) - len - 1); + + /* This will also work for NULL FILENAME as well. */ + return svn_dirent_join_many(result_pool, fs->path, parent, buffer, + filename, SVN_VA_NULL); +} + +const char * +svn_fs_x__path_rev_packed(svn_fs_t *fs, + svn_revnum_t rev, + const char *kind, + apr_pool_t *result_pool) +{ + assert(svn_fs_x__is_packed_rev(fs, rev)); + return construct_shard_sub_path(fs, rev, FALSE, TRUE, kind, result_pool); +} + +const char * +svn_fs_x__path_rev_shard(svn_fs_t *fs, + svn_revnum_t rev, + apr_pool_t *result_pool) +{ + return construct_shard_sub_path(fs, rev, FALSE, FALSE, NULL, result_pool); +} + +const char * +svn_fs_x__path_rev(svn_fs_t *fs, + svn_revnum_t rev, + apr_pool_t *result_pool) +{ + char buffer[SVN_INT64_BUFFER_SIZE]; + svn__i64toa(buffer, rev); + + assert(! svn_fs_x__is_packed_rev(fs, rev)); + return construct_shard_sub_path(fs, rev, FALSE, FALSE, buffer, result_pool); +} + +const char * +svn_fs_x__path_rev_absolute(svn_fs_t *fs, + svn_revnum_t rev, + apr_pool_t *result_pool) +{ + return svn_fs_x__is_packed_rev(fs, rev) + ? svn_fs_x__path_rev_packed(fs, rev, PATH_PACKED, result_pool) + : svn_fs_x__path_rev(fs, rev, result_pool); +} + +const char * +svn_fs_x__path_revprops_shard(svn_fs_t *fs, + svn_revnum_t rev, + apr_pool_t *result_pool) +{ + return construct_shard_sub_path(fs, rev, TRUE, FALSE, NULL, result_pool); +} + +const char * +svn_fs_x__path_revprops_pack_shard(svn_fs_t *fs, + svn_revnum_t rev, + apr_pool_t *result_pool) +{ + return construct_shard_sub_path(fs, rev, TRUE, TRUE, NULL, result_pool); +} + +const char * +svn_fs_x__path_revprops(svn_fs_t *fs, + svn_revnum_t rev, + apr_pool_t *result_pool) +{ + char buffer[SVN_INT64_BUFFER_SIZE]; + svn__i64toa(buffer, rev); + + assert(! svn_fs_x__is_packed_revprop(fs, rev)); + return construct_shard_sub_path(fs, rev, TRUE, FALSE, buffer, result_pool); +} + +const char * +svn_fs_x__txn_name(svn_fs_x__txn_id_t txn_id, + apr_pool_t *result_pool) +{ + char *p = apr_palloc(result_pool, SVN_INT64_BUFFER_SIZE); + svn__ui64tobase36(p, txn_id); + return p; +} + +svn_error_t * +svn_fs_x__txn_by_name(svn_fs_x__txn_id_t *txn_id, + const char *txn_name) +{ + const char *next; + apr_uint64_t id = svn__base36toui64(&next, txn_name); + if (next == NULL || *next != 0 || *txn_name == 0) + return svn_error_createf(SVN_ERR_INCORRECT_PARAMS, NULL, + "Malformed TXN name '%s'", txn_name); + + *txn_id = id; + return SVN_NO_ERROR; +} + +const char * +svn_fs_x__path_txns_dir(svn_fs_t *fs, + apr_pool_t *result_pool) +{ + return svn_dirent_join(fs->path, PATH_TXNS_DIR, result_pool); +} + +/* Return the full path of the file FILENAME within transaction TXN_ID's + * transaction directory in FS. If FILENAME is NULL, return the transaction + * directory itself. + * + * Allocate the result in RESULT_POOL. + */ +static const char * +construct_txn_path(svn_fs_t *fs, + svn_fs_x__txn_id_t txn_id, + const char *filename, + apr_pool_t *result_pool) +{ + /* Construct the transaction directory name without temp. allocations. */ + char buffer[SVN_INT64_BUFFER_SIZE + sizeof(PATH_EXT_TXN)] = { 0 }; + apr_size_t len = svn__ui64tobase36(buffer, txn_id); + strncpy(buffer + len, PATH_EXT_TXN, sizeof(buffer) - len - 1); + + /* If FILENAME is NULL, it will terminate the list of segments + to concatenate. */ + return svn_dirent_join_many(result_pool, fs->path, PATH_TXNS_DIR, + buffer, filename, SVN_VA_NULL); +} + +const char * +svn_fs_x__path_txn_dir(svn_fs_t *fs, + svn_fs_x__txn_id_t txn_id, + apr_pool_t *result_pool) +{ + return construct_txn_path(fs, txn_id, NULL, result_pool); +} + +/* Return the name of the sha1->rep mapping file in transaction TXN_ID + * within FS for the given SHA1 checksum. Use POOL for allocations. + */ +const char * +svn_fs_x__path_txn_sha1(svn_fs_t *fs, + svn_fs_x__txn_id_t txn_id, + const unsigned char *sha1, + apr_pool_t *pool) +{ + svn_checksum_t checksum; + checksum.digest = sha1; + checksum.kind = svn_checksum_sha1; + + return svn_dirent_join(svn_fs_x__path_txn_dir(fs, txn_id, pool), + svn_checksum_to_cstring(&checksum, pool), + pool); +} + +const char * +svn_fs_x__path_txn_changes(svn_fs_t *fs, + svn_fs_x__txn_id_t txn_id, + apr_pool_t *result_pool) +{ + return construct_txn_path(fs, txn_id, PATH_CHANGES, result_pool); +} + +const char * +svn_fs_x__path_txn_props(svn_fs_t *fs, + svn_fs_x__txn_id_t txn_id, + apr_pool_t *result_pool) +{ + return construct_txn_path(fs, txn_id, PATH_TXN_PROPS, result_pool); +} + +const char * +svn_fs_x__path_txn_props_final(svn_fs_t *fs, + svn_fs_x__txn_id_t txn_id, + apr_pool_t *result_pool) +{ + return construct_txn_path(fs, txn_id, PATH_TXN_PROPS_FINAL, result_pool); +} + +const char* +svn_fs_x__path_l2p_proto_index(svn_fs_t *fs, + svn_fs_x__txn_id_t txn_id, + apr_pool_t *result_pool) +{ + return construct_txn_path(fs, txn_id, PATH_INDEX PATH_EXT_L2P_INDEX, + result_pool); +} + +const char* +svn_fs_x__path_p2l_proto_index(svn_fs_t *fs, + svn_fs_x__txn_id_t txn_id, + apr_pool_t *result_pool) +{ + return construct_txn_path(fs, txn_id, PATH_INDEX PATH_EXT_P2L_INDEX, + result_pool); +} + +const char * +svn_fs_x__path_txn_next_ids(svn_fs_t *fs, + svn_fs_x__txn_id_t txn_id, + apr_pool_t *result_pool) +{ + return construct_txn_path(fs, txn_id, PATH_NEXT_IDS, result_pool); +} + +const char * +svn_fs_x__path_min_unpacked_rev(svn_fs_t *fs, + apr_pool_t *result_pool) +{ + return svn_dirent_join(fs->path, PATH_MIN_UNPACKED_REV, result_pool); +} + +const char * +svn_fs_x__path_txn_proto_revs(svn_fs_t *fs, + apr_pool_t *result_pool) +{ + return svn_dirent_join(fs->path, PATH_TXN_PROTOS_DIR, result_pool); +} + +const char * +svn_fs_x__path_txn_item_index(svn_fs_t *fs, + svn_fs_x__txn_id_t txn_id, + apr_pool_t *result_pool) +{ + return construct_txn_path(fs, txn_id, PATH_TXN_ITEM_INDEX, result_pool); +} + +/* Return the full path of the proto-rev file / lock file for transaction + * TXN_ID in FS. The SUFFIX determines what file (rev / lock) it will be. + * + * Allocate the result in RESULT_POOL. + */ +static const char * +construct_proto_rev_path(svn_fs_t *fs, + svn_fs_x__txn_id_t txn_id, + const char *suffix, + apr_pool_t *result_pool) +{ + /* Construct the file name without temp. allocations. */ + char buffer[SVN_INT64_BUFFER_SIZE + sizeof(PATH_EXT_REV_LOCK)] = { 0 }; + apr_size_t len = svn__ui64tobase36(buffer, txn_id); + strncpy(buffer + len, suffix, sizeof(buffer) - len - 1); + + /* If FILENAME is NULL, it will terminate the list of segments + to concatenate. */ + return svn_dirent_join_many(result_pool, fs->path, PATH_TXN_PROTOS_DIR, + buffer, SVN_VA_NULL); +} + +const char * +svn_fs_x__path_txn_proto_rev(svn_fs_t *fs, + svn_fs_x__txn_id_t txn_id, + apr_pool_t *result_pool) +{ + return construct_proto_rev_path(fs, txn_id, PATH_EXT_REV, result_pool); +} + +const char * +svn_fs_x__path_txn_proto_rev_lock(svn_fs_t *fs, + svn_fs_x__txn_id_t txn_id, + apr_pool_t *result_pool) +{ + return construct_proto_rev_path(fs, txn_id, PATH_EXT_REV_LOCK, result_pool); +} + +/* Return the full path of the noderev-related file with the extension SUFFIX + * for noderev *ID in transaction TXN_ID in FS. + * + * Allocate the result in RESULT_POOL and temporaries in SCRATCH_POOL. + */ +static const char * +construct_txn_node_path(svn_fs_t *fs, + const svn_fs_x__id_t *id, + const char *suffix, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool) +{ + const char *filename = svn_fs_x__id_unparse(id, result_pool)->data; + apr_int64_t txn_id = svn_fs_x__get_txn_id(id->change_set); + + return svn_dirent_join(svn_fs_x__path_txn_dir(fs, txn_id, scratch_pool), + apr_psprintf(scratch_pool, PATH_PREFIX_NODE "%s%s", + filename, suffix), + result_pool); +} + +const char * +svn_fs_x__path_txn_node_rev(svn_fs_t *fs, + const svn_fs_x__id_t *id, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool) +{ + return construct_txn_node_path(fs, id, "", result_pool, scratch_pool); +} + +const char * +svn_fs_x__path_txn_node_props(svn_fs_t *fs, + const svn_fs_x__id_t *id, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool) +{ + return construct_txn_node_path(fs, id, PATH_EXT_PROPS, result_pool, + scratch_pool); +} + +const char * +svn_fs_x__path_txn_node_children(svn_fs_t *fs, + const svn_fs_x__id_t *id, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool) +{ + return construct_txn_node_path(fs, id, PATH_EXT_CHILDREN, result_pool, + scratch_pool); +} + +svn_error_t * +svn_fs_x__check_file_buffer_numeric(const char *buf, + apr_off_t offset, + const char *path, + const char *title, + apr_pool_t *scratch_pool) +{ + const char *p; + + for (p = buf + offset; *p; p++) + if (!svn_ctype_isdigit(*p)) + return svn_error_createf(SVN_ERR_BAD_VERSION_FILE_FORMAT, NULL, + _("%s file '%s' contains unexpected non-digit '%c' within '%s'"), + title, svn_dirent_local_style(path, scratch_pool), *p, buf); + + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__read_min_unpacked_rev(svn_revnum_t *min_unpacked_rev, + svn_fs_t *fs, + apr_pool_t *scratch_pool) +{ + char buf[80]; + apr_file_t *file; + apr_size_t len; + + SVN_ERR(svn_io_file_open(&file, + svn_fs_x__path_min_unpacked_rev(fs, scratch_pool), + APR_READ | APR_BUFFERED, + APR_OS_DEFAULT, + scratch_pool)); + len = sizeof(buf); + SVN_ERR(svn_io_read_length_line(file, buf, &len, scratch_pool)); + SVN_ERR(svn_io_file_close(file, scratch_pool)); + + SVN_ERR(svn_revnum_parse(min_unpacked_rev, buf, NULL)); + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__update_min_unpacked_rev(svn_fs_t *fs, + apr_pool_t *scratch_pool) +{ + svn_fs_x__data_t *ffd = fs->fsap_data; + return svn_fs_x__read_min_unpacked_rev(&ffd->min_unpacked_rev, fs, + scratch_pool); +} + +/* Write a file FILENAME in directory FS_PATH, containing a single line + * with the number REVNUM in ASCII decimal. Move the file into place + * atomically, overwriting any existing file. + * + * Similar to write_current(). */ +svn_error_t * +svn_fs_x__write_min_unpacked_rev(svn_fs_t *fs, + svn_revnum_t revnum, + apr_pool_t *scratch_pool) +{ + const char *final_path; + char buf[SVN_INT64_BUFFER_SIZE]; + apr_size_t len = svn__i64toa(buf, revnum); + buf[len] = '\n'; + + final_path = svn_fs_x__path_min_unpacked_rev(fs, scratch_pool); + + SVN_ERR(svn_io_write_atomic(final_path, buf, len + 1, + final_path /* copy_perms */, scratch_pool)); + + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__read_current(svn_revnum_t *rev, + svn_fs_t *fs, + apr_pool_t *scratch_pool) +{ + const char *str; + svn_stringbuf_t *content; + SVN_ERR(svn_fs_x__read_content(&content, + svn_fs_x__path_current(fs, scratch_pool), + scratch_pool)); + SVN_ERR(svn_revnum_parse(rev, content->data, &str)); + if (*str != '\n') + return svn_error_create(SVN_ERR_FS_CORRUPT, NULL, + _("Corrupt 'current' file")); + + return SVN_NO_ERROR; +} + +/* Atomically update the 'current' file to hold the specifed REV. + Perform temporary allocations in SCRATCH_POOL. */ +svn_error_t * +svn_fs_x__write_current(svn_fs_t *fs, + svn_revnum_t rev, + apr_pool_t *scratch_pool) +{ + char *buf; + const char *tmp_name, *name; + + /* Now we can just write out this line. */ + buf = apr_psprintf(scratch_pool, "%ld\n", rev); + + name = svn_fs_x__path_current(fs, scratch_pool); + SVN_ERR(svn_io_write_unique(&tmp_name, + svn_dirent_dirname(name, scratch_pool), + buf, strlen(buf), + svn_io_file_del_none, scratch_pool)); + + return svn_fs_x__move_into_place(tmp_name, name, name, scratch_pool); +} + + +svn_error_t * +svn_fs_x__try_stringbuf_from_file(svn_stringbuf_t **content, + svn_boolean_t *missing, + const char *path, + svn_boolean_t last_attempt, + apr_pool_t *result_pool) +{ + svn_error_t *err = svn_stringbuf_from_file2(content, path, result_pool); + if (missing) + *missing = FALSE; + + if (err) + { + *content = NULL; + + if (APR_STATUS_IS_ENOENT(err->apr_err)) + { + if (!last_attempt) + { + svn_error_clear(err); + if (missing) + *missing = TRUE; + return SVN_NO_ERROR; + } + } +#ifdef ESTALE + else if (APR_TO_OS_ERROR(err->apr_err) == ESTALE + || APR_TO_OS_ERROR(err->apr_err) == EIO) + { + if (!last_attempt) + { + svn_error_clear(err); + return SVN_NO_ERROR; + } + } +#endif + } + + return svn_error_trace(err); +} + +/* Fetch the current offset of FILE into *OFFSET_P. */ +svn_error_t * +svn_fs_x__get_file_offset(apr_off_t *offset_p, + apr_file_t *file, + apr_pool_t *scratch_pool) +{ + apr_off_t offset; + + /* Note that, for buffered files, one (possibly surprising) side-effect + of this call is to flush any unwritten data to disk. */ + offset = 0; + SVN_ERR(svn_io_file_seek(file, APR_CUR, &offset, scratch_pool)); + *offset_p = offset; + + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__read_content(svn_stringbuf_t **content, + const char *fname, + apr_pool_t *result_pool) +{ + int i; + *content = NULL; + + for (i = 0; !*content && (i < SVN_FS_X__RECOVERABLE_RETRY_COUNT); ++i) + SVN_ERR(svn_fs_x__try_stringbuf_from_file(content, NULL, + fname, i + 1 < SVN_FS_X__RECOVERABLE_RETRY_COUNT, + result_pool)); + + if (!*content) + return svn_error_createf(SVN_ERR_FS_CORRUPT, NULL, + _("Can't read '%s'"), + svn_dirent_local_style(fname, result_pool)); + + return SVN_NO_ERROR; +} + +/* Reads a line from STREAM and converts it to a 64 bit integer to be + * returned in *RESULT. If we encounter eof, set *HIT_EOF and leave + * *RESULT unchanged. If HIT_EOF is NULL, EOF causes an "corrupt FS" + * error return. + * SCRATCH_POOL is used for temporary allocations. + */ +svn_error_t * +svn_fs_x__read_number_from_stream(apr_int64_t *result, + svn_boolean_t *hit_eof, + svn_stream_t *stream, + apr_pool_t *scratch_pool) +{ + svn_stringbuf_t *sb; + svn_boolean_t eof; + svn_error_t *err; + + SVN_ERR(svn_stream_readline(stream, &sb, "\n", &eof, scratch_pool)); + if (hit_eof) + *hit_eof = eof; + else + if (eof) + return svn_error_create(SVN_ERR_FS_CORRUPT, NULL, _("Unexpected EOF")); + + if (!eof) + { + err = svn_cstring_atoi64(result, sb->data); + if (err) + return svn_error_createf(SVN_ERR_FS_CORRUPT, err, + _("Number '%s' invalid or too large"), + sb->data); + } + + return SVN_NO_ERROR; +} + + +/* Move a file into place from OLD_FILENAME in the transactions + directory to its final location NEW_FILENAME in the repository. On + Unix, match the permissions of the new file to the permissions of + PERMS_REFERENCE. Temporary allocations are from SCRATCH_POOL. + + This function almost duplicates svn_io_file_move(), but it tries to + guarantee a flush. */ +svn_error_t * +svn_fs_x__move_into_place(const char *old_filename, + const char *new_filename, + const char *perms_reference, + apr_pool_t *scratch_pool) +{ + svn_error_t *err; + + SVN_ERR(svn_io_copy_perms(perms_reference, old_filename, scratch_pool)); + + /* Move the file into place. */ + err = svn_io_file_rename(old_filename, new_filename, scratch_pool); + if (err && APR_STATUS_IS_EXDEV(err->apr_err)) + { + apr_file_t *file; + + /* Can't rename across devices; fall back to copying. */ + svn_error_clear(err); + err = SVN_NO_ERROR; + SVN_ERR(svn_io_copy_file(old_filename, new_filename, TRUE, + scratch_pool)); + + /* Flush the target of the copy to disk. */ + SVN_ERR(svn_io_file_open(&file, new_filename, APR_READ, + APR_OS_DEFAULT, scratch_pool)); + /* ### BH: Does this really guarantee a flush of the data written + ### via a completely different handle on all operating systems? + ### + ### Maybe we should perform the copy ourselves instead of making + ### apr do that and flush the real handle? */ + SVN_ERR(svn_io_file_flush_to_disk(file, scratch_pool)); + SVN_ERR(svn_io_file_close(file, scratch_pool)); + } + if (err) + return svn_error_trace(err); + +#ifdef __linux__ + { + /* Linux has the unusual feature that fsync() on a file is not + enough to ensure that a file's directory entries have been + flushed to disk; you have to fsync the directory as well. + On other operating systems, we'd only be asking for trouble + by trying to open and fsync a directory. */ + const char *dirname; + apr_file_t *file; + + dirname = svn_dirent_dirname(new_filename, scratch_pool); + SVN_ERR(svn_io_file_open(&file, dirname, APR_READ, APR_OS_DEFAULT, + scratch_pool)); + SVN_ERR(svn_io_file_flush_to_disk(file, scratch_pool)); + SVN_ERR(svn_io_file_close(file, scratch_pool)); + } +#endif + + return SVN_NO_ERROR; +} diff --git a/subversion/libsvn_fs_x/util.h b/subversion/libsvn_fs_x/util.h new file mode 100644 index 0000000..0010723 --- /dev/null +++ b/subversion/libsvn_fs_x/util.h @@ -0,0 +1,476 @@ +/* util.h --- utility functions for FSX repo access + * + * ==================================================================== + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + * ==================================================================== + */ + +#ifndef SVN_LIBSVN_FS__UTIL_H +#define SVN_LIBSVN_FS__UTIL_H + +#include "svn_fs.h" +#include "id.h" + +/* Functions for dealing with recoverable errors on mutable files + * + * Revprops, current, and txn-current files are mutable; that is, they + * change as part of normal fsx operation, in constrat to revs files, or + * the format file, which are written once at create (or upgrade) time. + * When more than one host writes to the same repository, we will + * sometimes see these recoverable errors when accesssing these files. + * + * These errors all relate to NFS, and thus we only use this retry code if + * ESTALE is defined. + * + ** ESTALE + * + * In NFS v3 and under, the server doesn't track opened files. If you + * unlink(2) or rename(2) a file held open by another process *on the + * same host*, that host's kernel typically renames the file to + * .nfsXXXX and automatically deletes that when it's no longer open, + * but this behavior is not required. + * + * For obvious reasons, this does not work *across hosts*. No one + * knows about the opened file; not the server, and not the deleting + * client. So the file vanishes, and the reader gets stale NFS file + * handle. + * + ** EIO, ENOENT + * + * Some client implementations (at least the 2.6.18.5 kernel that ships + * with Ubuntu Dapper) sometimes give spurious ENOENT (only on open) or + * even EIO errors when trying to read these files that have been renamed + * over on some other host. + * + ** Solution + * + * Try open and read of such files in try_stringbuf_from_file(). Call + * this function within a loop of SVN_FS_X__RECOVERABLE_RETRY_COUNT + * iterations (though, realistically, the second try will succeed). + */ + +#define SVN_FS_X__RECOVERABLE_RETRY_COUNT 10 + +/* Pathname helper functions */ + +/* Return TRUE is REV is packed in FS, FALSE otherwise. */ +svn_boolean_t +svn_fs_x__is_packed_rev(svn_fs_t *fs, + svn_revnum_t rev); + +/* Return TRUE is REV is packed in FS, FALSE otherwise. */ +svn_boolean_t +svn_fs_x__is_packed_revprop(svn_fs_t *fs, + svn_revnum_t rev); + +/* Return the first revision in the pack / rev file containing REV in + * filesystem FS. For non-packed revs, this will simply be REV. */ +svn_revnum_t +svn_fs_x__packed_base_rev(svn_fs_t *fs, + svn_revnum_t rev); + +/* Return the number of revisions in the pack / rev file in FS that contains + * revision REV. */ +svn_revnum_t +svn_fs_x__pack_size(svn_fs_t *fs, svn_revnum_t rev); + +/* Return the full path of the "format" file in FS. + * The result will be allocated in RESULT_POOL. + */ +const char * +svn_fs_x__path_format(svn_fs_t *fs, + apr_pool_t *result_pool); + +/* Return the path to the 'current' file in FS. + Perform allocation in RESULT_POOL. */ +const char * +svn_fs_x__path_current(svn_fs_t *fs, + apr_pool_t *result_pool); + +/* Return the full path of the "uuid" file in FS. + * The result will be allocated in RESULT_POOL. + */ +const char * +svn_fs_x__path_uuid(svn_fs_t *fs, + apr_pool_t *result_pool); + +/* Return the full path of the "txn-current" file in FS. + * The result will be allocated in RESULT_POOL. + */ +const char * +svn_fs_x__path_txn_current(svn_fs_t *fs, + apr_pool_t *result_pool); + +/* Return the full path of the "txn-current-lock" file in FS. + * The result will be allocated in RESULT_POOL. + */ +const char * +svn_fs_x__path_txn_current_lock(svn_fs_t *fs, + apr_pool_t *result_pool); + +/* Return the full path of the global write lock file in FS. + * The result will be allocated in RESULT_POOL. + */ +const char * +svn_fs_x__path_lock(svn_fs_t *fs, + apr_pool_t *result_pool); + +/* Return the full path of the pack operation lock file in FS. + * The result will be allocated in RESULT_POOL. + */ +const char * +svn_fs_x__path_pack_lock(svn_fs_t *fs, + apr_pool_t *result_pool); + +/* Return the full path of the revprop generation file in FS. + * Allocate the result in RESULT_POOL. + */ +const char * +svn_fs_x__path_revprop_generation(svn_fs_t *fs, + apr_pool_t *result_pool); + +/* Return the path of the pack-related file that for revision REV in FS. + * KIND specifies the file name base, e.g. "pack". + * The result will be allocated in RESULT_POOL. + */ +const char * +svn_fs_x__path_rev_packed(svn_fs_t *fs, + svn_revnum_t rev, + const char *kind, + apr_pool_t *result_pool); + +/* Return the full path of the rev shard directory that will contain + * revision REV in FS. Allocate the result in RESULT_POOL. + */ +const char * +svn_fs_x__path_rev_shard(svn_fs_t *fs, + svn_revnum_t rev, + apr_pool_t *result_pool); + +/* Return the full path of the non-packed rev file containing revision REV + * in FS. Allocate the result in RESULT_POOL. + */ +const char * +svn_fs_x__path_rev(svn_fs_t *fs, + svn_revnum_t rev, + apr_pool_t *result_pool); + +/* Set *PATH to the path of REV in FS, whether in a pack file or not. + Allocate *PATH in RESULT_POOL. + + Note: If the caller does not have the write lock on FS, then the path is + not guaranteed to be correct or to remain correct after the function + returns, because the revision might become packed before or after this + call. If a file exists at that path, then it is correct; if not, then + the caller should call update_min_unpacked_rev() and re-try once. */ +const char * +svn_fs_x__path_rev_absolute(svn_fs_t *fs, + svn_revnum_t rev, + apr_pool_t *result_pool); + +/* Return the full path of the revision properties shard directory that + * will contain the properties of revision REV in FS. + * Allocate the result in RESULT_POOL. + */ +const char * +svn_fs_x__path_revprops_shard(svn_fs_t *fs, + svn_revnum_t rev, + apr_pool_t *result_pool); + +/* Return the full path of the revision properties pack shard directory + * that will contain the packed properties of revision REV in FS. + * Allocate the result in RESULT_POOL. + */ +const char * +svn_fs_x__path_revprops_pack_shard(svn_fs_t *fs, + svn_revnum_t rev, + apr_pool_t *result_pool); + +/* Return the full path of the non-packed revision properties file that + * contains the props for revision REV in FS. + * Allocate the result in RESULT_POOL. + */ +const char * +svn_fs_x__path_revprops(svn_fs_t *fs, + svn_revnum_t rev, + apr_pool_t *result_pool); + +/* Convert the TXN_ID into a string, allocated from RESULT_POOL. + */ +const char * +svn_fs_x__txn_name(svn_fs_x__txn_id_t txn_id, + apr_pool_t *result_pool); + +/* Convert TXN_NAME into an ID and return it in *TXN_ID. */ +svn_error_t * +svn_fs_x__txn_by_name(svn_fs_x__txn_id_t *txn_id, + const char *txn_name); + +/* Return the path of the directory containing the transaction TXN_ID in FS. + * The result will be allocated in RESULT_POOL. + */ +const char * +svn_fs_x__path_txn_dir(svn_fs_t *fs, + svn_fs_x__txn_id_t txn_id, + apr_pool_t *result_pool); + +/* Return the path of the 'transactions' directory in FS. + * The result will be allocated in RESULT_POOL. + */ +const char * +svn_fs_x__path_txns_dir(svn_fs_t *fs, + apr_pool_t *result_pool); + +/* Return the name of the sha1->rep mapping file in transaction TXN_ID + * within FS for the given SHA1 checksum. Use POOL for allocations. + */ +const char * +svn_fs_x__path_txn_sha1(svn_fs_t *fs, + svn_fs_x__txn_id_t txn_id, + const unsigned char *sha1, + apr_pool_t *pool); + +/* Return the path of the 'txn-protorevs' directory in FS, even if that + * folder may not exist in FS. The result will be allocated in RESULT_POOL. + */ +const char * +svn_fs_x__path_txn_proto_revs(svn_fs_t *fs, + apr_pool_t *result_pool); + +/* Return the path of the changes file for transaction TXN_ID in FS. + * The result will be allocated in RESULT_POOL. + */ +const char * +svn_fs_x__path_txn_changes(svn_fs_t *fs, + svn_fs_x__txn_id_t txn_id, + apr_pool_t *result_pool); + +/* Return the path of the file containing the log-to-phys index for + * the transaction identified by TXN_ID in FS. + * The result will be allocated in RESULT_POOL. + */ +const char* +svn_fs_x__path_l2p_proto_index(svn_fs_t *fs, + svn_fs_x__txn_id_t txn_id, + apr_pool_t *result_pool); + +/* Return the path of the file containing the phys-to-log index for + * the transaction identified by TXN_ID in FS. + * The result will be allocated in RESULT_POOL. + */ +const char* +svn_fs_x__path_p2l_proto_index(svn_fs_t *fs, + svn_fs_x__txn_id_t txn_id, + apr_pool_t *result_pool); + +/* Return the path of the file containing the transaction properties for + * the transaction identified by TXN_ID in FS. + * The result will be allocated in RESULT_POOL. + */ +const char * +svn_fs_x__path_txn_props(svn_fs_t *fs, + svn_fs_x__txn_id_t txn_id, + apr_pool_t *result_pool); + +/* Return the path of the file containing the "final" transaction + * properties for the transaction identified by TXN_ID in FS. + * The result will be allocated in RESULT_POOL. + */ +const char * +svn_fs_x__path_txn_props_final(svn_fs_t *fs, + svn_fs_x__txn_id_t txn_id, + apr_pool_t *result_pool); + +/* Return the path of the file containing the node and copy ID counters for + * the transaction identified by TXN_ID in FS. + * The result will be allocated in RESULT_POOL. + */ +const char * +svn_fs_x__path_txn_next_ids(svn_fs_t *fs, + svn_fs_x__txn_id_t txn_id, + apr_pool_t *result_pool); + +/* Return the path of the file storing the oldest non-packed revision in FS. + * The result will be allocated in RESULT_POOL. + */ +const char * +svn_fs_x__path_min_unpacked_rev(svn_fs_t *fs, + apr_pool_t *result_pool); + +/* Return the path of the file containing item_index counter for + * the transaction identified by TXN_ID in FS. + * The result will be allocated in RESULT_POOL. + */ +const char * +svn_fs_x__path_txn_item_index(svn_fs_t *fs, + svn_fs_x__txn_id_t txn_id, + apr_pool_t *result_pool); + +/* Return the path of the proto-revision file for transaction TXN_ID in FS. + * The result will be allocated in RESULT_POOL. + */ +const char * +svn_fs_x__path_txn_proto_rev(svn_fs_t *fs, + svn_fs_x__txn_id_t txn_id, + apr_pool_t *result_pool); + +/* Return the path of the proto-revision lock file for transaction TXN_ID + * in FS. The result will be allocated in RESULT_POOL. + */ +const char * +svn_fs_x__path_txn_proto_rev_lock(svn_fs_t *fs, + svn_fs_x__txn_id_t txn_id, + apr_pool_t *result_pool); + +/* Return the path of the file containing the in-transaction node revision + * identified by ID in FS. + * The result will be allocated in RESULT_POOL, temporaries in SCRATCH_POOL. + */ +const char * +svn_fs_x__path_txn_node_rev(svn_fs_t *fs, + const svn_fs_x__id_t *id, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool); + +/* Return the path of the file containing the in-transaction properties of + * the node identified by ID in FS. + * The result will be allocated in RESULT_POOL, temporaries in SCRATCH_POOL. + */ +const char * +svn_fs_x__path_txn_node_props(svn_fs_t *fs, + const svn_fs_x__id_t *id, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool); + +/* Return the path of the file containing the directory entries of the + * in-transaction directory node identified by ID in FS. + * The result will be allocated in RESULT_POOL, temporaries in SCRATCH_POOL. + */ +const char * +svn_fs_x__path_txn_node_children(svn_fs_t *fs, + const svn_fs_x__id_t *id, + apr_pool_t *result_pool, + apr_pool_t *scratch_pool); + +/* Check that BUF, a nul-terminated buffer of text from file PATH, + contains only digits at OFFSET and beyond, raising an error if not. + TITLE contains a user-visible description of the file, usually the + short file name. + + Uses SCRATCH_POOL for temporary allocation. */ +svn_error_t * +svn_fs_x__check_file_buffer_numeric(const char *buf, + apr_off_t offset, + const char *path, + const char *title, + apr_pool_t *scratch_pool); + +/* Set *MIN_UNPACKED_REV to the integer value read from the file returned + * by #svn_fs_fs__path_min_unpacked_rev() for FS. + * Use SCRATCH_POOL for temporary allocations. + */ +svn_error_t * +svn_fs_x__read_min_unpacked_rev(svn_revnum_t *min_unpacked_rev, + svn_fs_t *fs, + apr_pool_t *scratch_pool); + +/* Re-read the MIN_UNPACKED_REV member of FS from disk. + * Use SCRATCH_POOL for temporary allocations. + */ +svn_error_t * +svn_fs_x__update_min_unpacked_rev(svn_fs_t *fs, + apr_pool_t *scratch_pool); + +/* Atomically update the 'min-unpacked-rev' file in FS to hold the specifed + * REVNUM. Perform temporary allocations in SCRATCH_POOL. + */ +svn_error_t * +svn_fs_x__write_min_unpacked_rev(svn_fs_t *fs, + svn_revnum_t revnum, + apr_pool_t *scratch_pool); + +/* Set *REV to the value read from the 'current' file. Perform temporary + * allocations in SCRATCH_POOL. + */ +svn_error_t * +svn_fs_x__read_current(svn_revnum_t *rev, + svn_fs_t *fs, + apr_pool_t *scratch_pool); + +/* Atomically update the 'current' file to hold the specifed REV. + Perform temporary allocations in SCRATCH_POOL. */ +svn_error_t * +svn_fs_x__write_current(svn_fs_t *fs, + svn_revnum_t rev, + apr_pool_t *scratch_pool); + +/* Read the file at PATH and return its content in *CONTENT, allocated in + * RESULT_POOL. *CONTENT will not be modified unless the whole file was + * read successfully. + * + * ESTALE, EIO and ENOENT will not cause this function to return an error + * unless LAST_ATTEMPT has been set. If MISSING is not NULL, indicate + * missing files (ENOENT) there. + */ +svn_error_t * +svn_fs_x__try_stringbuf_from_file(svn_stringbuf_t **content, + svn_boolean_t *missing, + const char *path, + svn_boolean_t last_attempt, + apr_pool_t *result_pool); + +/* Fetch the current offset of FILE into *OFFSET_P. + * Perform temporary allocations in SCRATCH_POOL. */ +svn_error_t * +svn_fs_x__get_file_offset(apr_off_t *offset_p, + apr_file_t *file, + apr_pool_t *scratch_pool); + +/* Read the file FNAME and store the contents in *BUF. + Allocations are performed in RESULT_POOL. */ +svn_error_t * +svn_fs_x__read_content(svn_stringbuf_t **content, + const char *fname, + apr_pool_t *result_pool); + +/* Reads a line from STREAM and converts it to a 64 bit integer to be + * returned in *RESULT. If we encounter eof, set *HIT_EOF and leave + * *RESULT unchanged. If HIT_EOF is NULL, EOF causes an "corrupt FS" + * error return. + * SCRATCH_POOL is used for temporary allocations. + */ +svn_error_t * +svn_fs_x__read_number_from_stream(apr_int64_t *result, + svn_boolean_t *hit_eof, + svn_stream_t *stream, + apr_pool_t *scratch_pool); + +/* Move a file into place from OLD_FILENAME in the transactions + directory to its final location NEW_FILENAME in the repository. On + Unix, match the permissions of the new file to the permissions of + PERMS_REFERENCE. Temporary allocations are from SCRATCH_POOL. + + This function almost duplicates svn_io_file_move(), but it tries to + guarantee a flush. */ +svn_error_t * +svn_fs_x__move_into_place(const char *old_filename, + const char *new_filename, + const char *perms_reference, + apr_pool_t *scratch_pool); + +#endif diff --git a/subversion/libsvn_fs_x/verify.c b/subversion/libsvn_fs_x/verify.c new file mode 100644 index 0000000..4ea0728 --- /dev/null +++ b/subversion/libsvn_fs_x/verify.c @@ -0,0 +1,850 @@ +/* verify.c --- verification of FSX filesystems + * + * ==================================================================== + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + * ==================================================================== + */ + +#include "verify.h" +#include "fs_x.h" +#include "svn_time.h" +#include "private/svn_subr_private.h" + +#include "cached_data.h" +#include "rep-cache.h" +#include "util.h" +#include "index.h" + +#include "../libsvn_fs/fs-loader.h" + +#include "svn_private_config.h" + + +/** Verifying. **/ + +/* Baton type expected by verify_walker(). The purpose is to limit the + * number of notifications sent. + */ +typedef struct verify_walker_baton_t +{ + /* number of calls to verify_walker() since the last clean */ + int iteration_count; + + /* progress notification callback to invoke periodically (may be NULL) */ + svn_fs_progress_notify_func_t notify_func; + + /* baton to use with NOTIFY_FUNC */ + void *notify_baton; + + /* remember the last revision for which we called notify_func */ + svn_revnum_t last_notified_revision; +} verify_walker_baton_t; + +/* Used by svn_fs_x__verify(). + Implements svn_fs_x__walk_rep_reference().walker. */ +static svn_error_t * +verify_walker(svn_fs_x__representation_t *rep, + void *baton, + svn_fs_t *fs, + apr_pool_t *scratch_pool) +{ + verify_walker_baton_t *walker_baton = baton; + + /* notify and free resources periodically */ + if (walker_baton->iteration_count > 1000) + { + svn_revnum_t revision = svn_fs_x__get_revnum(rep->id.change_set); + if ( walker_baton->notify_func + && revision != walker_baton->last_notified_revision) + { + walker_baton->notify_func(revision, + walker_baton->notify_baton, + scratch_pool); + walker_baton->last_notified_revision = revision; + } + + walker_baton->iteration_count = 0; + } + + /* access the repo data */ + SVN_ERR(svn_fs_x__check_rep(rep, fs, scratch_pool)); + + /* update resource usage counters */ + walker_baton->iteration_count++; + + return SVN_NO_ERROR; +} + +/* Verify the rep cache DB's consistency with our rev / pack data. + * The function signature is similar to svn_fs_x__verify. + * The values of START and END have already been auto-selected and + * verified. + */ +static svn_error_t * +verify_rep_cache(svn_fs_t *fs, + svn_revnum_t start, + svn_revnum_t end, + svn_fs_progress_notify_func_t notify_func, + void *notify_baton, + svn_cancel_func_t cancel_func, + void *cancel_baton, + apr_pool_t *scratch_pool) +{ + svn_boolean_t exists; + + /* rep-cache verification. */ + SVN_ERR(svn_fs_x__exists_rep_cache(&exists, fs, scratch_pool)); + if (exists) + { + /* provide a baton to allow the reuse of open file handles between + iterations (saves 2/3 of OS level file operations). */ + verify_walker_baton_t *baton + = apr_pcalloc(scratch_pool, sizeof(*baton)); + + baton->last_notified_revision = SVN_INVALID_REVNUM; + baton->notify_func = notify_func; + baton->notify_baton = notify_baton; + + /* tell the user that we are now ready to do *something* */ + if (notify_func) + notify_func(SVN_INVALID_REVNUM, notify_baton, scratch_pool); + + /* Do not attempt to walk the rep-cache database if its file does + not exist, since doing so would create it --- which may confuse + the administrator. Don't take any lock. */ + SVN_ERR(svn_fs_x__walk_rep_reference(fs, start, end, + verify_walker, baton, + cancel_func, cancel_baton, + scratch_pool)); + } + + return SVN_NO_ERROR; +} + +/* Verify that the MD5 checksum of the data between offsets START and END + * in FILE matches the EXPECTED checksum. If there is a mismatch use the + * indedx NAME in the error message. Supports cancellation with CANCEL_FUNC + * and CANCEL_BATON. SCRATCH_POOL is for temporary allocations. */ +static svn_error_t * +verify_index_checksum(apr_file_t *file, + const char *name, + apr_off_t start, + apr_off_t end, + svn_checksum_t *expected, + svn_cancel_func_t cancel_func, + void *cancel_baton, + apr_pool_t *scratch_pool) +{ + unsigned char buffer[SVN__STREAM_CHUNK_SIZE]; + apr_off_t size = end - start; + svn_checksum_t *actual; + svn_checksum_ctx_t *context + = svn_checksum_ctx_create(svn_checksum_md5, scratch_pool); + + /* Calculate the index checksum. */ + SVN_ERR(svn_io_file_seek(file, APR_SET, &start, scratch_pool)); + while (size > 0) + { + apr_size_t to_read = size > sizeof(buffer) + ? sizeof(buffer) + : (apr_size_t)size; + SVN_ERR(svn_io_file_read_full2(file, buffer, to_read, NULL, NULL, + scratch_pool)); + SVN_ERR(svn_checksum_update(context, buffer, to_read)); + size -= to_read; + + if (cancel_func) + SVN_ERR(cancel_func(cancel_baton)); + } + + SVN_ERR(svn_checksum_final(&actual, context, scratch_pool)); + + /* Verify that it matches the expected checksum. */ + if (!svn_checksum_match(expected, actual)) + { + const char *file_name; + + SVN_ERR(svn_io_file_name_get(&file_name, file, scratch_pool)); + SVN_ERR(svn_checksum_mismatch_err(expected, actual, scratch_pool, + _("%s checksum mismatch in file %s"), + name, file_name)); + } + + return SVN_NO_ERROR; +} + +/* Verify the MD5 checksums of the index data in the rev / pack file + * containing revision START in FS. If given, invoke CANCEL_FUNC with + * CANCEL_BATON at regular intervals. Use SCRATCH_POOL for temporary + * allocations. + */ +static svn_error_t * +verify_index_checksums(svn_fs_t *fs, + svn_revnum_t start, + svn_cancel_func_t cancel_func, + void *cancel_baton, + apr_pool_t *scratch_pool) +{ + svn_fs_x__revision_file_t *rev_file; + + /* Open the rev / pack file and read the footer */ + SVN_ERR(svn_fs_x__open_pack_or_rev_file(&rev_file, fs, start, + scratch_pool, scratch_pool)); + SVN_ERR(svn_fs_x__auto_read_footer(rev_file)); + + /* Verify the index contents against the checksum from the footer. */ + SVN_ERR(verify_index_checksum(rev_file->file, "L2P index", + rev_file->l2p_offset, rev_file->p2l_offset, + rev_file->l2p_checksum, + cancel_func, cancel_baton, scratch_pool)); + SVN_ERR(verify_index_checksum(rev_file->file, "P2L index", + rev_file->p2l_offset, rev_file->footer_offset, + rev_file->p2l_checksum, + cancel_func, cancel_baton, scratch_pool)); + + /* Done. */ + SVN_ERR(svn_fs_x__close_revision_file(rev_file)); + + return SVN_NO_ERROR; +} + +/* Verify that for all log-to-phys index entries for revisions START to + * START + COUNT-1 in FS there is a consistent entry in the phys-to-log + * index. If given, invoke CANCEL_FUNC with CANCEL_BATON at regular + * intervals. Use SCRATCH_POOL for temporary allocations. + */ +static svn_error_t * +compare_l2p_to_p2l_index(svn_fs_t *fs, + svn_revnum_t start, + svn_revnum_t count, + svn_cancel_func_t cancel_func, + void *cancel_baton, + apr_pool_t *scratch_pool) +{ + svn_revnum_t i; + apr_pool_t *iterpool = svn_pool_create(scratch_pool); + apr_array_header_t *max_ids; + + /* common file access structure */ + svn_fs_x__revision_file_t *rev_file; + SVN_ERR(svn_fs_x__open_pack_or_rev_file(&rev_file, fs, start, scratch_pool, + iterpool)); + + /* determine the range of items to check for each revision */ + SVN_ERR(svn_fs_x__l2p_get_max_ids(&max_ids, fs, start, count, scratch_pool, + iterpool)); + + /* check all items in all revisions if the given range */ + for (i = 0; i < max_ids->nelts; ++i) + { + apr_uint64_t k; + apr_uint64_t max_id = APR_ARRAY_IDX(max_ids, i, apr_uint64_t); + svn_revnum_t revision = start + i; + + for (k = 0; k < max_id; ++k) + { + apr_off_t offset; + apr_uint32_t sub_item; + svn_fs_x__id_t l2p_item; + svn_fs_x__id_t *p2l_item; + + l2p_item.change_set = svn_fs_x__change_set_by_rev(revision); + l2p_item.number = k; + + /* get L2P entry. Ignore unused entries. */ + SVN_ERR(svn_fs_x__item_offset(&offset, &sub_item, fs, rev_file, + &l2p_item, iterpool)); + if (offset == -1) + continue; + + /* find the corresponding P2L entry */ + SVN_ERR(svn_fs_x__p2l_item_lookup(&p2l_item, fs, rev_file, + revision, offset, sub_item, + iterpool, iterpool)); + + if (p2l_item == NULL) + return svn_error_createf(SVN_ERR_FS_INDEX_INCONSISTENT, + NULL, + _("p2l index entry not found for " + "PHYS o%s:s%ld returned by " + "l2p index for LOG r%ld:i%ld"), + apr_off_t_toa(scratch_pool, offset), + (long)sub_item, revision, (long)k); + + if (!svn_fs_x__id_eq(&l2p_item, p2l_item)) + return svn_error_createf(SVN_ERR_FS_INDEX_INCONSISTENT, + NULL, + _("p2l index info LOG r%ld:i%ld" + " does not match " + "l2p index for LOG r%ld:i%ld"), + svn_fs_x__get_revnum(p2l_item->change_set), + (long)p2l_item->number, revision, + (long)k); + + svn_pool_clear(iterpool); + } + + if (cancel_func) + SVN_ERR(cancel_func(cancel_baton)); + } + + svn_pool_destroy(iterpool); + + SVN_ERR(svn_fs_x__close_revision_file(rev_file)); + + return SVN_NO_ERROR; +} + +/* Verify that for all phys-to-log index entries for revisions START to + * START + COUNT-1 in FS there is a consistent entry in the log-to-phys + * index. If given, invoke CANCEL_FUNC with CANCEL_BATON at regular + * intervals. Use SCRATCH_POOL for temporary allocations. + * + * Please note that we can only check on pack / rev file granularity and + * must only be called for a single rev / pack file. + */ +static svn_error_t * +compare_p2l_to_l2p_index(svn_fs_t *fs, + svn_revnum_t start, + svn_revnum_t count, + svn_cancel_func_t cancel_func, + void *cancel_baton, + apr_pool_t *scratch_pool) +{ + svn_fs_x__data_t *ffd = fs->fsap_data; + apr_pool_t *iterpool = svn_pool_create(scratch_pool); + apr_pool_t *iterpool2 = svn_pool_create(scratch_pool); + apr_off_t max_offset; + apr_off_t offset = 0; + + /* common file access structure */ + svn_fs_x__revision_file_t *rev_file; + SVN_ERR(svn_fs_x__open_pack_or_rev_file(&rev_file, fs, start, scratch_pool, + iterpool)); + + /* get the size of the rev / pack file as covered by the P2L index */ + SVN_ERR(svn_fs_x__p2l_get_max_offset(&max_offset, fs, rev_file, start, + scratch_pool)); + + /* for all offsets in the file, get the P2L index entries and check + them against the L2P index */ + for (offset = 0; offset < max_offset; ) + { + apr_array_header_t *entries; + svn_fs_x__p2l_entry_t *last_entry; + int i; + + svn_pool_clear(iterpool); + + /* get all entries for the current block */ + SVN_ERR(svn_fs_x__p2l_index_lookup(&entries, fs, rev_file, start, + offset, ffd->p2l_page_size, + iterpool, iterpool)); + if (entries->nelts == 0) + return svn_error_createf(SVN_ERR_FS_INDEX_CORRUPTION, + NULL, + _("p2l does not cover offset %s" + " for revision %ld"), + apr_off_t_toa(scratch_pool, offset), start); + + /* process all entries (and later continue with the next block) */ + last_entry + = &APR_ARRAY_IDX(entries, entries->nelts-1, svn_fs_x__p2l_entry_t); + offset = last_entry->offset + last_entry->size; + + for (i = 0; i < entries->nelts; ++i) + { + apr_uint32_t k; + svn_fs_x__p2l_entry_t *entry + = &APR_ARRAY_IDX(entries, i, svn_fs_x__p2l_entry_t); + + /* check all sub-items for consist entries in the L2P index */ + for (k = 0; k < entry->item_count; ++k) + { + apr_off_t l2p_offset; + apr_uint32_t sub_item; + svn_fs_x__id_t *p2l_item = &entry->items[k]; + svn_revnum_t revision + = svn_fs_x__get_revnum(p2l_item->change_set); + + svn_pool_clear(iterpool2); + SVN_ERR(svn_fs_x__item_offset(&l2p_offset, &sub_item, fs, + rev_file, p2l_item, iterpool2)); + + if (sub_item != k || l2p_offset != entry->offset) + return svn_error_createf(SVN_ERR_FS_INDEX_INCONSISTENT, + NULL, + _("l2p index entry PHYS o%s:s%ld " + "does not match p2l index value " + "LOG r%ld:i%ld for PHYS o%s:s%ld"), + apr_off_t_toa(scratch_pool, + l2p_offset), + (long)sub_item, + revision, + (long)p2l_item->number, + apr_off_t_toa(scratch_pool, + entry->offset), + (long)k); + } + } + + if (cancel_func) + SVN_ERR(cancel_func(cancel_baton)); + } + + svn_pool_destroy(iterpool2); + svn_pool_destroy(iterpool); + + SVN_ERR(svn_fs_x__close_revision_file(rev_file)); + + return SVN_NO_ERROR; +} + +/* Items smaller than this can be read at once into a buffer and directly + * be checksummed. Larger items require stream processing. + * Must be a multiple of 8. */ +#define STREAM_THRESHOLD 4096 + +/* Verify that the next SIZE bytes read from FILE are NUL. SIZE must not + * exceed STREAM_THRESHOLD. Use SCRATCH_POOL for temporary allocations. + */ +static svn_error_t * +expect_buffer_nul(apr_file_t *file, + apr_off_t size, + apr_pool_t *scratch_pool) +{ + union + { + unsigned char buffer[STREAM_THRESHOLD]; + apr_uint64_t chunks[STREAM_THRESHOLD / sizeof(apr_uint64_t)]; + } data; + + apr_size_t i; + SVN_ERR_ASSERT(size <= STREAM_THRESHOLD); + + /* read the whole data block; error out on failure */ + data.chunks[(size - 1)/ sizeof(apr_uint64_t)] = 0; + SVN_ERR(svn_io_file_read_full2(file, data.buffer, size, NULL, NULL, + scratch_pool)); + + /* chunky check */ + for (i = 0; i < size / sizeof(apr_uint64_t); ++i) + if (data.chunks[i] != 0) + break; + + /* byte-wise check upon mismatch or at the end of the block */ + for (i *= sizeof(apr_uint64_t); i < size; ++i) + if (data.buffer[i] != 0) + { + const char *file_name; + apr_off_t offset; + + SVN_ERR(svn_io_file_name_get(&file_name, file, scratch_pool)); + SVN_ERR(svn_fs_x__get_file_offset(&offset, file, scratch_pool)); + offset -= size - i; + + return svn_error_createf(SVN_ERR_FS_CORRUPT, NULL, + _("Empty section in file %s contains " + "non-NUL data at offset %s"), + file_name, + apr_off_t_toa(scratch_pool, offset)); + } + + return SVN_NO_ERROR; +} + +/* Verify that the next SIZE bytes read from FILE are NUL. + * Use SCRATCH_POOL for temporary allocations. + */ +static svn_error_t * +read_all_nul(apr_file_t *file, + apr_off_t size, + apr_pool_t *scratch_pool) +{ + for (; size >= STREAM_THRESHOLD; size -= STREAM_THRESHOLD) + SVN_ERR(expect_buffer_nul(file, STREAM_THRESHOLD, scratch_pool)); + + if (size) + SVN_ERR(expect_buffer_nul(file, size, scratch_pool)); + + return SVN_NO_ERROR; +} + +/* Compare the ACTUAL checksum with the one expected by ENTRY. + * Return an error in case of mismatch. Use the name of FILE + * in error message. Allocate temporary data in SCRATCH_POOL. + */ +static svn_error_t * +expected_checksum(apr_file_t *file, + svn_fs_x__p2l_entry_t *entry, + apr_uint32_t actual, + apr_pool_t *scratch_pool) +{ + if (actual != entry->fnv1_checksum) + { + const char *file_name; + + SVN_ERR(svn_io_file_name_get(&file_name, file, scratch_pool)); + SVN_ERR(svn_io_file_name_get(&file_name, file, scratch_pool)); + return svn_error_createf(SVN_ERR_FS_CORRUPT, NULL, + _("Checksum mismatch in item at offset %s of " + "length %s bytes in file %s"), + apr_off_t_toa(scratch_pool, entry->offset), + apr_off_t_toa(scratch_pool, entry->size), + file_name); + } + + return SVN_NO_ERROR; +} + +/* Verify that the FNV checksum over the next ENTRY->SIZE bytes read + * from FILE will match ENTRY's expected checksum. SIZE must not + * exceed STREAM_THRESHOLD. Use SCRATCH_POOL for temporary allocations. + */ +static svn_error_t * +expected_buffered_checksum(apr_file_t *file, + svn_fs_x__p2l_entry_t *entry, + apr_pool_t *scratch_pool) +{ + unsigned char buffer[STREAM_THRESHOLD]; + SVN_ERR_ASSERT(entry->size <= STREAM_THRESHOLD); + + SVN_ERR(svn_io_file_read_full2(file, buffer, (apr_size_t)entry->size, + NULL, NULL, scratch_pool)); + SVN_ERR(expected_checksum(file, entry, + svn__fnv1a_32x4(buffer, (apr_size_t)entry->size), + scratch_pool)); + + return SVN_NO_ERROR; +} + +/* Verify that the FNV checksum over the next ENTRY->SIZE bytes read from + * FILE will match ENTRY's expected checksum. + * Use SCRATCH_POOL for temporary allocations. + */ +static svn_error_t * +expected_streamed_checksum(apr_file_t *file, + svn_fs_x__p2l_entry_t *entry, + apr_pool_t *scratch_pool) +{ + unsigned char buffer[STREAM_THRESHOLD]; + svn_checksum_t *checksum; + svn_checksum_ctx_t *context + = svn_checksum_ctx_create(svn_checksum_fnv1a_32x4, scratch_pool); + apr_off_t size = entry->size; + + while (size > 0) + { + apr_size_t to_read = size > sizeof(buffer) + ? sizeof(buffer) + : (apr_size_t)size; + SVN_ERR(svn_io_file_read_full2(file, buffer, to_read, NULL, NULL, + scratch_pool)); + SVN_ERR(svn_checksum_update(context, buffer, to_read)); + size -= to_read; + } + + SVN_ERR(svn_checksum_final(&checksum, context, scratch_pool)); + SVN_ERR(expected_checksum(file, entry, + ntohl(*(const apr_uint32_t *)checksum->digest), + scratch_pool)); + + return SVN_NO_ERROR; +} + +/* Verify that for all phys-to-log index entries for revisions START to + * START + COUNT-1 in FS match the actual pack / rev file contents. + * If given, invoke CANCEL_FUNC with CANCEL_BATON at regular intervals. + * Use SCRATCH_POOL for temporary allocations. + * + * Please note that we can only check on pack / rev file granularity and + * must only be called for a single rev / pack file. + */ +static svn_error_t * +compare_p2l_to_rev(svn_fs_t *fs, + svn_revnum_t start, + svn_revnum_t count, + svn_cancel_func_t cancel_func, + void *cancel_baton, + apr_pool_t *scratch_pool) +{ + svn_fs_x__data_t *ffd = fs->fsap_data; + apr_pool_t *iterpool = svn_pool_create(scratch_pool); + apr_off_t max_offset; + apr_off_t offset = 0; + svn_fs_x__revision_file_t *rev_file; + + /* open the pack / rev file that is covered by the p2l index */ + SVN_ERR(svn_fs_x__open_pack_or_rev_file(&rev_file, fs, start, scratch_pool, + iterpool)); + + /* check file size vs. range covered by index */ + SVN_ERR(svn_fs_x__auto_read_footer(rev_file)); + SVN_ERR(svn_fs_x__p2l_get_max_offset(&max_offset, fs, rev_file, start, + scratch_pool)); + + if (rev_file->l2p_offset != max_offset) + return svn_error_createf(SVN_ERR_FS_INDEX_INCONSISTENT, NULL, + _("File size of %s for revision r%ld does " + "not match p2l index size of %s"), + apr_off_t_toa(scratch_pool, + rev_file->l2p_offset), + start, + apr_off_t_toa(scratch_pool, + max_offset)); + + SVN_ERR(svn_io_file_aligned_seek(rev_file->file, ffd->block_size, NULL, 0, + scratch_pool)); + + /* for all offsets in the file, get the P2L index entries and check + them against the L2P index */ + for (offset = 0; offset < max_offset; ) + { + apr_array_header_t *entries; + int i; + + svn_pool_clear(iterpool); + + /* get all entries for the current block */ + SVN_ERR(svn_fs_x__p2l_index_lookup(&entries, fs, rev_file, start, + offset, ffd->p2l_page_size, + iterpool, iterpool)); + + /* The above might have moved the file pointer. + * Ensure we actually start reading at OFFSET. */ + SVN_ERR(svn_io_file_aligned_seek(rev_file->file, ffd->block_size, + NULL, offset, iterpool)); + + /* process all entries (and later continue with the next block) */ + for (i = 0; i < entries->nelts; ++i) + { + svn_fs_x__p2l_entry_t *entry + = &APR_ARRAY_IDX(entries, i, svn_fs_x__p2l_entry_t); + + /* skip bits we previously checked */ + if (i == 0 && entry->offset < offset) + continue; + + /* skip zero-sized entries */ + if (entry->size == 0) + continue; + + /* p2l index must cover all rev / pack file offsets exactly once */ + if (entry->offset != offset) + return svn_error_createf(SVN_ERR_FS_INDEX_INCONSISTENT, + NULL, + _("p2l index entry for revision r%ld" + " is non-contiguous between offsets " + " %s and %s"), + start, + apr_off_t_toa(scratch_pool, offset), + apr_off_t_toa(scratch_pool, + entry->offset)); + + /* empty sections must contain NUL bytes only */ + if (entry->type == SVN_FS_X__ITEM_TYPE_UNUSED) + { + /* skip filler entry at the end of the p2l index */ + if (entry->offset != max_offset) + SVN_ERR(read_all_nul(rev_file->file, entry->size, iterpool)); + } + else + { + if (entry->size < STREAM_THRESHOLD) + SVN_ERR(expected_buffered_checksum(rev_file->file, entry, + iterpool)); + else + SVN_ERR(expected_streamed_checksum(rev_file->file, entry, + iterpool)); + } + + /* advance offset */ + offset += entry->size; + } + + if (cancel_func) + SVN_ERR(cancel_func(cancel_baton)); + } + + svn_pool_destroy(iterpool); + + return SVN_NO_ERROR; +} + +/* Verify that the revprops of the revisions START to END in FS can be + * accessed. Invoke CANCEL_FUNC with CANCEL_BATON at regular intervals. + * + * The values of START and END have already been auto-selected and + * verified. + */ +static svn_error_t * +verify_revprops(svn_fs_t *fs, + svn_revnum_t start, + svn_revnum_t end, + svn_cancel_func_t cancel_func, + void *cancel_baton, + apr_pool_t *scratch_pool) +{ + svn_revnum_t revision; + apr_pool_t *iterpool = svn_pool_create(scratch_pool); + + for (revision = start; revision < end; ++revision) + { + svn_string_t *date; + apr_time_t timetemp; + + svn_pool_clear(iterpool); + + /* Access the svn:date revprop. + * This implies parsing all revprops for that revision. */ + SVN_ERR(svn_fs_x__revision_prop(&date, fs, revision, + SVN_PROP_REVISION_DATE, + iterpool, iterpool)); + + /* The time stamp is the only revprop that, if given, needs to + * have a valid content. */ + if (date) + SVN_ERR(svn_time_from_cstring(&timetemp, date->data, iterpool)); + + if (cancel_func) + SVN_ERR(cancel_func(cancel_baton)); + } + + svn_pool_destroy(iterpool); + + return SVN_NO_ERROR; +} + +/* Verify that on-disk representation has not been tempered with (in a way + * that leaves the repository in a corrupted state). This compares log-to- + * phys with phys-to-log indexes, verifies the low-level checksums and + * checks that all revprops are available. The function signature is + * similar to svn_fs_x__verify. + * + * The values of START and END have already been auto-selected and + * verified. + */ +static svn_error_t * +verify_metadata_consistency(svn_fs_t *fs, + svn_revnum_t start, + svn_revnum_t end, + svn_fs_progress_notify_func_t notify_func, + void *notify_baton, + svn_cancel_func_t cancel_func, + void *cancel_baton, + apr_pool_t *scratch_pool) +{ + svn_error_t *err; + svn_fs_x__data_t *ffd = fs->fsap_data; + svn_revnum_t revision, next_revision; + apr_pool_t *iterpool = svn_pool_create(scratch_pool); + + for (revision = start; revision <= end; revision = next_revision) + { + svn_revnum_t count = svn_fs_x__packed_base_rev(fs, revision); + svn_revnum_t pack_start = count; + svn_revnum_t pack_end = pack_start + svn_fs_x__pack_size(fs, revision); + + svn_pool_clear(iterpool); + + if (notify_func && (pack_start % ffd->max_files_per_dir == 0)) + notify_func(pack_start, notify_baton, iterpool); + + /* Check for external corruption to the indexes. */ + err = verify_index_checksums(fs, pack_start, cancel_func, + cancel_baton, iterpool); + + /* two-way index check */ + if (!err) + err = compare_l2p_to_p2l_index(fs, pack_start, pack_end - pack_start, + cancel_func, cancel_baton, iterpool); + if (!err) + err = compare_p2l_to_l2p_index(fs, pack_start, pack_end - pack_start, + cancel_func, cancel_baton, iterpool); + + /* verify in-index checksums and types vs. actual rev / pack files */ + if (!err) + err = compare_p2l_to_rev(fs, pack_start, pack_end - pack_start, + cancel_func, cancel_baton, iterpool); + + /* ensure that revprops are available and accessible */ + if (!err) + err = verify_revprops(fs, pack_start, pack_end, + cancel_func, cancel_baton, iterpool); + + /* concurrent packing is one of the reasons why verification may fail. + Make sure, we operate on up-to-date information. */ + if (err) + SVN_ERR(svn_fs_x__read_min_unpacked_rev(&ffd->min_unpacked_rev, + fs, scratch_pool)); + + /* retry the whole shard if it got packed in the meantime */ + if (err && count != svn_fs_x__pack_size(fs, revision)) + { + svn_error_clear(err); + + /* We could simply assign revision here but the code below is + more intuitive to maintainers. */ + next_revision = svn_fs_x__packed_base_rev(fs, revision); + } + else + { + SVN_ERR(err); + next_revision = pack_end; + } + } + + svn_pool_destroy(iterpool); + + return SVN_NO_ERROR; +} + +svn_error_t * +svn_fs_x__verify(svn_fs_t *fs, + svn_revnum_t start, + svn_revnum_t end, + svn_fs_progress_notify_func_t notify_func, + void *notify_baton, + svn_cancel_func_t cancel_func, + void *cancel_baton, + apr_pool_t *scratch_pool) +{ + svn_fs_x__data_t *ffd = fs->fsap_data; + svn_revnum_t youngest = ffd->youngest_rev_cache; /* cache is current */ + + /* Input validation. */ + if (! SVN_IS_VALID_REVNUM(start)) + start = 0; + if (! SVN_IS_VALID_REVNUM(end)) + end = youngest; + SVN_ERR(svn_fs_x__ensure_revision_exists(start, fs, scratch_pool)); + SVN_ERR(svn_fs_x__ensure_revision_exists(end, fs, scratch_pool)); + + /* log/phys index consistency. We need to check them first to make + sure we can access the rev / pack files in format7. */ + SVN_ERR(verify_metadata_consistency(fs, start, end, + notify_func, notify_baton, + cancel_func, cancel_baton, + scratch_pool)); + + /* rep cache consistency */ + SVN_ERR(verify_rep_cache(fs, start, end, notify_func, notify_baton, + cancel_func, cancel_baton, scratch_pool)); + + return SVN_NO_ERROR; +} diff --git a/subversion/libsvn_fs_x/verify.h b/subversion/libsvn_fs_x/verify.h new file mode 100644 index 0000000..805f654 --- /dev/null +++ b/subversion/libsvn_fs_x/verify.h @@ -0,0 +1,43 @@ +/* verify.h : verification interface of the native filesystem layer + * + * ==================================================================== + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + * ==================================================================== + */ + +#ifndef SVN_LIBSVN_FS__VERIFY_H +#define SVN_LIBSVN_FS__VERIFY_H + +#include "fs.h" + +/* Verify metadata in fsx filesystem FS. Limit the checks to revisions + * START to END where possible. Indicate progress via the optional + * NOTIFY_FUNC callback using NOTIFY_BATON. The optional CANCEL_FUNC + * will periodically be called with CANCEL_BATON to allow for preemption. + * Use SCRATCH_POOL for temporary allocations. */ +svn_error_t * +svn_fs_x__verify(svn_fs_t *fs, + svn_revnum_t start, + svn_revnum_t end, + svn_fs_progress_notify_func_t notify_func, + void *notify_baton, + svn_cancel_func_t cancel_func, + void *cancel_baton, + apr_pool_t *scratch_pool); + +#endif |