diff options
author | Alex Gorrod <alexander.gorrod@mongodb.com> | 2016-04-28 21:16:44 +1000 |
---|---|---|
committer | Keith Bostic <keith.bostic@mongodb.com> | 2016-04-28 07:16:44 -0400 |
commit | b217c497e38141e8980babd2785c98926867e675 (patch) | |
tree | aae0e6b2026952186b17aba9f3f468162d039d49 /src | |
parent | 316f7f535da96cbea59e17d33283190bda804c5f (diff) | |
download | mongo-b217c497e38141e8980babd2785c98926867e675.tar.gz |
WT-2552 Add public API for pluggable filesystems (#2671)
* WT-2552 Add public API for pluggable filesystems
Not yet compiling. The main parts of this change should be here,
but it involved extensive parameter re-organization. There are also
a number of layering violations between our existing file system
implementations and the WT_FH, that aren't possible with the new
structure.
There are a number of specific todo comments in the code. One of the main
issues is that the in-memory file system had a special close semantic
that relied on WiredTiger handle tracking. The in-memory file-system should
do it's own tracking of file handles, I've gone part way down that road by
adding a queue for closed handles. Need to also add in live handles, and
manage the queue as appropriate.
I haven't created an example application that uses the new API yet.
* WT-2552 Add public API for pluggable filesystems
I always forget you have to remove the already-built html files when
changing PREDEFINED, add a reminder to the complaint.
* WT-2552 Add public API for pluggable filesystems
You have to remove the .js files, too.
* WT-2552 Add public API for pluggable filesystems
Make dist/s_all run cleanly.
* WT-2552 Add public API for pluggable filesystems
Whitespace.
* WT-2552 Add public API for pluggable filesystems
Make it compile/build/lint.
* WT-2552 Add public API for pluggable filesystems
block_write.c: In function '__wt_block_extend':
block_write.c:130:71: error: missing terminating ' character [-Werror]
* WT-2552 Add public API for pluggable filesystems
os_fs_inmemory.c: In function '__im_file_truncate':
os_fs_inmemory.c:344:10: error: 'session' is used uninitialized in this
function [-Werror=uninitialized]
* WT-2552 Add public API for pluggable filesystems
os_fs.c: In function '__posix_directory_sync':
os_fs.c:92:10: error: 'session' is used uninitialized in this function
[-Werror=uninitialized]
* WT-2552 Add public API for pluggable filesystems
Go back to using bool types in the file-system API, this requires we add
<stdbool.h> to the "standard" wiredtiger.h includes.
Consistently use wt_session to represent a WT_SESSION, we were using
"wtsession" in some places.
Make a pass over the Windows code, but I'm sure it doesn't compile yet.
* WT-2552 Add public API for pluggable filesystems
Fix up another couple of bool types.
* WT-2552 Add public API for pluggable filesystems
Move the file naming work out of the underlying filesystem functions,
the calls to __wt_filename are now in the upper-level code,n os_fs.i;
that means the filesystem code is no longer responsible for figuring out
paths. This is cleaner, although the directory-sync call is a bit of a
kluge, and I've commimtted us to handling NULL filesystem methods.
With this set of changes, in-memory runs again.
More Windows naming fixes.
* WT-2552 Add public API for pluggable filesystems
os_fs.c: In function '__posix_directory_sync':
os_fs.c:96:3: error: label 'err' used but not defined
* WT-2552 Add public API for pluggable filesystems
Pull out another call to __wt_filename() from the filesystem-dependent
code.
* WT-2552 Add public API for pluggable filesystems
Consistently check for missing file-system methods when doing
file-system calls.
Other minor lint & cleanup.
* WT-2552 Add public API for pluggable filesystems
Change the in-memory code to maintain a complete list of the files it
has ever opened, and depend on that list instead of reaching up into the
common layer for the WT_FH handle list.
This means __wt_handle_search is only used by the common WT_FH handle
code, simplify it, and add a __wt_handle_is_open function that can be
called for diagnostic purposes (to check for open files that are being
renamed or removed, for example).
* Fix comiler warning and ignore the file system API in Java
* Flesh out the example file system implementation.
* Add in some plumbing for set_file_system in wiredtiger_open.
* WT-2552 Add public API for pluggable filesystems
Whitespace.
* WT-2552 Add public API for pluggable filesystems
WT_CONFIG_ITEM.val isn't a boolean, don't use boolean types in
equal/not-equal comparisons.
* WT-2552 Add public API for pluggable filesystems
Remove unused #includes.
Increment/decrement the DEMO_FILE_SYSTEM.{opened,closed}_file_count.
Allocate demo structures, they're larger than the underlying structures.
Swap the number/size calloc arguments, number comes first.
Fix a couple of statics.
* WT-2552 Add public API for pluggable filesystems
Use %u instead of casting to %d.
* WT-2552 Add public API for pluggable filesystems
Add ex_file_system.c to the list of example programs.
* WT-2552 Add public API for pluggable filesystems
Change ex_file_system.c to not require <wt_internal.h>: strip down a
copy of FreeBSD's <queue.h> for local inclusion, rewrite a few other
minor pieces of code.
* WT-2552 Add public API for pluggable filesystems
Update spell check info
* WT-2552 Add public API for pluggable filesystems
__conn_load_extensions() shouldn't set the "early" boolean to true.
* WT-2552 Add public API for pluggable filesystems
Don't indirect through a NULL pointer if "local" was set and no path was
specified, always set the name to something useful.
* WT-2552 Add public API for pluggable filesystems
Don't indirect through a NULL pointer if "local" was set and no path was
specified, always set the name to something useful.
* WT-2552 Add public API for pluggable filesystems
wt_off_t vs. size_t conversion lint.
* WT-2552 Add public API for pluggable filesystems
Add -rdynamic to the load for ex_file_system, the main executable
symbols are not exported by default.
* WT-2552 Add public API for pluggable filesystems
The underlying handle name includes the enclosing directory,
compare against the WT_FH.name field instead.
* WT-2552 Add public API for pluggable filesystems
demo_fs_rename should return 0 if successful, simplify error handling
Don't bother casting arguments to free(), it's not necessary.
* WT-2552 Add public API for pluggable filesystems
General WT_FILE_SYSTEM cleanup.
Move OS initialization into the wiredtiger_open() code (the
os_common/os_init.c file is no longer needed).
Allow early-load extensions to be part of the environment settings,
matching the "in-memory" and "readonly" configurations.
Syntax check the set of a file-system, remove tests for NULL methods in
the file-system structure unless it's legal for them to be NULL.
Windows, POSIX and in-memory file systems now set WT_FILE_SYSTEM.terminate,
call that function to cleanup when discarding a WT_CONNECTION.
Export file-type and open-flags constants for WT_FILE_SYSTEM.open_file,
sort the WT_FILE_SYSTEM methods, do an editing pass.
Change the WT_FILE_HANDLE type from (const char *) to (char *), it's
"owned" by the underlying layer, and it's simpler that way.
Minor (untested) cleanup of the Windows WT_FILE_SYSTEM.open-file method.
* WT-2552 Add public API for pluggable filesystems
Export the advise argument #defines for the WT_FILE_HANDLE.fadvise method.
Sort the WT_FILE_HANDLE methods.
* WT-2552 Add public API for pluggable filesystems
Clean up and simplify WT_FILE_SYSTEM/WT_FILE_HANDLE documentation's
description of the handles.
* WT-2552 Add public API for pluggable filesystems
WT_FILE_HANDLE.close is a required function (at the least, it
has to free the memory).
WT_FILE_HANDLE.fadvise isn't a required function, if it's not
configured, don't call it.
* WT-2552 Add public API for pluggable filesystems
The WT_FILE_HANDLE.lock function is required.
Change the __wt_open() signature to match WT_FILE_SYSTEM.open_file().
* WT-2552 Add public API for pluggable filesystems
Rework all of the WT_FILE_HANDLE mapped region methods to be optional.
* WT-2552 Add public API for pluggable filesystems
The WT_FILE_HANDLE.{read,size} methods are required.
The WT_FILE_HANDLE.sync method is not required.
Split the WT_FILE_HANDLE.sync method into .sync and .sync_nowait versions,
it makes the upper-level code simpler (Windows supports .sync but doesn't
support .sync_nowait).
* WT-2552 Add public API for pluggable filesystems
The WT_FILE_HANDLE.{truncate,write} methods are required IFF the file
is not readonly.
* WT-2552 Add public API for pluggable filesystems
POSIX shouldn't declare a no-sync handle function unless the
sync_file_range system call is available.
* WT-2552 Add public API for pluggable filesystems
Typo, missing semi-colon.
* Fix a bug in ex_file_system.c
* Fix a memory leak in posix file handle implementation
* WT-2552 Use the correct flags when opening backup file.
* WT-2552 Add public API for pluggable filesystems
Simplify open-file error handling by calling the close function on the
handle, that way we won't forget to free all of the applicable memory
allocations.
* WT-2552 Add public API for pluggable filesystems
Simplify the directory-list method, don't pass in an include/exclude
file, if prefix is non-NULL, it implies we only want files matching
the prefix.
* WT-2552 Add public API for pluggable filesystems
Replace WT_FILE_HANDLE_POSIX.fallocate_{available,requires_locking} wiht
WT_FILE_HANDLE.fallocate and WT_FILE_HANDLE.fallocate_nolock.
Example code doesn't need to set WT_FILE_HANDLE methods to NULL, the
allocation does that.
Free the I/O buffer if open-handle allocation fails in the example code.
Remove snippets for WT_FILE_SYSTEM and WT_FILE_HANDLE methods, we're
not going to provide example code for them.
* WT-2552 Add public API for pluggable filesystems
Document we expect either ENOTSUP or EBUSY from optionally supported
APIs. Review/cleanups ENOTSUP/EBUSY returns from optionally supported
APIs.
Make WT_FILE_HANDLE.lock optional.
Don't configure or call the POSIX fadvise function on files configured
for direct I/O.
Rename __wt_filesize_name to __wt_size for consistency.
Update the spelling list.
* WT-2552 Add public API for pluggable filesystems
WT_FILE_HANDLE.truncate requires locking in all known implementations,
document it is not called concurrently with other operations.
* WT-2552 Add public API for pluggable filesystems
Don't terminate the filesystem unless we've actually configured one.
* WT-2552 Add public API for pluggable filesystems
Remove WT_FILE_SYSTEM and WT_FILE_HANDLE from SWIG so the test suite
can pass again.
* WT-2552 Add public API for pluggable filesystems
Merge __conn_load_early_extensions() and __conn_load_extensions().
Fix a problem where I moved the early extensions load to where it could
include the WiredTiger environment variable, but I didn't pass the built
cfg into the function.
* WT-2552 Add public API for pluggable filesystems
Linux build typo.
* WT-2552 Add public API for pluggable filesystems
Get rid of the "bool silent" argument to WT_FILE_SYSTEM.size by testing
for the file's existence before requesting the size (an extra system
call, but guaranteed to hit in the buffer cache at least).
* WT-2552 Add public API for pluggable filesystems
Naming consistency pass over the WT_FILE_SYSTEM functions.
* WT-2552 Add public API for pluggable filesystems
Fix a spin lock mismatch.
* WT-2552 Add public API for pluggable filesystems
Another spinlock mismatch.
* Update example pluggable file system.
Add a directory list implementation to the example, which uncovered
an issue with the API. The directory list API allocates memory that
is freed by WiredTiger, which I don't think is kosher.
* Change file-directory-sync to use reguar fsync.
The distinction in os_fs.i doesn't work with the filesystem API.
Also add directory_sync application to the example application.
* WT-2552 Add public API for pluggable filesystems
Whitespace.
* WT-2552 Add public API for pluggable filesystems
Rewrite __wt_free to not evaluate macro arguments multiple times.
* WT-2552 Add public API for pluggable filesystems
Simplify the directory-list functions: __wt_realloc_def() already
handles scaling the size of the allocations, there's no need to
involve a separate constant that increments the allocation size.
* WT-2552 Add public API for pluggable filesystems
Fix a grouping problem in a realloc call, we need to multiple the size
times the previously allocated slots + 10.
Fix buffer overrun, if "count" has already been incremented, the memset
would skip clearing the first slot and clear one slot past the end of
the buffer.
Remove a comment, realloc requires clearing allocated memory, it's not
paranoia.
* WT-2552 Add public API for pluggable filesystems
Add the mapping-cookie argument to the map-preload and map-discard
functions.
Change page-discard to stop reaching down through the block manager,
instead, provide a block-manager map-discard function that does the
work.
* WT-2552 Add public API for pluggable filesystems
Require a directory-list function.
Implement a directory-list function for the in-memory filesystem.
Consistency pass, make all the directory-list functions look the same.
* WT-2552 Add public API for pluggable filesystems
The WT_FILE_SYSTEM.{directory_sync, remove, rename} methods are not
required for read-only systems.
* WT-2552 Add public API for pluggable filesystems
Change the WT_FILE_SYSTEM.open_file file_type argument from a set of
constants to an enum.
This requires changing how we store connection direct I/O configuration
(the constants used to be flags stored in the WT_CONNECTION_IMPL), and
requiring all callers of __wt_open() do their own work to figure out if
WT_OPEN_DIRECTIO should be specified.
* WT-2552 Add public API for pluggable filesystems
Make no guarantees WT_FILE_SYSTEM and WT_FILE_HANDLE methods are
not called concurrently (except for WT_FILE_HANDLE::fallocate and
WT_FILE_HANDLE::fallocate_nolock).
Rewrite the in-memory FS code to lock across all methods (for example,
WT_FILE_HANDLE.close), that means including a reference to the enclosing
WT_FILE_SYSTEM in the WT_FILE_HANDLE structure so we can find a lock
without using the WT_CONNECTION_IMPL structure.
* WT-2552 Add public API for pluggable filesystems
Remove __wt_directory_sync_fh, it's no longer useful.
* WT-2552 Add public API for pluggable filesystems
Rename WT_INMEMORY_FILE_SYSTEM to WT_FILE_SYSTEM_INMEM, matching
WT_FILE_HANDLE_INMEM.
* WT-2552 Add public API for pluggable filesystems
Add WT_FILE_SYSTEM.directory_list_free, to free memory allocated
by WT_FILE_SYSTEM.direct_list.
Fix a memory leak in __log_archive_once (if __wt_readlock failed,
we leaked the directory-list memory).
* WT-2552 Add public API for pluggable filesystems
Typo, check WT_DIRECT_IO_LOG, not WT_DIRECT_IO_CHECKPOINT.
* WT-2552 Add public API for pluggable filesystems
Typo, unreachable code.
* WT-2552 Add public API for pluggable filesystems
We don't require WT_FILE_SYSTEM.{remove,rename} if the system is
read-only.
* Fix Windows build with pluggable file system.
Involved removing u_int from the public API.
* Fix line wrapping.
* Fix Windows terminate function.
* Forgot something in my last commit.
* Fix Windows munmap bug.
* Add new example to Windows build. Extend example to be more complete.
* Fix example loading on Windows
* Update documentation
* Add missing spell words
* Remove old comment.
Diffstat (limited to 'src')
50 files changed, 2379 insertions, 1679 deletions
diff --git a/src/block/block_map.c b/src/block/block_map.c index b16fe7f8423..ce6fe8602f5 100644 --- a/src/block/block_map.c +++ b/src/block/block_map.c @@ -13,24 +13,16 @@ * Map a segment of the file in, if possible. */ int -__wt_block_map( - WT_SESSION_IMPL *session, WT_BLOCK *block, void *mapp, size_t *maplenp, - void **mappingcookie) +__wt_block_map(WT_SESSION_IMPL *session, WT_BLOCK *block, + void *mapped_regionp, size_t *lengthp, void *mapped_cookiep) { WT_DECL_RET; + WT_FILE_HANDLE *handle; - *(void **)mapp = NULL; - *maplenp = 0; + *(void **)mapped_regionp = NULL; + *lengthp = 0; + *(void **)mapped_cookiep = NULL; -#ifdef WORDS_BIGENDIAN - /* - * The underlying objects are little-endian, mapping objects isn't - * currently supported on big-endian systems. - */ - WT_UNUSED(session); - WT_UNUSED(block); - WT_UNUSED(mappingcookie); -#else /* Map support is configurable. */ if (!S2C(session)->mmap) return (0); @@ -51,15 +43,23 @@ __wt_block_map( return (0); /* + * There may be no underlying functionality. + */ + handle = block->fh->handle; + if (handle->map == NULL) + return (0); + + /* * Map the file into memory. * Ignore not-supported errors, we'll read the file through the cache * if map fails. */ - ret = block->fh->fh_map( - session, block->fh, mapp, maplenp, mappingcookie); - if (ret == ENOTSUP) + ret = handle->map(handle, + (WT_SESSION *)session, mapped_regionp, lengthp, mapped_cookiep); + if (ret == ENOTSUP) { + *(void **)mapped_regionp = NULL; ret = 0; -#endif + } return (ret); } @@ -69,11 +69,13 @@ __wt_block_map( * Unmap any mapped-in segment of the file. */ int -__wt_block_unmap( - WT_SESSION_IMPL *session, WT_BLOCK *block, void *map, size_t maplen, - void **mappingcookie) +__wt_block_unmap(WT_SESSION_IMPL *session, + WT_BLOCK *block, void *mapped_region, size_t length, void *mapped_cookie) { + WT_FILE_HANDLE *handle; + /* Unmap the file from memory. */ - return (block->fh->fh_map_unmap( - session, block->fh, map, maplen, mappingcookie)); + handle = block->fh->handle; + return (handle->unmap(handle, + (WT_SESSION *)session, mapped_region, length, mapped_cookie)); } diff --git a/src/block/block_mgr.c b/src/block/block_mgr.c index 06150a0f062..465952d8ca5 100644 --- a/src/block/block_mgr.c +++ b/src/block/block_mgr.c @@ -103,7 +103,7 @@ __bm_checkpoint_load(WT_BM *bm, WT_SESSION_IMPL *session, * of being read into cache buffers. */ WT_RET(__wt_block_map(session, - bm->block, &bm->map, &bm->maplen, &bm->mappingcookie)); + bm->block, &bm->map, &bm->maplen, &bm->mapped_cookie)); /* * If this handle is for a checkpoint, that is, read-only, there @@ -149,7 +149,7 @@ __bm_checkpoint_unload(WT_BM *bm, WT_SESSION_IMPL *session) /* Unmap any mapped segment. */ if (bm->map != NULL) WT_TRET(__wt_block_unmap(session, - bm->block, bm->map, bm->maplen, &bm->mappingcookie)); + bm->block, bm->map, bm->maplen, &bm->mapped_cookie)); /* Unload the checkpoint. */ WT_TRET(__wt_block_checkpoint_unload(session, bm->block, !bm->is_live)); @@ -302,6 +302,20 @@ __bm_is_mapped(WT_BM *bm, WT_SESSION_IMPL *session) } /* + * __bm_map_discard -- + * Discard a mapped segment. + */ +static int +__bm_map_discard(WT_BM *bm, WT_SESSION_IMPL *session, void *map, size_t len) +{ + WT_FILE_HANDLE *handle; + + handle = bm->block->fh->handle; + return (handle->map_discard( + handle, (WT_SESSION *)session, map, len, bm->mapped_cookie)); +} + +/* * __bm_salvage_end -- * End a block manager salvage. */ @@ -413,19 +427,7 @@ __bm_stat(WT_BM *bm, WT_SESSION_IMPL *session, WT_DSRC_STATS *stats) static int __bm_sync(WT_BM *bm, WT_SESSION_IMPL *session, bool block) { - WT_DECL_RET; - - if (!block && !bm->block->nowait_sync_available) - return (0); - - if ((ret = __wt_fsync(session, bm->block->fh, block)) == 0) - return (0); - - /* Ignore ENOTSUP, but don't try again. */ - if (ret != ENOTSUP) - return (ret); - bm->block->nowait_sync_available = false; - return (0); + return (__wt_fsync(session, bm->block->fh, block)); } /* @@ -544,6 +546,7 @@ __bm_method_set(WT_BM *bm, bool readonly) bm->compact_start = __bm_compact_start; bm->free = __bm_free; bm->is_mapped = __bm_is_mapped; + bm->map_discard = __bm_map_discard; bm->preload = __wt_bm_preload; bm->read = __wt_bm_read; bm->salvage_end = __bm_salvage_end; diff --git a/src/block/block_open.c b/src/block/block_open.c index cc3d8dbb46e..e58bef30a6d 100644 --- a/src/block/block_open.c +++ b/src/block/block_open.c @@ -43,7 +43,7 @@ __wt_block_manager_create( * in our space. Move any existing files out of the way and complain. */ for (;;) { - if ((ret = __wt_open(session, filename, WT_FILE_TYPE_DATA, + if ((ret = __wt_open(session, filename, WT_OPEN_FILE_TYPE_DATA, WT_OPEN_CREATE | WT_OPEN_EXCLUSIVE, &fh)) == 0) break; WT_ERR_TEST(ret != EEXIST, ret); @@ -53,10 +53,10 @@ __wt_block_manager_create( for (suffix = 1;; ++suffix) { WT_ERR(__wt_buf_fmt( session, tmp, "%s.%d", filename, suffix)); - WT_ERR(__wt_exist(session, tmp->data, &exists)); + WT_ERR(__wt_fs_exist(session, tmp->data, &exists)); if (!exists) { - WT_ERR( - __wt_rename(session, filename, tmp->data)); + WT_ERR(__wt_fs_rename( + session, filename, tmp->data)); WT_ERR(__wt_msg(session, "unexpected file %s found, renamed to %s", filename, (char *)tmp->data)); @@ -82,11 +82,11 @@ __wt_block_manager_create( * that the file will appear. */ if (ret == 0) - WT_TRET(__wt_directory_sync(session, filename)); + WT_TRET(__wt_fs_directory_sync(session, filename)); /* Undo any create on error. */ if (ret != 0) - WT_TRET(__wt_remove(session, filename)); + WT_TRET(__wt_fs_remove(session, filename)); err: __wt_scr_free(session, &tmp); @@ -200,20 +200,18 @@ __wt_block_open(WT_SESSION_IMPL *session, /* Set the file extension information. */ block->extend_len = conn->data_extend_len; - /* Set the asynchronous flush, preload availability. */ - block->nowait_sync_available = true; - block->preload_available = true; - /* * Open the underlying file handle. * * "direct_io=checkpoint" configures direct I/O for readonly data files. */ flags = 0; - if (readonly && FLD_ISSET(conn->direct_io, WT_FILE_TYPE_CHECKPOINT)) + if (readonly && FLD_ISSET(conn->direct_io, WT_DIRECT_IO_CHECKPOINT)) + LF_SET(WT_OPEN_DIRECTIO); + if (!readonly && FLD_ISSET(conn->direct_io, WT_DIRECT_IO_DATA)) LF_SET(WT_OPEN_DIRECTIO); WT_ERR(__wt_open( - session, filename, WT_FILE_TYPE_DATA, flags, &block->fh)); + session, filename, WT_OPEN_FILE_TYPE_DATA, flags, &block->fh)); /* Set the file's size. */ WT_ERR(__wt_filesize(session, block->fh, &block->size)); @@ -426,5 +424,5 @@ int __wt_block_manager_named_size( WT_SESSION_IMPL *session, const char *name, wt_off_t *sizep) { - return (__wt_filesize_name(session, name, false, sizep)); + return (__wt_fs_size(session, name, sizep)); } diff --git a/src/block/block_read.c b/src/block/block_read.c index 6f0c41c1b5c..7304f6ff4bc 100644 --- a/src/block/block_read.c +++ b/src/block/block_read.c @@ -19,44 +19,32 @@ __wt_bm_preload( WT_BLOCK *block; WT_DECL_ITEM(tmp); WT_DECL_RET; + WT_FILE_HANDLE *handle; wt_off_t offset; uint32_t cksum, size; bool mapped; WT_UNUSED(addr_size); + block = bm->block; WT_STAT_FAST_CONN_INCR(session, block_preload); - /* Preload the block. */ - if (block->preload_available) { - /* Crack the cookie. */ - WT_RET(__wt_block_buffer_to_addr( - block, addr, &offset, &size, &cksum)); - - mapped = bm->map != NULL && - offset + size <= (wt_off_t)bm->maplen; - if (mapped) - ret = block->fh->fh_map_preload(session, - block->fh, (uint8_t *)bm->map + offset, size); - else - ret = block->fh->fh_advise(session, - block->fh, (wt_off_t)offset, - (wt_off_t)size, POSIX_FADV_WILLNEED); - if (ret == 0) - return (0); - - /* Ignore ENOTSUP, but don't try again. */ - if (ret != ENOTSUP) - return (ret); - block->preload_available = false; - } + /* Crack the cookie. */ + WT_RET(__wt_block_buffer_to_addr(block, addr, &offset, &size, &cksum)); - /* - * If preload isn't supported, do it the slow way; don't call the - * underlying read routine directly, we don't know for certain if - * this is a mapped range. - */ + handle = block->fh->handle; + mapped = bm->map != NULL && offset + size <= (wt_off_t)bm->maplen; + if (mapped && handle->map_preload != NULL) + ret = handle->map_preload(handle, (WT_SESSION *)session, + (uint8_t *)bm->map + offset, size, bm->mapped_cookie); + if (!mapped && handle->fadvise != NULL) + ret = handle->fadvise(handle, (WT_SESSION *)session, + (wt_off_t)offset, (wt_off_t)size, WT_FILE_HANDLE_WILLNEED); + if (ret != EBUSY && ret != ENOTSUP) + return (ret); + + /* If preload isn't supported, do it the slow way. */ WT_RET(__wt_scr_alloc(session, 0, &tmp)); ret = __wt_bm_read(bm, session, tmp, addr, addr_size); __wt_scr_free(session, &tmp); @@ -74,6 +62,7 @@ __wt_bm_read(WT_BM *bm, WT_SESSION_IMPL *session, { WT_BLOCK *block; WT_DECL_RET; + WT_FILE_HANDLE *handle; wt_off_t offset; uint32_t cksum, size; bool mapped; @@ -87,23 +76,17 @@ __wt_bm_read(WT_BM *bm, WT_SESSION_IMPL *session, /* * Map the block if it's possible. */ + handle = block->fh->handle; mapped = bm->map != NULL && offset + size <= (wt_off_t)bm->maplen; - if (mapped) { + if (mapped && handle->map_preload != NULL) { buf->data = (uint8_t *)bm->map + offset; buf->size = size; - if (block->preload_available) { - ret = block->fh->fh_map_preload( - session, block->fh, buf->data, buf->size); - - /* Ignore ENOTSUP, but don't try again. */ - if (ret != ENOTSUP) - return (ret); - block->preload_available = false; - } + ret = handle->map_preload(handle, (WT_SESSION *)session, + buf->data, buf->size,bm->mapped_cookie); WT_STAT_FAST_CONN_INCR(session, block_map_read); WT_STAT_FAST_CONN_INCRV(session, block_byte_map_read, size); - return (0); + return (ret); } #ifdef HAVE_DIAGNOSTIC diff --git a/src/block/block_write.c b/src/block/block_write.c index e79e538c920..4f1224f3c13 100644 --- a/src/block/block_write.c +++ b/src/block/block_write.c @@ -48,27 +48,28 @@ int __wt_block_discard(WT_SESSION_IMPL *session, WT_BLOCK *block, size_t added_size) { WT_DECL_RET; + WT_FILE_HANDLE *handle; + /* The file may not support this call. */ + handle = block->fh->handle; + if (handle->fadvise == NULL) + return (0); + + /* The call may not be configured. */ if (block->os_cache_max == 0) return (0); /* * We're racing on the addition, but I'm not willing to serialize on it - * in the standard read path with more evidence it's needed. + * in the standard read path without evidence it's needed. */ if ((block->os_cache += added_size) <= block->os_cache_max) return (0); block->os_cache = 0; - WT_ERR(block->fh->fh_advise(session, - block->fh, (wt_off_t)0, (wt_off_t)0, POSIX_FADV_DONTNEED)); - return (0); - -err: /* Ignore ENOTSUP, but don't try again. */ - if (ret != ENOTSUP) - return (ret); - block->os_cache_max = 0; - return (0); + ret = handle->fadvise(handle, (WT_SESSION *)session, + (wt_off_t)0, (wt_off_t)0, WT_FILE_HANDLE_DONTNEED); + return (ret == EBUSY || ret == ENOTSUP ? 0 : ret); } /* @@ -80,6 +81,7 @@ __wt_block_extend(WT_SESSION_IMPL *session, WT_BLOCK *block, WT_FH *fh, wt_off_t offset, size_t align_size, bool *release_lockp) { WT_DECL_RET; + WT_FILE_HANDLE *handle; bool locked; /* @@ -125,7 +127,8 @@ __wt_block_extend(WT_SESSION_IMPL *session, WT_BLOCK *block, * based on the filesystem type, fall back to ftruncate in that case, * and remember that ftruncate requires locking. */ - if (fh->fallocate_available != WT_FALLOCATE_NOT_AVAILABLE) { + handle = fh->handle; + if (handle->fallocate != NULL || handle->fallocate_nolock != NULL) { /* * Release any locally acquired lock if not needed to extend the * file, extending the file may require updating on-disk file's @@ -133,7 +136,7 @@ __wt_block_extend(WT_SESSION_IMPL *session, WT_BLOCK *block, * configure for file extension on systems that require locking * over the extend call.) */ - if (!fh->fallocate_requires_locking && *release_lockp) { + if (handle->fallocate_nolock != NULL && *release_lockp) { *release_lockp = locked = false; __wt_spin_unlock(session, &block->live_lock); } @@ -149,8 +152,7 @@ __wt_block_extend(WT_SESSION_IMPL *session, WT_BLOCK *block, if ((ret = __wt_fallocate( session, fh, block->size, block->extend_len * 2)) == 0) return (0); - if (ret != ENOTSUP) - return (ret); + WT_RET_ERROR_OK(ret, ENOTSUP); } /* @@ -173,9 +175,8 @@ __wt_block_extend(WT_SESSION_IMPL *session, WT_BLOCK *block, * The truncate might fail if there's a mapped file (in other words, if * there's an open checkpoint on the file), that's OK. */ - if ((ret = __wt_ftruncate(session, fh, block->extend_size)) == EBUSY) - ret = 0; - return (ret); + WT_RET_BUSY_OK(__wt_ftruncate(session, fh, block->extend_size)); + return (0); } /* diff --git a/src/btree/bt_discard.c b/src/btree/bt_discard.c index 509333551c4..9807d5bc88f 100644 --- a/src/btree/bt_discard.c +++ b/src/btree/bt_discard.c @@ -40,7 +40,6 @@ __wt_ref_out(WT_SESSION_IMPL *session, WT_REF *ref) void __wt_page_out(WT_SESSION_IMPL *session, WT_PAGE **pagep) { - WT_FH *fh; WT_PAGE *page; WT_PAGE_HEADER *dsk; WT_PAGE_MODIFY *mod; @@ -134,10 +133,11 @@ __wt_page_out(WT_SESSION_IMPL *session, WT_PAGE **pagep) dsk = (WT_PAGE_HEADER *)page->dsk; if (F_ISSET_ATOMIC(page, WT_PAGE_DISK_ALLOC)) __wt_overwrite_and_free_len(session, dsk, dsk->mem_size); - if (F_ISSET_ATOMIC(page, WT_PAGE_DISK_MAPPED)) { - fh = S2BT(session)->bm->block->fh; - (void)fh->fh_map_discard(session, fh, dsk, dsk->mem_size); - } + + /* Discard any mapped image. */ + if (F_ISSET_ATOMIC(page, WT_PAGE_DISK_MAPPED)) + (void)S2BT(session)->bm->map_discard( + S2BT(session)->bm, session, dsk, (size_t)dsk->mem_size); __wt_overwrite_and_free(session, page); } diff --git a/src/config/config_def.c b/src/config/config_def.c index 3c0940bfc4c..c7bbdf50280 100644 --- a/src/config/config_def.c +++ b/src/config/config_def.c @@ -17,6 +17,7 @@ static const WT_CONFIG_CHECK confchk_WT_CONNECTION_close[] = { static const WT_CONFIG_CHECK confchk_WT_CONNECTION_load_extension[] = { { "config", "string", NULL, NULL, NULL, 0 }, + { "early_load", "boolean", NULL, NULL, NULL, 0 }, { "entry", "string", NULL, NULL, NULL, 0 }, { "terminate", "string", NULL, NULL, NULL, 0 }, { NULL, NULL, NULL, NULL, NULL, 0 } @@ -958,9 +959,9 @@ static const WT_CONFIG_ENTRY config_entries[] = { confchk_WT_CONNECTION_close, 1 }, { "WT_CONNECTION.load_extension", - "config=,entry=wiredtiger_extension_init," + "config=,early_load=0,entry=wiredtiger_extension_init," "terminate=wiredtiger_extension_terminate", - confchk_WT_CONNECTION_load_extension, 3 + confchk_WT_CONNECTION_load_extension, 4 }, { "WT_CONNECTION.open_session", "isolation=read-committed", @@ -982,6 +983,10 @@ static const WT_CONFIG_ENTRY config_entries[] = { "timestamp=\"%b %d %H:%M:%S\",wait=0),verbose=", confchk_WT_CONNECTION_reconfigure, 18 }, + { "WT_CONNECTION.set_file_system", + "", + NULL, 0 + }, { "WT_CURSOR.close", "", NULL, 0 diff --git a/src/conn/conn_api.c b/src/conn/conn_api.c index 4efa853851e..18ad383ec74 100644 --- a/src/conn/conn_api.c +++ b/src/conn/conn_api.c @@ -806,6 +806,7 @@ static int __conn_load_default_extensions(WT_CONNECTION_IMPL *conn) { WT_UNUSED(conn); + #ifdef HAVE_BUILTIN_EXTENSION_SNAPPY WT_RET(snappy_extension_init(&conn->iface, NULL)); #endif @@ -819,18 +820,16 @@ __conn_load_default_extensions(WT_CONNECTION_IMPL *conn) } /* - * __conn_load_extension -- - * WT_CONNECTION->load_extension method. + * __conn_load_extension_int -- + * Internal extension load interface */ static int -__conn_load_extension( - WT_CONNECTION *wt_conn, const char *path, const char *config) +__conn_load_extension_int(WT_SESSION_IMPL *session, + const char *path, const char *cfg[], bool early_load) { WT_CONFIG_ITEM cval; - WT_CONNECTION_IMPL *conn; WT_DECL_RET; WT_DLH *dlh; - WT_SESSION_IMPL *session; int (*load)(WT_CONNECTION *, WT_CONFIG_ARG *); bool is_local; const char *init_name, *terminate_name; @@ -839,8 +838,10 @@ __conn_load_extension( init_name = terminate_name = NULL; is_local = strcmp(path, "local") == 0; - conn = (WT_CONNECTION_IMPL *)wt_conn; - CONNECTION_API_CALL(conn, session, load_extension, config, cfg); + /* Ensure that the load matches the phase of startup we are in. */ + WT_ERR(__wt_config_gets(session, cfg, "early_load", &cval)); + if ((cval.val == 0 && early_load) || (cval.val != 0 && !early_load)) + return (0); /* * This assumes the underlying shared libraries are reference counted, @@ -865,20 +866,39 @@ __conn_load_extension( __wt_dlsym(session, dlh, terminate_name, false, &dlh->terminate)); /* Call the load function last, it simplifies error handling. */ - WT_ERR(load(wt_conn, (WT_CONFIG_ARG *)cfg)); + WT_ERR(load(&S2C(session)->iface, (WT_CONFIG_ARG *)cfg)); /* Link onto the environment's list of open libraries. */ - __wt_spin_lock(session, &conn->api_lock); - TAILQ_INSERT_TAIL(&conn->dlhqh, dlh, q); - __wt_spin_unlock(session, &conn->api_lock); + __wt_spin_lock(session, &S2C(session)->api_lock); + TAILQ_INSERT_TAIL(&S2C(session)->dlhqh, dlh, q); + __wt_spin_unlock(session, &S2C(session)->api_lock); dlh = NULL; err: if (dlh != NULL) WT_TRET(__wt_dlclose(session, dlh)); __wt_free(session, init_name); __wt_free(session, terminate_name); + return (ret); +} - API_END_RET_NOTFOUND_MAP(session, ret); +/* + * __conn_load_extension -- + * WT_CONNECTION->load_extension method. + */ +static int +__conn_load_extension( + WT_CONNECTION *wt_conn, const char *path, const char *config) +{ + WT_CONNECTION_IMPL *conn; + WT_DECL_RET; + WT_SESSION_IMPL *session; + + conn = (WT_CONNECTION_IMPL *)wt_conn; + CONNECTION_API_CALL(conn, session, load_extension, config, cfg); + + ret = __conn_load_extension_int(session, path, cfg, false); + +err: API_END_RET_NOTFOUND_MAP(session, ret); } /* @@ -886,18 +906,16 @@ err: if (dlh != NULL) * Load the list of application-configured extensions. */ static int -__conn_load_extensions(WT_SESSION_IMPL *session, const char *cfg[]) +__conn_load_extensions( + WT_SESSION_IMPL *session, const char *cfg[], bool early_load) { WT_CONFIG subconfig; WT_CONFIG_ITEM cval, skey, sval; - WT_CONNECTION_IMPL *conn; WT_DECL_ITEM(exconfig); WT_DECL_ITEM(expath); WT_DECL_RET; - - conn = S2C(session); - - WT_ERR(__conn_load_default_extensions(conn)); + const char *sub_cfg[] = { + WT_CONFIG_BASE(session, WT_CONNECTION_load_extension), NULL, NULL }; WT_ERR(__wt_config_gets(session, cfg, "extensions", &cval)); WT_ERR(__wt_config_subinit(session, &subconfig, &cval)); @@ -912,8 +930,9 @@ __conn_load_extensions(WT_SESSION_IMPL *session, const char *cfg[]) WT_ERR(__wt_buf_fmt(session, exconfig, "%.*s", (int)sval.len, sval.str)); } - WT_ERR(conn->iface.load_extension(&conn->iface, - expath->data, (sval.len > 0) ? exconfig->data : NULL)); + sub_cfg[1] = sval.len > 0 ? exconfig->data : NULL; + WT_ERR(__conn_load_extension_int( + session, expath->data, sub_cfg, early_load)); } WT_ERR_NOTFOUND_OK(ret); @@ -1192,12 +1211,12 @@ __conn_config_file(WT_SESSION_IMPL *session, fh = NULL; /* Configuration files are always optional. */ - WT_RET(__wt_exist(session, filename, &exist)); + WT_RET(__wt_fs_exist(session, filename, &exist)); if (!exist) return (0); /* Open the configuration file. */ - WT_RET(__wt_open(session, filename, WT_FILE_TYPE_REGULAR, 0, &fh)); + WT_RET(__wt_open(session, filename, WT_OPEN_FILE_TYPE_REGULAR, 0, &fh)); WT_ERR(__wt_filesize(session, fh, &size)); if (size == 0) goto err; @@ -1488,8 +1507,8 @@ __conn_single(WT_SESSION_IMPL *session, const char *cfg[]) */ exist = false; if (!is_create) - WT_ERR(__wt_exist(session, WT_WIREDTIGER, &exist)); - ret = __wt_open(session, WT_SINGLETHREAD, WT_FILE_TYPE_REGULAR, + WT_ERR(__wt_fs_exist(session, WT_WIREDTIGER, &exist)); + ret = __wt_open(session, WT_SINGLETHREAD, WT_OPEN_FILE_TYPE_REGULAR, is_create || exist ? WT_OPEN_CREATE : 0, &conn->lock_fh); /* @@ -1545,7 +1564,7 @@ __conn_single(WT_SESSION_IMPL *session, const char *cfg[]) /* We own the lock file, optionally create the WiredTiger file. */ ret = __wt_open(session, WT_WIREDTIGER, - WT_FILE_TYPE_REGULAR, is_create ? WT_OPEN_CREATE : 0, &fh); + WT_OPEN_FILE_TYPE_REGULAR, is_create ? WT_OPEN_CREATE : 0, &fh); /* * If we're read-only, check for success as well as handled errors. @@ -1582,7 +1601,7 @@ __conn_single(WT_SESSION_IMPL *session, const char *cfg[]) * and there's never a database home after that point without a turtle * file. If the turtle file doesn't exist, it's a create. */ - WT_ERR(__wt_exist(session, WT_METADATA_TURTLE, &exist)); + WT_ERR(__wt_fs_exist(session, WT_METADATA_TURTLE, &exist)); conn->is_new = exist ? 0 : 1; if (conn->is_new) { @@ -1789,7 +1808,7 @@ __conn_write_base_config(WT_SESSION_IMPL *session, const char *cfg[]) * only NOT exist if we crashed before it was created; in other words, * if the base configuration file exists, we're done. */ - WT_RET(__wt_exist(session, WT_BASECONFIG, &exist)); + WT_RET(__wt_fs_exist(session, WT_BASECONFIG, &exist)); if (exist) return (0); @@ -1864,6 +1883,57 @@ err: WT_TRET(__wt_fclose(session, &fs)); } /* + * __conn_set_file_system -- + * Configure a custom file system implementation on database open. + */ +static int +__conn_set_file_system( + WT_CONNECTION *wt_conn, WT_FILE_SYSTEM *file_system, const char *config) +{ + WT_CONNECTION_IMPL *conn; + WT_DECL_RET; + WT_SESSION_IMPL *session; + + conn = (WT_CONNECTION_IMPL *)wt_conn; + CONNECTION_API_CALL(conn, session, set_file_system, config, cfg); + WT_UNUSED(cfg); + + conn->file_system = file_system; + +err: API_END_RET(session, ret); +} + +/* + * __conn_chk_file_system -- + * Check the configured file system. + */ +static int +__conn_chk_file_system(WT_SESSION_IMPL *session, bool readonly) +{ + WT_CONNECTION_IMPL *conn; + + conn = S2C(session); + +#define WT_CONN_SET_FILE_SYSTEM_REQ(name) \ + if (conn->file_system->name == NULL) \ + WT_RET_MSG(session, EINVAL, \ + "a WT_FILE_SYSTEM.%s method must be configured", #name) + + WT_CONN_SET_FILE_SYSTEM_REQ(directory_list); + WT_CONN_SET_FILE_SYSTEM_REQ(directory_list_free); + /* not required: directory_sync */ + WT_CONN_SET_FILE_SYSTEM_REQ(exist); + WT_CONN_SET_FILE_SYSTEM_REQ(open_file); + if (!readonly) { + WT_CONN_SET_FILE_SYSTEM_REQ(remove); + WT_CONN_SET_FILE_SYSTEM_REQ(rename); + } + WT_CONN_SET_FILE_SYSTEM_REQ(size); + + return (0); +} + +/* * wiredtiger_open -- * Main library entry point: open a new connection to a WiredTiger * database. @@ -1887,12 +1957,13 @@ wiredtiger_open(const char *home, WT_EVENT_HANDLER *event_handler, __conn_add_compressor, __conn_add_encryptor, __conn_add_extractor, + __conn_set_file_system, __conn_get_extension_api }; static const WT_NAME_FLAG file_types[] = { - { "checkpoint", WT_FILE_TYPE_CHECKPOINT }, - { "data", WT_FILE_TYPE_DATA }, - { "log", WT_FILE_TYPE_LOG }, + { "checkpoint", WT_DIRECT_IO_CHECKPOINT }, + { "data", WT_DIRECT_IO_DATA }, + { "log", WT_DIRECT_IO_LOG }, { NULL, 0 } }; @@ -1982,10 +2053,27 @@ wiredtiger_open(const char *home, WT_EVENT_HANDLER *event_handler, F_SET(conn, WT_CONN_READONLY); /* - * After checking readonly and in-memory, but before we do anything that - * touches the filesystem, configure the OS layer. + * Load early extensions before doing further initialization (one early + * extension is to configure a file system). */ - WT_ERR(__wt_os_init(session)); + WT_ERR(__conn_load_extensions(session, cfg, true)); + + /* + * If the application didn't configure its own file system, configure + * one of ours. Check to ensure we have a valid file system. + */ + if (conn->file_system == NULL) { + if (F_ISSET(conn, WT_CONN_IN_MEMORY)) + WT_ERR(__wt_os_inmemory(session)); + else +#if defined(_MSC_VER) + WT_ERR(__wt_os_win(session)); +#else + WT_ERR(__wt_os_posix(session)); +#endif + } + WT_ERR( + __conn_chk_file_system(session, F_ISSET(conn, WT_CONN_READONLY))); /* * Capture the config_base setting file for later use. Again, if the @@ -2118,8 +2206,8 @@ wiredtiger_open(const char *home, WT_EVENT_HANDLER *event_handler, if (ret == 0) { if (sval.val) FLD_SET(conn->direct_io, ft->flag); - } else if (ret != WT_NOTFOUND) - goto err; + } else + WT_ERR_NOTFOUND_OK(ret); } WT_ERR(__wt_config_gets(session, cfg, "write_through", &cval)); @@ -2128,8 +2216,8 @@ wiredtiger_open(const char *home, WT_EVENT_HANDLER *event_handler, if (ret == 0) { if (sval.val) FLD_SET(conn->write_through, ft->flag); - } else if (ret != WT_NOTFOUND) - goto err; + } else + WT_ERR_NOTFOUND_OK(ret); } /* @@ -2153,15 +2241,15 @@ wiredtiger_open(const char *home, WT_EVENT_HANDLER *event_handler, ret = __wt_config_subgets(session, &cval, ft->name, &sval); if (ret == 0) { switch (ft->flag) { - case WT_FILE_TYPE_DATA: + case WT_DIRECT_IO_DATA: conn->data_extend_len = sval.val; break; - case WT_FILE_TYPE_LOG: + case WT_DIRECT_IO_LOG: conn->log_extend_len = sval.val; break; } - } else if (ret != WT_NOTFOUND) - goto err; + } else + WT_ERR_NOTFOUND_OK(ret); } WT_ERR(__wt_config_gets(session, cfg, "mmap", &cval)); @@ -2190,7 +2278,8 @@ wiredtiger_open(const char *home, WT_EVENT_HANDLER *event_handler, * everything else to be in place, and the extensions call back into the * library. */ - WT_ERR(__conn_load_extensions(session, cfg)); + WT_ERR(__conn_load_default_extensions(conn)); + WT_ERR(__conn_load_extensions(session, cfg, false)); /* * The metadata/log encryptor is configured after extensions, since diff --git a/src/conn/conn_handle.c b/src/conn/conn_handle.c index 5f4c38e7361..509966793e5 100644 --- a/src/conn/conn_handle.c +++ b/src/conn/conn_handle.c @@ -149,15 +149,17 @@ __wt_connection_destroy(WT_CONNECTION_IMPL *conn) __wt_spin_destroy(session, &conn->page_lock[i]); __wt_free(session, conn->page_lock); + /* Destroy the file-system configuration. */ + if (conn->file_system != NULL && conn->file_system->terminate != NULL) + WT_TRET(conn->file_system->terminate( + conn->file_system, (WT_SESSION *)session)); + /* Free allocated memory. */ __wt_free(session, conn->cfg); __wt_free(session, conn->home); __wt_free(session, conn->error_prefix); __wt_free(session, conn->sessions); - /* Destroy the OS configuration. */ - WT_TRET(__wt_os_cleanup(session)); - __wt_free(NULL, conn); return (ret); } diff --git a/src/conn/conn_log.c b/src/conn/conn_log.c index 672071b59bf..394378b65fc 100644 --- a/src/conn/conn_log.c +++ b/src/conn/conn_log.c @@ -178,6 +178,7 @@ __log_archive_once(WT_SESSION_IMPL *session, uint32_t backup_file) conn = S2C(session); log = conn->log; logcount = 0; + locked = false; logfiles = NULL; /* @@ -198,14 +199,14 @@ __log_archive_once(WT_SESSION_IMPL *session, uint32_t backup_file) * Main archive code. Get the list of all log files and * remove any earlier than the minimum log number. */ - WT_RET(__wt_dirlist(session, conn->log_path, - WT_LOG_FILENAME, WT_DIRLIST_INCLUDE, &logfiles, &logcount)); + WT_ERR(__wt_fs_directory_list( + session, conn->log_path, WT_LOG_FILENAME, &logfiles, &logcount)); /* * We can only archive files if a hot backup is not in progress or * if we are the backup. */ - WT_RET(__wt_readlock(session, conn->hot_backup_lock)); + WT_ERR(__wt_readlock(session, conn->hot_backup_lock)); locked = true; if (!conn->hot_backup || backup_file != 0) { for (i = 0; i < logcount; i++) { @@ -218,9 +219,6 @@ __log_archive_once(WT_SESSION_IMPL *session, uint32_t backup_file) } WT_ERR(__wt_readunlock(session, conn->hot_backup_lock)); locked = false; - __wt_log_files_free(session, logfiles, logcount); - logfiles = NULL; - logcount = 0; /* * Indicate what is our new earliest LSN. It is the start @@ -232,8 +230,7 @@ __log_archive_once(WT_SESSION_IMPL *session, uint32_t backup_file) err: __wt_err(session, ret, "log archive server error"); if (locked) WT_TRET(__wt_readunlock(session, conn->hot_backup_lock)); - if (logfiles != NULL) - __wt_log_files_free(session, logfiles, logcount); + WT_TRET(__wt_fs_directory_list_free(session, &logfiles, &logcount)); return (ret); } @@ -259,10 +256,10 @@ __log_prealloc_once(WT_SESSION_IMPL *session) * Allocate up to the maximum number, accounting for any existing * files that may not have been used yet. */ - WT_ERR(__wt_dirlist(session, conn->log_path, - WT_LOG_PREPNAME, WT_DIRLIST_INCLUDE, &recfiles, &reccount)); - __wt_log_files_free(session, recfiles, reccount); - recfiles = NULL; + WT_ERR(__wt_fs_directory_list( + session, conn->log_path, WT_LOG_PREPNAME, &recfiles, &reccount)); + WT_ERR(__wt_fs_directory_list_free(session, &recfiles, &reccount)); + /* * Adjust the number of files to pre-allocate if we find that * the critical path had to allocate them since we last ran. @@ -292,8 +289,7 @@ __log_prealloc_once(WT_SESSION_IMPL *session) if (0) err: __wt_err(session, ret, "log pre-alloc server error"); - if (recfiles != NULL) - __wt_log_files_free(session, recfiles, reccount); + WT_TRET(__wt_fs_directory_list_free(session, &recfiles, &reccount)); return (ret); } @@ -868,9 +864,9 @@ __wt_logmgr_create(WT_SESSION_IMPL *session, const char *cfg[]) "log write LSN")); WT_RET(__wt_rwlock_alloc(session, &log->log_archive_lock, "log archive lock")); - if (FLD_ISSET(conn->direct_io, WT_FILE_TYPE_LOG)) - log->allocsize = - WT_MAX((uint32_t)conn->buffer_alignment, WT_LOG_ALIGN); + if (FLD_ISSET(conn->direct_io, WT_DIRECT_IO_LOG)) + log->allocsize = (uint32_t) + WT_MAX(conn->buffer_alignment, WT_LOG_ALIGN); else log->allocsize = WT_LOG_ALIGN; WT_INIT_LSN(&log->alloc_lsn); diff --git a/src/cursor/cur_backup.c b/src/cursor/cur_backup.c index c89f002fa75..b901b5a0869 100644 --- a/src/cursor/cur_backup.c +++ b/src/cursor/cur_backup.c @@ -178,8 +178,7 @@ __backup_log_append(WT_SESSION_IMPL *session, WT_CURSOR_BACKUP *cb, bool active) for (i = 0; i < logcount; i++) WT_ERR(__backup_list_append(session, cb, logfiles[i])); } -err: if (logfiles != NULL) - __wt_log_files_free(session, logfiles, logcount); +err: WT_TRET(__wt_fs_directory_list_free(session, &logfiles, &logcount)); return (ret); } @@ -257,11 +256,11 @@ __backup_start( session, cb, WT_INCREMENTAL_BACKUP)); } else { WT_ERR(__backup_list_append(session, cb, WT_METADATA_BACKUP)); - WT_ERR(__wt_exist(session, WT_BASECONFIG, &exist)); + WT_ERR(__wt_fs_exist(session, WT_BASECONFIG, &exist)); if (exist) WT_ERR(__backup_list_append( session, cb, WT_BASECONFIG)); - WT_ERR(__wt_exist(session, WT_USERCONFIG, &exist)); + WT_ERR(__wt_fs_exist(session, WT_USERCONFIG, &exist)); if (exist) WT_ERR(__backup_list_append( session, cb, WT_USERCONFIG)); diff --git a/src/cursor/cur_join.c b/src/cursor/cur_join.c index 93c1711ef93..8bf7007527b 100644 --- a/src/cursor/cur_join.c +++ b/src/cursor/cur_join.c @@ -211,9 +211,8 @@ err: __wt_free(session, uri); /* * __curjoin_iter_bump -- - * Called to advance the iterator to the next endpoint, - * which may in turn advance to the next entry. - * + * Called to advance the iterator to the next endpoint, which may in turn + * advance to the next entry. */ static int __curjoin_iter_bump(WT_CURSOR_JOIN_ITER *iter) diff --git a/src/docs/Doxyfile b/src/docs/Doxyfile index 4c1682de6eb..69e9716b425 100644 --- a/src/docs/Doxyfile +++ b/src/docs/Doxyfile @@ -1570,6 +1570,8 @@ PREDEFINED = DOXYGEN \ __wt_event_handler:=WT_EVENT_HANDLER \ __wt_extension_api:=WT_EXTENSION_API \ __wt_extractor:=WT_EXTRACTOR \ + __wt_file_handle:=WT_FILE_HANDLE \ + __wt_file_system:=WT_FILE_SYSTEM \ __wt_item:=WT_ITEM \ __wt_lsn:=WT_LSN \ __wt_session:=WT_SESSION \ diff --git a/src/docs/custom-file-systems.dox b/src/docs/custom-file-systems.dox new file mode 100644 index 00000000000..4b012952e15 --- /dev/null +++ b/src/docs/custom-file-systems.dox @@ -0,0 +1,25 @@ +/*! @page custom_file_systems Custom File Systems + +Applications can provide a custom file system implementation that will be +used by WiredTiger to interact with the I/O subsystem using the +WT_FILE_SYSTEM and WT_FILE_HANDLE interfaces. + +It is not necessary for all file system providers to implement all methods +in the WT_FILE_SYSTEM and WT_FILE_HANDLE structures. The documentation for +those structures indicate which methods are optional. Methods which are not +provided should be set to NULL. Generally the function pointers should not +be changed once a handle is created. There is one exception to this, which +are the fallocate and fallocate_nolock - for an example of how fallocate +can be changed after create see the WiredTiger POSIX file system +implementation. + +WT_FILE_SYSTEM and WT_FILE_HANDLE methods which fail but not fatally +(for example, a file truncation call which fails because the file is +currently mapped into memory), should return EBUSY. + +Unless explicitly stated otherwise, WiredTiger may invoke methods on the +WT_FILE_SYSTEM and WT_FILE_HANDLE interfaces from multiple threads +concurrently. It is the responsibility of the implementation to protect +any shared data. + +*/ diff --git a/src/docs/examples.dox b/src/docs/examples.dox index 3ed7357b52c..9e3e6844da4 100644 --- a/src/docs/examples.dox +++ b/src/docs/examples.dox @@ -55,4 +55,7 @@ Shows how to access the database log files. @example ex_thread.c Shows how to access a database with multiple threads. +@example ex_file_system.c +Shows how to extend WiredTiger with a custom file-system implementation. + */ diff --git a/src/docs/programming.dox b/src/docs/programming.dox index 7ec68ca9b46..81e612e8ee8 100644 --- a/src/docs/programming.dox +++ b/src/docs/programming.dox @@ -56,6 +56,7 @@ each of which is ordered by one or more columns. - @subpage custom_collators - @subpage custom_extractors - @subpage custom_data_sources +- @subpage custom_file_systems - @subpage helium @m_endif diff --git a/src/docs/spell.ok b/src/docs/spell.ok index 965d28f2ec6..d197b5517f2 100644 --- a/src/docs/spell.ok +++ b/src/docs/spell.ok @@ -25,6 +25,7 @@ EBUSY ECMA EINVAL ENCRYPTOR +ENOTSUP EmpId Encryptors Facebook @@ -209,6 +210,7 @@ erlang errno exe fadvise +fallocate failchk fd's fdatasync @@ -333,6 +335,7 @@ nocase nocasecoll nodup noflush +nolock nolocking nommap nop diff --git a/src/include/block.h b/src/include/block.h index e964fb4e8c2..9f652ceddb9 100644 --- a/src/include/block.h +++ b/src/include/block.h @@ -174,6 +174,7 @@ struct __wt_bm { int (*compact_start)(WT_BM *, WT_SESSION_IMPL *); int (*free)(WT_BM *, WT_SESSION_IMPL *, const uint8_t *, size_t); bool (*is_mapped)(WT_BM *, WT_SESSION_IMPL *); + int (*map_discard)(WT_BM *, WT_SESSION_IMPL *, void *, size_t); int (*preload)(WT_BM *, WT_SESSION_IMPL *, const uint8_t *, size_t); int (*read) (WT_BM *, WT_SESSION_IMPL *, WT_ITEM *, const uint8_t *, size_t); @@ -196,9 +197,9 @@ struct __wt_bm { WT_BLOCK *block; /* Underlying file */ - void *map; /* Mapped region */ - size_t maplen; - void *mappingcookie; + void *map; /* Mapped region */ + size_t maplen; + void *mapped_cookie; /* * There's only a single block manager handle that can be written, all @@ -224,8 +225,6 @@ struct __wt_block { wt_off_t size; /* File size */ wt_off_t extend_size; /* File extended size */ wt_off_t extend_len; /* File extend chunk size */ - bool nowait_sync_available; /* File can flush asynchronously */ - bool preload_available; /* File pages can be preloaded */ /* Configuration information, set when the file is opened. */ uint32_t allocfirst; /* Allocation is first-fit */ diff --git a/src/include/config.h b/src/include/config.h index 48a255134af..486aa50e86c 100644 --- a/src/include/config.h +++ b/src/include/config.h @@ -59,41 +59,42 @@ struct __wt_config_parser_impl { #define WT_CONFIG_ENTRY_WT_CONNECTION_load_extension 7 #define WT_CONFIG_ENTRY_WT_CONNECTION_open_session 8 #define WT_CONFIG_ENTRY_WT_CONNECTION_reconfigure 9 -#define WT_CONFIG_ENTRY_WT_CURSOR_close 10 -#define WT_CONFIG_ENTRY_WT_CURSOR_reconfigure 11 -#define WT_CONFIG_ENTRY_WT_SESSION_begin_transaction 12 -#define WT_CONFIG_ENTRY_WT_SESSION_checkpoint 13 -#define WT_CONFIG_ENTRY_WT_SESSION_close 14 -#define WT_CONFIG_ENTRY_WT_SESSION_commit_transaction 15 -#define WT_CONFIG_ENTRY_WT_SESSION_compact 16 -#define WT_CONFIG_ENTRY_WT_SESSION_create 17 -#define WT_CONFIG_ENTRY_WT_SESSION_drop 18 -#define WT_CONFIG_ENTRY_WT_SESSION_join 19 -#define WT_CONFIG_ENTRY_WT_SESSION_log_flush 20 -#define WT_CONFIG_ENTRY_WT_SESSION_log_printf 21 -#define WT_CONFIG_ENTRY_WT_SESSION_open_cursor 22 -#define WT_CONFIG_ENTRY_WT_SESSION_rebalance 23 -#define WT_CONFIG_ENTRY_WT_SESSION_reconfigure 24 -#define WT_CONFIG_ENTRY_WT_SESSION_rename 25 -#define WT_CONFIG_ENTRY_WT_SESSION_reset 26 -#define WT_CONFIG_ENTRY_WT_SESSION_rollback_transaction 27 -#define WT_CONFIG_ENTRY_WT_SESSION_salvage 28 -#define WT_CONFIG_ENTRY_WT_SESSION_snapshot 29 -#define WT_CONFIG_ENTRY_WT_SESSION_strerror 30 -#define WT_CONFIG_ENTRY_WT_SESSION_transaction_sync 31 -#define WT_CONFIG_ENTRY_WT_SESSION_truncate 32 -#define WT_CONFIG_ENTRY_WT_SESSION_upgrade 33 -#define WT_CONFIG_ENTRY_WT_SESSION_verify 34 -#define WT_CONFIG_ENTRY_colgroup_meta 35 -#define WT_CONFIG_ENTRY_file_config 36 -#define WT_CONFIG_ENTRY_file_meta 37 -#define WT_CONFIG_ENTRY_index_meta 38 -#define WT_CONFIG_ENTRY_lsm_meta 39 -#define WT_CONFIG_ENTRY_table_meta 40 -#define WT_CONFIG_ENTRY_wiredtiger_open 41 -#define WT_CONFIG_ENTRY_wiredtiger_open_all 42 -#define WT_CONFIG_ENTRY_wiredtiger_open_basecfg 43 -#define WT_CONFIG_ENTRY_wiredtiger_open_usercfg 44 +#define WT_CONFIG_ENTRY_WT_CONNECTION_set_file_system 10 +#define WT_CONFIG_ENTRY_WT_CURSOR_close 11 +#define WT_CONFIG_ENTRY_WT_CURSOR_reconfigure 12 +#define WT_CONFIG_ENTRY_WT_SESSION_begin_transaction 13 +#define WT_CONFIG_ENTRY_WT_SESSION_checkpoint 14 +#define WT_CONFIG_ENTRY_WT_SESSION_close 15 +#define WT_CONFIG_ENTRY_WT_SESSION_commit_transaction 16 +#define WT_CONFIG_ENTRY_WT_SESSION_compact 17 +#define WT_CONFIG_ENTRY_WT_SESSION_create 18 +#define WT_CONFIG_ENTRY_WT_SESSION_drop 19 +#define WT_CONFIG_ENTRY_WT_SESSION_join 20 +#define WT_CONFIG_ENTRY_WT_SESSION_log_flush 21 +#define WT_CONFIG_ENTRY_WT_SESSION_log_printf 22 +#define WT_CONFIG_ENTRY_WT_SESSION_open_cursor 23 +#define WT_CONFIG_ENTRY_WT_SESSION_rebalance 24 +#define WT_CONFIG_ENTRY_WT_SESSION_reconfigure 25 +#define WT_CONFIG_ENTRY_WT_SESSION_rename 26 +#define WT_CONFIG_ENTRY_WT_SESSION_reset 27 +#define WT_CONFIG_ENTRY_WT_SESSION_rollback_transaction 28 +#define WT_CONFIG_ENTRY_WT_SESSION_salvage 29 +#define WT_CONFIG_ENTRY_WT_SESSION_snapshot 30 +#define WT_CONFIG_ENTRY_WT_SESSION_strerror 31 +#define WT_CONFIG_ENTRY_WT_SESSION_transaction_sync 32 +#define WT_CONFIG_ENTRY_WT_SESSION_truncate 33 +#define WT_CONFIG_ENTRY_WT_SESSION_upgrade 34 +#define WT_CONFIG_ENTRY_WT_SESSION_verify 35 +#define WT_CONFIG_ENTRY_colgroup_meta 36 +#define WT_CONFIG_ENTRY_file_config 37 +#define WT_CONFIG_ENTRY_file_meta 38 +#define WT_CONFIG_ENTRY_index_meta 39 +#define WT_CONFIG_ENTRY_lsm_meta 40 +#define WT_CONFIG_ENTRY_table_meta 41 +#define WT_CONFIG_ENTRY_wiredtiger_open 42 +#define WT_CONFIG_ENTRY_wiredtiger_open_all 43 +#define WT_CONFIG_ENTRY_wiredtiger_open_basecfg 44 +#define WT_CONFIG_ENTRY_wiredtiger_open_usercfg 45 /* * configuration section: END * DO NOT EDIT: automatically built by dist/flags.py. diff --git a/src/include/connection.h b/src/include/connection.h index 5023fb1872a..81229315c48 100644 --- a/src/include/connection.h +++ b/src/include/connection.h @@ -414,32 +414,26 @@ struct __wt_connection_impl { wt_off_t data_extend_len; /* file_extend data length */ wt_off_t log_extend_len; /* file_extend log length */ - /* O_DIRECT/FILE_FLAG_NO_BUFFERING file type flags */ - uint32_t direct_io; - uint32_t write_through; /* FILE_FLAG_WRITE_THROUGH type flags */ +#define WT_DIRECT_IO_CHECKPOINT 0x01 /* Checkpoints */ +#define WT_DIRECT_IO_DATA 0x02 /* Data files */ +#define WT_DIRECT_IO_LOG 0x04 /* Log files */ + uint32_t direct_io; /* O_DIRECT, FILE_FLAG_NO_BUFFERING */ + + uint32_t write_through; /* FILE_FLAG_WRITE_THROUGH */ + bool mmap; /* mmap configuration */ int page_size; /* OS page size for mmap alignment */ uint32_t verbose; - void *inmemory; /* In-memory configuration cookie */ - #define WT_STDERR(s) (&S2C(s)->wt_stderr) #define WT_STDOUT(s) (&S2C(s)->wt_stdout) WT_FSTREAM wt_stderr, wt_stdout; /* - * OS library/system call jump table, to support in-memory and readonly - * configurations as well as special devices with other non-POSIX APIs. + * File system interface abstracted to support alternative file system + * implementations. */ - int (*file_directory_list)(WT_SESSION_IMPL *, - const char *, const char *, uint32_t, char ***, u_int *); - int (*file_directory_sync)(WT_SESSION_IMPL *, const char *); - int (*file_exist)(WT_SESSION_IMPL *, const char *, bool *); - int (*file_remove)(WT_SESSION_IMPL *, const char *); - int (*file_rename)(WT_SESSION_IMPL *, const char *, const char *); - int (*file_size)(WT_SESSION_IMPL *, const char *, bool, wt_off_t *); - int (*file_open)(WT_SESSION_IMPL *, - WT_FH *, const char *, uint32_t, uint32_t); + WT_FILE_SYSTEM *file_system; uint32_t flags; }; diff --git a/src/include/extern.h b/src/include/extern.h index ae82424078d..22346698574 100644 --- a/src/include/extern.h +++ b/src/include/extern.h @@ -41,8 +41,8 @@ extern int __wt_block_extlist_write(WT_SESSION_IMPL *session, WT_BLOCK *block, W extern int __wt_block_extlist_truncate( WT_SESSION_IMPL *session, WT_BLOCK *block, WT_EXTLIST *el); extern int __wt_block_extlist_init(WT_SESSION_IMPL *session, WT_EXTLIST *el, const char *name, const char *extname, bool track_size); extern void __wt_block_extlist_free(WT_SESSION_IMPL *session, WT_EXTLIST *el); -extern int __wt_block_map( WT_SESSION_IMPL *session, WT_BLOCK *block, void *mapp, size_t *maplenp, void **mappingcookie); -extern int __wt_block_unmap( WT_SESSION_IMPL *session, WT_BLOCK *block, void *map, size_t maplen, void **mappingcookie); +extern int __wt_block_map(WT_SESSION_IMPL *session, WT_BLOCK *block, void *mapped_regionp, size_t *lengthp, void *mapped_cookiep); +extern int __wt_block_unmap(WT_SESSION_IMPL *session, WT_BLOCK *block, void *mapped_region, size_t length, void *mapped_cookie); extern int __wt_block_manager_open(WT_SESSION_IMPL *session, const char *filename, const char *cfg[], bool forced_salvage, bool readonly, uint32_t allocsize, WT_BM **bmp); extern int __wt_block_manager_drop(WT_SESSION_IMPL *session, const char *filename); extern int __wt_block_manager_create( WT_SESSION_IMPL *session, const char *filename, uint32_t allocsize); @@ -356,7 +356,6 @@ extern int __wt_log_force_sync(WT_SESSION_IMPL *session, WT_LSN *min_lsn); extern int __wt_log_needs_recovery(WT_SESSION_IMPL *session, WT_LSN *ckp_lsn, bool *recp); extern void __wt_log_written_reset(WT_SESSION_IMPL *session); extern int __wt_log_get_all_files(WT_SESSION_IMPL *session, char ***filesp, u_int *countp, uint32_t *maxid, bool active_only); -extern void __wt_log_files_free(WT_SESSION_IMPL *session, char **files, u_int count); extern int __wt_log_extract_lognum( WT_SESSION_IMPL *session, const char *name, uint32_t *id); extern int __wt_log_acquire(WT_SESSION_IMPL *session, uint64_t recsize, WT_LOGSLOT *slot); extern int __wt_log_allocfile( WT_SESSION_IMPL *session, uint32_t lognum, const char *dest); @@ -713,7 +712,7 @@ extern int __wt_txn_named_snapshot_config(WT_SESSION_IMPL *session, const char * extern int __wt_txn_named_snapshot_destroy(WT_SESSION_IMPL *session); extern int __wt_txn_recover(WT_SESSION_IMPL *session); extern bool __wt_absolute_path(const char *path); -extern bool __wt_handle_search(WT_SESSION_IMPL *session, const char *name, bool increment_ref, WT_FH *newfh, WT_FH **fhp); +extern bool __wt_handle_is_open(WT_SESSION_IMPL *session, const char *name); extern bool __wt_has_priv(void); extern const char *__wt_path_separator(void); extern const char *__wt_strerror(WT_SESSION_IMPL *session, int error, char *errbuf, size_t errlen); @@ -740,22 +739,18 @@ extern int __wt_malloc(WT_SESSION_IMPL *session, size_t bytes_to_allocate, void extern int __wt_map_error_rdonly(int error); extern int __wt_nfilename( WT_SESSION_IMPL *session, const char *name, size_t namelen, char **path); extern int __wt_once(void (*init_routine)(void)); -extern int __wt_open(WT_SESSION_IMPL *session, const char *name, uint32_t file_type, uint32_t flags, WT_FH **fhp); -extern int __wt_os_cleanup(WT_SESSION_IMPL *session); -extern int __wt_os_init(WT_SESSION_IMPL *session); +extern int __wt_open(WT_SESSION_IMPL *session, const char *name, WT_OPEN_FILE_TYPE file_type, u_int flags, WT_FH **fhp); extern int __wt_os_inmemory(WT_SESSION_IMPL *session); -extern int __wt_os_inmemory_cleanup(WT_SESSION_IMPL *session); extern int __wt_os_posix(WT_SESSION_IMPL *session); -extern int __wt_os_posix_cleanup(WT_SESSION_IMPL *session); extern int __wt_os_stdio(WT_SESSION_IMPL *session); extern int __wt_os_win(WT_SESSION_IMPL *session); -extern int __wt_os_win_cleanup(WT_SESSION_IMPL *session); -extern int __wt_posix_directory_list(WT_SESSION_IMPL *session, const char *dir, const char *prefix, uint32_t flags, char ***dirlist, u_int *countp); -extern int __wt_posix_file_allocate( WT_SESSION_IMPL *session, WT_FH *fh, wt_off_t offset, wt_off_t len); -extern int __wt_posix_map(WT_SESSION_IMPL *session, WT_FH *fh, void *mapp, size_t *lenp, void **mappingcookie); -extern int __wt_posix_map_discard( WT_SESSION_IMPL *session, WT_FH *fh, void *p, size_t size); -extern int __wt_posix_map_preload( WT_SESSION_IMPL *session, WT_FH *fh, const void *p, size_t size); -extern int __wt_posix_map_unmap(WT_SESSION_IMPL *session, WT_FH *fh, void *map, size_t len, void **mappingcookie); +extern int __wt_posix_directory_list(WT_FILE_SYSTEM *file_system, WT_SESSION *wt_session, const char *directory, const char *prefix, char ***dirlistp, uint32_t *countp); +extern int __wt_posix_directory_list_free(WT_FILE_SYSTEM *file_system, WT_SESSION *wt_session, char **dirlist, uint32_t count); +extern int __wt_posix_file_fallocate(WT_FILE_HANDLE *file_handle, WT_SESSION *wt_session, wt_off_t offset, wt_off_t len); +extern int __wt_posix_map(WT_FILE_HANDLE *fh, WT_SESSION *wt_session, void *mapped_regionp, size_t *lenp, void *mapped_cookiep); +extern int __wt_posix_map_discard(WT_FILE_HANDLE *fh, WT_SESSION *wt_session, void *map, size_t length, void *mapped_cookie); +extern int __wt_posix_map_preload(WT_FILE_HANDLE *fh, WT_SESSION *wt_session, const void *map, size_t length, void *mapped_cookie); +extern int __wt_posix_unmap(WT_FILE_HANDLE *fh, WT_SESSION *wt_session, void *mapped_region, size_t len, void *mapped_cookie); extern int __wt_realloc(WT_SESSION_IMPL *session, size_t *bytes_allocated_ret, size_t bytes_to_allocate, void *retp); extern int __wt_realloc_aligned(WT_SESSION_IMPL *session, size_t *bytes_allocated_ret, size_t bytes_to_allocate, void *retp); extern int __wt_realloc_noclear(WT_SESSION_IMPL *session, size_t *bytes_allocated_ret, size_t bytes_to_allocate, void *retp); @@ -764,15 +759,14 @@ extern int __wt_rename_and_sync_directory( WT_SESSION_IMPL *session, const char extern int __wt_strndup(WT_SESSION_IMPL *session, const void *str, size_t len, void *retp); extern int __wt_thread_create(WT_SESSION_IMPL *session, wt_thread_t *tidret, WT_THREAD_CALLBACK(*func)(void *), void *arg); extern int __wt_thread_join(WT_SESSION_IMPL *session, wt_thread_t tid); -extern int __wt_win_directory_list(WT_SESSION_IMPL *session, const char *dir, const char *prefix, uint32_t flags, char ***dirlist, u_int *countp); -extern int __wt_win_map(WT_SESSION_IMPL *session, WT_FH *fh, void *mapp, size_t *lenp, void **mappingcookie); -extern int __wt_win_map_discard(WT_SESSION_IMPL *session, WT_FH *fh, void *p, size_t size); -extern int __wt_win_map_preload( WT_SESSION_IMPL *session, WT_FH *fh, const void *p, size_t size); -extern int __wt_win_map_unmap(WT_SESSION_IMPL *session, WT_FH *fh, void *map, size_t len, void **mappingcookie); +extern int __wt_win_directory_list(WT_FILE_SYSTEM *file_system, WT_SESSION *wt_session, const char *directory, const char *prefix, char ***dirlistp, uint32_t *countp); +extern int __wt_win_directory_list_free(WT_FILE_SYSTEM *file_system, WT_SESSION *wt_session, char **dirlist, uint32_t count); +extern int __wt_win_fs_size(WT_FILE_SYSTEM *file_system, WT_SESSION *wt_session, const char *name, wt_off_t *sizep); +extern int __wt_win_map(WT_FILE_HANDLE *file_handle, WT_SESSION *wt_session, void *mapped_regionp, size_t *lenp, void *mapped_cookiep); +extern int __wt_win_unmap(WT_FILE_HANDLE *file_handle, WT_SESSION *wt_session, void *mapped_region, size_t length, void *mapped_cookie); extern uint64_t __wt_strtouq(const char *nptr, char **endptr, int base); extern void __wt_abort(WT_SESSION_IMPL *session) WT_GCC_FUNC_DECL_ATTRIBUTE((noreturn)); extern void __wt_free_int(WT_SESSION_IMPL *session, const void *p_arg); -extern void __wt_posix_file_allocate_configure(WT_SESSION_IMPL *session, WT_FH *fh); extern void __wt_sleep(uint64_t seconds, uint64_t micro_seconds); extern void __wt_stream_set_line_buffer(FILE *fp); extern void __wt_stream_set_no_buffer(FILE *fp); diff --git a/src/include/flags.h b/src/include/flags.h index 3d9b0ed716b..da7aee7b059 100644 --- a/src/include/flags.h +++ b/src/include/flags.h @@ -24,11 +24,6 @@ #define WT_EVICT_IN_MEMORY 0x00000002 #define WT_EVICT_LOOKASIDE 0x00000004 #define WT_EVICT_UPDATE_RESTORE 0x00000008 -#define WT_FILE_TYPE_CHECKPOINT 0x00000001 -#define WT_FILE_TYPE_DATA 0x00000002 -#define WT_FILE_TYPE_DIRECTORY 0x00000004 -#define WT_FILE_TYPE_LOG 0x00000008 -#define WT_FILE_TYPE_REGULAR 0x00000010 #define WT_LOGSCAN_FIRST 0x00000001 #define WT_LOGSCAN_FROM_CKP 0x00000002 #define WT_LOGSCAN_ONE 0x00000004 diff --git a/src/include/misc.h b/src/include/misc.h index 07d52c61eac..4c7c9572905 100644 --- a/src/include/misc.h +++ b/src/include/misc.h @@ -96,8 +96,9 @@ * the caller remember to put the & operator on the pointer. */ #define __wt_free(session, p) do { \ - if ((p) != NULL) \ - __wt_free_int(session, (void *)&(p)); \ + void *__p = &(p); \ + if (*(void **)__p != NULL) \ + __wt_free_int(session, __p); \ } while (0) #ifdef HAVE_DIAGNOSTIC #define __wt_overwrite_and_free(session, p) do { \ diff --git a/src/include/os.h b/src/include/os.h index 830277fb5f3..0df2ea49197 100644 --- a/src/include/os.h +++ b/src/include/os.h @@ -6,14 +6,6 @@ * See the file LICENSE for redistribution information. */ -/* - * Number of directory entries can grow dynamically. - */ -#define WT_DIR_ENTRY 32 - -#define WT_DIRLIST_EXCLUDE 0x1 /* Exclude files matching prefix */ -#define WT_DIRLIST_INCLUDE 0x2 /* Include files matching prefix */ - #define WT_SYSCALL_RETRY(call, ret) do { \ int __retry; \ for (__retry = 0; __retry < 10; ++__retry) { \ @@ -58,81 +50,68 @@ (t1).tv_nsec < (t2).tv_nsec ? -1 : \ (t1).tv_nsec == (t2).tv_nsec ? 0 : 1 : 1) -/* - * The underlying OS calls return ENOTSUP if posix_fadvise functionality isn't - * available, but WiredTiger uses the POSIX flag names in the API. Use distinct - * values so the underlying code can distinguish. - */ -#ifndef POSIX_FADV_DONTNEED -#define POSIX_FADV_DONTNEED 0x01 -#endif -#ifndef POSIX_FADV_WILLNEED -#define POSIX_FADV_WILLNEED 0x02 -#endif - -#define WT_OPEN_CREATE 0x001 /* Create */ -#define WT_OPEN_DIRECTIO 0x002 /* Direct I/O (if available) */ -#define WT_OPEN_EXCLUSIVE 0x004 /* Open exclusively */ -#define WT_OPEN_FIXED 0x008 /* Path isn't relative to home */ -#define WT_OPEN_READONLY 0x010 /* Open readonly (internal) */ - struct __wt_fh { + /* + * There is a file name field in both the WT_FH and WT_FILE_HANDLE + * structures, which isn't ideal. There would be compromises to keeping + * a single copy: If it were in WT_FH, file systems could not access + * the name field, if it were just in the WT_FILE_HANDLE internal + * WiredTiger code would need to maintain a string inside a structure + * that is owned by the user (since we care about the content of the + * file name). Keeping two copies seems most reasonable. + */ const char *name; /* File name */ - uint64_t name_hash; /* Hash of name */ - TAILQ_ENTRY(__wt_fh) q; /* List of open handles */ - TAILQ_ENTRY(__wt_fh) hashq; /* Hashed list of handles */ - u_int ref; /* Reference count */ + uint64_t name_hash; /* hash of name */ + TAILQ_ENTRY(__wt_fh) q; /* internal queue */ + TAILQ_ENTRY(__wt_fh) hashq; /* internal hash queue */ + u_int ref; /* reference count */ + + WT_FILE_HANDLE *handle; +}; +#ifdef _WIN32 +struct __wt_file_handle_win { + WT_FILE_HANDLE iface; /* - * Underlying file system handle support. + * Windows specific file handle fields */ -#ifdef _WIN32 HANDLE filehandle; /* Windows file handle */ HANDLE filehandle_secondary; /* Windows file handle for file size changes */ + bool direct_io; /* O_DIRECT configured */ +}; + #else - int fd; /* POSIX file handle */ -#endif - /* In-memory specific fields. */ - size_t off; /* Read/write offset */ - WT_ITEM buf; /* Data */ +struct __wt_file_handle_posix { + WT_FILE_HANDLE iface; - bool direct_io; /* O_DIRECT configured */ + /* + * POSIX specific file handle fields + */ + int fd; /* POSIX file handle */ - enum { /* file extend configuration */ - WT_FALLOCATE_AVAILABLE, - WT_FALLOCATE_NOT_AVAILABLE, - WT_FALLOCATE_POSIX, - WT_FALLOCATE_STD, - WT_FALLOCATE_SYS } fallocate_available; - bool fallocate_requires_locking; + bool direct_io; /* O_DIRECT configured */ +}; +#endif -#define WT_FH_IN_MEMORY 0x01 /* In-memory, don't remove */ - uint32_t flags; +struct __wt_file_handle_inmem { + WT_FILE_HANDLE iface; + /* + * In memory specific file handle fields + */ + TAILQ_ENTRY(__wt_file_handle_inmem) q; /* Closed file queue */ - int (*fh_advise)(WT_SESSION_IMPL *, WT_FH *, wt_off_t, wt_off_t, int); - int (*fh_allocate)(WT_SESSION_IMPL *, WT_FH *, wt_off_t, wt_off_t); - int (*fh_close)(WT_SESSION_IMPL *, WT_FH *); - int (*fh_lock)(WT_SESSION_IMPL *, WT_FH *, bool); - int (*fh_map)(WT_SESSION_IMPL *, WT_FH *, void *, size_t *, void **); - int (*fh_map_discard)(WT_SESSION_IMPL *, WT_FH *, void *, size_t); - int (*fh_map_preload)(WT_SESSION_IMPL *, WT_FH *, const void *, size_t); - int (*fh_map_unmap)( - WT_SESSION_IMPL *, WT_FH *, void *, size_t, void **); - int (*fh_read)(WT_SESSION_IMPL *, WT_FH *, wt_off_t, size_t, void *); - int (*fh_size)(WT_SESSION_IMPL *, WT_FH *, wt_off_t *); - int (*fh_sync)(WT_SESSION_IMPL *, WT_FH *, bool); - int (*fh_truncate)(WT_SESSION_IMPL *, WT_FH *, wt_off_t); - int (*fh_write)( - WT_SESSION_IMPL *, WT_FH *, wt_off_t, size_t, const void *); + size_t off; /* Read/write offset */ + WT_ITEM buf; /* Data */ + u_int ref; /* Reference count */ }; struct __wt_fstream { - const char *name; /* Stream name */ + const char *name; /* Stream name */ - FILE *fp; /* stdio FILE stream */ + FILE *fp; /* stdio FILE stream */ WT_FH *fh; /* WT file handle */ wt_off_t off; /* Read/write offset */ wt_off_t size; /* File size */ diff --git a/src/include/os_fhandle.i b/src/include/os_fhandle.i index 4a5d7d2c3a7..a093d80d388 100644 --- a/src/include/os_fhandle.i +++ b/src/include/os_fhandle.i @@ -7,18 +7,24 @@ */ /* - * __wt_directory_sync_fh -- - * Flush a directory file handle to ensure file creation is durable. - * - * We don't use the normal sync path because many file systems don't require - * this step and we don't want to penalize them. + * __wt_fsync -- + * POSIX fsync. */ static inline int -__wt_directory_sync_fh(WT_SESSION_IMPL *session, WT_FH *fh) +__wt_fsync(WT_SESSION_IMPL *session, WT_FH *fh, bool block) { - WT_ASSERT(session, !F_ISSET(S2C(session), WT_CONN_READONLY)); + WT_FILE_HANDLE *handle; - return (fh->fh_sync(session, fh, true)); + WT_RET(__wt_verbose( + session, WT_VERB_HANDLEOPS, "%s: handle-sync", fh->handle->name)); + + handle = fh->handle; + if (block) + return (handle->sync == NULL ? 0 : + handle->sync(handle, (WT_SESSION *)session)); + else + return (handle->sync_nowait == NULL ? 0 : + handle->sync_nowait(handle, (WT_SESSION *)session)); } /* @@ -29,14 +35,34 @@ static inline int __wt_fallocate( WT_SESSION_IMPL *session, WT_FH *fh, wt_off_t offset, wt_off_t len) { + WT_DECL_RET; + WT_FILE_HANDLE *handle; + WT_ASSERT(session, !F_ISSET(S2C(session), WT_CONN_READONLY)); WT_ASSERT(session, !F_ISSET(S2C(session), WT_CONN_IN_MEMORY)); WT_RET(__wt_verbose(session, WT_VERB_HANDLEOPS, "%s: handle-allocate: %" PRIuMAX " at %" PRIuMAX, - fh->name, (uintmax_t)len, (uintmax_t)offset)); - - return (fh->fh_allocate(session, fh, offset, len)); + fh->handle->name, (uintmax_t)len, (uintmax_t)offset)); + + /* + * Our caller is responsible for handling any locking issues, all we + * have to do is find a function to call. + * + * Be cautious, the underlying system might have configured the nolock + * flavor, that failed, and we have to fallback to the locking flavor. + */ + handle = fh->handle; + if (handle->fallocate_nolock != NULL) { + if ((ret = handle->fallocate_nolock( + handle, (WT_SESSION *)session, offset, len)) == 0) + return (0); + WT_RET_ERROR_OK(ret, ENOTSUP); + } + if (handle->fallocate != NULL) + return (handle->fallocate( + handle, (WT_SESSION *)session, offset, len)); + return (ENOTSUP); } /* @@ -46,10 +72,14 @@ __wt_fallocate( static inline int __wt_file_lock(WT_SESSION_IMPL * session, WT_FH *fh, bool lock) { + WT_FILE_HANDLE *handle; + WT_RET(__wt_verbose(session, WT_VERB_HANDLEOPS, - "%s: handle-lock: %s", fh->name, lock ? "lock" : "unlock")); + "%s: handle-lock: %s", fh->handle->name, lock ? "lock" : "unlock")); - return (fh->fh_lock(session, fh, lock)); + handle = fh->handle; + return (handle->lock == NULL ? 0 : + handle->lock(handle, (WT_SESSION*)session, lock)); } /* @@ -62,11 +92,12 @@ __wt_read( { WT_RET(__wt_verbose(session, WT_VERB_HANDLEOPS, "%s: handle-read: %" WT_SIZET_FMT " at %" PRIuMAX, - fh->name, len, (uintmax_t)offset)); + fh->handle->name, len, (uintmax_t)offset)); WT_STAT_FAST_CONN_INCR(session, read_io); - return (fh->fh_read(session, fh, offset, len, buf)); + return (fh->handle->read( + fh->handle, (WT_SESSION *)session, offset, len, buf)); } /* @@ -77,22 +108,9 @@ static inline int __wt_filesize(WT_SESSION_IMPL *session, WT_FH *fh, wt_off_t *sizep) { WT_RET(__wt_verbose( - session, WT_VERB_HANDLEOPS, "%s: handle-size", fh->name)); - - return (fh->fh_size(session, fh, sizep)); -} - -/* - * __wt_fsync -- - * POSIX fsync. - */ -static inline int -__wt_fsync(WT_SESSION_IMPL *session, WT_FH *fh, bool block) -{ - WT_RET(__wt_verbose( - session, WT_VERB_HANDLEOPS, "%s: handle-sync", fh->name)); + session, WT_VERB_HANDLEOPS, "%s: handle-size", fh->handle->name)); - return (fh->fh_sync(session, fh, block)); + return (fh->handle->size(fh->handle, (WT_SESSION *)session, sizep)); } /* @@ -105,9 +123,10 @@ __wt_ftruncate(WT_SESSION_IMPL *session, WT_FH *fh, wt_off_t len) WT_ASSERT(session, !F_ISSET(S2C(session), WT_CONN_READONLY)); WT_RET(__wt_verbose(session, WT_VERB_HANDLEOPS, - "%s: handle-truncate: %" PRIuMAX, fh->name, (uintmax_t)len)); + "%s: handle-truncate: %" PRIuMAX, + fh->handle->name, (uintmax_t)len)); - return (fh->fh_truncate(session, fh, len)); + return (fh->handle->truncate(fh->handle, (WT_SESSION *)session, len)); } /* @@ -124,9 +143,10 @@ __wt_write(WT_SESSION_IMPL *session, WT_RET(__wt_verbose(session, WT_VERB_HANDLEOPS, "%s: handle-write: %" WT_SIZET_FMT " at %" PRIuMAX, - fh->name, len, (uintmax_t)offset)); + fh->handle->name, len, (uintmax_t)offset)); WT_STAT_FAST_CONN_INCR(session, write_io); - return (fh->fh_write(session, fh, offset, len, buf)); + return (fh->handle->write( + fh->handle, (WT_SESSION *)session, offset, len, buf)); } diff --git a/src/include/os_fs.i b/src/include/os_fs.i index 8f3920ffdb2..b7389a39e06 100644 --- a/src/include/os_fs.i +++ b/src/include/os_fs.i @@ -7,89 +7,238 @@ */ /* - * __wt_dirlist -- + * __wt_fs_directory_list -- * Get a list of files from a directory. */ static inline int -__wt_dirlist(WT_SESSION_IMPL *session, const char *dir, - const char *prefix, uint32_t flags, char ***dirlist, u_int *countp) +__wt_fs_directory_list(WT_SESSION_IMPL *session, + const char *dir, const char *prefix, char ***dirlistp, u_int *countp) { - WT_ASSERT(session, !F_ISSET(S2C(session), WT_CONN_IN_MEMORY)); + WT_DECL_RET; + WT_FILE_SYSTEM *file_system; + WT_SESSION *wt_session; + char *path; + + *dirlistp = NULL; + *countp = 0; WT_RET(__wt_verbose(session, WT_VERB_FILEOPS, "%s: directory-list: %s prefix %s", - dir, LF_ISSET(WT_DIRLIST_INCLUDE) ? "include" : "exclude", - prefix == NULL ? "all" : prefix)); + dir, prefix == NULL ? "all" : prefix)); + + WT_RET(__wt_filename(session, dir, &path)); + + file_system = S2C(session)->file_system; + wt_session = (WT_SESSION *)session; + ret = file_system->directory_list( + file_system, wt_session, path, prefix, dirlistp, countp); - return (S2C(session)->file_directory_list( - session, dir, prefix, flags, dirlist, countp)); + __wt_free(session, path); + return (ret); } /* - * __wt_directory_sync -- + * __wt_fs_directory_list_free -- + * Free memory allocated by __wt_fs_directory_list. + */ +static inline int +__wt_fs_directory_list_free( + WT_SESSION_IMPL *session, char ***dirlistp, u_int *countp) +{ + WT_DECL_RET; + WT_FILE_SYSTEM *file_system; + WT_SESSION *wt_session; + + if (*dirlistp != NULL) { + file_system = S2C(session)->file_system; + wt_session = (WT_SESSION *)session; + ret = file_system->directory_list_free( + file_system, wt_session, *dirlistp, *countp); + } + + *dirlistp = NULL; + *countp = 0; + return (ret); +} + +/* + * __wt_fs_directory_sync -- * Flush a directory to ensure file creation is durable. */ static inline int -__wt_directory_sync(WT_SESSION_IMPL *session, const char *name) +__wt_fs_directory_sync(WT_SESSION_IMPL *session, const char *name) { + WT_DECL_RET; + WT_FILE_SYSTEM *file_system; + WT_SESSION *wt_session; + char *copy, *dir; + WT_ASSERT(session, !F_ISSET(S2C(session), WT_CONN_READONLY)); WT_RET(__wt_verbose( session, WT_VERB_FILEOPS, "%s: directory-sync", name)); - return (S2C(session)->file_directory_sync(session, name)); + /* + * POSIX 1003.1 does not require that fsync of a file handle ensures the + * entry in the directory containing the file has also reached disk (and + * there are historic Linux filesystems requiring it). If the underlying + * filesystem method is set, do an explicit fsync on a file descriptor + * for the directory to be sure. + * + * directory-sync is not a required call, no method means the call isn't + * needed. + */ + file_system = S2C(session)->file_system; + if (file_system->directory_sync == NULL) + return (0); + + copy = NULL; + if (name == NULL || strchr(name, '/') == NULL) + name = S2C(session)->home; + else { + /* + * File name construction should not return a path without any + * slash separator, but caution isn't unreasonable. + */ + WT_RET(__wt_filename(session, name, ©)); + if ((dir = strrchr(copy, '/')) == NULL) + name = S2C(session)->home; + else { + dir[1] = '\0'; + name = copy; + } + } + + wt_session = (WT_SESSION *)session; + ret = file_system->directory_sync(file_system, wt_session, name); + + __wt_free(session, copy); + return (ret); } /* - * __wt_exist -- + * __wt_fs_exist -- * Return if the file exists. */ static inline int -__wt_exist(WT_SESSION_IMPL *session, const char *name, bool *existp) +__wt_fs_exist(WT_SESSION_IMPL *session, const char *name, bool *existp) { + WT_DECL_RET; + WT_FILE_SYSTEM *file_system; + WT_SESSION *wt_session; + char *path; + WT_RET(__wt_verbose(session, WT_VERB_FILEOPS, "%s: file-exist", name)); - return (S2C(session)->file_exist(session, name, existp)); + WT_RET(__wt_filename(session, name, &path)); + + file_system = S2C(session)->file_system; + wt_session = (WT_SESSION *)session; + ret = file_system->exist(file_system, wt_session, path, existp); + + __wt_free(session, path); + return (ret); } /* - * __wt_remove -- + * __wt_fs_remove -- * POSIX remove. */ static inline int -__wt_remove(WT_SESSION_IMPL *session, const char *name) +__wt_fs_remove(WT_SESSION_IMPL *session, const char *name) { + WT_DECL_RET; + WT_FILE_SYSTEM *file_system; + WT_SESSION *wt_session; + char *path; + WT_ASSERT(session, !F_ISSET(S2C(session), WT_CONN_READONLY)); WT_RET(__wt_verbose(session, WT_VERB_FILEOPS, "%s: file-remove", name)); - return (S2C(session)->file_remove(session, name)); +#ifdef HAVE_DIAGNOSTIC + /* + * It is a layering violation to retrieve a WT_FH here, but it is a + * useful diagnostic to ensure WiredTiger doesn't have the handle open. + */ + if (__wt_handle_is_open(session, name)) + WT_RET_MSG(session, EINVAL, + "%s: file-remove: file has open handles", name); +#endif + + WT_RET(__wt_filename(session, name, &path)); + + file_system = S2C(session)->file_system; + wt_session = (WT_SESSION *)session; + ret = file_system->remove(file_system, wt_session, path); + + __wt_free(session, path); + return (ret); } /* - * __wt_rename -- + * __wt_fs_rename -- * POSIX rename. */ static inline int -__wt_rename(WT_SESSION_IMPL *session, const char *from, const char *to) +__wt_fs_rename(WT_SESSION_IMPL *session, const char *from, const char *to) { + WT_DECL_RET; + WT_FILE_SYSTEM *file_system; + WT_SESSION *wt_session; + char *from_path, *to_path; + WT_ASSERT(session, !F_ISSET(S2C(session), WT_CONN_READONLY)); WT_RET(__wt_verbose( session, WT_VERB_FILEOPS, "%s to %s: file-rename", from, to)); - return (S2C(session)->file_rename(session, from, to)); +#ifdef HAVE_DIAGNOSTIC + /* + * It is a layering violation to retrieve a WT_FH here, but it is a + * useful diagnostic to ensure WiredTiger doesn't have the handle open. + */ + if (__wt_handle_is_open(session, from)) + WT_RET_MSG(session, EINVAL, + "%s: file-rename: file has open handles", from); + if (__wt_handle_is_open(session, to)) + WT_RET_MSG(session, EINVAL, + "%s: file-rename: file has open handles", to); +#endif + + from_path = to_path = NULL; + WT_ERR(__wt_filename(session, from, &from_path)); + WT_ERR(__wt_filename(session, to, &to_path)); + + file_system = S2C(session)->file_system; + wt_session = (WT_SESSION *)session; + ret = file_system->rename(file_system, wt_session, from_path, to_path); + +err: __wt_free(session, from_path); + __wt_free(session, to_path); + return (ret); } /* - * __wt_filesize_name -- + * __wt_fs_size -- * Get the size of a file in bytes, by file name. */ static inline int -__wt_filesize_name( - WT_SESSION_IMPL *session, const char *name, bool silent, wt_off_t *sizep) +__wt_fs_size(WT_SESSION_IMPL *session, const char *name, wt_off_t *sizep) { + WT_DECL_RET; + WT_FILE_SYSTEM *file_system; + WT_SESSION *wt_session; + char *path; + WT_RET(__wt_verbose(session, WT_VERB_FILEOPS, "%s: file-size", name)); - return (S2C(session)->file_size(session, name, silent, sizep)); + WT_RET(__wt_filename(session, name, &path)); + + file_system = S2C(session)->file_system; + wt_session = (WT_SESSION *)session; + ret = file_system->size(file_system, wt_session, path, sizep); + + __wt_free(session, path); + return (ret); } diff --git a/src/include/wiredtiger.in b/src/include/wiredtiger.in index 87f7ed276e2..6bb1bf418e9 100644 --- a/src/include/wiredtiger.in +++ b/src/include/wiredtiger.in @@ -71,6 +71,8 @@ struct __wt_encryptor; typedef struct __wt_encryptor WT_ENCRYPTOR; struct __wt_event_handler; typedef struct __wt_event_handler WT_EVENT_HANDLER; struct __wt_extension_api; typedef struct __wt_extension_api WT_EXTENSION_API; struct __wt_extractor; typedef struct __wt_extractor WT_EXTRACTOR; +struct __wt_file_handle; typedef struct __wt_file_handle WT_FILE_HANDLE; +struct __wt_file_system; typedef struct __wt_file_system WT_FILE_SYSTEM; struct __wt_item; typedef struct __wt_item WT_ITEM; struct __wt_session; typedef struct __wt_session WT_SESSION; @@ -2034,6 +2036,10 @@ struct __wt_connection { * @configstart{WT_CONNECTION.load_extension, see dist/api_data.py} * @config{config, configuration string passed to the entry point of the * extension as its WT_CONFIG_ARG argument., a string; default empty.} + * @config{early_load, whether this extension should be loaded at the + * beginning of ::wiredtiger_open. Only applicable to extensions loaded + * via the wiredtiger_open configurations string., a boolean flag; + * default \c false.} * @config{entry, the entry point of the extension\, called to * initialize the extension when it is loaded. The signature of the * function must match ::wiredtiger_extension_init., a string; default @@ -2145,6 +2151,23 @@ struct __wt_connection { WT_EXTRACTOR *extractor, const char *config); /*! + * Configure a custom file system. + * + * This method can only be called from an early loaded extension + * module. The application must first implement the WT_FILE_SYSTEM + * interface and then register the implementation with WiredTiger: + * + * @snippet ex_file_system.c WT_FILE_SYSTEM register + * + * @param connection the connection handle + * @param fs the populated file system structure + * @configempty{WT_CONNECTION.set_file_system, see dist/api_data.py} + * @errors + */ + int __F(set_file_system)( + WT_CONNECTION *connection, WT_FILE_SYSTEM *fs, const char *config); + + /*! * Return a reference to the WiredTiger extension functions. * * @snippet ex_data_source.c WT_EXTENSION_API declaration @@ -3056,7 +3079,7 @@ const char *wiredtiger_version(int *majorp, int *minorp, int *patchp); /******************************************* * Forward structure declarations for the extension API *******************************************/ -struct __wt_config_arg; typedef struct __wt_config_arg WT_CONFIG_ARG; +struct __wt_config_arg; typedef struct __wt_config_arg WT_CONFIG_ARG; /*! * The interface implemented by applications to provide custom ordering of @@ -3587,7 +3610,7 @@ struct __wt_encryptor { * number of bytes needed. * * @param[out] expansion_constantp the additional number of bytes needed - * when encrypting. + * when encrypting. * @returns zero for success, non-zero to indicate an error. * * @snippet nop_encrypt.c WT_ENCRYPTOR sizing @@ -3606,8 +3629,7 @@ struct __wt_encryptor { * is used instead of this one for any callbacks. * * @param[in] encrypt_config the "encryption" portion of the - * configuration from the wiredtiger_open or - * WT_SESSION::create call + * configuration from the wiredtiger_open or WT_SESSION::create call * @param[out] customp the new modified encryptor, or NULL. * @returns zero for success, non-zero to indicate an error. */ @@ -3682,6 +3704,466 @@ struct __wt_extractor { int (*terminate)(WT_EXTRACTOR *extractor, WT_SESSION *session); }; +#if !defined(SWIG) +/*! WT_FILE_SYSTEM::open_file file types */ +typedef enum { + WT_OPEN_FILE_TYPE_CHECKPOINT, /*!< open a data file checkpoint */ + WT_OPEN_FILE_TYPE_DATA, /*!< open a data file */ + WT_OPEN_FILE_TYPE_DIRECTORY, /*!< open a directory */ + WT_OPEN_FILE_TYPE_LOG, /*!< open a log file */ + WT_OPEN_FILE_TYPE_REGULAR /*!< open a regular file */ +} WT_OPEN_FILE_TYPE; + +/*! WT_FILE_SYSTEM::open_file flags: create if does not exist */ +#define WT_OPEN_CREATE 0x001 +/*! WT_FILE_SYSTEM::open_file flags: direct I/O requested */ +#define WT_OPEN_DIRECTIO 0x002 +/*! WT_FILE_SYSTEM::open_file flags: error if exclusive use not available */ +#define WT_OPEN_EXCLUSIVE 0x004 +#ifndef DOXYGEN +#define WT_OPEN_FIXED 0x008 /* Path not home relative (internal) */ +#endif +/*! WT_FILE_SYSTEM::open_file flags: open is read-only */ +#define WT_OPEN_READONLY 0x010 + +/*! + * The interface implemented by applications to provide a custom file system + * implementation. + * + * <b>Thread safety:</b> WiredTiger may invoke methods on the WT_FILE_SYSTEM + * interface from multiple threads concurrently. It is the responsibility of + * the implementation to protect any shared data. + * + * Applications register implementations with WiredTiger by calling + * WT_CONNECTION::add_file_system. See @ref custom_file_systems for more + * information. + * + * @snippet ex_file_system.c WT_FILE_SYSTEM register + */ +struct __wt_file_system { + /*! + * Return a list of file names for the named directory. + * + * @errors + * + * @param file_system the WT_FILE_SYSTEM + * @param session the current WiredTiger session + * @param directory the name of the directory + * @param prefix if not NULL, only files with names matching the prefix + * are returned + * @param[out] dirlist the method returns an allocated array of + * individually allocated strings, one for each entry in the + * directory. + * @param[out] countp the method the number of entries returned + */ + int (*directory_list)(WT_FILE_SYSTEM *file_system, WT_SESSION *session, + const char *directory, const char *prefix, char ***dirlist, + uint32_t *countp); + + /*! + * Free memory allocated by WT_FILE_SYSTEM::directory_list. + * + * @errors + * + * @param file_system the WT_FILE_SYSTEM + * @param session the current WiredTiger session + * @param dirlist array returned by WT_FILE_SYSTEM::directory_list + * @param count count returned by WT_FILE_SYSTEM::directory_list + */ + int (*directory_list_free)(WT_FILE_SYSTEM *file_system, + WT_SESSION *session, char **dirlist, uint32_t count); + + /*! + * Flush the named directory. + * + * This method is not required for readonly file systems or file systems + * where it is not necessary to flush a file's directory to ensure the + * durability of file system operations, and should be set to NULL when + * not required by the file system. + * + * @errors + * + * @param file_system the WT_FILE_SYSTEM + * @param session the current WiredTiger session + * @param directory the name of the directory + */ + int (*directory_sync)(WT_FILE_SYSTEM *file_system, + WT_SESSION *session, const char *directory); + + /*! + * Return if the named file system object exists. + * + * @errors + * + * @param file_system the WT_FILE_SYSTEM + * @param session the current WiredTiger session + * @param name the name of the file + * @param[out] existp If the named file system object exists + */ + int (*exist)(WT_FILE_SYSTEM *file_system, + WT_SESSION *session, const char *name, bool *existp); + + /*! + * Open a handle for a named file system object + * + * @errors + * + * @param file_system the WT_FILE_SYSTEM + * @param session the current WiredTiger session + * @param name the name of the file system object + * @param file_type the type of the file + * The file type is provided to allow optimization for different file + * access patterns. + * @param flags flags indicating how to open the file, one or more of + * ::WT_OPEN_CREATE, ::WT_OPEN_DIRECTIO, ::WT_OPEN_EXCLUSIVE or + * ::WT_OPEN_READONLY. + * @param[out] file_handlep the handle to the newly opened file. File + * system implementations must allocate memory for the handle and + * the WT_FILE_HANDLE::name field, and fill in the WT_FILE_HANDLE:: + * fields. Applications wanting to associate private information + * with the WT_FILE_HANDLE:: structure should declare and allocate + * their own structure as a superset of a WT_FILE_HANDLE:: structure. + */ + int (*open_file)(WT_FILE_SYSTEM *file_system, WT_SESSION *session, + const char *name, WT_OPEN_FILE_TYPE file_type, uint32_t flags, + WT_FILE_HANDLE **file_handlep); + + /*! + * Remove a named file system object + * + * This method is not required for readonly file systems and should be + * set to NULL when not required by the file system. + * + * @errors + * + * @param file_system the WT_FILE_SYSTEM + * @param session the current WiredTiger session + * @param name the name of the file system object + */ + int (*remove)( + WT_FILE_SYSTEM *file_system, WT_SESSION *session, const char *name); + + /*! + * Rename a named file system object + * + * This method is not required for readonly file systems and should be + * set to NULL when not required by the file system. + * + * @errors + * + * @param file_system the WT_FILE_SYSTEM + * @param session the current WiredTiger session + * @param from the original name of the object + * @param to the new name for the object + */ + int (*rename)(WT_FILE_SYSTEM *file_system, + WT_SESSION *session, const char *from, const char *to); + + /*! + * Return the size of a named file system object + * + * @errors + * + * @param file_system the WT_FILE_SYSTEM + * @param session the current WiredTiger session + * @param name the name of the file system object + * @param[out] sizep the size of the file system entry + */ + int (*size)(WT_FILE_SYSTEM *file_system, + WT_SESSION *session, const char *name, wt_off_t *sizep); + + /*! + * A callback performed when the file system is closed and will no + * longer be accessed by the WiredTiger database. + * + * This method is not required and should be set to NULL when not + * required by the file system. + * + * The WT_FILE_SYSTEM::terminate callback is intended to allow cleanup, + * the handle will not be subsequently accessed by WiredTiger. + */ + int (*terminate)(WT_FILE_SYSTEM *file_system, WT_SESSION *session); +}; + +/*! WT_FILE_HANDLE::fadvise flags: no longer need */ +#define WT_FILE_HANDLE_DONTNEED 1 +/*! WT_FILE_HANDLE::fadvise flags: will need */ +#define WT_FILE_HANDLE_WILLNEED 2 + +/*! + * A file handle implementation returned by WT_FILE_SYSTEM::open_file. + * + * <b>Thread safety:</b> Unless explicitly stated otherwise, WiredTiger may + * invoke methods on the WT_FILE_HANDLE interface from multiple threads + * concurrently. It is the responsibility of the implementation to protect + * any shared data. + * + * See @ref custom_file_systems for more information. + */ +struct __wt_file_handle { + /*! + * The enclosing file system, set by WT_FILE_SYSTEM::open_file. + */ + WT_FILE_SYSTEM *file_system; + + /*! + * The name of the file, set by WT_FILE_SYSTEM::open_file. + */ + char *name; + + /*! + * Close a file handle, the handle will not be further accessed by + * WiredTiger. + * + * @errors + * + * @param file_handle the WT_FILE_HANDLE + * @param session the current WiredTiger session + */ + int (*close)(WT_FILE_HANDLE *file_handle, WT_SESSION *session); + + /*! + * Indicate expected future use of file ranges, based on the POSIX + * 1003.1 standard fadvise. + * + * This method is not required, and should be set to NULL when not + * supported by the file. + * + * @errors + * + * @param file_handle the WT_FILE_HANDLE + * @param session the current WiredTiger session + * @param offset the file offset + * @param len the size of the advisory + * @param advice one of ::WT_FILE_HANDLE_WILLNEED or + * ::WT_FILE_HANDLE_DONTNEED. + */ + int (*fadvise)(WT_FILE_HANDLE *file_handle, + WT_SESSION *session, wt_off_t offset, wt_off_t len, int advice); + + /*! + * Ensure disk space is allocated for the file, based on the POSIX + * 1003.1 standard fallocate. + * + * This method is not required, and should be set to NULL when not + * supported by the file. + * + * This method is not called by multiple threads concurrently (on the + * same file handle). If the file handle's fallocate method supports + * concurrent calls, set the WT_FILE_HANDLE::fallocate_nolock method + * instead. + * + * @errors + * + * @param file_handle the WT_FILE_HANDLE + * @param session the current WiredTiger session + * @param offset the file offset + * @param len the size of the advisory + */ + int (*fallocate)(WT_FILE_HANDLE *file_handle, + WT_SESSION *session, wt_off_t, wt_off_t); + + /*! + * Ensure disk space is allocated for the file, based on the POSIX + * 1003.1 standard fallocate. + * + * This method is not required, and should be set to NULL when not + * supported by the file. + * + * This method may be called by multiple threads concurrently (on the + * same file handle). If the file handle's fallocate method does not + * support concurrent calls, set the WT_FILE_HANDLE::fallocate method + * instead. + * + * @errors + * + * @param file_handle the WT_FILE_HANDLE + * @param session the current WiredTiger session + * @param offset the file offset + * @param len the size of the advisory + */ + int (*fallocate_nolock)(WT_FILE_HANDLE *file_handle, + WT_SESSION *session, wt_off_t, wt_off_t); + + /*! + * Lock/unlock a file from the perspective of other processes running + * in the system. + * + * @errors + * + * @param file_handle the WT_FILE_HANDLE + * @param session the current WiredTiger session + * @param lock whether to lock or unlock + */ + int (*lock)( + WT_FILE_HANDLE *file_handle, WT_SESSION *session, bool lock); + + /*! + * Map a file into memory, based on the POSIX 1003.1 standard mmap. + * + * This method is not required, and should be set to NULL when not + * supported by the file. + * + * @errors + * + * @param file_handle the WT_FILE_HANDLE + * @param session the current WiredTiger session + * @param[out] mapped_regionp a reference to a memory location into + * which should be stored a pointer to the start of the mapped region + * @param[out] lengthp a reference to a memory location into which + * should be stored the length of the region + * @param[out] mapped_cookiep a reference to a memory location into + * which can be optionally stored a pointer to an opaque cookie + * which is subsequently passed to WT_FILE_HANDLE::unmap. + */ + int (*map)(WT_FILE_HANDLE *file_handle, WT_SESSION *session, + void *mapped_regionp, size_t *lengthp, void *mapped_cookiep); + + /*! + * Unmap part of a memory mapped file, based on the POSIX 1003.1 + * standard madvise. + * + * This method is not required, and should be set to NULL when not + * supported by the file. + * + * @errors + * + * @param file_handle the WT_FILE_HANDLE + * @param session the current WiredTiger session + * @param map a location in the mapped region unlikely to be used in the + * near future + * @param length the length of the mapped region to discard + * @param mapped_cookie any cookie set by the WT_FILE_HANDLE::map method + */ + int (*map_discard)(WT_FILE_HANDLE *file_handle, + WT_SESSION *session, void *map, size_t length, void *mapped_cookie); + + /*! + * Preload part of a memory mapped file, based on the POSIX 1003.1 + * standard madvise. + * + * This method is not required, and should be set to NULL when not + * supported by the file. + * + * @errors + * + * @param file_handle the WT_FILE_HANDLE + * @param session the current WiredTiger session + * @param map a location in the mapped region likely to be used in the + * near future + * @param length the size of the mapped region to preload + * @param mapped_cookie any cookie set by the WT_FILE_HANDLE::map method + */ + int (*map_preload)(WT_FILE_HANDLE *file_handle, WT_SESSION *session, + const void *map, size_t length, void *mapped_cookie); + + /*! + * Unmap a memory mapped file, based on the POSIX 1003.1 standard + * munmap. + * + * This method is only required if a valid implementation of map is + * provided by the file, and should be set to NULL otherwise. + * + * @errors + * + * @param file_handle the WT_FILE_HANDLE + * @param session the current WiredTiger session + * @param mapped_region a pointer to the start of the mapped region + * @param length the length of the mapped region + * @param mapped_cookie any cookie set by the WT_FILE_HANDLE::map method + */ + int (*unmap)(WT_FILE_HANDLE *file_handle, WT_SESSION *session, + void *mapped_region, size_t length, void *mapped_cookie); + + /*! + * Read from a file, based on the POSIX 1003.1 standard pread. + * + * @errors + * + * @param file_handle the WT_FILE_HANDLE + * @param session the current WiredTiger session + * @param offset the offset in the file to start reading from + * @param len the amount to read + * @param[out] buf buffer to hold the content read from file + */ + int (*read)(WT_FILE_HANDLE *file_handle, + WT_SESSION *session, wt_off_t offset, size_t len, void *buf); + + /*! + * Return the size of a file. + * + * @errors + * + * @param file_handle the WT_FILE_HANDLE + * @param session the current WiredTiger session + * @param sizep the size of the file + */ + int (*size)( + WT_FILE_HANDLE *file_handle, WT_SESSION *session, wt_off_t *sizep); + + /*! + * Make outstanding file writes durable and do not return until writes + * are complete. + * + * This method is not required for read-only files, and should be set + * to NULL when not supported by the file. + * + * @errors + * + * @param file_handle the WT_FILE_HANDLE + * @param session the current WiredTiger session + */ + int (*sync)(WT_FILE_HANDLE *file_handle, WT_SESSION *session); + + /*! + * Schedule the outstanding file writes required for durability and + * return immediately. + * + * This method is not required, and should be set to NULL when not + * supported by the file. + * + * @errors + * + * @param file_handle the WT_FILE_HANDLE + * @param session the current WiredTiger session + */ + int (*sync_nowait)(WT_FILE_HANDLE *file_handle, WT_SESSION *session); + + /*! + * Lengthen or shorten a file to the specified length, based on the + * POSIX 1003.1 standard ftruncate. + * + * This method is not required for read-only files, and should be set + * to NULL when not supported by the file. + * + * @errors + * + * @param file_handle the WT_FILE_HANDLE + * @param session the current WiredTiger session + * @param length desired file size after truncate + */ + int (*truncate)( + WT_FILE_HANDLE *file_handle, WT_SESSION *session, wt_off_t length); + + /*! + * Write to a file, based on the POSIX 1003.1 standard pwrite. + * + * This method is not required for read-only files, and should be set + * to NULL when not supported by the file. + * + * @errors + * + * @param file_handle the WT_FILE_HANDLE + * @param session the current WiredTiger session + * @param offset offset at which to start writing + * @param length amount of data to write + * @param buf content to be written to the file + */ + int (*write)(WT_FILE_HANDLE *file_handle, WT_SESSION *session, + wt_off_t offset, size_t length, const void *buf); +}; +#endif /* !defined(SWIG) */ + /*! * Entry point to an extension, called when the extension is loaded. * diff --git a/src/include/wt_internal.h b/src/include/wt_internal.h index e149ba9b3a7..0c8abf36cfe 100644 --- a/src/include/wt_internal.h +++ b/src/include/wt_internal.h @@ -181,6 +181,12 @@ struct __wt_fair_lock; typedef struct __wt_fair_lock WT_FAIR_LOCK; struct __wt_fh; typedef struct __wt_fh WT_FH; +struct __wt_file_handle_inmem; + typedef struct __wt_file_handle_inmem WT_FILE_HANDLE_INMEM; +struct __wt_file_handle_posix; + typedef struct __wt_file_handle_posix WT_FILE_HANDLE_POSIX; +struct __wt_file_handle_win; + typedef struct __wt_file_handle_win WT_FILE_HANDLE_WIN; struct __wt_fstream; typedef struct __wt_fstream WT_FSTREAM; struct __wt_hazard; diff --git a/src/log/log.c b/src/log/log.c index aabf629f867..fd5d4bca5bc 100644 --- a/src/log/log.c +++ b/src/log/log.c @@ -124,7 +124,7 @@ __wt_log_force_sync(WT_SESSION_IMPL *session, WT_LSN *min_lsn) "log_force_sync: sync directory %s to LSN %" PRIu32 "/%" PRIu32, log->log_dir_fh->name, min_lsn->l.file, min_lsn->l.offset)); - WT_ERR(__wt_directory_sync_fh(session, log->log_dir_fh)); + WT_ERR(__wt_fsync(session, log->log_dir_fh, true)); log->sync_dir_lsn = *min_lsn; WT_STAT_FAST_CONN_INCR(session, log_sync_dir); } @@ -258,8 +258,8 @@ __log_get_files(WT_SESSION_IMPL *session, log_path = conn->log_path; if (log_path == NULL) log_path = ""; - return (__wt_dirlist(session, log_path, file_prefix, - WT_DIRLIST_INCLUDE, filesp, countp)); + return (__wt_fs_directory_list( + session, log_path, file_prefix, filesp, countp)); } /* @@ -277,6 +277,9 @@ __wt_log_get_all_files(WT_SESSION_IMPL *session, uint32_t id, max; u_int count, i; + *filesp = NULL; + *countp = 0; + id = 0; log = S2C(session)->log; @@ -307,26 +310,12 @@ __wt_log_get_all_files(WT_SESSION_IMPL *session, *countp = count; if (0) { -err: __wt_log_files_free(session, files, count); +err: WT_TRET(__wt_fs_directory_list_free(session, &files, &count)); } return (ret); } /* - * __wt_log_files_free -- - * Free memory associated with a log file list. - */ -void -__wt_log_files_free(WT_SESSION_IMPL *session, char **files, u_int count) -{ - u_int i; - - for (i = 0; i < count; i++) - __wt_free(session, files[i]); - __wt_free(session, files); -} - -/* * __log_filename -- * Given a log number, return a WT_ITEM of a generated log file name * of the given prefix type. @@ -450,14 +439,20 @@ __log_prealloc(WT_SESSION_IMPL *session, WT_FH *fh) * and zero the log file based on what is available. */ if (FLD_ISSET(conn->log_flags, WT_CONN_LOG_ZERO_FILL)) - ret = __log_zero(session, fh, - WT_LOG_FIRST_RECORD, conn->log_file_max); - else if (fh->fallocate_available == WT_FALLOCATE_NOT_AVAILABLE || - (ret = __wt_fallocate(session, fh, - WT_LOG_FIRST_RECORD, conn->log_file_max)) == ENOTSUP) - ret = __wt_ftruncate(session, fh, - WT_LOG_FIRST_RECORD + conn->log_file_max); - return (ret); + return (__log_zero(session, fh, + WT_LOG_FIRST_RECORD, conn->log_file_max)); + + /* + * We have exclusive access to the log file and there are no other + * writes happening concurrently, so there are no locking issues. + */ + if ((ret = __wt_fallocate( + session, fh, WT_LOG_FIRST_RECORD, conn->log_file_max)) == 0) + return (0); + WT_RET_ERROR_OK(ret, ENOTSUP); + + return (__wt_ftruncate( + session, fh, WT_LOG_FIRST_RECORD + conn->log_file_max)); } /* @@ -669,14 +664,17 @@ static int __log_openfile(WT_SESSION_IMPL *session, bool ok_create, WT_FH **fhp, const char *file_prefix, uint32_t id) { + WT_CONNECTION_IMPL *conn; WT_DECL_ITEM(buf); WT_DECL_RET; WT_LOG *log; WT_LOG_DESC *desc; WT_LOG_RECORD *logrec; uint32_t allocsize; + u_int flags; - log = S2C(session)->log; + conn = S2C(session); + log = conn->log; if (log == NULL) allocsize = WT_LOG_ALIGN; else @@ -685,8 +683,14 @@ __log_openfile(WT_SESSION_IMPL *session, WT_ERR(__log_filename(session, id, file_prefix, buf)); WT_ERR(__wt_verbose(session, WT_VERB_LOG, "opening log %s", (const char *)buf->data)); - WT_ERR(__wt_open(session, buf->data, - WT_FILE_TYPE_LOG, ok_create ? WT_OPEN_CREATE : 0, fhp)); + flags = 0; + if (ok_create) + LF_SET(WT_OPEN_CREATE); + if (FLD_ISSET(conn->direct_io, WT_DIRECT_IO_LOG)) + LF_SET(WT_OPEN_DIRECTIO); + WT_ERR(__wt_open( + session, buf->data, WT_OPEN_FILE_TYPE_LOG, flags, fhp)); + /* * If we are not creating the log file but opening it for reading, * check that the magic number and versions are correct. @@ -757,12 +761,11 @@ __log_alloc_prealloc(WT_SESSION_IMPL *session, uint32_t to_num) * All file setup, writing the header and pre-allocation was done * before. We only need to rename it. */ - WT_ERR(__wt_rename(session, from_path->data, to_path->data)); + WT_ERR(__wt_fs_rename(session, from_path->data, to_path->data)); err: __wt_scr_free(session, &from_path); __wt_scr_free(session, &to_path); - if (logfiles != NULL) - __wt_log_files_free(session, logfiles, logcount); + WT_TRET(__wt_fs_directory_list_free(session, &logfiles, &logcount)); return (ret); } @@ -984,8 +987,7 @@ __log_truncate(WT_SESSION_IMPL *session, } } err: WT_TRET(__wt_close(session, &log_fh)); - if (logfiles != NULL) - __wt_log_files_free(session, logfiles, logcount); + WT_TRET(__wt_fs_directory_list_free(session, &logfiles, &logcount)); return (ret); } @@ -1037,7 +1039,7 @@ __wt_log_allocfile( /* * Rename it into place and make it available. */ - WT_ERR(__wt_rename(session, from_path->data, to_path->data)); + WT_ERR(__wt_fs_rename(session, from_path->data, to_path->data)); err: __wt_scr_free(session, &from_path); __wt_scr_free(session, &to_path); @@ -1060,7 +1062,7 @@ __wt_log_remove(WT_SESSION_IMPL *session, WT_ERR(__log_filename(session, lognum, file_prefix, path)); WT_ERR(__wt_verbose(session, WT_VERB_LOG, "log_remove: remove log %s", (char *)path->data)); - WT_ERR(__wt_remove(session, path->data)); + WT_ERR(__wt_fs_remove(session, path->data)); err: __wt_scr_free(session, &path); return (ret); } @@ -1096,7 +1098,7 @@ __wt_log_open(WT_SESSION_IMPL *session) WT_RET(__wt_verbose(session, WT_VERB_LOG, "log_open: open fh to directory %s", conn->log_path)); WT_RET(__wt_open(session, conn->log_path, - WT_FILE_TYPE_DIRECTORY, 0, &log->log_dir_fh)); + WT_OPEN_FILE_TYPE_DIRECTORY, 0, &log->log_dir_fh)); } if (!F_ISSET(conn, WT_CONN_READONLY)) { @@ -1113,9 +1115,8 @@ __wt_log_open(WT_SESSION_IMPL *session) WT_ERR(__wt_log_remove( session, WT_LOG_TMPNAME, lognum)); } - __wt_log_files_free(session, logfiles, logcount); - logfiles = NULL; - logcount = 0; + WT_ERR( + __wt_fs_directory_list_free(session, &logfiles, &logcount)); WT_ERR(__log_get_files(session, WT_LOG_PREPNAME, &logfiles, &logcount)); for (i = 0; i < logcount; i++) { @@ -1124,8 +1125,8 @@ __wt_log_open(WT_SESSION_IMPL *session) WT_ERR(__wt_log_remove( session, WT_LOG_PREPNAME, lognum)); } - __wt_log_files_free(session, logfiles, logcount); - logfiles = NULL; + WT_ERR( + __wt_fs_directory_list_free(session, &logfiles, &logcount)); } /* @@ -1163,8 +1164,7 @@ __wt_log_open(WT_SESSION_IMPL *session) FLD_SET(conn->log_flags, WT_CONN_LOG_EXISTED); } -err: if (logfiles != NULL) - __wt_log_files_free(session, logfiles, logcount); +err: WT_TRET(__wt_fs_directory_list_free(session, &logfiles, &logcount)); return (ret); } @@ -1200,8 +1200,7 @@ __wt_log_close(WT_SESSION_IMPL *session) WT_RET(__wt_verbose(session, WT_VERB_LOG, "closing log directory %s", log->log_dir_fh->name)); if (!F_ISSET(conn, WT_CONN_READONLY)) - WT_RET( - __wt_directory_sync_fh(session, log->log_dir_fh)); + WT_RET(__wt_fsync(session, log->log_dir_fh, true)); WT_RET(__wt_close(session, &log->log_dir_fh)); log->log_dir_fh = NULL; } @@ -1408,8 +1407,7 @@ __wt_log_release(WT_SESSION_IMPL *session, WT_LOGSLOT *slot, bool *freep) "/%" PRIu32, log->log_dir_fh->name, sync_lsn.l.file, sync_lsn.l.offset)); - WT_ERR(__wt_directory_sync_fh( - session, log->log_dir_fh)); + WT_ERR(__wt_fsync(session, log->log_dir_fh, true)); log->sync_dir_lsn = sync_lsn; WT_STAT_FAST_CONN_INCR(session, log_sync_dir); } @@ -1550,8 +1548,8 @@ __wt_log_scan(WT_SESSION_IMPL *session, WT_LSN *lsnp, uint32_t flags, } WT_SET_LSN(&start_lsn, firstlog, 0); WT_SET_LSN(&end_lsn, lastlog, 0); - __wt_log_files_free(session, logfiles, logcount); - logfiles = NULL; + WT_ERR( + __wt_fs_directory_list_free(session, &logfiles, &logcount)); } WT_ERR(__log_openfile( session, false, &log_fh, WT_LOG_FILENAME, start_lsn.l.file)); @@ -1747,8 +1745,7 @@ advance: err: WT_STAT_FAST_CONN_INCR(session, log_scans); - if (logfiles != NULL) - __wt_log_files_free(session, logfiles, logcount); + WT_TRET(__wt_fs_directory_list_free(session, &logfiles, &logcount)); __wt_scr_free(session, &buf); __wt_scr_free(session, &decryptitem); diff --git a/src/lsm/lsm_tree.c b/src/lsm/lsm_tree.c index de6dc005bc6..da106ae2089 100644 --- a/src/lsm/lsm_tree.c +++ b/src/lsm/lsm_tree.c @@ -235,7 +235,7 @@ __wt_lsm_tree_set_chunk_size( if (!WT_PREFIX_SKIP(filename, "file:")) WT_RET_MSG(session, EINVAL, "Expected a 'file:' URI: %s", chunk->uri); - WT_RET(__wt_filesize_name(session, filename, false, &size)); + WT_RET(__wt_fs_size(session, filename, &size)); chunk->size = (uint64_t)size; @@ -256,7 +256,7 @@ __lsm_tree_cleanup_old(WT_SESSION_IMPL *session, const char *uri) { WT_CONFIG_BASE(session, WT_SESSION_drop), "force", NULL }; bool exists; - WT_RET(__wt_exist(session, uri + strlen("file:"), &exists)); + WT_RET(__wt_fs_exist(session, uri + strlen("file:"), &exists)); if (exists) WT_WITH_SCHEMA_LOCK(session, ret, ret = __wt_schema_drop(session, uri, cfg)); diff --git a/src/lsm/lsm_work_unit.c b/src/lsm/lsm_work_unit.c index 51cf2e981de..821a996c38b 100644 --- a/src/lsm/lsm_work_unit.c +++ b/src/lsm/lsm_work_unit.c @@ -525,7 +525,7 @@ __lsm_drop_file(WT_SESSION_IMPL *session, const char *uri) ret = __wt_schema_drop(session, uri, drop_cfg)); if (ret == 0) - ret = __wt_remove(session, uri + strlen("file:")); + ret = __wt_fs_remove(session, uri + strlen("file:")); WT_RET(__wt_verbose(session, WT_VERB_LSM, "Dropped %s", uri)); if (ret == EBUSY || ret == ENOENT) diff --git a/src/meta/meta_track.c b/src/meta/meta_track.c index a73b7e09d37..4fe628e319b 100644 --- a/src/meta/meta_track.c +++ b/src/meta/meta_track.c @@ -194,8 +194,8 @@ __meta_track_unroll(WT_SESSION_IMPL *session, WT_META_TRACK *trk) __wt_err(session, ret, "metadata unroll rename %s to %s", trk->b, trk->a); - if (trk->a == NULL && - (ret = __wt_remove(session, trk->b + strlen("file:"))) != 0) + if (trk->a == NULL && (ret = + __wt_fs_remove(session, trk->b + strlen("file:"))) != 0) __wt_err(session, ret, "metadata unroll create %s", trk->b); diff --git a/src/meta/meta_turtle.c b/src/meta/meta_turtle.c index a45e7ecf9e0..ee9ee522748 100644 --- a/src/meta/meta_turtle.c +++ b/src/meta/meta_turtle.c @@ -75,11 +75,11 @@ __metadata_load_hot_backup(WT_SESSION_IMPL *session) bool exist; /* Look for a hot backup file: if we find it, load it. */ - WT_RET(__wt_exist(session, WT_METADATA_BACKUP, &exist)); + WT_RET(__wt_fs_exist(session, WT_METADATA_BACKUP, &exist)); if (!exist) return (0); WT_RET(__wt_fopen(session, - WT_METADATA_BACKUP, WT_FILE_TYPE_REGULAR, WT_STREAM_READ, &fs)); + WT_METADATA_BACKUP, 0, WT_STREAM_READ, &fs)); /* Read line pairs and load them into the metadata file. */ WT_ERR(__wt_scr_alloc(session, 512, &key)); @@ -128,7 +128,7 @@ __metadata_load_bulk(WT_SESSION_IMPL *session) continue; /* If the file exists, it's all good. */ - WT_ERR(__wt_exist(session, key, &exist)); + WT_ERR(__wt_fs_exist(session, key, &exist)); if (exist) continue; @@ -182,9 +182,9 @@ __wt_turtle_init(WT_SESSION_IMPL *session) * that is an error. Otherwise, if there's already a turtle file, we're * done. */ - WT_RET(__wt_exist(session, WT_INCREMENTAL_BACKUP, &exist_incr)); - WT_RET(__wt_exist(session, WT_METADATA_BACKUP, &exist_backup)); - WT_RET(__wt_exist(session, WT_METADATA_TURTLE, &exist_turtle)); + WT_RET(__wt_fs_exist(session, WT_INCREMENTAL_BACKUP, &exist_incr)); + WT_RET(__wt_fs_exist(session, WT_METADATA_BACKUP, &exist_backup)); + WT_RET(__wt_fs_exist(session, WT_METADATA_TURTLE, &exist_turtle)); if (exist_turtle) { if (exist_incr) WT_RET_MSG(session, EINVAL, @@ -254,7 +254,7 @@ __wt_turtle_read(WT_SESSION_IMPL *session, const char *key, char **valuep) * the turtle file, and that means returning the default configuration * string for the metadata file. */ - WT_RET(__wt_exist(session, WT_METADATA_TURTLE, &exist)); + WT_RET(__wt_fs_exist(session, WT_METADATA_TURTLE, &exist)); if (!exist) return (strcmp(key, WT_METAFILE_URI) == 0 ? __metadata_config(session, valuep) : WT_NOTFOUND); diff --git a/src/os_common/filename.c b/src/os_common/filename.c index 771cf61f081..5f174288350 100644 --- a/src/os_common/filename.c +++ b/src/os_common/filename.c @@ -60,9 +60,9 @@ __wt_remove_if_exists(WT_SESSION_IMPL *session, const char *name) { bool exist; - WT_RET(__wt_exist(session, name, &exist)); + WT_RET(__wt_fs_exist(session, name, &exist)); if (exist) - WT_RET(__wt_remove(session, name)); + WT_RET(__wt_fs_remove(session, name)); return (0); } @@ -78,7 +78,7 @@ __wt_rename_and_sync_directory( bool same_directory; /* Rename the source file to the target. */ - WT_RET(__wt_rename(session, from, to)); + WT_RET(__wt_fs_rename(session, from, to)); /* * Flush the backing directory to guarantee the rename. My reading of @@ -89,7 +89,7 @@ __wt_rename_and_sync_directory( * with specific mount options. Flush both of the from/to directories * until it's a performance problem. */ - WT_RET(__wt_directory_sync(session, from)); + WT_RET(__wt_fs_directory_sync(session, from)); /* * In almost all cases, we're going to be renaming files in the same @@ -101,7 +101,7 @@ __wt_rename_and_sync_directory( (fp != NULL && tp != NULL && fp - from == tp - to && memcmp(from, to, (size_t)(fp - from)) == 0); - return (same_directory ? 0 : __wt_directory_sync(session, to)); + return (same_directory ? 0 : __wt_fs_directory_sync(session, to)); } /* @@ -138,9 +138,9 @@ __wt_copy_and_sync(WT_SESSION *wt_session, const char *from, const char *to) WT_ERR(__wt_remove_if_exists(session, tmp->data)); /* Open the from and temporary file handles. */ - WT_ERR(__wt_open(session, from, WT_FILE_TYPE_REGULAR, 0, &ffh)); - WT_ERR(__wt_open(session, tmp->data, - WT_FILE_TYPE_REGULAR, WT_OPEN_CREATE | WT_OPEN_EXCLUSIVE, &tfh)); + WT_ERR(__wt_open(session, from, WT_OPEN_FILE_TYPE_REGULAR, 0, &ffh)); + WT_ERR(__wt_open(session, tmp->data, WT_OPEN_FILE_TYPE_REGULAR, + WT_OPEN_CREATE | WT_OPEN_EXCLUSIVE, &tfh)); /* * Allocate a copy buffer. Don't use a scratch buffer, this thing is diff --git a/src/os_common/os_fhandle.c b/src/os_common/os_fhandle.c index c14fa084130..ec92797fb50 100644 --- a/src/os_common/os_fhandle.c +++ b/src/os_common/os_fhandle.c @@ -9,213 +9,88 @@ #include "wt_internal.h" /* - * __fhandle_advise_notsup -- - * POSIX fadvise unsupported. + * __fhandle_method_finalize -- + * Initialize any NULL WT_FH structure methods to not-supported. Doing + * this means that custom file systems with incomplete implementations + * won't dereference NULL pointers. */ static int -__fhandle_advise_notsup(WT_SESSION_IMPL *session, - WT_FH *fh, wt_off_t offset, wt_off_t len, int advice) +__fhandle_method_finalize( + WT_SESSION_IMPL *session, WT_FILE_HANDLE *handle, bool readonly) { - WT_UNUSED(session); - WT_UNUSED(fh); - WT_UNUSED(offset); - WT_UNUSED(len); - WT_UNUSED(advice); - - /* Quietly fail, callers expect not-supported failures. */ - return (ENOTSUP); -} - -/* - * __fhandle_allocate_notsup -- - * POSIX fallocate unsupported. - */ -static int -__fhandle_allocate_notsup( - WT_SESSION_IMPL *session, WT_FH *fh, wt_off_t offset, wt_off_t len) -{ - WT_UNUSED(offset); - WT_UNUSED(len); - WT_RET_MSG(session, ENOTSUP, "%s: file-allocate", fh->name); -} - -/* - * __fhandle_close_notsup -- - * ANSI C close/fclose unsupported. - */ -static int -__fhandle_close_notsup(WT_SESSION_IMPL *session, WT_FH *fh) -{ - WT_RET_MSG(session, ENOTSUP, "%s: file-close", fh->name); -} - -/* - * __fhandle_lock_notsup -- - * Lock/unlock a file unsupported. - */ -static int -__fhandle_lock_notsup(WT_SESSION_IMPL *session, WT_FH *fh, bool lock) -{ - WT_UNUSED(lock); - WT_RET_MSG(session, ENOTSUP, "%s: file-lock", fh->name); -} - -/* - * __fhandle_map_notsup -- - * Map a file unsupported. - */ -static int -__fhandle_map_notsup(WT_SESSION_IMPL *session, - WT_FH *fh, void *p, size_t *lenp, void **mappingcookie) -{ - WT_UNUSED(p); - WT_UNUSED(lenp); - WT_UNUSED(mappingcookie); - WT_RET_MSG(session, ENOTSUP, "%s: file-map", fh->name); -} - -/* - * __fhandle_map_discard_notsup -- - * Discard a section of a mapped region unsupported. - */ -static int -__fhandle_map_discard_notsup( - WT_SESSION_IMPL *session, WT_FH *fh, void *p, size_t len) -{ - WT_UNUSED(p); - WT_UNUSED(len); - WT_RET_MSG(session, ENOTSUP, "%s: file-map-discard", fh->name); -} +#define WT_HANDLE_METHOD_REQ(name) \ + if (handle->name == NULL) \ + WT_RET_MSG(session, EINVAL, \ + "a WT_FILE_HANDLE.%s method must be configured", #name) + + WT_HANDLE_METHOD_REQ(close); + /* not required: fadvise */ + /* not required: fallocate */ + /* not required: fallocate_nolock */ + /* not required: lock */ + /* not required: map */ + /* not required: map_discard */ + /* not required: map_preload */ + /* not required: map_unmap */ + WT_HANDLE_METHOD_REQ(read); + WT_HANDLE_METHOD_REQ(size); + /* not required: sync */ + /* not required: sync_nowait */ + if (!readonly) { + WT_HANDLE_METHOD_REQ(truncate); + WT_HANDLE_METHOD_REQ(write); + } -/* - * __fhandle_map_preload_notsup -- - * Preload a section of a mapped region unsupported. - */ -static int -__fhandle_map_preload_notsup( - WT_SESSION_IMPL *session, WT_FH *fh, const void *p, size_t len) -{ - WT_UNUSED(p); - WT_UNUSED(len); - WT_RET_MSG(session, ENOTSUP, "%s: file-map-preload", fh->name); + return (0); } +#ifdef HAVE_DIAGNOSTIC /* - * __fhandle_map_unmap_notsup -- - * Unmap a file unsupported. + * __wt_handle_is_open -- + * Return if there's an open handle matching a name. */ -static int -__fhandle_map_unmap_notsup(WT_SESSION_IMPL *session, - WT_FH *fh, void *p, size_t len, void **mappingcookie) +bool +__wt_handle_is_open(WT_SESSION_IMPL *session, const char *name) { - WT_UNUSED(p); - WT_UNUSED(len); - WT_UNUSED(mappingcookie); - WT_RET_MSG(session, ENOTSUP, "%s: file-map-unmap", fh->name); -} + WT_CONNECTION_IMPL *conn; + WT_FH *fh; + uint64_t bucket, hash; + bool found; -/* - * __fhandle_read_notsup -- - * POSIX pread unsupported. - */ -static int -__fhandle_read_notsup( - WT_SESSION_IMPL *session, WT_FH *fh, wt_off_t offset, size_t len, void *buf) -{ - WT_UNUSED(offset); - WT_UNUSED(len); - WT_UNUSED(buf); - WT_RET_MSG(session, ENOTSUP, "%s: file-read", fh->name); -} + conn = S2C(session); + found = false; -/* - * __fhandle_size_notsup -- - * Get the size of a file in bytes unsupported. - */ -static int -__fhandle_size_notsup(WT_SESSION_IMPL *session, WT_FH *fh, wt_off_t *sizep) -{ - WT_UNUSED(sizep); - WT_RET_MSG(session, ENOTSUP, "%s: file-size", fh->name); -} + hash = __wt_hash_city64(name, strlen(name)); + bucket = hash % WT_HASH_ARRAY_SIZE; -/* - * __fhandle_sync_notsup -- - * POSIX fsync unsupported. - */ -static int -__fhandle_sync_notsup(WT_SESSION_IMPL *session, WT_FH *fh, bool block) -{ - WT_UNUSED(block); - WT_RET_MSG(session, ENOTSUP, "%s: file-sync", fh->name); -} + __wt_spin_lock(session, &conn->fh_lock); -/* - * __fhandle_truncate_notsup -- - * POSIX ftruncate. - */ -static int -__fhandle_truncate_notsup(WT_SESSION_IMPL *session, WT_FH *fh, wt_off_t len) -{ - WT_UNUSED(len); - WT_RET_MSG(session, ENOTSUP, "%s: file-truncate", fh->name); -} + TAILQ_FOREACH(fh, &conn->fhhash[bucket], hashq) + if (strcmp(name, fh->name) == 0) { + found = true; + break; + } -/* - * __fhandle_write_notsup -- - * POSIX pwrite. - */ -static int -__fhandle_write_notsup(WT_SESSION_IMPL *session, - WT_FH *fh, wt_off_t offset, size_t len, const void *buf) -{ - WT_UNUSED(offset); - WT_UNUSED(len); - WT_UNUSED(buf); - WT_RET_MSG(session, ENOTSUP, "%s: file-write", fh->name); -} + __wt_spin_unlock(session, &conn->fh_lock); -/* - * __fhandle_method_init -- - * Initialize the WT_FH structure's methods to not-supported. - */ -static void -__fhandle_method_init(WT_FH *fh) -{ - /* - * Set up the initial set of handle methods to standard "not-supported" - * functions, the underlying open functions turn on supported functions. - */ - fh->fh_advise = __fhandle_advise_notsup; - fh->fh_allocate = __fhandle_allocate_notsup; - fh->fh_close = __fhandle_close_notsup; - fh->fh_lock = __fhandle_lock_notsup; - fh->fh_map = __fhandle_map_notsup; - fh->fh_map_discard = __fhandle_map_discard_notsup; - fh->fh_map_preload = __fhandle_map_preload_notsup; - fh->fh_map_unmap = __fhandle_map_unmap_notsup; - fh->fh_read = __fhandle_read_notsup; - fh->fh_size = __fhandle_size_notsup; - fh->fh_sync = __fhandle_sync_notsup; - fh->fh_truncate = __fhandle_truncate_notsup; - fh->fh_write = __fhandle_write_notsup; + return (found); } +#endif /* - * __wt_handle_search -- + * __handle_search -- * Search for a matching handle. */ -bool -__wt_handle_search(WT_SESSION_IMPL *session, - const char *name, bool increment_ref, WT_FH *newfh, WT_FH **fhp) +static bool +__handle_search( + WT_SESSION_IMPL *session, const char *name, WT_FH *newfh, WT_FH **fhp) { WT_CONNECTION_IMPL *conn; WT_FH *fh; uint64_t bucket, hash; bool found; - if (fhp != NULL) - *fhp = NULL; + *fhp = NULL; conn = S2C(session); found = false; @@ -226,15 +101,13 @@ __wt_handle_search(WT_SESSION_IMPL *session, __wt_spin_lock(session, &conn->fh_lock); /* - * If we already have the file open, optionally increment the reference - * count and return a pointer. + * If we already have the file open, increment the reference count and + * return a pointer. */ TAILQ_FOREACH(fh, &conn->fhhash[bucket], hashq) if (strcmp(name, fh->name) == 0) { - if (increment_ref) - ++fh->ref; - if (fhp != NULL) - *fhp = fh; + ++fh->ref; + *fhp = fh; found = true; break; } @@ -245,10 +118,8 @@ __wt_handle_search(WT_SESSION_IMPL *session, WT_CONN_FILE_INSERT(conn, newfh, bucket); (void)__wt_atomic_add32(&conn->open_file_count, 1); - if (increment_ref) - ++newfh->ref; - if (fhp != NULL) - *fhp = newfh; + ++newfh->ref; + *fhp = newfh; } __wt_spin_unlock(session, &conn->fh_lock); @@ -261,8 +132,8 @@ __wt_handle_search(WT_SESSION_IMPL *session, * Optionally output a verbose message on handle open. */ static inline int -__open_verbose(WT_SESSION_IMPL *session, - const char *name, uint32_t file_type, uint32_t flags) +__open_verbose( + WT_SESSION_IMPL *session, const char *name, int file_type, u_int flags) { #ifdef HAVE_VERBOSE WT_DECL_RET; @@ -278,19 +149,19 @@ __open_verbose(WT_SESSION_IMPL *session, */ switch (file_type) { - case WT_FILE_TYPE_CHECKPOINT: + case WT_OPEN_FILE_TYPE_CHECKPOINT: file_type_tag = "checkpoint"; break; - case WT_FILE_TYPE_DATA: + case WT_OPEN_FILE_TYPE_DATA: file_type_tag = "data"; break; - case WT_FILE_TYPE_DIRECTORY: + case WT_OPEN_FILE_TYPE_DIRECTORY: file_type_tag = "directory"; break; - case WT_FILE_TYPE_LOG: + case WT_OPEN_FILE_TYPE_LOG: file_type_tag = "log"; break; - case WT_FILE_TYPE_REGULAR: + case WT_OPEN_FILE_TYPE_REGULAR: file_type_tag = "regular"; break; default: @@ -337,17 +208,19 @@ err: __wt_scr_free(session, &tmp); */ int __wt_open(WT_SESSION_IMPL *session, - const char *name, uint32_t file_type, uint32_t flags, WT_FH **fhp) + const char *name, WT_OPEN_FILE_TYPE file_type, u_int flags, WT_FH **fhp) { WT_CONNECTION_IMPL *conn; WT_DECL_RET; WT_FH *fh; + WT_FILE_SYSTEM *file_system; bool lock_file, open_called; char *path; WT_ASSERT(session, file_type != 0); /* A file type is required. */ conn = S2C(session); + file_system = conn->file_system; fh = NULL; open_called = false; path = NULL; @@ -355,7 +228,7 @@ __wt_open(WT_SESSION_IMPL *session, WT_RET(__open_verbose(session, name, file_type, flags)); /* Check if the handle is already open. */ - if (__wt_handle_search(session, name, true, NULL, &fh)) { + if (__handle_search(session, name, NULL, &fh)) { *fhp = fh; return (0); } @@ -363,7 +236,6 @@ __wt_open(WT_SESSION_IMPL *session, /* Allocate and initialize the handle. */ WT_ERR(__wt_calloc_one(session, &fh)); WT_ERR(__wt_strdup(session, name, &fh->name)); - __fhandle_method_init(fh); /* * If this is a read-only connection, open all files read-only except @@ -378,30 +250,26 @@ __wt_open(WT_SESSION_IMPL *session, WT_ASSERT(session, lock_file || !LF_ISSET(WT_OPEN_CREATE)); } - /* - * Direct I/O: file-type is a flag from the set of possible flags stored - * in the connection handle during configuration, check for a match. - */ - fh->direct_io = false; - if (FLD_ISSET(conn->direct_io, file_type)) - LF_SET(WT_OPEN_DIRECTIO); - /* Create the path to the file. */ if (!LF_ISSET(WT_OPEN_FIXED)) WT_ERR(__wt_filename(session, name, &path)); /* Call the underlying open function. */ - WT_ERR(conn->file_open( - session, fh, path == NULL ? name : path, file_type, flags)); + WT_ERR(file_system->open_file(file_system, &session->iface, + path == NULL ? name : path, file_type, flags, &fh->handle)); open_called = true; + WT_ERR(__fhandle_method_finalize( + session, fh->handle, LF_ISSET(WT_OPEN_READONLY))); + /* * Repeat the check for a match: if there's no match, link our newly * created handle onto the database's list of files. */ - if (__wt_handle_search(session, name, true, fh, fhp)) { + if (__handle_search(session, name, fh, fhp)) { err: if (open_called) - WT_TRET(fh->fh_close(session, fh)); + WT_TRET(fh->handle->close( + fh->handle, (WT_SESSION *)session)); if (fh != NULL) { __wt_free(session, fh->name); __wt_free(session, fh); @@ -443,7 +311,7 @@ __wt_close(WT_SESSION_IMPL *session, WT_FH **fhp) */ __wt_spin_lock(session, &conn->fh_lock); WT_ASSERT(session, fh->ref > 0); - if ((fh->ref > 0 && --fh->ref > 0) || F_ISSET(fh, WT_FH_IN_MEMORY)) { + if ((fh->ref > 0 && --fh->ref > 0)) { __wt_spin_unlock(session, &conn->fh_lock); return (0); } @@ -456,7 +324,7 @@ __wt_close(WT_SESSION_IMPL *session, WT_FH **fhp) __wt_spin_unlock(session, &conn->fh_lock); /* Discard underlying resources. */ - ret = fh->fh_close(session, fh); + ret = fh->handle->close(fh->handle, (WT_SESSION *)session); __wt_free(session, fh->name); __wt_free(session, fh); @@ -478,18 +346,13 @@ __wt_close_connection_close(WT_SESSION_IMPL *session) conn = S2C(session); while ((fh = TAILQ_FIRST(&conn->fhqh)) != NULL) { - /* - * In-memory configurations will have open files, but the ref - * counts should be zero. - */ - if (!F_ISSET(conn, WT_CONN_IN_MEMORY) || fh->ref != 0) { + if (fh->ref != 0) { ret = EBUSY; __wt_errx(session, "Connection has open file handles: %s", fh->name); } fh->ref = 1; - F_CLR(fh, WT_FH_IN_MEMORY); WT_TRET(__wt_close(session, &fh)); } diff --git a/src/os_common/os_fs_inmemory.c b/src/os_common/os_fs_inmemory.c index b4a6fd64784..55facbbaec1 100644 --- a/src/os_common/os_fs_inmemory.c +++ b/src/os_common/os_fs_inmemory.c @@ -9,39 +9,147 @@ #include "wt_internal.h" /* - * In-memory information. + * File system interface for in-memory implementation. */ typedef struct { + WT_FILE_SYSTEM iface; + + TAILQ_HEAD(__wt_closed_file_handle_qh, __wt_file_handle_inmem) fileq; + WT_SPINLOCK lock; -} WT_IM; +} WT_FILE_SYSTEM_INMEM; + +static int __im_file_size(WT_FILE_HANDLE *, WT_SESSION *, wt_off_t *); /* - * __im_directory_list -- - * Get a list of files from a directory, in-memory version. + * __im_handle_search -- + * Return a matching handle, if one exists. + */ +static WT_FILE_HANDLE_INMEM * +__im_handle_search(WT_FILE_SYSTEM *file_system, const char *name) +{ + WT_FILE_HANDLE_INMEM *im_fh; + WT_FILE_SYSTEM_INMEM *im_fs; + + im_fs = (WT_FILE_SYSTEM_INMEM *)file_system; + + TAILQ_FOREACH(im_fh, &im_fs->fileq, q) + if (strcmp(im_fh->iface.name, name) == 0) + break; + return (im_fh); +} + +/* + * __im_handle_remove -- + * Destroy an in-memory file handle. Should only happen on remove or + * shutdown. */ static int -__im_directory_list(WT_SESSION_IMPL *session, const char *dir, - const char *prefix, uint32_t flags, char ***dirlist, u_int *countp) +__im_handle_remove(WT_SESSION_IMPL *session, + WT_FILE_SYSTEM *file_system, WT_FILE_HANDLE_INMEM *im_fh) { - WT_UNUSED(session); - WT_UNUSED(dir); - WT_UNUSED(prefix); - WT_UNUSED(flags); - WT_UNUSED(dirlist); - WT_UNUSED(countp); + WT_FILE_HANDLE *fhp; + WT_FILE_SYSTEM_INMEM *im_fs; + + im_fs = (WT_FILE_SYSTEM_INMEM *)file_system; + + if (im_fh->ref != 0) + WT_RET_MSG(session, EBUSY, + "%s: file-remove", im_fh->iface.name); + + TAILQ_REMOVE(&im_fs->fileq, im_fh, q); - WT_RET_MSG(session, ENOTSUP, "directory-list"); + /* Clean up private information. */ + __wt_buf_free(session, &im_fh->buf); + + /* Clean up public information. */ + fhp = (WT_FILE_HANDLE *)im_fh; + __wt_free(session, fhp->name); + + __wt_free(session, im_fh); + + return (0); } /* - * __im_directory_sync -- - * Flush a directory to ensure file creation is durable. + * __im_fs_directory_list -- + * Return the directory contents. */ static int -__im_directory_sync(WT_SESSION_IMPL *session, const char *path) +__im_fs_directory_list(WT_FILE_SYSTEM *file_system, + WT_SESSION *wt_session, const char *directory, + const char *prefix, char ***dirlistp, uint32_t *countp) { - WT_UNUSED(session); - WT_UNUSED(path); + WT_DECL_RET; + WT_FILE_HANDLE_INMEM *im_fh; + WT_FILE_SYSTEM_INMEM *im_fs; + WT_SESSION_IMPL *session; + size_t dirallocsz, len; + uint32_t count; + char *name, **entries; + + im_fs = (WT_FILE_SYSTEM_INMEM *)file_system; + session = (WT_SESSION_IMPL *)wt_session; + + *dirlistp = NULL; + *countp = 0; + + dirallocsz = 0; + len = strlen(directory); + entries = NULL; + + __wt_spin_lock(session, &im_fs->lock); + + count = 0; + TAILQ_FOREACH(im_fh, &im_fs->fileq, q) { + name = im_fh->iface.name; + if (strncmp(name, directory, len) != 0 || + (prefix != NULL && !WT_PREFIX_MATCH(name + len, prefix))) + continue; + + WT_ERR(__wt_realloc_def( + session, &dirallocsz, count + 1, &entries)); + WT_ERR(__wt_strdup(session, name, &entries[count])); + ++count; + } + + *dirlistp = entries; + *countp = count; + +err: __wt_spin_unlock(session, &im_fs->lock); + if (ret == 0) + return (0); + + if (entries != NULL) { + while (count > 0) + __wt_free(session, entries[--count]); + __wt_free(session, entries); + } + + WT_RET_MSG(session, ret, + "%s: directory-list, prefix \"%s\"", + directory, prefix == NULL ? "" : prefix); +} + +/* + * __im_fs_directory_list_free -- + * Free memory returned by __im_fs_directory_list. + */ +static int +__im_fs_directory_list_free(WT_FILE_SYSTEM *file_system, + WT_SESSION *wt_session, char **dirlist, uint32_t count) +{ + WT_SESSION_IMPL *session; + + WT_UNUSED(file_system); + + session = (WT_SESSION_IMPL *)wt_session; + + if (dirlist != NULL) { + while (count > 0) + __wt_free(session, dirlist[--count]); + __wt_free(session, dirlist); + } return (0); } @@ -50,9 +158,20 @@ __im_directory_sync(WT_SESSION_IMPL *session, const char *path) * Return if the file exists. */ static int -__im_fs_exist(WT_SESSION_IMPL *session, const char *name, bool *existp) +__im_fs_exist(WT_FILE_SYSTEM *file_system, + WT_SESSION *wt_session, const char *name, bool *existp) { - *existp = __wt_handle_search(session, name, false, NULL, NULL); + WT_FILE_SYSTEM_INMEM *im_fs; + WT_SESSION_IMPL *session; + + im_fs = (WT_FILE_SYSTEM_INMEM *)file_system; + session = (WT_SESSION_IMPL *)wt_session; + + __wt_spin_lock(session, &im_fs->lock); + + *existp = __im_handle_search(file_system, name) != NULL; + + __wt_spin_unlock(session, &im_fs->lock); return (0); } @@ -61,18 +180,24 @@ __im_fs_exist(WT_SESSION_IMPL *session, const char *name, bool *existp) * POSIX remove. */ static int -__im_fs_remove(WT_SESSION_IMPL *session, const char *name) +__im_fs_remove( + WT_FILE_SYSTEM *file_system, WT_SESSION *wt_session, const char *name) { WT_DECL_RET; - WT_FH *fh; + WT_FILE_HANDLE_INMEM *im_fh; + WT_FILE_SYSTEM_INMEM *im_fs; + WT_SESSION_IMPL *session; - if (__wt_handle_search(session, name, true, NULL, &fh)) { - WT_ASSERT(session, fh->ref == 1); + im_fs = (WT_FILE_SYSTEM_INMEM *)file_system; + session = (WT_SESSION_IMPL *)wt_session; - /* Force a discard of the handle. */ - F_CLR(fh, WT_FH_IN_MEMORY); - ret = __wt_close(session, &fh); - } + __wt_spin_lock(session, &im_fs->lock); + + ret = ENOENT; + if ((im_fh = __im_handle_search(file_system, name)) != NULL) + ret = __im_handle_remove(session, file_system, im_fh); + + __wt_spin_unlock(session, &im_fs->lock); return (ret); } @@ -81,55 +206,29 @@ __im_fs_remove(WT_SESSION_IMPL *session, const char *name) * POSIX rename. */ static int -__im_fs_rename(WT_SESSION_IMPL *session, const char *from, const char *to) +__im_fs_rename(WT_FILE_SYSTEM *file_system, + WT_SESSION *wt_session, const char *from, const char *to) { - WT_CONNECTION_IMPL *conn; WT_DECL_RET; - WT_FH *fh; - uint64_t bucket, hash; - char *to_name; - - conn = S2C(session); - - /* We'll need a copy of the target name. */ - WT_RET(__wt_strdup(session, to, &to_name)); - - __wt_spin_lock(session, &conn->fh_lock); - - /* Make sure the target name isn't active. */ - hash = __wt_hash_city64(to, strlen(to)); - bucket = hash % WT_HASH_ARRAY_SIZE; - TAILQ_FOREACH(fh, &conn->fhhash[bucket], hashq) - if (strcmp(to, fh->name) == 0) - WT_ERR(EPERM); + WT_FILE_HANDLE_INMEM *im_fh; + WT_FILE_SYSTEM_INMEM *im_fs; + WT_SESSION_IMPL *session; + char *copy; - /* Find the source name. */ - hash = __wt_hash_city64(from, strlen(from)); - bucket = hash % WT_HASH_ARRAY_SIZE; - TAILQ_FOREACH(fh, &conn->fhhash[bucket], hashq) - if (strcmp(from, fh->name) == 0) - break; - if (fh == NULL) - WT_ERR(ENOENT); - - /* Remove source from the list. */ - WT_CONN_FILE_REMOVE(conn, fh, bucket); + im_fs = (WT_FILE_SYSTEM_INMEM *)file_system; + session = (WT_SESSION_IMPL *)wt_session; - /* Swap the names. */ - __wt_free(session, fh->name); - fh->name = to_name; - to_name = NULL; + __wt_spin_lock(session, &im_fs->lock); - /* Put source back on the list. */ - hash = __wt_hash_city64(to, strlen(to)); - bucket = hash % WT_HASH_ARRAY_SIZE; - WT_CONN_FILE_INSERT(conn, fh, bucket); + ret = ENOENT; + if ((im_fh = __im_handle_search(file_system, from)) != NULL) { + WT_ERR(__wt_strdup(session, to, ©)); - if (0) { -err: __wt_free(session, to_name); + __wt_free(session, im_fh->iface.name); + im_fh->iface.name = copy; } - __wt_spin_unlock(session, &conn->fh_lock); +err: __wt_spin_unlock(session, &im_fs->lock); return (ret); } @@ -138,25 +237,25 @@ err: __wt_free(session, to_name); * Get the size of a file in bytes, by file name. */ static int -__im_fs_size( - WT_SESSION_IMPL *session, const char *name, bool silent, wt_off_t *sizep) +__im_fs_size(WT_FILE_SYSTEM *file_system, + WT_SESSION *wt_session, const char *name, wt_off_t *sizep) { WT_DECL_RET; - WT_FH *fh; - WT_IM *im; + WT_FILE_HANDLE_INMEM *im_fh; + WT_FILE_SYSTEM_INMEM *im_fs; + WT_SESSION_IMPL *session; - WT_UNUSED(silent); + im_fs = (WT_FILE_SYSTEM_INMEM *)file_system; + session = (WT_SESSION_IMPL *)wt_session; - im = S2C(session)->inmemory; - __wt_spin_lock(session, &im->lock); + __wt_spin_lock(session, &im_fs->lock); - if (__wt_handle_search(session, name, true, NULL, &fh)) { - WT_ERR(fh->fh_size(session, fh, sizep)); - WT_ERR(__wt_close(session, &fh)); - } else - ret = ENOENT; + ret = ENOENT; + if ((im_fh = __im_handle_search(file_system, name)) != NULL) + ret = __im_file_size( + (WT_FILE_HANDLE *)im_fh, wt_session, sizep); -err: __wt_spin_unlock(session, &im->lock); + __wt_spin_unlock(session, &im_fs->lock); return (ret); } @@ -165,24 +264,22 @@ err: __wt_spin_unlock(session, &im->lock); * ANSI C close. */ static int -__im_file_close(WT_SESSION_IMPL *session, WT_FH *fh) +__im_file_close(WT_FILE_HANDLE *file_handle, WT_SESSION *wt_session) { - __wt_buf_free(session, &fh->buf); + WT_FILE_HANDLE_INMEM *im_fh; + WT_FILE_SYSTEM_INMEM *im_fs; + WT_SESSION_IMPL *session; - return (0); -} + im_fh = (WT_FILE_HANDLE_INMEM *)file_handle; + im_fs = (WT_FILE_SYSTEM_INMEM *)file_handle->file_system; + session = (WT_SESSION_IMPL *)wt_session; + + __wt_spin_lock(session, &im_fs->lock); + + --im_fh->ref; + + __wt_spin_unlock(session, &im_fs->lock); -/* - * __im_file_lock -- - * Lock/unlock a file. - */ -static int -__im_file_lock(WT_SESSION_IMPL *session, WT_FH *fh, bool lock) -{ - /* Locks are always granted. */ - WT_UNUSED(session); - WT_UNUSED(fh); - WT_UNUSED(lock); return (0); } @@ -191,31 +288,36 @@ __im_file_lock(WT_SESSION_IMPL *session, WT_FH *fh, bool lock) * POSIX pread. */ static int -__im_file_read( - WT_SESSION_IMPL *session, WT_FH *fh, wt_off_t offset, size_t len, void *buf) +__im_file_read(WT_FILE_HANDLE *file_handle, + WT_SESSION *wt_session, wt_off_t offset, size_t len, void *buf) { WT_DECL_RET; - WT_IM *im; + WT_FILE_HANDLE_INMEM *im_fh; + WT_FILE_SYSTEM_INMEM *im_fs; + WT_SESSION_IMPL *session; size_t off; - im = S2C(session)->inmemory; - __wt_spin_lock(session, &im->lock); + im_fh = (WT_FILE_HANDLE_INMEM *)file_handle; + im_fs = (WT_FILE_SYSTEM_INMEM *)file_handle->file_system; + session = (WT_SESSION_IMPL *)wt_session; + + __wt_spin_lock(session, &im_fs->lock); off = (size_t)offset; - if (off < fh->buf.size) { - len = WT_MIN(len, fh->buf.size - off); - memcpy(buf, (uint8_t *)fh->buf.mem + off, len); - fh->off = off + len; + if (off < im_fh->buf.size) { + len = WT_MIN(len, im_fh->buf.size - off); + memcpy(buf, (uint8_t *)im_fh->buf.mem + off, len); + im_fh->off = off + len; } else ret = WT_ERROR; - __wt_spin_unlock(session, &im->lock); + __wt_spin_unlock(session, &im_fs->lock); if (ret == 0) return (0); WT_RET_MSG(session, WT_ERROR, "%s: handle-read: failed to read %" WT_SIZET_FMT " bytes at " "offset %" WT_SIZET_FMT, - fh->name, len, off); + im_fh->iface.name, len, off); } /* @@ -223,34 +325,29 @@ __im_file_read( * Get the size of a file in bytes, by file handle. */ static int -__im_file_size(WT_SESSION_IMPL *session, WT_FH *fh, wt_off_t *sizep) +__im_file_size( + WT_FILE_HANDLE *file_handle, WT_SESSION *wt_session, wt_off_t *sizep) { - WT_UNUSED(session); + WT_FILE_HANDLE_INMEM *im_fh; + WT_FILE_SYSTEM_INMEM *im_fs; + WT_SESSION_IMPL *session; + + im_fh = (WT_FILE_HANDLE_INMEM *)file_handle; + im_fs = (WT_FILE_SYSTEM_INMEM *)file_handle->file_system; + session = (WT_SESSION_IMPL *)wt_session; + + __wt_spin_lock(session, &im_fs->lock); /* * XXX hack - MongoDB assumes that any file with content will have a * non-zero size. In memory tables generally are zero-sized, make * MongoDB happy. */ - *sizep = fh->buf.size == 0 ? 1024 : (wt_off_t)fh->buf.size; - return (0); -} + *sizep = im_fh->buf.size == 0 ? 1024 : (wt_off_t)im_fh->buf.size; -/* - * __im_file_sync -- - * POSIX fflush/fsync. - */ -static int -__im_file_sync(WT_SESSION_IMPL *session, WT_FH *fh, bool block) -{ - WT_UNUSED(session); - WT_UNUSED(fh); + __wt_spin_unlock(session, &im_fs->lock); - /* - * Callers attempting asynchronous flush handle ENOTSUP returns, and - * won't make further attempts. - */ - return (block ? 0 : ENOTSUP); + return (0); } /* @@ -258,27 +355,33 @@ __im_file_sync(WT_SESSION_IMPL *session, WT_FH *fh, bool block) * POSIX ftruncate. */ static int -__im_file_truncate(WT_SESSION_IMPL *session, WT_FH *fh, wt_off_t offset) +__im_file_truncate( + WT_FILE_HANDLE *file_handle, WT_SESSION *wt_session, wt_off_t offset) { WT_DECL_RET; - WT_IM *im; + WT_FILE_HANDLE_INMEM *im_fh; + WT_FILE_SYSTEM_INMEM *im_fs; + WT_SESSION_IMPL *session; size_t off; - im = S2C(session)->inmemory; - __wt_spin_lock(session, &im->lock); + im_fh = (WT_FILE_HANDLE_INMEM *)file_handle; + im_fs = (WT_FILE_SYSTEM_INMEM *)file_handle->file_system; + session = (WT_SESSION_IMPL *)wt_session; + + __wt_spin_lock(session, &im_fs->lock); /* * Grow the buffer as necessary, clear any new space in the file, * and reset the file's data length. */ off = (size_t)offset; - WT_ERR(__wt_buf_grow(session, &fh->buf, off)); - if (fh->buf.size < off) - memset((uint8_t *) - fh->buf.data + fh->buf.size, 0, off - fh->buf.size); - fh->buf.size = off; + WT_ERR(__wt_buf_grow(session, &im_fh->buf, off)); + if (im_fh->buf.size < off) + memset((uint8_t *)im_fh->buf.data + im_fh->buf.size, + 0, off - im_fh->buf.size); + im_fh->buf.size = off; -err: __wt_spin_unlock(session, &im->lock); +err: __wt_spin_unlock(session, &im_fs->lock); return (ret); } @@ -287,31 +390,36 @@ err: __wt_spin_unlock(session, &im->lock); * POSIX pwrite. */ static int -__im_file_write(WT_SESSION_IMPL *session, - WT_FH *fh, wt_off_t offset, size_t len, const void *buf) +__im_file_write(WT_FILE_HANDLE *file_handle, WT_SESSION *wt_session, + wt_off_t offset, size_t len, const void *buf) { WT_DECL_RET; - WT_IM *im; + WT_FILE_HANDLE_INMEM *im_fh; + WT_FILE_SYSTEM_INMEM *im_fs; + WT_SESSION_IMPL *session; size_t off; - im = S2C(session)->inmemory; - __wt_spin_lock(session, &im->lock); + im_fh = (WT_FILE_HANDLE_INMEM *)file_handle; + im_fs = (WT_FILE_SYSTEM_INMEM *)file_handle->file_system; + session = (WT_SESSION_IMPL *)wt_session; + + __wt_spin_lock(session, &im_fs->lock); off = (size_t)offset; - WT_ERR(__wt_buf_grow(session, &fh->buf, off + len + 1024)); + WT_ERR(__wt_buf_grow(session, &im_fh->buf, off + len + 1024)); - memcpy((uint8_t *)fh->buf.data + off, buf, len); - if (off + len > fh->buf.size) - fh->buf.size = off + len; - fh->off = off + len; + memcpy((uint8_t *)im_fh->buf.data + off, buf, len); + if (off + len > im_fh->buf.size) + im_fh->buf.size = off + len; + im_fh->off = off + len; -err: __wt_spin_unlock(session, &im->lock); +err: __wt_spin_unlock(session, &im_fs->lock); if (ret == 0) return (0); WT_RET_MSG(session, ret, "%s: handle-write: failed to write %" WT_SIZET_FMT " bytes at " "offset %" WT_SIZET_FMT, - fh->name, len, off); + im_fh->iface.name, len, off); } /* @@ -319,85 +427,134 @@ err: __wt_spin_unlock(session, &im->lock); * POSIX fopen/open. */ static int -__im_file_open(WT_SESSION_IMPL *session, - WT_FH *fh, const char *path, uint32_t file_type, uint32_t flags) +__im_file_open(WT_FILE_SYSTEM *file_system, WT_SESSION *wt_session, + const char *name, WT_OPEN_FILE_TYPE file_type, uint32_t flags, + WT_FILE_HANDLE **file_handlep) { - WT_UNUSED(session); - WT_UNUSED(path); + WT_DECL_RET; + WT_FILE_HANDLE *file_handle; + WT_FILE_HANDLE_INMEM *im_fh; + WT_FILE_SYSTEM_INMEM *im_fs; + WT_SESSION_IMPL *session; + WT_UNUSED(file_type); WT_UNUSED(flags); + im_fs = (WT_FILE_SYSTEM_INMEM *)file_system; + session = (WT_SESSION_IMPL *)wt_session; + + __wt_spin_lock(session, &im_fs->lock); + /* - * Unlike other file handle open implementations, the in-memory version - * is called whenever the WT_FH structure reference count goes to 0. - * This is because the in-memory implementation reuses WT_FH structures, - * and so we have to reset the file offset and potentially the list of - * functions, in the case of the file being opened in a different way. + * First search the file queue, if we find it, assert there's only a + * single reference, in-memory only supports a single handle on any + * file, for now. */ - fh->off = 0; - F_SET(fh, WT_FH_IN_MEMORY); + im_fh = __im_handle_search(file_system, name); + if (im_fh != NULL) { - fh->fh_close = __im_file_close; - fh->fh_lock = __im_file_lock; - fh->fh_read = __im_file_read; - fh->fh_size = __im_file_size; - fh->fh_sync = __im_file_sync; - fh->fh_truncate = __im_file_truncate; - fh->fh_write = __im_file_write; + if (im_fh->ref != 0) + WT_ERR_MSG(session, EBUSY, + "%s: file-open: already open", name); - return (0); + im_fh->ref = 1; + im_fh->off = 0; + + *file_handlep = (WT_FILE_HANDLE *)im_fh; + + __wt_spin_unlock(session, &im_fs->lock); + return (0); + } + + /* The file hasn't been opened before, create a new one. */ + WT_ERR(__wt_calloc_one(session, &im_fh)); + + /* Initialize private information. */ + im_fh->ref = 1; + im_fh->off = 0; + + /* Initialize public information. */ + file_handle = (WT_FILE_HANDLE *)im_fh; + file_handle->file_system = file_system; + WT_ERR(__wt_strdup(session, name, &file_handle->name)); + + file_handle->close = __im_file_close; + file_handle->read = __im_file_read; + file_handle->size = __im_file_size; + file_handle->truncate = __im_file_truncate; + file_handle->write = __im_file_write; + + TAILQ_INSERT_HEAD(&im_fs->fileq, im_fh, q); + + *file_handlep = file_handle; + + if (0) { +err: __wt_free(session, im_fh); + } + + __wt_spin_unlock(session, &im_fs->lock); + return (ret); } /* - * __wt_os_inmemory -- - * Initialize an in-memory configuration. + * __im_terminate -- + * Terminate an in-memory configuration. */ -int -__wt_os_inmemory(WT_SESSION_IMPL *session) +static int +__im_terminate(WT_FILE_SYSTEM *file_system, WT_SESSION *wt_session) { - WT_CONNECTION_IMPL *conn; WT_DECL_RET; - WT_IM *im; + WT_FILE_HANDLE_INMEM *im_fh; + WT_FILE_SYSTEM_INMEM *im_fs; + WT_SESSION_IMPL *session; - conn = S2C(session); - im = NULL; + WT_UNUSED(file_system); - /* Initialize the in-memory jump table. */ - conn->file_directory_list = __im_directory_list; - conn->file_directory_sync = __im_directory_sync; - conn->file_exist = __im_fs_exist; - conn->file_remove = __im_fs_remove; - conn->file_rename = __im_fs_rename; - conn->file_size = __im_fs_size; - conn->file_open = __im_file_open; - - /* Allocate an in-memory structure. */ - WT_RET(__wt_calloc_one(session, &im)); - WT_ERR(__wt_spin_init(session, &im->lock, "in-memory I/O")); - conn->inmemory = im; + session = (WT_SESSION_IMPL *)wt_session; + im_fs = (WT_FILE_SYSTEM_INMEM *)file_system; - return (0); + while ((im_fh = TAILQ_FIRST(&im_fs->fileq)) != NULL) + WT_TRET(__im_handle_remove(session, file_system, im_fh)); + + __wt_spin_destroy(session, &im_fs->lock); + __wt_free(session, im_fs); -err: __wt_free(session, im); return (ret); } /* - * __wt_os_inmemory_cleanup -- - * Discard an in-memory configuration. + * __wt_os_inmemory -- + * Initialize an in-memory configuration. */ int -__wt_os_inmemory_cleanup(WT_SESSION_IMPL *session) +__wt_os_inmemory(WT_SESSION_IMPL *session) { WT_DECL_RET; - WT_IM *im; + WT_FILE_SYSTEM *file_system; + WT_FILE_SYSTEM_INMEM *im_fs; - if ((im = S2C(session)->inmemory) == NULL) - return (0); - S2C(session)->inmemory = NULL; + WT_RET(__wt_calloc_one(session, &im_fs)); + + /* Initialize private information. */ + TAILQ_INIT(&im_fs->fileq); + WT_ERR(__wt_spin_init(session, &im_fs->lock, "in-memory I/O")); - __wt_spin_destroy(session, &im->lock); - __wt_free(session, im); + /* Initialize the in-memory jump table. */ + file_system = (WT_FILE_SYSTEM *)im_fs; + file_system->directory_list = __im_fs_directory_list; + file_system->directory_list_free = __im_fs_directory_list_free; + file_system->exist = __im_fs_exist; + file_system->open_file = __im_file_open; + file_system->remove = __im_fs_remove; + file_system->rename = __im_fs_rename; + file_system->size = __im_fs_size; + file_system->terminate = __im_terminate; + + /* Switch the file system into place. */ + S2C(session)->file_system = (WT_FILE_SYSTEM *)im_fs; + + return (0); +err: __wt_free(session, im_fs); return (ret); } diff --git a/src/os_common/os_fstream.c b/src/os_common/os_fstream.c index fe67c3312a5..fc0daf1c211 100644 --- a/src/os_common/os_fstream.c +++ b/src/os_common/os_fstream.c @@ -182,7 +182,8 @@ __wt_fopen(WT_SESSION_IMPL *session, fs = NULL; - WT_RET(__wt_open(session, name, WT_FILE_TYPE_REGULAR, open_flags, &fh)); + WT_RET(__wt_open( + session, name, WT_OPEN_FILE_TYPE_REGULAR, open_flags, &fh)); WT_ERR(__wt_calloc_one(session, &fs)); fs->fh = fh; diff --git a/src/os_common/os_init.c b/src/os_common/os_init.c deleted file mode 100644 index 512216c52a5..00000000000 --- a/src/os_common/os_init.c +++ /dev/null @@ -1,41 +0,0 @@ -/*- - * Copyright (c) 2014-2016 MongoDB, Inc. - * Copyright (c) 2008-2014 WiredTiger, Inc. - * All rights reserved. - * - * See the file LICENSE for redistribution information. - */ - -#include "wt_internal.h" - -/* - * __wt_os_init -- - * Initialize the OS layer. - */ -int -__wt_os_init(WT_SESSION_IMPL *session) -{ - return (F_ISSET(S2C(session), WT_CONN_IN_MEMORY) ? - __wt_os_inmemory(session) : -#if defined(_MSC_VER) - __wt_os_win(session)); -#else - __wt_os_posix(session)); -#endif -} - -/* - * __wt_os_cleanup -- - * Clean up the OS layer. - */ -int -__wt_os_cleanup(WT_SESSION_IMPL *session) -{ - return (F_ISSET(S2C(session), WT_CONN_IN_MEMORY) ? - __wt_os_inmemory_cleanup(session) : -#if defined(_MSC_VER) - __wt_os_win_cleanup(session)); -#else - __wt_os_posix_cleanup(session)); -#endif -} diff --git a/src/os_posix/os_dir.c b/src/os_posix/os_dir.c index 78ae5f8edd4..a23051e5b93 100644 --- a/src/os_posix/os_dir.c +++ b/src/os_posix/os_dir.c @@ -15,30 +15,33 @@ * Get a list of files from a directory, POSIX version. */ int -__wt_posix_directory_list(WT_SESSION_IMPL *session, const char *dir, - const char *prefix, uint32_t flags, char ***dirlist, u_int *countp) +__wt_posix_directory_list(WT_FILE_SYSTEM *file_system, + WT_SESSION *wt_session, const char *directory, + const char *prefix, char ***dirlistp, uint32_t *countp) { struct dirent *dp; DIR *dirp; WT_DECL_RET; + WT_SESSION_IMPL *session; size_t dirallocsz; - u_int count, dirsz; - bool match; - char **entries, *path; + uint32_t count; + char **entries; - *dirlist = NULL; - *countp = 0; + WT_UNUSED(file_system); + + session = (WT_SESSION_IMPL *)wt_session; - WT_RET(__wt_filename(session, dir, &path)); + *dirlistp = NULL; + *countp = 0; dirp = NULL; dirallocsz = 0; - dirsz = 0; entries = NULL; - WT_SYSCALL_RETRY(((dirp = opendir(path)) == NULL ? 1 : 0), ret); + WT_SYSCALL_RETRY(((dirp = opendir(directory)) == NULL ? 1 : 0), ret); if (ret != 0) - WT_ERR_MSG(session, ret, "%s: directory-list: opendir", path); + WT_RET_MSG(session, ret, + "%s: directory-list: opendir", directory); for (count = 0; (dp = readdir(dirp)) != NULL;) { /* @@ -49,44 +52,50 @@ __wt_posix_directory_list(WT_SESSION_IMPL *session, const char *dir, continue; /* The list of files is optionally filtered by a prefix. */ - match = false; - if (prefix != NULL && - ((LF_ISSET(WT_DIRLIST_INCLUDE) && - WT_PREFIX_MATCH(dp->d_name, prefix)) || - (LF_ISSET(WT_DIRLIST_EXCLUDE) && - !WT_PREFIX_MATCH(dp->d_name, prefix)))) - match = true; - if (prefix == NULL || match) { - /* - * We have a file name we want to return. - */ - count++; - if (count > dirsz) { - dirsz += WT_DIR_ENTRY; - WT_ERR(__wt_realloc_def( - session, &dirallocsz, dirsz, &entries)); - } - WT_ERR(__wt_strdup( - session, dp->d_name, &entries[count-1])); - } + if (prefix != NULL && !WT_PREFIX_MATCH(dp->d_name, prefix)) + continue; + + WT_ERR(__wt_realloc_def( + session, &dirallocsz, count + 1, &entries)); + WT_ERR(__wt_strdup(session, dp->d_name, &entries[count])); + ++count; } - if (count > 0) - *dirlist = entries; + + *dirlistp = entries; *countp = count; err: if (dirp != NULL) (void)closedir(dirp); - __wt_free(session, path); if (ret == 0) return (0); - if (*dirlist != NULL) { - for (count = dirsz; count > 0; count--) - __wt_free(session, entries[count]); - __wt_free(session, entries); - } + WT_TRET(__wt_posix_directory_list_free( + file_system, wt_session, entries, count)); + WT_RET_MSG(session, ret, "%s: directory-list, prefix \"%s\"", - dir, prefix == NULL ? "" : prefix); + directory, prefix == NULL ? "" : prefix); +} + +/* + * __wt_posix_directory_list_free -- + * Free memory returned by __wt_posix_directory_list. + */ +int +__wt_posix_directory_list_free(WT_FILE_SYSTEM *file_system, + WT_SESSION *wt_session, char **dirlist, uint32_t count) +{ + WT_SESSION_IMPL *session; + + WT_UNUSED(file_system); + + session = (WT_SESSION_IMPL *)wt_session; + + if (dirlist != NULL) { + while (count > 0) + __wt_free(session, dirlist[--count]); + __wt_free(session, dirlist); + } + return (0); } diff --git a/src/os_posix/os_dlopen.c b/src/os_posix/os_dlopen.c index 9a74eb4813d..ad1fcc90150 100644 --- a/src/os_posix/os_dlopen.c +++ b/src/os_posix/os_dlopen.c @@ -19,7 +19,7 @@ __wt_dlopen(WT_SESSION_IMPL *session, const char *path, WT_DLH **dlhp) WT_DLH *dlh; WT_RET(__wt_calloc_one(session, &dlh)); - WT_ERR(__wt_strdup(session, path, &dlh->name)); + WT_ERR(__wt_strdup(session, path == NULL ? "local" : path, &dlh->name)); if ((dlh->handle = dlopen(path, RTLD_LAZY)) == NULL) WT_ERR_MSG( diff --git a/src/os_posix/os_fallocate.c b/src/os_posix/os_fallocate.c index 51e29aab4de..92569d84c99 100644 --- a/src/os_posix/os_fallocate.c +++ b/src/os_posix/os_fallocate.c @@ -12,47 +12,28 @@ #include <linux/falloc.h> #include <sys/syscall.h> #endif -/* - * __wt_posix_file_allocate_configure -- - * Configure POSIX file-extension behavior for a file handle. - */ -void -__wt_posix_file_allocate_configure(WT_SESSION_IMPL *session, WT_FH *fh) -{ - WT_UNUSED(session); - - fh->fallocate_available = WT_FALLOCATE_NOT_AVAILABLE; - fh->fallocate_requires_locking = false; - - /* - * Check for the availability of some form of fallocate; in all cases, - * start off requiring locking, we'll relax that requirement once we - * know which system calls work with the handle's underlying filesystem. - */ -#if defined(HAVE_FALLOCATE) || defined(HAVE_POSIX_FALLOCATE) - fh->fallocate_available = WT_FALLOCATE_AVAILABLE; - fh->fallocate_requires_locking = true; -#endif -#if defined(__linux__) && defined(SYS_fallocate) - fh->fallocate_available = WT_FALLOCATE_AVAILABLE; - fh->fallocate_requires_locking = true; -#endif -} /* * __posix_std_fallocate -- * Linux fallocate call. */ static int -__posix_std_fallocate(WT_FH *fh, wt_off_t offset, wt_off_t len) +__posix_std_fallocate(WT_FILE_HANDLE *file_handle, + WT_SESSION *wt_session, wt_off_t offset, wt_off_t len) { #if defined(HAVE_FALLOCATE) WT_DECL_RET; + WT_FILE_HANDLE_POSIX *pfh; - WT_SYSCALL_RETRY(fallocate(fh->fd, 0, offset, len), ret); + WT_UNUSED(wt_session); + + pfh = (WT_FILE_HANDLE_POSIX *)file_handle; + + WT_SYSCALL_RETRY(fallocate(pfh->fd, 0, offset, len), ret); return (ret); #else - WT_UNUSED(fh); + WT_UNUSED(file_handle); + WT_UNUSED(wt_session); WT_UNUSED(offset); WT_UNUSED(len); return (ENOTSUP); @@ -64,10 +45,16 @@ __posix_std_fallocate(WT_FH *fh, wt_off_t offset, wt_off_t len) * Linux fallocate call (system call version). */ static int -__posix_sys_fallocate(WT_FH *fh, wt_off_t offset, wt_off_t len) +__posix_sys_fallocate(WT_FILE_HANDLE *file_handle, + WT_SESSION *wt_session, wt_off_t offset, wt_off_t len) { #if defined(__linux__) && defined(SYS_fallocate) WT_DECL_RET; + WT_FILE_HANDLE_POSIX *pfh; + + WT_UNUSED(wt_session); + + pfh = (WT_FILE_HANDLE_POSIX *)file_handle; /* * Try the system call for fallocate even if the C library wrapper was @@ -75,10 +62,11 @@ __posix_sys_fallocate(WT_FH *fh, wt_off_t offset, wt_off_t len) * Linux versions (RHEL 5.5), but not in the version of the C library. * This allows it to work everywhere the kernel supports it. */ - WT_SYSCALL_RETRY(syscall(SYS_fallocate, fh->fd, 0, offset, len), ret); + WT_SYSCALL_RETRY(syscall(SYS_fallocate, pfh->fd, 0, offset, len), ret); return (ret); #else - WT_UNUSED(fh); + WT_UNUSED(file_handle); + WT_UNUSED(wt_session); WT_UNUSED(offset); WT_UNUSED(len); return (ENOTSUP); @@ -90,15 +78,22 @@ __posix_sys_fallocate(WT_FH *fh, wt_off_t offset, wt_off_t len) * POSIX fallocate call. */ static int -__posix_posix_fallocate(WT_FH *fh, wt_off_t offset, wt_off_t len) +__posix_posix_fallocate(WT_FILE_HANDLE *file_handle, + WT_SESSION *wt_session, wt_off_t offset, wt_off_t len) { #if defined(HAVE_POSIX_FALLOCATE) WT_DECL_RET; + WT_FILE_HANDLE_POSIX *pfh; + + WT_UNUSED(wt_session); - WT_SYSCALL_RETRY(posix_fallocate(fh->fd, offset, len), ret); + pfh = (WT_FILE_HANDLE_POSIX *)file_handle; + + WT_SYSCALL_RETRY(posix_fallocate(pfh->fd, offset, len), ret); return (ret); #else - WT_UNUSED(fh); + WT_UNUSED(file_handle); + WT_UNUSED(wt_session); WT_UNUSED(offset); WT_UNUSED(len); return (ENOTSUP); @@ -106,67 +101,45 @@ __posix_posix_fallocate(WT_FH *fh, wt_off_t offset, wt_off_t len) } /* - * __wt_posix_file_allocate -- + * __wt_posix_file_fallocate -- * POSIX fallocate. */ int -__wt_posix_file_allocate( - WT_SESSION_IMPL *session, WT_FH *fh, wt_off_t offset, wt_off_t len) +__wt_posix_file_fallocate(WT_FILE_HANDLE *file_handle, + WT_SESSION *wt_session, wt_off_t offset, wt_off_t len) { - WT_DECL_RET; - - switch (fh->fallocate_available) { - /* - * Check for already configured handles and make the configured call. - */ - case WT_FALLOCATE_POSIX: - if ((ret = __posix_posix_fallocate(fh, offset, len)) == 0) - return (0); - WT_RET_MSG(session, ret, "%s: posix_fallocate", fh->name); - case WT_FALLOCATE_STD: - if ((ret = __posix_std_fallocate(fh, offset, len)) == 0) - return (0); - WT_RET_MSG(session, ret, "%s: fallocate", fh->name); - case WT_FALLOCATE_SYS: - if ((ret = __posix_sys_fallocate(fh, offset, len)) == 0) - return (0); - WT_RET_MSG(session, ret, "%s: sys_fallocate", fh->name); - /* - * Figure out what allocation call this system/filesystem supports, if - * any. + * The first fallocate call: figure out what allocation call this + * system/filesystem supports, if any. + * + * We've seen Linux systems where posix_fallocate has corrupted + * existing file data (even though that is explicitly disallowed + * by POSIX). FreeBSD and Solaris support posix_fallocate, and + * so far we've seen no problems leaving it unlocked. Check for + * fallocate (and the system call version of fallocate) first to + * avoid locking on Linux if at all possible. */ - case WT_FALLOCATE_AVAILABLE: - /* - * We've seen Linux systems where posix_fallocate has corrupted - * existing file data (even though that is explicitly disallowed - * by POSIX). FreeBSD and Solaris support posix_fallocate, and - * so far we've seen no problems leaving it unlocked. Check for - * fallocate (and the system call version of fallocate) first to - * avoid locking on Linux if at all possible. - */ - if ((ret = __posix_std_fallocate(fh, offset, len)) == 0) { - fh->fallocate_available = WT_FALLOCATE_STD; - fh->fallocate_requires_locking = false; - return (0); - } - if ((ret = __posix_sys_fallocate(fh, offset, len)) == 0) { - fh->fallocate_available = WT_FALLOCATE_SYS; - fh->fallocate_requires_locking = false; - return (0); - } - if ((ret = __posix_posix_fallocate(fh, offset, len)) == 0) { - fh->fallocate_available = WT_FALLOCATE_POSIX; -#if !defined(__linux__) - fh->fallocate_requires_locking = false; + if (__posix_std_fallocate(file_handle, wt_session, offset, len) == 0) { + file_handle->fallocate = NULL; + file_handle->fallocate_nolock = __posix_std_fallocate; + return (0); + } + if (__posix_sys_fallocate(file_handle, wt_session, offset, len) == 0) { + file_handle->fallocate = NULL; + file_handle->fallocate_nolock = __posix_sys_fallocate; + return (0); + } + if (__posix_posix_fallocate( + file_handle, wt_session, offset, len) == 0) { +#if defined(__linux__) + file_handle->fallocate = __posix_posix_fallocate; +#else + file_handle->fallocate = NULL; + file_handle->fallocate_nolock = __posix_posix_fallocate; #endif - return (0); - } - /* FALLTHROUGH */ - case WT_FALLOCATE_NOT_AVAILABLE: - default: - fh->fallocate_available = WT_FALLOCATE_NOT_AVAILABLE; - return (ENOTSUP); + return (0); } - /* NOTREACHED */ + + file_handle->fallocate = NULL; + return (ENOTSUP); } diff --git a/src/os_posix/os_fs.c b/src/os_posix/os_fs.c index 5cf8ac2118b..9645652d3e9 100644 --- a/src/os_posix/os_fs.c +++ b/src/os_posix/os_fs.c @@ -13,30 +13,13 @@ * Underlying support function to flush a file handle. */ static int -__posix_sync(WT_SESSION_IMPL *session, - int fd, const char *name, const char *func, bool block) +__posix_sync( + WT_SESSION_IMPL *session, int fd, const char *name, const char *func) { WT_DECL_RET; WT_ASSERT(session, !F_ISSET(S2C(session), WT_CONN_READONLY)); -#ifdef HAVE_SYNC_FILE_RANGE - if (!block) { - WT_SYSCALL_RETRY(sync_file_range(fd, - (off64_t)0, (off64_t)0, SYNC_FILE_RANGE_WRITE), ret); - if (ret == 0) - return (0); - WT_RET_MSG(session, ret, "%s: %s: sync_file_range", name, func); - } -#else - /* - * Callers attempting asynchronous flush handle ENOTSUP returns, and - * won't make further attempts. - */ - if (!block) - return (ENOTSUP); -#endif - #if defined(F_FULLFSYNC) /* * OS X fsync documentation: @@ -73,47 +56,29 @@ __posix_sync(WT_SESSION_IMPL *session, #endif } +#ifdef __linux__ /* * __posix_directory_sync -- * Flush a directory to ensure file creation is durable. */ static int -__posix_directory_sync(WT_SESSION_IMPL *session, const char *path) +__posix_directory_sync( + WT_FILE_SYSTEM *file_system, WT_SESSION *wt_session, const char *path) { -#ifdef __linux__ WT_DECL_RET; + WT_SESSION_IMPL *session; int fd, tret; - char *copy, *dir; - /* - * POSIX 1003.1 does not require that fsync of a file handle ensures the - * entry in the directory containing the file has also reached disk (and - * there are historic Linux filesystems requiring this), do an explicit - * fsync on a file descriptor for the directory to be sure. - */ - copy = NULL; - if (path == NULL || strchr(path, '/') == NULL) - path = S2C(session)->home; - else { - /* - * File name construction should not return a path without any - * slash separator, but caution isn't unreasonable. - */ - WT_RET(__wt_filename(session, path, ©)); - if ((dir = strrchr(copy, '/')) == NULL) - path = S2C(session)->home; - else { - dir[1] = '\0'; - path = copy; - } - } + WT_UNUSED(file_system); + + session = (WT_SESSION_IMPL *)wt_session; WT_SYSCALL_RETRY(( (fd = open(path, O_RDONLY, 0444)) == -1 ? 1 : 0), ret); if (ret != 0) - WT_ERR_MSG(session, ret, "%s: directory-sync: open", path); + WT_RET_MSG(session, ret, "%s: directory-sync: open", path); - ret = __posix_sync(session, fd, path, "directory-sync", true); + ret = __posix_sync(session, fd, path, "directory-sync"); WT_SYSCALL_RETRY(close(fd), tret); if (tret != 0) { @@ -121,40 +86,36 @@ __posix_directory_sync(WT_SESSION_IMPL *session, const char *path) if (ret == 0) ret = tret; } -err: __wt_free(session, copy); return (ret); -#else - WT_UNUSED(session); - WT_UNUSED(path); - return (0); -#endif } +#endif /* * __posix_fs_exist -- * Return if the file exists. */ static int -__posix_fs_exist(WT_SESSION_IMPL *session, const char *name, bool *existp) +__posix_fs_exist(WT_FILE_SYSTEM *file_system, + WT_SESSION *wt_session, const char *name, bool *existp) { struct stat sb; WT_DECL_RET; - char *path; + WT_SESSION_IMPL *session; + + WT_UNUSED(file_system); - WT_RET(__wt_filename(session, name, &path)); - name = path; + session = (WT_SESSION_IMPL *)wt_session; WT_SYSCALL_RETRY(stat(name, &sb), ret); - if (ret == 0) + if (ret == 0) { *existp = true; - else if (ret == ENOENT) { + return (0); + } + if (ret == ENOENT) { *existp = false; - ret = 0; - } else - __wt_err(session, ret, "%s: file-exist: stat", name); - - __wt_free(session, path); - return (ret); + return (0); + } + WT_RET_MSG(session, ret, "%s: file-exist: stat", name); } /* @@ -162,26 +123,20 @@ __posix_fs_exist(WT_SESSION_IMPL *session, const char *name, bool *existp) * Remove a file. */ static int -__posix_fs_remove(WT_SESSION_IMPL *session, const char *name) +__posix_fs_remove( + WT_FILE_SYSTEM *file_system, WT_SESSION *wt_session, const char *name) { WT_DECL_RET; - char *path; + WT_SESSION_IMPL *session; -#ifdef HAVE_DIAGNOSTIC - if (__wt_handle_search(session, name, false, NULL, NULL)) - WT_RET_MSG(session, EINVAL, - "%s: file-remove: file has open handles", name); -#endif + WT_UNUSED(file_system); - WT_RET(__wt_filename(session, name, &path)); - name = path; + session = (WT_SESSION_IMPL *)wt_session; WT_SYSCALL_RETRY(remove(name), ret); - if (ret != 0) - __wt_err(session, ret, "%s: file-remove: remove", name); - - __wt_free(session, path); - return (ret); + if (ret == 0) + return (0); + WT_RET_MSG(session, ret, "%s: file-remove: remove", name); } /* @@ -189,34 +144,20 @@ __posix_fs_remove(WT_SESSION_IMPL *session, const char *name) * Rename a file. */ static int -__posix_fs_rename(WT_SESSION_IMPL *session, const char *from, const char *to) +__posix_fs_rename(WT_FILE_SYSTEM *file_system, + WT_SESSION *wt_session, const char *from, const char *to) { WT_DECL_RET; - char *from_path, *to_path; - -#ifdef HAVE_DIAGNOSTIC - if (__wt_handle_search(session, from, false, NULL, NULL)) - WT_RET_MSG(session, EINVAL, - "%s: file-rename: file has open handles", from); - if (__wt_handle_search(session, to, false, NULL, NULL)) - WT_RET_MSG(session, EINVAL, - "%s: file-rename: file has open handles", to); -#endif + WT_SESSION_IMPL *session; - from_path = to_path = NULL; - WT_ERR(__wt_filename(session, from, &from_path)); - from = from_path; - WT_ERR(__wt_filename(session, to, &to_path)); - to = to_path; + WT_UNUSED(file_system); - WT_SYSCALL_RETRY(rename(from, to), ret); - if (ret != 0) - __wt_err(session, ret, - "%s to %s: file-rename: rename", from, to); + session = (WT_SESSION_IMPL *)wt_session; -err: __wt_free(session, from_path); - __wt_free(session, to_path); - return (ret); + WT_SYSCALL_RETRY(rename(from, to), ret); + if (ret == 0) + return (0); + WT_RET_MSG(session, ret, "%s to %s: file-rename: rename", from, to); } /* @@ -224,90 +165,86 @@ err: __wt_free(session, from_path); * Get the size of a file in bytes, by file name. */ static int -__posix_fs_size( - WT_SESSION_IMPL *session, const char *name, bool silent, wt_off_t *sizep) +__posix_fs_size(WT_FILE_SYSTEM *file_system, + WT_SESSION *wt_session, const char *name, wt_off_t *sizep) { struct stat sb; WT_DECL_RET; - char *path; + WT_SESSION_IMPL *session; - WT_RET(__wt_filename(session, name, &path)); - name = path; + WT_UNUSED(file_system); + + session = (WT_SESSION_IMPL *)wt_session; - /* - * Optionally don't log errors on ENOENT; some callers of this function - * expect failure in that case and don't want an error message logged. - */ WT_SYSCALL_RETRY(stat(name, &sb), ret); - if (ret == 0) + if (ret == 0) { *sizep = sb.st_size; - else if (ret != ENOENT || !silent) - __wt_err(session, ret, "%s: file-size: stat", name); - - __wt_free(session, path); - - return (ret); + return (0); + } + WT_RET_MSG(session, ret, "%s: file-size: stat", name); } +#if defined(HAVE_POSIX_FADVISE) /* * __posix_file_advise -- * POSIX fadvise. */ static int -__posix_file_advise(WT_SESSION_IMPL *session, - WT_FH *fh, wt_off_t offset, wt_off_t len, int advice) +__posix_file_advise(WT_FILE_HANDLE *file_handle, WT_SESSION *wt_session, + wt_off_t offset, wt_off_t len, int advice) { -#if defined(HAVE_POSIX_FADVISE) WT_DECL_RET; + WT_FILE_HANDLE_POSIX *pfh; + WT_SESSION_IMPL *session; - /* - * Refuse pre-load when direct I/O is configured for the file, the - * kernel cache isn't interesting. - */ - if (advice == POSIX_MADV_WILLNEED && fh->direct_io) - return (ENOTSUP); + session = (WT_SESSION_IMPL *)wt_session; + pfh = (WT_FILE_HANDLE_POSIX *)file_handle; - WT_SYSCALL_RETRY(posix_fadvise(fh->fd, offset, len, advice), ret); + WT_SYSCALL_RETRY(posix_fadvise(pfh->fd, offset, len, advice), ret); if (ret == 0) return (0); /* * Treat EINVAL as not-supported, some systems don't support some flags. - * Quietly fail, callers expect not-supported failures. + * Quietly fail, callers expect not-supported failures, and reset the + * handle method to prevent future calls. */ - if (ret == EINVAL) + if (ret == EINVAL) { + file_handle->fadvise = NULL; return (ENOTSUP); + } - WT_RET_MSG(session, ret, "%s: handle-advise: posix_fadvise", fh->name); -#else - WT_UNUSED(session); - WT_UNUSED(fh); - WT_UNUSED(offset); - WT_UNUSED(len); - WT_UNUSED(advice); + WT_RET_MSG(session, ret, + "%s: handle-advise: posix_fadvise", file_handle->name); - /* Quietly fail, callers expect not-supported failures. */ - return (ENOTSUP); -#endif } +#endif /* * __posix_file_close -- * ANSI C close. */ static int -__posix_file_close(WT_SESSION_IMPL *session, WT_FH *fh) +__posix_file_close(WT_FILE_HANDLE *file_handle, WT_SESSION *wt_session) { WT_DECL_RET; + WT_FILE_HANDLE_POSIX *pfh; + WT_SESSION_IMPL *session; + + session = (WT_SESSION_IMPL *)wt_session; + pfh = (WT_FILE_HANDLE_POSIX *)file_handle; /* Close the file handle. */ - if (fh->fd == -1) - return (0); + if (pfh->fd != -1) { + WT_SYSCALL_RETRY(close(pfh->fd), ret); + if (ret != 0) + __wt_err(session, ret, + "%s: handle-close: close", file_handle->name); + } - WT_SYSCALL_RETRY(close(fh->fd), ret); - if (ret == 0) - return (0); - WT_RET_MSG(session, ret, "%s: handle-close: close", fh->name); + __wt_free(session, file_handle->name); + __wt_free(session, pfh); + return (ret); } /* @@ -315,10 +252,16 @@ __posix_file_close(WT_SESSION_IMPL *session, WT_FH *fh) * Lock/unlock a file. */ static int -__posix_file_lock(WT_SESSION_IMPL *session, WT_FH *fh, bool lock) +__posix_file_lock( + WT_FILE_HANDLE *file_handle, WT_SESSION *wt_session, bool lock) { struct flock fl; WT_DECL_RET; + WT_FILE_HANDLE_POSIX *pfh; + WT_SESSION_IMPL *session; + + session = (WT_SESSION_IMPL *)wt_session; + pfh = (WT_FILE_HANDLE_POSIX *)file_handle; /* * WiredTiger requires this function be able to acquire locks past @@ -334,10 +277,10 @@ __posix_file_lock(WT_SESSION_IMPL *session, WT_FH *fh, bool lock) fl.l_type = lock ? F_WRLCK : F_UNLCK; fl.l_whence = SEEK_SET; - WT_SYSCALL_RETRY(fcntl(fh->fd, F_SETLK, &fl), ret); + WT_SYSCALL_RETRY(fcntl(pfh->fd, F_SETLK, &fl), ret); if (ret == 0) return (0); - WT_RET_MSG(session, ret, "%s: handle-lock: fcntl", fh->name); + WT_RET_MSG(session, ret, "%s: handle-lock: fcntl", file_handle->name); } /* @@ -345,16 +288,21 @@ __posix_file_lock(WT_SESSION_IMPL *session, WT_FH *fh, bool lock) * POSIX pread. */ static int -__posix_file_read( - WT_SESSION_IMPL *session, WT_FH *fh, wt_off_t offset, size_t len, void *buf) +__posix_file_read(WT_FILE_HANDLE *file_handle, + WT_SESSION *wt_session, wt_off_t offset, size_t len, void *buf) { + WT_FILE_HANDLE_POSIX *pfh; + WT_SESSION_IMPL *session; size_t chunk; ssize_t nr; uint8_t *addr; + session = (WT_SESSION_IMPL *)wt_session; + pfh = (WT_FILE_HANDLE_POSIX *)file_handle; + /* Assert direct I/O is aligned and a multiple of the alignment. */ WT_ASSERT(session, - !fh->direct_io || + !pfh->direct_io || S2C(session)->buffer_alignment == 0 || (!((uintptr_t)buf & (uintptr_t)(S2C(session)->buffer_alignment - 1)) && @@ -364,11 +312,11 @@ __posix_file_read( /* Break reads larger than 1GB into 1GB chunks. */ for (addr = buf; len > 0; addr += nr, len -= (size_t)nr, offset += nr) { chunk = WT_MIN(len, WT_GIGABYTE); - if ((nr = pread(fh->fd, addr, chunk, offset)) <= 0) + if ((nr = pread(pfh->fd, addr, chunk, offset)) <= 0) WT_RET_MSG(session, nr == 0 ? WT_ERROR : __wt_errno(), "%s: handle-read: pread: failed to read %" WT_SIZET_FMT " bytes at offset %" PRIuMAX, - fh->name, chunk, (uintmax_t)offset); + file_handle->name, chunk, (uintmax_t)offset); } return (0); } @@ -378,17 +326,23 @@ __posix_file_read( * Get the size of a file in bytes, by file handle. */ static int -__posix_file_size(WT_SESSION_IMPL *session, WT_FH *fh, wt_off_t *sizep) +__posix_file_size( + WT_FILE_HANDLE *file_handle, WT_SESSION *wt_session, wt_off_t *sizep) { struct stat sb; WT_DECL_RET; + WT_FILE_HANDLE_POSIX *pfh; + WT_SESSION_IMPL *session; + + session = (WT_SESSION_IMPL *)wt_session; + pfh = (WT_FILE_HANDLE_POSIX *)file_handle; - WT_SYSCALL_RETRY(fstat(fh->fd, &sb), ret); + WT_SYSCALL_RETRY(fstat(pfh->fd, &sb), ret); if (ret == 0) { *sizep = sb.st_size; return (0); } - WT_RET_MSG(session, ret, "%s: handle-size: fstat", fh->name); + WT_RET_MSG(session, ret, "%s: handle-size: fstat", file_handle->name); } /* @@ -396,24 +350,62 @@ __posix_file_size(WT_SESSION_IMPL *session, WT_FH *fh, wt_off_t *sizep) * POSIX fsync. */ static int -__posix_file_sync(WT_SESSION_IMPL *session, WT_FH *fh, bool block) +__posix_file_sync(WT_FILE_HANDLE *file_handle, WT_SESSION *wt_session) { - return (__posix_sync(session, fh->fd, fh->name, "handle-sync", block)); + WT_FILE_HANDLE_POSIX *pfh; + WT_SESSION_IMPL *session; + + session = (WT_SESSION_IMPL *)wt_session; + pfh = (WT_FILE_HANDLE_POSIX *)file_handle; + + return ( + __posix_sync(session, pfh->fd, file_handle->name, "handle-sync")); } +#ifdef HAVE_SYNC_FILE_RANGE +/* + * __posix_file_sync_nowait -- + * POSIX fsync. + */ +static int +__posix_file_sync_nowait(WT_FILE_HANDLE *file_handle, WT_SESSION *wt_session) +{ + WT_DECL_RET; + WT_FILE_HANDLE_POSIX *pfh; + WT_SESSION_IMPL *session; + + session = (WT_SESSION_IMPL *)wt_session; + pfh = (WT_FILE_HANDLE_POSIX *)file_handle; + + WT_SYSCALL_RETRY(sync_file_range(pfh->fd, + (off64_t)0, (off64_t)0, SYNC_FILE_RANGE_WRITE), ret); + if (ret == 0) + return (0); + WT_RET_MSG(session, ret, + "%s: handle-sync-nowait: sync_file_range", file_handle->name); +} +#endif + /* * __posix_file_truncate -- * POSIX ftruncate. */ static int -__posix_file_truncate(WT_SESSION_IMPL *session, WT_FH *fh, wt_off_t len) +__posix_file_truncate( + WT_FILE_HANDLE *file_handle, WT_SESSION *wt_session, wt_off_t len) { WT_DECL_RET; + WT_FILE_HANDLE_POSIX *pfh; + WT_SESSION_IMPL *session; + + session = (WT_SESSION_IMPL *)wt_session; + pfh = (WT_FILE_HANDLE_POSIX *)file_handle; - WT_SYSCALL_RETRY(ftruncate(fh->fd, len), ret); + WT_SYSCALL_RETRY(ftruncate(pfh->fd, len), ret); if (ret == 0) return (0); - WT_RET_MSG(session, ret, "%s: handle-truncate: ftruncate", fh->name); + WT_RET_MSG(session, ret, + "%s: handle-truncate: ftruncate", file_handle->name); } /* @@ -421,16 +413,21 @@ __posix_file_truncate(WT_SESSION_IMPL *session, WT_FH *fh, wt_off_t len) * POSIX pwrite. */ static int -__posix_file_write(WT_SESSION_IMPL *session, - WT_FH *fh, wt_off_t offset, size_t len, const void *buf) +__posix_file_write(WT_FILE_HANDLE *file_handle, WT_SESSION *wt_session, + wt_off_t offset, size_t len, const void *buf) { + WT_FILE_HANDLE_POSIX *pfh; + WT_SESSION_IMPL *session; size_t chunk; ssize_t nw; const uint8_t *addr; + session = (WT_SESSION_IMPL *)wt_session; + pfh = (WT_FILE_HANDLE_POSIX *)file_handle; + /* Assert direct I/O is aligned and a multiple of the alignment. */ WT_ASSERT(session, - !fh->direct_io || + !pfh->direct_io || S2C(session)->buffer_alignment == 0 || (!((uintptr_t)buf & (uintptr_t)(S2C(session)->buffer_alignment - 1)) && @@ -440,21 +437,21 @@ __posix_file_write(WT_SESSION_IMPL *session, /* Break writes larger than 1GB into 1GB chunks. */ for (addr = buf; len > 0; addr += nw, len -= (size_t)nw, offset += nw) { chunk = WT_MIN(len, WT_GIGABYTE); - if ((nw = pwrite(fh->fd, addr, chunk, offset)) < 0) + if ((nw = pwrite(pfh->fd, addr, chunk, offset)) < 0) WT_RET_MSG(session, __wt_errno(), "%s: handle-write: pwrite: failed to write %" WT_SIZET_FMT " bytes at offset %" PRIuMAX, - fh->name, chunk, (uintmax_t)offset); + file_handle->name, chunk, (uintmax_t)offset); } return (0); } /* - * __posix_file_open_cloexec -- + * __posix_open_file_cloexec -- * Prevent child access to file handles. */ static inline int -__posix_file_open_cloexec(WT_SESSION_IMPL *session, int fd, const char *name) +__posix_open_file_cloexec(WT_SESSION_IMPL *session, int fd, const char *name) { #if defined(HAVE_FCNTL) && defined(FD_CLOEXEC) && !defined(O_CLOEXEC) int f; @@ -479,24 +476,35 @@ __posix_file_open_cloexec(WT_SESSION_IMPL *session, int fd, const char *name) } /* - * __posix_file_open -- + * __posix_open_file -- * Open a file handle. */ static int -__posix_file_open(WT_SESSION_IMPL *session, - WT_FH *fh, const char *name, uint32_t file_type, uint32_t flags) +__posix_open_file(WT_FILE_SYSTEM *file_system, WT_SESSION *wt_session, + const char *name, WT_OPEN_FILE_TYPE file_type, uint32_t flags, + WT_FILE_HANDLE **file_handlep) { WT_CONNECTION_IMPL *conn; WT_DECL_RET; + WT_FILE_HANDLE *file_handle; + WT_FILE_HANDLE_POSIX *pfh; + WT_SESSION_IMPL *session; mode_t mode; - int f, fd, tret; + int f; + WT_UNUSED(file_system); + + *file_handlep = NULL; + + session = (WT_SESSION_IMPL *)wt_session; conn = S2C(session); + WT_RET(__wt_calloc_one(session, &pfh)); + /* Set up error handling. */ - fh->fd = fd = -1; + pfh->fd = -1; - if (file_type == WT_FILE_TYPE_DIRECTORY) { + if (file_type == WT_OPEN_FILE_TYPE_DIRECTORY) { f = O_RDONLY; #ifdef O_CLOEXEC /* @@ -507,10 +515,10 @@ __posix_file_open(WT_SESSION_IMPL *session, f |= O_CLOEXEC; #endif WT_SYSCALL_RETRY(( - (fd = open(name, f, 0444)) == -1 ? 1 : 0), ret); + (pfh->fd = open(name, f, 0444)) == -1 ? 1 : 0), ret); if (ret != 0) WT_ERR_MSG(session, ret, "%s: handle-open: open", name); - WT_ERR(__posix_file_open_cloexec(session, fd, name)); + WT_ERR(__posix_open_file_cloexec(session, pfh->fd, name)); goto directory_open; } @@ -539,16 +547,17 @@ __posix_file_open(WT_SESSION_IMPL *session, /* Direct I/O. */ if (LF_ISSET(WT_OPEN_DIRECTIO)) { f |= O_DIRECT; - fh->direct_io = true; - } + pfh->direct_io = true; + } else + pfh->direct_io = false; #endif #ifdef O_NOATIME /* Avoid updating metadata for read-only workloads. */ - if (file_type == WT_FILE_TYPE_DATA) + if (file_type == WT_OPEN_FILE_TYPE_DATA) f |= O_NOATIME; #endif - if (file_type == WT_FILE_TYPE_LOG && + if (file_type == WT_OPEN_FILE_TYPE_LOG && FLD_ISSET(conn->txn_logsync, WT_LOG_DSYNC)) { #ifdef O_DSYNC f |= O_DSYNC; @@ -560,20 +569,24 @@ __posix_file_open(WT_SESSION_IMPL *session, #endif } - WT_SYSCALL_RETRY(((fd = open(name, f, mode)) == -1 ? 1 : 0), ret); + WT_SYSCALL_RETRY(((pfh->fd = open(name, f, mode)) == -1 ? 1 : 0), ret); if (ret != 0) WT_ERR_MSG(session, ret, - fh->direct_io ? + pfh->direct_io ? "%s: handle-open: open: failed with direct I/O configured, " "some filesystem types do not support direct I/O" : "%s: handle-open: open", name); - WT_ERR(__posix_file_open_cloexec(session, fd, name)); + WT_ERR(__posix_open_file_cloexec(session, pfh->fd, name)); - /* Disable read-ahead on trees: it slows down random read workloads. */ #if defined(HAVE_POSIX_FADVISE) - if (file_type == WT_FILE_TYPE_DATA) { + /* + * Disable read-ahead on trees: it slows down random read workloads. + * Ignore fadvise when doing direct I/O, the kernel cache isn't + * interesting. + */ + if (!pfh->direct_io && file_type == WT_OPEN_FILE_TYPE_DATA) { WT_SYSCALL_RETRY( - posix_fadvise(fd, 0, 0, POSIX_FADV_RANDOM), ret); + posix_fadvise(pfh->fd, 0, 0, POSIX_FADV_RANDOM), ret); if (ret != 0) WT_ERR_MSG(session, ret, "%s: handle-open: posix_fadvise", name); @@ -581,66 +594,99 @@ __posix_file_open(WT_SESSION_IMPL *session, #endif directory_open: - fh->fd = fd; - - /* Configure fallocate calls. */ - __wt_posix_file_allocate_configure(session, fh); - - fh->fh_advise = __posix_file_advise; - fh->fh_allocate = __wt_posix_file_allocate; - fh->fh_close = __posix_file_close; - fh->fh_lock = __posix_file_lock; - fh->fh_map = __wt_posix_map; - fh->fh_map_discard = __wt_posix_map_discard; - fh->fh_map_preload = __wt_posix_map_preload; - fh->fh_map_unmap = __wt_posix_map_unmap; - fh->fh_read = __posix_file_read; - fh->fh_size = __posix_file_size; - fh->fh_sync = __posix_file_sync; - fh->fh_truncate = __posix_file_truncate; - fh->fh_write = __posix_file_write; + /* Initialize public information. */ + file_handle = (WT_FILE_HANDLE *)pfh; + WT_ERR(__wt_strdup(session, name, &file_handle->name)); + + file_handle->close = __posix_file_close; +#if defined(HAVE_POSIX_FADVISE) + /* + * Ignore fadvise when doing direct I/O, the kernel cache isn't + * interesting. + */ + if (!pfh->direct_io) + file_handle->fadvise = __posix_file_advise; +#endif + file_handle->fallocate = __wt_posix_file_fallocate; + file_handle->lock = __posix_file_lock; +#ifdef WORDS_BIGENDIAN + /* + * The underlying objects are little-endian, mapping objects isn't + * currently supported on big-endian systems. + */ +#else + file_handle->map = __wt_posix_map; +#ifdef HAVE_POSIX_MADVISE + file_handle->map_discard = __wt_posix_map_discard; + file_handle->map_preload = __wt_posix_map_preload; +#endif + file_handle->unmap = __wt_posix_unmap; +#endif + file_handle->read = __posix_file_read; + file_handle->size = __posix_file_size; + file_handle->sync = __posix_file_sync; +#ifdef HAVE_SYNC_FILE_RANGE + file_handle->sync_nowait = __posix_file_sync_nowait; +#endif + file_handle->truncate = __posix_file_truncate; + file_handle->write = __posix_file_write; + + *file_handlep = file_handle; return (0); -err: if (fd != -1) { - WT_SYSCALL_RETRY(close(fd), tret); - if (tret != 0) - __wt_err(session, tret, "%s: handle-open: close", name); - } +err: WT_TRET(__posix_file_close((WT_FILE_HANDLE *)pfh, wt_session)); return (ret); } /* - * __wt_os_posix -- - * Initialize a POSIX configuration. + * __posix_terminate -- + * Terminate a POSIX configuration. */ -int -__wt_os_posix(WT_SESSION_IMPL *session) +static int +__posix_terminate(WT_FILE_SYSTEM *file_system, WT_SESSION *wt_session) { - WT_CONNECTION_IMPL *conn; + WT_SESSION_IMPL *session; - conn = S2C(session); + WT_UNUSED(file_system); - /* Initialize the POSIX jump table. */ - conn->file_directory_list = __wt_posix_directory_list; - conn->file_directory_sync = __posix_directory_sync; - conn->file_exist = __posix_fs_exist; - conn->file_open = __posix_file_open; - conn->file_remove = __posix_fs_remove; - conn->file_rename = __posix_fs_rename; - conn->file_size = __posix_fs_size; + session = (WT_SESSION_IMPL *)wt_session; + __wt_free(session, file_system); return (0); } /* - * __wt_os_posix_cleanup -- - * Discard a POSIX configuration. + * __wt_os_posix -- + * Initialize a POSIX configuration. */ int -__wt_os_posix_cleanup(WT_SESSION_IMPL *session) +__wt_os_posix(WT_SESSION_IMPL *session) { - WT_UNUSED(session); + WT_CONNECTION_IMPL *conn; + WT_FILE_SYSTEM *file_system; + + conn = S2C(session); + + WT_RET(__wt_calloc_one(session, &file_system)); + + /* Initialize the POSIX jump table. */ + file_system->directory_list = __wt_posix_directory_list; + file_system->directory_list_free = __wt_posix_directory_list_free; +#ifdef __linux__ + file_system->directory_sync = __posix_directory_sync; +#else + file_system->directory_sync = NULL; +#endif + file_system->exist = __posix_fs_exist; + file_system->open_file = __posix_open_file; + file_system->remove = __posix_fs_remove; + file_system->rename = __posix_fs_rename; + file_system->size = __posix_fs_size; + file_system->terminate = __posix_terminate; + + /* Switch it into place. */ + conn->file_system = file_system; return (0); } diff --git a/src/os_posix/os_map.c b/src/os_posix/os_map.c index de28891ffd1..7fde4037250 100644 --- a/src/os_posix/os_map.c +++ b/src/os_posix/os_map.c @@ -13,23 +13,26 @@ * Map a file into memory. */ int -__wt_posix_map(WT_SESSION_IMPL *session, - WT_FH *fh, void *mapp, size_t *lenp, void **mappingcookie) +__wt_posix_map(WT_FILE_HANDLE *fh, WT_SESSION *wt_session, + void *mapped_regionp, size_t *lenp, void *mapped_cookiep) { + WT_FILE_HANDLE_POSIX *pfh; + WT_SESSION_IMPL *session; size_t len; wt_off_t file_size; void *map; - WT_UNUSED(mappingcookie); + WT_UNUSED(mapped_cookiep); - WT_ASSERT(session, !F_ISSET(S2C(session), WT_CONN_IN_MEMORY)); + session = (WT_SESSION_IMPL *)wt_session; + pfh = (WT_FILE_HANDLE_POSIX *)fh; /* * Mapping isn't possible if direct I/O configured for the file, the * Linux open(2) documentation says applications should avoid mixing * mmap(2) of files with direct I/O to the same files. */ - if (fh->direct_io) + if (pfh->direct_io) return (ENOTSUP); /* @@ -37,7 +40,7 @@ __wt_posix_map(WT_SESSION_IMPL *session, * underneath us, our caller needs to ensure consistency of the mapped * region vs. any other file activity. */ - WT_RET(__wt_filesize(session, fh, &file_size)); + WT_RET(fh->size(fh, wt_session, &file_size)); len = (size_t)file_size; (void)__wt_verbose(session, WT_VERB_HANDLEOPS, @@ -49,43 +52,48 @@ __wt_posix_map(WT_SESSION_IMPL *session, MAP_NOCORE | #endif MAP_PRIVATE, - fh->fd, (wt_off_t)0)) == MAP_FAILED) + pfh->fd, (wt_off_t)0)) == MAP_FAILED) WT_RET_MSG(session, __wt_errno(), "%s: memory-map: mmap", fh->name); - *(void **)mapp = map; + *(void **)mapped_regionp = map; *lenp = len; return (0); } #ifdef HAVE_POSIX_MADVISE /* - * __posix_map_preload_madvise -- + * __wt_posix_map_preload -- * Cause a section of a memory map to be faulted in. */ -static int -__posix_map_preload_madvise( - WT_SESSION_IMPL *session, WT_FH *fh, const void *p, size_t size) +int +__wt_posix_map_preload(WT_FILE_HANDLE *fh, + WT_SESSION *wt_session, const void *map, size_t length, void *mapped_cookie) { WT_BM *bm; WT_CONNECTION_IMPL *conn; WT_DECL_RET; + WT_SESSION_IMPL *session; void *blk; + WT_UNUSED(mapped_cookie); + + session = (WT_SESSION_IMPL *)wt_session; + conn = S2C(session); bm = S2BT(session)->bm; /* Linux requires the address be aligned to a 4KB boundary. */ - blk = (void *)((uintptr_t)p & ~(uintptr_t)(conn->page_size - 1)); - size += WT_PTRDIFF(p, blk); + blk = (void *)((uintptr_t)map & ~(uintptr_t)(conn->page_size - 1)); + length += WT_PTRDIFF(map, blk); /* XXX proxy for "am I doing a scan?" -- manual read-ahead */ if (F_ISSET(session, WT_SESSION_NO_CACHE)) { /* Read in 2MB blocks every 1MB of data. */ - if (((uintptr_t)((uint8_t *)blk + size) & + if (((uintptr_t)((uint8_t *)blk + length) & (uintptr_t)((1<<20) - 1)) < (uintptr_t)blk) return (0); - size = WT_MIN(WT_MAX(20 * size, 2 << 20), + length = WT_MIN(WT_MAX(20 * length, 2 << 20), WT_PTRDIFF((uint8_t *)bm->map + bm->maplen, blk)); } @@ -93,10 +101,10 @@ __posix_map_preload_madvise( * Manual pages aren't clear on whether alignment is required for the * size, so we will be conservative. */ - size &= ~(size_t)(conn->page_size - 1); + length &= ~(size_t)(conn->page_size - 1); - if (size <= (size_t)conn->page_size || - (ret = posix_madvise(blk, size, POSIX_MADV_WILLNEED)) == 0) + if (length <= (size_t)conn->page_size || + (ret = posix_madvise(blk, length, POSIX_MADV_WILLNEED)) == 0) return (0); WT_RET_MSG(session, ret, "%s: memory-map preload: posix_madvise: POSIX_MADV_WILLNEED", @@ -104,46 +112,30 @@ __posix_map_preload_madvise( } #endif -/* - * __wt_posix_map_preload -- - * Cause a section of a memory map to be faulted in. - */ -int -__wt_posix_map_preload( - WT_SESSION_IMPL *session, WT_FH *fh, const void *p, size_t size) -{ - WT_ASSERT(session, !F_ISSET(S2C(session), WT_CONN_IN_MEMORY)); - -#ifdef HAVE_POSIX_MADVISE - return (__posix_map_preload_madvise(session, fh, p, size)); -#else - WT_UNUSED(fh); - WT_UNUSED(p); - WT_UNUSED(size); - return (ENOTSUP); -#endif -} - #ifdef HAVE_POSIX_MADVISE /* - * __posix_map_discard_madvise -- + * __wt_posix_map_discard -- * Discard a chunk of the memory map. */ -static int -__posix_map_discard_madvise( - WT_SESSION_IMPL *session, WT_FH *fh, void *p, size_t size) +int +__wt_posix_map_discard(WT_FILE_HANDLE *fh, + WT_SESSION *wt_session, void *map, size_t length, void *mapped_cookie) { WT_CONNECTION_IMPL *conn; WT_DECL_RET; + WT_SESSION_IMPL *session; void *blk; + WT_UNUSED(mapped_cookie); + + session = (WT_SESSION_IMPL *)wt_session; conn = S2C(session); /* Linux requires the address be aligned to a 4KB boundary. */ - blk = (void *)((uintptr_t)p & ~(uintptr_t)(conn->page_size - 1)); - size += WT_PTRDIFF(p, blk); + blk = (void *)((uintptr_t)map & ~(uintptr_t)(conn->page_size - 1)); + length += WT_PTRDIFF(map, blk); - if ((ret = posix_madvise(blk, size, POSIX_MADV_DONTNEED)) == 0) + if ((ret = posix_madvise(blk, length, POSIX_MADV_DONTNEED)) == 0) return (0); WT_RET_MSG(session, ret, "%s: memory-map discard: posix_madvise: POSIX_MADV_DONTNEED", @@ -152,41 +144,23 @@ __posix_map_discard_madvise( #endif /* - * __wt_posix_map_discard -- - * Discard a chunk of the memory map. - */ -int -__wt_posix_map_discard( - WT_SESSION_IMPL *session, WT_FH *fh, void *p, size_t size) -{ - WT_ASSERT(session, !F_ISSET(S2C(session), WT_CONN_IN_MEMORY)); - -#ifdef HAVE_POSIX_MADVISE - return (__posix_map_discard_madvise(session, fh, p, size)); -#else - WT_UNUSED(fh); - WT_UNUSED(p); - WT_UNUSED(size); - return (ENOTSUP); -#endif -} - -/* - * __wt_posix_map_unmap -- + * __wt_posix_unmap -- * Remove a memory mapping. */ int -__wt_posix_map_unmap(WT_SESSION_IMPL *session, - WT_FH *fh, void *map, size_t len, void **mappingcookie) +__wt_posix_unmap(WT_FILE_HANDLE *fh, WT_SESSION *wt_session, + void *mapped_region, size_t len, void *mapped_cookie) { - WT_UNUSED(mappingcookie); + WT_SESSION_IMPL *session; + + WT_UNUSED(mapped_cookie); - WT_ASSERT(session, !F_ISSET(S2C(session), WT_CONN_IN_MEMORY)); + session = (WT_SESSION_IMPL *)wt_session; (void)__wt_verbose(session, WT_VERB_HANDLEOPS, "%s: memory-unmap: %" WT_SIZET_FMT " bytes", fh->name, len); - if (munmap(map, len) == 0) + if (munmap(mapped_region, len) == 0) return (0); WT_RET_MSG(session, __wt_errno(), "%s: memory-unmap: munmap", fh->name); diff --git a/src/os_win/os_dir.c b/src/os_win/os_dir.c index 64eae60983c..6f796f6ef7d 100644 --- a/src/os_win/os_dir.c +++ b/src/os_win/os_dir.c @@ -13,34 +13,37 @@ * Get a list of files from a directory, MSVC version. */ int -__wt_win_directory_list(WT_SESSION_IMPL *session, const char *dir, - const char *prefix, uint32_t flags, char ***dirlist, u_int *countp) +__wt_win_directory_list(WT_FILE_SYSTEM *file_system, + WT_SESSION *wt_session, const char *directory, + const char *prefix, char ***dirlistp, uint32_t *countp) { HANDLE findhandle; WIN32_FIND_DATA finddata; WT_DECL_ITEM(pathbuf); WT_DECL_RET; + WT_SESSION_IMPL *session; size_t dirallocsz, pathlen; - u_int count, dirsz; - bool match; - char **entries, *path; + uint32_t count; + char *dir_copy, **entries; - *dirlist = NULL; - *countp = 0; + WT_UNUSED(file_system); - WT_RET(__wt_filename(session, dir, &path)); + session = (WT_SESSION_IMPL *)wt_session; - pathlen = strlen(path); - if (path[pathlen - 1] == '\\') - path[pathlen - 1] = '\0'; - WT_ERR(__wt_scr_alloc(session, pathlen + 3, &pathbuf)); - WT_ERR(__wt_buf_fmt(session, pathbuf, "%s\\*", path)); + *dirlistp = NULL; + *countp = 0; findhandle = INVALID_HANDLE_VALUE; dirallocsz = 0; - dirsz = 0; entries = NULL; + WT_ERR(__wt_strdup(session, directory, &dir_copy)); + pathlen = strlen(dir_copy); + if (dir_copy[pathlen - 1] == '\\') + dir_copy[pathlen - 1] = '\0'; + WT_ERR(__wt_scr_alloc(session, pathlen + 3, &pathbuf)); + WT_ERR(__wt_buf_fmt(session, pathbuf, "%s\\*", dir_copy)); + findhandle = FindFirstFileA(pathbuf->data, &finddata); if (findhandle == INVALID_HANDLE_VALUE) WT_ERR_MSG(session, __wt_getlasterror(), @@ -56,46 +59,54 @@ __wt_win_directory_list(WT_SESSION_IMPL *session, const char *dir, continue; /* The list of files is optionally filtered by a prefix. */ - match = false; if (prefix != NULL && - ((LF_ISSET(WT_DIRLIST_INCLUDE) && - WT_PREFIX_MATCH(finddata.cFileName, prefix)) || - (LF_ISSET(WT_DIRLIST_EXCLUDE) && - !WT_PREFIX_MATCH(finddata.cFileName, prefix)))) - match = true; - if (prefix == NULL || match) { - /* - * We have a file name we want to return. - */ - count++; - if (count > dirsz) { - dirsz += WT_DIR_ENTRY; - WT_ERR(__wt_realloc_def(session, - &dirallocsz, dirsz, &entries)); - } - WT_ERR(__wt_strdup(session, - finddata.cFileName, &entries[count - 1])); - } + !WT_PREFIX_MATCH(finddata.cFileName, prefix)) + continue; + + WT_ERR(__wt_realloc_def( + session, &dirallocsz, count + 1, &entries)); + WT_ERR(__wt_strdup( + session, finddata.cFileName, &entries[count])); + ++count; } while (FindNextFileA(findhandle, &finddata) != 0); - if (count > 0) - *dirlist = entries; + + *dirlistp = entries; *countp = count; err: if (findhandle != INVALID_HANDLE_VALUE) (void)FindClose(findhandle); - __wt_free(session, path); + __wt_free(session, dir_copy); __wt_scr_free(session, &pathbuf); if (ret == 0) return (0); - if (*dirlist != NULL) { - for (count = dirsz; count > 0; count--) - __wt_free(session, entries[count]); - __wt_free(session, entries); - } + WT_TRET(__wt_win_directory_list_free( + file_system, wt_session, entries, count)); WT_RET_MSG(session, ret, "%s: directory-list, prefix \"%s\"", - dir, prefix == NULL ? "" : prefix); + directory, prefix == NULL ? "" : prefix); +} + +/* + * __wt_win_directory_list_free -- + * Free memory returned by __wt_win_directory_list, Windows version. + */ +int +__wt_win_directory_list_free(WT_FILE_SYSTEM *file_system, + WT_SESSION *wt_session, char **dirlist, uint32_t count) +{ + WT_SESSION_IMPL *session; + + WT_UNUSED(file_system); + + session = (WT_SESSION_IMPL *)wt_session; + + if (dirlist != NULL) { + while (count > 0) + __wt_free(session, dirlist[--count]); + __wt_free(session, dirlist); + } + return (0); } diff --git a/src/os_win/os_dlopen.c b/src/os_win/os_dlopen.c index ce949e4ea5f..9289c8f6488 100644 --- a/src/os_win/os_dlopen.c +++ b/src/os_win/os_dlopen.c @@ -20,6 +20,7 @@ __wt_dlopen(WT_SESSION_IMPL *session, const char *path, WT_DLH **dlhp) WT_RET(__wt_calloc_one(session, &dlh)); WT_ERR(__wt_strdup(session, path, &dlh->name)); + WT_ERR(__wt_strdup(session, path == NULL ? "local" : path, &dlh->name)); /* NULL means load from the current binary */ if (path == NULL) { diff --git a/src/os_win/os_fs.c b/src/os_win/os_fs.c index afe3a074374..318ff723829 100644 --- a/src/os_win/os_fs.c +++ b/src/os_win/os_fs.c @@ -9,34 +9,21 @@ #include "wt_internal.h" /* - * __win_directory_sync -- - * Flush a directory to ensure a file creation is durable. - */ -static int -__win_directory_sync(WT_SESSION_IMPL *session, const char *path) -{ - WT_UNUSED(session); - WT_UNUSED(path); - return (0); -} - -/* - * __win_file_exist -- + * __win_fs_exist -- * Return if the file exists. */ static int -__win_file_exist(WT_SESSION_IMPL *session, const char *name, bool *existp) +__win_fs_exist(WT_FILE_SYSTEM *file_system, + WT_SESSION *wt_session, const char *name, bool *existp) { WT_DECL_RET; - char *path; - - WT_RET(__wt_filename(session, name, &path)); + WT_SESSION_IMPL *session; - ret = GetFileAttributesA(path); + WT_UNUSED(file_system); - __wt_free(session, path); + session = (WT_SESSION_IMPL *)wt_session; - if (ret != INVALID_FILE_ATTRIBUTES) + if (GetFileAttributesA(name) != INVALID_FILE_ATTRIBUTES) *existp = true; else *existp = false; @@ -45,142 +32,96 @@ __win_file_exist(WT_SESSION_IMPL *session, const char *name, bool *existp) } /* - * __win_file_remove -- + * __win_fs_remove -- * Remove a file. */ static int -__win_file_remove(WT_SESSION_IMPL *session, const char *name) +__win_fs_remove( + WT_FILE_SYSTEM *file_system, WT_SESSION *wt_session, const char *name) { WT_DECL_RET; - char *path; + WT_SESSION_IMPL *session; -#ifdef HAVE_DIAGNOSTIC - if (__wt_handle_search(session, name, false, NULL, NULL)) - WT_RET_MSG(session, EINVAL, - "%s: file-remove: file has open handles", name); -#endif + WT_UNUSED(file_system); - WT_RET(__wt_filename(session, name, &path)); - name = path; + session = (WT_SESSION_IMPL *)wt_session; - if (DeleteFileA(name) == FALSE) { - ret = __wt_getlasterror(); - __wt_err(session, ret, "%s: file-remove: DeleteFileA", name); - } + if (DeleteFileA(name) == FALSE) + WT_RET_MSG(session, __wt_getlasterror(), + "%s: file-remove: DeleteFileA", name); - __wt_free(session, path); - return (ret); + return (0); } /* - * __win_file_rename -- + * __win_fs_rename -- * Rename a file. */ static int -__win_file_rename(WT_SESSION_IMPL *session, const char *from, const char *to) +__win_fs_rename(WT_FILE_SYSTEM *file_system, + WT_SESSION *wt_session, const char *from, const char *to) { WT_DECL_RET; - char *from_path, *to_path; + WT_SESSION_IMPL *session; -#ifdef HAVE_DIAGNOSTIC - if (__wt_handle_search(session, from, false, NULL, NULL)) - WT_RET_MSG(session, EINVAL, - "%s: file-rename: file has open handles", from); - if (__wt_handle_search(session, to, false, NULL, NULL)) - WT_RET_MSG(session, EINVAL, - "%s: file-rename: file has open handles", to); -#endif + WT_UNUSED(file_system); - from_path = to_path = NULL; - WT_ERR(__wt_filename(session, from, &from_path)); - from = from_path; - WT_ERR(__wt_filename(session, to, &to_path)); - to = to_path; + session = (WT_SESSION_IMPL *)wt_session; /* * Check if file exists since Windows does not override the file if * it exists. */ if (GetFileAttributesA(to) != INVALID_FILE_ATTRIBUTES) - if (DeleteFileA(to) == FALSE) { - ret = __wt_getlasterror(); - __wt_err(session, ret, + if (DeleteFileA(to) == FALSE) + WT_RET_MSG(session, __wt_getlasterror(), "%s to %s: file-rename: rename", from, to); - } - if (ret == 0 && MoveFileA(from, to) == FALSE) { - ret = __wt_getlasterror(); - __wt_err(session, ret, + if (MoveFileA(from, to) == FALSE) + WT_RET_MSG(session, __wt_getlasterror(), "%s to %s: file-rename: rename", from, to); - } -err: __wt_free(session, from_path); - __wt_free(session, to_path); - return (ret); + return (0); } /* - * __win_file_size -- + * __wt_win_fs_size -- * Get the size of a file in bytes, by file name. */ -static int -__win_file_size( - WT_SESSION_IMPL *session, const char *name, bool silent, wt_off_t *sizep) +int +__wt_win_fs_size(WT_FILE_SYSTEM *file_system, + WT_SESSION *wt_session, const char *name, wt_off_t *sizep) { WIN32_FILE_ATTRIBUTE_DATA data; - WT_DECL_RET; - char *path; - - WT_RET(__wt_filename(session, name, &path)); + WT_SESSION_IMPL *session; - ret = GetFileAttributesExA(path, GetFileExInfoStandard, &data); + WT_UNUSED(file_system); - __wt_free(session, path); + session = (WT_SESSION_IMPL *)wt_session; - if (ret != 0) { + if (GetFileAttributesExA(name, GetFileExInfoStandard, &data) != 0) { *sizep = ((int64_t)data.nFileSizeHigh << 32) | data.nFileSizeLow; return (0); } - /* - * Some callers of this function expect failure if the file doesn't - * exist, and don't want an error message logged. - */ - ret = __wt_getlasterror(); - if (!silent) - WT_RET_MSG(session, ret, - "%s: file-size: GetFileAttributesEx", name); - return (ret); -} - -/* - * __win_handle_allocate_configure -- - * Configure fallocate behavior for a file handle. - */ -static void -__win_handle_allocate_configure(WT_SESSION_IMPL *session, WT_FH *fh) -{ - WT_UNUSED(session); - - /* - * fallocate on Windows would be implemented using SetEndOfFile, which - * can also truncate the file. WiredTiger expects fallocate to ignore - * requests to truncate the file which Windows does not do, so we don't - * support the call. - */ - fh->fallocate_available = WT_FALLOCATE_NOT_AVAILABLE; - fh->fallocate_requires_locking = false; + WT_RET_MSG(session, __wt_getlasterror(), + "%s: file-size: GetFileAttributesEx", name); } /* - * __win_handle_close -- + * __win_file_close -- * ANSI C close. */ static int -__win_handle_close(WT_SESSION_IMPL *session, WT_FH *fh) +__win_file_close(WT_FILE_HANDLE *file_handle, WT_SESSION *wt_session) { WT_DECL_RET; + WT_FILE_HANDLE_WIN *win_fh; + WT_SESSION_IMPL *session; + + win_fh = (WT_FILE_HANDLE_WIN *)file_handle; + session = (WT_SESSION_IMPL *)wt_session; /* * Close the primary and secondary handles. @@ -189,31 +130,40 @@ __win_handle_close(WT_SESSION_IMPL *session, WT_FH *fh) * flushing, as it's not necessary (or possible) to flush a directory * on Windows. Confirm the file handle is open before closing it. */ - if (fh->filehandle != INVALID_HANDLE_VALUE && - CloseHandle(fh->filehandle) == 0) { + if (win_fh->filehandle != INVALID_HANDLE_VALUE && + CloseHandle(win_fh->filehandle) == 0) { ret = __wt_getlasterror(); __wt_err(session, ret, - "%s: handle-close: CloseHandle", fh->name); + "%s: handle-close: CloseHandle", file_handle->name); } - if (fh->filehandle_secondary != INVALID_HANDLE_VALUE && - CloseHandle(fh->filehandle_secondary) == 0) { + if (win_fh->filehandle_secondary != INVALID_HANDLE_VALUE && + CloseHandle(win_fh->filehandle_secondary) == 0) { ret = __wt_getlasterror(); __wt_err(session, ret, - "%s: handle-close: secondary: CloseHandle", fh->name); + "%s: handle-close: secondary: CloseHandle", + file_handle->name); } + __wt_free(session, file_handle->name); + __wt_free(session, win_fh); return (ret); } /* - * __win_handle_lock -- + * __win_file_lock -- * Lock/unlock a file. */ static int -__win_handle_lock(WT_SESSION_IMPL *session, WT_FH *fh, bool lock) +__win_file_lock( + WT_FILE_HANDLE *file_handle, WT_SESSION *wt_session, bool lock) { WT_DECL_RET; + WT_FILE_HANDLE_WIN *win_fh; + WT_SESSION_IMPL *session; + + win_fh = (WT_FILE_HANDLE_WIN *)file_handle; + session = (WT_SESSION_IMPL *)wt_session; /* * WiredTiger requires this function be able to acquire locks past @@ -231,37 +181,42 @@ __win_handle_lock(WT_SESSION_IMPL *session, WT_FH *fh, bool lock) * This is useful to coordinate adding records to the end of a file. */ if (lock) { - if (LockFile(fh->filehandle, 0, 0, 1, 0) == FALSE) { + if (LockFile(win_fh->filehandle, 0, 0, 1, 0) == FALSE) { ret = __wt_getlasterror(); __wt_err(session, ret, - "%s: handle-lock: LockFile", fh->name); + "%s: handle-lock: LockFile", file_handle->name); } } else - if (UnlockFile(fh->filehandle, 0, 0, 1, 0) == FALSE) { + if (UnlockFile(win_fh->filehandle, 0, 0, 1, 0) == FALSE) { ret = __wt_getlasterror(); __wt_err(session, ret, - "%s: handle-lock: UnlockFile", fh->name); + "%s: handle-lock: UnlockFile", file_handle->name); } return (ret); } /* - * __win_handle_read -- + * __win_file_read -- * Read a chunk. */ static int -__win_handle_read( - WT_SESSION_IMPL *session, WT_FH *fh, wt_off_t offset, size_t len, void *buf) +__win_file_read(WT_FILE_HANDLE *file_handle, + WT_SESSION *wt_session, wt_off_t offset, size_t len, void *buf) { DWORD chunk, nr; uint8_t *addr; OVERLAPPED overlapped = { 0 }; + WT_FILE_HANDLE_WIN *win_fh; + WT_SESSION_IMPL *session; + + win_fh = (WT_FILE_HANDLE_WIN *)file_handle; + session = (WT_SESSION_IMPL *)wt_session; nr = 0; /* Assert direct I/O is aligned and a multiple of the alignment. */ WT_ASSERT(session, - !fh->direct_io || + !win_fh->direct_io || S2C(session)->buffer_alignment == 0 || (!((uintptr_t)buf & (uintptr_t)(S2C(session)->buffer_alignment - 1)) && @@ -274,42 +229,54 @@ __win_handle_read( overlapped.Offset = UINT32_MAX & offset; overlapped.OffsetHigh = UINT32_MAX & (offset >> 32); - if (!ReadFile(fh->filehandle, addr, chunk, &nr, &overlapped)) + if (!ReadFile( + win_fh->filehandle, addr, chunk, &nr, &overlapped)) WT_RET_MSG(session, __wt_getlasterror(), "%s: handle-read: ReadFile: failed to read %lu " "bytes at offset %" PRIuMAX, - fh->name, chunk, (uintmax_t)offset); + file_handle->name, chunk, (uintmax_t)offset); } return (0); } /* - * __win_handle_size -- + * __win_file_size -- * Get the size of a file in bytes, by file handle. */ static int -__win_handle_size(WT_SESSION_IMPL *session, WT_FH *fh, wt_off_t *sizep) +__win_file_size( + WT_FILE_HANDLE *file_handle, WT_SESSION *wt_session, wt_off_t *sizep) { + WT_FILE_HANDLE_WIN *win_fh; + WT_SESSION_IMPL *session; LARGE_INTEGER size; - if (GetFileSizeEx(fh->filehandle, &size) != 0) { + win_fh = (WT_FILE_HANDLE_WIN *)file_handle; + session = (WT_SESSION_IMPL *)wt_session; + + if (GetFileSizeEx(win_fh->filehandle, &size) != 0) { *sizep = size.QuadPart; return (0); } - WT_RET_MSG(session, - __wt_getlasterror(), "%s: handle-size: GetFileSizeEx", fh->name); + WT_RET_MSG(session, __wt_getlasterror(), + "%s: handle-size: GetFileSizeEx", file_handle->name); } /* - * __win_handle_sync -- + * __win_file_sync -- * MSVC fsync. */ static int -__win_handle_sync(WT_SESSION_IMPL *session, WT_FH *fh, bool block) +__win_file_sync(WT_FILE_HANDLE *file_handle, WT_SESSION *wt_session) { WT_DECL_RET; + WT_FILE_HANDLE_WIN *win_fh; + WT_SESSION_IMPL *session; + + win_fh = (WT_FILE_HANDLE_WIN *)file_handle; + session = (WT_SESSION_IMPL *)wt_session; /* * We don't open Windows system handles when opening directories @@ -317,72 +284,79 @@ __win_handle_sync(WT_SESSION_IMPL *session, WT_FH *fh, bool block) * a directory on Windows. Confirm the file handle is set before * attempting to sync it. */ - if (fh->filehandle == INVALID_HANDLE_VALUE) + if (win_fh->filehandle == INVALID_HANDLE_VALUE) return (0); - /* - * Callers attempting asynchronous flush handle ENOTSUP returns, - * and won't make further attempts. - */ - if (!block) - return (ENOTSUP); - - if (FlushFileBuffers(fh->filehandle) == FALSE) { + if (FlushFileBuffers(win_fh->filehandle) == FALSE) { ret = __wt_getlasterror(); WT_RET_MSG(session, ret, - "%s handle-sync: FlushFileBuffers error", fh->name); + "%s handle-sync: FlushFileBuffers error", + file_handle->name); } return (0); } /* - * __win_handle_truncate -- + * __win_file_truncate -- * Truncate a file. */ static int -__win_handle_truncate(WT_SESSION_IMPL *session, WT_FH *fh, wt_off_t len) +__win_file_truncate( + WT_FILE_HANDLE *file_handle, WT_SESSION *wt_session, wt_off_t len) { WT_DECL_RET; + WT_FILE_HANDLE_WIN *win_fh; + WT_SESSION_IMPL *session; LARGE_INTEGER largeint; + win_fh = (WT_FILE_HANDLE_WIN *)file_handle; + session = (WT_SESSION_IMPL *)wt_session; + largeint.QuadPart = len; - if (fh->filehandle_secondary == INVALID_HANDLE_VALUE) + if (win_fh->filehandle_secondary == INVALID_HANDLE_VALUE) WT_RET_MSG(session, EINVAL, - "%s: handle-truncate: read-only", fh->name); + "%s: handle-truncate: read-only", file_handle->name); if (SetFilePointerEx( - fh->filehandle_secondary, largeint, NULL, FILE_BEGIN) == FALSE) + win_fh->filehandle_secondary, largeint, NULL, FILE_BEGIN) == FALSE) WT_RET_MSG(session, __wt_getlasterror(), - "%s: handle-truncate: SetFilePointerEx", fh->name); + "%s: handle-truncate: SetFilePointerEx", + file_handle->name); - if (SetEndOfFile(fh->filehandle_secondary) == FALSE) { + if (SetEndOfFile(win_fh->filehandle_secondary) == FALSE) { if (GetLastError() == ERROR_USER_MAPPED_FILE) return (EBUSY); WT_RET_MSG(session, __wt_getlasterror(), - "%s: handle-truncate: SetEndOfFile error", fh->name); + "%s: handle-truncate: SetEndOfFile error", + file_handle->name); } return (0); } /* - * __win_handle_write -- + * __win_file_write -- * Write a chunk. */ static int -__win_handle_write(WT_SESSION_IMPL *session, - WT_FH *fh, wt_off_t offset, size_t len, const void *buf) +__win_file_write(WT_FILE_HANDLE *file_handle, + WT_SESSION *wt_session, wt_off_t offset, size_t len, const void *buf) { DWORD chunk; DWORD nw; const uint8_t *addr; OVERLAPPED overlapped = { 0 }; + WT_FILE_HANDLE_WIN *win_fh; + WT_SESSION_IMPL *session; + + win_fh = (WT_FILE_HANDLE_WIN *)file_handle; + session = (WT_SESSION_IMPL *)wt_session; nw = 0; /* Assert direct I/O is aligned and a multiple of the alignment. */ WT_ASSERT(session, - !fh->direct_io || + !win_fh->direct_io || S2C(session)->buffer_alignment == 0 || (!((uintptr_t)buf & (uintptr_t)(S2C(session)->buffer_alignment - 1)) && @@ -395,36 +369,47 @@ __win_handle_write(WT_SESSION_IMPL *session, overlapped.Offset = UINT32_MAX & offset; overlapped.OffsetHigh = UINT32_MAX & (offset >> 32); - if (!WriteFile(fh->filehandle, addr, chunk, &nw, &overlapped)) + if (!WriteFile( + win_fh->filehandle, addr, chunk, &nw, &overlapped)) WT_RET_MSG(session, __wt_getlasterror(), "%s: handle-write: WriteFile: failed to write %lu " "bytes at offset %" PRIuMAX, - fh->name, chunk, (uintmax_t)offset); + file_handle->name, chunk, (uintmax_t)offset); } return (0); } /* - * __win_file_open -- + * __win_open_file -- * Open a file handle. */ static int -__win_file_open(WT_SESSION_IMPL *session, - WT_FH *fh, const char *name, uint32_t file_type, uint32_t flags) +__win_open_file(WT_FILE_SYSTEM *file_system, WT_SESSION *wt_session, + const char *name, WT_OPEN_FILE_TYPE file_type, uint32_t flags, + WT_FILE_HANDLE **file_handlep) { DWORD dwCreationDisposition; - HANDLE filehandle, filehandle_secondary; WT_CONNECTION_IMPL *conn; WT_DECL_RET; + WT_FILE_HANDLE *file_handle; + WT_FILE_HANDLE_WIN *win_fh; + WT_SESSION_IMPL *session; int desired_access, f; - bool direct_io; + WT_UNUSED(file_system); + + *file_handlep = NULL; + + session = (WT_SESSION_IMPL *)wt_session; conn = S2C(session); - direct_io = false; + + WT_RET(__wt_calloc_one(session, &win_fh)); + + win_fh->direct_io = false; /* Set up error handling. */ - fh->filehandle = fh->filehandle_secondary = - filehandle = filehandle_secondary = INVALID_HANDLE_VALUE; + win_fh->filehandle = + win_fh->filehandle_secondary = INVALID_HANDLE_VALUE; /* * Opening a file handle on a directory is only to support filesystems @@ -432,7 +417,7 @@ __win_file_open(WT_SESSION_IMPL *session, * require that functionality: create an empty WT_FH structure with * invalid handles. */ - if (file_type == WT_FILE_TYPE_DIRECTORY) + if (file_type == WT_OPEN_FILE_TYPE_DIRECTORY) goto directory_open; desired_access = GENERIC_READ; @@ -460,33 +445,33 @@ __win_file_open(WT_SESSION_IMPL *session, /* Direct I/O. */ if (LF_ISSET(WT_OPEN_DIRECTIO)) { f |= FILE_FLAG_NO_BUFFERING; - fh->direct_io = true; + win_fh->direct_io = true; } /* FILE_FLAG_WRITE_THROUGH does not require aligned buffers */ if (FLD_ISSET(conn->write_through, file_type)) f |= FILE_FLAG_WRITE_THROUGH; - if (file_type == WT_FILE_TYPE_LOG && + if (file_type == WT_OPEN_FILE_TYPE_LOG && FLD_ISSET(conn->txn_logsync, WT_LOG_DSYNC)) f |= FILE_FLAG_WRITE_THROUGH; /* Disable read-ahead on trees: it slows down random read workloads. */ - if (file_type == WT_FILE_TYPE_DATA) + if (file_type == WT_OPEN_FILE_TYPE_DATA) f |= FILE_FLAG_RANDOM_ACCESS; - filehandle = CreateFileA(name, desired_access, + win_fh->filehandle = CreateFileA(name, desired_access, FILE_SHARE_READ | FILE_SHARE_WRITE, NULL, dwCreationDisposition, f, NULL); - if (filehandle == INVALID_HANDLE_VALUE) { + if (win_fh->filehandle == INVALID_HANDLE_VALUE) { if (LF_ISSET(WT_OPEN_CREATE) && GetLastError() == ERROR_FILE_EXISTS) - filehandle = CreateFileA(name, desired_access, + win_fh->filehandle = CreateFileA(name, desired_access, FILE_SHARE_READ | FILE_SHARE_WRITE, NULL, OPEN_EXISTING, f, NULL); - if (filehandle == INVALID_HANDLE_VALUE) + if (win_fh->filehandle == INVALID_HANDLE_VALUE) WT_ERR_MSG(session, __wt_getlasterror(), - direct_io ? + win_fh->direct_io ? "%s: handle-open: CreateFileA: failed with direct " "I/O configured, some filesystem types do not " "support direct I/O" : @@ -499,74 +484,88 @@ __win_file_open(WT_SESSION_IMPL *session, * pointer. */ if (!LF_ISSET(WT_OPEN_READONLY)) { - filehandle_secondary = CreateFileA(name, desired_access, + win_fh->filehandle_secondary = CreateFileA(name, desired_access, FILE_SHARE_READ | FILE_SHARE_WRITE, NULL, OPEN_EXISTING, f, NULL); - if (filehandle_secondary == INVALID_HANDLE_VALUE) + if (win_fh->filehandle_secondary == INVALID_HANDLE_VALUE) WT_ERR_MSG(session, __wt_getlasterror(), "%s: handle-open: CreateFileA: secondary", name); } - /* Configure fallocate/posix_fallocate calls. */ - __win_handle_allocate_configure(session, fh); - directory_open: - fh->filehandle = filehandle; - fh->filehandle_secondary = filehandle_secondary; - - fh->fh_close = __win_handle_close; - fh->fh_lock = __win_handle_lock; - fh->fh_map = __wt_win_map; - fh->fh_map_discard = __wt_win_map_discard; - fh->fh_map_preload = __wt_win_map_preload; - fh->fh_map_unmap = __wt_win_map_unmap; - fh->fh_read = __win_handle_read; - fh->fh_size = __win_handle_size; - fh->fh_sync = __win_handle_sync; - fh->fh_truncate = __win_handle_truncate; - fh->fh_write = __win_handle_write; + /* Initialize public information. */ + file_handle = (WT_FILE_HANDLE *)win_fh; + WT_ERR(__wt_strdup(session, name, &file_handle->name)); - return (0); + file_handle->close = __win_file_close; + file_handle->lock = __win_file_lock; +#ifdef WORDS_BIGENDIAN + /* + * The underlying objects are little-endian, mapping objects isn't + * currently supported on big-endian systems. + */ +#else + file_handle->map = __wt_win_map; + file_handle->map_discard = NULL; + file_handle->map_preload = NULL; + file_handle->unmap = __wt_win_unmap; +#endif + file_handle->read = __win_file_read; + file_handle->size = __win_file_size; + file_handle->sync = __win_file_sync; + file_handle->truncate = __win_file_truncate; + file_handle->write = __win_file_write; + + *file_handlep = file_handle; -err: if (filehandle != INVALID_HANDLE_VALUE) - (void)CloseHandle(filehandle); - if (filehandle_secondary != INVALID_HANDLE_VALUE) - (void)CloseHandle(filehandle_secondary); + return (0); +err: WT_TRET(__win_file_close((WT_FILE_HANDLE *)win_fh, wt_session)); return (ret); } /* - * __wt_os_win -- - * Initialize a MSVC configuration. + * __win_terminate -- + * Discard a Windows configuration. */ -int -__wt_os_win(WT_SESSION_IMPL *session) +static int +__win_terminate(WT_FILE_SYSTEM *file_system, WT_SESSION *wt_session) { - WT_CONNECTION_IMPL *conn; - - conn = S2C(session); + WT_SESSION_IMPL *session; - /* Initialize the POSIX jump table. */ - conn->file_directory_list = __wt_win_directory_list; - conn->file_directory_sync = __win_directory_sync; - conn->file_exist = __win_file_exist; - conn->file_open = __win_file_open; - conn->file_remove = __win_file_remove; - conn->file_rename = __win_file_rename; - conn->file_size = __win_file_size; + session = (WT_SESSION_IMPL *)wt_session; + __wt_free(session, file_system); return (0); } /* - * __wt_os_win_cleanup -- - * Discard a POSIX configuration. + * __wt_os_win -- + * Initialize a MSVC configuration. */ int -__wt_os_win_cleanup(WT_SESSION_IMPL *session) +__wt_os_win(WT_SESSION_IMPL *session) { - WT_UNUSED(session); + WT_CONNECTION_IMPL *conn; + WT_FILE_SYSTEM *file_system; + + conn = S2C(session); + + WT_RET(__wt_calloc_one(session, &file_system)); + + /* Initialize the Windows jump table. */ + file_system->directory_list = __wt_win_directory_list; + file_system->directory_list_free = __wt_win_directory_list_free; + file_system->directory_sync = NULL; + file_system->exist = __win_fs_exist; + file_system->open_file = __win_open_file; + file_system->remove = __win_fs_remove; + file_system->rename = __win_fs_rename; + file_system->size = __wt_win_fs_size; + file_system->terminate = __win_terminate; + + /* Switch it into place. */ + conn->file_system = file_system; return (0); } diff --git a/src/os_win/os_map.c b/src/os_win/os_map.c index b043f9c9923..488cbfb2ceb 100644 --- a/src/os_win/os_map.c +++ b/src/os_win/os_map.c @@ -13,106 +13,83 @@ * Map a file into memory. */ int -__wt_win_map(WT_SESSION_IMPL *session, - WT_FH *fh, void *mapp, size_t *lenp, void **mappingcookie) +__wt_win_map(WT_FILE_HANDLE *file_handle, WT_SESSION *wt_session, + void *mapped_regionp, size_t *lenp, void *mapped_cookiep) { WT_DECL_RET; + WT_FILE_HANDLE_WIN *win_fh; + WT_SESSION_IMPL *session; size_t len; wt_off_t file_size; - void *map; + void *map, *mapped_cookie; + + win_fh = (WT_FILE_HANDLE_WIN *)file_handle; + session = (WT_SESSION_IMPL *)wt_session; /* * There's no locking here to prevent the underlying file from changing * underneath us, our caller needs to ensure consistency of the mapped * region vs. any other file activity. */ - WT_RET(__wt_filesize(session, fh, &file_size)); + WT_RET(__wt_win_fs_size(file_handle->file_system, + wt_session, file_handle->name, &file_size)); len = (size_t)file_size; (void)__wt_verbose(session, WT_VERB_HANDLEOPS, - "%s: memory-map: %" WT_SIZET_FMT " bytes", fh->name, len); + "%s: memory-map: %" WT_SIZET_FMT " bytes", file_handle->name, len); - *mappingcookie = - CreateFileMappingA(fh->filehandle, NULL, PAGE_READONLY, 0, 0, NULL); - if (*mappingcookie == NULL) + mapped_cookie = CreateFileMappingA( + win_fh->filehandle, NULL, PAGE_READONLY, 0, 0, NULL); + if (mapped_cookie == NULL) WT_RET_MSG(session, __wt_getlasterror(), - "%s: memory-map: CreateFileMappingA", fh->name); + "%s: memory-map: CreateFileMappingA", file_handle->name); if ((map = - MapViewOfFile(*mappingcookie, FILE_MAP_READ, 0, 0, len)) == NULL) { + MapViewOfFile(mapped_cookie, FILE_MAP_READ, 0, 0, len)) == NULL) { /* Retrieve the error before cleaning up. */ ret = __wt_getlasterror(); - CloseHandle(*mappingcookie); - *mappingcookie = NULL; + CloseHandle(mapped_cookie); WT_RET_MSG(session, ret, - "%s: memory-map: MapViewOfFile", fh->name); + "%s: memory-map: MapViewOfFile", file_handle->name); } - *(void **)mapp = map; + *(void **)mapped_cookiep = mapped_cookie; + *(void **)mapped_regionp = map; *lenp = len; return (0); } /* - * __wt_win_map_preload -- - * Cause a section of a memory map to be faulted in. - */ -int -__wt_win_map_preload( - WT_SESSION_IMPL *session, WT_FH *fh, const void *p, size_t size) -{ - WT_UNUSED(session); - WT_UNUSED(fh); - WT_UNUSED(p); - WT_UNUSED(size); - - return (ENOTSUP); -} - -/* - * __wt_win_map_discard -- - * Discard a chunk of the memory map. - */ -int -__wt_win_map_discard(WT_SESSION_IMPL *session, WT_FH *fh, void *p, size_t size) -{ - WT_UNUSED(session); - WT_UNUSED(fh); - WT_UNUSED(p); - WT_UNUSED(size); - - return (ENOTSUP); -} - -/* - * __wt_win_map_unmap -- + * __wt_win_unmap -- * Remove a memory mapping. */ int -__wt_win_map_unmap(WT_SESSION_IMPL *session, - WT_FH *fh, void *map, size_t len, void **mappingcookie) +__wt_win_unmap(WT_FILE_HANDLE *file_handle, WT_SESSION *wt_session, + void *mapped_region, size_t length, void *mapped_cookie) { WT_DECL_RET; + WT_FILE_HANDLE_WIN *win_fh; + WT_SESSION_IMPL *session; - (void)__wt_verbose(session, WT_VERB_HANDLEOPS, - "%s: memory-unmap: %" WT_SIZET_FMT " bytes", fh->name, len); + win_fh = (WT_FILE_HANDLE_WIN *)file_handle; + session = (WT_SESSION_IMPL *)wt_session; - WT_ASSERT(session, *mappingcookie != NULL); + (void)__wt_verbose(session, WT_VERB_HANDLEOPS, + "%s: memory-unmap: %" WT_SIZET_FMT " bytes", + file_handle->name, length); - if (UnmapViewOfFile(map) == 0) { + if (UnmapViewOfFile(mapped_region) == 0) { ret = __wt_getlasterror(); __wt_err(session, ret, - "%s: memory-unmap: UnmapViewOfFile", fh->name); + "%s: memory-unmap: UnmapViewOfFile", file_handle->name); } - if (CloseHandle(*mappingcookie) == 0) { + if (CloseHandle(*(void **)mapped_cookie) == 0) { ret = __wt_getlasterror(); __wt_err(session, ret, - "%s: memory-unmap: CloseHandle", fh->name); + "%s: memory-unmap: CloseHandle", file_handle->name); } - *mappingcookie = NULL; - return (ret); } diff --git a/src/schema/schema_create.c b/src/schema/schema_create.c index 756f1fdcc6c..67d64cf1c75 100644 --- a/src/schema/schema_create.c +++ b/src/schema/schema_create.c @@ -35,7 +35,7 @@ __wt_direct_io_size_check(WT_SESSION_IMPL *session, * units of its happy place. */ if (FLD_ISSET(conn->direct_io, - WT_FILE_TYPE_CHECKPOINT | WT_FILE_TYPE_DATA)) { + WT_DIRECT_IO_CHECKPOINT | WT_DIRECT_IO_DATA)) { align = (int64_t)conn->buffer_alignment; if (align != 0 && (cval.val < align || cval.val % align != 0)) WT_RET_MSG(session, EINVAL, diff --git a/src/schema/schema_rename.c b/src/schema/schema_rename.c index 21402ed9332..8f4d374fd22 100644 --- a/src/schema/schema_rename.c +++ b/src/schema/schema_rename.c @@ -55,7 +55,7 @@ __rename_file( default: WT_ERR(ret); } - WT_ERR(__wt_exist(session, newfile, &exist)); + WT_ERR(__wt_fs_exist(session, newfile, &exist)); if (exist) WT_ERR_MSG(session, EEXIST, "%s", newfile); @@ -64,7 +64,7 @@ __rename_file( WT_ERR(__wt_metadata_insert(session, newuri, oldvalue)); /* Rename the underlying file. */ - WT_ERR(__wt_rename(session, filename, newfile)); + WT_ERR(__wt_fs_rename(session, filename, newfile)); if (WT_META_TRACKING(session)) WT_ERR(__wt_meta_track_fileop(session, uri, newuri)); diff --git a/src/schema/schema_stat.c b/src/schema/schema_stat.c index d3d0605c60a..c204d6b1a24 100644 --- a/src/schema/schema_stat.c +++ b/src/schema/schema_stat.c @@ -69,6 +69,7 @@ __curstat_size_only(WT_SESSION_IMPL *session, WT_ITEM namebuf; wt_off_t filesize; char *tableconf; + bool exist; WT_CLEAR(namebuf); *was_fast = false; @@ -96,10 +97,11 @@ __curstat_size_only(WT_SESSION_IMPL *session, * are concurrent schema level operations (for example drop). That is * fine - failing here results in falling back to the slow path of * opening the handle. - * !!! Deliberately discard the return code from a failed call - the - * error is flagged by not setting fast to true. */ - if (__wt_filesize_name(session, namebuf.data, true, &filesize) == 0) { + WT_ERR(__wt_fs_exist(session, namebuf.data, &exist)); + if (exist) { + WT_ERR(__wt_fs_size(session, namebuf.data, &filesize)); + /* Setup and populate the statistics structure */ __wt_stat_dsrc_init_single(&cst->u.dsrc_stats); cst->u.dsrc_stats.block_size = filesize; |