summaryrefslogtreecommitdiff
path: root/src
diff options
context:
space:
mode:
authorAlex Gorrod <alexander.gorrod@mongodb.com>2016-04-28 21:16:44 +1000
committerKeith Bostic <keith.bostic@mongodb.com>2016-04-28 07:16:44 -0400
commitb217c497e38141e8980babd2785c98926867e675 (patch)
treeaae0e6b2026952186b17aba9f3f468162d039d49 /src
parent316f7f535da96cbea59e17d33283190bda804c5f (diff)
downloadmongo-b217c497e38141e8980babd2785c98926867e675.tar.gz
WT-2552 Add public API for pluggable filesystems (#2671)
* WT-2552 Add public API for pluggable filesystems Not yet compiling. The main parts of this change should be here, but it involved extensive parameter re-organization. There are also a number of layering violations between our existing file system implementations and the WT_FH, that aren't possible with the new structure. There are a number of specific todo comments in the code. One of the main issues is that the in-memory file system had a special close semantic that relied on WiredTiger handle tracking. The in-memory file-system should do it's own tracking of file handles, I've gone part way down that road by adding a queue for closed handles. Need to also add in live handles, and manage the queue as appropriate. I haven't created an example application that uses the new API yet. * WT-2552 Add public API for pluggable filesystems I always forget you have to remove the already-built html files when changing PREDEFINED, add a reminder to the complaint. * WT-2552 Add public API for pluggable filesystems You have to remove the .js files, too. * WT-2552 Add public API for pluggable filesystems Make dist/s_all run cleanly. * WT-2552 Add public API for pluggable filesystems Whitespace. * WT-2552 Add public API for pluggable filesystems Make it compile/build/lint. * WT-2552 Add public API for pluggable filesystems block_write.c: In function '__wt_block_extend': block_write.c:130:71: error: missing terminating ' character [-Werror] * WT-2552 Add public API for pluggable filesystems os_fs_inmemory.c: In function '__im_file_truncate': os_fs_inmemory.c:344:10: error: 'session' is used uninitialized in this function [-Werror=uninitialized] * WT-2552 Add public API for pluggable filesystems os_fs.c: In function '__posix_directory_sync': os_fs.c:92:10: error: 'session' is used uninitialized in this function [-Werror=uninitialized] * WT-2552 Add public API for pluggable filesystems Go back to using bool types in the file-system API, this requires we add <stdbool.h> to the "standard" wiredtiger.h includes. Consistently use wt_session to represent a WT_SESSION, we were using "wtsession" in some places. Make a pass over the Windows code, but I'm sure it doesn't compile yet. * WT-2552 Add public API for pluggable filesystems Fix up another couple of bool types. * WT-2552 Add public API for pluggable filesystems Move the file naming work out of the underlying filesystem functions, the calls to __wt_filename are now in the upper-level code,n os_fs.i; that means the filesystem code is no longer responsible for figuring out paths. This is cleaner, although the directory-sync call is a bit of a kluge, and I've commimtted us to handling NULL filesystem methods. With this set of changes, in-memory runs again. More Windows naming fixes. * WT-2552 Add public API for pluggable filesystems os_fs.c: In function '__posix_directory_sync': os_fs.c:96:3: error: label 'err' used but not defined * WT-2552 Add public API for pluggable filesystems Pull out another call to __wt_filename() from the filesystem-dependent code. * WT-2552 Add public API for pluggable filesystems Consistently check for missing file-system methods when doing file-system calls. Other minor lint & cleanup. * WT-2552 Add public API for pluggable filesystems Change the in-memory code to maintain a complete list of the files it has ever opened, and depend on that list instead of reaching up into the common layer for the WT_FH handle list. This means __wt_handle_search is only used by the common WT_FH handle code, simplify it, and add a __wt_handle_is_open function that can be called for diagnostic purposes (to check for open files that are being renamed or removed, for example). * Fix comiler warning and ignore the file system API in Java * Flesh out the example file system implementation. * Add in some plumbing for set_file_system in wiredtiger_open. * WT-2552 Add public API for pluggable filesystems Whitespace. * WT-2552 Add public API for pluggable filesystems WT_CONFIG_ITEM.val isn't a boolean, don't use boolean types in equal/not-equal comparisons. * WT-2552 Add public API for pluggable filesystems Remove unused #includes. Increment/decrement the DEMO_FILE_SYSTEM.{opened,closed}_file_count. Allocate demo structures, they're larger than the underlying structures. Swap the number/size calloc arguments, number comes first. Fix a couple of statics. * WT-2552 Add public API for pluggable filesystems Use %u instead of casting to %d. * WT-2552 Add public API for pluggable filesystems Add ex_file_system.c to the list of example programs. * WT-2552 Add public API for pluggable filesystems Change ex_file_system.c to not require <wt_internal.h>: strip down a copy of FreeBSD's <queue.h> for local inclusion, rewrite a few other minor pieces of code. * WT-2552 Add public API for pluggable filesystems Update spell check info * WT-2552 Add public API for pluggable filesystems __conn_load_extensions() shouldn't set the "early" boolean to true. * WT-2552 Add public API for pluggable filesystems Don't indirect through a NULL pointer if "local" was set and no path was specified, always set the name to something useful. * WT-2552 Add public API for pluggable filesystems Don't indirect through a NULL pointer if "local" was set and no path was specified, always set the name to something useful. * WT-2552 Add public API for pluggable filesystems wt_off_t vs. size_t conversion lint. * WT-2552 Add public API for pluggable filesystems Add -rdynamic to the load for ex_file_system, the main executable symbols are not exported by default. * WT-2552 Add public API for pluggable filesystems The underlying handle name includes the enclosing directory, compare against the WT_FH.name field instead. * WT-2552 Add public API for pluggable filesystems demo_fs_rename should return 0 if successful, simplify error handling Don't bother casting arguments to free(), it's not necessary. * WT-2552 Add public API for pluggable filesystems General WT_FILE_SYSTEM cleanup. Move OS initialization into the wiredtiger_open() code (the os_common/os_init.c file is no longer needed). Allow early-load extensions to be part of the environment settings, matching the "in-memory" and "readonly" configurations. Syntax check the set of a file-system, remove tests for NULL methods in the file-system structure unless it's legal for them to be NULL. Windows, POSIX and in-memory file systems now set WT_FILE_SYSTEM.terminate, call that function to cleanup when discarding a WT_CONNECTION. Export file-type and open-flags constants for WT_FILE_SYSTEM.open_file, sort the WT_FILE_SYSTEM methods, do an editing pass. Change the WT_FILE_HANDLE type from (const char *) to (char *), it's "owned" by the underlying layer, and it's simpler that way. Minor (untested) cleanup of the Windows WT_FILE_SYSTEM.open-file method. * WT-2552 Add public API for pluggable filesystems Export the advise argument #defines for the WT_FILE_HANDLE.fadvise method. Sort the WT_FILE_HANDLE methods. * WT-2552 Add public API for pluggable filesystems Clean up and simplify WT_FILE_SYSTEM/WT_FILE_HANDLE documentation's description of the handles. * WT-2552 Add public API for pluggable filesystems WT_FILE_HANDLE.close is a required function (at the least, it has to free the memory). WT_FILE_HANDLE.fadvise isn't a required function, if it's not configured, don't call it. * WT-2552 Add public API for pluggable filesystems The WT_FILE_HANDLE.lock function is required. Change the __wt_open() signature to match WT_FILE_SYSTEM.open_file(). * WT-2552 Add public API for pluggable filesystems Rework all of the WT_FILE_HANDLE mapped region methods to be optional. * WT-2552 Add public API for pluggable filesystems The WT_FILE_HANDLE.{read,size} methods are required. The WT_FILE_HANDLE.sync method is not required. Split the WT_FILE_HANDLE.sync method into .sync and .sync_nowait versions, it makes the upper-level code simpler (Windows supports .sync but doesn't support .sync_nowait). * WT-2552 Add public API for pluggable filesystems The WT_FILE_HANDLE.{truncate,write} methods are required IFF the file is not readonly. * WT-2552 Add public API for pluggable filesystems POSIX shouldn't declare a no-sync handle function unless the sync_file_range system call is available. * WT-2552 Add public API for pluggable filesystems Typo, missing semi-colon. * Fix a bug in ex_file_system.c * Fix a memory leak in posix file handle implementation * WT-2552 Use the correct flags when opening backup file. * WT-2552 Add public API for pluggable filesystems Simplify open-file error handling by calling the close function on the handle, that way we won't forget to free all of the applicable memory allocations. * WT-2552 Add public API for pluggable filesystems Simplify the directory-list method, don't pass in an include/exclude file, if prefix is non-NULL, it implies we only want files matching the prefix. * WT-2552 Add public API for pluggable filesystems Replace WT_FILE_HANDLE_POSIX.fallocate_{available,requires_locking} wiht WT_FILE_HANDLE.fallocate and WT_FILE_HANDLE.fallocate_nolock. Example code doesn't need to set WT_FILE_HANDLE methods to NULL, the allocation does that. Free the I/O buffer if open-handle allocation fails in the example code. Remove snippets for WT_FILE_SYSTEM and WT_FILE_HANDLE methods, we're not going to provide example code for them. * WT-2552 Add public API for pluggable filesystems Document we expect either ENOTSUP or EBUSY from optionally supported APIs. Review/cleanups ENOTSUP/EBUSY returns from optionally supported APIs. Make WT_FILE_HANDLE.lock optional. Don't configure or call the POSIX fadvise function on files configured for direct I/O. Rename __wt_filesize_name to __wt_size for consistency. Update the spelling list. * WT-2552 Add public API for pluggable filesystems WT_FILE_HANDLE.truncate requires locking in all known implementations, document it is not called concurrently with other operations. * WT-2552 Add public API for pluggable filesystems Don't terminate the filesystem unless we've actually configured one. * WT-2552 Add public API for pluggable filesystems Remove WT_FILE_SYSTEM and WT_FILE_HANDLE from SWIG so the test suite can pass again. * WT-2552 Add public API for pluggable filesystems Merge __conn_load_early_extensions() and __conn_load_extensions(). Fix a problem where I moved the early extensions load to where it could include the WiredTiger environment variable, but I didn't pass the built cfg into the function. * WT-2552 Add public API for pluggable filesystems Linux build typo. * WT-2552 Add public API for pluggable filesystems Get rid of the "bool silent" argument to WT_FILE_SYSTEM.size by testing for the file's existence before requesting the size (an extra system call, but guaranteed to hit in the buffer cache at least). * WT-2552 Add public API for pluggable filesystems Naming consistency pass over the WT_FILE_SYSTEM functions. * WT-2552 Add public API for pluggable filesystems Fix a spin lock mismatch. * WT-2552 Add public API for pluggable filesystems Another spinlock mismatch. * Update example pluggable file system. Add a directory list implementation to the example, which uncovered an issue with the API. The directory list API allocates memory that is freed by WiredTiger, which I don't think is kosher. * Change file-directory-sync to use reguar fsync. The distinction in os_fs.i doesn't work with the filesystem API. Also add directory_sync application to the example application. * WT-2552 Add public API for pluggable filesystems Whitespace. * WT-2552 Add public API for pluggable filesystems Rewrite __wt_free to not evaluate macro arguments multiple times. * WT-2552 Add public API for pluggable filesystems Simplify the directory-list functions: __wt_realloc_def() already handles scaling the size of the allocations, there's no need to involve a separate constant that increments the allocation size. * WT-2552 Add public API for pluggable filesystems Fix a grouping problem in a realloc call, we need to multiple the size times the previously allocated slots + 10. Fix buffer overrun, if "count" has already been incremented, the memset would skip clearing the first slot and clear one slot past the end of the buffer. Remove a comment, realloc requires clearing allocated memory, it's not paranoia. * WT-2552 Add public API for pluggable filesystems Add the mapping-cookie argument to the map-preload and map-discard functions. Change page-discard to stop reaching down through the block manager, instead, provide a block-manager map-discard function that does the work. * WT-2552 Add public API for pluggable filesystems Require a directory-list function. Implement a directory-list function for the in-memory filesystem. Consistency pass, make all the directory-list functions look the same. * WT-2552 Add public API for pluggable filesystems The WT_FILE_SYSTEM.{directory_sync, remove, rename} methods are not required for read-only systems. * WT-2552 Add public API for pluggable filesystems Change the WT_FILE_SYSTEM.open_file file_type argument from a set of constants to an enum. This requires changing how we store connection direct I/O configuration (the constants used to be flags stored in the WT_CONNECTION_IMPL), and requiring all callers of __wt_open() do their own work to figure out if WT_OPEN_DIRECTIO should be specified. * WT-2552 Add public API for pluggable filesystems Make no guarantees WT_FILE_SYSTEM and WT_FILE_HANDLE methods are not called concurrently (except for WT_FILE_HANDLE::fallocate and WT_FILE_HANDLE::fallocate_nolock). Rewrite the in-memory FS code to lock across all methods (for example, WT_FILE_HANDLE.close), that means including a reference to the enclosing WT_FILE_SYSTEM in the WT_FILE_HANDLE structure so we can find a lock without using the WT_CONNECTION_IMPL structure. * WT-2552 Add public API for pluggable filesystems Remove __wt_directory_sync_fh, it's no longer useful. * WT-2552 Add public API for pluggable filesystems Rename WT_INMEMORY_FILE_SYSTEM to WT_FILE_SYSTEM_INMEM, matching WT_FILE_HANDLE_INMEM. * WT-2552 Add public API for pluggable filesystems Add WT_FILE_SYSTEM.directory_list_free, to free memory allocated by WT_FILE_SYSTEM.direct_list. Fix a memory leak in __log_archive_once (if __wt_readlock failed, we leaked the directory-list memory). * WT-2552 Add public API for pluggable filesystems Typo, check WT_DIRECT_IO_LOG, not WT_DIRECT_IO_CHECKPOINT. * WT-2552 Add public API for pluggable filesystems Typo, unreachable code. * WT-2552 Add public API for pluggable filesystems We don't require WT_FILE_SYSTEM.{remove,rename} if the system is read-only. * Fix Windows build with pluggable file system. Involved removing u_int from the public API. * Fix line wrapping. * Fix Windows terminate function. * Forgot something in my last commit. * Fix Windows munmap bug. * Add new example to Windows build. Extend example to be more complete. * Fix example loading on Windows * Update documentation * Add missing spell words * Remove old comment.
Diffstat (limited to 'src')
-rw-r--r--src/block/block_map.c48
-rw-r--r--src/block/block_mgr.c33
-rw-r--r--src/block/block_open.c24
-rw-r--r--src/block/block_read.c61
-rw-r--r--src/block/block_write.c35
-rw-r--r--src/btree/bt_discard.c10
-rw-r--r--src/config/config_def.c9
-rw-r--r--src/conn/conn_api.c175
-rw-r--r--src/conn/conn_handle.c8
-rw-r--r--src/conn/conn_log.c30
-rw-r--r--src/cursor/cur_backup.c7
-rw-r--r--src/cursor/cur_join.c5
-rw-r--r--src/docs/Doxyfile2
-rw-r--r--src/docs/custom-file-systems.dox25
-rw-r--r--src/docs/examples.dox3
-rw-r--r--src/docs/programming.dox1
-rw-r--r--src/docs/spell.ok3
-rw-r--r--src/include/block.h9
-rw-r--r--src/include/config.h71
-rw-r--r--src/include/connection.h26
-rw-r--r--src/include/extern.h38
-rw-r--r--src/include/flags.h5
-rw-r--r--src/include/misc.h5
-rw-r--r--src/include/os.h107
-rw-r--r--src/include/os_fhandle.i88
-rw-r--r--src/include/os_fs.i197
-rw-r--r--src/include/wiredtiger.in490
-rw-r--r--src/include/wt_internal.h6
-rw-r--r--src/log/log.c101
-rw-r--r--src/lsm/lsm_tree.c4
-rw-r--r--src/lsm/lsm_work_unit.c2
-rw-r--r--src/meta/meta_track.c4
-rw-r--r--src/meta/meta_turtle.c14
-rw-r--r--src/os_common/filename.c16
-rw-r--r--src/os_common/os_fhandle.c305
-rw-r--r--src/os_common/os_fs_inmemory.c557
-rw-r--r--src/os_common/os_fstream.c3
-rw-r--r--src/os_common/os_init.c41
-rw-r--r--src/os_posix/os_dir.c89
-rw-r--r--src/os_posix/os_dlopen.c2
-rw-r--r--src/os_posix/os_fallocate.c153
-rw-r--r--src/os_posix/os_fs.c502
-rw-r--r--src/os_posix/os_map.c118
-rw-r--r--src/os_win/os_dir.c95
-rw-r--r--src/os_win/os_dlopen.c1
-rw-r--r--src/os_win/os_fs.c425
-rw-r--r--src/os_win/os_map.c91
-rw-r--r--src/schema/schema_create.c2
-rw-r--r--src/schema/schema_rename.c4
-rw-r--r--src/schema/schema_stat.c8
50 files changed, 2379 insertions, 1679 deletions
diff --git a/src/block/block_map.c b/src/block/block_map.c
index b16fe7f8423..ce6fe8602f5 100644
--- a/src/block/block_map.c
+++ b/src/block/block_map.c
@@ -13,24 +13,16 @@
* Map a segment of the file in, if possible.
*/
int
-__wt_block_map(
- WT_SESSION_IMPL *session, WT_BLOCK *block, void *mapp, size_t *maplenp,
- void **mappingcookie)
+__wt_block_map(WT_SESSION_IMPL *session, WT_BLOCK *block,
+ void *mapped_regionp, size_t *lengthp, void *mapped_cookiep)
{
WT_DECL_RET;
+ WT_FILE_HANDLE *handle;
- *(void **)mapp = NULL;
- *maplenp = 0;
+ *(void **)mapped_regionp = NULL;
+ *lengthp = 0;
+ *(void **)mapped_cookiep = NULL;
-#ifdef WORDS_BIGENDIAN
- /*
- * The underlying objects are little-endian, mapping objects isn't
- * currently supported on big-endian systems.
- */
- WT_UNUSED(session);
- WT_UNUSED(block);
- WT_UNUSED(mappingcookie);
-#else
/* Map support is configurable. */
if (!S2C(session)->mmap)
return (0);
@@ -51,15 +43,23 @@ __wt_block_map(
return (0);
/*
+ * There may be no underlying functionality.
+ */
+ handle = block->fh->handle;
+ if (handle->map == NULL)
+ return (0);
+
+ /*
* Map the file into memory.
* Ignore not-supported errors, we'll read the file through the cache
* if map fails.
*/
- ret = block->fh->fh_map(
- session, block->fh, mapp, maplenp, mappingcookie);
- if (ret == ENOTSUP)
+ ret = handle->map(handle,
+ (WT_SESSION *)session, mapped_regionp, lengthp, mapped_cookiep);
+ if (ret == ENOTSUP) {
+ *(void **)mapped_regionp = NULL;
ret = 0;
-#endif
+ }
return (ret);
}
@@ -69,11 +69,13 @@ __wt_block_map(
* Unmap any mapped-in segment of the file.
*/
int
-__wt_block_unmap(
- WT_SESSION_IMPL *session, WT_BLOCK *block, void *map, size_t maplen,
- void **mappingcookie)
+__wt_block_unmap(WT_SESSION_IMPL *session,
+ WT_BLOCK *block, void *mapped_region, size_t length, void *mapped_cookie)
{
+ WT_FILE_HANDLE *handle;
+
/* Unmap the file from memory. */
- return (block->fh->fh_map_unmap(
- session, block->fh, map, maplen, mappingcookie));
+ handle = block->fh->handle;
+ return (handle->unmap(handle,
+ (WT_SESSION *)session, mapped_region, length, mapped_cookie));
}
diff --git a/src/block/block_mgr.c b/src/block/block_mgr.c
index 06150a0f062..465952d8ca5 100644
--- a/src/block/block_mgr.c
+++ b/src/block/block_mgr.c
@@ -103,7 +103,7 @@ __bm_checkpoint_load(WT_BM *bm, WT_SESSION_IMPL *session,
* of being read into cache buffers.
*/
WT_RET(__wt_block_map(session,
- bm->block, &bm->map, &bm->maplen, &bm->mappingcookie));
+ bm->block, &bm->map, &bm->maplen, &bm->mapped_cookie));
/*
* If this handle is for a checkpoint, that is, read-only, there
@@ -149,7 +149,7 @@ __bm_checkpoint_unload(WT_BM *bm, WT_SESSION_IMPL *session)
/* Unmap any mapped segment. */
if (bm->map != NULL)
WT_TRET(__wt_block_unmap(session,
- bm->block, bm->map, bm->maplen, &bm->mappingcookie));
+ bm->block, bm->map, bm->maplen, &bm->mapped_cookie));
/* Unload the checkpoint. */
WT_TRET(__wt_block_checkpoint_unload(session, bm->block, !bm->is_live));
@@ -302,6 +302,20 @@ __bm_is_mapped(WT_BM *bm, WT_SESSION_IMPL *session)
}
/*
+ * __bm_map_discard --
+ * Discard a mapped segment.
+ */
+static int
+__bm_map_discard(WT_BM *bm, WT_SESSION_IMPL *session, void *map, size_t len)
+{
+ WT_FILE_HANDLE *handle;
+
+ handle = bm->block->fh->handle;
+ return (handle->map_discard(
+ handle, (WT_SESSION *)session, map, len, bm->mapped_cookie));
+}
+
+/*
* __bm_salvage_end --
* End a block manager salvage.
*/
@@ -413,19 +427,7 @@ __bm_stat(WT_BM *bm, WT_SESSION_IMPL *session, WT_DSRC_STATS *stats)
static int
__bm_sync(WT_BM *bm, WT_SESSION_IMPL *session, bool block)
{
- WT_DECL_RET;
-
- if (!block && !bm->block->nowait_sync_available)
- return (0);
-
- if ((ret = __wt_fsync(session, bm->block->fh, block)) == 0)
- return (0);
-
- /* Ignore ENOTSUP, but don't try again. */
- if (ret != ENOTSUP)
- return (ret);
- bm->block->nowait_sync_available = false;
- return (0);
+ return (__wt_fsync(session, bm->block->fh, block));
}
/*
@@ -544,6 +546,7 @@ __bm_method_set(WT_BM *bm, bool readonly)
bm->compact_start = __bm_compact_start;
bm->free = __bm_free;
bm->is_mapped = __bm_is_mapped;
+ bm->map_discard = __bm_map_discard;
bm->preload = __wt_bm_preload;
bm->read = __wt_bm_read;
bm->salvage_end = __bm_salvage_end;
diff --git a/src/block/block_open.c b/src/block/block_open.c
index cc3d8dbb46e..e58bef30a6d 100644
--- a/src/block/block_open.c
+++ b/src/block/block_open.c
@@ -43,7 +43,7 @@ __wt_block_manager_create(
* in our space. Move any existing files out of the way and complain.
*/
for (;;) {
- if ((ret = __wt_open(session, filename, WT_FILE_TYPE_DATA,
+ if ((ret = __wt_open(session, filename, WT_OPEN_FILE_TYPE_DATA,
WT_OPEN_CREATE | WT_OPEN_EXCLUSIVE, &fh)) == 0)
break;
WT_ERR_TEST(ret != EEXIST, ret);
@@ -53,10 +53,10 @@ __wt_block_manager_create(
for (suffix = 1;; ++suffix) {
WT_ERR(__wt_buf_fmt(
session, tmp, "%s.%d", filename, suffix));
- WT_ERR(__wt_exist(session, tmp->data, &exists));
+ WT_ERR(__wt_fs_exist(session, tmp->data, &exists));
if (!exists) {
- WT_ERR(
- __wt_rename(session, filename, tmp->data));
+ WT_ERR(__wt_fs_rename(
+ session, filename, tmp->data));
WT_ERR(__wt_msg(session,
"unexpected file %s found, renamed to %s",
filename, (char *)tmp->data));
@@ -82,11 +82,11 @@ __wt_block_manager_create(
* that the file will appear.
*/
if (ret == 0)
- WT_TRET(__wt_directory_sync(session, filename));
+ WT_TRET(__wt_fs_directory_sync(session, filename));
/* Undo any create on error. */
if (ret != 0)
- WT_TRET(__wt_remove(session, filename));
+ WT_TRET(__wt_fs_remove(session, filename));
err: __wt_scr_free(session, &tmp);
@@ -200,20 +200,18 @@ __wt_block_open(WT_SESSION_IMPL *session,
/* Set the file extension information. */
block->extend_len = conn->data_extend_len;
- /* Set the asynchronous flush, preload availability. */
- block->nowait_sync_available = true;
- block->preload_available = true;
-
/*
* Open the underlying file handle.
*
* "direct_io=checkpoint" configures direct I/O for readonly data files.
*/
flags = 0;
- if (readonly && FLD_ISSET(conn->direct_io, WT_FILE_TYPE_CHECKPOINT))
+ if (readonly && FLD_ISSET(conn->direct_io, WT_DIRECT_IO_CHECKPOINT))
+ LF_SET(WT_OPEN_DIRECTIO);
+ if (!readonly && FLD_ISSET(conn->direct_io, WT_DIRECT_IO_DATA))
LF_SET(WT_OPEN_DIRECTIO);
WT_ERR(__wt_open(
- session, filename, WT_FILE_TYPE_DATA, flags, &block->fh));
+ session, filename, WT_OPEN_FILE_TYPE_DATA, flags, &block->fh));
/* Set the file's size. */
WT_ERR(__wt_filesize(session, block->fh, &block->size));
@@ -426,5 +424,5 @@ int
__wt_block_manager_named_size(
WT_SESSION_IMPL *session, const char *name, wt_off_t *sizep)
{
- return (__wt_filesize_name(session, name, false, sizep));
+ return (__wt_fs_size(session, name, sizep));
}
diff --git a/src/block/block_read.c b/src/block/block_read.c
index 6f0c41c1b5c..7304f6ff4bc 100644
--- a/src/block/block_read.c
+++ b/src/block/block_read.c
@@ -19,44 +19,32 @@ __wt_bm_preload(
WT_BLOCK *block;
WT_DECL_ITEM(tmp);
WT_DECL_RET;
+ WT_FILE_HANDLE *handle;
wt_off_t offset;
uint32_t cksum, size;
bool mapped;
WT_UNUSED(addr_size);
+
block = bm->block;
WT_STAT_FAST_CONN_INCR(session, block_preload);
- /* Preload the block. */
- if (block->preload_available) {
- /* Crack the cookie. */
- WT_RET(__wt_block_buffer_to_addr(
- block, addr, &offset, &size, &cksum));
-
- mapped = bm->map != NULL &&
- offset + size <= (wt_off_t)bm->maplen;
- if (mapped)
- ret = block->fh->fh_map_preload(session,
- block->fh, (uint8_t *)bm->map + offset, size);
- else
- ret = block->fh->fh_advise(session,
- block->fh, (wt_off_t)offset,
- (wt_off_t)size, POSIX_FADV_WILLNEED);
- if (ret == 0)
- return (0);
-
- /* Ignore ENOTSUP, but don't try again. */
- if (ret != ENOTSUP)
- return (ret);
- block->preload_available = false;
- }
+ /* Crack the cookie. */
+ WT_RET(__wt_block_buffer_to_addr(block, addr, &offset, &size, &cksum));
- /*
- * If preload isn't supported, do it the slow way; don't call the
- * underlying read routine directly, we don't know for certain if
- * this is a mapped range.
- */
+ handle = block->fh->handle;
+ mapped = bm->map != NULL && offset + size <= (wt_off_t)bm->maplen;
+ if (mapped && handle->map_preload != NULL)
+ ret = handle->map_preload(handle, (WT_SESSION *)session,
+ (uint8_t *)bm->map + offset, size, bm->mapped_cookie);
+ if (!mapped && handle->fadvise != NULL)
+ ret = handle->fadvise(handle, (WT_SESSION *)session,
+ (wt_off_t)offset, (wt_off_t)size, WT_FILE_HANDLE_WILLNEED);
+ if (ret != EBUSY && ret != ENOTSUP)
+ return (ret);
+
+ /* If preload isn't supported, do it the slow way. */
WT_RET(__wt_scr_alloc(session, 0, &tmp));
ret = __wt_bm_read(bm, session, tmp, addr, addr_size);
__wt_scr_free(session, &tmp);
@@ -74,6 +62,7 @@ __wt_bm_read(WT_BM *bm, WT_SESSION_IMPL *session,
{
WT_BLOCK *block;
WT_DECL_RET;
+ WT_FILE_HANDLE *handle;
wt_off_t offset;
uint32_t cksum, size;
bool mapped;
@@ -87,23 +76,17 @@ __wt_bm_read(WT_BM *bm, WT_SESSION_IMPL *session,
/*
* Map the block if it's possible.
*/
+ handle = block->fh->handle;
mapped = bm->map != NULL && offset + size <= (wt_off_t)bm->maplen;
- if (mapped) {
+ if (mapped && handle->map_preload != NULL) {
buf->data = (uint8_t *)bm->map + offset;
buf->size = size;
- if (block->preload_available) {
- ret = block->fh->fh_map_preload(
- session, block->fh, buf->data, buf->size);
-
- /* Ignore ENOTSUP, but don't try again. */
- if (ret != ENOTSUP)
- return (ret);
- block->preload_available = false;
- }
+ ret = handle->map_preload(handle, (WT_SESSION *)session,
+ buf->data, buf->size,bm->mapped_cookie);
WT_STAT_FAST_CONN_INCR(session, block_map_read);
WT_STAT_FAST_CONN_INCRV(session, block_byte_map_read, size);
- return (0);
+ return (ret);
}
#ifdef HAVE_DIAGNOSTIC
diff --git a/src/block/block_write.c b/src/block/block_write.c
index e79e538c920..4f1224f3c13 100644
--- a/src/block/block_write.c
+++ b/src/block/block_write.c
@@ -48,27 +48,28 @@ int
__wt_block_discard(WT_SESSION_IMPL *session, WT_BLOCK *block, size_t added_size)
{
WT_DECL_RET;
+ WT_FILE_HANDLE *handle;
+ /* The file may not support this call. */
+ handle = block->fh->handle;
+ if (handle->fadvise == NULL)
+ return (0);
+
+ /* The call may not be configured. */
if (block->os_cache_max == 0)
return (0);
/*
* We're racing on the addition, but I'm not willing to serialize on it
- * in the standard read path with more evidence it's needed.
+ * in the standard read path without evidence it's needed.
*/
if ((block->os_cache += added_size) <= block->os_cache_max)
return (0);
block->os_cache = 0;
- WT_ERR(block->fh->fh_advise(session,
- block->fh, (wt_off_t)0, (wt_off_t)0, POSIX_FADV_DONTNEED));
- return (0);
-
-err: /* Ignore ENOTSUP, but don't try again. */
- if (ret != ENOTSUP)
- return (ret);
- block->os_cache_max = 0;
- return (0);
+ ret = handle->fadvise(handle, (WT_SESSION *)session,
+ (wt_off_t)0, (wt_off_t)0, WT_FILE_HANDLE_DONTNEED);
+ return (ret == EBUSY || ret == ENOTSUP ? 0 : ret);
}
/*
@@ -80,6 +81,7 @@ __wt_block_extend(WT_SESSION_IMPL *session, WT_BLOCK *block,
WT_FH *fh, wt_off_t offset, size_t align_size, bool *release_lockp)
{
WT_DECL_RET;
+ WT_FILE_HANDLE *handle;
bool locked;
/*
@@ -125,7 +127,8 @@ __wt_block_extend(WT_SESSION_IMPL *session, WT_BLOCK *block,
* based on the filesystem type, fall back to ftruncate in that case,
* and remember that ftruncate requires locking.
*/
- if (fh->fallocate_available != WT_FALLOCATE_NOT_AVAILABLE) {
+ handle = fh->handle;
+ if (handle->fallocate != NULL || handle->fallocate_nolock != NULL) {
/*
* Release any locally acquired lock if not needed to extend the
* file, extending the file may require updating on-disk file's
@@ -133,7 +136,7 @@ __wt_block_extend(WT_SESSION_IMPL *session, WT_BLOCK *block,
* configure for file extension on systems that require locking
* over the extend call.)
*/
- if (!fh->fallocate_requires_locking && *release_lockp) {
+ if (handle->fallocate_nolock != NULL && *release_lockp) {
*release_lockp = locked = false;
__wt_spin_unlock(session, &block->live_lock);
}
@@ -149,8 +152,7 @@ __wt_block_extend(WT_SESSION_IMPL *session, WT_BLOCK *block,
if ((ret = __wt_fallocate(
session, fh, block->size, block->extend_len * 2)) == 0)
return (0);
- if (ret != ENOTSUP)
- return (ret);
+ WT_RET_ERROR_OK(ret, ENOTSUP);
}
/*
@@ -173,9 +175,8 @@ __wt_block_extend(WT_SESSION_IMPL *session, WT_BLOCK *block,
* The truncate might fail if there's a mapped file (in other words, if
* there's an open checkpoint on the file), that's OK.
*/
- if ((ret = __wt_ftruncate(session, fh, block->extend_size)) == EBUSY)
- ret = 0;
- return (ret);
+ WT_RET_BUSY_OK(__wt_ftruncate(session, fh, block->extend_size));
+ return (0);
}
/*
diff --git a/src/btree/bt_discard.c b/src/btree/bt_discard.c
index 509333551c4..9807d5bc88f 100644
--- a/src/btree/bt_discard.c
+++ b/src/btree/bt_discard.c
@@ -40,7 +40,6 @@ __wt_ref_out(WT_SESSION_IMPL *session, WT_REF *ref)
void
__wt_page_out(WT_SESSION_IMPL *session, WT_PAGE **pagep)
{
- WT_FH *fh;
WT_PAGE *page;
WT_PAGE_HEADER *dsk;
WT_PAGE_MODIFY *mod;
@@ -134,10 +133,11 @@ __wt_page_out(WT_SESSION_IMPL *session, WT_PAGE **pagep)
dsk = (WT_PAGE_HEADER *)page->dsk;
if (F_ISSET_ATOMIC(page, WT_PAGE_DISK_ALLOC))
__wt_overwrite_and_free_len(session, dsk, dsk->mem_size);
- if (F_ISSET_ATOMIC(page, WT_PAGE_DISK_MAPPED)) {
- fh = S2BT(session)->bm->block->fh;
- (void)fh->fh_map_discard(session, fh, dsk, dsk->mem_size);
- }
+
+ /* Discard any mapped image. */
+ if (F_ISSET_ATOMIC(page, WT_PAGE_DISK_MAPPED))
+ (void)S2BT(session)->bm->map_discard(
+ S2BT(session)->bm, session, dsk, (size_t)dsk->mem_size);
__wt_overwrite_and_free(session, page);
}
diff --git a/src/config/config_def.c b/src/config/config_def.c
index 3c0940bfc4c..c7bbdf50280 100644
--- a/src/config/config_def.c
+++ b/src/config/config_def.c
@@ -17,6 +17,7 @@ static const WT_CONFIG_CHECK confchk_WT_CONNECTION_close[] = {
static const WT_CONFIG_CHECK confchk_WT_CONNECTION_load_extension[] = {
{ "config", "string", NULL, NULL, NULL, 0 },
+ { "early_load", "boolean", NULL, NULL, NULL, 0 },
{ "entry", "string", NULL, NULL, NULL, 0 },
{ "terminate", "string", NULL, NULL, NULL, 0 },
{ NULL, NULL, NULL, NULL, NULL, 0 }
@@ -958,9 +959,9 @@ static const WT_CONFIG_ENTRY config_entries[] = {
confchk_WT_CONNECTION_close, 1
},
{ "WT_CONNECTION.load_extension",
- "config=,entry=wiredtiger_extension_init,"
+ "config=,early_load=0,entry=wiredtiger_extension_init,"
"terminate=wiredtiger_extension_terminate",
- confchk_WT_CONNECTION_load_extension, 3
+ confchk_WT_CONNECTION_load_extension, 4
},
{ "WT_CONNECTION.open_session",
"isolation=read-committed",
@@ -982,6 +983,10 @@ static const WT_CONFIG_ENTRY config_entries[] = {
"timestamp=\"%b %d %H:%M:%S\",wait=0),verbose=",
confchk_WT_CONNECTION_reconfigure, 18
},
+ { "WT_CONNECTION.set_file_system",
+ "",
+ NULL, 0
+ },
{ "WT_CURSOR.close",
"",
NULL, 0
diff --git a/src/conn/conn_api.c b/src/conn/conn_api.c
index 4efa853851e..18ad383ec74 100644
--- a/src/conn/conn_api.c
+++ b/src/conn/conn_api.c
@@ -806,6 +806,7 @@ static int
__conn_load_default_extensions(WT_CONNECTION_IMPL *conn)
{
WT_UNUSED(conn);
+
#ifdef HAVE_BUILTIN_EXTENSION_SNAPPY
WT_RET(snappy_extension_init(&conn->iface, NULL));
#endif
@@ -819,18 +820,16 @@ __conn_load_default_extensions(WT_CONNECTION_IMPL *conn)
}
/*
- * __conn_load_extension --
- * WT_CONNECTION->load_extension method.
+ * __conn_load_extension_int --
+ * Internal extension load interface
*/
static int
-__conn_load_extension(
- WT_CONNECTION *wt_conn, const char *path, const char *config)
+__conn_load_extension_int(WT_SESSION_IMPL *session,
+ const char *path, const char *cfg[], bool early_load)
{
WT_CONFIG_ITEM cval;
- WT_CONNECTION_IMPL *conn;
WT_DECL_RET;
WT_DLH *dlh;
- WT_SESSION_IMPL *session;
int (*load)(WT_CONNECTION *, WT_CONFIG_ARG *);
bool is_local;
const char *init_name, *terminate_name;
@@ -839,8 +838,10 @@ __conn_load_extension(
init_name = terminate_name = NULL;
is_local = strcmp(path, "local") == 0;
- conn = (WT_CONNECTION_IMPL *)wt_conn;
- CONNECTION_API_CALL(conn, session, load_extension, config, cfg);
+ /* Ensure that the load matches the phase of startup we are in. */
+ WT_ERR(__wt_config_gets(session, cfg, "early_load", &cval));
+ if ((cval.val == 0 && early_load) || (cval.val != 0 && !early_load))
+ return (0);
/*
* This assumes the underlying shared libraries are reference counted,
@@ -865,20 +866,39 @@ __conn_load_extension(
__wt_dlsym(session, dlh, terminate_name, false, &dlh->terminate));
/* Call the load function last, it simplifies error handling. */
- WT_ERR(load(wt_conn, (WT_CONFIG_ARG *)cfg));
+ WT_ERR(load(&S2C(session)->iface, (WT_CONFIG_ARG *)cfg));
/* Link onto the environment's list of open libraries. */
- __wt_spin_lock(session, &conn->api_lock);
- TAILQ_INSERT_TAIL(&conn->dlhqh, dlh, q);
- __wt_spin_unlock(session, &conn->api_lock);
+ __wt_spin_lock(session, &S2C(session)->api_lock);
+ TAILQ_INSERT_TAIL(&S2C(session)->dlhqh, dlh, q);
+ __wt_spin_unlock(session, &S2C(session)->api_lock);
dlh = NULL;
err: if (dlh != NULL)
WT_TRET(__wt_dlclose(session, dlh));
__wt_free(session, init_name);
__wt_free(session, terminate_name);
+ return (ret);
+}
- API_END_RET_NOTFOUND_MAP(session, ret);
+/*
+ * __conn_load_extension --
+ * WT_CONNECTION->load_extension method.
+ */
+static int
+__conn_load_extension(
+ WT_CONNECTION *wt_conn, const char *path, const char *config)
+{
+ WT_CONNECTION_IMPL *conn;
+ WT_DECL_RET;
+ WT_SESSION_IMPL *session;
+
+ conn = (WT_CONNECTION_IMPL *)wt_conn;
+ CONNECTION_API_CALL(conn, session, load_extension, config, cfg);
+
+ ret = __conn_load_extension_int(session, path, cfg, false);
+
+err: API_END_RET_NOTFOUND_MAP(session, ret);
}
/*
@@ -886,18 +906,16 @@ err: if (dlh != NULL)
* Load the list of application-configured extensions.
*/
static int
-__conn_load_extensions(WT_SESSION_IMPL *session, const char *cfg[])
+__conn_load_extensions(
+ WT_SESSION_IMPL *session, const char *cfg[], bool early_load)
{
WT_CONFIG subconfig;
WT_CONFIG_ITEM cval, skey, sval;
- WT_CONNECTION_IMPL *conn;
WT_DECL_ITEM(exconfig);
WT_DECL_ITEM(expath);
WT_DECL_RET;
-
- conn = S2C(session);
-
- WT_ERR(__conn_load_default_extensions(conn));
+ const char *sub_cfg[] = {
+ WT_CONFIG_BASE(session, WT_CONNECTION_load_extension), NULL, NULL };
WT_ERR(__wt_config_gets(session, cfg, "extensions", &cval));
WT_ERR(__wt_config_subinit(session, &subconfig, &cval));
@@ -912,8 +930,9 @@ __conn_load_extensions(WT_SESSION_IMPL *session, const char *cfg[])
WT_ERR(__wt_buf_fmt(session,
exconfig, "%.*s", (int)sval.len, sval.str));
}
- WT_ERR(conn->iface.load_extension(&conn->iface,
- expath->data, (sval.len > 0) ? exconfig->data : NULL));
+ sub_cfg[1] = sval.len > 0 ? exconfig->data : NULL;
+ WT_ERR(__conn_load_extension_int(
+ session, expath->data, sub_cfg, early_load));
}
WT_ERR_NOTFOUND_OK(ret);
@@ -1192,12 +1211,12 @@ __conn_config_file(WT_SESSION_IMPL *session,
fh = NULL;
/* Configuration files are always optional. */
- WT_RET(__wt_exist(session, filename, &exist));
+ WT_RET(__wt_fs_exist(session, filename, &exist));
if (!exist)
return (0);
/* Open the configuration file. */
- WT_RET(__wt_open(session, filename, WT_FILE_TYPE_REGULAR, 0, &fh));
+ WT_RET(__wt_open(session, filename, WT_OPEN_FILE_TYPE_REGULAR, 0, &fh));
WT_ERR(__wt_filesize(session, fh, &size));
if (size == 0)
goto err;
@@ -1488,8 +1507,8 @@ __conn_single(WT_SESSION_IMPL *session, const char *cfg[])
*/
exist = false;
if (!is_create)
- WT_ERR(__wt_exist(session, WT_WIREDTIGER, &exist));
- ret = __wt_open(session, WT_SINGLETHREAD, WT_FILE_TYPE_REGULAR,
+ WT_ERR(__wt_fs_exist(session, WT_WIREDTIGER, &exist));
+ ret = __wt_open(session, WT_SINGLETHREAD, WT_OPEN_FILE_TYPE_REGULAR,
is_create || exist ? WT_OPEN_CREATE : 0, &conn->lock_fh);
/*
@@ -1545,7 +1564,7 @@ __conn_single(WT_SESSION_IMPL *session, const char *cfg[])
/* We own the lock file, optionally create the WiredTiger file. */
ret = __wt_open(session, WT_WIREDTIGER,
- WT_FILE_TYPE_REGULAR, is_create ? WT_OPEN_CREATE : 0, &fh);
+ WT_OPEN_FILE_TYPE_REGULAR, is_create ? WT_OPEN_CREATE : 0, &fh);
/*
* If we're read-only, check for success as well as handled errors.
@@ -1582,7 +1601,7 @@ __conn_single(WT_SESSION_IMPL *session, const char *cfg[])
* and there's never a database home after that point without a turtle
* file. If the turtle file doesn't exist, it's a create.
*/
- WT_ERR(__wt_exist(session, WT_METADATA_TURTLE, &exist));
+ WT_ERR(__wt_fs_exist(session, WT_METADATA_TURTLE, &exist));
conn->is_new = exist ? 0 : 1;
if (conn->is_new) {
@@ -1789,7 +1808,7 @@ __conn_write_base_config(WT_SESSION_IMPL *session, const char *cfg[])
* only NOT exist if we crashed before it was created; in other words,
* if the base configuration file exists, we're done.
*/
- WT_RET(__wt_exist(session, WT_BASECONFIG, &exist));
+ WT_RET(__wt_fs_exist(session, WT_BASECONFIG, &exist));
if (exist)
return (0);
@@ -1864,6 +1883,57 @@ err: WT_TRET(__wt_fclose(session, &fs));
}
/*
+ * __conn_set_file_system --
+ * Configure a custom file system implementation on database open.
+ */
+static int
+__conn_set_file_system(
+ WT_CONNECTION *wt_conn, WT_FILE_SYSTEM *file_system, const char *config)
+{
+ WT_CONNECTION_IMPL *conn;
+ WT_DECL_RET;
+ WT_SESSION_IMPL *session;
+
+ conn = (WT_CONNECTION_IMPL *)wt_conn;
+ CONNECTION_API_CALL(conn, session, set_file_system, config, cfg);
+ WT_UNUSED(cfg);
+
+ conn->file_system = file_system;
+
+err: API_END_RET(session, ret);
+}
+
+/*
+ * __conn_chk_file_system --
+ * Check the configured file system.
+ */
+static int
+__conn_chk_file_system(WT_SESSION_IMPL *session, bool readonly)
+{
+ WT_CONNECTION_IMPL *conn;
+
+ conn = S2C(session);
+
+#define WT_CONN_SET_FILE_SYSTEM_REQ(name) \
+ if (conn->file_system->name == NULL) \
+ WT_RET_MSG(session, EINVAL, \
+ "a WT_FILE_SYSTEM.%s method must be configured", #name)
+
+ WT_CONN_SET_FILE_SYSTEM_REQ(directory_list);
+ WT_CONN_SET_FILE_SYSTEM_REQ(directory_list_free);
+ /* not required: directory_sync */
+ WT_CONN_SET_FILE_SYSTEM_REQ(exist);
+ WT_CONN_SET_FILE_SYSTEM_REQ(open_file);
+ if (!readonly) {
+ WT_CONN_SET_FILE_SYSTEM_REQ(remove);
+ WT_CONN_SET_FILE_SYSTEM_REQ(rename);
+ }
+ WT_CONN_SET_FILE_SYSTEM_REQ(size);
+
+ return (0);
+}
+
+/*
* wiredtiger_open --
* Main library entry point: open a new connection to a WiredTiger
* database.
@@ -1887,12 +1957,13 @@ wiredtiger_open(const char *home, WT_EVENT_HANDLER *event_handler,
__conn_add_compressor,
__conn_add_encryptor,
__conn_add_extractor,
+ __conn_set_file_system,
__conn_get_extension_api
};
static const WT_NAME_FLAG file_types[] = {
- { "checkpoint", WT_FILE_TYPE_CHECKPOINT },
- { "data", WT_FILE_TYPE_DATA },
- { "log", WT_FILE_TYPE_LOG },
+ { "checkpoint", WT_DIRECT_IO_CHECKPOINT },
+ { "data", WT_DIRECT_IO_DATA },
+ { "log", WT_DIRECT_IO_LOG },
{ NULL, 0 }
};
@@ -1982,10 +2053,27 @@ wiredtiger_open(const char *home, WT_EVENT_HANDLER *event_handler,
F_SET(conn, WT_CONN_READONLY);
/*
- * After checking readonly and in-memory, but before we do anything that
- * touches the filesystem, configure the OS layer.
+ * Load early extensions before doing further initialization (one early
+ * extension is to configure a file system).
*/
- WT_ERR(__wt_os_init(session));
+ WT_ERR(__conn_load_extensions(session, cfg, true));
+
+ /*
+ * If the application didn't configure its own file system, configure
+ * one of ours. Check to ensure we have a valid file system.
+ */
+ if (conn->file_system == NULL) {
+ if (F_ISSET(conn, WT_CONN_IN_MEMORY))
+ WT_ERR(__wt_os_inmemory(session));
+ else
+#if defined(_MSC_VER)
+ WT_ERR(__wt_os_win(session));
+#else
+ WT_ERR(__wt_os_posix(session));
+#endif
+ }
+ WT_ERR(
+ __conn_chk_file_system(session, F_ISSET(conn, WT_CONN_READONLY)));
/*
* Capture the config_base setting file for later use. Again, if the
@@ -2118,8 +2206,8 @@ wiredtiger_open(const char *home, WT_EVENT_HANDLER *event_handler,
if (ret == 0) {
if (sval.val)
FLD_SET(conn->direct_io, ft->flag);
- } else if (ret != WT_NOTFOUND)
- goto err;
+ } else
+ WT_ERR_NOTFOUND_OK(ret);
}
WT_ERR(__wt_config_gets(session, cfg, "write_through", &cval));
@@ -2128,8 +2216,8 @@ wiredtiger_open(const char *home, WT_EVENT_HANDLER *event_handler,
if (ret == 0) {
if (sval.val)
FLD_SET(conn->write_through, ft->flag);
- } else if (ret != WT_NOTFOUND)
- goto err;
+ } else
+ WT_ERR_NOTFOUND_OK(ret);
}
/*
@@ -2153,15 +2241,15 @@ wiredtiger_open(const char *home, WT_EVENT_HANDLER *event_handler,
ret = __wt_config_subgets(session, &cval, ft->name, &sval);
if (ret == 0) {
switch (ft->flag) {
- case WT_FILE_TYPE_DATA:
+ case WT_DIRECT_IO_DATA:
conn->data_extend_len = sval.val;
break;
- case WT_FILE_TYPE_LOG:
+ case WT_DIRECT_IO_LOG:
conn->log_extend_len = sval.val;
break;
}
- } else if (ret != WT_NOTFOUND)
- goto err;
+ } else
+ WT_ERR_NOTFOUND_OK(ret);
}
WT_ERR(__wt_config_gets(session, cfg, "mmap", &cval));
@@ -2190,7 +2278,8 @@ wiredtiger_open(const char *home, WT_EVENT_HANDLER *event_handler,
* everything else to be in place, and the extensions call back into the
* library.
*/
- WT_ERR(__conn_load_extensions(session, cfg));
+ WT_ERR(__conn_load_default_extensions(conn));
+ WT_ERR(__conn_load_extensions(session, cfg, false));
/*
* The metadata/log encryptor is configured after extensions, since
diff --git a/src/conn/conn_handle.c b/src/conn/conn_handle.c
index 5f4c38e7361..509966793e5 100644
--- a/src/conn/conn_handle.c
+++ b/src/conn/conn_handle.c
@@ -149,15 +149,17 @@ __wt_connection_destroy(WT_CONNECTION_IMPL *conn)
__wt_spin_destroy(session, &conn->page_lock[i]);
__wt_free(session, conn->page_lock);
+ /* Destroy the file-system configuration. */
+ if (conn->file_system != NULL && conn->file_system->terminate != NULL)
+ WT_TRET(conn->file_system->terminate(
+ conn->file_system, (WT_SESSION *)session));
+
/* Free allocated memory. */
__wt_free(session, conn->cfg);
__wt_free(session, conn->home);
__wt_free(session, conn->error_prefix);
__wt_free(session, conn->sessions);
- /* Destroy the OS configuration. */
- WT_TRET(__wt_os_cleanup(session));
-
__wt_free(NULL, conn);
return (ret);
}
diff --git a/src/conn/conn_log.c b/src/conn/conn_log.c
index 672071b59bf..394378b65fc 100644
--- a/src/conn/conn_log.c
+++ b/src/conn/conn_log.c
@@ -178,6 +178,7 @@ __log_archive_once(WT_SESSION_IMPL *session, uint32_t backup_file)
conn = S2C(session);
log = conn->log;
logcount = 0;
+ locked = false;
logfiles = NULL;
/*
@@ -198,14 +199,14 @@ __log_archive_once(WT_SESSION_IMPL *session, uint32_t backup_file)
* Main archive code. Get the list of all log files and
* remove any earlier than the minimum log number.
*/
- WT_RET(__wt_dirlist(session, conn->log_path,
- WT_LOG_FILENAME, WT_DIRLIST_INCLUDE, &logfiles, &logcount));
+ WT_ERR(__wt_fs_directory_list(
+ session, conn->log_path, WT_LOG_FILENAME, &logfiles, &logcount));
/*
* We can only archive files if a hot backup is not in progress or
* if we are the backup.
*/
- WT_RET(__wt_readlock(session, conn->hot_backup_lock));
+ WT_ERR(__wt_readlock(session, conn->hot_backup_lock));
locked = true;
if (!conn->hot_backup || backup_file != 0) {
for (i = 0; i < logcount; i++) {
@@ -218,9 +219,6 @@ __log_archive_once(WT_SESSION_IMPL *session, uint32_t backup_file)
}
WT_ERR(__wt_readunlock(session, conn->hot_backup_lock));
locked = false;
- __wt_log_files_free(session, logfiles, logcount);
- logfiles = NULL;
- logcount = 0;
/*
* Indicate what is our new earliest LSN. It is the start
@@ -232,8 +230,7 @@ __log_archive_once(WT_SESSION_IMPL *session, uint32_t backup_file)
err: __wt_err(session, ret, "log archive server error");
if (locked)
WT_TRET(__wt_readunlock(session, conn->hot_backup_lock));
- if (logfiles != NULL)
- __wt_log_files_free(session, logfiles, logcount);
+ WT_TRET(__wt_fs_directory_list_free(session, &logfiles, &logcount));
return (ret);
}
@@ -259,10 +256,10 @@ __log_prealloc_once(WT_SESSION_IMPL *session)
* Allocate up to the maximum number, accounting for any existing
* files that may not have been used yet.
*/
- WT_ERR(__wt_dirlist(session, conn->log_path,
- WT_LOG_PREPNAME, WT_DIRLIST_INCLUDE, &recfiles, &reccount));
- __wt_log_files_free(session, recfiles, reccount);
- recfiles = NULL;
+ WT_ERR(__wt_fs_directory_list(
+ session, conn->log_path, WT_LOG_PREPNAME, &recfiles, &reccount));
+ WT_ERR(__wt_fs_directory_list_free(session, &recfiles, &reccount));
+
/*
* Adjust the number of files to pre-allocate if we find that
* the critical path had to allocate them since we last ran.
@@ -292,8 +289,7 @@ __log_prealloc_once(WT_SESSION_IMPL *session)
if (0)
err: __wt_err(session, ret, "log pre-alloc server error");
- if (recfiles != NULL)
- __wt_log_files_free(session, recfiles, reccount);
+ WT_TRET(__wt_fs_directory_list_free(session, &recfiles, &reccount));
return (ret);
}
@@ -868,9 +864,9 @@ __wt_logmgr_create(WT_SESSION_IMPL *session, const char *cfg[])
"log write LSN"));
WT_RET(__wt_rwlock_alloc(session,
&log->log_archive_lock, "log archive lock"));
- if (FLD_ISSET(conn->direct_io, WT_FILE_TYPE_LOG))
- log->allocsize =
- WT_MAX((uint32_t)conn->buffer_alignment, WT_LOG_ALIGN);
+ if (FLD_ISSET(conn->direct_io, WT_DIRECT_IO_LOG))
+ log->allocsize = (uint32_t)
+ WT_MAX(conn->buffer_alignment, WT_LOG_ALIGN);
else
log->allocsize = WT_LOG_ALIGN;
WT_INIT_LSN(&log->alloc_lsn);
diff --git a/src/cursor/cur_backup.c b/src/cursor/cur_backup.c
index c89f002fa75..b901b5a0869 100644
--- a/src/cursor/cur_backup.c
+++ b/src/cursor/cur_backup.c
@@ -178,8 +178,7 @@ __backup_log_append(WT_SESSION_IMPL *session, WT_CURSOR_BACKUP *cb, bool active)
for (i = 0; i < logcount; i++)
WT_ERR(__backup_list_append(session, cb, logfiles[i]));
}
-err: if (logfiles != NULL)
- __wt_log_files_free(session, logfiles, logcount);
+err: WT_TRET(__wt_fs_directory_list_free(session, &logfiles, &logcount));
return (ret);
}
@@ -257,11 +256,11 @@ __backup_start(
session, cb, WT_INCREMENTAL_BACKUP));
} else {
WT_ERR(__backup_list_append(session, cb, WT_METADATA_BACKUP));
- WT_ERR(__wt_exist(session, WT_BASECONFIG, &exist));
+ WT_ERR(__wt_fs_exist(session, WT_BASECONFIG, &exist));
if (exist)
WT_ERR(__backup_list_append(
session, cb, WT_BASECONFIG));
- WT_ERR(__wt_exist(session, WT_USERCONFIG, &exist));
+ WT_ERR(__wt_fs_exist(session, WT_USERCONFIG, &exist));
if (exist)
WT_ERR(__backup_list_append(
session, cb, WT_USERCONFIG));
diff --git a/src/cursor/cur_join.c b/src/cursor/cur_join.c
index 93c1711ef93..8bf7007527b 100644
--- a/src/cursor/cur_join.c
+++ b/src/cursor/cur_join.c
@@ -211,9 +211,8 @@ err: __wt_free(session, uri);
/*
* __curjoin_iter_bump --
- * Called to advance the iterator to the next endpoint,
- * which may in turn advance to the next entry.
- *
+ * Called to advance the iterator to the next endpoint, which may in turn
+ * advance to the next entry.
*/
static int
__curjoin_iter_bump(WT_CURSOR_JOIN_ITER *iter)
diff --git a/src/docs/Doxyfile b/src/docs/Doxyfile
index 4c1682de6eb..69e9716b425 100644
--- a/src/docs/Doxyfile
+++ b/src/docs/Doxyfile
@@ -1570,6 +1570,8 @@ PREDEFINED = DOXYGEN \
__wt_event_handler:=WT_EVENT_HANDLER \
__wt_extension_api:=WT_EXTENSION_API \
__wt_extractor:=WT_EXTRACTOR \
+ __wt_file_handle:=WT_FILE_HANDLE \
+ __wt_file_system:=WT_FILE_SYSTEM \
__wt_item:=WT_ITEM \
__wt_lsn:=WT_LSN \
__wt_session:=WT_SESSION \
diff --git a/src/docs/custom-file-systems.dox b/src/docs/custom-file-systems.dox
new file mode 100644
index 00000000000..4b012952e15
--- /dev/null
+++ b/src/docs/custom-file-systems.dox
@@ -0,0 +1,25 @@
+/*! @page custom_file_systems Custom File Systems
+
+Applications can provide a custom file system implementation that will be
+used by WiredTiger to interact with the I/O subsystem using the
+WT_FILE_SYSTEM and WT_FILE_HANDLE interfaces.
+
+It is not necessary for all file system providers to implement all methods
+in the WT_FILE_SYSTEM and WT_FILE_HANDLE structures. The documentation for
+those structures indicate which methods are optional. Methods which are not
+provided should be set to NULL. Generally the function pointers should not
+be changed once a handle is created. There is one exception to this, which
+are the fallocate and fallocate_nolock - for an example of how fallocate
+can be changed after create see the WiredTiger POSIX file system
+implementation.
+
+WT_FILE_SYSTEM and WT_FILE_HANDLE methods which fail but not fatally
+(for example, a file truncation call which fails because the file is
+currently mapped into memory), should return EBUSY.
+
+Unless explicitly stated otherwise, WiredTiger may invoke methods on the
+WT_FILE_SYSTEM and WT_FILE_HANDLE interfaces from multiple threads
+concurrently. It is the responsibility of the implementation to protect
+any shared data.
+
+*/
diff --git a/src/docs/examples.dox b/src/docs/examples.dox
index 3ed7357b52c..9e3e6844da4 100644
--- a/src/docs/examples.dox
+++ b/src/docs/examples.dox
@@ -55,4 +55,7 @@ Shows how to access the database log files.
@example ex_thread.c
Shows how to access a database with multiple threads.
+@example ex_file_system.c
+Shows how to extend WiredTiger with a custom file-system implementation.
+
*/
diff --git a/src/docs/programming.dox b/src/docs/programming.dox
index 7ec68ca9b46..81e612e8ee8 100644
--- a/src/docs/programming.dox
+++ b/src/docs/programming.dox
@@ -56,6 +56,7 @@ each of which is ordered by one or more columns.
- @subpage custom_collators
- @subpage custom_extractors
- @subpage custom_data_sources
+- @subpage custom_file_systems
- @subpage helium
@m_endif
diff --git a/src/docs/spell.ok b/src/docs/spell.ok
index 965d28f2ec6..d197b5517f2 100644
--- a/src/docs/spell.ok
+++ b/src/docs/spell.ok
@@ -25,6 +25,7 @@ EBUSY
ECMA
EINVAL
ENCRYPTOR
+ENOTSUP
EmpId
Encryptors
Facebook
@@ -209,6 +210,7 @@ erlang
errno
exe
fadvise
+fallocate
failchk
fd's
fdatasync
@@ -333,6 +335,7 @@ nocase
nocasecoll
nodup
noflush
+nolock
nolocking
nommap
nop
diff --git a/src/include/block.h b/src/include/block.h
index e964fb4e8c2..9f652ceddb9 100644
--- a/src/include/block.h
+++ b/src/include/block.h
@@ -174,6 +174,7 @@ struct __wt_bm {
int (*compact_start)(WT_BM *, WT_SESSION_IMPL *);
int (*free)(WT_BM *, WT_SESSION_IMPL *, const uint8_t *, size_t);
bool (*is_mapped)(WT_BM *, WT_SESSION_IMPL *);
+ int (*map_discard)(WT_BM *, WT_SESSION_IMPL *, void *, size_t);
int (*preload)(WT_BM *, WT_SESSION_IMPL *, const uint8_t *, size_t);
int (*read)
(WT_BM *, WT_SESSION_IMPL *, WT_ITEM *, const uint8_t *, size_t);
@@ -196,9 +197,9 @@ struct __wt_bm {
WT_BLOCK *block; /* Underlying file */
- void *map; /* Mapped region */
- size_t maplen;
- void *mappingcookie;
+ void *map; /* Mapped region */
+ size_t maplen;
+ void *mapped_cookie;
/*
* There's only a single block manager handle that can be written, all
@@ -224,8 +225,6 @@ struct __wt_block {
wt_off_t size; /* File size */
wt_off_t extend_size; /* File extended size */
wt_off_t extend_len; /* File extend chunk size */
- bool nowait_sync_available; /* File can flush asynchronously */
- bool preload_available; /* File pages can be preloaded */
/* Configuration information, set when the file is opened. */
uint32_t allocfirst; /* Allocation is first-fit */
diff --git a/src/include/config.h b/src/include/config.h
index 48a255134af..486aa50e86c 100644
--- a/src/include/config.h
+++ b/src/include/config.h
@@ -59,41 +59,42 @@ struct __wt_config_parser_impl {
#define WT_CONFIG_ENTRY_WT_CONNECTION_load_extension 7
#define WT_CONFIG_ENTRY_WT_CONNECTION_open_session 8
#define WT_CONFIG_ENTRY_WT_CONNECTION_reconfigure 9
-#define WT_CONFIG_ENTRY_WT_CURSOR_close 10
-#define WT_CONFIG_ENTRY_WT_CURSOR_reconfigure 11
-#define WT_CONFIG_ENTRY_WT_SESSION_begin_transaction 12
-#define WT_CONFIG_ENTRY_WT_SESSION_checkpoint 13
-#define WT_CONFIG_ENTRY_WT_SESSION_close 14
-#define WT_CONFIG_ENTRY_WT_SESSION_commit_transaction 15
-#define WT_CONFIG_ENTRY_WT_SESSION_compact 16
-#define WT_CONFIG_ENTRY_WT_SESSION_create 17
-#define WT_CONFIG_ENTRY_WT_SESSION_drop 18
-#define WT_CONFIG_ENTRY_WT_SESSION_join 19
-#define WT_CONFIG_ENTRY_WT_SESSION_log_flush 20
-#define WT_CONFIG_ENTRY_WT_SESSION_log_printf 21
-#define WT_CONFIG_ENTRY_WT_SESSION_open_cursor 22
-#define WT_CONFIG_ENTRY_WT_SESSION_rebalance 23
-#define WT_CONFIG_ENTRY_WT_SESSION_reconfigure 24
-#define WT_CONFIG_ENTRY_WT_SESSION_rename 25
-#define WT_CONFIG_ENTRY_WT_SESSION_reset 26
-#define WT_CONFIG_ENTRY_WT_SESSION_rollback_transaction 27
-#define WT_CONFIG_ENTRY_WT_SESSION_salvage 28
-#define WT_CONFIG_ENTRY_WT_SESSION_snapshot 29
-#define WT_CONFIG_ENTRY_WT_SESSION_strerror 30
-#define WT_CONFIG_ENTRY_WT_SESSION_transaction_sync 31
-#define WT_CONFIG_ENTRY_WT_SESSION_truncate 32
-#define WT_CONFIG_ENTRY_WT_SESSION_upgrade 33
-#define WT_CONFIG_ENTRY_WT_SESSION_verify 34
-#define WT_CONFIG_ENTRY_colgroup_meta 35
-#define WT_CONFIG_ENTRY_file_config 36
-#define WT_CONFIG_ENTRY_file_meta 37
-#define WT_CONFIG_ENTRY_index_meta 38
-#define WT_CONFIG_ENTRY_lsm_meta 39
-#define WT_CONFIG_ENTRY_table_meta 40
-#define WT_CONFIG_ENTRY_wiredtiger_open 41
-#define WT_CONFIG_ENTRY_wiredtiger_open_all 42
-#define WT_CONFIG_ENTRY_wiredtiger_open_basecfg 43
-#define WT_CONFIG_ENTRY_wiredtiger_open_usercfg 44
+#define WT_CONFIG_ENTRY_WT_CONNECTION_set_file_system 10
+#define WT_CONFIG_ENTRY_WT_CURSOR_close 11
+#define WT_CONFIG_ENTRY_WT_CURSOR_reconfigure 12
+#define WT_CONFIG_ENTRY_WT_SESSION_begin_transaction 13
+#define WT_CONFIG_ENTRY_WT_SESSION_checkpoint 14
+#define WT_CONFIG_ENTRY_WT_SESSION_close 15
+#define WT_CONFIG_ENTRY_WT_SESSION_commit_transaction 16
+#define WT_CONFIG_ENTRY_WT_SESSION_compact 17
+#define WT_CONFIG_ENTRY_WT_SESSION_create 18
+#define WT_CONFIG_ENTRY_WT_SESSION_drop 19
+#define WT_CONFIG_ENTRY_WT_SESSION_join 20
+#define WT_CONFIG_ENTRY_WT_SESSION_log_flush 21
+#define WT_CONFIG_ENTRY_WT_SESSION_log_printf 22
+#define WT_CONFIG_ENTRY_WT_SESSION_open_cursor 23
+#define WT_CONFIG_ENTRY_WT_SESSION_rebalance 24
+#define WT_CONFIG_ENTRY_WT_SESSION_reconfigure 25
+#define WT_CONFIG_ENTRY_WT_SESSION_rename 26
+#define WT_CONFIG_ENTRY_WT_SESSION_reset 27
+#define WT_CONFIG_ENTRY_WT_SESSION_rollback_transaction 28
+#define WT_CONFIG_ENTRY_WT_SESSION_salvage 29
+#define WT_CONFIG_ENTRY_WT_SESSION_snapshot 30
+#define WT_CONFIG_ENTRY_WT_SESSION_strerror 31
+#define WT_CONFIG_ENTRY_WT_SESSION_transaction_sync 32
+#define WT_CONFIG_ENTRY_WT_SESSION_truncate 33
+#define WT_CONFIG_ENTRY_WT_SESSION_upgrade 34
+#define WT_CONFIG_ENTRY_WT_SESSION_verify 35
+#define WT_CONFIG_ENTRY_colgroup_meta 36
+#define WT_CONFIG_ENTRY_file_config 37
+#define WT_CONFIG_ENTRY_file_meta 38
+#define WT_CONFIG_ENTRY_index_meta 39
+#define WT_CONFIG_ENTRY_lsm_meta 40
+#define WT_CONFIG_ENTRY_table_meta 41
+#define WT_CONFIG_ENTRY_wiredtiger_open 42
+#define WT_CONFIG_ENTRY_wiredtiger_open_all 43
+#define WT_CONFIG_ENTRY_wiredtiger_open_basecfg 44
+#define WT_CONFIG_ENTRY_wiredtiger_open_usercfg 45
/*
* configuration section: END
* DO NOT EDIT: automatically built by dist/flags.py.
diff --git a/src/include/connection.h b/src/include/connection.h
index 5023fb1872a..81229315c48 100644
--- a/src/include/connection.h
+++ b/src/include/connection.h
@@ -414,32 +414,26 @@ struct __wt_connection_impl {
wt_off_t data_extend_len; /* file_extend data length */
wt_off_t log_extend_len; /* file_extend log length */
- /* O_DIRECT/FILE_FLAG_NO_BUFFERING file type flags */
- uint32_t direct_io;
- uint32_t write_through; /* FILE_FLAG_WRITE_THROUGH type flags */
+#define WT_DIRECT_IO_CHECKPOINT 0x01 /* Checkpoints */
+#define WT_DIRECT_IO_DATA 0x02 /* Data files */
+#define WT_DIRECT_IO_LOG 0x04 /* Log files */
+ uint32_t direct_io; /* O_DIRECT, FILE_FLAG_NO_BUFFERING */
+
+ uint32_t write_through; /* FILE_FLAG_WRITE_THROUGH */
+
bool mmap; /* mmap configuration */
int page_size; /* OS page size for mmap alignment */
uint32_t verbose;
- void *inmemory; /* In-memory configuration cookie */
-
#define WT_STDERR(s) (&S2C(s)->wt_stderr)
#define WT_STDOUT(s) (&S2C(s)->wt_stdout)
WT_FSTREAM wt_stderr, wt_stdout;
/*
- * OS library/system call jump table, to support in-memory and readonly
- * configurations as well as special devices with other non-POSIX APIs.
+ * File system interface abstracted to support alternative file system
+ * implementations.
*/
- int (*file_directory_list)(WT_SESSION_IMPL *,
- const char *, const char *, uint32_t, char ***, u_int *);
- int (*file_directory_sync)(WT_SESSION_IMPL *, const char *);
- int (*file_exist)(WT_SESSION_IMPL *, const char *, bool *);
- int (*file_remove)(WT_SESSION_IMPL *, const char *);
- int (*file_rename)(WT_SESSION_IMPL *, const char *, const char *);
- int (*file_size)(WT_SESSION_IMPL *, const char *, bool, wt_off_t *);
- int (*file_open)(WT_SESSION_IMPL *,
- WT_FH *, const char *, uint32_t, uint32_t);
+ WT_FILE_SYSTEM *file_system;
uint32_t flags;
};
diff --git a/src/include/extern.h b/src/include/extern.h
index ae82424078d..22346698574 100644
--- a/src/include/extern.h
+++ b/src/include/extern.h
@@ -41,8 +41,8 @@ extern int __wt_block_extlist_write(WT_SESSION_IMPL *session, WT_BLOCK *block, W
extern int __wt_block_extlist_truncate( WT_SESSION_IMPL *session, WT_BLOCK *block, WT_EXTLIST *el);
extern int __wt_block_extlist_init(WT_SESSION_IMPL *session, WT_EXTLIST *el, const char *name, const char *extname, bool track_size);
extern void __wt_block_extlist_free(WT_SESSION_IMPL *session, WT_EXTLIST *el);
-extern int __wt_block_map( WT_SESSION_IMPL *session, WT_BLOCK *block, void *mapp, size_t *maplenp, void **mappingcookie);
-extern int __wt_block_unmap( WT_SESSION_IMPL *session, WT_BLOCK *block, void *map, size_t maplen, void **mappingcookie);
+extern int __wt_block_map(WT_SESSION_IMPL *session, WT_BLOCK *block, void *mapped_regionp, size_t *lengthp, void *mapped_cookiep);
+extern int __wt_block_unmap(WT_SESSION_IMPL *session, WT_BLOCK *block, void *mapped_region, size_t length, void *mapped_cookie);
extern int __wt_block_manager_open(WT_SESSION_IMPL *session, const char *filename, const char *cfg[], bool forced_salvage, bool readonly, uint32_t allocsize, WT_BM **bmp);
extern int __wt_block_manager_drop(WT_SESSION_IMPL *session, const char *filename);
extern int __wt_block_manager_create( WT_SESSION_IMPL *session, const char *filename, uint32_t allocsize);
@@ -356,7 +356,6 @@ extern int __wt_log_force_sync(WT_SESSION_IMPL *session, WT_LSN *min_lsn);
extern int __wt_log_needs_recovery(WT_SESSION_IMPL *session, WT_LSN *ckp_lsn, bool *recp);
extern void __wt_log_written_reset(WT_SESSION_IMPL *session);
extern int __wt_log_get_all_files(WT_SESSION_IMPL *session, char ***filesp, u_int *countp, uint32_t *maxid, bool active_only);
-extern void __wt_log_files_free(WT_SESSION_IMPL *session, char **files, u_int count);
extern int __wt_log_extract_lognum( WT_SESSION_IMPL *session, const char *name, uint32_t *id);
extern int __wt_log_acquire(WT_SESSION_IMPL *session, uint64_t recsize, WT_LOGSLOT *slot);
extern int __wt_log_allocfile( WT_SESSION_IMPL *session, uint32_t lognum, const char *dest);
@@ -713,7 +712,7 @@ extern int __wt_txn_named_snapshot_config(WT_SESSION_IMPL *session, const char *
extern int __wt_txn_named_snapshot_destroy(WT_SESSION_IMPL *session);
extern int __wt_txn_recover(WT_SESSION_IMPL *session);
extern bool __wt_absolute_path(const char *path);
-extern bool __wt_handle_search(WT_SESSION_IMPL *session, const char *name, bool increment_ref, WT_FH *newfh, WT_FH **fhp);
+extern bool __wt_handle_is_open(WT_SESSION_IMPL *session, const char *name);
extern bool __wt_has_priv(void);
extern const char *__wt_path_separator(void);
extern const char *__wt_strerror(WT_SESSION_IMPL *session, int error, char *errbuf, size_t errlen);
@@ -740,22 +739,18 @@ extern int __wt_malloc(WT_SESSION_IMPL *session, size_t bytes_to_allocate, void
extern int __wt_map_error_rdonly(int error);
extern int __wt_nfilename( WT_SESSION_IMPL *session, const char *name, size_t namelen, char **path);
extern int __wt_once(void (*init_routine)(void));
-extern int __wt_open(WT_SESSION_IMPL *session, const char *name, uint32_t file_type, uint32_t flags, WT_FH **fhp);
-extern int __wt_os_cleanup(WT_SESSION_IMPL *session);
-extern int __wt_os_init(WT_SESSION_IMPL *session);
+extern int __wt_open(WT_SESSION_IMPL *session, const char *name, WT_OPEN_FILE_TYPE file_type, u_int flags, WT_FH **fhp);
extern int __wt_os_inmemory(WT_SESSION_IMPL *session);
-extern int __wt_os_inmemory_cleanup(WT_SESSION_IMPL *session);
extern int __wt_os_posix(WT_SESSION_IMPL *session);
-extern int __wt_os_posix_cleanup(WT_SESSION_IMPL *session);
extern int __wt_os_stdio(WT_SESSION_IMPL *session);
extern int __wt_os_win(WT_SESSION_IMPL *session);
-extern int __wt_os_win_cleanup(WT_SESSION_IMPL *session);
-extern int __wt_posix_directory_list(WT_SESSION_IMPL *session, const char *dir, const char *prefix, uint32_t flags, char ***dirlist, u_int *countp);
-extern int __wt_posix_file_allocate( WT_SESSION_IMPL *session, WT_FH *fh, wt_off_t offset, wt_off_t len);
-extern int __wt_posix_map(WT_SESSION_IMPL *session, WT_FH *fh, void *mapp, size_t *lenp, void **mappingcookie);
-extern int __wt_posix_map_discard( WT_SESSION_IMPL *session, WT_FH *fh, void *p, size_t size);
-extern int __wt_posix_map_preload( WT_SESSION_IMPL *session, WT_FH *fh, const void *p, size_t size);
-extern int __wt_posix_map_unmap(WT_SESSION_IMPL *session, WT_FH *fh, void *map, size_t len, void **mappingcookie);
+extern int __wt_posix_directory_list(WT_FILE_SYSTEM *file_system, WT_SESSION *wt_session, const char *directory, const char *prefix, char ***dirlistp, uint32_t *countp);
+extern int __wt_posix_directory_list_free(WT_FILE_SYSTEM *file_system, WT_SESSION *wt_session, char **dirlist, uint32_t count);
+extern int __wt_posix_file_fallocate(WT_FILE_HANDLE *file_handle, WT_SESSION *wt_session, wt_off_t offset, wt_off_t len);
+extern int __wt_posix_map(WT_FILE_HANDLE *fh, WT_SESSION *wt_session, void *mapped_regionp, size_t *lenp, void *mapped_cookiep);
+extern int __wt_posix_map_discard(WT_FILE_HANDLE *fh, WT_SESSION *wt_session, void *map, size_t length, void *mapped_cookie);
+extern int __wt_posix_map_preload(WT_FILE_HANDLE *fh, WT_SESSION *wt_session, const void *map, size_t length, void *mapped_cookie);
+extern int __wt_posix_unmap(WT_FILE_HANDLE *fh, WT_SESSION *wt_session, void *mapped_region, size_t len, void *mapped_cookie);
extern int __wt_realloc(WT_SESSION_IMPL *session, size_t *bytes_allocated_ret, size_t bytes_to_allocate, void *retp);
extern int __wt_realloc_aligned(WT_SESSION_IMPL *session, size_t *bytes_allocated_ret, size_t bytes_to_allocate, void *retp);
extern int __wt_realloc_noclear(WT_SESSION_IMPL *session, size_t *bytes_allocated_ret, size_t bytes_to_allocate, void *retp);
@@ -764,15 +759,14 @@ extern int __wt_rename_and_sync_directory( WT_SESSION_IMPL *session, const char
extern int __wt_strndup(WT_SESSION_IMPL *session, const void *str, size_t len, void *retp);
extern int __wt_thread_create(WT_SESSION_IMPL *session, wt_thread_t *tidret, WT_THREAD_CALLBACK(*func)(void *), void *arg);
extern int __wt_thread_join(WT_SESSION_IMPL *session, wt_thread_t tid);
-extern int __wt_win_directory_list(WT_SESSION_IMPL *session, const char *dir, const char *prefix, uint32_t flags, char ***dirlist, u_int *countp);
-extern int __wt_win_map(WT_SESSION_IMPL *session, WT_FH *fh, void *mapp, size_t *lenp, void **mappingcookie);
-extern int __wt_win_map_discard(WT_SESSION_IMPL *session, WT_FH *fh, void *p, size_t size);
-extern int __wt_win_map_preload( WT_SESSION_IMPL *session, WT_FH *fh, const void *p, size_t size);
-extern int __wt_win_map_unmap(WT_SESSION_IMPL *session, WT_FH *fh, void *map, size_t len, void **mappingcookie);
+extern int __wt_win_directory_list(WT_FILE_SYSTEM *file_system, WT_SESSION *wt_session, const char *directory, const char *prefix, char ***dirlistp, uint32_t *countp);
+extern int __wt_win_directory_list_free(WT_FILE_SYSTEM *file_system, WT_SESSION *wt_session, char **dirlist, uint32_t count);
+extern int __wt_win_fs_size(WT_FILE_SYSTEM *file_system, WT_SESSION *wt_session, const char *name, wt_off_t *sizep);
+extern int __wt_win_map(WT_FILE_HANDLE *file_handle, WT_SESSION *wt_session, void *mapped_regionp, size_t *lenp, void *mapped_cookiep);
+extern int __wt_win_unmap(WT_FILE_HANDLE *file_handle, WT_SESSION *wt_session, void *mapped_region, size_t length, void *mapped_cookie);
extern uint64_t __wt_strtouq(const char *nptr, char **endptr, int base);
extern void __wt_abort(WT_SESSION_IMPL *session) WT_GCC_FUNC_DECL_ATTRIBUTE((noreturn));
extern void __wt_free_int(WT_SESSION_IMPL *session, const void *p_arg);
-extern void __wt_posix_file_allocate_configure(WT_SESSION_IMPL *session, WT_FH *fh);
extern void __wt_sleep(uint64_t seconds, uint64_t micro_seconds);
extern void __wt_stream_set_line_buffer(FILE *fp);
extern void __wt_stream_set_no_buffer(FILE *fp);
diff --git a/src/include/flags.h b/src/include/flags.h
index 3d9b0ed716b..da7aee7b059 100644
--- a/src/include/flags.h
+++ b/src/include/flags.h
@@ -24,11 +24,6 @@
#define WT_EVICT_IN_MEMORY 0x00000002
#define WT_EVICT_LOOKASIDE 0x00000004
#define WT_EVICT_UPDATE_RESTORE 0x00000008
-#define WT_FILE_TYPE_CHECKPOINT 0x00000001
-#define WT_FILE_TYPE_DATA 0x00000002
-#define WT_FILE_TYPE_DIRECTORY 0x00000004
-#define WT_FILE_TYPE_LOG 0x00000008
-#define WT_FILE_TYPE_REGULAR 0x00000010
#define WT_LOGSCAN_FIRST 0x00000001
#define WT_LOGSCAN_FROM_CKP 0x00000002
#define WT_LOGSCAN_ONE 0x00000004
diff --git a/src/include/misc.h b/src/include/misc.h
index 07d52c61eac..4c7c9572905 100644
--- a/src/include/misc.h
+++ b/src/include/misc.h
@@ -96,8 +96,9 @@
* the caller remember to put the & operator on the pointer.
*/
#define __wt_free(session, p) do { \
- if ((p) != NULL) \
- __wt_free_int(session, (void *)&(p)); \
+ void *__p = &(p); \
+ if (*(void **)__p != NULL) \
+ __wt_free_int(session, __p); \
} while (0)
#ifdef HAVE_DIAGNOSTIC
#define __wt_overwrite_and_free(session, p) do { \
diff --git a/src/include/os.h b/src/include/os.h
index 830277fb5f3..0df2ea49197 100644
--- a/src/include/os.h
+++ b/src/include/os.h
@@ -6,14 +6,6 @@
* See the file LICENSE for redistribution information.
*/
-/*
- * Number of directory entries can grow dynamically.
- */
-#define WT_DIR_ENTRY 32
-
-#define WT_DIRLIST_EXCLUDE 0x1 /* Exclude files matching prefix */
-#define WT_DIRLIST_INCLUDE 0x2 /* Include files matching prefix */
-
#define WT_SYSCALL_RETRY(call, ret) do { \
int __retry; \
for (__retry = 0; __retry < 10; ++__retry) { \
@@ -58,81 +50,68 @@
(t1).tv_nsec < (t2).tv_nsec ? -1 : \
(t1).tv_nsec == (t2).tv_nsec ? 0 : 1 : 1)
-/*
- * The underlying OS calls return ENOTSUP if posix_fadvise functionality isn't
- * available, but WiredTiger uses the POSIX flag names in the API. Use distinct
- * values so the underlying code can distinguish.
- */
-#ifndef POSIX_FADV_DONTNEED
-#define POSIX_FADV_DONTNEED 0x01
-#endif
-#ifndef POSIX_FADV_WILLNEED
-#define POSIX_FADV_WILLNEED 0x02
-#endif
-
-#define WT_OPEN_CREATE 0x001 /* Create */
-#define WT_OPEN_DIRECTIO 0x002 /* Direct I/O (if available) */
-#define WT_OPEN_EXCLUSIVE 0x004 /* Open exclusively */
-#define WT_OPEN_FIXED 0x008 /* Path isn't relative to home */
-#define WT_OPEN_READONLY 0x010 /* Open readonly (internal) */
-
struct __wt_fh {
+ /*
+ * There is a file name field in both the WT_FH and WT_FILE_HANDLE
+ * structures, which isn't ideal. There would be compromises to keeping
+ * a single copy: If it were in WT_FH, file systems could not access
+ * the name field, if it were just in the WT_FILE_HANDLE internal
+ * WiredTiger code would need to maintain a string inside a structure
+ * that is owned by the user (since we care about the content of the
+ * file name). Keeping two copies seems most reasonable.
+ */
const char *name; /* File name */
- uint64_t name_hash; /* Hash of name */
- TAILQ_ENTRY(__wt_fh) q; /* List of open handles */
- TAILQ_ENTRY(__wt_fh) hashq; /* Hashed list of handles */
- u_int ref; /* Reference count */
+ uint64_t name_hash; /* hash of name */
+ TAILQ_ENTRY(__wt_fh) q; /* internal queue */
+ TAILQ_ENTRY(__wt_fh) hashq; /* internal hash queue */
+ u_int ref; /* reference count */
+
+ WT_FILE_HANDLE *handle;
+};
+#ifdef _WIN32
+struct __wt_file_handle_win {
+ WT_FILE_HANDLE iface;
/*
- * Underlying file system handle support.
+ * Windows specific file handle fields
*/
-#ifdef _WIN32
HANDLE filehandle; /* Windows file handle */
HANDLE filehandle_secondary; /* Windows file handle
for file size changes */
+ bool direct_io; /* O_DIRECT configured */
+};
+
#else
- int fd; /* POSIX file handle */
-#endif
- /* In-memory specific fields. */
- size_t off; /* Read/write offset */
- WT_ITEM buf; /* Data */
+struct __wt_file_handle_posix {
+ WT_FILE_HANDLE iface;
- bool direct_io; /* O_DIRECT configured */
+ /*
+ * POSIX specific file handle fields
+ */
+ int fd; /* POSIX file handle */
- enum { /* file extend configuration */
- WT_FALLOCATE_AVAILABLE,
- WT_FALLOCATE_NOT_AVAILABLE,
- WT_FALLOCATE_POSIX,
- WT_FALLOCATE_STD,
- WT_FALLOCATE_SYS } fallocate_available;
- bool fallocate_requires_locking;
+ bool direct_io; /* O_DIRECT configured */
+};
+#endif
-#define WT_FH_IN_MEMORY 0x01 /* In-memory, don't remove */
- uint32_t flags;
+struct __wt_file_handle_inmem {
+ WT_FILE_HANDLE iface;
+ /*
+ * In memory specific file handle fields
+ */
+ TAILQ_ENTRY(__wt_file_handle_inmem) q; /* Closed file queue */
- int (*fh_advise)(WT_SESSION_IMPL *, WT_FH *, wt_off_t, wt_off_t, int);
- int (*fh_allocate)(WT_SESSION_IMPL *, WT_FH *, wt_off_t, wt_off_t);
- int (*fh_close)(WT_SESSION_IMPL *, WT_FH *);
- int (*fh_lock)(WT_SESSION_IMPL *, WT_FH *, bool);
- int (*fh_map)(WT_SESSION_IMPL *, WT_FH *, void *, size_t *, void **);
- int (*fh_map_discard)(WT_SESSION_IMPL *, WT_FH *, void *, size_t);
- int (*fh_map_preload)(WT_SESSION_IMPL *, WT_FH *, const void *, size_t);
- int (*fh_map_unmap)(
- WT_SESSION_IMPL *, WT_FH *, void *, size_t, void **);
- int (*fh_read)(WT_SESSION_IMPL *, WT_FH *, wt_off_t, size_t, void *);
- int (*fh_size)(WT_SESSION_IMPL *, WT_FH *, wt_off_t *);
- int (*fh_sync)(WT_SESSION_IMPL *, WT_FH *, bool);
- int (*fh_truncate)(WT_SESSION_IMPL *, WT_FH *, wt_off_t);
- int (*fh_write)(
- WT_SESSION_IMPL *, WT_FH *, wt_off_t, size_t, const void *);
+ size_t off; /* Read/write offset */
+ WT_ITEM buf; /* Data */
+ u_int ref; /* Reference count */
};
struct __wt_fstream {
- const char *name; /* Stream name */
+ const char *name; /* Stream name */
- FILE *fp; /* stdio FILE stream */
+ FILE *fp; /* stdio FILE stream */
WT_FH *fh; /* WT file handle */
wt_off_t off; /* Read/write offset */
wt_off_t size; /* File size */
diff --git a/src/include/os_fhandle.i b/src/include/os_fhandle.i
index 4a5d7d2c3a7..a093d80d388 100644
--- a/src/include/os_fhandle.i
+++ b/src/include/os_fhandle.i
@@ -7,18 +7,24 @@
*/
/*
- * __wt_directory_sync_fh --
- * Flush a directory file handle to ensure file creation is durable.
- *
- * We don't use the normal sync path because many file systems don't require
- * this step and we don't want to penalize them.
+ * __wt_fsync --
+ * POSIX fsync.
*/
static inline int
-__wt_directory_sync_fh(WT_SESSION_IMPL *session, WT_FH *fh)
+__wt_fsync(WT_SESSION_IMPL *session, WT_FH *fh, bool block)
{
- WT_ASSERT(session, !F_ISSET(S2C(session), WT_CONN_READONLY));
+ WT_FILE_HANDLE *handle;
- return (fh->fh_sync(session, fh, true));
+ WT_RET(__wt_verbose(
+ session, WT_VERB_HANDLEOPS, "%s: handle-sync", fh->handle->name));
+
+ handle = fh->handle;
+ if (block)
+ return (handle->sync == NULL ? 0 :
+ handle->sync(handle, (WT_SESSION *)session));
+ else
+ return (handle->sync_nowait == NULL ? 0 :
+ handle->sync_nowait(handle, (WT_SESSION *)session));
}
/*
@@ -29,14 +35,34 @@ static inline int
__wt_fallocate(
WT_SESSION_IMPL *session, WT_FH *fh, wt_off_t offset, wt_off_t len)
{
+ WT_DECL_RET;
+ WT_FILE_HANDLE *handle;
+
WT_ASSERT(session, !F_ISSET(S2C(session), WT_CONN_READONLY));
WT_ASSERT(session, !F_ISSET(S2C(session), WT_CONN_IN_MEMORY));
WT_RET(__wt_verbose(session, WT_VERB_HANDLEOPS,
"%s: handle-allocate: %" PRIuMAX " at %" PRIuMAX,
- fh->name, (uintmax_t)len, (uintmax_t)offset));
-
- return (fh->fh_allocate(session, fh, offset, len));
+ fh->handle->name, (uintmax_t)len, (uintmax_t)offset));
+
+ /*
+ * Our caller is responsible for handling any locking issues, all we
+ * have to do is find a function to call.
+ *
+ * Be cautious, the underlying system might have configured the nolock
+ * flavor, that failed, and we have to fallback to the locking flavor.
+ */
+ handle = fh->handle;
+ if (handle->fallocate_nolock != NULL) {
+ if ((ret = handle->fallocate_nolock(
+ handle, (WT_SESSION *)session, offset, len)) == 0)
+ return (0);
+ WT_RET_ERROR_OK(ret, ENOTSUP);
+ }
+ if (handle->fallocate != NULL)
+ return (handle->fallocate(
+ handle, (WT_SESSION *)session, offset, len));
+ return (ENOTSUP);
}
/*
@@ -46,10 +72,14 @@ __wt_fallocate(
static inline int
__wt_file_lock(WT_SESSION_IMPL * session, WT_FH *fh, bool lock)
{
+ WT_FILE_HANDLE *handle;
+
WT_RET(__wt_verbose(session, WT_VERB_HANDLEOPS,
- "%s: handle-lock: %s", fh->name, lock ? "lock" : "unlock"));
+ "%s: handle-lock: %s", fh->handle->name, lock ? "lock" : "unlock"));
- return (fh->fh_lock(session, fh, lock));
+ handle = fh->handle;
+ return (handle->lock == NULL ? 0 :
+ handle->lock(handle, (WT_SESSION*)session, lock));
}
/*
@@ -62,11 +92,12 @@ __wt_read(
{
WT_RET(__wt_verbose(session, WT_VERB_HANDLEOPS,
"%s: handle-read: %" WT_SIZET_FMT " at %" PRIuMAX,
- fh->name, len, (uintmax_t)offset));
+ fh->handle->name, len, (uintmax_t)offset));
WT_STAT_FAST_CONN_INCR(session, read_io);
- return (fh->fh_read(session, fh, offset, len, buf));
+ return (fh->handle->read(
+ fh->handle, (WT_SESSION *)session, offset, len, buf));
}
/*
@@ -77,22 +108,9 @@ static inline int
__wt_filesize(WT_SESSION_IMPL *session, WT_FH *fh, wt_off_t *sizep)
{
WT_RET(__wt_verbose(
- session, WT_VERB_HANDLEOPS, "%s: handle-size", fh->name));
-
- return (fh->fh_size(session, fh, sizep));
-}
-
-/*
- * __wt_fsync --
- * POSIX fsync.
- */
-static inline int
-__wt_fsync(WT_SESSION_IMPL *session, WT_FH *fh, bool block)
-{
- WT_RET(__wt_verbose(
- session, WT_VERB_HANDLEOPS, "%s: handle-sync", fh->name));
+ session, WT_VERB_HANDLEOPS, "%s: handle-size", fh->handle->name));
- return (fh->fh_sync(session, fh, block));
+ return (fh->handle->size(fh->handle, (WT_SESSION *)session, sizep));
}
/*
@@ -105,9 +123,10 @@ __wt_ftruncate(WT_SESSION_IMPL *session, WT_FH *fh, wt_off_t len)
WT_ASSERT(session, !F_ISSET(S2C(session), WT_CONN_READONLY));
WT_RET(__wt_verbose(session, WT_VERB_HANDLEOPS,
- "%s: handle-truncate: %" PRIuMAX, fh->name, (uintmax_t)len));
+ "%s: handle-truncate: %" PRIuMAX,
+ fh->handle->name, (uintmax_t)len));
- return (fh->fh_truncate(session, fh, len));
+ return (fh->handle->truncate(fh->handle, (WT_SESSION *)session, len));
}
/*
@@ -124,9 +143,10 @@ __wt_write(WT_SESSION_IMPL *session,
WT_RET(__wt_verbose(session, WT_VERB_HANDLEOPS,
"%s: handle-write: %" WT_SIZET_FMT " at %" PRIuMAX,
- fh->name, len, (uintmax_t)offset));
+ fh->handle->name, len, (uintmax_t)offset));
WT_STAT_FAST_CONN_INCR(session, write_io);
- return (fh->fh_write(session, fh, offset, len, buf));
+ return (fh->handle->write(
+ fh->handle, (WT_SESSION *)session, offset, len, buf));
}
diff --git a/src/include/os_fs.i b/src/include/os_fs.i
index 8f3920ffdb2..b7389a39e06 100644
--- a/src/include/os_fs.i
+++ b/src/include/os_fs.i
@@ -7,89 +7,238 @@
*/
/*
- * __wt_dirlist --
+ * __wt_fs_directory_list --
* Get a list of files from a directory.
*/
static inline int
-__wt_dirlist(WT_SESSION_IMPL *session, const char *dir,
- const char *prefix, uint32_t flags, char ***dirlist, u_int *countp)
+__wt_fs_directory_list(WT_SESSION_IMPL *session,
+ const char *dir, const char *prefix, char ***dirlistp, u_int *countp)
{
- WT_ASSERT(session, !F_ISSET(S2C(session), WT_CONN_IN_MEMORY));
+ WT_DECL_RET;
+ WT_FILE_SYSTEM *file_system;
+ WT_SESSION *wt_session;
+ char *path;
+
+ *dirlistp = NULL;
+ *countp = 0;
WT_RET(__wt_verbose(session, WT_VERB_FILEOPS,
"%s: directory-list: %s prefix %s",
- dir, LF_ISSET(WT_DIRLIST_INCLUDE) ? "include" : "exclude",
- prefix == NULL ? "all" : prefix));
+ dir, prefix == NULL ? "all" : prefix));
+
+ WT_RET(__wt_filename(session, dir, &path));
+
+ file_system = S2C(session)->file_system;
+ wt_session = (WT_SESSION *)session;
+ ret = file_system->directory_list(
+ file_system, wt_session, path, prefix, dirlistp, countp);
- return (S2C(session)->file_directory_list(
- session, dir, prefix, flags, dirlist, countp));
+ __wt_free(session, path);
+ return (ret);
}
/*
- * __wt_directory_sync --
+ * __wt_fs_directory_list_free --
+ * Free memory allocated by __wt_fs_directory_list.
+ */
+static inline int
+__wt_fs_directory_list_free(
+ WT_SESSION_IMPL *session, char ***dirlistp, u_int *countp)
+{
+ WT_DECL_RET;
+ WT_FILE_SYSTEM *file_system;
+ WT_SESSION *wt_session;
+
+ if (*dirlistp != NULL) {
+ file_system = S2C(session)->file_system;
+ wt_session = (WT_SESSION *)session;
+ ret = file_system->directory_list_free(
+ file_system, wt_session, *dirlistp, *countp);
+ }
+
+ *dirlistp = NULL;
+ *countp = 0;
+ return (ret);
+}
+
+/*
+ * __wt_fs_directory_sync --
* Flush a directory to ensure file creation is durable.
*/
static inline int
-__wt_directory_sync(WT_SESSION_IMPL *session, const char *name)
+__wt_fs_directory_sync(WT_SESSION_IMPL *session, const char *name)
{
+ WT_DECL_RET;
+ WT_FILE_SYSTEM *file_system;
+ WT_SESSION *wt_session;
+ char *copy, *dir;
+
WT_ASSERT(session, !F_ISSET(S2C(session), WT_CONN_READONLY));
WT_RET(__wt_verbose(
session, WT_VERB_FILEOPS, "%s: directory-sync", name));
- return (S2C(session)->file_directory_sync(session, name));
+ /*
+ * POSIX 1003.1 does not require that fsync of a file handle ensures the
+ * entry in the directory containing the file has also reached disk (and
+ * there are historic Linux filesystems requiring it). If the underlying
+ * filesystem method is set, do an explicit fsync on a file descriptor
+ * for the directory to be sure.
+ *
+ * directory-sync is not a required call, no method means the call isn't
+ * needed.
+ */
+ file_system = S2C(session)->file_system;
+ if (file_system->directory_sync == NULL)
+ return (0);
+
+ copy = NULL;
+ if (name == NULL || strchr(name, '/') == NULL)
+ name = S2C(session)->home;
+ else {
+ /*
+ * File name construction should not return a path without any
+ * slash separator, but caution isn't unreasonable.
+ */
+ WT_RET(__wt_filename(session, name, &copy));
+ if ((dir = strrchr(copy, '/')) == NULL)
+ name = S2C(session)->home;
+ else {
+ dir[1] = '\0';
+ name = copy;
+ }
+ }
+
+ wt_session = (WT_SESSION *)session;
+ ret = file_system->directory_sync(file_system, wt_session, name);
+
+ __wt_free(session, copy);
+ return (ret);
}
/*
- * __wt_exist --
+ * __wt_fs_exist --
* Return if the file exists.
*/
static inline int
-__wt_exist(WT_SESSION_IMPL *session, const char *name, bool *existp)
+__wt_fs_exist(WT_SESSION_IMPL *session, const char *name, bool *existp)
{
+ WT_DECL_RET;
+ WT_FILE_SYSTEM *file_system;
+ WT_SESSION *wt_session;
+ char *path;
+
WT_RET(__wt_verbose(session, WT_VERB_FILEOPS, "%s: file-exist", name));
- return (S2C(session)->file_exist(session, name, existp));
+ WT_RET(__wt_filename(session, name, &path));
+
+ file_system = S2C(session)->file_system;
+ wt_session = (WT_SESSION *)session;
+ ret = file_system->exist(file_system, wt_session, path, existp);
+
+ __wt_free(session, path);
+ return (ret);
}
/*
- * __wt_remove --
+ * __wt_fs_remove --
* POSIX remove.
*/
static inline int
-__wt_remove(WT_SESSION_IMPL *session, const char *name)
+__wt_fs_remove(WT_SESSION_IMPL *session, const char *name)
{
+ WT_DECL_RET;
+ WT_FILE_SYSTEM *file_system;
+ WT_SESSION *wt_session;
+ char *path;
+
WT_ASSERT(session, !F_ISSET(S2C(session), WT_CONN_READONLY));
WT_RET(__wt_verbose(session, WT_VERB_FILEOPS, "%s: file-remove", name));
- return (S2C(session)->file_remove(session, name));
+#ifdef HAVE_DIAGNOSTIC
+ /*
+ * It is a layering violation to retrieve a WT_FH here, but it is a
+ * useful diagnostic to ensure WiredTiger doesn't have the handle open.
+ */
+ if (__wt_handle_is_open(session, name))
+ WT_RET_MSG(session, EINVAL,
+ "%s: file-remove: file has open handles", name);
+#endif
+
+ WT_RET(__wt_filename(session, name, &path));
+
+ file_system = S2C(session)->file_system;
+ wt_session = (WT_SESSION *)session;
+ ret = file_system->remove(file_system, wt_session, path);
+
+ __wt_free(session, path);
+ return (ret);
}
/*
- * __wt_rename --
+ * __wt_fs_rename --
* POSIX rename.
*/
static inline int
-__wt_rename(WT_SESSION_IMPL *session, const char *from, const char *to)
+__wt_fs_rename(WT_SESSION_IMPL *session, const char *from, const char *to)
{
+ WT_DECL_RET;
+ WT_FILE_SYSTEM *file_system;
+ WT_SESSION *wt_session;
+ char *from_path, *to_path;
+
WT_ASSERT(session, !F_ISSET(S2C(session), WT_CONN_READONLY));
WT_RET(__wt_verbose(
session, WT_VERB_FILEOPS, "%s to %s: file-rename", from, to));
- return (S2C(session)->file_rename(session, from, to));
+#ifdef HAVE_DIAGNOSTIC
+ /*
+ * It is a layering violation to retrieve a WT_FH here, but it is a
+ * useful diagnostic to ensure WiredTiger doesn't have the handle open.
+ */
+ if (__wt_handle_is_open(session, from))
+ WT_RET_MSG(session, EINVAL,
+ "%s: file-rename: file has open handles", from);
+ if (__wt_handle_is_open(session, to))
+ WT_RET_MSG(session, EINVAL,
+ "%s: file-rename: file has open handles", to);
+#endif
+
+ from_path = to_path = NULL;
+ WT_ERR(__wt_filename(session, from, &from_path));
+ WT_ERR(__wt_filename(session, to, &to_path));
+
+ file_system = S2C(session)->file_system;
+ wt_session = (WT_SESSION *)session;
+ ret = file_system->rename(file_system, wt_session, from_path, to_path);
+
+err: __wt_free(session, from_path);
+ __wt_free(session, to_path);
+ return (ret);
}
/*
- * __wt_filesize_name --
+ * __wt_fs_size --
* Get the size of a file in bytes, by file name.
*/
static inline int
-__wt_filesize_name(
- WT_SESSION_IMPL *session, const char *name, bool silent, wt_off_t *sizep)
+__wt_fs_size(WT_SESSION_IMPL *session, const char *name, wt_off_t *sizep)
{
+ WT_DECL_RET;
+ WT_FILE_SYSTEM *file_system;
+ WT_SESSION *wt_session;
+ char *path;
+
WT_RET(__wt_verbose(session, WT_VERB_FILEOPS, "%s: file-size", name));
- return (S2C(session)->file_size(session, name, silent, sizep));
+ WT_RET(__wt_filename(session, name, &path));
+
+ file_system = S2C(session)->file_system;
+ wt_session = (WT_SESSION *)session;
+ ret = file_system->size(file_system, wt_session, path, sizep);
+
+ __wt_free(session, path);
+ return (ret);
}
diff --git a/src/include/wiredtiger.in b/src/include/wiredtiger.in
index 87f7ed276e2..6bb1bf418e9 100644
--- a/src/include/wiredtiger.in
+++ b/src/include/wiredtiger.in
@@ -71,6 +71,8 @@ struct __wt_encryptor; typedef struct __wt_encryptor WT_ENCRYPTOR;
struct __wt_event_handler; typedef struct __wt_event_handler WT_EVENT_HANDLER;
struct __wt_extension_api; typedef struct __wt_extension_api WT_EXTENSION_API;
struct __wt_extractor; typedef struct __wt_extractor WT_EXTRACTOR;
+struct __wt_file_handle; typedef struct __wt_file_handle WT_FILE_HANDLE;
+struct __wt_file_system; typedef struct __wt_file_system WT_FILE_SYSTEM;
struct __wt_item; typedef struct __wt_item WT_ITEM;
struct __wt_session; typedef struct __wt_session WT_SESSION;
@@ -2034,6 +2036,10 @@ struct __wt_connection {
* @configstart{WT_CONNECTION.load_extension, see dist/api_data.py}
* @config{config, configuration string passed to the entry point of the
* extension as its WT_CONFIG_ARG argument., a string; default empty.}
+ * @config{early_load, whether this extension should be loaded at the
+ * beginning of ::wiredtiger_open. Only applicable to extensions loaded
+ * via the wiredtiger_open configurations string., a boolean flag;
+ * default \c false.}
* @config{entry, the entry point of the extension\, called to
* initialize the extension when it is loaded. The signature of the
* function must match ::wiredtiger_extension_init., a string; default
@@ -2145,6 +2151,23 @@ struct __wt_connection {
WT_EXTRACTOR *extractor, const char *config);
/*!
+ * Configure a custom file system.
+ *
+ * This method can only be called from an early loaded extension
+ * module. The application must first implement the WT_FILE_SYSTEM
+ * interface and then register the implementation with WiredTiger:
+ *
+ * @snippet ex_file_system.c WT_FILE_SYSTEM register
+ *
+ * @param connection the connection handle
+ * @param fs the populated file system structure
+ * @configempty{WT_CONNECTION.set_file_system, see dist/api_data.py}
+ * @errors
+ */
+ int __F(set_file_system)(
+ WT_CONNECTION *connection, WT_FILE_SYSTEM *fs, const char *config);
+
+ /*!
* Return a reference to the WiredTiger extension functions.
*
* @snippet ex_data_source.c WT_EXTENSION_API declaration
@@ -3056,7 +3079,7 @@ const char *wiredtiger_version(int *majorp, int *minorp, int *patchp);
/*******************************************
* Forward structure declarations for the extension API
*******************************************/
-struct __wt_config_arg; typedef struct __wt_config_arg WT_CONFIG_ARG;
+struct __wt_config_arg; typedef struct __wt_config_arg WT_CONFIG_ARG;
/*!
* The interface implemented by applications to provide custom ordering of
@@ -3587,7 +3610,7 @@ struct __wt_encryptor {
* number of bytes needed.
*
* @param[out] expansion_constantp the additional number of bytes needed
- * when encrypting.
+ * when encrypting.
* @returns zero for success, non-zero to indicate an error.
*
* @snippet nop_encrypt.c WT_ENCRYPTOR sizing
@@ -3606,8 +3629,7 @@ struct __wt_encryptor {
* is used instead of this one for any callbacks.
*
* @param[in] encrypt_config the "encryption" portion of the
- * configuration from the wiredtiger_open or
- * WT_SESSION::create call
+ * configuration from the wiredtiger_open or WT_SESSION::create call
* @param[out] customp the new modified encryptor, or NULL.
* @returns zero for success, non-zero to indicate an error.
*/
@@ -3682,6 +3704,466 @@ struct __wt_extractor {
int (*terminate)(WT_EXTRACTOR *extractor, WT_SESSION *session);
};
+#if !defined(SWIG)
+/*! WT_FILE_SYSTEM::open_file file types */
+typedef enum {
+ WT_OPEN_FILE_TYPE_CHECKPOINT, /*!< open a data file checkpoint */
+ WT_OPEN_FILE_TYPE_DATA, /*!< open a data file */
+ WT_OPEN_FILE_TYPE_DIRECTORY, /*!< open a directory */
+ WT_OPEN_FILE_TYPE_LOG, /*!< open a log file */
+ WT_OPEN_FILE_TYPE_REGULAR /*!< open a regular file */
+} WT_OPEN_FILE_TYPE;
+
+/*! WT_FILE_SYSTEM::open_file flags: create if does not exist */
+#define WT_OPEN_CREATE 0x001
+/*! WT_FILE_SYSTEM::open_file flags: direct I/O requested */
+#define WT_OPEN_DIRECTIO 0x002
+/*! WT_FILE_SYSTEM::open_file flags: error if exclusive use not available */
+#define WT_OPEN_EXCLUSIVE 0x004
+#ifndef DOXYGEN
+#define WT_OPEN_FIXED 0x008 /* Path not home relative (internal) */
+#endif
+/*! WT_FILE_SYSTEM::open_file flags: open is read-only */
+#define WT_OPEN_READONLY 0x010
+
+/*!
+ * The interface implemented by applications to provide a custom file system
+ * implementation.
+ *
+ * <b>Thread safety:</b> WiredTiger may invoke methods on the WT_FILE_SYSTEM
+ * interface from multiple threads concurrently. It is the responsibility of
+ * the implementation to protect any shared data.
+ *
+ * Applications register implementations with WiredTiger by calling
+ * WT_CONNECTION::add_file_system. See @ref custom_file_systems for more
+ * information.
+ *
+ * @snippet ex_file_system.c WT_FILE_SYSTEM register
+ */
+struct __wt_file_system {
+ /*!
+ * Return a list of file names for the named directory.
+ *
+ * @errors
+ *
+ * @param file_system the WT_FILE_SYSTEM
+ * @param session the current WiredTiger session
+ * @param directory the name of the directory
+ * @param prefix if not NULL, only files with names matching the prefix
+ * are returned
+ * @param[out] dirlist the method returns an allocated array of
+ * individually allocated strings, one for each entry in the
+ * directory.
+ * @param[out] countp the method the number of entries returned
+ */
+ int (*directory_list)(WT_FILE_SYSTEM *file_system, WT_SESSION *session,
+ const char *directory, const char *prefix, char ***dirlist,
+ uint32_t *countp);
+
+ /*!
+ * Free memory allocated by WT_FILE_SYSTEM::directory_list.
+ *
+ * @errors
+ *
+ * @param file_system the WT_FILE_SYSTEM
+ * @param session the current WiredTiger session
+ * @param dirlist array returned by WT_FILE_SYSTEM::directory_list
+ * @param count count returned by WT_FILE_SYSTEM::directory_list
+ */
+ int (*directory_list_free)(WT_FILE_SYSTEM *file_system,
+ WT_SESSION *session, char **dirlist, uint32_t count);
+
+ /*!
+ * Flush the named directory.
+ *
+ * This method is not required for readonly file systems or file systems
+ * where it is not necessary to flush a file's directory to ensure the
+ * durability of file system operations, and should be set to NULL when
+ * not required by the file system.
+ *
+ * @errors
+ *
+ * @param file_system the WT_FILE_SYSTEM
+ * @param session the current WiredTiger session
+ * @param directory the name of the directory
+ */
+ int (*directory_sync)(WT_FILE_SYSTEM *file_system,
+ WT_SESSION *session, const char *directory);
+
+ /*!
+ * Return if the named file system object exists.
+ *
+ * @errors
+ *
+ * @param file_system the WT_FILE_SYSTEM
+ * @param session the current WiredTiger session
+ * @param name the name of the file
+ * @param[out] existp If the named file system object exists
+ */
+ int (*exist)(WT_FILE_SYSTEM *file_system,
+ WT_SESSION *session, const char *name, bool *existp);
+
+ /*!
+ * Open a handle for a named file system object
+ *
+ * @errors
+ *
+ * @param file_system the WT_FILE_SYSTEM
+ * @param session the current WiredTiger session
+ * @param name the name of the file system object
+ * @param file_type the type of the file
+ * The file type is provided to allow optimization for different file
+ * access patterns.
+ * @param flags flags indicating how to open the file, one or more of
+ * ::WT_OPEN_CREATE, ::WT_OPEN_DIRECTIO, ::WT_OPEN_EXCLUSIVE or
+ * ::WT_OPEN_READONLY.
+ * @param[out] file_handlep the handle to the newly opened file. File
+ * system implementations must allocate memory for the handle and
+ * the WT_FILE_HANDLE::name field, and fill in the WT_FILE_HANDLE::
+ * fields. Applications wanting to associate private information
+ * with the WT_FILE_HANDLE:: structure should declare and allocate
+ * their own structure as a superset of a WT_FILE_HANDLE:: structure.
+ */
+ int (*open_file)(WT_FILE_SYSTEM *file_system, WT_SESSION *session,
+ const char *name, WT_OPEN_FILE_TYPE file_type, uint32_t flags,
+ WT_FILE_HANDLE **file_handlep);
+
+ /*!
+ * Remove a named file system object
+ *
+ * This method is not required for readonly file systems and should be
+ * set to NULL when not required by the file system.
+ *
+ * @errors
+ *
+ * @param file_system the WT_FILE_SYSTEM
+ * @param session the current WiredTiger session
+ * @param name the name of the file system object
+ */
+ int (*remove)(
+ WT_FILE_SYSTEM *file_system, WT_SESSION *session, const char *name);
+
+ /*!
+ * Rename a named file system object
+ *
+ * This method is not required for readonly file systems and should be
+ * set to NULL when not required by the file system.
+ *
+ * @errors
+ *
+ * @param file_system the WT_FILE_SYSTEM
+ * @param session the current WiredTiger session
+ * @param from the original name of the object
+ * @param to the new name for the object
+ */
+ int (*rename)(WT_FILE_SYSTEM *file_system,
+ WT_SESSION *session, const char *from, const char *to);
+
+ /*!
+ * Return the size of a named file system object
+ *
+ * @errors
+ *
+ * @param file_system the WT_FILE_SYSTEM
+ * @param session the current WiredTiger session
+ * @param name the name of the file system object
+ * @param[out] sizep the size of the file system entry
+ */
+ int (*size)(WT_FILE_SYSTEM *file_system,
+ WT_SESSION *session, const char *name, wt_off_t *sizep);
+
+ /*!
+ * A callback performed when the file system is closed and will no
+ * longer be accessed by the WiredTiger database.
+ *
+ * This method is not required and should be set to NULL when not
+ * required by the file system.
+ *
+ * The WT_FILE_SYSTEM::terminate callback is intended to allow cleanup,
+ * the handle will not be subsequently accessed by WiredTiger.
+ */
+ int (*terminate)(WT_FILE_SYSTEM *file_system, WT_SESSION *session);
+};
+
+/*! WT_FILE_HANDLE::fadvise flags: no longer need */
+#define WT_FILE_HANDLE_DONTNEED 1
+/*! WT_FILE_HANDLE::fadvise flags: will need */
+#define WT_FILE_HANDLE_WILLNEED 2
+
+/*!
+ * A file handle implementation returned by WT_FILE_SYSTEM::open_file.
+ *
+ * <b>Thread safety:</b> Unless explicitly stated otherwise, WiredTiger may
+ * invoke methods on the WT_FILE_HANDLE interface from multiple threads
+ * concurrently. It is the responsibility of the implementation to protect
+ * any shared data.
+ *
+ * See @ref custom_file_systems for more information.
+ */
+struct __wt_file_handle {
+ /*!
+ * The enclosing file system, set by WT_FILE_SYSTEM::open_file.
+ */
+ WT_FILE_SYSTEM *file_system;
+
+ /*!
+ * The name of the file, set by WT_FILE_SYSTEM::open_file.
+ */
+ char *name;
+
+ /*!
+ * Close a file handle, the handle will not be further accessed by
+ * WiredTiger.
+ *
+ * @errors
+ *
+ * @param file_handle the WT_FILE_HANDLE
+ * @param session the current WiredTiger session
+ */
+ int (*close)(WT_FILE_HANDLE *file_handle, WT_SESSION *session);
+
+ /*!
+ * Indicate expected future use of file ranges, based on the POSIX
+ * 1003.1 standard fadvise.
+ *
+ * This method is not required, and should be set to NULL when not
+ * supported by the file.
+ *
+ * @errors
+ *
+ * @param file_handle the WT_FILE_HANDLE
+ * @param session the current WiredTiger session
+ * @param offset the file offset
+ * @param len the size of the advisory
+ * @param advice one of ::WT_FILE_HANDLE_WILLNEED or
+ * ::WT_FILE_HANDLE_DONTNEED.
+ */
+ int (*fadvise)(WT_FILE_HANDLE *file_handle,
+ WT_SESSION *session, wt_off_t offset, wt_off_t len, int advice);
+
+ /*!
+ * Ensure disk space is allocated for the file, based on the POSIX
+ * 1003.1 standard fallocate.
+ *
+ * This method is not required, and should be set to NULL when not
+ * supported by the file.
+ *
+ * This method is not called by multiple threads concurrently (on the
+ * same file handle). If the file handle's fallocate method supports
+ * concurrent calls, set the WT_FILE_HANDLE::fallocate_nolock method
+ * instead.
+ *
+ * @errors
+ *
+ * @param file_handle the WT_FILE_HANDLE
+ * @param session the current WiredTiger session
+ * @param offset the file offset
+ * @param len the size of the advisory
+ */
+ int (*fallocate)(WT_FILE_HANDLE *file_handle,
+ WT_SESSION *session, wt_off_t, wt_off_t);
+
+ /*!
+ * Ensure disk space is allocated for the file, based on the POSIX
+ * 1003.1 standard fallocate.
+ *
+ * This method is not required, and should be set to NULL when not
+ * supported by the file.
+ *
+ * This method may be called by multiple threads concurrently (on the
+ * same file handle). If the file handle's fallocate method does not
+ * support concurrent calls, set the WT_FILE_HANDLE::fallocate method
+ * instead.
+ *
+ * @errors
+ *
+ * @param file_handle the WT_FILE_HANDLE
+ * @param session the current WiredTiger session
+ * @param offset the file offset
+ * @param len the size of the advisory
+ */
+ int (*fallocate_nolock)(WT_FILE_HANDLE *file_handle,
+ WT_SESSION *session, wt_off_t, wt_off_t);
+
+ /*!
+ * Lock/unlock a file from the perspective of other processes running
+ * in the system.
+ *
+ * @errors
+ *
+ * @param file_handle the WT_FILE_HANDLE
+ * @param session the current WiredTiger session
+ * @param lock whether to lock or unlock
+ */
+ int (*lock)(
+ WT_FILE_HANDLE *file_handle, WT_SESSION *session, bool lock);
+
+ /*!
+ * Map a file into memory, based on the POSIX 1003.1 standard mmap.
+ *
+ * This method is not required, and should be set to NULL when not
+ * supported by the file.
+ *
+ * @errors
+ *
+ * @param file_handle the WT_FILE_HANDLE
+ * @param session the current WiredTiger session
+ * @param[out] mapped_regionp a reference to a memory location into
+ * which should be stored a pointer to the start of the mapped region
+ * @param[out] lengthp a reference to a memory location into which
+ * should be stored the length of the region
+ * @param[out] mapped_cookiep a reference to a memory location into
+ * which can be optionally stored a pointer to an opaque cookie
+ * which is subsequently passed to WT_FILE_HANDLE::unmap.
+ */
+ int (*map)(WT_FILE_HANDLE *file_handle, WT_SESSION *session,
+ void *mapped_regionp, size_t *lengthp, void *mapped_cookiep);
+
+ /*!
+ * Unmap part of a memory mapped file, based on the POSIX 1003.1
+ * standard madvise.
+ *
+ * This method is not required, and should be set to NULL when not
+ * supported by the file.
+ *
+ * @errors
+ *
+ * @param file_handle the WT_FILE_HANDLE
+ * @param session the current WiredTiger session
+ * @param map a location in the mapped region unlikely to be used in the
+ * near future
+ * @param length the length of the mapped region to discard
+ * @param mapped_cookie any cookie set by the WT_FILE_HANDLE::map method
+ */
+ int (*map_discard)(WT_FILE_HANDLE *file_handle,
+ WT_SESSION *session, void *map, size_t length, void *mapped_cookie);
+
+ /*!
+ * Preload part of a memory mapped file, based on the POSIX 1003.1
+ * standard madvise.
+ *
+ * This method is not required, and should be set to NULL when not
+ * supported by the file.
+ *
+ * @errors
+ *
+ * @param file_handle the WT_FILE_HANDLE
+ * @param session the current WiredTiger session
+ * @param map a location in the mapped region likely to be used in the
+ * near future
+ * @param length the size of the mapped region to preload
+ * @param mapped_cookie any cookie set by the WT_FILE_HANDLE::map method
+ */
+ int (*map_preload)(WT_FILE_HANDLE *file_handle, WT_SESSION *session,
+ const void *map, size_t length, void *mapped_cookie);
+
+ /*!
+ * Unmap a memory mapped file, based on the POSIX 1003.1 standard
+ * munmap.
+ *
+ * This method is only required if a valid implementation of map is
+ * provided by the file, and should be set to NULL otherwise.
+ *
+ * @errors
+ *
+ * @param file_handle the WT_FILE_HANDLE
+ * @param session the current WiredTiger session
+ * @param mapped_region a pointer to the start of the mapped region
+ * @param length the length of the mapped region
+ * @param mapped_cookie any cookie set by the WT_FILE_HANDLE::map method
+ */
+ int (*unmap)(WT_FILE_HANDLE *file_handle, WT_SESSION *session,
+ void *mapped_region, size_t length, void *mapped_cookie);
+
+ /*!
+ * Read from a file, based on the POSIX 1003.1 standard pread.
+ *
+ * @errors
+ *
+ * @param file_handle the WT_FILE_HANDLE
+ * @param session the current WiredTiger session
+ * @param offset the offset in the file to start reading from
+ * @param len the amount to read
+ * @param[out] buf buffer to hold the content read from file
+ */
+ int (*read)(WT_FILE_HANDLE *file_handle,
+ WT_SESSION *session, wt_off_t offset, size_t len, void *buf);
+
+ /*!
+ * Return the size of a file.
+ *
+ * @errors
+ *
+ * @param file_handle the WT_FILE_HANDLE
+ * @param session the current WiredTiger session
+ * @param sizep the size of the file
+ */
+ int (*size)(
+ WT_FILE_HANDLE *file_handle, WT_SESSION *session, wt_off_t *sizep);
+
+ /*!
+ * Make outstanding file writes durable and do not return until writes
+ * are complete.
+ *
+ * This method is not required for read-only files, and should be set
+ * to NULL when not supported by the file.
+ *
+ * @errors
+ *
+ * @param file_handle the WT_FILE_HANDLE
+ * @param session the current WiredTiger session
+ */
+ int (*sync)(WT_FILE_HANDLE *file_handle, WT_SESSION *session);
+
+ /*!
+ * Schedule the outstanding file writes required for durability and
+ * return immediately.
+ *
+ * This method is not required, and should be set to NULL when not
+ * supported by the file.
+ *
+ * @errors
+ *
+ * @param file_handle the WT_FILE_HANDLE
+ * @param session the current WiredTiger session
+ */
+ int (*sync_nowait)(WT_FILE_HANDLE *file_handle, WT_SESSION *session);
+
+ /*!
+ * Lengthen or shorten a file to the specified length, based on the
+ * POSIX 1003.1 standard ftruncate.
+ *
+ * This method is not required for read-only files, and should be set
+ * to NULL when not supported by the file.
+ *
+ * @errors
+ *
+ * @param file_handle the WT_FILE_HANDLE
+ * @param session the current WiredTiger session
+ * @param length desired file size after truncate
+ */
+ int (*truncate)(
+ WT_FILE_HANDLE *file_handle, WT_SESSION *session, wt_off_t length);
+
+ /*!
+ * Write to a file, based on the POSIX 1003.1 standard pwrite.
+ *
+ * This method is not required for read-only files, and should be set
+ * to NULL when not supported by the file.
+ *
+ * @errors
+ *
+ * @param file_handle the WT_FILE_HANDLE
+ * @param session the current WiredTiger session
+ * @param offset offset at which to start writing
+ * @param length amount of data to write
+ * @param buf content to be written to the file
+ */
+ int (*write)(WT_FILE_HANDLE *file_handle, WT_SESSION *session,
+ wt_off_t offset, size_t length, const void *buf);
+};
+#endif /* !defined(SWIG) */
+
/*!
* Entry point to an extension, called when the extension is loaded.
*
diff --git a/src/include/wt_internal.h b/src/include/wt_internal.h
index e149ba9b3a7..0c8abf36cfe 100644
--- a/src/include/wt_internal.h
+++ b/src/include/wt_internal.h
@@ -181,6 +181,12 @@ struct __wt_fair_lock;
typedef struct __wt_fair_lock WT_FAIR_LOCK;
struct __wt_fh;
typedef struct __wt_fh WT_FH;
+struct __wt_file_handle_inmem;
+ typedef struct __wt_file_handle_inmem WT_FILE_HANDLE_INMEM;
+struct __wt_file_handle_posix;
+ typedef struct __wt_file_handle_posix WT_FILE_HANDLE_POSIX;
+struct __wt_file_handle_win;
+ typedef struct __wt_file_handle_win WT_FILE_HANDLE_WIN;
struct __wt_fstream;
typedef struct __wt_fstream WT_FSTREAM;
struct __wt_hazard;
diff --git a/src/log/log.c b/src/log/log.c
index aabf629f867..fd5d4bca5bc 100644
--- a/src/log/log.c
+++ b/src/log/log.c
@@ -124,7 +124,7 @@ __wt_log_force_sync(WT_SESSION_IMPL *session, WT_LSN *min_lsn)
"log_force_sync: sync directory %s to LSN %" PRIu32
"/%" PRIu32,
log->log_dir_fh->name, min_lsn->l.file, min_lsn->l.offset));
- WT_ERR(__wt_directory_sync_fh(session, log->log_dir_fh));
+ WT_ERR(__wt_fsync(session, log->log_dir_fh, true));
log->sync_dir_lsn = *min_lsn;
WT_STAT_FAST_CONN_INCR(session, log_sync_dir);
}
@@ -258,8 +258,8 @@ __log_get_files(WT_SESSION_IMPL *session,
log_path = conn->log_path;
if (log_path == NULL)
log_path = "";
- return (__wt_dirlist(session, log_path, file_prefix,
- WT_DIRLIST_INCLUDE, filesp, countp));
+ return (__wt_fs_directory_list(
+ session, log_path, file_prefix, filesp, countp));
}
/*
@@ -277,6 +277,9 @@ __wt_log_get_all_files(WT_SESSION_IMPL *session,
uint32_t id, max;
u_int count, i;
+ *filesp = NULL;
+ *countp = 0;
+
id = 0;
log = S2C(session)->log;
@@ -307,26 +310,12 @@ __wt_log_get_all_files(WT_SESSION_IMPL *session,
*countp = count;
if (0) {
-err: __wt_log_files_free(session, files, count);
+err: WT_TRET(__wt_fs_directory_list_free(session, &files, &count));
}
return (ret);
}
/*
- * __wt_log_files_free --
- * Free memory associated with a log file list.
- */
-void
-__wt_log_files_free(WT_SESSION_IMPL *session, char **files, u_int count)
-{
- u_int i;
-
- for (i = 0; i < count; i++)
- __wt_free(session, files[i]);
- __wt_free(session, files);
-}
-
-/*
* __log_filename --
* Given a log number, return a WT_ITEM of a generated log file name
* of the given prefix type.
@@ -450,14 +439,20 @@ __log_prealloc(WT_SESSION_IMPL *session, WT_FH *fh)
* and zero the log file based on what is available.
*/
if (FLD_ISSET(conn->log_flags, WT_CONN_LOG_ZERO_FILL))
- ret = __log_zero(session, fh,
- WT_LOG_FIRST_RECORD, conn->log_file_max);
- else if (fh->fallocate_available == WT_FALLOCATE_NOT_AVAILABLE ||
- (ret = __wt_fallocate(session, fh,
- WT_LOG_FIRST_RECORD, conn->log_file_max)) == ENOTSUP)
- ret = __wt_ftruncate(session, fh,
- WT_LOG_FIRST_RECORD + conn->log_file_max);
- return (ret);
+ return (__log_zero(session, fh,
+ WT_LOG_FIRST_RECORD, conn->log_file_max));
+
+ /*
+ * We have exclusive access to the log file and there are no other
+ * writes happening concurrently, so there are no locking issues.
+ */
+ if ((ret = __wt_fallocate(
+ session, fh, WT_LOG_FIRST_RECORD, conn->log_file_max)) == 0)
+ return (0);
+ WT_RET_ERROR_OK(ret, ENOTSUP);
+
+ return (__wt_ftruncate(
+ session, fh, WT_LOG_FIRST_RECORD + conn->log_file_max));
}
/*
@@ -669,14 +664,17 @@ static int
__log_openfile(WT_SESSION_IMPL *session,
bool ok_create, WT_FH **fhp, const char *file_prefix, uint32_t id)
{
+ WT_CONNECTION_IMPL *conn;
WT_DECL_ITEM(buf);
WT_DECL_RET;
WT_LOG *log;
WT_LOG_DESC *desc;
WT_LOG_RECORD *logrec;
uint32_t allocsize;
+ u_int flags;
- log = S2C(session)->log;
+ conn = S2C(session);
+ log = conn->log;
if (log == NULL)
allocsize = WT_LOG_ALIGN;
else
@@ -685,8 +683,14 @@ __log_openfile(WT_SESSION_IMPL *session,
WT_ERR(__log_filename(session, id, file_prefix, buf));
WT_ERR(__wt_verbose(session, WT_VERB_LOG,
"opening log %s", (const char *)buf->data));
- WT_ERR(__wt_open(session, buf->data,
- WT_FILE_TYPE_LOG, ok_create ? WT_OPEN_CREATE : 0, fhp));
+ flags = 0;
+ if (ok_create)
+ LF_SET(WT_OPEN_CREATE);
+ if (FLD_ISSET(conn->direct_io, WT_DIRECT_IO_LOG))
+ LF_SET(WT_OPEN_DIRECTIO);
+ WT_ERR(__wt_open(
+ session, buf->data, WT_OPEN_FILE_TYPE_LOG, flags, fhp));
+
/*
* If we are not creating the log file but opening it for reading,
* check that the magic number and versions are correct.
@@ -757,12 +761,11 @@ __log_alloc_prealloc(WT_SESSION_IMPL *session, uint32_t to_num)
* All file setup, writing the header and pre-allocation was done
* before. We only need to rename it.
*/
- WT_ERR(__wt_rename(session, from_path->data, to_path->data));
+ WT_ERR(__wt_fs_rename(session, from_path->data, to_path->data));
err: __wt_scr_free(session, &from_path);
__wt_scr_free(session, &to_path);
- if (logfiles != NULL)
- __wt_log_files_free(session, logfiles, logcount);
+ WT_TRET(__wt_fs_directory_list_free(session, &logfiles, &logcount));
return (ret);
}
@@ -984,8 +987,7 @@ __log_truncate(WT_SESSION_IMPL *session,
}
}
err: WT_TRET(__wt_close(session, &log_fh));
- if (logfiles != NULL)
- __wt_log_files_free(session, logfiles, logcount);
+ WT_TRET(__wt_fs_directory_list_free(session, &logfiles, &logcount));
return (ret);
}
@@ -1037,7 +1039,7 @@ __wt_log_allocfile(
/*
* Rename it into place and make it available.
*/
- WT_ERR(__wt_rename(session, from_path->data, to_path->data));
+ WT_ERR(__wt_fs_rename(session, from_path->data, to_path->data));
err: __wt_scr_free(session, &from_path);
__wt_scr_free(session, &to_path);
@@ -1060,7 +1062,7 @@ __wt_log_remove(WT_SESSION_IMPL *session,
WT_ERR(__log_filename(session, lognum, file_prefix, path));
WT_ERR(__wt_verbose(session, WT_VERB_LOG,
"log_remove: remove log %s", (char *)path->data));
- WT_ERR(__wt_remove(session, path->data));
+ WT_ERR(__wt_fs_remove(session, path->data));
err: __wt_scr_free(session, &path);
return (ret);
}
@@ -1096,7 +1098,7 @@ __wt_log_open(WT_SESSION_IMPL *session)
WT_RET(__wt_verbose(session, WT_VERB_LOG,
"log_open: open fh to directory %s", conn->log_path));
WT_RET(__wt_open(session, conn->log_path,
- WT_FILE_TYPE_DIRECTORY, 0, &log->log_dir_fh));
+ WT_OPEN_FILE_TYPE_DIRECTORY, 0, &log->log_dir_fh));
}
if (!F_ISSET(conn, WT_CONN_READONLY)) {
@@ -1113,9 +1115,8 @@ __wt_log_open(WT_SESSION_IMPL *session)
WT_ERR(__wt_log_remove(
session, WT_LOG_TMPNAME, lognum));
}
- __wt_log_files_free(session, logfiles, logcount);
- logfiles = NULL;
- logcount = 0;
+ WT_ERR(
+ __wt_fs_directory_list_free(session, &logfiles, &logcount));
WT_ERR(__log_get_files(session,
WT_LOG_PREPNAME, &logfiles, &logcount));
for (i = 0; i < logcount; i++) {
@@ -1124,8 +1125,8 @@ __wt_log_open(WT_SESSION_IMPL *session)
WT_ERR(__wt_log_remove(
session, WT_LOG_PREPNAME, lognum));
}
- __wt_log_files_free(session, logfiles, logcount);
- logfiles = NULL;
+ WT_ERR(
+ __wt_fs_directory_list_free(session, &logfiles, &logcount));
}
/*
@@ -1163,8 +1164,7 @@ __wt_log_open(WT_SESSION_IMPL *session)
FLD_SET(conn->log_flags, WT_CONN_LOG_EXISTED);
}
-err: if (logfiles != NULL)
- __wt_log_files_free(session, logfiles, logcount);
+err: WT_TRET(__wt_fs_directory_list_free(session, &logfiles, &logcount));
return (ret);
}
@@ -1200,8 +1200,7 @@ __wt_log_close(WT_SESSION_IMPL *session)
WT_RET(__wt_verbose(session, WT_VERB_LOG,
"closing log directory %s", log->log_dir_fh->name));
if (!F_ISSET(conn, WT_CONN_READONLY))
- WT_RET(
- __wt_directory_sync_fh(session, log->log_dir_fh));
+ WT_RET(__wt_fsync(session, log->log_dir_fh, true));
WT_RET(__wt_close(session, &log->log_dir_fh));
log->log_dir_fh = NULL;
}
@@ -1408,8 +1407,7 @@ __wt_log_release(WT_SESSION_IMPL *session, WT_LOGSLOT *slot, bool *freep)
"/%" PRIu32,
log->log_dir_fh->name,
sync_lsn.l.file, sync_lsn.l.offset));
- WT_ERR(__wt_directory_sync_fh(
- session, log->log_dir_fh));
+ WT_ERR(__wt_fsync(session, log->log_dir_fh, true));
log->sync_dir_lsn = sync_lsn;
WT_STAT_FAST_CONN_INCR(session, log_sync_dir);
}
@@ -1550,8 +1548,8 @@ __wt_log_scan(WT_SESSION_IMPL *session, WT_LSN *lsnp, uint32_t flags,
}
WT_SET_LSN(&start_lsn, firstlog, 0);
WT_SET_LSN(&end_lsn, lastlog, 0);
- __wt_log_files_free(session, logfiles, logcount);
- logfiles = NULL;
+ WT_ERR(
+ __wt_fs_directory_list_free(session, &logfiles, &logcount));
}
WT_ERR(__log_openfile(
session, false, &log_fh, WT_LOG_FILENAME, start_lsn.l.file));
@@ -1747,8 +1745,7 @@ advance:
err: WT_STAT_FAST_CONN_INCR(session, log_scans);
- if (logfiles != NULL)
- __wt_log_files_free(session, logfiles, logcount);
+ WT_TRET(__wt_fs_directory_list_free(session, &logfiles, &logcount));
__wt_scr_free(session, &buf);
__wt_scr_free(session, &decryptitem);
diff --git a/src/lsm/lsm_tree.c b/src/lsm/lsm_tree.c
index de6dc005bc6..da106ae2089 100644
--- a/src/lsm/lsm_tree.c
+++ b/src/lsm/lsm_tree.c
@@ -235,7 +235,7 @@ __wt_lsm_tree_set_chunk_size(
if (!WT_PREFIX_SKIP(filename, "file:"))
WT_RET_MSG(session, EINVAL,
"Expected a 'file:' URI: %s", chunk->uri);
- WT_RET(__wt_filesize_name(session, filename, false, &size));
+ WT_RET(__wt_fs_size(session, filename, &size));
chunk->size = (uint64_t)size;
@@ -256,7 +256,7 @@ __lsm_tree_cleanup_old(WT_SESSION_IMPL *session, const char *uri)
{ WT_CONFIG_BASE(session, WT_SESSION_drop), "force", NULL };
bool exists;
- WT_RET(__wt_exist(session, uri + strlen("file:"), &exists));
+ WT_RET(__wt_fs_exist(session, uri + strlen("file:"), &exists));
if (exists)
WT_WITH_SCHEMA_LOCK(session, ret,
ret = __wt_schema_drop(session, uri, cfg));
diff --git a/src/lsm/lsm_work_unit.c b/src/lsm/lsm_work_unit.c
index 51cf2e981de..821a996c38b 100644
--- a/src/lsm/lsm_work_unit.c
+++ b/src/lsm/lsm_work_unit.c
@@ -525,7 +525,7 @@ __lsm_drop_file(WT_SESSION_IMPL *session, const char *uri)
ret = __wt_schema_drop(session, uri, drop_cfg));
if (ret == 0)
- ret = __wt_remove(session, uri + strlen("file:"));
+ ret = __wt_fs_remove(session, uri + strlen("file:"));
WT_RET(__wt_verbose(session, WT_VERB_LSM, "Dropped %s", uri));
if (ret == EBUSY || ret == ENOENT)
diff --git a/src/meta/meta_track.c b/src/meta/meta_track.c
index a73b7e09d37..4fe628e319b 100644
--- a/src/meta/meta_track.c
+++ b/src/meta/meta_track.c
@@ -194,8 +194,8 @@ __meta_track_unroll(WT_SESSION_IMPL *session, WT_META_TRACK *trk)
__wt_err(session, ret,
"metadata unroll rename %s to %s", trk->b, trk->a);
- if (trk->a == NULL &&
- (ret = __wt_remove(session, trk->b + strlen("file:"))) != 0)
+ if (trk->a == NULL && (ret =
+ __wt_fs_remove(session, trk->b + strlen("file:"))) != 0)
__wt_err(session, ret,
"metadata unroll create %s", trk->b);
diff --git a/src/meta/meta_turtle.c b/src/meta/meta_turtle.c
index a45e7ecf9e0..ee9ee522748 100644
--- a/src/meta/meta_turtle.c
+++ b/src/meta/meta_turtle.c
@@ -75,11 +75,11 @@ __metadata_load_hot_backup(WT_SESSION_IMPL *session)
bool exist;
/* Look for a hot backup file: if we find it, load it. */
- WT_RET(__wt_exist(session, WT_METADATA_BACKUP, &exist));
+ WT_RET(__wt_fs_exist(session, WT_METADATA_BACKUP, &exist));
if (!exist)
return (0);
WT_RET(__wt_fopen(session,
- WT_METADATA_BACKUP, WT_FILE_TYPE_REGULAR, WT_STREAM_READ, &fs));
+ WT_METADATA_BACKUP, 0, WT_STREAM_READ, &fs));
/* Read line pairs and load them into the metadata file. */
WT_ERR(__wt_scr_alloc(session, 512, &key));
@@ -128,7 +128,7 @@ __metadata_load_bulk(WT_SESSION_IMPL *session)
continue;
/* If the file exists, it's all good. */
- WT_ERR(__wt_exist(session, key, &exist));
+ WT_ERR(__wt_fs_exist(session, key, &exist));
if (exist)
continue;
@@ -182,9 +182,9 @@ __wt_turtle_init(WT_SESSION_IMPL *session)
* that is an error. Otherwise, if there's already a turtle file, we're
* done.
*/
- WT_RET(__wt_exist(session, WT_INCREMENTAL_BACKUP, &exist_incr));
- WT_RET(__wt_exist(session, WT_METADATA_BACKUP, &exist_backup));
- WT_RET(__wt_exist(session, WT_METADATA_TURTLE, &exist_turtle));
+ WT_RET(__wt_fs_exist(session, WT_INCREMENTAL_BACKUP, &exist_incr));
+ WT_RET(__wt_fs_exist(session, WT_METADATA_BACKUP, &exist_backup));
+ WT_RET(__wt_fs_exist(session, WT_METADATA_TURTLE, &exist_turtle));
if (exist_turtle) {
if (exist_incr)
WT_RET_MSG(session, EINVAL,
@@ -254,7 +254,7 @@ __wt_turtle_read(WT_SESSION_IMPL *session, const char *key, char **valuep)
* the turtle file, and that means returning the default configuration
* string for the metadata file.
*/
- WT_RET(__wt_exist(session, WT_METADATA_TURTLE, &exist));
+ WT_RET(__wt_fs_exist(session, WT_METADATA_TURTLE, &exist));
if (!exist)
return (strcmp(key, WT_METAFILE_URI) == 0 ?
__metadata_config(session, valuep) : WT_NOTFOUND);
diff --git a/src/os_common/filename.c b/src/os_common/filename.c
index 771cf61f081..5f174288350 100644
--- a/src/os_common/filename.c
+++ b/src/os_common/filename.c
@@ -60,9 +60,9 @@ __wt_remove_if_exists(WT_SESSION_IMPL *session, const char *name)
{
bool exist;
- WT_RET(__wt_exist(session, name, &exist));
+ WT_RET(__wt_fs_exist(session, name, &exist));
if (exist)
- WT_RET(__wt_remove(session, name));
+ WT_RET(__wt_fs_remove(session, name));
return (0);
}
@@ -78,7 +78,7 @@ __wt_rename_and_sync_directory(
bool same_directory;
/* Rename the source file to the target. */
- WT_RET(__wt_rename(session, from, to));
+ WT_RET(__wt_fs_rename(session, from, to));
/*
* Flush the backing directory to guarantee the rename. My reading of
@@ -89,7 +89,7 @@ __wt_rename_and_sync_directory(
* with specific mount options. Flush both of the from/to directories
* until it's a performance problem.
*/
- WT_RET(__wt_directory_sync(session, from));
+ WT_RET(__wt_fs_directory_sync(session, from));
/*
* In almost all cases, we're going to be renaming files in the same
@@ -101,7 +101,7 @@ __wt_rename_and_sync_directory(
(fp != NULL && tp != NULL &&
fp - from == tp - to && memcmp(from, to, (size_t)(fp - from)) == 0);
- return (same_directory ? 0 : __wt_directory_sync(session, to));
+ return (same_directory ? 0 : __wt_fs_directory_sync(session, to));
}
/*
@@ -138,9 +138,9 @@ __wt_copy_and_sync(WT_SESSION *wt_session, const char *from, const char *to)
WT_ERR(__wt_remove_if_exists(session, tmp->data));
/* Open the from and temporary file handles. */
- WT_ERR(__wt_open(session, from, WT_FILE_TYPE_REGULAR, 0, &ffh));
- WT_ERR(__wt_open(session, tmp->data,
- WT_FILE_TYPE_REGULAR, WT_OPEN_CREATE | WT_OPEN_EXCLUSIVE, &tfh));
+ WT_ERR(__wt_open(session, from, WT_OPEN_FILE_TYPE_REGULAR, 0, &ffh));
+ WT_ERR(__wt_open(session, tmp->data, WT_OPEN_FILE_TYPE_REGULAR,
+ WT_OPEN_CREATE | WT_OPEN_EXCLUSIVE, &tfh));
/*
* Allocate a copy buffer. Don't use a scratch buffer, this thing is
diff --git a/src/os_common/os_fhandle.c b/src/os_common/os_fhandle.c
index c14fa084130..ec92797fb50 100644
--- a/src/os_common/os_fhandle.c
+++ b/src/os_common/os_fhandle.c
@@ -9,213 +9,88 @@
#include "wt_internal.h"
/*
- * __fhandle_advise_notsup --
- * POSIX fadvise unsupported.
+ * __fhandle_method_finalize --
+ * Initialize any NULL WT_FH structure methods to not-supported. Doing
+ * this means that custom file systems with incomplete implementations
+ * won't dereference NULL pointers.
*/
static int
-__fhandle_advise_notsup(WT_SESSION_IMPL *session,
- WT_FH *fh, wt_off_t offset, wt_off_t len, int advice)
+__fhandle_method_finalize(
+ WT_SESSION_IMPL *session, WT_FILE_HANDLE *handle, bool readonly)
{
- WT_UNUSED(session);
- WT_UNUSED(fh);
- WT_UNUSED(offset);
- WT_UNUSED(len);
- WT_UNUSED(advice);
-
- /* Quietly fail, callers expect not-supported failures. */
- return (ENOTSUP);
-}
-
-/*
- * __fhandle_allocate_notsup --
- * POSIX fallocate unsupported.
- */
-static int
-__fhandle_allocate_notsup(
- WT_SESSION_IMPL *session, WT_FH *fh, wt_off_t offset, wt_off_t len)
-{
- WT_UNUSED(offset);
- WT_UNUSED(len);
- WT_RET_MSG(session, ENOTSUP, "%s: file-allocate", fh->name);
-}
-
-/*
- * __fhandle_close_notsup --
- * ANSI C close/fclose unsupported.
- */
-static int
-__fhandle_close_notsup(WT_SESSION_IMPL *session, WT_FH *fh)
-{
- WT_RET_MSG(session, ENOTSUP, "%s: file-close", fh->name);
-}
-
-/*
- * __fhandle_lock_notsup --
- * Lock/unlock a file unsupported.
- */
-static int
-__fhandle_lock_notsup(WT_SESSION_IMPL *session, WT_FH *fh, bool lock)
-{
- WT_UNUSED(lock);
- WT_RET_MSG(session, ENOTSUP, "%s: file-lock", fh->name);
-}
-
-/*
- * __fhandle_map_notsup --
- * Map a file unsupported.
- */
-static int
-__fhandle_map_notsup(WT_SESSION_IMPL *session,
- WT_FH *fh, void *p, size_t *lenp, void **mappingcookie)
-{
- WT_UNUSED(p);
- WT_UNUSED(lenp);
- WT_UNUSED(mappingcookie);
- WT_RET_MSG(session, ENOTSUP, "%s: file-map", fh->name);
-}
-
-/*
- * __fhandle_map_discard_notsup --
- * Discard a section of a mapped region unsupported.
- */
-static int
-__fhandle_map_discard_notsup(
- WT_SESSION_IMPL *session, WT_FH *fh, void *p, size_t len)
-{
- WT_UNUSED(p);
- WT_UNUSED(len);
- WT_RET_MSG(session, ENOTSUP, "%s: file-map-discard", fh->name);
-}
+#define WT_HANDLE_METHOD_REQ(name) \
+ if (handle->name == NULL) \
+ WT_RET_MSG(session, EINVAL, \
+ "a WT_FILE_HANDLE.%s method must be configured", #name)
+
+ WT_HANDLE_METHOD_REQ(close);
+ /* not required: fadvise */
+ /* not required: fallocate */
+ /* not required: fallocate_nolock */
+ /* not required: lock */
+ /* not required: map */
+ /* not required: map_discard */
+ /* not required: map_preload */
+ /* not required: map_unmap */
+ WT_HANDLE_METHOD_REQ(read);
+ WT_HANDLE_METHOD_REQ(size);
+ /* not required: sync */
+ /* not required: sync_nowait */
+ if (!readonly) {
+ WT_HANDLE_METHOD_REQ(truncate);
+ WT_HANDLE_METHOD_REQ(write);
+ }
-/*
- * __fhandle_map_preload_notsup --
- * Preload a section of a mapped region unsupported.
- */
-static int
-__fhandle_map_preload_notsup(
- WT_SESSION_IMPL *session, WT_FH *fh, const void *p, size_t len)
-{
- WT_UNUSED(p);
- WT_UNUSED(len);
- WT_RET_MSG(session, ENOTSUP, "%s: file-map-preload", fh->name);
+ return (0);
}
+#ifdef HAVE_DIAGNOSTIC
/*
- * __fhandle_map_unmap_notsup --
- * Unmap a file unsupported.
+ * __wt_handle_is_open --
+ * Return if there's an open handle matching a name.
*/
-static int
-__fhandle_map_unmap_notsup(WT_SESSION_IMPL *session,
- WT_FH *fh, void *p, size_t len, void **mappingcookie)
+bool
+__wt_handle_is_open(WT_SESSION_IMPL *session, const char *name)
{
- WT_UNUSED(p);
- WT_UNUSED(len);
- WT_UNUSED(mappingcookie);
- WT_RET_MSG(session, ENOTSUP, "%s: file-map-unmap", fh->name);
-}
+ WT_CONNECTION_IMPL *conn;
+ WT_FH *fh;
+ uint64_t bucket, hash;
+ bool found;
-/*
- * __fhandle_read_notsup --
- * POSIX pread unsupported.
- */
-static int
-__fhandle_read_notsup(
- WT_SESSION_IMPL *session, WT_FH *fh, wt_off_t offset, size_t len, void *buf)
-{
- WT_UNUSED(offset);
- WT_UNUSED(len);
- WT_UNUSED(buf);
- WT_RET_MSG(session, ENOTSUP, "%s: file-read", fh->name);
-}
+ conn = S2C(session);
+ found = false;
-/*
- * __fhandle_size_notsup --
- * Get the size of a file in bytes unsupported.
- */
-static int
-__fhandle_size_notsup(WT_SESSION_IMPL *session, WT_FH *fh, wt_off_t *sizep)
-{
- WT_UNUSED(sizep);
- WT_RET_MSG(session, ENOTSUP, "%s: file-size", fh->name);
-}
+ hash = __wt_hash_city64(name, strlen(name));
+ bucket = hash % WT_HASH_ARRAY_SIZE;
-/*
- * __fhandle_sync_notsup --
- * POSIX fsync unsupported.
- */
-static int
-__fhandle_sync_notsup(WT_SESSION_IMPL *session, WT_FH *fh, bool block)
-{
- WT_UNUSED(block);
- WT_RET_MSG(session, ENOTSUP, "%s: file-sync", fh->name);
-}
+ __wt_spin_lock(session, &conn->fh_lock);
-/*
- * __fhandle_truncate_notsup --
- * POSIX ftruncate.
- */
-static int
-__fhandle_truncate_notsup(WT_SESSION_IMPL *session, WT_FH *fh, wt_off_t len)
-{
- WT_UNUSED(len);
- WT_RET_MSG(session, ENOTSUP, "%s: file-truncate", fh->name);
-}
+ TAILQ_FOREACH(fh, &conn->fhhash[bucket], hashq)
+ if (strcmp(name, fh->name) == 0) {
+ found = true;
+ break;
+ }
-/*
- * __fhandle_write_notsup --
- * POSIX pwrite.
- */
-static int
-__fhandle_write_notsup(WT_SESSION_IMPL *session,
- WT_FH *fh, wt_off_t offset, size_t len, const void *buf)
-{
- WT_UNUSED(offset);
- WT_UNUSED(len);
- WT_UNUSED(buf);
- WT_RET_MSG(session, ENOTSUP, "%s: file-write", fh->name);
-}
+ __wt_spin_unlock(session, &conn->fh_lock);
-/*
- * __fhandle_method_init --
- * Initialize the WT_FH structure's methods to not-supported.
- */
-static void
-__fhandle_method_init(WT_FH *fh)
-{
- /*
- * Set up the initial set of handle methods to standard "not-supported"
- * functions, the underlying open functions turn on supported functions.
- */
- fh->fh_advise = __fhandle_advise_notsup;
- fh->fh_allocate = __fhandle_allocate_notsup;
- fh->fh_close = __fhandle_close_notsup;
- fh->fh_lock = __fhandle_lock_notsup;
- fh->fh_map = __fhandle_map_notsup;
- fh->fh_map_discard = __fhandle_map_discard_notsup;
- fh->fh_map_preload = __fhandle_map_preload_notsup;
- fh->fh_map_unmap = __fhandle_map_unmap_notsup;
- fh->fh_read = __fhandle_read_notsup;
- fh->fh_size = __fhandle_size_notsup;
- fh->fh_sync = __fhandle_sync_notsup;
- fh->fh_truncate = __fhandle_truncate_notsup;
- fh->fh_write = __fhandle_write_notsup;
+ return (found);
}
+#endif
/*
- * __wt_handle_search --
+ * __handle_search --
* Search for a matching handle.
*/
-bool
-__wt_handle_search(WT_SESSION_IMPL *session,
- const char *name, bool increment_ref, WT_FH *newfh, WT_FH **fhp)
+static bool
+__handle_search(
+ WT_SESSION_IMPL *session, const char *name, WT_FH *newfh, WT_FH **fhp)
{
WT_CONNECTION_IMPL *conn;
WT_FH *fh;
uint64_t bucket, hash;
bool found;
- if (fhp != NULL)
- *fhp = NULL;
+ *fhp = NULL;
conn = S2C(session);
found = false;
@@ -226,15 +101,13 @@ __wt_handle_search(WT_SESSION_IMPL *session,
__wt_spin_lock(session, &conn->fh_lock);
/*
- * If we already have the file open, optionally increment the reference
- * count and return a pointer.
+ * If we already have the file open, increment the reference count and
+ * return a pointer.
*/
TAILQ_FOREACH(fh, &conn->fhhash[bucket], hashq)
if (strcmp(name, fh->name) == 0) {
- if (increment_ref)
- ++fh->ref;
- if (fhp != NULL)
- *fhp = fh;
+ ++fh->ref;
+ *fhp = fh;
found = true;
break;
}
@@ -245,10 +118,8 @@ __wt_handle_search(WT_SESSION_IMPL *session,
WT_CONN_FILE_INSERT(conn, newfh, bucket);
(void)__wt_atomic_add32(&conn->open_file_count, 1);
- if (increment_ref)
- ++newfh->ref;
- if (fhp != NULL)
- *fhp = newfh;
+ ++newfh->ref;
+ *fhp = newfh;
}
__wt_spin_unlock(session, &conn->fh_lock);
@@ -261,8 +132,8 @@ __wt_handle_search(WT_SESSION_IMPL *session,
* Optionally output a verbose message on handle open.
*/
static inline int
-__open_verbose(WT_SESSION_IMPL *session,
- const char *name, uint32_t file_type, uint32_t flags)
+__open_verbose(
+ WT_SESSION_IMPL *session, const char *name, int file_type, u_int flags)
{
#ifdef HAVE_VERBOSE
WT_DECL_RET;
@@ -278,19 +149,19 @@ __open_verbose(WT_SESSION_IMPL *session,
*/
switch (file_type) {
- case WT_FILE_TYPE_CHECKPOINT:
+ case WT_OPEN_FILE_TYPE_CHECKPOINT:
file_type_tag = "checkpoint";
break;
- case WT_FILE_TYPE_DATA:
+ case WT_OPEN_FILE_TYPE_DATA:
file_type_tag = "data";
break;
- case WT_FILE_TYPE_DIRECTORY:
+ case WT_OPEN_FILE_TYPE_DIRECTORY:
file_type_tag = "directory";
break;
- case WT_FILE_TYPE_LOG:
+ case WT_OPEN_FILE_TYPE_LOG:
file_type_tag = "log";
break;
- case WT_FILE_TYPE_REGULAR:
+ case WT_OPEN_FILE_TYPE_REGULAR:
file_type_tag = "regular";
break;
default:
@@ -337,17 +208,19 @@ err: __wt_scr_free(session, &tmp);
*/
int
__wt_open(WT_SESSION_IMPL *session,
- const char *name, uint32_t file_type, uint32_t flags, WT_FH **fhp)
+ const char *name, WT_OPEN_FILE_TYPE file_type, u_int flags, WT_FH **fhp)
{
WT_CONNECTION_IMPL *conn;
WT_DECL_RET;
WT_FH *fh;
+ WT_FILE_SYSTEM *file_system;
bool lock_file, open_called;
char *path;
WT_ASSERT(session, file_type != 0); /* A file type is required. */
conn = S2C(session);
+ file_system = conn->file_system;
fh = NULL;
open_called = false;
path = NULL;
@@ -355,7 +228,7 @@ __wt_open(WT_SESSION_IMPL *session,
WT_RET(__open_verbose(session, name, file_type, flags));
/* Check if the handle is already open. */
- if (__wt_handle_search(session, name, true, NULL, &fh)) {
+ if (__handle_search(session, name, NULL, &fh)) {
*fhp = fh;
return (0);
}
@@ -363,7 +236,6 @@ __wt_open(WT_SESSION_IMPL *session,
/* Allocate and initialize the handle. */
WT_ERR(__wt_calloc_one(session, &fh));
WT_ERR(__wt_strdup(session, name, &fh->name));
- __fhandle_method_init(fh);
/*
* If this is a read-only connection, open all files read-only except
@@ -378,30 +250,26 @@ __wt_open(WT_SESSION_IMPL *session,
WT_ASSERT(session, lock_file || !LF_ISSET(WT_OPEN_CREATE));
}
- /*
- * Direct I/O: file-type is a flag from the set of possible flags stored
- * in the connection handle during configuration, check for a match.
- */
- fh->direct_io = false;
- if (FLD_ISSET(conn->direct_io, file_type))
- LF_SET(WT_OPEN_DIRECTIO);
-
/* Create the path to the file. */
if (!LF_ISSET(WT_OPEN_FIXED))
WT_ERR(__wt_filename(session, name, &path));
/* Call the underlying open function. */
- WT_ERR(conn->file_open(
- session, fh, path == NULL ? name : path, file_type, flags));
+ WT_ERR(file_system->open_file(file_system, &session->iface,
+ path == NULL ? name : path, file_type, flags, &fh->handle));
open_called = true;
+ WT_ERR(__fhandle_method_finalize(
+ session, fh->handle, LF_ISSET(WT_OPEN_READONLY)));
+
/*
* Repeat the check for a match: if there's no match, link our newly
* created handle onto the database's list of files.
*/
- if (__wt_handle_search(session, name, true, fh, fhp)) {
+ if (__handle_search(session, name, fh, fhp)) {
err: if (open_called)
- WT_TRET(fh->fh_close(session, fh));
+ WT_TRET(fh->handle->close(
+ fh->handle, (WT_SESSION *)session));
if (fh != NULL) {
__wt_free(session, fh->name);
__wt_free(session, fh);
@@ -443,7 +311,7 @@ __wt_close(WT_SESSION_IMPL *session, WT_FH **fhp)
*/
__wt_spin_lock(session, &conn->fh_lock);
WT_ASSERT(session, fh->ref > 0);
- if ((fh->ref > 0 && --fh->ref > 0) || F_ISSET(fh, WT_FH_IN_MEMORY)) {
+ if ((fh->ref > 0 && --fh->ref > 0)) {
__wt_spin_unlock(session, &conn->fh_lock);
return (0);
}
@@ -456,7 +324,7 @@ __wt_close(WT_SESSION_IMPL *session, WT_FH **fhp)
__wt_spin_unlock(session, &conn->fh_lock);
/* Discard underlying resources. */
- ret = fh->fh_close(session, fh);
+ ret = fh->handle->close(fh->handle, (WT_SESSION *)session);
__wt_free(session, fh->name);
__wt_free(session, fh);
@@ -478,18 +346,13 @@ __wt_close_connection_close(WT_SESSION_IMPL *session)
conn = S2C(session);
while ((fh = TAILQ_FIRST(&conn->fhqh)) != NULL) {
- /*
- * In-memory configurations will have open files, but the ref
- * counts should be zero.
- */
- if (!F_ISSET(conn, WT_CONN_IN_MEMORY) || fh->ref != 0) {
+ if (fh->ref != 0) {
ret = EBUSY;
__wt_errx(session,
"Connection has open file handles: %s", fh->name);
}
fh->ref = 1;
- F_CLR(fh, WT_FH_IN_MEMORY);
WT_TRET(__wt_close(session, &fh));
}
diff --git a/src/os_common/os_fs_inmemory.c b/src/os_common/os_fs_inmemory.c
index b4a6fd64784..55facbbaec1 100644
--- a/src/os_common/os_fs_inmemory.c
+++ b/src/os_common/os_fs_inmemory.c
@@ -9,39 +9,147 @@
#include "wt_internal.h"
/*
- * In-memory information.
+ * File system interface for in-memory implementation.
*/
typedef struct {
+ WT_FILE_SYSTEM iface;
+
+ TAILQ_HEAD(__wt_closed_file_handle_qh, __wt_file_handle_inmem) fileq;
+
WT_SPINLOCK lock;
-} WT_IM;
+} WT_FILE_SYSTEM_INMEM;
+
+static int __im_file_size(WT_FILE_HANDLE *, WT_SESSION *, wt_off_t *);
/*
- * __im_directory_list --
- * Get a list of files from a directory, in-memory version.
+ * __im_handle_search --
+ * Return a matching handle, if one exists.
+ */
+static WT_FILE_HANDLE_INMEM *
+__im_handle_search(WT_FILE_SYSTEM *file_system, const char *name)
+{
+ WT_FILE_HANDLE_INMEM *im_fh;
+ WT_FILE_SYSTEM_INMEM *im_fs;
+
+ im_fs = (WT_FILE_SYSTEM_INMEM *)file_system;
+
+ TAILQ_FOREACH(im_fh, &im_fs->fileq, q)
+ if (strcmp(im_fh->iface.name, name) == 0)
+ break;
+ return (im_fh);
+}
+
+/*
+ * __im_handle_remove --
+ * Destroy an in-memory file handle. Should only happen on remove or
+ * shutdown.
*/
static int
-__im_directory_list(WT_SESSION_IMPL *session, const char *dir,
- const char *prefix, uint32_t flags, char ***dirlist, u_int *countp)
+__im_handle_remove(WT_SESSION_IMPL *session,
+ WT_FILE_SYSTEM *file_system, WT_FILE_HANDLE_INMEM *im_fh)
{
- WT_UNUSED(session);
- WT_UNUSED(dir);
- WT_UNUSED(prefix);
- WT_UNUSED(flags);
- WT_UNUSED(dirlist);
- WT_UNUSED(countp);
+ WT_FILE_HANDLE *fhp;
+ WT_FILE_SYSTEM_INMEM *im_fs;
+
+ im_fs = (WT_FILE_SYSTEM_INMEM *)file_system;
+
+ if (im_fh->ref != 0)
+ WT_RET_MSG(session, EBUSY,
+ "%s: file-remove", im_fh->iface.name);
+
+ TAILQ_REMOVE(&im_fs->fileq, im_fh, q);
- WT_RET_MSG(session, ENOTSUP, "directory-list");
+ /* Clean up private information. */
+ __wt_buf_free(session, &im_fh->buf);
+
+ /* Clean up public information. */
+ fhp = (WT_FILE_HANDLE *)im_fh;
+ __wt_free(session, fhp->name);
+
+ __wt_free(session, im_fh);
+
+ return (0);
}
/*
- * __im_directory_sync --
- * Flush a directory to ensure file creation is durable.
+ * __im_fs_directory_list --
+ * Return the directory contents.
*/
static int
-__im_directory_sync(WT_SESSION_IMPL *session, const char *path)
+__im_fs_directory_list(WT_FILE_SYSTEM *file_system,
+ WT_SESSION *wt_session, const char *directory,
+ const char *prefix, char ***dirlistp, uint32_t *countp)
{
- WT_UNUSED(session);
- WT_UNUSED(path);
+ WT_DECL_RET;
+ WT_FILE_HANDLE_INMEM *im_fh;
+ WT_FILE_SYSTEM_INMEM *im_fs;
+ WT_SESSION_IMPL *session;
+ size_t dirallocsz, len;
+ uint32_t count;
+ char *name, **entries;
+
+ im_fs = (WT_FILE_SYSTEM_INMEM *)file_system;
+ session = (WT_SESSION_IMPL *)wt_session;
+
+ *dirlistp = NULL;
+ *countp = 0;
+
+ dirallocsz = 0;
+ len = strlen(directory);
+ entries = NULL;
+
+ __wt_spin_lock(session, &im_fs->lock);
+
+ count = 0;
+ TAILQ_FOREACH(im_fh, &im_fs->fileq, q) {
+ name = im_fh->iface.name;
+ if (strncmp(name, directory, len) != 0 ||
+ (prefix != NULL && !WT_PREFIX_MATCH(name + len, prefix)))
+ continue;
+
+ WT_ERR(__wt_realloc_def(
+ session, &dirallocsz, count + 1, &entries));
+ WT_ERR(__wt_strdup(session, name, &entries[count]));
+ ++count;
+ }
+
+ *dirlistp = entries;
+ *countp = count;
+
+err: __wt_spin_unlock(session, &im_fs->lock);
+ if (ret == 0)
+ return (0);
+
+ if (entries != NULL) {
+ while (count > 0)
+ __wt_free(session, entries[--count]);
+ __wt_free(session, entries);
+ }
+
+ WT_RET_MSG(session, ret,
+ "%s: directory-list, prefix \"%s\"",
+ directory, prefix == NULL ? "" : prefix);
+}
+
+/*
+ * __im_fs_directory_list_free --
+ * Free memory returned by __im_fs_directory_list.
+ */
+static int
+__im_fs_directory_list_free(WT_FILE_SYSTEM *file_system,
+ WT_SESSION *wt_session, char **dirlist, uint32_t count)
+{
+ WT_SESSION_IMPL *session;
+
+ WT_UNUSED(file_system);
+
+ session = (WT_SESSION_IMPL *)wt_session;
+
+ if (dirlist != NULL) {
+ while (count > 0)
+ __wt_free(session, dirlist[--count]);
+ __wt_free(session, dirlist);
+ }
return (0);
}
@@ -50,9 +158,20 @@ __im_directory_sync(WT_SESSION_IMPL *session, const char *path)
* Return if the file exists.
*/
static int
-__im_fs_exist(WT_SESSION_IMPL *session, const char *name, bool *existp)
+__im_fs_exist(WT_FILE_SYSTEM *file_system,
+ WT_SESSION *wt_session, const char *name, bool *existp)
{
- *existp = __wt_handle_search(session, name, false, NULL, NULL);
+ WT_FILE_SYSTEM_INMEM *im_fs;
+ WT_SESSION_IMPL *session;
+
+ im_fs = (WT_FILE_SYSTEM_INMEM *)file_system;
+ session = (WT_SESSION_IMPL *)wt_session;
+
+ __wt_spin_lock(session, &im_fs->lock);
+
+ *existp = __im_handle_search(file_system, name) != NULL;
+
+ __wt_spin_unlock(session, &im_fs->lock);
return (0);
}
@@ -61,18 +180,24 @@ __im_fs_exist(WT_SESSION_IMPL *session, const char *name, bool *existp)
* POSIX remove.
*/
static int
-__im_fs_remove(WT_SESSION_IMPL *session, const char *name)
+__im_fs_remove(
+ WT_FILE_SYSTEM *file_system, WT_SESSION *wt_session, const char *name)
{
WT_DECL_RET;
- WT_FH *fh;
+ WT_FILE_HANDLE_INMEM *im_fh;
+ WT_FILE_SYSTEM_INMEM *im_fs;
+ WT_SESSION_IMPL *session;
- if (__wt_handle_search(session, name, true, NULL, &fh)) {
- WT_ASSERT(session, fh->ref == 1);
+ im_fs = (WT_FILE_SYSTEM_INMEM *)file_system;
+ session = (WT_SESSION_IMPL *)wt_session;
- /* Force a discard of the handle. */
- F_CLR(fh, WT_FH_IN_MEMORY);
- ret = __wt_close(session, &fh);
- }
+ __wt_spin_lock(session, &im_fs->lock);
+
+ ret = ENOENT;
+ if ((im_fh = __im_handle_search(file_system, name)) != NULL)
+ ret = __im_handle_remove(session, file_system, im_fh);
+
+ __wt_spin_unlock(session, &im_fs->lock);
return (ret);
}
@@ -81,55 +206,29 @@ __im_fs_remove(WT_SESSION_IMPL *session, const char *name)
* POSIX rename.
*/
static int
-__im_fs_rename(WT_SESSION_IMPL *session, const char *from, const char *to)
+__im_fs_rename(WT_FILE_SYSTEM *file_system,
+ WT_SESSION *wt_session, const char *from, const char *to)
{
- WT_CONNECTION_IMPL *conn;
WT_DECL_RET;
- WT_FH *fh;
- uint64_t bucket, hash;
- char *to_name;
-
- conn = S2C(session);
-
- /* We'll need a copy of the target name. */
- WT_RET(__wt_strdup(session, to, &to_name));
-
- __wt_spin_lock(session, &conn->fh_lock);
-
- /* Make sure the target name isn't active. */
- hash = __wt_hash_city64(to, strlen(to));
- bucket = hash % WT_HASH_ARRAY_SIZE;
- TAILQ_FOREACH(fh, &conn->fhhash[bucket], hashq)
- if (strcmp(to, fh->name) == 0)
- WT_ERR(EPERM);
+ WT_FILE_HANDLE_INMEM *im_fh;
+ WT_FILE_SYSTEM_INMEM *im_fs;
+ WT_SESSION_IMPL *session;
+ char *copy;
- /* Find the source name. */
- hash = __wt_hash_city64(from, strlen(from));
- bucket = hash % WT_HASH_ARRAY_SIZE;
- TAILQ_FOREACH(fh, &conn->fhhash[bucket], hashq)
- if (strcmp(from, fh->name) == 0)
- break;
- if (fh == NULL)
- WT_ERR(ENOENT);
-
- /* Remove source from the list. */
- WT_CONN_FILE_REMOVE(conn, fh, bucket);
+ im_fs = (WT_FILE_SYSTEM_INMEM *)file_system;
+ session = (WT_SESSION_IMPL *)wt_session;
- /* Swap the names. */
- __wt_free(session, fh->name);
- fh->name = to_name;
- to_name = NULL;
+ __wt_spin_lock(session, &im_fs->lock);
- /* Put source back on the list. */
- hash = __wt_hash_city64(to, strlen(to));
- bucket = hash % WT_HASH_ARRAY_SIZE;
- WT_CONN_FILE_INSERT(conn, fh, bucket);
+ ret = ENOENT;
+ if ((im_fh = __im_handle_search(file_system, from)) != NULL) {
+ WT_ERR(__wt_strdup(session, to, &copy));
- if (0) {
-err: __wt_free(session, to_name);
+ __wt_free(session, im_fh->iface.name);
+ im_fh->iface.name = copy;
}
- __wt_spin_unlock(session, &conn->fh_lock);
+err: __wt_spin_unlock(session, &im_fs->lock);
return (ret);
}
@@ -138,25 +237,25 @@ err: __wt_free(session, to_name);
* Get the size of a file in bytes, by file name.
*/
static int
-__im_fs_size(
- WT_SESSION_IMPL *session, const char *name, bool silent, wt_off_t *sizep)
+__im_fs_size(WT_FILE_SYSTEM *file_system,
+ WT_SESSION *wt_session, const char *name, wt_off_t *sizep)
{
WT_DECL_RET;
- WT_FH *fh;
- WT_IM *im;
+ WT_FILE_HANDLE_INMEM *im_fh;
+ WT_FILE_SYSTEM_INMEM *im_fs;
+ WT_SESSION_IMPL *session;
- WT_UNUSED(silent);
+ im_fs = (WT_FILE_SYSTEM_INMEM *)file_system;
+ session = (WT_SESSION_IMPL *)wt_session;
- im = S2C(session)->inmemory;
- __wt_spin_lock(session, &im->lock);
+ __wt_spin_lock(session, &im_fs->lock);
- if (__wt_handle_search(session, name, true, NULL, &fh)) {
- WT_ERR(fh->fh_size(session, fh, sizep));
- WT_ERR(__wt_close(session, &fh));
- } else
- ret = ENOENT;
+ ret = ENOENT;
+ if ((im_fh = __im_handle_search(file_system, name)) != NULL)
+ ret = __im_file_size(
+ (WT_FILE_HANDLE *)im_fh, wt_session, sizep);
-err: __wt_spin_unlock(session, &im->lock);
+ __wt_spin_unlock(session, &im_fs->lock);
return (ret);
}
@@ -165,24 +264,22 @@ err: __wt_spin_unlock(session, &im->lock);
* ANSI C close.
*/
static int
-__im_file_close(WT_SESSION_IMPL *session, WT_FH *fh)
+__im_file_close(WT_FILE_HANDLE *file_handle, WT_SESSION *wt_session)
{
- __wt_buf_free(session, &fh->buf);
+ WT_FILE_HANDLE_INMEM *im_fh;
+ WT_FILE_SYSTEM_INMEM *im_fs;
+ WT_SESSION_IMPL *session;
- return (0);
-}
+ im_fh = (WT_FILE_HANDLE_INMEM *)file_handle;
+ im_fs = (WT_FILE_SYSTEM_INMEM *)file_handle->file_system;
+ session = (WT_SESSION_IMPL *)wt_session;
+
+ __wt_spin_lock(session, &im_fs->lock);
+
+ --im_fh->ref;
+
+ __wt_spin_unlock(session, &im_fs->lock);
-/*
- * __im_file_lock --
- * Lock/unlock a file.
- */
-static int
-__im_file_lock(WT_SESSION_IMPL *session, WT_FH *fh, bool lock)
-{
- /* Locks are always granted. */
- WT_UNUSED(session);
- WT_UNUSED(fh);
- WT_UNUSED(lock);
return (0);
}
@@ -191,31 +288,36 @@ __im_file_lock(WT_SESSION_IMPL *session, WT_FH *fh, bool lock)
* POSIX pread.
*/
static int
-__im_file_read(
- WT_SESSION_IMPL *session, WT_FH *fh, wt_off_t offset, size_t len, void *buf)
+__im_file_read(WT_FILE_HANDLE *file_handle,
+ WT_SESSION *wt_session, wt_off_t offset, size_t len, void *buf)
{
WT_DECL_RET;
- WT_IM *im;
+ WT_FILE_HANDLE_INMEM *im_fh;
+ WT_FILE_SYSTEM_INMEM *im_fs;
+ WT_SESSION_IMPL *session;
size_t off;
- im = S2C(session)->inmemory;
- __wt_spin_lock(session, &im->lock);
+ im_fh = (WT_FILE_HANDLE_INMEM *)file_handle;
+ im_fs = (WT_FILE_SYSTEM_INMEM *)file_handle->file_system;
+ session = (WT_SESSION_IMPL *)wt_session;
+
+ __wt_spin_lock(session, &im_fs->lock);
off = (size_t)offset;
- if (off < fh->buf.size) {
- len = WT_MIN(len, fh->buf.size - off);
- memcpy(buf, (uint8_t *)fh->buf.mem + off, len);
- fh->off = off + len;
+ if (off < im_fh->buf.size) {
+ len = WT_MIN(len, im_fh->buf.size - off);
+ memcpy(buf, (uint8_t *)im_fh->buf.mem + off, len);
+ im_fh->off = off + len;
} else
ret = WT_ERROR;
- __wt_spin_unlock(session, &im->lock);
+ __wt_spin_unlock(session, &im_fs->lock);
if (ret == 0)
return (0);
WT_RET_MSG(session, WT_ERROR,
"%s: handle-read: failed to read %" WT_SIZET_FMT " bytes at "
"offset %" WT_SIZET_FMT,
- fh->name, len, off);
+ im_fh->iface.name, len, off);
}
/*
@@ -223,34 +325,29 @@ __im_file_read(
* Get the size of a file in bytes, by file handle.
*/
static int
-__im_file_size(WT_SESSION_IMPL *session, WT_FH *fh, wt_off_t *sizep)
+__im_file_size(
+ WT_FILE_HANDLE *file_handle, WT_SESSION *wt_session, wt_off_t *sizep)
{
- WT_UNUSED(session);
+ WT_FILE_HANDLE_INMEM *im_fh;
+ WT_FILE_SYSTEM_INMEM *im_fs;
+ WT_SESSION_IMPL *session;
+
+ im_fh = (WT_FILE_HANDLE_INMEM *)file_handle;
+ im_fs = (WT_FILE_SYSTEM_INMEM *)file_handle->file_system;
+ session = (WT_SESSION_IMPL *)wt_session;
+
+ __wt_spin_lock(session, &im_fs->lock);
/*
* XXX hack - MongoDB assumes that any file with content will have a
* non-zero size. In memory tables generally are zero-sized, make
* MongoDB happy.
*/
- *sizep = fh->buf.size == 0 ? 1024 : (wt_off_t)fh->buf.size;
- return (0);
-}
+ *sizep = im_fh->buf.size == 0 ? 1024 : (wt_off_t)im_fh->buf.size;
-/*
- * __im_file_sync --
- * POSIX fflush/fsync.
- */
-static int
-__im_file_sync(WT_SESSION_IMPL *session, WT_FH *fh, bool block)
-{
- WT_UNUSED(session);
- WT_UNUSED(fh);
+ __wt_spin_unlock(session, &im_fs->lock);
- /*
- * Callers attempting asynchronous flush handle ENOTSUP returns, and
- * won't make further attempts.
- */
- return (block ? 0 : ENOTSUP);
+ return (0);
}
/*
@@ -258,27 +355,33 @@ __im_file_sync(WT_SESSION_IMPL *session, WT_FH *fh, bool block)
* POSIX ftruncate.
*/
static int
-__im_file_truncate(WT_SESSION_IMPL *session, WT_FH *fh, wt_off_t offset)
+__im_file_truncate(
+ WT_FILE_HANDLE *file_handle, WT_SESSION *wt_session, wt_off_t offset)
{
WT_DECL_RET;
- WT_IM *im;
+ WT_FILE_HANDLE_INMEM *im_fh;
+ WT_FILE_SYSTEM_INMEM *im_fs;
+ WT_SESSION_IMPL *session;
size_t off;
- im = S2C(session)->inmemory;
- __wt_spin_lock(session, &im->lock);
+ im_fh = (WT_FILE_HANDLE_INMEM *)file_handle;
+ im_fs = (WT_FILE_SYSTEM_INMEM *)file_handle->file_system;
+ session = (WT_SESSION_IMPL *)wt_session;
+
+ __wt_spin_lock(session, &im_fs->lock);
/*
* Grow the buffer as necessary, clear any new space in the file,
* and reset the file's data length.
*/
off = (size_t)offset;
- WT_ERR(__wt_buf_grow(session, &fh->buf, off));
- if (fh->buf.size < off)
- memset((uint8_t *)
- fh->buf.data + fh->buf.size, 0, off - fh->buf.size);
- fh->buf.size = off;
+ WT_ERR(__wt_buf_grow(session, &im_fh->buf, off));
+ if (im_fh->buf.size < off)
+ memset((uint8_t *)im_fh->buf.data + im_fh->buf.size,
+ 0, off - im_fh->buf.size);
+ im_fh->buf.size = off;
-err: __wt_spin_unlock(session, &im->lock);
+err: __wt_spin_unlock(session, &im_fs->lock);
return (ret);
}
@@ -287,31 +390,36 @@ err: __wt_spin_unlock(session, &im->lock);
* POSIX pwrite.
*/
static int
-__im_file_write(WT_SESSION_IMPL *session,
- WT_FH *fh, wt_off_t offset, size_t len, const void *buf)
+__im_file_write(WT_FILE_HANDLE *file_handle, WT_SESSION *wt_session,
+ wt_off_t offset, size_t len, const void *buf)
{
WT_DECL_RET;
- WT_IM *im;
+ WT_FILE_HANDLE_INMEM *im_fh;
+ WT_FILE_SYSTEM_INMEM *im_fs;
+ WT_SESSION_IMPL *session;
size_t off;
- im = S2C(session)->inmemory;
- __wt_spin_lock(session, &im->lock);
+ im_fh = (WT_FILE_HANDLE_INMEM *)file_handle;
+ im_fs = (WT_FILE_SYSTEM_INMEM *)file_handle->file_system;
+ session = (WT_SESSION_IMPL *)wt_session;
+
+ __wt_spin_lock(session, &im_fs->lock);
off = (size_t)offset;
- WT_ERR(__wt_buf_grow(session, &fh->buf, off + len + 1024));
+ WT_ERR(__wt_buf_grow(session, &im_fh->buf, off + len + 1024));
- memcpy((uint8_t *)fh->buf.data + off, buf, len);
- if (off + len > fh->buf.size)
- fh->buf.size = off + len;
- fh->off = off + len;
+ memcpy((uint8_t *)im_fh->buf.data + off, buf, len);
+ if (off + len > im_fh->buf.size)
+ im_fh->buf.size = off + len;
+ im_fh->off = off + len;
-err: __wt_spin_unlock(session, &im->lock);
+err: __wt_spin_unlock(session, &im_fs->lock);
if (ret == 0)
return (0);
WT_RET_MSG(session, ret,
"%s: handle-write: failed to write %" WT_SIZET_FMT " bytes at "
"offset %" WT_SIZET_FMT,
- fh->name, len, off);
+ im_fh->iface.name, len, off);
}
/*
@@ -319,85 +427,134 @@ err: __wt_spin_unlock(session, &im->lock);
* POSIX fopen/open.
*/
static int
-__im_file_open(WT_SESSION_IMPL *session,
- WT_FH *fh, const char *path, uint32_t file_type, uint32_t flags)
+__im_file_open(WT_FILE_SYSTEM *file_system, WT_SESSION *wt_session,
+ const char *name, WT_OPEN_FILE_TYPE file_type, uint32_t flags,
+ WT_FILE_HANDLE **file_handlep)
{
- WT_UNUSED(session);
- WT_UNUSED(path);
+ WT_DECL_RET;
+ WT_FILE_HANDLE *file_handle;
+ WT_FILE_HANDLE_INMEM *im_fh;
+ WT_FILE_SYSTEM_INMEM *im_fs;
+ WT_SESSION_IMPL *session;
+
WT_UNUSED(file_type);
WT_UNUSED(flags);
+ im_fs = (WT_FILE_SYSTEM_INMEM *)file_system;
+ session = (WT_SESSION_IMPL *)wt_session;
+
+ __wt_spin_lock(session, &im_fs->lock);
+
/*
- * Unlike other file handle open implementations, the in-memory version
- * is called whenever the WT_FH structure reference count goes to 0.
- * This is because the in-memory implementation reuses WT_FH structures,
- * and so we have to reset the file offset and potentially the list of
- * functions, in the case of the file being opened in a different way.
+ * First search the file queue, if we find it, assert there's only a
+ * single reference, in-memory only supports a single handle on any
+ * file, for now.
*/
- fh->off = 0;
- F_SET(fh, WT_FH_IN_MEMORY);
+ im_fh = __im_handle_search(file_system, name);
+ if (im_fh != NULL) {
- fh->fh_close = __im_file_close;
- fh->fh_lock = __im_file_lock;
- fh->fh_read = __im_file_read;
- fh->fh_size = __im_file_size;
- fh->fh_sync = __im_file_sync;
- fh->fh_truncate = __im_file_truncate;
- fh->fh_write = __im_file_write;
+ if (im_fh->ref != 0)
+ WT_ERR_MSG(session, EBUSY,
+ "%s: file-open: already open", name);
- return (0);
+ im_fh->ref = 1;
+ im_fh->off = 0;
+
+ *file_handlep = (WT_FILE_HANDLE *)im_fh;
+
+ __wt_spin_unlock(session, &im_fs->lock);
+ return (0);
+ }
+
+ /* The file hasn't been opened before, create a new one. */
+ WT_ERR(__wt_calloc_one(session, &im_fh));
+
+ /* Initialize private information. */
+ im_fh->ref = 1;
+ im_fh->off = 0;
+
+ /* Initialize public information. */
+ file_handle = (WT_FILE_HANDLE *)im_fh;
+ file_handle->file_system = file_system;
+ WT_ERR(__wt_strdup(session, name, &file_handle->name));
+
+ file_handle->close = __im_file_close;
+ file_handle->read = __im_file_read;
+ file_handle->size = __im_file_size;
+ file_handle->truncate = __im_file_truncate;
+ file_handle->write = __im_file_write;
+
+ TAILQ_INSERT_HEAD(&im_fs->fileq, im_fh, q);
+
+ *file_handlep = file_handle;
+
+ if (0) {
+err: __wt_free(session, im_fh);
+ }
+
+ __wt_spin_unlock(session, &im_fs->lock);
+ return (ret);
}
/*
- * __wt_os_inmemory --
- * Initialize an in-memory configuration.
+ * __im_terminate --
+ * Terminate an in-memory configuration.
*/
-int
-__wt_os_inmemory(WT_SESSION_IMPL *session)
+static int
+__im_terminate(WT_FILE_SYSTEM *file_system, WT_SESSION *wt_session)
{
- WT_CONNECTION_IMPL *conn;
WT_DECL_RET;
- WT_IM *im;
+ WT_FILE_HANDLE_INMEM *im_fh;
+ WT_FILE_SYSTEM_INMEM *im_fs;
+ WT_SESSION_IMPL *session;
- conn = S2C(session);
- im = NULL;
+ WT_UNUSED(file_system);
- /* Initialize the in-memory jump table. */
- conn->file_directory_list = __im_directory_list;
- conn->file_directory_sync = __im_directory_sync;
- conn->file_exist = __im_fs_exist;
- conn->file_remove = __im_fs_remove;
- conn->file_rename = __im_fs_rename;
- conn->file_size = __im_fs_size;
- conn->file_open = __im_file_open;
-
- /* Allocate an in-memory structure. */
- WT_RET(__wt_calloc_one(session, &im));
- WT_ERR(__wt_spin_init(session, &im->lock, "in-memory I/O"));
- conn->inmemory = im;
+ session = (WT_SESSION_IMPL *)wt_session;
+ im_fs = (WT_FILE_SYSTEM_INMEM *)file_system;
- return (0);
+ while ((im_fh = TAILQ_FIRST(&im_fs->fileq)) != NULL)
+ WT_TRET(__im_handle_remove(session, file_system, im_fh));
+
+ __wt_spin_destroy(session, &im_fs->lock);
+ __wt_free(session, im_fs);
-err: __wt_free(session, im);
return (ret);
}
/*
- * __wt_os_inmemory_cleanup --
- * Discard an in-memory configuration.
+ * __wt_os_inmemory --
+ * Initialize an in-memory configuration.
*/
int
-__wt_os_inmemory_cleanup(WT_SESSION_IMPL *session)
+__wt_os_inmemory(WT_SESSION_IMPL *session)
{
WT_DECL_RET;
- WT_IM *im;
+ WT_FILE_SYSTEM *file_system;
+ WT_FILE_SYSTEM_INMEM *im_fs;
- if ((im = S2C(session)->inmemory) == NULL)
- return (0);
- S2C(session)->inmemory = NULL;
+ WT_RET(__wt_calloc_one(session, &im_fs));
+
+ /* Initialize private information. */
+ TAILQ_INIT(&im_fs->fileq);
+ WT_ERR(__wt_spin_init(session, &im_fs->lock, "in-memory I/O"));
- __wt_spin_destroy(session, &im->lock);
- __wt_free(session, im);
+ /* Initialize the in-memory jump table. */
+ file_system = (WT_FILE_SYSTEM *)im_fs;
+ file_system->directory_list = __im_fs_directory_list;
+ file_system->directory_list_free = __im_fs_directory_list_free;
+ file_system->exist = __im_fs_exist;
+ file_system->open_file = __im_file_open;
+ file_system->remove = __im_fs_remove;
+ file_system->rename = __im_fs_rename;
+ file_system->size = __im_fs_size;
+ file_system->terminate = __im_terminate;
+
+ /* Switch the file system into place. */
+ S2C(session)->file_system = (WT_FILE_SYSTEM *)im_fs;
+
+ return (0);
+err: __wt_free(session, im_fs);
return (ret);
}
diff --git a/src/os_common/os_fstream.c b/src/os_common/os_fstream.c
index fe67c3312a5..fc0daf1c211 100644
--- a/src/os_common/os_fstream.c
+++ b/src/os_common/os_fstream.c
@@ -182,7 +182,8 @@ __wt_fopen(WT_SESSION_IMPL *session,
fs = NULL;
- WT_RET(__wt_open(session, name, WT_FILE_TYPE_REGULAR, open_flags, &fh));
+ WT_RET(__wt_open(
+ session, name, WT_OPEN_FILE_TYPE_REGULAR, open_flags, &fh));
WT_ERR(__wt_calloc_one(session, &fs));
fs->fh = fh;
diff --git a/src/os_common/os_init.c b/src/os_common/os_init.c
deleted file mode 100644
index 512216c52a5..00000000000
--- a/src/os_common/os_init.c
+++ /dev/null
@@ -1,41 +0,0 @@
-/*-
- * Copyright (c) 2014-2016 MongoDB, Inc.
- * Copyright (c) 2008-2014 WiredTiger, Inc.
- * All rights reserved.
- *
- * See the file LICENSE for redistribution information.
- */
-
-#include "wt_internal.h"
-
-/*
- * __wt_os_init --
- * Initialize the OS layer.
- */
-int
-__wt_os_init(WT_SESSION_IMPL *session)
-{
- return (F_ISSET(S2C(session), WT_CONN_IN_MEMORY) ?
- __wt_os_inmemory(session) :
-#if defined(_MSC_VER)
- __wt_os_win(session));
-#else
- __wt_os_posix(session));
-#endif
-}
-
-/*
- * __wt_os_cleanup --
- * Clean up the OS layer.
- */
-int
-__wt_os_cleanup(WT_SESSION_IMPL *session)
-{
- return (F_ISSET(S2C(session), WT_CONN_IN_MEMORY) ?
- __wt_os_inmemory_cleanup(session) :
-#if defined(_MSC_VER)
- __wt_os_win_cleanup(session));
-#else
- __wt_os_posix_cleanup(session));
-#endif
-}
diff --git a/src/os_posix/os_dir.c b/src/os_posix/os_dir.c
index 78ae5f8edd4..a23051e5b93 100644
--- a/src/os_posix/os_dir.c
+++ b/src/os_posix/os_dir.c
@@ -15,30 +15,33 @@
* Get a list of files from a directory, POSIX version.
*/
int
-__wt_posix_directory_list(WT_SESSION_IMPL *session, const char *dir,
- const char *prefix, uint32_t flags, char ***dirlist, u_int *countp)
+__wt_posix_directory_list(WT_FILE_SYSTEM *file_system,
+ WT_SESSION *wt_session, const char *directory,
+ const char *prefix, char ***dirlistp, uint32_t *countp)
{
struct dirent *dp;
DIR *dirp;
WT_DECL_RET;
+ WT_SESSION_IMPL *session;
size_t dirallocsz;
- u_int count, dirsz;
- bool match;
- char **entries, *path;
+ uint32_t count;
+ char **entries;
- *dirlist = NULL;
- *countp = 0;
+ WT_UNUSED(file_system);
+
+ session = (WT_SESSION_IMPL *)wt_session;
- WT_RET(__wt_filename(session, dir, &path));
+ *dirlistp = NULL;
+ *countp = 0;
dirp = NULL;
dirallocsz = 0;
- dirsz = 0;
entries = NULL;
- WT_SYSCALL_RETRY(((dirp = opendir(path)) == NULL ? 1 : 0), ret);
+ WT_SYSCALL_RETRY(((dirp = opendir(directory)) == NULL ? 1 : 0), ret);
if (ret != 0)
- WT_ERR_MSG(session, ret, "%s: directory-list: opendir", path);
+ WT_RET_MSG(session, ret,
+ "%s: directory-list: opendir", directory);
for (count = 0; (dp = readdir(dirp)) != NULL;) {
/*
@@ -49,44 +52,50 @@ __wt_posix_directory_list(WT_SESSION_IMPL *session, const char *dir,
continue;
/* The list of files is optionally filtered by a prefix. */
- match = false;
- if (prefix != NULL &&
- ((LF_ISSET(WT_DIRLIST_INCLUDE) &&
- WT_PREFIX_MATCH(dp->d_name, prefix)) ||
- (LF_ISSET(WT_DIRLIST_EXCLUDE) &&
- !WT_PREFIX_MATCH(dp->d_name, prefix))))
- match = true;
- if (prefix == NULL || match) {
- /*
- * We have a file name we want to return.
- */
- count++;
- if (count > dirsz) {
- dirsz += WT_DIR_ENTRY;
- WT_ERR(__wt_realloc_def(
- session, &dirallocsz, dirsz, &entries));
- }
- WT_ERR(__wt_strdup(
- session, dp->d_name, &entries[count-1]));
- }
+ if (prefix != NULL && !WT_PREFIX_MATCH(dp->d_name, prefix))
+ continue;
+
+ WT_ERR(__wt_realloc_def(
+ session, &dirallocsz, count + 1, &entries));
+ WT_ERR(__wt_strdup(session, dp->d_name, &entries[count]));
+ ++count;
}
- if (count > 0)
- *dirlist = entries;
+
+ *dirlistp = entries;
*countp = count;
err: if (dirp != NULL)
(void)closedir(dirp);
- __wt_free(session, path);
if (ret == 0)
return (0);
- if (*dirlist != NULL) {
- for (count = dirsz; count > 0; count--)
- __wt_free(session, entries[count]);
- __wt_free(session, entries);
- }
+ WT_TRET(__wt_posix_directory_list_free(
+ file_system, wt_session, entries, count));
+
WT_RET_MSG(session, ret,
"%s: directory-list, prefix \"%s\"",
- dir, prefix == NULL ? "" : prefix);
+ directory, prefix == NULL ? "" : prefix);
+}
+
+/*
+ * __wt_posix_directory_list_free --
+ * Free memory returned by __wt_posix_directory_list.
+ */
+int
+__wt_posix_directory_list_free(WT_FILE_SYSTEM *file_system,
+ WT_SESSION *wt_session, char **dirlist, uint32_t count)
+{
+ WT_SESSION_IMPL *session;
+
+ WT_UNUSED(file_system);
+
+ session = (WT_SESSION_IMPL *)wt_session;
+
+ if (dirlist != NULL) {
+ while (count > 0)
+ __wt_free(session, dirlist[--count]);
+ __wt_free(session, dirlist);
+ }
+ return (0);
}
diff --git a/src/os_posix/os_dlopen.c b/src/os_posix/os_dlopen.c
index 9a74eb4813d..ad1fcc90150 100644
--- a/src/os_posix/os_dlopen.c
+++ b/src/os_posix/os_dlopen.c
@@ -19,7 +19,7 @@ __wt_dlopen(WT_SESSION_IMPL *session, const char *path, WT_DLH **dlhp)
WT_DLH *dlh;
WT_RET(__wt_calloc_one(session, &dlh));
- WT_ERR(__wt_strdup(session, path, &dlh->name));
+ WT_ERR(__wt_strdup(session, path == NULL ? "local" : path, &dlh->name));
if ((dlh->handle = dlopen(path, RTLD_LAZY)) == NULL)
WT_ERR_MSG(
diff --git a/src/os_posix/os_fallocate.c b/src/os_posix/os_fallocate.c
index 51e29aab4de..92569d84c99 100644
--- a/src/os_posix/os_fallocate.c
+++ b/src/os_posix/os_fallocate.c
@@ -12,47 +12,28 @@
#include <linux/falloc.h>
#include <sys/syscall.h>
#endif
-/*
- * __wt_posix_file_allocate_configure --
- * Configure POSIX file-extension behavior for a file handle.
- */
-void
-__wt_posix_file_allocate_configure(WT_SESSION_IMPL *session, WT_FH *fh)
-{
- WT_UNUSED(session);
-
- fh->fallocate_available = WT_FALLOCATE_NOT_AVAILABLE;
- fh->fallocate_requires_locking = false;
-
- /*
- * Check for the availability of some form of fallocate; in all cases,
- * start off requiring locking, we'll relax that requirement once we
- * know which system calls work with the handle's underlying filesystem.
- */
-#if defined(HAVE_FALLOCATE) || defined(HAVE_POSIX_FALLOCATE)
- fh->fallocate_available = WT_FALLOCATE_AVAILABLE;
- fh->fallocate_requires_locking = true;
-#endif
-#if defined(__linux__) && defined(SYS_fallocate)
- fh->fallocate_available = WT_FALLOCATE_AVAILABLE;
- fh->fallocate_requires_locking = true;
-#endif
-}
/*
* __posix_std_fallocate --
* Linux fallocate call.
*/
static int
-__posix_std_fallocate(WT_FH *fh, wt_off_t offset, wt_off_t len)
+__posix_std_fallocate(WT_FILE_HANDLE *file_handle,
+ WT_SESSION *wt_session, wt_off_t offset, wt_off_t len)
{
#if defined(HAVE_FALLOCATE)
WT_DECL_RET;
+ WT_FILE_HANDLE_POSIX *pfh;
- WT_SYSCALL_RETRY(fallocate(fh->fd, 0, offset, len), ret);
+ WT_UNUSED(wt_session);
+
+ pfh = (WT_FILE_HANDLE_POSIX *)file_handle;
+
+ WT_SYSCALL_RETRY(fallocate(pfh->fd, 0, offset, len), ret);
return (ret);
#else
- WT_UNUSED(fh);
+ WT_UNUSED(file_handle);
+ WT_UNUSED(wt_session);
WT_UNUSED(offset);
WT_UNUSED(len);
return (ENOTSUP);
@@ -64,10 +45,16 @@ __posix_std_fallocate(WT_FH *fh, wt_off_t offset, wt_off_t len)
* Linux fallocate call (system call version).
*/
static int
-__posix_sys_fallocate(WT_FH *fh, wt_off_t offset, wt_off_t len)
+__posix_sys_fallocate(WT_FILE_HANDLE *file_handle,
+ WT_SESSION *wt_session, wt_off_t offset, wt_off_t len)
{
#if defined(__linux__) && defined(SYS_fallocate)
WT_DECL_RET;
+ WT_FILE_HANDLE_POSIX *pfh;
+
+ WT_UNUSED(wt_session);
+
+ pfh = (WT_FILE_HANDLE_POSIX *)file_handle;
/*
* Try the system call for fallocate even if the C library wrapper was
@@ -75,10 +62,11 @@ __posix_sys_fallocate(WT_FH *fh, wt_off_t offset, wt_off_t len)
* Linux versions (RHEL 5.5), but not in the version of the C library.
* This allows it to work everywhere the kernel supports it.
*/
- WT_SYSCALL_RETRY(syscall(SYS_fallocate, fh->fd, 0, offset, len), ret);
+ WT_SYSCALL_RETRY(syscall(SYS_fallocate, pfh->fd, 0, offset, len), ret);
return (ret);
#else
- WT_UNUSED(fh);
+ WT_UNUSED(file_handle);
+ WT_UNUSED(wt_session);
WT_UNUSED(offset);
WT_UNUSED(len);
return (ENOTSUP);
@@ -90,15 +78,22 @@ __posix_sys_fallocate(WT_FH *fh, wt_off_t offset, wt_off_t len)
* POSIX fallocate call.
*/
static int
-__posix_posix_fallocate(WT_FH *fh, wt_off_t offset, wt_off_t len)
+__posix_posix_fallocate(WT_FILE_HANDLE *file_handle,
+ WT_SESSION *wt_session, wt_off_t offset, wt_off_t len)
{
#if defined(HAVE_POSIX_FALLOCATE)
WT_DECL_RET;
+ WT_FILE_HANDLE_POSIX *pfh;
+
+ WT_UNUSED(wt_session);
- WT_SYSCALL_RETRY(posix_fallocate(fh->fd, offset, len), ret);
+ pfh = (WT_FILE_HANDLE_POSIX *)file_handle;
+
+ WT_SYSCALL_RETRY(posix_fallocate(pfh->fd, offset, len), ret);
return (ret);
#else
- WT_UNUSED(fh);
+ WT_UNUSED(file_handle);
+ WT_UNUSED(wt_session);
WT_UNUSED(offset);
WT_UNUSED(len);
return (ENOTSUP);
@@ -106,67 +101,45 @@ __posix_posix_fallocate(WT_FH *fh, wt_off_t offset, wt_off_t len)
}
/*
- * __wt_posix_file_allocate --
+ * __wt_posix_file_fallocate --
* POSIX fallocate.
*/
int
-__wt_posix_file_allocate(
- WT_SESSION_IMPL *session, WT_FH *fh, wt_off_t offset, wt_off_t len)
+__wt_posix_file_fallocate(WT_FILE_HANDLE *file_handle,
+ WT_SESSION *wt_session, wt_off_t offset, wt_off_t len)
{
- WT_DECL_RET;
-
- switch (fh->fallocate_available) {
- /*
- * Check for already configured handles and make the configured call.
- */
- case WT_FALLOCATE_POSIX:
- if ((ret = __posix_posix_fallocate(fh, offset, len)) == 0)
- return (0);
- WT_RET_MSG(session, ret, "%s: posix_fallocate", fh->name);
- case WT_FALLOCATE_STD:
- if ((ret = __posix_std_fallocate(fh, offset, len)) == 0)
- return (0);
- WT_RET_MSG(session, ret, "%s: fallocate", fh->name);
- case WT_FALLOCATE_SYS:
- if ((ret = __posix_sys_fallocate(fh, offset, len)) == 0)
- return (0);
- WT_RET_MSG(session, ret, "%s: sys_fallocate", fh->name);
-
/*
- * Figure out what allocation call this system/filesystem supports, if
- * any.
+ * The first fallocate call: figure out what allocation call this
+ * system/filesystem supports, if any.
+ *
+ * We've seen Linux systems where posix_fallocate has corrupted
+ * existing file data (even though that is explicitly disallowed
+ * by POSIX). FreeBSD and Solaris support posix_fallocate, and
+ * so far we've seen no problems leaving it unlocked. Check for
+ * fallocate (and the system call version of fallocate) first to
+ * avoid locking on Linux if at all possible.
*/
- case WT_FALLOCATE_AVAILABLE:
- /*
- * We've seen Linux systems where posix_fallocate has corrupted
- * existing file data (even though that is explicitly disallowed
- * by POSIX). FreeBSD and Solaris support posix_fallocate, and
- * so far we've seen no problems leaving it unlocked. Check for
- * fallocate (and the system call version of fallocate) first to
- * avoid locking on Linux if at all possible.
- */
- if ((ret = __posix_std_fallocate(fh, offset, len)) == 0) {
- fh->fallocate_available = WT_FALLOCATE_STD;
- fh->fallocate_requires_locking = false;
- return (0);
- }
- if ((ret = __posix_sys_fallocate(fh, offset, len)) == 0) {
- fh->fallocate_available = WT_FALLOCATE_SYS;
- fh->fallocate_requires_locking = false;
- return (0);
- }
- if ((ret = __posix_posix_fallocate(fh, offset, len)) == 0) {
- fh->fallocate_available = WT_FALLOCATE_POSIX;
-#if !defined(__linux__)
- fh->fallocate_requires_locking = false;
+ if (__posix_std_fallocate(file_handle, wt_session, offset, len) == 0) {
+ file_handle->fallocate = NULL;
+ file_handle->fallocate_nolock = __posix_std_fallocate;
+ return (0);
+ }
+ if (__posix_sys_fallocate(file_handle, wt_session, offset, len) == 0) {
+ file_handle->fallocate = NULL;
+ file_handle->fallocate_nolock = __posix_sys_fallocate;
+ return (0);
+ }
+ if (__posix_posix_fallocate(
+ file_handle, wt_session, offset, len) == 0) {
+#if defined(__linux__)
+ file_handle->fallocate = __posix_posix_fallocate;
+#else
+ file_handle->fallocate = NULL;
+ file_handle->fallocate_nolock = __posix_posix_fallocate;
#endif
- return (0);
- }
- /* FALLTHROUGH */
- case WT_FALLOCATE_NOT_AVAILABLE:
- default:
- fh->fallocate_available = WT_FALLOCATE_NOT_AVAILABLE;
- return (ENOTSUP);
+ return (0);
}
- /* NOTREACHED */
+
+ file_handle->fallocate = NULL;
+ return (ENOTSUP);
}
diff --git a/src/os_posix/os_fs.c b/src/os_posix/os_fs.c
index 5cf8ac2118b..9645652d3e9 100644
--- a/src/os_posix/os_fs.c
+++ b/src/os_posix/os_fs.c
@@ -13,30 +13,13 @@
* Underlying support function to flush a file handle.
*/
static int
-__posix_sync(WT_SESSION_IMPL *session,
- int fd, const char *name, const char *func, bool block)
+__posix_sync(
+ WT_SESSION_IMPL *session, int fd, const char *name, const char *func)
{
WT_DECL_RET;
WT_ASSERT(session, !F_ISSET(S2C(session), WT_CONN_READONLY));
-#ifdef HAVE_SYNC_FILE_RANGE
- if (!block) {
- WT_SYSCALL_RETRY(sync_file_range(fd,
- (off64_t)0, (off64_t)0, SYNC_FILE_RANGE_WRITE), ret);
- if (ret == 0)
- return (0);
- WT_RET_MSG(session, ret, "%s: %s: sync_file_range", name, func);
- }
-#else
- /*
- * Callers attempting asynchronous flush handle ENOTSUP returns, and
- * won't make further attempts.
- */
- if (!block)
- return (ENOTSUP);
-#endif
-
#if defined(F_FULLFSYNC)
/*
* OS X fsync documentation:
@@ -73,47 +56,29 @@ __posix_sync(WT_SESSION_IMPL *session,
#endif
}
+#ifdef __linux__
/*
* __posix_directory_sync --
* Flush a directory to ensure file creation is durable.
*/
static int
-__posix_directory_sync(WT_SESSION_IMPL *session, const char *path)
+__posix_directory_sync(
+ WT_FILE_SYSTEM *file_system, WT_SESSION *wt_session, const char *path)
{
-#ifdef __linux__
WT_DECL_RET;
+ WT_SESSION_IMPL *session;
int fd, tret;
- char *copy, *dir;
- /*
- * POSIX 1003.1 does not require that fsync of a file handle ensures the
- * entry in the directory containing the file has also reached disk (and
- * there are historic Linux filesystems requiring this), do an explicit
- * fsync on a file descriptor for the directory to be sure.
- */
- copy = NULL;
- if (path == NULL || strchr(path, '/') == NULL)
- path = S2C(session)->home;
- else {
- /*
- * File name construction should not return a path without any
- * slash separator, but caution isn't unreasonable.
- */
- WT_RET(__wt_filename(session, path, &copy));
- if ((dir = strrchr(copy, '/')) == NULL)
- path = S2C(session)->home;
- else {
- dir[1] = '\0';
- path = copy;
- }
- }
+ WT_UNUSED(file_system);
+
+ session = (WT_SESSION_IMPL *)wt_session;
WT_SYSCALL_RETRY((
(fd = open(path, O_RDONLY, 0444)) == -1 ? 1 : 0), ret);
if (ret != 0)
- WT_ERR_MSG(session, ret, "%s: directory-sync: open", path);
+ WT_RET_MSG(session, ret, "%s: directory-sync: open", path);
- ret = __posix_sync(session, fd, path, "directory-sync", true);
+ ret = __posix_sync(session, fd, path, "directory-sync");
WT_SYSCALL_RETRY(close(fd), tret);
if (tret != 0) {
@@ -121,40 +86,36 @@ __posix_directory_sync(WT_SESSION_IMPL *session, const char *path)
if (ret == 0)
ret = tret;
}
-err: __wt_free(session, copy);
return (ret);
-#else
- WT_UNUSED(session);
- WT_UNUSED(path);
- return (0);
-#endif
}
+#endif
/*
* __posix_fs_exist --
* Return if the file exists.
*/
static int
-__posix_fs_exist(WT_SESSION_IMPL *session, const char *name, bool *existp)
+__posix_fs_exist(WT_FILE_SYSTEM *file_system,
+ WT_SESSION *wt_session, const char *name, bool *existp)
{
struct stat sb;
WT_DECL_RET;
- char *path;
+ WT_SESSION_IMPL *session;
+
+ WT_UNUSED(file_system);
- WT_RET(__wt_filename(session, name, &path));
- name = path;
+ session = (WT_SESSION_IMPL *)wt_session;
WT_SYSCALL_RETRY(stat(name, &sb), ret);
- if (ret == 0)
+ if (ret == 0) {
*existp = true;
- else if (ret == ENOENT) {
+ return (0);
+ }
+ if (ret == ENOENT) {
*existp = false;
- ret = 0;
- } else
- __wt_err(session, ret, "%s: file-exist: stat", name);
-
- __wt_free(session, path);
- return (ret);
+ return (0);
+ }
+ WT_RET_MSG(session, ret, "%s: file-exist: stat", name);
}
/*
@@ -162,26 +123,20 @@ __posix_fs_exist(WT_SESSION_IMPL *session, const char *name, bool *existp)
* Remove a file.
*/
static int
-__posix_fs_remove(WT_SESSION_IMPL *session, const char *name)
+__posix_fs_remove(
+ WT_FILE_SYSTEM *file_system, WT_SESSION *wt_session, const char *name)
{
WT_DECL_RET;
- char *path;
+ WT_SESSION_IMPL *session;
-#ifdef HAVE_DIAGNOSTIC
- if (__wt_handle_search(session, name, false, NULL, NULL))
- WT_RET_MSG(session, EINVAL,
- "%s: file-remove: file has open handles", name);
-#endif
+ WT_UNUSED(file_system);
- WT_RET(__wt_filename(session, name, &path));
- name = path;
+ session = (WT_SESSION_IMPL *)wt_session;
WT_SYSCALL_RETRY(remove(name), ret);
- if (ret != 0)
- __wt_err(session, ret, "%s: file-remove: remove", name);
-
- __wt_free(session, path);
- return (ret);
+ if (ret == 0)
+ return (0);
+ WT_RET_MSG(session, ret, "%s: file-remove: remove", name);
}
/*
@@ -189,34 +144,20 @@ __posix_fs_remove(WT_SESSION_IMPL *session, const char *name)
* Rename a file.
*/
static int
-__posix_fs_rename(WT_SESSION_IMPL *session, const char *from, const char *to)
+__posix_fs_rename(WT_FILE_SYSTEM *file_system,
+ WT_SESSION *wt_session, const char *from, const char *to)
{
WT_DECL_RET;
- char *from_path, *to_path;
-
-#ifdef HAVE_DIAGNOSTIC
- if (__wt_handle_search(session, from, false, NULL, NULL))
- WT_RET_MSG(session, EINVAL,
- "%s: file-rename: file has open handles", from);
- if (__wt_handle_search(session, to, false, NULL, NULL))
- WT_RET_MSG(session, EINVAL,
- "%s: file-rename: file has open handles", to);
-#endif
+ WT_SESSION_IMPL *session;
- from_path = to_path = NULL;
- WT_ERR(__wt_filename(session, from, &from_path));
- from = from_path;
- WT_ERR(__wt_filename(session, to, &to_path));
- to = to_path;
+ WT_UNUSED(file_system);
- WT_SYSCALL_RETRY(rename(from, to), ret);
- if (ret != 0)
- __wt_err(session, ret,
- "%s to %s: file-rename: rename", from, to);
+ session = (WT_SESSION_IMPL *)wt_session;
-err: __wt_free(session, from_path);
- __wt_free(session, to_path);
- return (ret);
+ WT_SYSCALL_RETRY(rename(from, to), ret);
+ if (ret == 0)
+ return (0);
+ WT_RET_MSG(session, ret, "%s to %s: file-rename: rename", from, to);
}
/*
@@ -224,90 +165,86 @@ err: __wt_free(session, from_path);
* Get the size of a file in bytes, by file name.
*/
static int
-__posix_fs_size(
- WT_SESSION_IMPL *session, const char *name, bool silent, wt_off_t *sizep)
+__posix_fs_size(WT_FILE_SYSTEM *file_system,
+ WT_SESSION *wt_session, const char *name, wt_off_t *sizep)
{
struct stat sb;
WT_DECL_RET;
- char *path;
+ WT_SESSION_IMPL *session;
- WT_RET(__wt_filename(session, name, &path));
- name = path;
+ WT_UNUSED(file_system);
+
+ session = (WT_SESSION_IMPL *)wt_session;
- /*
- * Optionally don't log errors on ENOENT; some callers of this function
- * expect failure in that case and don't want an error message logged.
- */
WT_SYSCALL_RETRY(stat(name, &sb), ret);
- if (ret == 0)
+ if (ret == 0) {
*sizep = sb.st_size;
- else if (ret != ENOENT || !silent)
- __wt_err(session, ret, "%s: file-size: stat", name);
-
- __wt_free(session, path);
-
- return (ret);
+ return (0);
+ }
+ WT_RET_MSG(session, ret, "%s: file-size: stat", name);
}
+#if defined(HAVE_POSIX_FADVISE)
/*
* __posix_file_advise --
* POSIX fadvise.
*/
static int
-__posix_file_advise(WT_SESSION_IMPL *session,
- WT_FH *fh, wt_off_t offset, wt_off_t len, int advice)
+__posix_file_advise(WT_FILE_HANDLE *file_handle, WT_SESSION *wt_session,
+ wt_off_t offset, wt_off_t len, int advice)
{
-#if defined(HAVE_POSIX_FADVISE)
WT_DECL_RET;
+ WT_FILE_HANDLE_POSIX *pfh;
+ WT_SESSION_IMPL *session;
- /*
- * Refuse pre-load when direct I/O is configured for the file, the
- * kernel cache isn't interesting.
- */
- if (advice == POSIX_MADV_WILLNEED && fh->direct_io)
- return (ENOTSUP);
+ session = (WT_SESSION_IMPL *)wt_session;
+ pfh = (WT_FILE_HANDLE_POSIX *)file_handle;
- WT_SYSCALL_RETRY(posix_fadvise(fh->fd, offset, len, advice), ret);
+ WT_SYSCALL_RETRY(posix_fadvise(pfh->fd, offset, len, advice), ret);
if (ret == 0)
return (0);
/*
* Treat EINVAL as not-supported, some systems don't support some flags.
- * Quietly fail, callers expect not-supported failures.
+ * Quietly fail, callers expect not-supported failures, and reset the
+ * handle method to prevent future calls.
*/
- if (ret == EINVAL)
+ if (ret == EINVAL) {
+ file_handle->fadvise = NULL;
return (ENOTSUP);
+ }
- WT_RET_MSG(session, ret, "%s: handle-advise: posix_fadvise", fh->name);
-#else
- WT_UNUSED(session);
- WT_UNUSED(fh);
- WT_UNUSED(offset);
- WT_UNUSED(len);
- WT_UNUSED(advice);
+ WT_RET_MSG(session, ret,
+ "%s: handle-advise: posix_fadvise", file_handle->name);
- /* Quietly fail, callers expect not-supported failures. */
- return (ENOTSUP);
-#endif
}
+#endif
/*
* __posix_file_close --
* ANSI C close.
*/
static int
-__posix_file_close(WT_SESSION_IMPL *session, WT_FH *fh)
+__posix_file_close(WT_FILE_HANDLE *file_handle, WT_SESSION *wt_session)
{
WT_DECL_RET;
+ WT_FILE_HANDLE_POSIX *pfh;
+ WT_SESSION_IMPL *session;
+
+ session = (WT_SESSION_IMPL *)wt_session;
+ pfh = (WT_FILE_HANDLE_POSIX *)file_handle;
/* Close the file handle. */
- if (fh->fd == -1)
- return (0);
+ if (pfh->fd != -1) {
+ WT_SYSCALL_RETRY(close(pfh->fd), ret);
+ if (ret != 0)
+ __wt_err(session, ret,
+ "%s: handle-close: close", file_handle->name);
+ }
- WT_SYSCALL_RETRY(close(fh->fd), ret);
- if (ret == 0)
- return (0);
- WT_RET_MSG(session, ret, "%s: handle-close: close", fh->name);
+ __wt_free(session, file_handle->name);
+ __wt_free(session, pfh);
+ return (ret);
}
/*
@@ -315,10 +252,16 @@ __posix_file_close(WT_SESSION_IMPL *session, WT_FH *fh)
* Lock/unlock a file.
*/
static int
-__posix_file_lock(WT_SESSION_IMPL *session, WT_FH *fh, bool lock)
+__posix_file_lock(
+ WT_FILE_HANDLE *file_handle, WT_SESSION *wt_session, bool lock)
{
struct flock fl;
WT_DECL_RET;
+ WT_FILE_HANDLE_POSIX *pfh;
+ WT_SESSION_IMPL *session;
+
+ session = (WT_SESSION_IMPL *)wt_session;
+ pfh = (WT_FILE_HANDLE_POSIX *)file_handle;
/*
* WiredTiger requires this function be able to acquire locks past
@@ -334,10 +277,10 @@ __posix_file_lock(WT_SESSION_IMPL *session, WT_FH *fh, bool lock)
fl.l_type = lock ? F_WRLCK : F_UNLCK;
fl.l_whence = SEEK_SET;
- WT_SYSCALL_RETRY(fcntl(fh->fd, F_SETLK, &fl), ret);
+ WT_SYSCALL_RETRY(fcntl(pfh->fd, F_SETLK, &fl), ret);
if (ret == 0)
return (0);
- WT_RET_MSG(session, ret, "%s: handle-lock: fcntl", fh->name);
+ WT_RET_MSG(session, ret, "%s: handle-lock: fcntl", file_handle->name);
}
/*
@@ -345,16 +288,21 @@ __posix_file_lock(WT_SESSION_IMPL *session, WT_FH *fh, bool lock)
* POSIX pread.
*/
static int
-__posix_file_read(
- WT_SESSION_IMPL *session, WT_FH *fh, wt_off_t offset, size_t len, void *buf)
+__posix_file_read(WT_FILE_HANDLE *file_handle,
+ WT_SESSION *wt_session, wt_off_t offset, size_t len, void *buf)
{
+ WT_FILE_HANDLE_POSIX *pfh;
+ WT_SESSION_IMPL *session;
size_t chunk;
ssize_t nr;
uint8_t *addr;
+ session = (WT_SESSION_IMPL *)wt_session;
+ pfh = (WT_FILE_HANDLE_POSIX *)file_handle;
+
/* Assert direct I/O is aligned and a multiple of the alignment. */
WT_ASSERT(session,
- !fh->direct_io ||
+ !pfh->direct_io ||
S2C(session)->buffer_alignment == 0 ||
(!((uintptr_t)buf &
(uintptr_t)(S2C(session)->buffer_alignment - 1)) &&
@@ -364,11 +312,11 @@ __posix_file_read(
/* Break reads larger than 1GB into 1GB chunks. */
for (addr = buf; len > 0; addr += nr, len -= (size_t)nr, offset += nr) {
chunk = WT_MIN(len, WT_GIGABYTE);
- if ((nr = pread(fh->fd, addr, chunk, offset)) <= 0)
+ if ((nr = pread(pfh->fd, addr, chunk, offset)) <= 0)
WT_RET_MSG(session, nr == 0 ? WT_ERROR : __wt_errno(),
"%s: handle-read: pread: failed to read %"
WT_SIZET_FMT " bytes at offset %" PRIuMAX,
- fh->name, chunk, (uintmax_t)offset);
+ file_handle->name, chunk, (uintmax_t)offset);
}
return (0);
}
@@ -378,17 +326,23 @@ __posix_file_read(
* Get the size of a file in bytes, by file handle.
*/
static int
-__posix_file_size(WT_SESSION_IMPL *session, WT_FH *fh, wt_off_t *sizep)
+__posix_file_size(
+ WT_FILE_HANDLE *file_handle, WT_SESSION *wt_session, wt_off_t *sizep)
{
struct stat sb;
WT_DECL_RET;
+ WT_FILE_HANDLE_POSIX *pfh;
+ WT_SESSION_IMPL *session;
+
+ session = (WT_SESSION_IMPL *)wt_session;
+ pfh = (WT_FILE_HANDLE_POSIX *)file_handle;
- WT_SYSCALL_RETRY(fstat(fh->fd, &sb), ret);
+ WT_SYSCALL_RETRY(fstat(pfh->fd, &sb), ret);
if (ret == 0) {
*sizep = sb.st_size;
return (0);
}
- WT_RET_MSG(session, ret, "%s: handle-size: fstat", fh->name);
+ WT_RET_MSG(session, ret, "%s: handle-size: fstat", file_handle->name);
}
/*
@@ -396,24 +350,62 @@ __posix_file_size(WT_SESSION_IMPL *session, WT_FH *fh, wt_off_t *sizep)
* POSIX fsync.
*/
static int
-__posix_file_sync(WT_SESSION_IMPL *session, WT_FH *fh, bool block)
+__posix_file_sync(WT_FILE_HANDLE *file_handle, WT_SESSION *wt_session)
{
- return (__posix_sync(session, fh->fd, fh->name, "handle-sync", block));
+ WT_FILE_HANDLE_POSIX *pfh;
+ WT_SESSION_IMPL *session;
+
+ session = (WT_SESSION_IMPL *)wt_session;
+ pfh = (WT_FILE_HANDLE_POSIX *)file_handle;
+
+ return (
+ __posix_sync(session, pfh->fd, file_handle->name, "handle-sync"));
}
+#ifdef HAVE_SYNC_FILE_RANGE
+/*
+ * __posix_file_sync_nowait --
+ * POSIX fsync.
+ */
+static int
+__posix_file_sync_nowait(WT_FILE_HANDLE *file_handle, WT_SESSION *wt_session)
+{
+ WT_DECL_RET;
+ WT_FILE_HANDLE_POSIX *pfh;
+ WT_SESSION_IMPL *session;
+
+ session = (WT_SESSION_IMPL *)wt_session;
+ pfh = (WT_FILE_HANDLE_POSIX *)file_handle;
+
+ WT_SYSCALL_RETRY(sync_file_range(pfh->fd,
+ (off64_t)0, (off64_t)0, SYNC_FILE_RANGE_WRITE), ret);
+ if (ret == 0)
+ return (0);
+ WT_RET_MSG(session, ret,
+ "%s: handle-sync-nowait: sync_file_range", file_handle->name);
+}
+#endif
+
/*
* __posix_file_truncate --
* POSIX ftruncate.
*/
static int
-__posix_file_truncate(WT_SESSION_IMPL *session, WT_FH *fh, wt_off_t len)
+__posix_file_truncate(
+ WT_FILE_HANDLE *file_handle, WT_SESSION *wt_session, wt_off_t len)
{
WT_DECL_RET;
+ WT_FILE_HANDLE_POSIX *pfh;
+ WT_SESSION_IMPL *session;
+
+ session = (WT_SESSION_IMPL *)wt_session;
+ pfh = (WT_FILE_HANDLE_POSIX *)file_handle;
- WT_SYSCALL_RETRY(ftruncate(fh->fd, len), ret);
+ WT_SYSCALL_RETRY(ftruncate(pfh->fd, len), ret);
if (ret == 0)
return (0);
- WT_RET_MSG(session, ret, "%s: handle-truncate: ftruncate", fh->name);
+ WT_RET_MSG(session, ret,
+ "%s: handle-truncate: ftruncate", file_handle->name);
}
/*
@@ -421,16 +413,21 @@ __posix_file_truncate(WT_SESSION_IMPL *session, WT_FH *fh, wt_off_t len)
* POSIX pwrite.
*/
static int
-__posix_file_write(WT_SESSION_IMPL *session,
- WT_FH *fh, wt_off_t offset, size_t len, const void *buf)
+__posix_file_write(WT_FILE_HANDLE *file_handle, WT_SESSION *wt_session,
+ wt_off_t offset, size_t len, const void *buf)
{
+ WT_FILE_HANDLE_POSIX *pfh;
+ WT_SESSION_IMPL *session;
size_t chunk;
ssize_t nw;
const uint8_t *addr;
+ session = (WT_SESSION_IMPL *)wt_session;
+ pfh = (WT_FILE_HANDLE_POSIX *)file_handle;
+
/* Assert direct I/O is aligned and a multiple of the alignment. */
WT_ASSERT(session,
- !fh->direct_io ||
+ !pfh->direct_io ||
S2C(session)->buffer_alignment == 0 ||
(!((uintptr_t)buf &
(uintptr_t)(S2C(session)->buffer_alignment - 1)) &&
@@ -440,21 +437,21 @@ __posix_file_write(WT_SESSION_IMPL *session,
/* Break writes larger than 1GB into 1GB chunks. */
for (addr = buf; len > 0; addr += nw, len -= (size_t)nw, offset += nw) {
chunk = WT_MIN(len, WT_GIGABYTE);
- if ((nw = pwrite(fh->fd, addr, chunk, offset)) < 0)
+ if ((nw = pwrite(pfh->fd, addr, chunk, offset)) < 0)
WT_RET_MSG(session, __wt_errno(),
"%s: handle-write: pwrite: failed to write %"
WT_SIZET_FMT " bytes at offset %" PRIuMAX,
- fh->name, chunk, (uintmax_t)offset);
+ file_handle->name, chunk, (uintmax_t)offset);
}
return (0);
}
/*
- * __posix_file_open_cloexec --
+ * __posix_open_file_cloexec --
* Prevent child access to file handles.
*/
static inline int
-__posix_file_open_cloexec(WT_SESSION_IMPL *session, int fd, const char *name)
+__posix_open_file_cloexec(WT_SESSION_IMPL *session, int fd, const char *name)
{
#if defined(HAVE_FCNTL) && defined(FD_CLOEXEC) && !defined(O_CLOEXEC)
int f;
@@ -479,24 +476,35 @@ __posix_file_open_cloexec(WT_SESSION_IMPL *session, int fd, const char *name)
}
/*
- * __posix_file_open --
+ * __posix_open_file --
* Open a file handle.
*/
static int
-__posix_file_open(WT_SESSION_IMPL *session,
- WT_FH *fh, const char *name, uint32_t file_type, uint32_t flags)
+__posix_open_file(WT_FILE_SYSTEM *file_system, WT_SESSION *wt_session,
+ const char *name, WT_OPEN_FILE_TYPE file_type, uint32_t flags,
+ WT_FILE_HANDLE **file_handlep)
{
WT_CONNECTION_IMPL *conn;
WT_DECL_RET;
+ WT_FILE_HANDLE *file_handle;
+ WT_FILE_HANDLE_POSIX *pfh;
+ WT_SESSION_IMPL *session;
mode_t mode;
- int f, fd, tret;
+ int f;
+ WT_UNUSED(file_system);
+
+ *file_handlep = NULL;
+
+ session = (WT_SESSION_IMPL *)wt_session;
conn = S2C(session);
+ WT_RET(__wt_calloc_one(session, &pfh));
+
/* Set up error handling. */
- fh->fd = fd = -1;
+ pfh->fd = -1;
- if (file_type == WT_FILE_TYPE_DIRECTORY) {
+ if (file_type == WT_OPEN_FILE_TYPE_DIRECTORY) {
f = O_RDONLY;
#ifdef O_CLOEXEC
/*
@@ -507,10 +515,10 @@ __posix_file_open(WT_SESSION_IMPL *session,
f |= O_CLOEXEC;
#endif
WT_SYSCALL_RETRY((
- (fd = open(name, f, 0444)) == -1 ? 1 : 0), ret);
+ (pfh->fd = open(name, f, 0444)) == -1 ? 1 : 0), ret);
if (ret != 0)
WT_ERR_MSG(session, ret, "%s: handle-open: open", name);
- WT_ERR(__posix_file_open_cloexec(session, fd, name));
+ WT_ERR(__posix_open_file_cloexec(session, pfh->fd, name));
goto directory_open;
}
@@ -539,16 +547,17 @@ __posix_file_open(WT_SESSION_IMPL *session,
/* Direct I/O. */
if (LF_ISSET(WT_OPEN_DIRECTIO)) {
f |= O_DIRECT;
- fh->direct_io = true;
- }
+ pfh->direct_io = true;
+ } else
+ pfh->direct_io = false;
#endif
#ifdef O_NOATIME
/* Avoid updating metadata for read-only workloads. */
- if (file_type == WT_FILE_TYPE_DATA)
+ if (file_type == WT_OPEN_FILE_TYPE_DATA)
f |= O_NOATIME;
#endif
- if (file_type == WT_FILE_TYPE_LOG &&
+ if (file_type == WT_OPEN_FILE_TYPE_LOG &&
FLD_ISSET(conn->txn_logsync, WT_LOG_DSYNC)) {
#ifdef O_DSYNC
f |= O_DSYNC;
@@ -560,20 +569,24 @@ __posix_file_open(WT_SESSION_IMPL *session,
#endif
}
- WT_SYSCALL_RETRY(((fd = open(name, f, mode)) == -1 ? 1 : 0), ret);
+ WT_SYSCALL_RETRY(((pfh->fd = open(name, f, mode)) == -1 ? 1 : 0), ret);
if (ret != 0)
WT_ERR_MSG(session, ret,
- fh->direct_io ?
+ pfh->direct_io ?
"%s: handle-open: open: failed with direct I/O configured, "
"some filesystem types do not support direct I/O" :
"%s: handle-open: open", name);
- WT_ERR(__posix_file_open_cloexec(session, fd, name));
+ WT_ERR(__posix_open_file_cloexec(session, pfh->fd, name));
- /* Disable read-ahead on trees: it slows down random read workloads. */
#if defined(HAVE_POSIX_FADVISE)
- if (file_type == WT_FILE_TYPE_DATA) {
+ /*
+ * Disable read-ahead on trees: it slows down random read workloads.
+ * Ignore fadvise when doing direct I/O, the kernel cache isn't
+ * interesting.
+ */
+ if (!pfh->direct_io && file_type == WT_OPEN_FILE_TYPE_DATA) {
WT_SYSCALL_RETRY(
- posix_fadvise(fd, 0, 0, POSIX_FADV_RANDOM), ret);
+ posix_fadvise(pfh->fd, 0, 0, POSIX_FADV_RANDOM), ret);
if (ret != 0)
WT_ERR_MSG(session, ret,
"%s: handle-open: posix_fadvise", name);
@@ -581,66 +594,99 @@ __posix_file_open(WT_SESSION_IMPL *session,
#endif
directory_open:
- fh->fd = fd;
-
- /* Configure fallocate calls. */
- __wt_posix_file_allocate_configure(session, fh);
-
- fh->fh_advise = __posix_file_advise;
- fh->fh_allocate = __wt_posix_file_allocate;
- fh->fh_close = __posix_file_close;
- fh->fh_lock = __posix_file_lock;
- fh->fh_map = __wt_posix_map;
- fh->fh_map_discard = __wt_posix_map_discard;
- fh->fh_map_preload = __wt_posix_map_preload;
- fh->fh_map_unmap = __wt_posix_map_unmap;
- fh->fh_read = __posix_file_read;
- fh->fh_size = __posix_file_size;
- fh->fh_sync = __posix_file_sync;
- fh->fh_truncate = __posix_file_truncate;
- fh->fh_write = __posix_file_write;
+ /* Initialize public information. */
+ file_handle = (WT_FILE_HANDLE *)pfh;
+ WT_ERR(__wt_strdup(session, name, &file_handle->name));
+
+ file_handle->close = __posix_file_close;
+#if defined(HAVE_POSIX_FADVISE)
+ /*
+ * Ignore fadvise when doing direct I/O, the kernel cache isn't
+ * interesting.
+ */
+ if (!pfh->direct_io)
+ file_handle->fadvise = __posix_file_advise;
+#endif
+ file_handle->fallocate = __wt_posix_file_fallocate;
+ file_handle->lock = __posix_file_lock;
+#ifdef WORDS_BIGENDIAN
+ /*
+ * The underlying objects are little-endian, mapping objects isn't
+ * currently supported on big-endian systems.
+ */
+#else
+ file_handle->map = __wt_posix_map;
+#ifdef HAVE_POSIX_MADVISE
+ file_handle->map_discard = __wt_posix_map_discard;
+ file_handle->map_preload = __wt_posix_map_preload;
+#endif
+ file_handle->unmap = __wt_posix_unmap;
+#endif
+ file_handle->read = __posix_file_read;
+ file_handle->size = __posix_file_size;
+ file_handle->sync = __posix_file_sync;
+#ifdef HAVE_SYNC_FILE_RANGE
+ file_handle->sync_nowait = __posix_file_sync_nowait;
+#endif
+ file_handle->truncate = __posix_file_truncate;
+ file_handle->write = __posix_file_write;
+
+ *file_handlep = file_handle;
return (0);
-err: if (fd != -1) {
- WT_SYSCALL_RETRY(close(fd), tret);
- if (tret != 0)
- __wt_err(session, tret, "%s: handle-open: close", name);
- }
+err: WT_TRET(__posix_file_close((WT_FILE_HANDLE *)pfh, wt_session));
return (ret);
}
/*
- * __wt_os_posix --
- * Initialize a POSIX configuration.
+ * __posix_terminate --
+ * Terminate a POSIX configuration.
*/
-int
-__wt_os_posix(WT_SESSION_IMPL *session)
+static int
+__posix_terminate(WT_FILE_SYSTEM *file_system, WT_SESSION *wt_session)
{
- WT_CONNECTION_IMPL *conn;
+ WT_SESSION_IMPL *session;
- conn = S2C(session);
+ WT_UNUSED(file_system);
- /* Initialize the POSIX jump table. */
- conn->file_directory_list = __wt_posix_directory_list;
- conn->file_directory_sync = __posix_directory_sync;
- conn->file_exist = __posix_fs_exist;
- conn->file_open = __posix_file_open;
- conn->file_remove = __posix_fs_remove;
- conn->file_rename = __posix_fs_rename;
- conn->file_size = __posix_fs_size;
+ session = (WT_SESSION_IMPL *)wt_session;
+ __wt_free(session, file_system);
return (0);
}
/*
- * __wt_os_posix_cleanup --
- * Discard a POSIX configuration.
+ * __wt_os_posix --
+ * Initialize a POSIX configuration.
*/
int
-__wt_os_posix_cleanup(WT_SESSION_IMPL *session)
+__wt_os_posix(WT_SESSION_IMPL *session)
{
- WT_UNUSED(session);
+ WT_CONNECTION_IMPL *conn;
+ WT_FILE_SYSTEM *file_system;
+
+ conn = S2C(session);
+
+ WT_RET(__wt_calloc_one(session, &file_system));
+
+ /* Initialize the POSIX jump table. */
+ file_system->directory_list = __wt_posix_directory_list;
+ file_system->directory_list_free = __wt_posix_directory_list_free;
+#ifdef __linux__
+ file_system->directory_sync = __posix_directory_sync;
+#else
+ file_system->directory_sync = NULL;
+#endif
+ file_system->exist = __posix_fs_exist;
+ file_system->open_file = __posix_open_file;
+ file_system->remove = __posix_fs_remove;
+ file_system->rename = __posix_fs_rename;
+ file_system->size = __posix_fs_size;
+ file_system->terminate = __posix_terminate;
+
+ /* Switch it into place. */
+ conn->file_system = file_system;
return (0);
}
diff --git a/src/os_posix/os_map.c b/src/os_posix/os_map.c
index de28891ffd1..7fde4037250 100644
--- a/src/os_posix/os_map.c
+++ b/src/os_posix/os_map.c
@@ -13,23 +13,26 @@
* Map a file into memory.
*/
int
-__wt_posix_map(WT_SESSION_IMPL *session,
- WT_FH *fh, void *mapp, size_t *lenp, void **mappingcookie)
+__wt_posix_map(WT_FILE_HANDLE *fh, WT_SESSION *wt_session,
+ void *mapped_regionp, size_t *lenp, void *mapped_cookiep)
{
+ WT_FILE_HANDLE_POSIX *pfh;
+ WT_SESSION_IMPL *session;
size_t len;
wt_off_t file_size;
void *map;
- WT_UNUSED(mappingcookie);
+ WT_UNUSED(mapped_cookiep);
- WT_ASSERT(session, !F_ISSET(S2C(session), WT_CONN_IN_MEMORY));
+ session = (WT_SESSION_IMPL *)wt_session;
+ pfh = (WT_FILE_HANDLE_POSIX *)fh;
/*
* Mapping isn't possible if direct I/O configured for the file, the
* Linux open(2) documentation says applications should avoid mixing
* mmap(2) of files with direct I/O to the same files.
*/
- if (fh->direct_io)
+ if (pfh->direct_io)
return (ENOTSUP);
/*
@@ -37,7 +40,7 @@ __wt_posix_map(WT_SESSION_IMPL *session,
* underneath us, our caller needs to ensure consistency of the mapped
* region vs. any other file activity.
*/
- WT_RET(__wt_filesize(session, fh, &file_size));
+ WT_RET(fh->size(fh, wt_session, &file_size));
len = (size_t)file_size;
(void)__wt_verbose(session, WT_VERB_HANDLEOPS,
@@ -49,43 +52,48 @@ __wt_posix_map(WT_SESSION_IMPL *session,
MAP_NOCORE |
#endif
MAP_PRIVATE,
- fh->fd, (wt_off_t)0)) == MAP_FAILED)
+ pfh->fd, (wt_off_t)0)) == MAP_FAILED)
WT_RET_MSG(session,
__wt_errno(), "%s: memory-map: mmap", fh->name);
- *(void **)mapp = map;
+ *(void **)mapped_regionp = map;
*lenp = len;
return (0);
}
#ifdef HAVE_POSIX_MADVISE
/*
- * __posix_map_preload_madvise --
+ * __wt_posix_map_preload --
* Cause a section of a memory map to be faulted in.
*/
-static int
-__posix_map_preload_madvise(
- WT_SESSION_IMPL *session, WT_FH *fh, const void *p, size_t size)
+int
+__wt_posix_map_preload(WT_FILE_HANDLE *fh,
+ WT_SESSION *wt_session, const void *map, size_t length, void *mapped_cookie)
{
WT_BM *bm;
WT_CONNECTION_IMPL *conn;
WT_DECL_RET;
+ WT_SESSION_IMPL *session;
void *blk;
+ WT_UNUSED(mapped_cookie);
+
+ session = (WT_SESSION_IMPL *)wt_session;
+
conn = S2C(session);
bm = S2BT(session)->bm;
/* Linux requires the address be aligned to a 4KB boundary. */
- blk = (void *)((uintptr_t)p & ~(uintptr_t)(conn->page_size - 1));
- size += WT_PTRDIFF(p, blk);
+ blk = (void *)((uintptr_t)map & ~(uintptr_t)(conn->page_size - 1));
+ length += WT_PTRDIFF(map, blk);
/* XXX proxy for "am I doing a scan?" -- manual read-ahead */
if (F_ISSET(session, WT_SESSION_NO_CACHE)) {
/* Read in 2MB blocks every 1MB of data. */
- if (((uintptr_t)((uint8_t *)blk + size) &
+ if (((uintptr_t)((uint8_t *)blk + length) &
(uintptr_t)((1<<20) - 1)) < (uintptr_t)blk)
return (0);
- size = WT_MIN(WT_MAX(20 * size, 2 << 20),
+ length = WT_MIN(WT_MAX(20 * length, 2 << 20),
WT_PTRDIFF((uint8_t *)bm->map + bm->maplen, blk));
}
@@ -93,10 +101,10 @@ __posix_map_preload_madvise(
* Manual pages aren't clear on whether alignment is required for the
* size, so we will be conservative.
*/
- size &= ~(size_t)(conn->page_size - 1);
+ length &= ~(size_t)(conn->page_size - 1);
- if (size <= (size_t)conn->page_size ||
- (ret = posix_madvise(blk, size, POSIX_MADV_WILLNEED)) == 0)
+ if (length <= (size_t)conn->page_size ||
+ (ret = posix_madvise(blk, length, POSIX_MADV_WILLNEED)) == 0)
return (0);
WT_RET_MSG(session, ret,
"%s: memory-map preload: posix_madvise: POSIX_MADV_WILLNEED",
@@ -104,46 +112,30 @@ __posix_map_preload_madvise(
}
#endif
-/*
- * __wt_posix_map_preload --
- * Cause a section of a memory map to be faulted in.
- */
-int
-__wt_posix_map_preload(
- WT_SESSION_IMPL *session, WT_FH *fh, const void *p, size_t size)
-{
- WT_ASSERT(session, !F_ISSET(S2C(session), WT_CONN_IN_MEMORY));
-
-#ifdef HAVE_POSIX_MADVISE
- return (__posix_map_preload_madvise(session, fh, p, size));
-#else
- WT_UNUSED(fh);
- WT_UNUSED(p);
- WT_UNUSED(size);
- return (ENOTSUP);
-#endif
-}
-
#ifdef HAVE_POSIX_MADVISE
/*
- * __posix_map_discard_madvise --
+ * __wt_posix_map_discard --
* Discard a chunk of the memory map.
*/
-static int
-__posix_map_discard_madvise(
- WT_SESSION_IMPL *session, WT_FH *fh, void *p, size_t size)
+int
+__wt_posix_map_discard(WT_FILE_HANDLE *fh,
+ WT_SESSION *wt_session, void *map, size_t length, void *mapped_cookie)
{
WT_CONNECTION_IMPL *conn;
WT_DECL_RET;
+ WT_SESSION_IMPL *session;
void *blk;
+ WT_UNUSED(mapped_cookie);
+
+ session = (WT_SESSION_IMPL *)wt_session;
conn = S2C(session);
/* Linux requires the address be aligned to a 4KB boundary. */
- blk = (void *)((uintptr_t)p & ~(uintptr_t)(conn->page_size - 1));
- size += WT_PTRDIFF(p, blk);
+ blk = (void *)((uintptr_t)map & ~(uintptr_t)(conn->page_size - 1));
+ length += WT_PTRDIFF(map, blk);
- if ((ret = posix_madvise(blk, size, POSIX_MADV_DONTNEED)) == 0)
+ if ((ret = posix_madvise(blk, length, POSIX_MADV_DONTNEED)) == 0)
return (0);
WT_RET_MSG(session, ret,
"%s: memory-map discard: posix_madvise: POSIX_MADV_DONTNEED",
@@ -152,41 +144,23 @@ __posix_map_discard_madvise(
#endif
/*
- * __wt_posix_map_discard --
- * Discard a chunk of the memory map.
- */
-int
-__wt_posix_map_discard(
- WT_SESSION_IMPL *session, WT_FH *fh, void *p, size_t size)
-{
- WT_ASSERT(session, !F_ISSET(S2C(session), WT_CONN_IN_MEMORY));
-
-#ifdef HAVE_POSIX_MADVISE
- return (__posix_map_discard_madvise(session, fh, p, size));
-#else
- WT_UNUSED(fh);
- WT_UNUSED(p);
- WT_UNUSED(size);
- return (ENOTSUP);
-#endif
-}
-
-/*
- * __wt_posix_map_unmap --
+ * __wt_posix_unmap --
* Remove a memory mapping.
*/
int
-__wt_posix_map_unmap(WT_SESSION_IMPL *session,
- WT_FH *fh, void *map, size_t len, void **mappingcookie)
+__wt_posix_unmap(WT_FILE_HANDLE *fh, WT_SESSION *wt_session,
+ void *mapped_region, size_t len, void *mapped_cookie)
{
- WT_UNUSED(mappingcookie);
+ WT_SESSION_IMPL *session;
+
+ WT_UNUSED(mapped_cookie);
- WT_ASSERT(session, !F_ISSET(S2C(session), WT_CONN_IN_MEMORY));
+ session = (WT_SESSION_IMPL *)wt_session;
(void)__wt_verbose(session, WT_VERB_HANDLEOPS,
"%s: memory-unmap: %" WT_SIZET_FMT " bytes", fh->name, len);
- if (munmap(map, len) == 0)
+ if (munmap(mapped_region, len) == 0)
return (0);
WT_RET_MSG(session, __wt_errno(), "%s: memory-unmap: munmap", fh->name);
diff --git a/src/os_win/os_dir.c b/src/os_win/os_dir.c
index 64eae60983c..6f796f6ef7d 100644
--- a/src/os_win/os_dir.c
+++ b/src/os_win/os_dir.c
@@ -13,34 +13,37 @@
* Get a list of files from a directory, MSVC version.
*/
int
-__wt_win_directory_list(WT_SESSION_IMPL *session, const char *dir,
- const char *prefix, uint32_t flags, char ***dirlist, u_int *countp)
+__wt_win_directory_list(WT_FILE_SYSTEM *file_system,
+ WT_SESSION *wt_session, const char *directory,
+ const char *prefix, char ***dirlistp, uint32_t *countp)
{
HANDLE findhandle;
WIN32_FIND_DATA finddata;
WT_DECL_ITEM(pathbuf);
WT_DECL_RET;
+ WT_SESSION_IMPL *session;
size_t dirallocsz, pathlen;
- u_int count, dirsz;
- bool match;
- char **entries, *path;
+ uint32_t count;
+ char *dir_copy, **entries;
- *dirlist = NULL;
- *countp = 0;
+ WT_UNUSED(file_system);
- WT_RET(__wt_filename(session, dir, &path));
+ session = (WT_SESSION_IMPL *)wt_session;
- pathlen = strlen(path);
- if (path[pathlen - 1] == '\\')
- path[pathlen - 1] = '\0';
- WT_ERR(__wt_scr_alloc(session, pathlen + 3, &pathbuf));
- WT_ERR(__wt_buf_fmt(session, pathbuf, "%s\\*", path));
+ *dirlistp = NULL;
+ *countp = 0;
findhandle = INVALID_HANDLE_VALUE;
dirallocsz = 0;
- dirsz = 0;
entries = NULL;
+ WT_ERR(__wt_strdup(session, directory, &dir_copy));
+ pathlen = strlen(dir_copy);
+ if (dir_copy[pathlen - 1] == '\\')
+ dir_copy[pathlen - 1] = '\0';
+ WT_ERR(__wt_scr_alloc(session, pathlen + 3, &pathbuf));
+ WT_ERR(__wt_buf_fmt(session, pathbuf, "%s\\*", dir_copy));
+
findhandle = FindFirstFileA(pathbuf->data, &finddata);
if (findhandle == INVALID_HANDLE_VALUE)
WT_ERR_MSG(session, __wt_getlasterror(),
@@ -56,46 +59,54 @@ __wt_win_directory_list(WT_SESSION_IMPL *session, const char *dir,
continue;
/* The list of files is optionally filtered by a prefix. */
- match = false;
if (prefix != NULL &&
- ((LF_ISSET(WT_DIRLIST_INCLUDE) &&
- WT_PREFIX_MATCH(finddata.cFileName, prefix)) ||
- (LF_ISSET(WT_DIRLIST_EXCLUDE) &&
- !WT_PREFIX_MATCH(finddata.cFileName, prefix))))
- match = true;
- if (prefix == NULL || match) {
- /*
- * We have a file name we want to return.
- */
- count++;
- if (count > dirsz) {
- dirsz += WT_DIR_ENTRY;
- WT_ERR(__wt_realloc_def(session,
- &dirallocsz, dirsz, &entries));
- }
- WT_ERR(__wt_strdup(session,
- finddata.cFileName, &entries[count - 1]));
- }
+ !WT_PREFIX_MATCH(finddata.cFileName, prefix))
+ continue;
+
+ WT_ERR(__wt_realloc_def(
+ session, &dirallocsz, count + 1, &entries));
+ WT_ERR(__wt_strdup(
+ session, finddata.cFileName, &entries[count]));
+ ++count;
} while (FindNextFileA(findhandle, &finddata) != 0);
- if (count > 0)
- *dirlist = entries;
+
+ *dirlistp = entries;
*countp = count;
err: if (findhandle != INVALID_HANDLE_VALUE)
(void)FindClose(findhandle);
- __wt_free(session, path);
+ __wt_free(session, dir_copy);
__wt_scr_free(session, &pathbuf);
if (ret == 0)
return (0);
- if (*dirlist != NULL) {
- for (count = dirsz; count > 0; count--)
- __wt_free(session, entries[count]);
- __wt_free(session, entries);
- }
+ WT_TRET(__wt_win_directory_list_free(
+ file_system, wt_session, entries, count));
WT_RET_MSG(session, ret,
"%s: directory-list, prefix \"%s\"",
- dir, prefix == NULL ? "" : prefix);
+ directory, prefix == NULL ? "" : prefix);
+}
+
+/*
+ * __wt_win_directory_list_free --
+ * Free memory returned by __wt_win_directory_list, Windows version.
+ */
+int
+__wt_win_directory_list_free(WT_FILE_SYSTEM *file_system,
+ WT_SESSION *wt_session, char **dirlist, uint32_t count)
+{
+ WT_SESSION_IMPL *session;
+
+ WT_UNUSED(file_system);
+
+ session = (WT_SESSION_IMPL *)wt_session;
+
+ if (dirlist != NULL) {
+ while (count > 0)
+ __wt_free(session, dirlist[--count]);
+ __wt_free(session, dirlist);
+ }
+ return (0);
}
diff --git a/src/os_win/os_dlopen.c b/src/os_win/os_dlopen.c
index ce949e4ea5f..9289c8f6488 100644
--- a/src/os_win/os_dlopen.c
+++ b/src/os_win/os_dlopen.c
@@ -20,6 +20,7 @@ __wt_dlopen(WT_SESSION_IMPL *session, const char *path, WT_DLH **dlhp)
WT_RET(__wt_calloc_one(session, &dlh));
WT_ERR(__wt_strdup(session, path, &dlh->name));
+ WT_ERR(__wt_strdup(session, path == NULL ? "local" : path, &dlh->name));
/* NULL means load from the current binary */
if (path == NULL) {
diff --git a/src/os_win/os_fs.c b/src/os_win/os_fs.c
index afe3a074374..318ff723829 100644
--- a/src/os_win/os_fs.c
+++ b/src/os_win/os_fs.c
@@ -9,34 +9,21 @@
#include "wt_internal.h"
/*
- * __win_directory_sync --
- * Flush a directory to ensure a file creation is durable.
- */
-static int
-__win_directory_sync(WT_SESSION_IMPL *session, const char *path)
-{
- WT_UNUSED(session);
- WT_UNUSED(path);
- return (0);
-}
-
-/*
- * __win_file_exist --
+ * __win_fs_exist --
* Return if the file exists.
*/
static int
-__win_file_exist(WT_SESSION_IMPL *session, const char *name, bool *existp)
+__win_fs_exist(WT_FILE_SYSTEM *file_system,
+ WT_SESSION *wt_session, const char *name, bool *existp)
{
WT_DECL_RET;
- char *path;
-
- WT_RET(__wt_filename(session, name, &path));
+ WT_SESSION_IMPL *session;
- ret = GetFileAttributesA(path);
+ WT_UNUSED(file_system);
- __wt_free(session, path);
+ session = (WT_SESSION_IMPL *)wt_session;
- if (ret != INVALID_FILE_ATTRIBUTES)
+ if (GetFileAttributesA(name) != INVALID_FILE_ATTRIBUTES)
*existp = true;
else
*existp = false;
@@ -45,142 +32,96 @@ __win_file_exist(WT_SESSION_IMPL *session, const char *name, bool *existp)
}
/*
- * __win_file_remove --
+ * __win_fs_remove --
* Remove a file.
*/
static int
-__win_file_remove(WT_SESSION_IMPL *session, const char *name)
+__win_fs_remove(
+ WT_FILE_SYSTEM *file_system, WT_SESSION *wt_session, const char *name)
{
WT_DECL_RET;
- char *path;
+ WT_SESSION_IMPL *session;
-#ifdef HAVE_DIAGNOSTIC
- if (__wt_handle_search(session, name, false, NULL, NULL))
- WT_RET_MSG(session, EINVAL,
- "%s: file-remove: file has open handles", name);
-#endif
+ WT_UNUSED(file_system);
- WT_RET(__wt_filename(session, name, &path));
- name = path;
+ session = (WT_SESSION_IMPL *)wt_session;
- if (DeleteFileA(name) == FALSE) {
- ret = __wt_getlasterror();
- __wt_err(session, ret, "%s: file-remove: DeleteFileA", name);
- }
+ if (DeleteFileA(name) == FALSE)
+ WT_RET_MSG(session, __wt_getlasterror(),
+ "%s: file-remove: DeleteFileA", name);
- __wt_free(session, path);
- return (ret);
+ return (0);
}
/*
- * __win_file_rename --
+ * __win_fs_rename --
* Rename a file.
*/
static int
-__win_file_rename(WT_SESSION_IMPL *session, const char *from, const char *to)
+__win_fs_rename(WT_FILE_SYSTEM *file_system,
+ WT_SESSION *wt_session, const char *from, const char *to)
{
WT_DECL_RET;
- char *from_path, *to_path;
+ WT_SESSION_IMPL *session;
-#ifdef HAVE_DIAGNOSTIC
- if (__wt_handle_search(session, from, false, NULL, NULL))
- WT_RET_MSG(session, EINVAL,
- "%s: file-rename: file has open handles", from);
- if (__wt_handle_search(session, to, false, NULL, NULL))
- WT_RET_MSG(session, EINVAL,
- "%s: file-rename: file has open handles", to);
-#endif
+ WT_UNUSED(file_system);
- from_path = to_path = NULL;
- WT_ERR(__wt_filename(session, from, &from_path));
- from = from_path;
- WT_ERR(__wt_filename(session, to, &to_path));
- to = to_path;
+ session = (WT_SESSION_IMPL *)wt_session;
/*
* Check if file exists since Windows does not override the file if
* it exists.
*/
if (GetFileAttributesA(to) != INVALID_FILE_ATTRIBUTES)
- if (DeleteFileA(to) == FALSE) {
- ret = __wt_getlasterror();
- __wt_err(session, ret,
+ if (DeleteFileA(to) == FALSE)
+ WT_RET_MSG(session, __wt_getlasterror(),
"%s to %s: file-rename: rename", from, to);
- }
- if (ret == 0 && MoveFileA(from, to) == FALSE) {
- ret = __wt_getlasterror();
- __wt_err(session, ret,
+ if (MoveFileA(from, to) == FALSE)
+ WT_RET_MSG(session, __wt_getlasterror(),
"%s to %s: file-rename: rename", from, to);
- }
-err: __wt_free(session, from_path);
- __wt_free(session, to_path);
- return (ret);
+ return (0);
}
/*
- * __win_file_size --
+ * __wt_win_fs_size --
* Get the size of a file in bytes, by file name.
*/
-static int
-__win_file_size(
- WT_SESSION_IMPL *session, const char *name, bool silent, wt_off_t *sizep)
+int
+__wt_win_fs_size(WT_FILE_SYSTEM *file_system,
+ WT_SESSION *wt_session, const char *name, wt_off_t *sizep)
{
WIN32_FILE_ATTRIBUTE_DATA data;
- WT_DECL_RET;
- char *path;
-
- WT_RET(__wt_filename(session, name, &path));
+ WT_SESSION_IMPL *session;
- ret = GetFileAttributesExA(path, GetFileExInfoStandard, &data);
+ WT_UNUSED(file_system);
- __wt_free(session, path);
+ session = (WT_SESSION_IMPL *)wt_session;
- if (ret != 0) {
+ if (GetFileAttributesExA(name, GetFileExInfoStandard, &data) != 0) {
*sizep =
((int64_t)data.nFileSizeHigh << 32) | data.nFileSizeLow;
return (0);
}
- /*
- * Some callers of this function expect failure if the file doesn't
- * exist, and don't want an error message logged.
- */
- ret = __wt_getlasterror();
- if (!silent)
- WT_RET_MSG(session, ret,
- "%s: file-size: GetFileAttributesEx", name);
- return (ret);
-}
-
-/*
- * __win_handle_allocate_configure --
- * Configure fallocate behavior for a file handle.
- */
-static void
-__win_handle_allocate_configure(WT_SESSION_IMPL *session, WT_FH *fh)
-{
- WT_UNUSED(session);
-
- /*
- * fallocate on Windows would be implemented using SetEndOfFile, which
- * can also truncate the file. WiredTiger expects fallocate to ignore
- * requests to truncate the file which Windows does not do, so we don't
- * support the call.
- */
- fh->fallocate_available = WT_FALLOCATE_NOT_AVAILABLE;
- fh->fallocate_requires_locking = false;
+ WT_RET_MSG(session, __wt_getlasterror(),
+ "%s: file-size: GetFileAttributesEx", name);
}
/*
- * __win_handle_close --
+ * __win_file_close --
* ANSI C close.
*/
static int
-__win_handle_close(WT_SESSION_IMPL *session, WT_FH *fh)
+__win_file_close(WT_FILE_HANDLE *file_handle, WT_SESSION *wt_session)
{
WT_DECL_RET;
+ WT_FILE_HANDLE_WIN *win_fh;
+ WT_SESSION_IMPL *session;
+
+ win_fh = (WT_FILE_HANDLE_WIN *)file_handle;
+ session = (WT_SESSION_IMPL *)wt_session;
/*
* Close the primary and secondary handles.
@@ -189,31 +130,40 @@ __win_handle_close(WT_SESSION_IMPL *session, WT_FH *fh)
* flushing, as it's not necessary (or possible) to flush a directory
* on Windows. Confirm the file handle is open before closing it.
*/
- if (fh->filehandle != INVALID_HANDLE_VALUE &&
- CloseHandle(fh->filehandle) == 0) {
+ if (win_fh->filehandle != INVALID_HANDLE_VALUE &&
+ CloseHandle(win_fh->filehandle) == 0) {
ret = __wt_getlasterror();
__wt_err(session, ret,
- "%s: handle-close: CloseHandle", fh->name);
+ "%s: handle-close: CloseHandle", file_handle->name);
}
- if (fh->filehandle_secondary != INVALID_HANDLE_VALUE &&
- CloseHandle(fh->filehandle_secondary) == 0) {
+ if (win_fh->filehandle_secondary != INVALID_HANDLE_VALUE &&
+ CloseHandle(win_fh->filehandle_secondary) == 0) {
ret = __wt_getlasterror();
__wt_err(session, ret,
- "%s: handle-close: secondary: CloseHandle", fh->name);
+ "%s: handle-close: secondary: CloseHandle",
+ file_handle->name);
}
+ __wt_free(session, file_handle->name);
+ __wt_free(session, win_fh);
return (ret);
}
/*
- * __win_handle_lock --
+ * __win_file_lock --
* Lock/unlock a file.
*/
static int
-__win_handle_lock(WT_SESSION_IMPL *session, WT_FH *fh, bool lock)
+__win_file_lock(
+ WT_FILE_HANDLE *file_handle, WT_SESSION *wt_session, bool lock)
{
WT_DECL_RET;
+ WT_FILE_HANDLE_WIN *win_fh;
+ WT_SESSION_IMPL *session;
+
+ win_fh = (WT_FILE_HANDLE_WIN *)file_handle;
+ session = (WT_SESSION_IMPL *)wt_session;
/*
* WiredTiger requires this function be able to acquire locks past
@@ -231,37 +181,42 @@ __win_handle_lock(WT_SESSION_IMPL *session, WT_FH *fh, bool lock)
* This is useful to coordinate adding records to the end of a file.
*/
if (lock) {
- if (LockFile(fh->filehandle, 0, 0, 1, 0) == FALSE) {
+ if (LockFile(win_fh->filehandle, 0, 0, 1, 0) == FALSE) {
ret = __wt_getlasterror();
__wt_err(session, ret,
- "%s: handle-lock: LockFile", fh->name);
+ "%s: handle-lock: LockFile", file_handle->name);
}
} else
- if (UnlockFile(fh->filehandle, 0, 0, 1, 0) == FALSE) {
+ if (UnlockFile(win_fh->filehandle, 0, 0, 1, 0) == FALSE) {
ret = __wt_getlasterror();
__wt_err(session, ret,
- "%s: handle-lock: UnlockFile", fh->name);
+ "%s: handle-lock: UnlockFile", file_handle->name);
}
return (ret);
}
/*
- * __win_handle_read --
+ * __win_file_read --
* Read a chunk.
*/
static int
-__win_handle_read(
- WT_SESSION_IMPL *session, WT_FH *fh, wt_off_t offset, size_t len, void *buf)
+__win_file_read(WT_FILE_HANDLE *file_handle,
+ WT_SESSION *wt_session, wt_off_t offset, size_t len, void *buf)
{
DWORD chunk, nr;
uint8_t *addr;
OVERLAPPED overlapped = { 0 };
+ WT_FILE_HANDLE_WIN *win_fh;
+ WT_SESSION_IMPL *session;
+
+ win_fh = (WT_FILE_HANDLE_WIN *)file_handle;
+ session = (WT_SESSION_IMPL *)wt_session;
nr = 0;
/* Assert direct I/O is aligned and a multiple of the alignment. */
WT_ASSERT(session,
- !fh->direct_io ||
+ !win_fh->direct_io ||
S2C(session)->buffer_alignment == 0 ||
(!((uintptr_t)buf &
(uintptr_t)(S2C(session)->buffer_alignment - 1)) &&
@@ -274,42 +229,54 @@ __win_handle_read(
overlapped.Offset = UINT32_MAX & offset;
overlapped.OffsetHigh = UINT32_MAX & (offset >> 32);
- if (!ReadFile(fh->filehandle, addr, chunk, &nr, &overlapped))
+ if (!ReadFile(
+ win_fh->filehandle, addr, chunk, &nr, &overlapped))
WT_RET_MSG(session,
__wt_getlasterror(),
"%s: handle-read: ReadFile: failed to read %lu "
"bytes at offset %" PRIuMAX,
- fh->name, chunk, (uintmax_t)offset);
+ file_handle->name, chunk, (uintmax_t)offset);
}
return (0);
}
/*
- * __win_handle_size --
+ * __win_file_size --
* Get the size of a file in bytes, by file handle.
*/
static int
-__win_handle_size(WT_SESSION_IMPL *session, WT_FH *fh, wt_off_t *sizep)
+__win_file_size(
+ WT_FILE_HANDLE *file_handle, WT_SESSION *wt_session, wt_off_t *sizep)
{
+ WT_FILE_HANDLE_WIN *win_fh;
+ WT_SESSION_IMPL *session;
LARGE_INTEGER size;
- if (GetFileSizeEx(fh->filehandle, &size) != 0) {
+ win_fh = (WT_FILE_HANDLE_WIN *)file_handle;
+ session = (WT_SESSION_IMPL *)wt_session;
+
+ if (GetFileSizeEx(win_fh->filehandle, &size) != 0) {
*sizep = size.QuadPart;
return (0);
}
- WT_RET_MSG(session,
- __wt_getlasterror(), "%s: handle-size: GetFileSizeEx", fh->name);
+ WT_RET_MSG(session, __wt_getlasterror(),
+ "%s: handle-size: GetFileSizeEx", file_handle->name);
}
/*
- * __win_handle_sync --
+ * __win_file_sync --
* MSVC fsync.
*/
static int
-__win_handle_sync(WT_SESSION_IMPL *session, WT_FH *fh, bool block)
+__win_file_sync(WT_FILE_HANDLE *file_handle, WT_SESSION *wt_session)
{
WT_DECL_RET;
+ WT_FILE_HANDLE_WIN *win_fh;
+ WT_SESSION_IMPL *session;
+
+ win_fh = (WT_FILE_HANDLE_WIN *)file_handle;
+ session = (WT_SESSION_IMPL *)wt_session;
/*
* We don't open Windows system handles when opening directories
@@ -317,72 +284,79 @@ __win_handle_sync(WT_SESSION_IMPL *session, WT_FH *fh, bool block)
* a directory on Windows. Confirm the file handle is set before
* attempting to sync it.
*/
- if (fh->filehandle == INVALID_HANDLE_VALUE)
+ if (win_fh->filehandle == INVALID_HANDLE_VALUE)
return (0);
- /*
- * Callers attempting asynchronous flush handle ENOTSUP returns,
- * and won't make further attempts.
- */
- if (!block)
- return (ENOTSUP);
-
- if (FlushFileBuffers(fh->filehandle) == FALSE) {
+ if (FlushFileBuffers(win_fh->filehandle) == FALSE) {
ret = __wt_getlasterror();
WT_RET_MSG(session, ret,
- "%s handle-sync: FlushFileBuffers error", fh->name);
+ "%s handle-sync: FlushFileBuffers error",
+ file_handle->name);
}
return (0);
}
/*
- * __win_handle_truncate --
+ * __win_file_truncate --
* Truncate a file.
*/
static int
-__win_handle_truncate(WT_SESSION_IMPL *session, WT_FH *fh, wt_off_t len)
+__win_file_truncate(
+ WT_FILE_HANDLE *file_handle, WT_SESSION *wt_session, wt_off_t len)
{
WT_DECL_RET;
+ WT_FILE_HANDLE_WIN *win_fh;
+ WT_SESSION_IMPL *session;
LARGE_INTEGER largeint;
+ win_fh = (WT_FILE_HANDLE_WIN *)file_handle;
+ session = (WT_SESSION_IMPL *)wt_session;
+
largeint.QuadPart = len;
- if (fh->filehandle_secondary == INVALID_HANDLE_VALUE)
+ if (win_fh->filehandle_secondary == INVALID_HANDLE_VALUE)
WT_RET_MSG(session, EINVAL,
- "%s: handle-truncate: read-only", fh->name);
+ "%s: handle-truncate: read-only", file_handle->name);
if (SetFilePointerEx(
- fh->filehandle_secondary, largeint, NULL, FILE_BEGIN) == FALSE)
+ win_fh->filehandle_secondary, largeint, NULL, FILE_BEGIN) == FALSE)
WT_RET_MSG(session, __wt_getlasterror(),
- "%s: handle-truncate: SetFilePointerEx", fh->name);
+ "%s: handle-truncate: SetFilePointerEx",
+ file_handle->name);
- if (SetEndOfFile(fh->filehandle_secondary) == FALSE) {
+ if (SetEndOfFile(win_fh->filehandle_secondary) == FALSE) {
if (GetLastError() == ERROR_USER_MAPPED_FILE)
return (EBUSY);
WT_RET_MSG(session, __wt_getlasterror(),
- "%s: handle-truncate: SetEndOfFile error", fh->name);
+ "%s: handle-truncate: SetEndOfFile error",
+ file_handle->name);
}
return (0);
}
/*
- * __win_handle_write --
+ * __win_file_write --
* Write a chunk.
*/
static int
-__win_handle_write(WT_SESSION_IMPL *session,
- WT_FH *fh, wt_off_t offset, size_t len, const void *buf)
+__win_file_write(WT_FILE_HANDLE *file_handle,
+ WT_SESSION *wt_session, wt_off_t offset, size_t len, const void *buf)
{
DWORD chunk;
DWORD nw;
const uint8_t *addr;
OVERLAPPED overlapped = { 0 };
+ WT_FILE_HANDLE_WIN *win_fh;
+ WT_SESSION_IMPL *session;
+
+ win_fh = (WT_FILE_HANDLE_WIN *)file_handle;
+ session = (WT_SESSION_IMPL *)wt_session;
nw = 0;
/* Assert direct I/O is aligned and a multiple of the alignment. */
WT_ASSERT(session,
- !fh->direct_io ||
+ !win_fh->direct_io ||
S2C(session)->buffer_alignment == 0 ||
(!((uintptr_t)buf &
(uintptr_t)(S2C(session)->buffer_alignment - 1)) &&
@@ -395,36 +369,47 @@ __win_handle_write(WT_SESSION_IMPL *session,
overlapped.Offset = UINT32_MAX & offset;
overlapped.OffsetHigh = UINT32_MAX & (offset >> 32);
- if (!WriteFile(fh->filehandle, addr, chunk, &nw, &overlapped))
+ if (!WriteFile(
+ win_fh->filehandle, addr, chunk, &nw, &overlapped))
WT_RET_MSG(session, __wt_getlasterror(),
"%s: handle-write: WriteFile: failed to write %lu "
"bytes at offset %" PRIuMAX,
- fh->name, chunk, (uintmax_t)offset);
+ file_handle->name, chunk, (uintmax_t)offset);
}
return (0);
}
/*
- * __win_file_open --
+ * __win_open_file --
* Open a file handle.
*/
static int
-__win_file_open(WT_SESSION_IMPL *session,
- WT_FH *fh, const char *name, uint32_t file_type, uint32_t flags)
+__win_open_file(WT_FILE_SYSTEM *file_system, WT_SESSION *wt_session,
+ const char *name, WT_OPEN_FILE_TYPE file_type, uint32_t flags,
+ WT_FILE_HANDLE **file_handlep)
{
DWORD dwCreationDisposition;
- HANDLE filehandle, filehandle_secondary;
WT_CONNECTION_IMPL *conn;
WT_DECL_RET;
+ WT_FILE_HANDLE *file_handle;
+ WT_FILE_HANDLE_WIN *win_fh;
+ WT_SESSION_IMPL *session;
int desired_access, f;
- bool direct_io;
+ WT_UNUSED(file_system);
+
+ *file_handlep = NULL;
+
+ session = (WT_SESSION_IMPL *)wt_session;
conn = S2C(session);
- direct_io = false;
+
+ WT_RET(__wt_calloc_one(session, &win_fh));
+
+ win_fh->direct_io = false;
/* Set up error handling. */
- fh->filehandle = fh->filehandle_secondary =
- filehandle = filehandle_secondary = INVALID_HANDLE_VALUE;
+ win_fh->filehandle =
+ win_fh->filehandle_secondary = INVALID_HANDLE_VALUE;
/*
* Opening a file handle on a directory is only to support filesystems
@@ -432,7 +417,7 @@ __win_file_open(WT_SESSION_IMPL *session,
* require that functionality: create an empty WT_FH structure with
* invalid handles.
*/
- if (file_type == WT_FILE_TYPE_DIRECTORY)
+ if (file_type == WT_OPEN_FILE_TYPE_DIRECTORY)
goto directory_open;
desired_access = GENERIC_READ;
@@ -460,33 +445,33 @@ __win_file_open(WT_SESSION_IMPL *session,
/* Direct I/O. */
if (LF_ISSET(WT_OPEN_DIRECTIO)) {
f |= FILE_FLAG_NO_BUFFERING;
- fh->direct_io = true;
+ win_fh->direct_io = true;
}
/* FILE_FLAG_WRITE_THROUGH does not require aligned buffers */
if (FLD_ISSET(conn->write_through, file_type))
f |= FILE_FLAG_WRITE_THROUGH;
- if (file_type == WT_FILE_TYPE_LOG &&
+ if (file_type == WT_OPEN_FILE_TYPE_LOG &&
FLD_ISSET(conn->txn_logsync, WT_LOG_DSYNC))
f |= FILE_FLAG_WRITE_THROUGH;
/* Disable read-ahead on trees: it slows down random read workloads. */
- if (file_type == WT_FILE_TYPE_DATA)
+ if (file_type == WT_OPEN_FILE_TYPE_DATA)
f |= FILE_FLAG_RANDOM_ACCESS;
- filehandle = CreateFileA(name, desired_access,
+ win_fh->filehandle = CreateFileA(name, desired_access,
FILE_SHARE_READ | FILE_SHARE_WRITE,
NULL, dwCreationDisposition, f, NULL);
- if (filehandle == INVALID_HANDLE_VALUE) {
+ if (win_fh->filehandle == INVALID_HANDLE_VALUE) {
if (LF_ISSET(WT_OPEN_CREATE) &&
GetLastError() == ERROR_FILE_EXISTS)
- filehandle = CreateFileA(name, desired_access,
+ win_fh->filehandle = CreateFileA(name, desired_access,
FILE_SHARE_READ | FILE_SHARE_WRITE,
NULL, OPEN_EXISTING, f, NULL);
- if (filehandle == INVALID_HANDLE_VALUE)
+ if (win_fh->filehandle == INVALID_HANDLE_VALUE)
WT_ERR_MSG(session, __wt_getlasterror(),
- direct_io ?
+ win_fh->direct_io ?
"%s: handle-open: CreateFileA: failed with direct "
"I/O configured, some filesystem types do not "
"support direct I/O" :
@@ -499,74 +484,88 @@ __win_file_open(WT_SESSION_IMPL *session,
* pointer.
*/
if (!LF_ISSET(WT_OPEN_READONLY)) {
- filehandle_secondary = CreateFileA(name, desired_access,
+ win_fh->filehandle_secondary = CreateFileA(name, desired_access,
FILE_SHARE_READ | FILE_SHARE_WRITE,
NULL, OPEN_EXISTING, f, NULL);
- if (filehandle_secondary == INVALID_HANDLE_VALUE)
+ if (win_fh->filehandle_secondary == INVALID_HANDLE_VALUE)
WT_ERR_MSG(session, __wt_getlasterror(),
"%s: handle-open: CreateFileA: secondary", name);
}
- /* Configure fallocate/posix_fallocate calls. */
- __win_handle_allocate_configure(session, fh);
-
directory_open:
- fh->filehandle = filehandle;
- fh->filehandle_secondary = filehandle_secondary;
-
- fh->fh_close = __win_handle_close;
- fh->fh_lock = __win_handle_lock;
- fh->fh_map = __wt_win_map;
- fh->fh_map_discard = __wt_win_map_discard;
- fh->fh_map_preload = __wt_win_map_preload;
- fh->fh_map_unmap = __wt_win_map_unmap;
- fh->fh_read = __win_handle_read;
- fh->fh_size = __win_handle_size;
- fh->fh_sync = __win_handle_sync;
- fh->fh_truncate = __win_handle_truncate;
- fh->fh_write = __win_handle_write;
+ /* Initialize public information. */
+ file_handle = (WT_FILE_HANDLE *)win_fh;
+ WT_ERR(__wt_strdup(session, name, &file_handle->name));
- return (0);
+ file_handle->close = __win_file_close;
+ file_handle->lock = __win_file_lock;
+#ifdef WORDS_BIGENDIAN
+ /*
+ * The underlying objects are little-endian, mapping objects isn't
+ * currently supported on big-endian systems.
+ */
+#else
+ file_handle->map = __wt_win_map;
+ file_handle->map_discard = NULL;
+ file_handle->map_preload = NULL;
+ file_handle->unmap = __wt_win_unmap;
+#endif
+ file_handle->read = __win_file_read;
+ file_handle->size = __win_file_size;
+ file_handle->sync = __win_file_sync;
+ file_handle->truncate = __win_file_truncate;
+ file_handle->write = __win_file_write;
+
+ *file_handlep = file_handle;
-err: if (filehandle != INVALID_HANDLE_VALUE)
- (void)CloseHandle(filehandle);
- if (filehandle_secondary != INVALID_HANDLE_VALUE)
- (void)CloseHandle(filehandle_secondary);
+ return (0);
+err: WT_TRET(__win_file_close((WT_FILE_HANDLE *)win_fh, wt_session));
return (ret);
}
/*
- * __wt_os_win --
- * Initialize a MSVC configuration.
+ * __win_terminate --
+ * Discard a Windows configuration.
*/
-int
-__wt_os_win(WT_SESSION_IMPL *session)
+static int
+__win_terminate(WT_FILE_SYSTEM *file_system, WT_SESSION *wt_session)
{
- WT_CONNECTION_IMPL *conn;
-
- conn = S2C(session);
+ WT_SESSION_IMPL *session;
- /* Initialize the POSIX jump table. */
- conn->file_directory_list = __wt_win_directory_list;
- conn->file_directory_sync = __win_directory_sync;
- conn->file_exist = __win_file_exist;
- conn->file_open = __win_file_open;
- conn->file_remove = __win_file_remove;
- conn->file_rename = __win_file_rename;
- conn->file_size = __win_file_size;
+ session = (WT_SESSION_IMPL *)wt_session;
+ __wt_free(session, file_system);
return (0);
}
/*
- * __wt_os_win_cleanup --
- * Discard a POSIX configuration.
+ * __wt_os_win --
+ * Initialize a MSVC configuration.
*/
int
-__wt_os_win_cleanup(WT_SESSION_IMPL *session)
+__wt_os_win(WT_SESSION_IMPL *session)
{
- WT_UNUSED(session);
+ WT_CONNECTION_IMPL *conn;
+ WT_FILE_SYSTEM *file_system;
+
+ conn = S2C(session);
+
+ WT_RET(__wt_calloc_one(session, &file_system));
+
+ /* Initialize the Windows jump table. */
+ file_system->directory_list = __wt_win_directory_list;
+ file_system->directory_list_free = __wt_win_directory_list_free;
+ file_system->directory_sync = NULL;
+ file_system->exist = __win_fs_exist;
+ file_system->open_file = __win_open_file;
+ file_system->remove = __win_fs_remove;
+ file_system->rename = __win_fs_rename;
+ file_system->size = __wt_win_fs_size;
+ file_system->terminate = __win_terminate;
+
+ /* Switch it into place. */
+ conn->file_system = file_system;
return (0);
}
diff --git a/src/os_win/os_map.c b/src/os_win/os_map.c
index b043f9c9923..488cbfb2ceb 100644
--- a/src/os_win/os_map.c
+++ b/src/os_win/os_map.c
@@ -13,106 +13,83 @@
* Map a file into memory.
*/
int
-__wt_win_map(WT_SESSION_IMPL *session,
- WT_FH *fh, void *mapp, size_t *lenp, void **mappingcookie)
+__wt_win_map(WT_FILE_HANDLE *file_handle, WT_SESSION *wt_session,
+ void *mapped_regionp, size_t *lenp, void *mapped_cookiep)
{
WT_DECL_RET;
+ WT_FILE_HANDLE_WIN *win_fh;
+ WT_SESSION_IMPL *session;
size_t len;
wt_off_t file_size;
- void *map;
+ void *map, *mapped_cookie;
+
+ win_fh = (WT_FILE_HANDLE_WIN *)file_handle;
+ session = (WT_SESSION_IMPL *)wt_session;
/*
* There's no locking here to prevent the underlying file from changing
* underneath us, our caller needs to ensure consistency of the mapped
* region vs. any other file activity.
*/
- WT_RET(__wt_filesize(session, fh, &file_size));
+ WT_RET(__wt_win_fs_size(file_handle->file_system,
+ wt_session, file_handle->name, &file_size));
len = (size_t)file_size;
(void)__wt_verbose(session, WT_VERB_HANDLEOPS,
- "%s: memory-map: %" WT_SIZET_FMT " bytes", fh->name, len);
+ "%s: memory-map: %" WT_SIZET_FMT " bytes", file_handle->name, len);
- *mappingcookie =
- CreateFileMappingA(fh->filehandle, NULL, PAGE_READONLY, 0, 0, NULL);
- if (*mappingcookie == NULL)
+ mapped_cookie = CreateFileMappingA(
+ win_fh->filehandle, NULL, PAGE_READONLY, 0, 0, NULL);
+ if (mapped_cookie == NULL)
WT_RET_MSG(session, __wt_getlasterror(),
- "%s: memory-map: CreateFileMappingA", fh->name);
+ "%s: memory-map: CreateFileMappingA", file_handle->name);
if ((map =
- MapViewOfFile(*mappingcookie, FILE_MAP_READ, 0, 0, len)) == NULL) {
+ MapViewOfFile(mapped_cookie, FILE_MAP_READ, 0, 0, len)) == NULL) {
/* Retrieve the error before cleaning up. */
ret = __wt_getlasterror();
- CloseHandle(*mappingcookie);
- *mappingcookie = NULL;
+ CloseHandle(mapped_cookie);
WT_RET_MSG(session, ret,
- "%s: memory-map: MapViewOfFile", fh->name);
+ "%s: memory-map: MapViewOfFile", file_handle->name);
}
- *(void **)mapp = map;
+ *(void **)mapped_cookiep = mapped_cookie;
+ *(void **)mapped_regionp = map;
*lenp = len;
return (0);
}
/*
- * __wt_win_map_preload --
- * Cause a section of a memory map to be faulted in.
- */
-int
-__wt_win_map_preload(
- WT_SESSION_IMPL *session, WT_FH *fh, const void *p, size_t size)
-{
- WT_UNUSED(session);
- WT_UNUSED(fh);
- WT_UNUSED(p);
- WT_UNUSED(size);
-
- return (ENOTSUP);
-}
-
-/*
- * __wt_win_map_discard --
- * Discard a chunk of the memory map.
- */
-int
-__wt_win_map_discard(WT_SESSION_IMPL *session, WT_FH *fh, void *p, size_t size)
-{
- WT_UNUSED(session);
- WT_UNUSED(fh);
- WT_UNUSED(p);
- WT_UNUSED(size);
-
- return (ENOTSUP);
-}
-
-/*
- * __wt_win_map_unmap --
+ * __wt_win_unmap --
* Remove a memory mapping.
*/
int
-__wt_win_map_unmap(WT_SESSION_IMPL *session,
- WT_FH *fh, void *map, size_t len, void **mappingcookie)
+__wt_win_unmap(WT_FILE_HANDLE *file_handle, WT_SESSION *wt_session,
+ void *mapped_region, size_t length, void *mapped_cookie)
{
WT_DECL_RET;
+ WT_FILE_HANDLE_WIN *win_fh;
+ WT_SESSION_IMPL *session;
- (void)__wt_verbose(session, WT_VERB_HANDLEOPS,
- "%s: memory-unmap: %" WT_SIZET_FMT " bytes", fh->name, len);
+ win_fh = (WT_FILE_HANDLE_WIN *)file_handle;
+ session = (WT_SESSION_IMPL *)wt_session;
- WT_ASSERT(session, *mappingcookie != NULL);
+ (void)__wt_verbose(session, WT_VERB_HANDLEOPS,
+ "%s: memory-unmap: %" WT_SIZET_FMT " bytes",
+ file_handle->name, length);
- if (UnmapViewOfFile(map) == 0) {
+ if (UnmapViewOfFile(mapped_region) == 0) {
ret = __wt_getlasterror();
__wt_err(session, ret,
- "%s: memory-unmap: UnmapViewOfFile", fh->name);
+ "%s: memory-unmap: UnmapViewOfFile", file_handle->name);
}
- if (CloseHandle(*mappingcookie) == 0) {
+ if (CloseHandle(*(void **)mapped_cookie) == 0) {
ret = __wt_getlasterror();
__wt_err(session, ret,
- "%s: memory-unmap: CloseHandle", fh->name);
+ "%s: memory-unmap: CloseHandle", file_handle->name);
}
- *mappingcookie = NULL;
-
return (ret);
}
diff --git a/src/schema/schema_create.c b/src/schema/schema_create.c
index 756f1fdcc6c..67d64cf1c75 100644
--- a/src/schema/schema_create.c
+++ b/src/schema/schema_create.c
@@ -35,7 +35,7 @@ __wt_direct_io_size_check(WT_SESSION_IMPL *session,
* units of its happy place.
*/
if (FLD_ISSET(conn->direct_io,
- WT_FILE_TYPE_CHECKPOINT | WT_FILE_TYPE_DATA)) {
+ WT_DIRECT_IO_CHECKPOINT | WT_DIRECT_IO_DATA)) {
align = (int64_t)conn->buffer_alignment;
if (align != 0 && (cval.val < align || cval.val % align != 0))
WT_RET_MSG(session, EINVAL,
diff --git a/src/schema/schema_rename.c b/src/schema/schema_rename.c
index 21402ed9332..8f4d374fd22 100644
--- a/src/schema/schema_rename.c
+++ b/src/schema/schema_rename.c
@@ -55,7 +55,7 @@ __rename_file(
default:
WT_ERR(ret);
}
- WT_ERR(__wt_exist(session, newfile, &exist));
+ WT_ERR(__wt_fs_exist(session, newfile, &exist));
if (exist)
WT_ERR_MSG(session, EEXIST, "%s", newfile);
@@ -64,7 +64,7 @@ __rename_file(
WT_ERR(__wt_metadata_insert(session, newuri, oldvalue));
/* Rename the underlying file. */
- WT_ERR(__wt_rename(session, filename, newfile));
+ WT_ERR(__wt_fs_rename(session, filename, newfile));
if (WT_META_TRACKING(session))
WT_ERR(__wt_meta_track_fileop(session, uri, newuri));
diff --git a/src/schema/schema_stat.c b/src/schema/schema_stat.c
index d3d0605c60a..c204d6b1a24 100644
--- a/src/schema/schema_stat.c
+++ b/src/schema/schema_stat.c
@@ -69,6 +69,7 @@ __curstat_size_only(WT_SESSION_IMPL *session,
WT_ITEM namebuf;
wt_off_t filesize;
char *tableconf;
+ bool exist;
WT_CLEAR(namebuf);
*was_fast = false;
@@ -96,10 +97,11 @@ __curstat_size_only(WT_SESSION_IMPL *session,
* are concurrent schema level operations (for example drop). That is
* fine - failing here results in falling back to the slow path of
* opening the handle.
- * !!! Deliberately discard the return code from a failed call - the
- * error is flagged by not setting fast to true.
*/
- if (__wt_filesize_name(session, namebuf.data, true, &filesize) == 0) {
+ WT_ERR(__wt_fs_exist(session, namebuf.data, &exist));
+ if (exist) {
+ WT_ERR(__wt_fs_size(session, namebuf.data, &filesize));
+
/* Setup and populate the statistics structure */
__wt_stat_dsrc_init_single(&cst->u.dsrc_stats);
cst->u.dsrc_stats.block_size = filesize;