| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If two clients created a snapshot at the same time, the one with the
higher snapshot id might be created first, so the lower snapshot id
would be added to the snapshot context and the snaphot seq would be
set to the lower one.
Instead of allowing this to happen, return -ESTALE if the snapshot id
is lower than the currently stored snapshot sequence number. On the
client side, get a new id and retry if this error is encountered.
Backport: argonaut
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
|
|
|
|
|
|
| |
This is a backport of a3ad98a3eef062e9ed51dd2d1e58c593e12c9703
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The assign_bid method has issues with replay because it is a write
that also returns data. This means that the replayed operation would
return success, but no data, and cause a create to fail. Instead, let
the client set the bid based on its global id and a random number.
This only affects the creation of new images, since the bid is put
into an opaque string as part of the object prefix.
Keep the server side assign_bid around in case there are old clients
still using it.
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
|
|
|
|
|
|
|
| |
The only caller, snapshot_add(), already does a notify when add_snap()
succeeds.
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
|
|
|
|
|
|
|
|
| |
The actual data object ids don't need to be artificially restricted in
length. RBD_MAX_BLOCK_NAME_SIZE just limits the size of the object
prefix, since it's used in rbd_info_t.
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
|
|
|
|
|
|
|
|
|
|
| |
This will fail if features are requested that the client or server
does not support. Currently there are no features defined, so
zero is the only valid value.
copy() preserves the format and features of the source image.
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
|
|
|
|
| |
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
|
|
|
|
|
|
|
|
| |
The value must be passed, and it shouldn't be below 4k
(enforced by the command line tool already) or above the
range expressible in the header.
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
|
|
|
|
|
|
|
| |
This is more consistent with the rest of the code now,
and is a bit more clear.
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Instead of interpreting the header, just copy all the data and
omap values from the original header to the newly name one.
This will continue working with future header changes.
We can create the new header and write all data and omap values
to it atomically to avoid some races.
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
|
|
|
|
|
|
|
| |
Use the old or new methods make resize, snapshot add and snapsnhot
remove work with both old and new formats.
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
|
|
|
|
|
|
|
|
|
| |
Make most of them take the parameters they actually use.
trim_image() now takes an ImageCtx, which means remove() must
open the image. This has the nice side effect of not duplicating
the snapshot listing code for the old format.
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Checking that it exists doesn't prevent you from having the snapshot
change out from under you in the following situation:
You have the image open at snapshot "foo".
Someone removes snapshot "foo", writes some data to the image, and
creates a new snapshot called "foo".
This second snapshot will have a different id, but nothing prevents it
from having the name of a previously deleted snapshot.
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
|
|
|
|
|
|
|
|
| |
It now sets the member variables of ImageCtx so other functions
don't have to use the on-disk header. If the features use by
the new format are incompatible with this client, an error is returned.
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Detect the format when an image is opened by the presence of the
original format header object. Use member variables of ImageCtx to
store image metadata instead of the on-disk header format
ImageCtx::header.
This lays the foundation for changing the rest of librbd to work with
old and new formats.
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
|
|
|
|
|
|
| |
There will be an exception if memory can't be allocated.
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
|
|
|
|
|
|
| |
Remove possibility of set_start_time before set_ictx error
Signed-off-by: Dan Mick <dan.mick@inktank.com>
|
|
|
|
|
|
|
| |
Fixes: #2408
Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Return errors from flushing to the caller. Warn
if an error occurs during invalidation, but don't retry,
since the higher level handles these cases, namely:
* rollback (doing this with an image open is asking for trouble)
* shrink (doing this with writes in flight may create extra objects anyway)
* shutdown (qemu flushes before closing the device)
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
|
|
|
|
|
|
| |
This replaces the hard-coded 1 second writeback timer.
Signed-off-by: Sage Weil <sage@newdream.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Make ObjectCacher users specify the cache size for each ObjectCacher
instances. This avoids the confusing config namespace for the object
cache (client_oc_*), and also will make it possible to eventually have
cache sizes that vary between (say) RBD images.
- drop unused client_oc_max_sync_write
- add rbd_cache_max_size, max_dirty, target_dirty config values (these are
the defaults for each image)
We probably want to add librbd calls to specify the cache size on a
per-image basis? Alternatively, we should make it possible to share a
cache pool between multiple images in some explicit way.
Signed-off-by: Sage Weil <sage@newdream.net>
|
|
|
|
|
|
|
| |
This gives us access to the original ObjectExtent (useful later), and
simplifies the callers.
Signed-off-by: Sage Weil <sage@newdream.net>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We do three things here:
- Wait for the dirty limit to drop _after_ writing into the cache. This
means that an active thread can always provide its dirty data to the
cache for potential writing without waiting (a small win). It's also
helpful later... (see below, and next commit)
- Don't wait for other waiters. If another thread dirtying 1MB and is
waiting for it, don't wait for them too. This prevents two threads
writing 1MB at a time with a limit of 1MB from serializing: both can
dirty their 1MB and initiate a flush, and they once 1/2 of that has
flushed one of them will be allowed to proceed.
- Update the flusher to add the dirty_waiting bytes to the amount to
write so that the OPs will indeed be parallel.
Signed-off-by: Sage Weil <sage@newdream.net>
|
|
|
|
|
|
|
| |
This allows the rbd tool to provide a useful error message, instead of
compounding more possible causes into one error code.
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
|
|
|
|
|
|
| |
size_t was accidentally copy-pasted.
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
|
|\
| |
| |
| | |
Reviewed-by: Sage Weil <sage.weil@dreamhost.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This way we can't miss an update if we get a notify during ictx_refresh.
Specifically, a race like this:
Thread 1 Thread 2 Process 2
ictx_refresh()
read_header()
snap_create()
notify()
need_refresh = true
process header...
need_refresh = false
If this happened, we would not re-read the header with the new
snapshot, so the snapshot would not happen at the intended point
in time, but only after we re-read the header again.
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* snapid should determine whether our mapped snapshot is gone, not snapname
* snap_set(<nonexistent_snap>) shouldn't reset us to CEPH_NOSNAP
* snapname should be set before using the it in the perfcounter name
* snapname and image name don't need to be passed as arguments since an
ImageCtx already contains that info
* ictx_check() doesn't need to check for non-existent snaps - only I/Os care,
so check in check_io() instead
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
|
| |
| |
| |
| |
| |
| |
| | |
The earlier condition is >. != means < at this point, and the nesting
is unnecessary.
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
|
|/
|
|
|
|
|
|
|
| |
In particular, the OSD may return EBUSY if there are still watchers.
Ignore ENOENT, as that may indicate we are cleaning up a previously
aborted removal.
Fixes: #2311
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
|
|\ |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
I was seeing failures of LibRBD.TestIOToSnapshot where we would fail to
refresh after rollback, even though the snap existed. I assume it is
because the std::string whose c_str() we were pointing to was reallocated.
Use a std::string here instead.
This code is weird.
Signed-off-by: Sage Weil <sage@newdream.net>
|
| |
| |
| |
| |
| |
| |
| | |
Track IO operations on a per-image basis.
Implements: #1451
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The caller is still invalidating the entire cache, so we don't need to
deal with discard at this level. That might be worth cleaning up
later, though.
Fixes: #2296
Signed-off-by: Sage Weil <sage@newdream.net>
|
| |
| |
| |
| |
| |
| |
| | |
Do not assume the object extents are at the trailing edge of objects.
Instead, discard arbitrary extents. Fix callers.
Signed-off-by: Sage Weil <sage@newdream.net>
|
| |
| |
| |
| |
| |
| | |
objects is misleading here, these are byte offsets
Signed-off-by: Sage Weil <sage@newdream.net>
|
| |
| |
| |
| |
| |
| | |
Fed this to test_librbd_fsx and it was happy.
Signed-off-by: Sage Weil <sage@newdream.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
handle_sparse_read() was taking buf_ofs and buf_len, but buf_len was being
interpreted as the total size of the buffer, not the length of the extent
in the buffer start at buf_ofs. Both callers pass in an extent length, so
fix the zero code to do the right thing.
Specifically, the behavior I saw was:
- read range spanning 2 objects, trailing 20k and leading 50k
- first object didn't exist, zeroed first 20k of buffer
- second object didn't exist, zeroed next 30k (50k-20k) of buffer
- the last 20k of buffer was unzeroed.
Signed-off-by: Sage Weil <sage@newdream.net>
|
|/
|
|
|
|
| |
Print old -> new, not new -> old.
Signed-off-by: Sage Weil <sage@newdream.net>
|
|
|
|
|
|
|
| |
'enabled' is useless verbiage. We should fix the rgw option too,
protably...
Signed-off-by: Sage Weil <sage@newdream.net>
|
|
|
|
| |
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
|
|
|
|
|
|
|
|
|
|
| |
Implement sync and async discard. Embed an ObjectWriteOperation in the
BlockCompletion struct.
The sync version does a sync op on every block, just like write()... very
stupid. Both of these should fixed.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
|
|
|
|
|
|
|
| |
This makes sure the state is as consistent as librbd can make it
before the snapshot is actually created.
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
|
|
|
|
|
|
|
| |
This is a temporary workaround until the ObjectCacher
is smarter about snapshots.
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
|
|
|
|
|
|
|
| |
ObjectCacher will never do short reads, and always returns 0.
librados may do short reads at the end of an object.
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
|
|
|
|
|
|
|
| |
librados does this for us normally, but caching does not check for this.
We might as well check early to avoid scheduling a bunch of aios anyway.
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
|
|
|
|
|
|
|
| |
This uses the existing infrastructure of ObjectCacher for
buffer management and expiry.
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
|
|
|
|
|
|
| |
This is superseded by a full-fledged writeback cache.
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
|
|
|
|
|
|
|
|
|
| |
- explicitly defined subsystems, and ceph_subsys_FOO enums to go with them
- modular log system with Entry object
- separate gather level and log level
- drop lots of DoutStreambuf hackery
Signed-off-by: Sage Weil <sage@newdream.net>
|
|
|
|
|
|
| |
Don't expose internal CephContext type name.
Signed-off-by: Sage Weil <sage@newdream.net>
|