summaryrefslogtreecommitdiff
path: root/doc/source/overview_container_sync.rst
diff options
context:
space:
mode:
authorDavid Hadas <davidh@il.ibm.com>2013-02-27 00:49:51 +0200
committerDavid Hadas <davidh@il.ibm.com>2013-03-08 22:28:06 +0200
commit8b140033f01333fbd6d41e2946db949ab6f92599 (patch)
tree567ade27c4073e340584d7e9f1d5e86a091a829e /doc/source/overview_container_sync.rst
parenta8af3835c060f384c7e4e6779073b43e65c222e9 (diff)
downloadswift-8b140033f01333fbd6d41e2946db949ab6f92599.tar.gz
Improved container-sync resiliency
container-sync now skips faulty objects in the first and second rounds. All replicas try in the second round. No server will give up until the faulty object suceeds Fixes: bug #1068423 Change-Id: I0defc174b2ce3796a6acf410a2d2eae138e8193d
Diffstat (limited to 'doc/source/overview_container_sync.rst')
-rw-r--r--doc/source/overview_container_sync.rst24
1 files changed, 14 insertions, 10 deletions
diff --git a/doc/source/overview_container_sync.rst b/doc/source/overview_container_sync.rst
index b62136d25..c0ab3a10b 100644
--- a/doc/source/overview_container_sync.rst
+++ b/doc/source/overview_container_sync.rst
@@ -223,7 +223,7 @@ hash of the object name, so it's not always guaranteed to be exactly
one out of every three rows, but it usually gets close. For the sake
of example, let's say that this process ends up owning rows 2 and 5.
-Once it's finished syncing those rows, it updates SP1 to be the
+Once it's finished trying to sync those rows, it updates SP1 to be the
biggest row-id that it's seen, which is 6 in this example. ::
SP2 SP1
@@ -241,19 +241,23 @@ container, creating new rows in the database. ::
On the next run, the container-sync starts off looking at rows with
ids between SP1 and SP2. This time, there are a bunch of them. The
-sync process takes the ones it *does not* own and syncs them. Again,
-this is based on the hashes, so this will be everything it didn't sync
-before. In this example, that's rows 0, 1, 3, 4, and 6.
-
-Under normal circumstances, the container-sync processes for the other
-replicas will have already taken care of synchronizing those rows, so
-this is a set of quick checks. However, if one of those other sync
+sync process try to sync all of them. If it succeeds, it will set
+SP2 to equal SP1. If it fails, it will set SP2 to the failed object
+and will continue to try all other objects till SP1, setting SP2 to
+the first object that failed.
+
+Under normal circumstances, the container-sync processes
+will have already taken care of synchronizing all rows, between SP1
+and SP2, resulting in a set of quick checks.
+However, if one of the sync
processes failed for some reason, then this is a vital fallback to
make sure all the objects in the container get synchronized. Without
this seemingly-redundant work, any container-sync failure results in
-unsynchronized objects.
+unsynchronized objects. Note that the container sync will persistently
+retry to sync any faulty object until success, while logging each failure.
-Once it's done with the fallback rows, SP2 is advanced to SP1. ::
+Once it's done with the fallback rows, and assuming no faults occured,
+SP2 is advanced to SP1. ::
SP2
SP1