diff options
author | David Hadas <davidh@il.ibm.com> | 2013-02-27 00:49:51 +0200 |
---|---|---|
committer | David Hadas <davidh@il.ibm.com> | 2013-03-08 22:28:06 +0200 |
commit | 8b140033f01333fbd6d41e2946db949ab6f92599 (patch) | |
tree | 567ade27c4073e340584d7e9f1d5e86a091a829e /doc/source/overview_container_sync.rst | |
parent | a8af3835c060f384c7e4e6779073b43e65c222e9 (diff) | |
download | swift-8b140033f01333fbd6d41e2946db949ab6f92599.tar.gz |
Improved container-sync resiliency
container-sync now skips faulty objects in the first and second rounds.
All replicas try in the second round.
No server will give up until the faulty object suceeds
Fixes: bug #1068423
Change-Id: I0defc174b2ce3796a6acf410a2d2eae138e8193d
Diffstat (limited to 'doc/source/overview_container_sync.rst')
-rw-r--r-- | doc/source/overview_container_sync.rst | 24 |
1 files changed, 14 insertions, 10 deletions
diff --git a/doc/source/overview_container_sync.rst b/doc/source/overview_container_sync.rst index b62136d25..c0ab3a10b 100644 --- a/doc/source/overview_container_sync.rst +++ b/doc/source/overview_container_sync.rst @@ -223,7 +223,7 @@ hash of the object name, so it's not always guaranteed to be exactly one out of every three rows, but it usually gets close. For the sake of example, let's say that this process ends up owning rows 2 and 5. -Once it's finished syncing those rows, it updates SP1 to be the +Once it's finished trying to sync those rows, it updates SP1 to be the biggest row-id that it's seen, which is 6 in this example. :: SP2 SP1 @@ -241,19 +241,23 @@ container, creating new rows in the database. :: On the next run, the container-sync starts off looking at rows with ids between SP1 and SP2. This time, there are a bunch of them. The -sync process takes the ones it *does not* own and syncs them. Again, -this is based on the hashes, so this will be everything it didn't sync -before. In this example, that's rows 0, 1, 3, 4, and 6. - -Under normal circumstances, the container-sync processes for the other -replicas will have already taken care of synchronizing those rows, so -this is a set of quick checks. However, if one of those other sync +sync process try to sync all of them. If it succeeds, it will set +SP2 to equal SP1. If it fails, it will set SP2 to the failed object +and will continue to try all other objects till SP1, setting SP2 to +the first object that failed. + +Under normal circumstances, the container-sync processes +will have already taken care of synchronizing all rows, between SP1 +and SP2, resulting in a set of quick checks. +However, if one of the sync processes failed for some reason, then this is a vital fallback to make sure all the objects in the container get synchronized. Without this seemingly-redundant work, any container-sync failure results in -unsynchronized objects. +unsynchronized objects. Note that the container sync will persistently +retry to sync any faulty object until success, while logging each failure. -Once it's done with the fallback rows, SP2 is advanced to SP1. :: +Once it's done with the fallback rows, and assuming no faults occured, +SP2 is advanced to SP1. :: SP2 SP1 |