summaryrefslogtreecommitdiff
path: root/doc/source/overview_container_sync.rst
diff options
context:
space:
mode:
authorSamuel Merritt <sam@swiftstack.com>2012-11-21 14:57:21 -0800
committerSamuel Merritt <sam@swiftstack.com>2012-11-21 14:59:26 -0800
commit89a871d42f1226c2dd292ea739dfda01d6f4b3f2 (patch)
treea2464cd559d2f6e92d6267b82c003e7b6ce71084 /doc/source/overview_container_sync.rst
parent2fc9716ec9384b0079d9c077e0f081a13ad76624 (diff)
downloadswift-89a871d42f1226c2dd292ea739dfda01d6f4b3f2.tar.gz
Improve container-sync docs.
Two improvements: first, document that the container-sync process connects to the remote cluster's proxy server, so outbound connectivity is required. Second, rewrite the behind-the-scenes container-sync example and add some ASCII-art diagrams. Fixes bug 1068430. Bonus fix of docstring in wsgi.py to squelch a sphinx warning. Change-Id: I85bd56c2bd14431e13f7c57a43852777f14014fb
Diffstat (limited to 'doc/source/overview_container_sync.rst')
-rw-r--r--doc/source/overview_container_sync.rst121
1 files changed, 85 insertions, 36 deletions
diff --git a/doc/source/overview_container_sync.rst b/doc/source/overview_container_sync.rst
index af0168791..b62136d25 100644
--- a/doc/source/overview_container_sync.rst
+++ b/doc/source/overview_container_sync.rst
@@ -174,6 +174,13 @@ to the other container.
.. note::
+ The swift-container-sync process runs on each container server in
+ the cluster and talks to the proxy servers in the remote cluster.
+ Therefore, the container servers must be permitted to initiate
+ outbound connections to the remote proxy servers.
+
+.. note::
+
Container sync will sync object POSTs only if the proxy server is set to
use "object_post_as_copy = true" which is the default. So-called fast
object posts, "object_post_as_copy = false" do not update the container
@@ -184,39 +191,81 @@ The actual syncing is slightly more complicated to make use of the three
do the exact same work but also without missing work if one node happens to
be down.
-Two sync points are kept per container database. All rows between the two
-sync points trigger updates. Any rows newer than both sync points cause
-updates depending on the node's position for the container (primary nodes
-do one third, etc. depending on the replica count of course). After a sync
-run, the first sync point is set to the newest ROWID known and the second
-sync point is set to newest ROWID for which all updates have been sent.
-
-An example may help. Assume replica count is 3 and perfectly matching
-ROWIDs starting at 1.
-
- First sync run, database has 6 rows:
-
- * SyncPoint1 starts as -1.
- * SyncPoint2 starts as -1.
- * No rows between points, so no "all updates" rows.
- * Six rows newer than SyncPoint1, so a third of the rows are sent
- by node 1, another third by node 2, remaining third by node 3.
- * SyncPoint1 is set as 6 (the newest ROWID known).
- * SyncPoint2 is left as -1 since no "all updates" rows were synced.
-
- Next sync run, database has 12 rows:
-
- * SyncPoint1 starts as 6.
- * SyncPoint2 starts as -1.
- * The rows between -1 and 6 all trigger updates (most of which
- should short-circuit on the remote end as having already been
- done).
- * Six more rows newer than SyncPoint1, so a third of the rows are
- sent by node 1, another third by node 2, remaining third by node
- 3.
- * SyncPoint1 is set as 12 (the newest ROWID known).
- * SyncPoint2 is set as 6 (the newest "all updates" ROWID).
-
-In this way, under normal circumstances each node sends its share of
-updates each run and just sends a batch of older updates to ensure nothing
-was missed.
+Two sync points are kept in each container database. When syncing a
+container, the container-sync process figures out which replica of the
+container it has. In a standard 3-replica scenario, the process will
+have either replica number 0, 1, or 2. This is used to figure out
+which rows are belong to this sync process and which ones don't.
+
+An example may help. Assume a replica count of 3 and database row IDs
+are 1..6. Also, assume that container-sync is running on this
+container for the first time, hence SP1 = SP2 = -1. ::
+
+ SP1
+ SP2
+ |
+ v
+ -1 0 1 2 3 4 5 6
+
+First, the container-sync process looks for rows with id between SP1
+and SP2. Since this is the first run, SP1 = SP2 = -1, and there aren't
+any such rows. ::
+
+ SP1
+ SP2
+ |
+ v
+ -1 0 1 2 3 4 5 6
+
+Second, the container-sync process looks for rows with id greater than
+SP1, and syncs those rows which it owns. Ownership is based on the
+hash of the object name, so it's not always guaranteed to be exactly
+one out of every three rows, but it usually gets close. For the sake
+of example, let's say that this process ends up owning rows 2 and 5.
+
+Once it's finished syncing those rows, it updates SP1 to be the
+biggest row-id that it's seen, which is 6 in this example. ::
+
+ SP2 SP1
+ | |
+ v v
+ -1 0 1 2 3 4 5 6
+
+While all that was going on, clients uploaded new objects into the
+container, creating new rows in the database. ::
+
+ SP2 SP1
+ | |
+ v v
+ -1 0 1 2 3 4 5 6 7 8 9 10 11 12
+
+On the next run, the container-sync starts off looking at rows with
+ids between SP1 and SP2. This time, there are a bunch of them. The
+sync process takes the ones it *does not* own and syncs them. Again,
+this is based on the hashes, so this will be everything it didn't sync
+before. In this example, that's rows 0, 1, 3, 4, and 6.
+
+Under normal circumstances, the container-sync processes for the other
+replicas will have already taken care of synchronizing those rows, so
+this is a set of quick checks. However, if one of those other sync
+processes failed for some reason, then this is a vital fallback to
+make sure all the objects in the container get synchronized. Without
+this seemingly-redundant work, any container-sync failure results in
+unsynchronized objects.
+
+Once it's done with the fallback rows, SP2 is advanced to SP1. ::
+
+ SP2
+ SP1
+ |
+ v
+ -1 0 1 2 3 4 5 6 7 8 9 10 11 12
+
+Then, rows with row ID greater than SP1 are synchronized (provided
+this container-sync process is responsible for them), and SP1 is moved
+up to the greatest row ID seen. ::
+
+ SP2 SP1
+ | |
+ v v
+ -1 0 1 2 3 4 5 6 7 8 9 10 11 12