summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorathanatos <rexludorum@gmail.com>2013-05-14 15:28:45 -0700
committerathanatos <rexludorum@gmail.com>2013-05-14 15:28:45 -0700
commit5ff703d60adceb7753a50ecc8bf1e32a95999caf (patch)
tree65f198ad2010740d8d263fb97c2aacef7ca85173
parent52b0438c66b23c5eec4eed62a489143f995f6c94 (diff)
parent2a4425af0eec5438f28fc515b22dd768ab3afb8e (diff)
downloadceph-5ff703d60adceb7753a50ecc8bf1e32a95999caf.tar.gz
Merge pull request #283 from dachary/wip-5058
internal documentation proofreading Reviewed-by: Sam Just <sam.just@inktank.com>
-rw-r--r--doc/dev/osd_internals/map_message_handling.rst46
-rw-r--r--doc/dev/osd_internals/pg.rst6
-rw-r--r--doc/dev/osd_internals/pg_removal.rst11
3 files changed, 35 insertions, 28 deletions
diff --git a/doc/dev/osd_internals/map_message_handling.rst b/doc/dev/osd_internals/map_message_handling.rst
index 39b035b11dd..eb27396df37 100644
--- a/doc/dev/osd_internals/map_message_handling.rst
+++ b/doc/dev/osd_internals/map_message_handling.rst
@@ -5,7 +5,7 @@ Map and PG Message handling
Overview
--------
The OSD handles routing incoming messages to PGs, creating the PG if necessary
-in come cases.
+in some cases.
PG messages generally come in two varieties:
@@ -64,6 +64,7 @@ messages. That is, messages from different PGs may be reordered.
MOSDPGOps follow the following process:
1. OSD::handle_op: validates permissions and crush mapping.
+ discard the request if they are not connected and the client cannot get the reply ( See OSD::op_is_discardable )
See OSDService::handle_misdirected_op
See OSD::op_has_sufficient_caps
See OSD::require_same_or_newer_map
@@ -74,26 +75,37 @@ MOSDSubOps follow the following process:
1. OSD::handle_sub_op checks that sender is an OSD
2. OSD::enqueue_op
-OSD::enqueue_op calls PG::queue_op which checks can_discard_request before
-queueing the op in the op_queue and the PG in the OpWQ. Note, a single PG
-may be in the op queue multiple times for multiple ops.
+OSD::enqueue_op calls PG::queue_op which checks waiting_for_map before calling OpWQ::queue which adds the op to the queue of the PG responsible for handling it.
-dequeue_op is then eventually called on the PG. At this time, the op is popped
-off of op_queue and passed to PG::do_request, which checks that the PG map is
-new enough (must_delay_op) and then processes the request.
+OSD::dequeue_op is then eventually called, with a lock on the PG. At
+this time, the op is passed to PG::do_request, which checks that:
-In summary, the possible ways that an op may wait or be discarded in are:
+ 1. the PG map is new enough (PG::must_delay_op)
+ 2. the client requesting the op has enough permissions (PG::op_has_sufficient_caps)
+ 3. the op is not to be discarded (PG::can_discard_{request,op,subop,scan,backfill})
+ 4. the PG is active (PG::flushed boolean)
+ 5. the op is a CEPH_MSG_OSD_OP and the PG is in PG_STATE_ACTIVE state and not in PG_STATE_REPLAY
- 1. Wait in waiting_for_osdmap due to OSD::require_same_or_newer_map from
- OSD::handle_*.
- 2. Discarded in OSD::can_discard_op at enqueue_op.
- 3. Wait in PG::op_waiters due to PG::must_delay_request in PG::do_request.
- 4. Wait in PG::waiting_for_active in due_request due to !flushed.
- 5. Wait in PG::waiting_for_active due to !active() in do_op/do_sub_op.
- 6. Wait in PG::waiting_for_(degraded|missing) in do_op.
- 7. Wait in PG::waiting_for_active due to scrub_block_writes in do_op
+If these conditions are not met, the op is either discarded or queued for later processing. If all conditions are met, the op is processed according to its type:
-TODO: The above is not a complete list.
+ 1. CEPH_MSG_OSD_OP is handled by PG::do_op
+ 2. MSG_OSD_SUBOP is handled by PG::do_sub_op
+ 3. MSG_OSD_SUBOPREPLY is handled by PG::do_sub_op_reply
+ 4. MSG_OSD_PG_SCAN is handled by PG::do_scan
+ 5. MSG_OSD_PG_BACKFILL is handled by PG::do_backfill
+
+CEPH_MSG_OSD_OP processing
+--------------------------
+
+ReplicatedPG::do_op handles CEPH_MSG_OSD_OP op and will queue it
+
+ 1. in wait_for_all_missing if it is a CEPH_OSD_OP_PGLS for a designated snapid and some object updates are still missing
+ 2. in waiting_for_active if the op may write but the scrubber is working
+ 3. in waiting_for_missing_object if the op requires an object or a snapdir or a specific snap that is still missing
+ 4. in waiting_for_degraded_object if the op may write an object or a snapdir that is degraded, or if another object blocks it ("blocked_by")
+ 5. in waiting_for_backfill_pos if the op requires an object that will be available after the backfill is complete
+ 6. in waiting_for_ack if an ack from another OSD is expected
+ 7. in waiting_for_ondisk if the op is waiting for a write to complete
Peering Messages
----------------
diff --git a/doc/dev/osd_internals/pg.rst b/doc/dev/osd_internals/pg.rst
index 2c2c572fa51..405536396f1 100644
--- a/doc/dev/osd_internals/pg.rst
+++ b/doc/dev/osd_internals/pg.rst
@@ -7,19 +7,19 @@ Concepts
*Peering Interval*
See PG::start_peering_interval.
- See PG::up_acting_affected.
+ See PG::acting_up_affected
See PG::RecoveryState::Reset
A peering interval is a maximal set of contiguous map epochs in which the
up and acting sets did not change. PG::RecoveryMachine represents a
transition from one interval to another as passing through
- RecoveryState::Reset. On PG;:RecoveryState::AdvMap PG::up_acting_affected can
+ RecoveryState::Reset. On PG::RecoveryState::AdvMap PG::acting_up_affected can
cause the pg to transition to Reset.
Peering Details and Gotchas
---------------------------
-For an overview of peering, see Peering.
+For an overview of peering, see `Peering <../../peering>`_.
* PG::flushed defaults to false and is set to false in
PG::start_peering_interval. Upon transitioning to PG::RecoveryState::Started
diff --git a/doc/dev/osd_internals/pg_removal.rst b/doc/dev/osd_internals/pg_removal.rst
index 4ac0d331b23..c5e0582fefa 100644
--- a/doc/dev/osd_internals/pg_removal.rst
+++ b/doc/dev/osd_internals/pg_removal.rst
@@ -20,19 +20,14 @@ deleted. Each DeletingState object in deleting_pgs lives while at
least one reference to it remains. Each item in RemoveWQ carries a
reference to the DeletingState for the relevant pg such that
deleting_pgs.lookup(pgid) will return a null ref only if there are no
-collections currently being deleted for that pg. DeletingState allows
-you to register a callback to be called when the deletion is finally
-complete. See PG::start_flush. We use this mechanism to prevent the
-pg from being "flushed" until any pending deletes are complete.
-Metadata operations are safe since we did remove the old metadata
-objects and we inherit the osr from the previous copy of the pg.
+collections currently being deleted for that pg.
The DeletingState for a pg also carries information about the status
of the current deletion and allows the deletion to be cancelled.
The possible states are:
1. QUEUED: the PG is in the RemoveWQ
- 2. CLEARING_DIR: the PG's contents are being removed syncronously
+ 2. CLEARING_DIR: the PG's contents are being removed synchronously
3. DELETING_DIR: the PG's directories and metadata being queued for removal
4. DELETED_DIR: the final removal transaction has been queued
5. CANCELED: the deletion has been canceled
@@ -46,7 +41,7 @@ fails to stop the deletion will not return until the final removal
transaction is queued. This ensures that any operations queued after
that point will be ordered after the pg deletion.
-_create_lock_pg must handle two cases:
+OSD::_create_lock_pg must handle two cases:
1. Either there is no DeletingStateRef for the pg, or it failed to cancel
2. We succeeded in canceling the deletion.