summaryrefslogtreecommitdiff
path: root/src/doc/anchortable.txt
blob: d9c0fefc31e08842e53494be58dcf772ab6306f3 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54

ANCHOR TABLE PROTOCOL

MDS sends an update PREPARE to the anchortable MDS.  The prepare is
identified by the ino and operation type; only one for each type
(create, update, destroy) can be pending at any time.  Both parties
may actually be the same local node, but for simplicity we treat that
situation the same.  (That is, we act as if they may fail
independently, even if they can't.)

The anchortable journals the proposed update, and responds with an
AGREE and a version number.  This uniquely identifies the request.

The MDS can then update the filesystem metadata however it sees fit.  
When it is finished (and the results journaled), it sends a COMMIT to
the anchortable.  The table journals the commit, frees any state from
the transaction, and sends an ACK.  The initiating MDS should then
journal the ACK to complete the transaction.


ANCHOR TABLE FAILURE

If the AT fails before journaling the PREPARE and sending the AGREE,
the initiating MDS will simply retry the request.

If the AT fails after journaling PREPARE but before journaling COMMIT,
it will resend AGREE to the initiating MDS.

If the AT fails after the COMMIT, the transaction has been closed, and it
takes no action.  If it receives a COMMIT for which it has no open
transaction, it will reply with ACK.


INITIATING MDS FAILURE

If the MDS fails before the metadata update has been journaled, no
action is taken, since nothing is known about the previously proposed
transaction.  If an AGREE message is received and there is no
corresponding PREPARE or pending-commit state, and ROLLBACK is sent to
the anchor table.

If the MDS fails after journaling the metadata update but before
journaling the ACK, it resends COMMIT to the anchor table.  If it
receives an AGREE after resending the COMMIT, it simply ignores the
AGREE.  The anchortable will respond with an ACK, allowing the
initiating MDS to journal the final ACK and close out the transaction
locally.  

On journal replay, each metadata update (EMetaBlob) encountered that
includes an anchor transaction is noted in the AnchorClient by adding
it to the pending_commit list, and each journaled ACK is removed from
that list.  Journal replay may enounter ACKs with no prior metadata
update; these are ignored.  When recovery finishes, a COMMIT is sent
for all outstanding transactions.