diff options
Diffstat (limited to 'ACE/protocols/ace/TMCast/README')
-rw-r--r-- | ACE/protocols/ace/TMCast/README | 240 |
1 files changed, 240 insertions, 0 deletions
diff --git a/ACE/protocols/ace/TMCast/README b/ACE/protocols/ace/TMCast/README new file mode 100644 index 00000000000..7104be46e30 --- /dev/null +++ b/ACE/protocols/ace/TMCast/README @@ -0,0 +1,240 @@ + + +Introduction +------------ + +TMCast (stands for Transaction MultiCast) is an implementation of a +transactional multicast protocol. In essence, the idea is to represent +each message delivery to members of a multicast group as a transaction +- an atomic, consistent and isolated action. A multicast transaction +can be viewed as an atomic transition of the group members to a new +state. If we define [Mo] as a set of operational (non-faulty) members +of the group, [Mf] as a set of faulty members of the group, [Ma] as a +set of members that view transition [Tn] as aborted and [Mc] as a set +of members that view transition [Tn] as committed, then this atomic +transition [Tn] should satisfy one of the following equations: + +Mo(Tn-1) = Ma(T) U Mf(T) +Mo(Tn-1) = Mc(T) U Mf(T) + +Or, in other words, after transaction T has been committed (aborted), +all operational (before transaction T) members are either in the +committed (aborted) or failed state. + +Thus, for each member of the group, outcome of the transaction can be +commit, abort or a member failure. It is important for a member to +exhibit a failfast (error latency is less than transaction cycle) +behavior. Or, in other words, if a member transitioned into a wrong +state, it is guaranteed to fail instead of delivering a wrong result. + +In order to achieve such an error detection in a decentralized +environment, certain limitations were imposed. One of the most +user-visible limitation is the fact that the lifetime of the group +with only one member is very short. This is because there is not way +for a member to distinguish "no members yet" case from "my link to the +group is down". In such a situation, the member assumes the latter +case. There is also a military saying that puts it quite nicely: two +is one, one is nothing. + + + +State of Implementation +----------------------- + +The current implementation is in a prototypical stage. The following +parts are not implemented or still under development: + +* Handling of network partitioning (TODO) + +* Redundant network support (TODO) + +* Member failure detection (partial implementation) + + +Examples +-------- + +There is a simple example available in examples/TMCast/Member with +the corresponding README. + + +Architecture +------------ + +Primary goals of the protocol are to (1) mask transient failures of the +underlying multicast protocol (or, more precisely, allow to recover +from transient failures) and (2) exhibit failfast behavior in cases of +permanent failures. + +The distinction between transient and permanent failures is based on +timeouts thus what can be a transient failure in one configuration of +the protocol could be a permanent failure in the other. + +[Maybe talk more about a transient/permanent threshold and its effect +on performance/resource utilization/etc.] + +[Maybe add a terminology section.] + +Each member of a multicast group has its unique (group-wise) id: + +struct MemberId +{ + char id[MEMBER_ID_LENGTH]; +}; + +Each payload delivery is part of a transaction. Each transaction is +identified by TransactionId: + +typedef unsigned short TransactionId; + + +Each transaction has a status code which identifies its state, as viewed by +a group member: + + +typedef unsigned char TransactionStatus; + +TransactionStatus const TS_BEGIN = 1; +TransactionStatus const TS_COMMIT = 2; +TransactionStatus const TS_ABORT = 3; +TransactionStatus const TS_COMMITTED = 4; +TransactionStatus const TS_ABORTED = 5; + +Thus each transaction is described by its id and status: + +struct Transaction +{ + TransactionId id; + TransactionStatus status; +}; + +The outcome of some predefined number of recent transactions is stored +in TransactionList: + +typedef Transaction TransactionList[TL_LENGTH]; + + +Each message sent to a multicast group has the following header: + +struct MessageHeader +{ + unsigned long length; + unsigned long check_sum; + MemberId member_id; + Transaction current; + TransactionList transaction_list; +}; + +[Maybe describe each field here.] + +A new member joins the group with transaction id 0 and status +TS_COMMITTED. + +Each member sends a periodic 'pulse' messages with some predefined interval +advertising its current view of the group. This includes the state of the +current transaction and the history of the recent transactions. + + +If a member of the group needs a payload delivery it starts a new +transaction by sending a message with current transaction set to + +{++current_id, TS_BEGIN} + +and payload appended after the header. + + +Each member joins a transaction in one of the following ways: + +* A member that began the transaction joins it 'to commit' (TS_COMMIT) + +* A member that received TS_BEGIN joins current transaction 'to commit' + (TS_COMMIT). + +* A member that received TS_COMMIT or TS_ABORT but did not receive TS_BEGIN + joins current transaction 'to abort' (TS_ABORT). + + +After a member has joined the transaction it starts participating in the +transaction's voting phase. On this phase members of the group decide the +fate of the transaction. Each member sends a predefined number of messages +where it announces its vote. In between those messages the member is receiving +and processing votes from other members and can be influenced by their +'opinion'. + +In their decision-making members follow the principle of the majority. As +the voting progresses (and comes close to an end) members become more and +more reluctant to deviate from the decision of the majority. + +[Maybe add an equation that measures member's willingness to change +its mind.] + +At the end of the voting phase each member declares the current transaction +either committed (TS_COMMITTED) or aborted (TS_ABORTED). If this decision does +not agree with the majority the member declares itself failed. + +In addition, each member builds a 'majority view' of the transaction history +(based on transaction_list). If it deviates from the member's own history the +member declares itself failed. + +Here are some example scenarios of how the protocol behaves in different +situations. Let's say we have three members of the group S, R1, R2. S +initiates a transaction. R1 and R2 join it. + +Scenario 1. (two-step voting) + +1. S initiates a transaction (TS_BEGIN) +2a. R1 receives TS_BEGIN, joins for commit +2b. R2 receives TS_BEGIN, joins for commit +3a. S announces TS_COMMIT (first vote) +3b. R1 announces TS_COMMIT (first vote) +3c. R2 announces TS_COMMIT (first vote) +4a. S announces TS_COMMIT (second vote) +4b. R1 announces TS_COMMIT (second vote) +4c. R2 announces TS_COMMIT (second vote) +5a. S announces TS_COMMITTED (end of vote) +5b. R1 announces TS_COMMITTED (end of vote) +5c. R2 announces TS_COMMITTED (end of vote) + + +Scenario 2. (two-step voting) + +1. S initiates a transaction (TS_BEGIN) +2a. R1 receives TS_BEGIN, joins for commit +2b. R2 didn't receive TS_BEGIN +3a. S announces TS_COMMIT (first vote) +3b. R1 announces TS_COMMIT (first vote) +3c. R2 received R1's TS_COMMIT announces TS_ABORT (first vote) +4a. S received R2's TS_ABORT announces TS_ABORT (second vote) +4b. R1 received R2's TS_ABORT announces TS_ABORT (second vote) +4c. R2 announces TS_ABORT (second vote) +5a. S announces TS_ABORTED (end of vote) +5b. R1 announces TS_ABORTED (end of vote) +5c. R2 announces TS_ABORTED (end of vote) + + +Scenario 3. (three-step voting) + +1. S initiates a transaction (TS_BEGIN) +2a. R1 receives TS_BEGIN, joins for commit +2b. R2 didn't receive TS_BEGIN +3a. S announces TS_COMMIT (first vote) +3b. R1 announces TS_COMMIT (first vote) +3c. R2 still didn't receive anything +4a. S announces TS_COMMIT (second vote) +4b. R1 announces TS_COMMIT (second vote) +4c. R2 received R1's TS_COMMIT, announces TS_ABORT (first vote) + +5a. S received R2's TS_ABORT but it is the end of the voting phase and + majority (S and R1) vote for commit, announces TS_COMMIT (third vote) +5b. R1 received R2's TS_ABORT but it is the end of the voting phase and + majority (S and R1) vote for commit, announces TS_COMMIT (third vote) +5c. R2 announces TS_ABORT (second vote) + +6a. S announces TS_COMMITTED (end of vote) +6b. R1 announces TS_COMMITTED (end of vote) +6c. R2 discovers that the the majority has declared current transaction + committed and thus declares itself failed. + + +-- +Boris Kolpackov <boris@dre.vanderbilt.edu> |