summaryrefslogtreecommitdiff
path: root/TAO/orbsvcs/examples/FaultTolerance/RolyPoly/README
blob: 786b42a81d98ae6f70b7339c7a04cbeb3a275efd (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105


Overview

RolyPoly is a simple example that shows how to increase application 
reliability by using replication to tolerate faults. It allows you 
to start two replicas of the same object which are logically seen as
one object by a client. Furthermore, you can terminate one of the
replicas without interrupting the service provided by the object.

RolyPoly is using request/reply logging to suppress repeated 
requests (thus guaranteeing exactly-once semantic) and state
synchronization (to ensure all replicas are in a consistent
state). Since replicas are generally distributed across multiple
nodes in the network, logging and state synchronizations are
done using multicast group communication protocol.

In order to make it illustrative, each replica can be set to
fail in one of the predefined places called crash points. The
following crash point numbers are defined:

0 - no crash point (default).

1 - fail before reply logging/state synchronization.

2 - fail after reply logging/state synchronization but before
    returning reply to the client.

Essential difference between crash point 1 and 2 is that in
the second case there should be reply replay while in the 
first case request is simply re-executed (this can be observed
in the trace messages of the replicas).


Execution Scenario

In this example scenario we will start three replicas. For one
of them (let us call it primary) we will specify a crash point 
other than 0. Then we will start a client to execute requests 
on the resulting object. After a few requests, primary will 
fail and we will be able to observe transparent shifting of 
client to the other replica. Also we will be able to make sure 
that, after this shifting, object is still in expected state 
(i.e. the sequence of returned numbers is not interrupted and 
that, in case of the crash point 2, request is not re-executed).

Note, due to the underlying group communication architecture,
the group with only one member (replica in our case) can only
exist for a very short period of time. This, in turn, means 
that we need to start first two replicas virtually at the same 
time. This is also a reason why we need three replicas instead 
of two - if one replica is going to fail then the other one 
won't live very long alone. For more information on the reasons 
why it works this way please see documentation for TMCast 
available at $(ACE_ROOT)/ace/TMCast/README.

Suppose we have node0, node1 and node2 on which we are going 
to start our replicas (it could be the same node). Then, to 
start our replicas we can execute the following commands:

node0$ ./server -o replica-0.ior -c 2
node1$ ./server -o replica-1.ior
node2$ ./server -o replica-2.ior

When all replicas are up we can start the client:

$ ./client -k file://replica-0.ior -k file://replica-1.ior

In this scenario, after executing a few requests, replica-0 
will fail in crash point 2. After that, replica-1 will continue
executing client requests. You can see what's going on with
replicas by looking at various trace messages printed during
execution.


Architecture

The biggest part of the replication logic is carried out by
the ReplicaController. In particular it performs the 
following tasks:

* management of distributed request/reply log

* state synchronization

* repeated request suppression


Object implementation (interface RolyPoly in our case) can use
two different strategies for delivering state update to the
ReplicaController:

* push model: client calls Checkpointable::associate_state
  to associate the state update with current request.

* pull model: ReplicaController will call Checkpointable::get_state
  implemented by the servant.

This two model can be used simultaneously. In RolyPoly interface
implementation you can comment out corresponding piece of code to 
chose one of the strategies.

-- 
Boris Kolpackov <boris@dre.vanderbilt.edu>