diff options
Diffstat (limited to 'trunk/TAO/orbsvcs/examples/FaultTolerance/RolyPoly/README')
-rw-r--r-- | trunk/TAO/orbsvcs/examples/FaultTolerance/RolyPoly/README | 105 |
1 files changed, 105 insertions, 0 deletions
diff --git a/trunk/TAO/orbsvcs/examples/FaultTolerance/RolyPoly/README b/trunk/TAO/orbsvcs/examples/FaultTolerance/RolyPoly/README new file mode 100644 index 00000000000..786b42a81d9 --- /dev/null +++ b/trunk/TAO/orbsvcs/examples/FaultTolerance/RolyPoly/README @@ -0,0 +1,105 @@ + + +Overview + +RolyPoly is a simple example that shows how to increase application +reliability by using replication to tolerate faults. It allows you +to start two replicas of the same object which are logically seen as +one object by a client. Furthermore, you can terminate one of the +replicas without interrupting the service provided by the object. + +RolyPoly is using request/reply logging to suppress repeated +requests (thus guaranteeing exactly-once semantic) and state +synchronization (to ensure all replicas are in a consistent +state). Since replicas are generally distributed across multiple +nodes in the network, logging and state synchronizations are +done using multicast group communication protocol. + +In order to make it illustrative, each replica can be set to +fail in one of the predefined places called crash points. The +following crash point numbers are defined: + +0 - no crash point (default). + +1 - fail before reply logging/state synchronization. + +2 - fail after reply logging/state synchronization but before + returning reply to the client. + +Essential difference between crash point 1 and 2 is that in +the second case there should be reply replay while in the +first case request is simply re-executed (this can be observed +in the trace messages of the replicas). + + +Execution Scenario + +In this example scenario we will start three replicas. For one +of them (let us call it primary) we will specify a crash point +other than 0. Then we will start a client to execute requests +on the resulting object. After a few requests, primary will +fail and we will be able to observe transparent shifting of +client to the other replica. Also we will be able to make sure +that, after this shifting, object is still in expected state +(i.e. the sequence of returned numbers is not interrupted and +that, in case of the crash point 2, request is not re-executed). + +Note, due to the underlying group communication architecture, +the group with only one member (replica in our case) can only +exist for a very short period of time. This, in turn, means +that we need to start first two replicas virtually at the same +time. This is also a reason why we need three replicas instead +of two - if one replica is going to fail then the other one +won't live very long alone. For more information on the reasons +why it works this way please see documentation for TMCast +available at $(ACE_ROOT)/ace/TMCast/README. + +Suppose we have node0, node1 and node2 on which we are going +to start our replicas (it could be the same node). Then, to +start our replicas we can execute the following commands: + +node0$ ./server -o replica-0.ior -c 2 +node1$ ./server -o replica-1.ior +node2$ ./server -o replica-2.ior + +When all replicas are up we can start the client: + +$ ./client -k file://replica-0.ior -k file://replica-1.ior + +In this scenario, after executing a few requests, replica-0 +will fail in crash point 2. After that, replica-1 will continue +executing client requests. You can see what's going on with +replicas by looking at various trace messages printed during +execution. + + +Architecture + +The biggest part of the replication logic is carried out by +the ReplicaController. In particular it performs the +following tasks: + +* management of distributed request/reply log + +* state synchronization + +* repeated request suppression + + +Object implementation (interface RolyPoly in our case) can use +two different strategies for delivering state update to the +ReplicaController: + +* push model: client calls Checkpointable::associate_state + to associate the state update with current request. + +* pull model: ReplicaController will call Checkpointable::get_state + implemented by the servant. + +This two model can be used simultaneously. In RolyPoly interface +implementation you can comment out corresponding piece of code to +chose one of the strategies. + +-- +Boris Kolpackov <boris@dre.vanderbilt.edu> + |