summaryrefslogtreecommitdiff
path: root/TAO/docs/releasenotes/ftcorba_services.html
blob: d603acd716a7f2104e06ea2e2f65035ef4dfb03c (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
<!doctype html public "-//w3c//dtd html 4.0 transitional//en">
<!--  -->
<html>
<head>
   <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
   <meta name="Author" content="Steve Totten">
   <title>Fault Tolerant (FT) CORBA Services</title>
</head>
<body>

<center>

<h2>Fault Tolerant (FT) CORBA Services</h2></center>

<p>
Points of contact: <a href="mailto:wilson_d@ociweb.com">Dale
  Wilson</a> and <a href="mailto:totten_s@ociweb.com">Steve Totten</a>
</p>

<p>
<ul>
  <li><a href="#Introduction">Introduction</a></li>
  <li>
    <a href="#FTCORBA_Services">FT CORBA Services</a>
    <ul>
      <li><a href="#Replication_Manager">Replication Manager</a></li>
      <li><a href="#Fault_Notifier">Fault Notifier</a></li>
      <li><a href="#Fault_Detector">Fault Detector</a></li>
      <li><a href="#Fault_Detector_Factory">Fault Detector Factory</a></li>
      <li><a href="#Redundancy">Redundancy of FT CORBA Infrastructure
        Services</a></li>
    </ul>
  </li>
  <li>
    <a href="#Sample_FT_Application">Sample FT Application</a>
    <ul>
      <li><a href="#Replica_Factory">Replica Factory</a></li>
      <li><a href="#Replica">Replica</a></li>
      <li><a href="#Object_Group_Creator">Object Group Creator</a></li>
      <li><a href="#Client">Client</a></li>
      <li><a href="#Prototype_Architecture">Prototype Architecture</a></li>
    </ul>
  </li>
  <li><a href="#Propagating_IOGRs">Propagating IOGRs</a></li>
  <li><a href="#IOGR_Creation_Manipulation">IOGR Creation and
    Manipulation</a></li>
  <li><a href="#Bootstrapping">Bootstrapping of FT CORBA
    Infrastructure and Application</a></li>
  <li><a href="#Future_Work">Future Work</a></li>
</ul>
</p>

<h3><a name="Introduction"></a>Introduction</h3>

<p>
Object Computing, Inc. (OCI) and the Distributed Object Computing
(DOC) group at Vanderbilt University's Institute for Software
Intensive Systems (ISIS) collaborated on a research and development
(R&amp;D) effort to demonstrate the viability of the OMG FT CORBA
specification (defined in Chapter 23 of the <a
  href="http://www.omg.org/cgi-bin/doc?formal/02-12-02">CORBA 3.0
  specification</a>), with some extensions, as a platform for building
fault-tolerant DRE applications.
</p>

<p>
The OCI team designed, implemented, and tested FT service-level
entities and other components needed to support an FT infrastructure
in TAO, and a sample application to demonstrate TAO's FT capabilities.
The ISIS team provided enhancements to TAO's ORB Core to support fault
tolerance in applications, including implementing ORB-core-level
features defined in the FT CORBA specification.
</p>

<p>
Extensions to the FT CORBA specification investigated during the
project included:
<ol>
  <li>Adding a <code>SEMI_ACTIVE</code> replication style similar to
  that described <a
    href="http://www.dre.vanderbilt.edu/~schmidt/PDF/WDMS02.pdf">here</a>.</li>
  <li>Separating interfaces and type definitions that are common
  across multiple specifications into a Portable Group module as
  described in the <a
    href="http://www.omg.org/cgi-bin/doc?ptc/01-11-09">OMG Data
    Parallel Processing specification</a> and the <a
    href="http://www.omg.org/cgi-bin/doc?ptc/01-11-08">Unreliable
    Multicast Inter-ORB Protocol specification</a></li>
  <li>Adding factory registration and a fault detector factory
  interfaces.</li>
  <li>Adding mechanisms for bootstrapping FT CORBA systems.</li>
  <li>Defining protocols for operating between the ORB core and FT
  services.</li>
</ol>

<h3><a name="FTCORBA_Services"></a>FT CORBA Services</h3>

<h4><a name="Replication_Manager"></a>Replication Manager</h4>

<p>
The Replication Manager is perhaps the most visible of FT CORBA's
infrastructure components. Fault tolerant services interact with the
replication manager to create object groups, manage an object group's
properties, control an object group's membership, and so forth. The
Replication Manager is also solely responsible for the creation and
maintenance of Interoperable Object Group References (IOGRs).
According to the FT CORBA specification, the Replication Manager's
operations are defined by three separate interfaces:
</p>

<ul>
  <li><em>Property Manager</em>: Defines operations for setting
  properties, such as replication style, at several levels: by group,
  by type, or by default.</li>
  <li><em>Object Group Manager</em>: Defines operations to add and
  remove members of an object group, change the primary member of an
  object group (for passive replication styles only), specify or get
  the locations of object group members, and get the current object
  group reference and object group identifier.</li>
  <li><em>Generic Factory</em>: Defines operations for creating and
  destroying objects. The replication manager's realization of these
  interfaces effect the creation and destruction of object groups.
  These operations are also realized by application specific object
  factories to create and destroy replicas as object groups are
  created and maintained.</li>
</ul>

<p>
<em>
Note: The Data Parallel (DP) CORBA final adopted specification defines
a new <code>PortableGroup</code> module, including the three
interfaces listed above, to share common interfaces and their
supporting types among DP CORBA, FT CORBA, Load Balancing, and other
specifications. It is identical to a subset of the FT CORBA
specification with a few changes to make it more generic to group
management.  TAO already has an implementation of the PortableGroup
module that we adapted for reuse by the Replication Manager's
implementation for this project.
</em>
</p>

<p>
In addition, the Replication Manager serves the role of a consumer for
fault report events propagated to it via the Fault Notifier, so it
must realize the Structured Push Consumer interface from the
CosNotifyComm module (defined in the <a
  href="http://www.omg.org/cgi-bin/doc?formal/02-08-04">OMG's
  Notification Service specification (formal/02-08-04)</a>.  Our
design provides a <em>Fault Consumer/Analyzer framework</em> and
concrete fault consumer and fault analyzer implementations that are
tailored for use by the Replication Manager.  The classes making up
this framework, and the relationships among them, are shown in the
figure below.
</p>

<center>
<p>
<a name="FT_FaultAnalyzerFramework"></a> <img
src="FT_FaultAnalyzerFramework.jpg" alt="Fault Consumer/Analyzer
Framework" title="Fault Consumer/Analyzer Framework"></img>
</p>
<h4>Figure 1: Fault Consumer/Analyzer Framework</h4>
</center>


<p>
The OCI team also added a <em>Factory Registry</em> interface that is
also realized by the Replication Manager to allow various types of
factories, such as Replica Factories and Fault Detector Factories (all
of which implement the Generic Factory interface) to register with the
Replication Manager.  The Replication Manager then uses these
factories when necessary to add members (replicas) to object groups or
to create a new Fault Detector at a specific location to monitor a
replica.
</p>

<p>
The Replication Manager's interfaces are defined in <a
  href="../../orbsvcs/orbsvcs/FT_ReplicationManager.idl">FT_ReplicationManager.idl</a>.
The Replication Manager also supports interfaces and types from the
PortableGroup module that is defined in <a
  href="../../orbsvcs/orbsvcs/PortableGroup.idl">PortableGroup.idl</a>
and additional interfaces and types from the FT module that is defined
in <a
  href="../../orbsvcs/orbsvcs/FT_CORBA.idl">FT_CORBA.idl</a>.
Source code for the Replication Manager's implementation is found in
the <a
  href="../../orbsvcs/FT_ReplicationManager/">FT_ReplicationManager</a>
directory.
</p>

<h4><a name="Fault_Notifier"></a>Fault Notifier</h4>

<p>
FT CORBA's Fault Notifier is based upon a subset of the <a
  href="http://www.omg.org/cgi-bin/doc?formal/02-08-04">OMG's
  Notification Service specification (formal/02-08-04)</a>. The Fault
Notifier typically gathers fault reports from Fault Detectors as well
as from application- or platform-specific fault detectors.
</p>

<p>
The Fault Notifier can support an arbitrary number of Fault Detectors
and consumers because it is based upon the Notification Service.
Components interested in receiving fault reports assume the role of
push consumer with respect to the Fault Notifier.  The Replication
Manager is one such component, as described above; an application may
provide its own fault analysis capability by connecting an
application-specific fault analyzer as a consumer to the Fault
Notifier.  (In fact, a real-world application will likely participate
intimately in identifying and analyzing faults.  One way this could be
done is to "plug-in" an application-specific fault analyzer to the
Replication Manager, using the <a
  href="#FT_FaultAnalyzerFramework">Fault Consumer/Analyzer
  framework</a> described above.)
</p>

<p>
The Fault Notifier's interfaces are defined in <a
  href="../../orbsvcs/orbsvcs/FT_Notifier.idl">FT_Notifier.idl</a>.
Source code for the Fault Notifier's implementation is found in the <a
  href="../../orbsvcs/Fault_Notifier/">Fault_Notifier</a>
directory.
</p>


<h4><a name="Fault_Detector"></a>Fault Detector</h4>

<p>
The Fault Detector is the basic component in FT CORBA for monitoring a
fault tolerant system's software components, processes, and processing
nodes and reporting faults.  The FT CORBA specification defines a
single monitoring style, the <em>pull</em> monitoring style, in which
a Fault Detector periodically issues a CORBA request
(<code>is_alive</code>) to monitored objects and reports faults for
those objects that fail to respond.  A fault detector that monitors a
single replica may be co-located on the same host as that replica.  If
the replica fails (defined as failure to reply to the detector's
<code>is_alive</code> invocation within a prescribed time-out period),
the detector issues a fault report to the Fault Notifier, which it
finds via the Replication Manager.  In our example application, the
detector then exits since the replica that it was monitoring no longer
exists.  Fault detectors can also be deployed on other nodes and used
to monitor other FT CORBA infrastructure components, such as a Fault
Detector or Fault Detector Factory on another host.  The pull
monitoring style is used to monitor these components as well.
</p>

<h4><a name="Fault_Detector_Factory"></a>Fault Detector Factory</h4>

<p>
Fault Detectors are created and managed by a Fault Detector Factory.
There may be many Fault Detector Factories deployed in a typical fault
tolerant system.  The Fault Detector Factory implements the Generic
Factory interface.
</p>

<p>
Fault Detectors are created in the same process as their Fault
Detector Factory.  Each Fault Detector runs in its own thread and
monitors its replica according to its prescribed monitoring interval
(defined by a property on the object group).  The Fault Detector
Factory owns the thread manager for these threads.  If a replica
member is removed from an object group, the Fault Detector Factory
that "owns" the Fault Detector that is monitoring that replica can
cause the detector to "quit," thereby causing it to clean up any
resources and its thread to exit.
</p>

<p>
Fault Detector Factories register with the Replication Manager via the
Factory Registry interface.  The Replication Manager then uses these
Fault Detector Factories to create new Fault Detectors as needed to
monitor replicas as they are created.
</p>

<p>
The Fault Detector Factory's interfaces are defined in
<a
  href="../../orbsvcs/orbsvcs/FT_FaultDetectorFactory.idl">FT_FaultDetectorFactory.idl</a>.  Source code for the
Fault Detector and Fault Detector Factory implementations is found in
the <a
  href="../../orbsvcs/Fault_Detector/">Fault_Detector</a> directory.
</p>

<h4><a name="Redundancy"></a>Redundancy of FT CORBA Infrastructure Services</h4>

<p>
To achieve fault tolerance, a system must not have a single point of
failure.  This includes not only application services, but
infrastructure services as well.  In the case of this project, the
following FT infrastructure services need to be made fault tolerant
via redundancy:
</p>

<ul>
  <li>Replication Manager</li>
  <li>Fault Notifier</li>
  <li>Fault Detector Factories</li>
  <li>Replica Factories</li>
</ul>

<p>
<em>
One of the initial goals of this project was to provide redundant
implementations of each of these services after first providing basic
non-redundant implementations.  Unfortunately, due to complexities
encountered during implementation and the relatively short time frame
for the project, we did not complete development of redundant versions
of the various FT infrastructure services.  We encourage this to be
given a high priority for any follow-on work.
</em>
</p>

<p>
Note that making the Replication Manager redundant will require direct access to the lower-level state synchronization mechanism (i.e., via a synchronization strategy) while other FT infrastructure services can likely be made fault tolerant using the full range of FT CORBA mechanisms.
</p>


<h3><a name="Sample_FT_Application"></a>Sample FT Application</h3>

<h4><a name="Replica_Factory"></a>Replica Factory</h4>

<p>
A Replica Factory is an application-defined entity that implements the
Generic Factory interface.  There may be many Replica Factories
deployed in a typical fault tolerant application.  Each Replica
Factory acts as an agent for the Replication Manager to create and
manage replica members of object groups of a specific type at a
specific location.  Replica Factories register with the Replication
Manager via the Factory Registry interface.  The Replication Manager
then uses these Replica Factories to create new replicas as needed
when creating object groups or adding new members to existing object
groups.
</p>

<p>
We have provided a sample implementation of a Replica Factory as part
of our example application for this project.  It implements the
Generic Factory interface from the FT module defined in <a
  href="../../orbsvcs/orbsvcs/FT_CORBA.idl">FT_CORBA.idl</a>.
Source code for the example application's Replica Factory is found in
the <a
  href="../../orbsvcs/tests/FT_App/">FT_App</a>
directory.
</p>


<h4><a name="Replica"></a>Replica</h4>

<p>
A Replica is an application object that serves as a member of an
object group.  Each replica implements an application-defined
interface.  In addition, each replica must implement the Pull
Monitorable interface so it can be monitored by a Fault Detector.
Replicas are created by Replica Factories by the Replication Manager
or by another application.  Each new replica is then added to an
object group and managed by the Replication Manager.
</p>

<p>
We have provided a sample implementation of a Replica as part of our
example application for this project.  A Replica must implement the
<code>PullMonitorable</code>, <code>Checkpointable</code>, and
<code>Updateable</code> interfaces, which are defined in <a
  href="../../orbsvcs/orbsvcs/FT_Replica.idl">FT_Replica.idl</a>.
For our example application, a test replica interface is defined in <a
  href="../../orbsvcs/tests/FT_App/FT_TestReplica.idl">FT_App/FT_TestReplica.idl</a>.
The implementation of the test replica is also in the <a
  href="../../orbsvcs/tests/FT_App/">FT_App</a>
directory.
</p>

<h4><a name="Object_Group_Creator"></a>Object Group Creator</h4>

<p>
The Object Group Creator is a utility for creating an object group.
It can be used by an application to create an initial set of objects
in a system.  The Object Group Creator finds the Replication Manager
and uses its Factory Registry interface to get a list of factories it
can use to create objects of the desired type.  The Object Group
Creator can be used in different ways depending upon if the object
group's <code>MembershipStyle</code> property value is
application-controlled membership or infrastructure-controlled
membership.
</p>

<ul>
  <li>Application-controlled membership: If application-controlled
  membership is being used, the Object Group Creator calls the
  Replication Manager to create an empty object group, then calls the
  factories to create members for the group.  Members are added via
  the Replication Manager's <code>add_member</code> operation.</li>

  <li>Infrastructure-controlled membership: If
  infrastructure-controlled membership is being used, and the object
  creator is configured to set factories at the type level, the object
  creator optionally passes the set of factories to the
  <code>set_type_properties</code> operation of the Replication
  Manager, then calls Replication Manager's <code>create_object</code>
  operation to create an object group.</li>
</ul>

<p>
After creating the object group, the Object Group Creator can
optionally write the group's IOGR to a file or bind it in the Naming
Service so it can be accessed by clients.
</p>

<p>
The Object Group Creator can exist as a stand-alone utility or it can
be integrated with an application.  Our example application includes
an implementation of the Object Group Creator in the <a
  href="../../orbsvcs/tests/FT_App/">FT_App</a>
directory.
</p>


<h4><a name="Client"></a>Client</h4>

<p>
A client application obtains the object group reference from a file or
from the Naming Service and invokes operations on it as it would a
normal IOR.  In the <code>SEMI_ACTIVE</code> replication style, only the primary
replica receives and processes each request.  The state
synchronization strategy developed by the ISIS team for this project
is used synchronize state between the primary and backup replicas with
the completion of each request.  If the primary replica fails, the
transparent reinvocation mechanism inherent in the FT ORB (also
developed by the ISIS team) causes the client's failed request to be
automatically reinvoked on a backup replica.  Meanwhile, the fault
detection mechanisms described above are used to notify the
Replication Manager of the fault and the Replication Manager takes the
necessary actions to maintain the object group's integrity.  Our
example application includes a simple client in the <a
  href="../../orbsvcs/tests/FT_App/">FT_App</a>
directory.
</p>

<h4><a name="Prototype_Architecture"></a>Prototype Architecture</h4>

<p>
The figure below shows the architecture of a prototypical FT system
and the relationships among the various FT infrastructure and
application-defined components described above.
</p>

<center>
<p>
<a name="FT_PrototypeArchitecture"></a> <img
src="FT_PrototypeArchitecture.jpg" alt="Architecture of Prototypical FT System" title="Architecture of Prototypical FT System"></img>
</p>
<h4>Figure 2: Architecture of Prototypical FT System</h4>
</center>

<p>
The steps involved in orderly start-up and operation of an FT system
are numbered in Figure 2 and described below:
</p>

<ol>
  <li>Start the Naming Service. (This step is optional as none of the
  FT components actually depends upon the Naming Service.)</li>
  <li>Start the Replication Manager.</li>
  <li>Start the Fault Notifier.</li>
  <li>The Fault Notifier finds the Replication Manager and registers
  with it.</li>
  <li>The Replication Manager connects as a consumer to the Fault
  Notifier.</li>
  <li>Start one or more Fault Detector Factories.</li>
  <li>The Fault Detector Factories register with the Replication Manager's Factory Registry.</li>
  <li>Start one or more Replica Factories.</li>
  <li>The Replica Factories register with the Replication Manager's Factory Registry.</li>
  <li>Start the Object Group Creator.</li>
  <li>(not shown) The Object Group Creator finds the Replication Manager and gets a list of Fault Detector Factories for the Replication Manager's Factory Registry.</li>
  <li>(not shown) The Object Group Creator gets a list of Replica Factories from the Replication Manager's Factory Registry.</li>
  <li>The Object Group Creator creates an object group via the Replication Manager's Generic Factory interface.</li>
  <li>The Object Group Creator creates one or more Replicas via Replica Factories.</li>
  <li>Each Replica Factory creates a Replica.</li>
  <li>The Object Group Creator creates a Fault Detector for each Replica via the Fault Detector Factories.</li>
  <li>Each Fault Detector Factory creates a Fault Detector for a Replica.</li>
  <li>Each Fault Detector finds the Replication Manager and gets the Fault Notifier from the Replication Manager.</li>
  <li>Each Fault Detector connects as a supplier to the Fault Notifier.</li>
  <li>The Object Group Creator adds each Replica as a member to the object group via the Replication Manager's Object Group Manager interface.</li>
  <li>The Replication Manager generates a new IOGR for each added Replica and updates each Replica member of the object group with the new IOGR.</li>
  <li>The Object Group Creator optionally binds the IOGR of the object group with the Naming Service or publishes its IOGR in some other way, such as a file.</li>
  <li>Start a Client.</li>
  <li>The Client optionally resolves the object group by name from the Naming Service or resolves it in some other way, such as from a file or via a corbaloc ObjectURL.</li>
  <li>The Client invokes a request on the object group.  This request is carried out by the primary Replica of the object group.</li>
  <li>Each Fault Detector periodically pings its Replica via the Replica's PullMonitorable interface.</li>
  <li>If a Replica fails, the Fault Detector pushes a structured fault report to the Fault Notifier.</li>
  <li>The Fault Notifier pushes the structured fault report as an event to the Replication Manager's consumer.</li>
  <li>(not shown) The Replication Manager removes the failed member from the object group, selects a new primary for the object group, generates a new IOGR, and updates each Replica member of the object group with the new IOGR.</li>
  <li>(not shown) The Replication Manager may also add new members to the object group if the number of replicas has fallen below the object group's MinimumNumberReplicas property.  When it adds new members, the Replication Manager also generates a new IOGR and updates each Replica member of the object group with the new IOGR.</li>
</ol>


<h3><a name="Propagating_IOGRs"></a>Propagating IOGRs</h3>

<p>
The FT CORBA specification requires the Replication Manager to create
and maintain IOGRs.  It also requires the FT ORB to perform
<em>most-recent IOGR</em> processing, whereby the FT ORB can update a
client using an old IOGR, by means of a <code>LOCATION_FORWARD</code>
reply, with a new IOGR.  However, the specification fails to define a
way for the Replication Manager to propagate revised IOGRs to the FT
ORBs of object group members.  Therefore, the OCI and ISIS teams
agreed upon a simple interface (<code>tao_update_iogr</code>) by which
the Replication Manager can propagate revised IOGRs to the FT ORB for
each member of an object group (e.g., after failure of a primary
replica, selection of a new primary member, and generation of a new
IOGR by the Replication Manager).  While this interface is
TAO-specific, it accomplishes one of our research goals of
investigating formal protocols by which different ORB implementations
of FT CORBA could be made interoperable.  The ISIS team implemented
this interface and the OCI team incorporated its use within the
Replication Manager.
</p>

<h3><a name="IOGR_Creation_Manipulation"></a>IOGR Creation and
  Manipulation</h3>

<p>
To support FT CORBA, the Replication Manager must be able to create
and manipulate IOGRs.  For example, the Replication Manager's
realization of the Generic Factory interface must return an IOGR.
Also, upon receiving a fault report on an object group, the
Replication Manager may need to remove a member, designate a member as
the new primary replica, and generate a new IOGR that can then be
propagated to each member of the object group.
</p>

<p>
For the purposes of this project, we used TAO's existing
IORManipulation library for creating and managing IOGRs.  However, the
IORManipulation library lacked certain features that were needed.  We
worked with the ISIS team to define extensions to the IORManipulation
library to:
</p>

<ul>
  <li>Support the creation of "empty" IORs that have no profiles.</li>
  <li>Support the complete replacement of all the profiles in an
IOR.</li>
  <li>Support the addition of the following tagged components:</li>
  <ul>
    <li>TAG_MULTIPLE_COMPONENTS</li>
    <li>TAG_GROUP</li>
    <li>TAG_FT_PRIMARY</li>
  </ul>
</ul>

<p>
<em>
While the IORManipulation library can be used to create and manipulate
IOGRs, a longer term approach may be to use a specialized
implementation of the Object Reference Template and IORInterceptor
abstractions defined in sections 21.5.3 and 21.5.4 of the <a
  href="http://www.omg.org/cgi-bin/doc?formal/02-12-02">CORBA 3.0
  specification</a>, respectively.
</em>
</p>

<h3><a name="Bootstrapping"></a>Bootstrapping of FT CORBA Infrastructure and Application</h3>

<p>
FT CORBA infrastructure and application components must collaborate to
achieve fault tolerance.  To do so, infrastructure and application
components must be started and initial objects and object groups
created in an orderly fashion.  Much of this "bootstrapping" can be
accomplished by scripting.  However, an entity to control the creation
of initial objects and object groups can greatly simplify certain
aspects of the bootstrapping process.
</p>

<p>
The sample application provided as part of this project uses an Object
Group Creator utility to create initial objects and object groups.
The Object Group Creator is implemented as a library that can be
easily integrated with other parts of an application.  A simple
wrapper is also provided allowing the Object Group Creator to be used
as a stand-alone executable.
</p>

<h3><a name="Future_Work"></a>Future Work</h3>

<p>
During the course of this research, we uncovered several areas for
further research and development, including:
</p>

<ul>
  <li>Adding redundancy to the FT infrastructure services.</li>
  <li>Extending FT support to additional platforms, such as the
  VxWorks RTOS.</li>
  <li>Incorporating FT capabilities into an advanced application,
  such as TAO's RT Notification Service, or Naming
  Service.</li>
  <li> Integrating the <a href="ftrt_ec.html"> FT RT Event Service </a>
  with FT Services. The current implementation of FT RT Event
  Service in TAO is not based on the FT Services implemented
  by OCI because they are developed independently.
  </li>
  <li>Investigating improvements to the mechanisms used to detect
  faults in replicated application objects.</li>
  <li>Investigating advanced fault analysis capabilities and
  mechanisms by which application- or platform-specific fault
  analyzers to be "plugged" into the Replication Manager or other
  components.</li>
  <li>Investigating formal protocols that will allow interoperability
  among different ORB implementations of FT CORBA.</li>
  <li>Refactoring of the Replication Manager and Portable Group
  implementations to consolidate the representation of object
  groups.</li>
  <li>Employing standard mechanisms, such as Object Reference Template
  and IORInterceptors, instead of the TAO-specific IORManipulation
  library, for the creation and manipulation of IOGRs.</li>
  <li>Investigating performance and scalability issues in FT
  systems.</li>
  <li>Enabling the configuration and enforcement of FT quality of
  service (QoS) properties, such as bounds on fault detection and
  recovery times.</li>
</ul>

</body>
</html>