trunk/TAO/docs/notification/reliability.html


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
  <head>
    <title>Using the Reliable Notification Service</title>
    <meta content="False" name="vs_snapToGrid">
    <meta content="False" name="vs_showGrid">
    <!-- $Id$ -->
  </head>
  <body>
    <h1>Using the Reliable Notification Service</h1>
    <h2>Background</h2>
    <p>There are two CORBA services defined by the OMG to support the
      Supplier/Consumer design pattern.&nbsp; This pattern allows messages (known as
      Events in this context) to be generated by one or more suppliers and delivered
      to one or more consumers without requiring that the suppliers and consumers
      have any knowledge of each other.&nbsp;</p>
    <P>The Event Service provides a basic implementation of this pattern, and the
      Notification service extends this basic service to support a rich variety of
      optional features.</P>
    <h2>Reliability and Persistence</h2>
    <p>One of the optional features of the Notification service is Reliability.&nbsp;
      By default the Event Service and the Notification service provide a&nbsp; <EM>best-effort</EM>
      support for event delivery.&nbsp; If things go wrong -- program crashes,
      communications failures, etc.&nbsp; events may be lost without notice.</p>
    <P>There are some circumstances in which losing events is&nbsp; not
      acceptable.&nbsp; The Notification service may be used for these situations if
      it is configured for reliable operation.&nbsp; Reliable operation is not
      available in the Event Service.&nbsp; Reliable operation means information is
      saved persistently (usually on a disk file) and used to recover from the
      various failures that might otherwise lead to loss of data.</P>
    <P>There are two separate, but related, issues that need to be addressed to
      provide reliable event delivery:&nbsp; topology persistence an event
      persistence.</P>
    <P>To provide topology persistence, sometimes called connection persistence, the
      Notification service must keep track of what clients (Suppliers and Consumers)
      have connected to the Notification service and what options have been specified
      to contol the delivery of events.</P>
    <P>To provide event persistence the Notification service tracks each event in
      persistent storage to be sure it is delivered to every consumer that should
      receive it.&nbsp;
    </P>
    <P>There may be situations in which topology persistence is all that is necessary
      -- it&nbsp;may be&nbsp;acceptable to lose events during a failure as long as
      the system is restored to normal operation afterward.&nbsp; Event persistence
      on the other hand can only be supported if topology persistence is also being
      used.&nbsp; It doesn't help to keep track of events if the system is unable to
      find the consumers to which the events should be delivered.</P>
    <P>Two separate issues must be addressed as part of setting up the Notifcation
      for reliable operation.&nbsp; At the system administration level the
      Notification&nbsp; service must be configured for topology persistence and
      possibly for event persistence.&nbsp; At the application level,&nbsp;programs
      that operate as consumers and suppliers must set the appropriate parameters to
      enable reliable operation, and must cooperate with the reconnection process
      that occurs during topology recovery.</P>
    <h2>Configuring Notification Service Reliability.</h2>
    <h3>Service Configurator Changes</h3>
    <P>Runtime configuration of the Notification Service is supported through the
      service configurator file. This file is normally named svc.conf; however the
      -ORBSvcConf command line option allows an alternate service configuration file
      to be specified.
    </P>
    <P>
      Service configuration changes to support Notification Service reliability
      include a new option on the existing&nbsp; <code>Notify_Default_Event_Manager_Objects_Factory</code>
      service configuration command, and two new service configuration commands.
    </P>
    <H4>Notify_Default_Event_Manager_Objects_Factory option: -AllowReconnect</H4>
    <p>Certain recovery cases require that a Consumer be able to reconnect to an
      existing proxy object in the Notification Service in order to receive all
      events delivered by that proxy object. This behavior is a departure from the
      OMG Specification which mandates that the Notification Service should throw an
      "Already Connected" exception when a consumer attempts to connect to a proxy
      that was previously used by another Consumer.
    </p>
    <p>A new option, -AllowReconnect,&nbsp;is available for the existing <code>Notify_Default_Event_Manager_Objects_Factory
      </code>command to support this requirement. As an example of its use, the
      following line configures the Notification Service for multi-threaded operation
      supporting reconnection.</p>
    <code>static Notify_Default_Event_Manager_Objects_Factory "-DispatchingThreads 2
      -SourceThreads 2 -AllowReconnect" </code>
    <H3>Configuring Connection (Topologogy) Reliability</H3>
    <p>The support for persistent topology is actually a configurable strategy.&nbsp;
      TAO includes an XML Topology Persistence Strategy that uses an XML file for
      persistent storage, but it it is designed to allow other strategies to be
      developed.&nbsp; For example if topology information should be stored in a
      relational database file, it is possible to develop a persistent topology
      strategy to do so.&nbsp; The details of doing this are beyond the scope of this
      document.
    </p>
    <P>This document describes how to configure the XML topology persistence included
      with TAO.</P>
    <P>An example of the&nbsp;service configuration command to&nbsp;configure the XML
      strategy is:
    </P>
    <p><code>dynamic Topology_Factory Service_Object*
        TAO_CosNotification_Persist:_make_XML_Topology_Factory() "-base_path ./reconnect_test" </code>
    </p>
    <p>The first part of this line: <code>dynamic Topology_Factory Service_Object*
        TAO_CosNotification_Persist:_make_XML_Topology_Factory()</code>should be given
      exactly as shown. For details on this syntax, see chapter 17 of the TAO
      Developer's Guide.
    </p>
    <P>The quoted string at the end of the line contain arguments for the configured
      strategy. The arguments recognized by the XML topology strategy implemented in
      this project are:
    </P>
    <ul>
      <li>
      -v
      <li>
        -base_path <EM>file_path</EM>
      <li>
        -backup_count&nbsp;<EM>count</EM>
      <li>
        -save_base_path <EM>file_path</EM>
      <li>
        -load_base_path <EM>file_path</EM>
      <li>
        <H4>-no_timestamp
        </H4>
      </li>
    </ul>
    <H4>Topology_Factory Option: -v</H4>
    To help diagnose and/or document svc.conf settings, the "-v" will cause the
    options for the Topology_Factory to be displayed as they are interpreted
    <H4>Topology_Factory Option: -base_path file_path
    </H4>
    <P>The argument for this option is a fully qualified path name without an
      extension for the xml file in which topology information is saved. Three
      extensions will be appended to this name: .new, .xml, and .000
    </P>
    <P>Saved topology information will be written to <EM>file_path</EM>.new file.
      Information with a .new extension is not necessarily complete and will not be
      used to restore the topology.
    </P>
    <P>When the .new file is complete, the previous <EM>file_path</EM>.000 (if any)
      will be deleted, the previous <EM>file_path</EM>.xml (if any) will be renamed
      as <EM>file_path</EM>.000 and the <EM>file_path</EM>.new file will be renamed
      as file_path.xml. The assumption is that a file system rename operation is
      atomic. If this assumption holds than at any time the file <EM>file_path</EM>.xml
      (if it exists) contains the most recent complete save. If <EM>file_path</EM>.xml
      does not exist then <EM>file_path</EM>.000 contains the most recent complete
      save. If neither of these files exist the saved topology information is not
      available.
    </P>
    <H4>Topology_Factory Option: -backup_count count</H4>
    <P>This option modifies the behavior described in the preceeding section to allow
      additional backup copies of the topology file to be retained. The default
      value, 1, means that only the <EM>file_path</EM>.000 file will be kept. If a
      higher number is specified, then older versions will be kept. Rather than
      deleting <EM>file_path</EM>.000, the system will rename it to be <EM>file_path.</EM>001.&nbsp;
      Older versions will be named <EM>file_path</EM>.002, <EM>file_path</EM>.002 and
      so on.
    </P>
    <P>Under normal circumstances only one backup file is required -- in fact these
      additional backup files will not be used to restore the topoogy.&nbsp; However
      setting this number to a larger value lets the system keep a brief history of
      topology changes. Since the XML files are roughly human-readable this can be
      used as a diagnostic tool for problems related to Notification Service
      topology.
    </P>
    <H4>Topology_Factory Options: -save_base_path file_path and -load_base_path
      file_path
    </H4>
    <P>These options are alternatives to the -base_path option. They allow the file
      from which topology information is loaded at Notification Service startup time
      to be different from the file to which this information is saved as the system
      runs.
    </P>
    <P>This option is mostly used for developer testing, a system administrator may
      find an interesting use for this option -- possibly involving script files that
      rename the XML files during recovery from a Notification Service failure.
    </P>
    <H4>Topology_Factory Option: -no_timestamp</H4>
    <P>The XML files include a timestamp to indicate when the information was saved.
      The timestamp is for information only and is not needed for correct functioning
      of the topology persistence. This option suppresses that timestamp. Doing so
      makes it possible to compare XML files using a program like diff to see if the
      files represent the same topology.
    </P>
    <P>This option is intended primarily for testing the persistent topology
      implementation.
    </P>
    <h3>Configuring Event Reliability</h3>
    <p>A service configuraton new object, "Event_Persistence", can be configured in
      the service configuration file to enable and configure the Event Reliability.
      An example of the line needed to configure the Event_Persistence object is:
    </p>
    <p><code>dynamic Event_Persistence Service_Object*
        TAO_CosNotification_Persist:_make_Standard_Event_Persistence() "-v -file_path
        ./event_persist.db" </code>
    </p>
    <p><CODE></CODE>If this line does not appear in svc.conf, then event reliability
      will not be supported. QoS parameters for reliable event delivery will be
      silently ignored when Event Reliability is not configured. Event reliability
      also requires topology reliability, so if this line appears there must also be
      a "Topology_Factory" line in the file. If not, the Notification Service will
      fail to start up.
    </p>
    <P>The beginning of this line, up to and including the parentheses, should appear
      exactly as shown. For details on this syntax, see chapter 17 of the TAO
      Developer's Guide. The quoted string at the end of the line contains options
      for Event_Persistence.
    </P>
    <h4>Event_Persistence Option: -v</h4>
    <p>This option and any option that appears after this option will be written to
      the log (normally the console) as it is processed. This is intended to help
      diagnose and document the Event Persistence settings. The default is to
      configure Event Persistence silently.
    </p>
    <h4>Event_Persistence Option: -file_path path
    </h4>
    <p>This option gives the completely qualified name for the file in which
      persistent event information will be stored. The file should be configured on a
      reliable device that supports synchronized writes (i.e. flushing the operating
      system's write cache.) A device that is suitable for storing a reliable
      database would be appropriate for storing this file. The file will be subject
      to a relatively high number of small (single block) write requests, but very
      few, if any, read requests. If the file does not exist, then a new file will be
      created. If the file does exist, and if topology is successfully loaded, the
      events from this file will be reloaded and redelivered automatically. This is a
      required option. There is no default value.
    </p>
    <h4>Event_Persistence Option: -block_size n
    </h4>
    <p>This option gives the block size in bytes for the device on which the event
      reliability file is stored. For both performance and reliability reasons it is
      important that the value matches the physical characteristics of the device.
      The default value is 512.
    </p>
    <h2>Application Programming Changes to Support Reliability</h2>
    <p>
    &nbsp;When it is configured as described above, the Notification service
    supports reliable connectivity and/or&nbsp; event delivery.&nbsp;&nbsp;&nbsp;
    Actually achieving such reliability, however, requires cooperation from the
    Notification service clients (Suppliers and Consumers).
    <P>
    There are a number of failure possibilities and different recovery techniques
    are needed to handle them.&nbsp; The simplest case is when a client
    fails&nbsp;and is restarted.&nbsp;
    <P>
    The Notification service will have maintained the connection points (Supplier
    and Consumer Admins, Proxy Consumers, Proxy Admins, etc.) As each of these
    connections was established, an&nbsp;ID returned by the notification
    service.&nbsp; An application that wishes to be reconnected after a failure
    should save a persistent copy of these IDs.&nbsp; For example, it could write
    the IDs to a file, then read them back from the file after restarting.&nbsp;
    Using these ID's the application can reconnect to the existing connection
    points in the Notification service.&nbsp; The reconnection to the Proxy objects
    will only work if the Notification service has been configured with the&nbsp;
    -AllowReconnection option described above, but otherwise this process is fairly
    straightforward.
    <P>
    As soon as a supplier has reconnected, it can resume sending events.&nbsp; As
    soon as a consumer has reconnected, persistent events (if any) and new events
    will start to arrive.
    <P>
    Notice that the identity of a consumer or supplier is determined by these saved
    IDs.&nbsp; This is true even if the restarted client is running on a completely
    different machine from the original client.
    <P>
      The case of the Notification service itself failing then being restarted on the
      same or a different machine is somewhat more complicated.&nbsp; The
      Notification service wasn't designed to initiate a connection to a
      client.&nbsp; It must wait for the client to reconnect before it can start
      accepting or delivering events.&nbsp; The difficulty is in having the client
      know when to initiatie the reconnection, and to where the Notification service
      is running in case it was necessary to move it to a new machine due to the
      failure
      <H3>Reconnection Registry</H3>
    <p>The reconnection registry provides an answer to the question of how the client
      knows where and when to reconnect to the Notification&nbsp; service.&nbsp; This
      TAO-specific interface is implemented by the EventChannelFactory in the
      reliable Notification Service.&nbsp; Clients can narrow the EventChannelFactory
      object reference to a Reconnection Registery interface, then register a
      Reconnection Callback object that will be notified when the Notification
      service has restarted and is ready for reconections.&nbsp; The
      EventChannelFactory passes its own object reference to the Reconnection
      Callback object to inform the client where the Notification service is now
      running.</p>
    <P>The interfaces involved are defined in the NotifyExt.idl file (in
      $TAO_ROOT/orbsvcs/orbsvcs) and are shown here:</P>
  <pre>
  /** 
   * \brief An interface which gets registered with a ReconnectionRegistry.
   * 
   * A supplier or consumer must implement this interface in order to 
   * allow the Notification Service to attempt to reconnect to it after 
   * a failure.  The supplier or consumer must register its instance of 
   * this interface with the ReconnectionRegistry.
   */ 
  interface ReconnectionCallback
  { 
    /// Perform operations to reconnect to the Notification Service 
    /// after a failure.
    void reconnect (in Object new_connection); 

    /// Check to see if the ReconnectionCallback is alive
    boolean is_alive ();
  };

  /**
   * \brief An interface that handles registration of suppliers and consumers.
   *
   * This registry should be implemented by an EventChannelFactory and
   * will call the appropriate reconnect methods for all ReconnectionCallback
   * objects registered with it.
   */
  interface ReconnectionRegistry
  {
    typedef unsigned long ReconnectionID;
    ReconnectionID register_callback(in ReconnectionCallback reconection);

    void unregister_callback (in ReconnectionID id);

    /// Check to see if the ReconnectionRegistry is alive
    boolean is_alive ();
  };
  </pre>
    <H3>Using&nbsp;Event Reliability</H3>
    <P>Configuring the Notification service for reliable event delivery is necessary,
      but not sufficient to enable reliable handling of events.&nbsp; The application
      code in either the client or the server must configure the EventChannel through
      which the events are delivered to operate in the reliable mode.&nbsp; This is
      done by setting the QoSProperties named "ConnectionReliabilty" and
      "EventReliability" to the value "persistent" -- either at the time the channel
      is created or at a later time useing&nbsp; the set_qos method.</P>
    <P>Once an channel has been configured for reliable operation, persistence can be
      disabled on an event by event basis using QoSProperties of the event
      itself.&nbsp; This could be none, for examlpe, to avoid the overhead of
      persistently storing events for which reliability is not needed.</P>
    <P>The supplier sends events to the EventChannel using a push() method.&nbsp; For
      persistent events, this call will not return to the supplier until the
      Notification service is prepared to guarantee event delivery.&nbsp;
    </P>
    <P>Application code in the Consumer should be written with the knowledge that
      events are guaranteed to be delivered, but during recovery from a failure there
      is a possiblity that an event may arrive more than once.&nbsp; This could
      happen, for example if the event was in the process of being delivered at the
      time the failure occurred and the failure prevents the Notfication service from
      determining if the delivery completed successfully.&nbsp; To meet its
      committment that every event will be delivered, the Notification service will
      retry the delivery in this canse which may result in a duplicate event.</P>
    <P>As long as this situation is understood at the time the application is
      designed, it should be possible for the application to handle this situation.</P>
  </body>
</html>