1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
|
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
<head>
<title>Using the Reliable Notification Service</title>
<meta content="False" name="vs_snapToGrid">
<meta content="False" name="vs_showGrid">
<!-- $Id$ -->
</head>
<body>
<h1>Using the Reliable Notification Service</h1>
<h2>Background</h2>
<p>There are two CORBA services defined by the OMG to support the
Supplier/Consumer design pattern. This pattern allows messages (known as
Events in this context) to be generated by one or more suppliers and delivered
to one or more consumers without requiring that the suppliers and consumers
have any knowledge of each other. </p>
<P>The Event Service provides a basic implementation of this pattern, and the
Notification service extends this basic service to support a rich variety of
optional features.</P>
<h2>Reliability and Persistence</h2>
<p>One of the optional features of the Notification service is Reliability.
By default the Event Service and the Notification service provide a <EM>best-effort</EM>
support for event delivery. If things go wrong -- program crashes,
communications failures, etc. events may be lost without notice.</p>
<P>There are some circumstances in which losing events is not
acceptable. The Notification service may be used for these situations if
it is configured for reliable operation. Reliable operation is not
available in the Event Service. Reliable operation means information is
saved persistently (usually on a disk file) and used to recover from the
various failures that might otherwise lead to loss of data.</P>
<P>There are two separate, but related, issues that need to be addressed to
provide reliable event delivery: topology persistence an event
persistence.</P>
<P>To provide topology persistence, sometimes called connection persistence, the
Notification service must keep track of what clients (Suppliers and Consumers)
have connected to the Notification service and what options have been specified
to contol the delivery of events.</P>
<P>To provide event persistence the Notification service tracks each event in
persistent storage to be sure it is delivered to every consumer that should
receive it.
</P>
<P>There may be situations in which topology persistence is all that is necessary
-- it may be acceptable to lose events during a failure as long as
the system is restored to normal operation afterward. Event persistence
on the other hand can only be supported if topology persistence is also being
used. It doesn't help to keep track of events if the system is unable to
find the consumers to which the events should be delivered.</P>
<P>Two separate issues must be addressed as part of setting up the Notifcation
for reliable operation. At the system administration level the
Notification service must be configured for topology persistence and
possibly for event persistence. At the application level, programs
that operate as consumers and suppliers must set the appropriate parameters to
enable reliable operation, and must cooperate with the reconnection process
that occurs during topology recovery.</P>
<h2>Configuring Notification Service Reliability.</h2>
<h3>Service Configurator Changes</h3>
<P>Runtime configuration of the Notification Service is supported through the
service configurator file. This file is normally named svc.conf; however the
-ORBSvcConf command line option allows an alternate service configuration file
to be specified.
</P>
<P>
Service configuration changes to support Notification Service reliability
include a new option on the existing <code>Notify_Default_Event_Manager_Objects_Factory</code>
service configuration command, and two new service configuration commands.
</P>
<H4>Notify_Default_Event_Manager_Objects_Factory option: -AllowReconnect</H4>
<p>Certain recovery cases require that a Consumer be able to reconnect to an
existing proxy object in the Notification Service in order to receive all
events delivered by that proxy object. This behavior is a departure from the
OMG Specification which mandates that the Notification Service should throw an
"Already Connected" exception when a consumer attempts to connect to a proxy
that was previously used by another Consumer.
</p>
<p>A new option, -AllowReconnect, is available for the existing <code>Notify_Default_Event_Manager_Objects_Factory
</code>command to support this requirement. As an example of its use, the
following line configures the Notification Service for multi-threaded operation
supporting reconnection.</p>
<code>static Notify_Default_Event_Manager_Objects_Factory "-DispatchingThreads 2
-SourceThreads 2 -AllowReconnect" </code>
<H3>Configuring Connection (Topologogy) Reliability</H3>
<p>The support for persistent topology is actually a configurable strategy.
TAO includes an XML Topology Persistence Strategy that uses an XML file for
persistent storage, but it it is designed to allow other strategies to be
developed. For example if topology information should be stored in a
relational database file, it is possible to develop a persistent topology
strategy to do so. The details of doing this are beyond the scope of this
document.
</p>
<P>This document describes how to configure the XML topology persistence included
with TAO.</P>
<P>An example of the service configuration command to configure the XML
strategy is:
</P>
<p><code>dynamic Topology_Factory Service_Object*
TAO_CosNotification_Persist:_make_XML_Topology_Factory() "-base_path ./reconnect_test" </code>
</p>
<p>The first part of this line: <code>dynamic Topology_Factory Service_Object*
TAO_CosNotification_Persist:_make_XML_Topology_Factory()</code>should be given
exactly as shown. For details on this syntax, see chapter 17 of the TAO
Developer's Guide.
</p>
<P>The quoted string at the end of the line contain arguments for the configured
strategy. The arguments recognized by the XML topology strategy implemented in
this project are:
</P>
<ul>
<li>
-v
<li>
-base_path <EM>file_path</EM>
<li>
-backup_count <EM>count</EM>
<li>
-save_base_path <EM>file_path</EM>
<li>
-load_base_path <EM>file_path</EM>
<li>
<H4>-no_timestamp
</H4>
</li>
</ul>
<H4>Topology_Factory Option: -v</H4>
To help diagnose and/or document svc.conf settings, the "-v" will cause the
options for the Topology_Factory to be displayed as they are interpreted
<H4>Topology_Factory Option: -base_path file_path
</H4>
<P>The argument for this option is a fully qualified path name without an
extension for the xml file in which topology information is saved. Three
extensions will be appended to this name: .new, .xml, and .000
</P>
<P>Saved topology information will be written to <EM>file_path</EM>.new file.
Information with a .new extension is not necessarily complete and will not be
used to restore the topology.
</P>
<P>When the .new file is complete, the previous <EM>file_path</EM>.000 (if any)
will be deleted, the previous <EM>file_path</EM>.xml (if any) will be renamed
as <EM>file_path</EM>.000 and the <EM>file_path</EM>.new file will be renamed
as file_path.xml. The assumption is that a file system rename operation is
atomic. If this assumption holds than at any time the file <EM>file_path</EM>.xml
(if it exists) contains the most recent complete save. If <EM>file_path</EM>.xml
does not exist then <EM>file_path</EM>.000 contains the most recent complete
save. If neither of these files exist the saved topology information is not
available.
</P>
<H4>Topology_Factory Option: -backup_count count</H4>
<P>This option modifies the behavior described in the preceeding section to allow
additional backup copies of the topology file to be retained. The default
value, 1, means that only the <EM>file_path</EM>.000 file will be kept. If a
higher number is specified, then older versions will be kept. Rather than
deleting <EM>file_path</EM>.000, the system will rename it to be <EM>file_path.</EM>001.
Older versions will be named <EM>file_path</EM>.002, <EM>file_path</EM>.002 and
so on.
</P>
<P>Under normal circumstances only one backup file is required -- in fact these
additional backup files will not be used to restore the topoogy. However
setting this number to a larger value lets the system keep a brief history of
topology changes. Since the XML files are roughly human-readable this can be
used as a diagnostic tool for problems related to Notification Service
topology.
</P>
<H4>Topology_Factory Options: -save_base_path file_path and -load_base_path
file_path
</H4>
<P>These options are alternatives to the -base_path option. They allow the file
from which topology information is loaded at Notification Service startup time
to be different from the file to which this information is saved as the system
runs.
</P>
<P>This option is mostly used for developer testing, a system administrator may
find an interesting use for this option -- possibly involving script files that
rename the XML files during recovery from a Notification Service failure.
</P>
<H4>Topology_Factory Option: -no_timestamp</H4>
<P>The XML files include a timestamp to indicate when the information was saved.
The timestamp is for information only and is not needed for correct functioning
of the topology persistence. This option suppresses that timestamp. Doing so
makes it possible to compare XML files using a program like diff to see if the
files represent the same topology.
</P>
<P>This option is intended primarily for testing the persistent topology
implementation.
</P>
<h3>Configuring Event Reliability</h3>
<p>A service configuraton new object, "Event_Persistence", can be configured in
the service configuration file to enable and configure the Event Reliability.
An example of the line needed to configure the Event_Persistence object is:
</p>
<p><code>dynamic Event_Persistence Service_Object*
TAO_CosNotification_Persist:_make_Standard_Event_Persistence() "-v -file_path
./event_persist.db" </code>
</p>
<p><CODE></CODE>If this line does not appear in svc.conf, then event reliability
will not be supported. QoS parameters for reliable event delivery will be
silently ignored when Event Reliability is not configured. Event reliability
also requires topology reliability, so if this line appears there must also be
a "Topology_Factory" line in the file. If not, the Notification Service will
fail to start up.
</p>
<P>The beginning of this line, up to and including the parentheses, should appear
exactly as shown. For details on this syntax, see chapter 17 of the TAO
Developer's Guide. The quoted string at the end of the line contains options
for Event_Persistence.
</P>
<h4>Event_Persistence Option: -v</h4>
<p>This option and any option that appears after this option will be written to
the log (normally the console) as it is processed. This is intended to help
diagnose and document the Event Persistence settings. The default is to
configure Event Persistence silently.
</p>
<h4>Event_Persistence Option: -file_path path
</h4>
<p>This option gives the completely qualified name for the file in which
persistent event information will be stored. The file should be configured on a
reliable device that supports synchronized writes (i.e. flushing the operating
system's write cache.) A device that is suitable for storing a reliable
database would be appropriate for storing this file. The file will be subject
to a relatively high number of small (single block) write requests, but very
few, if any, read requests. If the file does not exist, then a new file will be
created. If the file does exist, and if topology is successfully loaded, the
events from this file will be reloaded and redelivered automatically. This is a
required option. There is no default value.
</p>
<h4>Event_Persistence Option: -block_size n
</h4>
<p>This option gives the block size in bytes for the device on which the event
reliability file is stored. For both performance and reliability reasons it is
important that the value matches the physical characteristics of the device.
The default value is 512.
</p>
<h2>Application Programming Changes to Support Reliability</h2>
<p>
When it is configured as described above, the Notification service
supports reliable connectivity and/or event delivery.
Actually achieving such reliability, however, requires cooperation from the
Notification service clients (Suppliers and Consumers).
<P>
There are a number of failure possibilities and different recovery techniques
are needed to handle them. The simplest case is when a client
fails and is restarted.
<P>
The Notification service will have maintained the connection points (Supplier
and Consumer Admins, Proxy Consumers, Proxy Admins, etc.) As each of these
connections was established, an ID returned by the notification
service. An application that wishes to be reconnected after a failure
should save a persistent copy of these IDs. For example, it could write
the IDs to a file, then read them back from the file after restarting.
Using these ID's the application can reconnect to the existing connection
points in the Notification service. The reconnection to the Proxy objects
will only work if the Notification service has been configured with the
-AllowReconnection option described above, but otherwise this process is fairly
straightforward.
<P>
As soon as a supplier has reconnected, it can resume sending events. As
soon as a consumer has reconnected, persistent events (if any) and new events
will start to arrive.
<P>
Notice that the identity of a consumer or supplier is determined by these saved
IDs. This is true even if the restarted client is running on a completely
different machine from the original client.
<P>
The case of the Notification service itself failing then being restarted on the
same or a different machine is somewhat more complicated. The
Notification service wasn't designed to initiate a connection to a
client. It must wait for the client to reconnect before it can start
accepting or delivering events. The difficulty is in having the client
know when to initiatie the reconnection, and to where the Notification service
is running in case it was necessary to move it to a new machine due to the
failure
<H3>Reconnection Registry</H3>
<p>The reconnection registry provides an answer to the question of how the client
knows where and when to reconnect to the Notification service. This
TAO-specific interface is implemented by the EventChannelFactory in the
reliable Notification Service. Clients can narrow the EventChannelFactory
object reference to a Reconnection Registery interface, then register a
Reconnection Callback object that will be notified when the Notification
service has restarted and is ready for reconections. The
EventChannelFactory passes its own object reference to the Reconnection
Callback object to inform the client where the Notification service is now
running.</p>
<P>The interfaces involved are defined in the NotifyExt.idl file (in
$TAO_ROOT/orbsvcs/orbsvcs) and are shown here:</P>
<pre>
/**
* \brief An interface which gets registered with a ReconnectionRegistry.
*
* A supplier or consumer must implement this interface in order to
* allow the Notification Service to attempt to reconnect to it after
* a failure. The supplier or consumer must register its instance of
* this interface with the ReconnectionRegistry.
*/
interface ReconnectionCallback
{
/// Perform operations to reconnect to the Notification Service
/// after a failure.
void reconnect (in Object new_connection);
/// Check to see if the ReconnectionCallback is alive
boolean is_alive ();
};
/**
* \brief An interface that handles registration of suppliers and consumers.
*
* This registry should be implemented by an EventChannelFactory and
* will call the appropriate reconnect methods for all ReconnectionCallback
* objects registered with it.
*/
interface ReconnectionRegistry
{
typedef unsigned long ReconnectionID;
ReconnectionID register_callback(in ReconnectionCallback reconection);
void unregister_callback (in ReconnectionID id);
/// Check to see if the ReconnectionRegistry is alive
boolean is_alive ();
};
</pre>
<H3>Using Event Reliability</H3>
<P>Configuring the Notification service for reliable event delivery is necessary,
but not sufficient to enable reliable handling of events. The application
code in either the client or the server must configure the EventChannel through
which the events are delivered to operate in the reliable mode. This is
done by setting the QoSProperties named "ConnectionReliabilty" and
"EventReliability" to the value "persistent" -- either at the time the channel
is created or at a later time useing the set_qos method.</P>
<P>Once an channel has been configured for reliable operation, persistence can be
disabled on an event by event basis using QoSProperties of the event
itself. This could be none, for examlpe, to avoid the overhead of
persistently storing events for which reliability is not needed.</P>
<P>The supplier sends events to the EventChannel using a push() method. For
persistent events, this call will not return to the supplier until the
Notification service is prepared to guarantee event delivery.
</P>
<P>Application code in the Consumer should be written with the knowledge that
events are guaranteed to be delivered, but during recovery from a failure there
is a possiblity that an event may arrive more than once. This could
happen, for example if the event was in the process of being delivered at the
time the failure occurred and the failure prevents the Notfication service from
determining if the delivery completed successfully. To meet its
committment that every event will be delivered, the Notification service will
retry the delivery in this canse which may result in a duplicate event.</P>
<P>As long as this situation is understood at the time the application is
designed, it should be possible for the application to handle this situation.</P>
</body>
</html>
|