1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
|
# $Id$
The new implementation of the Event Channel
= Abstract
The current implementation of TAO's real-time Event Channel
has proven to be efficient, generally reliable and in general bug
free; but it was designed to solve the forces in one problem domain
(hard real-time avionics), so performance considerations (latency)
were the main concern over other requirements.
A new implementation of the real-time Event Channel is
proposed that will preserve the performance of the the original event
channel, yet it will be properly strategized so it can be adapted and
extended to other domains (such as distributed interactive
simulation).
= Thanks to Irfan for providing feedback on this document and
designing the delayed operation for the event dispatching.
= The module architecture
The current event channel is based on a series of modules
organized in a stack, each module is supposed to execute a single
task, for instance correlation; and could potentially be extracted or
replaced to adapt the EC to new requirements. Unfortunately the
modules keep explicit references to the modules above them and below
them, including the exact type of the module. Even though adding base
classes to represent the modules in a generic way would solve the
syntactic problems of this architecture, the relation between the
modules have semantic significance and they cannot simply be organized
in an arbitrary way; even more, we have found that only a few
operations on each module need to be strategized, and that using
Template Method or Strategy on each module would be a better solution
than to replace the module wholesale.
= The new architecture
The new Event Channel will consist of the following
components:
- The Filter object: each consumer will have a filter object, this
filters are organized in a hierarchical structure, for instance a
disjunction filter has N children and accepts an event if any of its
children do, in contrast a conjunction filter will wait until all
its children have accepted the event.
The filter hierarchy for a particular consumer is created using
the Builder pattern, i.e. a separate class creates the hierarchy
from the ConsumerQOS specification. The user can add new filtering
strategies (such as "wait until this events arrive in this
sequence" or "do not accept events from this source") by providing a
new filter and a new Filter_Builder objects.
Notice that the per-consumer filters were already present in the
old event channel, the main difference is that all events went
through the correlation module, in many cases just to check that the
consumer did not require correlation; the new scheme will eliminate
that extra test. More importantly, if the event was accepted a
ACE_ES_Dispatch_Request was dynamically allocated and passed to the
dispatching module; this was necessary because some implementations
of the dispatching module required to enqueue the event. Clearly
the allocation of some node for the queue is a decision better left
for the dispatching module, thus avoiding memory allocations in the
single threaded case.
Another importat feature of this design is that in the case where
the consumer only has disjunctive filtering no copies of the data
are needed (until we arrive to the dispatching module).
This filter objects can also be used to strategize the priority
assignment in conjuctive and disjunctive correlations; for example,
some consumers may require a fixed priority for disjunctions or the
highest priority for conjunctions, instead of just the priority of
the last event.
- The Dispatching module: as in the original event channel
implementation there are multiple ways to dispatch events, this
feature is preserved with only a few syntactic changes.
Possible implementations: Reactive Dispatching, a single queue
served by multiple threads, multiple queues serviced by threads at
different priorities.
An interesting challenge to solve in this module is to minimize
the amount of copying of events, for instance:
+ Reactive dipatching does not require any copies for an event,
the event is simply pushed from the supplier to the consumer.
+ If the dispatching module has any form of queueing then at least
one copy must be made. Ideally we want to avoid making multiple
copies for each consumer interested in the event; a potential
solution is to let the dispatching module create up front a
"reference counted" version of the event; this reference counted
version is used by the SupplierFiltering strategy to push to
each consumer.
@@ TODO: how do we match this with the filtering interfaces?
- The ConsumerAdmin module: this object acts as a factory for the
ProxyPushSupplier objects (i.e. the interface between the event
channel and its consumers); the object delegates on the event
channel implementation to create the objects, and provides a simple
mechanism to control the object activation in different POAs: it
queries the Event Channel for the right POA to use.
Possible implementations: it could keep the consumers classified
by the events they consume, minimizing the time required to
connect/disconnect a consumer or to dispatch an event
[see the relationship with SupplierFiltering]
- The SupplierAdmin module: this is a factory for the
ProxyPushConsumer objects (again delegating on the event channel);
it also provides the user with control over the proxy activation.
It provides two template methods that can be used to:
+ Inform all supplier that a consumer has connected/disconnected.
+ Inform only the suppliers that are publish the events in the
consumer.
+ Do not inform the suppliers.
Possible implementations: it could keep the supplier classified
by the events they publish, minimizing the time to
connect/disconnect a supplier.
- The ProxyPushSupplier object delegates on the filter to do most of
its job, but it can be subclassed to provide event counting and
similar features. Notice that providing the Supplier with a
Null_Filter moves all the filtering responsability to the
ProxyPushConsumers.
Possible implementations: use templates to define the kind of
locking strategy.
- The ProxyPushConsumer object is strategized to provide different
ways to handle (close to the supplier) filtering (see the
SupplierFiltering description).
Possible implementations: use templates to define the kind of
locking strategy.
- The SupplierFiltering classes are used to control the filtering
strategy close the the Suppliers (remember that the object close to
the supplier is the ProxyPushConsumer). This object receives an
event set from the PushSupplier and passes it up to the right set of
ProxyPushSupplier (the Consumer representatives).
Possible implementations:
+ Each one keeps track of the consumers possibly interested in
the events published here; in this way the dispatching is
proportional to the number of consumers interested in the
event, not the total number of consumers.
+ Use the global consumer list to find objects interested in the
current event; it is simpler, scales betters memory-wise, but
will perform worse than the first alternative, unless most
consumers are interested in most events.
+ Use a global list for each type of event, thus amortizing the
cost between all the suppliers that have an event.
+ Keep a single list of consumers, but do not try to filter them
by source or type, the ProxyPushSuppliers are then responsible
for filtering (using the EC_Filter objects).
An interesting aspect of this object is how is it to manage event
dispatching: will it iterate over the set of consumers (holding a
lock?) and just dispatch the event to each one? Will it make a copy
of the consumers (reducing the duration of the lock)? We forsee
several alternatives:
+ Simply iterate over the set of consumers for the current
supplier (can be a global set), holding a lock as we go.
This is potentially the most efficient version, but it can
suffer from priority inversion because the lock is held for
a long time.
It can also produce dead-locks if there is no queueing in
the dispatching module.
+ Make a copy of the current set of consumers, unfortunately
this could requires a memory allocation, and potentially
increasing the reference count on the consumers.
An interesting idea to explore is to keep a work array in
TSS storage, this array can be used to copy the consumers
from the shared resource.
Since the size of the array can be determined before hand
(using the subscriptions and publications), the array could
be pre-allocated to the maximum size of all the supplier, or
simply grown on demand. For hard real-time applications the
initial size of this array could be configured at compile
time, and chosen so that no re-allocation is ever needed.
In any case the shared resource is held for a shorter
time, just long enough to copy the necessary elements into
the array, the dispatching to each consumer is done after
releasing the global lock.
+ Mark the list as "busy" so no changes will be made to it
while we iterate over it. Instead, any changes to the list
are "posted" in a list of Command objects.
After finishing the iteration the lock is acquired again,
the is is marked as "idle" and any posted operations are
executed.
"busy" and "idle" could be implemented using a reference
count; so multiple "reader" threads can go in [with a
limit?]
Drawbacks: can lead to dead-lock if the HWM is reached when
a consumer that also plays the Supplier role pushes an event
as part of its upcall.
Can give too much priority to the writes if the HWM is 1.
- The Timer_Module: this is used by the EC_Filter_Builder to create
per-consumer timeout events. It will simply push events directly to
the EC_ProxyPushSupplier objects.
- The Event_Channel: this class acts as a mediator between the
components above. It is completely strategized by an abstract
factory (EC_Factory) that creates each one of the objects already
described, in fact, the factory methods in the Event Channel are
implemented as simple delegation on the EC_Factory.
- The EC_Factory: using this class the user can strategize and control
all the object creations in the EC; notice that this is an abstract
factory because not all the combinations of the classes described
make sense.
= Interactions
The architecture described above is a bit hard to understand if we
don't describe the interaction between all the components above:
- Dispatching an event: the path to dispatch an event starts with an
EC_ProxyPushConsumer that receives the event from its supplier,
it simply delegates on the SupplierFiltering object bound to it at
creation time.
The SupplierFiltering object pushes the event [since the event is
a set it has to push one event at a time] to the a set of
ProxyPushSuppliers [recall that this are the consumer ambassadors]
- Adding a consumer:
- Adding a supplier:
- Removing a consumer:
- Removing a supplier:
- Shutting down the event channel:
= Performance
The main sources of overhead in the EC are:
- Locking
- Memory allocation
- Data copying
the new design do not neglect this issues, for instance:
- The EC_Factory creates strategized locks, so the single threaded
implementations can perform optimally.
- Each consumer has its own filters, so no locking is required at
each filter (just one for the consumer).
- In the common case data can be pushed from the original supplier
to the dispatching module without any data copies or memory
allocations, at that point the dispatching module can make the
copies it deems necessary to push the data ahead.
- The filtering mechanism will provide two data paths, one for the
data that is owned by the filter or the event channel (and thus
require no copying) and one for external data; this will be used
in the dispatching module to minimize data copying too.
- Features that are not important for production code can be plugged
out, for example: timestamping (for performance analysis) can be
implemented as a Decorator on the ProxyPushConsumer and the
Dispatching classes. Similarly the configuration runs require
that the EC call the scheduler with the subscription and
correlation information, this calls can be completely removed for
the production code.
- Initial experiments show that the new EC can (in some
configurations) dispatch one event without a single memory
allocation, contrast this with the 8 memory allocations in the old
EC.
- Initial experiments show that the new EC can (in some
configurations) dispatch one event with 0 locks for single
threaded applications and we estimate that less than 5 locks
(exact numbers are not available yet) will be required for the
multi-threaded version.
Contrast this with the 28 locks required by the old EC.
- Initial experiments suggest that the EC can filter and dispatch
events with no data copying (with Reactive dispatching) and with
only one data copy (if Prioritized dispatching is used).
We believe that even this extra data copy could be minimized in
the multi-threaded case (or at least limited to the event header
and avoided for the event payload).
In general we expect that the performance of the new real-time
Event Service will be equal or better than the previous
implementation. As we mention above initial experiments suggest that
this is going to be the case.
= Locking revisited
There are several ways to support strategized locking, for example:
- Each class is implemented without any locking, all the locks must
be taken outside the context of the class.
- Each class is implemented as a base class without any locking,
derived classes provide the right kind of locking [inflexible]
- Each class can be Decorated with a version that supports locking
[inefficient]
- Each class parametric over the kind of locking [complex]
- Each class has an strategy (ACE_Lock) to do the locking [easier]
we will try the last alternative first, if the performance
penalty in the case with no locking (or in the case where the exact
locking type is known well in advance) proves to be too high we can
explore the other solutions.
|