TAO/docs/performance.html


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
  <head>
    <!-- $Id$ -->
    <title>TAO Performance Tuning</title>
    <LINK href="tao.css" rel="stylesheet" type="text/css">
  </head>

  <body>
  <hr><p>
    <h3>TAO Performance Tuning</h3>

    <a name="overview"></a>
    <h3>Overview</h3>

    <p>
      TAO is increasingly being used to support high-performance
      distributed real-time and embedded (DRE) applications.  DRE
      applications constitute an important class of distributed
      systems where predictability and efficiency are essential for
      success.  This document describes how to configure <a href
      ="index.html">TAO</a> to enhance its throughput, scalability,
      and latency for a variety of applications. </p>

     <p>
      As with most applications, including compilers, enabling
      optimizations can often introduce side-effects that may not be
      desirable for all use-cases.  TAO's default configuration
      therefore emphasizes programming simplicity rather than top
      speed or scalability.  Our goal is to assure that CORBA
      applications work correctly ``out-of-the-box,'' while also
      enabling developers to further optimize their CORBA applications
      to meet stringent performance requirements. </P>
      
    <p>  
      TAO's performance tuning philosophy reflects the fact that there
      are trade-offs between speed, size, scalability, and programming
      simplicity.  For example, certain ORB configurations work well
      for a large number of clients, whereas others work better for a
      small number.  Likewise, certain configurations minimize
      internal ORB synchronization and memory allocation overhead by
      making assumptions about how applications are designed.
    </p>

    <p>
      This document is organized as follows:
    </p>
    <ul>
      <li>
	<a href="#throughput">Optimizing Throughput</a>
	<ul>
	  <li>
	    <a href="#client_throughput">Optimizing Client Throughput</a>
	  </li>
	  <li>
	    <a href="#server_throughput">Optimizing Server Throughput</a>
	  </li>
	</ul>
      </li>
      <li>
	<a href="#scalability">Optimizing Scalability</a>
	<ul>
	  <li>
	    <a href="#client_scalability">Optimizing Client Scalability</a>
	  </li>
	  <li>
	    <a href="#server_scalability">Optimizing Server Scalability</a>
	  </li>
	</ul>
      </li>
    </ul>
      
    <p><hr><p>
    <a name="throughput"></a>
    <h3>Optimizing Throughput</h3>

    <p>
      In this context, ``throughput'' refers to the number of events
      occurring per unit time, where ``events'' can refer to
      ORB-mediated operation invocations, for example.  This section
      describes how to optimize client and server throughput.
    </p>

    <p>
      It is important to understand that enabling throughput
      optimizations for the client may not affect the server
      performance and vice versa.  In particular, the client and
      server ORBs may be designed by different ORB suppliers.
    </p>

    <a name="client_throughput"></a>
    <h3>Optimizing Client Throughput</h3>

    <p>
      Client ORB throughput optimizations improve the rate at which
      CORBA requests (operation invocations) are sent to the target
      server.  Depending on the application, various techniques can be
      employed to improve the rate at which CORBA requests are sent
      and/or the amount of work the client can perform as requests are
      sent or replies received.  These techniques consist of:
    </p>
    <ul>
      <li>
	<b>Run-time features</b> offered by the ORB, such as
	Asynchronous Method Invocations (AMI)
        <!-- Ossama, are there other examples you can list here? -->
      </li>
      <li>
	<b>ORB configurations</b>, such as disabling synchronization
	of various parts of the ORB in a single-threaded application
      </li>
    </ul>

    <p>
      We explore these techniques below.
    </p>

    <h4>Run-time Client Optimizations</h4>

    <p>
      For two-way invocations, i.e., those that expect a reply
      (including ``<CODE>void</CODE>'' replies), Asynchronous method
      invocations (AMI) can be used to give the client the opportunity
      to perform other work as a CORBA request is sent to the target,
      handled by the target, and the reply is received.
    </p>

    <h4>Client Optimizations via ORB Configuration</h4>

    <p>
      A TAO client ORB can be optimized for various types of
      applications:
    </p>

    <ul>
      <li>
	<b>Single-Threaded</b>
	<ul>
	  <li>
	    <p>
	      A single-threaded client application may not require
	      the internal thread synchronization performed by TAO.
	      It may therefore be useful to add the following line to your
	      <code>svc.conf</code> file:
	    </p>

	    <blockquote>
	      <code>static <a href = "Options.html#DefaultClient">Client_Strategy_Factory</a> "<a href="Options.html#-ORBProfileLock">-ORBProfileLock</a> null"</code>
	    </blockquote>

	    <p>
	      If such an entry already exists in your
	      <code>svc.conf</code> file, then just add
	      <code>-ORBProfileLock null</code> to the list options
	      between the quotes found after
	      <code>Client_Strategy_Factory</code>.
	    </p>

	    <p>
	      Other options include disabling synchronization in the
	      components of TAO responsible for constructing and sending
	      requests to the target and for receiving replies.  These
	      components are called ``connection handlers.''  To disable
	      synchronization in the client connection handlers, simply
	      add:
	    </p>

	    <blockquote>
	      <code>
		<a href="Options.html#-ORBClientConnectionHandler">
		  -ORBClientConnectionHandler</a> ST
	      </code>
	    </blockquote>

	    <p>
	      to the list of <code>Client_Strategy_Factory</code>
	      options.  Other values for this option, such as
	      <code>RW</code>, are more appropriate for "pure"
	      clients.  See the <code>
		<a href="Options.html#-ORBClientConnectionHandler">
		  -ORBClientConnectionHandler</a></code> option
	      documentation for details.
	    </p>
	    
	  </li>
	</ul>
      </li>

      <li>
	<b>Low Client Scalability Requirements</b>
	<ul>
	  <li>
	    <p>
	      Clients with lower scalability requirements can dedicate a
	      connection to one request at a time, which means that no
              other requests or replies will be sent or received,
	      respectively, over that connection while a request is
              pending.  The connection is <i>exclusive</i> to a given
              request, thus reducing contention on a connection.
              However, that exclusivity 
	      comes at the cost of a smaller number of requests that
	      may be issued at a given point in time.  To enable this
	      behaviour, add the following option to the
	      <code>Client_Strategy_Factory</code> line of your
	      <code>svc.conf</code> file:
	    </p>

	    <blockquote>
	      <code>
		<a href="Options.html#-ORBTransportMuxStrategy">
		  -ORBTransportMuxStrategy</a> EXCLUSIVE
	      </code>
	    </blockquote>

	  </li>
	</ul>
      </li>
    </ul>

    <a name="server_throughput"></a>
    <h3>Optimizing Server Throughput</h3>

    <p>
      Throughput on the server side can be improved by configuring TAO
      to use a <i>thread-per-connection</i> concurrency model.  With
      this concurrency model, a single thread is assigned to service
      each connection.  That same thread is used to dispatch the
      request to the appropriate servant, meaning that thread context
      switching is kept to minimum.  To enable this concurrency model
      in TAO, add the following option to the
      <code>
	<a href="Options.html#DefaultServer">Server_Strategy_Factory</a>
      </code>
      entry in your <code>svc.conf</code> file:
    </p>

    <blockquote>
      <code>
	<a href="Options.html#orb_concurrency">
	  -ORBConcurrency</a> thread-per-connection
      </code>
    </blockquote>

    <p>
      While the <i>thread-per-connection</i> concurrency model may
      improve throughput, it generally does not scale well due to
      limitations of the platform the application is running.  In
      particular, most operating systems cannot efficiently handle
      more than <code>100</code> or <code>200</code> threads running
      concurrently, meaning that.  Hence performance often degrades
      sharply as the number of connections increases over those
      numbers.
    </p>

    <p>
      Other concurrency models are further discussed in the
      <i><a href="#server_scalability">Optimizing Server
	  Scalability</a></i> section below.
    </p>

    <p><hr><p>

    <a name="scalability"></a>
    <h3>Optimizing Scalability</h3>

    <p>
      In this context, ``scalability'' refers to how well an ORB
      performs as the number of CORBA requests increases.  For
      example, a non-scalable configuration will perform poorly as the
      number of pending CORBA requests on the client increases from
      <code>10</code> to <code>1,000</code>, and similarly on the
      server.  ORB scalability is particularly important on the server
      since it must often handle many requests from multiple clients.
    </p>

    <a name="client_scalability"></a>
    <h3>Optimizing Client Scalability</h3>

    <p>
      In order to optimize TAO for scalability on the client side,
      connection multiplexing must be enabled.  Specifically, multiple
      requests may be issued and pending over the same connection.
      Sharing a connection in this manner reduces the amount of
      resources required by the ORB, which in turn makes more
      resources available to the application.  To enable this behavior
      use the following <code>Client_Strategy_Factory</code> option:
    </p>

    <blockquote>
      <code>
	<a href="Options.html#-ORBTransportMuxStrategy">
	  -ORBTransportMuxStrategy</a> MUXED
      </code>
    </blockquote>

    <p>
      This is the default setting used by TAO.
    </p>

    <a name="server_scalability"></a>
    <h3>Optimizing Server Scalability</h3>

    <p>
      Scalability on the server side depends greatly on the
      <i>concurrency model</i> in use.  TAO supports two concurrency
      models:
    </p>

    <ol>
      <li>Reactive, and</li>
      <li>Thread-per-connection</li>
    </ol>

    <p>
      The thread-per-connection concurrency model is described above
      in the
      <i><a href="#server_throughput">Optimizing Server
	  Throughput</a></i>
      section.
    </p>

    <p>
      A <i>reactive</i> concurrency model employs the Reactor design
      pattern to demultiplex incoming CORBA requests.  The underlying
      event demultiplexing mechanism is typically one of the
      mechanisms provided by the operating system, such as the
      <code>select(2)</code> system call.  To enable this concurrency
      model, add the following option to the
      <code>
	<a href="Options.html#DefaultServer">Server_Strategy_Factory</a>
      </code>
      entry in your <code>svc.conf</code> file:
    </p>

    <blockquote>
      <code>
	<a href="Options.html#orb_concurrency">
	  -ORBConcurrency</a> reactive
      </code>
    </blockquote>

    <p>
      This is the default setting used by TAO.
    </p>

    <p>
      The reactive concurrency model provides improved scalability on
      the server side due to the fact that less resources are used,
      which in turn allows a very large number of requests to be
      handled by the server side ORB.  This concurrency model provides
      much better scalability than the thread-per-connection model
      described above.
    </p>

    <p>
      Further scalability tuning can be achieved by choosing a Reactor
      appropriate for your application.  For example, if your
      application is single-threaded then a reactor optimized for
      single-threaded use may be appropriate.  To select a
      single-threaded <code>select(2)</code> based reactor, add the
      following option to the
      <code>
	<a href="Options.html#AdvancedResourceFactory">Advanced_Resource_Factory</a>
      </code>
      entry in your <code>svc.conf</code> file:
    </p>

    <blockquote>
      <code>
	<a href="Options.html#-ORBReactorType">
	  -ORBReactorType</a> select_st
      </code>
    </blockquote>

    <p>
      If your application uses thread pools, then the thread pool
      reactor may be a better choice.  To use it, add the following
      option instead:
    </p>

    <blockquote>
      <code>
	<a href="Options.html#-ORBReactorType">
	  -ORBReactorType</a> tp_reactor
      </code>
    </blockquote>

    <p>
      This is TAO's default reactor.  See the
      <code>
	<a href="Options.html#-ORBReactorType">-ORBReactorType</a>
      </code>
      documentation for other reactor choices.
    </p>

    <p>
      Note that may have to link the <code>TAO_Strategies</code>
      library into your application in order to take advantage of the
      <code>
	<a href="Options.html#AdvancedResourceFactory">Advanced_Resource_Factory</a>
      </code>
      features, such as alternate reactor choices.
    </p>

    <p>
      A third concurrency model, <i>un</i>supported by TAO, is
      <i>thread-per-request</i>.  In this case, a single thread is
      used to service each request as it arrives.  This concurrency
      model generally provides neither scalability nor speed, which is
      the reason why it is often not used in practice.
    </p>

    <hr><P>
    <address><a href="mailto:ossama@uci.edu">Ossama Othman</a></address>
<!-- Created: Mon Nov 26 13:22:00 PST 2001 -->
<!-- hhmts start -->
Last modified: Sun Dec 16 10:03:16 Pacific Standard Time 2001
<!-- hhmts end -->
  </body>
</html>