1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
|
Note: this file is somewhat outdated
Intention of this file is to capture and document CIDL complier design
ideas/decisions.
Conceptual parts of CIDL compiler design
----------------------------------------
Option Parser Consists of option parser and option
database.
C Preprocessor Interfacing Represents mechanism of preprocessing
cidl files.
IDL Compiler Interfacing Represents mechanism of invoking IDL
compiler.
Scanner Scanner for preprocessed cidl file.
Parser CIDL grammar parser. Consists of grammar
and semantic rules.
Syntax Tree Intermediate representation of cidl file.
Consists of syntax tree nodes itself and
perhaps symbol tables.
Semantic Analyzer Traverses Syntax Tree and performs
semantic analysis as well as some
semantic expansions.
Code Generation Stream Stream to output generated code to. Used
by concrete Code Generators
Code Generators
{
Executor Mapping Generator Generator for local executor mapping.
Executor Implementation Generator Generator for partial implementation
of local executor mapping.
Skeleton Thunk Generator Generator for skeleton thunks i.e.
code that implements skeleton and
thunks user-defined functions to
executor mapping.
}
Compiler driver Establishes order of execution of
different components as part of
compilation process.
How everything works together
-----------------------------
(1) Compiler Driver executes Option Parser to populate Option Database
(2) Compiler Driver executes C Preprocessor on a supplied cidl file
(3) Compiler Driver executes Parser which uses Scanner to scan preprocessed
cidl file and generates Syntax Tree by means of semantic rules.
(4) At this point we have Syntax Tree corresponding to the original cidl
file. Compiler Driver executes Executor Mapping Generator,
Executor Implementation Generator and Skeleton Thunk Generator on
Syntax Tree.
General Design Ideas/Decision
-------------
[IDEA]: There is an effort to use autoconf/automake in ACE/TAO. Maybe it's
a good idea to start using it with CIDLC? There is one side advantage
of this approach: if we decide to embed GCC CPP then we will have to
use configure (or otherwise ACE-ify the code which doesn't sound like
a right solution).
[IDEA]: CIDLC is a prototype for a new IDLC, PSDLC and IfR model. Here are
basic concepts:
- use common IDL grammar, semantic rules and syntax tree nodes
for IDLC, CIDLC, PSDLC and IfR. Possibly have several libraries
for example ast_idl-2.so, ast_idl-3.so, scaner_idl-2.so
scaner_idl-3.so, parser_idl-2.so, parser_idl-3.so. Dependency
graph would look like this:
ast_idl-2.so scanner_idl-2.so
| |
|---------------------------------|
| | |
| | |
| parser_idl-2.so |
| | |
ast_idl-3.so | scanner_idl-3.so
| | |
| | |
| | |
---------parser_idl-3.so---------
Same idea applies for CIDL and PSDL.
- use the same internal representation (syntax tree) in all
compilers and IfR. This way at some stage if we will need
to make one of the compilers IfR-integrated (import keyword?)
then it will be a much easier task than it's now. This internal
representation may also be usable in typecodes
@@ boris: not clear to me.
@@ jeff: A typecode is like a piece of the Syntax Tree with these
exceptions -
(1) There is no typecode for an IDL module.
(2) Typecodes for interfaces and valuetypes lack some of the
information in the corresponding Syntax Tree nodes.
With these exceptions in mind, a typecode can be composed and
traversed in the same manner as a Syntax Tree, perhaps with
different classes than used to compose the ST itself.
@@ boris: Ok, let me see if I got it right. So when typecode
is kept in parsed state (as opposite to binary) (btw, when
does it happen?) it makes sense to apply the same techniques
(if in fact not the same ST nodes and traversal mechs) as
for XIDL compilation.
[IDEA]: We should be consistent with the way external compilers that we call
report errors. For now those are CPP and IDLC.
Option Parser
-------------
[IDEA]: Use Spirit parser framework to generate option parser.
[IDEA]: Option Database is probably a singleton.
@@ jeff: This is a good idea, especially when passing some of the
options to a preprocessor or spawned IDL compier. But I think we
will still need 'state' classes for the front and back ends (to
hold values set by command line options and default values) so
we can keep them decoupled).
@@ boris: I understand what you mean. Though I think we will be
able to do with one 'runtime database'. Each 'compiler module'
will be able to populate its 'namespace' with (1) default
values, (2) with module-specific options and (3) arbitrary
runtime information. I will present prototopy design shortly.
[IDEA]: It seems we will have to execute at least two external programs
as part of CIDLC execution: CPP and IDLC. Why wouldn't we follow
GCC specs model (gcc -dumpspecs). Here are candidates to be put into
specs:
- default CPP name and options
- default IDLC name and options
- default file extensions and formats for different mappings
- other ideas?
[IDEA]: Provide short and long option names (e.g. -o and --output-dir)
for every option (maybe except -I, -D, etc).
C Preprocessor Interfacing
--------------------------
[IDEA]: Embed/require GCC CPP
[IDEA]: We need a new model of handling includes in CIDLC (as well as IDLC).
Right now I'm mentally testing a new model (thanks to Carlos for the
comments). Soon I will put the description here.
[IDEA]: We cannot move cidl file being preprocessed to for example /tmp
as it's currently the case with IDLC.
[IDEA]: Can we use pipes (ACE Pipes) portably to avoid temporary files?
(Kitty, you had some ideas about that?)
IDL Compiler Interfacing
------------------------
[IDEA]: Same as for CPP: Can we use pipes?
@@ jeff: check with Nanbor on this. I think there may be CCM/CIAO
use cases where we need the intermediate IDL file.
[IDEA]: Will need a mechanism to pass options to IDLC from CIDLC command
line (would be nice to have this ability for CPP as well).
Something like -x in xterm? Better ideas?
Scanner
------
[IDEA]: Use Spirit framework to construct scanner. The resulting sequence
can be sequence of objects? BTW, Spirit parser expects a "forward
iterator"-based scanner. So this basically mean that we may have to
keep the whole sequence in memory. BTW, this is another good reason
to have scanner: if we manage to make scanner a predictable parser
(i.e. no backtracking) then we don't have to keep the whole
preprocessed cidl file in memory.
Parser
------
[IDEA]: Use Spirit framework to construct parser.
[IDEA]: Define IDL grammar as a number of grammar capsules. This way it's
much easier to reuse/inherit even dynamically. Need to elaborate
this idea.
[IDEA]: Use functors as semantic actions. This way we can specify (via
functor's data member) on which Syntax Tree they are working.
Bad side: semantic rules are defined during grammar construction.
However we can use a modification of the factory method pattern.
Better ideas?
@@ jeff: I think ST node creation with a factory
is a good idea - another ST implementation could be plugged in,
as long as it uses a factory with the same method names.
@@ boris: Right. In fact it's our 'improved' way of handling 'BE'
usecases.
Syntax Tree
-----------
[IDEA]: Use interface repository model as a base for Syntax Tree hierarchy.
[IDEA]: Currently (in IDLC) symbol lookup is accomplished by AST navigation,
and is probably the biggest single bottleneck in performance. Perhaps
a separate symbol table would be preferable. Also, lookups could be
specialized, e.g., for declaration, for references, and perhaps a
third type for argument-related lookups.
[NOTE]: If we are to implement symbol tables then we need to think how we
are going to inherit (extend) this tables.
[NOTE]: Inheritance/supports graphs: these graphs need to be traversed at
several points in the back end. Currently they are rebuilt for each
use, using an n-squared algorithm. We could at least build them only
once for each interface/valuetype, perhaps even with a better
algorithm. It could be integrated into inheritance/supports error
checking at node creation time, which also be streamlined.
@@ boris: Well, I think we should design our Syntax Tree so that
every interface/valuetype has a list (flat?) of interfaces it
inherits from/supports.
[IDEA]: We will probably want to use factories to instantiate Syntax Tree
Nodes (STN). This will allow a concrete code generators to alter (i.e.
inherit off and extend) vanilla STNs (i.e. alternative to BE nodes
in current IDLC design).
Common Syntax Tree traversal Design Ideas/Decision
--------------------------------------------------
[IDEA] If we specify Syntax Tree traversal facility then we will be able
to specify (or even plug dynamically) Syntax Tree traversal agents
that may not only generate something but also annotate or modify
Syntax Tree. We are already using this technique for a number of
features (e.g. AMI, IDL3 extension, what else?) but all these agents
are hardwired inside TAO IDLC. If we have this facility then we will
be able to produce modular and highly extensible design. Notes:
- Some traversal agents can change Syntax Tree so that it will be
unusable by some later traversal agents. So maybe the more
generic approach would be to produce new Syntax Tree?
@@ jeff: Yes, say for example that we were using a common ST
representation for the IDL compiler and the IFR. We would not
want to send the extra AMI nodes to the IFR so in that case
simple modification of the ST might not be best.
[IDEA] Need a generic name for "Syntax Tree Traversal Agents". What about
"Syntax Tree Traverser"?
Code Generation Stream
----------------------
[IDEA] Use language indentation engines for code generation (like a c-mode
in emacs). The idea is that code like this
out << "long foo (long arg0, " << endl
<< " long arg1) " << endl
<< "{ " << endl
<< " return arg0 + arg1; " << endl
<< "} " << endl;
will result in a generated code like this:
namespace N
{
...
long foo (long arg0,
long arg1)
{
return arg0 + arg1;
}
...
}
Note that no special actions were taken to ensure proper indentation.
Instead the stream's indentation engine is responsible for that.
The same mech can be used for different languages (e.g. XML).
Code Generators
---------------
[IDEA] It makes sense to establish a general concept of code generators.
"Executor Mapping Generator", "Executor Implementation Generator"
and "Skeleton Thunk Generator" would be a concrete code generators.
[IDEA] Expression evaluation: currently the result (not the expression)
is generated, which may not always be necessary.
@@ boris: I would say may not always be correct
However, for purposes of type coercion and other checking (such as
for positive integer values in string, array and sequence bounds)
evaluation must be done internally.
@@ boris: note that evaluation is needed to only verify that things
are correct. You don't have to (shouldn't?) substitute original
(const) expression with what's been evaluated.
@@ jeff: it may be necessary in some cases to append 'f' or 'U' to
a generated number to avoid a C++ compiler warning.
@@ boris: shouldn't this 'f' and 'U' be in IDL as well?
[IDEA] I wonder if it's a good idea to use a separate pass over syntax tree
for semantic checking (e.g. type coercion, positive values for
sequence bounds).
@@ jeff: This may hurt performance a little - more lookups - but it
will improve error reporting.
@@ boris: As we dicussed earlier this pass could be used to do
'semantic expansions' (e.g. calculate a flat list of interface's
children, etc). Also I don't think we should worry about speed
very much here (of course I don't say we have to be stupid ;-)
In fact if we are trading better design vs faster compilation
at this stage we should always go for better design.
Executor Mapping Generator
--------------------------
Executor Implementation Generator
--------------------------------
[IDEA]: Translate CIDL composition to C++ namespace.
Skeleton Thunk Generator
------------------------
Compiler driver
---------------
Vault
-----
Some thoughts from Jeff that I are not directly related to CIDLC and are
rather current IDLC design defects:
* AMI/AMH implied IDL: more can be done in the BE preprocessing pass,
hopefully eliminating a big chunk of the huge volume of AMI/AMH visitor
code. The implied IDL generated for CCM types, for example, leaves almost
nothing extra for the visitors to do.
* Fwd decl redefinition: forward declaration nodes all initially contain a
heap-allocated dummy full-definition member, later replaced by a copy
of the full definition. This needs to be streamlined.
* Memory leaks: inconsistent copying/passing policies make it almost
impossible to eliminate the huge number of leaks. The front end will be
more and more reused, and it may be desirable to make it executable as a
function call, in which case it will important to eliminate the leaks.
Perhaps copying of AST nodes can be eliminated with reference counting or
just with careful management, similarly for string identifiers and literals.
Destroy() methods have been put in all the node classes, and are called
recursively from the AST root at destruction time, but they are far from
doing a complete job.
* Visitor instantiation: the huge visitor factory has already been much
reduced, and the huge enum of context state values is being reduced.
However there will still be an abundance of switch statements at nearly
every instance of visitor creation at scope nesting. We could make better
use of polymorphism to get rid of them.
* Node narrowing: instead of the impenetrable macros we use now, we
could either generate valuetype-like downcast methods for the (C)IDL
types, or we could just use dynamic_cast.
* Error reporting: making error messages more informative, and error recovery
could both be a lot better, as they are in most other IDL compilers. If a
recursive descent parser is used (such as Spirit), there is a simple
generic algorithm for error recovery.
* FE/BE node classes: if BE node classes are implemented at all, there should
be a complete separation of concerns - BE node classes should contain only
info related to code generation, and FE node classes should contain only
info related to the AST representation. As the front end becomes more
modular and reusable, this will become more and more necessary.
@@ boris: It doesn't seem we will need two separate and parallel hierarhies.
* Undefined fwd decls: now that we have dropped support for platforms without
namespaces, the code generated for fwd declarations not defined in the same
translation unit can be much improved, most likely by the elimination of
generated flat-name global methods, and perhaps other improvements as well.
* Strategized code generation: many places now have either lots of
duplication, or an explosion of branching in a single visitor. Adding code
generation for use cases incrementally may give us an opportunity to
refactor and strategize it better.
* Node generator: this class does nothing more than call 'new' and pass
unchanged the arguments it gets to the appropriate constructor - it can be
eliminated.
* Virtual methods: there are many member functions in the IDL compiler that
are needlessly virtual.
* Misc. leveraging: redesign of mechanisms listed above can have an effect
on other mechanisms, such as the handling of pragma prefix, typeprefix, and
reopened modules.
|