diff options
Diffstat (limited to 'TAO/CIAO/CIDLC/DesignNotes')
-rw-r--r-- | TAO/CIAO/CIDLC/DesignNotes | 457 |
1 files changed, 457 insertions, 0 deletions
diff --git a/TAO/CIAO/CIDLC/DesignNotes b/TAO/CIAO/CIDLC/DesignNotes new file mode 100644 index 00000000000..b01fd452762 --- /dev/null +++ b/TAO/CIAO/CIDLC/DesignNotes @@ -0,0 +1,457 @@ + +Intention of this file is to capture and document CIDL complier design +ideas/decisions. + +Conceptual parts of CIDL compiler design +---------------------------------------- + +Option Parser Consists of option parser and option + database. + +C Preprocessor Interfacing Represents mechanism of preprocessing + cidl files. + +IDL Compiler Interfacing Represents mechanism of invoking IDL + compiler. + +Scanner Scanner for preprocessed cidl file. + +Parser CIDL grammar parser. Consists of grammar + and semantic rules. + +Syntax Tree Intermediate representation of cidl file. + Consists of syntax tree nodes itself and + perhaps symbol tables. + +Semantic Analyzer Traverses Syntax Tree and performs + semantic analysis as well as some + semantic expansions. + + +Code Generation Stream Stream to output generated code to. Used + by concrete Code Generators + +Code Generators +{ + + Executor Mapping Generator Generator for local executor mapping. + + Executor Implementation Generator Generator for partial implementation + of local executor mapping. + + Skeleton Thunk Generator Generator for skeleton thunks i.e. + code that implements skeleton and + thunks user-defined functions to + executor mapping. +} + +Compiler driver Establishes order of execution of + different components as part of + compilation process. + + +How everything works together +----------------------------- + +(1) Compiler Driver executes Option Parser to populate Option Database + +(2) Compiler Driver executes C Preprocessor on a supplied cidl file + +(3) Compiler Driver executes Parser which uses Scanner to scan preprocessed + cidl file and generates Syntax Tree by means of semantic rules. + +(4) At this point we have Syntax Tree corresponding to the original cidl + file. Compiler Driver executes Executor Mapping Generator, + Executor Implementation Generator and Skeleton Thunk Generator on + Syntax Tree. + + + +General Design Ideas/Decision +------------- + +[IDEA]: There is an effort to use autoconf/automake in ACE/TAO. Maybe it's + a good idea to start using it with CIDLC? There is one side advantage + of this approach: if we decide to embed GCC CPP then we will have to + use configure (or otherwise ACE-ify the code which doesn't sound like + a right solution). + +[IDEA]: CIDLC is a prototype for a new IDLC, PSDLC and IfR model. Here are + basic concepts: + + - use common IDL grammar, semantic rules and syntax tree nodes + for IDLC, CIDLC, PSDLC and IfR. Possibly have several libraries + for example ast_idl-2.so, ast_idl-3.so, scaner_idl-2.so + scaner_idl-3.so, parser_idl-2.so, parser_idl-3.so. Dependency + graph would look like this: + + + ast_idl-2.so scanner_idl-2.so + | | + |---------------------------------| + | | | + | | | + | parser_idl-2.so | + | | | + ast_idl-3.so | scanner_idl-3.so + | | | + | | | + | | | + ---------parser_idl-3.so--------- + + Same idea applies for CIDL and PSDL. + + + - use the same internal representation (syntax tree) in all + compilers and IfR. This way at some stage if we will need + to make one of the compilers IfR-integrated (import keyword?) + then it will be a much easier task than it's now. This internal + representation may also be usable in typecodes + + @@ boris: not clear to me. + + @@ jeff: A typecode is like a piece of the Syntax Tree with these + exceptions - + + (1) There is no typecode for an IDL module. + + (2) Typecodes for interfaces and valuetypes lack some of the + information in the corresponding Syntax Tree nodes. + + With these exceptions in mind, a typecode can be composed and + traversed in the same manner as a Syntax Tree, perhaps with + different classes than used to compose the ST itself. + + @@ boris: Ok, let me see if I got it right. So when typecode + is kept in parsed state (as opposite to binary) (btw, when + does it happen?) it makes sense to apply the same techniques + (if in fact not the same ST nodes and traversal mechs) as + for XIDL compilation. + +[IDEA]: We should be consistent with the way external compilers that we call + report errors. For now those are CPP and IDLC. + +Option Parser +------------- + +[IDEA]: Use Spirit parser framework to generate option parser. + +[IDEA]: Option Database is probably a singleton. + + @@ jeff: This is a good idea, especially when passing some of the + options to a preprocessor or spawned IDL compier. But I think we + will still need 'state' classes for the front and back ends (to + hold values set by command line options and default values) so + we can keep them decoupled). + + + @@ boris: I understand what you mean. Though I think we will be + able to do with one 'runtime database'. Each 'compiler module' + will be able to populate its 'namespace' with (1) default + values, (2) with module-specific options and (3) arbitrary + runtime information. I will present prototopy design shortly. + + +[IDEA]: It seems we will have to execute at least two external programs + as part of CIDLC execution: CPP and IDLC. Why wouldn't we follow + GCC specs model (gcc -dumpspecs). Here are candidates to be put into + specs: + + - default CPP name and options + - default IDLC name and options + - default file extensions and formats for different mappings + - other ideas? + +[IDEA]: Provide short and long option names (e.g. -o and --output-dir) + for every option (maybe except -I, -D, etc). + + +C Preprocessor Interfacing +-------------------------- + +[IDEA]: Embed/require GCC CPP + +[IDEA]: We need a new model of handling includes in CIDLC (as well as IDLC). + Right now I'm mentally testing a new model (thanks to Carlos for the + comments). Soon I will put the description here. + +[IDEA]: We cannot move cidl file being preprocessed to for example /tmp + as it's currently the case with IDLC. + +[IDEA]: Can we use pipes (ACE Pipes) portably to avoid temporary files? + (Kitty, you had some ideas about that?) + + + +IDL Compiler Interfacing +------------------------ + +[IDEA]: Same as for CPP: Can we use pipes? + + @@ jeff: check with Nanbor on this. I think there may be CCM/CIAO + use cases where we need the intermediate IDL file. + +[IDEA]: Will need a mechanism to pass options to IDLC from CIDLC command + line (would be nice to have this ability for CPP as well). + Something like -x in xterm? Better ideas? + + + +Scanner +------ + +[IDEA]: Use Spirit framework to construct scanner. The resulting sequence + can be sequence of objects? BTW, Spirit parser expects a "forward + iterator"-based scanner. So this basically mean that we may have to + keep the whole sequence in memory. BTW, this is another good reason + to have scanner: if we manage to make scanner a predictable parser + (i.e. no backtracking) then we don't have to keep the whole + preprocessed cidl file in memory. + + + +Parser +------ + +[IDEA]: Use Spirit framework to construct parser. + +[IDEA]: Define IDL grammar as a number of grammar capsules. This way it's + much easier to reuse/inherit even dynamically. Need to elaborate + this idea. + +[IDEA]: Use functors as semantic actions. This way we can specify (via + functor's data member) on which Syntax Tree they are working. + Bad side: semantic rules are defined during grammar construction. + However we can use a modification of the factory method pattern. + Better ideas? + + @@ jeff: I think ST node creation with a factory + is a good idea - another ST implementation could be plugged in, + as long as it uses a factory with the same method names. + + @@ boris: Right. In fact it's our 'improved' way of handling 'BE' + usecases. + + + +Syntax Tree +----------- + +[IDEA]: Use interface repository model as a base for Syntax Tree hierarchy. + +[IDEA]: Currently (in IDLC) symbol lookup is accomplished by AST navigation, + and is probably the biggest single bottleneck in performance. Perhaps + a separate symbol table would be preferable. Also, lookups could be + specialized, e.g., for declaration, for references, and perhaps a + third type for argument-related lookups. + +[NOTE]: If we are to implement symbol tables then we need to think how we + are going to inherit (extend) this tables. + +[NOTE]: Inheritance/supports graphs: these graphs need to be traversed at + several points in the back end. Currently they are rebuilt for each + use, using an n-squared algorithm. We could at least build them only + once for each interface/valuetype, perhaps even with a better + algorithm. It could be integrated into inheritance/supports error + checking at node creation time, which also be streamlined. + + @@ boris: Well, I think we should design our Syntax Tree so that + every interface/valuetype has a list (flat?) of interfaces it + inherits from/supports. + +[IDEA]: We will probably want to use factories to instantiate Syntax Tree + Nodes (STN). This will allow a concrete code generators to alter (i.e. + inherit off and extend) vanilla STNs (i.e. alternative to BE nodes + in current IDLC design). + + +Common Syntax Tree traversal Design Ideas/Decision +-------------------------------------------------- + +[IDEA] If we specify Syntax Tree traversal facility then we will be able + to specify (or even plug dynamically) Syntax Tree traversal agents + that may not only generate something but also annotate or modify + Syntax Tree. We are already using this technique for a number of + features (e.g. AMI, IDL3 extension, what else?) but all these agents + are hardwired inside TAO IDLC. If we have this facility then we will + be able to produce modular and highly extensible design. Notes: + + - Some traversal agents can change Syntax Tree so that it will be + unusable by some later traversal agents. So maybe the more + generic approach would be to produce new Syntax Tree? + + @@ jeff: Yes, say for example that we were using a common ST + representation for the IDL compiler and the IFR. We would not + want to send the extra AMI nodes to the IFR so in that case + simple modification of the ST might not be best. + +[IDEA] Need a generic name for "Syntax Tree Traversal Agents". What about + "Syntax Tree Traverser"? + + +Code Generation Stream +---------------------- + +[IDEA] Use language indentation engines for code generation (like a c-mode + in emacs). The idea is that code like this + + out << "long foo (long arg0, " << endl + << " long arg1) " << endl + << "{ " << endl + << " return arg0 + arg1; " << endl + << "} " << endl; + + will result in a generated code like this: + + namespace N + { + ... + + long foo (long arg0, + long arg1) + { + return arg0 + arg1; + } + + ... + } + + Note that no special actions were taken to ensure proper indentation. + Instead the stream's indentation engine is responsible for that. + The same mech can be used for different languages (e.g. XML). + + +Code Generators +--------------- + +[IDEA] It makes sense to establish a general concept of code generators. + "Executor Mapping Generator", "Executor Implementation Generator" + and "Skeleton Thunk Generator" would be a concrete code generators. + +[IDEA] Expression evaluation: currently the result (not the expression) + is generated, which may not always be necessary. + + @@ boris: I would say may not always be correct + + + However, for purposes of type coercion and other checking (such as + for positive integer values in string, array and sequence bounds) + evaluation must be done internally. + + @@ boris: note that evaluation is needed to only verify that things + are correct. You don't have to (shouldn't?) substitute original + (const) expression with what's been evaluated. + + + @@ jeff: it may be necessary in some cases to append 'f' or 'U' to + a generated number to avoid a C++ compiler warning. + + @@ boris: shouldn't this 'f' and 'U' be in IDL as well? + +[IDEA] I wonder if it's a good idea to use a separate pass over syntax tree + for semantic checking (e.g. type coercion, positive values for + sequence bounds). + + @@ jeff: This may hurt performance a little - more lookups - but it + will improve error reporting. + + @@ boris: As we dicussed earlier this pass could be used to do + 'semantic expansions' (e.g. calculate a flat list of interface's + children, etc). Also I don't think we should worry about speed + very much here (of course I don't say we have to be stupid ;-) + In fact if we are trading better design vs faster compilation + at this stage we should always go for better design. + + +Executor Mapping Generator +-------------------------- + + + +Executor Implementation Generator +-------------------------------- + +[IDEA]: Translate CIDL composition to C++ namespace. + + + +Skeleton Thunk Generator +------------------------ + + + + +Compiler driver +--------------- + + + +Vault +----- + +Some thoughts from Jeff that I are not directly related to CIDLC and are +rather current IDLC design defects: + +* AMI/AMH implied IDL: more can be done in the BE preprocessing pass, + hopefully eliminating a big chunk of the huge volume of AMI/AMH visitor + code. The implied IDL generated for CCM types, for example, leaves almost + nothing extra for the visitors to do. + +* Fwd decl redefinition: forward declaration nodes all initially contain a + heap-allocated dummy full-definition member, later replaced by a copy + of the full definition. This needs to be streamlined. + +* Memory leaks: inconsistent copying/passing policies make it almost + impossible to eliminate the huge number of leaks. The front end will be + more and more reused, and it may be desirable to make it executable as a + function call, in which case it will important to eliminate the leaks. + Perhaps copying of AST nodes can be eliminated with reference counting or + just with careful management, similarly for string identifiers and literals. + Destroy() methods have been put in all the node classes, and are called + recursively from the AST root at destruction time, but they are far from + doing a complete job. + +* Visitor instantiation: the huge visitor factory has already been much + reduced, and the huge enum of context state values is being reduced. + However there will still be an abundance of switch statements at nearly + every instance of visitor creation at scope nesting. We could make better + use of polymorphism to get rid of them. + +* Node narrowing: instead of the impenetrable macros we use now, we + could either generate valuetype-like downcast methods for the (C)IDL + types, or we could just use dynamic_cast. + +* Error reporting: making error messages more informative, and error recovery + could both be a lot better, as they are in most other IDL compilers. If a + recursive descent parser is used (such as Spirit), there is a simple + generic algorithm for error recovery. + + +* FE/BE node classes: if BE node classes are implemented at all, there should + be a complete separation of concerns - BE node classes should contain only + info related to code generation, and FE node classes should contain only + info related to the AST representation. As the front end becomes more + modular and reusable, this will become more and more necessary. + + @@ boris: It doesn't seem we will need two separate and parallel hierarhies. + +* Undefined fwd decls: now that we have dropped support for platforms without + namespaces, the code generated for fwd declarations not defined in the same + translation unit can be much improved, most likely by the elimination of + generated flat-name global methods, and perhaps other improvements as well. + +* Strategized code generation: many places now have either lots of + duplication, or an explosion of branching in a single visitor. Adding code + generation for use cases incrementally may give us an opportunity to + refactor and strategize it better. + +* Node generator: this class does nothing more than call 'new' and pass + unchanged the arguments it gets to the appropriate constructor - it can be + eliminated. + +* Virtual methods: there are many member functions in the IDL compiler that + are needlessly virtual. + +* Misc. leveraging: redesign of mechanisms listed above can have an effect + on other mechanisms, such as the handling of pragma prefix, typeprefix, and + reopened modules. |