Note: this file is somewhat outdated

Intention of this file is to capture and document CIDL complier design
ideas/decisions.

Conceptual parts of CIDL compiler design
----------------------------------------

Option Parser                       Consists of option parser and option
                                    database.

C Preprocessor Interfacing          Represents mechanism of preprocessing
                                    cidl files.

IDL Compiler Interfacing            Represents mechanism of invoking IDL
                                    compiler.

Scanner                             Scanner for preprocessed cidl file.

Parser                              CIDL grammar parser. Consists of grammar
                                    and semantic rules.

Syntax Tree                         Intermediate representation of cidl file.
                                    Consists of syntax tree nodes itself and
                                    perhaps symbol tables.

Semantic Analyzer                   Traverses Syntax Tree and performs
                                    semantic analysis as well as some
                                    semantic expansions.


Code Generation Stream              Stream to output generated code to. Used
                                    by concrete Code Generators

Code Generators
{

  Executor Mapping Generator          Generator for local executor mapping.

  Executor Implementation Generator   Generator for partial implementation
                                      of local executor mapping.

  Skeleton Thunk Generator            Generator for skeleton thunks i.e.
                                      code that implements skeleton and
                                      thunks user-defined functions to
                                      executor mapping.
}

Compiler driver                     Establishes order of execution of
                                    different components as part of
                                    compilation process.


How everything works together
-----------------------------

(1) Compiler Driver executes Option Parser to populate Option Database

(2) Compiler Driver executes C Preprocessor on a supplied cidl file

(3) Compiler Driver executes Parser which uses Scanner to scan preprocessed
    cidl file and generates Syntax Tree by means of semantic rules.

(4) At this point we have Syntax Tree corresponding to the original cidl
    file. Compiler Driver executes Executor Mapping Generator,
    Executor Implementation Generator and Skeleton Thunk Generator on
    Syntax Tree.


General Design Ideas/Decision
-------------

[IDEA]: There is an effort to use autoconf/automake in ACE/TAO. Maybe it's
        a good idea to start using it with CIDLC? There is one side advantage
        of this approach: if we decide to embed GCC CPP then we will have to
        use configure (or otherwise ACE-ify the code which doesn't sound like
        a right solution).

[IDEA]: CIDLC is a prototype for a new IDLC, PSDLC and IfR model. Here are
        basic concepts:

          - use common IDL grammar, semantic rules and syntax tree nodes
            for IDLC, CIDLC, PSDLC and IfR. Possibly have several libraries
            for example ast_idl-2.so, ast_idl-3.so, scaner_idl-2.so
            scaner_idl-3.so, parser_idl-2.so, parser_idl-3.so. Dependency
            graph would look like this:


                 ast_idl-2.so                     scanner_idl-2.so
                      |                                 |
                      |---------------------------------|
                      |               |                 |
                      |               |                 |
                      |         parser_idl-2.so         |
                      |               |                 |
                 ast_idl-3.so         |           scanner_idl-3.so
                      |               |                 |
                      |               |                 |
                      |               |                 |
                       ---------parser_idl-3.so---------

            Same idea applies for CIDL and PSDL.


          - use the same internal representation (syntax tree) in all
            compilers and IfR. This way at some stage if we will need
            to make one of the compilers IfR-integrated (import keyword?)
            then it will be a much easier task than it's now. This internal
            representation may also be usable in typecodes

            @@ boris: not clear to me.

            @@ jeff: A typecode is like a piece of the Syntax Tree with these
               exceptions -

                (1) There is no typecode for an IDL module.

                (2) Typecodes for interfaces and valuetypes lack some of the
                    information in the corresponding Syntax Tree nodes.

               With these exceptions in mind, a typecode can be composed and
               traversed in the same manner as a Syntax Tree, perhaps with
               different classes than used to compose the ST itself.

            @@ boris: Ok, let me see if I got it right. So when typecode
               is kept in parsed state (as opposite to binary) (btw, when
               does it happen?) it makes sense to apply the same techniques
               (if in fact not the same ST nodes and traversal mechs) as
               for XIDL compilation.

[IDEA]: We should be consistent with the way external compilers that we call
        report errors. For now those are CPP and IDLC.

Option Parser
-------------

[IDEA]: Use Spirit parser framework to generate option parser.

[IDEA]: Option Database is probably a singleton.

        @@ jeff: This is a good idea, especially when passing some of the
           options to a preprocessor or spawned IDL compier. But I think we
           will still need 'state' classes for the front and back ends (to
           hold values set by command line options and default values) so
           we can keep them decoupled).


        @@ boris: I understand what you mean. Though I think we will be
           able to do with one 'runtime database'. Each 'compiler module'
           will be able to populate its 'namespace' with (1) default
           values, (2) with module-specific options and (3) arbitrary
           runtime information. I will present prototopy design shortly.


[IDEA]: It seems we will have to execute at least two external programs
        as part of CIDLC execution: CPP and IDLC. Why wouldn't we follow
        GCC specs model (gcc -dumpspecs). Here are candidates to be put into
        specs:

          - default CPP name and options
          - default  IDLC name and options
          - default file extensions and formats for different mappings
          - other ideas?

[IDEA]: Provide short and long option names (e.g. -o and --output-dir)
        for every option (maybe except -I, -D, etc).


C Preprocessor Interfacing
--------------------------

[IDEA]: Embed/require GCC CPP

[IDEA]: We need a new model of handling includes in CIDLC (as well as IDLC).
        Right now I'm mentally testing a new model (thanks to Carlos for the
        comments). Soon I will put the description here.

[IDEA]: We cannot move cidl file being preprocessed to for example /tmp
        as it's currently the case with IDLC.

[IDEA]: Can we use pipes (ACE Pipes) portably to avoid temporary files?
        (Kitty, you had some ideas about that?)


IDL Compiler Interfacing
------------------------

[IDEA]: Same as for CPP: Can we use pipes?

        @@ jeff: check with Nanbor on this. I think there may be CCM/CIAO
           use cases where we need the intermediate IDL file.

[IDEA]: Will need a mechanism to pass options to IDLC from CIDLC command
        line (would be nice to have this ability for CPP as well).
        Something like -x in xterm? Better ideas?


Scanner
------

[IDEA]: Use Spirit framework to construct scanner. The resulting sequence
        can be sequence of objects? BTW, Spirit parser expects a "forward
        iterator"-based scanner. So this basically mean that we may have to
        keep the whole sequence in memory. BTW, this is another good reason
        to have scanner: if we manage to make scanner a predictable parser
        (i.e. no backtracking) then we don't have to keep the whole
        preprocessed cidl file in memory.


Parser
------

[IDEA]: Use Spirit framework to construct parser.

[IDEA]: Define IDL grammar as a number of grammar capsules. This way it's
        much easier to reuse/inherit even dynamically. Need to elaborate
        this idea.

[IDEA]: Use functors as semantic actions. This way we can specify (via
        functor's data member) on which Syntax Tree they are working.
        Bad side: semantic rules are defined during grammar construction.
        However we can use a modification of the factory method pattern.
        Better ideas?

        @@ jeff: I think ST node creation with a factory
           is a good idea - another ST implementation could be plugged in,
           as long as it uses a factory with the same method names.

        @@ boris: Right. In fact it's our 'improved' way of handling 'BE'
           usecases.


Syntax Tree
-----------

[IDEA]: Use interface repository model as a base for Syntax Tree hierarchy.

[IDEA]: Currently (in IDLC) symbol lookup is accomplished by AST navigation,
        and is probably the biggest single bottleneck in performance. Perhaps
        a separate symbol table would be preferable. Also, lookups could be
        specialized, e.g., for declaration, for references, and perhaps a
        third type for argument-related lookups.

[NOTE]: If we are to implement symbol tables then we need to think how we
        are going to inherit (extend) this tables.

[NOTE]: Inheritance/supports graphs: these graphs need to be traversed at
        several points in the back end. Currently they are rebuilt for each
        use, using an n-squared algorithm. We could at least build them only
        once for each interface/valuetype, perhaps even with a better
        algorithm. It could be integrated into inheritance/supports error
        checking at node creation time, which also be streamlined.

        @@ boris: Well, I think we should design our Syntax Tree so that
           every interface/valuetype has a list (flat?) of interfaces it
           inherits from/supports.

[IDEA]: We will probably want to use factories to instantiate Syntax Tree
        Nodes (STN). This will allow a concrete code generators to alter (i.e.
        inherit off and extend) vanilla STNs (i.e. alternative to BE nodes
        in current IDLC design).


Common Syntax Tree traversal Design Ideas/Decision
--------------------------------------------------

[IDEA] If we specify Syntax Tree traversal facility then we will be able
       to specify (or even plug dynamically) Syntax Tree traversal agents
       that may not only generate something but also annotate or modify
       Syntax Tree. We are already using this technique for a number of
       features (e.g. AMI, IDL3 extension, what else?) but all these agents
       are hardwired inside TAO IDLC. If we have this facility then we will
       be able to produce modular and highly extensible design. Notes:

       - Some traversal agents can change Syntax Tree so that it will be
         unusable by some later traversal agents. So maybe the more
         generic approach would be to produce new Syntax Tree?

         @@ jeff: Yes, say for example that we were using a common ST
            representation for the IDL compiler and the IFR. We would not
            want to send the extra AMI nodes to the IFR so in that case
            simple modification of the ST might not be best.

[IDEA] Need a generic name for "Syntax Tree Traversal Agents". What about
       "Syntax Tree Traverser"?


Code Generation Stream
----------------------

[IDEA] Use language indentation engines for code generation (like a c-mode
       in emacs). The idea is that code like this

       out << "long foo (long arg0,  "  << endl
           << "          long arg1)  "  << endl
           << "{                     "  << endl
           << "  return arg0 + arg1; "  << endl
           << "}                     "  << endl;

	will result in a generated code like this:

       namespace N
       {
         ...

         long foo (long arg0,
                   long arg1)
         {
           return arg0 + arg1;
         }

         ...
       }

       Note that no special actions were taken to ensure proper indentation.
       Instead the stream's indentation engine is responsible for that.
       The same mech can be used for different languages (e.g. XML).


Code Generators
---------------

[IDEA] It makes sense to establish a general concept of code generators.
       "Executor Mapping Generator", "Executor Implementation Generator"
       and "Skeleton Thunk Generator" would be a concrete code generators.

[IDEA] Expression evaluation: currently the result (not the expression)
       is generated, which may not always be necessary.

       @@ boris: I would say may not always be correct


       However, for purposes of type coercion and other checking (such as
       for positive integer values in string, array and sequence bounds)
       evaluation must be done internally.

       @@ boris: note that evaluation is needed to only verify that things
          are correct. You don't have to (shouldn't?) substitute original
          (const) expression with what's been evaluated.


       @@ jeff: it may be necessary in some cases to append 'f' or 'U' to
          a generated number to avoid a C++ compiler warning.

       @@ boris: shouldn't this 'f' and 'U' be in IDL as well?

[IDEA] I wonder if it's a good idea to use a separate pass over syntax tree
       for semantic checking (e.g. type coercion, positive values for
       sequence bounds).

       @@ jeff: This may hurt performance a little - more lookups - but it
          will improve error reporting.

       @@ boris:  As we dicussed earlier this pass could be used to do
          'semantic expansions' (e.g. calculate a flat list of interface's
          children, etc). Also I don't think we should worry about speed
          very much here (of course I don't say we have to be stupid ;-)
          In fact if we are trading better design vs faster compilation
          at this stage we should always go for better design.


Executor Mapping Generator
--------------------------


Executor Implementation Generator
--------------------------------

[IDEA]: Translate CIDL composition to C++ namespace.


Skeleton Thunk Generator
------------------------


Compiler driver
---------------


Vault
-----

Some thoughts from Jeff that I are not directly related to CIDLC and are
rather current IDLC design defects:

* AMI/AMH implied IDL: more can be done in the BE preprocessing pass,
  hopefully eliminating a big chunk of the huge volume of AMI/AMH visitor
  code. The implied IDL generated for CCM types, for example, leaves almost
  nothing extra for the visitors to do.

* Fwd decl redefinition: forward declaration nodes all initially contain a
  heap-allocated dummy full-definition member, later replaced by a copy
  of the full definition. This needs to be streamlined.

* Memory leaks: inconsistent copying/passing policies make it almost
  impossible to eliminate the huge number of leaks. The front end will be
  more and more reused, and it may be desirable to make it executable as a
  function call, in which case it will important to eliminate the leaks.
  Perhaps copying of AST nodes can be eliminated with reference counting or
  just with careful management, similarly for string identifiers and literals.
  Destroy() methods have been put in all the node classes, and are called
  recursively from the AST root at destruction time, but they are far from
  doing a complete job.

* Visitor instantiation: the huge visitor factory has already been much
  reduced, and the huge enum of context state values is being reduced.
  However there will still be an abundance of switch statements at nearly
  every instance of visitor creation at scope nesting. We could make better
  use of polymorphism to get rid of them.

* Node narrowing: instead of the impenetrable macros we use now, we
  could either generate valuetype-like downcast methods for the (C)IDL
  types, or we could just use dynamic_cast.

* Error reporting: making error messages more informative, and error recovery
  could both be a lot better, as they are in most other IDL compilers. If a
  recursive descent parser is used (such as Spirit), there is a simple
  generic algorithm for error recovery.


* FE/BE node classes: if BE node classes are implemented at all, there should
  be a complete separation of concerns - BE node classes should contain only
  info related to code generation, and FE node classes should contain only
  info related to the AST representation. As the front end becomes more
  modular and reusable, this will become more and more necessary.

  @@ boris: It doesn't seem we will need two separate and parallel hierarhies.

* Undefined fwd decls: now that we have dropped support for platforms without
  namespaces, the code generated for fwd declarations not defined in the same
  translation unit can be much improved, most likely by the elimination of
  generated flat-name global methods, and perhaps other improvements as well.

* Strategized code generation: many places now have either lots of
  duplication, or an explosion of branching in a single visitor. Adding code
  generation for use cases incrementally may give us an opportunity to
  refactor and strategize it better.

* Node generator: this class does nothing more than call 'new' and pass
  unchanged the arguments it gets to the appropriate constructor - it can be
  eliminated.

* Virtual methods: there are many member functions in the IDL compiler that
  are needlessly virtual.

* Misc. leveraging: redesign of mechanisms listed above can have an effect
  on other mechanisms, such as the handling of pragma prefix, typeprefix, and
  reopened modules.