From greyham Thu Oct 28 18:42:35 1993 Newsgroups: comp.lang.c++,comp.programming.literate Subject: An Automatic C++ documentation compilation project. Summary: Anyone willing to add C++ support to c2man? Keywords: c2man, C, C++, Literate Programming, Documentation Copyright 1993, 1994 by Graham Stoney. This may be freely redistributed or quoted, so long as it's attributed to me. Writing and maintaining documentation has often been a thorn in the side of the Software Engineer and Programmer. After spending a great deal of time and effort writing documentation about a program or software system, the code invariably changes, quickly rendering the documentation out of date. The documentation becomes misleading, gets neglected, and quickly becomes useless. "Literate Programming" is one approach to solving this problem. It effectively introduces a whole new (typesetting) language, requires a quite radical shift on the part of the "non-literate" programmer and still requires a good deal of effort on the part of the programmer[1]. I'd like to suggest a different approach which lies considerably closer to more traditional programming practices, and can offer quite immediate benefits when functional interface documentation is the main documentation required. The primary philosophy here is to use the programming language as far as possible to express the programmer's intentions, and to use comments only when the programming language is not sufficiently expressive. A comment can then become part of the language grammar which is recognised by a "documentation compiler". This tool parses a superset of the programming language and can automatically generate documentation in human-readable form by associating the programmer's comments with the objects in the code by their context. Whilst the idea of extracting documentation from comments in source code is by no means new, the difference here is that the comments actually form part of the grammar of the language recognised by the documentation compiler[2]. Comments should not repeat information that is already represented in the program code; for instance, a comment describing a function argument should not repeat the name and type of that argument (since that information has already been included, for the compiler), but should appear near the argument. For example, in C, the programmer should write this: /* include an example in the article */ enum Result example(int page /* page it appears on */); Rather than this: /* include an example in the article * * PARAMETERS: * int page page it appears on * * RETURNS: * RESULT_YES The readers agreed * RESULT_NO The readers disagreed * RESULT_YOURE_JOKING The readers disagreed strongly * RESULT_BLANK_LOOKS The readers didn't understand */ enum Result example(int page); Also in this example, the documentation compiler knows the possible enumerated values that the function can return (as does the "real" compiler), so it is unnecessary for the programmer to restate them. The comments need simply be included in the definition for "enum Result" for the "RETURNS" information to be generated automatically: enum Result { RESULT_YES, /* The readers agreed */ RESULT_NO, /* The readers disagreed */ RESULT_YOURE_JOKING, /* The readers disagreed strongly */ RESULT_BLANK_LOOKS /* The readers didn't understand */ }; Critics have suggested that the latter style in the example is easier to read for someone wishing to call the function in question. Of course, this is a style question which depends on each person's tastes; but the criticism is tied to the notion that the source code needs to look "beautiful" because it is the primary reference for someone wishing to use that function. This becomes much less significant once documentation is available which is known to _always_ be up to date. Of course, the latter style takes longer to write and maintain, and can become out of date should the name or type of the parameter be changed, yet the comment get neglected. I have implemented one such documentation compiler for the C language called "c2man", which is freely available[3]. The response from users has been extremely encouraging; I suspect this is partly because of the wide variety of styles of comment placement that are recognised: it often correctly recognises comments that weren't written with c2man in mind at all. While it's use is focused solely on functional interface documentation and it doesn't have anywhere near the power of a full Literate Programming system, the focus is on reducing the effort required by the programmer to the absolute minimum, and seeing how much documentation we can get essentially "for free". Many people have requested C++ support be added to c2man, and I suspect that this philosophy would be even more suitable and powerful for documenting interfaces to C++ classes automatically. Here is an example of how I envisage this philosophy would work when applied to C++. It's interesting to note that this code was written a couple of years ago exactly as you see it here, without the idea of generating documentation from it in mind at all: // generic Timer class class Timer { private: static int numactive; // number of constructed timers. static Timer *first; // first one in list. Timer *next; // next one in linked list. Time ticksdiff; // ticks we take to expire once at front. enum { INACTIVE, // timer is not in chain. STARTED, // one-shot RUNNING // continuous. } state; // original interrupt vector value. static void interrupt (far *old_vector)(...); void (*timeout_function)(int); // function called when we time out int timeout_parameter; // gets passed to timeout_function Time duration; // timer length (ticks) static void interrupt far tick(...); // clock tick routine. void insert(); // add into active chain. void remove(); // remove from active chain. void set(Time milliseconds); // set duration from ms. public: // constructor Timer(Time time=0, // milliseconds void (*function)(int)=0, // called at timeout int param=-1); // param for function // destructor ~Timer(); // start (or restart) a timer running. void Start(); void Start(Time duration); // how long to run for // start a timer running continuous. void Run(); // stop a timer. void Stop(); // is a timer active? boolean Active() const { return state != INACTIVE; }; }; Processing this class declaration could generate the following automatically: NAME Timer - generic timer class SYNOPSIS class Timer { public: Timer(Time time=0, void (*function)(int)=0, int param=-1); ~Timer(); void Start(); void Start(Time duration); void Run(); void Stop(); boolean Active() const; }; PARAMETERS Time time Milliseconds void (*function)(int) called at timeout. int param Param for function. Time duration How long to run for. DESCRIPTION Timer Constructor ~Timer Destructor Start Start (or restart) a timer running. Run Start a timer running continuous. Stop Stop a timer. Active Is a timer active?. It should also be possible to extract this information from the implementation of the class (rather than the declaration), if that's where the user prefers to put the comments describing each member function and their parameters. The ideal tool should: 1. Avoid imposing a style on the programmer. 2. Work out section names (NAME, SYNOPSIS etc) without the programmer having to specify them explicitly. 3. Handle C++ and C style code equally well. 4. Not require the programmer to restate information which is already expressed in the syntax of the programming language. 5. Work reasonably well with existing code. 6. Flatten the class hierarchy so that the documentation for each class includes virtually everything the user needs to know about it. A number of tools already exist which attempt to tackle this problem, such as class2man, genman, classdoc and docclass. They vary in sophistication, utility, and the demands they place on the programmer; however, none as yet meet all the criteria set out above, and no one tool will suit the tastes of all programmers. Pouring lots of effort into a really ``smart'' documentation generator makes sense because once it's done, you get a payback for every document you generate. Every little feature added to the documentation generator to make things easier for the programmer pays off multiple times, and minimising the effort required by the programmer is the key. The logical starting point would be to graft Jim Roskind's C++ grammar[4] into c2man, modifying it to recognise comments in the relevant places, and adding all the necessary structures to hold the information from the parser that will get included in the output. Very little functional change should be needed in the lexer, which already recognises C++ comments. Unfortunately, at present I do not have sufficient spare time to make the additions to c2man required to support C++. It would be a great contribution to the C++ community, not to mention the documentation time saved by themselves, for someone involved in C++ work to add this support and release the result[5]. If you work with a team developing C++ code, please consider having one of your developers on a ``Usenet Sabbatical'' to extend this philosophy to C++, and start reaping the benefits in documentation time savings. It could also make an ideal Computer Science student compiler project. Please contact me via E-mail if you are interested in undertaking such a project. Graham Stoney greyham@research.canon.com.au Footnotes: 1. Advocates of Literate Programming would argue that Literate Programming is much more than snazzy documents and that it encourages this extra effort to focus early on in the design of the software, which pays off later. 2. To get a better idea, see the file grammar.y in the c2man distribution. 3. c2man has been posted to comp.sources.misc. It should be available from: location: ftp from any comp.sources.misc archive, in volume42 (the version in the comp.sources.reviewed archive is obsolete) ftp /pub/Unix/Util/c2man-2.0.*.tar.gz from dnpap.et.tudelft.nl Australia: ftp /usenet/comp.sources.misc/volume42/c2man-2.0/* from archie.au N.America: ftp /usenet/comp.sources.misc/volume42/c2man-2.0/* from ftp.wustl.edu Europe: ftp /News/comp.sources.misc/volume42/c2man-2.0/* from ftp.irisa.fr Japan: ftp /pub/NetNews/comp.sources.misc/volume42/c2man-2.0/* from ftp.iij.ad.jp Patches: ftp pub/netnews/sources.bugs/volume93/sep/c2man* from lth.se 4. Jim Roskind's yaccable C++ grammar is available via ftp from ics.uci.edu in the ftp/pub directory as: c++grammar2.0.tar.Z byacc1.8.tar.Z 5. c2man's copyright requires that all derivative works remain freely available.