diff options
Diffstat (limited to 'docs/comm/rts-libs/coding-style.html')
-rw-r--r-- | docs/comm/rts-libs/coding-style.html | 516 |
1 files changed, 516 insertions, 0 deletions
diff --git a/docs/comm/rts-libs/coding-style.html b/docs/comm/rts-libs/coding-style.html new file mode 100644 index 0000000000..58f5b4f9bb --- /dev/null +++ b/docs/comm/rts-libs/coding-style.html @@ -0,0 +1,516 @@ +<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN"> +<html> + <head> + <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=ISO-8859-1"> + <title>The GHC Commentary - Style Guidelines for RTS C code</title> + </head> + +<body> +<H1>The GHC Commentary - Style Guidelines for RTS C code</h1> + +<h2>Comments</h2> + +<p>These coding style guidelines are mainly intended for use in +<tt>ghc/rts</tt> and <tt>ghc/includes</tt>. + +<p>NB These are just suggestions. They're not set in stone. Some of +them are probably misguided. If you disagree with them, feel free to +modify this document (and make your commit message reasonably +informative) or mail someone (eg. <a +href="glasgow-haskell-users@haskell.org">The GHC mailing list</a>) + +<h2>References</h2> + +If you haven't read them already, you might like to check the following. +Where they conflict with our suggestions, they're probably right. + +<ul> + +<li> +The C99 standard. One reasonable reference is <a +href="http://home.tiscalinet.ch/t_wolf/tw/c/c9x_changes.html">here</a>. + +<p><li> +Writing Solid Code, Microsoft Press. (Highly recommended. Possibly +the only Microsoft Press book that's worth reading.) + +<p><li> +Autoconf documentation. +See also <a href="http://peti.gmd.de/autoconf-archive/">The autoconf macro archive</a> and +<a href="http://www.cyclic.com/cyclic-pages/autoconf.html">Cyclic Software's description</a> + +<p><li> <a +href="http://www.cs.umd.edu/users/cml/cstyle/indhill-cstyle.html">Indian +Hill C Style and Coding Standards</a>. + +<p><li> +<a href="http://www.cs.umd.edu/users/cml/cstyle/">A list of C programming style links</a> + +<p><li> +<a href="http://www.lysator.liu.se/c/c-www.html">A very large list of C programming links</a> + +<p><li> +<a href="http://www.geek-girl.com/unix.html">A list of Unix programming links</a> + +</ul> + + +<h2>Portability issues</h2> + +<ul> +<p><li> We try to stick to C99 where possible. We use the following +C99 features relative to C89, some of which were previously GCC +extensions (possibly with different syntax): + +<ul> +<p><li>Variable length arrays as the last field of a struct. GCC has +a similar extension, but the syntax is slightly different: in GCC you +would declare the array as <tt>arr[0]</tt>, whereas in C99 it is +declared as <tt>arr[]</tt>. + +<p><li>Inline annotations on functions (see later) + +<p><li>Labeled elements in initialisers. Again, GCC has a slightly +different syntax from C99 here, and we stick with the GCC syntax until +GCC implements the C99 proposal. + +<p><li>C++-style comments. These are part of the C99 standard, and we +prefer to use them whenever possible. +</ul> + +<p>In addition we use ANSI-C-style function declarations and +prototypes exclusively. Every function should have a prototype; +static function prototypes may be placed near the top of the file in +which they are declared, and external prototypes are usually placed in +a header file with the same basename as the source file (although there +are exceptions to this rule, particularly when several source files +together implement a subsystem which is described by a single external +header file). + +<p><li>We use the following GCC extensions, but surround them with +<tt>#ifdef __GNUC__</tt>: + +<ul> +<p><li>Function attributes (mostly just <code>no_return</code> and +<code>unused</code>) +<p><li>Inline assembly. +</ul> + +<p><li> +char can be signed or unsigned - always say which you mean + +<p><li>Our POSIX policy: try to write code that only uses POSIX (IEEE +Std 1003.1) interfaces and APIs. We used to define +<code>POSIX_SOURCE</code> by default, but found that this caused more +problems than it solved, so now we require any code that is +POSIX-compliant to explicitly say so by having <code>#include +"PosixSource.h"</code> at the top. Try to do this whenever possible. + +<p><li> Some architectures have memory alignment constraints. Others +don't have any constraints but go faster if you align things. These +macros (from <tt>ghcconfig.h</tt>) tell you which alignment to use + +<pre> + /* minimum alignment of unsigned int */ + #define ALIGNMENT_UNSIGNED_INT 4 + + /* minimum alignment of long */ + #define ALIGNMENT_LONG 4 + + /* minimum alignment of float */ + #define ALIGNMENT_FLOAT 4 + + /* minimum alignment of double */ + #define ALIGNMENT_DOUBLE 4 +</pre> + +<p><li> Use <tt>StgInt</tt>, <tt>StgWord</tt> and <tt>StgPtr</tt> when +reading/writing ints and ptrs to the stack or heap. Note that, by +definition, <tt>StgInt</tt>, <tt>StgWord</tt> and <tt>StgPtr</tt> are +the same size and have the same alignment constraints even if +<code>sizeof(int) != sizeof(ptr)</code> on that platform. + +<p><li> Use <tt>StgInt8</tt>, <tt>StgInt16</tt>, etc when you need a +certain minimum number of bits in a type. Use <tt>int</tt> and +<tt>nat</tt> when there's no particular constraint. ANSI C only +guarantees that ints are at least 16 bits but within GHC we assume +they are 32 bits. + +<p><li> Use <tt>StgFloat</tt> and <tt>StgDouble</tt> for floating +point values which will go on/have come from the stack or heap. Note +that <tt>StgDouble</tt> may occupy more than one <tt>StgWord</tt>, but +it will always be a whole number multiple. + +<p> +Use <code>PK_FLT(addr)</code>, <code>PK_DBL(addr)</code> to read +<tt>StgFloat</tt> and <tt>StgDouble</tt> values from the stack/heap, +and <code>ASSIGN_FLT(val,addr)</code> / +<code>ASSIGN_DBL(val,addr)</code> to assign StgFloat/StgDouble values +to heap/stack locations. These macros take care of alignment +restrictions. + +<p> +Heap/Stack locations are always <tt>StgWord</tt> aligned; the +alignment requirements of an <tt>StgDouble</tt> may be more than that +of <tt>StgWord</tt>, but we don't pad misaligned <tt>StgDoubles</tt> +because doing so would be too much hassle (see <code>PK_DBL</code> & +co above). + +<p><li> +Avoid conditional code like this: + +<pre> + #ifdef solaris_host_OS + // do something solaris specific + #endif +</pre> + +Instead, add an appropriate test to the configure.ac script and use +the result of that test instead. + +<pre> + #ifdef HAVE_BSD_H + // use a BSD library + #endif +</pre> + +<p>The problem is that things change from one version of an OS to another +- things get added, things get deleted, things get broken, some things +are optional extras. Using "feature tests" instead of "system tests" +makes things a lot less brittle. Things also tend to get documented +better. + +</ul> + +<h2>Debugging/robustness tricks</h2> + + +Anyone who has tried to debug a garbage collector or code generator +will tell you: "If a program is going to crash, it should crash as +soon, as noisily and as often as possible." There's nothing worse +than trying to find a bug which only shows up when running GHC on +itself and doesn't manifest itself until 10 seconds after the actual +cause of the problem. + +<p>We put all our debugging code inside <tt>#ifdef DEBUG</tt>. The +general policy is we don't ship code with debugging checks and +assertions in it, but we do run with those checks in place when +developing and testing. Anything inside <tt>#ifdef DEBUG</tt> should +not slow down the code by more than a factor of 2. + +<p>We also have more expensive "sanity checking" code for hardcore +debugging - this can slow down the code by a large factor, but is only +enabled on demand by a command-line flag. General sanity checking in +the RTS is currently enabled with the <tt>-DS</tt> RTS flag. + +<p>There are a number of RTS flags which control debugging output and +sanity checking in various parts of the system when <tt>DEBUG</tt> is +defined. For example, to get the scheduler to be verbose about what +it is doing, you would say <tt>+RTS -Ds -RTS</tt>. See +<tt>includes/RtsFlags.h</tt> and <tt>rts/RtsFlags.c</tt> for the full +set of debugging flags. To check one of these flags in the code, +write: + +<pre> + IF_DEBUG(gc, fprintf(stderr, "...")); +</pre> + +would check the <tt>gc</tt> flag before generating the output (and the +code is removed altogether if <tt>DEBUG</tt> is not defined). + +<p>All debugging output should go to <tt>stderr</tt>. + +<p> +Particular guidelines for writing robust code: + +<ul> +<p><li> +Use assertions. Use lots of assertions. If you write a comment +that says "takes a +ve number" add an assertion. If you're casting +an int to a nat, add an assertion. If you're casting an int to a char, +add an assertion. We use the <tt>ASSERT</tt> macro for writing +assertions; it goes away when <tt>DEBUG</tt> is not defined. + +<p><li> +Write special debugging code to check the integrity of your data structures. +(Most of the runtime checking code is in <tt>rts/Sanity.c</tt>) +Add extra assertions which call this code at the start and end of any +code that operates on your data structures. + +<p><li> +When you find a hard-to-spot bug, try to think of some assertions, +sanity checks or whatever that would have made the bug easier to find. + +<p><li> +When defining an enumeration, it's a good idea not to use 0 for normal +values. Instead, make 0 raise an internal error. The idea here is to +make it easier to detect pointer-related errors on the assumption that +random pointers are more likely to point to a 0 than to anything else. + +<pre> +typedef enum + { i_INTERNAL_ERROR /* Instruction 0 raises an internal error */ + , i_PANIC /* irrefutable pattern match failed! */ + , i_ERROR /* user level error */ + + ... +</pre> + +<p><li> Use <tt>#warning</tt> or <tt>#error</tt> whenever you write a +piece of incomplete/broken code. + +<p><li> When testing, try to make infrequent things happen often. + For example, make a context switch/gc/etc happen every time a + context switch/gc/etc can happen. The system will run like a + pig but it'll catch a lot of bugs. + +</ul> + +<h2>Syntactic details</h2> + +<ul> +<p><li><b>Important:</b> Put "redundant" braces or parens in your code. +Omitting braces and parens leads to very hard to spot bugs - +especially if you use macros (and you might have noticed that GHC does +this a lot!) + +<p> +In particular: +<ul> +<p><li> +Put braces round the body of for loops, while loops, if statements, etc. +even if they "aren't needed" because it's really hard to find the resulting +bug if you mess up. Indent them any way you like but put them in there! +</ul> + +<p><li> +When defining a macro, always put parens round args - just in case. +For example, write: +<pre> + #define add(x,y) ((x)+(y)) +</pre> +instead of +<pre> + #define add(x,y) x+y +</pre> + +<p><li> Don't declare and initialize variables at the same time. +Separating the declaration and initialization takes more lines, but +make the code clearer. + +<p><li> +Use inline functions instead of macros if possible - they're a lot +less tricky to get right and don't suffer from the usual problems +of side effects, evaluation order, multiple evaluation, etc. + +<ul> +<p><li>Inline functions get the naming issue right. E.g. they + can have local variables which (in an expression context) + macros can't. + +<p><li> Inline functions have call-by-value semantics whereas macros + are call-by-name. You can be bitten by duplicated computation + if you aren't careful. + +<p><li> You can use inline functions from inside gdb if you compile with + -O0 or -fkeep-inline-functions. If you use macros, you'd better + know what they expand to. +</ul> + +However, note that macros can serve as both l-values and r-values and +can be "polymorphic" as these examples show: +<pre> + // you can use this as an l-value or an l-value + #define PROF_INFO(cl) (((StgClosure*)(cl))->header.profInfo) + + // polymorphic case + // but note that min(min(1,2),3) does 3 comparisions instead of 2!! + #define min(x,y) (((x)<=(y)) ? (x) : (y)) +</pre> + +<p><li> +Inline functions should be "static inline" because: +<ul> +<p><li> +gcc will delete static inlines if not used or theyre always inlined. + +<p><li> + if they're externed, we could get conflicts between 2 copies of the + same function if, for some reason, gcc is unable to delete them. + If they're static, we still get multiple copies but at least they don't conflict. +</ul> + +OTOH, the gcc manual says this +so maybe we should use extern inline? + +<pre> + When a function is both inline and `static', if all calls to the +function are integrated into the caller, and the function's address is +never used, then the function's own assembler code is never referenced. +In this case, GNU CC does not actually output assembler code for the +function, unless you specify the option `-fkeep-inline-functions'. +Some calls cannot be integrated for various reasons (in particular, +calls that precede the function's definition cannot be integrated, and +neither can recursive calls within the definition). If there is a +nonintegrated call, then the function is compiled to assembler code as +usual. The function must also be compiled as usual if the program +refers to its address, because that can't be inlined. + + When an inline function is not `static', then the compiler must +assume that there may be calls from other source files; since a global +symbol can be defined only once in any program, the function must not +be defined in the other source files, so the calls therein cannot be +integrated. Therefore, a non-`static' inline function is always +compiled on its own in the usual fashion. + + If you specify both `inline' and `extern' in the function +definition, then the definition is used only for inlining. In no case +is the function compiled on its own, not even if you refer to its +address explicitly. Such an address becomes an external reference, as +if you had only declared the function, and had not defined it. + + This combination of `inline' and `extern' has almost the effect of a +macro. The way to use it is to put a function definition in a header +file with these keywords, and put another copy of the definition +(lacking `inline' and `extern') in a library file. The definition in +the header file will cause most calls to the function to be inlined. +If any uses of the function remain, they will refer to the single copy +in the library. +</pre> + +<p><li> +Don't define macros that expand to a list of statements. +You could just use braces as in: + +<pre> + #define ASSIGN_CC_ID(ccID) \ + { \ + ccID = CC_ID; \ + CC_ID++; \ + } +</pre> + +(but it's usually better to use an inline function instead - see above). + +<p><li> +Don't even write macros that expand to 0 statements - they can mess you +up as well. Use the doNothing macro instead. +<pre> + #define doNothing() do { } while (0) +</pre> + +<p><li> +This code +<pre> +int* p, q; +</pre> +looks like it declares two pointers but, in fact, only p is a pointer. +It's safer to write this: +<pre> +int* p; +int* q; +</pre> +You could also write this: +<pre> +int *p, *q; +</pre> +but it is preferrable to split the declarations. + +<p><li> +Try to use ANSI C's enum feature when defining lists of constants of +the same type. Among other benefits, you'll notice that gdb uses the +name instead of its (usually inscrutable) number when printing values +with enum types and gdb will let you use the name in expressions you +type. + +<p> +Examples: +<pre> + typedef enum { /* N.B. Used as indexes into arrays */ + NO_HEAP_PROFILING, + HEAP_BY_CC, + HEAP_BY_MOD, + HEAP_BY_GRP, + HEAP_BY_DESCR, + HEAP_BY_TYPE, + HEAP_BY_TIME + } ProfilingFlags; +</pre> +instead of +<pre> + # define NO_HEAP_PROFILING 0 /* N.B. Used as indexes into arrays */ + # define HEAP_BY_CC 1 + # define HEAP_BY_MOD 2 + # define HEAP_BY_GRP 3 + # define HEAP_BY_DESCR 4 + # define HEAP_BY_TYPE 5 + # define HEAP_BY_TIME 6 +</pre> +and +<pre> + typedef enum { + CCchar = 'C', + MODchar = 'M', + GRPchar = 'G', + DESCRchar = 'D', + TYPEchar = 'Y', + TIMEchar = 'T' + } ProfilingTag; +</pre> +instead of +<pre> + # define CCchar 'C' + # define MODchar 'M' + # define GRPchar 'G' + # define DESCRchar 'D' + # define TYPEchar 'Y' + # define TIMEchar 'T' +</pre> + +<p><li> Please keep to 80 columns: the line has to be drawn somewhere, +and by keeping it to 80 columns we can ensure that code looks OK on +everyone's screen. Long lines are hard to read, and a sign that the +code needs to be restructured anyway. + +<p><li> When commenting out large chunks of code, use <code>#ifdef 0 +... #endif</code> rather than <code>/* ... */</code> because C doesn't +have nested comments. + +<p><li>When declaring a typedef for a struct, give the struct a name +as well, so that other headers can forward-reference the struct name +and it becomes possible to have opaque pointers to the struct. Our +convention is to name the struct the same as the typedef, but add a +leading underscore. For example: + +<pre> + typedef struct _Foo { + ... + } Foo; +</pre> + +<p><li>Do not use <tt>!</tt> instead of explicit comparison against +<tt>NULL</tt> or <tt>'\0'</tt>; the latter is much clearer. + +<p><li> We don't care too much about your indentation style but, if +you're modifying a function, please try to use the same style as the +rest of the function (or file). If you're writing new code, a +tab width of 4 is preferred. + +</ul> + +<h2>CVS issues</h2> + +<ul> +<p><li> +Don't be tempted to reindent or reorganise large chunks of code - it +generates large diffs in which it's hard to see whether anything else +was changed. +<p> +If you must reindent or reorganise, don't include any functional +changes that commit and give advance warning that you're about to do +it in case anyone else is changing that file. +</ul> + + +</body> +</html> |