summaryrefslogtreecommitdiff
path: root/docs/comm/rts-libs/coding-style.html
diff options
context:
space:
mode:
Diffstat (limited to 'docs/comm/rts-libs/coding-style.html')
-rw-r--r--docs/comm/rts-libs/coding-style.html516
1 files changed, 516 insertions, 0 deletions
diff --git a/docs/comm/rts-libs/coding-style.html b/docs/comm/rts-libs/coding-style.html
new file mode 100644
index 0000000000..58f5b4f9bb
--- /dev/null
+++ b/docs/comm/rts-libs/coding-style.html
@@ -0,0 +1,516 @@
+<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
+<html>
+ <head>
+ <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=ISO-8859-1">
+ <title>The GHC Commentary - Style Guidelines for RTS C code</title>
+ </head>
+
+<body>
+<H1>The GHC Commentary - Style Guidelines for RTS C code</h1>
+
+<h2>Comments</h2>
+
+<p>These coding style guidelines are mainly intended for use in
+<tt>ghc/rts</tt> and <tt>ghc/includes</tt>.
+
+<p>NB These are just suggestions. They're not set in stone. Some of
+them are probably misguided. If you disagree with them, feel free to
+modify this document (and make your commit message reasonably
+informative) or mail someone (eg. <a
+href="glasgow-haskell-users@haskell.org">The GHC mailing list</a>)
+
+<h2>References</h2>
+
+If you haven't read them already, you might like to check the following.
+Where they conflict with our suggestions, they're probably right.
+
+<ul>
+
+<li>
+The C99 standard. One reasonable reference is <a
+href="http://home.tiscalinet.ch/t_wolf/tw/c/c9x_changes.html">here</a>.
+
+<p><li>
+Writing Solid Code, Microsoft Press. (Highly recommended. Possibly
+the only Microsoft Press book that's worth reading.)
+
+<p><li>
+Autoconf documentation.
+See also <a href="http://peti.gmd.de/autoconf-archive/">The autoconf macro archive</a> and
+<a href="http://www.cyclic.com/cyclic-pages/autoconf.html">Cyclic Software's description</a>
+
+<p><li> <a
+href="http://www.cs.umd.edu/users/cml/cstyle/indhill-cstyle.html">Indian
+Hill C Style and Coding Standards</a>.
+
+<p><li>
+<a href="http://www.cs.umd.edu/users/cml/cstyle/">A list of C programming style links</a>
+
+<p><li>
+<a href="http://www.lysator.liu.se/c/c-www.html">A very large list of C programming links</a>
+
+<p><li>
+<a href="http://www.geek-girl.com/unix.html">A list of Unix programming links</a>
+
+</ul>
+
+
+<h2>Portability issues</h2>
+
+<ul>
+<p><li> We try to stick to C99 where possible. We use the following
+C99 features relative to C89, some of which were previously GCC
+extensions (possibly with different syntax):
+
+<ul>
+<p><li>Variable length arrays as the last field of a struct. GCC has
+a similar extension, but the syntax is slightly different: in GCC you
+would declare the array as <tt>arr[0]</tt>, whereas in C99 it is
+declared as <tt>arr[]</tt>.
+
+<p><li>Inline annotations on functions (see later)
+
+<p><li>Labeled elements in initialisers. Again, GCC has a slightly
+different syntax from C99 here, and we stick with the GCC syntax until
+GCC implements the C99 proposal.
+
+<p><li>C++-style comments. These are part of the C99 standard, and we
+prefer to use them whenever possible.
+</ul>
+
+<p>In addition we use ANSI-C-style function declarations and
+prototypes exclusively. Every function should have a prototype;
+static function prototypes may be placed near the top of the file in
+which they are declared, and external prototypes are usually placed in
+a header file with the same basename as the source file (although there
+are exceptions to this rule, particularly when several source files
+together implement a subsystem which is described by a single external
+header file).
+
+<p><li>We use the following GCC extensions, but surround them with
+<tt>#ifdef __GNUC__</tt>:
+
+<ul>
+<p><li>Function attributes (mostly just <code>no_return</code> and
+<code>unused</code>)
+<p><li>Inline assembly.
+</ul>
+
+<p><li>
+char can be signed or unsigned - always say which you mean
+
+<p><li>Our POSIX policy: try to write code that only uses POSIX (IEEE
+Std 1003.1) interfaces and APIs. We used to define
+<code>POSIX_SOURCE</code> by default, but found that this caused more
+problems than it solved, so now we require any code that is
+POSIX-compliant to explicitly say so by having <code>#include
+"PosixSource.h"</code> at the top. Try to do this whenever possible.
+
+<p><li> Some architectures have memory alignment constraints. Others
+don't have any constraints but go faster if you align things. These
+macros (from <tt>ghcconfig.h</tt>) tell you which alignment to use
+
+<pre>
+ /* minimum alignment of unsigned int */
+ #define ALIGNMENT_UNSIGNED_INT 4
+
+ /* minimum alignment of long */
+ #define ALIGNMENT_LONG 4
+
+ /* minimum alignment of float */
+ #define ALIGNMENT_FLOAT 4
+
+ /* minimum alignment of double */
+ #define ALIGNMENT_DOUBLE 4
+</pre>
+
+<p><li> Use <tt>StgInt</tt>, <tt>StgWord</tt> and <tt>StgPtr</tt> when
+reading/writing ints and ptrs to the stack or heap. Note that, by
+definition, <tt>StgInt</tt>, <tt>StgWord</tt> and <tt>StgPtr</tt> are
+the same size and have the same alignment constraints even if
+<code>sizeof(int) != sizeof(ptr)</code> on that platform.
+
+<p><li> Use <tt>StgInt8</tt>, <tt>StgInt16</tt>, etc when you need a
+certain minimum number of bits in a type. Use <tt>int</tt> and
+<tt>nat</tt> when there's no particular constraint. ANSI C only
+guarantees that ints are at least 16 bits but within GHC we assume
+they are 32 bits.
+
+<p><li> Use <tt>StgFloat</tt> and <tt>StgDouble</tt> for floating
+point values which will go on/have come from the stack or heap. Note
+that <tt>StgDouble</tt> may occupy more than one <tt>StgWord</tt>, but
+it will always be a whole number multiple.
+
+<p>
+Use <code>PK_FLT(addr)</code>, <code>PK_DBL(addr)</code> to read
+<tt>StgFloat</tt> and <tt>StgDouble</tt> values from the stack/heap,
+and <code>ASSIGN_FLT(val,addr)</code> /
+<code>ASSIGN_DBL(val,addr)</code> to assign StgFloat/StgDouble values
+to heap/stack locations. These macros take care of alignment
+restrictions.
+
+<p>
+Heap/Stack locations are always <tt>StgWord</tt> aligned; the
+alignment requirements of an <tt>StgDouble</tt> may be more than that
+of <tt>StgWord</tt>, but we don't pad misaligned <tt>StgDoubles</tt>
+because doing so would be too much hassle (see <code>PK_DBL</code> &
+co above).
+
+<p><li>
+Avoid conditional code like this:
+
+<pre>
+ #ifdef solaris_host_OS
+ // do something solaris specific
+ #endif
+</pre>
+
+Instead, add an appropriate test to the configure.ac script and use
+the result of that test instead.
+
+<pre>
+ #ifdef HAVE_BSD_H
+ // use a BSD library
+ #endif
+</pre>
+
+<p>The problem is that things change from one version of an OS to another
+- things get added, things get deleted, things get broken, some things
+are optional extras. Using "feature tests" instead of "system tests"
+makes things a lot less brittle. Things also tend to get documented
+better.
+
+</ul>
+
+<h2>Debugging/robustness tricks</h2>
+
+
+Anyone who has tried to debug a garbage collector or code generator
+will tell you: "If a program is going to crash, it should crash as
+soon, as noisily and as often as possible." There's nothing worse
+than trying to find a bug which only shows up when running GHC on
+itself and doesn't manifest itself until 10 seconds after the actual
+cause of the problem.
+
+<p>We put all our debugging code inside <tt>#ifdef DEBUG</tt>. The
+general policy is we don't ship code with debugging checks and
+assertions in it, but we do run with those checks in place when
+developing and testing. Anything inside <tt>#ifdef DEBUG</tt> should
+not slow down the code by more than a factor of 2.
+
+<p>We also have more expensive "sanity checking" code for hardcore
+debugging - this can slow down the code by a large factor, but is only
+enabled on demand by a command-line flag. General sanity checking in
+the RTS is currently enabled with the <tt>-DS</tt> RTS flag.
+
+<p>There are a number of RTS flags which control debugging output and
+sanity checking in various parts of the system when <tt>DEBUG</tt> is
+defined. For example, to get the scheduler to be verbose about what
+it is doing, you would say <tt>+RTS -Ds -RTS</tt>. See
+<tt>includes/RtsFlags.h</tt> and <tt>rts/RtsFlags.c</tt> for the full
+set of debugging flags. To check one of these flags in the code,
+write:
+
+<pre>
+ IF_DEBUG(gc, fprintf(stderr, "..."));
+</pre>
+
+would check the <tt>gc</tt> flag before generating the output (and the
+code is removed altogether if <tt>DEBUG</tt> is not defined).
+
+<p>All debugging output should go to <tt>stderr</tt>.
+
+<p>
+Particular guidelines for writing robust code:
+
+<ul>
+<p><li>
+Use assertions. Use lots of assertions. If you write a comment
+that says "takes a +ve number" add an assertion. If you're casting
+an int to a nat, add an assertion. If you're casting an int to a char,
+add an assertion. We use the <tt>ASSERT</tt> macro for writing
+assertions; it goes away when <tt>DEBUG</tt> is not defined.
+
+<p><li>
+Write special debugging code to check the integrity of your data structures.
+(Most of the runtime checking code is in <tt>rts/Sanity.c</tt>)
+Add extra assertions which call this code at the start and end of any
+code that operates on your data structures.
+
+<p><li>
+When you find a hard-to-spot bug, try to think of some assertions,
+sanity checks or whatever that would have made the bug easier to find.
+
+<p><li>
+When defining an enumeration, it's a good idea not to use 0 for normal
+values. Instead, make 0 raise an internal error. The idea here is to
+make it easier to detect pointer-related errors on the assumption that
+random pointers are more likely to point to a 0 than to anything else.
+
+<pre>
+typedef enum
+ { i_INTERNAL_ERROR /* Instruction 0 raises an internal error */
+ , i_PANIC /* irrefutable pattern match failed! */
+ , i_ERROR /* user level error */
+
+ ...
+</pre>
+
+<p><li> Use <tt>#warning</tt> or <tt>#error</tt> whenever you write a
+piece of incomplete/broken code.
+
+<p><li> When testing, try to make infrequent things happen often.
+ For example, make a context switch/gc/etc happen every time a
+ context switch/gc/etc can happen. The system will run like a
+ pig but it'll catch a lot of bugs.
+
+</ul>
+
+<h2>Syntactic details</h2>
+
+<ul>
+<p><li><b>Important:</b> Put "redundant" braces or parens in your code.
+Omitting braces and parens leads to very hard to spot bugs -
+especially if you use macros (and you might have noticed that GHC does
+this a lot!)
+
+<p>
+In particular:
+<ul>
+<p><li>
+Put braces round the body of for loops, while loops, if statements, etc.
+even if they "aren't needed" because it's really hard to find the resulting
+bug if you mess up. Indent them any way you like but put them in there!
+</ul>
+
+<p><li>
+When defining a macro, always put parens round args - just in case.
+For example, write:
+<pre>
+ #define add(x,y) ((x)+(y))
+</pre>
+instead of
+<pre>
+ #define add(x,y) x+y
+</pre>
+
+<p><li> Don't declare and initialize variables at the same time.
+Separating the declaration and initialization takes more lines, but
+make the code clearer.
+
+<p><li>
+Use inline functions instead of macros if possible - they're a lot
+less tricky to get right and don't suffer from the usual problems
+of side effects, evaluation order, multiple evaluation, etc.
+
+<ul>
+<p><li>Inline functions get the naming issue right. E.g. they
+ can have local variables which (in an expression context)
+ macros can't.
+
+<p><li> Inline functions have call-by-value semantics whereas macros
+ are call-by-name. You can be bitten by duplicated computation
+ if you aren't careful.
+
+<p><li> You can use inline functions from inside gdb if you compile with
+ -O0 or -fkeep-inline-functions. If you use macros, you'd better
+ know what they expand to.
+</ul>
+
+However, note that macros can serve as both l-values and r-values and
+can be "polymorphic" as these examples show:
+<pre>
+ // you can use this as an l-value or an l-value
+ #define PROF_INFO(cl) (((StgClosure*)(cl))->header.profInfo)
+
+ // polymorphic case
+ // but note that min(min(1,2),3) does 3 comparisions instead of 2!!
+ #define min(x,y) (((x)<=(y)) ? (x) : (y))
+</pre>
+
+<p><li>
+Inline functions should be "static inline" because:
+<ul>
+<p><li>
+gcc will delete static inlines if not used or theyre always inlined.
+
+<p><li>
+ if they're externed, we could get conflicts between 2 copies of the
+ same function if, for some reason, gcc is unable to delete them.
+ If they're static, we still get multiple copies but at least they don't conflict.
+</ul>
+
+OTOH, the gcc manual says this
+so maybe we should use extern inline?
+
+<pre>
+ When a function is both inline and `static', if all calls to the
+function are integrated into the caller, and the function's address is
+never used, then the function's own assembler code is never referenced.
+In this case, GNU CC does not actually output assembler code for the
+function, unless you specify the option `-fkeep-inline-functions'.
+Some calls cannot be integrated for various reasons (in particular,
+calls that precede the function's definition cannot be integrated, and
+neither can recursive calls within the definition). If there is a
+nonintegrated call, then the function is compiled to assembler code as
+usual. The function must also be compiled as usual if the program
+refers to its address, because that can't be inlined.
+
+ When an inline function is not `static', then the compiler must
+assume that there may be calls from other source files; since a global
+symbol can be defined only once in any program, the function must not
+be defined in the other source files, so the calls therein cannot be
+integrated. Therefore, a non-`static' inline function is always
+compiled on its own in the usual fashion.
+
+ If you specify both `inline' and `extern' in the function
+definition, then the definition is used only for inlining. In no case
+is the function compiled on its own, not even if you refer to its
+address explicitly. Such an address becomes an external reference, as
+if you had only declared the function, and had not defined it.
+
+ This combination of `inline' and `extern' has almost the effect of a
+macro. The way to use it is to put a function definition in a header
+file with these keywords, and put another copy of the definition
+(lacking `inline' and `extern') in a library file. The definition in
+the header file will cause most calls to the function to be inlined.
+If any uses of the function remain, they will refer to the single copy
+in the library.
+</pre>
+
+<p><li>
+Don't define macros that expand to a list of statements.
+You could just use braces as in:
+
+<pre>
+ #define ASSIGN_CC_ID(ccID) \
+ { \
+ ccID = CC_ID; \
+ CC_ID++; \
+ }
+</pre>
+
+(but it's usually better to use an inline function instead - see above).
+
+<p><li>
+Don't even write macros that expand to 0 statements - they can mess you
+up as well. Use the doNothing macro instead.
+<pre>
+ #define doNothing() do { } while (0)
+</pre>
+
+<p><li>
+This code
+<pre>
+int* p, q;
+</pre>
+looks like it declares two pointers but, in fact, only p is a pointer.
+It's safer to write this:
+<pre>
+int* p;
+int* q;
+</pre>
+You could also write this:
+<pre>
+int *p, *q;
+</pre>
+but it is preferrable to split the declarations.
+
+<p><li>
+Try to use ANSI C's enum feature when defining lists of constants of
+the same type. Among other benefits, you'll notice that gdb uses the
+name instead of its (usually inscrutable) number when printing values
+with enum types and gdb will let you use the name in expressions you
+type.
+
+<p>
+Examples:
+<pre>
+ typedef enum { /* N.B. Used as indexes into arrays */
+ NO_HEAP_PROFILING,
+ HEAP_BY_CC,
+ HEAP_BY_MOD,
+ HEAP_BY_GRP,
+ HEAP_BY_DESCR,
+ HEAP_BY_TYPE,
+ HEAP_BY_TIME
+ } ProfilingFlags;
+</pre>
+instead of
+<pre>
+ # define NO_HEAP_PROFILING 0 /* N.B. Used as indexes into arrays */
+ # define HEAP_BY_CC 1
+ # define HEAP_BY_MOD 2
+ # define HEAP_BY_GRP 3
+ # define HEAP_BY_DESCR 4
+ # define HEAP_BY_TYPE 5
+ # define HEAP_BY_TIME 6
+</pre>
+and
+<pre>
+ typedef enum {
+ CCchar = 'C',
+ MODchar = 'M',
+ GRPchar = 'G',
+ DESCRchar = 'D',
+ TYPEchar = 'Y',
+ TIMEchar = 'T'
+ } ProfilingTag;
+</pre>
+instead of
+<pre>
+ # define CCchar 'C'
+ # define MODchar 'M'
+ # define GRPchar 'G'
+ # define DESCRchar 'D'
+ # define TYPEchar 'Y'
+ # define TIMEchar 'T'
+</pre>
+
+<p><li> Please keep to 80 columns: the line has to be drawn somewhere,
+and by keeping it to 80 columns we can ensure that code looks OK on
+everyone's screen. Long lines are hard to read, and a sign that the
+code needs to be restructured anyway.
+
+<p><li> When commenting out large chunks of code, use <code>#ifdef 0
+... #endif</code> rather than <code>/* ... */</code> because C doesn't
+have nested comments.
+
+<p><li>When declaring a typedef for a struct, give the struct a name
+as well, so that other headers can forward-reference the struct name
+and it becomes possible to have opaque pointers to the struct. Our
+convention is to name the struct the same as the typedef, but add a
+leading underscore. For example:
+
+<pre>
+ typedef struct _Foo {
+ ...
+ } Foo;
+</pre>
+
+<p><li>Do not use <tt>!</tt> instead of explicit comparison against
+<tt>NULL</tt> or <tt>'\0'</tt>; the latter is much clearer.
+
+<p><li> We don't care too much about your indentation style but, if
+you're modifying a function, please try to use the same style as the
+rest of the function (or file). If you're writing new code, a
+tab width of 4 is preferred.
+
+</ul>
+
+<h2>CVS issues</h2>
+
+<ul>
+<p><li>
+Don't be tempted to reindent or reorganise large chunks of code - it
+generates large diffs in which it's hard to see whether anything else
+was changed.
+<p>
+If you must reindent or reorganise, don't include any functional
+changes that commit and give advance warning that you're about to do
+it in case anyone else is changing that file.
+</ul>
+
+
+</body>
+</html>