summaryrefslogtreecommitdiff
path: root/docs/coding-style.html
blob: ceca6b9babec64da02a610dbd85c29372756d8d1 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
   <TITLE>Access To The GHC CVS Repository</TITLE>
   <META NAME="GENERATOR" CONTENT="Mozilla/3.04Gold (X11; I; SunOS 5.5.1 i86pc) [Netscape]">
</HEAD>
<BODY>

<H1>Coding suggestions for GHC/Hugs related code</h1>

<h2>Comments</h2>

NB These are just suggestions.  They're not set in stone.  Some of
them are probably misguided.  If you disagree with them, feel free to 
modify this document (and make your commit message reasonably informative)
or mail someone (eg <a href="fp-cvs-fptools@dcs.gla.ac.uk">The FP-CVS mailing list</a> or
<a
href="mailto:reid-alastair@cs.yale.edu">reid-alastair@cs.yale.edu</a>).


<h2>References</h2>

If you haven't read them already, you might like to check the following.
Where they conflict with our suggestions, they're probably right.

<ul>
<li>
Writing Solid Code, Microsoft Press.  (Highly recommended.  Possibly
the only Microsoft Press book that's worth reading.  SimonPJ has a
copy.)

<li>
Autoconf documentation (which doesn't seem to be on the web).
See also <a href="http://peti.gmd.de/autoconf-archive/">The autoconf macro archive</a> and 
<a href="http://www.cyclic.com/cyclic-pages/autoconf.html">Cyclic Software's description</a>

<li>
<a href="http://www.cs.umd.edu/users/cml/cstyle/indhill-cstyle.html">Indian Hill C Style and Coding Standards</a>.

<li>
<a href="http://www.cs.umd.edu/users/cml/cstyle/">A list of C programming style links</a>

<li>
<a href="http://www.lysator.liu.se/c/c-www.html">A very large list of C programming links</a>

<li>
<a href="http://www.geek-girl.com/unix.html">A list of Unix programming links</a>


</ul>


<h2>Portability issues</h2>

<ul>
<li>
We use ANSI C with some extensions.  In particular, we use:
<ul>
<li>enum
<li>ANSI style prototypes
<li>#elsif, #error, #warning, ## and other cpp features
</ul>

<li>
We use the following gcc extensions (see gcc documentation):
<ul>
<li>zero length arrays (useful as the last field of a struct)
<li>inline annotations on functions (see later)
<li>Labeled elements in initialisers
<li>Function attributes (mostly just no_return)
<li>Macro varargs (actually, we don't use them yet but I'm very tempted)
<li>Alastair really likes to use C++ style comments - but
    he'll probably regret it later.
<li>other stuff I've forgotten about...
</ul>

Some of these gcc/ANSI features could be avoided (for example, we
could use __inline__ instead of inline or use the usual PROTO((...))
trick in function prototypes) - but some of them can't be avoided
so we don't try with the others.

<li>
char can be signed or unsigned - always say which you mean

<li>
Some architectures have memory alignment constraints.
Others don't have any constraints but go faster if you align things.
These macros tell you which alignment to use

<pre>
  /* minimum alignment of unsigned int */
  #define ALIGNMENT_UNSIGNED_INT 4

  /* minimum alignment of long */
  #define ALIGNMENT_LONG 4

  /* minimum alignment of float */
  #define ALIGNMENT_FLOAT 4

  /* minimum alignment of double */
  #define ALIGNMENT_DOUBLE 4
</pre>

<li>
Use StgInt, StgWord and StgPtr when reading/writing ints and ptrs to
the stack or heap.  Note that, by definition, StgInt, StgWord and
StgPtr are the same size and have the same alignment constraints
even if sizeof(int) != sizeof(ptr) on that platform.

<li>
Use StgInt8, StgInt16, etc when you need a certain minimum number of
bits in a type.  Use int and nat when there's no particular
constraint.  ANSI C only guarantees that ints are at least 16 bits but
within GHC we assume they are 32 bits.  (I'm not sure if this is a
good idea - ADR)

<li>
Use StgFloat and StgDouble for floating point values which will go
on/have come from the stack or heap.  Note that StgFloat may be the
same as StgDouble on some architectures (eg Alphas) and that it might
occupy many StgWords.

<p>
Use PK_FLT(addr), PK_DBL(addr) to read StgFloat and
StgDouble values from the stack/heap, and ASSIGN_FLT(val,addr)/
ASSIGN_DBL(val,addr) to assign StgFloat/StgDouble values to heap/stack
locations.  These macros take care of alignment restrictions.

<p>
Heap/Stack locations are StgWord aligned; the alignment requirements
of an StgDouble may be more than that of StgWord, but we don't pad
misaligned StgDoubles (doing so would be too much hassle).  

<p>
Doing a direct assignment/read of an StgDouble to/from a mis-aligned
location may not work, so we use the ASSIGN_DBL(,)/PK_DBL() macro,
which goes via a temporary.  

<p>
Problem: if the architecture allows mis-aligned accesses, but prefers
aligned accesses, these macros just add an extra level of indirection.
We need to distinguish between an architecture that allows mis-aligned
accesses and one that just imposes a performance penalty (this is most
of them).  Perhaps have PREFERRED_ALIGNMENT and REQUIRED_ALIGMENT
configurations?

<li>
Avoid conditional code like this:

<pre>
  #ifdef solaris_HOST_OS
  // do something solaris specific
  #endif
</pre>

Instead, add an appropriate test to the configure.in script and use
the result of that test instead. 

<pre>
  #ifdef HAVE_BSD_H
  // use a BSD library
  #endif
</pre>

The problem is that things change from one version of an OS to another
- things get added, things get deleted, things get broken, some things
are optional extras.  Using "feature tests" instead of "system tests"
makes things a lot less brittle.  Things also tend to get documented
better.

</ul>

<h2>Debugging/robustness tricks</h2>


Anyone who has tried to debug a garbage collector or code generator
will tell you: "If a program is going to crash, it should crash as
soon, as noisily and as often as possible."  There's nothing worse
than trying to find a bug which only shows up when running GHC on
itself and doesn't manifest itself until 10 seconds after the actual
cause of the problem.

<p>
The ideas in this section are mostly aimed at this issue:

<ul>
<li>
Use assertions.  Use lots of assertions.  If you write a comment
that says "takes a +ve number" add an assertion.  If you're casting
an int to a nat, add an assertion.  If you're casting an int to a char,
add an assertion.

<li>
Write special debugging code to check the integrity of your data structures.
(Most of the runtime checking code is in <tt>src/Sanity.c</tt>)
Add extra assertions which call this code at the start and end of any
code that operates on your data structures.

<li>
When you find a hard-to-spot bug, try to think of some assertions,
sanity checks or whatever that would have made the bug easier to find.

<li>
When defining an enumeration, it's a good idea not to use 0 for normal
values.  Instead, make 0 raise an internal error.  The idea here is to
make it easier to detect pointer-related errors on the assumption that
random pointers are more likely to point to a 0 than to anything else.

<pre>
typedef enum
    { i_INTERNAL_ERROR  /* Instruction 0 raises an internal error */
    , i_PANIC           /* irrefutable pattern match failed! */
    , i_ERROR           /* user level error */

    ...
</pre>

<li> Use #warning or #error whenever you write a piece of incomplete/broken code.

<li> When testing, try to make infrequent things happen often.
     For example, make a context switch/gc/etc happen every time a 
     context switch/gc/etc can happen.  The system will run like a 
     pig but it'll catch a lot of bugs.

</ul>

<h2>Syntactic details</h2>

<ul>
<li><b>Important:</b> Put "redundant" braces or parens in your code.
Omitting braces and parens leads to very hard to spot bugs -
especially if you use macros (and you might have noticed that GHC does
this a lot!)

<p>
In particular:
<ul>
<li>
Put braces round the body of for loops, while loops, if statements, etc.
even if they "aren't needed" because it's really hard to find the resulting
bug if you mess up.  Indent them any way you like but put them in there!

<li>
When defining a macro, always put parens round args - just in case.
For example, write:
<pre>
  #define add(x,y) ((x)+(y))
</pre>
instead of
<pre>
  #define add(x,y) x+y
</pre>

<li>
Don't define macros that expand to a list of statements.  
You could just use braces as in:

<pre>
  #define ASSIGN_CC_ID(ccID)              \
        {                                 \
        ccID = CC_ID;                     \
        CC_ID++;                          \
        }
</pre>

but it's better to use the "do { ... } while (0)" trick instead:

<pre>
  #define ASSIGN_CC_ID(ccID)              \
        do {                              \
        ccID = CC_ID;                     \
        CC_ID++;                          \
        } while(0)
</pre>

The following explanation comes from 
<a href="http://www.cs.umd.edu/users/cml/cstyle/code-std-disc.txt">The Usenet C programming FAQ</a>
<pre>
10.4:   What's the best way to write a multi-statement macro?

A:      The usual goal is to write a macro that can be invoked as if it
        were a statement consisting of a single function call.  This
        means that the "caller" will be supplying the final semicolon,
        so the macro body should not.  The macro body cannot therefore
        be a simple brace-enclosed compound statement, because syntax
        errors would result if it were invoked (apparently as a single
        statement, but with a resultant extra semicolon) as the if
        branch of an if/else statement with an explicit else clause.

        The traditional solution, therefore, is to use

                #define MACRO(arg1, arg2) do {  \
                        /* declarations */      \
                        stmt1;                  \
                        stmt2;                  \
                        /* ... */               \
                        } while(0)      /* (no trailing ; ) */

        When the caller appends a semicolon, this expansion becomes a
        single statement regardless of context.  (An optimizing compiler
        will remove any "dead" tests or branches on the constant
        condition 0, although lint may complain.)

        If all of the statements in the intended macro are simple
        expressions, with no declarations or loops, another technique is
        to write a single, parenthesized expression using one or more
        comma operators.  (For an example, see the first DEBUG() macro
        in question 10.26.)  This technique also allows a value to be
        "returned."

        References: H&S Sec. 3.3.2 p. 45; CT&P Sec. 6.3 pp. 82-3.
</pre>

<li>
Don't even write macros that expand to 0 statements - they can mess you 
up as well.  Use the doNothing macro instead.
<pre>
  #define doNothing() do { } while (0)
</pre>
</ul>

<li>
Use inline functions instead of macros if possible - they're a lot
less tricky to get right and don't suffer from the usual problems
of side effects, evaluation order, multiple evaluation, etc.

<ul>
<li>Inline functions get the naming issue right.  E.g. they
  can have local variables which (in an expression context)
  macros can't.

<li> Inline functions have call-by-value semantics whereas macros
  are call-by-name.  You can be bitten by duplicated computation
  if you aren't careful.

<li> You can use inline functions from inside gdb if you compile with
  -O0 or -fkeep-inline-functions.  If you use macros, you'd better
  know what they expand to.
</ul>

However, note that macros can serve as both l-values and r-values and
can be "polymorphic" as these examples show:
<pre>
  // you can use this as an l-value or an l-value
  #define PROF_INFO(cl) (((StgClosure*)(cl))->header.profInfo)

  // polymorphic case
  // but note that min(min(1,2),3) does 3 comparisions instead of 2!!
  #define min(x,y) (((x)<=(y)) ? (x) : (y))
</pre>

<li>
Inline functions should be "static inline" because:
<ul>
<li>
gcc will delete static inlines if not used or theyre always inlined.

<li>
  if they're externed, we could get conflicts between 2 copies of the 
  same function if, for some reason, gcc is unable to delete them.
  If they're static, we still get multiple copies but at least they don't conflict.
</ul>

OTOH, the gcc manual says this
so maybe we should use extern inline?

<pre>
   When a function is both inline and `static', if all calls to the
function are integrated into the caller, and the function's address is
never used, then the function's own assembler code is never referenced.
In this case, GNU CC does not actually output assembler code for the
function, unless you specify the option `-fkeep-inline-functions'.
Some calls cannot be integrated for various reasons (in particular,
calls that precede the function's definition cannot be integrated, and
neither can recursive calls within the definition).  If there is a
nonintegrated call, then the function is compiled to assembler code as
usual.  The function must also be compiled as usual if the program
refers to its address, because that can't be inlined.

   When an inline function is not `static', then the compiler must
assume that there may be calls from other source files; since a global
symbol can be defined only once in any program, the function must not
be defined in the other source files, so the calls therein cannot be
integrated.  Therefore, a non-`static' inline function is always
compiled on its own in the usual fashion.

   If you specify both `inline' and `extern' in the function
definition, then the definition is used only for inlining.  In no case
is the function compiled on its own, not even if you refer to its
address explicitly.  Such an address becomes an external reference, as
if you had only declared the function, and had not defined it.

   This combination of `inline' and `extern' has almost the effect of a
macro.  The way to use it is to put a function definition in a header
file with these keywords, and put another copy of the definition
(lacking `inline' and `extern') in a library file.  The definition in
the header file will cause most calls to the function to be inlined.
If any uses of the function remain, they will refer to the single copy
in the library.
</pre>

<li>
This code
<pre>
int* p, q;
</pre>
looks like it declares two pointers but, in fact, only p is a pointer.
It's safer to write this:
<pre>
int* p;
int* q;
</pre>
You could also write this:
<pre>
int *p, *q;
</pre>
but Alastair prefers to split the declarations.

<li>
Try to use ANSI C's enum feature when defining lists of constants of
the same type.  Among other benefits, you'll notice that gdb uses the
name instead of its (usually inscrutable) number when printing values
with enum types and gdb will let you use the name in expressions you
type.  

<p>
Examples:
<pre>
    typedef enum { /* N.B. Used as indexes into arrays */
     NO_HEAP_PROFILING,		
     HEAP_BY_CC,		
     HEAP_BY_MOD,		
     HEAP_BY_GRP,		
     HEAP_BY_DESCR,		
     HEAP_BY_TYPE,		
     HEAP_BY_TIME		
    } ProfilingFlags;
</pre>
instead of
<pre>
    # define NO_HEAP_PROFILING	0	/* N.B. Used as indexes into arrays */
    # define HEAP_BY_CC		1
    # define HEAP_BY_MOD	2
    # define HEAP_BY_GRP	3
    # define HEAP_BY_DESCR	4
    # define HEAP_BY_TYPE	5
    # define HEAP_BY_TIME	6
</pre>
and 
<pre>
    typedef enum {
     CCchar    = 'C',
     MODchar   = 'M',
     GRPchar   = 'G',
     DESCRchar = 'D',
     TYPEchar  = 'Y',
     TIMEchar  = 'T'
    } ProfilingTag;
</pre>
instead of
<pre>
    # define CCchar    'C'
    # define MODchar   'M'
    # define GRPchar   'G'
    # define DESCRchar 'D'
    # define TYPEchar  'Y'
    # define TIMEchar  'T'
</pre>
ToDo: at the time of writing, we still use the former.

<li>
Alastair likes to use stgCast instead of C syntax.  He claims
it's easier to write and easier to grep for. YMMV.
<pre>
#define stgCast(ty,e) ((ty)(e))
</pre>

<li> We don't care too much about your indentation style but, if
you're modifying a function, please try to use the same style as the
rest of the function (or file).  

<p>
On which note:
Hugs related pieces of code often start with the line:
<pre>
  /* -*- mode: hugs-c; -*- */
</pre>
which helps Emacs mimic the indentation style used by Mark P Jones
within Hugs.  Add this to your .emacs file.
<pre>
  (defun hugs-c-mode ()
    "C mode with adjusted defaults for use with Hugs (based on linux-c-mode)"
    (interactive)
    (c-mode)
    (setq c-basic-offset 4)
    (setq indent-tabs-mode nil) ; don't use tabs to indent
    (setq c-recognize-knr-r nil)  ; no K&R here - don't pay the price
    ; no: (setq tab-width 4)

    (c-set-offset 'knr-argdecl-intro    0)
    (c-set-offset 'case-label           0)
    (c-set-offset 'statement-case-intro '++)
    (c-set-offset 'statement-case-open  '+)
    )
</pre>


</ul>

<h2>CVS issues</h2>

<ul>
<li>
Don't be tempted to reindent or reorganise large chunks of code - it
generates large diffs in which it's hard to see whether anything else
was changed.
<p>
If you must reindent or reorganise, don't do anything else in that commit
and give advance warning that you're about to do it in case anyone else
is changing that file.
</ul>



</body>
</html>