summaryrefslogtreecommitdiff
path: root/ghc/docs/comm/the-beast/syntax.html
blob: be5bbefa17a7c8bd1c6a09b8518bfd1270dd6725 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
<html>
  <head>
    <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=ISO-8859-1">
    <title>The GHC Commentary - Just Syntax</title>
  </head>

  <body BGCOLOR="FFFFFF">
    <h1>The GHC Commentary - Just Syntax</h1>
    <p>
      The lexical and syntactic analyser for Haskell programs are located in
      <a
      href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/parser/"><code>fptools/ghc/compiler/parser/</code></a>.
    </p>

    <h2>The Lexer</h2>
    <p>
      The lexer is a rather tedious piece of Haskell code contained in the
      module <a
      href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/parser/Lex.lhs"><code>Lex</code></a>.
      Its complexity partially stems from covering, in addition to Haskell 98, 
      also the whole range of GHC language extensions plus its ability to
      analyse interface files in addition to normal Haskell source.  The lexer
      defines a parser monad <code>P a</code>, where <code>a</code> is the
      type of the result expected from a successful parse.  More precisely, a
      result of type
<blockquote><pre>
data ParseResult a = POk PState a
		   | PFailed Message</pre>
</blockquote>
    <p>
      is produced with <code>Message</code> being from <a
      href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/main/ErrUtils.lhs"><code>ErrUtils</code></a>
      (and currently is simply a synonym for <code>SDoc</code>).
    <p>
      The record type <code>PState</code> contains information such as the
      current source location, buffer state, contexts for layout processing,
      and whether Glasgow extensions are accepted (either due to
      <code>-fglasgow-exts</code> or due to reading an interface file).  Most
      of the fields of <code>PState</code> store unboxed values; in fact, even
      the flag indicating whether Glasgow extensions are enabled is
      represented by an unboxed integer instead of by a <code>Bool</code>.  My
      (= chak's) guess is that this is to avoid having to perform a
      <code>case</code> on a boxed value in the inner loop of the lexer.
    <p>
      The same lexer is used by the Haskell source parser, the Haskell
      interface parser, and the package configuration parser.
    
    <h2>The Haskell Source Parser</h2>
    <p>
      The parser for Haskell source files is defined in the form of a parser
      specification for the parser generator <a
      href="http://haskell.org/happy/">Happy</a> in the file <a
      href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/parser/Parser.y"><code>Parser.y</code></a>. 
      The parser exports three entry points for parsing entire modules
      (<code>parseModule</code>, individual statements
      (<code>parseStmt</code>), and individual identifiers
      (<code>parseIdentifier</code>), respectively.  The last two are needed
      for GHCi.  All three require a parser state (of type
      <code>PState</code>) and are invoked from <a
      href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/main/HscMain.lhs"><code>HscMain</code></a>.
    <p>
      Parsing of Haskell is a rather involved process.  The most challenging
      features are probably the treatment of layout and expressions that
      contain infix operators.  The latter may be user-defined and so are not
      easily captured in a static syntax specification.  Infix operators may
      also appear in the right hand sides of value definitions, and so, GHC's
      parser treats those in the same way as expressions.  In other words, as
      general expressions are a syntactic superset of expressions - ok, they
      <em>nearly</em> are - the parser simply attempts to parse a general
      expression in such positions.  Afterwards, the generated parse tree is
      inspected to ensure that the accepted phrase indeed forms a legal
      pattern.  This and similar checks are performed by the routines from <a
      href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/parser/ParseUtil.lhs"><code>ParseUtil</code></a>. In
      some cases, these routines do, in addition to checking for
      wellformedness, also transform the parse tree, such that it fits into
      the syntactic context in which it has been parsed; in fact, this happens
      for patterns, which are transformed from a representation of type
      <code>RdrNameHsExpr</code> into a representation of type
      <code>RdrNamePat</code>.

    <h2>The Haskell Interface Parser</h2>
    <p>
      The parser for interface files is also generated by Happy from <a href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/rename/ParseIface.y"><code>ParseIface.y</code></a>.
      It's main routine <code>parseIface</code> is invoked from <a href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/rename/RnHiFiles.lhs"><code>RnHiFiles</code></a><code>.readIface</code>.

    <h2>The Package Configuration Parser</h2>
    <p>
      The parser for configuration files is by far the smallest of the three
      and defined in <a href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/main/ParsePkgConf.y"><code>ParsePkgConf.y</code></a>.
      It exports <code>loadPackageConfig</code>, which is used by <a href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/main/DriverState.hs"><code>DriverState</code></a><code>.readPackageConf</code>.
    
    <p><small>
<!-- hhmts start -->
Last modified: Wed Jan 16 00:30:14 EST 2002
<!-- hhmts end -->
    </small>
  </body>
</html>