diff options
Diffstat (limited to 'docs/comm/the-beast/syntax.html')
-rw-r--r-- | docs/comm/the-beast/syntax.html | 99 |
1 files changed, 99 insertions, 0 deletions
diff --git a/docs/comm/the-beast/syntax.html b/docs/comm/the-beast/syntax.html new file mode 100644 index 0000000000..be5bbefa17 --- /dev/null +++ b/docs/comm/the-beast/syntax.html @@ -0,0 +1,99 @@ +<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN"> +<html> + <head> + <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=ISO-8859-1"> + <title>The GHC Commentary - Just Syntax</title> + </head> + + <body BGCOLOR="FFFFFF"> + <h1>The GHC Commentary - Just Syntax</h1> + <p> + The lexical and syntactic analyser for Haskell programs are located in + <a + href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/parser/"><code>fptools/ghc/compiler/parser/</code></a>. + </p> + + <h2>The Lexer</h2> + <p> + The lexer is a rather tedious piece of Haskell code contained in the + module <a + href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/parser/Lex.lhs"><code>Lex</code></a>. + Its complexity partially stems from covering, in addition to Haskell 98, + also the whole range of GHC language extensions plus its ability to + analyse interface files in addition to normal Haskell source. The lexer + defines a parser monad <code>P a</code>, where <code>a</code> is the + type of the result expected from a successful parse. More precisely, a + result of type +<blockquote><pre> +data ParseResult a = POk PState a + | PFailed Message</pre> +</blockquote> + <p> + is produced with <code>Message</code> being from <a + href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/main/ErrUtils.lhs"><code>ErrUtils</code></a> + (and currently is simply a synonym for <code>SDoc</code>). + <p> + The record type <code>PState</code> contains information such as the + current source location, buffer state, contexts for layout processing, + and whether Glasgow extensions are accepted (either due to + <code>-fglasgow-exts</code> or due to reading an interface file). Most + of the fields of <code>PState</code> store unboxed values; in fact, even + the flag indicating whether Glasgow extensions are enabled is + represented by an unboxed integer instead of by a <code>Bool</code>. My + (= chak's) guess is that this is to avoid having to perform a + <code>case</code> on a boxed value in the inner loop of the lexer. + <p> + The same lexer is used by the Haskell source parser, the Haskell + interface parser, and the package configuration parser. + + <h2>The Haskell Source Parser</h2> + <p> + The parser for Haskell source files is defined in the form of a parser + specification for the parser generator <a + href="http://haskell.org/happy/">Happy</a> in the file <a + href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/parser/Parser.y"><code>Parser.y</code></a>. + The parser exports three entry points for parsing entire modules + (<code>parseModule</code>, individual statements + (<code>parseStmt</code>), and individual identifiers + (<code>parseIdentifier</code>), respectively. The last two are needed + for GHCi. All three require a parser state (of type + <code>PState</code>) and are invoked from <a + href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/main/HscMain.lhs"><code>HscMain</code></a>. + <p> + Parsing of Haskell is a rather involved process. The most challenging + features are probably the treatment of layout and expressions that + contain infix operators. The latter may be user-defined and so are not + easily captured in a static syntax specification. Infix operators may + also appear in the right hand sides of value definitions, and so, GHC's + parser treats those in the same way as expressions. In other words, as + general expressions are a syntactic superset of expressions - ok, they + <em>nearly</em> are - the parser simply attempts to parse a general + expression in such positions. Afterwards, the generated parse tree is + inspected to ensure that the accepted phrase indeed forms a legal + pattern. This and similar checks are performed by the routines from <a + href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/parser/ParseUtil.lhs"><code>ParseUtil</code></a>. In + some cases, these routines do, in addition to checking for + wellformedness, also transform the parse tree, such that it fits into + the syntactic context in which it has been parsed; in fact, this happens + for patterns, which are transformed from a representation of type + <code>RdrNameHsExpr</code> into a representation of type + <code>RdrNamePat</code>. + + <h2>The Haskell Interface Parser</h2> + <p> + The parser for interface files is also generated by Happy from <a href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/rename/ParseIface.y"><code>ParseIface.y</code></a>. + It's main routine <code>parseIface</code> is invoked from <a href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/rename/RnHiFiles.lhs"><code>RnHiFiles</code></a><code>.readIface</code>. + + <h2>The Package Configuration Parser</h2> + <p> + The parser for configuration files is by far the smallest of the three + and defined in <a href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/main/ParsePkgConf.y"><code>ParsePkgConf.y</code></a>. + It exports <code>loadPackageConfig</code>, which is used by <a href="http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/ghc/compiler/main/DriverState.hs"><code>DriverState</code></a><code>.readPackageConf</code>. + + <p><small> +<!-- hhmts start --> +Last modified: Wed Jan 16 00:30:14 EST 2002 +<!-- hhmts end --> + </small> + </body> +</html> |