summaryrefslogtreecommitdiff
path: root/compiler/GHC/Parser
diff options
context:
space:
mode:
authorSylvain Henry <sylvain@haskus.fr>2020-04-05 17:39:13 +0200
committerSylvain Henry <sylvain@haskus.fr>2020-04-18 20:04:46 +0200
commit15312bbb53f247c9ed2c5cf75100a9f44c1c7227 (patch)
tree8306dcc04a5b7c82464f903044dfdd589e7fdcd7 /compiler/GHC/Parser
parent3ca52151881451ce5b3a7740d003e811b586140d (diff)
downloadhaskell-15312bbb53f247c9ed2c5cf75100a9f44c1c7227.tar.gz
Modules (#13009)
* SysTools * Parser * GHC.Builtin * GHC.Iface.Recomp * Settings Update Haddock submodule Metric Decrease: Naperian parsing001
Diffstat (limited to 'compiler/GHC/Parser')
-rw-r--r--compiler/GHC/Parser/Annotation.hs378
-rw-r--r--compiler/GHC/Parser/CharClass.hs215
-rw-r--r--compiler/GHC/Parser/Header.hs361
-rw-r--r--compiler/GHC/Parser/Lexer.x3294
-rw-r--r--compiler/GHC/Parser/PostProcess.hs3090
-rw-r--r--compiler/GHC/Parser/PostProcess/Haddock.hs35
6 files changed, 7373 insertions, 0 deletions
diff --git a/compiler/GHC/Parser/Annotation.hs b/compiler/GHC/Parser/Annotation.hs
new file mode 100644
index 0000000000..dbd1f79e23
--- /dev/null
+++ b/compiler/GHC/Parser/Annotation.hs
@@ -0,0 +1,378 @@
+{-# LANGUAGE DeriveDataTypeable #-}
+
+module GHC.Parser.Annotation (
+ getAnnotation, getAndRemoveAnnotation,
+ getAnnotationComments,getAndRemoveAnnotationComments,
+ ApiAnns(..),
+ ApiAnnKey,
+ AnnKeywordId(..),
+ AnnotationComment(..),
+ IsUnicodeSyntax(..),
+ unicodeAnn,
+ HasE(..),
+ LRdrName -- Exists for haddocks only
+ ) where
+
+import GhcPrelude
+
+import GHC.Types.Name.Reader
+import Outputable
+import GHC.Types.SrcLoc
+import qualified Data.Map as Map
+import Data.Data
+
+
+{-
+Note [Api annotations]
+~~~~~~~~~~~~~~~~~~~~~~
+Given a parse tree of a Haskell module, how can we reconstruct
+the original Haskell source code, retaining all whitespace and
+source code comments? We need to track the locations of all
+elements from the original source: this includes keywords such as
+'let' / 'in' / 'do' etc as well as punctuation such as commas and
+braces, and also comments. We collectively refer to this
+metadata as the "API annotations".
+
+Rather than annotate the resulting parse tree with these locations
+directly (this would be a major change to some fairly core data
+structures in GHC), we instead capture locations for these elements in a
+structure separate from the parse tree, and returned in the
+pm_annotations field of the ParsedModule type.
+
+The full ApiAnns type is
+
+> data ApiAnns =
+> ApiAnns
+> { apiAnnItems :: Map.Map ApiAnnKey [RealSrcSpan],
+> apiAnnEofPos :: Maybe RealSrcSpan,
+> apiAnnComments :: Map.Map RealSrcSpan [RealLocated AnnotationComment],
+> apiAnnRogueComments :: [RealLocated AnnotationComment]
+> }
+
+NON-COMMENT ELEMENTS
+
+Intuitively, every AST element directly contains a bag of keywords
+(keywords can show up more than once in a node: a semicolon i.e. newline
+can show up multiple times before the next AST element), each of which
+needs to be associated with its location in the original source code.
+
+Consequently, the structure that records non-comment elements is logically
+a two level map, from the RealSrcSpan of the AST element containing it, to
+a map from keywords ('AnnKeyWord') to all locations of the keyword directly
+in the AST element:
+
+> type ApiAnnKey = (RealSrcSpan,AnnKeywordId)
+>
+> Map.Map ApiAnnKey [RealSrcSpan]
+
+So
+
+> let x = 1 in 2 *x
+
+would result in the AST element
+
+ L span (HsLet (binds for x = 1) (2 * x))
+
+and the annotations
+
+ (span,AnnLet) having the location of the 'let' keyword
+ (span,AnnEqual) having the location of the '=' sign
+ (span,AnnIn) having the location of the 'in' keyword
+
+For any given element in the AST, there is only a set number of
+keywords that are applicable for it (e.g., you'll never see an
+'import' keyword associated with a let-binding.) The set of allowed
+keywords is documented in a comment associated with the constructor
+of a given AST element, although the ground truth is in GHC.Parser
+and GHC.Parser.PostProcess (which actually add the annotations; see #13012).
+
+COMMENT ELEMENTS
+
+Every comment is associated with a *located* AnnotationComment.
+We associate comments with the lowest (most specific) AST element
+enclosing them:
+
+> Map.Map RealSrcSpan [RealLocated AnnotationComment]
+
+PARSER STATE
+
+There are three fields in PState (the parser state) which play a role
+with annotations.
+
+> annotations :: [(ApiAnnKey,[RealSrcSpan])],
+> comment_q :: [RealLocated AnnotationComment],
+> annotations_comments :: [(RealSrcSpan,[RealLocated AnnotationComment])]
+
+The 'annotations' and 'annotations_comments' fields are simple: they simply
+accumulate annotations that will end up in 'ApiAnns' at the end
+(after they are passed to Map.fromList).
+
+The 'comment_q' field captures comments as they are seen in the token stream,
+so that when they are ready to be allocated via the parser they are
+available (at the time we lex a comment, we don't know what the enclosing
+AST node of it is, so we can't associate it with a RealSrcSpan in
+annotations_comments).
+
+PARSER EMISSION OF ANNOTATIONS
+
+The parser interacts with the lexer using the function
+
+> addAnnotation :: RealSrcSpan -> AnnKeywordId -> RealSrcSpan -> P ()
+
+which takes the AST element RealSrcSpan, the annotation keyword and the
+target RealSrcSpan.
+
+This adds the annotation to the `annotations` field of `PState` and
+transfers any comments in `comment_q` WHICH ARE ENCLOSED by
+the RealSrcSpan of this element to the `annotations_comments`
+field. (Comments which are outside of this annotation are deferred
+until later. 'allocateComments' in 'Lexer' is responsible for
+making sure we only attach comments that actually fit in the 'SrcSpan'.)
+
+The wiki page describing this feature is
+https://gitlab.haskell.org/ghc/ghc/wikis/api-annotations
+
+-}
+-- ---------------------------------------------------------------------
+
+-- If you update this, update the Note [Api annotations] above
+data ApiAnns =
+ ApiAnns
+ { apiAnnItems :: Map.Map ApiAnnKey [RealSrcSpan],
+ apiAnnEofPos :: Maybe RealSrcSpan,
+ apiAnnComments :: Map.Map RealSrcSpan [RealLocated AnnotationComment],
+ apiAnnRogueComments :: [RealLocated AnnotationComment]
+ }
+
+-- If you update this, update the Note [Api annotations] above
+type ApiAnnKey = (RealSrcSpan,AnnKeywordId)
+
+
+-- | Retrieve a list of annotation 'SrcSpan's based on the 'SrcSpan'
+-- of the annotated AST element, and the known type of the annotation.
+getAnnotation :: ApiAnns -> RealSrcSpan -> AnnKeywordId -> [RealSrcSpan]
+getAnnotation anns span ann =
+ case Map.lookup ann_key ann_items of
+ Nothing -> []
+ Just ss -> ss
+ where ann_items = apiAnnItems anns
+ ann_key = (span,ann)
+
+-- | Retrieve a list of annotation 'SrcSpan's based on the 'SrcSpan'
+-- of the annotated AST element, and the known type of the annotation.
+-- The list is removed from the annotations.
+getAndRemoveAnnotation :: ApiAnns -> RealSrcSpan -> AnnKeywordId
+ -> ([RealSrcSpan],ApiAnns)
+getAndRemoveAnnotation anns span ann =
+ case Map.lookup ann_key ann_items of
+ Nothing -> ([],anns)
+ Just ss -> (ss,anns{ apiAnnItems = Map.delete ann_key ann_items })
+ where ann_items = apiAnnItems anns
+ ann_key = (span,ann)
+
+-- |Retrieve the comments allocated to the current 'SrcSpan'
+--
+-- Note: A given 'SrcSpan' may appear in multiple AST elements,
+-- beware of duplicates
+getAnnotationComments :: ApiAnns -> RealSrcSpan -> [RealLocated AnnotationComment]
+getAnnotationComments anns span =
+ case Map.lookup span (apiAnnComments anns) of
+ Just cs -> cs
+ Nothing -> []
+
+-- |Retrieve the comments allocated to the current 'SrcSpan', and
+-- remove them from the annotations
+getAndRemoveAnnotationComments :: ApiAnns -> RealSrcSpan
+ -> ([RealLocated AnnotationComment],ApiAnns)
+getAndRemoveAnnotationComments anns span =
+ case Map.lookup span ann_comments of
+ Just cs -> (cs, anns{ apiAnnComments = Map.delete span ann_comments })
+ Nothing -> ([], anns)
+ where ann_comments = apiAnnComments anns
+
+-- --------------------------------------------------------------------
+
+-- | API Annotations exist so that tools can perform source to source
+-- conversions of Haskell code. They are used to keep track of the
+-- various syntactic keywords that are not captured in the existing
+-- AST.
+--
+-- The annotations, together with original source comments are made
+-- available in the @'pm_annotations'@ field of @'GHC.ParsedModule'@.
+-- Comments are only retained if @'Opt_KeepRawTokenStream'@ is set in
+-- @'DynFlags.DynFlags'@ before parsing.
+--
+-- The wiki page describing this feature is
+-- https://gitlab.haskell.org/ghc/ghc/wikis/api-annotations
+--
+-- Note: in general the names of these are taken from the
+-- corresponding token, unless otherwise noted
+-- See note [Api annotations] above for details of the usage
+data AnnKeywordId
+ = AnnAnyclass
+ | AnnAs
+ | AnnAt
+ | AnnBang -- ^ '!'
+ | AnnBackquote -- ^ '`'
+ | AnnBy
+ | AnnCase -- ^ case or lambda case
+ | AnnClass
+ | AnnClose -- ^ '\#)' or '\#-}' etc
+ | AnnCloseB -- ^ '|)'
+ | AnnCloseBU -- ^ '|)', unicode variant
+ | AnnCloseC -- ^ '}'
+ | AnnCloseQ -- ^ '|]'
+ | AnnCloseQU -- ^ '|]', unicode variant
+ | AnnCloseP -- ^ ')'
+ | AnnCloseS -- ^ ']'
+ | AnnColon
+ | AnnComma -- ^ as a list separator
+ | AnnCommaTuple -- ^ in a RdrName for a tuple
+ | AnnDarrow -- ^ '=>'
+ | AnnDarrowU -- ^ '=>', unicode variant
+ | AnnData
+ | AnnDcolon -- ^ '::'
+ | AnnDcolonU -- ^ '::', unicode variant
+ | AnnDefault
+ | AnnDeriving
+ | AnnDo
+ | AnnDot -- ^ '.'
+ | AnnDotdot -- ^ '..'
+ | AnnElse
+ | AnnEqual
+ | AnnExport
+ | AnnFamily
+ | AnnForall
+ | AnnForallU -- ^ Unicode variant
+ | AnnForeign
+ | AnnFunId -- ^ for function name in matches where there are
+ -- multiple equations for the function.
+ | AnnGroup
+ | AnnHeader -- ^ for CType
+ | AnnHiding
+ | AnnIf
+ | AnnImport
+ | AnnIn
+ | AnnInfix -- ^ 'infix' or 'infixl' or 'infixr'
+ | AnnInstance
+ | AnnLam
+ | AnnLarrow -- ^ '<-'
+ | AnnLarrowU -- ^ '<-', unicode variant
+ | AnnLet
+ | AnnMdo
+ | AnnMinus -- ^ '-'
+ | AnnModule
+ | AnnNewtype
+ | AnnName -- ^ where a name loses its location in the AST, this carries it
+ | AnnOf
+ | AnnOpen -- ^ '(\#' or '{-\# LANGUAGE' etc
+ | AnnOpenB -- ^ '(|'
+ | AnnOpenBU -- ^ '(|', unicode variant
+ | AnnOpenC -- ^ '{'
+ | AnnOpenE -- ^ '[e|' or '[e||'
+ | AnnOpenEQ -- ^ '[|'
+ | AnnOpenEQU -- ^ '[|', unicode variant
+ | AnnOpenP -- ^ '('
+ | AnnOpenS -- ^ '['
+ | AnnDollar -- ^ prefix '$' -- TemplateHaskell
+ | AnnDollarDollar -- ^ prefix '$$' -- TemplateHaskell
+ | AnnPackageName
+ | AnnPattern
+ | AnnProc
+ | AnnQualified
+ | AnnRarrow -- ^ '->'
+ | AnnRarrowU -- ^ '->', unicode variant
+ | AnnRec
+ | AnnRole
+ | AnnSafe
+ | AnnSemi -- ^ ';'
+ | AnnSimpleQuote -- ^ '''
+ | AnnSignature
+ | AnnStatic -- ^ 'static'
+ | AnnStock
+ | AnnThen
+ | AnnThIdSplice -- ^ '$'
+ | AnnThIdTySplice -- ^ '$$'
+ | AnnThTyQuote -- ^ double '''
+ | AnnTilde -- ^ '~'
+ | AnnType
+ | AnnUnit -- ^ '()' for types
+ | AnnUsing
+ | AnnVal -- ^ e.g. INTEGER
+ | AnnValStr -- ^ String value, will need quotes when output
+ | AnnVbar -- ^ '|'
+ | AnnVia -- ^ 'via'
+ | AnnWhere
+ | Annlarrowtail -- ^ '-<'
+ | AnnlarrowtailU -- ^ '-<', unicode variant
+ | Annrarrowtail -- ^ '->'
+ | AnnrarrowtailU -- ^ '->', unicode variant
+ | AnnLarrowtail -- ^ '-<<'
+ | AnnLarrowtailU -- ^ '-<<', unicode variant
+ | AnnRarrowtail -- ^ '>>-'
+ | AnnRarrowtailU -- ^ '>>-', unicode variant
+ deriving (Eq, Ord, Data, Show)
+
+instance Outputable AnnKeywordId where
+ ppr x = text (show x)
+
+-- ---------------------------------------------------------------------
+
+data AnnotationComment =
+ -- Documentation annotations
+ AnnDocCommentNext String -- ^ something beginning '-- |'
+ | AnnDocCommentPrev String -- ^ something beginning '-- ^'
+ | AnnDocCommentNamed String -- ^ something beginning '-- $'
+ | AnnDocSection Int String -- ^ a section heading
+ | AnnDocOptions String -- ^ doc options (prune, ignore-exports, etc)
+ | AnnLineComment String -- ^ comment starting by "--"
+ | AnnBlockComment String -- ^ comment in {- -}
+ deriving (Eq, Ord, Data, Show)
+-- Note: these are based on the Token versions, but the Token type is
+-- defined in GHC.Parser.Lexer and bringing it in here would create a loop
+
+instance Outputable AnnotationComment where
+ ppr x = text (show x)
+
+-- | - 'ApiAnnotation.AnnKeywordId' : 'ApiAnnotation.AnnOpen',
+-- 'ApiAnnotation.AnnClose','ApiAnnotation.AnnComma',
+-- 'ApiAnnotation.AnnRarrow'
+-- 'ApiAnnotation.AnnTilde'
+-- - May have 'ApiAnnotation.AnnComma' when in a list
+type LRdrName = Located RdrName
+
+
+-- | Certain tokens can have alternate representations when unicode syntax is
+-- enabled. This flag is attached to those tokens in the lexer so that the
+-- original source representation can be reproduced in the corresponding
+-- 'ApiAnnotation'
+data IsUnicodeSyntax = UnicodeSyntax | NormalSyntax
+ deriving (Eq, Ord, Data, Show)
+
+-- | Convert a normal annotation into its unicode equivalent one
+unicodeAnn :: AnnKeywordId -> AnnKeywordId
+unicodeAnn AnnForall = AnnForallU
+unicodeAnn AnnDcolon = AnnDcolonU
+unicodeAnn AnnLarrow = AnnLarrowU
+unicodeAnn AnnRarrow = AnnRarrowU
+unicodeAnn AnnDarrow = AnnDarrowU
+unicodeAnn Annlarrowtail = AnnlarrowtailU
+unicodeAnn Annrarrowtail = AnnrarrowtailU
+unicodeAnn AnnLarrowtail = AnnLarrowtailU
+unicodeAnn AnnRarrowtail = AnnRarrowtailU
+unicodeAnn AnnOpenB = AnnOpenBU
+unicodeAnn AnnCloseB = AnnCloseBU
+unicodeAnn AnnOpenEQ = AnnOpenEQU
+unicodeAnn AnnCloseQ = AnnCloseQU
+unicodeAnn ann = ann
+
+
+-- | Some template haskell tokens have two variants, one with an `e` the other
+-- not:
+--
+-- > [| or [e|
+-- > [|| or [e||
+--
+-- This type indicates whether the 'e' is present or not.
+data HasE = HasE | NoE
+ deriving (Eq, Ord, Data, Show)
diff --git a/compiler/GHC/Parser/CharClass.hs b/compiler/GHC/Parser/CharClass.hs
new file mode 100644
index 0000000000..dc98d48f94
--- /dev/null
+++ b/compiler/GHC/Parser/CharClass.hs
@@ -0,0 +1,215 @@
+-- Character classification
+{-# LANGUAGE CPP #-}
+module GHC.Parser.CharClass
+ ( is_ident -- Char# -> Bool
+ , is_symbol -- Char# -> Bool
+ , is_any -- Char# -> Bool
+ , is_space -- Char# -> Bool
+ , is_lower -- Char# -> Bool
+ , is_upper -- Char# -> Bool
+ , is_digit -- Char# -> Bool
+ , is_alphanum -- Char# -> Bool
+
+ , is_decdigit, is_hexdigit, is_octdigit, is_bindigit
+ , hexDigit, octDecDigit
+ ) where
+
+#include "HsVersions.h"
+
+import GhcPrelude
+
+import Data.Bits ( Bits((.&.),(.|.)) )
+import Data.Char ( ord, chr )
+import Data.Word
+import Panic
+
+-- Bit masks
+
+cIdent, cSymbol, cAny, cSpace, cLower, cUpper, cDigit :: Word8
+cIdent = 1
+cSymbol = 2
+cAny = 4
+cSpace = 8
+cLower = 16
+cUpper = 32
+cDigit = 64
+
+-- | The predicates below look costly, but aren't, GHC+GCC do a great job
+-- at the big case below.
+
+{-# INLINABLE is_ctype #-}
+is_ctype :: Word8 -> Char -> Bool
+is_ctype mask c = (charType c .&. mask) /= 0
+
+is_ident, is_symbol, is_any, is_space, is_lower, is_upper, is_digit,
+ is_alphanum :: Char -> Bool
+is_ident = is_ctype cIdent
+is_symbol = is_ctype cSymbol
+is_any = is_ctype cAny
+is_space = is_ctype cSpace
+is_lower = is_ctype cLower
+is_upper = is_ctype cUpper
+is_digit = is_ctype cDigit
+is_alphanum = is_ctype (cLower+cUpper+cDigit)
+
+-- Utils
+
+hexDigit :: Char -> Int
+hexDigit c | is_decdigit c = ord c - ord '0'
+ | otherwise = ord (to_lower c) - ord 'a' + 10
+
+octDecDigit :: Char -> Int
+octDecDigit c = ord c - ord '0'
+
+is_decdigit :: Char -> Bool
+is_decdigit c
+ = c >= '0' && c <= '9'
+
+is_hexdigit :: Char -> Bool
+is_hexdigit c
+ = is_decdigit c
+ || (c >= 'a' && c <= 'f')
+ || (c >= 'A' && c <= 'F')
+
+is_octdigit :: Char -> Bool
+is_octdigit c = c >= '0' && c <= '7'
+
+is_bindigit :: Char -> Bool
+is_bindigit c = c == '0' || c == '1'
+
+to_lower :: Char -> Char
+to_lower c
+ | c >= 'A' && c <= 'Z' = chr (ord c - (ord 'A' - ord 'a'))
+ | otherwise = c
+
+charType :: Char -> Word8
+charType c = case c of
+ '\0' -> 0 -- \000
+ '\1' -> 0 -- \001
+ '\2' -> 0 -- \002
+ '\3' -> 0 -- \003
+ '\4' -> 0 -- \004
+ '\5' -> 0 -- \005
+ '\6' -> 0 -- \006
+ '\7' -> 0 -- \007
+ '\8' -> 0 -- \010
+ '\9' -> cSpace -- \t (not allowed in strings, so !cAny)
+ '\10' -> cSpace -- \n (ditto)
+ '\11' -> cSpace -- \v (ditto)
+ '\12' -> cSpace -- \f (ditto)
+ '\13' -> cSpace -- ^M (ditto)
+ '\14' -> 0 -- \016
+ '\15' -> 0 -- \017
+ '\16' -> 0 -- \020
+ '\17' -> 0 -- \021
+ '\18' -> 0 -- \022
+ '\19' -> 0 -- \023
+ '\20' -> 0 -- \024
+ '\21' -> 0 -- \025
+ '\22' -> 0 -- \026
+ '\23' -> 0 -- \027
+ '\24' -> 0 -- \030
+ '\25' -> 0 -- \031
+ '\26' -> 0 -- \032
+ '\27' -> 0 -- \033
+ '\28' -> 0 -- \034
+ '\29' -> 0 -- \035
+ '\30' -> 0 -- \036
+ '\31' -> 0 -- \037
+ '\32' -> cAny .|. cSpace --
+ '\33' -> cAny .|. cSymbol -- !
+ '\34' -> cAny -- "
+ '\35' -> cAny .|. cSymbol -- #
+ '\36' -> cAny .|. cSymbol -- $
+ '\37' -> cAny .|. cSymbol -- %
+ '\38' -> cAny .|. cSymbol -- &
+ '\39' -> cAny .|. cIdent -- '
+ '\40' -> cAny -- (
+ '\41' -> cAny -- )
+ '\42' -> cAny .|. cSymbol -- *
+ '\43' -> cAny .|. cSymbol -- +
+ '\44' -> cAny -- ,
+ '\45' -> cAny .|. cSymbol -- -
+ '\46' -> cAny .|. cSymbol -- .
+ '\47' -> cAny .|. cSymbol -- /
+ '\48' -> cAny .|. cIdent .|. cDigit -- 0
+ '\49' -> cAny .|. cIdent .|. cDigit -- 1
+ '\50' -> cAny .|. cIdent .|. cDigit -- 2
+ '\51' -> cAny .|. cIdent .|. cDigit -- 3
+ '\52' -> cAny .|. cIdent .|. cDigit -- 4
+ '\53' -> cAny .|. cIdent .|. cDigit -- 5
+ '\54' -> cAny .|. cIdent .|. cDigit -- 6
+ '\55' -> cAny .|. cIdent .|. cDigit -- 7
+ '\56' -> cAny .|. cIdent .|. cDigit -- 8
+ '\57' -> cAny .|. cIdent .|. cDigit -- 9
+ '\58' -> cAny .|. cSymbol -- :
+ '\59' -> cAny -- ;
+ '\60' -> cAny .|. cSymbol -- <
+ '\61' -> cAny .|. cSymbol -- =
+ '\62' -> cAny .|. cSymbol -- >
+ '\63' -> cAny .|. cSymbol -- ?
+ '\64' -> cAny .|. cSymbol -- @
+ '\65' -> cAny .|. cIdent .|. cUpper -- A
+ '\66' -> cAny .|. cIdent .|. cUpper -- B
+ '\67' -> cAny .|. cIdent .|. cUpper -- C
+ '\68' -> cAny .|. cIdent .|. cUpper -- D
+ '\69' -> cAny .|. cIdent .|. cUpper -- E
+ '\70' -> cAny .|. cIdent .|. cUpper -- F
+ '\71' -> cAny .|. cIdent .|. cUpper -- G
+ '\72' -> cAny .|. cIdent .|. cUpper -- H
+ '\73' -> cAny .|. cIdent .|. cUpper -- I
+ '\74' -> cAny .|. cIdent .|. cUpper -- J
+ '\75' -> cAny .|. cIdent .|. cUpper -- K
+ '\76' -> cAny .|. cIdent .|. cUpper -- L
+ '\77' -> cAny .|. cIdent .|. cUpper -- M
+ '\78' -> cAny .|. cIdent .|. cUpper -- N
+ '\79' -> cAny .|. cIdent .|. cUpper -- O
+ '\80' -> cAny .|. cIdent .|. cUpper -- P
+ '\81' -> cAny .|. cIdent .|. cUpper -- Q
+ '\82' -> cAny .|. cIdent .|. cUpper -- R
+ '\83' -> cAny .|. cIdent .|. cUpper -- S
+ '\84' -> cAny .|. cIdent .|. cUpper -- T
+ '\85' -> cAny .|. cIdent .|. cUpper -- U
+ '\86' -> cAny .|. cIdent .|. cUpper -- V
+ '\87' -> cAny .|. cIdent .|. cUpper -- W
+ '\88' -> cAny .|. cIdent .|. cUpper -- X
+ '\89' -> cAny .|. cIdent .|. cUpper -- Y
+ '\90' -> cAny .|. cIdent .|. cUpper -- Z
+ '\91' -> cAny -- [
+ '\92' -> cAny .|. cSymbol -- backslash
+ '\93' -> cAny -- ]
+ '\94' -> cAny .|. cSymbol -- ^
+ '\95' -> cAny .|. cIdent .|. cLower -- _
+ '\96' -> cAny -- `
+ '\97' -> cAny .|. cIdent .|. cLower -- a
+ '\98' -> cAny .|. cIdent .|. cLower -- b
+ '\99' -> cAny .|. cIdent .|. cLower -- c
+ '\100' -> cAny .|. cIdent .|. cLower -- d
+ '\101' -> cAny .|. cIdent .|. cLower -- e
+ '\102' -> cAny .|. cIdent .|. cLower -- f
+ '\103' -> cAny .|. cIdent .|. cLower -- g
+ '\104' -> cAny .|. cIdent .|. cLower -- h
+ '\105' -> cAny .|. cIdent .|. cLower -- i
+ '\106' -> cAny .|. cIdent .|. cLower -- j
+ '\107' -> cAny .|. cIdent .|. cLower -- k
+ '\108' -> cAny .|. cIdent .|. cLower -- l
+ '\109' -> cAny .|. cIdent .|. cLower -- m
+ '\110' -> cAny .|. cIdent .|. cLower -- n
+ '\111' -> cAny .|. cIdent .|. cLower -- o
+ '\112' -> cAny .|. cIdent .|. cLower -- p
+ '\113' -> cAny .|. cIdent .|. cLower -- q
+ '\114' -> cAny .|. cIdent .|. cLower -- r
+ '\115' -> cAny .|. cIdent .|. cLower -- s
+ '\116' -> cAny .|. cIdent .|. cLower -- t
+ '\117' -> cAny .|. cIdent .|. cLower -- u
+ '\118' -> cAny .|. cIdent .|. cLower -- v
+ '\119' -> cAny .|. cIdent .|. cLower -- w
+ '\120' -> cAny .|. cIdent .|. cLower -- x
+ '\121' -> cAny .|. cIdent .|. cLower -- y
+ '\122' -> cAny .|. cIdent .|. cLower -- z
+ '\123' -> cAny -- {
+ '\124' -> cAny .|. cSymbol -- |
+ '\125' -> cAny -- }
+ '\126' -> cAny .|. cSymbol -- ~
+ '\127' -> 0 -- \177
+ _ -> panic ("charType: " ++ show c)
diff --git a/compiler/GHC/Parser/Header.hs b/compiler/GHC/Parser/Header.hs
new file mode 100644
index 0000000000..e2373827f4
--- /dev/null
+++ b/compiler/GHC/Parser/Header.hs
@@ -0,0 +1,361 @@
+{-# LANGUAGE CPP #-}
+{-# LANGUAGE ViewPatterns #-}
+{-# LANGUAGE TypeFamilies #-}
+
+-----------------------------------------------------------------------------
+--
+-- | Parsing the top of a Haskell source file to get its module name,
+-- imports and options.
+--
+-- (c) Simon Marlow 2005
+-- (c) Lemmih 2006
+--
+-----------------------------------------------------------------------------
+
+module GHC.Parser.Header
+ ( getImports
+ , mkPrelImports -- used by the renamer too
+ , getOptionsFromFile
+ , getOptions
+ , optionsErrorMsgs
+ , checkProcessArgsResult
+ )
+where
+
+#include "HsVersions.h"
+
+import GhcPrelude
+
+import GHC.Platform
+import GHC.Driver.Types
+import GHC.Parser ( parseHeader )
+import GHC.Parser.Lexer
+import FastString
+import GHC.Hs
+import GHC.Types.Module
+import GHC.Builtin.Names
+import StringBuffer
+import GHC.Types.SrcLoc
+import GHC.Driver.Session
+import ErrUtils
+import Util
+import Outputable
+import Maybes
+import Bag ( emptyBag, listToBag, unitBag )
+import MonadUtils
+import Exception
+import GHC.Types.Basic
+import qualified GHC.LanguageExtensions as LangExt
+
+import Control.Monad
+import System.IO
+import System.IO.Unsafe
+import Data.List
+
+------------------------------------------------------------------------------
+
+-- | Parse the imports of a source file.
+--
+-- Throws a 'SourceError' if parsing fails.
+getImports :: DynFlags
+ -> StringBuffer -- ^ Parse this.
+ -> FilePath -- ^ Filename the buffer came from. Used for
+ -- reporting parse error locations.
+ -> FilePath -- ^ The original source filename (used for locations
+ -- in the function result)
+ -> IO (Either
+ ErrorMessages
+ ([(Maybe FastString, Located ModuleName)],
+ [(Maybe FastString, Located ModuleName)],
+ Located ModuleName))
+ -- ^ The source imports and normal imports (with optional package
+ -- names from -XPackageImports), and the module name.
+getImports dflags buf filename source_filename = do
+ let loc = mkRealSrcLoc (mkFastString filename) 1 1
+ case unP parseHeader (mkPState dflags buf loc) of
+ PFailed pst ->
+ -- assuming we're not logging warnings here as per below
+ return $ Left $ getErrorMessages pst dflags
+ POk pst rdr_module -> fmap Right $ do
+ let _ms@(_warns, errs) = getMessages pst dflags
+ -- don't log warnings: they'll be reported when we parse the file
+ -- for real. See #2500.
+ ms = (emptyBag, errs)
+ -- logWarnings warns
+ if errorsFound dflags ms
+ then throwIO $ mkSrcErr errs
+ else
+ let hsmod = unLoc rdr_module
+ mb_mod = hsmodName hsmod
+ imps = hsmodImports hsmod
+ main_loc = srcLocSpan (mkSrcLoc (mkFastString source_filename)
+ 1 1)
+ mod = mb_mod `orElse` L main_loc mAIN_NAME
+ (src_idecls, ord_idecls) = partition (ideclSource.unLoc) imps
+
+ -- GHC.Prim doesn't exist physically, so don't go looking for it.
+ ordinary_imps = filter ((/= moduleName gHC_PRIM) . unLoc
+ . ideclName . unLoc)
+ ord_idecls
+
+ implicit_prelude = xopt LangExt.ImplicitPrelude dflags
+ implicit_imports = mkPrelImports (unLoc mod) main_loc
+ implicit_prelude imps
+ convImport (L _ i) = (fmap sl_fs (ideclPkgQual i), ideclName i)
+ in
+ return (map convImport src_idecls,
+ map convImport (implicit_imports ++ ordinary_imps),
+ mod)
+
+mkPrelImports :: ModuleName
+ -> SrcSpan -- Attribute the "import Prelude" to this location
+ -> Bool -> [LImportDecl GhcPs]
+ -> [LImportDecl GhcPs]
+-- Construct the implicit declaration "import Prelude" (or not)
+--
+-- NB: opt_NoImplicitPrelude is slightly different to import Prelude ();
+-- because the former doesn't even look at Prelude.hi for instance
+-- declarations, whereas the latter does.
+mkPrelImports this_mod loc implicit_prelude import_decls
+ | this_mod == pRELUDE_NAME
+ || explicit_prelude_import
+ || not implicit_prelude
+ = []
+ | otherwise = [preludeImportDecl]
+ where
+ explicit_prelude_import
+ = notNull [ () | L _ (ImportDecl { ideclName = mod
+ , ideclPkgQual = Nothing })
+ <- import_decls
+ , unLoc mod == pRELUDE_NAME ]
+
+ preludeImportDecl :: LImportDecl GhcPs
+ preludeImportDecl
+ = L loc $ ImportDecl { ideclExt = noExtField,
+ ideclSourceSrc = NoSourceText,
+ ideclName = L loc pRELUDE_NAME,
+ ideclPkgQual = Nothing,
+ ideclSource = False,
+ ideclSafe = False, -- Not a safe import
+ ideclQualified = NotQualified,
+ ideclImplicit = True, -- Implicit!
+ ideclAs = Nothing,
+ ideclHiding = Nothing }
+
+--------------------------------------------------------------
+-- Get options
+--------------------------------------------------------------
+
+-- | Parse OPTIONS and LANGUAGE pragmas of the source file.
+--
+-- Throws a 'SourceError' if flag parsing fails (including unsupported flags.)
+getOptionsFromFile :: DynFlags
+ -> FilePath -- ^ Input file
+ -> IO [Located String] -- ^ Parsed options, if any.
+getOptionsFromFile dflags filename
+ = Exception.bracket
+ (openBinaryFile filename ReadMode)
+ (hClose)
+ (\handle -> do
+ opts <- fmap (getOptions' dflags)
+ (lazyGetToks dflags' filename handle)
+ seqList opts $ return opts)
+ where -- We don't need to get haddock doc tokens when we're just
+ -- getting the options from pragmas, and lazily lexing them
+ -- correctly is a little tricky: If there is "\n" or "\n-"
+ -- left at the end of a buffer then the haddock doc may
+ -- continue past the end of the buffer, despite the fact that
+ -- we already have an apparently-complete token.
+ -- We therefore just turn Opt_Haddock off when doing the lazy
+ -- lex.
+ dflags' = gopt_unset dflags Opt_Haddock
+
+blockSize :: Int
+-- blockSize = 17 -- for testing :-)
+blockSize = 1024
+
+lazyGetToks :: DynFlags -> FilePath -> Handle -> IO [Located Token]
+lazyGetToks dflags filename handle = do
+ buf <- hGetStringBufferBlock handle blockSize
+ unsafeInterleaveIO $ lazyLexBuf handle (pragState dflags buf loc) False blockSize
+ where
+ loc = mkRealSrcLoc (mkFastString filename) 1 1
+
+ lazyLexBuf :: Handle -> PState -> Bool -> Int -> IO [Located Token]
+ lazyLexBuf handle state eof size = do
+ case unP (lexer False return) state of
+ POk state' t -> do
+ -- pprTrace "lazyLexBuf" (text (show (buffer state'))) (return ())
+ if atEnd (buffer state') && not eof
+ -- if this token reached the end of the buffer, and we haven't
+ -- necessarily read up to the end of the file, then the token might
+ -- be truncated, so read some more of the file and lex it again.
+ then getMore handle state size
+ else case unLoc t of
+ ITeof -> return [t]
+ _other -> do rest <- lazyLexBuf handle state' eof size
+ return (t : rest)
+ _ | not eof -> getMore handle state size
+ | otherwise -> return [L (mkSrcSpanPs (last_loc state)) ITeof]
+ -- parser assumes an ITeof sentinel at the end
+
+ getMore :: Handle -> PState -> Int -> IO [Located Token]
+ getMore handle state size = do
+ -- pprTrace "getMore" (text (show (buffer state))) (return ())
+ let new_size = size * 2
+ -- double the buffer size each time we read a new block. This
+ -- counteracts the quadratic slowdown we otherwise get for very
+ -- large module names (#5981)
+ nextbuf <- hGetStringBufferBlock handle new_size
+ if (len nextbuf == 0) then lazyLexBuf handle state True new_size else do
+ newbuf <- appendStringBuffers (buffer state) nextbuf
+ unsafeInterleaveIO $ lazyLexBuf handle state{buffer=newbuf} False new_size
+
+
+getToks :: DynFlags -> FilePath -> StringBuffer -> [Located Token]
+getToks dflags filename buf = lexAll (pragState dflags buf loc)
+ where
+ loc = mkRealSrcLoc (mkFastString filename) 1 1
+
+ lexAll state = case unP (lexer False return) state of
+ POk _ t@(L _ ITeof) -> [t]
+ POk state' t -> t : lexAll state'
+ _ -> [L (mkSrcSpanPs (last_loc state)) ITeof]
+
+
+-- | Parse OPTIONS and LANGUAGE pragmas of the source file.
+--
+-- Throws a 'SourceError' if flag parsing fails (including unsupported flags.)
+getOptions :: DynFlags
+ -> StringBuffer -- ^ Input Buffer
+ -> FilePath -- ^ Source filename. Used for location info.
+ -> [Located String] -- ^ Parsed options.
+getOptions dflags buf filename
+ = getOptions' dflags (getToks dflags filename buf)
+
+-- The token parser is written manually because Happy can't
+-- return a partial result when it encounters a lexer error.
+-- We want to extract options before the buffer is passed through
+-- CPP, so we can't use the same trick as 'getImports'.
+getOptions' :: DynFlags
+ -> [Located Token] -- Input buffer
+ -> [Located String] -- Options.
+getOptions' dflags toks
+ = parseToks toks
+ where
+ parseToks (open:close:xs)
+ | IToptions_prag str <- unLoc open
+ , ITclose_prag <- unLoc close
+ = case toArgs str of
+ Left _err -> optionsParseError str dflags $ -- #15053
+ combineSrcSpans (getLoc open) (getLoc close)
+ Right args -> map (L (getLoc open)) args ++ parseToks xs
+ parseToks (open:close:xs)
+ | ITinclude_prag str <- unLoc open
+ , ITclose_prag <- unLoc close
+ = map (L (getLoc open)) ["-#include",removeSpaces str] ++
+ parseToks xs
+ parseToks (open:close:xs)
+ | ITdocOptions str <- unLoc open
+ , ITclose_prag <- unLoc close
+ = map (L (getLoc open)) ["-haddock-opts", removeSpaces str]
+ ++ parseToks xs
+ parseToks (open:xs)
+ | ITlanguage_prag <- unLoc open
+ = parseLanguage xs
+ parseToks (comment:xs) -- Skip over comments
+ | isComment (unLoc comment)
+ = parseToks xs
+ parseToks _ = []
+ parseLanguage ((L loc (ITconid fs)):rest)
+ = checkExtension dflags (L loc fs) :
+ case rest of
+ (L _loc ITcomma):more -> parseLanguage more
+ (L _loc ITclose_prag):more -> parseToks more
+ (L loc _):_ -> languagePragParseError dflags loc
+ [] -> panic "getOptions'.parseLanguage(1) went past eof token"
+ parseLanguage (tok:_)
+ = languagePragParseError dflags (getLoc tok)
+ parseLanguage []
+ = panic "getOptions'.parseLanguage(2) went past eof token"
+
+ isComment :: Token -> Bool
+ isComment c =
+ case c of
+ (ITlineComment {}) -> True
+ (ITblockComment {}) -> True
+ (ITdocCommentNext {}) -> True
+ (ITdocCommentPrev {}) -> True
+ (ITdocCommentNamed {}) -> True
+ (ITdocSection {}) -> True
+ _ -> False
+
+-----------------------------------------------------------------------------
+
+-- | Complain about non-dynamic flags in OPTIONS pragmas.
+--
+-- Throws a 'SourceError' if the input list is non-empty claiming that the
+-- input flags are unknown.
+checkProcessArgsResult :: MonadIO m => DynFlags -> [Located String] -> m ()
+checkProcessArgsResult dflags flags
+ = when (notNull flags) $
+ liftIO $ throwIO $ mkSrcErr $ listToBag $ map mkMsg flags
+ where mkMsg (L loc flag)
+ = mkPlainErrMsg dflags loc $
+ (text "unknown flag in {-# OPTIONS_GHC #-} pragma:" <+>
+ text flag)
+
+-----------------------------------------------------------------------------
+
+checkExtension :: DynFlags -> Located FastString -> Located String
+checkExtension dflags (L l ext)
+-- Checks if a given extension is valid, and if so returns
+-- its corresponding flag. Otherwise it throws an exception.
+ = if ext' `elem` supported
+ then L l ("-X"++ext')
+ else unsupportedExtnError dflags l ext'
+ where
+ ext' = unpackFS ext
+ supported = supportedLanguagesAndExtensions $ platformMini $ targetPlatform dflags
+
+languagePragParseError :: DynFlags -> SrcSpan -> a
+languagePragParseError dflags loc =
+ throwErr dflags loc $
+ vcat [ text "Cannot parse LANGUAGE pragma"
+ , text "Expecting comma-separated list of language options,"
+ , text "each starting with a capital letter"
+ , nest 2 (text "E.g. {-# LANGUAGE TemplateHaskell, GADTs #-}") ]
+
+unsupportedExtnError :: DynFlags -> SrcSpan -> String -> a
+unsupportedExtnError dflags loc unsup =
+ throwErr dflags loc $
+ text "Unsupported extension: " <> text unsup $$
+ if null suggestions then Outputable.empty else text "Perhaps you meant" <+> quotedListWithOr (map text suggestions)
+ where
+ supported = supportedLanguagesAndExtensions $ platformMini $ targetPlatform dflags
+ suggestions = fuzzyMatch unsup supported
+
+
+optionsErrorMsgs :: DynFlags -> [String] -> [Located String] -> FilePath -> Messages
+optionsErrorMsgs dflags unhandled_flags flags_lines _filename
+ = (emptyBag, listToBag (map mkMsg unhandled_flags_lines))
+ where unhandled_flags_lines :: [Located String]
+ unhandled_flags_lines = [ L l f
+ | f <- unhandled_flags
+ , L l f' <- flags_lines
+ , f == f' ]
+ mkMsg (L flagSpan flag) =
+ ErrUtils.mkPlainErrMsg dflags flagSpan $
+ text "unknown flag in {-# OPTIONS_GHC #-} pragma:" <+> text flag
+
+optionsParseError :: String -> DynFlags -> SrcSpan -> a -- #15053
+optionsParseError str dflags loc =
+ throwErr dflags loc $
+ vcat [ text "Error while parsing OPTIONS_GHC pragma."
+ , text "Expecting whitespace-separated list of GHC options."
+ , text " E.g. {-# OPTIONS_GHC -Wall -O2 #-}"
+ , text ("Input was: " ++ show str) ]
+
+throwErr :: DynFlags -> SrcSpan -> SDoc -> a -- #15053
+throwErr dflags loc doc =
+ throw $ mkSrcErr $ unitBag $ mkPlainErrMsg dflags loc doc
diff --git a/compiler/GHC/Parser/Lexer.x b/compiler/GHC/Parser/Lexer.x
new file mode 100644
index 0000000000..17b6674c95
--- /dev/null
+++ b/compiler/GHC/Parser/Lexer.x
@@ -0,0 +1,3294 @@
+-----------------------------------------------------------------------------
+-- (c) The University of Glasgow, 2006
+--
+-- GHC's lexer for Haskell 2010 [1].
+--
+-- This is a combination of an Alex-generated lexer [2] from a regex
+-- definition, with some hand-coded bits. [3]
+--
+-- Completely accurate information about token-spans within the source
+-- file is maintained. Every token has a start and end RealSrcLoc
+-- attached to it.
+--
+-- References:
+-- [1] https://www.haskell.org/onlinereport/haskell2010/haskellch2.html
+-- [2] http://www.haskell.org/alex/
+-- [3] https://gitlab.haskell.org/ghc/ghc/wikis/commentary/compiler/parser
+--
+-----------------------------------------------------------------------------
+
+-- ToDo / known bugs:
+-- - parsing integers is a bit slow
+-- - readRational is a bit slow
+--
+-- Known bugs, that were also in the previous version:
+-- - M... should be 3 tokens, not 1.
+-- - pragma-end should be only valid in a pragma
+
+-- qualified operator NOTES.
+--
+-- - If M.(+) is a single lexeme, then..
+-- - Probably (+) should be a single lexeme too, for consistency.
+-- Otherwise ( + ) would be a prefix operator, but M.( + ) would not be.
+-- - But we have to rule out reserved operators, otherwise (..) becomes
+-- a different lexeme.
+-- - Should we therefore also rule out reserved operators in the qualified
+-- form? This is quite difficult to achieve. We don't do it for
+-- qualified varids.
+
+
+-- -----------------------------------------------------------------------------
+-- Alex "Haskell code fragment top"
+
+{
+{-# LANGUAGE BangPatterns #-}
+{-# LANGUAGE LambdaCase #-}
+{-# LANGUAGE MultiWayIf #-}
+
+{-# OPTIONS_GHC -funbox-strict-fields #-}
+{-# OPTIONS_GHC -Wno-incomplete-uni-patterns #-}
+
+module GHC.Parser.Lexer (
+ Token(..), lexer, lexerDbg, pragState, mkPState, mkPStatePure, PState(..),
+ P(..), ParseResult(..), mkParserFlags, mkParserFlags', ParserFlags(..),
+ appendWarning,
+ appendError,
+ allocateComments,
+ MonadP(..),
+ getRealSrcLoc, getPState, withThisPackage,
+ failMsgP, failLocMsgP, srcParseFail,
+ getErrorMessages, getMessages,
+ popContext, pushModuleContext, setLastToken, setSrcLoc,
+ activeContext, nextIsEOF,
+ getLexState, popLexState, pushLexState,
+ ExtBits(..),
+ xtest,
+ lexTokenStream,
+ AddAnn(..),mkParensApiAnn,
+ addAnnsAt,
+ commentToAnnotation
+ ) where
+
+import GhcPrelude
+
+-- base
+import Control.Monad
+import Data.Bits
+import Data.Char
+import Data.List
+import Data.Maybe
+import Data.Word
+
+import EnumSet (EnumSet)
+import qualified EnumSet
+
+-- ghc-boot
+import qualified GHC.LanguageExtensions as LangExt
+
+-- bytestring
+import Data.ByteString (ByteString)
+
+-- containers
+import Data.Map (Map)
+import qualified Data.Map as Map
+
+-- compiler/utils
+import Bag
+import Outputable
+import StringBuffer
+import FastString
+import GHC.Types.Unique.FM
+import Util ( readRational, readHexRational )
+
+-- compiler/main
+import ErrUtils
+import GHC.Driver.Session as DynFlags
+
+-- compiler/basicTypes
+import GHC.Types.SrcLoc
+import GHC.Types.Module
+import GHC.Types.Basic ( InlineSpec(..), RuleMatchInfo(..),
+ IntegralLit(..), FractionalLit(..),
+ SourceText(..) )
+
+-- compiler/parser
+import GHC.Parser.CharClass
+
+import GHC.Parser.Annotation
+}
+
+-- -----------------------------------------------------------------------------
+-- Alex "Character set macros"
+
+-- NB: The logic behind these definitions is also reflected in basicTypes/Lexeme.hs
+-- Any changes here should likely be reflected there.
+$unispace = \x05 -- Trick Alex into handling Unicode. See [Unicode in Alex].
+$nl = [\n\r\f]
+$whitechar = [$nl\v\ $unispace]
+$white_no_nl = $whitechar # \n -- TODO #8424
+$tab = \t
+
+$ascdigit = 0-9
+$unidigit = \x03 -- Trick Alex into handling Unicode. See [Unicode in Alex].
+$decdigit = $ascdigit -- for now, should really be $digit (ToDo)
+$digit = [$ascdigit $unidigit]
+
+$special = [\(\)\,\;\[\]\`\{\}]
+$ascsymbol = [\!\#\$\%\&\*\+\.\/\<\=\>\?\@\\\^\|\-\~\:]
+$unisymbol = \x04 -- Trick Alex into handling Unicode. See [Unicode in Alex].
+$symbol = [$ascsymbol $unisymbol] # [$special \_\"\']
+
+$unilarge = \x01 -- Trick Alex into handling Unicode. See [Unicode in Alex].
+$asclarge = [A-Z]
+$large = [$asclarge $unilarge]
+
+$unismall = \x02 -- Trick Alex into handling Unicode. See [Unicode in Alex].
+$ascsmall = [a-z]
+$small = [$ascsmall $unismall \_]
+
+$unigraphic = \x06 -- Trick Alex into handling Unicode. See [Unicode in Alex].
+$graphic = [$small $large $symbol $digit $special $unigraphic \"\']
+
+$binit = 0-1
+$octit = 0-7
+$hexit = [$decdigit A-F a-f]
+
+$uniidchar = \x07 -- Trick Alex into handling Unicode. See [Unicode in Alex].
+$idchar = [$small $large $digit $uniidchar \']
+
+$pragmachar = [$small $large $digit]
+
+$docsym = [\| \^ \* \$]
+
+
+-- -----------------------------------------------------------------------------
+-- Alex "Regular expression macros"
+
+@varid = $small $idchar* -- variable identifiers
+@conid = $large $idchar* -- constructor identifiers
+
+@varsym = ($symbol # \:) $symbol* -- variable (operator) symbol
+@consym = \: $symbol* -- constructor (operator) symbol
+
+-- See Note [Lexing NumericUnderscores extension] and #14473
+@numspc = _* -- numeric spacer (#14473)
+@decimal = $decdigit(@numspc $decdigit)*
+@binary = $binit(@numspc $binit)*
+@octal = $octit(@numspc $octit)*
+@hexadecimal = $hexit(@numspc $hexit)*
+@exponent = @numspc [eE] [\-\+]? @decimal
+@bin_exponent = @numspc [pP] [\-\+]? @decimal
+
+@qual = (@conid \.)+
+@qvarid = @qual @varid
+@qconid = @qual @conid
+@qvarsym = @qual @varsym
+@qconsym = @qual @consym
+
+@floating_point = @numspc @decimal \. @decimal @exponent? | @numspc @decimal @exponent
+@hex_floating_point = @numspc @hexadecimal \. @hexadecimal @bin_exponent? | @numspc @hexadecimal @bin_exponent
+
+-- normal signed numerical literals can only be explicitly negative,
+-- not explicitly positive (contrast @exponent)
+@negative = \-
+@signed = @negative ?
+
+
+-- -----------------------------------------------------------------------------
+-- Alex "Identifier"
+
+haskell :-
+
+
+-- -----------------------------------------------------------------------------
+-- Alex "Rules"
+
+-- everywhere: skip whitespace
+$white_no_nl+ ;
+$tab { warnTab }
+
+-- Everywhere: deal with nested comments. We explicitly rule out
+-- pragmas, "{-#", so that we don't accidentally treat them as comments.
+-- (this can happen even though pragmas will normally take precedence due to
+-- longest-match, because pragmas aren't valid in every state, but comments
+-- are). We also rule out nested Haddock comments, if the -haddock flag is
+-- set.
+
+"{-" / { isNormalComment } { nested_comment lexToken }
+
+-- Single-line comments are a bit tricky. Haskell 98 says that two or
+-- more dashes followed by a symbol should be parsed as a varsym, so we
+-- have to exclude those.
+
+-- Since Haddock comments aren't valid in every state, we need to rule them
+-- out here.
+
+-- The following two rules match comments that begin with two dashes, but
+-- continue with a different character. The rules test that this character
+-- is not a symbol (in which case we'd have a varsym), and that it's not a
+-- space followed by a Haddock comment symbol (docsym) (in which case we'd
+-- have a Haddock comment). The rules then munch the rest of the line.
+
+"-- " ~$docsym .* { lineCommentToken }
+"--" [^$symbol \ ] .* { lineCommentToken }
+
+-- Next, match Haddock comments if no -haddock flag
+
+"-- " $docsym .* / { alexNotPred (ifExtension HaddockBit) } { lineCommentToken }
+
+-- Now, when we've matched comments that begin with 2 dashes and continue
+-- with a different character, we need to match comments that begin with three
+-- or more dashes (which clearly can't be Haddock comments). We only need to
+-- make sure that the first non-dash character isn't a symbol, and munch the
+-- rest of the line.
+
+"---"\-* ~$symbol .* { lineCommentToken }
+
+-- Since the previous rules all match dashes followed by at least one
+-- character, we also need to match a whole line filled with just dashes.
+
+"--"\-* / { atEOL } { lineCommentToken }
+
+-- We need this rule since none of the other single line comment rules
+-- actually match this case.
+
+"-- " / { atEOL } { lineCommentToken }
+
+-- 'bol' state: beginning of a line. Slurp up all the whitespace (including
+-- blank lines) until we find a non-whitespace character, then do layout
+-- processing.
+--
+-- One slight wibble here: what if the line begins with {-#? In
+-- theory, we have to lex the pragma to see if it's one we recognise,
+-- and if it is, then we backtrack and do_bol, otherwise we treat it
+-- as a nested comment. We don't bother with this: if the line begins
+-- with {-#, then we'll assume it's a pragma we know about and go for do_bol.
+<bol> {
+ \n ;
+ ^\# line { begin line_prag1 }
+ ^\# / { followedByDigit } { begin line_prag1 }
+ ^\# pragma .* \n ; -- GCC 3.3 CPP generated, apparently
+ ^\# \! .* \n ; -- #!, for scripts
+ () { do_bol }
+}
+
+-- after a layout keyword (let, where, do, of), we begin a new layout
+-- context if the curly brace is missing.
+-- Careful! This stuff is quite delicate.
+<layout, layout_do, layout_if> {
+ \{ / { notFollowedBy '-' } { hopefully_open_brace }
+ -- we might encounter {-# here, but {- has been handled already
+ \n ;
+ ^\# (line)? { begin line_prag1 }
+}
+
+-- after an 'if', a vertical bar starts a layout context for MultiWayIf
+<layout_if> {
+ \| / { notFollowedBySymbol } { new_layout_context True dontGenerateSemic ITvbar }
+ () { pop }
+}
+
+-- do is treated in a subtly different way, see new_layout_context
+<layout> () { new_layout_context True generateSemic ITvocurly }
+<layout_do> () { new_layout_context False generateSemic ITvocurly }
+
+-- after a new layout context which was found to be to the left of the
+-- previous context, we have generated a '{' token, and we now need to
+-- generate a matching '}' token.
+<layout_left> () { do_layout_left }
+
+<0,option_prags> \n { begin bol }
+
+"{-#" $whitechar* $pragmachar+ / { known_pragma linePrags }
+ { dispatch_pragmas linePrags }
+
+-- single-line line pragmas, of the form
+-- # <line> "<file>" <extra-stuff> \n
+<line_prag1> {
+ @decimal $white_no_nl+ \" [$graphic \ ]* \" { setLineAndFile line_prag1a }
+ () { failLinePrag1 }
+}
+<line_prag1a> .* { popLinePrag1 }
+
+-- Haskell-style line pragmas, of the form
+-- {-# LINE <line> "<file>" #-}
+<line_prag2> {
+ @decimal $white_no_nl+ \" [$graphic \ ]* \" { setLineAndFile line_prag2a }
+}
+<line_prag2a> "#-}"|"-}" { pop }
+ -- NOTE: accept -} at the end of a LINE pragma, for compatibility
+ -- with older versions of GHC which generated these.
+
+-- Haskell-style column pragmas, of the form
+-- {-# COLUMN <column> #-}
+<column_prag> @decimal $whitechar* "#-}" { setColumn }
+
+<0,option_prags> {
+ "{-#" $whitechar* $pragmachar+
+ $whitechar+ $pragmachar+ / { known_pragma twoWordPrags }
+ { dispatch_pragmas twoWordPrags }
+
+ "{-#" $whitechar* $pragmachar+ / { known_pragma oneWordPrags }
+ { dispatch_pragmas oneWordPrags }
+
+ -- We ignore all these pragmas, but don't generate a warning for them
+ "{-#" $whitechar* $pragmachar+ / { known_pragma ignoredPrags }
+ { dispatch_pragmas ignoredPrags }
+
+ -- ToDo: should only be valid inside a pragma:
+ "#-}" { endPrag }
+}
+
+<option_prags> {
+ "{-#" $whitechar* $pragmachar+ / { known_pragma fileHeaderPrags }
+ { dispatch_pragmas fileHeaderPrags }
+}
+
+<0> {
+ -- In the "0" mode we ignore these pragmas
+ "{-#" $whitechar* $pragmachar+ / { known_pragma fileHeaderPrags }
+ { nested_comment lexToken }
+}
+
+<0,option_prags> {
+ "{-#" { warnThen Opt_WarnUnrecognisedPragmas (text "Unrecognised pragma")
+ (nested_comment lexToken) }
+}
+
+-- '0' state: ordinary lexemes
+
+-- Haddock comments
+
+<0,option_prags> {
+ "-- " $docsym / { ifExtension HaddockBit } { multiline_doc_comment }
+ "{-" \ ? $docsym / { ifExtension HaddockBit } { nested_doc_comment }
+}
+
+-- "special" symbols
+
+<0> {
+ "[|" / { ifExtension ThQuotesBit } { token (ITopenExpQuote NoE NormalSyntax) }
+ "[||" / { ifExtension ThQuotesBit } { token (ITopenTExpQuote NoE) }
+ "[e|" / { ifExtension ThQuotesBit } { token (ITopenExpQuote HasE NormalSyntax) }
+ "[e||" / { ifExtension ThQuotesBit } { token (ITopenTExpQuote HasE) }
+ "[p|" / { ifExtension ThQuotesBit } { token ITopenPatQuote }
+ "[d|" / { ifExtension ThQuotesBit } { layout_token ITopenDecQuote }
+ "[t|" / { ifExtension ThQuotesBit } { token ITopenTypQuote }
+ "|]" / { ifExtension ThQuotesBit } { token (ITcloseQuote NormalSyntax) }
+ "||]" / { ifExtension ThQuotesBit } { token ITcloseTExpQuote }
+
+ "[" @varid "|" / { ifExtension QqBit } { lex_quasiquote_tok }
+
+ -- qualified quasi-quote (#5555)
+ "[" @qvarid "|" / { ifExtension QqBit } { lex_qquasiquote_tok }
+
+ $unigraphic -- ⟦
+ / { ifCurrentChar '⟦' `alexAndPred`
+ ifExtension UnicodeSyntaxBit `alexAndPred`
+ ifExtension ThQuotesBit }
+ { token (ITopenExpQuote NoE UnicodeSyntax) }
+ $unigraphic -- ⟧
+ / { ifCurrentChar '⟧' `alexAndPred`
+ ifExtension UnicodeSyntaxBit `alexAndPred`
+ ifExtension ThQuotesBit }
+ { token (ITcloseQuote UnicodeSyntax) }
+}
+
+<0> {
+ "(|"
+ / { ifExtension ArrowsBit `alexAndPred`
+ notFollowedBySymbol }
+ { special (IToparenbar NormalSyntax) }
+ "|)"
+ / { ifExtension ArrowsBit }
+ { special (ITcparenbar NormalSyntax) }
+
+ $unigraphic -- ⦇
+ / { ifCurrentChar '⦇' `alexAndPred`
+ ifExtension UnicodeSyntaxBit `alexAndPred`
+ ifExtension ArrowsBit }
+ { special (IToparenbar UnicodeSyntax) }
+ $unigraphic -- ⦈
+ / { ifCurrentChar '⦈' `alexAndPred`
+ ifExtension UnicodeSyntaxBit `alexAndPred`
+ ifExtension ArrowsBit }
+ { special (ITcparenbar UnicodeSyntax) }
+}
+
+<0> {
+ \? @varid / { ifExtension IpBit } { skip_one_varid ITdupipvarid }
+}
+
+<0> {
+ "#" @varid / { ifExtension OverloadedLabelsBit } { skip_one_varid ITlabelvarid }
+}
+
+<0> {
+ "(#" / { ifExtension UnboxedTuplesBit `alexOrPred`
+ ifExtension UnboxedSumsBit }
+ { token IToubxparen }
+ "#)" / { ifExtension UnboxedTuplesBit `alexOrPred`
+ ifExtension UnboxedSumsBit }
+ { token ITcubxparen }
+}
+
+<0,option_prags> {
+ \( { special IToparen }
+ \) { special ITcparen }
+ \[ { special ITobrack }
+ \] { special ITcbrack }
+ \, { special ITcomma }
+ \; { special ITsemi }
+ \` { special ITbackquote }
+
+ \{ { open_brace }
+ \} { close_brace }
+}
+
+<0,option_prags> {
+ @qvarid { idtoken qvarid }
+ @qconid { idtoken qconid }
+ @varid { varid }
+ @conid { idtoken conid }
+}
+
+<0> {
+ @qvarid "#"+ / { ifExtension MagicHashBit } { idtoken qvarid }
+ @qconid "#"+ / { ifExtension MagicHashBit } { idtoken qconid }
+ @varid "#"+ / { ifExtension MagicHashBit } { varid }
+ @conid "#"+ / { ifExtension MagicHashBit } { idtoken conid }
+}
+
+-- Operators classified into prefix, suffix, tight infix, and loose infix.
+-- See Note [Whitespace-sensitive operator parsing]
+<0> {
+ @varsym / { precededByClosingToken `alexAndPred` followedByOpeningToken } { varsym_tight_infix }
+ @varsym / { followedByOpeningToken } { varsym_prefix }
+ @varsym / { precededByClosingToken } { varsym_suffix }
+ @varsym { varsym_loose_infix }
+}
+
+-- ToDo: - move `var` and (sym) into lexical syntax?
+-- - remove backquote from $special?
+<0> {
+ @qvarsym { idtoken qvarsym }
+ @qconsym { idtoken qconsym }
+ @consym { consym }
+}
+
+-- For the normal boxed literals we need to be careful
+-- when trying to be close to Haskell98
+
+-- Note [Lexing NumericUnderscores extension] (#14473)
+--
+-- NumericUnderscores extension allows underscores in numeric literals.
+-- Multiple underscores are represented with @numspc macro.
+-- To be simpler, we have only the definitions with underscores.
+-- And then we have a separate function (tok_integral and tok_frac)
+-- that validates the literals.
+-- If extensions are not enabled, check that there are no underscores.
+--
+<0> {
+ -- Normal integral literals (:: Num a => a, from Integer)
+ @decimal { tok_num positive 0 0 decimal }
+ 0[bB] @numspc @binary / { ifExtension BinaryLiteralsBit } { tok_num positive 2 2 binary }
+ 0[oO] @numspc @octal { tok_num positive 2 2 octal }
+ 0[xX] @numspc @hexadecimal { tok_num positive 2 2 hexadecimal }
+ @negative @decimal / { ifExtension NegativeLiteralsBit } { tok_num negative 1 1 decimal }
+ @negative 0[bB] @numspc @binary / { ifExtension NegativeLiteralsBit `alexAndPred`
+ ifExtension BinaryLiteralsBit } { tok_num negative 3 3 binary }
+ @negative 0[oO] @numspc @octal / { ifExtension NegativeLiteralsBit } { tok_num negative 3 3 octal }
+ @negative 0[xX] @numspc @hexadecimal / { ifExtension NegativeLiteralsBit } { tok_num negative 3 3 hexadecimal }
+
+ -- Normal rational literals (:: Fractional a => a, from Rational)
+ @floating_point { tok_frac 0 tok_float }
+ @negative @floating_point / { ifExtension NegativeLiteralsBit } { tok_frac 0 tok_float }
+ 0[xX] @numspc @hex_floating_point / { ifExtension HexFloatLiteralsBit } { tok_frac 0 tok_hex_float }
+ @negative 0[xX] @numspc @hex_floating_point
+ / { ifExtension HexFloatLiteralsBit `alexAndPred`
+ ifExtension NegativeLiteralsBit } { tok_frac 0 tok_hex_float }
+}
+
+<0> {
+ -- Unboxed ints (:: Int#) and words (:: Word#)
+ -- It's simpler (and faster?) to give separate cases to the negatives,
+ -- especially considering octal/hexadecimal prefixes.
+ @decimal \# / { ifExtension MagicHashBit } { tok_primint positive 0 1 decimal }
+ 0[bB] @numspc @binary \# / { ifExtension MagicHashBit `alexAndPred`
+ ifExtension BinaryLiteralsBit } { tok_primint positive 2 3 binary }
+ 0[oO] @numspc @octal \# / { ifExtension MagicHashBit } { tok_primint positive 2 3 octal }
+ 0[xX] @numspc @hexadecimal \# / { ifExtension MagicHashBit } { tok_primint positive 2 3 hexadecimal }
+ @negative @decimal \# / { ifExtension MagicHashBit } { tok_primint negative 1 2 decimal }
+ @negative 0[bB] @numspc @binary \# / { ifExtension MagicHashBit `alexAndPred`
+ ifExtension BinaryLiteralsBit } { tok_primint negative 3 4 binary }
+ @negative 0[oO] @numspc @octal \# / { ifExtension MagicHashBit } { tok_primint negative 3 4 octal }
+ @negative 0[xX] @numspc @hexadecimal \#
+ / { ifExtension MagicHashBit } { tok_primint negative 3 4 hexadecimal }
+
+ @decimal \# \# / { ifExtension MagicHashBit } { tok_primword 0 2 decimal }
+ 0[bB] @numspc @binary \# \# / { ifExtension MagicHashBit `alexAndPred`
+ ifExtension BinaryLiteralsBit } { tok_primword 2 4 binary }
+ 0[oO] @numspc @octal \# \# / { ifExtension MagicHashBit } { tok_primword 2 4 octal }
+ 0[xX] @numspc @hexadecimal \# \# / { ifExtension MagicHashBit } { tok_primword 2 4 hexadecimal }
+
+ -- Unboxed floats and doubles (:: Float#, :: Double#)
+ -- prim_{float,double} work with signed literals
+ @signed @floating_point \# / { ifExtension MagicHashBit } { tok_frac 1 tok_primfloat }
+ @signed @floating_point \# \# / { ifExtension MagicHashBit } { tok_frac 2 tok_primdouble }
+}
+
+-- Strings and chars are lexed by hand-written code. The reason is
+-- that even if we recognise the string or char here in the regex
+-- lexer, we would still have to parse the string afterward in order
+-- to convert it to a String.
+<0> {
+ \' { lex_char_tok }
+ \" { lex_string_tok }
+}
+
+-- Note [Whitespace-sensitive operator parsing]
+-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+-- In accord with GHC Proposal #229 https://github.com/ghc-proposals/ghc-proposals/blob/master/proposals/0229-whitespace-bang-patterns.rst
+-- we classify operator occurrences into four categories:
+--
+-- a ! b -- a loose infix occurrence
+-- a!b -- a tight infix occurrence
+-- a !b -- a prefix occurrence
+-- a! b -- a suffix occurrence
+--
+-- The rules are a bit more elaborate than simply checking for whitespace, in
+-- order to accommodate the following use cases:
+--
+-- f (!a) = ... -- prefix occurrence
+-- g (a !) -- loose infix occurrence
+-- g (! a) -- loose infix occurrence
+--
+-- The precise rules are as follows:
+--
+-- * Identifiers, literals, and opening brackets (, (#, [, [|, [||, [p|, [e|,
+-- [t|, {, are considered "opening tokens". The function followedByOpeningToken
+-- tests whether the next token is an opening token.
+--
+-- * Identifiers, literals, and closing brackets ), #), ], |], },
+-- are considered "closing tokens". The function precededByClosingToken tests
+-- whether the previous token is a closing token.
+--
+-- * Whitespace, comments, separators, and other tokens, are considered
+-- neither opening nor closing.
+--
+-- * Any unqualified operator occurrence is classified as prefix, suffix, or
+-- tight/loose infix, based on preceding and following tokens:
+--
+-- precededByClosingToken | followedByOpeningToken | Occurrence
+-- ------------------------+------------------------+------------
+-- False | True | prefix
+-- True | False | suffix
+-- True | True | tight infix
+-- False | False | loose infix
+-- ------------------------+------------------------+------------
+--
+-- A loose infix occurrence is always considered an operator. Other types of
+-- occurrences may be assigned a special per-operator meaning override:
+--
+-- Operator | Occurrence | Token returned
+-- ----------+---------------+------------------------------------------
+-- ! | prefix | ITbang
+-- | | strictness annotation or bang pattern,
+-- | | e.g. f !x = rhs, data T = MkT !a
+-- | not prefix | ITvarsym "!"
+-- | | ordinary operator or type operator,
+-- | | e.g. xs ! 3, (! x), Int ! Bool
+-- ----------+---------------+------------------------------------------
+-- ~ | prefix | ITtilde
+-- | | laziness annotation or lazy pattern,
+-- | | e.g. f ~x = rhs, data T = MkT ~a
+-- | not prefix | ITvarsym "~"
+-- | | ordinary operator or type operator,
+-- | | e.g. xs ~ 3, (~ x), Int ~ Bool
+-- ----------+---------------+------------------------------------------
+-- $ $$ | prefix | ITdollar, ITdollardollar
+-- | | untyped or typed Template Haskell splice,
+-- | | e.g. $(f x), $$(f x), $$"str"
+-- | not prefix | ITvarsym "$", ITvarsym "$$"
+-- | | ordinary operator or type operator,
+-- | | e.g. f $ g x, a $$ b
+-- ----------+---------------+------------------------------------------
+-- @ | prefix | ITtypeApp
+-- | | type application, e.g. fmap @Maybe
+-- | tight infix | ITat
+-- | | as-pattern, e.g. f p@(a,b) = rhs
+-- | suffix | parse error
+-- | | e.g. f p@ x = rhs
+-- | loose infix | ITvarsym "@"
+-- | | ordinary operator or type operator,
+-- | | e.g. f @ g, (f @)
+-- ----------+---------------+------------------------------------------
+--
+-- Also, some of these overrides are guarded behind language extensions.
+-- According to the specification, we must determine the occurrence based on
+-- surrounding *tokens* (see the proposal for the exact rules). However, in
+-- the implementation we cheat a little and do the classification based on
+-- characters, for reasons of both simplicity and efficiency (see
+-- 'followedByOpeningToken' and 'precededByClosingToken')
+--
+-- When an operator is subject to a meaning override, it is mapped to special
+-- token: ITbang, ITtilde, ITat, ITdollar, ITdollardollar. Otherwise, it is
+-- returned as ITvarsym.
+--
+-- For example, this is how we process the (!):
+--
+-- precededByClosingToken | followedByOpeningToken | Token
+-- ------------------------+------------------------+-------------
+-- False | True | ITbang
+-- True | False | ITvarsym "!"
+-- True | True | ITvarsym "!"
+-- False | False | ITvarsym "!"
+-- ------------------------+------------------------+-------------
+--
+-- And this is how we process the (@):
+--
+-- precededByClosingToken | followedByOpeningToken | Token
+-- ------------------------+------------------------+-------------
+-- False | True | ITtypeApp
+-- True | False | parse error
+-- True | True | ITat
+-- False | False | ITvarsym "@"
+-- ------------------------+------------------------+-------------
+
+-- -----------------------------------------------------------------------------
+-- Alex "Haskell code fragment bottom"
+
+{
+
+-- -----------------------------------------------------------------------------
+-- The token type
+
+data Token
+ = ITas -- Haskell keywords
+ | ITcase
+ | ITclass
+ | ITdata
+ | ITdefault
+ | ITderiving
+ | ITdo
+ | ITelse
+ | IThiding
+ | ITforeign
+ | ITif
+ | ITimport
+ | ITin
+ | ITinfix
+ | ITinfixl
+ | ITinfixr
+ | ITinstance
+ | ITlet
+ | ITmodule
+ | ITnewtype
+ | ITof
+ | ITqualified
+ | ITthen
+ | ITtype
+ | ITwhere
+
+ | ITforall IsUnicodeSyntax -- GHC extension keywords
+ | ITexport
+ | ITlabel
+ | ITdynamic
+ | ITsafe
+ | ITinterruptible
+ | ITunsafe
+ | ITstdcallconv
+ | ITccallconv
+ | ITcapiconv
+ | ITprimcallconv
+ | ITjavascriptcallconv
+ | ITmdo
+ | ITfamily
+ | ITrole
+ | ITgroup
+ | ITby
+ | ITusing
+ | ITpattern
+ | ITstatic
+ | ITstock
+ | ITanyclass
+ | ITvia
+
+ -- Backpack tokens
+ | ITunit
+ | ITsignature
+ | ITdependency
+ | ITrequires
+
+ -- Pragmas, see note [Pragma source text] in BasicTypes
+ | ITinline_prag SourceText InlineSpec RuleMatchInfo
+ | ITspec_prag SourceText -- SPECIALISE
+ | ITspec_inline_prag SourceText Bool -- SPECIALISE INLINE (or NOINLINE)
+ | ITsource_prag SourceText
+ | ITrules_prag SourceText
+ | ITwarning_prag SourceText
+ | ITdeprecated_prag SourceText
+ | ITline_prag SourceText -- not usually produced, see 'UsePosPragsBit'
+ | ITcolumn_prag SourceText -- not usually produced, see 'UsePosPragsBit'
+ | ITscc_prag SourceText
+ | ITgenerated_prag SourceText
+ | ITcore_prag SourceText -- hdaume: core annotations
+ | ITunpack_prag SourceText
+ | ITnounpack_prag SourceText
+ | ITann_prag SourceText
+ | ITcomplete_prag SourceText
+ | ITclose_prag
+ | IToptions_prag String
+ | ITinclude_prag String
+ | ITlanguage_prag
+ | ITminimal_prag SourceText
+ | IToverlappable_prag SourceText -- instance overlap mode
+ | IToverlapping_prag SourceText -- instance overlap mode
+ | IToverlaps_prag SourceText -- instance overlap mode
+ | ITincoherent_prag SourceText -- instance overlap mode
+ | ITctype SourceText
+ | ITcomment_line_prag -- See Note [Nested comment line pragmas]
+
+ | ITdotdot -- reserved symbols
+ | ITcolon
+ | ITdcolon IsUnicodeSyntax
+ | ITequal
+ | ITlam
+ | ITlcase
+ | ITvbar
+ | ITlarrow IsUnicodeSyntax
+ | ITrarrow IsUnicodeSyntax
+ | ITdarrow IsUnicodeSyntax
+ | ITminus
+ | ITbang -- Prefix (!) only, e.g. f !x = rhs
+ | ITtilde -- Prefix (~) only, e.g. f ~x = rhs
+ | ITat -- Tight infix (@) only, e.g. f x@pat = rhs
+ | ITtypeApp -- Prefix (@) only, e.g. f @t
+ | ITstar IsUnicodeSyntax
+ | ITdot
+
+ | ITbiglam -- GHC-extension symbols
+
+ | ITocurly -- special symbols
+ | ITccurly
+ | ITvocurly
+ | ITvccurly
+ | ITobrack
+ | ITopabrack -- [:, for parallel arrays with -XParallelArrays
+ | ITcpabrack -- :], for parallel arrays with -XParallelArrays
+ | ITcbrack
+ | IToparen
+ | ITcparen
+ | IToubxparen
+ | ITcubxparen
+ | ITsemi
+ | ITcomma
+ | ITunderscore
+ | ITbackquote
+ | ITsimpleQuote -- '
+
+ | ITvarid FastString -- identifiers
+ | ITconid FastString
+ | ITvarsym FastString
+ | ITconsym FastString
+ | ITqvarid (FastString,FastString)
+ | ITqconid (FastString,FastString)
+ | ITqvarsym (FastString,FastString)
+ | ITqconsym (FastString,FastString)
+
+ | ITdupipvarid FastString -- GHC extension: implicit param: ?x
+ | ITlabelvarid FastString -- Overloaded label: #x
+
+ | ITchar SourceText Char -- Note [Literal source text] in BasicTypes
+ | ITstring SourceText FastString -- Note [Literal source text] in BasicTypes
+ | ITinteger IntegralLit -- Note [Literal source text] in BasicTypes
+ | ITrational FractionalLit
+
+ | ITprimchar SourceText Char -- Note [Literal source text] in BasicTypes
+ | ITprimstring SourceText ByteString -- Note [Literal source text] @BasicTypes
+ | ITprimint SourceText Integer -- Note [Literal source text] in BasicTypes
+ | ITprimword SourceText Integer -- Note [Literal source text] in BasicTypes
+ | ITprimfloat FractionalLit
+ | ITprimdouble FractionalLit
+
+ -- Template Haskell extension tokens
+ | ITopenExpQuote HasE IsUnicodeSyntax -- [| or [e|
+ | ITopenPatQuote -- [p|
+ | ITopenDecQuote -- [d|
+ | ITopenTypQuote -- [t|
+ | ITcloseQuote IsUnicodeSyntax -- |]
+ | ITopenTExpQuote HasE -- [|| or [e||
+ | ITcloseTExpQuote -- ||]
+ | ITdollar -- prefix $
+ | ITdollardollar -- prefix $$
+ | ITtyQuote -- ''
+ | ITquasiQuote (FastString,FastString,PsSpan)
+ -- ITquasiQuote(quoter, quote, loc)
+ -- represents a quasi-quote of the form
+ -- [quoter| quote |]
+ | ITqQuasiQuote (FastString,FastString,FastString,PsSpan)
+ -- ITqQuasiQuote(Qual, quoter, quote, loc)
+ -- represents a qualified quasi-quote of the form
+ -- [Qual.quoter| quote |]
+
+ -- Arrow notation extension
+ | ITproc
+ | ITrec
+ | IToparenbar IsUnicodeSyntax -- ^ @(|@
+ | ITcparenbar IsUnicodeSyntax -- ^ @|)@
+ | ITlarrowtail IsUnicodeSyntax -- ^ @-<@
+ | ITrarrowtail IsUnicodeSyntax -- ^ @>-@
+ | ITLarrowtail IsUnicodeSyntax -- ^ @-<<@
+ | ITRarrowtail IsUnicodeSyntax -- ^ @>>-@
+
+ | ITunknown String -- ^ Used when the lexer can't make sense of it
+ | ITeof -- ^ end of file token
+
+ -- Documentation annotations
+ | ITdocCommentNext String -- ^ something beginning @-- |@
+ | ITdocCommentPrev String -- ^ something beginning @-- ^@
+ | ITdocCommentNamed String -- ^ something beginning @-- $@
+ | ITdocSection Int String -- ^ a section heading
+ | ITdocOptions String -- ^ doc options (prune, ignore-exports, etc)
+ | ITlineComment String -- ^ comment starting by "--"
+ | ITblockComment String -- ^ comment in {- -}
+
+ deriving Show
+
+instance Outputable Token where
+ ppr x = text (show x)
+
+
+-- the bitmap provided as the third component indicates whether the
+-- corresponding extension keyword is valid under the extension options
+-- provided to the compiler; if the extension corresponding to *any* of the
+-- bits set in the bitmap is enabled, the keyword is valid (this setup
+-- facilitates using a keyword in two different extensions that can be
+-- activated independently)
+--
+reservedWordsFM :: UniqFM (Token, ExtsBitmap)
+reservedWordsFM = listToUFM $
+ map (\(x, y, z) -> (mkFastString x, (y, z)))
+ [( "_", ITunderscore, 0 ),
+ ( "as", ITas, 0 ),
+ ( "case", ITcase, 0 ),
+ ( "class", ITclass, 0 ),
+ ( "data", ITdata, 0 ),
+ ( "default", ITdefault, 0 ),
+ ( "deriving", ITderiving, 0 ),
+ ( "do", ITdo, 0 ),
+ ( "else", ITelse, 0 ),
+ ( "hiding", IThiding, 0 ),
+ ( "if", ITif, 0 ),
+ ( "import", ITimport, 0 ),
+ ( "in", ITin, 0 ),
+ ( "infix", ITinfix, 0 ),
+ ( "infixl", ITinfixl, 0 ),
+ ( "infixr", ITinfixr, 0 ),
+ ( "instance", ITinstance, 0 ),
+ ( "let", ITlet, 0 ),
+ ( "module", ITmodule, 0 ),
+ ( "newtype", ITnewtype, 0 ),
+ ( "of", ITof, 0 ),
+ ( "qualified", ITqualified, 0 ),
+ ( "then", ITthen, 0 ),
+ ( "type", ITtype, 0 ),
+ ( "where", ITwhere, 0 ),
+
+ ( "forall", ITforall NormalSyntax, 0),
+ ( "mdo", ITmdo, xbit RecursiveDoBit),
+ -- See Note [Lexing type pseudo-keywords]
+ ( "family", ITfamily, 0 ),
+ ( "role", ITrole, 0 ),
+ ( "pattern", ITpattern, xbit PatternSynonymsBit),
+ ( "static", ITstatic, xbit StaticPointersBit ),
+ ( "stock", ITstock, 0 ),
+ ( "anyclass", ITanyclass, 0 ),
+ ( "via", ITvia, 0 ),
+ ( "group", ITgroup, xbit TransformComprehensionsBit),
+ ( "by", ITby, xbit TransformComprehensionsBit),
+ ( "using", ITusing, xbit TransformComprehensionsBit),
+
+ ( "foreign", ITforeign, xbit FfiBit),
+ ( "export", ITexport, xbit FfiBit),
+ ( "label", ITlabel, xbit FfiBit),
+ ( "dynamic", ITdynamic, xbit FfiBit),
+ ( "safe", ITsafe, xbit FfiBit .|.
+ xbit SafeHaskellBit),
+ ( "interruptible", ITinterruptible, xbit InterruptibleFfiBit),
+ ( "unsafe", ITunsafe, xbit FfiBit),
+ ( "stdcall", ITstdcallconv, xbit FfiBit),
+ ( "ccall", ITccallconv, xbit FfiBit),
+ ( "capi", ITcapiconv, xbit CApiFfiBit),
+ ( "prim", ITprimcallconv, xbit FfiBit),
+ ( "javascript", ITjavascriptcallconv, xbit FfiBit),
+
+ ( "unit", ITunit, 0 ),
+ ( "dependency", ITdependency, 0 ),
+ ( "signature", ITsignature, 0 ),
+
+ ( "rec", ITrec, xbit ArrowsBit .|.
+ xbit RecursiveDoBit),
+ ( "proc", ITproc, xbit ArrowsBit)
+ ]
+
+{-----------------------------------
+Note [Lexing type pseudo-keywords]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+One might think that we wish to treat 'family' and 'role' as regular old
+varids whenever -XTypeFamilies and -XRoleAnnotations are off, respectively.
+But, there is no need to do so. These pseudo-keywords are not stolen syntax:
+they are only used after the keyword 'type' at the top-level, where varids are
+not allowed. Furthermore, checks further downstream (GHC.Tc.TyCl) ensure that
+type families and role annotations are never declared without their extensions
+on. In fact, by unconditionally lexing these pseudo-keywords as special, we
+can get better error messages.
+
+Also, note that these are included in the `varid` production in the parser --
+a key detail to make all this work.
+-------------------------------------}
+
+reservedSymsFM :: UniqFM (Token, IsUnicodeSyntax, ExtsBitmap)
+reservedSymsFM = listToUFM $
+ map (\ (x,w,y,z) -> (mkFastString x,(w,y,z)))
+ [ ("..", ITdotdot, NormalSyntax, 0 )
+ -- (:) is a reserved op, meaning only list cons
+ ,(":", ITcolon, NormalSyntax, 0 )
+ ,("::", ITdcolon NormalSyntax, NormalSyntax, 0 )
+ ,("=", ITequal, NormalSyntax, 0 )
+ ,("\\", ITlam, NormalSyntax, 0 )
+ ,("|", ITvbar, NormalSyntax, 0 )
+ ,("<-", ITlarrow NormalSyntax, NormalSyntax, 0 )
+ ,("->", ITrarrow NormalSyntax, NormalSyntax, 0 )
+ ,("=>", ITdarrow NormalSyntax, NormalSyntax, 0 )
+ ,("-", ITminus, NormalSyntax, 0 )
+
+ ,("*", ITstar NormalSyntax, NormalSyntax, xbit StarIsTypeBit)
+
+ -- For 'forall a . t'
+ ,(".", ITdot, NormalSyntax, 0 )
+
+ ,("-<", ITlarrowtail NormalSyntax, NormalSyntax, xbit ArrowsBit)
+ ,(">-", ITrarrowtail NormalSyntax, NormalSyntax, xbit ArrowsBit)
+ ,("-<<", ITLarrowtail NormalSyntax, NormalSyntax, xbit ArrowsBit)
+ ,(">>-", ITRarrowtail NormalSyntax, NormalSyntax, xbit ArrowsBit)
+
+ ,("∷", ITdcolon UnicodeSyntax, UnicodeSyntax, 0 )
+ ,("⇒", ITdarrow UnicodeSyntax, UnicodeSyntax, 0 )
+ ,("∀", ITforall UnicodeSyntax, UnicodeSyntax, 0 )
+ ,("→", ITrarrow UnicodeSyntax, UnicodeSyntax, 0 )
+ ,("←", ITlarrow UnicodeSyntax, UnicodeSyntax, 0 )
+
+ ,("⤙", ITlarrowtail UnicodeSyntax, UnicodeSyntax, xbit ArrowsBit)
+ ,("⤚", ITrarrowtail UnicodeSyntax, UnicodeSyntax, xbit ArrowsBit)
+ ,("⤛", ITLarrowtail UnicodeSyntax, UnicodeSyntax, xbit ArrowsBit)
+ ,("⤜", ITRarrowtail UnicodeSyntax, UnicodeSyntax, xbit ArrowsBit)
+
+ ,("★", ITstar UnicodeSyntax, UnicodeSyntax, xbit StarIsTypeBit)
+
+ -- ToDo: ideally, → and ∷ should be "specials", so that they cannot
+ -- form part of a large operator. This would let us have a better
+ -- syntax for kinds: ɑ∷*→* would be a legal kind signature. (maybe).
+ ]
+
+-- -----------------------------------------------------------------------------
+-- Lexer actions
+
+type Action = PsSpan -> StringBuffer -> Int -> P (PsLocated Token)
+
+special :: Token -> Action
+special tok span _buf _len = return (L span tok)
+
+token, layout_token :: Token -> Action
+token t span _buf _len = return (L span t)
+layout_token t span _buf _len = pushLexState layout >> return (L span t)
+
+idtoken :: (StringBuffer -> Int -> Token) -> Action
+idtoken f span buf len = return (L span $! (f buf len))
+
+skip_one_varid :: (FastString -> Token) -> Action
+skip_one_varid f span buf len
+ = return (L span $! f (lexemeToFastString (stepOn buf) (len-1)))
+
+skip_two_varid :: (FastString -> Token) -> Action
+skip_two_varid f span buf len
+ = return (L span $! f (lexemeToFastString (stepOn (stepOn buf)) (len-2)))
+
+strtoken :: (String -> Token) -> Action
+strtoken f span buf len =
+ return (L span $! (f $! lexemeToString buf len))
+
+begin :: Int -> Action
+begin code _span _str _len = do pushLexState code; lexToken
+
+pop :: Action
+pop _span _buf _len = do _ <- popLexState
+ lexToken
+-- See Note [Nested comment line pragmas]
+failLinePrag1 :: Action
+failLinePrag1 span _buf _len = do
+ b <- getBit InNestedCommentBit
+ if b then return (L span ITcomment_line_prag)
+ else lexError "lexical error in pragma"
+
+-- See Note [Nested comment line pragmas]
+popLinePrag1 :: Action
+popLinePrag1 span _buf _len = do
+ b <- getBit InNestedCommentBit
+ if b then return (L span ITcomment_line_prag) else do
+ _ <- popLexState
+ lexToken
+
+hopefully_open_brace :: Action
+hopefully_open_brace span buf len
+ = do relaxed <- getBit RelaxedLayoutBit
+ ctx <- getContext
+ (AI l _) <- getInput
+ let offset = srcLocCol (psRealLoc l)
+ isOK = relaxed ||
+ case ctx of
+ Layout prev_off _ : _ -> prev_off < offset
+ _ -> True
+ if isOK then pop_and open_brace span buf len
+ else addFatalError (mkSrcSpanPs span) (text "Missing block")
+
+pop_and :: Action -> Action
+pop_and act span buf len = do _ <- popLexState
+ act span buf len
+
+-- See Note [Whitespace-sensitive operator parsing]
+followedByOpeningToken :: AlexAccPred ExtsBitmap
+followedByOpeningToken _ _ _ (AI _ buf)
+ | atEnd buf = False
+ | otherwise =
+ case nextChar buf of
+ ('{', buf') -> nextCharIsNot buf' (== '-')
+ ('(', _) -> True
+ ('[', _) -> True
+ ('\"', _) -> True
+ ('\'', _) -> True
+ ('_', _) -> True
+ (c, _) -> isAlphaNum c
+
+-- See Note [Whitespace-sensitive operator parsing]
+precededByClosingToken :: AlexAccPred ExtsBitmap
+precededByClosingToken _ (AI _ buf) _ _ =
+ case prevChar buf '\n' of
+ '}' -> decodePrevNChars 1 buf /= "-"
+ ')' -> True
+ ']' -> True
+ '\"' -> True
+ '\'' -> True
+ '_' -> True
+ c -> isAlphaNum c
+
+{-# INLINE nextCharIs #-}
+nextCharIs :: StringBuffer -> (Char -> Bool) -> Bool
+nextCharIs buf p = not (atEnd buf) && p (currentChar buf)
+
+{-# INLINE nextCharIsNot #-}
+nextCharIsNot :: StringBuffer -> (Char -> Bool) -> Bool
+nextCharIsNot buf p = not (nextCharIs buf p)
+
+notFollowedBy :: Char -> AlexAccPred ExtsBitmap
+notFollowedBy char _ _ _ (AI _ buf)
+ = nextCharIsNot buf (== char)
+
+notFollowedBySymbol :: AlexAccPred ExtsBitmap
+notFollowedBySymbol _ _ _ (AI _ buf)
+ = nextCharIsNot buf (`elem` "!#$%&*+./<=>?@\\^|-~")
+
+followedByDigit :: AlexAccPred ExtsBitmap
+followedByDigit _ _ _ (AI _ buf)
+ = afterOptionalSpace buf (\b -> nextCharIs b (`elem` ['0'..'9']))
+
+ifCurrentChar :: Char -> AlexAccPred ExtsBitmap
+ifCurrentChar char _ (AI _ buf) _ _
+ = nextCharIs buf (== char)
+
+-- We must reject doc comments as being ordinary comments everywhere.
+-- In some cases the doc comment will be selected as the lexeme due to
+-- maximal munch, but not always, because the nested comment rule is
+-- valid in all states, but the doc-comment rules are only valid in
+-- the non-layout states.
+isNormalComment :: AlexAccPred ExtsBitmap
+isNormalComment bits _ _ (AI _ buf)
+ | HaddockBit `xtest` bits = notFollowedByDocOrPragma
+ | otherwise = nextCharIsNot buf (== '#')
+ where
+ notFollowedByDocOrPragma
+ = afterOptionalSpace buf (\b -> nextCharIsNot b (`elem` "|^*$#"))
+
+afterOptionalSpace :: StringBuffer -> (StringBuffer -> Bool) -> Bool
+afterOptionalSpace buf p
+ = if nextCharIs buf (== ' ')
+ then p (snd (nextChar buf))
+ else p buf
+
+atEOL :: AlexAccPred ExtsBitmap
+atEOL _ _ _ (AI _ buf) = atEnd buf || currentChar buf == '\n'
+
+ifExtension :: ExtBits -> AlexAccPred ExtsBitmap
+ifExtension extBits bits _ _ _ = extBits `xtest` bits
+
+alexNotPred p userState in1 len in2
+ = not (p userState in1 len in2)
+
+alexOrPred p1 p2 userState in1 len in2
+ = p1 userState in1 len in2 || p2 userState in1 len in2
+
+multiline_doc_comment :: Action
+multiline_doc_comment span buf _len = withLexedDocType (worker "")
+ where
+ worker commentAcc input docType checkNextLine = case alexGetChar' input of
+ Just ('\n', input')
+ | checkNextLine -> case checkIfCommentLine input' of
+ Just input -> worker ('\n':commentAcc) input docType checkNextLine
+ Nothing -> docCommentEnd input commentAcc docType buf span
+ | otherwise -> docCommentEnd input commentAcc docType buf span
+ Just (c, input) -> worker (c:commentAcc) input docType checkNextLine
+ Nothing -> docCommentEnd input commentAcc docType buf span
+
+ -- Check if the next line of input belongs to this doc comment as well.
+ -- A doc comment continues onto the next line when the following
+ -- conditions are met:
+ -- * The line starts with "--"
+ -- * The line doesn't start with "---".
+ -- * The line doesn't start with "-- $", because that would be the
+ -- start of a /new/ named haddock chunk (#10398).
+ checkIfCommentLine :: AlexInput -> Maybe AlexInput
+ checkIfCommentLine input = check (dropNonNewlineSpace input)
+ where
+ check input = do
+ ('-', input) <- alexGetChar' input
+ ('-', input) <- alexGetChar' input
+ (c, after_c) <- alexGetChar' input
+ case c of
+ '-' -> Nothing
+ ' ' -> case alexGetChar' after_c of
+ Just ('$', _) -> Nothing
+ _ -> Just input
+ _ -> Just input
+
+ dropNonNewlineSpace input = case alexGetChar' input of
+ Just (c, input')
+ | isSpace c && c /= '\n' -> dropNonNewlineSpace input'
+ | otherwise -> input
+ Nothing -> input
+
+lineCommentToken :: Action
+lineCommentToken span buf len = do
+ b <- getBit RawTokenStreamBit
+ if b then strtoken ITlineComment span buf len else lexToken
+
+{-
+ nested comments require traversing by hand, they can't be parsed
+ using regular expressions.
+-}
+nested_comment :: P (PsLocated Token) -> Action
+nested_comment cont span buf len = do
+ input <- getInput
+ go (reverse $ lexemeToString buf len) (1::Int) input
+ where
+ go commentAcc 0 input = do
+ setInput input
+ b <- getBit RawTokenStreamBit
+ if b
+ then docCommentEnd input commentAcc ITblockComment buf span
+ else cont
+ go commentAcc n input = case alexGetChar' input of
+ Nothing -> errBrace input (psRealSpan span)
+ Just ('-',input) -> case alexGetChar' input of
+ Nothing -> errBrace input (psRealSpan span)
+ Just ('\125',input) -> go ('\125':'-':commentAcc) (n-1) input -- '}'
+ Just (_,_) -> go ('-':commentAcc) n input
+ Just ('\123',input) -> case alexGetChar' input of -- '{' char
+ Nothing -> errBrace input (psRealSpan span)
+ Just ('-',input) -> go ('-':'\123':commentAcc) (n+1) input
+ Just (_,_) -> go ('\123':commentAcc) n input
+ -- See Note [Nested comment line pragmas]
+ Just ('\n',input) -> case alexGetChar' input of
+ Nothing -> errBrace input (psRealSpan span)
+ Just ('#',_) -> do (parsedAcc,input) <- parseNestedPragma input
+ go (parsedAcc ++ '\n':commentAcc) n input
+ Just (_,_) -> go ('\n':commentAcc) n input
+ Just (c,input) -> go (c:commentAcc) n input
+
+nested_doc_comment :: Action
+nested_doc_comment span buf _len = withLexedDocType (go "")
+ where
+ go commentAcc input docType _ = case alexGetChar' input of
+ Nothing -> errBrace input (psRealSpan span)
+ Just ('-',input) -> case alexGetChar' input of
+ Nothing -> errBrace input (psRealSpan span)
+ Just ('\125',input) ->
+ docCommentEnd input commentAcc docType buf span
+ Just (_,_) -> go ('-':commentAcc) input docType False
+ Just ('\123', input) -> case alexGetChar' input of
+ Nothing -> errBrace input (psRealSpan span)
+ Just ('-',input) -> do
+ setInput input
+ let cont = do input <- getInput; go commentAcc input docType False
+ nested_comment cont span buf _len
+ Just (_,_) -> go ('\123':commentAcc) input docType False
+ -- See Note [Nested comment line pragmas]
+ Just ('\n',input) -> case alexGetChar' input of
+ Nothing -> errBrace input (psRealSpan span)
+ Just ('#',_) -> do (parsedAcc,input) <- parseNestedPragma input
+ go (parsedAcc ++ '\n':commentAcc) input docType False
+ Just (_,_) -> go ('\n':commentAcc) input docType False
+ Just (c,input) -> go (c:commentAcc) input docType False
+
+-- See Note [Nested comment line pragmas]
+parseNestedPragma :: AlexInput -> P (String,AlexInput)
+parseNestedPragma input@(AI _ buf) = do
+ origInput <- getInput
+ setInput input
+ setExts (.|. xbit InNestedCommentBit)
+ pushLexState bol
+ lt <- lexToken
+ _ <- popLexState
+ setExts (.&. complement (xbit InNestedCommentBit))
+ postInput@(AI _ postBuf) <- getInput
+ setInput origInput
+ case unLoc lt of
+ ITcomment_line_prag -> do
+ let bytes = byteDiff buf postBuf
+ diff = lexemeToString buf bytes
+ return (reverse diff, postInput)
+ lt' -> panic ("parseNestedPragma: unexpected token" ++ (show lt'))
+
+{-
+Note [Nested comment line pragmas]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+We used to ignore cpp-preprocessor-generated #line pragmas if they were inside
+nested comments.
+
+Now, when parsing a nested comment, if we encounter a line starting with '#' we
+call parseNestedPragma, which executes the following:
+1. Save the current lexer input (loc, buf) for later
+2. Set the current lexer input to the beginning of the line starting with '#'
+3. Turn the 'InNestedComment' extension on
+4. Push the 'bol' lexer state
+5. Lex a token. Due to (2), (3), and (4), this should always lex a single line
+ or less and return the ITcomment_line_prag token. This may set source line
+ and file location if a #line pragma is successfully parsed
+6. Restore lexer input and state to what they were before we did all this
+7. Return control to the function parsing a nested comment, informing it of
+ what the lexer parsed
+
+Regarding (5) above:
+Every exit from the 'bol' lexer state (do_bol, popLinePrag1, failLinePrag1)
+checks if the 'InNestedComment' extension is set. If it is, that function will
+return control to parseNestedPragma by returning the ITcomment_line_prag token.
+
+See #314 for more background on the bug this fixes.
+-}
+
+withLexedDocType :: (AlexInput -> (String -> Token) -> Bool -> P (PsLocated Token))
+ -> P (PsLocated Token)
+withLexedDocType lexDocComment = do
+ input@(AI _ buf) <- getInput
+ case prevChar buf ' ' of
+ -- The `Bool` argument to lexDocComment signals whether or not the next
+ -- line of input might also belong to this doc comment.
+ '|' -> lexDocComment input ITdocCommentNext True
+ '^' -> lexDocComment input ITdocCommentPrev True
+ '$' -> lexDocComment input ITdocCommentNamed True
+ '*' -> lexDocSection 1 input
+ _ -> panic "withLexedDocType: Bad doc type"
+ where
+ lexDocSection n input = case alexGetChar' input of
+ Just ('*', input) -> lexDocSection (n+1) input
+ Just (_, _) -> lexDocComment input (ITdocSection n) False
+ Nothing -> do setInput input; lexToken -- eof reached, lex it normally
+
+-- RULES pragmas turn on the forall and '.' keywords, and we turn them
+-- off again at the end of the pragma.
+rulePrag :: Action
+rulePrag span buf len = do
+ setExts (.|. xbit InRulePragBit)
+ let !src = lexemeToString buf len
+ return (L span (ITrules_prag (SourceText src)))
+
+-- When 'UsePosPragsBit' is not set, it is expected that we emit a token instead
+-- of updating the position in 'PState'
+linePrag :: Action
+linePrag span buf len = do
+ usePosPrags <- getBit UsePosPragsBit
+ if usePosPrags
+ then begin line_prag2 span buf len
+ else let !src = lexemeToString buf len
+ in return (L span (ITline_prag (SourceText src)))
+
+-- When 'UsePosPragsBit' is not set, it is expected that we emit a token instead
+-- of updating the position in 'PState'
+columnPrag :: Action
+columnPrag span buf len = do
+ usePosPrags <- getBit UsePosPragsBit
+ let !src = lexemeToString buf len
+ if usePosPrags
+ then begin column_prag span buf len
+ else let !src = lexemeToString buf len
+ in return (L span (ITcolumn_prag (SourceText src)))
+
+endPrag :: Action
+endPrag span _buf _len = do
+ setExts (.&. complement (xbit InRulePragBit))
+ return (L span ITclose_prag)
+
+-- docCommentEnd
+-------------------------------------------------------------------------------
+-- This function is quite tricky. We can't just return a new token, we also
+-- need to update the state of the parser. Why? Because the token is longer
+-- than what was lexed by Alex, and the lexToken function doesn't know this, so
+-- it writes the wrong token length to the parser state. This function is
+-- called afterwards, so it can just update the state.
+
+docCommentEnd :: AlexInput -> String -> (String -> Token) -> StringBuffer ->
+ PsSpan -> P (PsLocated Token)
+docCommentEnd input commentAcc docType buf span = do
+ setInput input
+ let (AI loc nextBuf) = input
+ comment = reverse commentAcc
+ span' = mkPsSpan (psSpanStart span) loc
+ last_len = byteDiff buf nextBuf
+
+ span `seq` setLastToken span' last_len
+ return (L span' (docType comment))
+
+errBrace :: AlexInput -> RealSrcSpan -> P a
+errBrace (AI end _) span = failLocMsgP (realSrcSpanStart span) (psRealLoc end) "unterminated `{-'"
+
+open_brace, close_brace :: Action
+open_brace span _str _len = do
+ ctx <- getContext
+ setContext (NoLayout:ctx)
+ return (L span ITocurly)
+close_brace span _str _len = do
+ popContext
+ return (L span ITccurly)
+
+qvarid, qconid :: StringBuffer -> Int -> Token
+qvarid buf len = ITqvarid $! splitQualName buf len False
+qconid buf len = ITqconid $! splitQualName buf len False
+
+splitQualName :: StringBuffer -> Int -> Bool -> (FastString,FastString)
+-- takes a StringBuffer and a length, and returns the module name
+-- and identifier parts of a qualified name. Splits at the *last* dot,
+-- because of hierarchical module names.
+splitQualName orig_buf len parens = split orig_buf orig_buf
+ where
+ split buf dot_buf
+ | orig_buf `byteDiff` buf >= len = done dot_buf
+ | c == '.' = found_dot buf'
+ | otherwise = split buf' dot_buf
+ where
+ (c,buf') = nextChar buf
+
+ -- careful, we might get names like M....
+ -- so, if the character after the dot is not upper-case, this is
+ -- the end of the qualifier part.
+ found_dot buf -- buf points after the '.'
+ | isUpper c = split buf' buf
+ | otherwise = done buf
+ where
+ (c,buf') = nextChar buf
+
+ done dot_buf =
+ (lexemeToFastString orig_buf (qual_size - 1),
+ if parens -- Prelude.(+)
+ then lexemeToFastString (stepOn dot_buf) (len - qual_size - 2)
+ else lexemeToFastString dot_buf (len - qual_size))
+ where
+ qual_size = orig_buf `byteDiff` dot_buf
+
+varid :: Action
+varid span buf len =
+ case lookupUFM reservedWordsFM fs of
+ Just (ITcase, _) -> do
+ lastTk <- getLastTk
+ keyword <- case lastTk of
+ Just ITlam -> do
+ lambdaCase <- getBit LambdaCaseBit
+ unless lambdaCase $ do
+ pState <- getPState
+ addError (mkSrcSpanPs (last_loc pState)) $ text
+ "Illegal lambda-case (use LambdaCase)"
+ return ITlcase
+ _ -> return ITcase
+ maybe_layout keyword
+ return $ L span keyword
+ Just (keyword, 0) -> do
+ maybe_layout keyword
+ return $ L span keyword
+ Just (keyword, i) -> do
+ exts <- getExts
+ if exts .&. i /= 0
+ then do
+ maybe_layout keyword
+ return $ L span keyword
+ else
+ return $ L span $ ITvarid fs
+ Nothing ->
+ return $ L span $ ITvarid fs
+ where
+ !fs = lexemeToFastString buf len
+
+conid :: StringBuffer -> Int -> Token
+conid buf len = ITconid $! lexemeToFastString buf len
+
+qvarsym, qconsym :: StringBuffer -> Int -> Token
+qvarsym buf len = ITqvarsym $! splitQualName buf len False
+qconsym buf len = ITqconsym $! splitQualName buf len False
+
+-- See Note [Whitespace-sensitive operator parsing]
+varsym_prefix :: Action
+varsym_prefix = sym $ \exts s ->
+ if | TypeApplicationsBit `xtest` exts, s == fsLit "@"
+ -> return ITtypeApp
+ | ThQuotesBit `xtest` exts, s == fsLit "$"
+ -> return ITdollar
+ | ThQuotesBit `xtest` exts, s == fsLit "$$"
+ -> return ITdollardollar
+ | s == fsLit "!" -> return ITbang
+ | s == fsLit "~" -> return ITtilde
+ | otherwise -> return (ITvarsym s)
+
+-- See Note [Whitespace-sensitive operator parsing]
+varsym_suffix :: Action
+varsym_suffix = sym $ \_ s ->
+ if | s == fsLit "@"
+ -> failMsgP "Suffix occurrence of @. For an as-pattern, remove the leading whitespace."
+ | otherwise -> return (ITvarsym s)
+
+-- See Note [Whitespace-sensitive operator parsing]
+varsym_tight_infix :: Action
+varsym_tight_infix = sym $ \_ s ->
+ if | s == fsLit "@" -> return ITat
+ | otherwise -> return (ITvarsym s)
+
+-- See Note [Whitespace-sensitive operator parsing]
+varsym_loose_infix :: Action
+varsym_loose_infix = sym (\_ s -> return $ ITvarsym s)
+
+consym :: Action
+consym = sym (\_exts s -> return $ ITconsym s)
+
+sym :: (ExtsBitmap -> FastString -> P Token) -> Action
+sym con span buf len =
+ case lookupUFM reservedSymsFM fs of
+ Just (keyword, NormalSyntax, 0) ->
+ return $ L span keyword
+ Just (keyword, NormalSyntax, i) -> do
+ exts <- getExts
+ if exts .&. i /= 0
+ then return $ L span keyword
+ else L span <$!> con exts fs
+ Just (keyword, UnicodeSyntax, 0) -> do
+ exts <- getExts
+ if xtest UnicodeSyntaxBit exts
+ then return $ L span keyword
+ else L span <$!> con exts fs
+ Just (keyword, UnicodeSyntax, i) -> do
+ exts <- getExts
+ if exts .&. i /= 0 && xtest UnicodeSyntaxBit exts
+ then return $ L span keyword
+ else L span <$!> con exts fs
+ Nothing -> do
+ exts <- getExts
+ L span <$!> con exts fs
+ where
+ !fs = lexemeToFastString buf len
+
+-- Variations on the integral numeric literal.
+tok_integral :: (SourceText -> Integer -> Token)
+ -> (Integer -> Integer)
+ -> Int -> Int
+ -> (Integer, (Char -> Int))
+ -> Action
+tok_integral itint transint transbuf translen (radix,char_to_int) span buf len = do
+ numericUnderscores <- getBit NumericUnderscoresBit -- #14473
+ let src = lexemeToString buf len
+ when ((not numericUnderscores) && ('_' `elem` src)) $ do
+ pState <- getPState
+ addError (mkSrcSpanPs (last_loc pState)) $ text
+ "Use NumericUnderscores to allow underscores in integer literals"
+ return $ L span $ itint (SourceText src)
+ $! transint $ parseUnsignedInteger
+ (offsetBytes transbuf buf) (subtract translen len) radix char_to_int
+
+tok_num :: (Integer -> Integer)
+ -> Int -> Int
+ -> (Integer, (Char->Int)) -> Action
+tok_num = tok_integral $ \case
+ st@(SourceText ('-':_)) -> itint st (const True)
+ st@(SourceText _) -> itint st (const False)
+ st@NoSourceText -> itint st (< 0)
+ where
+ itint :: SourceText -> (Integer -> Bool) -> Integer -> Token
+ itint !st is_negative !val = ITinteger ((IL st $! is_negative val) val)
+
+tok_primint :: (Integer -> Integer)
+ -> Int -> Int
+ -> (Integer, (Char->Int)) -> Action
+tok_primint = tok_integral ITprimint
+
+
+tok_primword :: Int -> Int
+ -> (Integer, (Char->Int)) -> Action
+tok_primword = tok_integral ITprimword positive
+positive, negative :: (Integer -> Integer)
+positive = id
+negative = negate
+decimal, octal, hexadecimal :: (Integer, Char -> Int)
+decimal = (10,octDecDigit)
+binary = (2,octDecDigit)
+octal = (8,octDecDigit)
+hexadecimal = (16,hexDigit)
+
+-- readRational can understand negative rationals, exponents, everything.
+tok_frac :: Int -> (String -> Token) -> Action
+tok_frac drop f span buf len = do
+ numericUnderscores <- getBit NumericUnderscoresBit -- #14473
+ let src = lexemeToString buf (len-drop)
+ when ((not numericUnderscores) && ('_' `elem` src)) $ do
+ pState <- getPState
+ addError (mkSrcSpanPs (last_loc pState)) $ text
+ "Use NumericUnderscores to allow underscores in floating literals"
+ return (L span $! (f $! src))
+
+tok_float, tok_primfloat, tok_primdouble :: String -> Token
+tok_float str = ITrational $! readFractionalLit str
+tok_hex_float str = ITrational $! readHexFractionalLit str
+tok_primfloat str = ITprimfloat $! readFractionalLit str
+tok_primdouble str = ITprimdouble $! readFractionalLit str
+
+readFractionalLit :: String -> FractionalLit
+readFractionalLit str = ((FL $! (SourceText str)) $! is_neg) $! readRational str
+ where is_neg = case str of ('-':_) -> True
+ _ -> False
+readHexFractionalLit :: String -> FractionalLit
+readHexFractionalLit str =
+ FL { fl_text = SourceText str
+ , fl_neg = case str of
+ '-' : _ -> True
+ _ -> False
+ , fl_value = readHexRational str
+ }
+
+-- -----------------------------------------------------------------------------
+-- Layout processing
+
+-- we're at the first token on a line, insert layout tokens if necessary
+do_bol :: Action
+do_bol span _str _len = do
+ -- See Note [Nested comment line pragmas]
+ b <- getBit InNestedCommentBit
+ if b then return (L span ITcomment_line_prag) else do
+ (pos, gen_semic) <- getOffside
+ case pos of
+ LT -> do
+ --trace "layout: inserting '}'" $ do
+ popContext
+ -- do NOT pop the lex state, we might have a ';' to insert
+ return (L span ITvccurly)
+ EQ | gen_semic -> do
+ --trace "layout: inserting ';'" $ do
+ _ <- popLexState
+ return (L span ITsemi)
+ _ -> do
+ _ <- popLexState
+ lexToken
+
+-- certain keywords put us in the "layout" state, where we might
+-- add an opening curly brace.
+maybe_layout :: Token -> P ()
+maybe_layout t = do -- If the alternative layout rule is enabled then
+ -- we never create an implicit layout context here.
+ -- Layout is handled XXX instead.
+ -- The code for closing implicit contexts, or
+ -- inserting implicit semi-colons, is therefore
+ -- irrelevant as it only applies in an implicit
+ -- context.
+ alr <- getBit AlternativeLayoutRuleBit
+ unless alr $ f t
+ where f ITdo = pushLexState layout_do
+ f ITmdo = pushLexState layout_do
+ f ITof = pushLexState layout
+ f ITlcase = pushLexState layout
+ f ITlet = pushLexState layout
+ f ITwhere = pushLexState layout
+ f ITrec = pushLexState layout
+ f ITif = pushLexState layout_if
+ f _ = return ()
+
+-- Pushing a new implicit layout context. If the indentation of the
+-- next token is not greater than the previous layout context, then
+-- Haskell 98 says that the new layout context should be empty; that is
+-- the lexer must generate {}.
+--
+-- We are slightly more lenient than this: when the new context is started
+-- by a 'do', then we allow the new context to be at the same indentation as
+-- the previous context. This is what the 'strict' argument is for.
+new_layout_context :: Bool -> Bool -> Token -> Action
+new_layout_context strict gen_semic tok span _buf len = do
+ _ <- popLexState
+ (AI l _) <- getInput
+ let offset = srcLocCol (psRealLoc l) - len
+ ctx <- getContext
+ nondecreasing <- getBit NondecreasingIndentationBit
+ let strict' = strict || not nondecreasing
+ case ctx of
+ Layout prev_off _ : _ |
+ (strict' && prev_off >= offset ||
+ not strict' && prev_off > offset) -> do
+ -- token is indented to the left of the previous context.
+ -- we must generate a {} sequence now.
+ pushLexState layout_left
+ return (L span tok)
+ _ -> do setContext (Layout offset gen_semic : ctx)
+ return (L span tok)
+
+do_layout_left :: Action
+do_layout_left span _buf _len = do
+ _ <- popLexState
+ pushLexState bol -- we must be at the start of a line
+ return (L span ITvccurly)
+
+-- -----------------------------------------------------------------------------
+-- LINE pragmas
+
+setLineAndFile :: Int -> Action
+setLineAndFile code (PsSpan span _) buf len = do
+ let src = lexemeToString buf (len - 1) -- drop trailing quotation mark
+ linenumLen = length $ head $ words src
+ linenum = parseUnsignedInteger buf linenumLen 10 octDecDigit
+ file = mkFastString $ go $ drop 1 $ dropWhile (/= '"') src
+ -- skip everything through first quotation mark to get to the filename
+ where go ('\\':c:cs) = c : go cs
+ go (c:cs) = c : go cs
+ go [] = []
+ -- decode escapes in the filename. e.g. on Windows
+ -- when our filenames have backslashes in, gcc seems to
+ -- escape the backslashes. One symptom of not doing this
+ -- is that filenames in error messages look a bit strange:
+ -- C:\\foo\bar.hs
+ -- only the first backslash is doubled, because we apply
+ -- System.FilePath.normalise before printing out
+ -- filenames and it does not remove duplicate
+ -- backslashes after the drive letter (should it?).
+ resetAlrLastLoc file
+ setSrcLoc (mkRealSrcLoc file (fromIntegral linenum - 1) (srcSpanEndCol span))
+ -- subtract one: the line number refers to the *following* line
+ addSrcFile file
+ _ <- popLexState
+ pushLexState code
+ lexToken
+
+setColumn :: Action
+setColumn (PsSpan span _) buf len = do
+ let column =
+ case reads (lexemeToString buf len) of
+ [(column, _)] -> column
+ _ -> error "setColumn: expected integer" -- shouldn't happen
+ setSrcLoc (mkRealSrcLoc (srcSpanFile span) (srcSpanEndLine span)
+ (fromIntegral (column :: Integer)))
+ _ <- popLexState
+ lexToken
+
+alrInitialLoc :: FastString -> RealSrcSpan
+alrInitialLoc file = mkRealSrcSpan loc loc
+ where -- This is a hack to ensure that the first line in a file
+ -- looks like it is after the initial location:
+ loc = mkRealSrcLoc file (-1) (-1)
+
+-- -----------------------------------------------------------------------------
+-- Options, includes and language pragmas.
+
+lex_string_prag :: (String -> Token) -> Action
+lex_string_prag mkTok span _buf _len
+ = do input <- getInput
+ start <- getParsedLoc
+ tok <- go [] input
+ end <- getParsedLoc
+ return (L (mkPsSpan start end) tok)
+ where go acc input
+ = if isString input "#-}"
+ then do setInput input
+ return (mkTok (reverse acc))
+ else case alexGetChar input of
+ Just (c,i) -> go (c:acc) i
+ Nothing -> err input
+ isString _ [] = True
+ isString i (x:xs)
+ = case alexGetChar i of
+ Just (c,i') | c == x -> isString i' xs
+ _other -> False
+ err (AI end _) = failLocMsgP (realSrcSpanStart (psRealSpan span)) (psRealLoc end) "unterminated options pragma"
+
+
+-- -----------------------------------------------------------------------------
+-- Strings & Chars
+
+-- This stuff is horrible. I hates it.
+
+lex_string_tok :: Action
+lex_string_tok span buf _len = do
+ tok <- lex_string ""
+ (AI end bufEnd) <- getInput
+ let
+ tok' = case tok of
+ ITprimstring _ bs -> ITprimstring (SourceText src) bs
+ ITstring _ s -> ITstring (SourceText src) s
+ _ -> panic "lex_string_tok"
+ src = lexemeToString buf (cur bufEnd - cur buf)
+ return (L (mkPsSpan (psSpanStart span) end) tok')
+
+lex_string :: String -> P Token
+lex_string s = do
+ i <- getInput
+ case alexGetChar' i of
+ Nothing -> lit_error i
+
+ Just ('"',i) -> do
+ setInput i
+ let s' = reverse s
+ magicHash <- getBit MagicHashBit
+ if magicHash
+ then do
+ i <- getInput
+ case alexGetChar' i of
+ Just ('#',i) -> do
+ setInput i
+ when (any (> '\xFF') s') $ do
+ pState <- getPState
+ addError (mkSrcSpanPs (last_loc pState)) $ text
+ "primitive string literal must contain only characters <= \'\\xFF\'"
+ return (ITprimstring (SourceText s') (unsafeMkByteString s'))
+ _other ->
+ return (ITstring (SourceText s') (mkFastString s'))
+ else
+ return (ITstring (SourceText s') (mkFastString s'))
+
+ Just ('\\',i)
+ | Just ('&',i) <- next -> do
+ setInput i; lex_string s
+ | Just (c,i) <- next, c <= '\x7f' && is_space c -> do
+ -- is_space only works for <= '\x7f' (#3751, #5425)
+ setInput i; lex_stringgap s
+ where next = alexGetChar' i
+
+ Just (c, i1) -> do
+ case c of
+ '\\' -> do setInput i1; c' <- lex_escape; lex_string (c':s)
+ c | isAny c -> do setInput i1; lex_string (c:s)
+ _other -> lit_error i
+
+lex_stringgap :: String -> P Token
+lex_stringgap s = do
+ i <- getInput
+ c <- getCharOrFail i
+ case c of
+ '\\' -> lex_string s
+ c | c <= '\x7f' && is_space c -> lex_stringgap s
+ -- is_space only works for <= '\x7f' (#3751, #5425)
+ _other -> lit_error i
+
+
+lex_char_tok :: Action
+-- Here we are basically parsing character literals, such as 'x' or '\n'
+-- but we additionally spot 'x and ''T, returning ITsimpleQuote and
+-- ITtyQuote respectively, but WITHOUT CONSUMING the x or T part
+-- (the parser does that).
+-- So we have to do two characters of lookahead: when we see 'x we need to
+-- see if there's a trailing quote
+lex_char_tok span buf _len = do -- We've seen '
+ i1 <- getInput -- Look ahead to first character
+ let loc = psSpanStart span
+ case alexGetChar' i1 of
+ Nothing -> lit_error i1
+
+ Just ('\'', i2@(AI end2 _)) -> do -- We've seen ''
+ setInput i2
+ return (L (mkPsSpan loc end2) ITtyQuote)
+
+ Just ('\\', i2@(AI _end2 _)) -> do -- We've seen 'backslash
+ setInput i2
+ lit_ch <- lex_escape
+ i3 <- getInput
+ mc <- getCharOrFail i3 -- Trailing quote
+ if mc == '\'' then finish_char_tok buf loc lit_ch
+ else lit_error i3
+
+ Just (c, i2@(AI _end2 _))
+ | not (isAny c) -> lit_error i1
+ | otherwise ->
+
+ -- We've seen 'x, where x is a valid character
+ -- (i.e. not newline etc) but not a quote or backslash
+ case alexGetChar' i2 of -- Look ahead one more character
+ Just ('\'', i3) -> do -- We've seen 'x'
+ setInput i3
+ finish_char_tok buf loc c
+ _other -> do -- We've seen 'x not followed by quote
+ -- (including the possibility of EOF)
+ -- Just parse the quote only
+ let (AI end _) = i1
+ return (L (mkPsSpan loc end) ITsimpleQuote)
+
+finish_char_tok :: StringBuffer -> PsLoc -> Char -> P (PsLocated Token)
+finish_char_tok buf loc ch -- We've already seen the closing quote
+ -- Just need to check for trailing #
+ = do magicHash <- getBit MagicHashBit
+ i@(AI end bufEnd) <- getInput
+ let src = lexemeToString buf (cur bufEnd - cur buf)
+ if magicHash then do
+ case alexGetChar' i of
+ Just ('#',i@(AI end _)) -> do
+ setInput i
+ return (L (mkPsSpan loc end)
+ (ITprimchar (SourceText src) ch))
+ _other ->
+ return (L (mkPsSpan loc end)
+ (ITchar (SourceText src) ch))
+ else do
+ return (L (mkPsSpan loc end) (ITchar (SourceText src) ch))
+
+isAny :: Char -> Bool
+isAny c | c > '\x7f' = isPrint c
+ | otherwise = is_any c
+
+lex_escape :: P Char
+lex_escape = do
+ i0 <- getInput
+ c <- getCharOrFail i0
+ case c of
+ 'a' -> return '\a'
+ 'b' -> return '\b'
+ 'f' -> return '\f'
+ 'n' -> return '\n'
+ 'r' -> return '\r'
+ 't' -> return '\t'
+ 'v' -> return '\v'
+ '\\' -> return '\\'
+ '"' -> return '\"'
+ '\'' -> return '\''
+ '^' -> do i1 <- getInput
+ c <- getCharOrFail i1
+ if c >= '@' && c <= '_'
+ then return (chr (ord c - ord '@'))
+ else lit_error i1
+
+ 'x' -> readNum is_hexdigit 16 hexDigit
+ 'o' -> readNum is_octdigit 8 octDecDigit
+ x | is_decdigit x -> readNum2 is_decdigit 10 octDecDigit (octDecDigit x)
+
+ c1 -> do
+ i <- getInput
+ case alexGetChar' i of
+ Nothing -> lit_error i0
+ Just (c2,i2) ->
+ case alexGetChar' i2 of
+ Nothing -> do lit_error i0
+ Just (c3,i3) ->
+ let str = [c1,c2,c3] in
+ case [ (c,rest) | (p,c) <- silly_escape_chars,
+ Just rest <- [stripPrefix p str] ] of
+ (escape_char,[]):_ -> do
+ setInput i3
+ return escape_char
+ (escape_char,_:_):_ -> do
+ setInput i2
+ return escape_char
+ [] -> lit_error i0
+
+readNum :: (Char -> Bool) -> Int -> (Char -> Int) -> P Char
+readNum is_digit base conv = do
+ i <- getInput
+ c <- getCharOrFail i
+ if is_digit c
+ then readNum2 is_digit base conv (conv c)
+ else lit_error i
+
+readNum2 :: (Char -> Bool) -> Int -> (Char -> Int) -> Int -> P Char
+readNum2 is_digit base conv i = do
+ input <- getInput
+ read i input
+ where read i input = do
+ case alexGetChar' input of
+ Just (c,input') | is_digit c -> do
+ let i' = i*base + conv c
+ if i' > 0x10ffff
+ then setInput input >> lexError "numeric escape sequence out of range"
+ else read i' input'
+ _other -> do
+ setInput input; return (chr i)
+
+
+silly_escape_chars :: [(String, Char)]
+silly_escape_chars = [
+ ("NUL", '\NUL'),
+ ("SOH", '\SOH'),
+ ("STX", '\STX'),
+ ("ETX", '\ETX'),
+ ("EOT", '\EOT'),
+ ("ENQ", '\ENQ'),
+ ("ACK", '\ACK'),
+ ("BEL", '\BEL'),
+ ("BS", '\BS'),
+ ("HT", '\HT'),
+ ("LF", '\LF'),
+ ("VT", '\VT'),
+ ("FF", '\FF'),
+ ("CR", '\CR'),
+ ("SO", '\SO'),
+ ("SI", '\SI'),
+ ("DLE", '\DLE'),
+ ("DC1", '\DC1'),
+ ("DC2", '\DC2'),
+ ("DC3", '\DC3'),
+ ("DC4", '\DC4'),
+ ("NAK", '\NAK'),
+ ("SYN", '\SYN'),
+ ("ETB", '\ETB'),
+ ("CAN", '\CAN'),
+ ("EM", '\EM'),
+ ("SUB", '\SUB'),
+ ("ESC", '\ESC'),
+ ("FS", '\FS'),
+ ("GS", '\GS'),
+ ("RS", '\RS'),
+ ("US", '\US'),
+ ("SP", '\SP'),
+ ("DEL", '\DEL')
+ ]
+
+-- before calling lit_error, ensure that the current input is pointing to
+-- the position of the error in the buffer. This is so that we can report
+-- a correct location to the user, but also so we can detect UTF-8 decoding
+-- errors if they occur.
+lit_error :: AlexInput -> P a
+lit_error i = do setInput i; lexError "lexical error in string/character literal"
+
+getCharOrFail :: AlexInput -> P Char
+getCharOrFail i = do
+ case alexGetChar' i of
+ Nothing -> lexError "unexpected end-of-file in string/character literal"
+ Just (c,i) -> do setInput i; return c
+
+-- -----------------------------------------------------------------------------
+-- QuasiQuote
+
+lex_qquasiquote_tok :: Action
+lex_qquasiquote_tok span buf len = do
+ let (qual, quoter) = splitQualName (stepOn buf) (len - 2) False
+ quoteStart <- getParsedLoc
+ quote <- lex_quasiquote (psRealLoc quoteStart) ""
+ end <- getParsedLoc
+ return (L (mkPsSpan (psSpanStart span) end)
+ (ITqQuasiQuote (qual,
+ quoter,
+ mkFastString (reverse quote),
+ mkPsSpan quoteStart end)))
+
+lex_quasiquote_tok :: Action
+lex_quasiquote_tok span buf len = do
+ let quoter = tail (lexemeToString buf (len - 1))
+ -- 'tail' drops the initial '[',
+ -- while the -1 drops the trailing '|'
+ quoteStart <- getParsedLoc
+ quote <- lex_quasiquote (psRealLoc quoteStart) ""
+ end <- getParsedLoc
+ return (L (mkPsSpan (psSpanStart span) end)
+ (ITquasiQuote (mkFastString quoter,
+ mkFastString (reverse quote),
+ mkPsSpan quoteStart end)))
+
+lex_quasiquote :: RealSrcLoc -> String -> P String
+lex_quasiquote start s = do
+ i <- getInput
+ case alexGetChar' i of
+ Nothing -> quasiquote_error start
+
+ -- NB: The string "|]" terminates the quasiquote,
+ -- with absolutely no escaping. See the extensive
+ -- discussion on #5348 for why there is no
+ -- escape handling.
+ Just ('|',i)
+ | Just (']',i) <- alexGetChar' i
+ -> do { setInput i; return s }
+
+ Just (c, i) -> do
+ setInput i; lex_quasiquote start (c : s)
+
+quasiquote_error :: RealSrcLoc -> P a
+quasiquote_error start = do
+ (AI end buf) <- getInput
+ reportLexError start (psRealLoc end) buf "unterminated quasiquotation"
+
+-- -----------------------------------------------------------------------------
+-- Warnings
+
+warnTab :: Action
+warnTab srcspan _buf _len = do
+ addTabWarning (psRealSpan srcspan)
+ lexToken
+
+warnThen :: WarningFlag -> SDoc -> Action -> Action
+warnThen option warning action srcspan buf len = do
+ addWarning option (RealSrcSpan (psRealSpan srcspan) Nothing) warning
+ action srcspan buf len
+
+-- -----------------------------------------------------------------------------
+-- The Parse Monad
+
+-- | Do we want to generate ';' layout tokens? In some cases we just want to
+-- generate '}', e.g. in MultiWayIf we don't need ';'s because '|' separates
+-- alternatives (unlike a `case` expression where we need ';' to as a separator
+-- between alternatives).
+type GenSemic = Bool
+
+generateSemic, dontGenerateSemic :: GenSemic
+generateSemic = True
+dontGenerateSemic = False
+
+data LayoutContext
+ = NoLayout
+ | Layout !Int !GenSemic
+ deriving Show
+
+-- | The result of running a parser.
+data ParseResult a
+ = POk -- ^ The parser has consumed a (possibly empty) prefix
+ -- of the input and produced a result. Use 'getMessages'
+ -- to check for accumulated warnings and non-fatal errors.
+ PState -- ^ The resulting parsing state. Can be used to resume parsing.
+ a -- ^ The resulting value.
+ | PFailed -- ^ The parser has consumed a (possibly empty) prefix
+ -- of the input and failed.
+ PState -- ^ The parsing state right before failure, including the fatal
+ -- parse error. 'getMessages' and 'getErrorMessages' must return
+ -- a non-empty bag of errors.
+
+-- | Test whether a 'WarningFlag' is set
+warnopt :: WarningFlag -> ParserFlags -> Bool
+warnopt f options = f `EnumSet.member` pWarningFlags options
+
+-- | The subset of the 'DynFlags' used by the parser.
+-- See 'mkParserFlags' or 'mkParserFlags'' for ways to construct this.
+data ParserFlags = ParserFlags {
+ pWarningFlags :: EnumSet WarningFlag
+ , pThisPackage :: UnitId -- ^ key of package currently being compiled
+ , pExtsBitmap :: !ExtsBitmap -- ^ bitmap of permitted extensions
+ }
+
+data PState = PState {
+ buffer :: StringBuffer,
+ options :: ParserFlags,
+ -- This needs to take DynFlags as an argument until
+ -- we have a fix for #10143
+ messages :: DynFlags -> Messages,
+ tab_first :: Maybe RealSrcSpan, -- pos of first tab warning in the file
+ tab_count :: !Int, -- number of tab warnings in the file
+ last_tk :: Maybe Token,
+ last_loc :: PsSpan, -- pos of previous token
+ last_len :: !Int, -- len of previous token
+ loc :: PsLoc, -- current loc (end of prev token + 1)
+ context :: [LayoutContext],
+ lex_state :: [Int],
+ srcfiles :: [FastString],
+ -- Used in the alternative layout rule:
+ -- These tokens are the next ones to be sent out. They are
+ -- just blindly emitted, without the rule looking at them again:
+ alr_pending_implicit_tokens :: [PsLocated Token],
+ -- This is the next token to be considered or, if it is Nothing,
+ -- we need to get the next token from the input stream:
+ alr_next_token :: Maybe (PsLocated Token),
+ -- This is what we consider to be the location of the last token
+ -- emitted:
+ alr_last_loc :: PsSpan,
+ -- The stack of layout contexts:
+ alr_context :: [ALRContext],
+ -- Are we expecting a '{'? If it's Just, then the ALRLayout tells
+ -- us what sort of layout the '{' will open:
+ alr_expecting_ocurly :: Maybe ALRLayout,
+ -- Have we just had the '}' for a let block? If so, than an 'in'
+ -- token doesn't need to close anything:
+ alr_justClosedExplicitLetBlock :: Bool,
+
+ -- The next three are used to implement Annotations giving the
+ -- locations of 'noise' tokens in the source, so that users of
+ -- the GHC API can do source to source conversions.
+ -- See note [Api annotations] in GHC.Parser.Annotation
+ annotations :: [(ApiAnnKey,[RealSrcSpan])],
+ eof_pos :: Maybe RealSrcSpan,
+ comment_q :: [RealLocated AnnotationComment],
+ annotations_comments :: [(RealSrcSpan,[RealLocated AnnotationComment])]
+ }
+ -- last_loc and last_len are used when generating error messages,
+ -- and in pushCurrentContext only. Sigh, if only Happy passed the
+ -- current token to happyError, we could at least get rid of last_len.
+ -- Getting rid of last_loc would require finding another way to
+ -- implement pushCurrentContext (which is only called from one place).
+
+data ALRContext = ALRNoLayout Bool{- does it contain commas? -}
+ Bool{- is it a 'let' block? -}
+ | ALRLayout ALRLayout Int
+data ALRLayout = ALRLayoutLet
+ | ALRLayoutWhere
+ | ALRLayoutOf
+ | ALRLayoutDo
+
+-- | The parsing monad, isomorphic to @StateT PState Maybe@.
+newtype P a = P { unP :: PState -> ParseResult a }
+
+instance Functor P where
+ fmap = liftM
+
+instance Applicative P where
+ pure = returnP
+ (<*>) = ap
+
+instance Monad P where
+ (>>=) = thenP
+
+returnP :: a -> P a
+returnP a = a `seq` (P $ \s -> POk s a)
+
+thenP :: P a -> (a -> P b) -> P b
+(P m) `thenP` k = P $ \ s ->
+ case m s of
+ POk s1 a -> (unP (k a)) s1
+ PFailed s1 -> PFailed s1
+
+failMsgP :: String -> P a
+failMsgP msg = do
+ pState <- getPState
+ addFatalError (mkSrcSpanPs (last_loc pState)) (text msg)
+
+failLocMsgP :: RealSrcLoc -> RealSrcLoc -> String -> P a
+failLocMsgP loc1 loc2 str =
+ addFatalError (RealSrcSpan (mkRealSrcSpan loc1 loc2) Nothing) (text str)
+
+getPState :: P PState
+getPState = P $ \s -> POk s s
+
+withThisPackage :: (UnitId -> a) -> P a
+withThisPackage f = P $ \s@(PState{options = o}) -> POk s (f (pThisPackage o))
+
+getExts :: P ExtsBitmap
+getExts = P $ \s -> POk s (pExtsBitmap . options $ s)
+
+setExts :: (ExtsBitmap -> ExtsBitmap) -> P ()
+setExts f = P $ \s -> POk s {
+ options =
+ let p = options s
+ in p { pExtsBitmap = f (pExtsBitmap p) }
+ } ()
+
+setSrcLoc :: RealSrcLoc -> P ()
+setSrcLoc new_loc =
+ P $ \s@(PState{ loc = PsLoc _ buf_loc }) ->
+ POk s{ loc = PsLoc new_loc buf_loc } ()
+
+getRealSrcLoc :: P RealSrcLoc
+getRealSrcLoc = P $ \s@(PState{ loc=loc }) -> POk s (psRealLoc loc)
+
+getParsedLoc :: P PsLoc
+getParsedLoc = P $ \s@(PState{ loc=loc }) -> POk s loc
+
+addSrcFile :: FastString -> P ()
+addSrcFile f = P $ \s -> POk s{ srcfiles = f : srcfiles s } ()
+
+setEofPos :: RealSrcSpan -> P ()
+setEofPos span = P $ \s -> POk s{ eof_pos = Just span } ()
+
+setLastToken :: PsSpan -> Int -> P ()
+setLastToken loc len = P $ \s -> POk s {
+ last_loc=loc,
+ last_len=len
+ } ()
+
+setLastTk :: Token -> P ()
+setLastTk tk = P $ \s -> POk s { last_tk = Just tk } ()
+
+getLastTk :: P (Maybe Token)
+getLastTk = P $ \s@(PState { last_tk = last_tk }) -> POk s last_tk
+
+data AlexInput = AI PsLoc StringBuffer
+
+{-
+Note [Unicode in Alex]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Although newer versions of Alex support unicode, this grammar is processed with
+the old style '--latin1' behaviour. This means that when implementing the
+functions
+
+ alexGetByte :: AlexInput -> Maybe (Word8,AlexInput)
+ alexInputPrevChar :: AlexInput -> Char
+
+which Alex uses to take apart our 'AlexInput', we must
+
+ * return a latin1 character in the 'Word8' that 'alexGetByte' expects
+ * return a latin1 character in 'alexInputPrevChar'.
+
+We handle this in 'adjustChar' by squishing entire classes of unicode
+characters into single bytes.
+-}
+
+{-# INLINE adjustChar #-}
+adjustChar :: Char -> Word8
+adjustChar c = fromIntegral $ ord adj_c
+ where non_graphic = '\x00'
+ upper = '\x01'
+ lower = '\x02'
+ digit = '\x03'
+ symbol = '\x04'
+ space = '\x05'
+ other_graphic = '\x06'
+ uniidchar = '\x07'
+
+ adj_c
+ | c <= '\x07' = non_graphic
+ | c <= '\x7f' = c
+ -- Alex doesn't handle Unicode, so when Unicode
+ -- character is encountered we output these values
+ -- with the actual character value hidden in the state.
+ | otherwise =
+ -- NB: The logic behind these definitions is also reflected
+ -- in basicTypes/Lexeme.hs
+ -- Any changes here should likely be reflected there.
+
+ case generalCategory c of
+ UppercaseLetter -> upper
+ LowercaseLetter -> lower
+ TitlecaseLetter -> upper
+ ModifierLetter -> uniidchar -- see #10196
+ OtherLetter -> lower -- see #1103
+ NonSpacingMark -> uniidchar -- see #7650
+ SpacingCombiningMark -> other_graphic
+ EnclosingMark -> other_graphic
+ DecimalNumber -> digit
+ LetterNumber -> other_graphic
+ OtherNumber -> digit -- see #4373
+ ConnectorPunctuation -> symbol
+ DashPunctuation -> symbol
+ OpenPunctuation -> other_graphic
+ ClosePunctuation -> other_graphic
+ InitialQuote -> other_graphic
+ FinalQuote -> other_graphic
+ OtherPunctuation -> symbol
+ MathSymbol -> symbol
+ CurrencySymbol -> symbol
+ ModifierSymbol -> symbol
+ OtherSymbol -> symbol
+ Space -> space
+ _other -> non_graphic
+
+-- Getting the previous 'Char' isn't enough here - we need to convert it into
+-- the same format that 'alexGetByte' would have produced.
+--
+-- See Note [Unicode in Alex] and #13986.
+alexInputPrevChar :: AlexInput -> Char
+alexInputPrevChar (AI _ buf) = chr (fromIntegral (adjustChar pc))
+ where pc = prevChar buf '\n'
+
+-- backwards compatibility for Alex 2.x
+alexGetChar :: AlexInput -> Maybe (Char,AlexInput)
+alexGetChar inp = case alexGetByte inp of
+ Nothing -> Nothing
+ Just (b,i) -> c `seq` Just (c,i)
+ where c = chr $ fromIntegral b
+
+-- See Note [Unicode in Alex]
+alexGetByte :: AlexInput -> Maybe (Word8,AlexInput)
+alexGetByte (AI loc s)
+ | atEnd s = Nothing
+ | otherwise = byte `seq` loc' `seq` s' `seq`
+ --trace (show (ord c)) $
+ Just (byte, (AI loc' s'))
+ where (c,s') = nextChar s
+ loc' = advancePsLoc loc c
+ byte = adjustChar c
+
+-- This version does not squash unicode characters, it is used when
+-- lexing strings.
+alexGetChar' :: AlexInput -> Maybe (Char,AlexInput)
+alexGetChar' (AI loc s)
+ | atEnd s = Nothing
+ | otherwise = c `seq` loc' `seq` s' `seq`
+ --trace (show (ord c)) $
+ Just (c, (AI loc' s'))
+ where (c,s') = nextChar s
+ loc' = advancePsLoc loc c
+
+getInput :: P AlexInput
+getInput = P $ \s@PState{ loc=l, buffer=b } -> POk s (AI l b)
+
+setInput :: AlexInput -> P ()
+setInput (AI l b) = P $ \s -> POk s{ loc=l, buffer=b } ()
+
+nextIsEOF :: P Bool
+nextIsEOF = do
+ AI _ s <- getInput
+ return $ atEnd s
+
+pushLexState :: Int -> P ()
+pushLexState ls = P $ \s@PState{ lex_state=l } -> POk s{lex_state=ls:l} ()
+
+popLexState :: P Int
+popLexState = P $ \s@PState{ lex_state=ls:l } -> POk s{ lex_state=l } ls
+
+getLexState :: P Int
+getLexState = P $ \s@PState{ lex_state=ls:_ } -> POk s ls
+
+popNextToken :: P (Maybe (PsLocated Token))
+popNextToken
+ = P $ \s@PState{ alr_next_token = m } ->
+ POk (s {alr_next_token = Nothing}) m
+
+activeContext :: P Bool
+activeContext = do
+ ctxt <- getALRContext
+ expc <- getAlrExpectingOCurly
+ impt <- implicitTokenPending
+ case (ctxt,expc) of
+ ([],Nothing) -> return impt
+ _other -> return True
+
+resetAlrLastLoc :: FastString -> P ()
+resetAlrLastLoc file =
+ P $ \s@(PState {alr_last_loc = PsSpan _ buf_span}) ->
+ POk s{ alr_last_loc = PsSpan (alrInitialLoc file) buf_span } ()
+
+setAlrLastLoc :: PsSpan -> P ()
+setAlrLastLoc l = P $ \s -> POk (s {alr_last_loc = l}) ()
+
+getAlrLastLoc :: P PsSpan
+getAlrLastLoc = P $ \s@(PState {alr_last_loc = l}) -> POk s l
+
+getALRContext :: P [ALRContext]
+getALRContext = P $ \s@(PState {alr_context = cs}) -> POk s cs
+
+setALRContext :: [ALRContext] -> P ()
+setALRContext cs = P $ \s -> POk (s {alr_context = cs}) ()
+
+getJustClosedExplicitLetBlock :: P Bool
+getJustClosedExplicitLetBlock
+ = P $ \s@(PState {alr_justClosedExplicitLetBlock = b}) -> POk s b
+
+setJustClosedExplicitLetBlock :: Bool -> P ()
+setJustClosedExplicitLetBlock b
+ = P $ \s -> POk (s {alr_justClosedExplicitLetBlock = b}) ()
+
+setNextToken :: PsLocated Token -> P ()
+setNextToken t = P $ \s -> POk (s {alr_next_token = Just t}) ()
+
+implicitTokenPending :: P Bool
+implicitTokenPending
+ = P $ \s@PState{ alr_pending_implicit_tokens = ts } ->
+ case ts of
+ [] -> POk s False
+ _ -> POk s True
+
+popPendingImplicitToken :: P (Maybe (PsLocated Token))
+popPendingImplicitToken
+ = P $ \s@PState{ alr_pending_implicit_tokens = ts } ->
+ case ts of
+ [] -> POk s Nothing
+ (t : ts') -> POk (s {alr_pending_implicit_tokens = ts'}) (Just t)
+
+setPendingImplicitTokens :: [PsLocated Token] -> P ()
+setPendingImplicitTokens ts = P $ \s -> POk (s {alr_pending_implicit_tokens = ts}) ()
+
+getAlrExpectingOCurly :: P (Maybe ALRLayout)
+getAlrExpectingOCurly = P $ \s@(PState {alr_expecting_ocurly = b}) -> POk s b
+
+setAlrExpectingOCurly :: Maybe ALRLayout -> P ()
+setAlrExpectingOCurly b = P $ \s -> POk (s {alr_expecting_ocurly = b}) ()
+
+-- | For reasons of efficiency, boolean parsing flags (eg, language extensions
+-- or whether we are currently in a @RULE@ pragma) are represented by a bitmap
+-- stored in a @Word64@.
+type ExtsBitmap = Word64
+
+xbit :: ExtBits -> ExtsBitmap
+xbit = bit . fromEnum
+
+xtest :: ExtBits -> ExtsBitmap -> Bool
+xtest ext xmap = testBit xmap (fromEnum ext)
+
+-- | Various boolean flags, mostly language extensions, that impact lexing and
+-- parsing. Note that a handful of these can change during lexing/parsing.
+data ExtBits
+ -- Flags that are constant once parsing starts
+ = FfiBit
+ | InterruptibleFfiBit
+ | CApiFfiBit
+ | ArrowsBit
+ | ThBit
+ | ThQuotesBit
+ | IpBit
+ | OverloadedLabelsBit -- #x overloaded labels
+ | ExplicitForallBit -- the 'forall' keyword
+ | BangPatBit -- Tells the parser to understand bang-patterns
+ -- (doesn't affect the lexer)
+ | PatternSynonymsBit -- pattern synonyms
+ | HaddockBit-- Lex and parse Haddock comments
+ | MagicHashBit -- "#" in both functions and operators
+ | RecursiveDoBit -- mdo
+ | UnicodeSyntaxBit -- the forall symbol, arrow symbols, etc
+ | UnboxedTuplesBit -- (# and #)
+ | UnboxedSumsBit -- (# and #)
+ | DatatypeContextsBit
+ | MonadComprehensionsBit
+ | TransformComprehensionsBit
+ | QqBit -- enable quasiquoting
+ | RawTokenStreamBit -- producing a token stream with all comments included
+ | AlternativeLayoutRuleBit
+ | ALRTransitionalBit
+ | RelaxedLayoutBit
+ | NondecreasingIndentationBit
+ | SafeHaskellBit
+ | TraditionalRecordSyntaxBit
+ | ExplicitNamespacesBit
+ | LambdaCaseBit
+ | BinaryLiteralsBit
+ | NegativeLiteralsBit
+ | HexFloatLiteralsBit
+ | TypeApplicationsBit
+ | StaticPointersBit
+ | NumericUnderscoresBit
+ | StarIsTypeBit
+ | BlockArgumentsBit
+ | NPlusKPatternsBit
+ | DoAndIfThenElseBit
+ | MultiWayIfBit
+ | GadtSyntaxBit
+ | ImportQualifiedPostBit
+
+ -- Flags that are updated once parsing starts
+ | InRulePragBit
+ | InNestedCommentBit -- See Note [Nested comment line pragmas]
+ | UsePosPragsBit
+ -- ^ If this is enabled, '{-# LINE ... -#}' and '{-# COLUMN ... #-}'
+ -- update the internal position. Otherwise, those pragmas are lexed as
+ -- tokens of their own.
+ deriving Enum
+
+
+
+
+
+-- PState for parsing options pragmas
+--
+pragState :: DynFlags -> StringBuffer -> RealSrcLoc -> PState
+pragState dynflags buf loc = (mkPState dynflags buf loc) {
+ lex_state = [bol, option_prags, 0]
+ }
+
+{-# INLINE mkParserFlags' #-}
+mkParserFlags'
+ :: EnumSet WarningFlag -- ^ warnings flags enabled
+ -> EnumSet LangExt.Extension -- ^ permitted language extensions enabled
+ -> UnitId -- ^ key of package currently being compiled
+ -> Bool -- ^ are safe imports on?
+ -> Bool -- ^ keeping Haddock comment tokens
+ -> Bool -- ^ keep regular comment tokens
+
+ -> Bool
+ -- ^ If this is enabled, '{-# LINE ... -#}' and '{-# COLUMN ... #-}' update
+ -- the internal position kept by the parser. Otherwise, those pragmas are
+ -- lexed as 'ITline_prag' and 'ITcolumn_prag' tokens.
+
+ -> ParserFlags
+-- ^ Given exactly the information needed, set up the 'ParserFlags'
+mkParserFlags' warningFlags extensionFlags thisPackage
+ safeImports isHaddock rawTokStream usePosPrags =
+ ParserFlags {
+ pWarningFlags = warningFlags
+ , pThisPackage = thisPackage
+ , pExtsBitmap = safeHaskellBit .|. langExtBits .|. optBits
+ }
+ where
+ safeHaskellBit = SafeHaskellBit `setBitIf` safeImports
+ langExtBits =
+ FfiBit `xoptBit` LangExt.ForeignFunctionInterface
+ .|. InterruptibleFfiBit `xoptBit` LangExt.InterruptibleFFI
+ .|. CApiFfiBit `xoptBit` LangExt.CApiFFI
+ .|. ArrowsBit `xoptBit` LangExt.Arrows
+ .|. ThBit `xoptBit` LangExt.TemplateHaskell
+ .|. ThQuotesBit `xoptBit` LangExt.TemplateHaskellQuotes
+ .|. QqBit `xoptBit` LangExt.QuasiQuotes
+ .|. IpBit `xoptBit` LangExt.ImplicitParams
+ .|. OverloadedLabelsBit `xoptBit` LangExt.OverloadedLabels
+ .|. ExplicitForallBit `xoptBit` LangExt.ExplicitForAll
+ .|. BangPatBit `xoptBit` LangExt.BangPatterns
+ .|. MagicHashBit `xoptBit` LangExt.MagicHash
+ .|. RecursiveDoBit `xoptBit` LangExt.RecursiveDo
+ .|. UnicodeSyntaxBit `xoptBit` LangExt.UnicodeSyntax
+ .|. UnboxedTuplesBit `xoptBit` LangExt.UnboxedTuples
+ .|. UnboxedSumsBit `xoptBit` LangExt.UnboxedSums
+ .|. DatatypeContextsBit `xoptBit` LangExt.DatatypeContexts
+ .|. TransformComprehensionsBit `xoptBit` LangExt.TransformListComp
+ .|. MonadComprehensionsBit `xoptBit` LangExt.MonadComprehensions
+ .|. AlternativeLayoutRuleBit `xoptBit` LangExt.AlternativeLayoutRule
+ .|. ALRTransitionalBit `xoptBit` LangExt.AlternativeLayoutRuleTransitional
+ .|. RelaxedLayoutBit `xoptBit` LangExt.RelaxedLayout
+ .|. NondecreasingIndentationBit `xoptBit` LangExt.NondecreasingIndentation
+ .|. TraditionalRecordSyntaxBit `xoptBit` LangExt.TraditionalRecordSyntax
+ .|. ExplicitNamespacesBit `xoptBit` LangExt.ExplicitNamespaces
+ .|. LambdaCaseBit `xoptBit` LangExt.LambdaCase
+ .|. BinaryLiteralsBit `xoptBit` LangExt.BinaryLiterals
+ .|. NegativeLiteralsBit `xoptBit` LangExt.NegativeLiterals
+ .|. HexFloatLiteralsBit `xoptBit` LangExt.HexFloatLiterals
+ .|. PatternSynonymsBit `xoptBit` LangExt.PatternSynonyms
+ .|. TypeApplicationsBit `xoptBit` LangExt.TypeApplications
+ .|. StaticPointersBit `xoptBit` LangExt.StaticPointers
+ .|. NumericUnderscoresBit `xoptBit` LangExt.NumericUnderscores
+ .|. StarIsTypeBit `xoptBit` LangExt.StarIsType
+ .|. BlockArgumentsBit `xoptBit` LangExt.BlockArguments
+ .|. NPlusKPatternsBit `xoptBit` LangExt.NPlusKPatterns
+ .|. DoAndIfThenElseBit `xoptBit` LangExt.DoAndIfThenElse
+ .|. MultiWayIfBit `xoptBit` LangExt.MultiWayIf
+ .|. GadtSyntaxBit `xoptBit` LangExt.GADTSyntax
+ .|. ImportQualifiedPostBit `xoptBit` LangExt.ImportQualifiedPost
+ optBits =
+ HaddockBit `setBitIf` isHaddock
+ .|. RawTokenStreamBit `setBitIf` rawTokStream
+ .|. UsePosPragsBit `setBitIf` usePosPrags
+
+ xoptBit bit ext = bit `setBitIf` EnumSet.member ext extensionFlags
+
+ setBitIf :: ExtBits -> Bool -> ExtsBitmap
+ b `setBitIf` cond | cond = xbit b
+ | otherwise = 0
+
+-- | Extracts the flag information needed for parsing
+mkParserFlags :: DynFlags -> ParserFlags
+mkParserFlags =
+ mkParserFlags'
+ <$> DynFlags.warningFlags
+ <*> DynFlags.extensionFlags
+ <*> DynFlags.thisPackage
+ <*> safeImportsOn
+ <*> gopt Opt_Haddock
+ <*> gopt Opt_KeepRawTokenStream
+ <*> const True
+
+-- | Creates a parse state from a 'DynFlags' value
+mkPState :: DynFlags -> StringBuffer -> RealSrcLoc -> PState
+mkPState flags = mkPStatePure (mkParserFlags flags)
+
+-- | Creates a parse state from a 'ParserFlags' value
+mkPStatePure :: ParserFlags -> StringBuffer -> RealSrcLoc -> PState
+mkPStatePure options buf loc =
+ PState {
+ buffer = buf,
+ options = options,
+ messages = const emptyMessages,
+ tab_first = Nothing,
+ tab_count = 0,
+ last_tk = Nothing,
+ last_loc = mkPsSpan init_loc init_loc,
+ last_len = 0,
+ loc = init_loc,
+ context = [],
+ lex_state = [bol, 0],
+ srcfiles = [],
+ alr_pending_implicit_tokens = [],
+ alr_next_token = Nothing,
+ alr_last_loc = PsSpan (alrInitialLoc (fsLit "<no file>")) (BufSpan (BufPos 0) (BufPos 0)),
+ alr_context = [],
+ alr_expecting_ocurly = Nothing,
+ alr_justClosedExplicitLetBlock = False,
+ annotations = [],
+ eof_pos = Nothing,
+ comment_q = [],
+ annotations_comments = []
+ }
+ where init_loc = PsLoc loc (BufPos 0)
+
+-- | An mtl-style class for monads that support parsing-related operations.
+-- For example, sometimes we make a second pass over the parsing results to validate,
+-- disambiguate, or rearrange them, and we do so in the PV monad which cannot consume
+-- input but can report parsing errors, check for extension bits, and accumulate
+-- parsing annotations. Both P and PV are instances of MonadP.
+--
+-- MonadP grants us convenient overloading. The other option is to have separate operations
+-- for each monad: addErrorP vs addErrorPV, getBitP vs getBitPV, and so on.
+--
+class Monad m => MonadP m where
+ -- | Add a non-fatal error. Use this when the parser can produce a result
+ -- despite the error.
+ --
+ -- For example, when GHC encounters a @forall@ in a type,
+ -- but @-XExplicitForAll@ is disabled, the parser constructs @ForAllTy@
+ -- as if @-XExplicitForAll@ was enabled, adding a non-fatal error to
+ -- the accumulator.
+ --
+ -- Control flow wise, non-fatal errors act like warnings: they are added
+ -- to the accumulator and parsing continues. This allows GHC to report
+ -- more than one parse error per file.
+ --
+ addError :: SrcSpan -> SDoc -> m ()
+ -- | Add a warning to the accumulator.
+ -- Use 'getMessages' to get the accumulated warnings.
+ addWarning :: WarningFlag -> SrcSpan -> SDoc -> m ()
+ -- | Add a fatal error. This will be the last error reported by the parser, and
+ -- the parser will not produce any result, ending in a 'PFailed' state.
+ addFatalError :: SrcSpan -> SDoc -> m a
+ -- | Check if a given flag is currently set in the bitmap.
+ getBit :: ExtBits -> m Bool
+ -- | Given a location and a list of AddAnn, apply them all to the location.
+ addAnnotation :: SrcSpan -- SrcSpan of enclosing AST construct
+ -> AnnKeywordId -- The first two parameters are the key
+ -> SrcSpan -- The location of the keyword itself
+ -> m ()
+
+appendError
+ :: SrcSpan
+ -> SDoc
+ -> (DynFlags -> Messages)
+ -> (DynFlags -> Messages)
+appendError srcspan msg m =
+ \d ->
+ let (ws, es) = m d
+ errormsg = mkErrMsg d srcspan alwaysQualify msg
+ es' = es `snocBag` errormsg
+ in (ws, es')
+
+appendWarning
+ :: ParserFlags
+ -> WarningFlag
+ -> SrcSpan
+ -> SDoc
+ -> (DynFlags -> Messages)
+ -> (DynFlags -> Messages)
+appendWarning o option srcspan warning m =
+ \d ->
+ let (ws, es) = m d
+ warning' = makeIntoWarning (Reason option) $
+ mkWarnMsg d srcspan alwaysQualify warning
+ ws' = if warnopt option o then ws `snocBag` warning' else ws
+ in (ws', es)
+
+instance MonadP P where
+ addError srcspan msg
+ = P $ \s@PState{messages=m} ->
+ POk s{messages=appendError srcspan msg m} ()
+ addWarning option srcspan warning
+ = P $ \s@PState{messages=m, options=o} ->
+ POk s{messages=appendWarning o option srcspan warning m} ()
+ addFatalError span msg =
+ addError span msg >> P PFailed
+ getBit ext = P $ \s -> let b = ext `xtest` pExtsBitmap (options s)
+ in b `seq` POk s b
+ addAnnotation (RealSrcSpan l _) a (RealSrcSpan v _) = do
+ addAnnotationOnly l a v
+ allocateCommentsP l
+ addAnnotation _ _ _ = return ()
+
+addAnnsAt :: MonadP m => SrcSpan -> [AddAnn] -> m ()
+addAnnsAt l = mapM_ (\(AddAnn a v) -> addAnnotation l a v)
+
+addTabWarning :: RealSrcSpan -> P ()
+addTabWarning srcspan
+ = P $ \s@PState{tab_first=tf, tab_count=tc, options=o} ->
+ let tf' = if isJust tf then tf else Just srcspan
+ tc' = tc + 1
+ s' = if warnopt Opt_WarnTabs o
+ then s{tab_first = tf', tab_count = tc'}
+ else s
+ in POk s' ()
+
+mkTabWarning :: PState -> DynFlags -> Maybe ErrMsg
+mkTabWarning PState{tab_first=tf, tab_count=tc} d =
+ let middle = if tc == 1
+ then text ""
+ else text ", and in" <+> speakNOf (tc - 1) (text "further location")
+ message = text "Tab character found here"
+ <> middle
+ <> text "."
+ $+$ text "Please use spaces instead."
+ in fmap (\s -> makeIntoWarning (Reason Opt_WarnTabs) $
+ mkWarnMsg d (RealSrcSpan s Nothing) alwaysQualify message) tf
+
+-- | Get a bag of the errors that have been accumulated so far.
+-- Does not take -Werror into account.
+getErrorMessages :: PState -> DynFlags -> ErrorMessages
+getErrorMessages PState{messages=m} d =
+ let (_, es) = m d in es
+
+-- | Get the warnings and errors accumulated so far.
+-- Does not take -Werror into account.
+getMessages :: PState -> DynFlags -> Messages
+getMessages p@PState{messages=m} d =
+ let (ws, es) = m d
+ tabwarning = mkTabWarning p d
+ ws' = maybe ws (`consBag` ws) tabwarning
+ in (ws', es)
+
+getContext :: P [LayoutContext]
+getContext = P $ \s@PState{context=ctx} -> POk s ctx
+
+setContext :: [LayoutContext] -> P ()
+setContext ctx = P $ \s -> POk s{context=ctx} ()
+
+popContext :: P ()
+popContext = P $ \ s@(PState{ buffer = buf, options = o, context = ctx,
+ last_len = len, last_loc = last_loc }) ->
+ case ctx of
+ (_:tl) ->
+ POk s{ context = tl } ()
+ [] ->
+ unP (addFatalError (mkSrcSpanPs last_loc) (srcParseErr o buf len)) s
+
+-- Push a new layout context at the indentation of the last token read.
+pushCurrentContext :: GenSemic -> P ()
+pushCurrentContext gen_semic = P $ \ s@PState{ last_loc=loc, context=ctx } ->
+ POk s{context = Layout (srcSpanStartCol (psRealSpan loc)) gen_semic : ctx} ()
+
+-- This is only used at the outer level of a module when the 'module' keyword is
+-- missing.
+pushModuleContext :: P ()
+pushModuleContext = pushCurrentContext generateSemic
+
+getOffside :: P (Ordering, Bool)
+getOffside = P $ \s@PState{last_loc=loc, context=stk} ->
+ let offs = srcSpanStartCol (psRealSpan loc) in
+ let ord = case stk of
+ Layout n gen_semic : _ ->
+ --trace ("layout: " ++ show n ++ ", offs: " ++ show offs) $
+ (compare offs n, gen_semic)
+ _ ->
+ (GT, dontGenerateSemic)
+ in POk s ord
+
+-- ---------------------------------------------------------------------------
+-- Construct a parse error
+
+srcParseErr
+ :: ParserFlags
+ -> StringBuffer -- current buffer (placed just after the last token)
+ -> Int -- length of the previous token
+ -> MsgDoc
+srcParseErr options buf len
+ = if null token
+ then text "parse error (possibly incorrect indentation or mismatched brackets)"
+ else text "parse error on input" <+> quotes (text token)
+ $$ ppWhen (not th_enabled && token == "$") -- #7396
+ (text "Perhaps you intended to use TemplateHaskell")
+ $$ ppWhen (token == "<-")
+ (if mdoInLast100
+ then text "Perhaps you intended to use RecursiveDo"
+ else text "Perhaps this statement should be within a 'do' block?")
+ $$ ppWhen (token == "=" && doInLast100) -- #15849
+ (text "Perhaps you need a 'let' in a 'do' block?"
+ $$ text "e.g. 'let x = 5' instead of 'x = 5'")
+ $$ ppWhen (not ps_enabled && pattern == "pattern ") -- #12429
+ (text "Perhaps you intended to use PatternSynonyms")
+ where token = lexemeToString (offsetBytes (-len) buf) len
+ pattern = decodePrevNChars 8 buf
+ last100 = decodePrevNChars 100 buf
+ doInLast100 = "do" `isInfixOf` last100
+ mdoInLast100 = "mdo" `isInfixOf` last100
+ th_enabled = ThQuotesBit `xtest` pExtsBitmap options
+ ps_enabled = PatternSynonymsBit `xtest` pExtsBitmap options
+
+-- Report a parse failure, giving the span of the previous token as
+-- the location of the error. This is the entry point for errors
+-- detected during parsing.
+srcParseFail :: P a
+srcParseFail = P $ \s@PState{ buffer = buf, options = o, last_len = len,
+ last_loc = last_loc } ->
+ unP (addFatalError (mkSrcSpanPs last_loc) (srcParseErr o buf len)) s
+
+-- A lexical error is reported at a particular position in the source file,
+-- not over a token range.
+lexError :: String -> P a
+lexError str = do
+ loc <- getRealSrcLoc
+ (AI end buf) <- getInput
+ reportLexError loc (psRealLoc end) buf str
+
+-- -----------------------------------------------------------------------------
+-- This is the top-level function: called from the parser each time a
+-- new token is to be read from the input.
+
+lexer, lexerDbg :: Bool -> (Located Token -> P a) -> P a
+
+lexer queueComments cont = do
+ alr <- getBit AlternativeLayoutRuleBit
+ let lexTokenFun = if alr then lexTokenAlr else lexToken
+ (L span tok) <- lexTokenFun
+ --trace ("token: " ++ show tok) $ do
+
+ if (queueComments && isDocComment tok)
+ then queueComment (L (psRealSpan span) tok)
+ else return ()
+
+ if (queueComments && isComment tok)
+ then queueComment (L (psRealSpan span) tok) >> lexer queueComments cont
+ else cont (L (mkSrcSpanPs span) tok)
+
+-- Use this instead of 'lexer' in GHC.Parser to dump the tokens for debugging.
+lexerDbg queueComments cont = lexer queueComments contDbg
+ where
+ contDbg tok = trace ("token: " ++ show (unLoc tok)) (cont tok)
+
+lexTokenAlr :: P (PsLocated Token)
+lexTokenAlr = do mPending <- popPendingImplicitToken
+ t <- case mPending of
+ Nothing ->
+ do mNext <- popNextToken
+ t <- case mNext of
+ Nothing -> lexToken
+ Just next -> return next
+ alternativeLayoutRuleToken t
+ Just t ->
+ return t
+ setAlrLastLoc (getLoc t)
+ case unLoc t of
+ ITwhere -> setAlrExpectingOCurly (Just ALRLayoutWhere)
+ ITlet -> setAlrExpectingOCurly (Just ALRLayoutLet)
+ ITof -> setAlrExpectingOCurly (Just ALRLayoutOf)
+ ITlcase -> setAlrExpectingOCurly (Just ALRLayoutOf)
+ ITdo -> setAlrExpectingOCurly (Just ALRLayoutDo)
+ ITmdo -> setAlrExpectingOCurly (Just ALRLayoutDo)
+ ITrec -> setAlrExpectingOCurly (Just ALRLayoutDo)
+ _ -> return ()
+ return t
+
+alternativeLayoutRuleToken :: PsLocated Token -> P (PsLocated Token)
+alternativeLayoutRuleToken t
+ = do context <- getALRContext
+ lastLoc <- getAlrLastLoc
+ mExpectingOCurly <- getAlrExpectingOCurly
+ transitional <- getBit ALRTransitionalBit
+ justClosedExplicitLetBlock <- getJustClosedExplicitLetBlock
+ setJustClosedExplicitLetBlock False
+ let thisLoc = getLoc t
+ thisCol = srcSpanStartCol (psRealSpan thisLoc)
+ newLine = srcSpanStartLine (psRealSpan thisLoc) > srcSpanEndLine (psRealSpan lastLoc)
+ case (unLoc t, context, mExpectingOCurly) of
+ -- This case handles a GHC extension to the original H98
+ -- layout rule...
+ (ITocurly, _, Just alrLayout) ->
+ do setAlrExpectingOCurly Nothing
+ let isLet = case alrLayout of
+ ALRLayoutLet -> True
+ _ -> False
+ setALRContext (ALRNoLayout (containsCommas ITocurly) isLet : context)
+ return t
+ -- ...and makes this case unnecessary
+ {-
+ -- I think our implicit open-curly handling is slightly
+ -- different to John's, in how it interacts with newlines
+ -- and "in"
+ (ITocurly, _, Just _) ->
+ do setAlrExpectingOCurly Nothing
+ setNextToken t
+ lexTokenAlr
+ -}
+ (_, ALRLayout _ col : _ls, Just expectingOCurly)
+ | (thisCol > col) ||
+ (thisCol == col &&
+ isNonDecreasingIndentation expectingOCurly) ->
+ do setAlrExpectingOCurly Nothing
+ setALRContext (ALRLayout expectingOCurly thisCol : context)
+ setNextToken t
+ return (L thisLoc ITvocurly)
+ | otherwise ->
+ do setAlrExpectingOCurly Nothing
+ setPendingImplicitTokens [L lastLoc ITvccurly]
+ setNextToken t
+ return (L lastLoc ITvocurly)
+ (_, _, Just expectingOCurly) ->
+ do setAlrExpectingOCurly Nothing
+ setALRContext (ALRLayout expectingOCurly thisCol : context)
+ setNextToken t
+ return (L thisLoc ITvocurly)
+ -- We do the [] cases earlier than in the spec, as we
+ -- have an actual EOF token
+ (ITeof, ALRLayout _ _ : ls, _) ->
+ do setALRContext ls
+ setNextToken t
+ return (L thisLoc ITvccurly)
+ (ITeof, _, _) ->
+ return t
+ -- the other ITeof case omitted; general case below covers it
+ (ITin, _, _)
+ | justClosedExplicitLetBlock ->
+ return t
+ (ITin, ALRLayout ALRLayoutLet _ : ls, _)
+ | newLine ->
+ do setPendingImplicitTokens [t]
+ setALRContext ls
+ return (L thisLoc ITvccurly)
+ -- This next case is to handle a transitional issue:
+ (ITwhere, ALRLayout _ col : ls, _)
+ | newLine && thisCol == col && transitional ->
+ do addWarning Opt_WarnAlternativeLayoutRuleTransitional
+ (mkSrcSpanPs thisLoc)
+ (transitionalAlternativeLayoutWarning
+ "`where' clause at the same depth as implicit layout block")
+ setALRContext ls
+ setNextToken t
+ -- Note that we use lastLoc, as we may need to close
+ -- more layouts, or give a semicolon
+ return (L lastLoc ITvccurly)
+ -- This next case is to handle a transitional issue:
+ (ITvbar, ALRLayout _ col : ls, _)
+ | newLine && thisCol == col && transitional ->
+ do addWarning Opt_WarnAlternativeLayoutRuleTransitional
+ (mkSrcSpanPs thisLoc)
+ (transitionalAlternativeLayoutWarning
+ "`|' at the same depth as implicit layout block")
+ setALRContext ls
+ setNextToken t
+ -- Note that we use lastLoc, as we may need to close
+ -- more layouts, or give a semicolon
+ return (L lastLoc ITvccurly)
+ (_, ALRLayout _ col : ls, _)
+ | newLine && thisCol == col ->
+ do setNextToken t
+ let loc = psSpanStart thisLoc
+ zeroWidthLoc = mkPsSpan loc loc
+ return (L zeroWidthLoc ITsemi)
+ | newLine && thisCol < col ->
+ do setALRContext ls
+ setNextToken t
+ -- Note that we use lastLoc, as we may need to close
+ -- more layouts, or give a semicolon
+ return (L lastLoc ITvccurly)
+ -- We need to handle close before open, as 'then' is both
+ -- an open and a close
+ (u, _, _)
+ | isALRclose u ->
+ case context of
+ ALRLayout _ _ : ls ->
+ do setALRContext ls
+ setNextToken t
+ return (L thisLoc ITvccurly)
+ ALRNoLayout _ isLet : ls ->
+ do let ls' = if isALRopen u
+ then ALRNoLayout (containsCommas u) False : ls
+ else ls
+ setALRContext ls'
+ when isLet $ setJustClosedExplicitLetBlock True
+ return t
+ [] ->
+ do let ls = if isALRopen u
+ then [ALRNoLayout (containsCommas u) False]
+ else []
+ setALRContext ls
+ -- XXX This is an error in John's code, but
+ -- it looks reachable to me at first glance
+ return t
+ (u, _, _)
+ | isALRopen u ->
+ do setALRContext (ALRNoLayout (containsCommas u) False : context)
+ return t
+ (ITin, ALRLayout ALRLayoutLet _ : ls, _) ->
+ do setALRContext ls
+ setPendingImplicitTokens [t]
+ return (L thisLoc ITvccurly)
+ (ITin, ALRLayout _ _ : ls, _) ->
+ do setALRContext ls
+ setNextToken t
+ return (L thisLoc ITvccurly)
+ -- the other ITin case omitted; general case below covers it
+ (ITcomma, ALRLayout _ _ : ls, _)
+ | topNoLayoutContainsCommas ls ->
+ do setALRContext ls
+ setNextToken t
+ return (L thisLoc ITvccurly)
+ (ITwhere, ALRLayout ALRLayoutDo _ : ls, _) ->
+ do setALRContext ls
+ setPendingImplicitTokens [t]
+ return (L thisLoc ITvccurly)
+ -- the other ITwhere case omitted; general case below covers it
+ (_, _, _) -> return t
+
+transitionalAlternativeLayoutWarning :: String -> SDoc
+transitionalAlternativeLayoutWarning msg
+ = text "transitional layout will not be accepted in the future:"
+ $$ text msg
+
+isALRopen :: Token -> Bool
+isALRopen ITcase = True
+isALRopen ITif = True
+isALRopen ITthen = True
+isALRopen IToparen = True
+isALRopen ITobrack = True
+isALRopen ITocurly = True
+-- GHC Extensions:
+isALRopen IToubxparen = True
+isALRopen _ = False
+
+isALRclose :: Token -> Bool
+isALRclose ITof = True
+isALRclose ITthen = True
+isALRclose ITelse = True
+isALRclose ITcparen = True
+isALRclose ITcbrack = True
+isALRclose ITccurly = True
+-- GHC Extensions:
+isALRclose ITcubxparen = True
+isALRclose _ = False
+
+isNonDecreasingIndentation :: ALRLayout -> Bool
+isNonDecreasingIndentation ALRLayoutDo = True
+isNonDecreasingIndentation _ = False
+
+containsCommas :: Token -> Bool
+containsCommas IToparen = True
+containsCommas ITobrack = True
+-- John doesn't have {} as containing commas, but records contain them,
+-- which caused a problem parsing Cabal's Distribution.Simple.InstallDirs
+-- (defaultInstallDirs).
+containsCommas ITocurly = True
+-- GHC Extensions:
+containsCommas IToubxparen = True
+containsCommas _ = False
+
+topNoLayoutContainsCommas :: [ALRContext] -> Bool
+topNoLayoutContainsCommas [] = False
+topNoLayoutContainsCommas (ALRLayout _ _ : ls) = topNoLayoutContainsCommas ls
+topNoLayoutContainsCommas (ALRNoLayout b _ : _) = b
+
+lexToken :: P (PsLocated Token)
+lexToken = do
+ inp@(AI loc1 buf) <- getInput
+ sc <- getLexState
+ exts <- getExts
+ case alexScanUser exts inp sc of
+ AlexEOF -> do
+ let span = mkPsSpan loc1 loc1
+ setEofPos (psRealSpan span)
+ setLastToken span 0
+ return (L span ITeof)
+ AlexError (AI loc2 buf) ->
+ reportLexError (psRealLoc loc1) (psRealLoc loc2) buf "lexical error"
+ AlexSkip inp2 _ -> do
+ setInput inp2
+ lexToken
+ AlexToken inp2@(AI end buf2) _ t -> do
+ setInput inp2
+ let span = mkPsSpan loc1 end
+ let bytes = byteDiff buf buf2
+ span `seq` setLastToken span bytes
+ lt <- t span buf bytes
+ let lt' = unLoc lt
+ unless (isComment lt') (setLastTk lt')
+ return lt
+
+reportLexError :: RealSrcLoc -> RealSrcLoc -> StringBuffer -> [Char] -> P a
+reportLexError loc1 loc2 buf str
+ | atEnd buf = failLocMsgP loc1 loc2 (str ++ " at end of input")
+ | otherwise =
+ let c = fst (nextChar buf)
+ in if c == '\0' -- decoding errors are mapped to '\0', see utf8DecodeChar#
+ then failLocMsgP loc2 loc2 (str ++ " (UTF-8 decoding error)")
+ else failLocMsgP loc1 loc2 (str ++ " at character " ++ show c)
+
+lexTokenStream :: StringBuffer -> RealSrcLoc -> DynFlags -> ParseResult [Located Token]
+lexTokenStream buf loc dflags = unP go initState{ options = opts' }
+ where dflags' = gopt_set (gopt_unset dflags Opt_Haddock) Opt_KeepRawTokenStream
+ initState@PState{ options = opts } = mkPState dflags' buf loc
+ opts' = opts{ pExtsBitmap = complement (xbit UsePosPragsBit) .&. pExtsBitmap opts }
+ go = do
+ ltok <- lexer False return
+ case ltok of
+ L _ ITeof -> return []
+ _ -> liftM (ltok:) go
+
+linePrags = Map.singleton "line" linePrag
+
+fileHeaderPrags = Map.fromList([("options", lex_string_prag IToptions_prag),
+ ("options_ghc", lex_string_prag IToptions_prag),
+ ("options_haddock", lex_string_prag ITdocOptions),
+ ("language", token ITlanguage_prag),
+ ("include", lex_string_prag ITinclude_prag)])
+
+ignoredPrags = Map.fromList (map ignored pragmas)
+ where ignored opt = (opt, nested_comment lexToken)
+ impls = ["hugs", "nhc98", "jhc", "yhc", "catch", "derive"]
+ options_pragmas = map ("options_" ++) impls
+ -- CFILES is a hugs-only thing.
+ pragmas = options_pragmas ++ ["cfiles", "contract"]
+
+oneWordPrags = Map.fromList [
+ ("rules", rulePrag),
+ ("inline",
+ strtoken (\s -> (ITinline_prag (SourceText s) Inline FunLike))),
+ ("inlinable",
+ strtoken (\s -> (ITinline_prag (SourceText s) Inlinable FunLike))),
+ ("inlineable",
+ strtoken (\s -> (ITinline_prag (SourceText s) Inlinable FunLike))),
+ -- Spelling variant
+ ("notinline",
+ strtoken (\s -> (ITinline_prag (SourceText s) NoInline FunLike))),
+ ("specialize", strtoken (\s -> ITspec_prag (SourceText s))),
+ ("source", strtoken (\s -> ITsource_prag (SourceText s))),
+ ("warning", strtoken (\s -> ITwarning_prag (SourceText s))),
+ ("deprecated", strtoken (\s -> ITdeprecated_prag (SourceText s))),
+ ("scc", strtoken (\s -> ITscc_prag (SourceText s))),
+ ("generated", strtoken (\s -> ITgenerated_prag (SourceText s))),
+ ("core", strtoken (\s -> ITcore_prag (SourceText s))),
+ ("unpack", strtoken (\s -> ITunpack_prag (SourceText s))),
+ ("nounpack", strtoken (\s -> ITnounpack_prag (SourceText s))),
+ ("ann", strtoken (\s -> ITann_prag (SourceText s))),
+ ("minimal", strtoken (\s -> ITminimal_prag (SourceText s))),
+ ("overlaps", strtoken (\s -> IToverlaps_prag (SourceText s))),
+ ("overlappable", strtoken (\s -> IToverlappable_prag (SourceText s))),
+ ("overlapping", strtoken (\s -> IToverlapping_prag (SourceText s))),
+ ("incoherent", strtoken (\s -> ITincoherent_prag (SourceText s))),
+ ("ctype", strtoken (\s -> ITctype (SourceText s))),
+ ("complete", strtoken (\s -> ITcomplete_prag (SourceText s))),
+ ("column", columnPrag)
+ ]
+
+twoWordPrags = Map.fromList [
+ ("inline conlike",
+ strtoken (\s -> (ITinline_prag (SourceText s) Inline ConLike))),
+ ("notinline conlike",
+ strtoken (\s -> (ITinline_prag (SourceText s) NoInline ConLike))),
+ ("specialize inline",
+ strtoken (\s -> (ITspec_inline_prag (SourceText s) True))),
+ ("specialize notinline",
+ strtoken (\s -> (ITspec_inline_prag (SourceText s) False)))
+ ]
+
+dispatch_pragmas :: Map String Action -> Action
+dispatch_pragmas prags span buf len = case Map.lookup (clean_pragma (lexemeToString buf len)) prags of
+ Just found -> found span buf len
+ Nothing -> lexError "unknown pragma"
+
+known_pragma :: Map String Action -> AlexAccPred ExtsBitmap
+known_pragma prags _ (AI _ startbuf) _ (AI _ curbuf)
+ = isKnown && nextCharIsNot curbuf pragmaNameChar
+ where l = lexemeToString startbuf (byteDiff startbuf curbuf)
+ isKnown = isJust $ Map.lookup (clean_pragma l) prags
+ pragmaNameChar c = isAlphaNum c || c == '_'
+
+clean_pragma :: String -> String
+clean_pragma prag = canon_ws (map toLower (unprefix prag))
+ where unprefix prag' = case stripPrefix "{-#" prag' of
+ Just rest -> rest
+ Nothing -> prag'
+ canonical prag' = case prag' of
+ "noinline" -> "notinline"
+ "specialise" -> "specialize"
+ "constructorlike" -> "conlike"
+ _ -> prag'
+ canon_ws s = unwords (map canonical (words s))
+
+
+
+{-
+%************************************************************************
+%* *
+ Helper functions for generating annotations in the parser
+%* *
+%************************************************************************
+-}
+
+-- | Encapsulated call to addAnnotation, requiring only the SrcSpan of
+-- the AST construct the annotation belongs to; together with the
+-- AnnKeywordId, this is the key of the annotation map.
+--
+-- This type is useful for places in the parser where it is not yet
+-- known what SrcSpan an annotation should be added to. The most
+-- common situation is when we are parsing a list: the annotations
+-- need to be associated with the AST element that *contains* the
+-- list, not the list itself. 'AddAnn' lets us defer adding the
+-- annotations until we finish parsing the list and are now parsing
+-- the enclosing element; we then apply the 'AddAnn' to associate
+-- the annotations. Another common situation is where a common fragment of
+-- the AST has been factored out but there is no separate AST node for
+-- this fragment (this occurs in class and data declarations). In this
+-- case, the annotation belongs to the parent data declaration.
+--
+-- The usual way an 'AddAnn' is created is using the 'mj' ("make jump")
+-- function, and then it can be discharged using the 'ams' function.
+data AddAnn = AddAnn AnnKeywordId SrcSpan
+
+addAnnotationOnly :: RealSrcSpan -> AnnKeywordId -> RealSrcSpan -> P ()
+addAnnotationOnly l a v = P $ \s -> POk s {
+ annotations = ((l,a), [v]) : annotations s
+ } ()
+
+-- |Given a 'SrcSpan' that surrounds a 'HsPar' or 'HsParTy', generate
+-- 'AddAnn' values for the opening and closing bordering on the start
+-- and end of the span
+mkParensApiAnn :: SrcSpan -> [AddAnn]
+mkParensApiAnn (UnhelpfulSpan _) = []
+mkParensApiAnn (RealSrcSpan ss _) = [AddAnn AnnOpenP lo,AddAnn AnnCloseP lc]
+ where
+ f = srcSpanFile ss
+ sl = srcSpanStartLine ss
+ sc = srcSpanStartCol ss
+ el = srcSpanEndLine ss
+ ec = srcSpanEndCol ss
+ lo = RealSrcSpan (mkRealSrcSpan (realSrcSpanStart ss) (mkRealSrcLoc f sl (sc+1))) Nothing
+ lc = RealSrcSpan (mkRealSrcSpan (mkRealSrcLoc f el (ec - 1)) (realSrcSpanEnd ss)) Nothing
+
+queueComment :: RealLocated Token -> P()
+queueComment c = P $ \s -> POk s {
+ comment_q = commentToAnnotation c : comment_q s
+ } ()
+
+-- | Go through the @comment_q@ in @PState@ and remove all comments
+-- that belong within the given span
+allocateCommentsP :: RealSrcSpan -> P ()
+allocateCommentsP ss = P $ \s ->
+ let (comment_q', newAnns) = allocateComments ss (comment_q s) in
+ POk s {
+ comment_q = comment_q'
+ , annotations_comments = newAnns ++ (annotations_comments s)
+ } ()
+
+allocateComments
+ :: RealSrcSpan
+ -> [RealLocated AnnotationComment]
+ -> ([RealLocated AnnotationComment], [(RealSrcSpan,[RealLocated AnnotationComment])])
+allocateComments ss comment_q =
+ let
+ (before,rest) = break (\(L l _) -> isRealSubspanOf l ss) comment_q
+ (middle,after) = break (\(L l _) -> not (isRealSubspanOf l ss)) rest
+ comment_q' = before ++ after
+ newAnns = if null middle then []
+ else [(ss,middle)]
+ in
+ (comment_q', newAnns)
+
+
+commentToAnnotation :: RealLocated Token -> RealLocated AnnotationComment
+commentToAnnotation (L l (ITdocCommentNext s)) = L l (AnnDocCommentNext s)
+commentToAnnotation (L l (ITdocCommentPrev s)) = L l (AnnDocCommentPrev s)
+commentToAnnotation (L l (ITdocCommentNamed s)) = L l (AnnDocCommentNamed s)
+commentToAnnotation (L l (ITdocSection n s)) = L l (AnnDocSection n s)
+commentToAnnotation (L l (ITdocOptions s)) = L l (AnnDocOptions s)
+commentToAnnotation (L l (ITlineComment s)) = L l (AnnLineComment s)
+commentToAnnotation (L l (ITblockComment s)) = L l (AnnBlockComment s)
+commentToAnnotation _ = panic "commentToAnnotation"
+
+-- ---------------------------------------------------------------------
+
+isComment :: Token -> Bool
+isComment (ITlineComment _) = True
+isComment (ITblockComment _) = True
+isComment _ = False
+
+isDocComment :: Token -> Bool
+isDocComment (ITdocCommentNext _) = True
+isDocComment (ITdocCommentPrev _) = True
+isDocComment (ITdocCommentNamed _) = True
+isDocComment (ITdocSection _ _) = True
+isDocComment (ITdocOptions _) = True
+isDocComment _ = False
+}
diff --git a/compiler/GHC/Parser/PostProcess.hs b/compiler/GHC/Parser/PostProcess.hs
new file mode 100644
index 0000000000..7ce2f4fb9a
--- /dev/null
+++ b/compiler/GHC/Parser/PostProcess.hs
@@ -0,0 +1,3090 @@
+--
+-- (c) The University of Glasgow 2002-2006
+--
+
+-- Functions over HsSyn specialised to RdrName.
+
+{-# LANGUAGE CPP #-}
+{-# LANGUAGE FlexibleContexts #-}
+{-# LANGUAGE FlexibleInstances #-}
+{-# LANGUAGE TypeFamilies #-}
+{-# LANGUAGE MagicHash #-}
+{-# LANGUAGE ViewPatterns #-}
+{-# LANGUAGE GADTs #-}
+{-# LANGUAGE RankNTypes #-}
+{-# LANGUAGE LambdaCase #-}
+{-# LANGUAGE TypeApplications #-}
+{-# LANGUAGE GeneralizedNewtypeDeriving #-}
+
+{-# OPTIONS_GHC -Wno-incomplete-record-updates #-}
+
+module GHC.Parser.PostProcess (
+ mkHsOpApp,
+ mkHsIntegral, mkHsFractional, mkHsIsString,
+ mkHsDo, mkSpliceDecl,
+ mkRoleAnnotDecl,
+ mkClassDecl,
+ mkTyData, mkDataFamInst,
+ mkTySynonym, mkTyFamInstEqn,
+ mkStandaloneKindSig,
+ mkTyFamInst,
+ mkFamDecl, mkLHsSigType,
+ mkInlinePragma,
+ mkPatSynMatchGroup,
+ mkRecConstrOrUpdate, -- HsExp -> [HsFieldUpdate] -> P HsExp
+ mkTyClD, mkInstD,
+ mkRdrRecordCon, mkRdrRecordUpd,
+ setRdrNameSpace,
+ filterCTuple,
+
+ cvBindGroup,
+ cvBindsAndSigs,
+ cvTopDecls,
+ placeHolderPunRhs,
+
+ -- Stuff to do with Foreign declarations
+ mkImport,
+ parseCImport,
+ mkExport,
+ mkExtName, -- RdrName -> CLabelString
+ mkGadtDecl, -- [Located RdrName] -> LHsType RdrName -> ConDecl RdrName
+ mkConDeclH98,
+
+ -- Bunch of functions in the parser monad for
+ -- checking and constructing values
+ checkImportDecl,
+ checkExpBlockArguments,
+ checkPrecP, -- Int -> P Int
+ checkContext, -- HsType -> P HsContext
+ checkPattern, -- HsExp -> P HsPat
+ checkPattern_msg,
+ checkMonadComp, -- P (HsStmtContext GhcPs)
+ checkValDef, -- (SrcLoc, HsExp, HsRhs, [HsDecl]) -> P HsDecl
+ checkValSigLhs,
+ LRuleTyTmVar, RuleTyTmVar(..),
+ mkRuleBndrs, mkRuleTyVarBndrs,
+ checkRuleTyVarBndrNames,
+ checkRecordSyntax,
+ checkEmptyGADTs,
+ addFatalError, hintBangPat,
+ TyEl(..), mergeOps, mergeDataCon,
+ mkBangTy,
+
+ -- Help with processing exports
+ ImpExpSubSpec(..),
+ ImpExpQcSpec(..),
+ mkModuleImpExp,
+ mkTypeImpExp,
+ mkImpExpSubSpec,
+ checkImportSpec,
+
+ -- Token symbols
+ forallSym,
+ starSym,
+
+ -- Warnings and errors
+ warnStarIsType,
+ warnPrepositiveQualifiedModule,
+ failOpFewArgs,
+ failOpNotEnabledImportQualifiedPost,
+ failOpImportQualifiedTwice,
+
+ SumOrTuple (..),
+
+ -- Expression/command/pattern ambiguity resolution
+ PV,
+ runPV,
+ ECP(ECP, runECP_PV),
+ runECP_P,
+ DisambInfixOp(..),
+ DisambECP(..),
+ ecpFromExp,
+ ecpFromCmd,
+ PatBuilder
+ ) where
+
+import GhcPrelude
+import GHC.Hs -- Lots of it
+import GHC.Core.TyCon ( TyCon, isTupleTyCon, tyConSingleDataCon_maybe )
+import GHC.Core.DataCon ( DataCon, dataConTyCon )
+import GHC.Core.ConLike ( ConLike(..) )
+import GHC.Core.Coercion.Axiom ( Role, fsFromRole )
+import GHC.Types.Name.Reader
+import GHC.Types.Name
+import GHC.Types.Basic
+import GHC.Parser.Lexer
+import GHC.Utils.Lexeme ( isLexCon )
+import GHC.Core.Type ( TyThing(..), funTyCon )
+import GHC.Builtin.Types( cTupleTyConName, tupleTyCon, tupleDataCon,
+ nilDataConName, nilDataConKey,
+ listTyConName, listTyConKey, eqTyCon_RDR,
+ tupleTyConName, cTupleTyConNameArity_maybe )
+import GHC.Types.ForeignCall
+import GHC.Builtin.Names ( allNameStrings )
+import GHC.Types.SrcLoc
+import GHC.Types.Unique ( hasKey )
+import OrdList ( OrdList, fromOL )
+import Bag ( emptyBag, consBag )
+import Outputable
+import FastString
+import Maybes
+import Util
+import GHC.Parser.Annotation
+import Data.List
+import GHC.Driver.Session ( WarningFlag(..), DynFlags )
+import ErrUtils ( Messages )
+
+import Control.Monad
+import Text.ParserCombinators.ReadP as ReadP
+import Data.Char
+import qualified Data.Monoid as Monoid
+import Data.Data ( dataTypeOf, fromConstr, dataTypeConstrs )
+import Data.Kind ( Type )
+
+#include "HsVersions.h"
+
+
+{- **********************************************************************
+
+ Construction functions for Rdr stuff
+
+ ********************************************************************* -}
+
+-- | mkClassDecl builds a RdrClassDecl, filling in the names for tycon and
+-- datacon by deriving them from the name of the class. We fill in the names
+-- for the tycon and datacon corresponding to the class, by deriving them
+-- from the name of the class itself. This saves recording the names in the
+-- interface file (which would be equally good).
+
+-- Similarly for mkConDecl, mkClassOpSig and default-method names.
+
+-- *** See Note [The Naming story] in GHC.Hs.Decls ****
+
+mkTyClD :: LTyClDecl (GhcPass p) -> LHsDecl (GhcPass p)
+mkTyClD (L loc d) = L loc (TyClD noExtField d)
+
+mkInstD :: LInstDecl (GhcPass p) -> LHsDecl (GhcPass p)
+mkInstD (L loc d) = L loc (InstD noExtField d)
+
+mkClassDecl :: SrcSpan
+ -> Located (Maybe (LHsContext GhcPs), LHsType GhcPs)
+ -> Located (a,[LHsFunDep GhcPs])
+ -> OrdList (LHsDecl GhcPs)
+ -> P (LTyClDecl GhcPs)
+
+mkClassDecl loc (L _ (mcxt, tycl_hdr)) fds where_cls
+ = do { (binds, sigs, ats, at_defs, _, docs) <- cvBindsAndSigs where_cls
+ ; let cxt = fromMaybe (noLoc []) mcxt
+ ; (cls, tparams, fixity, ann) <- checkTyClHdr True tycl_hdr
+ ; addAnnsAt loc ann -- Add any API Annotations to the top SrcSpan
+ ; (tyvars,annst) <- checkTyVars (text "class") whereDots cls tparams
+ ; addAnnsAt loc annst -- Add any API Annotations to the top SrcSpan
+ ; return (L loc (ClassDecl { tcdCExt = noExtField, tcdCtxt = cxt
+ , tcdLName = cls, tcdTyVars = tyvars
+ , tcdFixity = fixity
+ , tcdFDs = snd (unLoc fds)
+ , tcdSigs = mkClassOpSigs sigs
+ , tcdMeths = binds
+ , tcdATs = ats, tcdATDefs = at_defs
+ , tcdDocs = docs })) }
+
+mkTyData :: SrcSpan
+ -> NewOrData
+ -> Maybe (Located CType)
+ -> Located (Maybe (LHsContext GhcPs), LHsType GhcPs)
+ -> Maybe (LHsKind GhcPs)
+ -> [LConDecl GhcPs]
+ -> HsDeriving GhcPs
+ -> P (LTyClDecl GhcPs)
+mkTyData loc new_or_data cType (L _ (mcxt, tycl_hdr))
+ ksig data_cons maybe_deriv
+ = do { (tc, tparams, fixity, ann) <- checkTyClHdr False tycl_hdr
+ ; addAnnsAt loc ann -- Add any API Annotations to the top SrcSpan
+ ; (tyvars, anns) <- checkTyVars (ppr new_or_data) equalsDots tc tparams
+ ; addAnnsAt loc anns -- Add any API Annotations to the top SrcSpan
+ ; defn <- mkDataDefn new_or_data cType mcxt ksig data_cons maybe_deriv
+ ; return (L loc (DataDecl { tcdDExt = noExtField,
+ tcdLName = tc, tcdTyVars = tyvars,
+ tcdFixity = fixity,
+ tcdDataDefn = defn })) }
+
+mkDataDefn :: NewOrData
+ -> Maybe (Located CType)
+ -> Maybe (LHsContext GhcPs)
+ -> Maybe (LHsKind GhcPs)
+ -> [LConDecl GhcPs]
+ -> HsDeriving GhcPs
+ -> P (HsDataDefn GhcPs)
+mkDataDefn new_or_data cType mcxt ksig data_cons maybe_deriv
+ = do { checkDatatypeContext mcxt
+ ; let cxt = fromMaybe (noLoc []) mcxt
+ ; return (HsDataDefn { dd_ext = noExtField
+ , dd_ND = new_or_data, dd_cType = cType
+ , dd_ctxt = cxt
+ , dd_cons = data_cons
+ , dd_kindSig = ksig
+ , dd_derivs = maybe_deriv }) }
+
+
+mkTySynonym :: SrcSpan
+ -> LHsType GhcPs -- LHS
+ -> LHsType GhcPs -- RHS
+ -> P (LTyClDecl GhcPs)
+mkTySynonym loc lhs rhs
+ = do { (tc, tparams, fixity, ann) <- checkTyClHdr False lhs
+ ; addAnnsAt loc ann -- Add any API Annotations to the top SrcSpan
+ ; (tyvars, anns) <- checkTyVars (text "type") equalsDots tc tparams
+ ; addAnnsAt loc anns -- Add any API Annotations to the top SrcSpan
+ ; return (L loc (SynDecl { tcdSExt = noExtField
+ , tcdLName = tc, tcdTyVars = tyvars
+ , tcdFixity = fixity
+ , tcdRhs = rhs })) }
+
+mkStandaloneKindSig
+ :: SrcSpan
+ -> Located [Located RdrName] -- LHS
+ -> LHsKind GhcPs -- RHS
+ -> P (LStandaloneKindSig GhcPs)
+mkStandaloneKindSig loc lhs rhs =
+ do { vs <- mapM check_lhs_name (unLoc lhs)
+ ; v <- check_singular_lhs (reverse vs)
+ ; return $ L loc $ StandaloneKindSig noExtField v (mkLHsSigType rhs) }
+ where
+ check_lhs_name v@(unLoc->name) =
+ if isUnqual name && isTcOcc (rdrNameOcc name)
+ then return v
+ else addFatalError (getLoc v) $
+ hang (text "Expected an unqualified type constructor:") 2 (ppr v)
+ check_singular_lhs vs =
+ case vs of
+ [] -> panic "mkStandaloneKindSig: empty left-hand side"
+ [v] -> return v
+ _ -> addFatalError (getLoc lhs) $
+ vcat [ hang (text "Standalone kind signatures do not support multiple names at the moment:")
+ 2 (pprWithCommas ppr vs)
+ , text "See https://gitlab.haskell.org/ghc/ghc/issues/16754 for details." ]
+
+mkTyFamInstEqn :: Maybe [LHsTyVarBndr GhcPs]
+ -> LHsType GhcPs
+ -> LHsType GhcPs
+ -> P (TyFamInstEqn GhcPs,[AddAnn])
+mkTyFamInstEqn bndrs lhs rhs
+ = do { (tc, tparams, fixity, ann) <- checkTyClHdr False lhs
+ ; return (mkHsImplicitBndrs
+ (FamEqn { feqn_ext = noExtField
+ , feqn_tycon = tc
+ , feqn_bndrs = bndrs
+ , feqn_pats = tparams
+ , feqn_fixity = fixity
+ , feqn_rhs = rhs }),
+ ann) }
+
+mkDataFamInst :: SrcSpan
+ -> NewOrData
+ -> Maybe (Located CType)
+ -> (Maybe ( LHsContext GhcPs), Maybe [LHsTyVarBndr GhcPs]
+ , LHsType GhcPs)
+ -> Maybe (LHsKind GhcPs)
+ -> [LConDecl GhcPs]
+ -> HsDeriving GhcPs
+ -> P (LInstDecl GhcPs)
+mkDataFamInst loc new_or_data cType (mcxt, bndrs, tycl_hdr)
+ ksig data_cons maybe_deriv
+ = do { (tc, tparams, fixity, ann) <- checkTyClHdr False tycl_hdr
+ ; addAnnsAt loc ann -- Add any API Annotations to the top SrcSpan
+ ; defn <- mkDataDefn new_or_data cType mcxt ksig data_cons maybe_deriv
+ ; return (L loc (DataFamInstD noExtField (DataFamInstDecl (mkHsImplicitBndrs
+ (FamEqn { feqn_ext = noExtField
+ , feqn_tycon = tc
+ , feqn_bndrs = bndrs
+ , feqn_pats = tparams
+ , feqn_fixity = fixity
+ , feqn_rhs = defn }))))) }
+
+mkTyFamInst :: SrcSpan
+ -> TyFamInstEqn GhcPs
+ -> P (LInstDecl GhcPs)
+mkTyFamInst loc eqn
+ = return (L loc (TyFamInstD noExtField (TyFamInstDecl eqn)))
+
+mkFamDecl :: SrcSpan
+ -> FamilyInfo GhcPs
+ -> LHsType GhcPs -- LHS
+ -> Located (FamilyResultSig GhcPs) -- Optional result signature
+ -> Maybe (LInjectivityAnn GhcPs) -- Injectivity annotation
+ -> P (LTyClDecl GhcPs)
+mkFamDecl loc info lhs ksig injAnn
+ = do { (tc, tparams, fixity, ann) <- checkTyClHdr False lhs
+ ; addAnnsAt loc ann -- Add any API Annotations to the top SrcSpan
+ ; (tyvars, anns) <- checkTyVars (ppr info) equals_or_where tc tparams
+ ; addAnnsAt loc anns -- Add any API Annotations to the top SrcSpan
+ ; return (L loc (FamDecl noExtField (FamilyDecl
+ { fdExt = noExtField
+ , fdInfo = info, fdLName = tc
+ , fdTyVars = tyvars
+ , fdFixity = fixity
+ , fdResultSig = ksig
+ , fdInjectivityAnn = injAnn }))) }
+ where
+ equals_or_where = case info of
+ DataFamily -> empty
+ OpenTypeFamily -> empty
+ ClosedTypeFamily {} -> whereDots
+
+mkSpliceDecl :: LHsExpr GhcPs -> HsDecl GhcPs
+-- If the user wrote
+-- [pads| ... ] then return a QuasiQuoteD
+-- $(e) then return a SpliceD
+-- but if she wrote, say,
+-- f x then behave as if she'd written $(f x)
+-- ie a SpliceD
+--
+-- Typed splices are not allowed at the top level, thus we do not represent them
+-- as spliced declaration. See #10945
+mkSpliceDecl lexpr@(L loc expr)
+ | HsSpliceE _ splice@(HsUntypedSplice {}) <- expr
+ = SpliceD noExtField (SpliceDecl noExtField (L loc splice) ExplicitSplice)
+
+ | HsSpliceE _ splice@(HsQuasiQuote {}) <- expr
+ = SpliceD noExtField (SpliceDecl noExtField (L loc splice) ExplicitSplice)
+
+ | otherwise
+ = SpliceD noExtField (SpliceDecl noExtField (L loc (mkUntypedSplice BareSplice lexpr))
+ ImplicitSplice)
+
+mkRoleAnnotDecl :: SrcSpan
+ -> Located RdrName -- type being annotated
+ -> [Located (Maybe FastString)] -- roles
+ -> P (LRoleAnnotDecl GhcPs)
+mkRoleAnnotDecl loc tycon roles
+ = do { roles' <- mapM parse_role roles
+ ; return $ L loc $ RoleAnnotDecl noExtField tycon roles' }
+ where
+ role_data_type = dataTypeOf (undefined :: Role)
+ all_roles = map fromConstr $ dataTypeConstrs role_data_type
+ possible_roles = [(fsFromRole role, role) | role <- all_roles]
+
+ parse_role (L loc_role Nothing) = return $ L loc_role Nothing
+ parse_role (L loc_role (Just role))
+ = case lookup role possible_roles of
+ Just found_role -> return $ L loc_role $ Just found_role
+ Nothing ->
+ let nearby = fuzzyLookup (unpackFS role)
+ (mapFst unpackFS possible_roles)
+ in
+ addFatalError loc_role
+ (text "Illegal role name" <+> quotes (ppr role) $$
+ suggestions nearby)
+
+ suggestions [] = empty
+ suggestions [r] = text "Perhaps you meant" <+> quotes (ppr r)
+ -- will this last case ever happen??
+ suggestions list = hang (text "Perhaps you meant one of these:")
+ 2 (pprWithCommas (quotes . ppr) list)
+
+{- **********************************************************************
+
+ #cvBinds-etc# Converting to @HsBinds@, etc.
+
+ ********************************************************************* -}
+
+-- | Function definitions are restructured here. Each is assumed to be recursive
+-- initially, and non recursive definitions are discovered by the dependency
+-- analyser.
+
+
+-- | Groups together bindings for a single function
+cvTopDecls :: OrdList (LHsDecl GhcPs) -> [LHsDecl GhcPs]
+cvTopDecls decls = go (fromOL decls)
+ where
+ go :: [LHsDecl GhcPs] -> [LHsDecl GhcPs]
+ go [] = []
+ go ((L l (ValD x b)) : ds)
+ = L l' (ValD x b') : go ds'
+ where (L l' b', ds') = getMonoBind (L l b) ds
+ go (d : ds) = d : go ds
+
+-- Declaration list may only contain value bindings and signatures.
+cvBindGroup :: OrdList (LHsDecl GhcPs) -> P (HsValBinds GhcPs)
+cvBindGroup binding
+ = do { (mbs, sigs, fam_ds, tfam_insts
+ , dfam_insts, _) <- cvBindsAndSigs binding
+ ; ASSERT( null fam_ds && null tfam_insts && null dfam_insts)
+ return $ ValBinds noExtField mbs sigs }
+
+cvBindsAndSigs :: OrdList (LHsDecl GhcPs)
+ -> P (LHsBinds GhcPs, [LSig GhcPs], [LFamilyDecl GhcPs]
+ , [LTyFamInstDecl GhcPs], [LDataFamInstDecl GhcPs], [LDocDecl])
+-- Input decls contain just value bindings and signatures
+-- and in case of class or instance declarations also
+-- associated type declarations. They might also contain Haddock comments.
+cvBindsAndSigs fb = go (fromOL fb)
+ where
+ go [] = return (emptyBag, [], [], [], [], [])
+ go ((L l (ValD _ b)) : ds)
+ = do { (bs, ss, ts, tfis, dfis, docs) <- go ds'
+ ; return (b' `consBag` bs, ss, ts, tfis, dfis, docs) }
+ where
+ (b', ds') = getMonoBind (L l b) ds
+ go ((L l decl) : ds)
+ = do { (bs, ss, ts, tfis, dfis, docs) <- go ds
+ ; case decl of
+ SigD _ s
+ -> return (bs, L l s : ss, ts, tfis, dfis, docs)
+ TyClD _ (FamDecl _ t)
+ -> return (bs, ss, L l t : ts, tfis, dfis, docs)
+ InstD _ (TyFamInstD { tfid_inst = tfi })
+ -> return (bs, ss, ts, L l tfi : tfis, dfis, docs)
+ InstD _ (DataFamInstD { dfid_inst = dfi })
+ -> return (bs, ss, ts, tfis, L l dfi : dfis, docs)
+ DocD _ d
+ -> return (bs, ss, ts, tfis, dfis, L l d : docs)
+ SpliceD _ d
+ -> addFatalError l $
+ hang (text "Declaration splices are allowed only" <+>
+ text "at the top level:")
+ 2 (ppr d)
+ _ -> pprPanic "cvBindsAndSigs" (ppr decl) }
+
+-----------------------------------------------------------------------------
+-- Group function bindings into equation groups
+
+getMonoBind :: LHsBind GhcPs -> [LHsDecl GhcPs]
+ -> (LHsBind GhcPs, [LHsDecl GhcPs])
+-- Suppose (b',ds') = getMonoBind b ds
+-- ds is a list of parsed bindings
+-- b is a MonoBinds that has just been read off the front
+
+-- Then b' is the result of grouping more equations from ds that
+-- belong with b into a single MonoBinds, and ds' is the depleted
+-- list of parsed bindings.
+--
+-- All Haddock comments between equations inside the group are
+-- discarded.
+--
+-- No AndMonoBinds or EmptyMonoBinds here; just single equations
+
+getMonoBind (L loc1 (FunBind { fun_id = fun_id1@(L _ f1)
+ , fun_matches =
+ MG { mg_alts = (L _ mtchs1) } }))
+ binds
+ | has_args mtchs1
+ = go mtchs1 loc1 binds []
+ where
+ go mtchs loc
+ ((L loc2 (ValD _ (FunBind { fun_id = (L _ f2)
+ , fun_matches =
+ MG { mg_alts = (L _ mtchs2) } })))
+ : binds) _
+ | f1 == f2 = go (mtchs2 ++ mtchs)
+ (combineSrcSpans loc loc2) binds []
+ go mtchs loc (doc_decl@(L loc2 (DocD {})) : binds) doc_decls
+ = let doc_decls' = doc_decl : doc_decls
+ in go mtchs (combineSrcSpans loc loc2) binds doc_decls'
+ go mtchs loc binds doc_decls
+ = ( L loc (makeFunBind fun_id1 (reverse mtchs))
+ , (reverse doc_decls) ++ binds)
+ -- Reverse the final matches, to get it back in the right order
+ -- Do the same thing with the trailing doc comments
+
+getMonoBind bind binds = (bind, binds)
+
+has_args :: [LMatch GhcPs (LHsExpr GhcPs)] -> Bool
+has_args [] = panic "GHC.Parser.PostProcess.has_args"
+has_args (L _ (Match { m_pats = args }) : _) = not (null args)
+ -- Don't group together FunBinds if they have
+ -- no arguments. This is necessary now that variable bindings
+ -- with no arguments are now treated as FunBinds rather
+ -- than pattern bindings (tests/rename/should_fail/rnfail002).
+
+{- **********************************************************************
+
+ #PrefixToHS-utils# Utilities for conversion
+
+ ********************************************************************* -}
+
+{- Note [Parsing data constructors is hard]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The problem with parsing data constructors is that they look a lot like types.
+Compare:
+
+ (s1) data T = C t1 t2
+ (s2) type T = C t1 t2
+
+Syntactically, there's little difference between these declarations, except in
+(s1) 'C' is a data constructor, but in (s2) 'C' is a type constructor.
+
+This similarity would pose no problem if we knew ahead of time if we are
+parsing a type or a constructor declaration. Looking at (s1) and (s2), a simple
+(but wrong!) rule comes to mind: in 'data' declarations assume we are parsing
+data constructors, and in other contexts (e.g. 'type' declarations) assume we
+are parsing type constructors.
+
+This simple rule does not work because of two problematic cases:
+
+ (p1) data T = C t1 t2 :+ t3
+ (p2) data T = C t1 t2 => t3
+
+In (p1) we encounter (:+) and it turns out we are parsing an infix data
+declaration, so (C t1 t2) is a type and 'C' is a type constructor.
+In (p2) we encounter (=>) and it turns out we are parsing an existential
+context, so (C t1 t2) is a constraint and 'C' is a type constructor.
+
+As the result, in order to determine whether (C t1 t2) declares a data
+constructor, a type, or a context, we would need unlimited lookahead which
+'happy' is not so happy with.
+
+To further complicate matters, the interpretation of (!) and (~) is different
+in constructors and types:
+
+ (b1) type T = C ! D
+ (b2) data T = C ! D
+ (b3) data T = C ! D => E
+
+In (b1) and (b3), (!) is a type operator with two arguments: 'C' and 'D'. At
+the same time, in (b2) it is a strictness annotation: 'C' is a data constructor
+with a single strict argument 'D'. For the programmer, these cases are usually
+easy to tell apart due to whitespace conventions:
+
+ (b2) data T = C !D -- no space after the bang hints that
+ -- it is a strictness annotation
+
+For the parser, on the other hand, this whitespace does not matter. We cannot
+tell apart (b2) from (b3) until we encounter (=>), so it requires unlimited
+lookahead.
+
+The solution that accounts for all of these issues is to initially parse data
+declarations and types as a reversed list of TyEl:
+
+ data TyEl = TyElOpr RdrName
+ | TyElOpd (HsType GhcPs)
+ | ...
+
+For example, both occurrences of (C ! D) in the following example are parsed
+into equal lists of TyEl:
+
+ data T = C ! D => C ! D results in [ TyElOpd (HsTyVar "D")
+ , TyElOpr "!"
+ , TyElOpd (HsTyVar "C") ]
+
+Note that elements are in reverse order. Also, 'C' is parsed as a type
+constructor (HsTyVar) even when it is a data constructor. We fix this in
+`tyConToDataCon`.
+
+By the time the list of TyEl is assembled, we have looked ahead enough to
+decide whether to reduce using `mergeOps` (for types) or `mergeDataCon` (for
+data constructors). These functions are where the actual job of parsing is
+done.
+
+-}
+
+-- | Reinterpret a type constructor, including type operators, as a data
+-- constructor.
+-- See Note [Parsing data constructors is hard]
+tyConToDataCon :: SrcSpan -> RdrName -> Either (SrcSpan, SDoc) (Located RdrName)
+tyConToDataCon loc tc
+ | isTcOcc occ || isDataOcc occ
+ , isLexCon (occNameFS occ)
+ = return (L loc (setRdrNameSpace tc srcDataName))
+
+ | otherwise
+ = Left (loc, msg)
+ where
+ occ = rdrNameOcc tc
+ msg = text "Not a data constructor:" <+> quotes (ppr tc)
+
+mkPatSynMatchGroup :: Located RdrName
+ -> Located (OrdList (LHsDecl GhcPs))
+ -> P (MatchGroup GhcPs (LHsExpr GhcPs))
+mkPatSynMatchGroup (L loc patsyn_name) (L _ decls) =
+ do { matches <- mapM fromDecl (fromOL decls)
+ ; when (null matches) (wrongNumberErr loc)
+ ; return $ mkMatchGroup FromSource matches }
+ where
+ fromDecl (L loc decl@(ValD _ (PatBind _
+ pat@(L _ (ConPatIn ln@(L _ name) details))
+ rhs _))) =
+ do { unless (name == patsyn_name) $
+ wrongNameBindingErr loc decl
+ ; match <- case details of
+ PrefixCon pats -> return $ Match { m_ext = noExtField
+ , m_ctxt = ctxt, m_pats = pats
+ , m_grhss = rhs }
+ where
+ ctxt = FunRhs { mc_fun = ln
+ , mc_fixity = Prefix
+ , mc_strictness = NoSrcStrict }
+
+ InfixCon p1 p2 -> return $ Match { m_ext = noExtField
+ , m_ctxt = ctxt
+ , m_pats = [p1, p2]
+ , m_grhss = rhs }
+ where
+ ctxt = FunRhs { mc_fun = ln
+ , mc_fixity = Infix
+ , mc_strictness = NoSrcStrict }
+
+ RecCon{} -> recordPatSynErr loc pat
+ ; return $ L loc match }
+ fromDecl (L loc decl) = extraDeclErr loc decl
+
+ extraDeclErr loc decl =
+ addFatalError loc $
+ text "pattern synonym 'where' clause must contain a single binding:" $$
+ ppr decl
+
+ wrongNameBindingErr loc decl =
+ addFatalError loc $
+ text "pattern synonym 'where' clause must bind the pattern synonym's name"
+ <+> quotes (ppr patsyn_name) $$ ppr decl
+
+ wrongNumberErr loc =
+ addFatalError loc $
+ text "pattern synonym 'where' clause cannot be empty" $$
+ text "In the pattern synonym declaration for: " <+> ppr (patsyn_name)
+
+recordPatSynErr :: SrcSpan -> LPat GhcPs -> P a
+recordPatSynErr loc pat =
+ addFatalError loc $
+ text "record syntax not supported for pattern synonym declarations:" $$
+ ppr pat
+
+mkConDeclH98 :: Located RdrName -> Maybe [LHsTyVarBndr GhcPs]
+ -> Maybe (LHsContext GhcPs) -> HsConDeclDetails GhcPs
+ -> ConDecl GhcPs
+
+mkConDeclH98 name mb_forall mb_cxt args
+ = ConDeclH98 { con_ext = noExtField
+ , con_name = name
+ , con_forall = noLoc $ isJust mb_forall
+ , con_ex_tvs = mb_forall `orElse` []
+ , con_mb_cxt = mb_cxt
+ , con_args = args
+ , con_doc = Nothing }
+
+mkGadtDecl :: [Located RdrName]
+ -> LHsType GhcPs -- Always a HsForAllTy
+ -> (ConDecl GhcPs, [AddAnn])
+mkGadtDecl names ty
+ = (ConDeclGADT { con_g_ext = noExtField
+ , con_names = names
+ , con_forall = L l $ isLHsForAllTy ty'
+ , con_qvars = mkHsQTvs tvs
+ , con_mb_cxt = mcxt
+ , con_args = args
+ , con_res_ty = res_ty
+ , con_doc = Nothing }
+ , anns1 ++ anns2)
+ where
+ (ty'@(L l _),anns1) = peel_parens ty []
+ (tvs, rho) = splitLHsForAllTyInvis ty'
+ (mcxt, tau, anns2) = split_rho rho []
+
+ split_rho (L _ (HsQualTy { hst_ctxt = cxt, hst_body = tau })) ann
+ = (Just cxt, tau, ann)
+ split_rho (L l (HsParTy _ ty)) ann
+ = split_rho ty (ann++mkParensApiAnn l)
+ split_rho tau ann
+ = (Nothing, tau, ann)
+
+ (args, res_ty) = split_tau tau
+
+ -- See Note [GADT abstract syntax] in GHC.Hs.Decls
+ split_tau (L _ (HsFunTy _ (L loc (HsRecTy _ rf)) res_ty))
+ = (RecCon (L loc rf), res_ty)
+ split_tau tau
+ = (PrefixCon [], tau)
+
+ peel_parens (L l (HsParTy _ ty)) ann = peel_parens ty
+ (ann++mkParensApiAnn l)
+ peel_parens ty ann = (ty, ann)
+
+
+setRdrNameSpace :: RdrName -> NameSpace -> RdrName
+-- ^ This rather gruesome function is used mainly by the parser.
+-- When parsing:
+--
+-- > data T a = T | T1 Int
+--
+-- we parse the data constructors as /types/ because of parser ambiguities,
+-- so then we need to change the /type constr/ to a /data constr/
+--
+-- The exact-name case /can/ occur when parsing:
+--
+-- > data [] a = [] | a : [a]
+--
+-- For the exact-name case we return an original name.
+setRdrNameSpace (Unqual occ) ns = Unqual (setOccNameSpace ns occ)
+setRdrNameSpace (Qual m occ) ns = Qual m (setOccNameSpace ns occ)
+setRdrNameSpace (Orig m occ) ns = Orig m (setOccNameSpace ns occ)
+setRdrNameSpace (Exact n) ns
+ | Just thing <- wiredInNameTyThing_maybe n
+ = setWiredInNameSpace thing ns
+ -- Preserve Exact Names for wired-in things,
+ -- notably tuples and lists
+
+ | isExternalName n
+ = Orig (nameModule n) occ
+
+ | otherwise -- This can happen when quoting and then
+ -- splicing a fixity declaration for a type
+ = Exact (mkSystemNameAt (nameUnique n) occ (nameSrcSpan n))
+ where
+ occ = setOccNameSpace ns (nameOccName n)
+
+setWiredInNameSpace :: TyThing -> NameSpace -> RdrName
+setWiredInNameSpace (ATyCon tc) ns
+ | isDataConNameSpace ns
+ = ty_con_data_con tc
+ | isTcClsNameSpace ns
+ = Exact (getName tc) -- No-op
+
+setWiredInNameSpace (AConLike (RealDataCon dc)) ns
+ | isTcClsNameSpace ns
+ = data_con_ty_con dc
+ | isDataConNameSpace ns
+ = Exact (getName dc) -- No-op
+
+setWiredInNameSpace thing ns
+ = pprPanic "setWiredinNameSpace" (pprNameSpace ns <+> ppr thing)
+
+ty_con_data_con :: TyCon -> RdrName
+ty_con_data_con tc
+ | isTupleTyCon tc
+ , Just dc <- tyConSingleDataCon_maybe tc
+ = Exact (getName dc)
+
+ | tc `hasKey` listTyConKey
+ = Exact nilDataConName
+
+ | otherwise -- See Note [setRdrNameSpace for wired-in names]
+ = Unqual (setOccNameSpace srcDataName (getOccName tc))
+
+data_con_ty_con :: DataCon -> RdrName
+data_con_ty_con dc
+ | let tc = dataConTyCon dc
+ , isTupleTyCon tc
+ = Exact (getName tc)
+
+ | dc `hasKey` nilDataConKey
+ = Exact listTyConName
+
+ | otherwise -- See Note [setRdrNameSpace for wired-in names]
+ = Unqual (setOccNameSpace tcClsName (getOccName dc))
+
+-- | Replaces constraint tuple names with corresponding boxed ones.
+filterCTuple :: RdrName -> RdrName
+filterCTuple (Exact n)
+ | Just arity <- cTupleTyConNameArity_maybe n
+ = Exact $ tupleTyConName BoxedTuple arity
+filterCTuple rdr = rdr
+
+
+{- Note [setRdrNameSpace for wired-in names]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+In GHC.Types, which declares (:), we have
+ infixr 5 :
+The ambiguity about which ":" is meant is resolved by parsing it as a
+data constructor, but then using dataTcOccs to try the type constructor too;
+and that in turn calls setRdrNameSpace to change the name-space of ":" to
+tcClsName. There isn't a corresponding ":" type constructor, but it's painful
+to make setRdrNameSpace partial, so we just make an Unqual name instead. It
+really doesn't matter!
+-}
+
+eitherToP :: Either (SrcSpan, SDoc) a -> P a
+-- Adapts the Either monad to the P monad
+eitherToP (Left (loc, doc)) = addFatalError loc doc
+eitherToP (Right thing) = return thing
+
+checkTyVars :: SDoc -> SDoc -> Located RdrName -> [LHsTypeArg GhcPs]
+ -> P ( LHsQTyVars GhcPs -- the synthesized type variables
+ , [AddAnn] ) -- action which adds annotations
+-- ^ Check whether the given list of type parameters are all type variables
+-- (possibly with a kind signature).
+checkTyVars pp_what equals_or_where tc tparms
+ = do { (tvs, anns) <- fmap unzip $ mapM check tparms
+ ; return (mkHsQTvs tvs, concat anns) }
+ where
+ check (HsTypeArg _ ki@(L loc _))
+ = addFatalError loc $
+ vcat [ text "Unexpected type application" <+>
+ text "@" <> ppr ki
+ , text "In the" <+> pp_what <+>
+ ptext (sLit "declaration for") <+> quotes (ppr tc)]
+ check (HsValArg ty) = chkParens [] ty
+ check (HsArgPar sp) = addFatalError sp $
+ vcat [text "Malformed" <+> pp_what
+ <+> text "declaration for" <+> quotes (ppr tc)]
+ -- Keep around an action for adjusting the annotations of extra parens
+ chkParens :: [AddAnn] -> LHsType GhcPs
+ -> P (LHsTyVarBndr GhcPs, [AddAnn])
+ chkParens acc (L l (HsParTy _ ty)) = chkParens (mkParensApiAnn l ++ acc) ty
+ chkParens acc ty = do
+ tv <- chk ty
+ return (tv, reverse acc)
+
+ -- Check that the name space is correct!
+ chk :: LHsType GhcPs -> P (LHsTyVarBndr GhcPs)
+ chk (L l (HsKindSig _ (L lv (HsTyVar _ _ (L _ tv))) k))
+ | isRdrTyVar tv = return (L l (KindedTyVar noExtField (L lv tv) k))
+ chk (L l (HsTyVar _ _ (L ltv tv)))
+ | isRdrTyVar tv = return (L l (UserTyVar noExtField (L ltv tv)))
+ chk t@(L loc _)
+ = addFatalError loc $
+ vcat [ text "Unexpected type" <+> quotes (ppr t)
+ , text "In the" <+> pp_what
+ <+> ptext (sLit "declaration for") <+> quotes tc'
+ , vcat[ (text "A" <+> pp_what
+ <+> ptext (sLit "declaration should have form"))
+ , nest 2
+ (pp_what
+ <+> tc'
+ <+> hsep (map text (takeList tparms allNameStrings))
+ <+> equals_or_where) ] ]
+
+ -- Avoid printing a constraint tuple in the error message. Print
+ -- a plain old tuple instead (since that's what the user probably
+ -- wrote). See #14907
+ tc' = ppr $ fmap filterCTuple tc
+
+
+
+whereDots, equalsDots :: SDoc
+-- Second argument to checkTyVars
+whereDots = text "where ..."
+equalsDots = text "= ..."
+
+checkDatatypeContext :: Maybe (LHsContext GhcPs) -> P ()
+checkDatatypeContext Nothing = return ()
+checkDatatypeContext (Just c)
+ = do allowed <- getBit DatatypeContextsBit
+ unless allowed $
+ addError (getLoc c)
+ (text "Illegal datatype context (use DatatypeContexts):"
+ <+> pprLHsContext c)
+
+type LRuleTyTmVar = Located RuleTyTmVar
+data RuleTyTmVar = RuleTyTmVar (Located RdrName) (Maybe (LHsType GhcPs))
+-- ^ Essentially a wrapper for a @RuleBndr GhcPs@
+
+-- turns RuleTyTmVars into RuleBnrs - this is straightforward
+mkRuleBndrs :: [LRuleTyTmVar] -> [LRuleBndr GhcPs]
+mkRuleBndrs = fmap (fmap cvt_one)
+ where cvt_one (RuleTyTmVar v Nothing) = RuleBndr noExtField v
+ cvt_one (RuleTyTmVar v (Just sig)) =
+ RuleBndrSig noExtField v (mkLHsSigWcType sig)
+
+-- turns RuleTyTmVars into HsTyVarBndrs - this is more interesting
+mkRuleTyVarBndrs :: [LRuleTyTmVar] -> [LHsTyVarBndr GhcPs]
+mkRuleTyVarBndrs = fmap (fmap cvt_one)
+ where cvt_one (RuleTyTmVar v Nothing) = UserTyVar noExtField (fmap tm_to_ty v)
+ cvt_one (RuleTyTmVar v (Just sig))
+ = KindedTyVar noExtField (fmap tm_to_ty v) sig
+ -- takes something in namespace 'varName' to something in namespace 'tvName'
+ tm_to_ty (Unqual occ) = Unqual (setOccNameSpace tvName occ)
+ tm_to_ty _ = panic "mkRuleTyVarBndrs"
+
+-- See note [Parsing explicit foralls in Rules] in GHC.Parser
+checkRuleTyVarBndrNames :: [LHsTyVarBndr GhcPs] -> P ()
+checkRuleTyVarBndrNames = mapM_ (check . fmap hsTyVarName)
+ where check (L loc (Unqual occ)) = do
+ when ((occNameString occ ==) `any` ["forall","family","role"])
+ (addFatalError loc (text $ "parse error on input "
+ ++ occNameString occ))
+ check _ = panic "checkRuleTyVarBndrNames"
+
+checkRecordSyntax :: (MonadP m, Outputable a) => Located a -> m (Located a)
+checkRecordSyntax lr@(L loc r)
+ = do allowed <- getBit TraditionalRecordSyntaxBit
+ unless allowed $ addError loc $
+ text "Illegal record syntax (use TraditionalRecordSyntax):" <+> ppr r
+ return lr
+
+-- | Check if the gadt_constrlist is empty. Only raise parse error for
+-- `data T where` to avoid affecting existing error message, see #8258.
+checkEmptyGADTs :: Located ([AddAnn], [LConDecl GhcPs])
+ -> P (Located ([AddAnn], [LConDecl GhcPs]))
+checkEmptyGADTs gadts@(L span (_, [])) -- Empty GADT declaration.
+ = do gadtSyntax <- getBit GadtSyntaxBit -- GADTs implies GADTSyntax
+ unless gadtSyntax $ addError span $ vcat
+ [ text "Illegal keyword 'where' in data declaration"
+ , text "Perhaps you intended to use GADTs or a similar language"
+ , text "extension to enable syntax: data T where"
+ ]
+ return gadts
+checkEmptyGADTs gadts = return gadts -- Ordinary GADT declaration.
+
+checkTyClHdr :: Bool -- True <=> class header
+ -- False <=> type header
+ -> LHsType GhcPs
+ -> P (Located RdrName, -- the head symbol (type or class name)
+ [LHsTypeArg GhcPs], -- parameters of head symbol
+ LexicalFixity, -- the declaration is in infix format
+ [AddAnn]) -- API Annotation for HsParTy when stripping parens
+-- Well-formedness check and decomposition of type and class heads.
+-- Decomposes T ty1 .. tyn into (T, [ty1, ..., tyn])
+-- Int :*: Bool into (:*:, [Int, Bool])
+-- returning the pieces
+checkTyClHdr is_cls ty
+ = goL ty [] [] Prefix
+ where
+ goL (L l ty) acc ann fix = go l ty acc ann fix
+
+ -- workaround to define '*' despite StarIsType
+ go lp (HsParTy _ (L l (HsStarTy _ isUni))) acc ann fix
+ = do { warnStarBndr l
+ ; let name = mkOccName tcClsName (starSym isUni)
+ ; return (L l (Unqual name), acc, fix, (ann ++ mkParensApiAnn lp)) }
+
+ go _ (HsTyVar _ _ ltc@(L _ tc)) acc ann fix
+ | isRdrTc tc = return (ltc, acc, fix, ann)
+ go _ (HsOpTy _ t1 ltc@(L _ tc) t2) acc ann _fix
+ | isRdrTc tc = return (ltc, HsValArg t1:HsValArg t2:acc, Infix, ann)
+ go l (HsParTy _ ty) acc ann fix = goL ty acc (ann ++mkParensApiAnn l) fix
+ go _ (HsAppTy _ t1 t2) acc ann fix = goL t1 (HsValArg t2:acc) ann fix
+ go _ (HsAppKindTy l ty ki) acc ann fix = goL ty (HsTypeArg l ki:acc) ann fix
+ go l (HsTupleTy _ HsBoxedOrConstraintTuple ts) [] ann fix
+ = return (L l (nameRdrName tup_name), map HsValArg ts, fix, ann)
+ where
+ arity = length ts
+ tup_name | is_cls = cTupleTyConName arity
+ | otherwise = getName (tupleTyCon Boxed arity)
+ -- See Note [Unit tuples] in GHC.Hs.Types (TODO: is this still relevant?)
+ go l _ _ _ _
+ = addFatalError l (text "Malformed head of type or class declaration:"
+ <+> ppr ty)
+
+-- | Yield a parse error if we have a function applied directly to a do block
+-- etc. and BlockArguments is not enabled.
+checkExpBlockArguments :: LHsExpr GhcPs -> PV ()
+checkCmdBlockArguments :: LHsCmd GhcPs -> PV ()
+(checkExpBlockArguments, checkCmdBlockArguments) = (checkExpr, checkCmd)
+ where
+ checkExpr :: LHsExpr GhcPs -> PV ()
+ checkExpr expr = case unLoc expr of
+ HsDo _ DoExpr _ -> check "do block" expr
+ HsDo _ MDoExpr _ -> check "mdo block" expr
+ HsLam {} -> check "lambda expression" expr
+ HsCase {} -> check "case expression" expr
+ HsLamCase {} -> check "lambda-case expression" expr
+ HsLet {} -> check "let expression" expr
+ HsIf {} -> check "if expression" expr
+ HsProc {} -> check "proc expression" expr
+ _ -> return ()
+
+ checkCmd :: LHsCmd GhcPs -> PV ()
+ checkCmd cmd = case unLoc cmd of
+ HsCmdLam {} -> check "lambda command" cmd
+ HsCmdCase {} -> check "case command" cmd
+ HsCmdIf {} -> check "if command" cmd
+ HsCmdLet {} -> check "let command" cmd
+ HsCmdDo {} -> check "do command" cmd
+ _ -> return ()
+
+ check :: Outputable a => String -> Located a -> PV ()
+ check element a = do
+ blockArguments <- getBit BlockArgumentsBit
+ unless blockArguments $
+ addError (getLoc a) $
+ text "Unexpected " <> text element <> text " in function application:"
+ $$ nest 4 (ppr a)
+ $$ text "You could write it with parentheses"
+ $$ text "Or perhaps you meant to enable BlockArguments?"
+
+-- | Validate the context constraints and break up a context into a list
+-- of predicates.
+--
+-- @
+-- (Eq a, Ord b) --> [Eq a, Ord b]
+-- Eq a --> [Eq a]
+-- (Eq a) --> [Eq a]
+-- (((Eq a))) --> [Eq a]
+-- @
+checkContext :: LHsType GhcPs -> P ([AddAnn],LHsContext GhcPs)
+checkContext (L l orig_t)
+ = check [] (L l orig_t)
+ where
+ check anns (L lp (HsTupleTy _ HsBoxedOrConstraintTuple ts))
+ -- (Eq a, Ord b) shows up as a tuple type. Only boxed tuples can
+ -- be used as context constraints.
+ = return (anns ++ mkParensApiAnn lp,L l ts) -- Ditto ()
+
+ check anns (L lp1 (HsParTy _ ty))
+ -- to be sure HsParTy doesn't get into the way
+ = check anns' ty
+ where anns' = if l == lp1 then anns
+ else (anns ++ mkParensApiAnn lp1)
+
+ -- no need for anns, returning original
+ check _anns t = checkNoDocs msg t *> return ([],L l [L l orig_t])
+
+ msg = text "data constructor context"
+
+-- | Check recursively if there are any 'HsDocTy's in the given type.
+-- This only works on a subset of types produced by 'btype_no_ops'
+checkNoDocs :: SDoc -> LHsType GhcPs -> P ()
+checkNoDocs msg ty = go ty
+ where
+ go (L _ (HsAppKindTy _ ty ki)) = go ty *> go ki
+ go (L _ (HsAppTy _ t1 t2)) = go t1 *> go t2
+ go (L l (HsDocTy _ t ds)) = addError l $ hsep
+ [ text "Unexpected haddock", quotes (ppr ds)
+ , text "on", msg, quotes (ppr t) ]
+ go _ = pure ()
+
+checkImportDecl :: Maybe (Located Token)
+ -> Maybe (Located Token)
+ -> P ()
+checkImportDecl mPre mPost = do
+ let whenJust mg f = maybe (pure ()) f mg
+
+ importQualifiedPostEnabled <- getBit ImportQualifiedPostBit
+
+ -- Error if 'qualified' found in postpositive position and
+ -- 'ImportQualifiedPost' is not in effect.
+ whenJust mPost $ \post ->
+ when (not importQualifiedPostEnabled) $
+ failOpNotEnabledImportQualifiedPost (getLoc post)
+
+ -- Error if 'qualified' occurs in both pre and postpositive
+ -- positions.
+ whenJust mPost $ \post ->
+ when (isJust mPre) $
+ failOpImportQualifiedTwice (getLoc post)
+
+ -- Warn if 'qualified' found in prepositive position and
+ -- 'Opt_WarnPrepositiveQualifiedModule' is enabled.
+ whenJust mPre $ \pre ->
+ warnPrepositiveQualifiedModule (getLoc pre)
+
+-- -------------------------------------------------------------------------
+-- Checking Patterns.
+
+-- We parse patterns as expressions and check for valid patterns below,
+-- converting the expression into a pattern at the same time.
+
+checkPattern :: Located (PatBuilder GhcPs) -> P (LPat GhcPs)
+checkPattern = runPV . checkLPat
+
+checkPattern_msg :: SDoc -> PV (Located (PatBuilder GhcPs)) -> P (LPat GhcPs)
+checkPattern_msg msg pp = runPV_msg msg (pp >>= checkLPat)
+
+checkLPat :: Located (PatBuilder GhcPs) -> PV (LPat GhcPs)
+checkLPat e@(L l _) = checkPat l e []
+
+checkPat :: SrcSpan -> Located (PatBuilder GhcPs) -> [LPat GhcPs]
+ -> PV (LPat GhcPs)
+checkPat loc (L l e@(PatBuilderVar (L _ c))) args
+ | isRdrDataCon c = return (L loc (ConPatIn (L l c) (PrefixCon args)))
+ | not (null args) && patIsRec c =
+ localPV_msg (\_ -> text "Perhaps you intended to use RecursiveDo") $
+ patFail l (ppr e)
+checkPat loc (L _ (PatBuilderApp f e)) args
+ = do p <- checkLPat e
+ checkPat loc f (p : args)
+checkPat loc (L _ e) []
+ = do p <- checkAPat loc e
+ return (L loc p)
+checkPat loc e _
+ = patFail loc (ppr e)
+
+checkAPat :: SrcSpan -> PatBuilder GhcPs -> PV (Pat GhcPs)
+checkAPat loc e0 = do
+ nPlusKPatterns <- getBit NPlusKPatternsBit
+ case e0 of
+ PatBuilderPat p -> return p
+ PatBuilderVar x -> return (VarPat noExtField x)
+
+ -- Overloaded numeric patterns (e.g. f 0 x = x)
+ -- Negation is recorded separately, so that the literal is zero or +ve
+ -- NB. Negative *primitive* literals are already handled by the lexer
+ PatBuilderOverLit pos_lit -> return (mkNPat (L loc pos_lit) Nothing)
+
+ -- n+k patterns
+ PatBuilderOpApp
+ (L nloc (PatBuilderVar (L _ n)))
+ (L _ plus)
+ (L lloc (PatBuilderOverLit lit@(OverLit {ol_val = HsIntegral {}})))
+ | nPlusKPatterns && (plus == plus_RDR)
+ -> return (mkNPlusKPat (L nloc n) (L lloc lit))
+
+ PatBuilderOpApp l (L cl c) r
+ | isRdrDataCon c -> do
+ l <- checkLPat l
+ r <- checkLPat r
+ return (ConPatIn (L cl c) (InfixCon l r))
+
+ PatBuilderPar e -> checkLPat e >>= (return . (ParPat noExtField))
+ _ -> patFail loc (ppr e0)
+
+placeHolderPunRhs :: DisambECP b => PV (Located b)
+-- The RHS of a punned record field will be filled in by the renamer
+-- It's better not to make it an error, in case we want to print it when
+-- debugging
+placeHolderPunRhs = mkHsVarPV (noLoc pun_RDR)
+
+plus_RDR, pun_RDR :: RdrName
+plus_RDR = mkUnqual varName (fsLit "+") -- Hack
+pun_RDR = mkUnqual varName (fsLit "pun-right-hand-side")
+
+checkPatField :: LHsRecField GhcPs (Located (PatBuilder GhcPs))
+ -> PV (LHsRecField GhcPs (LPat GhcPs))
+checkPatField (L l fld) = do p <- checkLPat (hsRecFieldArg fld)
+ return (L l (fld { hsRecFieldArg = p }))
+
+patFail :: SrcSpan -> SDoc -> PV a
+patFail loc e = addFatalError loc $ text "Parse error in pattern:" <+> ppr e
+
+patIsRec :: RdrName -> Bool
+patIsRec e = e == mkUnqual varName (fsLit "rec")
+
+---------------------------------------------------------------------------
+-- Check Equation Syntax
+
+checkValDef :: Located (PatBuilder GhcPs)
+ -> Maybe (LHsType GhcPs)
+ -> Located (a,GRHSs GhcPs (LHsExpr GhcPs))
+ -> P ([AddAnn],HsBind GhcPs)
+
+checkValDef lhs (Just sig) grhss
+ -- x :: ty = rhs parses as a *pattern* binding
+ = do lhs' <- runPV $ mkHsTySigPV (combineLocs lhs sig) lhs sig >>= checkLPat
+ checkPatBind lhs' grhss
+
+checkValDef lhs Nothing g@(L l (_,grhss))
+ = do { mb_fun <- isFunLhs lhs
+ ; case mb_fun of
+ Just (fun, is_infix, pats, ann) ->
+ checkFunBind NoSrcStrict ann (getLoc lhs)
+ fun is_infix pats (L l grhss)
+ Nothing -> do
+ lhs' <- checkPattern lhs
+ checkPatBind lhs' g }
+
+checkFunBind :: SrcStrictness
+ -> [AddAnn]
+ -> SrcSpan
+ -> Located RdrName
+ -> LexicalFixity
+ -> [Located (PatBuilder GhcPs)]
+ -> Located (GRHSs GhcPs (LHsExpr GhcPs))
+ -> P ([AddAnn],HsBind GhcPs)
+checkFunBind strictness ann lhs_loc fun is_infix pats (L rhs_span grhss)
+ = do ps <- mapM checkPattern pats
+ let match_span = combineSrcSpans lhs_loc rhs_span
+ -- Add back the annotations stripped from any HsPar values in the lhs
+ -- mapM_ (\a -> a match_span) ann
+ return (ann, makeFunBind fun
+ [L match_span (Match { m_ext = noExtField
+ , m_ctxt = FunRhs
+ { mc_fun = fun
+ , mc_fixity = is_infix
+ , mc_strictness = strictness }
+ , m_pats = ps
+ , m_grhss = grhss })])
+ -- The span of the match covers the entire equation.
+ -- That isn't quite right, but it'll do for now.
+
+makeFunBind :: Located RdrName -> [LMatch GhcPs (LHsExpr GhcPs)]
+ -> HsBind GhcPs
+-- Like GHC.Hs.Utils.mkFunBind, but we need to be able to set the fixity too
+makeFunBind fn ms
+ = FunBind { fun_ext = noExtField,
+ fun_id = fn,
+ fun_matches = mkMatchGroup FromSource ms,
+ fun_tick = [] }
+
+-- See Note [FunBind vs PatBind]
+checkPatBind :: LPat GhcPs
+ -> Located (a,GRHSs GhcPs (LHsExpr GhcPs))
+ -> P ([AddAnn],HsBind GhcPs)
+checkPatBind lhs (L match_span (_,grhss))
+ | BangPat _ p <- unLoc lhs
+ , VarPat _ v <- unLoc p
+ = return ([], makeFunBind v [L match_span (m v)])
+ where
+ m v = Match { m_ext = noExtField
+ , m_ctxt = FunRhs { mc_fun = L (getLoc lhs) (unLoc v)
+ , mc_fixity = Prefix
+ , mc_strictness = SrcStrict }
+ , m_pats = []
+ , m_grhss = grhss }
+
+checkPatBind lhs (L _ (_,grhss))
+ = return ([],PatBind noExtField lhs grhss ([],[]))
+
+checkValSigLhs :: LHsExpr GhcPs -> P (Located RdrName)
+checkValSigLhs (L _ (HsVar _ lrdr@(L _ v)))
+ | isUnqual v
+ , not (isDataOcc (rdrNameOcc v))
+ = return lrdr
+
+checkValSigLhs lhs@(L l _)
+ = addFatalError l ((text "Invalid type signature:" <+>
+ ppr lhs <+> text ":: ...")
+ $$ text hint)
+ where
+ hint | foreign_RDR `looks_like` lhs
+ = "Perhaps you meant to use ForeignFunctionInterface?"
+ | default_RDR `looks_like` lhs
+ = "Perhaps you meant to use DefaultSignatures?"
+ | pattern_RDR `looks_like` lhs
+ = "Perhaps you meant to use PatternSynonyms?"
+ | otherwise
+ = "Should be of form <variable> :: <type>"
+
+ -- A common error is to forget the ForeignFunctionInterface flag
+ -- so check for that, and suggest. cf #3805
+ -- Sadly 'foreign import' still barfs 'parse error' because
+ -- 'import' is a keyword
+ looks_like s (L _ (HsVar _ (L _ v))) = v == s
+ looks_like s (L _ (HsApp _ lhs _)) = looks_like s lhs
+ looks_like _ _ = False
+
+ foreign_RDR = mkUnqual varName (fsLit "foreign")
+ default_RDR = mkUnqual varName (fsLit "default")
+ pattern_RDR = mkUnqual varName (fsLit "pattern")
+
+checkDoAndIfThenElse
+ :: (Outputable a, Outputable b, Outputable c)
+ => Located a -> Bool -> b -> Bool -> Located c -> PV ()
+checkDoAndIfThenElse guardExpr semiThen thenExpr semiElse elseExpr
+ | semiThen || semiElse
+ = do doAndIfThenElse <- getBit DoAndIfThenElseBit
+ unless doAndIfThenElse $ do
+ addError (combineLocs guardExpr elseExpr)
+ (text "Unexpected semi-colons in conditional:"
+ $$ nest 4 expr
+ $$ text "Perhaps you meant to use DoAndIfThenElse?")
+ | otherwise = return ()
+ where pprOptSemi True = semi
+ pprOptSemi False = empty
+ expr = text "if" <+> ppr guardExpr <> pprOptSemi semiThen <+>
+ text "then" <+> ppr thenExpr <> pprOptSemi semiElse <+>
+ text "else" <+> ppr elseExpr
+
+isFunLhs :: Located (PatBuilder GhcPs)
+ -> P (Maybe (Located RdrName, LexicalFixity, [Located (PatBuilder GhcPs)],[AddAnn]))
+-- A variable binding is parsed as a FunBind.
+-- Just (fun, is_infix, arg_pats) if e is a function LHS
+isFunLhs e = go e [] []
+ where
+ go (L loc (PatBuilderVar (L _ f))) es ann
+ | not (isRdrDataCon f) = return (Just (L loc f, Prefix, es, ann))
+ go (L _ (PatBuilderApp f e)) es ann = go f (e:es) ann
+ go (L l (PatBuilderPar e)) es@(_:_) ann = go e es (ann ++ mkParensApiAnn l)
+ go (L loc (PatBuilderOpApp l (L loc' op) r)) es ann
+ | not (isRdrDataCon op) -- We have found the function!
+ = return (Just (L loc' op, Infix, (l:r:es), ann))
+ | otherwise -- Infix data con; keep going
+ = do { mb_l <- go l es ann
+ ; case mb_l of
+ Just (op', Infix, j : k : es', ann')
+ -> return (Just (op', Infix, j : op_app : es', ann'))
+ where
+ op_app = L loc (PatBuilderOpApp k
+ (L loc' op) r)
+ _ -> return Nothing }
+ go _ _ _ = return Nothing
+
+-- | Either an operator or an operand.
+data TyEl = TyElOpr RdrName | TyElOpd (HsType GhcPs)
+ | TyElKindApp SrcSpan (LHsType GhcPs)
+ -- See Note [TyElKindApp SrcSpan interpretation]
+ | TyElUnpackedness ([AddAnn], SourceText, SrcUnpackedness)
+ | TyElDocPrev HsDocString
+
+
+{- Note [TyElKindApp SrcSpan interpretation]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+A TyElKindApp captures type application written in haskell as
+
+ @ Foo
+
+where Foo is some type.
+
+The SrcSpan reflects both elements, and there are AnnAt and AnnVal API
+Annotations attached to this SrcSpan for the specific locations of
+each within it.
+-}
+
+instance Outputable TyEl where
+ ppr (TyElOpr name) = ppr name
+ ppr (TyElOpd ty) = ppr ty
+ ppr (TyElKindApp _ ki) = text "@" <> ppr ki
+ ppr (TyElUnpackedness (_, _, unpk)) = ppr unpk
+ ppr (TyElDocPrev doc) = ppr doc
+
+-- | Extract a strictness/unpackedness annotation from the front of a reversed
+-- 'TyEl' list.
+pUnpackedness
+ :: [Located TyEl] -- reversed TyEl
+ -> Maybe ( SrcSpan
+ , [AddAnn]
+ , SourceText
+ , SrcUnpackedness
+ , [Located TyEl] {- remaining TyEl -})
+pUnpackedness (L l x1 : xs)
+ | TyElUnpackedness (anns, prag, unpk) <- x1
+ = Just (l, anns, prag, unpk, xs)
+pUnpackedness _ = Nothing
+
+pBangTy
+ :: LHsType GhcPs -- a type to be wrapped inside HsBangTy
+ -> [Located TyEl] -- reversed TyEl
+ -> ( Bool {- has a strict mark been consumed? -}
+ , LHsType GhcPs {- the resulting BangTy -}
+ , P () {- add annotations -}
+ , [Located TyEl] {- remaining TyEl -})
+pBangTy lt@(L l1 _) xs =
+ case pUnpackedness xs of
+ Nothing -> (False, lt, pure (), xs)
+ Just (l2, anns, prag, unpk, xs') ->
+ let bl = combineSrcSpans l1 l2
+ bt = addUnpackedness (prag, unpk) lt
+ in (True, L bl bt, addAnnsAt bl anns, xs')
+
+mkBangTy :: SrcStrictness -> LHsType GhcPs -> HsType GhcPs
+mkBangTy strictness =
+ HsBangTy noExtField (HsSrcBang NoSourceText NoSrcUnpack strictness)
+
+addUnpackedness :: (SourceText, SrcUnpackedness) -> LHsType GhcPs -> HsType GhcPs
+addUnpackedness (prag, unpk) (L _ (HsBangTy x bang t))
+ | HsSrcBang NoSourceText NoSrcUnpack strictness <- bang
+ = HsBangTy x (HsSrcBang prag unpk strictness) t
+addUnpackedness (prag, unpk) t
+ = HsBangTy noExtField (HsSrcBang prag unpk NoSrcStrict) t
+
+-- | Merge a /reversed/ and /non-empty/ soup of operators and operands
+-- into a type.
+--
+-- User input: @F x y + G a b * X@
+-- Input to 'mergeOps': [X, *, b, a, G, +, y, x, F]
+-- Output corresponds to what the user wrote assuming all operators are of the
+-- same fixity and right-associative.
+--
+-- It's a bit silly that we're doing it at all, as the renamer will have to
+-- rearrange this, and it'd be easier to keep things separate.
+--
+-- See Note [Parsing data constructors is hard]
+mergeOps :: [Located TyEl] -> P (LHsType GhcPs)
+mergeOps ((L l1 (TyElOpd t)) : xs)
+ | (_, t', addAnns, xs') <- pBangTy (L l1 t) xs
+ , null xs' -- We accept a BangTy only when there are no preceding TyEl.
+ = addAnns >> return t'
+mergeOps all_xs = go (0 :: Int) [] id all_xs
+ where
+ -- NB. When modifying clauses in 'go', make sure that the reasoning in
+ -- Note [Non-empty 'acc' in mergeOps clause [end]] is still correct.
+
+ -- clause [unpk]:
+ -- handle (NO)UNPACK pragmas
+ go k acc ops_acc ((L l (TyElUnpackedness (anns, unpkSrc, unpk))):xs) =
+ if not (null acc) && null xs
+ then do { acc' <- eitherToP $ mergeOpsAcc acc
+ ; let a = ops_acc acc'
+ strictMark = HsSrcBang unpkSrc unpk NoSrcStrict
+ bl = combineSrcSpans l (getLoc a)
+ bt = HsBangTy noExtField strictMark a
+ ; addAnnsAt bl anns
+ ; return (L bl bt) }
+ else addFatalError l unpkError
+ where
+ unpkSDoc = case unpkSrc of
+ NoSourceText -> ppr unpk
+ SourceText str -> text str <> text " #-}"
+ unpkError
+ | not (null xs) = unpkSDoc <+> text "cannot appear inside a type."
+ | null acc && k == 0 = unpkSDoc <+> text "must be applied to a type."
+ | otherwise =
+ -- See Note [Impossible case in mergeOps clause [unpk]]
+ panic "mergeOps.UNPACK: impossible position"
+
+ -- clause [doc]:
+ -- we do not expect to encounter any docs
+ go _ _ _ ((L l (TyElDocPrev _)):_) =
+ failOpDocPrev l
+
+ -- clause [opr]:
+ -- when we encounter an operator, we must have accumulated
+ -- something for its rhs, and there must be something left
+ -- to build its lhs.
+ go k acc ops_acc ((L l (TyElOpr op)):xs) =
+ if null acc || null (filter isTyElOpd xs)
+ then failOpFewArgs (L l op)
+ else do { acc' <- eitherToP (mergeOpsAcc acc)
+ ; go (k + 1) [] (\c -> mkLHsOpTy c (L l op) (ops_acc acc')) xs }
+ where
+ isTyElOpd (L _ (TyElOpd _)) = True
+ isTyElOpd _ = False
+
+ -- clause [opd]:
+ -- whenever an operand is encountered, it is added to the accumulator
+ go k acc ops_acc ((L l (TyElOpd a)):xs) = go k (HsValArg (L l a):acc) ops_acc xs
+
+ -- clause [tyapp]:
+ -- whenever a type application is encountered, it is added to the accumulator
+ go k acc ops_acc ((L _ (TyElKindApp l a)):xs) = go k (HsTypeArg l a:acc) ops_acc xs
+
+ -- clause [end]
+ -- See Note [Non-empty 'acc' in mergeOps clause [end]]
+ go _ acc ops_acc [] = do { acc' <- eitherToP (mergeOpsAcc acc)
+ ; return (ops_acc acc') }
+
+mergeOpsAcc :: [HsArg (LHsType GhcPs) (LHsKind GhcPs)]
+ -> Either (SrcSpan, SDoc) (LHsType GhcPs)
+mergeOpsAcc [] = panic "mergeOpsAcc: empty input"
+mergeOpsAcc (HsTypeArg _ (L loc ki):_)
+ = Left (loc, text "Unexpected type application:" <+> ppr ki)
+mergeOpsAcc (HsValArg ty : xs) = go1 ty xs
+ where
+ go1 :: LHsType GhcPs
+ -> [HsArg (LHsType GhcPs) (LHsKind GhcPs)]
+ -> Either (SrcSpan, SDoc) (LHsType GhcPs)
+ go1 lhs [] = Right lhs
+ go1 lhs (x:xs) = case x of
+ HsValArg ty -> go1 (mkHsAppTy lhs ty) xs
+ HsTypeArg loc ki -> let ty = mkHsAppKindTy loc lhs ki
+ in go1 ty xs
+ HsArgPar _ -> go1 lhs xs
+mergeOpsAcc (HsArgPar _: xs) = mergeOpsAcc xs
+
+{- Note [Impossible case in mergeOps clause [unpk]]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+This case should never occur. Let us consider all possible
+variations of 'acc', 'xs', and 'k':
+
+ acc xs k
+==============================
+ null | null 0 -- "must be applied to a type"
+ null | not null 0 -- "must be applied to a type"
+not null | null 0 -- successful parse
+not null | not null 0 -- "cannot appear inside a type"
+ null | null >0 -- handled in clause [opr]
+ null | not null >0 -- "cannot appear inside a type"
+not null | null >0 -- successful parse
+not null | not null >0 -- "cannot appear inside a type"
+
+The (null acc && null xs && k>0) case is handled in clause [opr]
+by the following check:
+
+ if ... || null (filter isTyElOpd xs)
+ then failOpFewArgs (L l op)
+
+We know that this check has been performed because k>0, and by
+the time we reach the end of the list (null xs), the only way
+for (null acc) to hold is that there was not a single TyElOpd
+between the operator and the end of the list. But this case is
+caught by the check and reported as 'failOpFewArgs'.
+-}
+
+{- Note [Non-empty 'acc' in mergeOps clause [end]]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+In clause [end] we need to know that 'acc' is non-empty to call 'mergeAcc'
+without a check.
+
+Running 'mergeOps' with an empty input list is forbidden, so we do not consider
+this possibility. This means we'll hit at least one other clause before we
+reach clause [end].
+
+* Clauses [unpk] and [doc] do not call 'go' recursively, so we cannot hit
+ clause [end] from there.
+* Clause [opd] makes 'acc' non-empty, so if we hit clause [end] after it, 'acc'
+ will be non-empty.
+* Clause [opr] checks that (filter isTyElOpd xs) is not null - so we are going
+ to hit clause [opd] at least once before we reach clause [end], making 'acc'
+ non-empty.
+* There are no other clauses.
+
+Therefore, it is safe to omit a check for non-emptiness of 'acc' in clause
+[end].
+
+-}
+
+pInfixSide :: [Located TyEl] -> Maybe (LHsType GhcPs, P (), [Located TyEl])
+pInfixSide ((L l (TyElOpd t)):xs)
+ | (True, t', addAnns, xs') <- pBangTy (L l t) xs
+ = Just (t', addAnns, xs')
+pInfixSide (el:xs1)
+ | Just t1 <- pLHsTypeArg el
+ = go [t1] xs1
+ where
+ go :: [HsArg (LHsType GhcPs) (LHsKind GhcPs)]
+ -> [Located TyEl] -> Maybe (LHsType GhcPs, P (), [Located TyEl])
+ go acc (el:xs)
+ | Just t <- pLHsTypeArg el
+ = go (t:acc) xs
+ go acc xs = case mergeOpsAcc acc of
+ Left _ -> Nothing
+ Right acc' -> Just (acc', pure (), xs)
+pInfixSide _ = Nothing
+
+pLHsTypeArg :: Located TyEl -> Maybe (HsArg (LHsType GhcPs) (LHsKind GhcPs))
+pLHsTypeArg (L l (TyElOpd a)) = Just (HsValArg (L l a))
+pLHsTypeArg (L _ (TyElKindApp l a)) = Just (HsTypeArg l a)
+pLHsTypeArg _ = Nothing
+
+pDocPrev :: [Located TyEl] -> (Maybe LHsDocString, [Located TyEl])
+pDocPrev = go Nothing
+ where
+ go mTrailingDoc ((L l (TyElDocPrev doc)):xs) =
+ go (mTrailingDoc `mplus` Just (L l doc)) xs
+ go mTrailingDoc xs = (mTrailingDoc, xs)
+
+orErr :: Maybe a -> b -> Either b a
+orErr (Just a) _ = Right a
+orErr Nothing b = Left b
+
+-- | Merge a /reversed/ and /non-empty/ soup of operators and operands
+-- into a data constructor.
+--
+-- User input: @C !A B -- ^ doc@
+-- Input to 'mergeDataCon': ["doc", B, !A, C]
+-- Output: (C, PrefixCon [!A, B], "doc")
+--
+-- See Note [Parsing data constructors is hard]
+mergeDataCon
+ :: [Located TyEl]
+ -> P ( Located RdrName -- constructor name
+ , HsConDeclDetails GhcPs -- constructor field information
+ , Maybe LHsDocString -- docstring to go on the constructor
+ )
+mergeDataCon all_xs =
+ do { (addAnns, a) <- eitherToP res
+ ; addAnns
+ ; return a }
+ where
+ -- We start by splitting off the trailing documentation comment,
+ -- if any exists.
+ (mTrailingDoc, all_xs') = pDocPrev all_xs
+
+ -- Determine whether the trailing documentation comment exists and is the
+ -- only docstring in this constructor declaration.
+ --
+ -- When true, it means that it applies to the constructor itself:
+ -- data T = C
+ -- A
+ -- B -- ^ Comment on C (singleDoc == True)
+ --
+ -- When false, it means that it applies to the last field:
+ -- data T = C -- ^ Comment on C
+ -- A -- ^ Comment on A
+ -- B -- ^ Comment on B (singleDoc == False)
+ singleDoc = isJust mTrailingDoc &&
+ null [ () | (L _ (TyElDocPrev _)) <- all_xs' ]
+
+ -- The result of merging the list of reversed TyEl into a
+ -- data constructor, along with [AddAnn].
+ res = goFirst all_xs'
+
+ -- Take the trailing docstring into account when interpreting
+ -- the docstring near the constructor.
+ --
+ -- data T = C -- ^ docstring right after C
+ -- A
+ -- B -- ^ trailing docstring
+ --
+ -- 'mkConDoc' must be applied to the docstring right after C, so that it
+ -- falls back to the trailing docstring when appropriate (see singleDoc).
+ mkConDoc mDoc | singleDoc = mDoc `mplus` mTrailingDoc
+ | otherwise = mDoc
+
+ -- The docstring for the last field of a data constructor.
+ trailingFieldDoc | singleDoc = Nothing
+ | otherwise = mTrailingDoc
+
+ goFirst [ L l (TyElOpd (HsTyVar _ _ (L _ tc))) ]
+ = do { data_con <- tyConToDataCon l tc
+ ; return (pure (), (data_con, PrefixCon [], mTrailingDoc)) }
+ goFirst ((L l (TyElOpd (HsRecTy _ fields))):xs)
+ | (mConDoc, xs') <- pDocPrev xs
+ , [ L l' (TyElOpd (HsTyVar _ _ (L _ tc))) ] <- xs'
+ = do { data_con <- tyConToDataCon l' tc
+ ; let mDoc = mTrailingDoc `mplus` mConDoc
+ ; return (pure (), (data_con, RecCon (L l fields), mDoc)) }
+ goFirst [L l (TyElOpd (HsTupleTy _ HsBoxedOrConstraintTuple ts))]
+ = return ( pure ()
+ , ( L l (getRdrName (tupleDataCon Boxed (length ts)))
+ , PrefixCon ts
+ , mTrailingDoc ) )
+ goFirst ((L l (TyElOpd t)):xs)
+ | (_, t', addAnns, xs') <- pBangTy (L l t) xs
+ = go addAnns Nothing [mkLHsDocTyMaybe t' trailingFieldDoc] xs'
+ goFirst (L l (TyElKindApp _ _):_)
+ = goInfix Monoid.<> Left (l, kindAppErr)
+ goFirst xs
+ = go (pure ()) mTrailingDoc [] xs
+
+ go addAnns mLastDoc ts [ L l (TyElOpd (HsTyVar _ _ (L _ tc))) ]
+ = do { data_con <- tyConToDataCon l tc
+ ; return (addAnns, (data_con, PrefixCon ts, mkConDoc mLastDoc)) }
+ go addAnns mLastDoc ts ((L l (TyElDocPrev doc)):xs) =
+ go addAnns (mLastDoc `mplus` Just (L l doc)) ts xs
+ go addAnns mLastDoc ts ((L l (TyElOpd t)):xs)
+ | (_, t', addAnns', xs') <- pBangTy (L l t) xs
+ , t'' <- mkLHsDocTyMaybe t' mLastDoc
+ = go (addAnns >> addAnns') Nothing (t'':ts) xs'
+ go _ _ _ ((L _ (TyElOpr _)):_) =
+ -- Encountered an operator: backtrack to the beginning and attempt
+ -- to parse as an infix definition.
+ goInfix
+ go _ _ _ (L l (TyElKindApp _ _):_) = goInfix Monoid.<> Left (l, kindAppErr)
+ go _ _ _ _ = Left malformedErr
+ where
+ malformedErr =
+ ( foldr combineSrcSpans noSrcSpan (map getLoc all_xs')
+ , text "Cannot parse data constructor" <+>
+ text "in a data/newtype declaration:" $$
+ nest 2 (hsep . reverse $ map ppr all_xs'))
+
+ goInfix =
+ do { let xs0 = all_xs'
+ ; (rhs_t, rhs_addAnns, xs1) <- pInfixSide xs0 `orErr` malformedErr
+ ; let (mOpDoc, xs2) = pDocPrev xs1
+ ; (op, xs3) <- case xs2 of
+ (L l (TyElOpr op)) : xs3 ->
+ do { data_con <- tyConToDataCon l op
+ ; return (data_con, xs3) }
+ _ -> Left malformedErr
+ ; let (mLhsDoc, xs4) = pDocPrev xs3
+ ; (lhs_t, lhs_addAnns, xs5) <- pInfixSide xs4 `orErr` malformedErr
+ ; unless (null xs5) (Left malformedErr)
+ ; let rhs = mkLHsDocTyMaybe rhs_t trailingFieldDoc
+ lhs = mkLHsDocTyMaybe lhs_t mLhsDoc
+ addAnns = lhs_addAnns >> rhs_addAnns
+ ; return (addAnns, (op, InfixCon lhs rhs, mkConDoc mOpDoc)) }
+ where
+ malformedErr =
+ ( foldr combineSrcSpans noSrcSpan (map getLoc all_xs')
+ , text "Cannot parse an infix data constructor" <+>
+ text "in a data/newtype declaration:" $$
+ nest 2 (hsep . reverse $ map ppr all_xs'))
+
+ kindAppErr =
+ text "Unexpected kind application" <+>
+ text "in a data/newtype declaration:" $$
+ nest 2 (hsep . reverse $ map ppr all_xs')
+
+---------------------------------------------------------------------------
+-- | Check for monad comprehensions
+--
+-- If the flag MonadComprehensions is set, return a 'MonadComp' context,
+-- otherwise use the usual 'ListComp' context
+
+checkMonadComp :: PV (HsStmtContext GhcRn)
+checkMonadComp = do
+ monadComprehensions <- getBit MonadComprehensionsBit
+ return $ if monadComprehensions
+ then MonadComp
+ else ListComp
+
+-- -------------------------------------------------------------------------
+-- Expression/command/pattern ambiguity.
+-- See Note [Ambiguous syntactic categories]
+--
+
+-- See Note [Parser-Validator]
+-- See Note [Ambiguous syntactic categories]
+--
+-- This newtype is required to avoid impredicative types in monadic
+-- productions. That is, in a production that looks like
+--
+-- | ... {% return (ECP ...) }
+--
+-- we are dealing with
+-- P ECP
+-- whereas without a newtype we would be dealing with
+-- P (forall b. DisambECP b => PV (Located b))
+--
+newtype ECP =
+ ECP { runECP_PV :: forall b. DisambECP b => PV (Located b) }
+
+runECP_P :: DisambECP b => ECP -> P (Located b)
+runECP_P p = runPV (runECP_PV p)
+
+ecpFromExp :: LHsExpr GhcPs -> ECP
+ecpFromExp a = ECP (ecpFromExp' a)
+
+ecpFromCmd :: LHsCmd GhcPs -> ECP
+ecpFromCmd a = ECP (ecpFromCmd' a)
+
+-- | Disambiguate infix operators.
+-- See Note [Ambiguous syntactic categories]
+class DisambInfixOp b where
+ mkHsVarOpPV :: Located RdrName -> PV (Located b)
+ mkHsConOpPV :: Located RdrName -> PV (Located b)
+ mkHsInfixHolePV :: SrcSpan -> PV (Located b)
+
+instance DisambInfixOp (HsExpr GhcPs) where
+ mkHsVarOpPV v = return $ L (getLoc v) (HsVar noExtField v)
+ mkHsConOpPV v = return $ L (getLoc v) (HsVar noExtField v)
+ mkHsInfixHolePV l = return $ L l hsHoleExpr
+
+instance DisambInfixOp RdrName where
+ mkHsConOpPV (L l v) = return $ L l v
+ mkHsVarOpPV (L l v) = return $ L l v
+ mkHsInfixHolePV l =
+ addFatalError l $ text "Invalid infix hole, expected an infix operator"
+
+-- | Disambiguate constructs that may appear when we do not know ahead of time whether we are
+-- parsing an expression, a command, or a pattern.
+-- See Note [Ambiguous syntactic categories]
+class b ~ (Body b) GhcPs => DisambECP b where
+ -- | See Note [Body in DisambECP]
+ type Body b :: Type -> Type
+ -- | Return a command without ambiguity, or fail in a non-command context.
+ ecpFromCmd' :: LHsCmd GhcPs -> PV (Located b)
+ -- | Return an expression without ambiguity, or fail in a non-expression context.
+ ecpFromExp' :: LHsExpr GhcPs -> PV (Located b)
+ -- | Disambiguate "\... -> ..." (lambda)
+ mkHsLamPV :: SrcSpan -> MatchGroup GhcPs (Located b) -> PV (Located b)
+ -- | Disambiguate "let ... in ..."
+ mkHsLetPV :: SrcSpan -> LHsLocalBinds GhcPs -> Located b -> PV (Located b)
+ -- | Infix operator representation
+ type InfixOp b
+ -- | Bring superclass constraints on InfixOp into scope.
+ -- See Note [UndecidableSuperClasses for associated types]
+ superInfixOp :: (DisambInfixOp (InfixOp b) => PV (Located b )) -> PV (Located b)
+ -- | Disambiguate "f # x" (infix operator)
+ mkHsOpAppPV :: SrcSpan -> Located b -> Located (InfixOp b) -> Located b -> PV (Located b)
+ -- | Disambiguate "case ... of ..."
+ mkHsCasePV :: SrcSpan -> LHsExpr GhcPs -> MatchGroup GhcPs (Located b) -> PV (Located b)
+ -- | Function argument representation
+ type FunArg b
+ -- | Bring superclass constraints on FunArg into scope.
+ -- See Note [UndecidableSuperClasses for associated types]
+ superFunArg :: (DisambECP (FunArg b) => PV (Located b)) -> PV (Located b)
+ -- | Disambiguate "f x" (function application)
+ mkHsAppPV :: SrcSpan -> Located b -> Located (FunArg b) -> PV (Located b)
+ -- | Disambiguate "if ... then ... else ..."
+ mkHsIfPV :: SrcSpan
+ -> LHsExpr GhcPs
+ -> Bool -- semicolon?
+ -> Located b
+ -> Bool -- semicolon?
+ -> Located b
+ -> PV (Located b)
+ -- | Disambiguate "do { ... }" (do notation)
+ mkHsDoPV :: SrcSpan -> Located [LStmt GhcPs (Located b)] -> PV (Located b)
+ -- | Disambiguate "( ... )" (parentheses)
+ mkHsParPV :: SrcSpan -> Located b -> PV (Located b)
+ -- | Disambiguate a variable "f" or a data constructor "MkF".
+ mkHsVarPV :: Located RdrName -> PV (Located b)
+ -- | Disambiguate a monomorphic literal
+ mkHsLitPV :: Located (HsLit GhcPs) -> PV (Located b)
+ -- | Disambiguate an overloaded literal
+ mkHsOverLitPV :: Located (HsOverLit GhcPs) -> PV (Located b)
+ -- | Disambiguate a wildcard
+ mkHsWildCardPV :: SrcSpan -> PV (Located b)
+ -- | Disambiguate "a :: t" (type annotation)
+ mkHsTySigPV :: SrcSpan -> Located b -> LHsType GhcPs -> PV (Located b)
+ -- | Disambiguate "[a,b,c]" (list syntax)
+ mkHsExplicitListPV :: SrcSpan -> [Located b] -> PV (Located b)
+ -- | Disambiguate "$(...)" and "[quasi|...|]" (TH splices)
+ mkHsSplicePV :: Located (HsSplice GhcPs) -> PV (Located b)
+ -- | Disambiguate "f { a = b, ... }" syntax (record construction and record updates)
+ mkHsRecordPV ::
+ SrcSpan ->
+ SrcSpan ->
+ Located b ->
+ ([LHsRecField GhcPs (Located b)], Maybe SrcSpan) ->
+ PV (Located b)
+ -- | Disambiguate "-a" (negation)
+ mkHsNegAppPV :: SrcSpan -> Located b -> PV (Located b)
+ -- | Disambiguate "(# a)" (right operator section)
+ mkHsSectionR_PV :: SrcSpan -> Located (InfixOp b) -> Located b -> PV (Located b)
+ -- | Disambiguate "(a -> b)" (view pattern)
+ mkHsViewPatPV :: SrcSpan -> LHsExpr GhcPs -> Located b -> PV (Located b)
+ -- | Disambiguate "a@b" (as-pattern)
+ mkHsAsPatPV :: SrcSpan -> Located RdrName -> Located b -> PV (Located b)
+ -- | Disambiguate "~a" (lazy pattern)
+ mkHsLazyPatPV :: SrcSpan -> Located b -> PV (Located b)
+ -- | Disambiguate "!a" (bang pattern)
+ mkHsBangPatPV :: SrcSpan -> Located b -> PV (Located b)
+ -- | Disambiguate tuple sections and unboxed sums
+ mkSumOrTuplePV :: SrcSpan -> Boxity -> SumOrTuple b -> PV (Located b)
+ -- | Validate infixexp LHS to reject unwanted {-# SCC ... #-} pragmas
+ rejectPragmaPV :: Located b -> PV ()
+
+
+{- Note [UndecidableSuperClasses for associated types]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+(This Note is about the code in GHC, not about the user code that we are parsing)
+
+Assume we have a class C with an associated type T:
+
+ class C a where
+ type T a
+ ...
+
+If we want to add 'C (T a)' as a superclass, we need -XUndecidableSuperClasses:
+
+ {-# LANGUAGE UndecidableSuperClasses #-}
+ class C (T a) => C a where
+ type T a
+ ...
+
+Unfortunately, -XUndecidableSuperClasses don't work all that well, sometimes
+making GHC loop. The workaround is to bring this constraint into scope
+manually with a helper method:
+
+ class C a where
+ type T a
+ superT :: (C (T a) => r) -> r
+
+In order to avoid ambiguous types, 'r' must mention 'a'.
+
+For consistency, we use this approach for all constraints on associated types,
+even when -XUndecidableSuperClasses are not required.
+-}
+
+{- Note [Body in DisambECP]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+There are helper functions (mkBodyStmt, mkBindStmt, unguardedRHS, etc) that
+require their argument to take a form of (body GhcPs) for some (body :: Type ->
+*). To satisfy this requirement, we say that (b ~ Body b GhcPs) in the
+superclass constraints of DisambECP.
+
+The alternative is to change mkBodyStmt, mkBindStmt, unguardedRHS, etc, to drop
+this requirement. It is possible and would allow removing the type index of
+PatBuilder, but leads to worse type inference, breaking some code in the
+typechecker.
+-}
+
+instance DisambECP (HsCmd GhcPs) where
+ type Body (HsCmd GhcPs) = HsCmd
+ ecpFromCmd' = return
+ ecpFromExp' (L l e) = cmdFail l (ppr e)
+ mkHsLamPV l mg = return $ L l (HsCmdLam noExtField mg)
+ mkHsLetPV l bs e = return $ L l (HsCmdLet noExtField bs e)
+ type InfixOp (HsCmd GhcPs) = HsExpr GhcPs
+ superInfixOp m = m
+ mkHsOpAppPV l c1 op c2 = do
+ let cmdArg c = L (getLoc c) $ HsCmdTop noExtField c
+ return $ L l $ HsCmdArrForm noExtField op Infix Nothing [cmdArg c1, cmdArg c2]
+ mkHsCasePV l c mg = return $ L l (HsCmdCase noExtField c mg)
+ type FunArg (HsCmd GhcPs) = HsExpr GhcPs
+ superFunArg m = m
+ mkHsAppPV l c e = do
+ checkCmdBlockArguments c
+ checkExpBlockArguments e
+ return $ L l (HsCmdApp noExtField c e)
+ mkHsIfPV l c semi1 a semi2 b = do
+ checkDoAndIfThenElse c semi1 a semi2 b
+ return $ L l (mkHsCmdIf c a b)
+ mkHsDoPV l stmts = return $ L l (HsCmdDo noExtField stmts)
+ mkHsParPV l c = return $ L l (HsCmdPar noExtField c)
+ mkHsVarPV (L l v) = cmdFail l (ppr v)
+ mkHsLitPV (L l a) = cmdFail l (ppr a)
+ mkHsOverLitPV (L l a) = cmdFail l (ppr a)
+ mkHsWildCardPV l = cmdFail l (text "_")
+ mkHsTySigPV l a sig = cmdFail l (ppr a <+> text "::" <+> ppr sig)
+ mkHsExplicitListPV l xs = cmdFail l $
+ brackets (fsep (punctuate comma (map ppr xs)))
+ mkHsSplicePV (L l sp) = cmdFail l (ppr sp)
+ mkHsRecordPV l _ a (fbinds, ddLoc) = cmdFail l $
+ ppr a <+> ppr (mk_rec_fields fbinds ddLoc)
+ mkHsNegAppPV l a = cmdFail l (text "-" <> ppr a)
+ mkHsSectionR_PV l op c = cmdFail l $
+ let pp_op = fromMaybe (panic "cannot print infix operator")
+ (ppr_infix_expr (unLoc op))
+ in pp_op <> ppr c
+ mkHsViewPatPV l a b = cmdFail l $
+ ppr a <+> text "->" <+> ppr b
+ mkHsAsPatPV l v c = cmdFail l $
+ pprPrefixOcc (unLoc v) <> text "@" <> ppr c
+ mkHsLazyPatPV l c = cmdFail l $
+ text "~" <> ppr c
+ mkHsBangPatPV l c = cmdFail l $
+ text "!" <> ppr c
+ mkSumOrTuplePV l boxity a = cmdFail l (pprSumOrTuple boxity a)
+ rejectPragmaPV _ = return ()
+
+cmdFail :: SrcSpan -> SDoc -> PV a
+cmdFail loc e = addFatalError loc $
+ hang (text "Parse error in command:") 2 (ppr e)
+
+instance DisambECP (HsExpr GhcPs) where
+ type Body (HsExpr GhcPs) = HsExpr
+ ecpFromCmd' (L l c) = do
+ addError l $ vcat
+ [ text "Arrow command found where an expression was expected:",
+ nest 2 (ppr c) ]
+ return (L l hsHoleExpr)
+ ecpFromExp' = return
+ mkHsLamPV l mg = return $ L l (HsLam noExtField mg)
+ mkHsLetPV l bs c = return $ L l (HsLet noExtField bs c)
+ type InfixOp (HsExpr GhcPs) = HsExpr GhcPs
+ superInfixOp m = m
+ mkHsOpAppPV l e1 op e2 = do
+ return $ L l $ OpApp noExtField e1 op e2
+ mkHsCasePV l e mg = return $ L l (HsCase noExtField e mg)
+ type FunArg (HsExpr GhcPs) = HsExpr GhcPs
+ superFunArg m = m
+ mkHsAppPV l e1 e2 = do
+ checkExpBlockArguments e1
+ checkExpBlockArguments e2
+ return $ L l (HsApp noExtField e1 e2)
+ mkHsIfPV l c semi1 a semi2 b = do
+ checkDoAndIfThenElse c semi1 a semi2 b
+ return $ L l (mkHsIf c a b)
+ mkHsDoPV l stmts = return $ L l (HsDo noExtField DoExpr stmts)
+ mkHsParPV l e = return $ L l (HsPar noExtField e)
+ mkHsVarPV v@(getLoc -> l) = return $ L l (HsVar noExtField v)
+ mkHsLitPV (L l a) = return $ L l (HsLit noExtField a)
+ mkHsOverLitPV (L l a) = return $ L l (HsOverLit noExtField a)
+ mkHsWildCardPV l = return $ L l hsHoleExpr
+ mkHsTySigPV l a sig = return $ L l (ExprWithTySig noExtField a (mkLHsSigWcType sig))
+ mkHsExplicitListPV l xs = return $ L l (ExplicitList noExtField Nothing xs)
+ mkHsSplicePV sp = return $ mapLoc (HsSpliceE noExtField) sp
+ mkHsRecordPV l lrec a (fbinds, ddLoc) = do
+ r <- mkRecConstrOrUpdate a lrec (fbinds, ddLoc)
+ checkRecordSyntax (L l r)
+ mkHsNegAppPV l a = return $ L l (NegApp noExtField a noSyntaxExpr)
+ mkHsSectionR_PV l op e = return $ L l (SectionR noExtField op e)
+ mkHsViewPatPV l a b = patSynErr "View pattern" l (ppr a <+> text "->" <+> ppr b) empty
+ mkHsAsPatPV l v e =
+ patSynErr "@-pattern" l (pprPrefixOcc (unLoc v) <> text "@" <> ppr e) $
+ text "Type application syntax requires a space before '@'"
+ mkHsLazyPatPV l e = patSynErr "Lazy pattern" l (text "~" <> ppr e) $
+ text "Did you mean to add a space after the '~'?"
+ mkHsBangPatPV l e = patSynErr "Bang pattern" l (text "!" <> ppr e) $
+ text "Did you mean to add a space after the '!'?"
+ mkSumOrTuplePV = mkSumOrTupleExpr
+ rejectPragmaPV (L _ (OpApp _ _ _ e)) =
+ -- assuming left-associative parsing of operators
+ rejectPragmaPV e
+ rejectPragmaPV (L l (HsPragE _ prag _)) =
+ addError l $
+ hang (text "A pragma is not allowed in this position:") 2 (ppr prag)
+ rejectPragmaPV _ = return ()
+
+patSynErr :: String -> SrcSpan -> SDoc -> SDoc -> PV (LHsExpr GhcPs)
+patSynErr item l e explanation =
+ do { addError l $
+ sep [text item <+> text "in expression context:",
+ nest 4 (ppr e)] $$
+ explanation
+ ; return (L l hsHoleExpr) }
+
+hsHoleExpr :: HsExpr (GhcPass id)
+hsHoleExpr = HsUnboundVar noExtField (mkVarOcc "_")
+
+-- | See Note [Ambiguous syntactic categories] and Note [PatBuilder]
+data PatBuilder p
+ = PatBuilderPat (Pat p)
+ | PatBuilderPar (Located (PatBuilder p))
+ | PatBuilderApp (Located (PatBuilder p)) (Located (PatBuilder p))
+ | PatBuilderOpApp (Located (PatBuilder p)) (Located RdrName) (Located (PatBuilder p))
+ | PatBuilderVar (Located RdrName)
+ | PatBuilderOverLit (HsOverLit GhcPs)
+
+instance Outputable (PatBuilder GhcPs) where
+ ppr (PatBuilderPat p) = ppr p
+ ppr (PatBuilderPar (L _ p)) = parens (ppr p)
+ ppr (PatBuilderApp (L _ p1) (L _ p2)) = ppr p1 <+> ppr p2
+ ppr (PatBuilderOpApp (L _ p1) op (L _ p2)) = ppr p1 <+> ppr op <+> ppr p2
+ ppr (PatBuilderVar v) = ppr v
+ ppr (PatBuilderOverLit l) = ppr l
+
+instance DisambECP (PatBuilder GhcPs) where
+ type Body (PatBuilder GhcPs) = PatBuilder
+ ecpFromCmd' (L l c) =
+ addFatalError l $
+ text "Command syntax in pattern:" <+> ppr c
+ ecpFromExp' (L l e) =
+ addFatalError l $
+ text "Expression syntax in pattern:" <+> ppr e
+ mkHsLamPV l _ = addFatalError l $
+ text "Lambda-syntax in pattern." $$
+ text "Pattern matching on functions is not possible."
+ mkHsLetPV l _ _ = addFatalError l $ text "(let ... in ...)-syntax in pattern"
+ type InfixOp (PatBuilder GhcPs) = RdrName
+ superInfixOp m = m
+ mkHsOpAppPV l p1 op p2 = return $ L l $ PatBuilderOpApp p1 op p2
+ mkHsCasePV l _ _ = addFatalError l $ text "(case ... of ...)-syntax in pattern"
+ type FunArg (PatBuilder GhcPs) = PatBuilder GhcPs
+ superFunArg m = m
+ mkHsAppPV l p1 p2 = return $ L l (PatBuilderApp p1 p2)
+ mkHsIfPV l _ _ _ _ _ = addFatalError l $ text "(if ... then ... else ...)-syntax in pattern"
+ mkHsDoPV l _ = addFatalError l $ text "do-notation in pattern"
+ mkHsParPV l p = return $ L l (PatBuilderPar p)
+ mkHsVarPV v@(getLoc -> l) = return $ L l (PatBuilderVar v)
+ mkHsLitPV lit@(L l a) = do
+ checkUnboxedStringLitPat lit
+ return $ L l (PatBuilderPat (LitPat noExtField a))
+ mkHsOverLitPV (L l a) = return $ L l (PatBuilderOverLit a)
+ mkHsWildCardPV l = return $ L l (PatBuilderPat (WildPat noExtField))
+ mkHsTySigPV l b sig = do
+ p <- checkLPat b
+ return $ L l (PatBuilderPat (SigPat noExtField p (mkLHsSigWcType sig)))
+ mkHsExplicitListPV l xs = do
+ ps <- traverse checkLPat xs
+ return (L l (PatBuilderPat (ListPat noExtField ps)))
+ mkHsSplicePV (L l sp) = return $ L l (PatBuilderPat (SplicePat noExtField sp))
+ mkHsRecordPV l _ a (fbinds, ddLoc) = do
+ r <- mkPatRec a (mk_rec_fields fbinds ddLoc)
+ checkRecordSyntax (L l r)
+ mkHsNegAppPV l (L lp p) = do
+ lit <- case p of
+ PatBuilderOverLit pos_lit -> return (L lp pos_lit)
+ _ -> patFail l (text "-" <> ppr p)
+ return $ L l (PatBuilderPat (mkNPat lit (Just noSyntaxExpr)))
+ mkHsSectionR_PV l op p = patFail l (pprInfixOcc (unLoc op) <> ppr p)
+ mkHsViewPatPV l a b = do
+ p <- checkLPat b
+ return $ L l (PatBuilderPat (ViewPat noExtField a p))
+ mkHsAsPatPV l v e = do
+ p <- checkLPat e
+ return $ L l (PatBuilderPat (AsPat noExtField v p))
+ mkHsLazyPatPV l e = do
+ p <- checkLPat e
+ return $ L l (PatBuilderPat (LazyPat noExtField p))
+ mkHsBangPatPV l e = do
+ p <- checkLPat e
+ let pb = BangPat noExtField p
+ hintBangPat l pb
+ return $ L l (PatBuilderPat pb)
+ mkSumOrTuplePV = mkSumOrTuplePat
+ rejectPragmaPV _ = return ()
+
+checkUnboxedStringLitPat :: Located (HsLit GhcPs) -> PV ()
+checkUnboxedStringLitPat (L loc lit) =
+ case lit of
+ HsStringPrim _ _ -- Trac #13260
+ -> addFatalError loc (text "Illegal unboxed string literal in pattern:" $$ ppr lit)
+ _ -> return ()
+
+mkPatRec ::
+ Located (PatBuilder GhcPs) ->
+ HsRecFields GhcPs (Located (PatBuilder GhcPs)) ->
+ PV (PatBuilder GhcPs)
+mkPatRec (unLoc -> PatBuilderVar c) (HsRecFields fs dd)
+ | isRdrDataCon (unLoc c)
+ = do fs <- mapM checkPatField fs
+ return (PatBuilderPat (ConPatIn c (RecCon (HsRecFields fs dd))))
+mkPatRec p _ =
+ addFatalError (getLoc p) $ text "Not a record constructor:" <+> ppr p
+
+{- Note [Ambiguous syntactic categories]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+There are places in the grammar where we do not know whether we are parsing an
+expression or a pattern without unlimited lookahead (which we do not have in
+'happy'):
+
+View patterns:
+
+ f (Con a b ) = ... -- 'Con a b' is a pattern
+ f (Con a b -> x) = ... -- 'Con a b' is an expression
+
+do-notation:
+
+ do { Con a b <- x } -- 'Con a b' is a pattern
+ do { Con a b } -- 'Con a b' is an expression
+
+Guards:
+
+ x | True <- p && q = ... -- 'True' is a pattern
+ x | True = ... -- 'True' is an expression
+
+Top-level value/function declarations (FunBind/PatBind):
+
+ f ! a -- TH splice
+ f ! a = ... -- function declaration
+
+ Until we encounter the = sign, we don't know if it's a top-level
+ TemplateHaskell splice where ! is used, or if it's a function declaration
+ where ! is bound.
+
+There are also places in the grammar where we do not know whether we are
+parsing an expression or a command:
+
+ proc x -> do { (stuff) -< x } -- 'stuff' is an expression
+ proc x -> do { (stuff) } -- 'stuff' is a command
+
+ Until we encounter arrow syntax (-<) we don't know whether to parse 'stuff'
+ as an expression or a command.
+
+In fact, do-notation is subject to both ambiguities:
+
+ proc x -> do { (stuff) -< x } -- 'stuff' is an expression
+ proc x -> do { (stuff) <- f -< x } -- 'stuff' is a pattern
+ proc x -> do { (stuff) } -- 'stuff' is a command
+
+There are many possible solutions to this problem. For an overview of the ones
+we decided against, see Note [Resolving parsing ambiguities: non-taken alternatives]
+
+The solution that keeps basic definitions (such as HsExpr) clean, keeps the
+concerns local to the parser, and does not require duplication of hsSyn types,
+or an extra pass over the entire AST, is to parse into an overloaded
+parser-validator (a so-called tagless final encoding):
+
+ class DisambECP b where ...
+ instance DisambECP (HsCmd GhcPs) where ...
+ instance DisambECP (HsExp GhcPs) where ...
+ instance DisambECP (PatBuilder GhcPs) where ...
+
+The 'DisambECP' class contains functions to build and validate 'b'. For example,
+to add parentheses we have:
+
+ mkHsParPV :: DisambECP b => SrcSpan -> Located b -> PV (Located b)
+
+'mkHsParPV' will wrap the inner value in HsCmdPar for commands, HsPar for
+expressions, and 'PatBuilderPar' for patterns (later transformed into ParPat,
+see Note [PatBuilder]).
+
+Consider the 'alts' production used to parse case-of alternatives:
+
+ alts :: { Located ([AddAnn],[LMatch GhcPs (LHsExpr GhcPs)]) }
+ : alts1 { sL1 $1 (fst $ unLoc $1,snd $ unLoc $1) }
+ | ';' alts { sLL $1 $> ((mj AnnSemi $1:(fst $ unLoc $2)),snd $ unLoc $2) }
+
+We abstract over LHsExpr GhcPs, and it becomes:
+
+ alts :: { forall b. DisambECP b => PV (Located ([AddAnn],[LMatch GhcPs (Located b)])) }
+ : alts1 { $1 >>= \ $1 ->
+ return $ sL1 $1 (fst $ unLoc $1,snd $ unLoc $1) }
+ | ';' alts { $2 >>= \ $2 ->
+ return $ sLL $1 $> ((mj AnnSemi $1:(fst $ unLoc $2)),snd $ unLoc $2) }
+
+Compared to the initial definition, the added bits are:
+
+ forall b. DisambECP b => PV ( ... ) -- in the type signature
+ $1 >>= \ $1 -> return $ -- in one reduction rule
+ $2 >>= \ $2 -> return $ -- in another reduction rule
+
+The overhead is constant relative to the size of the rest of the reduction
+rule, so this approach scales well to large parser productions.
+
+Note that we write ($1 >>= \ $1 -> ...), so the second $1 is in a binding
+position and shadows the previous $1. We can do this because internally
+'happy' desugars $n to happy_var_n, and the rationale behind this idiom
+is to be able to write (sLL $1 $>) later on. The alternative would be to
+write this as ($1 >>= \ fresh_name -> ...), but then we couldn't refer
+to the last fresh name as $>.
+-}
+
+
+{- Note [Resolving parsing ambiguities: non-taken alternatives]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Alternative I, extra constructors in GHC.Hs.Expr
+------------------------------------------------
+We could add extra constructors to HsExpr to represent command-specific and
+pattern-specific syntactic constructs. Under this scheme, we parse patterns
+and commands as expressions and rejig later. This is what GHC used to do, and
+it polluted 'HsExpr' with irrelevant constructors:
+
+ * for commands: 'HsArrForm', 'HsArrApp'
+ * for patterns: 'EWildPat', 'EAsPat', 'EViewPat', 'ELazyPat'
+
+(As of now, we still do that for patterns, but we plan to fix it).
+
+There are several issues with this:
+
+ * The implementation details of parsing are leaking into hsSyn definitions.
+
+ * Code that uses HsExpr has to panic on these impossible-after-parsing cases.
+
+ * HsExpr is arbitrarily selected as the extension basis. Why not extend
+ HsCmd or HsPat with extra constructors instead?
+
+Alternative II, extra constructors in GHC.Hs.Expr for GhcPs
+-----------------------------------------------------------
+We could address some of the problems with Alternative I by using Trees That
+Grow and extending HsExpr only in the GhcPs pass. However, GhcPs corresponds to
+the output of parsing, not to its intermediate results, so we wouldn't want
+them there either.
+
+Alternative III, extra constructors in GHC.Hs.Expr for GhcPrePs
+---------------------------------------------------------------
+We could introduce a new pass, GhcPrePs, to keep GhcPs pristine.
+Unfortunately, creating a new pass would significantly bloat conversion code
+and slow down the compiler by adding another linear-time pass over the entire
+AST. For example, in order to build HsExpr GhcPrePs, we would need to build
+HsLocalBinds GhcPrePs (as part of HsLet), and we never want HsLocalBinds
+GhcPrePs.
+
+
+Alternative IV, sum type and bottom-up data flow
+------------------------------------------------
+Expressions and commands are disjoint. There are no user inputs that could be
+interpreted as either an expression or a command depending on outer context:
+
+ 5 -- definitely an expression
+ x -< y -- definitely a command
+
+Even though we have both 'HsLam' and 'HsCmdLam', we can look at
+the body to disambiguate:
+
+ \p -> 5 -- definitely an expression
+ \p -> x -< y -- definitely a command
+
+This means we could use a bottom-up flow of information to determine
+whether we are parsing an expression or a command, using a sum type
+for intermediate results:
+
+ Either (LHsExpr GhcPs) (LHsCmd GhcPs)
+
+There are two problems with this:
+
+ * We cannot handle the ambiguity between expressions and
+ patterns, which are not disjoint.
+
+ * Bottom-up flow of information leads to poor error messages. Consider
+
+ if ... then 5 else (x -< y)
+
+ Do we report that '5' is not a valid command or that (x -< y) is not a
+ valid expression? It depends on whether we want the entire node to be
+ 'HsIf' or 'HsCmdIf', and this information flows top-down, from the
+ surrounding parsing context (are we in 'proc'?)
+
+Alternative V, backtracking with parser combinators
+---------------------------------------------------
+One might think we could sidestep the issue entirely by using a backtracking
+parser and doing something along the lines of (try pExpr <|> pPat).
+
+Turns out, this wouldn't work very well, as there can be patterns inside
+expressions (e.g. via 'case', 'let', 'do') and expressions inside patterns
+(e.g. view patterns). To handle this, we would need to backtrack while
+backtracking, and unbound levels of backtracking lead to very fragile
+performance.
+
+Alternative VI, an intermediate data type
+-----------------------------------------
+There are common syntactic elements of expressions, commands, and patterns
+(e.g. all of them must have balanced parentheses), and we can capture this
+common structure in an intermediate data type, Frame:
+
+data Frame
+ = FrameVar RdrName
+ -- ^ Identifier: Just, map, BS.length
+ | FrameTuple [LTupArgFrame] Boxity
+ -- ^ Tuple (section): (a,b) (a,b,c) (a,,) (,a,)
+ | FrameTySig LFrame (LHsSigWcType GhcPs)
+ -- ^ Type signature: x :: ty
+ | FramePar (SrcSpan, SrcSpan) LFrame
+ -- ^ Parentheses
+ | FrameIf LFrame LFrame LFrame
+ -- ^ If-expression: if p then x else y
+ | FrameCase LFrame [LFrameMatch]
+ -- ^ Case-expression: case x of { p1 -> e1; p2 -> e2 }
+ | FrameDo (HsStmtContext GhcRn) [LFrameStmt]
+ -- ^ Do-expression: do { s1; a <- s2; s3 }
+ ...
+ | FrameExpr (HsExpr GhcPs) -- unambiguously an expression
+ | FramePat (HsPat GhcPs) -- unambiguously a pattern
+ | FrameCommand (HsCmd GhcPs) -- unambiguously a command
+
+To determine which constructors 'Frame' needs to have, we take the union of
+intersections between HsExpr, HsCmd, and HsPat.
+
+The intersection between HsPat and HsExpr:
+
+ HsPat = VarPat | TuplePat | SigPat | ParPat | ...
+ HsExpr = HsVar | ExplicitTuple | ExprWithTySig | HsPar | ...
+ -------------------------------------------------------------------
+ Frame = FrameVar | FrameTuple | FrameTySig | FramePar | ...
+
+The intersection between HsCmd and HsExpr:
+
+ HsCmd = HsCmdIf | HsCmdCase | HsCmdDo | HsCmdPar
+ HsExpr = HsIf | HsCase | HsDo | HsPar
+ ------------------------------------------------
+ Frame = FrameIf | FrameCase | FrameDo | FramePar
+
+The intersection between HsCmd and HsPat:
+
+ HsPat = ParPat | ...
+ HsCmd = HsCmdPar | ...
+ -----------------------
+ Frame = FramePar | ...
+
+Take the union of each intersection and this yields the final 'Frame' data
+type. The problem with this approach is that we end up duplicating a good
+portion of hsSyn:
+
+ Frame for HsExpr, HsPat, HsCmd
+ TupArgFrame for HsTupArg
+ FrameMatch for Match
+ FrameStmt for StmtLR
+ FrameGRHS for GRHS
+ FrameGRHSs for GRHSs
+ ...
+
+Alternative VII, a product type
+-------------------------------
+We could avoid the intermediate representation of Alternative VI by parsing
+into a product of interpretations directly:
+
+ -- See Note [Parser-Validator]
+ type ExpCmdPat = ( PV (LHsExpr GhcPs)
+ , PV (LHsCmd GhcPs)
+ , PV (LHsPat GhcPs) )
+
+This means that in positions where we do not know whether to produce
+expression, a pattern, or a command, we instead produce a parser-validator for
+each possible option.
+
+Then, as soon as we have parsed far enough to resolve the ambiguity, we pick
+the appropriate component of the product, discarding the rest:
+
+ checkExpOf3 (e, _, _) = e -- interpret as an expression
+ checkCmdOf3 (_, c, _) = c -- interpret as a command
+ checkPatOf3 (_, _, p) = p -- interpret as a pattern
+
+We can easily define ambiguities between arbitrary subsets of interpretations.
+For example, when we know ahead of type that only an expression or a command is
+possible, but not a pattern, we can use a smaller type:
+
+ -- See Note [Parser-Validator]
+ type ExpCmd = (PV (LHsExpr GhcPs), PV (LHsCmd GhcPs))
+
+ checkExpOf2 (e, _) = e -- interpret as an expression
+ checkCmdOf2 (_, c) = c -- interpret as a command
+
+However, there is a slight problem with this approach, namely code duplication
+in parser productions. Consider the 'alts' production used to parse case-of
+alternatives:
+
+ alts :: { Located ([AddAnn],[LMatch GhcPs (LHsExpr GhcPs)]) }
+ : alts1 { sL1 $1 (fst $ unLoc $1,snd $ unLoc $1) }
+ | ';' alts { sLL $1 $> ((mj AnnSemi $1:(fst $ unLoc $2)),snd $ unLoc $2) }
+
+Under the new scheme, we have to completely duplicate its type signature and
+each reduction rule:
+
+ alts :: { ( PV (Located ([AddAnn],[LMatch GhcPs (LHsExpr GhcPs)])) -- as an expression
+ , PV (Located ([AddAnn],[LMatch GhcPs (LHsCmd GhcPs)])) -- as a command
+ ) }
+ : alts1
+ { ( checkExpOf2 $1 >>= \ $1 ->
+ return $ sL1 $1 (fst $ unLoc $1,snd $ unLoc $1)
+ , checkCmdOf2 $1 >>= \ $1 ->
+ return $ sL1 $1 (fst $ unLoc $1,snd $ unLoc $1)
+ ) }
+ | ';' alts
+ { ( checkExpOf2 $2 >>= \ $2 ->
+ return $ sLL $1 $> ((mj AnnSemi $1:(fst $ unLoc $2)),snd $ unLoc $2)
+ , checkCmdOf2 $2 >>= \ $2 ->
+ return $ sLL $1 $> ((mj AnnSemi $1:(fst $ unLoc $2)),snd $ unLoc $2)
+ ) }
+
+And the same goes for other productions: 'altslist', 'alts1', 'alt', 'alt_rhs',
+'ralt', 'gdpats', 'gdpat', 'exp', ... and so on. That is a lot of code!
+
+Alternative VIII, a function from a GADT
+----------------------------------------
+We could avoid code duplication of the Alternative VII by representing the product
+as a function from a GADT:
+
+ data ExpCmdG b where
+ ExpG :: ExpCmdG HsExpr
+ CmdG :: ExpCmdG HsCmd
+
+ type ExpCmd = forall b. ExpCmdG b -> PV (Located (b GhcPs))
+
+ checkExp :: ExpCmd -> PV (LHsExpr GhcPs)
+ checkCmd :: ExpCmd -> PV (LHsCmd GhcPs)
+ checkExp f = f ExpG -- interpret as an expression
+ checkCmd f = f CmdG -- interpret as a command
+
+Consider the 'alts' production used to parse case-of alternatives:
+
+ alts :: { Located ([AddAnn],[LMatch GhcPs (LHsExpr GhcPs)]) }
+ : alts1 { sL1 $1 (fst $ unLoc $1,snd $ unLoc $1) }
+ | ';' alts { sLL $1 $> ((mj AnnSemi $1:(fst $ unLoc $2)),snd $ unLoc $2) }
+
+We abstract over LHsExpr, and it becomes:
+
+ alts :: { forall b. ExpCmdG b -> PV (Located ([AddAnn],[LMatch GhcPs (Located (b GhcPs))])) }
+ : alts1
+ { \tag -> $1 tag >>= \ $1 ->
+ return $ sL1 $1 (fst $ unLoc $1,snd $ unLoc $1) }
+ | ';' alts
+ { \tag -> $2 tag >>= \ $2 ->
+ return $ sLL $1 $> ((mj AnnSemi $1:(fst $ unLoc $2)),snd $ unLoc $2) }
+
+Note that 'ExpCmdG' is a singleton type, the value is completely
+determined by the type:
+
+ when (b~HsExpr), tag = ExpG
+ when (b~HsCmd), tag = CmdG
+
+This is a clear indication that we can use a class to pass this value behind
+the scenes:
+
+ class ExpCmdI b where expCmdG :: ExpCmdG b
+ instance ExpCmdI HsExpr where expCmdG = ExpG
+ instance ExpCmdI HsCmd where expCmdG = CmdG
+
+And now the 'alts' production is simplified, as we no longer need to
+thread 'tag' explicitly:
+
+ alts :: { forall b. ExpCmdI b => PV (Located ([AddAnn],[LMatch GhcPs (Located (b GhcPs))])) }
+ : alts1 { $1 >>= \ $1 ->
+ return $ sL1 $1 (fst $ unLoc $1,snd $ unLoc $1) }
+ | ';' alts { $2 >>= \ $2 ->
+ return $ sLL $1 $> ((mj AnnSemi $1:(fst $ unLoc $2)),snd $ unLoc $2) }
+
+This encoding works well enough, but introduces an extra GADT unlike the
+tagless final encoding, and there's no need for this complexity.
+
+-}
+
+{- Note [PatBuilder]
+~~~~~~~~~~~~~~~~~~~~
+Unlike HsExpr or HsCmd, the Pat type cannot accommodate all intermediate forms,
+so we introduce the notion of a PatBuilder.
+
+Consider a pattern like this:
+
+ Con a b c
+
+We parse arguments to "Con" one at a time in the fexp aexp parser production,
+building the result with mkHsAppPV, so the intermediate forms are:
+
+ 1. Con
+ 2. Con a
+ 3. Con a b
+ 4. Con a b c
+
+In 'HsExpr', we have 'HsApp', so the intermediate forms are represented like
+this (pseudocode):
+
+ 1. "Con"
+ 2. HsApp "Con" "a"
+ 3. HsApp (HsApp "Con" "a") "b"
+ 3. HsApp (HsApp (HsApp "Con" "a") "b") "c"
+
+Similarly, in 'HsCmd' we have 'HsCmdApp'. In 'Pat', however, what we have
+instead is 'ConPatIn', which is very awkward to modify and thus unsuitable for
+the intermediate forms.
+
+We also need an intermediate representation to postpone disambiguation between
+FunBind and PatBind. Consider:
+
+ a `Con` b = ...
+ a `fun` b = ...
+
+How do we know that (a `Con` b) is a PatBind but (a `fun` b) is a FunBind? We
+learn this by inspecting an intermediate representation in 'isFunLhs' and
+seeing that 'Con' is a data constructor but 'f' is not. We need an intermediate
+representation capable of representing both a FunBind and a PatBind, so Pat is
+insufficient.
+
+PatBuilder is an extension of Pat that is capable of representing intermediate
+parsing results for patterns and function bindings:
+
+ data PatBuilder p
+ = PatBuilderPat (Pat p)
+ | PatBuilderApp (Located (PatBuilder p)) (Located (PatBuilder p))
+ | PatBuilderOpApp (Located (PatBuilder p)) (Located RdrName) (Located (PatBuilder p))
+ ...
+
+It can represent any pattern via 'PatBuilderPat', but it also has a variety of
+other constructors which were added by following a simple principle: we never
+pattern match on the pattern stored inside 'PatBuilderPat'.
+-}
+
+---------------------------------------------------------------------------
+-- Miscellaneous utilities
+
+-- | Check if a fixity is valid. We support bypassing the usual bound checks
+-- for some special operators.
+checkPrecP
+ :: Located (SourceText,Int) -- ^ precedence
+ -> Located (OrdList (Located RdrName)) -- ^ operators
+ -> P ()
+checkPrecP (L l (_,i)) (L _ ol)
+ | 0 <= i, i <= maxPrecedence = pure ()
+ | all specialOp ol = pure ()
+ | otherwise = addFatalError l (text ("Precedence out of range: " ++ show i))
+ where
+ specialOp op = unLoc op `elem` [ eqTyCon_RDR
+ , getRdrName funTyCon ]
+
+mkRecConstrOrUpdate
+ :: LHsExpr GhcPs
+ -> SrcSpan
+ -> ([LHsRecField GhcPs (LHsExpr GhcPs)], Maybe SrcSpan)
+ -> PV (HsExpr GhcPs)
+
+mkRecConstrOrUpdate (L l (HsVar _ (L _ c))) _ (fs,dd)
+ | isRdrDataCon c
+ = return (mkRdrRecordCon (L l c) (mk_rec_fields fs dd))
+mkRecConstrOrUpdate exp _ (fs,dd)
+ | Just dd_loc <- dd = addFatalError dd_loc (text "You cannot use `..' in a record update")
+ | otherwise = return (mkRdrRecordUpd exp (map (fmap mk_rec_upd_field) fs))
+
+mkRdrRecordUpd :: LHsExpr GhcPs -> [LHsRecUpdField GhcPs] -> HsExpr GhcPs
+mkRdrRecordUpd exp flds
+ = RecordUpd { rupd_ext = noExtField
+ , rupd_expr = exp
+ , rupd_flds = flds }
+
+mkRdrRecordCon :: Located RdrName -> HsRecordBinds GhcPs -> HsExpr GhcPs
+mkRdrRecordCon con flds
+ = RecordCon { rcon_ext = noExtField, rcon_con_name = con, rcon_flds = flds }
+
+mk_rec_fields :: [LHsRecField id arg] -> Maybe SrcSpan -> HsRecFields id arg
+mk_rec_fields fs Nothing = HsRecFields { rec_flds = fs, rec_dotdot = Nothing }
+mk_rec_fields fs (Just s) = HsRecFields { rec_flds = fs
+ , rec_dotdot = Just (L s (length fs)) }
+
+mk_rec_upd_field :: HsRecField GhcPs (LHsExpr GhcPs) -> HsRecUpdField GhcPs
+mk_rec_upd_field (HsRecField (L loc (FieldOcc _ rdr)) arg pun)
+ = HsRecField (L loc (Unambiguous noExtField rdr)) arg pun
+
+mkInlinePragma :: SourceText -> (InlineSpec, RuleMatchInfo) -> Maybe Activation
+ -> InlinePragma
+-- The (Maybe Activation) is because the user can omit
+-- the activation spec (and usually does)
+mkInlinePragma src (inl, match_info) mb_act
+ = InlinePragma { inl_src = src -- Note [Pragma source text] in GHC.Types.Basic
+ , inl_inline = inl
+ , inl_sat = Nothing
+ , inl_act = act
+ , inl_rule = match_info }
+ where
+ act = case mb_act of
+ Just act -> act
+ Nothing -> -- No phase specified
+ case inl of
+ NoInline -> NeverActive
+ _other -> AlwaysActive
+
+-----------------------------------------------------------------------------
+-- utilities for foreign declarations
+
+-- construct a foreign import declaration
+--
+mkImport :: Located CCallConv
+ -> Located Safety
+ -> (Located StringLiteral, Located RdrName, LHsSigType GhcPs)
+ -> P (HsDecl GhcPs)
+mkImport cconv safety (L loc (StringLiteral esrc entity), v, ty) =
+ case unLoc cconv of
+ CCallConv -> mkCImport
+ CApiConv -> mkCImport
+ StdCallConv -> mkCImport
+ PrimCallConv -> mkOtherImport
+ JavaScriptCallConv -> mkOtherImport
+ where
+ -- Parse a C-like entity string of the following form:
+ -- "[static] [chname] [&] [cid]" | "dynamic" | "wrapper"
+ -- If 'cid' is missing, the function name 'v' is used instead as symbol
+ -- name (cf section 8.5.1 in Haskell 2010 report).
+ mkCImport = do
+ let e = unpackFS entity
+ case parseCImport cconv safety (mkExtName (unLoc v)) e (L loc esrc) of
+ Nothing -> addFatalError loc (text "Malformed entity string")
+ Just importSpec -> returnSpec importSpec
+
+ -- currently, all the other import conventions only support a symbol name in
+ -- the entity string. If it is missing, we use the function name instead.
+ mkOtherImport = returnSpec importSpec
+ where
+ entity' = if nullFS entity
+ then mkExtName (unLoc v)
+ else entity
+ funcTarget = CFunction (StaticTarget esrc entity' Nothing True)
+ importSpec = CImport cconv safety Nothing funcTarget (L loc esrc)
+
+ returnSpec spec = return $ ForD noExtField $ ForeignImport
+ { fd_i_ext = noExtField
+ , fd_name = v
+ , fd_sig_ty = ty
+ , fd_fi = spec
+ }
+
+
+
+-- the string "foo" is ambiguous: either a header or a C identifier. The
+-- C identifier case comes first in the alternatives below, so we pick
+-- that one.
+parseCImport :: Located CCallConv -> Located Safety -> FastString -> String
+ -> Located SourceText
+ -> Maybe ForeignImport
+parseCImport cconv safety nm str sourceText =
+ listToMaybe $ map fst $ filter (null.snd) $
+ readP_to_S parse str
+ where
+ parse = do
+ skipSpaces
+ r <- choice [
+ string "dynamic" >> return (mk Nothing (CFunction DynamicTarget)),
+ string "wrapper" >> return (mk Nothing CWrapper),
+ do optional (token "static" >> skipSpaces)
+ ((mk Nothing <$> cimp nm) +++
+ (do h <- munch1 hdr_char
+ skipSpaces
+ mk (Just (Header (SourceText h) (mkFastString h)))
+ <$> cimp nm))
+ ]
+ skipSpaces
+ return r
+
+ token str = do _ <- string str
+ toks <- look
+ case toks of
+ c : _
+ | id_char c -> pfail
+ _ -> return ()
+
+ mk h n = CImport cconv safety h n sourceText
+
+ hdr_char c = not (isSpace c)
+ -- header files are filenames, which can contain
+ -- pretty much any char (depending on the platform),
+ -- so just accept any non-space character
+ id_first_char c = isAlpha c || c == '_'
+ id_char c = isAlphaNum c || c == '_'
+
+ cimp nm = (ReadP.char '&' >> skipSpaces >> CLabel <$> cid)
+ +++ (do isFun <- case unLoc cconv of
+ CApiConv ->
+ option True
+ (do token "value"
+ skipSpaces
+ return False)
+ _ -> return True
+ cid' <- cid
+ return (CFunction (StaticTarget NoSourceText cid'
+ Nothing isFun)))
+ where
+ cid = return nm +++
+ (do c <- satisfy id_first_char
+ cs <- many (satisfy id_char)
+ return (mkFastString (c:cs)))
+
+
+-- construct a foreign export declaration
+--
+mkExport :: Located CCallConv
+ -> (Located StringLiteral, Located RdrName, LHsSigType GhcPs)
+ -> P (HsDecl GhcPs)
+mkExport (L lc cconv) (L le (StringLiteral esrc entity), v, ty)
+ = return $ ForD noExtField $
+ ForeignExport { fd_e_ext = noExtField, fd_name = v, fd_sig_ty = ty
+ , fd_fe = CExport (L lc (CExportStatic esrc entity' cconv))
+ (L le esrc) }
+ where
+ entity' | nullFS entity = mkExtName (unLoc v)
+ | otherwise = entity
+
+-- Supplying the ext_name in a foreign decl is optional; if it
+-- isn't there, the Haskell name is assumed. Note that no transformation
+-- of the Haskell name is then performed, so if you foreign export (++),
+-- it's external name will be "++". Too bad; it's important because we don't
+-- want z-encoding (e.g. names with z's in them shouldn't be doubled)
+--
+mkExtName :: RdrName -> CLabelString
+mkExtName rdrNm = mkFastString (occNameString (rdrNameOcc rdrNm))
+
+--------------------------------------------------------------------------------
+-- Help with module system imports/exports
+
+data ImpExpSubSpec = ImpExpAbs
+ | ImpExpAll
+ | ImpExpList [Located ImpExpQcSpec]
+ | ImpExpAllWith [Located ImpExpQcSpec]
+
+data ImpExpQcSpec = ImpExpQcName (Located RdrName)
+ | ImpExpQcType (Located RdrName)
+ | ImpExpQcWildcard
+
+mkModuleImpExp :: Located ImpExpQcSpec -> ImpExpSubSpec -> P (IE GhcPs)
+mkModuleImpExp (L l specname) subs =
+ case subs of
+ ImpExpAbs
+ | isVarNameSpace (rdrNameSpace name)
+ -> return $ IEVar noExtField (L l (ieNameFromSpec specname))
+ | otherwise -> IEThingAbs noExtField . L l <$> nameT
+ ImpExpAll -> IEThingAll noExtField . L l <$> nameT
+ ImpExpList xs ->
+ (\newName -> IEThingWith noExtField (L l newName)
+ NoIEWildcard (wrapped xs) []) <$> nameT
+ ImpExpAllWith xs ->
+ do allowed <- getBit PatternSynonymsBit
+ if allowed
+ then
+ let withs = map unLoc xs
+ pos = maybe NoIEWildcard IEWildcard
+ (findIndex isImpExpQcWildcard withs)
+ ies = wrapped $ filter (not . isImpExpQcWildcard . unLoc) xs
+ in (\newName
+ -> IEThingWith noExtField (L l newName) pos ies [])
+ <$> nameT
+ else addFatalError l
+ (text "Illegal export form (use PatternSynonyms to enable)")
+ where
+ name = ieNameVal specname
+ nameT =
+ if isVarNameSpace (rdrNameSpace name)
+ then addFatalError l
+ (text "Expecting a type constructor but found a variable,"
+ <+> quotes (ppr name) <> text "."
+ $$ if isSymOcc $ rdrNameOcc name
+ then text "If" <+> quotes (ppr name)
+ <+> text "is a type constructor"
+ <+> text "then enable ExplicitNamespaces and use the 'type' keyword."
+ else empty)
+ else return $ ieNameFromSpec specname
+
+ ieNameVal (ImpExpQcName ln) = unLoc ln
+ ieNameVal (ImpExpQcType ln) = unLoc ln
+ ieNameVal (ImpExpQcWildcard) = panic "ieNameVal got wildcard"
+
+ ieNameFromSpec (ImpExpQcName ln) = IEName ln
+ ieNameFromSpec (ImpExpQcType ln) = IEType ln
+ ieNameFromSpec (ImpExpQcWildcard) = panic "ieName got wildcard"
+
+ wrapped = map (mapLoc ieNameFromSpec)
+
+mkTypeImpExp :: Located RdrName -- TcCls or Var name space
+ -> P (Located RdrName)
+mkTypeImpExp name =
+ do allowed <- getBit ExplicitNamespacesBit
+ unless allowed $ addError (getLoc name) $
+ text "Illegal keyword 'type' (use ExplicitNamespaces to enable)"
+ return (fmap (`setRdrNameSpace` tcClsName) name)
+
+checkImportSpec :: Located [LIE GhcPs] -> P (Located [LIE GhcPs])
+checkImportSpec ie@(L _ specs) =
+ case [l | (L l (IEThingWith _ _ (IEWildcard _) _ _)) <- specs] of
+ [] -> return ie
+ (l:_) -> importSpecError l
+ where
+ importSpecError l =
+ addFatalError l
+ (text "Illegal import form, this syntax can only be used to bundle"
+ $+$ text "pattern synonyms with types in module exports.")
+
+-- In the correct order
+mkImpExpSubSpec :: [Located ImpExpQcSpec] -> P ([AddAnn], ImpExpSubSpec)
+mkImpExpSubSpec [] = return ([], ImpExpList [])
+mkImpExpSubSpec [L _ ImpExpQcWildcard] =
+ return ([], ImpExpAll)
+mkImpExpSubSpec xs =
+ if (any (isImpExpQcWildcard . unLoc) xs)
+ then return $ ([], ImpExpAllWith xs)
+ else return $ ([], ImpExpList xs)
+
+isImpExpQcWildcard :: ImpExpQcSpec -> Bool
+isImpExpQcWildcard ImpExpQcWildcard = True
+isImpExpQcWildcard _ = False
+
+-----------------------------------------------------------------------------
+-- Warnings and failures
+
+warnPrepositiveQualifiedModule :: SrcSpan -> P ()
+warnPrepositiveQualifiedModule span =
+ addWarning Opt_WarnPrepositiveQualifiedModule span msg
+ where
+ msg = text "Found" <+> quotes (text "qualified")
+ <+> text "in prepositive position"
+ $$ text "Suggested fix: place " <+> quotes (text "qualified")
+ <+> text "after the module name instead."
+
+failOpNotEnabledImportQualifiedPost :: SrcSpan -> P ()
+failOpNotEnabledImportQualifiedPost loc = addError loc msg
+ where
+ msg = text "Found" <+> quotes (text "qualified")
+ <+> text "in postpositive position. "
+ $$ text "To allow this, enable language extension 'ImportQualifiedPost'"
+
+failOpImportQualifiedTwice :: SrcSpan -> P ()
+failOpImportQualifiedTwice loc = addError loc msg
+ where
+ msg = text "Multiple occurrences of 'qualified'"
+
+warnStarIsType :: SrcSpan -> P ()
+warnStarIsType span = addWarning Opt_WarnStarIsType span msg
+ where
+ msg = text "Using" <+> quotes (text "*")
+ <+> text "(or its Unicode variant) to mean"
+ <+> quotes (text "Data.Kind.Type")
+ $$ text "relies on the StarIsType extension, which will become"
+ $$ text "deprecated in the future."
+ $$ text "Suggested fix: use" <+> quotes (text "Type")
+ <+> text "from" <+> quotes (text "Data.Kind") <+> text "instead."
+
+warnStarBndr :: SrcSpan -> P ()
+warnStarBndr span = addWarning Opt_WarnStarBinder span msg
+ where
+ msg = text "Found binding occurrence of" <+> quotes (text "*")
+ <+> text "yet StarIsType is enabled."
+ $$ text "NB. To use (or export) this operator in"
+ <+> text "modules with StarIsType,"
+ $$ text " including the definition module, you must qualify it."
+
+failOpFewArgs :: Located RdrName -> P a
+failOpFewArgs (L loc op) =
+ do { star_is_type <- getBit StarIsTypeBit
+ ; let msg = too_few $$ starInfo star_is_type op
+ ; addFatalError loc msg }
+ where
+ too_few = text "Operator applied to too few arguments:" <+> ppr op
+
+failOpDocPrev :: SrcSpan -> P a
+failOpDocPrev loc = addFatalError loc msg
+ where
+ msg = text "Unexpected documentation comment."
+
+-----------------------------------------------------------------------------
+-- Misc utils
+
+data PV_Context =
+ PV_Context
+ { pv_options :: ParserFlags
+ , pv_hint :: SDoc -- See Note [Parser-Validator Hint]
+ }
+
+data PV_Accum =
+ PV_Accum
+ { pv_messages :: DynFlags -> Messages
+ , pv_annotations :: [(ApiAnnKey,[RealSrcSpan])]
+ , pv_comment_q :: [RealLocated AnnotationComment]
+ , pv_annotations_comments :: [(RealSrcSpan,[RealLocated AnnotationComment])]
+ }
+
+data PV_Result a = PV_Ok PV_Accum a | PV_Failed PV_Accum
+
+-- See Note [Parser-Validator]
+newtype PV a = PV { unPV :: PV_Context -> PV_Accum -> PV_Result a }
+
+instance Functor PV where
+ fmap = liftM
+
+instance Applicative PV where
+ pure a = a `seq` PV (\_ acc -> PV_Ok acc a)
+ (<*>) = ap
+
+instance Monad PV where
+ m >>= f = PV $ \ctx acc ->
+ case unPV m ctx acc of
+ PV_Ok acc' a -> unPV (f a) ctx acc'
+ PV_Failed acc' -> PV_Failed acc'
+
+runPV :: PV a -> P a
+runPV = runPV_msg empty
+
+runPV_msg :: SDoc -> PV a -> P a
+runPV_msg msg m =
+ P $ \s ->
+ let
+ pv_ctx = PV_Context
+ { pv_options = options s
+ , pv_hint = msg }
+ pv_acc = PV_Accum
+ { pv_messages = messages s
+ , pv_annotations = annotations s
+ , pv_comment_q = comment_q s
+ , pv_annotations_comments = annotations_comments s }
+ mkPState acc' =
+ s { messages = pv_messages acc'
+ , annotations = pv_annotations acc'
+ , comment_q = pv_comment_q acc'
+ , annotations_comments = pv_annotations_comments acc' }
+ in
+ case unPV m pv_ctx pv_acc of
+ PV_Ok acc' a -> POk (mkPState acc') a
+ PV_Failed acc' -> PFailed (mkPState acc')
+
+localPV_msg :: (SDoc -> SDoc) -> PV a -> PV a
+localPV_msg f m =
+ let modifyHint ctx = ctx{pv_hint = f (pv_hint ctx)} in
+ PV (\ctx acc -> unPV m (modifyHint ctx) acc)
+
+instance MonadP PV where
+ addError srcspan msg =
+ PV $ \ctx acc@PV_Accum{pv_messages=m} ->
+ let msg' = msg $$ pv_hint ctx in
+ PV_Ok acc{pv_messages=appendError srcspan msg' m} ()
+ addWarning option srcspan warning =
+ PV $ \PV_Context{pv_options=o} acc@PV_Accum{pv_messages=m} ->
+ PV_Ok acc{pv_messages=appendWarning o option srcspan warning m} ()
+ addFatalError srcspan msg =
+ addError srcspan msg >> PV (const PV_Failed)
+ getBit ext =
+ PV $ \ctx acc ->
+ let b = ext `xtest` pExtsBitmap (pv_options ctx) in
+ PV_Ok acc $! b
+ addAnnotation (RealSrcSpan l _) a (RealSrcSpan v _) =
+ PV $ \_ acc ->
+ let
+ (comment_q', new_ann_comments) = allocateComments l (pv_comment_q acc)
+ annotations_comments' = new_ann_comments ++ pv_annotations_comments acc
+ annotations' = ((l,a), [v]) : pv_annotations acc
+ acc' = acc
+ { pv_annotations = annotations'
+ , pv_comment_q = comment_q'
+ , pv_annotations_comments = annotations_comments' }
+ in
+ PV_Ok acc' ()
+ addAnnotation _ _ _ = return ()
+
+{- Note [Parser-Validator]
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+When resolving ambiguities, we need to postpone failure to make a choice later.
+For example, if we have ambiguity between some A and B, our parser could be
+
+ abParser :: P (Maybe A, Maybe B)
+
+This way we can represent four possible outcomes of parsing:
+
+ (Just a, Nothing) -- definitely A
+ (Nothing, Just b) -- definitely B
+ (Just a, Just b) -- either A or B
+ (Nothing, Nothing) -- neither A nor B
+
+However, if we want to report informative parse errors, accumulate warnings,
+and add API annotations, we are better off using 'P' instead of 'Maybe':
+
+ abParser :: P (P A, P B)
+
+So we have an outer layer of P that consumes the input and builds the inner
+layer, which validates the input.
+
+For clarity, we introduce the notion of a parser-validator: a parser that does
+not consume any input, but may fail or use other effects. Thus we have:
+
+ abParser :: P (PV A, PV B)
+
+-}
+
+{- Note [Parser-Validator Hint]
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+A PV computation is parametrized by a hint for error messages, which can be set
+depending on validation context. We use this in checkPattern to fix #984.
+
+Consider this example, where the user has forgotten a 'do':
+
+ f _ = do
+ x <- computation
+ case () of
+ _ ->
+ result <- computation
+ case () of () -> undefined
+
+GHC parses it as follows:
+
+ f _ = do
+ x <- computation
+ (case () of
+ _ ->
+ result) <- computation
+ case () of () -> undefined
+
+Note that this fragment is parsed as a pattern:
+
+ case () of
+ _ ->
+ result
+
+We attempt to detect such cases and add a hint to the error messages:
+
+ T984.hs:6:9:
+ Parse error in pattern: case () of { _ -> result }
+ Possibly caused by a missing 'do'?
+
+The "Possibly caused by a missing 'do'?" suggestion is the hint that is passed
+as the 'pv_hint' field 'PV_Context'. When validating in a context other than
+'bindpat' (a pattern to the left of <-), we set the hint to 'empty' and it has
+no effect on the error messages.
+
+-}
+
+-- | Hint about bang patterns, assuming @BangPatterns@ is off.
+hintBangPat :: SrcSpan -> Pat GhcPs -> PV ()
+hintBangPat span e = do
+ bang_on <- getBit BangPatBit
+ unless bang_on $
+ addError span
+ (text "Illegal bang-pattern (use BangPatterns):" $$ ppr e)
+
+data SumOrTuple b
+ = Sum ConTag Arity (Located b)
+ | Tuple [Located (Maybe (Located b))]
+
+pprSumOrTuple :: Outputable b => Boxity -> SumOrTuple b -> SDoc
+pprSumOrTuple boxity = \case
+ Sum alt arity e ->
+ parOpen <+> ppr_bars (alt - 1) <+> ppr e <+> ppr_bars (arity - alt)
+ <+> parClose
+ Tuple xs ->
+ parOpen <> (fcat . punctuate comma $ map (maybe empty ppr . unLoc) xs)
+ <> parClose
+ where
+ ppr_bars n = hsep (replicate n (Outputable.char '|'))
+ (parOpen, parClose) =
+ case boxity of
+ Boxed -> (text "(", text ")")
+ Unboxed -> (text "(#", text "#)")
+
+mkSumOrTupleExpr :: SrcSpan -> Boxity -> SumOrTuple (HsExpr GhcPs) -> PV (LHsExpr GhcPs)
+
+-- Tuple
+mkSumOrTupleExpr l boxity (Tuple es) =
+ return $ L l (ExplicitTuple noExtField (map toTupArg es) boxity)
+ where
+ toTupArg :: Located (Maybe (LHsExpr GhcPs)) -> LHsTupArg GhcPs
+ toTupArg = mapLoc (maybe missingTupArg (Present noExtField))
+
+-- Sum
+mkSumOrTupleExpr l Unboxed (Sum alt arity e) =
+ return $ L l (ExplicitSum noExtField alt arity e)
+mkSumOrTupleExpr l Boxed a@Sum{} =
+ addFatalError l (hang (text "Boxed sums not supported:") 2
+ (pprSumOrTuple Boxed a))
+
+mkSumOrTuplePat :: SrcSpan -> Boxity -> SumOrTuple (PatBuilder GhcPs) -> PV (Located (PatBuilder GhcPs))
+
+-- Tuple
+mkSumOrTuplePat l boxity (Tuple ps) = do
+ ps' <- traverse toTupPat ps
+ return $ L l (PatBuilderPat (TuplePat noExtField ps' boxity))
+ where
+ toTupPat :: Located (Maybe (Located (PatBuilder GhcPs))) -> PV (LPat GhcPs)
+ toTupPat (L l p) = case p of
+ Nothing -> addFatalError l (text "Tuple section in pattern context")
+ Just p' -> checkLPat p'
+
+-- Sum
+mkSumOrTuplePat l Unboxed (Sum alt arity p) = do
+ p' <- checkLPat p
+ return $ L l (PatBuilderPat (SumPat noExtField p' alt arity))
+mkSumOrTuplePat l Boxed a@Sum{} =
+ addFatalError l (hang (text "Boxed sums not supported:") 2
+ (pprSumOrTuple Boxed a))
+
+mkLHsOpTy :: LHsType GhcPs -> Located RdrName -> LHsType GhcPs -> LHsType GhcPs
+mkLHsOpTy x op y =
+ let loc = getLoc x `combineSrcSpans` getLoc op `combineSrcSpans` getLoc y
+ in L loc (mkHsOpTy x op y)
+
+mkLHsDocTy :: LHsType GhcPs -> LHsDocString -> LHsType GhcPs
+mkLHsDocTy t doc =
+ let loc = getLoc t `combineSrcSpans` getLoc doc
+ in L loc (HsDocTy noExtField t doc)
+
+mkLHsDocTyMaybe :: LHsType GhcPs -> Maybe LHsDocString -> LHsType GhcPs
+mkLHsDocTyMaybe t = maybe t (mkLHsDocTy t)
+
+-----------------------------------------------------------------------------
+-- Token symbols
+
+starSym :: Bool -> String
+starSym True = "★"
+starSym False = "*"
+
+forallSym :: Bool -> String
+forallSym True = "∀"
+forallSym False = "forall"
diff --git a/compiler/GHC/Parser/PostProcess/Haddock.hs b/compiler/GHC/Parser/PostProcess/Haddock.hs
new file mode 100644
index 0000000000..a3d5e101d7
--- /dev/null
+++ b/compiler/GHC/Parser/PostProcess/Haddock.hs
@@ -0,0 +1,35 @@
+{-# OPTIONS_GHC -Wno-incomplete-record-updates #-}
+
+module GHC.Parser.PostProcess.Haddock where
+
+import GhcPrelude
+
+import GHC.Hs
+import GHC.Types.SrcLoc
+
+import Control.Monad
+
+-- -----------------------------------------------------------------------------
+-- Adding documentation to record fields (used in parsing).
+
+addFieldDoc :: LConDeclField a -> Maybe LHsDocString -> LConDeclField a
+addFieldDoc (L l fld) doc
+ = L l (fld { cd_fld_doc = cd_fld_doc fld `mplus` doc })
+
+addFieldDocs :: [LConDeclField a] -> Maybe LHsDocString -> [LConDeclField a]
+addFieldDocs [] _ = []
+addFieldDocs (x:xs) doc = addFieldDoc x doc : xs
+
+
+addConDoc :: LConDecl a -> Maybe LHsDocString -> LConDecl a
+addConDoc decl Nothing = decl
+addConDoc (L p c) doc = L p ( c { con_doc = con_doc c `mplus` doc } )
+
+addConDocs :: [LConDecl a] -> Maybe LHsDocString -> [LConDecl a]
+addConDocs [] _ = []
+addConDocs [x] doc = [addConDoc x doc]
+addConDocs (x:xs) doc = x : addConDocs xs doc
+
+addConDocFirst :: [LConDecl a] -> Maybe LHsDocString -> [LConDecl a]
+addConDocFirst [] _ = []
+addConDocFirst (x:xs) doc = addConDoc x doc : xs