diff options
author | Sylvain Henry <sylvain@haskus.fr> | 2020-04-05 17:39:13 +0200 |
---|---|---|
committer | Sylvain Henry <sylvain@haskus.fr> | 2020-04-18 20:04:46 +0200 |
commit | 15312bbb53f247c9ed2c5cf75100a9f44c1c7227 (patch) | |
tree | 8306dcc04a5b7c82464f903044dfdd589e7fdcd7 /compiler/GHC/Parser | |
parent | 3ca52151881451ce5b3a7740d003e811b586140d (diff) | |
download | haskell-15312bbb53f247c9ed2c5cf75100a9f44c1c7227.tar.gz |
Modules (#13009)
* SysTools
* Parser
* GHC.Builtin
* GHC.Iface.Recomp
* Settings
Update Haddock submodule
Metric Decrease:
Naperian
parsing001
Diffstat (limited to 'compiler/GHC/Parser')
-rw-r--r-- | compiler/GHC/Parser/Annotation.hs | 378 | ||||
-rw-r--r-- | compiler/GHC/Parser/CharClass.hs | 215 | ||||
-rw-r--r-- | compiler/GHC/Parser/Header.hs | 361 | ||||
-rw-r--r-- | compiler/GHC/Parser/Lexer.x | 3294 | ||||
-rw-r--r-- | compiler/GHC/Parser/PostProcess.hs | 3090 | ||||
-rw-r--r-- | compiler/GHC/Parser/PostProcess/Haddock.hs | 35 |
6 files changed, 7373 insertions, 0 deletions
diff --git a/compiler/GHC/Parser/Annotation.hs b/compiler/GHC/Parser/Annotation.hs new file mode 100644 index 0000000000..dbd1f79e23 --- /dev/null +++ b/compiler/GHC/Parser/Annotation.hs @@ -0,0 +1,378 @@ +{-# LANGUAGE DeriveDataTypeable #-} + +module GHC.Parser.Annotation ( + getAnnotation, getAndRemoveAnnotation, + getAnnotationComments,getAndRemoveAnnotationComments, + ApiAnns(..), + ApiAnnKey, + AnnKeywordId(..), + AnnotationComment(..), + IsUnicodeSyntax(..), + unicodeAnn, + HasE(..), + LRdrName -- Exists for haddocks only + ) where + +import GhcPrelude + +import GHC.Types.Name.Reader +import Outputable +import GHC.Types.SrcLoc +import qualified Data.Map as Map +import Data.Data + + +{- +Note [Api annotations] +~~~~~~~~~~~~~~~~~~~~~~ +Given a parse tree of a Haskell module, how can we reconstruct +the original Haskell source code, retaining all whitespace and +source code comments? We need to track the locations of all +elements from the original source: this includes keywords such as +'let' / 'in' / 'do' etc as well as punctuation such as commas and +braces, and also comments. We collectively refer to this +metadata as the "API annotations". + +Rather than annotate the resulting parse tree with these locations +directly (this would be a major change to some fairly core data +structures in GHC), we instead capture locations for these elements in a +structure separate from the parse tree, and returned in the +pm_annotations field of the ParsedModule type. + +The full ApiAnns type is + +> data ApiAnns = +> ApiAnns +> { apiAnnItems :: Map.Map ApiAnnKey [RealSrcSpan], +> apiAnnEofPos :: Maybe RealSrcSpan, +> apiAnnComments :: Map.Map RealSrcSpan [RealLocated AnnotationComment], +> apiAnnRogueComments :: [RealLocated AnnotationComment] +> } + +NON-COMMENT ELEMENTS + +Intuitively, every AST element directly contains a bag of keywords +(keywords can show up more than once in a node: a semicolon i.e. newline +can show up multiple times before the next AST element), each of which +needs to be associated with its location in the original source code. + +Consequently, the structure that records non-comment elements is logically +a two level map, from the RealSrcSpan of the AST element containing it, to +a map from keywords ('AnnKeyWord') to all locations of the keyword directly +in the AST element: + +> type ApiAnnKey = (RealSrcSpan,AnnKeywordId) +> +> Map.Map ApiAnnKey [RealSrcSpan] + +So + +> let x = 1 in 2 *x + +would result in the AST element + + L span (HsLet (binds for x = 1) (2 * x)) + +and the annotations + + (span,AnnLet) having the location of the 'let' keyword + (span,AnnEqual) having the location of the '=' sign + (span,AnnIn) having the location of the 'in' keyword + +For any given element in the AST, there is only a set number of +keywords that are applicable for it (e.g., you'll never see an +'import' keyword associated with a let-binding.) The set of allowed +keywords is documented in a comment associated with the constructor +of a given AST element, although the ground truth is in GHC.Parser +and GHC.Parser.PostProcess (which actually add the annotations; see #13012). + +COMMENT ELEMENTS + +Every comment is associated with a *located* AnnotationComment. +We associate comments with the lowest (most specific) AST element +enclosing them: + +> Map.Map RealSrcSpan [RealLocated AnnotationComment] + +PARSER STATE + +There are three fields in PState (the parser state) which play a role +with annotations. + +> annotations :: [(ApiAnnKey,[RealSrcSpan])], +> comment_q :: [RealLocated AnnotationComment], +> annotations_comments :: [(RealSrcSpan,[RealLocated AnnotationComment])] + +The 'annotations' and 'annotations_comments' fields are simple: they simply +accumulate annotations that will end up in 'ApiAnns' at the end +(after they are passed to Map.fromList). + +The 'comment_q' field captures comments as they are seen in the token stream, +so that when they are ready to be allocated via the parser they are +available (at the time we lex a comment, we don't know what the enclosing +AST node of it is, so we can't associate it with a RealSrcSpan in +annotations_comments). + +PARSER EMISSION OF ANNOTATIONS + +The parser interacts with the lexer using the function + +> addAnnotation :: RealSrcSpan -> AnnKeywordId -> RealSrcSpan -> P () + +which takes the AST element RealSrcSpan, the annotation keyword and the +target RealSrcSpan. + +This adds the annotation to the `annotations` field of `PState` and +transfers any comments in `comment_q` WHICH ARE ENCLOSED by +the RealSrcSpan of this element to the `annotations_comments` +field. (Comments which are outside of this annotation are deferred +until later. 'allocateComments' in 'Lexer' is responsible for +making sure we only attach comments that actually fit in the 'SrcSpan'.) + +The wiki page describing this feature is +https://gitlab.haskell.org/ghc/ghc/wikis/api-annotations + +-} +-- --------------------------------------------------------------------- + +-- If you update this, update the Note [Api annotations] above +data ApiAnns = + ApiAnns + { apiAnnItems :: Map.Map ApiAnnKey [RealSrcSpan], + apiAnnEofPos :: Maybe RealSrcSpan, + apiAnnComments :: Map.Map RealSrcSpan [RealLocated AnnotationComment], + apiAnnRogueComments :: [RealLocated AnnotationComment] + } + +-- If you update this, update the Note [Api annotations] above +type ApiAnnKey = (RealSrcSpan,AnnKeywordId) + + +-- | Retrieve a list of annotation 'SrcSpan's based on the 'SrcSpan' +-- of the annotated AST element, and the known type of the annotation. +getAnnotation :: ApiAnns -> RealSrcSpan -> AnnKeywordId -> [RealSrcSpan] +getAnnotation anns span ann = + case Map.lookup ann_key ann_items of + Nothing -> [] + Just ss -> ss + where ann_items = apiAnnItems anns + ann_key = (span,ann) + +-- | Retrieve a list of annotation 'SrcSpan's based on the 'SrcSpan' +-- of the annotated AST element, and the known type of the annotation. +-- The list is removed from the annotations. +getAndRemoveAnnotation :: ApiAnns -> RealSrcSpan -> AnnKeywordId + -> ([RealSrcSpan],ApiAnns) +getAndRemoveAnnotation anns span ann = + case Map.lookup ann_key ann_items of + Nothing -> ([],anns) + Just ss -> (ss,anns{ apiAnnItems = Map.delete ann_key ann_items }) + where ann_items = apiAnnItems anns + ann_key = (span,ann) + +-- |Retrieve the comments allocated to the current 'SrcSpan' +-- +-- Note: A given 'SrcSpan' may appear in multiple AST elements, +-- beware of duplicates +getAnnotationComments :: ApiAnns -> RealSrcSpan -> [RealLocated AnnotationComment] +getAnnotationComments anns span = + case Map.lookup span (apiAnnComments anns) of + Just cs -> cs + Nothing -> [] + +-- |Retrieve the comments allocated to the current 'SrcSpan', and +-- remove them from the annotations +getAndRemoveAnnotationComments :: ApiAnns -> RealSrcSpan + -> ([RealLocated AnnotationComment],ApiAnns) +getAndRemoveAnnotationComments anns span = + case Map.lookup span ann_comments of + Just cs -> (cs, anns{ apiAnnComments = Map.delete span ann_comments }) + Nothing -> ([], anns) + where ann_comments = apiAnnComments anns + +-- -------------------------------------------------------------------- + +-- | API Annotations exist so that tools can perform source to source +-- conversions of Haskell code. They are used to keep track of the +-- various syntactic keywords that are not captured in the existing +-- AST. +-- +-- The annotations, together with original source comments are made +-- available in the @'pm_annotations'@ field of @'GHC.ParsedModule'@. +-- Comments are only retained if @'Opt_KeepRawTokenStream'@ is set in +-- @'DynFlags.DynFlags'@ before parsing. +-- +-- The wiki page describing this feature is +-- https://gitlab.haskell.org/ghc/ghc/wikis/api-annotations +-- +-- Note: in general the names of these are taken from the +-- corresponding token, unless otherwise noted +-- See note [Api annotations] above for details of the usage +data AnnKeywordId + = AnnAnyclass + | AnnAs + | AnnAt + | AnnBang -- ^ '!' + | AnnBackquote -- ^ '`' + | AnnBy + | AnnCase -- ^ case or lambda case + | AnnClass + | AnnClose -- ^ '\#)' or '\#-}' etc + | AnnCloseB -- ^ '|)' + | AnnCloseBU -- ^ '|)', unicode variant + | AnnCloseC -- ^ '}' + | AnnCloseQ -- ^ '|]' + | AnnCloseQU -- ^ '|]', unicode variant + | AnnCloseP -- ^ ')' + | AnnCloseS -- ^ ']' + | AnnColon + | AnnComma -- ^ as a list separator + | AnnCommaTuple -- ^ in a RdrName for a tuple + | AnnDarrow -- ^ '=>' + | AnnDarrowU -- ^ '=>', unicode variant + | AnnData + | AnnDcolon -- ^ '::' + | AnnDcolonU -- ^ '::', unicode variant + | AnnDefault + | AnnDeriving + | AnnDo + | AnnDot -- ^ '.' + | AnnDotdot -- ^ '..' + | AnnElse + | AnnEqual + | AnnExport + | AnnFamily + | AnnForall + | AnnForallU -- ^ Unicode variant + | AnnForeign + | AnnFunId -- ^ for function name in matches where there are + -- multiple equations for the function. + | AnnGroup + | AnnHeader -- ^ for CType + | AnnHiding + | AnnIf + | AnnImport + | AnnIn + | AnnInfix -- ^ 'infix' or 'infixl' or 'infixr' + | AnnInstance + | AnnLam + | AnnLarrow -- ^ '<-' + | AnnLarrowU -- ^ '<-', unicode variant + | AnnLet + | AnnMdo + | AnnMinus -- ^ '-' + | AnnModule + | AnnNewtype + | AnnName -- ^ where a name loses its location in the AST, this carries it + | AnnOf + | AnnOpen -- ^ '(\#' or '{-\# LANGUAGE' etc + | AnnOpenB -- ^ '(|' + | AnnOpenBU -- ^ '(|', unicode variant + | AnnOpenC -- ^ '{' + | AnnOpenE -- ^ '[e|' or '[e||' + | AnnOpenEQ -- ^ '[|' + | AnnOpenEQU -- ^ '[|', unicode variant + | AnnOpenP -- ^ '(' + | AnnOpenS -- ^ '[' + | AnnDollar -- ^ prefix '$' -- TemplateHaskell + | AnnDollarDollar -- ^ prefix '$$' -- TemplateHaskell + | AnnPackageName + | AnnPattern + | AnnProc + | AnnQualified + | AnnRarrow -- ^ '->' + | AnnRarrowU -- ^ '->', unicode variant + | AnnRec + | AnnRole + | AnnSafe + | AnnSemi -- ^ ';' + | AnnSimpleQuote -- ^ ''' + | AnnSignature + | AnnStatic -- ^ 'static' + | AnnStock + | AnnThen + | AnnThIdSplice -- ^ '$' + | AnnThIdTySplice -- ^ '$$' + | AnnThTyQuote -- ^ double ''' + | AnnTilde -- ^ '~' + | AnnType + | AnnUnit -- ^ '()' for types + | AnnUsing + | AnnVal -- ^ e.g. INTEGER + | AnnValStr -- ^ String value, will need quotes when output + | AnnVbar -- ^ '|' + | AnnVia -- ^ 'via' + | AnnWhere + | Annlarrowtail -- ^ '-<' + | AnnlarrowtailU -- ^ '-<', unicode variant + | Annrarrowtail -- ^ '->' + | AnnrarrowtailU -- ^ '->', unicode variant + | AnnLarrowtail -- ^ '-<<' + | AnnLarrowtailU -- ^ '-<<', unicode variant + | AnnRarrowtail -- ^ '>>-' + | AnnRarrowtailU -- ^ '>>-', unicode variant + deriving (Eq, Ord, Data, Show) + +instance Outputable AnnKeywordId where + ppr x = text (show x) + +-- --------------------------------------------------------------------- + +data AnnotationComment = + -- Documentation annotations + AnnDocCommentNext String -- ^ something beginning '-- |' + | AnnDocCommentPrev String -- ^ something beginning '-- ^' + | AnnDocCommentNamed String -- ^ something beginning '-- $' + | AnnDocSection Int String -- ^ a section heading + | AnnDocOptions String -- ^ doc options (prune, ignore-exports, etc) + | AnnLineComment String -- ^ comment starting by "--" + | AnnBlockComment String -- ^ comment in {- -} + deriving (Eq, Ord, Data, Show) +-- Note: these are based on the Token versions, but the Token type is +-- defined in GHC.Parser.Lexer and bringing it in here would create a loop + +instance Outputable AnnotationComment where + ppr x = text (show x) + +-- | - 'ApiAnnotation.AnnKeywordId' : 'ApiAnnotation.AnnOpen', +-- 'ApiAnnotation.AnnClose','ApiAnnotation.AnnComma', +-- 'ApiAnnotation.AnnRarrow' +-- 'ApiAnnotation.AnnTilde' +-- - May have 'ApiAnnotation.AnnComma' when in a list +type LRdrName = Located RdrName + + +-- | Certain tokens can have alternate representations when unicode syntax is +-- enabled. This flag is attached to those tokens in the lexer so that the +-- original source representation can be reproduced in the corresponding +-- 'ApiAnnotation' +data IsUnicodeSyntax = UnicodeSyntax | NormalSyntax + deriving (Eq, Ord, Data, Show) + +-- | Convert a normal annotation into its unicode equivalent one +unicodeAnn :: AnnKeywordId -> AnnKeywordId +unicodeAnn AnnForall = AnnForallU +unicodeAnn AnnDcolon = AnnDcolonU +unicodeAnn AnnLarrow = AnnLarrowU +unicodeAnn AnnRarrow = AnnRarrowU +unicodeAnn AnnDarrow = AnnDarrowU +unicodeAnn Annlarrowtail = AnnlarrowtailU +unicodeAnn Annrarrowtail = AnnrarrowtailU +unicodeAnn AnnLarrowtail = AnnLarrowtailU +unicodeAnn AnnRarrowtail = AnnRarrowtailU +unicodeAnn AnnOpenB = AnnOpenBU +unicodeAnn AnnCloseB = AnnCloseBU +unicodeAnn AnnOpenEQ = AnnOpenEQU +unicodeAnn AnnCloseQ = AnnCloseQU +unicodeAnn ann = ann + + +-- | Some template haskell tokens have two variants, one with an `e` the other +-- not: +-- +-- > [| or [e| +-- > [|| or [e|| +-- +-- This type indicates whether the 'e' is present or not. +data HasE = HasE | NoE + deriving (Eq, Ord, Data, Show) diff --git a/compiler/GHC/Parser/CharClass.hs b/compiler/GHC/Parser/CharClass.hs new file mode 100644 index 0000000000..dc98d48f94 --- /dev/null +++ b/compiler/GHC/Parser/CharClass.hs @@ -0,0 +1,215 @@ +-- Character classification +{-# LANGUAGE CPP #-} +module GHC.Parser.CharClass + ( is_ident -- Char# -> Bool + , is_symbol -- Char# -> Bool + , is_any -- Char# -> Bool + , is_space -- Char# -> Bool + , is_lower -- Char# -> Bool + , is_upper -- Char# -> Bool + , is_digit -- Char# -> Bool + , is_alphanum -- Char# -> Bool + + , is_decdigit, is_hexdigit, is_octdigit, is_bindigit + , hexDigit, octDecDigit + ) where + +#include "HsVersions.h" + +import GhcPrelude + +import Data.Bits ( Bits((.&.),(.|.)) ) +import Data.Char ( ord, chr ) +import Data.Word +import Panic + +-- Bit masks + +cIdent, cSymbol, cAny, cSpace, cLower, cUpper, cDigit :: Word8 +cIdent = 1 +cSymbol = 2 +cAny = 4 +cSpace = 8 +cLower = 16 +cUpper = 32 +cDigit = 64 + +-- | The predicates below look costly, but aren't, GHC+GCC do a great job +-- at the big case below. + +{-# INLINABLE is_ctype #-} +is_ctype :: Word8 -> Char -> Bool +is_ctype mask c = (charType c .&. mask) /= 0 + +is_ident, is_symbol, is_any, is_space, is_lower, is_upper, is_digit, + is_alphanum :: Char -> Bool +is_ident = is_ctype cIdent +is_symbol = is_ctype cSymbol +is_any = is_ctype cAny +is_space = is_ctype cSpace +is_lower = is_ctype cLower +is_upper = is_ctype cUpper +is_digit = is_ctype cDigit +is_alphanum = is_ctype (cLower+cUpper+cDigit) + +-- Utils + +hexDigit :: Char -> Int +hexDigit c | is_decdigit c = ord c - ord '0' + | otherwise = ord (to_lower c) - ord 'a' + 10 + +octDecDigit :: Char -> Int +octDecDigit c = ord c - ord '0' + +is_decdigit :: Char -> Bool +is_decdigit c + = c >= '0' && c <= '9' + +is_hexdigit :: Char -> Bool +is_hexdigit c + = is_decdigit c + || (c >= 'a' && c <= 'f') + || (c >= 'A' && c <= 'F') + +is_octdigit :: Char -> Bool +is_octdigit c = c >= '0' && c <= '7' + +is_bindigit :: Char -> Bool +is_bindigit c = c == '0' || c == '1' + +to_lower :: Char -> Char +to_lower c + | c >= 'A' && c <= 'Z' = chr (ord c - (ord 'A' - ord 'a')) + | otherwise = c + +charType :: Char -> Word8 +charType c = case c of + '\0' -> 0 -- \000 + '\1' -> 0 -- \001 + '\2' -> 0 -- \002 + '\3' -> 0 -- \003 + '\4' -> 0 -- \004 + '\5' -> 0 -- \005 + '\6' -> 0 -- \006 + '\7' -> 0 -- \007 + '\8' -> 0 -- \010 + '\9' -> cSpace -- \t (not allowed in strings, so !cAny) + '\10' -> cSpace -- \n (ditto) + '\11' -> cSpace -- \v (ditto) + '\12' -> cSpace -- \f (ditto) + '\13' -> cSpace -- ^M (ditto) + '\14' -> 0 -- \016 + '\15' -> 0 -- \017 + '\16' -> 0 -- \020 + '\17' -> 0 -- \021 + '\18' -> 0 -- \022 + '\19' -> 0 -- \023 + '\20' -> 0 -- \024 + '\21' -> 0 -- \025 + '\22' -> 0 -- \026 + '\23' -> 0 -- \027 + '\24' -> 0 -- \030 + '\25' -> 0 -- \031 + '\26' -> 0 -- \032 + '\27' -> 0 -- \033 + '\28' -> 0 -- \034 + '\29' -> 0 -- \035 + '\30' -> 0 -- \036 + '\31' -> 0 -- \037 + '\32' -> cAny .|. cSpace -- + '\33' -> cAny .|. cSymbol -- ! + '\34' -> cAny -- " + '\35' -> cAny .|. cSymbol -- # + '\36' -> cAny .|. cSymbol -- $ + '\37' -> cAny .|. cSymbol -- % + '\38' -> cAny .|. cSymbol -- & + '\39' -> cAny .|. cIdent -- ' + '\40' -> cAny -- ( + '\41' -> cAny -- ) + '\42' -> cAny .|. cSymbol -- * + '\43' -> cAny .|. cSymbol -- + + '\44' -> cAny -- , + '\45' -> cAny .|. cSymbol -- - + '\46' -> cAny .|. cSymbol -- . + '\47' -> cAny .|. cSymbol -- / + '\48' -> cAny .|. cIdent .|. cDigit -- 0 + '\49' -> cAny .|. cIdent .|. cDigit -- 1 + '\50' -> cAny .|. cIdent .|. cDigit -- 2 + '\51' -> cAny .|. cIdent .|. cDigit -- 3 + '\52' -> cAny .|. cIdent .|. cDigit -- 4 + '\53' -> cAny .|. cIdent .|. cDigit -- 5 + '\54' -> cAny .|. cIdent .|. cDigit -- 6 + '\55' -> cAny .|. cIdent .|. cDigit -- 7 + '\56' -> cAny .|. cIdent .|. cDigit -- 8 + '\57' -> cAny .|. cIdent .|. cDigit -- 9 + '\58' -> cAny .|. cSymbol -- : + '\59' -> cAny -- ; + '\60' -> cAny .|. cSymbol -- < + '\61' -> cAny .|. cSymbol -- = + '\62' -> cAny .|. cSymbol -- > + '\63' -> cAny .|. cSymbol -- ? + '\64' -> cAny .|. cSymbol -- @ + '\65' -> cAny .|. cIdent .|. cUpper -- A + '\66' -> cAny .|. cIdent .|. cUpper -- B + '\67' -> cAny .|. cIdent .|. cUpper -- C + '\68' -> cAny .|. cIdent .|. cUpper -- D + '\69' -> cAny .|. cIdent .|. cUpper -- E + '\70' -> cAny .|. cIdent .|. cUpper -- F + '\71' -> cAny .|. cIdent .|. cUpper -- G + '\72' -> cAny .|. cIdent .|. cUpper -- H + '\73' -> cAny .|. cIdent .|. cUpper -- I + '\74' -> cAny .|. cIdent .|. cUpper -- J + '\75' -> cAny .|. cIdent .|. cUpper -- K + '\76' -> cAny .|. cIdent .|. cUpper -- L + '\77' -> cAny .|. cIdent .|. cUpper -- M + '\78' -> cAny .|. cIdent .|. cUpper -- N + '\79' -> cAny .|. cIdent .|. cUpper -- O + '\80' -> cAny .|. cIdent .|. cUpper -- P + '\81' -> cAny .|. cIdent .|. cUpper -- Q + '\82' -> cAny .|. cIdent .|. cUpper -- R + '\83' -> cAny .|. cIdent .|. cUpper -- S + '\84' -> cAny .|. cIdent .|. cUpper -- T + '\85' -> cAny .|. cIdent .|. cUpper -- U + '\86' -> cAny .|. cIdent .|. cUpper -- V + '\87' -> cAny .|. cIdent .|. cUpper -- W + '\88' -> cAny .|. cIdent .|. cUpper -- X + '\89' -> cAny .|. cIdent .|. cUpper -- Y + '\90' -> cAny .|. cIdent .|. cUpper -- Z + '\91' -> cAny -- [ + '\92' -> cAny .|. cSymbol -- backslash + '\93' -> cAny -- ] + '\94' -> cAny .|. cSymbol -- ^ + '\95' -> cAny .|. cIdent .|. cLower -- _ + '\96' -> cAny -- ` + '\97' -> cAny .|. cIdent .|. cLower -- a + '\98' -> cAny .|. cIdent .|. cLower -- b + '\99' -> cAny .|. cIdent .|. cLower -- c + '\100' -> cAny .|. cIdent .|. cLower -- d + '\101' -> cAny .|. cIdent .|. cLower -- e + '\102' -> cAny .|. cIdent .|. cLower -- f + '\103' -> cAny .|. cIdent .|. cLower -- g + '\104' -> cAny .|. cIdent .|. cLower -- h + '\105' -> cAny .|. cIdent .|. cLower -- i + '\106' -> cAny .|. cIdent .|. cLower -- j + '\107' -> cAny .|. cIdent .|. cLower -- k + '\108' -> cAny .|. cIdent .|. cLower -- l + '\109' -> cAny .|. cIdent .|. cLower -- m + '\110' -> cAny .|. cIdent .|. cLower -- n + '\111' -> cAny .|. cIdent .|. cLower -- o + '\112' -> cAny .|. cIdent .|. cLower -- p + '\113' -> cAny .|. cIdent .|. cLower -- q + '\114' -> cAny .|. cIdent .|. cLower -- r + '\115' -> cAny .|. cIdent .|. cLower -- s + '\116' -> cAny .|. cIdent .|. cLower -- t + '\117' -> cAny .|. cIdent .|. cLower -- u + '\118' -> cAny .|. cIdent .|. cLower -- v + '\119' -> cAny .|. cIdent .|. cLower -- w + '\120' -> cAny .|. cIdent .|. cLower -- x + '\121' -> cAny .|. cIdent .|. cLower -- y + '\122' -> cAny .|. cIdent .|. cLower -- z + '\123' -> cAny -- { + '\124' -> cAny .|. cSymbol -- | + '\125' -> cAny -- } + '\126' -> cAny .|. cSymbol -- ~ + '\127' -> 0 -- \177 + _ -> panic ("charType: " ++ show c) diff --git a/compiler/GHC/Parser/Header.hs b/compiler/GHC/Parser/Header.hs new file mode 100644 index 0000000000..e2373827f4 --- /dev/null +++ b/compiler/GHC/Parser/Header.hs @@ -0,0 +1,361 @@ +{-# LANGUAGE CPP #-} +{-# LANGUAGE ViewPatterns #-} +{-# LANGUAGE TypeFamilies #-} + +----------------------------------------------------------------------------- +-- +-- | Parsing the top of a Haskell source file to get its module name, +-- imports and options. +-- +-- (c) Simon Marlow 2005 +-- (c) Lemmih 2006 +-- +----------------------------------------------------------------------------- + +module GHC.Parser.Header + ( getImports + , mkPrelImports -- used by the renamer too + , getOptionsFromFile + , getOptions + , optionsErrorMsgs + , checkProcessArgsResult + ) +where + +#include "HsVersions.h" + +import GhcPrelude + +import GHC.Platform +import GHC.Driver.Types +import GHC.Parser ( parseHeader ) +import GHC.Parser.Lexer +import FastString +import GHC.Hs +import GHC.Types.Module +import GHC.Builtin.Names +import StringBuffer +import GHC.Types.SrcLoc +import GHC.Driver.Session +import ErrUtils +import Util +import Outputable +import Maybes +import Bag ( emptyBag, listToBag, unitBag ) +import MonadUtils +import Exception +import GHC.Types.Basic +import qualified GHC.LanguageExtensions as LangExt + +import Control.Monad +import System.IO +import System.IO.Unsafe +import Data.List + +------------------------------------------------------------------------------ + +-- | Parse the imports of a source file. +-- +-- Throws a 'SourceError' if parsing fails. +getImports :: DynFlags + -> StringBuffer -- ^ Parse this. + -> FilePath -- ^ Filename the buffer came from. Used for + -- reporting parse error locations. + -> FilePath -- ^ The original source filename (used for locations + -- in the function result) + -> IO (Either + ErrorMessages + ([(Maybe FastString, Located ModuleName)], + [(Maybe FastString, Located ModuleName)], + Located ModuleName)) + -- ^ The source imports and normal imports (with optional package + -- names from -XPackageImports), and the module name. +getImports dflags buf filename source_filename = do + let loc = mkRealSrcLoc (mkFastString filename) 1 1 + case unP parseHeader (mkPState dflags buf loc) of + PFailed pst -> + -- assuming we're not logging warnings here as per below + return $ Left $ getErrorMessages pst dflags + POk pst rdr_module -> fmap Right $ do + let _ms@(_warns, errs) = getMessages pst dflags + -- don't log warnings: they'll be reported when we parse the file + -- for real. See #2500. + ms = (emptyBag, errs) + -- logWarnings warns + if errorsFound dflags ms + then throwIO $ mkSrcErr errs + else + let hsmod = unLoc rdr_module + mb_mod = hsmodName hsmod + imps = hsmodImports hsmod + main_loc = srcLocSpan (mkSrcLoc (mkFastString source_filename) + 1 1) + mod = mb_mod `orElse` L main_loc mAIN_NAME + (src_idecls, ord_idecls) = partition (ideclSource.unLoc) imps + + -- GHC.Prim doesn't exist physically, so don't go looking for it. + ordinary_imps = filter ((/= moduleName gHC_PRIM) . unLoc + . ideclName . unLoc) + ord_idecls + + implicit_prelude = xopt LangExt.ImplicitPrelude dflags + implicit_imports = mkPrelImports (unLoc mod) main_loc + implicit_prelude imps + convImport (L _ i) = (fmap sl_fs (ideclPkgQual i), ideclName i) + in + return (map convImport src_idecls, + map convImport (implicit_imports ++ ordinary_imps), + mod) + +mkPrelImports :: ModuleName + -> SrcSpan -- Attribute the "import Prelude" to this location + -> Bool -> [LImportDecl GhcPs] + -> [LImportDecl GhcPs] +-- Construct the implicit declaration "import Prelude" (or not) +-- +-- NB: opt_NoImplicitPrelude is slightly different to import Prelude (); +-- because the former doesn't even look at Prelude.hi for instance +-- declarations, whereas the latter does. +mkPrelImports this_mod loc implicit_prelude import_decls + | this_mod == pRELUDE_NAME + || explicit_prelude_import + || not implicit_prelude + = [] + | otherwise = [preludeImportDecl] + where + explicit_prelude_import + = notNull [ () | L _ (ImportDecl { ideclName = mod + , ideclPkgQual = Nothing }) + <- import_decls + , unLoc mod == pRELUDE_NAME ] + + preludeImportDecl :: LImportDecl GhcPs + preludeImportDecl + = L loc $ ImportDecl { ideclExt = noExtField, + ideclSourceSrc = NoSourceText, + ideclName = L loc pRELUDE_NAME, + ideclPkgQual = Nothing, + ideclSource = False, + ideclSafe = False, -- Not a safe import + ideclQualified = NotQualified, + ideclImplicit = True, -- Implicit! + ideclAs = Nothing, + ideclHiding = Nothing } + +-------------------------------------------------------------- +-- Get options +-------------------------------------------------------------- + +-- | Parse OPTIONS and LANGUAGE pragmas of the source file. +-- +-- Throws a 'SourceError' if flag parsing fails (including unsupported flags.) +getOptionsFromFile :: DynFlags + -> FilePath -- ^ Input file + -> IO [Located String] -- ^ Parsed options, if any. +getOptionsFromFile dflags filename + = Exception.bracket + (openBinaryFile filename ReadMode) + (hClose) + (\handle -> do + opts <- fmap (getOptions' dflags) + (lazyGetToks dflags' filename handle) + seqList opts $ return opts) + where -- We don't need to get haddock doc tokens when we're just + -- getting the options from pragmas, and lazily lexing them + -- correctly is a little tricky: If there is "\n" or "\n-" + -- left at the end of a buffer then the haddock doc may + -- continue past the end of the buffer, despite the fact that + -- we already have an apparently-complete token. + -- We therefore just turn Opt_Haddock off when doing the lazy + -- lex. + dflags' = gopt_unset dflags Opt_Haddock + +blockSize :: Int +-- blockSize = 17 -- for testing :-) +blockSize = 1024 + +lazyGetToks :: DynFlags -> FilePath -> Handle -> IO [Located Token] +lazyGetToks dflags filename handle = do + buf <- hGetStringBufferBlock handle blockSize + unsafeInterleaveIO $ lazyLexBuf handle (pragState dflags buf loc) False blockSize + where + loc = mkRealSrcLoc (mkFastString filename) 1 1 + + lazyLexBuf :: Handle -> PState -> Bool -> Int -> IO [Located Token] + lazyLexBuf handle state eof size = do + case unP (lexer False return) state of + POk state' t -> do + -- pprTrace "lazyLexBuf" (text (show (buffer state'))) (return ()) + if atEnd (buffer state') && not eof + -- if this token reached the end of the buffer, and we haven't + -- necessarily read up to the end of the file, then the token might + -- be truncated, so read some more of the file and lex it again. + then getMore handle state size + else case unLoc t of + ITeof -> return [t] + _other -> do rest <- lazyLexBuf handle state' eof size + return (t : rest) + _ | not eof -> getMore handle state size + | otherwise -> return [L (mkSrcSpanPs (last_loc state)) ITeof] + -- parser assumes an ITeof sentinel at the end + + getMore :: Handle -> PState -> Int -> IO [Located Token] + getMore handle state size = do + -- pprTrace "getMore" (text (show (buffer state))) (return ()) + let new_size = size * 2 + -- double the buffer size each time we read a new block. This + -- counteracts the quadratic slowdown we otherwise get for very + -- large module names (#5981) + nextbuf <- hGetStringBufferBlock handle new_size + if (len nextbuf == 0) then lazyLexBuf handle state True new_size else do + newbuf <- appendStringBuffers (buffer state) nextbuf + unsafeInterleaveIO $ lazyLexBuf handle state{buffer=newbuf} False new_size + + +getToks :: DynFlags -> FilePath -> StringBuffer -> [Located Token] +getToks dflags filename buf = lexAll (pragState dflags buf loc) + where + loc = mkRealSrcLoc (mkFastString filename) 1 1 + + lexAll state = case unP (lexer False return) state of + POk _ t@(L _ ITeof) -> [t] + POk state' t -> t : lexAll state' + _ -> [L (mkSrcSpanPs (last_loc state)) ITeof] + + +-- | Parse OPTIONS and LANGUAGE pragmas of the source file. +-- +-- Throws a 'SourceError' if flag parsing fails (including unsupported flags.) +getOptions :: DynFlags + -> StringBuffer -- ^ Input Buffer + -> FilePath -- ^ Source filename. Used for location info. + -> [Located String] -- ^ Parsed options. +getOptions dflags buf filename + = getOptions' dflags (getToks dflags filename buf) + +-- The token parser is written manually because Happy can't +-- return a partial result when it encounters a lexer error. +-- We want to extract options before the buffer is passed through +-- CPP, so we can't use the same trick as 'getImports'. +getOptions' :: DynFlags + -> [Located Token] -- Input buffer + -> [Located String] -- Options. +getOptions' dflags toks + = parseToks toks + where + parseToks (open:close:xs) + | IToptions_prag str <- unLoc open + , ITclose_prag <- unLoc close + = case toArgs str of + Left _err -> optionsParseError str dflags $ -- #15053 + combineSrcSpans (getLoc open) (getLoc close) + Right args -> map (L (getLoc open)) args ++ parseToks xs + parseToks (open:close:xs) + | ITinclude_prag str <- unLoc open + , ITclose_prag <- unLoc close + = map (L (getLoc open)) ["-#include",removeSpaces str] ++ + parseToks xs + parseToks (open:close:xs) + | ITdocOptions str <- unLoc open + , ITclose_prag <- unLoc close + = map (L (getLoc open)) ["-haddock-opts", removeSpaces str] + ++ parseToks xs + parseToks (open:xs) + | ITlanguage_prag <- unLoc open + = parseLanguage xs + parseToks (comment:xs) -- Skip over comments + | isComment (unLoc comment) + = parseToks xs + parseToks _ = [] + parseLanguage ((L loc (ITconid fs)):rest) + = checkExtension dflags (L loc fs) : + case rest of + (L _loc ITcomma):more -> parseLanguage more + (L _loc ITclose_prag):more -> parseToks more + (L loc _):_ -> languagePragParseError dflags loc + [] -> panic "getOptions'.parseLanguage(1) went past eof token" + parseLanguage (tok:_) + = languagePragParseError dflags (getLoc tok) + parseLanguage [] + = panic "getOptions'.parseLanguage(2) went past eof token" + + isComment :: Token -> Bool + isComment c = + case c of + (ITlineComment {}) -> True + (ITblockComment {}) -> True + (ITdocCommentNext {}) -> True + (ITdocCommentPrev {}) -> True + (ITdocCommentNamed {}) -> True + (ITdocSection {}) -> True + _ -> False + +----------------------------------------------------------------------------- + +-- | Complain about non-dynamic flags in OPTIONS pragmas. +-- +-- Throws a 'SourceError' if the input list is non-empty claiming that the +-- input flags are unknown. +checkProcessArgsResult :: MonadIO m => DynFlags -> [Located String] -> m () +checkProcessArgsResult dflags flags + = when (notNull flags) $ + liftIO $ throwIO $ mkSrcErr $ listToBag $ map mkMsg flags + where mkMsg (L loc flag) + = mkPlainErrMsg dflags loc $ + (text "unknown flag in {-# OPTIONS_GHC #-} pragma:" <+> + text flag) + +----------------------------------------------------------------------------- + +checkExtension :: DynFlags -> Located FastString -> Located String +checkExtension dflags (L l ext) +-- Checks if a given extension is valid, and if so returns +-- its corresponding flag. Otherwise it throws an exception. + = if ext' `elem` supported + then L l ("-X"++ext') + else unsupportedExtnError dflags l ext' + where + ext' = unpackFS ext + supported = supportedLanguagesAndExtensions $ platformMini $ targetPlatform dflags + +languagePragParseError :: DynFlags -> SrcSpan -> a +languagePragParseError dflags loc = + throwErr dflags loc $ + vcat [ text "Cannot parse LANGUAGE pragma" + , text "Expecting comma-separated list of language options," + , text "each starting with a capital letter" + , nest 2 (text "E.g. {-# LANGUAGE TemplateHaskell, GADTs #-}") ] + +unsupportedExtnError :: DynFlags -> SrcSpan -> String -> a +unsupportedExtnError dflags loc unsup = + throwErr dflags loc $ + text "Unsupported extension: " <> text unsup $$ + if null suggestions then Outputable.empty else text "Perhaps you meant" <+> quotedListWithOr (map text suggestions) + where + supported = supportedLanguagesAndExtensions $ platformMini $ targetPlatform dflags + suggestions = fuzzyMatch unsup supported + + +optionsErrorMsgs :: DynFlags -> [String] -> [Located String] -> FilePath -> Messages +optionsErrorMsgs dflags unhandled_flags flags_lines _filename + = (emptyBag, listToBag (map mkMsg unhandled_flags_lines)) + where unhandled_flags_lines :: [Located String] + unhandled_flags_lines = [ L l f + | f <- unhandled_flags + , L l f' <- flags_lines + , f == f' ] + mkMsg (L flagSpan flag) = + ErrUtils.mkPlainErrMsg dflags flagSpan $ + text "unknown flag in {-# OPTIONS_GHC #-} pragma:" <+> text flag + +optionsParseError :: String -> DynFlags -> SrcSpan -> a -- #15053 +optionsParseError str dflags loc = + throwErr dflags loc $ + vcat [ text "Error while parsing OPTIONS_GHC pragma." + , text "Expecting whitespace-separated list of GHC options." + , text " E.g. {-# OPTIONS_GHC -Wall -O2 #-}" + , text ("Input was: " ++ show str) ] + +throwErr :: DynFlags -> SrcSpan -> SDoc -> a -- #15053 +throwErr dflags loc doc = + throw $ mkSrcErr $ unitBag $ mkPlainErrMsg dflags loc doc diff --git a/compiler/GHC/Parser/Lexer.x b/compiler/GHC/Parser/Lexer.x new file mode 100644 index 0000000000..17b6674c95 --- /dev/null +++ b/compiler/GHC/Parser/Lexer.x @@ -0,0 +1,3294 @@ +----------------------------------------------------------------------------- +-- (c) The University of Glasgow, 2006 +-- +-- GHC's lexer for Haskell 2010 [1]. +-- +-- This is a combination of an Alex-generated lexer [2] from a regex +-- definition, with some hand-coded bits. [3] +-- +-- Completely accurate information about token-spans within the source +-- file is maintained. Every token has a start and end RealSrcLoc +-- attached to it. +-- +-- References: +-- [1] https://www.haskell.org/onlinereport/haskell2010/haskellch2.html +-- [2] http://www.haskell.org/alex/ +-- [3] https://gitlab.haskell.org/ghc/ghc/wikis/commentary/compiler/parser +-- +----------------------------------------------------------------------------- + +-- ToDo / known bugs: +-- - parsing integers is a bit slow +-- - readRational is a bit slow +-- +-- Known bugs, that were also in the previous version: +-- - M... should be 3 tokens, not 1. +-- - pragma-end should be only valid in a pragma + +-- qualified operator NOTES. +-- +-- - If M.(+) is a single lexeme, then.. +-- - Probably (+) should be a single lexeme too, for consistency. +-- Otherwise ( + ) would be a prefix operator, but M.( + ) would not be. +-- - But we have to rule out reserved operators, otherwise (..) becomes +-- a different lexeme. +-- - Should we therefore also rule out reserved operators in the qualified +-- form? This is quite difficult to achieve. We don't do it for +-- qualified varids. + + +-- ----------------------------------------------------------------------------- +-- Alex "Haskell code fragment top" + +{ +{-# LANGUAGE BangPatterns #-} +{-# LANGUAGE LambdaCase #-} +{-# LANGUAGE MultiWayIf #-} + +{-# OPTIONS_GHC -funbox-strict-fields #-} +{-# OPTIONS_GHC -Wno-incomplete-uni-patterns #-} + +module GHC.Parser.Lexer ( + Token(..), lexer, lexerDbg, pragState, mkPState, mkPStatePure, PState(..), + P(..), ParseResult(..), mkParserFlags, mkParserFlags', ParserFlags(..), + appendWarning, + appendError, + allocateComments, + MonadP(..), + getRealSrcLoc, getPState, withThisPackage, + failMsgP, failLocMsgP, srcParseFail, + getErrorMessages, getMessages, + popContext, pushModuleContext, setLastToken, setSrcLoc, + activeContext, nextIsEOF, + getLexState, popLexState, pushLexState, + ExtBits(..), + xtest, + lexTokenStream, + AddAnn(..),mkParensApiAnn, + addAnnsAt, + commentToAnnotation + ) where + +import GhcPrelude + +-- base +import Control.Monad +import Data.Bits +import Data.Char +import Data.List +import Data.Maybe +import Data.Word + +import EnumSet (EnumSet) +import qualified EnumSet + +-- ghc-boot +import qualified GHC.LanguageExtensions as LangExt + +-- bytestring +import Data.ByteString (ByteString) + +-- containers +import Data.Map (Map) +import qualified Data.Map as Map + +-- compiler/utils +import Bag +import Outputable +import StringBuffer +import FastString +import GHC.Types.Unique.FM +import Util ( readRational, readHexRational ) + +-- compiler/main +import ErrUtils +import GHC.Driver.Session as DynFlags + +-- compiler/basicTypes +import GHC.Types.SrcLoc +import GHC.Types.Module +import GHC.Types.Basic ( InlineSpec(..), RuleMatchInfo(..), + IntegralLit(..), FractionalLit(..), + SourceText(..) ) + +-- compiler/parser +import GHC.Parser.CharClass + +import GHC.Parser.Annotation +} + +-- ----------------------------------------------------------------------------- +-- Alex "Character set macros" + +-- NB: The logic behind these definitions is also reflected in basicTypes/Lexeme.hs +-- Any changes here should likely be reflected there. +$unispace = \x05 -- Trick Alex into handling Unicode. See [Unicode in Alex]. +$nl = [\n\r\f] +$whitechar = [$nl\v\ $unispace] +$white_no_nl = $whitechar # \n -- TODO #8424 +$tab = \t + +$ascdigit = 0-9 +$unidigit = \x03 -- Trick Alex into handling Unicode. See [Unicode in Alex]. +$decdigit = $ascdigit -- for now, should really be $digit (ToDo) +$digit = [$ascdigit $unidigit] + +$special = [\(\)\,\;\[\]\`\{\}] +$ascsymbol = [\!\#\$\%\&\*\+\.\/\<\=\>\?\@\\\^\|\-\~\:] +$unisymbol = \x04 -- Trick Alex into handling Unicode. See [Unicode in Alex]. +$symbol = [$ascsymbol $unisymbol] # [$special \_\"\'] + +$unilarge = \x01 -- Trick Alex into handling Unicode. See [Unicode in Alex]. +$asclarge = [A-Z] +$large = [$asclarge $unilarge] + +$unismall = \x02 -- Trick Alex into handling Unicode. See [Unicode in Alex]. +$ascsmall = [a-z] +$small = [$ascsmall $unismall \_] + +$unigraphic = \x06 -- Trick Alex into handling Unicode. See [Unicode in Alex]. +$graphic = [$small $large $symbol $digit $special $unigraphic \"\'] + +$binit = 0-1 +$octit = 0-7 +$hexit = [$decdigit A-F a-f] + +$uniidchar = \x07 -- Trick Alex into handling Unicode. See [Unicode in Alex]. +$idchar = [$small $large $digit $uniidchar \'] + +$pragmachar = [$small $large $digit] + +$docsym = [\| \^ \* \$] + + +-- ----------------------------------------------------------------------------- +-- Alex "Regular expression macros" + +@varid = $small $idchar* -- variable identifiers +@conid = $large $idchar* -- constructor identifiers + +@varsym = ($symbol # \:) $symbol* -- variable (operator) symbol +@consym = \: $symbol* -- constructor (operator) symbol + +-- See Note [Lexing NumericUnderscores extension] and #14473 +@numspc = _* -- numeric spacer (#14473) +@decimal = $decdigit(@numspc $decdigit)* +@binary = $binit(@numspc $binit)* +@octal = $octit(@numspc $octit)* +@hexadecimal = $hexit(@numspc $hexit)* +@exponent = @numspc [eE] [\-\+]? @decimal +@bin_exponent = @numspc [pP] [\-\+]? @decimal + +@qual = (@conid \.)+ +@qvarid = @qual @varid +@qconid = @qual @conid +@qvarsym = @qual @varsym +@qconsym = @qual @consym + +@floating_point = @numspc @decimal \. @decimal @exponent? | @numspc @decimal @exponent +@hex_floating_point = @numspc @hexadecimal \. @hexadecimal @bin_exponent? | @numspc @hexadecimal @bin_exponent + +-- normal signed numerical literals can only be explicitly negative, +-- not explicitly positive (contrast @exponent) +@negative = \- +@signed = @negative ? + + +-- ----------------------------------------------------------------------------- +-- Alex "Identifier" + +haskell :- + + +-- ----------------------------------------------------------------------------- +-- Alex "Rules" + +-- everywhere: skip whitespace +$white_no_nl+ ; +$tab { warnTab } + +-- Everywhere: deal with nested comments. We explicitly rule out +-- pragmas, "{-#", so that we don't accidentally treat them as comments. +-- (this can happen even though pragmas will normally take precedence due to +-- longest-match, because pragmas aren't valid in every state, but comments +-- are). We also rule out nested Haddock comments, if the -haddock flag is +-- set. + +"{-" / { isNormalComment } { nested_comment lexToken } + +-- Single-line comments are a bit tricky. Haskell 98 says that two or +-- more dashes followed by a symbol should be parsed as a varsym, so we +-- have to exclude those. + +-- Since Haddock comments aren't valid in every state, we need to rule them +-- out here. + +-- The following two rules match comments that begin with two dashes, but +-- continue with a different character. The rules test that this character +-- is not a symbol (in which case we'd have a varsym), and that it's not a +-- space followed by a Haddock comment symbol (docsym) (in which case we'd +-- have a Haddock comment). The rules then munch the rest of the line. + +"-- " ~$docsym .* { lineCommentToken } +"--" [^$symbol \ ] .* { lineCommentToken } + +-- Next, match Haddock comments if no -haddock flag + +"-- " $docsym .* / { alexNotPred (ifExtension HaddockBit) } { lineCommentToken } + +-- Now, when we've matched comments that begin with 2 dashes and continue +-- with a different character, we need to match comments that begin with three +-- or more dashes (which clearly can't be Haddock comments). We only need to +-- make sure that the first non-dash character isn't a symbol, and munch the +-- rest of the line. + +"---"\-* ~$symbol .* { lineCommentToken } + +-- Since the previous rules all match dashes followed by at least one +-- character, we also need to match a whole line filled with just dashes. + +"--"\-* / { atEOL } { lineCommentToken } + +-- We need this rule since none of the other single line comment rules +-- actually match this case. + +"-- " / { atEOL } { lineCommentToken } + +-- 'bol' state: beginning of a line. Slurp up all the whitespace (including +-- blank lines) until we find a non-whitespace character, then do layout +-- processing. +-- +-- One slight wibble here: what if the line begins with {-#? In +-- theory, we have to lex the pragma to see if it's one we recognise, +-- and if it is, then we backtrack and do_bol, otherwise we treat it +-- as a nested comment. We don't bother with this: if the line begins +-- with {-#, then we'll assume it's a pragma we know about and go for do_bol. +<bol> { + \n ; + ^\# line { begin line_prag1 } + ^\# / { followedByDigit } { begin line_prag1 } + ^\# pragma .* \n ; -- GCC 3.3 CPP generated, apparently + ^\# \! .* \n ; -- #!, for scripts + () { do_bol } +} + +-- after a layout keyword (let, where, do, of), we begin a new layout +-- context if the curly brace is missing. +-- Careful! This stuff is quite delicate. +<layout, layout_do, layout_if> { + \{ / { notFollowedBy '-' } { hopefully_open_brace } + -- we might encounter {-# here, but {- has been handled already + \n ; + ^\# (line)? { begin line_prag1 } +} + +-- after an 'if', a vertical bar starts a layout context for MultiWayIf +<layout_if> { + \| / { notFollowedBySymbol } { new_layout_context True dontGenerateSemic ITvbar } + () { pop } +} + +-- do is treated in a subtly different way, see new_layout_context +<layout> () { new_layout_context True generateSemic ITvocurly } +<layout_do> () { new_layout_context False generateSemic ITvocurly } + +-- after a new layout context which was found to be to the left of the +-- previous context, we have generated a '{' token, and we now need to +-- generate a matching '}' token. +<layout_left> () { do_layout_left } + +<0,option_prags> \n { begin bol } + +"{-#" $whitechar* $pragmachar+ / { known_pragma linePrags } + { dispatch_pragmas linePrags } + +-- single-line line pragmas, of the form +-- # <line> "<file>" <extra-stuff> \n +<line_prag1> { + @decimal $white_no_nl+ \" [$graphic \ ]* \" { setLineAndFile line_prag1a } + () { failLinePrag1 } +} +<line_prag1a> .* { popLinePrag1 } + +-- Haskell-style line pragmas, of the form +-- {-# LINE <line> "<file>" #-} +<line_prag2> { + @decimal $white_no_nl+ \" [$graphic \ ]* \" { setLineAndFile line_prag2a } +} +<line_prag2a> "#-}"|"-}" { pop } + -- NOTE: accept -} at the end of a LINE pragma, for compatibility + -- with older versions of GHC which generated these. + +-- Haskell-style column pragmas, of the form +-- {-# COLUMN <column> #-} +<column_prag> @decimal $whitechar* "#-}" { setColumn } + +<0,option_prags> { + "{-#" $whitechar* $pragmachar+ + $whitechar+ $pragmachar+ / { known_pragma twoWordPrags } + { dispatch_pragmas twoWordPrags } + + "{-#" $whitechar* $pragmachar+ / { known_pragma oneWordPrags } + { dispatch_pragmas oneWordPrags } + + -- We ignore all these pragmas, but don't generate a warning for them + "{-#" $whitechar* $pragmachar+ / { known_pragma ignoredPrags } + { dispatch_pragmas ignoredPrags } + + -- ToDo: should only be valid inside a pragma: + "#-}" { endPrag } +} + +<option_prags> { + "{-#" $whitechar* $pragmachar+ / { known_pragma fileHeaderPrags } + { dispatch_pragmas fileHeaderPrags } +} + +<0> { + -- In the "0" mode we ignore these pragmas + "{-#" $whitechar* $pragmachar+ / { known_pragma fileHeaderPrags } + { nested_comment lexToken } +} + +<0,option_prags> { + "{-#" { warnThen Opt_WarnUnrecognisedPragmas (text "Unrecognised pragma") + (nested_comment lexToken) } +} + +-- '0' state: ordinary lexemes + +-- Haddock comments + +<0,option_prags> { + "-- " $docsym / { ifExtension HaddockBit } { multiline_doc_comment } + "{-" \ ? $docsym / { ifExtension HaddockBit } { nested_doc_comment } +} + +-- "special" symbols + +<0> { + "[|" / { ifExtension ThQuotesBit } { token (ITopenExpQuote NoE NormalSyntax) } + "[||" / { ifExtension ThQuotesBit } { token (ITopenTExpQuote NoE) } + "[e|" / { ifExtension ThQuotesBit } { token (ITopenExpQuote HasE NormalSyntax) } + "[e||" / { ifExtension ThQuotesBit } { token (ITopenTExpQuote HasE) } + "[p|" / { ifExtension ThQuotesBit } { token ITopenPatQuote } + "[d|" / { ifExtension ThQuotesBit } { layout_token ITopenDecQuote } + "[t|" / { ifExtension ThQuotesBit } { token ITopenTypQuote } + "|]" / { ifExtension ThQuotesBit } { token (ITcloseQuote NormalSyntax) } + "||]" / { ifExtension ThQuotesBit } { token ITcloseTExpQuote } + + "[" @varid "|" / { ifExtension QqBit } { lex_quasiquote_tok } + + -- qualified quasi-quote (#5555) + "[" @qvarid "|" / { ifExtension QqBit } { lex_qquasiquote_tok } + + $unigraphic -- ⟦ + / { ifCurrentChar '⟦' `alexAndPred` + ifExtension UnicodeSyntaxBit `alexAndPred` + ifExtension ThQuotesBit } + { token (ITopenExpQuote NoE UnicodeSyntax) } + $unigraphic -- ⟧ + / { ifCurrentChar '⟧' `alexAndPred` + ifExtension UnicodeSyntaxBit `alexAndPred` + ifExtension ThQuotesBit } + { token (ITcloseQuote UnicodeSyntax) } +} + +<0> { + "(|" + / { ifExtension ArrowsBit `alexAndPred` + notFollowedBySymbol } + { special (IToparenbar NormalSyntax) } + "|)" + / { ifExtension ArrowsBit } + { special (ITcparenbar NormalSyntax) } + + $unigraphic -- ⦇ + / { ifCurrentChar '⦇' `alexAndPred` + ifExtension UnicodeSyntaxBit `alexAndPred` + ifExtension ArrowsBit } + { special (IToparenbar UnicodeSyntax) } + $unigraphic -- ⦈ + / { ifCurrentChar '⦈' `alexAndPred` + ifExtension UnicodeSyntaxBit `alexAndPred` + ifExtension ArrowsBit } + { special (ITcparenbar UnicodeSyntax) } +} + +<0> { + \? @varid / { ifExtension IpBit } { skip_one_varid ITdupipvarid } +} + +<0> { + "#" @varid / { ifExtension OverloadedLabelsBit } { skip_one_varid ITlabelvarid } +} + +<0> { + "(#" / { ifExtension UnboxedTuplesBit `alexOrPred` + ifExtension UnboxedSumsBit } + { token IToubxparen } + "#)" / { ifExtension UnboxedTuplesBit `alexOrPred` + ifExtension UnboxedSumsBit } + { token ITcubxparen } +} + +<0,option_prags> { + \( { special IToparen } + \) { special ITcparen } + \[ { special ITobrack } + \] { special ITcbrack } + \, { special ITcomma } + \; { special ITsemi } + \` { special ITbackquote } + + \{ { open_brace } + \} { close_brace } +} + +<0,option_prags> { + @qvarid { idtoken qvarid } + @qconid { idtoken qconid } + @varid { varid } + @conid { idtoken conid } +} + +<0> { + @qvarid "#"+ / { ifExtension MagicHashBit } { idtoken qvarid } + @qconid "#"+ / { ifExtension MagicHashBit } { idtoken qconid } + @varid "#"+ / { ifExtension MagicHashBit } { varid } + @conid "#"+ / { ifExtension MagicHashBit } { idtoken conid } +} + +-- Operators classified into prefix, suffix, tight infix, and loose infix. +-- See Note [Whitespace-sensitive operator parsing] +<0> { + @varsym / { precededByClosingToken `alexAndPred` followedByOpeningToken } { varsym_tight_infix } + @varsym / { followedByOpeningToken } { varsym_prefix } + @varsym / { precededByClosingToken } { varsym_suffix } + @varsym { varsym_loose_infix } +} + +-- ToDo: - move `var` and (sym) into lexical syntax? +-- - remove backquote from $special? +<0> { + @qvarsym { idtoken qvarsym } + @qconsym { idtoken qconsym } + @consym { consym } +} + +-- For the normal boxed literals we need to be careful +-- when trying to be close to Haskell98 + +-- Note [Lexing NumericUnderscores extension] (#14473) +-- +-- NumericUnderscores extension allows underscores in numeric literals. +-- Multiple underscores are represented with @numspc macro. +-- To be simpler, we have only the definitions with underscores. +-- And then we have a separate function (tok_integral and tok_frac) +-- that validates the literals. +-- If extensions are not enabled, check that there are no underscores. +-- +<0> { + -- Normal integral literals (:: Num a => a, from Integer) + @decimal { tok_num positive 0 0 decimal } + 0[bB] @numspc @binary / { ifExtension BinaryLiteralsBit } { tok_num positive 2 2 binary } + 0[oO] @numspc @octal { tok_num positive 2 2 octal } + 0[xX] @numspc @hexadecimal { tok_num positive 2 2 hexadecimal } + @negative @decimal / { ifExtension NegativeLiteralsBit } { tok_num negative 1 1 decimal } + @negative 0[bB] @numspc @binary / { ifExtension NegativeLiteralsBit `alexAndPred` + ifExtension BinaryLiteralsBit } { tok_num negative 3 3 binary } + @negative 0[oO] @numspc @octal / { ifExtension NegativeLiteralsBit } { tok_num negative 3 3 octal } + @negative 0[xX] @numspc @hexadecimal / { ifExtension NegativeLiteralsBit } { tok_num negative 3 3 hexadecimal } + + -- Normal rational literals (:: Fractional a => a, from Rational) + @floating_point { tok_frac 0 tok_float } + @negative @floating_point / { ifExtension NegativeLiteralsBit } { tok_frac 0 tok_float } + 0[xX] @numspc @hex_floating_point / { ifExtension HexFloatLiteralsBit } { tok_frac 0 tok_hex_float } + @negative 0[xX] @numspc @hex_floating_point + / { ifExtension HexFloatLiteralsBit `alexAndPred` + ifExtension NegativeLiteralsBit } { tok_frac 0 tok_hex_float } +} + +<0> { + -- Unboxed ints (:: Int#) and words (:: Word#) + -- It's simpler (and faster?) to give separate cases to the negatives, + -- especially considering octal/hexadecimal prefixes. + @decimal \# / { ifExtension MagicHashBit } { tok_primint positive 0 1 decimal } + 0[bB] @numspc @binary \# / { ifExtension MagicHashBit `alexAndPred` + ifExtension BinaryLiteralsBit } { tok_primint positive 2 3 binary } + 0[oO] @numspc @octal \# / { ifExtension MagicHashBit } { tok_primint positive 2 3 octal } + 0[xX] @numspc @hexadecimal \# / { ifExtension MagicHashBit } { tok_primint positive 2 3 hexadecimal } + @negative @decimal \# / { ifExtension MagicHashBit } { tok_primint negative 1 2 decimal } + @negative 0[bB] @numspc @binary \# / { ifExtension MagicHashBit `alexAndPred` + ifExtension BinaryLiteralsBit } { tok_primint negative 3 4 binary } + @negative 0[oO] @numspc @octal \# / { ifExtension MagicHashBit } { tok_primint negative 3 4 octal } + @negative 0[xX] @numspc @hexadecimal \# + / { ifExtension MagicHashBit } { tok_primint negative 3 4 hexadecimal } + + @decimal \# \# / { ifExtension MagicHashBit } { tok_primword 0 2 decimal } + 0[bB] @numspc @binary \# \# / { ifExtension MagicHashBit `alexAndPred` + ifExtension BinaryLiteralsBit } { tok_primword 2 4 binary } + 0[oO] @numspc @octal \# \# / { ifExtension MagicHashBit } { tok_primword 2 4 octal } + 0[xX] @numspc @hexadecimal \# \# / { ifExtension MagicHashBit } { tok_primword 2 4 hexadecimal } + + -- Unboxed floats and doubles (:: Float#, :: Double#) + -- prim_{float,double} work with signed literals + @signed @floating_point \# / { ifExtension MagicHashBit } { tok_frac 1 tok_primfloat } + @signed @floating_point \# \# / { ifExtension MagicHashBit } { tok_frac 2 tok_primdouble } +} + +-- Strings and chars are lexed by hand-written code. The reason is +-- that even if we recognise the string or char here in the regex +-- lexer, we would still have to parse the string afterward in order +-- to convert it to a String. +<0> { + \' { lex_char_tok } + \" { lex_string_tok } +} + +-- Note [Whitespace-sensitive operator parsing] +-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +-- In accord with GHC Proposal #229 https://github.com/ghc-proposals/ghc-proposals/blob/master/proposals/0229-whitespace-bang-patterns.rst +-- we classify operator occurrences into four categories: +-- +-- a ! b -- a loose infix occurrence +-- a!b -- a tight infix occurrence +-- a !b -- a prefix occurrence +-- a! b -- a suffix occurrence +-- +-- The rules are a bit more elaborate than simply checking for whitespace, in +-- order to accommodate the following use cases: +-- +-- f (!a) = ... -- prefix occurrence +-- g (a !) -- loose infix occurrence +-- g (! a) -- loose infix occurrence +-- +-- The precise rules are as follows: +-- +-- * Identifiers, literals, and opening brackets (, (#, [, [|, [||, [p|, [e|, +-- [t|, {, are considered "opening tokens". The function followedByOpeningToken +-- tests whether the next token is an opening token. +-- +-- * Identifiers, literals, and closing brackets ), #), ], |], }, +-- are considered "closing tokens". The function precededByClosingToken tests +-- whether the previous token is a closing token. +-- +-- * Whitespace, comments, separators, and other tokens, are considered +-- neither opening nor closing. +-- +-- * Any unqualified operator occurrence is classified as prefix, suffix, or +-- tight/loose infix, based on preceding and following tokens: +-- +-- precededByClosingToken | followedByOpeningToken | Occurrence +-- ------------------------+------------------------+------------ +-- False | True | prefix +-- True | False | suffix +-- True | True | tight infix +-- False | False | loose infix +-- ------------------------+------------------------+------------ +-- +-- A loose infix occurrence is always considered an operator. Other types of +-- occurrences may be assigned a special per-operator meaning override: +-- +-- Operator | Occurrence | Token returned +-- ----------+---------------+------------------------------------------ +-- ! | prefix | ITbang +-- | | strictness annotation or bang pattern, +-- | | e.g. f !x = rhs, data T = MkT !a +-- | not prefix | ITvarsym "!" +-- | | ordinary operator or type operator, +-- | | e.g. xs ! 3, (! x), Int ! Bool +-- ----------+---------------+------------------------------------------ +-- ~ | prefix | ITtilde +-- | | laziness annotation or lazy pattern, +-- | | e.g. f ~x = rhs, data T = MkT ~a +-- | not prefix | ITvarsym "~" +-- | | ordinary operator or type operator, +-- | | e.g. xs ~ 3, (~ x), Int ~ Bool +-- ----------+---------------+------------------------------------------ +-- $ $$ | prefix | ITdollar, ITdollardollar +-- | | untyped or typed Template Haskell splice, +-- | | e.g. $(f x), $$(f x), $$"str" +-- | not prefix | ITvarsym "$", ITvarsym "$$" +-- | | ordinary operator or type operator, +-- | | e.g. f $ g x, a $$ b +-- ----------+---------------+------------------------------------------ +-- @ | prefix | ITtypeApp +-- | | type application, e.g. fmap @Maybe +-- | tight infix | ITat +-- | | as-pattern, e.g. f p@(a,b) = rhs +-- | suffix | parse error +-- | | e.g. f p@ x = rhs +-- | loose infix | ITvarsym "@" +-- | | ordinary operator or type operator, +-- | | e.g. f @ g, (f @) +-- ----------+---------------+------------------------------------------ +-- +-- Also, some of these overrides are guarded behind language extensions. +-- According to the specification, we must determine the occurrence based on +-- surrounding *tokens* (see the proposal for the exact rules). However, in +-- the implementation we cheat a little and do the classification based on +-- characters, for reasons of both simplicity and efficiency (see +-- 'followedByOpeningToken' and 'precededByClosingToken') +-- +-- When an operator is subject to a meaning override, it is mapped to special +-- token: ITbang, ITtilde, ITat, ITdollar, ITdollardollar. Otherwise, it is +-- returned as ITvarsym. +-- +-- For example, this is how we process the (!): +-- +-- precededByClosingToken | followedByOpeningToken | Token +-- ------------------------+------------------------+------------- +-- False | True | ITbang +-- True | False | ITvarsym "!" +-- True | True | ITvarsym "!" +-- False | False | ITvarsym "!" +-- ------------------------+------------------------+------------- +-- +-- And this is how we process the (@): +-- +-- precededByClosingToken | followedByOpeningToken | Token +-- ------------------------+------------------------+------------- +-- False | True | ITtypeApp +-- True | False | parse error +-- True | True | ITat +-- False | False | ITvarsym "@" +-- ------------------------+------------------------+------------- + +-- ----------------------------------------------------------------------------- +-- Alex "Haskell code fragment bottom" + +{ + +-- ----------------------------------------------------------------------------- +-- The token type + +data Token + = ITas -- Haskell keywords + | ITcase + | ITclass + | ITdata + | ITdefault + | ITderiving + | ITdo + | ITelse + | IThiding + | ITforeign + | ITif + | ITimport + | ITin + | ITinfix + | ITinfixl + | ITinfixr + | ITinstance + | ITlet + | ITmodule + | ITnewtype + | ITof + | ITqualified + | ITthen + | ITtype + | ITwhere + + | ITforall IsUnicodeSyntax -- GHC extension keywords + | ITexport + | ITlabel + | ITdynamic + | ITsafe + | ITinterruptible + | ITunsafe + | ITstdcallconv + | ITccallconv + | ITcapiconv + | ITprimcallconv + | ITjavascriptcallconv + | ITmdo + | ITfamily + | ITrole + | ITgroup + | ITby + | ITusing + | ITpattern + | ITstatic + | ITstock + | ITanyclass + | ITvia + + -- Backpack tokens + | ITunit + | ITsignature + | ITdependency + | ITrequires + + -- Pragmas, see note [Pragma source text] in BasicTypes + | ITinline_prag SourceText InlineSpec RuleMatchInfo + | ITspec_prag SourceText -- SPECIALISE + | ITspec_inline_prag SourceText Bool -- SPECIALISE INLINE (or NOINLINE) + | ITsource_prag SourceText + | ITrules_prag SourceText + | ITwarning_prag SourceText + | ITdeprecated_prag SourceText + | ITline_prag SourceText -- not usually produced, see 'UsePosPragsBit' + | ITcolumn_prag SourceText -- not usually produced, see 'UsePosPragsBit' + | ITscc_prag SourceText + | ITgenerated_prag SourceText + | ITcore_prag SourceText -- hdaume: core annotations + | ITunpack_prag SourceText + | ITnounpack_prag SourceText + | ITann_prag SourceText + | ITcomplete_prag SourceText + | ITclose_prag + | IToptions_prag String + | ITinclude_prag String + | ITlanguage_prag + | ITminimal_prag SourceText + | IToverlappable_prag SourceText -- instance overlap mode + | IToverlapping_prag SourceText -- instance overlap mode + | IToverlaps_prag SourceText -- instance overlap mode + | ITincoherent_prag SourceText -- instance overlap mode + | ITctype SourceText + | ITcomment_line_prag -- See Note [Nested comment line pragmas] + + | ITdotdot -- reserved symbols + | ITcolon + | ITdcolon IsUnicodeSyntax + | ITequal + | ITlam + | ITlcase + | ITvbar + | ITlarrow IsUnicodeSyntax + | ITrarrow IsUnicodeSyntax + | ITdarrow IsUnicodeSyntax + | ITminus + | ITbang -- Prefix (!) only, e.g. f !x = rhs + | ITtilde -- Prefix (~) only, e.g. f ~x = rhs + | ITat -- Tight infix (@) only, e.g. f x@pat = rhs + | ITtypeApp -- Prefix (@) only, e.g. f @t + | ITstar IsUnicodeSyntax + | ITdot + + | ITbiglam -- GHC-extension symbols + + | ITocurly -- special symbols + | ITccurly + | ITvocurly + | ITvccurly + | ITobrack + | ITopabrack -- [:, for parallel arrays with -XParallelArrays + | ITcpabrack -- :], for parallel arrays with -XParallelArrays + | ITcbrack + | IToparen + | ITcparen + | IToubxparen + | ITcubxparen + | ITsemi + | ITcomma + | ITunderscore + | ITbackquote + | ITsimpleQuote -- ' + + | ITvarid FastString -- identifiers + | ITconid FastString + | ITvarsym FastString + | ITconsym FastString + | ITqvarid (FastString,FastString) + | ITqconid (FastString,FastString) + | ITqvarsym (FastString,FastString) + | ITqconsym (FastString,FastString) + + | ITdupipvarid FastString -- GHC extension: implicit param: ?x + | ITlabelvarid FastString -- Overloaded label: #x + + | ITchar SourceText Char -- Note [Literal source text] in BasicTypes + | ITstring SourceText FastString -- Note [Literal source text] in BasicTypes + | ITinteger IntegralLit -- Note [Literal source text] in BasicTypes + | ITrational FractionalLit + + | ITprimchar SourceText Char -- Note [Literal source text] in BasicTypes + | ITprimstring SourceText ByteString -- Note [Literal source text] @BasicTypes + | ITprimint SourceText Integer -- Note [Literal source text] in BasicTypes + | ITprimword SourceText Integer -- Note [Literal source text] in BasicTypes + | ITprimfloat FractionalLit + | ITprimdouble FractionalLit + + -- Template Haskell extension tokens + | ITopenExpQuote HasE IsUnicodeSyntax -- [| or [e| + | ITopenPatQuote -- [p| + | ITopenDecQuote -- [d| + | ITopenTypQuote -- [t| + | ITcloseQuote IsUnicodeSyntax -- |] + | ITopenTExpQuote HasE -- [|| or [e|| + | ITcloseTExpQuote -- ||] + | ITdollar -- prefix $ + | ITdollardollar -- prefix $$ + | ITtyQuote -- '' + | ITquasiQuote (FastString,FastString,PsSpan) + -- ITquasiQuote(quoter, quote, loc) + -- represents a quasi-quote of the form + -- [quoter| quote |] + | ITqQuasiQuote (FastString,FastString,FastString,PsSpan) + -- ITqQuasiQuote(Qual, quoter, quote, loc) + -- represents a qualified quasi-quote of the form + -- [Qual.quoter| quote |] + + -- Arrow notation extension + | ITproc + | ITrec + | IToparenbar IsUnicodeSyntax -- ^ @(|@ + | ITcparenbar IsUnicodeSyntax -- ^ @|)@ + | ITlarrowtail IsUnicodeSyntax -- ^ @-<@ + | ITrarrowtail IsUnicodeSyntax -- ^ @>-@ + | ITLarrowtail IsUnicodeSyntax -- ^ @-<<@ + | ITRarrowtail IsUnicodeSyntax -- ^ @>>-@ + + | ITunknown String -- ^ Used when the lexer can't make sense of it + | ITeof -- ^ end of file token + + -- Documentation annotations + | ITdocCommentNext String -- ^ something beginning @-- |@ + | ITdocCommentPrev String -- ^ something beginning @-- ^@ + | ITdocCommentNamed String -- ^ something beginning @-- $@ + | ITdocSection Int String -- ^ a section heading + | ITdocOptions String -- ^ doc options (prune, ignore-exports, etc) + | ITlineComment String -- ^ comment starting by "--" + | ITblockComment String -- ^ comment in {- -} + + deriving Show + +instance Outputable Token where + ppr x = text (show x) + + +-- the bitmap provided as the third component indicates whether the +-- corresponding extension keyword is valid under the extension options +-- provided to the compiler; if the extension corresponding to *any* of the +-- bits set in the bitmap is enabled, the keyword is valid (this setup +-- facilitates using a keyword in two different extensions that can be +-- activated independently) +-- +reservedWordsFM :: UniqFM (Token, ExtsBitmap) +reservedWordsFM = listToUFM $ + map (\(x, y, z) -> (mkFastString x, (y, z))) + [( "_", ITunderscore, 0 ), + ( "as", ITas, 0 ), + ( "case", ITcase, 0 ), + ( "class", ITclass, 0 ), + ( "data", ITdata, 0 ), + ( "default", ITdefault, 0 ), + ( "deriving", ITderiving, 0 ), + ( "do", ITdo, 0 ), + ( "else", ITelse, 0 ), + ( "hiding", IThiding, 0 ), + ( "if", ITif, 0 ), + ( "import", ITimport, 0 ), + ( "in", ITin, 0 ), + ( "infix", ITinfix, 0 ), + ( "infixl", ITinfixl, 0 ), + ( "infixr", ITinfixr, 0 ), + ( "instance", ITinstance, 0 ), + ( "let", ITlet, 0 ), + ( "module", ITmodule, 0 ), + ( "newtype", ITnewtype, 0 ), + ( "of", ITof, 0 ), + ( "qualified", ITqualified, 0 ), + ( "then", ITthen, 0 ), + ( "type", ITtype, 0 ), + ( "where", ITwhere, 0 ), + + ( "forall", ITforall NormalSyntax, 0), + ( "mdo", ITmdo, xbit RecursiveDoBit), + -- See Note [Lexing type pseudo-keywords] + ( "family", ITfamily, 0 ), + ( "role", ITrole, 0 ), + ( "pattern", ITpattern, xbit PatternSynonymsBit), + ( "static", ITstatic, xbit StaticPointersBit ), + ( "stock", ITstock, 0 ), + ( "anyclass", ITanyclass, 0 ), + ( "via", ITvia, 0 ), + ( "group", ITgroup, xbit TransformComprehensionsBit), + ( "by", ITby, xbit TransformComprehensionsBit), + ( "using", ITusing, xbit TransformComprehensionsBit), + + ( "foreign", ITforeign, xbit FfiBit), + ( "export", ITexport, xbit FfiBit), + ( "label", ITlabel, xbit FfiBit), + ( "dynamic", ITdynamic, xbit FfiBit), + ( "safe", ITsafe, xbit FfiBit .|. + xbit SafeHaskellBit), + ( "interruptible", ITinterruptible, xbit InterruptibleFfiBit), + ( "unsafe", ITunsafe, xbit FfiBit), + ( "stdcall", ITstdcallconv, xbit FfiBit), + ( "ccall", ITccallconv, xbit FfiBit), + ( "capi", ITcapiconv, xbit CApiFfiBit), + ( "prim", ITprimcallconv, xbit FfiBit), + ( "javascript", ITjavascriptcallconv, xbit FfiBit), + + ( "unit", ITunit, 0 ), + ( "dependency", ITdependency, 0 ), + ( "signature", ITsignature, 0 ), + + ( "rec", ITrec, xbit ArrowsBit .|. + xbit RecursiveDoBit), + ( "proc", ITproc, xbit ArrowsBit) + ] + +{----------------------------------- +Note [Lexing type pseudo-keywords] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +One might think that we wish to treat 'family' and 'role' as regular old +varids whenever -XTypeFamilies and -XRoleAnnotations are off, respectively. +But, there is no need to do so. These pseudo-keywords are not stolen syntax: +they are only used after the keyword 'type' at the top-level, where varids are +not allowed. Furthermore, checks further downstream (GHC.Tc.TyCl) ensure that +type families and role annotations are never declared without their extensions +on. In fact, by unconditionally lexing these pseudo-keywords as special, we +can get better error messages. + +Also, note that these are included in the `varid` production in the parser -- +a key detail to make all this work. +-------------------------------------} + +reservedSymsFM :: UniqFM (Token, IsUnicodeSyntax, ExtsBitmap) +reservedSymsFM = listToUFM $ + map (\ (x,w,y,z) -> (mkFastString x,(w,y,z))) + [ ("..", ITdotdot, NormalSyntax, 0 ) + -- (:) is a reserved op, meaning only list cons + ,(":", ITcolon, NormalSyntax, 0 ) + ,("::", ITdcolon NormalSyntax, NormalSyntax, 0 ) + ,("=", ITequal, NormalSyntax, 0 ) + ,("\\", ITlam, NormalSyntax, 0 ) + ,("|", ITvbar, NormalSyntax, 0 ) + ,("<-", ITlarrow NormalSyntax, NormalSyntax, 0 ) + ,("->", ITrarrow NormalSyntax, NormalSyntax, 0 ) + ,("=>", ITdarrow NormalSyntax, NormalSyntax, 0 ) + ,("-", ITminus, NormalSyntax, 0 ) + + ,("*", ITstar NormalSyntax, NormalSyntax, xbit StarIsTypeBit) + + -- For 'forall a . t' + ,(".", ITdot, NormalSyntax, 0 ) + + ,("-<", ITlarrowtail NormalSyntax, NormalSyntax, xbit ArrowsBit) + ,(">-", ITrarrowtail NormalSyntax, NormalSyntax, xbit ArrowsBit) + ,("-<<", ITLarrowtail NormalSyntax, NormalSyntax, xbit ArrowsBit) + ,(">>-", ITRarrowtail NormalSyntax, NormalSyntax, xbit ArrowsBit) + + ,("∷", ITdcolon UnicodeSyntax, UnicodeSyntax, 0 ) + ,("⇒", ITdarrow UnicodeSyntax, UnicodeSyntax, 0 ) + ,("∀", ITforall UnicodeSyntax, UnicodeSyntax, 0 ) + ,("→", ITrarrow UnicodeSyntax, UnicodeSyntax, 0 ) + ,("←", ITlarrow UnicodeSyntax, UnicodeSyntax, 0 ) + + ,("⤙", ITlarrowtail UnicodeSyntax, UnicodeSyntax, xbit ArrowsBit) + ,("⤚", ITrarrowtail UnicodeSyntax, UnicodeSyntax, xbit ArrowsBit) + ,("⤛", ITLarrowtail UnicodeSyntax, UnicodeSyntax, xbit ArrowsBit) + ,("⤜", ITRarrowtail UnicodeSyntax, UnicodeSyntax, xbit ArrowsBit) + + ,("★", ITstar UnicodeSyntax, UnicodeSyntax, xbit StarIsTypeBit) + + -- ToDo: ideally, → and ∷ should be "specials", so that they cannot + -- form part of a large operator. This would let us have a better + -- syntax for kinds: ɑ∷*→* would be a legal kind signature. (maybe). + ] + +-- ----------------------------------------------------------------------------- +-- Lexer actions + +type Action = PsSpan -> StringBuffer -> Int -> P (PsLocated Token) + +special :: Token -> Action +special tok span _buf _len = return (L span tok) + +token, layout_token :: Token -> Action +token t span _buf _len = return (L span t) +layout_token t span _buf _len = pushLexState layout >> return (L span t) + +idtoken :: (StringBuffer -> Int -> Token) -> Action +idtoken f span buf len = return (L span $! (f buf len)) + +skip_one_varid :: (FastString -> Token) -> Action +skip_one_varid f span buf len + = return (L span $! f (lexemeToFastString (stepOn buf) (len-1))) + +skip_two_varid :: (FastString -> Token) -> Action +skip_two_varid f span buf len + = return (L span $! f (lexemeToFastString (stepOn (stepOn buf)) (len-2))) + +strtoken :: (String -> Token) -> Action +strtoken f span buf len = + return (L span $! (f $! lexemeToString buf len)) + +begin :: Int -> Action +begin code _span _str _len = do pushLexState code; lexToken + +pop :: Action +pop _span _buf _len = do _ <- popLexState + lexToken +-- See Note [Nested comment line pragmas] +failLinePrag1 :: Action +failLinePrag1 span _buf _len = do + b <- getBit InNestedCommentBit + if b then return (L span ITcomment_line_prag) + else lexError "lexical error in pragma" + +-- See Note [Nested comment line pragmas] +popLinePrag1 :: Action +popLinePrag1 span _buf _len = do + b <- getBit InNestedCommentBit + if b then return (L span ITcomment_line_prag) else do + _ <- popLexState + lexToken + +hopefully_open_brace :: Action +hopefully_open_brace span buf len + = do relaxed <- getBit RelaxedLayoutBit + ctx <- getContext + (AI l _) <- getInput + let offset = srcLocCol (psRealLoc l) + isOK = relaxed || + case ctx of + Layout prev_off _ : _ -> prev_off < offset + _ -> True + if isOK then pop_and open_brace span buf len + else addFatalError (mkSrcSpanPs span) (text "Missing block") + +pop_and :: Action -> Action +pop_and act span buf len = do _ <- popLexState + act span buf len + +-- See Note [Whitespace-sensitive operator parsing] +followedByOpeningToken :: AlexAccPred ExtsBitmap +followedByOpeningToken _ _ _ (AI _ buf) + | atEnd buf = False + | otherwise = + case nextChar buf of + ('{', buf') -> nextCharIsNot buf' (== '-') + ('(', _) -> True + ('[', _) -> True + ('\"', _) -> True + ('\'', _) -> True + ('_', _) -> True + (c, _) -> isAlphaNum c + +-- See Note [Whitespace-sensitive operator parsing] +precededByClosingToken :: AlexAccPred ExtsBitmap +precededByClosingToken _ (AI _ buf) _ _ = + case prevChar buf '\n' of + '}' -> decodePrevNChars 1 buf /= "-" + ')' -> True + ']' -> True + '\"' -> True + '\'' -> True + '_' -> True + c -> isAlphaNum c + +{-# INLINE nextCharIs #-} +nextCharIs :: StringBuffer -> (Char -> Bool) -> Bool +nextCharIs buf p = not (atEnd buf) && p (currentChar buf) + +{-# INLINE nextCharIsNot #-} +nextCharIsNot :: StringBuffer -> (Char -> Bool) -> Bool +nextCharIsNot buf p = not (nextCharIs buf p) + +notFollowedBy :: Char -> AlexAccPred ExtsBitmap +notFollowedBy char _ _ _ (AI _ buf) + = nextCharIsNot buf (== char) + +notFollowedBySymbol :: AlexAccPred ExtsBitmap +notFollowedBySymbol _ _ _ (AI _ buf) + = nextCharIsNot buf (`elem` "!#$%&*+./<=>?@\\^|-~") + +followedByDigit :: AlexAccPred ExtsBitmap +followedByDigit _ _ _ (AI _ buf) + = afterOptionalSpace buf (\b -> nextCharIs b (`elem` ['0'..'9'])) + +ifCurrentChar :: Char -> AlexAccPred ExtsBitmap +ifCurrentChar char _ (AI _ buf) _ _ + = nextCharIs buf (== char) + +-- We must reject doc comments as being ordinary comments everywhere. +-- In some cases the doc comment will be selected as the lexeme due to +-- maximal munch, but not always, because the nested comment rule is +-- valid in all states, but the doc-comment rules are only valid in +-- the non-layout states. +isNormalComment :: AlexAccPred ExtsBitmap +isNormalComment bits _ _ (AI _ buf) + | HaddockBit `xtest` bits = notFollowedByDocOrPragma + | otherwise = nextCharIsNot buf (== '#') + where + notFollowedByDocOrPragma + = afterOptionalSpace buf (\b -> nextCharIsNot b (`elem` "|^*$#")) + +afterOptionalSpace :: StringBuffer -> (StringBuffer -> Bool) -> Bool +afterOptionalSpace buf p + = if nextCharIs buf (== ' ') + then p (snd (nextChar buf)) + else p buf + +atEOL :: AlexAccPred ExtsBitmap +atEOL _ _ _ (AI _ buf) = atEnd buf || currentChar buf == '\n' + +ifExtension :: ExtBits -> AlexAccPred ExtsBitmap +ifExtension extBits bits _ _ _ = extBits `xtest` bits + +alexNotPred p userState in1 len in2 + = not (p userState in1 len in2) + +alexOrPred p1 p2 userState in1 len in2 + = p1 userState in1 len in2 || p2 userState in1 len in2 + +multiline_doc_comment :: Action +multiline_doc_comment span buf _len = withLexedDocType (worker "") + where + worker commentAcc input docType checkNextLine = case alexGetChar' input of + Just ('\n', input') + | checkNextLine -> case checkIfCommentLine input' of + Just input -> worker ('\n':commentAcc) input docType checkNextLine + Nothing -> docCommentEnd input commentAcc docType buf span + | otherwise -> docCommentEnd input commentAcc docType buf span + Just (c, input) -> worker (c:commentAcc) input docType checkNextLine + Nothing -> docCommentEnd input commentAcc docType buf span + + -- Check if the next line of input belongs to this doc comment as well. + -- A doc comment continues onto the next line when the following + -- conditions are met: + -- * The line starts with "--" + -- * The line doesn't start with "---". + -- * The line doesn't start with "-- $", because that would be the + -- start of a /new/ named haddock chunk (#10398). + checkIfCommentLine :: AlexInput -> Maybe AlexInput + checkIfCommentLine input = check (dropNonNewlineSpace input) + where + check input = do + ('-', input) <- alexGetChar' input + ('-', input) <- alexGetChar' input + (c, after_c) <- alexGetChar' input + case c of + '-' -> Nothing + ' ' -> case alexGetChar' after_c of + Just ('$', _) -> Nothing + _ -> Just input + _ -> Just input + + dropNonNewlineSpace input = case alexGetChar' input of + Just (c, input') + | isSpace c && c /= '\n' -> dropNonNewlineSpace input' + | otherwise -> input + Nothing -> input + +lineCommentToken :: Action +lineCommentToken span buf len = do + b <- getBit RawTokenStreamBit + if b then strtoken ITlineComment span buf len else lexToken + +{- + nested comments require traversing by hand, they can't be parsed + using regular expressions. +-} +nested_comment :: P (PsLocated Token) -> Action +nested_comment cont span buf len = do + input <- getInput + go (reverse $ lexemeToString buf len) (1::Int) input + where + go commentAcc 0 input = do + setInput input + b <- getBit RawTokenStreamBit + if b + then docCommentEnd input commentAcc ITblockComment buf span + else cont + go commentAcc n input = case alexGetChar' input of + Nothing -> errBrace input (psRealSpan span) + Just ('-',input) -> case alexGetChar' input of + Nothing -> errBrace input (psRealSpan span) + Just ('\125',input) -> go ('\125':'-':commentAcc) (n-1) input -- '}' + Just (_,_) -> go ('-':commentAcc) n input + Just ('\123',input) -> case alexGetChar' input of -- '{' char + Nothing -> errBrace input (psRealSpan span) + Just ('-',input) -> go ('-':'\123':commentAcc) (n+1) input + Just (_,_) -> go ('\123':commentAcc) n input + -- See Note [Nested comment line pragmas] + Just ('\n',input) -> case alexGetChar' input of + Nothing -> errBrace input (psRealSpan span) + Just ('#',_) -> do (parsedAcc,input) <- parseNestedPragma input + go (parsedAcc ++ '\n':commentAcc) n input + Just (_,_) -> go ('\n':commentAcc) n input + Just (c,input) -> go (c:commentAcc) n input + +nested_doc_comment :: Action +nested_doc_comment span buf _len = withLexedDocType (go "") + where + go commentAcc input docType _ = case alexGetChar' input of + Nothing -> errBrace input (psRealSpan span) + Just ('-',input) -> case alexGetChar' input of + Nothing -> errBrace input (psRealSpan span) + Just ('\125',input) -> + docCommentEnd input commentAcc docType buf span + Just (_,_) -> go ('-':commentAcc) input docType False + Just ('\123', input) -> case alexGetChar' input of + Nothing -> errBrace input (psRealSpan span) + Just ('-',input) -> do + setInput input + let cont = do input <- getInput; go commentAcc input docType False + nested_comment cont span buf _len + Just (_,_) -> go ('\123':commentAcc) input docType False + -- See Note [Nested comment line pragmas] + Just ('\n',input) -> case alexGetChar' input of + Nothing -> errBrace input (psRealSpan span) + Just ('#',_) -> do (parsedAcc,input) <- parseNestedPragma input + go (parsedAcc ++ '\n':commentAcc) input docType False + Just (_,_) -> go ('\n':commentAcc) input docType False + Just (c,input) -> go (c:commentAcc) input docType False + +-- See Note [Nested comment line pragmas] +parseNestedPragma :: AlexInput -> P (String,AlexInput) +parseNestedPragma input@(AI _ buf) = do + origInput <- getInput + setInput input + setExts (.|. xbit InNestedCommentBit) + pushLexState bol + lt <- lexToken + _ <- popLexState + setExts (.&. complement (xbit InNestedCommentBit)) + postInput@(AI _ postBuf) <- getInput + setInput origInput + case unLoc lt of + ITcomment_line_prag -> do + let bytes = byteDiff buf postBuf + diff = lexemeToString buf bytes + return (reverse diff, postInput) + lt' -> panic ("parseNestedPragma: unexpected token" ++ (show lt')) + +{- +Note [Nested comment line pragmas] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +We used to ignore cpp-preprocessor-generated #line pragmas if they were inside +nested comments. + +Now, when parsing a nested comment, if we encounter a line starting with '#' we +call parseNestedPragma, which executes the following: +1. Save the current lexer input (loc, buf) for later +2. Set the current lexer input to the beginning of the line starting with '#' +3. Turn the 'InNestedComment' extension on +4. Push the 'bol' lexer state +5. Lex a token. Due to (2), (3), and (4), this should always lex a single line + or less and return the ITcomment_line_prag token. This may set source line + and file location if a #line pragma is successfully parsed +6. Restore lexer input and state to what they were before we did all this +7. Return control to the function parsing a nested comment, informing it of + what the lexer parsed + +Regarding (5) above: +Every exit from the 'bol' lexer state (do_bol, popLinePrag1, failLinePrag1) +checks if the 'InNestedComment' extension is set. If it is, that function will +return control to parseNestedPragma by returning the ITcomment_line_prag token. + +See #314 for more background on the bug this fixes. +-} + +withLexedDocType :: (AlexInput -> (String -> Token) -> Bool -> P (PsLocated Token)) + -> P (PsLocated Token) +withLexedDocType lexDocComment = do + input@(AI _ buf) <- getInput + case prevChar buf ' ' of + -- The `Bool` argument to lexDocComment signals whether or not the next + -- line of input might also belong to this doc comment. + '|' -> lexDocComment input ITdocCommentNext True + '^' -> lexDocComment input ITdocCommentPrev True + '$' -> lexDocComment input ITdocCommentNamed True + '*' -> lexDocSection 1 input + _ -> panic "withLexedDocType: Bad doc type" + where + lexDocSection n input = case alexGetChar' input of + Just ('*', input) -> lexDocSection (n+1) input + Just (_, _) -> lexDocComment input (ITdocSection n) False + Nothing -> do setInput input; lexToken -- eof reached, lex it normally + +-- RULES pragmas turn on the forall and '.' keywords, and we turn them +-- off again at the end of the pragma. +rulePrag :: Action +rulePrag span buf len = do + setExts (.|. xbit InRulePragBit) + let !src = lexemeToString buf len + return (L span (ITrules_prag (SourceText src))) + +-- When 'UsePosPragsBit' is not set, it is expected that we emit a token instead +-- of updating the position in 'PState' +linePrag :: Action +linePrag span buf len = do + usePosPrags <- getBit UsePosPragsBit + if usePosPrags + then begin line_prag2 span buf len + else let !src = lexemeToString buf len + in return (L span (ITline_prag (SourceText src))) + +-- When 'UsePosPragsBit' is not set, it is expected that we emit a token instead +-- of updating the position in 'PState' +columnPrag :: Action +columnPrag span buf len = do + usePosPrags <- getBit UsePosPragsBit + let !src = lexemeToString buf len + if usePosPrags + then begin column_prag span buf len + else let !src = lexemeToString buf len + in return (L span (ITcolumn_prag (SourceText src))) + +endPrag :: Action +endPrag span _buf _len = do + setExts (.&. complement (xbit InRulePragBit)) + return (L span ITclose_prag) + +-- docCommentEnd +------------------------------------------------------------------------------- +-- This function is quite tricky. We can't just return a new token, we also +-- need to update the state of the parser. Why? Because the token is longer +-- than what was lexed by Alex, and the lexToken function doesn't know this, so +-- it writes the wrong token length to the parser state. This function is +-- called afterwards, so it can just update the state. + +docCommentEnd :: AlexInput -> String -> (String -> Token) -> StringBuffer -> + PsSpan -> P (PsLocated Token) +docCommentEnd input commentAcc docType buf span = do + setInput input + let (AI loc nextBuf) = input + comment = reverse commentAcc + span' = mkPsSpan (psSpanStart span) loc + last_len = byteDiff buf nextBuf + + span `seq` setLastToken span' last_len + return (L span' (docType comment)) + +errBrace :: AlexInput -> RealSrcSpan -> P a +errBrace (AI end _) span = failLocMsgP (realSrcSpanStart span) (psRealLoc end) "unterminated `{-'" + +open_brace, close_brace :: Action +open_brace span _str _len = do + ctx <- getContext + setContext (NoLayout:ctx) + return (L span ITocurly) +close_brace span _str _len = do + popContext + return (L span ITccurly) + +qvarid, qconid :: StringBuffer -> Int -> Token +qvarid buf len = ITqvarid $! splitQualName buf len False +qconid buf len = ITqconid $! splitQualName buf len False + +splitQualName :: StringBuffer -> Int -> Bool -> (FastString,FastString) +-- takes a StringBuffer and a length, and returns the module name +-- and identifier parts of a qualified name. Splits at the *last* dot, +-- because of hierarchical module names. +splitQualName orig_buf len parens = split orig_buf orig_buf + where + split buf dot_buf + | orig_buf `byteDiff` buf >= len = done dot_buf + | c == '.' = found_dot buf' + | otherwise = split buf' dot_buf + where + (c,buf') = nextChar buf + + -- careful, we might get names like M.... + -- so, if the character after the dot is not upper-case, this is + -- the end of the qualifier part. + found_dot buf -- buf points after the '.' + | isUpper c = split buf' buf + | otherwise = done buf + where + (c,buf') = nextChar buf + + done dot_buf = + (lexemeToFastString orig_buf (qual_size - 1), + if parens -- Prelude.(+) + then lexemeToFastString (stepOn dot_buf) (len - qual_size - 2) + else lexemeToFastString dot_buf (len - qual_size)) + where + qual_size = orig_buf `byteDiff` dot_buf + +varid :: Action +varid span buf len = + case lookupUFM reservedWordsFM fs of + Just (ITcase, _) -> do + lastTk <- getLastTk + keyword <- case lastTk of + Just ITlam -> do + lambdaCase <- getBit LambdaCaseBit + unless lambdaCase $ do + pState <- getPState + addError (mkSrcSpanPs (last_loc pState)) $ text + "Illegal lambda-case (use LambdaCase)" + return ITlcase + _ -> return ITcase + maybe_layout keyword + return $ L span keyword + Just (keyword, 0) -> do + maybe_layout keyword + return $ L span keyword + Just (keyword, i) -> do + exts <- getExts + if exts .&. i /= 0 + then do + maybe_layout keyword + return $ L span keyword + else + return $ L span $ ITvarid fs + Nothing -> + return $ L span $ ITvarid fs + where + !fs = lexemeToFastString buf len + +conid :: StringBuffer -> Int -> Token +conid buf len = ITconid $! lexemeToFastString buf len + +qvarsym, qconsym :: StringBuffer -> Int -> Token +qvarsym buf len = ITqvarsym $! splitQualName buf len False +qconsym buf len = ITqconsym $! splitQualName buf len False + +-- See Note [Whitespace-sensitive operator parsing] +varsym_prefix :: Action +varsym_prefix = sym $ \exts s -> + if | TypeApplicationsBit `xtest` exts, s == fsLit "@" + -> return ITtypeApp + | ThQuotesBit `xtest` exts, s == fsLit "$" + -> return ITdollar + | ThQuotesBit `xtest` exts, s == fsLit "$$" + -> return ITdollardollar + | s == fsLit "!" -> return ITbang + | s == fsLit "~" -> return ITtilde + | otherwise -> return (ITvarsym s) + +-- See Note [Whitespace-sensitive operator parsing] +varsym_suffix :: Action +varsym_suffix = sym $ \_ s -> + if | s == fsLit "@" + -> failMsgP "Suffix occurrence of @. For an as-pattern, remove the leading whitespace." + | otherwise -> return (ITvarsym s) + +-- See Note [Whitespace-sensitive operator parsing] +varsym_tight_infix :: Action +varsym_tight_infix = sym $ \_ s -> + if | s == fsLit "@" -> return ITat + | otherwise -> return (ITvarsym s) + +-- See Note [Whitespace-sensitive operator parsing] +varsym_loose_infix :: Action +varsym_loose_infix = sym (\_ s -> return $ ITvarsym s) + +consym :: Action +consym = sym (\_exts s -> return $ ITconsym s) + +sym :: (ExtsBitmap -> FastString -> P Token) -> Action +sym con span buf len = + case lookupUFM reservedSymsFM fs of + Just (keyword, NormalSyntax, 0) -> + return $ L span keyword + Just (keyword, NormalSyntax, i) -> do + exts <- getExts + if exts .&. i /= 0 + then return $ L span keyword + else L span <$!> con exts fs + Just (keyword, UnicodeSyntax, 0) -> do + exts <- getExts + if xtest UnicodeSyntaxBit exts + then return $ L span keyword + else L span <$!> con exts fs + Just (keyword, UnicodeSyntax, i) -> do + exts <- getExts + if exts .&. i /= 0 && xtest UnicodeSyntaxBit exts + then return $ L span keyword + else L span <$!> con exts fs + Nothing -> do + exts <- getExts + L span <$!> con exts fs + where + !fs = lexemeToFastString buf len + +-- Variations on the integral numeric literal. +tok_integral :: (SourceText -> Integer -> Token) + -> (Integer -> Integer) + -> Int -> Int + -> (Integer, (Char -> Int)) + -> Action +tok_integral itint transint transbuf translen (radix,char_to_int) span buf len = do + numericUnderscores <- getBit NumericUnderscoresBit -- #14473 + let src = lexemeToString buf len + when ((not numericUnderscores) && ('_' `elem` src)) $ do + pState <- getPState + addError (mkSrcSpanPs (last_loc pState)) $ text + "Use NumericUnderscores to allow underscores in integer literals" + return $ L span $ itint (SourceText src) + $! transint $ parseUnsignedInteger + (offsetBytes transbuf buf) (subtract translen len) radix char_to_int + +tok_num :: (Integer -> Integer) + -> Int -> Int + -> (Integer, (Char->Int)) -> Action +tok_num = tok_integral $ \case + st@(SourceText ('-':_)) -> itint st (const True) + st@(SourceText _) -> itint st (const False) + st@NoSourceText -> itint st (< 0) + where + itint :: SourceText -> (Integer -> Bool) -> Integer -> Token + itint !st is_negative !val = ITinteger ((IL st $! is_negative val) val) + +tok_primint :: (Integer -> Integer) + -> Int -> Int + -> (Integer, (Char->Int)) -> Action +tok_primint = tok_integral ITprimint + + +tok_primword :: Int -> Int + -> (Integer, (Char->Int)) -> Action +tok_primword = tok_integral ITprimword positive +positive, negative :: (Integer -> Integer) +positive = id +negative = negate +decimal, octal, hexadecimal :: (Integer, Char -> Int) +decimal = (10,octDecDigit) +binary = (2,octDecDigit) +octal = (8,octDecDigit) +hexadecimal = (16,hexDigit) + +-- readRational can understand negative rationals, exponents, everything. +tok_frac :: Int -> (String -> Token) -> Action +tok_frac drop f span buf len = do + numericUnderscores <- getBit NumericUnderscoresBit -- #14473 + let src = lexemeToString buf (len-drop) + when ((not numericUnderscores) && ('_' `elem` src)) $ do + pState <- getPState + addError (mkSrcSpanPs (last_loc pState)) $ text + "Use NumericUnderscores to allow underscores in floating literals" + return (L span $! (f $! src)) + +tok_float, tok_primfloat, tok_primdouble :: String -> Token +tok_float str = ITrational $! readFractionalLit str +tok_hex_float str = ITrational $! readHexFractionalLit str +tok_primfloat str = ITprimfloat $! readFractionalLit str +tok_primdouble str = ITprimdouble $! readFractionalLit str + +readFractionalLit :: String -> FractionalLit +readFractionalLit str = ((FL $! (SourceText str)) $! is_neg) $! readRational str + where is_neg = case str of ('-':_) -> True + _ -> False +readHexFractionalLit :: String -> FractionalLit +readHexFractionalLit str = + FL { fl_text = SourceText str + , fl_neg = case str of + '-' : _ -> True + _ -> False + , fl_value = readHexRational str + } + +-- ----------------------------------------------------------------------------- +-- Layout processing + +-- we're at the first token on a line, insert layout tokens if necessary +do_bol :: Action +do_bol span _str _len = do + -- See Note [Nested comment line pragmas] + b <- getBit InNestedCommentBit + if b then return (L span ITcomment_line_prag) else do + (pos, gen_semic) <- getOffside + case pos of + LT -> do + --trace "layout: inserting '}'" $ do + popContext + -- do NOT pop the lex state, we might have a ';' to insert + return (L span ITvccurly) + EQ | gen_semic -> do + --trace "layout: inserting ';'" $ do + _ <- popLexState + return (L span ITsemi) + _ -> do + _ <- popLexState + lexToken + +-- certain keywords put us in the "layout" state, where we might +-- add an opening curly brace. +maybe_layout :: Token -> P () +maybe_layout t = do -- If the alternative layout rule is enabled then + -- we never create an implicit layout context here. + -- Layout is handled XXX instead. + -- The code for closing implicit contexts, or + -- inserting implicit semi-colons, is therefore + -- irrelevant as it only applies in an implicit + -- context. + alr <- getBit AlternativeLayoutRuleBit + unless alr $ f t + where f ITdo = pushLexState layout_do + f ITmdo = pushLexState layout_do + f ITof = pushLexState layout + f ITlcase = pushLexState layout + f ITlet = pushLexState layout + f ITwhere = pushLexState layout + f ITrec = pushLexState layout + f ITif = pushLexState layout_if + f _ = return () + +-- Pushing a new implicit layout context. If the indentation of the +-- next token is not greater than the previous layout context, then +-- Haskell 98 says that the new layout context should be empty; that is +-- the lexer must generate {}. +-- +-- We are slightly more lenient than this: when the new context is started +-- by a 'do', then we allow the new context to be at the same indentation as +-- the previous context. This is what the 'strict' argument is for. +new_layout_context :: Bool -> Bool -> Token -> Action +new_layout_context strict gen_semic tok span _buf len = do + _ <- popLexState + (AI l _) <- getInput + let offset = srcLocCol (psRealLoc l) - len + ctx <- getContext + nondecreasing <- getBit NondecreasingIndentationBit + let strict' = strict || not nondecreasing + case ctx of + Layout prev_off _ : _ | + (strict' && prev_off >= offset || + not strict' && prev_off > offset) -> do + -- token is indented to the left of the previous context. + -- we must generate a {} sequence now. + pushLexState layout_left + return (L span tok) + _ -> do setContext (Layout offset gen_semic : ctx) + return (L span tok) + +do_layout_left :: Action +do_layout_left span _buf _len = do + _ <- popLexState + pushLexState bol -- we must be at the start of a line + return (L span ITvccurly) + +-- ----------------------------------------------------------------------------- +-- LINE pragmas + +setLineAndFile :: Int -> Action +setLineAndFile code (PsSpan span _) buf len = do + let src = lexemeToString buf (len - 1) -- drop trailing quotation mark + linenumLen = length $ head $ words src + linenum = parseUnsignedInteger buf linenumLen 10 octDecDigit + file = mkFastString $ go $ drop 1 $ dropWhile (/= '"') src + -- skip everything through first quotation mark to get to the filename + where go ('\\':c:cs) = c : go cs + go (c:cs) = c : go cs + go [] = [] + -- decode escapes in the filename. e.g. on Windows + -- when our filenames have backslashes in, gcc seems to + -- escape the backslashes. One symptom of not doing this + -- is that filenames in error messages look a bit strange: + -- C:\\foo\bar.hs + -- only the first backslash is doubled, because we apply + -- System.FilePath.normalise before printing out + -- filenames and it does not remove duplicate + -- backslashes after the drive letter (should it?). + resetAlrLastLoc file + setSrcLoc (mkRealSrcLoc file (fromIntegral linenum - 1) (srcSpanEndCol span)) + -- subtract one: the line number refers to the *following* line + addSrcFile file + _ <- popLexState + pushLexState code + lexToken + +setColumn :: Action +setColumn (PsSpan span _) buf len = do + let column = + case reads (lexemeToString buf len) of + [(column, _)] -> column + _ -> error "setColumn: expected integer" -- shouldn't happen + setSrcLoc (mkRealSrcLoc (srcSpanFile span) (srcSpanEndLine span) + (fromIntegral (column :: Integer))) + _ <- popLexState + lexToken + +alrInitialLoc :: FastString -> RealSrcSpan +alrInitialLoc file = mkRealSrcSpan loc loc + where -- This is a hack to ensure that the first line in a file + -- looks like it is after the initial location: + loc = mkRealSrcLoc file (-1) (-1) + +-- ----------------------------------------------------------------------------- +-- Options, includes and language pragmas. + +lex_string_prag :: (String -> Token) -> Action +lex_string_prag mkTok span _buf _len + = do input <- getInput + start <- getParsedLoc + tok <- go [] input + end <- getParsedLoc + return (L (mkPsSpan start end) tok) + where go acc input + = if isString input "#-}" + then do setInput input + return (mkTok (reverse acc)) + else case alexGetChar input of + Just (c,i) -> go (c:acc) i + Nothing -> err input + isString _ [] = True + isString i (x:xs) + = case alexGetChar i of + Just (c,i') | c == x -> isString i' xs + _other -> False + err (AI end _) = failLocMsgP (realSrcSpanStart (psRealSpan span)) (psRealLoc end) "unterminated options pragma" + + +-- ----------------------------------------------------------------------------- +-- Strings & Chars + +-- This stuff is horrible. I hates it. + +lex_string_tok :: Action +lex_string_tok span buf _len = do + tok <- lex_string "" + (AI end bufEnd) <- getInput + let + tok' = case tok of + ITprimstring _ bs -> ITprimstring (SourceText src) bs + ITstring _ s -> ITstring (SourceText src) s + _ -> panic "lex_string_tok" + src = lexemeToString buf (cur bufEnd - cur buf) + return (L (mkPsSpan (psSpanStart span) end) tok') + +lex_string :: String -> P Token +lex_string s = do + i <- getInput + case alexGetChar' i of + Nothing -> lit_error i + + Just ('"',i) -> do + setInput i + let s' = reverse s + magicHash <- getBit MagicHashBit + if magicHash + then do + i <- getInput + case alexGetChar' i of + Just ('#',i) -> do + setInput i + when (any (> '\xFF') s') $ do + pState <- getPState + addError (mkSrcSpanPs (last_loc pState)) $ text + "primitive string literal must contain only characters <= \'\\xFF\'" + return (ITprimstring (SourceText s') (unsafeMkByteString s')) + _other -> + return (ITstring (SourceText s') (mkFastString s')) + else + return (ITstring (SourceText s') (mkFastString s')) + + Just ('\\',i) + | Just ('&',i) <- next -> do + setInput i; lex_string s + | Just (c,i) <- next, c <= '\x7f' && is_space c -> do + -- is_space only works for <= '\x7f' (#3751, #5425) + setInput i; lex_stringgap s + where next = alexGetChar' i + + Just (c, i1) -> do + case c of + '\\' -> do setInput i1; c' <- lex_escape; lex_string (c':s) + c | isAny c -> do setInput i1; lex_string (c:s) + _other -> lit_error i + +lex_stringgap :: String -> P Token +lex_stringgap s = do + i <- getInput + c <- getCharOrFail i + case c of + '\\' -> lex_string s + c | c <= '\x7f' && is_space c -> lex_stringgap s + -- is_space only works for <= '\x7f' (#3751, #5425) + _other -> lit_error i + + +lex_char_tok :: Action +-- Here we are basically parsing character literals, such as 'x' or '\n' +-- but we additionally spot 'x and ''T, returning ITsimpleQuote and +-- ITtyQuote respectively, but WITHOUT CONSUMING the x or T part +-- (the parser does that). +-- So we have to do two characters of lookahead: when we see 'x we need to +-- see if there's a trailing quote +lex_char_tok span buf _len = do -- We've seen ' + i1 <- getInput -- Look ahead to first character + let loc = psSpanStart span + case alexGetChar' i1 of + Nothing -> lit_error i1 + + Just ('\'', i2@(AI end2 _)) -> do -- We've seen '' + setInput i2 + return (L (mkPsSpan loc end2) ITtyQuote) + + Just ('\\', i2@(AI _end2 _)) -> do -- We've seen 'backslash + setInput i2 + lit_ch <- lex_escape + i3 <- getInput + mc <- getCharOrFail i3 -- Trailing quote + if mc == '\'' then finish_char_tok buf loc lit_ch + else lit_error i3 + + Just (c, i2@(AI _end2 _)) + | not (isAny c) -> lit_error i1 + | otherwise -> + + -- We've seen 'x, where x is a valid character + -- (i.e. not newline etc) but not a quote or backslash + case alexGetChar' i2 of -- Look ahead one more character + Just ('\'', i3) -> do -- We've seen 'x' + setInput i3 + finish_char_tok buf loc c + _other -> do -- We've seen 'x not followed by quote + -- (including the possibility of EOF) + -- Just parse the quote only + let (AI end _) = i1 + return (L (mkPsSpan loc end) ITsimpleQuote) + +finish_char_tok :: StringBuffer -> PsLoc -> Char -> P (PsLocated Token) +finish_char_tok buf loc ch -- We've already seen the closing quote + -- Just need to check for trailing # + = do magicHash <- getBit MagicHashBit + i@(AI end bufEnd) <- getInput + let src = lexemeToString buf (cur bufEnd - cur buf) + if magicHash then do + case alexGetChar' i of + Just ('#',i@(AI end _)) -> do + setInput i + return (L (mkPsSpan loc end) + (ITprimchar (SourceText src) ch)) + _other -> + return (L (mkPsSpan loc end) + (ITchar (SourceText src) ch)) + else do + return (L (mkPsSpan loc end) (ITchar (SourceText src) ch)) + +isAny :: Char -> Bool +isAny c | c > '\x7f' = isPrint c + | otherwise = is_any c + +lex_escape :: P Char +lex_escape = do + i0 <- getInput + c <- getCharOrFail i0 + case c of + 'a' -> return '\a' + 'b' -> return '\b' + 'f' -> return '\f' + 'n' -> return '\n' + 'r' -> return '\r' + 't' -> return '\t' + 'v' -> return '\v' + '\\' -> return '\\' + '"' -> return '\"' + '\'' -> return '\'' + '^' -> do i1 <- getInput + c <- getCharOrFail i1 + if c >= '@' && c <= '_' + then return (chr (ord c - ord '@')) + else lit_error i1 + + 'x' -> readNum is_hexdigit 16 hexDigit + 'o' -> readNum is_octdigit 8 octDecDigit + x | is_decdigit x -> readNum2 is_decdigit 10 octDecDigit (octDecDigit x) + + c1 -> do + i <- getInput + case alexGetChar' i of + Nothing -> lit_error i0 + Just (c2,i2) -> + case alexGetChar' i2 of + Nothing -> do lit_error i0 + Just (c3,i3) -> + let str = [c1,c2,c3] in + case [ (c,rest) | (p,c) <- silly_escape_chars, + Just rest <- [stripPrefix p str] ] of + (escape_char,[]):_ -> do + setInput i3 + return escape_char + (escape_char,_:_):_ -> do + setInput i2 + return escape_char + [] -> lit_error i0 + +readNum :: (Char -> Bool) -> Int -> (Char -> Int) -> P Char +readNum is_digit base conv = do + i <- getInput + c <- getCharOrFail i + if is_digit c + then readNum2 is_digit base conv (conv c) + else lit_error i + +readNum2 :: (Char -> Bool) -> Int -> (Char -> Int) -> Int -> P Char +readNum2 is_digit base conv i = do + input <- getInput + read i input + where read i input = do + case alexGetChar' input of + Just (c,input') | is_digit c -> do + let i' = i*base + conv c + if i' > 0x10ffff + then setInput input >> lexError "numeric escape sequence out of range" + else read i' input' + _other -> do + setInput input; return (chr i) + + +silly_escape_chars :: [(String, Char)] +silly_escape_chars = [ + ("NUL", '\NUL'), + ("SOH", '\SOH'), + ("STX", '\STX'), + ("ETX", '\ETX'), + ("EOT", '\EOT'), + ("ENQ", '\ENQ'), + ("ACK", '\ACK'), + ("BEL", '\BEL'), + ("BS", '\BS'), + ("HT", '\HT'), + ("LF", '\LF'), + ("VT", '\VT'), + ("FF", '\FF'), + ("CR", '\CR'), + ("SO", '\SO'), + ("SI", '\SI'), + ("DLE", '\DLE'), + ("DC1", '\DC1'), + ("DC2", '\DC2'), + ("DC3", '\DC3'), + ("DC4", '\DC4'), + ("NAK", '\NAK'), + ("SYN", '\SYN'), + ("ETB", '\ETB'), + ("CAN", '\CAN'), + ("EM", '\EM'), + ("SUB", '\SUB'), + ("ESC", '\ESC'), + ("FS", '\FS'), + ("GS", '\GS'), + ("RS", '\RS'), + ("US", '\US'), + ("SP", '\SP'), + ("DEL", '\DEL') + ] + +-- before calling lit_error, ensure that the current input is pointing to +-- the position of the error in the buffer. This is so that we can report +-- a correct location to the user, but also so we can detect UTF-8 decoding +-- errors if they occur. +lit_error :: AlexInput -> P a +lit_error i = do setInput i; lexError "lexical error in string/character literal" + +getCharOrFail :: AlexInput -> P Char +getCharOrFail i = do + case alexGetChar' i of + Nothing -> lexError "unexpected end-of-file in string/character literal" + Just (c,i) -> do setInput i; return c + +-- ----------------------------------------------------------------------------- +-- QuasiQuote + +lex_qquasiquote_tok :: Action +lex_qquasiquote_tok span buf len = do + let (qual, quoter) = splitQualName (stepOn buf) (len - 2) False + quoteStart <- getParsedLoc + quote <- lex_quasiquote (psRealLoc quoteStart) "" + end <- getParsedLoc + return (L (mkPsSpan (psSpanStart span) end) + (ITqQuasiQuote (qual, + quoter, + mkFastString (reverse quote), + mkPsSpan quoteStart end))) + +lex_quasiquote_tok :: Action +lex_quasiquote_tok span buf len = do + let quoter = tail (lexemeToString buf (len - 1)) + -- 'tail' drops the initial '[', + -- while the -1 drops the trailing '|' + quoteStart <- getParsedLoc + quote <- lex_quasiquote (psRealLoc quoteStart) "" + end <- getParsedLoc + return (L (mkPsSpan (psSpanStart span) end) + (ITquasiQuote (mkFastString quoter, + mkFastString (reverse quote), + mkPsSpan quoteStart end))) + +lex_quasiquote :: RealSrcLoc -> String -> P String +lex_quasiquote start s = do + i <- getInput + case alexGetChar' i of + Nothing -> quasiquote_error start + + -- NB: The string "|]" terminates the quasiquote, + -- with absolutely no escaping. See the extensive + -- discussion on #5348 for why there is no + -- escape handling. + Just ('|',i) + | Just (']',i) <- alexGetChar' i + -> do { setInput i; return s } + + Just (c, i) -> do + setInput i; lex_quasiquote start (c : s) + +quasiquote_error :: RealSrcLoc -> P a +quasiquote_error start = do + (AI end buf) <- getInput + reportLexError start (psRealLoc end) buf "unterminated quasiquotation" + +-- ----------------------------------------------------------------------------- +-- Warnings + +warnTab :: Action +warnTab srcspan _buf _len = do + addTabWarning (psRealSpan srcspan) + lexToken + +warnThen :: WarningFlag -> SDoc -> Action -> Action +warnThen option warning action srcspan buf len = do + addWarning option (RealSrcSpan (psRealSpan srcspan) Nothing) warning + action srcspan buf len + +-- ----------------------------------------------------------------------------- +-- The Parse Monad + +-- | Do we want to generate ';' layout tokens? In some cases we just want to +-- generate '}', e.g. in MultiWayIf we don't need ';'s because '|' separates +-- alternatives (unlike a `case` expression where we need ';' to as a separator +-- between alternatives). +type GenSemic = Bool + +generateSemic, dontGenerateSemic :: GenSemic +generateSemic = True +dontGenerateSemic = False + +data LayoutContext + = NoLayout + | Layout !Int !GenSemic + deriving Show + +-- | The result of running a parser. +data ParseResult a + = POk -- ^ The parser has consumed a (possibly empty) prefix + -- of the input and produced a result. Use 'getMessages' + -- to check for accumulated warnings and non-fatal errors. + PState -- ^ The resulting parsing state. Can be used to resume parsing. + a -- ^ The resulting value. + | PFailed -- ^ The parser has consumed a (possibly empty) prefix + -- of the input and failed. + PState -- ^ The parsing state right before failure, including the fatal + -- parse error. 'getMessages' and 'getErrorMessages' must return + -- a non-empty bag of errors. + +-- | Test whether a 'WarningFlag' is set +warnopt :: WarningFlag -> ParserFlags -> Bool +warnopt f options = f `EnumSet.member` pWarningFlags options + +-- | The subset of the 'DynFlags' used by the parser. +-- See 'mkParserFlags' or 'mkParserFlags'' for ways to construct this. +data ParserFlags = ParserFlags { + pWarningFlags :: EnumSet WarningFlag + , pThisPackage :: UnitId -- ^ key of package currently being compiled + , pExtsBitmap :: !ExtsBitmap -- ^ bitmap of permitted extensions + } + +data PState = PState { + buffer :: StringBuffer, + options :: ParserFlags, + -- This needs to take DynFlags as an argument until + -- we have a fix for #10143 + messages :: DynFlags -> Messages, + tab_first :: Maybe RealSrcSpan, -- pos of first tab warning in the file + tab_count :: !Int, -- number of tab warnings in the file + last_tk :: Maybe Token, + last_loc :: PsSpan, -- pos of previous token + last_len :: !Int, -- len of previous token + loc :: PsLoc, -- current loc (end of prev token + 1) + context :: [LayoutContext], + lex_state :: [Int], + srcfiles :: [FastString], + -- Used in the alternative layout rule: + -- These tokens are the next ones to be sent out. They are + -- just blindly emitted, without the rule looking at them again: + alr_pending_implicit_tokens :: [PsLocated Token], + -- This is the next token to be considered or, if it is Nothing, + -- we need to get the next token from the input stream: + alr_next_token :: Maybe (PsLocated Token), + -- This is what we consider to be the location of the last token + -- emitted: + alr_last_loc :: PsSpan, + -- The stack of layout contexts: + alr_context :: [ALRContext], + -- Are we expecting a '{'? If it's Just, then the ALRLayout tells + -- us what sort of layout the '{' will open: + alr_expecting_ocurly :: Maybe ALRLayout, + -- Have we just had the '}' for a let block? If so, than an 'in' + -- token doesn't need to close anything: + alr_justClosedExplicitLetBlock :: Bool, + + -- The next three are used to implement Annotations giving the + -- locations of 'noise' tokens in the source, so that users of + -- the GHC API can do source to source conversions. + -- See note [Api annotations] in GHC.Parser.Annotation + annotations :: [(ApiAnnKey,[RealSrcSpan])], + eof_pos :: Maybe RealSrcSpan, + comment_q :: [RealLocated AnnotationComment], + annotations_comments :: [(RealSrcSpan,[RealLocated AnnotationComment])] + } + -- last_loc and last_len are used when generating error messages, + -- and in pushCurrentContext only. Sigh, if only Happy passed the + -- current token to happyError, we could at least get rid of last_len. + -- Getting rid of last_loc would require finding another way to + -- implement pushCurrentContext (which is only called from one place). + +data ALRContext = ALRNoLayout Bool{- does it contain commas? -} + Bool{- is it a 'let' block? -} + | ALRLayout ALRLayout Int +data ALRLayout = ALRLayoutLet + | ALRLayoutWhere + | ALRLayoutOf + | ALRLayoutDo + +-- | The parsing monad, isomorphic to @StateT PState Maybe@. +newtype P a = P { unP :: PState -> ParseResult a } + +instance Functor P where + fmap = liftM + +instance Applicative P where + pure = returnP + (<*>) = ap + +instance Monad P where + (>>=) = thenP + +returnP :: a -> P a +returnP a = a `seq` (P $ \s -> POk s a) + +thenP :: P a -> (a -> P b) -> P b +(P m) `thenP` k = P $ \ s -> + case m s of + POk s1 a -> (unP (k a)) s1 + PFailed s1 -> PFailed s1 + +failMsgP :: String -> P a +failMsgP msg = do + pState <- getPState + addFatalError (mkSrcSpanPs (last_loc pState)) (text msg) + +failLocMsgP :: RealSrcLoc -> RealSrcLoc -> String -> P a +failLocMsgP loc1 loc2 str = + addFatalError (RealSrcSpan (mkRealSrcSpan loc1 loc2) Nothing) (text str) + +getPState :: P PState +getPState = P $ \s -> POk s s + +withThisPackage :: (UnitId -> a) -> P a +withThisPackage f = P $ \s@(PState{options = o}) -> POk s (f (pThisPackage o)) + +getExts :: P ExtsBitmap +getExts = P $ \s -> POk s (pExtsBitmap . options $ s) + +setExts :: (ExtsBitmap -> ExtsBitmap) -> P () +setExts f = P $ \s -> POk s { + options = + let p = options s + in p { pExtsBitmap = f (pExtsBitmap p) } + } () + +setSrcLoc :: RealSrcLoc -> P () +setSrcLoc new_loc = + P $ \s@(PState{ loc = PsLoc _ buf_loc }) -> + POk s{ loc = PsLoc new_loc buf_loc } () + +getRealSrcLoc :: P RealSrcLoc +getRealSrcLoc = P $ \s@(PState{ loc=loc }) -> POk s (psRealLoc loc) + +getParsedLoc :: P PsLoc +getParsedLoc = P $ \s@(PState{ loc=loc }) -> POk s loc + +addSrcFile :: FastString -> P () +addSrcFile f = P $ \s -> POk s{ srcfiles = f : srcfiles s } () + +setEofPos :: RealSrcSpan -> P () +setEofPos span = P $ \s -> POk s{ eof_pos = Just span } () + +setLastToken :: PsSpan -> Int -> P () +setLastToken loc len = P $ \s -> POk s { + last_loc=loc, + last_len=len + } () + +setLastTk :: Token -> P () +setLastTk tk = P $ \s -> POk s { last_tk = Just tk } () + +getLastTk :: P (Maybe Token) +getLastTk = P $ \s@(PState { last_tk = last_tk }) -> POk s last_tk + +data AlexInput = AI PsLoc StringBuffer + +{- +Note [Unicode in Alex] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Although newer versions of Alex support unicode, this grammar is processed with +the old style '--latin1' behaviour. This means that when implementing the +functions + + alexGetByte :: AlexInput -> Maybe (Word8,AlexInput) + alexInputPrevChar :: AlexInput -> Char + +which Alex uses to take apart our 'AlexInput', we must + + * return a latin1 character in the 'Word8' that 'alexGetByte' expects + * return a latin1 character in 'alexInputPrevChar'. + +We handle this in 'adjustChar' by squishing entire classes of unicode +characters into single bytes. +-} + +{-# INLINE adjustChar #-} +adjustChar :: Char -> Word8 +adjustChar c = fromIntegral $ ord adj_c + where non_graphic = '\x00' + upper = '\x01' + lower = '\x02' + digit = '\x03' + symbol = '\x04' + space = '\x05' + other_graphic = '\x06' + uniidchar = '\x07' + + adj_c + | c <= '\x07' = non_graphic + | c <= '\x7f' = c + -- Alex doesn't handle Unicode, so when Unicode + -- character is encountered we output these values + -- with the actual character value hidden in the state. + | otherwise = + -- NB: The logic behind these definitions is also reflected + -- in basicTypes/Lexeme.hs + -- Any changes here should likely be reflected there. + + case generalCategory c of + UppercaseLetter -> upper + LowercaseLetter -> lower + TitlecaseLetter -> upper + ModifierLetter -> uniidchar -- see #10196 + OtherLetter -> lower -- see #1103 + NonSpacingMark -> uniidchar -- see #7650 + SpacingCombiningMark -> other_graphic + EnclosingMark -> other_graphic + DecimalNumber -> digit + LetterNumber -> other_graphic + OtherNumber -> digit -- see #4373 + ConnectorPunctuation -> symbol + DashPunctuation -> symbol + OpenPunctuation -> other_graphic + ClosePunctuation -> other_graphic + InitialQuote -> other_graphic + FinalQuote -> other_graphic + OtherPunctuation -> symbol + MathSymbol -> symbol + CurrencySymbol -> symbol + ModifierSymbol -> symbol + OtherSymbol -> symbol + Space -> space + _other -> non_graphic + +-- Getting the previous 'Char' isn't enough here - we need to convert it into +-- the same format that 'alexGetByte' would have produced. +-- +-- See Note [Unicode in Alex] and #13986. +alexInputPrevChar :: AlexInput -> Char +alexInputPrevChar (AI _ buf) = chr (fromIntegral (adjustChar pc)) + where pc = prevChar buf '\n' + +-- backwards compatibility for Alex 2.x +alexGetChar :: AlexInput -> Maybe (Char,AlexInput) +alexGetChar inp = case alexGetByte inp of + Nothing -> Nothing + Just (b,i) -> c `seq` Just (c,i) + where c = chr $ fromIntegral b + +-- See Note [Unicode in Alex] +alexGetByte :: AlexInput -> Maybe (Word8,AlexInput) +alexGetByte (AI loc s) + | atEnd s = Nothing + | otherwise = byte `seq` loc' `seq` s' `seq` + --trace (show (ord c)) $ + Just (byte, (AI loc' s')) + where (c,s') = nextChar s + loc' = advancePsLoc loc c + byte = adjustChar c + +-- This version does not squash unicode characters, it is used when +-- lexing strings. +alexGetChar' :: AlexInput -> Maybe (Char,AlexInput) +alexGetChar' (AI loc s) + | atEnd s = Nothing + | otherwise = c `seq` loc' `seq` s' `seq` + --trace (show (ord c)) $ + Just (c, (AI loc' s')) + where (c,s') = nextChar s + loc' = advancePsLoc loc c + +getInput :: P AlexInput +getInput = P $ \s@PState{ loc=l, buffer=b } -> POk s (AI l b) + +setInput :: AlexInput -> P () +setInput (AI l b) = P $ \s -> POk s{ loc=l, buffer=b } () + +nextIsEOF :: P Bool +nextIsEOF = do + AI _ s <- getInput + return $ atEnd s + +pushLexState :: Int -> P () +pushLexState ls = P $ \s@PState{ lex_state=l } -> POk s{lex_state=ls:l} () + +popLexState :: P Int +popLexState = P $ \s@PState{ lex_state=ls:l } -> POk s{ lex_state=l } ls + +getLexState :: P Int +getLexState = P $ \s@PState{ lex_state=ls:_ } -> POk s ls + +popNextToken :: P (Maybe (PsLocated Token)) +popNextToken + = P $ \s@PState{ alr_next_token = m } -> + POk (s {alr_next_token = Nothing}) m + +activeContext :: P Bool +activeContext = do + ctxt <- getALRContext + expc <- getAlrExpectingOCurly + impt <- implicitTokenPending + case (ctxt,expc) of + ([],Nothing) -> return impt + _other -> return True + +resetAlrLastLoc :: FastString -> P () +resetAlrLastLoc file = + P $ \s@(PState {alr_last_loc = PsSpan _ buf_span}) -> + POk s{ alr_last_loc = PsSpan (alrInitialLoc file) buf_span } () + +setAlrLastLoc :: PsSpan -> P () +setAlrLastLoc l = P $ \s -> POk (s {alr_last_loc = l}) () + +getAlrLastLoc :: P PsSpan +getAlrLastLoc = P $ \s@(PState {alr_last_loc = l}) -> POk s l + +getALRContext :: P [ALRContext] +getALRContext = P $ \s@(PState {alr_context = cs}) -> POk s cs + +setALRContext :: [ALRContext] -> P () +setALRContext cs = P $ \s -> POk (s {alr_context = cs}) () + +getJustClosedExplicitLetBlock :: P Bool +getJustClosedExplicitLetBlock + = P $ \s@(PState {alr_justClosedExplicitLetBlock = b}) -> POk s b + +setJustClosedExplicitLetBlock :: Bool -> P () +setJustClosedExplicitLetBlock b + = P $ \s -> POk (s {alr_justClosedExplicitLetBlock = b}) () + +setNextToken :: PsLocated Token -> P () +setNextToken t = P $ \s -> POk (s {alr_next_token = Just t}) () + +implicitTokenPending :: P Bool +implicitTokenPending + = P $ \s@PState{ alr_pending_implicit_tokens = ts } -> + case ts of + [] -> POk s False + _ -> POk s True + +popPendingImplicitToken :: P (Maybe (PsLocated Token)) +popPendingImplicitToken + = P $ \s@PState{ alr_pending_implicit_tokens = ts } -> + case ts of + [] -> POk s Nothing + (t : ts') -> POk (s {alr_pending_implicit_tokens = ts'}) (Just t) + +setPendingImplicitTokens :: [PsLocated Token] -> P () +setPendingImplicitTokens ts = P $ \s -> POk (s {alr_pending_implicit_tokens = ts}) () + +getAlrExpectingOCurly :: P (Maybe ALRLayout) +getAlrExpectingOCurly = P $ \s@(PState {alr_expecting_ocurly = b}) -> POk s b + +setAlrExpectingOCurly :: Maybe ALRLayout -> P () +setAlrExpectingOCurly b = P $ \s -> POk (s {alr_expecting_ocurly = b}) () + +-- | For reasons of efficiency, boolean parsing flags (eg, language extensions +-- or whether we are currently in a @RULE@ pragma) are represented by a bitmap +-- stored in a @Word64@. +type ExtsBitmap = Word64 + +xbit :: ExtBits -> ExtsBitmap +xbit = bit . fromEnum + +xtest :: ExtBits -> ExtsBitmap -> Bool +xtest ext xmap = testBit xmap (fromEnum ext) + +-- | Various boolean flags, mostly language extensions, that impact lexing and +-- parsing. Note that a handful of these can change during lexing/parsing. +data ExtBits + -- Flags that are constant once parsing starts + = FfiBit + | InterruptibleFfiBit + | CApiFfiBit + | ArrowsBit + | ThBit + | ThQuotesBit + | IpBit + | OverloadedLabelsBit -- #x overloaded labels + | ExplicitForallBit -- the 'forall' keyword + | BangPatBit -- Tells the parser to understand bang-patterns + -- (doesn't affect the lexer) + | PatternSynonymsBit -- pattern synonyms + | HaddockBit-- Lex and parse Haddock comments + | MagicHashBit -- "#" in both functions and operators + | RecursiveDoBit -- mdo + | UnicodeSyntaxBit -- the forall symbol, arrow symbols, etc + | UnboxedTuplesBit -- (# and #) + | UnboxedSumsBit -- (# and #) + | DatatypeContextsBit + | MonadComprehensionsBit + | TransformComprehensionsBit + | QqBit -- enable quasiquoting + | RawTokenStreamBit -- producing a token stream with all comments included + | AlternativeLayoutRuleBit + | ALRTransitionalBit + | RelaxedLayoutBit + | NondecreasingIndentationBit + | SafeHaskellBit + | TraditionalRecordSyntaxBit + | ExplicitNamespacesBit + | LambdaCaseBit + | BinaryLiteralsBit + | NegativeLiteralsBit + | HexFloatLiteralsBit + | TypeApplicationsBit + | StaticPointersBit + | NumericUnderscoresBit + | StarIsTypeBit + | BlockArgumentsBit + | NPlusKPatternsBit + | DoAndIfThenElseBit + | MultiWayIfBit + | GadtSyntaxBit + | ImportQualifiedPostBit + + -- Flags that are updated once parsing starts + | InRulePragBit + | InNestedCommentBit -- See Note [Nested comment line pragmas] + | UsePosPragsBit + -- ^ If this is enabled, '{-# LINE ... -#}' and '{-# COLUMN ... #-}' + -- update the internal position. Otherwise, those pragmas are lexed as + -- tokens of their own. + deriving Enum + + + + + +-- PState for parsing options pragmas +-- +pragState :: DynFlags -> StringBuffer -> RealSrcLoc -> PState +pragState dynflags buf loc = (mkPState dynflags buf loc) { + lex_state = [bol, option_prags, 0] + } + +{-# INLINE mkParserFlags' #-} +mkParserFlags' + :: EnumSet WarningFlag -- ^ warnings flags enabled + -> EnumSet LangExt.Extension -- ^ permitted language extensions enabled + -> UnitId -- ^ key of package currently being compiled + -> Bool -- ^ are safe imports on? + -> Bool -- ^ keeping Haddock comment tokens + -> Bool -- ^ keep regular comment tokens + + -> Bool + -- ^ If this is enabled, '{-# LINE ... -#}' and '{-# COLUMN ... #-}' update + -- the internal position kept by the parser. Otherwise, those pragmas are + -- lexed as 'ITline_prag' and 'ITcolumn_prag' tokens. + + -> ParserFlags +-- ^ Given exactly the information needed, set up the 'ParserFlags' +mkParserFlags' warningFlags extensionFlags thisPackage + safeImports isHaddock rawTokStream usePosPrags = + ParserFlags { + pWarningFlags = warningFlags + , pThisPackage = thisPackage + , pExtsBitmap = safeHaskellBit .|. langExtBits .|. optBits + } + where + safeHaskellBit = SafeHaskellBit `setBitIf` safeImports + langExtBits = + FfiBit `xoptBit` LangExt.ForeignFunctionInterface + .|. InterruptibleFfiBit `xoptBit` LangExt.InterruptibleFFI + .|. CApiFfiBit `xoptBit` LangExt.CApiFFI + .|. ArrowsBit `xoptBit` LangExt.Arrows + .|. ThBit `xoptBit` LangExt.TemplateHaskell + .|. ThQuotesBit `xoptBit` LangExt.TemplateHaskellQuotes + .|. QqBit `xoptBit` LangExt.QuasiQuotes + .|. IpBit `xoptBit` LangExt.ImplicitParams + .|. OverloadedLabelsBit `xoptBit` LangExt.OverloadedLabels + .|. ExplicitForallBit `xoptBit` LangExt.ExplicitForAll + .|. BangPatBit `xoptBit` LangExt.BangPatterns + .|. MagicHashBit `xoptBit` LangExt.MagicHash + .|. RecursiveDoBit `xoptBit` LangExt.RecursiveDo + .|. UnicodeSyntaxBit `xoptBit` LangExt.UnicodeSyntax + .|. UnboxedTuplesBit `xoptBit` LangExt.UnboxedTuples + .|. UnboxedSumsBit `xoptBit` LangExt.UnboxedSums + .|. DatatypeContextsBit `xoptBit` LangExt.DatatypeContexts + .|. TransformComprehensionsBit `xoptBit` LangExt.TransformListComp + .|. MonadComprehensionsBit `xoptBit` LangExt.MonadComprehensions + .|. AlternativeLayoutRuleBit `xoptBit` LangExt.AlternativeLayoutRule + .|. ALRTransitionalBit `xoptBit` LangExt.AlternativeLayoutRuleTransitional + .|. RelaxedLayoutBit `xoptBit` LangExt.RelaxedLayout + .|. NondecreasingIndentationBit `xoptBit` LangExt.NondecreasingIndentation + .|. TraditionalRecordSyntaxBit `xoptBit` LangExt.TraditionalRecordSyntax + .|. ExplicitNamespacesBit `xoptBit` LangExt.ExplicitNamespaces + .|. LambdaCaseBit `xoptBit` LangExt.LambdaCase + .|. BinaryLiteralsBit `xoptBit` LangExt.BinaryLiterals + .|. NegativeLiteralsBit `xoptBit` LangExt.NegativeLiterals + .|. HexFloatLiteralsBit `xoptBit` LangExt.HexFloatLiterals + .|. PatternSynonymsBit `xoptBit` LangExt.PatternSynonyms + .|. TypeApplicationsBit `xoptBit` LangExt.TypeApplications + .|. StaticPointersBit `xoptBit` LangExt.StaticPointers + .|. NumericUnderscoresBit `xoptBit` LangExt.NumericUnderscores + .|. StarIsTypeBit `xoptBit` LangExt.StarIsType + .|. BlockArgumentsBit `xoptBit` LangExt.BlockArguments + .|. NPlusKPatternsBit `xoptBit` LangExt.NPlusKPatterns + .|. DoAndIfThenElseBit `xoptBit` LangExt.DoAndIfThenElse + .|. MultiWayIfBit `xoptBit` LangExt.MultiWayIf + .|. GadtSyntaxBit `xoptBit` LangExt.GADTSyntax + .|. ImportQualifiedPostBit `xoptBit` LangExt.ImportQualifiedPost + optBits = + HaddockBit `setBitIf` isHaddock + .|. RawTokenStreamBit `setBitIf` rawTokStream + .|. UsePosPragsBit `setBitIf` usePosPrags + + xoptBit bit ext = bit `setBitIf` EnumSet.member ext extensionFlags + + setBitIf :: ExtBits -> Bool -> ExtsBitmap + b `setBitIf` cond | cond = xbit b + | otherwise = 0 + +-- | Extracts the flag information needed for parsing +mkParserFlags :: DynFlags -> ParserFlags +mkParserFlags = + mkParserFlags' + <$> DynFlags.warningFlags + <*> DynFlags.extensionFlags + <*> DynFlags.thisPackage + <*> safeImportsOn + <*> gopt Opt_Haddock + <*> gopt Opt_KeepRawTokenStream + <*> const True + +-- | Creates a parse state from a 'DynFlags' value +mkPState :: DynFlags -> StringBuffer -> RealSrcLoc -> PState +mkPState flags = mkPStatePure (mkParserFlags flags) + +-- | Creates a parse state from a 'ParserFlags' value +mkPStatePure :: ParserFlags -> StringBuffer -> RealSrcLoc -> PState +mkPStatePure options buf loc = + PState { + buffer = buf, + options = options, + messages = const emptyMessages, + tab_first = Nothing, + tab_count = 0, + last_tk = Nothing, + last_loc = mkPsSpan init_loc init_loc, + last_len = 0, + loc = init_loc, + context = [], + lex_state = [bol, 0], + srcfiles = [], + alr_pending_implicit_tokens = [], + alr_next_token = Nothing, + alr_last_loc = PsSpan (alrInitialLoc (fsLit "<no file>")) (BufSpan (BufPos 0) (BufPos 0)), + alr_context = [], + alr_expecting_ocurly = Nothing, + alr_justClosedExplicitLetBlock = False, + annotations = [], + eof_pos = Nothing, + comment_q = [], + annotations_comments = [] + } + where init_loc = PsLoc loc (BufPos 0) + +-- | An mtl-style class for monads that support parsing-related operations. +-- For example, sometimes we make a second pass over the parsing results to validate, +-- disambiguate, or rearrange them, and we do so in the PV monad which cannot consume +-- input but can report parsing errors, check for extension bits, and accumulate +-- parsing annotations. Both P and PV are instances of MonadP. +-- +-- MonadP grants us convenient overloading. The other option is to have separate operations +-- for each monad: addErrorP vs addErrorPV, getBitP vs getBitPV, and so on. +-- +class Monad m => MonadP m where + -- | Add a non-fatal error. Use this when the parser can produce a result + -- despite the error. + -- + -- For example, when GHC encounters a @forall@ in a type, + -- but @-XExplicitForAll@ is disabled, the parser constructs @ForAllTy@ + -- as if @-XExplicitForAll@ was enabled, adding a non-fatal error to + -- the accumulator. + -- + -- Control flow wise, non-fatal errors act like warnings: they are added + -- to the accumulator and parsing continues. This allows GHC to report + -- more than one parse error per file. + -- + addError :: SrcSpan -> SDoc -> m () + -- | Add a warning to the accumulator. + -- Use 'getMessages' to get the accumulated warnings. + addWarning :: WarningFlag -> SrcSpan -> SDoc -> m () + -- | Add a fatal error. This will be the last error reported by the parser, and + -- the parser will not produce any result, ending in a 'PFailed' state. + addFatalError :: SrcSpan -> SDoc -> m a + -- | Check if a given flag is currently set in the bitmap. + getBit :: ExtBits -> m Bool + -- | Given a location and a list of AddAnn, apply them all to the location. + addAnnotation :: SrcSpan -- SrcSpan of enclosing AST construct + -> AnnKeywordId -- The first two parameters are the key + -> SrcSpan -- The location of the keyword itself + -> m () + +appendError + :: SrcSpan + -> SDoc + -> (DynFlags -> Messages) + -> (DynFlags -> Messages) +appendError srcspan msg m = + \d -> + let (ws, es) = m d + errormsg = mkErrMsg d srcspan alwaysQualify msg + es' = es `snocBag` errormsg + in (ws, es') + +appendWarning + :: ParserFlags + -> WarningFlag + -> SrcSpan + -> SDoc + -> (DynFlags -> Messages) + -> (DynFlags -> Messages) +appendWarning o option srcspan warning m = + \d -> + let (ws, es) = m d + warning' = makeIntoWarning (Reason option) $ + mkWarnMsg d srcspan alwaysQualify warning + ws' = if warnopt option o then ws `snocBag` warning' else ws + in (ws', es) + +instance MonadP P where + addError srcspan msg + = P $ \s@PState{messages=m} -> + POk s{messages=appendError srcspan msg m} () + addWarning option srcspan warning + = P $ \s@PState{messages=m, options=o} -> + POk s{messages=appendWarning o option srcspan warning m} () + addFatalError span msg = + addError span msg >> P PFailed + getBit ext = P $ \s -> let b = ext `xtest` pExtsBitmap (options s) + in b `seq` POk s b + addAnnotation (RealSrcSpan l _) a (RealSrcSpan v _) = do + addAnnotationOnly l a v + allocateCommentsP l + addAnnotation _ _ _ = return () + +addAnnsAt :: MonadP m => SrcSpan -> [AddAnn] -> m () +addAnnsAt l = mapM_ (\(AddAnn a v) -> addAnnotation l a v) + +addTabWarning :: RealSrcSpan -> P () +addTabWarning srcspan + = P $ \s@PState{tab_first=tf, tab_count=tc, options=o} -> + let tf' = if isJust tf then tf else Just srcspan + tc' = tc + 1 + s' = if warnopt Opt_WarnTabs o + then s{tab_first = tf', tab_count = tc'} + else s + in POk s' () + +mkTabWarning :: PState -> DynFlags -> Maybe ErrMsg +mkTabWarning PState{tab_first=tf, tab_count=tc} d = + let middle = if tc == 1 + then text "" + else text ", and in" <+> speakNOf (tc - 1) (text "further location") + message = text "Tab character found here" + <> middle + <> text "." + $+$ text "Please use spaces instead." + in fmap (\s -> makeIntoWarning (Reason Opt_WarnTabs) $ + mkWarnMsg d (RealSrcSpan s Nothing) alwaysQualify message) tf + +-- | Get a bag of the errors that have been accumulated so far. +-- Does not take -Werror into account. +getErrorMessages :: PState -> DynFlags -> ErrorMessages +getErrorMessages PState{messages=m} d = + let (_, es) = m d in es + +-- | Get the warnings and errors accumulated so far. +-- Does not take -Werror into account. +getMessages :: PState -> DynFlags -> Messages +getMessages p@PState{messages=m} d = + let (ws, es) = m d + tabwarning = mkTabWarning p d + ws' = maybe ws (`consBag` ws) tabwarning + in (ws', es) + +getContext :: P [LayoutContext] +getContext = P $ \s@PState{context=ctx} -> POk s ctx + +setContext :: [LayoutContext] -> P () +setContext ctx = P $ \s -> POk s{context=ctx} () + +popContext :: P () +popContext = P $ \ s@(PState{ buffer = buf, options = o, context = ctx, + last_len = len, last_loc = last_loc }) -> + case ctx of + (_:tl) -> + POk s{ context = tl } () + [] -> + unP (addFatalError (mkSrcSpanPs last_loc) (srcParseErr o buf len)) s + +-- Push a new layout context at the indentation of the last token read. +pushCurrentContext :: GenSemic -> P () +pushCurrentContext gen_semic = P $ \ s@PState{ last_loc=loc, context=ctx } -> + POk s{context = Layout (srcSpanStartCol (psRealSpan loc)) gen_semic : ctx} () + +-- This is only used at the outer level of a module when the 'module' keyword is +-- missing. +pushModuleContext :: P () +pushModuleContext = pushCurrentContext generateSemic + +getOffside :: P (Ordering, Bool) +getOffside = P $ \s@PState{last_loc=loc, context=stk} -> + let offs = srcSpanStartCol (psRealSpan loc) in + let ord = case stk of + Layout n gen_semic : _ -> + --trace ("layout: " ++ show n ++ ", offs: " ++ show offs) $ + (compare offs n, gen_semic) + _ -> + (GT, dontGenerateSemic) + in POk s ord + +-- --------------------------------------------------------------------------- +-- Construct a parse error + +srcParseErr + :: ParserFlags + -> StringBuffer -- current buffer (placed just after the last token) + -> Int -- length of the previous token + -> MsgDoc +srcParseErr options buf len + = if null token + then text "parse error (possibly incorrect indentation or mismatched brackets)" + else text "parse error on input" <+> quotes (text token) + $$ ppWhen (not th_enabled && token == "$") -- #7396 + (text "Perhaps you intended to use TemplateHaskell") + $$ ppWhen (token == "<-") + (if mdoInLast100 + then text "Perhaps you intended to use RecursiveDo" + else text "Perhaps this statement should be within a 'do' block?") + $$ ppWhen (token == "=" && doInLast100) -- #15849 + (text "Perhaps you need a 'let' in a 'do' block?" + $$ text "e.g. 'let x = 5' instead of 'x = 5'") + $$ ppWhen (not ps_enabled && pattern == "pattern ") -- #12429 + (text "Perhaps you intended to use PatternSynonyms") + where token = lexemeToString (offsetBytes (-len) buf) len + pattern = decodePrevNChars 8 buf + last100 = decodePrevNChars 100 buf + doInLast100 = "do" `isInfixOf` last100 + mdoInLast100 = "mdo" `isInfixOf` last100 + th_enabled = ThQuotesBit `xtest` pExtsBitmap options + ps_enabled = PatternSynonymsBit `xtest` pExtsBitmap options + +-- Report a parse failure, giving the span of the previous token as +-- the location of the error. This is the entry point for errors +-- detected during parsing. +srcParseFail :: P a +srcParseFail = P $ \s@PState{ buffer = buf, options = o, last_len = len, + last_loc = last_loc } -> + unP (addFatalError (mkSrcSpanPs last_loc) (srcParseErr o buf len)) s + +-- A lexical error is reported at a particular position in the source file, +-- not over a token range. +lexError :: String -> P a +lexError str = do + loc <- getRealSrcLoc + (AI end buf) <- getInput + reportLexError loc (psRealLoc end) buf str + +-- ----------------------------------------------------------------------------- +-- This is the top-level function: called from the parser each time a +-- new token is to be read from the input. + +lexer, lexerDbg :: Bool -> (Located Token -> P a) -> P a + +lexer queueComments cont = do + alr <- getBit AlternativeLayoutRuleBit + let lexTokenFun = if alr then lexTokenAlr else lexToken + (L span tok) <- lexTokenFun + --trace ("token: " ++ show tok) $ do + + if (queueComments && isDocComment tok) + then queueComment (L (psRealSpan span) tok) + else return () + + if (queueComments && isComment tok) + then queueComment (L (psRealSpan span) tok) >> lexer queueComments cont + else cont (L (mkSrcSpanPs span) tok) + +-- Use this instead of 'lexer' in GHC.Parser to dump the tokens for debugging. +lexerDbg queueComments cont = lexer queueComments contDbg + where + contDbg tok = trace ("token: " ++ show (unLoc tok)) (cont tok) + +lexTokenAlr :: P (PsLocated Token) +lexTokenAlr = do mPending <- popPendingImplicitToken + t <- case mPending of + Nothing -> + do mNext <- popNextToken + t <- case mNext of + Nothing -> lexToken + Just next -> return next + alternativeLayoutRuleToken t + Just t -> + return t + setAlrLastLoc (getLoc t) + case unLoc t of + ITwhere -> setAlrExpectingOCurly (Just ALRLayoutWhere) + ITlet -> setAlrExpectingOCurly (Just ALRLayoutLet) + ITof -> setAlrExpectingOCurly (Just ALRLayoutOf) + ITlcase -> setAlrExpectingOCurly (Just ALRLayoutOf) + ITdo -> setAlrExpectingOCurly (Just ALRLayoutDo) + ITmdo -> setAlrExpectingOCurly (Just ALRLayoutDo) + ITrec -> setAlrExpectingOCurly (Just ALRLayoutDo) + _ -> return () + return t + +alternativeLayoutRuleToken :: PsLocated Token -> P (PsLocated Token) +alternativeLayoutRuleToken t + = do context <- getALRContext + lastLoc <- getAlrLastLoc + mExpectingOCurly <- getAlrExpectingOCurly + transitional <- getBit ALRTransitionalBit + justClosedExplicitLetBlock <- getJustClosedExplicitLetBlock + setJustClosedExplicitLetBlock False + let thisLoc = getLoc t + thisCol = srcSpanStartCol (psRealSpan thisLoc) + newLine = srcSpanStartLine (psRealSpan thisLoc) > srcSpanEndLine (psRealSpan lastLoc) + case (unLoc t, context, mExpectingOCurly) of + -- This case handles a GHC extension to the original H98 + -- layout rule... + (ITocurly, _, Just alrLayout) -> + do setAlrExpectingOCurly Nothing + let isLet = case alrLayout of + ALRLayoutLet -> True + _ -> False + setALRContext (ALRNoLayout (containsCommas ITocurly) isLet : context) + return t + -- ...and makes this case unnecessary + {- + -- I think our implicit open-curly handling is slightly + -- different to John's, in how it interacts with newlines + -- and "in" + (ITocurly, _, Just _) -> + do setAlrExpectingOCurly Nothing + setNextToken t + lexTokenAlr + -} + (_, ALRLayout _ col : _ls, Just expectingOCurly) + | (thisCol > col) || + (thisCol == col && + isNonDecreasingIndentation expectingOCurly) -> + do setAlrExpectingOCurly Nothing + setALRContext (ALRLayout expectingOCurly thisCol : context) + setNextToken t + return (L thisLoc ITvocurly) + | otherwise -> + do setAlrExpectingOCurly Nothing + setPendingImplicitTokens [L lastLoc ITvccurly] + setNextToken t + return (L lastLoc ITvocurly) + (_, _, Just expectingOCurly) -> + do setAlrExpectingOCurly Nothing + setALRContext (ALRLayout expectingOCurly thisCol : context) + setNextToken t + return (L thisLoc ITvocurly) + -- We do the [] cases earlier than in the spec, as we + -- have an actual EOF token + (ITeof, ALRLayout _ _ : ls, _) -> + do setALRContext ls + setNextToken t + return (L thisLoc ITvccurly) + (ITeof, _, _) -> + return t + -- the other ITeof case omitted; general case below covers it + (ITin, _, _) + | justClosedExplicitLetBlock -> + return t + (ITin, ALRLayout ALRLayoutLet _ : ls, _) + | newLine -> + do setPendingImplicitTokens [t] + setALRContext ls + return (L thisLoc ITvccurly) + -- This next case is to handle a transitional issue: + (ITwhere, ALRLayout _ col : ls, _) + | newLine && thisCol == col && transitional -> + do addWarning Opt_WarnAlternativeLayoutRuleTransitional + (mkSrcSpanPs thisLoc) + (transitionalAlternativeLayoutWarning + "`where' clause at the same depth as implicit layout block") + setALRContext ls + setNextToken t + -- Note that we use lastLoc, as we may need to close + -- more layouts, or give a semicolon + return (L lastLoc ITvccurly) + -- This next case is to handle a transitional issue: + (ITvbar, ALRLayout _ col : ls, _) + | newLine && thisCol == col && transitional -> + do addWarning Opt_WarnAlternativeLayoutRuleTransitional + (mkSrcSpanPs thisLoc) + (transitionalAlternativeLayoutWarning + "`|' at the same depth as implicit layout block") + setALRContext ls + setNextToken t + -- Note that we use lastLoc, as we may need to close + -- more layouts, or give a semicolon + return (L lastLoc ITvccurly) + (_, ALRLayout _ col : ls, _) + | newLine && thisCol == col -> + do setNextToken t + let loc = psSpanStart thisLoc + zeroWidthLoc = mkPsSpan loc loc + return (L zeroWidthLoc ITsemi) + | newLine && thisCol < col -> + do setALRContext ls + setNextToken t + -- Note that we use lastLoc, as we may need to close + -- more layouts, or give a semicolon + return (L lastLoc ITvccurly) + -- We need to handle close before open, as 'then' is both + -- an open and a close + (u, _, _) + | isALRclose u -> + case context of + ALRLayout _ _ : ls -> + do setALRContext ls + setNextToken t + return (L thisLoc ITvccurly) + ALRNoLayout _ isLet : ls -> + do let ls' = if isALRopen u + then ALRNoLayout (containsCommas u) False : ls + else ls + setALRContext ls' + when isLet $ setJustClosedExplicitLetBlock True + return t + [] -> + do let ls = if isALRopen u + then [ALRNoLayout (containsCommas u) False] + else [] + setALRContext ls + -- XXX This is an error in John's code, but + -- it looks reachable to me at first glance + return t + (u, _, _) + | isALRopen u -> + do setALRContext (ALRNoLayout (containsCommas u) False : context) + return t + (ITin, ALRLayout ALRLayoutLet _ : ls, _) -> + do setALRContext ls + setPendingImplicitTokens [t] + return (L thisLoc ITvccurly) + (ITin, ALRLayout _ _ : ls, _) -> + do setALRContext ls + setNextToken t + return (L thisLoc ITvccurly) + -- the other ITin case omitted; general case below covers it + (ITcomma, ALRLayout _ _ : ls, _) + | topNoLayoutContainsCommas ls -> + do setALRContext ls + setNextToken t + return (L thisLoc ITvccurly) + (ITwhere, ALRLayout ALRLayoutDo _ : ls, _) -> + do setALRContext ls + setPendingImplicitTokens [t] + return (L thisLoc ITvccurly) + -- the other ITwhere case omitted; general case below covers it + (_, _, _) -> return t + +transitionalAlternativeLayoutWarning :: String -> SDoc +transitionalAlternativeLayoutWarning msg + = text "transitional layout will not be accepted in the future:" + $$ text msg + +isALRopen :: Token -> Bool +isALRopen ITcase = True +isALRopen ITif = True +isALRopen ITthen = True +isALRopen IToparen = True +isALRopen ITobrack = True +isALRopen ITocurly = True +-- GHC Extensions: +isALRopen IToubxparen = True +isALRopen _ = False + +isALRclose :: Token -> Bool +isALRclose ITof = True +isALRclose ITthen = True +isALRclose ITelse = True +isALRclose ITcparen = True +isALRclose ITcbrack = True +isALRclose ITccurly = True +-- GHC Extensions: +isALRclose ITcubxparen = True +isALRclose _ = False + +isNonDecreasingIndentation :: ALRLayout -> Bool +isNonDecreasingIndentation ALRLayoutDo = True +isNonDecreasingIndentation _ = False + +containsCommas :: Token -> Bool +containsCommas IToparen = True +containsCommas ITobrack = True +-- John doesn't have {} as containing commas, but records contain them, +-- which caused a problem parsing Cabal's Distribution.Simple.InstallDirs +-- (defaultInstallDirs). +containsCommas ITocurly = True +-- GHC Extensions: +containsCommas IToubxparen = True +containsCommas _ = False + +topNoLayoutContainsCommas :: [ALRContext] -> Bool +topNoLayoutContainsCommas [] = False +topNoLayoutContainsCommas (ALRLayout _ _ : ls) = topNoLayoutContainsCommas ls +topNoLayoutContainsCommas (ALRNoLayout b _ : _) = b + +lexToken :: P (PsLocated Token) +lexToken = do + inp@(AI loc1 buf) <- getInput + sc <- getLexState + exts <- getExts + case alexScanUser exts inp sc of + AlexEOF -> do + let span = mkPsSpan loc1 loc1 + setEofPos (psRealSpan span) + setLastToken span 0 + return (L span ITeof) + AlexError (AI loc2 buf) -> + reportLexError (psRealLoc loc1) (psRealLoc loc2) buf "lexical error" + AlexSkip inp2 _ -> do + setInput inp2 + lexToken + AlexToken inp2@(AI end buf2) _ t -> do + setInput inp2 + let span = mkPsSpan loc1 end + let bytes = byteDiff buf buf2 + span `seq` setLastToken span bytes + lt <- t span buf bytes + let lt' = unLoc lt + unless (isComment lt') (setLastTk lt') + return lt + +reportLexError :: RealSrcLoc -> RealSrcLoc -> StringBuffer -> [Char] -> P a +reportLexError loc1 loc2 buf str + | atEnd buf = failLocMsgP loc1 loc2 (str ++ " at end of input") + | otherwise = + let c = fst (nextChar buf) + in if c == '\0' -- decoding errors are mapped to '\0', see utf8DecodeChar# + then failLocMsgP loc2 loc2 (str ++ " (UTF-8 decoding error)") + else failLocMsgP loc1 loc2 (str ++ " at character " ++ show c) + +lexTokenStream :: StringBuffer -> RealSrcLoc -> DynFlags -> ParseResult [Located Token] +lexTokenStream buf loc dflags = unP go initState{ options = opts' } + where dflags' = gopt_set (gopt_unset dflags Opt_Haddock) Opt_KeepRawTokenStream + initState@PState{ options = opts } = mkPState dflags' buf loc + opts' = opts{ pExtsBitmap = complement (xbit UsePosPragsBit) .&. pExtsBitmap opts } + go = do + ltok <- lexer False return + case ltok of + L _ ITeof -> return [] + _ -> liftM (ltok:) go + +linePrags = Map.singleton "line" linePrag + +fileHeaderPrags = Map.fromList([("options", lex_string_prag IToptions_prag), + ("options_ghc", lex_string_prag IToptions_prag), + ("options_haddock", lex_string_prag ITdocOptions), + ("language", token ITlanguage_prag), + ("include", lex_string_prag ITinclude_prag)]) + +ignoredPrags = Map.fromList (map ignored pragmas) + where ignored opt = (opt, nested_comment lexToken) + impls = ["hugs", "nhc98", "jhc", "yhc", "catch", "derive"] + options_pragmas = map ("options_" ++) impls + -- CFILES is a hugs-only thing. + pragmas = options_pragmas ++ ["cfiles", "contract"] + +oneWordPrags = Map.fromList [ + ("rules", rulePrag), + ("inline", + strtoken (\s -> (ITinline_prag (SourceText s) Inline FunLike))), + ("inlinable", + strtoken (\s -> (ITinline_prag (SourceText s) Inlinable FunLike))), + ("inlineable", + strtoken (\s -> (ITinline_prag (SourceText s) Inlinable FunLike))), + -- Spelling variant + ("notinline", + strtoken (\s -> (ITinline_prag (SourceText s) NoInline FunLike))), + ("specialize", strtoken (\s -> ITspec_prag (SourceText s))), + ("source", strtoken (\s -> ITsource_prag (SourceText s))), + ("warning", strtoken (\s -> ITwarning_prag (SourceText s))), + ("deprecated", strtoken (\s -> ITdeprecated_prag (SourceText s))), + ("scc", strtoken (\s -> ITscc_prag (SourceText s))), + ("generated", strtoken (\s -> ITgenerated_prag (SourceText s))), + ("core", strtoken (\s -> ITcore_prag (SourceText s))), + ("unpack", strtoken (\s -> ITunpack_prag (SourceText s))), + ("nounpack", strtoken (\s -> ITnounpack_prag (SourceText s))), + ("ann", strtoken (\s -> ITann_prag (SourceText s))), + ("minimal", strtoken (\s -> ITminimal_prag (SourceText s))), + ("overlaps", strtoken (\s -> IToverlaps_prag (SourceText s))), + ("overlappable", strtoken (\s -> IToverlappable_prag (SourceText s))), + ("overlapping", strtoken (\s -> IToverlapping_prag (SourceText s))), + ("incoherent", strtoken (\s -> ITincoherent_prag (SourceText s))), + ("ctype", strtoken (\s -> ITctype (SourceText s))), + ("complete", strtoken (\s -> ITcomplete_prag (SourceText s))), + ("column", columnPrag) + ] + +twoWordPrags = Map.fromList [ + ("inline conlike", + strtoken (\s -> (ITinline_prag (SourceText s) Inline ConLike))), + ("notinline conlike", + strtoken (\s -> (ITinline_prag (SourceText s) NoInline ConLike))), + ("specialize inline", + strtoken (\s -> (ITspec_inline_prag (SourceText s) True))), + ("specialize notinline", + strtoken (\s -> (ITspec_inline_prag (SourceText s) False))) + ] + +dispatch_pragmas :: Map String Action -> Action +dispatch_pragmas prags span buf len = case Map.lookup (clean_pragma (lexemeToString buf len)) prags of + Just found -> found span buf len + Nothing -> lexError "unknown pragma" + +known_pragma :: Map String Action -> AlexAccPred ExtsBitmap +known_pragma prags _ (AI _ startbuf) _ (AI _ curbuf) + = isKnown && nextCharIsNot curbuf pragmaNameChar + where l = lexemeToString startbuf (byteDiff startbuf curbuf) + isKnown = isJust $ Map.lookup (clean_pragma l) prags + pragmaNameChar c = isAlphaNum c || c == '_' + +clean_pragma :: String -> String +clean_pragma prag = canon_ws (map toLower (unprefix prag)) + where unprefix prag' = case stripPrefix "{-#" prag' of + Just rest -> rest + Nothing -> prag' + canonical prag' = case prag' of + "noinline" -> "notinline" + "specialise" -> "specialize" + "constructorlike" -> "conlike" + _ -> prag' + canon_ws s = unwords (map canonical (words s)) + + + +{- +%************************************************************************ +%* * + Helper functions for generating annotations in the parser +%* * +%************************************************************************ +-} + +-- | Encapsulated call to addAnnotation, requiring only the SrcSpan of +-- the AST construct the annotation belongs to; together with the +-- AnnKeywordId, this is the key of the annotation map. +-- +-- This type is useful for places in the parser where it is not yet +-- known what SrcSpan an annotation should be added to. The most +-- common situation is when we are parsing a list: the annotations +-- need to be associated with the AST element that *contains* the +-- list, not the list itself. 'AddAnn' lets us defer adding the +-- annotations until we finish parsing the list and are now parsing +-- the enclosing element; we then apply the 'AddAnn' to associate +-- the annotations. Another common situation is where a common fragment of +-- the AST has been factored out but there is no separate AST node for +-- this fragment (this occurs in class and data declarations). In this +-- case, the annotation belongs to the parent data declaration. +-- +-- The usual way an 'AddAnn' is created is using the 'mj' ("make jump") +-- function, and then it can be discharged using the 'ams' function. +data AddAnn = AddAnn AnnKeywordId SrcSpan + +addAnnotationOnly :: RealSrcSpan -> AnnKeywordId -> RealSrcSpan -> P () +addAnnotationOnly l a v = P $ \s -> POk s { + annotations = ((l,a), [v]) : annotations s + } () + +-- |Given a 'SrcSpan' that surrounds a 'HsPar' or 'HsParTy', generate +-- 'AddAnn' values for the opening and closing bordering on the start +-- and end of the span +mkParensApiAnn :: SrcSpan -> [AddAnn] +mkParensApiAnn (UnhelpfulSpan _) = [] +mkParensApiAnn (RealSrcSpan ss _) = [AddAnn AnnOpenP lo,AddAnn AnnCloseP lc] + where + f = srcSpanFile ss + sl = srcSpanStartLine ss + sc = srcSpanStartCol ss + el = srcSpanEndLine ss + ec = srcSpanEndCol ss + lo = RealSrcSpan (mkRealSrcSpan (realSrcSpanStart ss) (mkRealSrcLoc f sl (sc+1))) Nothing + lc = RealSrcSpan (mkRealSrcSpan (mkRealSrcLoc f el (ec - 1)) (realSrcSpanEnd ss)) Nothing + +queueComment :: RealLocated Token -> P() +queueComment c = P $ \s -> POk s { + comment_q = commentToAnnotation c : comment_q s + } () + +-- | Go through the @comment_q@ in @PState@ and remove all comments +-- that belong within the given span +allocateCommentsP :: RealSrcSpan -> P () +allocateCommentsP ss = P $ \s -> + let (comment_q', newAnns) = allocateComments ss (comment_q s) in + POk s { + comment_q = comment_q' + , annotations_comments = newAnns ++ (annotations_comments s) + } () + +allocateComments + :: RealSrcSpan + -> [RealLocated AnnotationComment] + -> ([RealLocated AnnotationComment], [(RealSrcSpan,[RealLocated AnnotationComment])]) +allocateComments ss comment_q = + let + (before,rest) = break (\(L l _) -> isRealSubspanOf l ss) comment_q + (middle,after) = break (\(L l _) -> not (isRealSubspanOf l ss)) rest + comment_q' = before ++ after + newAnns = if null middle then [] + else [(ss,middle)] + in + (comment_q', newAnns) + + +commentToAnnotation :: RealLocated Token -> RealLocated AnnotationComment +commentToAnnotation (L l (ITdocCommentNext s)) = L l (AnnDocCommentNext s) +commentToAnnotation (L l (ITdocCommentPrev s)) = L l (AnnDocCommentPrev s) +commentToAnnotation (L l (ITdocCommentNamed s)) = L l (AnnDocCommentNamed s) +commentToAnnotation (L l (ITdocSection n s)) = L l (AnnDocSection n s) +commentToAnnotation (L l (ITdocOptions s)) = L l (AnnDocOptions s) +commentToAnnotation (L l (ITlineComment s)) = L l (AnnLineComment s) +commentToAnnotation (L l (ITblockComment s)) = L l (AnnBlockComment s) +commentToAnnotation _ = panic "commentToAnnotation" + +-- --------------------------------------------------------------------- + +isComment :: Token -> Bool +isComment (ITlineComment _) = True +isComment (ITblockComment _) = True +isComment _ = False + +isDocComment :: Token -> Bool +isDocComment (ITdocCommentNext _) = True +isDocComment (ITdocCommentPrev _) = True +isDocComment (ITdocCommentNamed _) = True +isDocComment (ITdocSection _ _) = True +isDocComment (ITdocOptions _) = True +isDocComment _ = False +} diff --git a/compiler/GHC/Parser/PostProcess.hs b/compiler/GHC/Parser/PostProcess.hs new file mode 100644 index 0000000000..7ce2f4fb9a --- /dev/null +++ b/compiler/GHC/Parser/PostProcess.hs @@ -0,0 +1,3090 @@ +-- +-- (c) The University of Glasgow 2002-2006 +-- + +-- Functions over HsSyn specialised to RdrName. + +{-# LANGUAGE CPP #-} +{-# LANGUAGE FlexibleContexts #-} +{-# LANGUAGE FlexibleInstances #-} +{-# LANGUAGE TypeFamilies #-} +{-# LANGUAGE MagicHash #-} +{-# LANGUAGE ViewPatterns #-} +{-# LANGUAGE GADTs #-} +{-# LANGUAGE RankNTypes #-} +{-# LANGUAGE LambdaCase #-} +{-# LANGUAGE TypeApplications #-} +{-# LANGUAGE GeneralizedNewtypeDeriving #-} + +{-# OPTIONS_GHC -Wno-incomplete-record-updates #-} + +module GHC.Parser.PostProcess ( + mkHsOpApp, + mkHsIntegral, mkHsFractional, mkHsIsString, + mkHsDo, mkSpliceDecl, + mkRoleAnnotDecl, + mkClassDecl, + mkTyData, mkDataFamInst, + mkTySynonym, mkTyFamInstEqn, + mkStandaloneKindSig, + mkTyFamInst, + mkFamDecl, mkLHsSigType, + mkInlinePragma, + mkPatSynMatchGroup, + mkRecConstrOrUpdate, -- HsExp -> [HsFieldUpdate] -> P HsExp + mkTyClD, mkInstD, + mkRdrRecordCon, mkRdrRecordUpd, + setRdrNameSpace, + filterCTuple, + + cvBindGroup, + cvBindsAndSigs, + cvTopDecls, + placeHolderPunRhs, + + -- Stuff to do with Foreign declarations + mkImport, + parseCImport, + mkExport, + mkExtName, -- RdrName -> CLabelString + mkGadtDecl, -- [Located RdrName] -> LHsType RdrName -> ConDecl RdrName + mkConDeclH98, + + -- Bunch of functions in the parser monad for + -- checking and constructing values + checkImportDecl, + checkExpBlockArguments, + checkPrecP, -- Int -> P Int + checkContext, -- HsType -> P HsContext + checkPattern, -- HsExp -> P HsPat + checkPattern_msg, + checkMonadComp, -- P (HsStmtContext GhcPs) + checkValDef, -- (SrcLoc, HsExp, HsRhs, [HsDecl]) -> P HsDecl + checkValSigLhs, + LRuleTyTmVar, RuleTyTmVar(..), + mkRuleBndrs, mkRuleTyVarBndrs, + checkRuleTyVarBndrNames, + checkRecordSyntax, + checkEmptyGADTs, + addFatalError, hintBangPat, + TyEl(..), mergeOps, mergeDataCon, + mkBangTy, + + -- Help with processing exports + ImpExpSubSpec(..), + ImpExpQcSpec(..), + mkModuleImpExp, + mkTypeImpExp, + mkImpExpSubSpec, + checkImportSpec, + + -- Token symbols + forallSym, + starSym, + + -- Warnings and errors + warnStarIsType, + warnPrepositiveQualifiedModule, + failOpFewArgs, + failOpNotEnabledImportQualifiedPost, + failOpImportQualifiedTwice, + + SumOrTuple (..), + + -- Expression/command/pattern ambiguity resolution + PV, + runPV, + ECP(ECP, runECP_PV), + runECP_P, + DisambInfixOp(..), + DisambECP(..), + ecpFromExp, + ecpFromCmd, + PatBuilder + ) where + +import GhcPrelude +import GHC.Hs -- Lots of it +import GHC.Core.TyCon ( TyCon, isTupleTyCon, tyConSingleDataCon_maybe ) +import GHC.Core.DataCon ( DataCon, dataConTyCon ) +import GHC.Core.ConLike ( ConLike(..) ) +import GHC.Core.Coercion.Axiom ( Role, fsFromRole ) +import GHC.Types.Name.Reader +import GHC.Types.Name +import GHC.Types.Basic +import GHC.Parser.Lexer +import GHC.Utils.Lexeme ( isLexCon ) +import GHC.Core.Type ( TyThing(..), funTyCon ) +import GHC.Builtin.Types( cTupleTyConName, tupleTyCon, tupleDataCon, + nilDataConName, nilDataConKey, + listTyConName, listTyConKey, eqTyCon_RDR, + tupleTyConName, cTupleTyConNameArity_maybe ) +import GHC.Types.ForeignCall +import GHC.Builtin.Names ( allNameStrings ) +import GHC.Types.SrcLoc +import GHC.Types.Unique ( hasKey ) +import OrdList ( OrdList, fromOL ) +import Bag ( emptyBag, consBag ) +import Outputable +import FastString +import Maybes +import Util +import GHC.Parser.Annotation +import Data.List +import GHC.Driver.Session ( WarningFlag(..), DynFlags ) +import ErrUtils ( Messages ) + +import Control.Monad +import Text.ParserCombinators.ReadP as ReadP +import Data.Char +import qualified Data.Monoid as Monoid +import Data.Data ( dataTypeOf, fromConstr, dataTypeConstrs ) +import Data.Kind ( Type ) + +#include "HsVersions.h" + + +{- ********************************************************************** + + Construction functions for Rdr stuff + + ********************************************************************* -} + +-- | mkClassDecl builds a RdrClassDecl, filling in the names for tycon and +-- datacon by deriving them from the name of the class. We fill in the names +-- for the tycon and datacon corresponding to the class, by deriving them +-- from the name of the class itself. This saves recording the names in the +-- interface file (which would be equally good). + +-- Similarly for mkConDecl, mkClassOpSig and default-method names. + +-- *** See Note [The Naming story] in GHC.Hs.Decls **** + +mkTyClD :: LTyClDecl (GhcPass p) -> LHsDecl (GhcPass p) +mkTyClD (L loc d) = L loc (TyClD noExtField d) + +mkInstD :: LInstDecl (GhcPass p) -> LHsDecl (GhcPass p) +mkInstD (L loc d) = L loc (InstD noExtField d) + +mkClassDecl :: SrcSpan + -> Located (Maybe (LHsContext GhcPs), LHsType GhcPs) + -> Located (a,[LHsFunDep GhcPs]) + -> OrdList (LHsDecl GhcPs) + -> P (LTyClDecl GhcPs) + +mkClassDecl loc (L _ (mcxt, tycl_hdr)) fds where_cls + = do { (binds, sigs, ats, at_defs, _, docs) <- cvBindsAndSigs where_cls + ; let cxt = fromMaybe (noLoc []) mcxt + ; (cls, tparams, fixity, ann) <- checkTyClHdr True tycl_hdr + ; addAnnsAt loc ann -- Add any API Annotations to the top SrcSpan + ; (tyvars,annst) <- checkTyVars (text "class") whereDots cls tparams + ; addAnnsAt loc annst -- Add any API Annotations to the top SrcSpan + ; return (L loc (ClassDecl { tcdCExt = noExtField, tcdCtxt = cxt + , tcdLName = cls, tcdTyVars = tyvars + , tcdFixity = fixity + , tcdFDs = snd (unLoc fds) + , tcdSigs = mkClassOpSigs sigs + , tcdMeths = binds + , tcdATs = ats, tcdATDefs = at_defs + , tcdDocs = docs })) } + +mkTyData :: SrcSpan + -> NewOrData + -> Maybe (Located CType) + -> Located (Maybe (LHsContext GhcPs), LHsType GhcPs) + -> Maybe (LHsKind GhcPs) + -> [LConDecl GhcPs] + -> HsDeriving GhcPs + -> P (LTyClDecl GhcPs) +mkTyData loc new_or_data cType (L _ (mcxt, tycl_hdr)) + ksig data_cons maybe_deriv + = do { (tc, tparams, fixity, ann) <- checkTyClHdr False tycl_hdr + ; addAnnsAt loc ann -- Add any API Annotations to the top SrcSpan + ; (tyvars, anns) <- checkTyVars (ppr new_or_data) equalsDots tc tparams + ; addAnnsAt loc anns -- Add any API Annotations to the top SrcSpan + ; defn <- mkDataDefn new_or_data cType mcxt ksig data_cons maybe_deriv + ; return (L loc (DataDecl { tcdDExt = noExtField, + tcdLName = tc, tcdTyVars = tyvars, + tcdFixity = fixity, + tcdDataDefn = defn })) } + +mkDataDefn :: NewOrData + -> Maybe (Located CType) + -> Maybe (LHsContext GhcPs) + -> Maybe (LHsKind GhcPs) + -> [LConDecl GhcPs] + -> HsDeriving GhcPs + -> P (HsDataDefn GhcPs) +mkDataDefn new_or_data cType mcxt ksig data_cons maybe_deriv + = do { checkDatatypeContext mcxt + ; let cxt = fromMaybe (noLoc []) mcxt + ; return (HsDataDefn { dd_ext = noExtField + , dd_ND = new_or_data, dd_cType = cType + , dd_ctxt = cxt + , dd_cons = data_cons + , dd_kindSig = ksig + , dd_derivs = maybe_deriv }) } + + +mkTySynonym :: SrcSpan + -> LHsType GhcPs -- LHS + -> LHsType GhcPs -- RHS + -> P (LTyClDecl GhcPs) +mkTySynonym loc lhs rhs + = do { (tc, tparams, fixity, ann) <- checkTyClHdr False lhs + ; addAnnsAt loc ann -- Add any API Annotations to the top SrcSpan + ; (tyvars, anns) <- checkTyVars (text "type") equalsDots tc tparams + ; addAnnsAt loc anns -- Add any API Annotations to the top SrcSpan + ; return (L loc (SynDecl { tcdSExt = noExtField + , tcdLName = tc, tcdTyVars = tyvars + , tcdFixity = fixity + , tcdRhs = rhs })) } + +mkStandaloneKindSig + :: SrcSpan + -> Located [Located RdrName] -- LHS + -> LHsKind GhcPs -- RHS + -> P (LStandaloneKindSig GhcPs) +mkStandaloneKindSig loc lhs rhs = + do { vs <- mapM check_lhs_name (unLoc lhs) + ; v <- check_singular_lhs (reverse vs) + ; return $ L loc $ StandaloneKindSig noExtField v (mkLHsSigType rhs) } + where + check_lhs_name v@(unLoc->name) = + if isUnqual name && isTcOcc (rdrNameOcc name) + then return v + else addFatalError (getLoc v) $ + hang (text "Expected an unqualified type constructor:") 2 (ppr v) + check_singular_lhs vs = + case vs of + [] -> panic "mkStandaloneKindSig: empty left-hand side" + [v] -> return v + _ -> addFatalError (getLoc lhs) $ + vcat [ hang (text "Standalone kind signatures do not support multiple names at the moment:") + 2 (pprWithCommas ppr vs) + , text "See https://gitlab.haskell.org/ghc/ghc/issues/16754 for details." ] + +mkTyFamInstEqn :: Maybe [LHsTyVarBndr GhcPs] + -> LHsType GhcPs + -> LHsType GhcPs + -> P (TyFamInstEqn GhcPs,[AddAnn]) +mkTyFamInstEqn bndrs lhs rhs + = do { (tc, tparams, fixity, ann) <- checkTyClHdr False lhs + ; return (mkHsImplicitBndrs + (FamEqn { feqn_ext = noExtField + , feqn_tycon = tc + , feqn_bndrs = bndrs + , feqn_pats = tparams + , feqn_fixity = fixity + , feqn_rhs = rhs }), + ann) } + +mkDataFamInst :: SrcSpan + -> NewOrData + -> Maybe (Located CType) + -> (Maybe ( LHsContext GhcPs), Maybe [LHsTyVarBndr GhcPs] + , LHsType GhcPs) + -> Maybe (LHsKind GhcPs) + -> [LConDecl GhcPs] + -> HsDeriving GhcPs + -> P (LInstDecl GhcPs) +mkDataFamInst loc new_or_data cType (mcxt, bndrs, tycl_hdr) + ksig data_cons maybe_deriv + = do { (tc, tparams, fixity, ann) <- checkTyClHdr False tycl_hdr + ; addAnnsAt loc ann -- Add any API Annotations to the top SrcSpan + ; defn <- mkDataDefn new_or_data cType mcxt ksig data_cons maybe_deriv + ; return (L loc (DataFamInstD noExtField (DataFamInstDecl (mkHsImplicitBndrs + (FamEqn { feqn_ext = noExtField + , feqn_tycon = tc + , feqn_bndrs = bndrs + , feqn_pats = tparams + , feqn_fixity = fixity + , feqn_rhs = defn }))))) } + +mkTyFamInst :: SrcSpan + -> TyFamInstEqn GhcPs + -> P (LInstDecl GhcPs) +mkTyFamInst loc eqn + = return (L loc (TyFamInstD noExtField (TyFamInstDecl eqn))) + +mkFamDecl :: SrcSpan + -> FamilyInfo GhcPs + -> LHsType GhcPs -- LHS + -> Located (FamilyResultSig GhcPs) -- Optional result signature + -> Maybe (LInjectivityAnn GhcPs) -- Injectivity annotation + -> P (LTyClDecl GhcPs) +mkFamDecl loc info lhs ksig injAnn + = do { (tc, tparams, fixity, ann) <- checkTyClHdr False lhs + ; addAnnsAt loc ann -- Add any API Annotations to the top SrcSpan + ; (tyvars, anns) <- checkTyVars (ppr info) equals_or_where tc tparams + ; addAnnsAt loc anns -- Add any API Annotations to the top SrcSpan + ; return (L loc (FamDecl noExtField (FamilyDecl + { fdExt = noExtField + , fdInfo = info, fdLName = tc + , fdTyVars = tyvars + , fdFixity = fixity + , fdResultSig = ksig + , fdInjectivityAnn = injAnn }))) } + where + equals_or_where = case info of + DataFamily -> empty + OpenTypeFamily -> empty + ClosedTypeFamily {} -> whereDots + +mkSpliceDecl :: LHsExpr GhcPs -> HsDecl GhcPs +-- If the user wrote +-- [pads| ... ] then return a QuasiQuoteD +-- $(e) then return a SpliceD +-- but if she wrote, say, +-- f x then behave as if she'd written $(f x) +-- ie a SpliceD +-- +-- Typed splices are not allowed at the top level, thus we do not represent them +-- as spliced declaration. See #10945 +mkSpliceDecl lexpr@(L loc expr) + | HsSpliceE _ splice@(HsUntypedSplice {}) <- expr + = SpliceD noExtField (SpliceDecl noExtField (L loc splice) ExplicitSplice) + + | HsSpliceE _ splice@(HsQuasiQuote {}) <- expr + = SpliceD noExtField (SpliceDecl noExtField (L loc splice) ExplicitSplice) + + | otherwise + = SpliceD noExtField (SpliceDecl noExtField (L loc (mkUntypedSplice BareSplice lexpr)) + ImplicitSplice) + +mkRoleAnnotDecl :: SrcSpan + -> Located RdrName -- type being annotated + -> [Located (Maybe FastString)] -- roles + -> P (LRoleAnnotDecl GhcPs) +mkRoleAnnotDecl loc tycon roles + = do { roles' <- mapM parse_role roles + ; return $ L loc $ RoleAnnotDecl noExtField tycon roles' } + where + role_data_type = dataTypeOf (undefined :: Role) + all_roles = map fromConstr $ dataTypeConstrs role_data_type + possible_roles = [(fsFromRole role, role) | role <- all_roles] + + parse_role (L loc_role Nothing) = return $ L loc_role Nothing + parse_role (L loc_role (Just role)) + = case lookup role possible_roles of + Just found_role -> return $ L loc_role $ Just found_role + Nothing -> + let nearby = fuzzyLookup (unpackFS role) + (mapFst unpackFS possible_roles) + in + addFatalError loc_role + (text "Illegal role name" <+> quotes (ppr role) $$ + suggestions nearby) + + suggestions [] = empty + suggestions [r] = text "Perhaps you meant" <+> quotes (ppr r) + -- will this last case ever happen?? + suggestions list = hang (text "Perhaps you meant one of these:") + 2 (pprWithCommas (quotes . ppr) list) + +{- ********************************************************************** + + #cvBinds-etc# Converting to @HsBinds@, etc. + + ********************************************************************* -} + +-- | Function definitions are restructured here. Each is assumed to be recursive +-- initially, and non recursive definitions are discovered by the dependency +-- analyser. + + +-- | Groups together bindings for a single function +cvTopDecls :: OrdList (LHsDecl GhcPs) -> [LHsDecl GhcPs] +cvTopDecls decls = go (fromOL decls) + where + go :: [LHsDecl GhcPs] -> [LHsDecl GhcPs] + go [] = [] + go ((L l (ValD x b)) : ds) + = L l' (ValD x b') : go ds' + where (L l' b', ds') = getMonoBind (L l b) ds + go (d : ds) = d : go ds + +-- Declaration list may only contain value bindings and signatures. +cvBindGroup :: OrdList (LHsDecl GhcPs) -> P (HsValBinds GhcPs) +cvBindGroup binding + = do { (mbs, sigs, fam_ds, tfam_insts + , dfam_insts, _) <- cvBindsAndSigs binding + ; ASSERT( null fam_ds && null tfam_insts && null dfam_insts) + return $ ValBinds noExtField mbs sigs } + +cvBindsAndSigs :: OrdList (LHsDecl GhcPs) + -> P (LHsBinds GhcPs, [LSig GhcPs], [LFamilyDecl GhcPs] + , [LTyFamInstDecl GhcPs], [LDataFamInstDecl GhcPs], [LDocDecl]) +-- Input decls contain just value bindings and signatures +-- and in case of class or instance declarations also +-- associated type declarations. They might also contain Haddock comments. +cvBindsAndSigs fb = go (fromOL fb) + where + go [] = return (emptyBag, [], [], [], [], []) + go ((L l (ValD _ b)) : ds) + = do { (bs, ss, ts, tfis, dfis, docs) <- go ds' + ; return (b' `consBag` bs, ss, ts, tfis, dfis, docs) } + where + (b', ds') = getMonoBind (L l b) ds + go ((L l decl) : ds) + = do { (bs, ss, ts, tfis, dfis, docs) <- go ds + ; case decl of + SigD _ s + -> return (bs, L l s : ss, ts, tfis, dfis, docs) + TyClD _ (FamDecl _ t) + -> return (bs, ss, L l t : ts, tfis, dfis, docs) + InstD _ (TyFamInstD { tfid_inst = tfi }) + -> return (bs, ss, ts, L l tfi : tfis, dfis, docs) + InstD _ (DataFamInstD { dfid_inst = dfi }) + -> return (bs, ss, ts, tfis, L l dfi : dfis, docs) + DocD _ d + -> return (bs, ss, ts, tfis, dfis, L l d : docs) + SpliceD _ d + -> addFatalError l $ + hang (text "Declaration splices are allowed only" <+> + text "at the top level:") + 2 (ppr d) + _ -> pprPanic "cvBindsAndSigs" (ppr decl) } + +----------------------------------------------------------------------------- +-- Group function bindings into equation groups + +getMonoBind :: LHsBind GhcPs -> [LHsDecl GhcPs] + -> (LHsBind GhcPs, [LHsDecl GhcPs]) +-- Suppose (b',ds') = getMonoBind b ds +-- ds is a list of parsed bindings +-- b is a MonoBinds that has just been read off the front + +-- Then b' is the result of grouping more equations from ds that +-- belong with b into a single MonoBinds, and ds' is the depleted +-- list of parsed bindings. +-- +-- All Haddock comments between equations inside the group are +-- discarded. +-- +-- No AndMonoBinds or EmptyMonoBinds here; just single equations + +getMonoBind (L loc1 (FunBind { fun_id = fun_id1@(L _ f1) + , fun_matches = + MG { mg_alts = (L _ mtchs1) } })) + binds + | has_args mtchs1 + = go mtchs1 loc1 binds [] + where + go mtchs loc + ((L loc2 (ValD _ (FunBind { fun_id = (L _ f2) + , fun_matches = + MG { mg_alts = (L _ mtchs2) } }))) + : binds) _ + | f1 == f2 = go (mtchs2 ++ mtchs) + (combineSrcSpans loc loc2) binds [] + go mtchs loc (doc_decl@(L loc2 (DocD {})) : binds) doc_decls + = let doc_decls' = doc_decl : doc_decls + in go mtchs (combineSrcSpans loc loc2) binds doc_decls' + go mtchs loc binds doc_decls + = ( L loc (makeFunBind fun_id1 (reverse mtchs)) + , (reverse doc_decls) ++ binds) + -- Reverse the final matches, to get it back in the right order + -- Do the same thing with the trailing doc comments + +getMonoBind bind binds = (bind, binds) + +has_args :: [LMatch GhcPs (LHsExpr GhcPs)] -> Bool +has_args [] = panic "GHC.Parser.PostProcess.has_args" +has_args (L _ (Match { m_pats = args }) : _) = not (null args) + -- Don't group together FunBinds if they have + -- no arguments. This is necessary now that variable bindings + -- with no arguments are now treated as FunBinds rather + -- than pattern bindings (tests/rename/should_fail/rnfail002). + +{- ********************************************************************** + + #PrefixToHS-utils# Utilities for conversion + + ********************************************************************* -} + +{- Note [Parsing data constructors is hard] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The problem with parsing data constructors is that they look a lot like types. +Compare: + + (s1) data T = C t1 t2 + (s2) type T = C t1 t2 + +Syntactically, there's little difference between these declarations, except in +(s1) 'C' is a data constructor, but in (s2) 'C' is a type constructor. + +This similarity would pose no problem if we knew ahead of time if we are +parsing a type or a constructor declaration. Looking at (s1) and (s2), a simple +(but wrong!) rule comes to mind: in 'data' declarations assume we are parsing +data constructors, and in other contexts (e.g. 'type' declarations) assume we +are parsing type constructors. + +This simple rule does not work because of two problematic cases: + + (p1) data T = C t1 t2 :+ t3 + (p2) data T = C t1 t2 => t3 + +In (p1) we encounter (:+) and it turns out we are parsing an infix data +declaration, so (C t1 t2) is a type and 'C' is a type constructor. +In (p2) we encounter (=>) and it turns out we are parsing an existential +context, so (C t1 t2) is a constraint and 'C' is a type constructor. + +As the result, in order to determine whether (C t1 t2) declares a data +constructor, a type, or a context, we would need unlimited lookahead which +'happy' is not so happy with. + +To further complicate matters, the interpretation of (!) and (~) is different +in constructors and types: + + (b1) type T = C ! D + (b2) data T = C ! D + (b3) data T = C ! D => E + +In (b1) and (b3), (!) is a type operator with two arguments: 'C' and 'D'. At +the same time, in (b2) it is a strictness annotation: 'C' is a data constructor +with a single strict argument 'D'. For the programmer, these cases are usually +easy to tell apart due to whitespace conventions: + + (b2) data T = C !D -- no space after the bang hints that + -- it is a strictness annotation + +For the parser, on the other hand, this whitespace does not matter. We cannot +tell apart (b2) from (b3) until we encounter (=>), so it requires unlimited +lookahead. + +The solution that accounts for all of these issues is to initially parse data +declarations and types as a reversed list of TyEl: + + data TyEl = TyElOpr RdrName + | TyElOpd (HsType GhcPs) + | ... + +For example, both occurrences of (C ! D) in the following example are parsed +into equal lists of TyEl: + + data T = C ! D => C ! D results in [ TyElOpd (HsTyVar "D") + , TyElOpr "!" + , TyElOpd (HsTyVar "C") ] + +Note that elements are in reverse order. Also, 'C' is parsed as a type +constructor (HsTyVar) even when it is a data constructor. We fix this in +`tyConToDataCon`. + +By the time the list of TyEl is assembled, we have looked ahead enough to +decide whether to reduce using `mergeOps` (for types) or `mergeDataCon` (for +data constructors). These functions are where the actual job of parsing is +done. + +-} + +-- | Reinterpret a type constructor, including type operators, as a data +-- constructor. +-- See Note [Parsing data constructors is hard] +tyConToDataCon :: SrcSpan -> RdrName -> Either (SrcSpan, SDoc) (Located RdrName) +tyConToDataCon loc tc + | isTcOcc occ || isDataOcc occ + , isLexCon (occNameFS occ) + = return (L loc (setRdrNameSpace tc srcDataName)) + + | otherwise + = Left (loc, msg) + where + occ = rdrNameOcc tc + msg = text "Not a data constructor:" <+> quotes (ppr tc) + +mkPatSynMatchGroup :: Located RdrName + -> Located (OrdList (LHsDecl GhcPs)) + -> P (MatchGroup GhcPs (LHsExpr GhcPs)) +mkPatSynMatchGroup (L loc patsyn_name) (L _ decls) = + do { matches <- mapM fromDecl (fromOL decls) + ; when (null matches) (wrongNumberErr loc) + ; return $ mkMatchGroup FromSource matches } + where + fromDecl (L loc decl@(ValD _ (PatBind _ + pat@(L _ (ConPatIn ln@(L _ name) details)) + rhs _))) = + do { unless (name == patsyn_name) $ + wrongNameBindingErr loc decl + ; match <- case details of + PrefixCon pats -> return $ Match { m_ext = noExtField + , m_ctxt = ctxt, m_pats = pats + , m_grhss = rhs } + where + ctxt = FunRhs { mc_fun = ln + , mc_fixity = Prefix + , mc_strictness = NoSrcStrict } + + InfixCon p1 p2 -> return $ Match { m_ext = noExtField + , m_ctxt = ctxt + , m_pats = [p1, p2] + , m_grhss = rhs } + where + ctxt = FunRhs { mc_fun = ln + , mc_fixity = Infix + , mc_strictness = NoSrcStrict } + + RecCon{} -> recordPatSynErr loc pat + ; return $ L loc match } + fromDecl (L loc decl) = extraDeclErr loc decl + + extraDeclErr loc decl = + addFatalError loc $ + text "pattern synonym 'where' clause must contain a single binding:" $$ + ppr decl + + wrongNameBindingErr loc decl = + addFatalError loc $ + text "pattern synonym 'where' clause must bind the pattern synonym's name" + <+> quotes (ppr patsyn_name) $$ ppr decl + + wrongNumberErr loc = + addFatalError loc $ + text "pattern synonym 'where' clause cannot be empty" $$ + text "In the pattern synonym declaration for: " <+> ppr (patsyn_name) + +recordPatSynErr :: SrcSpan -> LPat GhcPs -> P a +recordPatSynErr loc pat = + addFatalError loc $ + text "record syntax not supported for pattern synonym declarations:" $$ + ppr pat + +mkConDeclH98 :: Located RdrName -> Maybe [LHsTyVarBndr GhcPs] + -> Maybe (LHsContext GhcPs) -> HsConDeclDetails GhcPs + -> ConDecl GhcPs + +mkConDeclH98 name mb_forall mb_cxt args + = ConDeclH98 { con_ext = noExtField + , con_name = name + , con_forall = noLoc $ isJust mb_forall + , con_ex_tvs = mb_forall `orElse` [] + , con_mb_cxt = mb_cxt + , con_args = args + , con_doc = Nothing } + +mkGadtDecl :: [Located RdrName] + -> LHsType GhcPs -- Always a HsForAllTy + -> (ConDecl GhcPs, [AddAnn]) +mkGadtDecl names ty + = (ConDeclGADT { con_g_ext = noExtField + , con_names = names + , con_forall = L l $ isLHsForAllTy ty' + , con_qvars = mkHsQTvs tvs + , con_mb_cxt = mcxt + , con_args = args + , con_res_ty = res_ty + , con_doc = Nothing } + , anns1 ++ anns2) + where + (ty'@(L l _),anns1) = peel_parens ty [] + (tvs, rho) = splitLHsForAllTyInvis ty' + (mcxt, tau, anns2) = split_rho rho [] + + split_rho (L _ (HsQualTy { hst_ctxt = cxt, hst_body = tau })) ann + = (Just cxt, tau, ann) + split_rho (L l (HsParTy _ ty)) ann + = split_rho ty (ann++mkParensApiAnn l) + split_rho tau ann + = (Nothing, tau, ann) + + (args, res_ty) = split_tau tau + + -- See Note [GADT abstract syntax] in GHC.Hs.Decls + split_tau (L _ (HsFunTy _ (L loc (HsRecTy _ rf)) res_ty)) + = (RecCon (L loc rf), res_ty) + split_tau tau + = (PrefixCon [], tau) + + peel_parens (L l (HsParTy _ ty)) ann = peel_parens ty + (ann++mkParensApiAnn l) + peel_parens ty ann = (ty, ann) + + +setRdrNameSpace :: RdrName -> NameSpace -> RdrName +-- ^ This rather gruesome function is used mainly by the parser. +-- When parsing: +-- +-- > data T a = T | T1 Int +-- +-- we parse the data constructors as /types/ because of parser ambiguities, +-- so then we need to change the /type constr/ to a /data constr/ +-- +-- The exact-name case /can/ occur when parsing: +-- +-- > data [] a = [] | a : [a] +-- +-- For the exact-name case we return an original name. +setRdrNameSpace (Unqual occ) ns = Unqual (setOccNameSpace ns occ) +setRdrNameSpace (Qual m occ) ns = Qual m (setOccNameSpace ns occ) +setRdrNameSpace (Orig m occ) ns = Orig m (setOccNameSpace ns occ) +setRdrNameSpace (Exact n) ns + | Just thing <- wiredInNameTyThing_maybe n + = setWiredInNameSpace thing ns + -- Preserve Exact Names for wired-in things, + -- notably tuples and lists + + | isExternalName n + = Orig (nameModule n) occ + + | otherwise -- This can happen when quoting and then + -- splicing a fixity declaration for a type + = Exact (mkSystemNameAt (nameUnique n) occ (nameSrcSpan n)) + where + occ = setOccNameSpace ns (nameOccName n) + +setWiredInNameSpace :: TyThing -> NameSpace -> RdrName +setWiredInNameSpace (ATyCon tc) ns + | isDataConNameSpace ns + = ty_con_data_con tc + | isTcClsNameSpace ns + = Exact (getName tc) -- No-op + +setWiredInNameSpace (AConLike (RealDataCon dc)) ns + | isTcClsNameSpace ns + = data_con_ty_con dc + | isDataConNameSpace ns + = Exact (getName dc) -- No-op + +setWiredInNameSpace thing ns + = pprPanic "setWiredinNameSpace" (pprNameSpace ns <+> ppr thing) + +ty_con_data_con :: TyCon -> RdrName +ty_con_data_con tc + | isTupleTyCon tc + , Just dc <- tyConSingleDataCon_maybe tc + = Exact (getName dc) + + | tc `hasKey` listTyConKey + = Exact nilDataConName + + | otherwise -- See Note [setRdrNameSpace for wired-in names] + = Unqual (setOccNameSpace srcDataName (getOccName tc)) + +data_con_ty_con :: DataCon -> RdrName +data_con_ty_con dc + | let tc = dataConTyCon dc + , isTupleTyCon tc + = Exact (getName tc) + + | dc `hasKey` nilDataConKey + = Exact listTyConName + + | otherwise -- See Note [setRdrNameSpace for wired-in names] + = Unqual (setOccNameSpace tcClsName (getOccName dc)) + +-- | Replaces constraint tuple names with corresponding boxed ones. +filterCTuple :: RdrName -> RdrName +filterCTuple (Exact n) + | Just arity <- cTupleTyConNameArity_maybe n + = Exact $ tupleTyConName BoxedTuple arity +filterCTuple rdr = rdr + + +{- Note [setRdrNameSpace for wired-in names] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +In GHC.Types, which declares (:), we have + infixr 5 : +The ambiguity about which ":" is meant is resolved by parsing it as a +data constructor, but then using dataTcOccs to try the type constructor too; +and that in turn calls setRdrNameSpace to change the name-space of ":" to +tcClsName. There isn't a corresponding ":" type constructor, but it's painful +to make setRdrNameSpace partial, so we just make an Unqual name instead. It +really doesn't matter! +-} + +eitherToP :: Either (SrcSpan, SDoc) a -> P a +-- Adapts the Either monad to the P monad +eitherToP (Left (loc, doc)) = addFatalError loc doc +eitherToP (Right thing) = return thing + +checkTyVars :: SDoc -> SDoc -> Located RdrName -> [LHsTypeArg GhcPs] + -> P ( LHsQTyVars GhcPs -- the synthesized type variables + , [AddAnn] ) -- action which adds annotations +-- ^ Check whether the given list of type parameters are all type variables +-- (possibly with a kind signature). +checkTyVars pp_what equals_or_where tc tparms + = do { (tvs, anns) <- fmap unzip $ mapM check tparms + ; return (mkHsQTvs tvs, concat anns) } + where + check (HsTypeArg _ ki@(L loc _)) + = addFatalError loc $ + vcat [ text "Unexpected type application" <+> + text "@" <> ppr ki + , text "In the" <+> pp_what <+> + ptext (sLit "declaration for") <+> quotes (ppr tc)] + check (HsValArg ty) = chkParens [] ty + check (HsArgPar sp) = addFatalError sp $ + vcat [text "Malformed" <+> pp_what + <+> text "declaration for" <+> quotes (ppr tc)] + -- Keep around an action for adjusting the annotations of extra parens + chkParens :: [AddAnn] -> LHsType GhcPs + -> P (LHsTyVarBndr GhcPs, [AddAnn]) + chkParens acc (L l (HsParTy _ ty)) = chkParens (mkParensApiAnn l ++ acc) ty + chkParens acc ty = do + tv <- chk ty + return (tv, reverse acc) + + -- Check that the name space is correct! + chk :: LHsType GhcPs -> P (LHsTyVarBndr GhcPs) + chk (L l (HsKindSig _ (L lv (HsTyVar _ _ (L _ tv))) k)) + | isRdrTyVar tv = return (L l (KindedTyVar noExtField (L lv tv) k)) + chk (L l (HsTyVar _ _ (L ltv tv))) + | isRdrTyVar tv = return (L l (UserTyVar noExtField (L ltv tv))) + chk t@(L loc _) + = addFatalError loc $ + vcat [ text "Unexpected type" <+> quotes (ppr t) + , text "In the" <+> pp_what + <+> ptext (sLit "declaration for") <+> quotes tc' + , vcat[ (text "A" <+> pp_what + <+> ptext (sLit "declaration should have form")) + , nest 2 + (pp_what + <+> tc' + <+> hsep (map text (takeList tparms allNameStrings)) + <+> equals_or_where) ] ] + + -- Avoid printing a constraint tuple in the error message. Print + -- a plain old tuple instead (since that's what the user probably + -- wrote). See #14907 + tc' = ppr $ fmap filterCTuple tc + + + +whereDots, equalsDots :: SDoc +-- Second argument to checkTyVars +whereDots = text "where ..." +equalsDots = text "= ..." + +checkDatatypeContext :: Maybe (LHsContext GhcPs) -> P () +checkDatatypeContext Nothing = return () +checkDatatypeContext (Just c) + = do allowed <- getBit DatatypeContextsBit + unless allowed $ + addError (getLoc c) + (text "Illegal datatype context (use DatatypeContexts):" + <+> pprLHsContext c) + +type LRuleTyTmVar = Located RuleTyTmVar +data RuleTyTmVar = RuleTyTmVar (Located RdrName) (Maybe (LHsType GhcPs)) +-- ^ Essentially a wrapper for a @RuleBndr GhcPs@ + +-- turns RuleTyTmVars into RuleBnrs - this is straightforward +mkRuleBndrs :: [LRuleTyTmVar] -> [LRuleBndr GhcPs] +mkRuleBndrs = fmap (fmap cvt_one) + where cvt_one (RuleTyTmVar v Nothing) = RuleBndr noExtField v + cvt_one (RuleTyTmVar v (Just sig)) = + RuleBndrSig noExtField v (mkLHsSigWcType sig) + +-- turns RuleTyTmVars into HsTyVarBndrs - this is more interesting +mkRuleTyVarBndrs :: [LRuleTyTmVar] -> [LHsTyVarBndr GhcPs] +mkRuleTyVarBndrs = fmap (fmap cvt_one) + where cvt_one (RuleTyTmVar v Nothing) = UserTyVar noExtField (fmap tm_to_ty v) + cvt_one (RuleTyTmVar v (Just sig)) + = KindedTyVar noExtField (fmap tm_to_ty v) sig + -- takes something in namespace 'varName' to something in namespace 'tvName' + tm_to_ty (Unqual occ) = Unqual (setOccNameSpace tvName occ) + tm_to_ty _ = panic "mkRuleTyVarBndrs" + +-- See note [Parsing explicit foralls in Rules] in GHC.Parser +checkRuleTyVarBndrNames :: [LHsTyVarBndr GhcPs] -> P () +checkRuleTyVarBndrNames = mapM_ (check . fmap hsTyVarName) + where check (L loc (Unqual occ)) = do + when ((occNameString occ ==) `any` ["forall","family","role"]) + (addFatalError loc (text $ "parse error on input " + ++ occNameString occ)) + check _ = panic "checkRuleTyVarBndrNames" + +checkRecordSyntax :: (MonadP m, Outputable a) => Located a -> m (Located a) +checkRecordSyntax lr@(L loc r) + = do allowed <- getBit TraditionalRecordSyntaxBit + unless allowed $ addError loc $ + text "Illegal record syntax (use TraditionalRecordSyntax):" <+> ppr r + return lr + +-- | Check if the gadt_constrlist is empty. Only raise parse error for +-- `data T where` to avoid affecting existing error message, see #8258. +checkEmptyGADTs :: Located ([AddAnn], [LConDecl GhcPs]) + -> P (Located ([AddAnn], [LConDecl GhcPs])) +checkEmptyGADTs gadts@(L span (_, [])) -- Empty GADT declaration. + = do gadtSyntax <- getBit GadtSyntaxBit -- GADTs implies GADTSyntax + unless gadtSyntax $ addError span $ vcat + [ text "Illegal keyword 'where' in data declaration" + , text "Perhaps you intended to use GADTs or a similar language" + , text "extension to enable syntax: data T where" + ] + return gadts +checkEmptyGADTs gadts = return gadts -- Ordinary GADT declaration. + +checkTyClHdr :: Bool -- True <=> class header + -- False <=> type header + -> LHsType GhcPs + -> P (Located RdrName, -- the head symbol (type or class name) + [LHsTypeArg GhcPs], -- parameters of head symbol + LexicalFixity, -- the declaration is in infix format + [AddAnn]) -- API Annotation for HsParTy when stripping parens +-- Well-formedness check and decomposition of type and class heads. +-- Decomposes T ty1 .. tyn into (T, [ty1, ..., tyn]) +-- Int :*: Bool into (:*:, [Int, Bool]) +-- returning the pieces +checkTyClHdr is_cls ty + = goL ty [] [] Prefix + where + goL (L l ty) acc ann fix = go l ty acc ann fix + + -- workaround to define '*' despite StarIsType + go lp (HsParTy _ (L l (HsStarTy _ isUni))) acc ann fix + = do { warnStarBndr l + ; let name = mkOccName tcClsName (starSym isUni) + ; return (L l (Unqual name), acc, fix, (ann ++ mkParensApiAnn lp)) } + + go _ (HsTyVar _ _ ltc@(L _ tc)) acc ann fix + | isRdrTc tc = return (ltc, acc, fix, ann) + go _ (HsOpTy _ t1 ltc@(L _ tc) t2) acc ann _fix + | isRdrTc tc = return (ltc, HsValArg t1:HsValArg t2:acc, Infix, ann) + go l (HsParTy _ ty) acc ann fix = goL ty acc (ann ++mkParensApiAnn l) fix + go _ (HsAppTy _ t1 t2) acc ann fix = goL t1 (HsValArg t2:acc) ann fix + go _ (HsAppKindTy l ty ki) acc ann fix = goL ty (HsTypeArg l ki:acc) ann fix + go l (HsTupleTy _ HsBoxedOrConstraintTuple ts) [] ann fix + = return (L l (nameRdrName tup_name), map HsValArg ts, fix, ann) + where + arity = length ts + tup_name | is_cls = cTupleTyConName arity + | otherwise = getName (tupleTyCon Boxed arity) + -- See Note [Unit tuples] in GHC.Hs.Types (TODO: is this still relevant?) + go l _ _ _ _ + = addFatalError l (text "Malformed head of type or class declaration:" + <+> ppr ty) + +-- | Yield a parse error if we have a function applied directly to a do block +-- etc. and BlockArguments is not enabled. +checkExpBlockArguments :: LHsExpr GhcPs -> PV () +checkCmdBlockArguments :: LHsCmd GhcPs -> PV () +(checkExpBlockArguments, checkCmdBlockArguments) = (checkExpr, checkCmd) + where + checkExpr :: LHsExpr GhcPs -> PV () + checkExpr expr = case unLoc expr of + HsDo _ DoExpr _ -> check "do block" expr + HsDo _ MDoExpr _ -> check "mdo block" expr + HsLam {} -> check "lambda expression" expr + HsCase {} -> check "case expression" expr + HsLamCase {} -> check "lambda-case expression" expr + HsLet {} -> check "let expression" expr + HsIf {} -> check "if expression" expr + HsProc {} -> check "proc expression" expr + _ -> return () + + checkCmd :: LHsCmd GhcPs -> PV () + checkCmd cmd = case unLoc cmd of + HsCmdLam {} -> check "lambda command" cmd + HsCmdCase {} -> check "case command" cmd + HsCmdIf {} -> check "if command" cmd + HsCmdLet {} -> check "let command" cmd + HsCmdDo {} -> check "do command" cmd + _ -> return () + + check :: Outputable a => String -> Located a -> PV () + check element a = do + blockArguments <- getBit BlockArgumentsBit + unless blockArguments $ + addError (getLoc a) $ + text "Unexpected " <> text element <> text " in function application:" + $$ nest 4 (ppr a) + $$ text "You could write it with parentheses" + $$ text "Or perhaps you meant to enable BlockArguments?" + +-- | Validate the context constraints and break up a context into a list +-- of predicates. +-- +-- @ +-- (Eq a, Ord b) --> [Eq a, Ord b] +-- Eq a --> [Eq a] +-- (Eq a) --> [Eq a] +-- (((Eq a))) --> [Eq a] +-- @ +checkContext :: LHsType GhcPs -> P ([AddAnn],LHsContext GhcPs) +checkContext (L l orig_t) + = check [] (L l orig_t) + where + check anns (L lp (HsTupleTy _ HsBoxedOrConstraintTuple ts)) + -- (Eq a, Ord b) shows up as a tuple type. Only boxed tuples can + -- be used as context constraints. + = return (anns ++ mkParensApiAnn lp,L l ts) -- Ditto () + + check anns (L lp1 (HsParTy _ ty)) + -- to be sure HsParTy doesn't get into the way + = check anns' ty + where anns' = if l == lp1 then anns + else (anns ++ mkParensApiAnn lp1) + + -- no need for anns, returning original + check _anns t = checkNoDocs msg t *> return ([],L l [L l orig_t]) + + msg = text "data constructor context" + +-- | Check recursively if there are any 'HsDocTy's in the given type. +-- This only works on a subset of types produced by 'btype_no_ops' +checkNoDocs :: SDoc -> LHsType GhcPs -> P () +checkNoDocs msg ty = go ty + where + go (L _ (HsAppKindTy _ ty ki)) = go ty *> go ki + go (L _ (HsAppTy _ t1 t2)) = go t1 *> go t2 + go (L l (HsDocTy _ t ds)) = addError l $ hsep + [ text "Unexpected haddock", quotes (ppr ds) + , text "on", msg, quotes (ppr t) ] + go _ = pure () + +checkImportDecl :: Maybe (Located Token) + -> Maybe (Located Token) + -> P () +checkImportDecl mPre mPost = do + let whenJust mg f = maybe (pure ()) f mg + + importQualifiedPostEnabled <- getBit ImportQualifiedPostBit + + -- Error if 'qualified' found in postpositive position and + -- 'ImportQualifiedPost' is not in effect. + whenJust mPost $ \post -> + when (not importQualifiedPostEnabled) $ + failOpNotEnabledImportQualifiedPost (getLoc post) + + -- Error if 'qualified' occurs in both pre and postpositive + -- positions. + whenJust mPost $ \post -> + when (isJust mPre) $ + failOpImportQualifiedTwice (getLoc post) + + -- Warn if 'qualified' found in prepositive position and + -- 'Opt_WarnPrepositiveQualifiedModule' is enabled. + whenJust mPre $ \pre -> + warnPrepositiveQualifiedModule (getLoc pre) + +-- ------------------------------------------------------------------------- +-- Checking Patterns. + +-- We parse patterns as expressions and check for valid patterns below, +-- converting the expression into a pattern at the same time. + +checkPattern :: Located (PatBuilder GhcPs) -> P (LPat GhcPs) +checkPattern = runPV . checkLPat + +checkPattern_msg :: SDoc -> PV (Located (PatBuilder GhcPs)) -> P (LPat GhcPs) +checkPattern_msg msg pp = runPV_msg msg (pp >>= checkLPat) + +checkLPat :: Located (PatBuilder GhcPs) -> PV (LPat GhcPs) +checkLPat e@(L l _) = checkPat l e [] + +checkPat :: SrcSpan -> Located (PatBuilder GhcPs) -> [LPat GhcPs] + -> PV (LPat GhcPs) +checkPat loc (L l e@(PatBuilderVar (L _ c))) args + | isRdrDataCon c = return (L loc (ConPatIn (L l c) (PrefixCon args))) + | not (null args) && patIsRec c = + localPV_msg (\_ -> text "Perhaps you intended to use RecursiveDo") $ + patFail l (ppr e) +checkPat loc (L _ (PatBuilderApp f e)) args + = do p <- checkLPat e + checkPat loc f (p : args) +checkPat loc (L _ e) [] + = do p <- checkAPat loc e + return (L loc p) +checkPat loc e _ + = patFail loc (ppr e) + +checkAPat :: SrcSpan -> PatBuilder GhcPs -> PV (Pat GhcPs) +checkAPat loc e0 = do + nPlusKPatterns <- getBit NPlusKPatternsBit + case e0 of + PatBuilderPat p -> return p + PatBuilderVar x -> return (VarPat noExtField x) + + -- Overloaded numeric patterns (e.g. f 0 x = x) + -- Negation is recorded separately, so that the literal is zero or +ve + -- NB. Negative *primitive* literals are already handled by the lexer + PatBuilderOverLit pos_lit -> return (mkNPat (L loc pos_lit) Nothing) + + -- n+k patterns + PatBuilderOpApp + (L nloc (PatBuilderVar (L _ n))) + (L _ plus) + (L lloc (PatBuilderOverLit lit@(OverLit {ol_val = HsIntegral {}}))) + | nPlusKPatterns && (plus == plus_RDR) + -> return (mkNPlusKPat (L nloc n) (L lloc lit)) + + PatBuilderOpApp l (L cl c) r + | isRdrDataCon c -> do + l <- checkLPat l + r <- checkLPat r + return (ConPatIn (L cl c) (InfixCon l r)) + + PatBuilderPar e -> checkLPat e >>= (return . (ParPat noExtField)) + _ -> patFail loc (ppr e0) + +placeHolderPunRhs :: DisambECP b => PV (Located b) +-- The RHS of a punned record field will be filled in by the renamer +-- It's better not to make it an error, in case we want to print it when +-- debugging +placeHolderPunRhs = mkHsVarPV (noLoc pun_RDR) + +plus_RDR, pun_RDR :: RdrName +plus_RDR = mkUnqual varName (fsLit "+") -- Hack +pun_RDR = mkUnqual varName (fsLit "pun-right-hand-side") + +checkPatField :: LHsRecField GhcPs (Located (PatBuilder GhcPs)) + -> PV (LHsRecField GhcPs (LPat GhcPs)) +checkPatField (L l fld) = do p <- checkLPat (hsRecFieldArg fld) + return (L l (fld { hsRecFieldArg = p })) + +patFail :: SrcSpan -> SDoc -> PV a +patFail loc e = addFatalError loc $ text "Parse error in pattern:" <+> ppr e + +patIsRec :: RdrName -> Bool +patIsRec e = e == mkUnqual varName (fsLit "rec") + +--------------------------------------------------------------------------- +-- Check Equation Syntax + +checkValDef :: Located (PatBuilder GhcPs) + -> Maybe (LHsType GhcPs) + -> Located (a,GRHSs GhcPs (LHsExpr GhcPs)) + -> P ([AddAnn],HsBind GhcPs) + +checkValDef lhs (Just sig) grhss + -- x :: ty = rhs parses as a *pattern* binding + = do lhs' <- runPV $ mkHsTySigPV (combineLocs lhs sig) lhs sig >>= checkLPat + checkPatBind lhs' grhss + +checkValDef lhs Nothing g@(L l (_,grhss)) + = do { mb_fun <- isFunLhs lhs + ; case mb_fun of + Just (fun, is_infix, pats, ann) -> + checkFunBind NoSrcStrict ann (getLoc lhs) + fun is_infix pats (L l grhss) + Nothing -> do + lhs' <- checkPattern lhs + checkPatBind lhs' g } + +checkFunBind :: SrcStrictness + -> [AddAnn] + -> SrcSpan + -> Located RdrName + -> LexicalFixity + -> [Located (PatBuilder GhcPs)] + -> Located (GRHSs GhcPs (LHsExpr GhcPs)) + -> P ([AddAnn],HsBind GhcPs) +checkFunBind strictness ann lhs_loc fun is_infix pats (L rhs_span grhss) + = do ps <- mapM checkPattern pats + let match_span = combineSrcSpans lhs_loc rhs_span + -- Add back the annotations stripped from any HsPar values in the lhs + -- mapM_ (\a -> a match_span) ann + return (ann, makeFunBind fun + [L match_span (Match { m_ext = noExtField + , m_ctxt = FunRhs + { mc_fun = fun + , mc_fixity = is_infix + , mc_strictness = strictness } + , m_pats = ps + , m_grhss = grhss })]) + -- The span of the match covers the entire equation. + -- That isn't quite right, but it'll do for now. + +makeFunBind :: Located RdrName -> [LMatch GhcPs (LHsExpr GhcPs)] + -> HsBind GhcPs +-- Like GHC.Hs.Utils.mkFunBind, but we need to be able to set the fixity too +makeFunBind fn ms + = FunBind { fun_ext = noExtField, + fun_id = fn, + fun_matches = mkMatchGroup FromSource ms, + fun_tick = [] } + +-- See Note [FunBind vs PatBind] +checkPatBind :: LPat GhcPs + -> Located (a,GRHSs GhcPs (LHsExpr GhcPs)) + -> P ([AddAnn],HsBind GhcPs) +checkPatBind lhs (L match_span (_,grhss)) + | BangPat _ p <- unLoc lhs + , VarPat _ v <- unLoc p + = return ([], makeFunBind v [L match_span (m v)]) + where + m v = Match { m_ext = noExtField + , m_ctxt = FunRhs { mc_fun = L (getLoc lhs) (unLoc v) + , mc_fixity = Prefix + , mc_strictness = SrcStrict } + , m_pats = [] + , m_grhss = grhss } + +checkPatBind lhs (L _ (_,grhss)) + = return ([],PatBind noExtField lhs grhss ([],[])) + +checkValSigLhs :: LHsExpr GhcPs -> P (Located RdrName) +checkValSigLhs (L _ (HsVar _ lrdr@(L _ v))) + | isUnqual v + , not (isDataOcc (rdrNameOcc v)) + = return lrdr + +checkValSigLhs lhs@(L l _) + = addFatalError l ((text "Invalid type signature:" <+> + ppr lhs <+> text ":: ...") + $$ text hint) + where + hint | foreign_RDR `looks_like` lhs + = "Perhaps you meant to use ForeignFunctionInterface?" + | default_RDR `looks_like` lhs + = "Perhaps you meant to use DefaultSignatures?" + | pattern_RDR `looks_like` lhs + = "Perhaps you meant to use PatternSynonyms?" + | otherwise + = "Should be of form <variable> :: <type>" + + -- A common error is to forget the ForeignFunctionInterface flag + -- so check for that, and suggest. cf #3805 + -- Sadly 'foreign import' still barfs 'parse error' because + -- 'import' is a keyword + looks_like s (L _ (HsVar _ (L _ v))) = v == s + looks_like s (L _ (HsApp _ lhs _)) = looks_like s lhs + looks_like _ _ = False + + foreign_RDR = mkUnqual varName (fsLit "foreign") + default_RDR = mkUnqual varName (fsLit "default") + pattern_RDR = mkUnqual varName (fsLit "pattern") + +checkDoAndIfThenElse + :: (Outputable a, Outputable b, Outputable c) + => Located a -> Bool -> b -> Bool -> Located c -> PV () +checkDoAndIfThenElse guardExpr semiThen thenExpr semiElse elseExpr + | semiThen || semiElse + = do doAndIfThenElse <- getBit DoAndIfThenElseBit + unless doAndIfThenElse $ do + addError (combineLocs guardExpr elseExpr) + (text "Unexpected semi-colons in conditional:" + $$ nest 4 expr + $$ text "Perhaps you meant to use DoAndIfThenElse?") + | otherwise = return () + where pprOptSemi True = semi + pprOptSemi False = empty + expr = text "if" <+> ppr guardExpr <> pprOptSemi semiThen <+> + text "then" <+> ppr thenExpr <> pprOptSemi semiElse <+> + text "else" <+> ppr elseExpr + +isFunLhs :: Located (PatBuilder GhcPs) + -> P (Maybe (Located RdrName, LexicalFixity, [Located (PatBuilder GhcPs)],[AddAnn])) +-- A variable binding is parsed as a FunBind. +-- Just (fun, is_infix, arg_pats) if e is a function LHS +isFunLhs e = go e [] [] + where + go (L loc (PatBuilderVar (L _ f))) es ann + | not (isRdrDataCon f) = return (Just (L loc f, Prefix, es, ann)) + go (L _ (PatBuilderApp f e)) es ann = go f (e:es) ann + go (L l (PatBuilderPar e)) es@(_:_) ann = go e es (ann ++ mkParensApiAnn l) + go (L loc (PatBuilderOpApp l (L loc' op) r)) es ann + | not (isRdrDataCon op) -- We have found the function! + = return (Just (L loc' op, Infix, (l:r:es), ann)) + | otherwise -- Infix data con; keep going + = do { mb_l <- go l es ann + ; case mb_l of + Just (op', Infix, j : k : es', ann') + -> return (Just (op', Infix, j : op_app : es', ann')) + where + op_app = L loc (PatBuilderOpApp k + (L loc' op) r) + _ -> return Nothing } + go _ _ _ = return Nothing + +-- | Either an operator or an operand. +data TyEl = TyElOpr RdrName | TyElOpd (HsType GhcPs) + | TyElKindApp SrcSpan (LHsType GhcPs) + -- See Note [TyElKindApp SrcSpan interpretation] + | TyElUnpackedness ([AddAnn], SourceText, SrcUnpackedness) + | TyElDocPrev HsDocString + + +{- Note [TyElKindApp SrcSpan interpretation] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +A TyElKindApp captures type application written in haskell as + + @ Foo + +where Foo is some type. + +The SrcSpan reflects both elements, and there are AnnAt and AnnVal API +Annotations attached to this SrcSpan for the specific locations of +each within it. +-} + +instance Outputable TyEl where + ppr (TyElOpr name) = ppr name + ppr (TyElOpd ty) = ppr ty + ppr (TyElKindApp _ ki) = text "@" <> ppr ki + ppr (TyElUnpackedness (_, _, unpk)) = ppr unpk + ppr (TyElDocPrev doc) = ppr doc + +-- | Extract a strictness/unpackedness annotation from the front of a reversed +-- 'TyEl' list. +pUnpackedness + :: [Located TyEl] -- reversed TyEl + -> Maybe ( SrcSpan + , [AddAnn] + , SourceText + , SrcUnpackedness + , [Located TyEl] {- remaining TyEl -}) +pUnpackedness (L l x1 : xs) + | TyElUnpackedness (anns, prag, unpk) <- x1 + = Just (l, anns, prag, unpk, xs) +pUnpackedness _ = Nothing + +pBangTy + :: LHsType GhcPs -- a type to be wrapped inside HsBangTy + -> [Located TyEl] -- reversed TyEl + -> ( Bool {- has a strict mark been consumed? -} + , LHsType GhcPs {- the resulting BangTy -} + , P () {- add annotations -} + , [Located TyEl] {- remaining TyEl -}) +pBangTy lt@(L l1 _) xs = + case pUnpackedness xs of + Nothing -> (False, lt, pure (), xs) + Just (l2, anns, prag, unpk, xs') -> + let bl = combineSrcSpans l1 l2 + bt = addUnpackedness (prag, unpk) lt + in (True, L bl bt, addAnnsAt bl anns, xs') + +mkBangTy :: SrcStrictness -> LHsType GhcPs -> HsType GhcPs +mkBangTy strictness = + HsBangTy noExtField (HsSrcBang NoSourceText NoSrcUnpack strictness) + +addUnpackedness :: (SourceText, SrcUnpackedness) -> LHsType GhcPs -> HsType GhcPs +addUnpackedness (prag, unpk) (L _ (HsBangTy x bang t)) + | HsSrcBang NoSourceText NoSrcUnpack strictness <- bang + = HsBangTy x (HsSrcBang prag unpk strictness) t +addUnpackedness (prag, unpk) t + = HsBangTy noExtField (HsSrcBang prag unpk NoSrcStrict) t + +-- | Merge a /reversed/ and /non-empty/ soup of operators and operands +-- into a type. +-- +-- User input: @F x y + G a b * X@ +-- Input to 'mergeOps': [X, *, b, a, G, +, y, x, F] +-- Output corresponds to what the user wrote assuming all operators are of the +-- same fixity and right-associative. +-- +-- It's a bit silly that we're doing it at all, as the renamer will have to +-- rearrange this, and it'd be easier to keep things separate. +-- +-- See Note [Parsing data constructors is hard] +mergeOps :: [Located TyEl] -> P (LHsType GhcPs) +mergeOps ((L l1 (TyElOpd t)) : xs) + | (_, t', addAnns, xs') <- pBangTy (L l1 t) xs + , null xs' -- We accept a BangTy only when there are no preceding TyEl. + = addAnns >> return t' +mergeOps all_xs = go (0 :: Int) [] id all_xs + where + -- NB. When modifying clauses in 'go', make sure that the reasoning in + -- Note [Non-empty 'acc' in mergeOps clause [end]] is still correct. + + -- clause [unpk]: + -- handle (NO)UNPACK pragmas + go k acc ops_acc ((L l (TyElUnpackedness (anns, unpkSrc, unpk))):xs) = + if not (null acc) && null xs + then do { acc' <- eitherToP $ mergeOpsAcc acc + ; let a = ops_acc acc' + strictMark = HsSrcBang unpkSrc unpk NoSrcStrict + bl = combineSrcSpans l (getLoc a) + bt = HsBangTy noExtField strictMark a + ; addAnnsAt bl anns + ; return (L bl bt) } + else addFatalError l unpkError + where + unpkSDoc = case unpkSrc of + NoSourceText -> ppr unpk + SourceText str -> text str <> text " #-}" + unpkError + | not (null xs) = unpkSDoc <+> text "cannot appear inside a type." + | null acc && k == 0 = unpkSDoc <+> text "must be applied to a type." + | otherwise = + -- See Note [Impossible case in mergeOps clause [unpk]] + panic "mergeOps.UNPACK: impossible position" + + -- clause [doc]: + -- we do not expect to encounter any docs + go _ _ _ ((L l (TyElDocPrev _)):_) = + failOpDocPrev l + + -- clause [opr]: + -- when we encounter an operator, we must have accumulated + -- something for its rhs, and there must be something left + -- to build its lhs. + go k acc ops_acc ((L l (TyElOpr op)):xs) = + if null acc || null (filter isTyElOpd xs) + then failOpFewArgs (L l op) + else do { acc' <- eitherToP (mergeOpsAcc acc) + ; go (k + 1) [] (\c -> mkLHsOpTy c (L l op) (ops_acc acc')) xs } + where + isTyElOpd (L _ (TyElOpd _)) = True + isTyElOpd _ = False + + -- clause [opd]: + -- whenever an operand is encountered, it is added to the accumulator + go k acc ops_acc ((L l (TyElOpd a)):xs) = go k (HsValArg (L l a):acc) ops_acc xs + + -- clause [tyapp]: + -- whenever a type application is encountered, it is added to the accumulator + go k acc ops_acc ((L _ (TyElKindApp l a)):xs) = go k (HsTypeArg l a:acc) ops_acc xs + + -- clause [end] + -- See Note [Non-empty 'acc' in mergeOps clause [end]] + go _ acc ops_acc [] = do { acc' <- eitherToP (mergeOpsAcc acc) + ; return (ops_acc acc') } + +mergeOpsAcc :: [HsArg (LHsType GhcPs) (LHsKind GhcPs)] + -> Either (SrcSpan, SDoc) (LHsType GhcPs) +mergeOpsAcc [] = panic "mergeOpsAcc: empty input" +mergeOpsAcc (HsTypeArg _ (L loc ki):_) + = Left (loc, text "Unexpected type application:" <+> ppr ki) +mergeOpsAcc (HsValArg ty : xs) = go1 ty xs + where + go1 :: LHsType GhcPs + -> [HsArg (LHsType GhcPs) (LHsKind GhcPs)] + -> Either (SrcSpan, SDoc) (LHsType GhcPs) + go1 lhs [] = Right lhs + go1 lhs (x:xs) = case x of + HsValArg ty -> go1 (mkHsAppTy lhs ty) xs + HsTypeArg loc ki -> let ty = mkHsAppKindTy loc lhs ki + in go1 ty xs + HsArgPar _ -> go1 lhs xs +mergeOpsAcc (HsArgPar _: xs) = mergeOpsAcc xs + +{- Note [Impossible case in mergeOps clause [unpk]] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +This case should never occur. Let us consider all possible +variations of 'acc', 'xs', and 'k': + + acc xs k +============================== + null | null 0 -- "must be applied to a type" + null | not null 0 -- "must be applied to a type" +not null | null 0 -- successful parse +not null | not null 0 -- "cannot appear inside a type" + null | null >0 -- handled in clause [opr] + null | not null >0 -- "cannot appear inside a type" +not null | null >0 -- successful parse +not null | not null >0 -- "cannot appear inside a type" + +The (null acc && null xs && k>0) case is handled in clause [opr] +by the following check: + + if ... || null (filter isTyElOpd xs) + then failOpFewArgs (L l op) + +We know that this check has been performed because k>0, and by +the time we reach the end of the list (null xs), the only way +for (null acc) to hold is that there was not a single TyElOpd +between the operator and the end of the list. But this case is +caught by the check and reported as 'failOpFewArgs'. +-} + +{- Note [Non-empty 'acc' in mergeOps clause [end]] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +In clause [end] we need to know that 'acc' is non-empty to call 'mergeAcc' +without a check. + +Running 'mergeOps' with an empty input list is forbidden, so we do not consider +this possibility. This means we'll hit at least one other clause before we +reach clause [end]. + +* Clauses [unpk] and [doc] do not call 'go' recursively, so we cannot hit + clause [end] from there. +* Clause [opd] makes 'acc' non-empty, so if we hit clause [end] after it, 'acc' + will be non-empty. +* Clause [opr] checks that (filter isTyElOpd xs) is not null - so we are going + to hit clause [opd] at least once before we reach clause [end], making 'acc' + non-empty. +* There are no other clauses. + +Therefore, it is safe to omit a check for non-emptiness of 'acc' in clause +[end]. + +-} + +pInfixSide :: [Located TyEl] -> Maybe (LHsType GhcPs, P (), [Located TyEl]) +pInfixSide ((L l (TyElOpd t)):xs) + | (True, t', addAnns, xs') <- pBangTy (L l t) xs + = Just (t', addAnns, xs') +pInfixSide (el:xs1) + | Just t1 <- pLHsTypeArg el + = go [t1] xs1 + where + go :: [HsArg (LHsType GhcPs) (LHsKind GhcPs)] + -> [Located TyEl] -> Maybe (LHsType GhcPs, P (), [Located TyEl]) + go acc (el:xs) + | Just t <- pLHsTypeArg el + = go (t:acc) xs + go acc xs = case mergeOpsAcc acc of + Left _ -> Nothing + Right acc' -> Just (acc', pure (), xs) +pInfixSide _ = Nothing + +pLHsTypeArg :: Located TyEl -> Maybe (HsArg (LHsType GhcPs) (LHsKind GhcPs)) +pLHsTypeArg (L l (TyElOpd a)) = Just (HsValArg (L l a)) +pLHsTypeArg (L _ (TyElKindApp l a)) = Just (HsTypeArg l a) +pLHsTypeArg _ = Nothing + +pDocPrev :: [Located TyEl] -> (Maybe LHsDocString, [Located TyEl]) +pDocPrev = go Nothing + where + go mTrailingDoc ((L l (TyElDocPrev doc)):xs) = + go (mTrailingDoc `mplus` Just (L l doc)) xs + go mTrailingDoc xs = (mTrailingDoc, xs) + +orErr :: Maybe a -> b -> Either b a +orErr (Just a) _ = Right a +orErr Nothing b = Left b + +-- | Merge a /reversed/ and /non-empty/ soup of operators and operands +-- into a data constructor. +-- +-- User input: @C !A B -- ^ doc@ +-- Input to 'mergeDataCon': ["doc", B, !A, C] +-- Output: (C, PrefixCon [!A, B], "doc") +-- +-- See Note [Parsing data constructors is hard] +mergeDataCon + :: [Located TyEl] + -> P ( Located RdrName -- constructor name + , HsConDeclDetails GhcPs -- constructor field information + , Maybe LHsDocString -- docstring to go on the constructor + ) +mergeDataCon all_xs = + do { (addAnns, a) <- eitherToP res + ; addAnns + ; return a } + where + -- We start by splitting off the trailing documentation comment, + -- if any exists. + (mTrailingDoc, all_xs') = pDocPrev all_xs + + -- Determine whether the trailing documentation comment exists and is the + -- only docstring in this constructor declaration. + -- + -- When true, it means that it applies to the constructor itself: + -- data T = C + -- A + -- B -- ^ Comment on C (singleDoc == True) + -- + -- When false, it means that it applies to the last field: + -- data T = C -- ^ Comment on C + -- A -- ^ Comment on A + -- B -- ^ Comment on B (singleDoc == False) + singleDoc = isJust mTrailingDoc && + null [ () | (L _ (TyElDocPrev _)) <- all_xs' ] + + -- The result of merging the list of reversed TyEl into a + -- data constructor, along with [AddAnn]. + res = goFirst all_xs' + + -- Take the trailing docstring into account when interpreting + -- the docstring near the constructor. + -- + -- data T = C -- ^ docstring right after C + -- A + -- B -- ^ trailing docstring + -- + -- 'mkConDoc' must be applied to the docstring right after C, so that it + -- falls back to the trailing docstring when appropriate (see singleDoc). + mkConDoc mDoc | singleDoc = mDoc `mplus` mTrailingDoc + | otherwise = mDoc + + -- The docstring for the last field of a data constructor. + trailingFieldDoc | singleDoc = Nothing + | otherwise = mTrailingDoc + + goFirst [ L l (TyElOpd (HsTyVar _ _ (L _ tc))) ] + = do { data_con <- tyConToDataCon l tc + ; return (pure (), (data_con, PrefixCon [], mTrailingDoc)) } + goFirst ((L l (TyElOpd (HsRecTy _ fields))):xs) + | (mConDoc, xs') <- pDocPrev xs + , [ L l' (TyElOpd (HsTyVar _ _ (L _ tc))) ] <- xs' + = do { data_con <- tyConToDataCon l' tc + ; let mDoc = mTrailingDoc `mplus` mConDoc + ; return (pure (), (data_con, RecCon (L l fields), mDoc)) } + goFirst [L l (TyElOpd (HsTupleTy _ HsBoxedOrConstraintTuple ts))] + = return ( pure () + , ( L l (getRdrName (tupleDataCon Boxed (length ts))) + , PrefixCon ts + , mTrailingDoc ) ) + goFirst ((L l (TyElOpd t)):xs) + | (_, t', addAnns, xs') <- pBangTy (L l t) xs + = go addAnns Nothing [mkLHsDocTyMaybe t' trailingFieldDoc] xs' + goFirst (L l (TyElKindApp _ _):_) + = goInfix Monoid.<> Left (l, kindAppErr) + goFirst xs + = go (pure ()) mTrailingDoc [] xs + + go addAnns mLastDoc ts [ L l (TyElOpd (HsTyVar _ _ (L _ tc))) ] + = do { data_con <- tyConToDataCon l tc + ; return (addAnns, (data_con, PrefixCon ts, mkConDoc mLastDoc)) } + go addAnns mLastDoc ts ((L l (TyElDocPrev doc)):xs) = + go addAnns (mLastDoc `mplus` Just (L l doc)) ts xs + go addAnns mLastDoc ts ((L l (TyElOpd t)):xs) + | (_, t', addAnns', xs') <- pBangTy (L l t) xs + , t'' <- mkLHsDocTyMaybe t' mLastDoc + = go (addAnns >> addAnns') Nothing (t'':ts) xs' + go _ _ _ ((L _ (TyElOpr _)):_) = + -- Encountered an operator: backtrack to the beginning and attempt + -- to parse as an infix definition. + goInfix + go _ _ _ (L l (TyElKindApp _ _):_) = goInfix Monoid.<> Left (l, kindAppErr) + go _ _ _ _ = Left malformedErr + where + malformedErr = + ( foldr combineSrcSpans noSrcSpan (map getLoc all_xs') + , text "Cannot parse data constructor" <+> + text "in a data/newtype declaration:" $$ + nest 2 (hsep . reverse $ map ppr all_xs')) + + goInfix = + do { let xs0 = all_xs' + ; (rhs_t, rhs_addAnns, xs1) <- pInfixSide xs0 `orErr` malformedErr + ; let (mOpDoc, xs2) = pDocPrev xs1 + ; (op, xs3) <- case xs2 of + (L l (TyElOpr op)) : xs3 -> + do { data_con <- tyConToDataCon l op + ; return (data_con, xs3) } + _ -> Left malformedErr + ; let (mLhsDoc, xs4) = pDocPrev xs3 + ; (lhs_t, lhs_addAnns, xs5) <- pInfixSide xs4 `orErr` malformedErr + ; unless (null xs5) (Left malformedErr) + ; let rhs = mkLHsDocTyMaybe rhs_t trailingFieldDoc + lhs = mkLHsDocTyMaybe lhs_t mLhsDoc + addAnns = lhs_addAnns >> rhs_addAnns + ; return (addAnns, (op, InfixCon lhs rhs, mkConDoc mOpDoc)) } + where + malformedErr = + ( foldr combineSrcSpans noSrcSpan (map getLoc all_xs') + , text "Cannot parse an infix data constructor" <+> + text "in a data/newtype declaration:" $$ + nest 2 (hsep . reverse $ map ppr all_xs')) + + kindAppErr = + text "Unexpected kind application" <+> + text "in a data/newtype declaration:" $$ + nest 2 (hsep . reverse $ map ppr all_xs') + +--------------------------------------------------------------------------- +-- | Check for monad comprehensions +-- +-- If the flag MonadComprehensions is set, return a 'MonadComp' context, +-- otherwise use the usual 'ListComp' context + +checkMonadComp :: PV (HsStmtContext GhcRn) +checkMonadComp = do + monadComprehensions <- getBit MonadComprehensionsBit + return $ if monadComprehensions + then MonadComp + else ListComp + +-- ------------------------------------------------------------------------- +-- Expression/command/pattern ambiguity. +-- See Note [Ambiguous syntactic categories] +-- + +-- See Note [Parser-Validator] +-- See Note [Ambiguous syntactic categories] +-- +-- This newtype is required to avoid impredicative types in monadic +-- productions. That is, in a production that looks like +-- +-- | ... {% return (ECP ...) } +-- +-- we are dealing with +-- P ECP +-- whereas without a newtype we would be dealing with +-- P (forall b. DisambECP b => PV (Located b)) +-- +newtype ECP = + ECP { runECP_PV :: forall b. DisambECP b => PV (Located b) } + +runECP_P :: DisambECP b => ECP -> P (Located b) +runECP_P p = runPV (runECP_PV p) + +ecpFromExp :: LHsExpr GhcPs -> ECP +ecpFromExp a = ECP (ecpFromExp' a) + +ecpFromCmd :: LHsCmd GhcPs -> ECP +ecpFromCmd a = ECP (ecpFromCmd' a) + +-- | Disambiguate infix operators. +-- See Note [Ambiguous syntactic categories] +class DisambInfixOp b where + mkHsVarOpPV :: Located RdrName -> PV (Located b) + mkHsConOpPV :: Located RdrName -> PV (Located b) + mkHsInfixHolePV :: SrcSpan -> PV (Located b) + +instance DisambInfixOp (HsExpr GhcPs) where + mkHsVarOpPV v = return $ L (getLoc v) (HsVar noExtField v) + mkHsConOpPV v = return $ L (getLoc v) (HsVar noExtField v) + mkHsInfixHolePV l = return $ L l hsHoleExpr + +instance DisambInfixOp RdrName where + mkHsConOpPV (L l v) = return $ L l v + mkHsVarOpPV (L l v) = return $ L l v + mkHsInfixHolePV l = + addFatalError l $ text "Invalid infix hole, expected an infix operator" + +-- | Disambiguate constructs that may appear when we do not know ahead of time whether we are +-- parsing an expression, a command, or a pattern. +-- See Note [Ambiguous syntactic categories] +class b ~ (Body b) GhcPs => DisambECP b where + -- | See Note [Body in DisambECP] + type Body b :: Type -> Type + -- | Return a command without ambiguity, or fail in a non-command context. + ecpFromCmd' :: LHsCmd GhcPs -> PV (Located b) + -- | Return an expression without ambiguity, or fail in a non-expression context. + ecpFromExp' :: LHsExpr GhcPs -> PV (Located b) + -- | Disambiguate "\... -> ..." (lambda) + mkHsLamPV :: SrcSpan -> MatchGroup GhcPs (Located b) -> PV (Located b) + -- | Disambiguate "let ... in ..." + mkHsLetPV :: SrcSpan -> LHsLocalBinds GhcPs -> Located b -> PV (Located b) + -- | Infix operator representation + type InfixOp b + -- | Bring superclass constraints on InfixOp into scope. + -- See Note [UndecidableSuperClasses for associated types] + superInfixOp :: (DisambInfixOp (InfixOp b) => PV (Located b )) -> PV (Located b) + -- | Disambiguate "f # x" (infix operator) + mkHsOpAppPV :: SrcSpan -> Located b -> Located (InfixOp b) -> Located b -> PV (Located b) + -- | Disambiguate "case ... of ..." + mkHsCasePV :: SrcSpan -> LHsExpr GhcPs -> MatchGroup GhcPs (Located b) -> PV (Located b) + -- | Function argument representation + type FunArg b + -- | Bring superclass constraints on FunArg into scope. + -- See Note [UndecidableSuperClasses for associated types] + superFunArg :: (DisambECP (FunArg b) => PV (Located b)) -> PV (Located b) + -- | Disambiguate "f x" (function application) + mkHsAppPV :: SrcSpan -> Located b -> Located (FunArg b) -> PV (Located b) + -- | Disambiguate "if ... then ... else ..." + mkHsIfPV :: SrcSpan + -> LHsExpr GhcPs + -> Bool -- semicolon? + -> Located b + -> Bool -- semicolon? + -> Located b + -> PV (Located b) + -- | Disambiguate "do { ... }" (do notation) + mkHsDoPV :: SrcSpan -> Located [LStmt GhcPs (Located b)] -> PV (Located b) + -- | Disambiguate "( ... )" (parentheses) + mkHsParPV :: SrcSpan -> Located b -> PV (Located b) + -- | Disambiguate a variable "f" or a data constructor "MkF". + mkHsVarPV :: Located RdrName -> PV (Located b) + -- | Disambiguate a monomorphic literal + mkHsLitPV :: Located (HsLit GhcPs) -> PV (Located b) + -- | Disambiguate an overloaded literal + mkHsOverLitPV :: Located (HsOverLit GhcPs) -> PV (Located b) + -- | Disambiguate a wildcard + mkHsWildCardPV :: SrcSpan -> PV (Located b) + -- | Disambiguate "a :: t" (type annotation) + mkHsTySigPV :: SrcSpan -> Located b -> LHsType GhcPs -> PV (Located b) + -- | Disambiguate "[a,b,c]" (list syntax) + mkHsExplicitListPV :: SrcSpan -> [Located b] -> PV (Located b) + -- | Disambiguate "$(...)" and "[quasi|...|]" (TH splices) + mkHsSplicePV :: Located (HsSplice GhcPs) -> PV (Located b) + -- | Disambiguate "f { a = b, ... }" syntax (record construction and record updates) + mkHsRecordPV :: + SrcSpan -> + SrcSpan -> + Located b -> + ([LHsRecField GhcPs (Located b)], Maybe SrcSpan) -> + PV (Located b) + -- | Disambiguate "-a" (negation) + mkHsNegAppPV :: SrcSpan -> Located b -> PV (Located b) + -- | Disambiguate "(# a)" (right operator section) + mkHsSectionR_PV :: SrcSpan -> Located (InfixOp b) -> Located b -> PV (Located b) + -- | Disambiguate "(a -> b)" (view pattern) + mkHsViewPatPV :: SrcSpan -> LHsExpr GhcPs -> Located b -> PV (Located b) + -- | Disambiguate "a@b" (as-pattern) + mkHsAsPatPV :: SrcSpan -> Located RdrName -> Located b -> PV (Located b) + -- | Disambiguate "~a" (lazy pattern) + mkHsLazyPatPV :: SrcSpan -> Located b -> PV (Located b) + -- | Disambiguate "!a" (bang pattern) + mkHsBangPatPV :: SrcSpan -> Located b -> PV (Located b) + -- | Disambiguate tuple sections and unboxed sums + mkSumOrTuplePV :: SrcSpan -> Boxity -> SumOrTuple b -> PV (Located b) + -- | Validate infixexp LHS to reject unwanted {-# SCC ... #-} pragmas + rejectPragmaPV :: Located b -> PV () + + +{- Note [UndecidableSuperClasses for associated types] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +(This Note is about the code in GHC, not about the user code that we are parsing) + +Assume we have a class C with an associated type T: + + class C a where + type T a + ... + +If we want to add 'C (T a)' as a superclass, we need -XUndecidableSuperClasses: + + {-# LANGUAGE UndecidableSuperClasses #-} + class C (T a) => C a where + type T a + ... + +Unfortunately, -XUndecidableSuperClasses don't work all that well, sometimes +making GHC loop. The workaround is to bring this constraint into scope +manually with a helper method: + + class C a where + type T a + superT :: (C (T a) => r) -> r + +In order to avoid ambiguous types, 'r' must mention 'a'. + +For consistency, we use this approach for all constraints on associated types, +even when -XUndecidableSuperClasses are not required. +-} + +{- Note [Body in DisambECP] +~~~~~~~~~~~~~~~~~~~~~~~~~~~ +There are helper functions (mkBodyStmt, mkBindStmt, unguardedRHS, etc) that +require their argument to take a form of (body GhcPs) for some (body :: Type -> +*). To satisfy this requirement, we say that (b ~ Body b GhcPs) in the +superclass constraints of DisambECP. + +The alternative is to change mkBodyStmt, mkBindStmt, unguardedRHS, etc, to drop +this requirement. It is possible and would allow removing the type index of +PatBuilder, but leads to worse type inference, breaking some code in the +typechecker. +-} + +instance DisambECP (HsCmd GhcPs) where + type Body (HsCmd GhcPs) = HsCmd + ecpFromCmd' = return + ecpFromExp' (L l e) = cmdFail l (ppr e) + mkHsLamPV l mg = return $ L l (HsCmdLam noExtField mg) + mkHsLetPV l bs e = return $ L l (HsCmdLet noExtField bs e) + type InfixOp (HsCmd GhcPs) = HsExpr GhcPs + superInfixOp m = m + mkHsOpAppPV l c1 op c2 = do + let cmdArg c = L (getLoc c) $ HsCmdTop noExtField c + return $ L l $ HsCmdArrForm noExtField op Infix Nothing [cmdArg c1, cmdArg c2] + mkHsCasePV l c mg = return $ L l (HsCmdCase noExtField c mg) + type FunArg (HsCmd GhcPs) = HsExpr GhcPs + superFunArg m = m + mkHsAppPV l c e = do + checkCmdBlockArguments c + checkExpBlockArguments e + return $ L l (HsCmdApp noExtField c e) + mkHsIfPV l c semi1 a semi2 b = do + checkDoAndIfThenElse c semi1 a semi2 b + return $ L l (mkHsCmdIf c a b) + mkHsDoPV l stmts = return $ L l (HsCmdDo noExtField stmts) + mkHsParPV l c = return $ L l (HsCmdPar noExtField c) + mkHsVarPV (L l v) = cmdFail l (ppr v) + mkHsLitPV (L l a) = cmdFail l (ppr a) + mkHsOverLitPV (L l a) = cmdFail l (ppr a) + mkHsWildCardPV l = cmdFail l (text "_") + mkHsTySigPV l a sig = cmdFail l (ppr a <+> text "::" <+> ppr sig) + mkHsExplicitListPV l xs = cmdFail l $ + brackets (fsep (punctuate comma (map ppr xs))) + mkHsSplicePV (L l sp) = cmdFail l (ppr sp) + mkHsRecordPV l _ a (fbinds, ddLoc) = cmdFail l $ + ppr a <+> ppr (mk_rec_fields fbinds ddLoc) + mkHsNegAppPV l a = cmdFail l (text "-" <> ppr a) + mkHsSectionR_PV l op c = cmdFail l $ + let pp_op = fromMaybe (panic "cannot print infix operator") + (ppr_infix_expr (unLoc op)) + in pp_op <> ppr c + mkHsViewPatPV l a b = cmdFail l $ + ppr a <+> text "->" <+> ppr b + mkHsAsPatPV l v c = cmdFail l $ + pprPrefixOcc (unLoc v) <> text "@" <> ppr c + mkHsLazyPatPV l c = cmdFail l $ + text "~" <> ppr c + mkHsBangPatPV l c = cmdFail l $ + text "!" <> ppr c + mkSumOrTuplePV l boxity a = cmdFail l (pprSumOrTuple boxity a) + rejectPragmaPV _ = return () + +cmdFail :: SrcSpan -> SDoc -> PV a +cmdFail loc e = addFatalError loc $ + hang (text "Parse error in command:") 2 (ppr e) + +instance DisambECP (HsExpr GhcPs) where + type Body (HsExpr GhcPs) = HsExpr + ecpFromCmd' (L l c) = do + addError l $ vcat + [ text "Arrow command found where an expression was expected:", + nest 2 (ppr c) ] + return (L l hsHoleExpr) + ecpFromExp' = return + mkHsLamPV l mg = return $ L l (HsLam noExtField mg) + mkHsLetPV l bs c = return $ L l (HsLet noExtField bs c) + type InfixOp (HsExpr GhcPs) = HsExpr GhcPs + superInfixOp m = m + mkHsOpAppPV l e1 op e2 = do + return $ L l $ OpApp noExtField e1 op e2 + mkHsCasePV l e mg = return $ L l (HsCase noExtField e mg) + type FunArg (HsExpr GhcPs) = HsExpr GhcPs + superFunArg m = m + mkHsAppPV l e1 e2 = do + checkExpBlockArguments e1 + checkExpBlockArguments e2 + return $ L l (HsApp noExtField e1 e2) + mkHsIfPV l c semi1 a semi2 b = do + checkDoAndIfThenElse c semi1 a semi2 b + return $ L l (mkHsIf c a b) + mkHsDoPV l stmts = return $ L l (HsDo noExtField DoExpr stmts) + mkHsParPV l e = return $ L l (HsPar noExtField e) + mkHsVarPV v@(getLoc -> l) = return $ L l (HsVar noExtField v) + mkHsLitPV (L l a) = return $ L l (HsLit noExtField a) + mkHsOverLitPV (L l a) = return $ L l (HsOverLit noExtField a) + mkHsWildCardPV l = return $ L l hsHoleExpr + mkHsTySigPV l a sig = return $ L l (ExprWithTySig noExtField a (mkLHsSigWcType sig)) + mkHsExplicitListPV l xs = return $ L l (ExplicitList noExtField Nothing xs) + mkHsSplicePV sp = return $ mapLoc (HsSpliceE noExtField) sp + mkHsRecordPV l lrec a (fbinds, ddLoc) = do + r <- mkRecConstrOrUpdate a lrec (fbinds, ddLoc) + checkRecordSyntax (L l r) + mkHsNegAppPV l a = return $ L l (NegApp noExtField a noSyntaxExpr) + mkHsSectionR_PV l op e = return $ L l (SectionR noExtField op e) + mkHsViewPatPV l a b = patSynErr "View pattern" l (ppr a <+> text "->" <+> ppr b) empty + mkHsAsPatPV l v e = + patSynErr "@-pattern" l (pprPrefixOcc (unLoc v) <> text "@" <> ppr e) $ + text "Type application syntax requires a space before '@'" + mkHsLazyPatPV l e = patSynErr "Lazy pattern" l (text "~" <> ppr e) $ + text "Did you mean to add a space after the '~'?" + mkHsBangPatPV l e = patSynErr "Bang pattern" l (text "!" <> ppr e) $ + text "Did you mean to add a space after the '!'?" + mkSumOrTuplePV = mkSumOrTupleExpr + rejectPragmaPV (L _ (OpApp _ _ _ e)) = + -- assuming left-associative parsing of operators + rejectPragmaPV e + rejectPragmaPV (L l (HsPragE _ prag _)) = + addError l $ + hang (text "A pragma is not allowed in this position:") 2 (ppr prag) + rejectPragmaPV _ = return () + +patSynErr :: String -> SrcSpan -> SDoc -> SDoc -> PV (LHsExpr GhcPs) +patSynErr item l e explanation = + do { addError l $ + sep [text item <+> text "in expression context:", + nest 4 (ppr e)] $$ + explanation + ; return (L l hsHoleExpr) } + +hsHoleExpr :: HsExpr (GhcPass id) +hsHoleExpr = HsUnboundVar noExtField (mkVarOcc "_") + +-- | See Note [Ambiguous syntactic categories] and Note [PatBuilder] +data PatBuilder p + = PatBuilderPat (Pat p) + | PatBuilderPar (Located (PatBuilder p)) + | PatBuilderApp (Located (PatBuilder p)) (Located (PatBuilder p)) + | PatBuilderOpApp (Located (PatBuilder p)) (Located RdrName) (Located (PatBuilder p)) + | PatBuilderVar (Located RdrName) + | PatBuilderOverLit (HsOverLit GhcPs) + +instance Outputable (PatBuilder GhcPs) where + ppr (PatBuilderPat p) = ppr p + ppr (PatBuilderPar (L _ p)) = parens (ppr p) + ppr (PatBuilderApp (L _ p1) (L _ p2)) = ppr p1 <+> ppr p2 + ppr (PatBuilderOpApp (L _ p1) op (L _ p2)) = ppr p1 <+> ppr op <+> ppr p2 + ppr (PatBuilderVar v) = ppr v + ppr (PatBuilderOverLit l) = ppr l + +instance DisambECP (PatBuilder GhcPs) where + type Body (PatBuilder GhcPs) = PatBuilder + ecpFromCmd' (L l c) = + addFatalError l $ + text "Command syntax in pattern:" <+> ppr c + ecpFromExp' (L l e) = + addFatalError l $ + text "Expression syntax in pattern:" <+> ppr e + mkHsLamPV l _ = addFatalError l $ + text "Lambda-syntax in pattern." $$ + text "Pattern matching on functions is not possible." + mkHsLetPV l _ _ = addFatalError l $ text "(let ... in ...)-syntax in pattern" + type InfixOp (PatBuilder GhcPs) = RdrName + superInfixOp m = m + mkHsOpAppPV l p1 op p2 = return $ L l $ PatBuilderOpApp p1 op p2 + mkHsCasePV l _ _ = addFatalError l $ text "(case ... of ...)-syntax in pattern" + type FunArg (PatBuilder GhcPs) = PatBuilder GhcPs + superFunArg m = m + mkHsAppPV l p1 p2 = return $ L l (PatBuilderApp p1 p2) + mkHsIfPV l _ _ _ _ _ = addFatalError l $ text "(if ... then ... else ...)-syntax in pattern" + mkHsDoPV l _ = addFatalError l $ text "do-notation in pattern" + mkHsParPV l p = return $ L l (PatBuilderPar p) + mkHsVarPV v@(getLoc -> l) = return $ L l (PatBuilderVar v) + mkHsLitPV lit@(L l a) = do + checkUnboxedStringLitPat lit + return $ L l (PatBuilderPat (LitPat noExtField a)) + mkHsOverLitPV (L l a) = return $ L l (PatBuilderOverLit a) + mkHsWildCardPV l = return $ L l (PatBuilderPat (WildPat noExtField)) + mkHsTySigPV l b sig = do + p <- checkLPat b + return $ L l (PatBuilderPat (SigPat noExtField p (mkLHsSigWcType sig))) + mkHsExplicitListPV l xs = do + ps <- traverse checkLPat xs + return (L l (PatBuilderPat (ListPat noExtField ps))) + mkHsSplicePV (L l sp) = return $ L l (PatBuilderPat (SplicePat noExtField sp)) + mkHsRecordPV l _ a (fbinds, ddLoc) = do + r <- mkPatRec a (mk_rec_fields fbinds ddLoc) + checkRecordSyntax (L l r) + mkHsNegAppPV l (L lp p) = do + lit <- case p of + PatBuilderOverLit pos_lit -> return (L lp pos_lit) + _ -> patFail l (text "-" <> ppr p) + return $ L l (PatBuilderPat (mkNPat lit (Just noSyntaxExpr))) + mkHsSectionR_PV l op p = patFail l (pprInfixOcc (unLoc op) <> ppr p) + mkHsViewPatPV l a b = do + p <- checkLPat b + return $ L l (PatBuilderPat (ViewPat noExtField a p)) + mkHsAsPatPV l v e = do + p <- checkLPat e + return $ L l (PatBuilderPat (AsPat noExtField v p)) + mkHsLazyPatPV l e = do + p <- checkLPat e + return $ L l (PatBuilderPat (LazyPat noExtField p)) + mkHsBangPatPV l e = do + p <- checkLPat e + let pb = BangPat noExtField p + hintBangPat l pb + return $ L l (PatBuilderPat pb) + mkSumOrTuplePV = mkSumOrTuplePat + rejectPragmaPV _ = return () + +checkUnboxedStringLitPat :: Located (HsLit GhcPs) -> PV () +checkUnboxedStringLitPat (L loc lit) = + case lit of + HsStringPrim _ _ -- Trac #13260 + -> addFatalError loc (text "Illegal unboxed string literal in pattern:" $$ ppr lit) + _ -> return () + +mkPatRec :: + Located (PatBuilder GhcPs) -> + HsRecFields GhcPs (Located (PatBuilder GhcPs)) -> + PV (PatBuilder GhcPs) +mkPatRec (unLoc -> PatBuilderVar c) (HsRecFields fs dd) + | isRdrDataCon (unLoc c) + = do fs <- mapM checkPatField fs + return (PatBuilderPat (ConPatIn c (RecCon (HsRecFields fs dd)))) +mkPatRec p _ = + addFatalError (getLoc p) $ text "Not a record constructor:" <+> ppr p + +{- Note [Ambiguous syntactic categories] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +There are places in the grammar where we do not know whether we are parsing an +expression or a pattern without unlimited lookahead (which we do not have in +'happy'): + +View patterns: + + f (Con a b ) = ... -- 'Con a b' is a pattern + f (Con a b -> x) = ... -- 'Con a b' is an expression + +do-notation: + + do { Con a b <- x } -- 'Con a b' is a pattern + do { Con a b } -- 'Con a b' is an expression + +Guards: + + x | True <- p && q = ... -- 'True' is a pattern + x | True = ... -- 'True' is an expression + +Top-level value/function declarations (FunBind/PatBind): + + f ! a -- TH splice + f ! a = ... -- function declaration + + Until we encounter the = sign, we don't know if it's a top-level + TemplateHaskell splice where ! is used, or if it's a function declaration + where ! is bound. + +There are also places in the grammar where we do not know whether we are +parsing an expression or a command: + + proc x -> do { (stuff) -< x } -- 'stuff' is an expression + proc x -> do { (stuff) } -- 'stuff' is a command + + Until we encounter arrow syntax (-<) we don't know whether to parse 'stuff' + as an expression or a command. + +In fact, do-notation is subject to both ambiguities: + + proc x -> do { (stuff) -< x } -- 'stuff' is an expression + proc x -> do { (stuff) <- f -< x } -- 'stuff' is a pattern + proc x -> do { (stuff) } -- 'stuff' is a command + +There are many possible solutions to this problem. For an overview of the ones +we decided against, see Note [Resolving parsing ambiguities: non-taken alternatives] + +The solution that keeps basic definitions (such as HsExpr) clean, keeps the +concerns local to the parser, and does not require duplication of hsSyn types, +or an extra pass over the entire AST, is to parse into an overloaded +parser-validator (a so-called tagless final encoding): + + class DisambECP b where ... + instance DisambECP (HsCmd GhcPs) where ... + instance DisambECP (HsExp GhcPs) where ... + instance DisambECP (PatBuilder GhcPs) where ... + +The 'DisambECP' class contains functions to build and validate 'b'. For example, +to add parentheses we have: + + mkHsParPV :: DisambECP b => SrcSpan -> Located b -> PV (Located b) + +'mkHsParPV' will wrap the inner value in HsCmdPar for commands, HsPar for +expressions, and 'PatBuilderPar' for patterns (later transformed into ParPat, +see Note [PatBuilder]). + +Consider the 'alts' production used to parse case-of alternatives: + + alts :: { Located ([AddAnn],[LMatch GhcPs (LHsExpr GhcPs)]) } + : alts1 { sL1 $1 (fst $ unLoc $1,snd $ unLoc $1) } + | ';' alts { sLL $1 $> ((mj AnnSemi $1:(fst $ unLoc $2)),snd $ unLoc $2) } + +We abstract over LHsExpr GhcPs, and it becomes: + + alts :: { forall b. DisambECP b => PV (Located ([AddAnn],[LMatch GhcPs (Located b)])) } + : alts1 { $1 >>= \ $1 -> + return $ sL1 $1 (fst $ unLoc $1,snd $ unLoc $1) } + | ';' alts { $2 >>= \ $2 -> + return $ sLL $1 $> ((mj AnnSemi $1:(fst $ unLoc $2)),snd $ unLoc $2) } + +Compared to the initial definition, the added bits are: + + forall b. DisambECP b => PV ( ... ) -- in the type signature + $1 >>= \ $1 -> return $ -- in one reduction rule + $2 >>= \ $2 -> return $ -- in another reduction rule + +The overhead is constant relative to the size of the rest of the reduction +rule, so this approach scales well to large parser productions. + +Note that we write ($1 >>= \ $1 -> ...), so the second $1 is in a binding +position and shadows the previous $1. We can do this because internally +'happy' desugars $n to happy_var_n, and the rationale behind this idiom +is to be able to write (sLL $1 $>) later on. The alternative would be to +write this as ($1 >>= \ fresh_name -> ...), but then we couldn't refer +to the last fresh name as $>. +-} + + +{- Note [Resolving parsing ambiguities: non-taken alternatives] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Alternative I, extra constructors in GHC.Hs.Expr +------------------------------------------------ +We could add extra constructors to HsExpr to represent command-specific and +pattern-specific syntactic constructs. Under this scheme, we parse patterns +and commands as expressions and rejig later. This is what GHC used to do, and +it polluted 'HsExpr' with irrelevant constructors: + + * for commands: 'HsArrForm', 'HsArrApp' + * for patterns: 'EWildPat', 'EAsPat', 'EViewPat', 'ELazyPat' + +(As of now, we still do that for patterns, but we plan to fix it). + +There are several issues with this: + + * The implementation details of parsing are leaking into hsSyn definitions. + + * Code that uses HsExpr has to panic on these impossible-after-parsing cases. + + * HsExpr is arbitrarily selected as the extension basis. Why not extend + HsCmd or HsPat with extra constructors instead? + +Alternative II, extra constructors in GHC.Hs.Expr for GhcPs +----------------------------------------------------------- +We could address some of the problems with Alternative I by using Trees That +Grow and extending HsExpr only in the GhcPs pass. However, GhcPs corresponds to +the output of parsing, not to its intermediate results, so we wouldn't want +them there either. + +Alternative III, extra constructors in GHC.Hs.Expr for GhcPrePs +--------------------------------------------------------------- +We could introduce a new pass, GhcPrePs, to keep GhcPs pristine. +Unfortunately, creating a new pass would significantly bloat conversion code +and slow down the compiler by adding another linear-time pass over the entire +AST. For example, in order to build HsExpr GhcPrePs, we would need to build +HsLocalBinds GhcPrePs (as part of HsLet), and we never want HsLocalBinds +GhcPrePs. + + +Alternative IV, sum type and bottom-up data flow +------------------------------------------------ +Expressions and commands are disjoint. There are no user inputs that could be +interpreted as either an expression or a command depending on outer context: + + 5 -- definitely an expression + x -< y -- definitely a command + +Even though we have both 'HsLam' and 'HsCmdLam', we can look at +the body to disambiguate: + + \p -> 5 -- definitely an expression + \p -> x -< y -- definitely a command + +This means we could use a bottom-up flow of information to determine +whether we are parsing an expression or a command, using a sum type +for intermediate results: + + Either (LHsExpr GhcPs) (LHsCmd GhcPs) + +There are two problems with this: + + * We cannot handle the ambiguity between expressions and + patterns, which are not disjoint. + + * Bottom-up flow of information leads to poor error messages. Consider + + if ... then 5 else (x -< y) + + Do we report that '5' is not a valid command or that (x -< y) is not a + valid expression? It depends on whether we want the entire node to be + 'HsIf' or 'HsCmdIf', and this information flows top-down, from the + surrounding parsing context (are we in 'proc'?) + +Alternative V, backtracking with parser combinators +--------------------------------------------------- +One might think we could sidestep the issue entirely by using a backtracking +parser and doing something along the lines of (try pExpr <|> pPat). + +Turns out, this wouldn't work very well, as there can be patterns inside +expressions (e.g. via 'case', 'let', 'do') and expressions inside patterns +(e.g. view patterns). To handle this, we would need to backtrack while +backtracking, and unbound levels of backtracking lead to very fragile +performance. + +Alternative VI, an intermediate data type +----------------------------------------- +There are common syntactic elements of expressions, commands, and patterns +(e.g. all of them must have balanced parentheses), and we can capture this +common structure in an intermediate data type, Frame: + +data Frame + = FrameVar RdrName + -- ^ Identifier: Just, map, BS.length + | FrameTuple [LTupArgFrame] Boxity + -- ^ Tuple (section): (a,b) (a,b,c) (a,,) (,a,) + | FrameTySig LFrame (LHsSigWcType GhcPs) + -- ^ Type signature: x :: ty + | FramePar (SrcSpan, SrcSpan) LFrame + -- ^ Parentheses + | FrameIf LFrame LFrame LFrame + -- ^ If-expression: if p then x else y + | FrameCase LFrame [LFrameMatch] + -- ^ Case-expression: case x of { p1 -> e1; p2 -> e2 } + | FrameDo (HsStmtContext GhcRn) [LFrameStmt] + -- ^ Do-expression: do { s1; a <- s2; s3 } + ... + | FrameExpr (HsExpr GhcPs) -- unambiguously an expression + | FramePat (HsPat GhcPs) -- unambiguously a pattern + | FrameCommand (HsCmd GhcPs) -- unambiguously a command + +To determine which constructors 'Frame' needs to have, we take the union of +intersections between HsExpr, HsCmd, and HsPat. + +The intersection between HsPat and HsExpr: + + HsPat = VarPat | TuplePat | SigPat | ParPat | ... + HsExpr = HsVar | ExplicitTuple | ExprWithTySig | HsPar | ... + ------------------------------------------------------------------- + Frame = FrameVar | FrameTuple | FrameTySig | FramePar | ... + +The intersection between HsCmd and HsExpr: + + HsCmd = HsCmdIf | HsCmdCase | HsCmdDo | HsCmdPar + HsExpr = HsIf | HsCase | HsDo | HsPar + ------------------------------------------------ + Frame = FrameIf | FrameCase | FrameDo | FramePar + +The intersection between HsCmd and HsPat: + + HsPat = ParPat | ... + HsCmd = HsCmdPar | ... + ----------------------- + Frame = FramePar | ... + +Take the union of each intersection and this yields the final 'Frame' data +type. The problem with this approach is that we end up duplicating a good +portion of hsSyn: + + Frame for HsExpr, HsPat, HsCmd + TupArgFrame for HsTupArg + FrameMatch for Match + FrameStmt for StmtLR + FrameGRHS for GRHS + FrameGRHSs for GRHSs + ... + +Alternative VII, a product type +------------------------------- +We could avoid the intermediate representation of Alternative VI by parsing +into a product of interpretations directly: + + -- See Note [Parser-Validator] + type ExpCmdPat = ( PV (LHsExpr GhcPs) + , PV (LHsCmd GhcPs) + , PV (LHsPat GhcPs) ) + +This means that in positions where we do not know whether to produce +expression, a pattern, or a command, we instead produce a parser-validator for +each possible option. + +Then, as soon as we have parsed far enough to resolve the ambiguity, we pick +the appropriate component of the product, discarding the rest: + + checkExpOf3 (e, _, _) = e -- interpret as an expression + checkCmdOf3 (_, c, _) = c -- interpret as a command + checkPatOf3 (_, _, p) = p -- interpret as a pattern + +We can easily define ambiguities between arbitrary subsets of interpretations. +For example, when we know ahead of type that only an expression or a command is +possible, but not a pattern, we can use a smaller type: + + -- See Note [Parser-Validator] + type ExpCmd = (PV (LHsExpr GhcPs), PV (LHsCmd GhcPs)) + + checkExpOf2 (e, _) = e -- interpret as an expression + checkCmdOf2 (_, c) = c -- interpret as a command + +However, there is a slight problem with this approach, namely code duplication +in parser productions. Consider the 'alts' production used to parse case-of +alternatives: + + alts :: { Located ([AddAnn],[LMatch GhcPs (LHsExpr GhcPs)]) } + : alts1 { sL1 $1 (fst $ unLoc $1,snd $ unLoc $1) } + | ';' alts { sLL $1 $> ((mj AnnSemi $1:(fst $ unLoc $2)),snd $ unLoc $2) } + +Under the new scheme, we have to completely duplicate its type signature and +each reduction rule: + + alts :: { ( PV (Located ([AddAnn],[LMatch GhcPs (LHsExpr GhcPs)])) -- as an expression + , PV (Located ([AddAnn],[LMatch GhcPs (LHsCmd GhcPs)])) -- as a command + ) } + : alts1 + { ( checkExpOf2 $1 >>= \ $1 -> + return $ sL1 $1 (fst $ unLoc $1,snd $ unLoc $1) + , checkCmdOf2 $1 >>= \ $1 -> + return $ sL1 $1 (fst $ unLoc $1,snd $ unLoc $1) + ) } + | ';' alts + { ( checkExpOf2 $2 >>= \ $2 -> + return $ sLL $1 $> ((mj AnnSemi $1:(fst $ unLoc $2)),snd $ unLoc $2) + , checkCmdOf2 $2 >>= \ $2 -> + return $ sLL $1 $> ((mj AnnSemi $1:(fst $ unLoc $2)),snd $ unLoc $2) + ) } + +And the same goes for other productions: 'altslist', 'alts1', 'alt', 'alt_rhs', +'ralt', 'gdpats', 'gdpat', 'exp', ... and so on. That is a lot of code! + +Alternative VIII, a function from a GADT +---------------------------------------- +We could avoid code duplication of the Alternative VII by representing the product +as a function from a GADT: + + data ExpCmdG b where + ExpG :: ExpCmdG HsExpr + CmdG :: ExpCmdG HsCmd + + type ExpCmd = forall b. ExpCmdG b -> PV (Located (b GhcPs)) + + checkExp :: ExpCmd -> PV (LHsExpr GhcPs) + checkCmd :: ExpCmd -> PV (LHsCmd GhcPs) + checkExp f = f ExpG -- interpret as an expression + checkCmd f = f CmdG -- interpret as a command + +Consider the 'alts' production used to parse case-of alternatives: + + alts :: { Located ([AddAnn],[LMatch GhcPs (LHsExpr GhcPs)]) } + : alts1 { sL1 $1 (fst $ unLoc $1,snd $ unLoc $1) } + | ';' alts { sLL $1 $> ((mj AnnSemi $1:(fst $ unLoc $2)),snd $ unLoc $2) } + +We abstract over LHsExpr, and it becomes: + + alts :: { forall b. ExpCmdG b -> PV (Located ([AddAnn],[LMatch GhcPs (Located (b GhcPs))])) } + : alts1 + { \tag -> $1 tag >>= \ $1 -> + return $ sL1 $1 (fst $ unLoc $1,snd $ unLoc $1) } + | ';' alts + { \tag -> $2 tag >>= \ $2 -> + return $ sLL $1 $> ((mj AnnSemi $1:(fst $ unLoc $2)),snd $ unLoc $2) } + +Note that 'ExpCmdG' is a singleton type, the value is completely +determined by the type: + + when (b~HsExpr), tag = ExpG + when (b~HsCmd), tag = CmdG + +This is a clear indication that we can use a class to pass this value behind +the scenes: + + class ExpCmdI b where expCmdG :: ExpCmdG b + instance ExpCmdI HsExpr where expCmdG = ExpG + instance ExpCmdI HsCmd where expCmdG = CmdG + +And now the 'alts' production is simplified, as we no longer need to +thread 'tag' explicitly: + + alts :: { forall b. ExpCmdI b => PV (Located ([AddAnn],[LMatch GhcPs (Located (b GhcPs))])) } + : alts1 { $1 >>= \ $1 -> + return $ sL1 $1 (fst $ unLoc $1,snd $ unLoc $1) } + | ';' alts { $2 >>= \ $2 -> + return $ sLL $1 $> ((mj AnnSemi $1:(fst $ unLoc $2)),snd $ unLoc $2) } + +This encoding works well enough, but introduces an extra GADT unlike the +tagless final encoding, and there's no need for this complexity. + +-} + +{- Note [PatBuilder] +~~~~~~~~~~~~~~~~~~~~ +Unlike HsExpr or HsCmd, the Pat type cannot accommodate all intermediate forms, +so we introduce the notion of a PatBuilder. + +Consider a pattern like this: + + Con a b c + +We parse arguments to "Con" one at a time in the fexp aexp parser production, +building the result with mkHsAppPV, so the intermediate forms are: + + 1. Con + 2. Con a + 3. Con a b + 4. Con a b c + +In 'HsExpr', we have 'HsApp', so the intermediate forms are represented like +this (pseudocode): + + 1. "Con" + 2. HsApp "Con" "a" + 3. HsApp (HsApp "Con" "a") "b" + 3. HsApp (HsApp (HsApp "Con" "a") "b") "c" + +Similarly, in 'HsCmd' we have 'HsCmdApp'. In 'Pat', however, what we have +instead is 'ConPatIn', which is very awkward to modify and thus unsuitable for +the intermediate forms. + +We also need an intermediate representation to postpone disambiguation between +FunBind and PatBind. Consider: + + a `Con` b = ... + a `fun` b = ... + +How do we know that (a `Con` b) is a PatBind but (a `fun` b) is a FunBind? We +learn this by inspecting an intermediate representation in 'isFunLhs' and +seeing that 'Con' is a data constructor but 'f' is not. We need an intermediate +representation capable of representing both a FunBind and a PatBind, so Pat is +insufficient. + +PatBuilder is an extension of Pat that is capable of representing intermediate +parsing results for patterns and function bindings: + + data PatBuilder p + = PatBuilderPat (Pat p) + | PatBuilderApp (Located (PatBuilder p)) (Located (PatBuilder p)) + | PatBuilderOpApp (Located (PatBuilder p)) (Located RdrName) (Located (PatBuilder p)) + ... + +It can represent any pattern via 'PatBuilderPat', but it also has a variety of +other constructors which were added by following a simple principle: we never +pattern match on the pattern stored inside 'PatBuilderPat'. +-} + +--------------------------------------------------------------------------- +-- Miscellaneous utilities + +-- | Check if a fixity is valid. We support bypassing the usual bound checks +-- for some special operators. +checkPrecP + :: Located (SourceText,Int) -- ^ precedence + -> Located (OrdList (Located RdrName)) -- ^ operators + -> P () +checkPrecP (L l (_,i)) (L _ ol) + | 0 <= i, i <= maxPrecedence = pure () + | all specialOp ol = pure () + | otherwise = addFatalError l (text ("Precedence out of range: " ++ show i)) + where + specialOp op = unLoc op `elem` [ eqTyCon_RDR + , getRdrName funTyCon ] + +mkRecConstrOrUpdate + :: LHsExpr GhcPs + -> SrcSpan + -> ([LHsRecField GhcPs (LHsExpr GhcPs)], Maybe SrcSpan) + -> PV (HsExpr GhcPs) + +mkRecConstrOrUpdate (L l (HsVar _ (L _ c))) _ (fs,dd) + | isRdrDataCon c + = return (mkRdrRecordCon (L l c) (mk_rec_fields fs dd)) +mkRecConstrOrUpdate exp _ (fs,dd) + | Just dd_loc <- dd = addFatalError dd_loc (text "You cannot use `..' in a record update") + | otherwise = return (mkRdrRecordUpd exp (map (fmap mk_rec_upd_field) fs)) + +mkRdrRecordUpd :: LHsExpr GhcPs -> [LHsRecUpdField GhcPs] -> HsExpr GhcPs +mkRdrRecordUpd exp flds + = RecordUpd { rupd_ext = noExtField + , rupd_expr = exp + , rupd_flds = flds } + +mkRdrRecordCon :: Located RdrName -> HsRecordBinds GhcPs -> HsExpr GhcPs +mkRdrRecordCon con flds + = RecordCon { rcon_ext = noExtField, rcon_con_name = con, rcon_flds = flds } + +mk_rec_fields :: [LHsRecField id arg] -> Maybe SrcSpan -> HsRecFields id arg +mk_rec_fields fs Nothing = HsRecFields { rec_flds = fs, rec_dotdot = Nothing } +mk_rec_fields fs (Just s) = HsRecFields { rec_flds = fs + , rec_dotdot = Just (L s (length fs)) } + +mk_rec_upd_field :: HsRecField GhcPs (LHsExpr GhcPs) -> HsRecUpdField GhcPs +mk_rec_upd_field (HsRecField (L loc (FieldOcc _ rdr)) arg pun) + = HsRecField (L loc (Unambiguous noExtField rdr)) arg pun + +mkInlinePragma :: SourceText -> (InlineSpec, RuleMatchInfo) -> Maybe Activation + -> InlinePragma +-- The (Maybe Activation) is because the user can omit +-- the activation spec (and usually does) +mkInlinePragma src (inl, match_info) mb_act + = InlinePragma { inl_src = src -- Note [Pragma source text] in GHC.Types.Basic + , inl_inline = inl + , inl_sat = Nothing + , inl_act = act + , inl_rule = match_info } + where + act = case mb_act of + Just act -> act + Nothing -> -- No phase specified + case inl of + NoInline -> NeverActive + _other -> AlwaysActive + +----------------------------------------------------------------------------- +-- utilities for foreign declarations + +-- construct a foreign import declaration +-- +mkImport :: Located CCallConv + -> Located Safety + -> (Located StringLiteral, Located RdrName, LHsSigType GhcPs) + -> P (HsDecl GhcPs) +mkImport cconv safety (L loc (StringLiteral esrc entity), v, ty) = + case unLoc cconv of + CCallConv -> mkCImport + CApiConv -> mkCImport + StdCallConv -> mkCImport + PrimCallConv -> mkOtherImport + JavaScriptCallConv -> mkOtherImport + where + -- Parse a C-like entity string of the following form: + -- "[static] [chname] [&] [cid]" | "dynamic" | "wrapper" + -- If 'cid' is missing, the function name 'v' is used instead as symbol + -- name (cf section 8.5.1 in Haskell 2010 report). + mkCImport = do + let e = unpackFS entity + case parseCImport cconv safety (mkExtName (unLoc v)) e (L loc esrc) of + Nothing -> addFatalError loc (text "Malformed entity string") + Just importSpec -> returnSpec importSpec + + -- currently, all the other import conventions only support a symbol name in + -- the entity string. If it is missing, we use the function name instead. + mkOtherImport = returnSpec importSpec + where + entity' = if nullFS entity + then mkExtName (unLoc v) + else entity + funcTarget = CFunction (StaticTarget esrc entity' Nothing True) + importSpec = CImport cconv safety Nothing funcTarget (L loc esrc) + + returnSpec spec = return $ ForD noExtField $ ForeignImport + { fd_i_ext = noExtField + , fd_name = v + , fd_sig_ty = ty + , fd_fi = spec + } + + + +-- the string "foo" is ambiguous: either a header or a C identifier. The +-- C identifier case comes first in the alternatives below, so we pick +-- that one. +parseCImport :: Located CCallConv -> Located Safety -> FastString -> String + -> Located SourceText + -> Maybe ForeignImport +parseCImport cconv safety nm str sourceText = + listToMaybe $ map fst $ filter (null.snd) $ + readP_to_S parse str + where + parse = do + skipSpaces + r <- choice [ + string "dynamic" >> return (mk Nothing (CFunction DynamicTarget)), + string "wrapper" >> return (mk Nothing CWrapper), + do optional (token "static" >> skipSpaces) + ((mk Nothing <$> cimp nm) +++ + (do h <- munch1 hdr_char + skipSpaces + mk (Just (Header (SourceText h) (mkFastString h))) + <$> cimp nm)) + ] + skipSpaces + return r + + token str = do _ <- string str + toks <- look + case toks of + c : _ + | id_char c -> pfail + _ -> return () + + mk h n = CImport cconv safety h n sourceText + + hdr_char c = not (isSpace c) + -- header files are filenames, which can contain + -- pretty much any char (depending on the platform), + -- so just accept any non-space character + id_first_char c = isAlpha c || c == '_' + id_char c = isAlphaNum c || c == '_' + + cimp nm = (ReadP.char '&' >> skipSpaces >> CLabel <$> cid) + +++ (do isFun <- case unLoc cconv of + CApiConv -> + option True + (do token "value" + skipSpaces + return False) + _ -> return True + cid' <- cid + return (CFunction (StaticTarget NoSourceText cid' + Nothing isFun))) + where + cid = return nm +++ + (do c <- satisfy id_first_char + cs <- many (satisfy id_char) + return (mkFastString (c:cs))) + + +-- construct a foreign export declaration +-- +mkExport :: Located CCallConv + -> (Located StringLiteral, Located RdrName, LHsSigType GhcPs) + -> P (HsDecl GhcPs) +mkExport (L lc cconv) (L le (StringLiteral esrc entity), v, ty) + = return $ ForD noExtField $ + ForeignExport { fd_e_ext = noExtField, fd_name = v, fd_sig_ty = ty + , fd_fe = CExport (L lc (CExportStatic esrc entity' cconv)) + (L le esrc) } + where + entity' | nullFS entity = mkExtName (unLoc v) + | otherwise = entity + +-- Supplying the ext_name in a foreign decl is optional; if it +-- isn't there, the Haskell name is assumed. Note that no transformation +-- of the Haskell name is then performed, so if you foreign export (++), +-- it's external name will be "++". Too bad; it's important because we don't +-- want z-encoding (e.g. names with z's in them shouldn't be doubled) +-- +mkExtName :: RdrName -> CLabelString +mkExtName rdrNm = mkFastString (occNameString (rdrNameOcc rdrNm)) + +-------------------------------------------------------------------------------- +-- Help with module system imports/exports + +data ImpExpSubSpec = ImpExpAbs + | ImpExpAll + | ImpExpList [Located ImpExpQcSpec] + | ImpExpAllWith [Located ImpExpQcSpec] + +data ImpExpQcSpec = ImpExpQcName (Located RdrName) + | ImpExpQcType (Located RdrName) + | ImpExpQcWildcard + +mkModuleImpExp :: Located ImpExpQcSpec -> ImpExpSubSpec -> P (IE GhcPs) +mkModuleImpExp (L l specname) subs = + case subs of + ImpExpAbs + | isVarNameSpace (rdrNameSpace name) + -> return $ IEVar noExtField (L l (ieNameFromSpec specname)) + | otherwise -> IEThingAbs noExtField . L l <$> nameT + ImpExpAll -> IEThingAll noExtField . L l <$> nameT + ImpExpList xs -> + (\newName -> IEThingWith noExtField (L l newName) + NoIEWildcard (wrapped xs) []) <$> nameT + ImpExpAllWith xs -> + do allowed <- getBit PatternSynonymsBit + if allowed + then + let withs = map unLoc xs + pos = maybe NoIEWildcard IEWildcard + (findIndex isImpExpQcWildcard withs) + ies = wrapped $ filter (not . isImpExpQcWildcard . unLoc) xs + in (\newName + -> IEThingWith noExtField (L l newName) pos ies []) + <$> nameT + else addFatalError l + (text "Illegal export form (use PatternSynonyms to enable)") + where + name = ieNameVal specname + nameT = + if isVarNameSpace (rdrNameSpace name) + then addFatalError l + (text "Expecting a type constructor but found a variable," + <+> quotes (ppr name) <> text "." + $$ if isSymOcc $ rdrNameOcc name + then text "If" <+> quotes (ppr name) + <+> text "is a type constructor" + <+> text "then enable ExplicitNamespaces and use the 'type' keyword." + else empty) + else return $ ieNameFromSpec specname + + ieNameVal (ImpExpQcName ln) = unLoc ln + ieNameVal (ImpExpQcType ln) = unLoc ln + ieNameVal (ImpExpQcWildcard) = panic "ieNameVal got wildcard" + + ieNameFromSpec (ImpExpQcName ln) = IEName ln + ieNameFromSpec (ImpExpQcType ln) = IEType ln + ieNameFromSpec (ImpExpQcWildcard) = panic "ieName got wildcard" + + wrapped = map (mapLoc ieNameFromSpec) + +mkTypeImpExp :: Located RdrName -- TcCls or Var name space + -> P (Located RdrName) +mkTypeImpExp name = + do allowed <- getBit ExplicitNamespacesBit + unless allowed $ addError (getLoc name) $ + text "Illegal keyword 'type' (use ExplicitNamespaces to enable)" + return (fmap (`setRdrNameSpace` tcClsName) name) + +checkImportSpec :: Located [LIE GhcPs] -> P (Located [LIE GhcPs]) +checkImportSpec ie@(L _ specs) = + case [l | (L l (IEThingWith _ _ (IEWildcard _) _ _)) <- specs] of + [] -> return ie + (l:_) -> importSpecError l + where + importSpecError l = + addFatalError l + (text "Illegal import form, this syntax can only be used to bundle" + $+$ text "pattern synonyms with types in module exports.") + +-- In the correct order +mkImpExpSubSpec :: [Located ImpExpQcSpec] -> P ([AddAnn], ImpExpSubSpec) +mkImpExpSubSpec [] = return ([], ImpExpList []) +mkImpExpSubSpec [L _ ImpExpQcWildcard] = + return ([], ImpExpAll) +mkImpExpSubSpec xs = + if (any (isImpExpQcWildcard . unLoc) xs) + then return $ ([], ImpExpAllWith xs) + else return $ ([], ImpExpList xs) + +isImpExpQcWildcard :: ImpExpQcSpec -> Bool +isImpExpQcWildcard ImpExpQcWildcard = True +isImpExpQcWildcard _ = False + +----------------------------------------------------------------------------- +-- Warnings and failures + +warnPrepositiveQualifiedModule :: SrcSpan -> P () +warnPrepositiveQualifiedModule span = + addWarning Opt_WarnPrepositiveQualifiedModule span msg + where + msg = text "Found" <+> quotes (text "qualified") + <+> text "in prepositive position" + $$ text "Suggested fix: place " <+> quotes (text "qualified") + <+> text "after the module name instead." + +failOpNotEnabledImportQualifiedPost :: SrcSpan -> P () +failOpNotEnabledImportQualifiedPost loc = addError loc msg + where + msg = text "Found" <+> quotes (text "qualified") + <+> text "in postpositive position. " + $$ text "To allow this, enable language extension 'ImportQualifiedPost'" + +failOpImportQualifiedTwice :: SrcSpan -> P () +failOpImportQualifiedTwice loc = addError loc msg + where + msg = text "Multiple occurrences of 'qualified'" + +warnStarIsType :: SrcSpan -> P () +warnStarIsType span = addWarning Opt_WarnStarIsType span msg + where + msg = text "Using" <+> quotes (text "*") + <+> text "(or its Unicode variant) to mean" + <+> quotes (text "Data.Kind.Type") + $$ text "relies on the StarIsType extension, which will become" + $$ text "deprecated in the future." + $$ text "Suggested fix: use" <+> quotes (text "Type") + <+> text "from" <+> quotes (text "Data.Kind") <+> text "instead." + +warnStarBndr :: SrcSpan -> P () +warnStarBndr span = addWarning Opt_WarnStarBinder span msg + where + msg = text "Found binding occurrence of" <+> quotes (text "*") + <+> text "yet StarIsType is enabled." + $$ text "NB. To use (or export) this operator in" + <+> text "modules with StarIsType," + $$ text " including the definition module, you must qualify it." + +failOpFewArgs :: Located RdrName -> P a +failOpFewArgs (L loc op) = + do { star_is_type <- getBit StarIsTypeBit + ; let msg = too_few $$ starInfo star_is_type op + ; addFatalError loc msg } + where + too_few = text "Operator applied to too few arguments:" <+> ppr op + +failOpDocPrev :: SrcSpan -> P a +failOpDocPrev loc = addFatalError loc msg + where + msg = text "Unexpected documentation comment." + +----------------------------------------------------------------------------- +-- Misc utils + +data PV_Context = + PV_Context + { pv_options :: ParserFlags + , pv_hint :: SDoc -- See Note [Parser-Validator Hint] + } + +data PV_Accum = + PV_Accum + { pv_messages :: DynFlags -> Messages + , pv_annotations :: [(ApiAnnKey,[RealSrcSpan])] + , pv_comment_q :: [RealLocated AnnotationComment] + , pv_annotations_comments :: [(RealSrcSpan,[RealLocated AnnotationComment])] + } + +data PV_Result a = PV_Ok PV_Accum a | PV_Failed PV_Accum + +-- See Note [Parser-Validator] +newtype PV a = PV { unPV :: PV_Context -> PV_Accum -> PV_Result a } + +instance Functor PV where + fmap = liftM + +instance Applicative PV where + pure a = a `seq` PV (\_ acc -> PV_Ok acc a) + (<*>) = ap + +instance Monad PV where + m >>= f = PV $ \ctx acc -> + case unPV m ctx acc of + PV_Ok acc' a -> unPV (f a) ctx acc' + PV_Failed acc' -> PV_Failed acc' + +runPV :: PV a -> P a +runPV = runPV_msg empty + +runPV_msg :: SDoc -> PV a -> P a +runPV_msg msg m = + P $ \s -> + let + pv_ctx = PV_Context + { pv_options = options s + , pv_hint = msg } + pv_acc = PV_Accum + { pv_messages = messages s + , pv_annotations = annotations s + , pv_comment_q = comment_q s + , pv_annotations_comments = annotations_comments s } + mkPState acc' = + s { messages = pv_messages acc' + , annotations = pv_annotations acc' + , comment_q = pv_comment_q acc' + , annotations_comments = pv_annotations_comments acc' } + in + case unPV m pv_ctx pv_acc of + PV_Ok acc' a -> POk (mkPState acc') a + PV_Failed acc' -> PFailed (mkPState acc') + +localPV_msg :: (SDoc -> SDoc) -> PV a -> PV a +localPV_msg f m = + let modifyHint ctx = ctx{pv_hint = f (pv_hint ctx)} in + PV (\ctx acc -> unPV m (modifyHint ctx) acc) + +instance MonadP PV where + addError srcspan msg = + PV $ \ctx acc@PV_Accum{pv_messages=m} -> + let msg' = msg $$ pv_hint ctx in + PV_Ok acc{pv_messages=appendError srcspan msg' m} () + addWarning option srcspan warning = + PV $ \PV_Context{pv_options=o} acc@PV_Accum{pv_messages=m} -> + PV_Ok acc{pv_messages=appendWarning o option srcspan warning m} () + addFatalError srcspan msg = + addError srcspan msg >> PV (const PV_Failed) + getBit ext = + PV $ \ctx acc -> + let b = ext `xtest` pExtsBitmap (pv_options ctx) in + PV_Ok acc $! b + addAnnotation (RealSrcSpan l _) a (RealSrcSpan v _) = + PV $ \_ acc -> + let + (comment_q', new_ann_comments) = allocateComments l (pv_comment_q acc) + annotations_comments' = new_ann_comments ++ pv_annotations_comments acc + annotations' = ((l,a), [v]) : pv_annotations acc + acc' = acc + { pv_annotations = annotations' + , pv_comment_q = comment_q' + , pv_annotations_comments = annotations_comments' } + in + PV_Ok acc' () + addAnnotation _ _ _ = return () + +{- Note [Parser-Validator] +~~~~~~~~~~~~~~~~~~~~~~~~~~ + +When resolving ambiguities, we need to postpone failure to make a choice later. +For example, if we have ambiguity between some A and B, our parser could be + + abParser :: P (Maybe A, Maybe B) + +This way we can represent four possible outcomes of parsing: + + (Just a, Nothing) -- definitely A + (Nothing, Just b) -- definitely B + (Just a, Just b) -- either A or B + (Nothing, Nothing) -- neither A nor B + +However, if we want to report informative parse errors, accumulate warnings, +and add API annotations, we are better off using 'P' instead of 'Maybe': + + abParser :: P (P A, P B) + +So we have an outer layer of P that consumes the input and builds the inner +layer, which validates the input. + +For clarity, we introduce the notion of a parser-validator: a parser that does +not consume any input, but may fail or use other effects. Thus we have: + + abParser :: P (PV A, PV B) + +-} + +{- Note [Parser-Validator Hint] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +A PV computation is parametrized by a hint for error messages, which can be set +depending on validation context. We use this in checkPattern to fix #984. + +Consider this example, where the user has forgotten a 'do': + + f _ = do + x <- computation + case () of + _ -> + result <- computation + case () of () -> undefined + +GHC parses it as follows: + + f _ = do + x <- computation + (case () of + _ -> + result) <- computation + case () of () -> undefined + +Note that this fragment is parsed as a pattern: + + case () of + _ -> + result + +We attempt to detect such cases and add a hint to the error messages: + + T984.hs:6:9: + Parse error in pattern: case () of { _ -> result } + Possibly caused by a missing 'do'? + +The "Possibly caused by a missing 'do'?" suggestion is the hint that is passed +as the 'pv_hint' field 'PV_Context'. When validating in a context other than +'bindpat' (a pattern to the left of <-), we set the hint to 'empty' and it has +no effect on the error messages. + +-} + +-- | Hint about bang patterns, assuming @BangPatterns@ is off. +hintBangPat :: SrcSpan -> Pat GhcPs -> PV () +hintBangPat span e = do + bang_on <- getBit BangPatBit + unless bang_on $ + addError span + (text "Illegal bang-pattern (use BangPatterns):" $$ ppr e) + +data SumOrTuple b + = Sum ConTag Arity (Located b) + | Tuple [Located (Maybe (Located b))] + +pprSumOrTuple :: Outputable b => Boxity -> SumOrTuple b -> SDoc +pprSumOrTuple boxity = \case + Sum alt arity e -> + parOpen <+> ppr_bars (alt - 1) <+> ppr e <+> ppr_bars (arity - alt) + <+> parClose + Tuple xs -> + parOpen <> (fcat . punctuate comma $ map (maybe empty ppr . unLoc) xs) + <> parClose + where + ppr_bars n = hsep (replicate n (Outputable.char '|')) + (parOpen, parClose) = + case boxity of + Boxed -> (text "(", text ")") + Unboxed -> (text "(#", text "#)") + +mkSumOrTupleExpr :: SrcSpan -> Boxity -> SumOrTuple (HsExpr GhcPs) -> PV (LHsExpr GhcPs) + +-- Tuple +mkSumOrTupleExpr l boxity (Tuple es) = + return $ L l (ExplicitTuple noExtField (map toTupArg es) boxity) + where + toTupArg :: Located (Maybe (LHsExpr GhcPs)) -> LHsTupArg GhcPs + toTupArg = mapLoc (maybe missingTupArg (Present noExtField)) + +-- Sum +mkSumOrTupleExpr l Unboxed (Sum alt arity e) = + return $ L l (ExplicitSum noExtField alt arity e) +mkSumOrTupleExpr l Boxed a@Sum{} = + addFatalError l (hang (text "Boxed sums not supported:") 2 + (pprSumOrTuple Boxed a)) + +mkSumOrTuplePat :: SrcSpan -> Boxity -> SumOrTuple (PatBuilder GhcPs) -> PV (Located (PatBuilder GhcPs)) + +-- Tuple +mkSumOrTuplePat l boxity (Tuple ps) = do + ps' <- traverse toTupPat ps + return $ L l (PatBuilderPat (TuplePat noExtField ps' boxity)) + where + toTupPat :: Located (Maybe (Located (PatBuilder GhcPs))) -> PV (LPat GhcPs) + toTupPat (L l p) = case p of + Nothing -> addFatalError l (text "Tuple section in pattern context") + Just p' -> checkLPat p' + +-- Sum +mkSumOrTuplePat l Unboxed (Sum alt arity p) = do + p' <- checkLPat p + return $ L l (PatBuilderPat (SumPat noExtField p' alt arity)) +mkSumOrTuplePat l Boxed a@Sum{} = + addFatalError l (hang (text "Boxed sums not supported:") 2 + (pprSumOrTuple Boxed a)) + +mkLHsOpTy :: LHsType GhcPs -> Located RdrName -> LHsType GhcPs -> LHsType GhcPs +mkLHsOpTy x op y = + let loc = getLoc x `combineSrcSpans` getLoc op `combineSrcSpans` getLoc y + in L loc (mkHsOpTy x op y) + +mkLHsDocTy :: LHsType GhcPs -> LHsDocString -> LHsType GhcPs +mkLHsDocTy t doc = + let loc = getLoc t `combineSrcSpans` getLoc doc + in L loc (HsDocTy noExtField t doc) + +mkLHsDocTyMaybe :: LHsType GhcPs -> Maybe LHsDocString -> LHsType GhcPs +mkLHsDocTyMaybe t = maybe t (mkLHsDocTy t) + +----------------------------------------------------------------------------- +-- Token symbols + +starSym :: Bool -> String +starSym True = "★" +starSym False = "*" + +forallSym :: Bool -> String +forallSym True = "∀" +forallSym False = "forall" diff --git a/compiler/GHC/Parser/PostProcess/Haddock.hs b/compiler/GHC/Parser/PostProcess/Haddock.hs new file mode 100644 index 0000000000..a3d5e101d7 --- /dev/null +++ b/compiler/GHC/Parser/PostProcess/Haddock.hs @@ -0,0 +1,35 @@ +{-# OPTIONS_GHC -Wno-incomplete-record-updates #-} + +module GHC.Parser.PostProcess.Haddock where + +import GhcPrelude + +import GHC.Hs +import GHC.Types.SrcLoc + +import Control.Monad + +-- ----------------------------------------------------------------------------- +-- Adding documentation to record fields (used in parsing). + +addFieldDoc :: LConDeclField a -> Maybe LHsDocString -> LConDeclField a +addFieldDoc (L l fld) doc + = L l (fld { cd_fld_doc = cd_fld_doc fld `mplus` doc }) + +addFieldDocs :: [LConDeclField a] -> Maybe LHsDocString -> [LConDeclField a] +addFieldDocs [] _ = [] +addFieldDocs (x:xs) doc = addFieldDoc x doc : xs + + +addConDoc :: LConDecl a -> Maybe LHsDocString -> LConDecl a +addConDoc decl Nothing = decl +addConDoc (L p c) doc = L p ( c { con_doc = con_doc c `mplus` doc } ) + +addConDocs :: [LConDecl a] -> Maybe LHsDocString -> [LConDecl a] +addConDocs [] _ = [] +addConDocs [x] doc = [addConDoc x doc] +addConDocs (x:xs) doc = x : addConDocs xs doc + +addConDocFirst :: [LConDecl a] -> Maybe LHsDocString -> [LConDecl a] +addConDocFirst [] _ = [] +addConDocFirst (x:xs) doc = addConDoc x doc : xs |