diff options
author | Sylvain Henry <sylvain@haskus.fr> | 2020-01-15 19:52:49 +0100 |
---|---|---|
committer | Marge Bot <ben+marge-bot@smart-cactus.org> | 2020-01-25 05:24:19 -0500 |
commit | 0a5e4f5f7d6cef16b6b11ac8d3a269b92016ed5d (patch) | |
tree | 9456f70f36a1d2fcc9ff6bf6bc6eab1c2beda862 /docs/users_guide/exts/primitives.rst | |
parent | b1a32170a1b8e7a31c2dd28eb1b02f1375ea3998 (diff) | |
download | haskell-0a5e4f5f7d6cef16b6b11ac8d3a269b92016ed5d.tar.gz |
Split glasgow_exts into several files (#17316)
Diffstat (limited to 'docs/users_guide/exts/primitives.rst')
-rw-r--r-- | docs/users_guide/exts/primitives.rst | 371 |
1 files changed, 371 insertions, 0 deletions
diff --git a/docs/users_guide/exts/primitives.rst b/docs/users_guide/exts/primitives.rst new file mode 100644 index 0000000000..1995646713 --- /dev/null +++ b/docs/users_guide/exts/primitives.rst @@ -0,0 +1,371 @@ +.. _primitives: + +Unboxed types and primitive operations +====================================== + +GHC is built on a raft of primitive data types and operations; +"primitive" in the sense that they cannot be defined in Haskell itself. +While you really can use this stuff to write fast code, we generally +find it a lot less painful, and more satisfying in the long run, to use +higher-level language features and libraries. With any luck, the code +you write will be optimised to the efficient unboxed version in any +case. And if it isn't, we'd like to know about it. + +All these primitive data types and operations are exported by the +library ``GHC.Prim``, for which there is +:ghc-prim-ref:`detailed online documentation <GHC.Prim.>`. (This +documentation is generated from the file ``compiler/prelude/primops.txt.pp``.) + +If you want to mention any of the primitive data types or operations in +your program, you must first import ``GHC.Prim`` to bring them into +scope. Many of them have names ending in ``#``, and to mention such names +you need the :extension:`MagicHash` extension. + +The primops make extensive use of `unboxed types <#glasgow-unboxed>`__ +and `unboxed tuples <#unboxed-tuples>`__, which we briefly summarise +here. + +.. _glasgow-unboxed: + +Unboxed types +------------- + +Most types in GHC are boxed, which means that values of that type are +represented by a pointer to a heap object. The representation of a +Haskell ``Int``, for example, is a two-word heap object. An unboxed +type, however, is represented by the value itself, no pointers or heap +allocation are involved. + +Unboxed types correspond to the “raw machine” types you would use in C: +``Int#`` (long int), ``Double#`` (double), ``Addr#`` (void \*), etc. The +*primitive operations* (PrimOps) on these types are what you might +expect; e.g., ``(+#)`` is addition on ``Int#``\ s, and is the +machine-addition that we all know and love—usually one instruction. + +Primitive (unboxed) types cannot be defined in Haskell, and are +therefore built into the language and compiler. Primitive types are +always unlifted; that is, a value of a primitive type cannot be bottom. +(Note: a "boxed" type means that a value is represented by a pointer to a heap +object; a "lifted" type means that terms of that type may be bottom. See +the next paragraph for an example.) +We use the convention (but it is only a convention) that primitive +types, values, and operations have a ``#`` suffix (see +:ref:`magic-hash`). For some primitive types we have special syntax for +literals, also described in the `same section <#magic-hash>`__. + +Primitive values are often represented by a simple bit-pattern, such as +``Int#``, ``Float#``, ``Double#``. But this is not necessarily the case: +a primitive value might be represented by a pointer to a heap-allocated +object. Examples include ``Array#``, the type of primitive arrays. Thus, +``Array#`` is an unlifted, boxed type. A +primitive array is heap-allocated because it is too big a value to fit +in a register, and would be too expensive to copy around; in a sense, it +is accidental that it is represented by a pointer. If a pointer +represents a primitive value, then it really does point to that value: +no unevaluated thunks, no indirections. Nothing can be at the other end +of the pointer than the primitive value. A numerically-intensive program +using unboxed types can go a *lot* faster than its “standard” +counterpart—we saw a threefold speedup on one example. + +Unboxed type kinds +------------------ + +Because unboxed types are represented without the use of pointers, we +cannot store them in use a polymorphic datatype at an unboxed type. +For example, the ``Just`` node +of ``Just 42#`` would have to be different from the ``Just`` node of +``Just 42``; the former stores an integer directly, while the latter +stores a pointer. GHC currently does not support this variety of ``Just`` +nodes (nor for any other datatype). Accordingly, the *kind* of an unboxed +type is different from the kind of a boxed type. + +The Haskell Report describes that ``*`` (spelled ``Type`` and imported from +``Data.Kind`` in the GHC dialect of Haskell) is the kind of ordinary datatypes, +such as ``Int``. Furthermore, type constructors can have kinds with arrows; for +example, ``Maybe`` has kind ``Type -> Type``. Unboxed types have a kind that +specifies their runtime representation. For example, the type ``Int#`` has kind +``TYPE 'IntRep`` and ``Double#`` has kind ``TYPE 'DoubleRep``. These kinds say +that the runtime representation of an ``Int#`` is a machine integer, and the +runtime representation of a ``Double#`` is a machine double-precision floating +point. In contrast, the kind ``Type`` is actually just a synonym for ``TYPE +'LiftedRep``. More details of the ``TYPE`` mechanisms appear in the `section +on runtime representation polymorphism <#runtime-rep>`__. + +Given that ``Int#``'s kind is not ``Type``, it then it follows that ``Maybe +Int#`` is disallowed. Similarly, because type variables tend to be of kind +``Type`` (for example, in ``(.) :: (b -> c) -> (a -> b) -> a -> c``, all the +type variables have kind ``Type``), polymorphism tends not to work over +primitive types. Stepping back, this makes some sense, because a polymorphic +function needs to manipulate the pointers to its data, and most primitive types +are unboxed. + +There are some restrictions on the use of primitive types: + +- You cannot define a newtype whose representation type (the argument + type of the data constructor) is an unboxed type. Thus, this is + illegal: + + :: + + newtype A = MkA Int# + + However, this restriction can be relaxed by enabling + :extension:`UnliftedNewtypes`. The `section on unlifted newtypes + <#unlifted-newtypes>`__ details the behavior of such types. + +- You cannot bind a variable with an unboxed type in a *top-level* + binding. + +- You cannot bind a variable with an unboxed type in a *recursive* + binding. + +- You may bind unboxed variables in a (non-recursive, non-top-level) + pattern binding, but you must make any such pattern-match strict. + (Failing to do so emits a warning :ghc-flag:`-Wunbanged-strict-patterns`.) + For example, rather than: + + :: + + data Foo = Foo Int Int# + + f x = let (Foo a b, w) = ..rhs.. in ..body.. + + you must write: + + :: + + data Foo = Foo Int Int# + + f x = let !(Foo a b, w) = ..rhs.. in ..body.. + + since ``b`` has type ``Int#``. + +.. _unboxed-tuples: + +Unboxed tuples +-------------- + +.. extension:: UnboxedTuples + :shortdesc: Enable the use of unboxed tuple syntax. + + :since: 6.8.1 + + +Unboxed tuples aren't really exported by ``GHC.Exts``; they are a +syntactic extension (:extension:`UnboxedTuples`). An +unboxed tuple looks like this: :: + + (# e_1, ..., e_n #) + +where ``e_1..e_n`` are expressions of any type (primitive or +non-primitive). The type of an unboxed tuple looks the same. + +Note that when unboxed tuples are enabled, ``(#`` is a single lexeme, so +for example when using operators like ``#`` and ``#-`` you need to write +``( # )`` and ``( #- )`` rather than ``(#)`` and ``(#-)``. + +Unboxed tuples are used for functions that need to return multiple +values, but they avoid the heap allocation normally associated with +using fully-fledged tuples. When an unboxed tuple is returned, the +components are put directly into registers or on the stack; the unboxed +tuple itself does not have a composite representation. Many of the +primitive operations listed in ``primops.txt.pp`` return unboxed tuples. +In particular, the ``IO`` and ``ST`` monads use unboxed tuples to avoid +unnecessary allocation during sequences of operations. + +There are some restrictions on the use of unboxed tuples: + +- The typical use of unboxed tuples is simply to return multiple + values, binding those multiple results with a ``case`` expression, + thus: + + :: + + f x y = (# x+1, y-1 #) + g x = case f x x of { (# a, b #) -> a + b } + + You can have an unboxed tuple in a pattern binding, thus + + :: + + f x = let (# p,q #) = h x in ..body.. + + If the types of ``p`` and ``q`` are not unboxed, the resulting + binding is lazy like any other Haskell pattern binding. The above + example desugars like this: + + :: + + f x = let t = case h x of { (# p,q #) -> (p,q) } + p = fst t + q = snd t + in ..body.. + + Indeed, the bindings can even be recursive. + +.. _unboxed-sums: + +Unboxed sums +------------ + +.. extension:: UnboxedSums + :shortdesc: Enable unboxed sums. + + :since: 8.2.1 + + Enable the use of unboxed sum syntax. + +`-XUnboxedSums` enables new syntax for anonymous, unboxed sum types. The syntax +for an unboxed sum type with N alternatives is :: + + (# t_1 | t_2 | ... | t_N #) + +where ``t_1`` ... ``t_N`` are types (which can be unlifted, including unboxed +tuples and sums). + +Unboxed tuples can be used for multi-arity alternatives. For example: :: + + (# (# Int, String #) | Bool #) + +The term level syntax is similar. Leading and preceding bars (`|`) indicate which +alternative it is. Here are two terms of the type shown above: :: + + (# (# 1, "foo" #) | #) -- first alternative + + (# | True #) -- second alternative + +The pattern syntax reflects the term syntax: :: + + case x of + (# (# i, str #) | #) -> ... + (# | bool #) -> ... + +Unboxed sums are "unboxed" in the sense that, instead of allocating sums in the +heap and representing values as pointers, unboxed sums are represented as their +components, just like unboxed tuples. These "components" depend on alternatives +of a sum type. Like unboxed tuples, unboxed sums are lazy in their lifted +components. + +The code generator tries to generate as compact layout as possible for each +unboxed sum. In the best case, size of an unboxed sum is size of its biggest +alternative plus one word (for a tag). The algorithm for generating the memory +layout for a sum type works like this: + +- All types are classified as one of these classes: 32bit word, 64bit word, + 32bit float, 64bit float, pointer. + +- For each alternative of the sum type, a layout that consists of these fields + is generated. For example, if an alternative has ``Int``, ``Float#`` and + ``String`` fields, the layout will have an 32bit word, 32bit float and + pointer fields. + +- Layout fields are then overlapped so that the final layout will be as compact + as possible. For example, suppose we have the unboxed sum: :: + + (# (# Word32#, String, Float# #) + | (# Float#, Float#, Maybe Int #) #) + + The final layout will be something like :: + + Int32, Float32, Float32, Word32, Pointer + + The first ``Int32`` is for the tag. There are two ``Float32`` fields because + floating point types can't overlap with other types, because of limitations of + the code generator that we're hoping to overcome in the future. The second + alternative needs two ``Float32`` fields: The ``Word32`` field is for the + ``Word32#`` in the first alternative. The ``Pointer`` field is shared between + ``String`` and ``Maybe Int`` values of the alternatives. + + As another example, this is the layout for the unboxed version of ``Maybe a`` + type, ``(# (# #) | a #)``: :: + + Int32, Pointer + + The ``Pointer`` field is not used when tag says that it's ``Nothing``. + Otherwise ``Pointer`` points to the value in ``Just``. As mentioned + above, this type is lazy in its lifted field. Therefore, the type :: + + data Maybe' a = Maybe' (# (# #) | a #) + + is *precisely* isomorphic to the type ``Maybe a``, although its memory + representation is different. + + In the degenerate case where all the alternatives have zero width, such + as the ``Bool``-like ``(# (# #) | (# #) #)``, the unboxed sum layout only + has an ``Int32`` tag field (i.e., the whole thing is represented by an integer). + +.. _unlifted-newtypes: + +Unlifted Newtypes +----------------- + +.. extension:: UnliftedNewtypes + :shortdesc: Enable unlifted newtypes. + + :since: 8.10.1 + + Enable the use of newtypes over types with non-lifted runtime representations. + +GHC implements an :extension:`UnliftedNewtypes` extension as specified in +`this GHC proposal <https://github.com/ghc-proposals/ghc-proposals/blob/master/proposals/0013-unlifted-newtypes.rst>`_. +:extension:`UnliftedNewtypes` relaxes the restrictions around what types can appear inside +of a `newtype`. For example, the type :: + + newtype A = MkA Int# + +is accepted when this extension is enabled. This creates a type +``A :: TYPE 'IntRep`` and a data constructor ``MkA :: Int# -> A``. +Although the kind of ``A`` is inferred by GHC, there is nothing visually +distictive about this type that indicated that is it not of kind ``Type`` +like newtypes typically are. `GADTSyntax <#gadt-style>`__ can be used to +provide a kind signature for additional clarity :: + + newtype A :: TYPE 'IntRep where + MkA :: Int# -> A + +The ``Coercible`` machinery works with unlifted newtypes just like it does with +lifted types. In either of the equivalent formulations of ``A`` given above, +users would additionally have access to a coercion between ``A`` and ``Int#``. + +As a consequence of the +`levity-polymorphic binder restriction <#levity-polymorphic-restrictions>`__, +levity-polymorphic fields are disallowed in data constructors +of data types declared using ``data``. However, since ``newtype`` data +constructor application is implemented as a coercion instead of as function +application, this restriction does not apply to the field inside a ``newtype`` +data constructor. Thus, the type checker accepts :: + + newtype Identity# :: forall (r :: RuntimeRep). TYPE r -> TYPE r where + MkIdentity# :: forall (r :: RuntimeRep) (a :: TYPE r). a -> Identity# a + +And with `UnboxedSums <#unboxed-sums>`__ enabled :: + + newtype Maybe# :: forall (r :: RuntimeRep). TYPE r -> TYPE (SumRep '[r, TupleRep '[]]) where + MkMaybe# :: forall (r :: RuntimeRep) (a :: TYPE r). (# a | (# #) #) -> Maybe# a + +This extension also relaxes some of the restrictions around data family +instances. In particular, :extension:`UnliftedNewtypes` permits a +``newtype instance`` to be given a return kind of ``TYPE r``, not just +``Type``. For example, the following ``newtype instance`` declarations would be +permitted: :: + + class Foo a where + data FooKey a :: TYPE 'IntRep + class Bar (r :: RuntimeRep) where + data BarType r :: TYPE r + + instance Foo Bool where + newtype FooKey Bool = FooKeyBoolC Int# + instance Bar 'WordRep where + newtype BarType 'WordRep = BarTypeWordRepC Word# + +It is worth noting that :extension:`UnliftedNewtypes` is *not* required to give +the data families themselves return kinds involving ``TYPE``, such as the +``FooKey`` and ``BarType`` examples above. The extension is +only required for ``newtype instance`` declarations, such as ``FooKeyBoolC`` +and ``BarTypeWorkRepC`` above. + +This extension impacts the determination of whether or not a newtype has +a Complete User-Specified Kind Signature (CUSK). The exact impact is specified +`the section on CUSKs <#complete-kind-signatures>`__. + |