diff options
Diffstat (limited to 'docs/users_guide/primitives.xml')
-rw-r--r-- | docs/users_guide/primitives.xml | 1215 |
1 files changed, 1215 insertions, 0 deletions
diff --git a/docs/users_guide/primitives.xml b/docs/users_guide/primitives.xml new file mode 100644 index 0000000000..e41bb59ee1 --- /dev/null +++ b/docs/users_guide/primitives.xml @@ -0,0 +1,1215 @@ +<?xml version="1.0" encoding="iso-8859-1"?> +<!-- UNBOXED TYPES AND PRIMITIVE OPERATIONS --> + +<sect1 id="primitives"> + <title>Unboxed types and primitive operations</title> + <indexterm><primary>GHC.Exts module</primary></indexterm> + + <para>This chapter defines all the types which are primitive in + Glasgow Haskell, and the operations provided for them. You bring + them into scope by importing module <literal>GHC.Exts</literal>.</para> + + <para>Note: while you really can use this stuff to write fast code, + we generally find it a lot less painful, and more satisfying in the + long run, to use higher-level language features and libraries. With + any luck, the code you write will be optimised to the efficient + unboxed version in any case. And if it isn't, we'd like to know + about it.</para> + +<sect2 id="glasgow-unboxed"> +<title>Unboxed types +</title> + +<para> +<indexterm><primary>Unboxed types (Glasgow extension)</primary></indexterm> +</para> + +<para>Most types in GHC are <firstterm>boxed</firstterm>, which means +that values of that type are represented by a pointer to a heap +object. The representation of a Haskell <literal>Int</literal>, for +example, is a two-word heap object. An <firstterm>unboxed</firstterm> +type, however, is represented by the value itself, no pointers or heap +allocation are involved. +</para> + +<para> +Unboxed types correspond to the “raw machine” types you +would use in C: <literal>Int#</literal> (long int), +<literal>Double#</literal> (double), <literal>Addr#</literal> +(void *), etc. The <emphasis>primitive operations</emphasis> +(PrimOps) on these types are what you might expect; e.g., +<literal>(+#)</literal> is addition on +<literal>Int#</literal>s, and is the machine-addition that we all +know and love—usually one instruction. +</para> + +<para> +Primitive (unboxed) types cannot be defined in Haskell, and are +therefore built into the language and compiler. Primitive types are +always unlifted; that is, a value of a primitive type cannot be +bottom. We use the convention that primitive types, values, and +operations have a <literal>#</literal> suffix. +</para> + +<para> +Primitive values are often represented by a simple bit-pattern, such +as <literal>Int#</literal>, <literal>Float#</literal>, +<literal>Double#</literal>. But this is not necessarily the case: +a primitive value might be represented by a pointer to a +heap-allocated object. Examples include +<literal>Array#</literal>, the type of primitive arrays. A +primitive array is heap-allocated because it is too big a value to fit +in a register, and would be too expensive to copy around; in a sense, +it is accidental that it is represented by a pointer. If a pointer +represents a primitive value, then it really does point to that value: +no unevaluated thunks, no indirections…nothing can be at the +other end of the pointer than the primitive value. +</para> + +<para> +There are some restrictions on the use of primitive types, the main +one being that you can't pass a primitive value to a polymorphic +function or store one in a polymorphic data type. This rules out +things like <literal>[Int#]</literal> (i.e. lists of primitive +integers). The reason for this restriction is that polymorphic +arguments and constructor fields are assumed to be pointers: if an +unboxed integer is stored in one of these, the garbage collector would +attempt to follow it, leading to unpredictable space leaks. Or a +<function>seq</function> operation on the polymorphic component may +attempt to dereference the pointer, with disastrous results. Even +worse, the unboxed value might be larger than a pointer +(<literal>Double#</literal> for instance). +</para> + +<para> +Nevertheless, A numerically-intensive program using unboxed types can +go a <emphasis>lot</emphasis> faster than its “standard” +counterpart—we saw a threefold speedup on one example. +</para> + +</sect2> + +<sect2 id="unboxed-tuples"> +<title>Unboxed Tuples +</title> + +<para> +Unboxed tuples aren't really exported by <literal>GHC.Exts</literal>, +they're available by default with <option>-fglasgow-exts</option>. An +unboxed tuple looks like this: +</para> + +<para> + +<programlisting> +(# e_1, ..., e_n #) +</programlisting> + +</para> + +<para> +where <literal>e_1..e_n</literal> are expressions of any +type (primitive or non-primitive). The type of an unboxed tuple looks +the same. +</para> + +<para> +Unboxed tuples are used for functions that need to return multiple +values, but they avoid the heap allocation normally associated with +using fully-fledged tuples. When an unboxed tuple is returned, the +components are put directly into registers or on the stack; the +unboxed tuple itself does not have a composite representation. Many +of the primitive operations listed in this section return unboxed +tuples. +</para> + +<para> +There are some pretty stringent restrictions on the use of unboxed tuples: +</para> + +<para> + +<itemizedlist> +<listitem> + +<para> + Unboxed tuple types are subject to the same restrictions as +other unboxed types; i.e. they may not be stored in polymorphic data +structures or passed to polymorphic functions. + +</para> +</listitem> +<listitem> + +<para> + Unboxed tuples may only be constructed as the direct result of +a function, and may only be deconstructed with a <literal>case</literal> expression. +eg. the following are valid: + + +<programlisting> +f x y = (# x+1, y-1 #) +g x = case f x x of { (# a, b #) -> a + b } +</programlisting> + + +but the following are invalid: + + +<programlisting> +f x y = g (# x, y #) +g (# x, y #) = x + y +</programlisting> + + +</para> +</listitem> +<listitem> + +<para> + No variable can have an unboxed tuple type. This is illegal: + + +<programlisting> +f :: (# Int, Int #) -> (# Int, Int #) +f x = x +</programlisting> + + +because <literal>x</literal> has an unboxed tuple type. + +</para> +</listitem> + +</itemizedlist> + +</para> + +<para> +Note: we may relax some of these restrictions in the future. +</para> + +<para> +The <literal>IO</literal> and <literal>ST</literal> monads use unboxed +tuples to avoid unnecessary allocation during sequences of operations. +</para> + +</sect2> + +<sect2> +<title>Character and numeric types</title> + +<indexterm><primary>character types, primitive</primary></indexterm> +<indexterm><primary>numeric types, primitive</primary></indexterm> +<indexterm><primary>integer types, primitive</primary></indexterm> +<indexterm><primary>floating point types, primitive</primary></indexterm> +<para> +There are the following obvious primitive types: +</para> + +<programlisting> +type Char# +type Int# +type Word# +type Addr# +type Float# +type Double# +type Int64# +type Word64# +</programlisting> + +<indexterm><primary><literal>Char#</literal></primary></indexterm> +<indexterm><primary><literal>Int#</literal></primary></indexterm> +<indexterm><primary><literal>Word#</literal></primary></indexterm> +<indexterm><primary><literal>Addr#</literal></primary></indexterm> +<indexterm><primary><literal>Float#</literal></primary></indexterm> +<indexterm><primary><literal>Double#</literal></primary></indexterm> +<indexterm><primary><literal>Int64#</literal></primary></indexterm> +<indexterm><primary><literal>Word64#</literal></primary></indexterm> + +<para> +If you really want to know their exact equivalents in C, see +<filename>ghc/includes/StgTypes.h</filename> in the GHC source tree. +</para> + +<para> +Literals for these types may be written as follows: +</para> + +<para> + +<programlisting> +1# an Int# +1.2# a Float# +1.34## a Double# +'a'# a Char#; for weird characters, use e.g. '\o<octal>'# +"a"# an Addr# (a `char *'); only characters '\0'..'\255' allowed +</programlisting> + +<indexterm><primary>literals, primitive</primary></indexterm> +<indexterm><primary>constants, primitive</primary></indexterm> +<indexterm><primary>numbers, primitive</primary></indexterm> +</para> + +</sect2> + +<sect2> +<title>Comparison operations</title> + +<para> +<indexterm><primary>comparisons, primitive</primary></indexterm> +<indexterm><primary>operators, comparison</primary></indexterm> +</para> + +<para> + +<programlisting> +{>,>=,==,/=,<,<=}# :: Int# -> Int# -> Bool + +{gt,ge,eq,ne,lt,le}Char# :: Char# -> Char# -> Bool + -- ditto for Word# and Addr# +</programlisting> + +<indexterm><primary><literal>>#</literal></primary></indexterm> +<indexterm><primary><literal>>=#</literal></primary></indexterm> +<indexterm><primary><literal>==#</literal></primary></indexterm> +<indexterm><primary><literal>/=#</literal></primary></indexterm> +<indexterm><primary><literal><#</literal></primary></indexterm> +<indexterm><primary><literal><=#</literal></primary></indexterm> +<indexterm><primary><literal>gt{Char,Word,Addr}#</literal></primary></indexterm> +<indexterm><primary><literal>ge{Char,Word,Addr}#</literal></primary></indexterm> +<indexterm><primary><literal>eq{Char,Word,Addr}#</literal></primary></indexterm> +<indexterm><primary><literal>ne{Char,Word,Addr}#</literal></primary></indexterm> +<indexterm><primary><literal>lt{Char,Word,Addr}#</literal></primary></indexterm> +<indexterm><primary><literal>le{Char,Word,Addr}#</literal></primary></indexterm> +</para> + +</sect2> + +<sect2> +<title>Primitive-character operations</title> + +<para> +<indexterm><primary>characters, primitive operations</primary></indexterm> +<indexterm><primary>operators, primitive character</primary></indexterm> +</para> + +<para> + +<programlisting> +ord# :: Char# -> Int# +chr# :: Int# -> Char# +</programlisting> + +<indexterm><primary><literal>ord#</literal></primary></indexterm> +<indexterm><primary><literal>chr#</literal></primary></indexterm> +</para> + +</sect2> + +<sect2> +<title>Primitive-<literal>Int</literal> operations</title> + +<para> +<indexterm><primary>integers, primitive operations</primary></indexterm> +<indexterm><primary>operators, primitive integer</primary></indexterm> +</para> + +<para> + +<programlisting> +{+,-,*,quotInt,remInt,gcdInt}# :: Int# -> Int# -> Int# +negateInt# :: Int# -> Int# + +iShiftL#, iShiftRA#, iShiftRL# :: Int# -> Int# -> Int# + -- shift left, right arithmetic, right logical + +addIntC#, subIntC#, mulIntC# :: Int# -> Int# -> (# Int#, Int# #) + -- add, subtract, multiply with carry +</programlisting> + +<indexterm><primary><literal>+#</literal></primary></indexterm> +<indexterm><primary><literal>-#</literal></primary></indexterm> +<indexterm><primary><literal>*#</literal></primary></indexterm> +<indexterm><primary><literal>quotInt#</literal></primary></indexterm> +<indexterm><primary><literal>remInt#</literal></primary></indexterm> +<indexterm><primary><literal>gcdInt#</literal></primary></indexterm> +<indexterm><primary><literal>iShiftL#</literal></primary></indexterm> +<indexterm><primary><literal>iShiftRA#</literal></primary></indexterm> +<indexterm><primary><literal>iShiftRL#</literal></primary></indexterm> +<indexterm><primary><literal>addIntC#</literal></primary></indexterm> +<indexterm><primary><literal>subIntC#</literal></primary></indexterm> +<indexterm><primary><literal>mulIntC#</literal></primary></indexterm> +<indexterm><primary>shift operations, integer</primary></indexterm> +</para> + +<para> +<emphasis>Note:</emphasis> No error/overflow checking! +</para> + +</sect2> + +<sect2> +<title>Primitive-<literal>Double</literal> and <literal>Float</literal> operations</title> + +<para> +<indexterm><primary>floating point numbers, primitive</primary></indexterm> +<indexterm><primary>operators, primitive floating point</primary></indexterm> +</para> + +<para> + +<programlisting> +{+,-,*,/}## :: Double# -> Double# -> Double# +{<,<=,==,/=,>=,>}## :: Double# -> Double# -> Bool +negateDouble# :: Double# -> Double# +double2Int# :: Double# -> Int# +int2Double# :: Int# -> Double# + +{plus,minus,times,divide}Float# :: Float# -> Float# -> Float# +{gt,ge,eq,ne,lt,le}Float# :: Float# -> Float# -> Bool +negateFloat# :: Float# -> Float# +float2Int# :: Float# -> Int# +int2Float# :: Int# -> Float# +</programlisting> + +</para> + +<para> +<indexterm><primary><literal>+##</literal></primary></indexterm> +<indexterm><primary><literal>-##</literal></primary></indexterm> +<indexterm><primary><literal>*##</literal></primary></indexterm> +<indexterm><primary><literal>/##</literal></primary></indexterm> +<indexterm><primary><literal><##</literal></primary></indexterm> +<indexterm><primary><literal><=##</literal></primary></indexterm> +<indexterm><primary><literal>==##</literal></primary></indexterm> +<indexterm><primary><literal>=/##</literal></primary></indexterm> +<indexterm><primary><literal>>=##</literal></primary></indexterm> +<indexterm><primary><literal>>##</literal></primary></indexterm> +<indexterm><primary><literal>negateDouble#</literal></primary></indexterm> +<indexterm><primary><literal>double2Int#</literal></primary></indexterm> +<indexterm><primary><literal>int2Double#</literal></primary></indexterm> +</para> + +<para> +<indexterm><primary><literal>plusFloat#</literal></primary></indexterm> +<indexterm><primary><literal>minusFloat#</literal></primary></indexterm> +<indexterm><primary><literal>timesFloat#</literal></primary></indexterm> +<indexterm><primary><literal>divideFloat#</literal></primary></indexterm> +<indexterm><primary><literal>gtFloat#</literal></primary></indexterm> +<indexterm><primary><literal>geFloat#</literal></primary></indexterm> +<indexterm><primary><literal>eqFloat#</literal></primary></indexterm> +<indexterm><primary><literal>neFloat#</literal></primary></indexterm> +<indexterm><primary><literal>ltFloat#</literal></primary></indexterm> +<indexterm><primary><literal>leFloat#</literal></primary></indexterm> +<indexterm><primary><literal>negateFloat#</literal></primary></indexterm> +<indexterm><primary><literal>float2Int#</literal></primary></indexterm> +<indexterm><primary><literal>int2Float#</literal></primary></indexterm> +</para> + +<para> +And a full complement of trigonometric functions: +</para> + +<para> + +<programlisting> +expDouble# :: Double# -> Double# +logDouble# :: Double# -> Double# +sqrtDouble# :: Double# -> Double# +sinDouble# :: Double# -> Double# +cosDouble# :: Double# -> Double# +tanDouble# :: Double# -> Double# +asinDouble# :: Double# -> Double# +acosDouble# :: Double# -> Double# +atanDouble# :: Double# -> Double# +sinhDouble# :: Double# -> Double# +coshDouble# :: Double# -> Double# +tanhDouble# :: Double# -> Double# +powerDouble# :: Double# -> Double# -> Double# +</programlisting> + +<indexterm><primary>trigonometric functions, primitive</primary></indexterm> +</para> + +<para> +similarly for <literal>Float#</literal>. +</para> + +<para> +There are two coercion functions for <literal>Float#</literal>/<literal>Double#</literal>: +</para> + +<para> + +<programlisting> +float2Double# :: Float# -> Double# +double2Float# :: Double# -> Float# +</programlisting> + +<indexterm><primary><literal>float2Double#</literal></primary></indexterm> +<indexterm><primary><literal>double2Float#</literal></primary></indexterm> +</para> + +<para> +The primitive version of <function>decodeDouble</function> +(<function>encodeDouble</function> is implemented as an external C +function): +</para> + +<para> + +<programlisting> +decodeDouble# :: Double# -> PrelNum.ReturnIntAndGMP +</programlisting> + +<indexterm><primary><literal>encodeDouble#</literal></primary></indexterm> +<indexterm><primary><literal>decodeDouble#</literal></primary></indexterm> +</para> + +<para> +(And the same for <literal>Float#</literal>s.) +</para> + +</sect2> + +<sect2 id="integer-operations"> +<title>Operations on/for <literal>Integers</literal> (interface to GMP) +</title> + +<para> +<indexterm><primary>arbitrary precision integers</primary></indexterm> +<indexterm><primary>Integer, operations on</primary></indexterm> +</para> + +<para> +We implement <literal>Integers</literal> (arbitrary-precision +integers) using the GNU multiple-precision (GMP) package (version +2.0.2). +</para> + +<para> +The data type for <literal>Integer</literal> is either a small +integer, represented by an <literal>Int</literal>, or a large integer +represented using the pieces required by GMP's +<literal>MP_INT</literal> in <filename>gmp.h</filename> (see +<filename>gmp.info</filename> in +<filename>ghc/includes/runtime/gmp</filename>). It comes out as: +</para> + +<para> + +<programlisting> +data Integer = S# Int# -- small integers + | J# Int# ByteArray# -- large integers +</programlisting> + +<indexterm><primary>Integer type</primary></indexterm> The primitive +ops to support large <literal>Integers</literal> use the +“pieces” of the representation, and are as follows: +</para> + +<para> + +<programlisting> +negateInteger# :: Int# -> ByteArray# -> Integer + +{plus,minus,times}Integer#, gcdInteger#, + quotInteger#, remInteger#, divExactInteger# + :: Int# -> ByteArray# + -> Int# -> ByteArray# + -> (# Int#, ByteArray# #) + +cmpInteger# + :: Int# -> ByteArray# + -> Int# -> ByteArray# + -> Int# -- -1 for <; 0 for ==; +1 for > + +cmpIntegerInt# + :: Int# -> ByteArray# + -> Int# + -> Int# -- -1 for <; 0 for ==; +1 for > + +gcdIntegerInt# :: + :: Int# -> ByteArray# + -> Int# + -> Int# + +divModInteger#, quotRemInteger# + :: Int# -> ByteArray# + -> Int# -> ByteArray# + -> (# Int#, ByteArray#, + Int#, ByteArray# #) + +integer2Int# :: Int# -> ByteArray# -> Int# + +int2Integer# :: Int# -> Integer -- NB: no error-checking on these two! +word2Integer# :: Word# -> Integer + +addr2Integer# :: Addr# -> Integer + -- the Addr# is taken to be a `char *' string + -- to be converted into an Integer. +</programlisting> + +<indexterm><primary><literal>negateInteger#</literal></primary></indexterm> +<indexterm><primary><literal>plusInteger#</literal></primary></indexterm> +<indexterm><primary><literal>minusInteger#</literal></primary></indexterm> +<indexterm><primary><literal>timesInteger#</literal></primary></indexterm> +<indexterm><primary><literal>quotInteger#</literal></primary></indexterm> +<indexterm><primary><literal>remInteger#</literal></primary></indexterm> +<indexterm><primary><literal>gcdInteger#</literal></primary></indexterm> +<indexterm><primary><literal>gcdIntegerInt#</literal></primary></indexterm> +<indexterm><primary><literal>divExactInteger#</literal></primary></indexterm> +<indexterm><primary><literal>cmpInteger#</literal></primary></indexterm> +<indexterm><primary><literal>divModInteger#</literal></primary></indexterm> +<indexterm><primary><literal>quotRemInteger#</literal></primary></indexterm> +<indexterm><primary><literal>integer2Int#</literal></primary></indexterm> +<indexterm><primary><literal>int2Integer#</literal></primary></indexterm> +<indexterm><primary><literal>word2Integer#</literal></primary></indexterm> +<indexterm><primary><literal>addr2Integer#</literal></primary></indexterm> +</para> + +</sect2> + +<sect2> +<title>Words and addresses</title> + +<para> +<indexterm><primary>word, primitive type</primary></indexterm> +<indexterm><primary>address, primitive type</primary></indexterm> +<indexterm><primary>unsigned integer, primitive type</primary></indexterm> +<indexterm><primary>pointer, primitive type</primary></indexterm> +</para> + +<para> +A <literal>Word#</literal> is used for bit-twiddling operations. +It is the same size as an <literal>Int#</literal>, but has no sign +nor any arithmetic operations. + +<programlisting> +type Word# -- Same size/etc as Int# but *unsigned* +type Addr# -- A pointer from outside the "Haskell world" (from C, probably); + -- described under "arrays" +</programlisting> + +<indexterm><primary><literal>Word#</literal></primary></indexterm> +<indexterm><primary><literal>Addr#</literal></primary></indexterm> +</para> + +<para> +<literal>Word#</literal>s and <literal>Addr#</literal>s have +the usual comparison operations. Other +unboxed-<literal>Word</literal> ops (bit-twiddling and coercions): +</para> + +<para> + +<programlisting> +{gt,ge,eq,ne,lt,le}Word# :: Word# -> Word# -> Bool + +and#, or#, xor# :: Word# -> Word# -> Word# + -- standard bit ops. + +quotWord#, remWord# :: Word# -> Word# -> Word# + -- word (i.e. unsigned) versions are different from int + -- versions, so we have to provide these explicitly. + +not# :: Word# -> Word# + +shiftL#, shiftRL# :: Word# -> Int# -> Word# + -- shift left, right logical + +int2Word# :: Int# -> Word# -- just a cast, really +word2Int# :: Word# -> Int# +</programlisting> + +<indexterm><primary>bit operations, Word and Addr</primary></indexterm> +<indexterm><primary><literal>gtWord#</literal></primary></indexterm> +<indexterm><primary><literal>geWord#</literal></primary></indexterm> +<indexterm><primary><literal>eqWord#</literal></primary></indexterm> +<indexterm><primary><literal>neWord#</literal></primary></indexterm> +<indexterm><primary><literal>ltWord#</literal></primary></indexterm> +<indexterm><primary><literal>leWord#</literal></primary></indexterm> +<indexterm><primary><literal>and#</literal></primary></indexterm> +<indexterm><primary><literal>or#</literal></primary></indexterm> +<indexterm><primary><literal>xor#</literal></primary></indexterm> +<indexterm><primary><literal>not#</literal></primary></indexterm> +<indexterm><primary><literal>quotWord#</literal></primary></indexterm> +<indexterm><primary><literal>remWord#</literal></primary></indexterm> +<indexterm><primary><literal>shiftL#</literal></primary></indexterm> +<indexterm><primary><literal>shiftRA#</literal></primary></indexterm> +<indexterm><primary><literal>shiftRL#</literal></primary></indexterm> +<indexterm><primary><literal>int2Word#</literal></primary></indexterm> +<indexterm><primary><literal>word2Int#</literal></primary></indexterm> +</para> + +<para> +Unboxed-<literal>Addr</literal> ops (C casts, really): + +<programlisting> +{gt,ge,eq,ne,lt,le}Addr# :: Addr# -> Addr# -> Bool + +int2Addr# :: Int# -> Addr# +addr2Int# :: Addr# -> Int# +addr2Integer# :: Addr# -> (# Int#, ByteArray# #) +</programlisting> + +<indexterm><primary><literal>gtAddr#</literal></primary></indexterm> +<indexterm><primary><literal>geAddr#</literal></primary></indexterm> +<indexterm><primary><literal>eqAddr#</literal></primary></indexterm> +<indexterm><primary><literal>neAddr#</literal></primary></indexterm> +<indexterm><primary><literal>ltAddr#</literal></primary></indexterm> +<indexterm><primary><literal>leAddr#</literal></primary></indexterm> +<indexterm><primary><literal>int2Addr#</literal></primary></indexterm> +<indexterm><primary><literal>addr2Int#</literal></primary></indexterm> +<indexterm><primary><literal>addr2Integer#</literal></primary></indexterm> +</para> + +<para> +The casts between <literal>Int#</literal>, +<literal>Word#</literal> and <literal>Addr#</literal> +correspond to null operations at the machine level, but are required +to keep the Haskell type checker happy. +</para> + +<para> +Operations for indexing off of C pointers +(<literal>Addr#</literal>s) to snatch values are listed under +“arrays”. +</para> + +</sect2> + +<sect2> +<title>Arrays</title> + +<para> +<indexterm><primary>arrays, primitive</primary></indexterm> +</para> + +<para> +The type <literal>Array# elt</literal> is the type of primitive, +unpointed arrays of values of type <literal>elt</literal>. +</para> + +<para> + +<programlisting> +type Array# elt +</programlisting> + +<indexterm><primary><literal>Array#</literal></primary></indexterm> +</para> + +<para> +<literal>Array#</literal> is more primitive than a Haskell +array—indeed, the Haskell <literal>Array</literal> interface is +implemented using <literal>Array#</literal>—in that an +<literal>Array#</literal> is indexed only by +<literal>Int#</literal>s, starting at zero. It is also more +primitive by virtue of being unboxed. That doesn't mean that it isn't +a heap-allocated object—of course, it is. Rather, being unboxed +means that it is represented by a pointer to the array itself, and not +to a thunk which will evaluate to the array (or to bottom). The +components of an <literal>Array#</literal> are themselves boxed. +</para> + +<para> +The type <literal>ByteArray#</literal> is similar to +<literal>Array#</literal>, except that it contains just a string +of (non-pointer) bytes. +</para> + +<para> + +<programlisting> +type ByteArray# +</programlisting> + +<indexterm><primary><literal>ByteArray#</literal></primary></indexterm> +</para> + +<para> +Arrays of these types are useful when a Haskell program wishes to +construct a value to pass to a C procedure. It is also possible to use +them to build (say) arrays of unboxed characters for internal use in a +Haskell program. Given these uses, <literal>ByteArray#</literal> +is deliberately a bit vague about the type of its components. +Operations are provided to extract values of type +<literal>Char#</literal>, <literal>Int#</literal>, +<literal>Float#</literal>, <literal>Double#</literal>, and +<literal>Addr#</literal> from arbitrary offsets within a +<literal>ByteArray#</literal>. (For type +<literal>Foo#</literal>, the $i$th offset gets you the $i$th +<literal>Foo#</literal>, not the <literal>Foo#</literal> at +byte-position $i$. Mumble.) (If you want a +<literal>Word#</literal>, grab an <literal>Int#</literal>, +then coerce it.) +</para> + +<para> +Lastly, we have static byte-arrays, of type +<literal>Addr#</literal> [mentioned previously]. (Remember +the duality between arrays and pointers in C.) Arrays of this types +are represented by a pointer to an array in the world outside Haskell, +so this pointer is not followed by the garbage collector. In other +respects they are just like <literal>ByteArray#</literal>. They +are only needed in order to pass values from C to Haskell. +</para> + +</sect2> + +<sect2> +<title>Reading and writing</title> + +<para> +Primitive arrays are linear, and indexed starting at zero. +</para> + +<para> +The size and indices of a <literal>ByteArray#</literal>, <literal>Addr#</literal>, and +<literal>MutableByteArray#</literal> are all in bytes. It's up to the program to +calculate the correct byte offset from the start of the array. This +allows a <literal>ByteArray#</literal> to contain a mixture of values of different +type, which is often needed when preparing data for and unpicking +results from C. (Umm…not true of indices…WDP 95/09) +</para> + +<para> +<emphasis>Should we provide some <literal>sizeOfDouble#</literal> constants?</emphasis> +</para> + +<para> +Out-of-range errors on indexing should be caught by the code which +uses the primitive operation; the primitive operations themselves do +<emphasis>not</emphasis> check for out-of-range indexes. The intention is that the +primitive ops compile to one machine instruction or thereabouts. +</para> + +<para> +We use the terms “reading” and “writing” to refer to accessing +<emphasis>mutable</emphasis> arrays (see <xref linkend="sect-mutable">), and +“indexing” to refer to reading a value from an <emphasis>immutable</emphasis> +array. +</para> + +<para> +Immutable byte arrays are straightforward to index (all indices are in +units of the size of the object being read): + +<programlisting> +indexCharArray# :: ByteArray# -> Int# -> Char# +indexIntArray# :: ByteArray# -> Int# -> Int# +indexAddrArray# :: ByteArray# -> Int# -> Addr# +indexFloatArray# :: ByteArray# -> Int# -> Float# +indexDoubleArray# :: ByteArray# -> Int# -> Double# + +indexCharOffAddr# :: Addr# -> Int# -> Char# +indexIntOffAddr# :: Addr# -> Int# -> Int# +indexFloatOffAddr# :: Addr# -> Int# -> Float# +indexDoubleOffAddr# :: Addr# -> Int# -> Double# +indexAddrOffAddr# :: Addr# -> Int# -> Addr# + -- Get an Addr# from an Addr# offset +</programlisting> + +<indexterm><primary><literal>indexCharArray#</literal></primary></indexterm> +<indexterm><primary><literal>indexIntArray#</literal></primary></indexterm> +<indexterm><primary><literal>indexAddrArray#</literal></primary></indexterm> +<indexterm><primary><literal>indexFloatArray#</literal></primary></indexterm> +<indexterm><primary><literal>indexDoubleArray#</literal></primary></indexterm> +<indexterm><primary><literal>indexCharOffAddr#</literal></primary></indexterm> +<indexterm><primary><literal>indexIntOffAddr#</literal></primary></indexterm> +<indexterm><primary><literal>indexFloatOffAddr#</literal></primary></indexterm> +<indexterm><primary><literal>indexDoubleOffAddr#</literal></primary></indexterm> +<indexterm><primary><literal>indexAddrOffAddr#</literal></primary></indexterm> +</para> + +<para> +The last of these, <function>indexAddrOffAddr#</function>, extracts an <literal>Addr#</literal> using an offset +from another <literal>Addr#</literal>, thereby providing the ability to follow a chain of +C pointers. +</para> + +<para> +Something a bit more interesting goes on when indexing arrays of boxed +objects, because the result is simply the boxed object. So presumably +it should be entered—we never usually return an unevaluated +object! This is a pain: primitive ops aren't supposed to do +complicated things like enter objects. The current solution is to +return a single element unboxed tuple (see <xref linkend="unboxed-tuples">). +</para> + +<para> + +<programlisting> +indexArray# :: Array# elt -> Int# -> (# elt #) +</programlisting> + +<indexterm><primary><literal>indexArray#</literal></primary></indexterm> +</para> + +</sect2> + +<sect2> +<title>The state type</title> + +<para> +<indexterm><primary><literal>state, primitive type</literal></primary></indexterm> +<indexterm><primary><literal>State#</literal></primary></indexterm> +</para> + +<para> +The primitive type <literal>State#</literal> represents the state of a state +transformer. It is parameterised on the desired type of state, which +serves to keep states from distinct threads distinct from one another. +But the <emphasis>only</emphasis> effect of this parameterisation is in the type +system: all values of type <literal>State#</literal> are represented in the same way. +Indeed, they are all represented by nothing at all! The code +generator “knows” to generate no code, and allocate no registers +etc, for primitive states. +</para> + +<para> + +<programlisting> +type State# s +</programlisting> + +</para> + +<para> +The type <literal>GHC.RealWorld</literal> is truly opaque: there are no values defined +of this type, and no operations over it. It is “primitive” in that +sense - but it is <emphasis>not unlifted!</emphasis> Its only role in life is to be +the type which distinguishes the <literal>IO</literal> state transformer. +</para> + +<para> + +<programlisting> +data RealWorld +</programlisting> + +</para> + +</sect2> + +<sect2> +<title>State of the world</title> + +<para> +A single, primitive, value of type <literal>State# RealWorld</literal> is provided. +</para> + +<para> + +<programlisting> +realWorld# :: State# RealWorld +</programlisting> + +<indexterm><primary>realWorld# state object</primary></indexterm> +</para> + +<para> +(Note: in the compiler, not a <literal>PrimOp</literal>; just a mucho magic +<literal>Id</literal>. Exported from <literal>GHC</literal>, though). +</para> + +</sect2> + +<sect2 id="sect-mutable"> +<title>Mutable arrays</title> + +<para> +<indexterm><primary>mutable arrays</primary></indexterm> +<indexterm><primary>arrays, mutable</primary></indexterm> +Corresponding to <literal>Array#</literal> and <literal>ByteArray#</literal>, we have the types of +mutable versions of each. In each case, the representation is a +pointer to a suitable block of (mutable) heap-allocated storage. +</para> + +<para> + +<programlisting> +type MutableArray# s elt +type MutableByteArray# s +</programlisting> + +<indexterm><primary><literal>MutableArray#</literal></primary></indexterm> +<indexterm><primary><literal>MutableByteArray#</literal></primary></indexterm> +</para> + +<sect3> +<title>Allocation</title> + +<para> +<indexterm><primary>mutable arrays, allocation</primary></indexterm> +<indexterm><primary>arrays, allocation</primary></indexterm> +<indexterm><primary>allocation, of mutable arrays</primary></indexterm> +</para> + +<para> +Mutable arrays can be allocated. Only pointer-arrays are initialised; +arrays of non-pointers are filled in by “user code” rather than by +the array-allocation primitive. Reason: only the pointer case has to +worry about GC striking with a partly-initialised array. +</para> + +<para> + +<programlisting> +newArray# :: Int# -> elt -> State# s -> (# State# s, MutableArray# s elt #) + +newCharArray# :: Int# -> State# s -> (# State# s, MutableByteArray# s elt #) +newIntArray# :: Int# -> State# s -> (# State# s, MutableByteArray# s elt #) +newAddrArray# :: Int# -> State# s -> (# State# s, MutableByteArray# s elt #) +newFloatArray# :: Int# -> State# s -> (# State# s, MutableByteArray# s elt #) +newDoubleArray# :: Int# -> State# s -> (# State# s, MutableByteArray# s elt #) +</programlisting> + +<indexterm><primary><literal>newArray#</literal></primary></indexterm> +<indexterm><primary><literal>newCharArray#</literal></primary></indexterm> +<indexterm><primary><literal>newIntArray#</literal></primary></indexterm> +<indexterm><primary><literal>newAddrArray#</literal></primary></indexterm> +<indexterm><primary><literal>newFloatArray#</literal></primary></indexterm> +<indexterm><primary><literal>newDoubleArray#</literal></primary></indexterm> +</para> + +<para> +The size of a <literal>ByteArray#</literal> is given in bytes. +</para> + +</sect3> + +<sect3> +<title>Reading and writing</title> + +<para> +<indexterm><primary>arrays, reading and writing</primary></indexterm> +</para> + +<para> + +<programlisting> +readArray# :: MutableArray# s elt -> Int# -> State# s -> (# State# s, elt #) +readCharArray# :: MutableByteArray# s -> Int# -> State# s -> (# State# s, Char# #) +readIntArray# :: MutableByteArray# s -> Int# -> State# s -> (# State# s, Int# #) +readAddrArray# :: MutableByteArray# s -> Int# -> State# s -> (# State# s, Addr# #) +readFloatArray# :: MutableByteArray# s -> Int# -> State# s -> (# State# s, Float# #) +readDoubleArray# :: MutableByteArray# s -> Int# -> State# s -> (# State# s, Double# #) + +writeArray# :: MutableArray# s elt -> Int# -> elt -> State# s -> State# s +writeCharArray# :: MutableByteArray# s -> Int# -> Char# -> State# s -> State# s +writeIntArray# :: MutableByteArray# s -> Int# -> Int# -> State# s -> State# s +writeAddrArray# :: MutableByteArray# s -> Int# -> Addr# -> State# s -> State# s +writeFloatArray# :: MutableByteArray# s -> Int# -> Float# -> State# s -> State# s +writeDoubleArray# :: MutableByteArray# s -> Int# -> Double# -> State# s -> State# s +</programlisting> + +<indexterm><primary><literal>readArray#</literal></primary></indexterm> +<indexterm><primary><literal>readCharArray#</literal></primary></indexterm> +<indexterm><primary><literal>readIntArray#</literal></primary></indexterm> +<indexterm><primary><literal>readAddrArray#</literal></primary></indexterm> +<indexterm><primary><literal>readFloatArray#</literal></primary></indexterm> +<indexterm><primary><literal>readDoubleArray#</literal></primary></indexterm> +<indexterm><primary><literal>writeArray#</literal></primary></indexterm> +<indexterm><primary><literal>writeCharArray#</literal></primary></indexterm> +<indexterm><primary><literal>writeIntArray#</literal></primary></indexterm> +<indexterm><primary><literal>writeAddrArray#</literal></primary></indexterm> +<indexterm><primary><literal>writeFloatArray#</literal></primary></indexterm> +<indexterm><primary><literal>writeDoubleArray#</literal></primary></indexterm> +</para> + +</sect3> + +<sect3> +<title>Equality</title> + +<para> +<indexterm><primary>arrays, testing for equality</primary></indexterm> +</para> + +<para> +One can take “equality” of mutable arrays. What is compared is the +<emphasis>name</emphasis> or reference to the mutable array, not its contents. +</para> + +<para> + +<programlisting> +sameMutableArray# :: MutableArray# s elt -> MutableArray# s elt -> Bool +sameMutableByteArray# :: MutableByteArray# s -> MutableByteArray# s -> Bool +</programlisting> + +<indexterm><primary><literal>sameMutableArray#</literal></primary></indexterm> +<indexterm><primary><literal>sameMutableByteArray#</literal></primary></indexterm> +</para> + +</sect3> + +<sect3> +<title>Freezing mutable arrays</title> + +<para> +<indexterm><primary>arrays, freezing mutable</primary></indexterm> +<indexterm><primary>freezing mutable arrays</primary></indexterm> +<indexterm><primary>mutable arrays, freezing</primary></indexterm> +</para> + +<para> +Only unsafe-freeze has a primitive. (Safe freeze is done directly in Haskell +by copying the array and then using <function>unsafeFreeze</function>.) +</para> + +<para> + +<programlisting> +unsafeFreezeArray# :: MutableArray# s elt -> State# s -> (# State# s, Array# s elt #) +unsafeFreezeByteArray# :: MutableByteArray# s -> State# s -> (# State# s, ByteArray# #) +</programlisting> + +<indexterm><primary><literal>unsafeFreezeArray#</literal></primary></indexterm> +<indexterm><primary><literal>unsafeFreezeByteArray#</literal></primary></indexterm> +</para> + +</sect3> + +</sect2> + +<sect2> +<title>Synchronizing variables (M-vars)</title> + +<para> +<indexterm><primary>synchronising variables (M-vars)</primary></indexterm> +<indexterm><primary>M-Vars</primary></indexterm> +</para> + +<para> +Synchronising variables are the primitive type used to implement +Concurrent Haskell's MVars (see the Concurrent Haskell paper for +the operational behaviour of these operations). +</para> + +<para> + +<programlisting> +type MVar# s elt -- primitive + +newMVar# :: State# s -> (# State# s, MVar# s elt #) +takeMVar# :: SynchVar# s elt -> State# s -> (# State# s, elt #) +putMVar# :: SynchVar# s elt -> State# s -> State# s +</programlisting> + +<indexterm><primary><literal>SynchVar#</literal></primary></indexterm> +<indexterm><primary><literal>newSynchVar#</literal></primary></indexterm> +<indexterm><primary><literal>takeMVar</literal></primary></indexterm> +<indexterm><primary><literal>putMVar</literal></primary></indexterm> +</para> + +</sect2> + +<sect2 id="glasgow-prim-arrays"> +<title>Primitive arrays, mutable and otherwise +</title> + +<para> +<indexterm><primary>primitive arrays (Glasgow extension)</primary></indexterm> +<indexterm><primary>arrays, primitive (Glasgow extension)</primary></indexterm> +</para> + +<para> +GHC knows about quite a few flavours of Large Swathes of Bytes. +</para> + +<para> +First, GHC distinguishes between primitive arrays of (boxed) Haskell +objects (type <literal>Array# obj</literal>) and primitive arrays of bytes (type +<literal>ByteArray#</literal>). +</para> + +<para> +Second, it distinguishes between… +<variablelist> + +<varlistentry> +<term>Immutable:</term> +<listitem> +<para> +Arrays that do not change (as with “standard” Haskell arrays); you +can only read from them. Obviously, they do not need the care and +attention of the state-transformer monad. +</para> +</listitem> +</varlistentry> +<varlistentry> +<term>Mutable:</term> +<listitem> +<para> +Arrays that may be changed or “mutated.” All the operations on them +live within the state-transformer monad and the updates happen +<emphasis>in-place</emphasis>. +</para> +</listitem> +</varlistentry> +<varlistentry> +<term>“Static” (in C land):</term> +<listitem> +<para> +A C routine may pass an <literal>Addr#</literal> pointer back into Haskell land. There +are then primitive operations with which you may merrily grab values +over in C land, by indexing off the “static” pointer. +</para> +</listitem> +</varlistentry> +<varlistentry> +<term>“Stable” pointers:</term> +<listitem> +<para> +If, for some reason, you wish to hand a Haskell pointer (i.e., +<emphasis>not</emphasis> an unboxed value) to a C routine, you first make the +pointer “stable,” so that the garbage collector won't forget that it +exists. That is, GHC provides a safe way to pass Haskell pointers to +C. +</para> + +<para> +Please see the module <literal>Foreign.StablePtr</literal> in the +library documentation for more details. +</para> +</listitem> +</varlistentry> +<varlistentry> +<term>“Foreign objects”:</term> +<listitem> +<para> +A “foreign object” is a safe way to pass an external object (a +C-allocated pointer, say) to Haskell and have Haskell do the Right +Thing when it no longer references the object. So, for example, C +could pass a large bitmap over to Haskell and say “please free this +memory when you're done with it.” +</para> + +<para> +Please see module <literal>Foreign.ForeignPtr</literal> in the library +documentatation for more details. +</para> +</listitem> +</varlistentry> +</variablelist> +</para> + +<para> +The libraries documentatation gives more details on all these +“primitive array” types and the operations on them. +</para> + +</sect2> + +</sect1> + +<!-- Emacs stuff: + ;;; Local Variables: *** + ;;; mode: xml *** + ;;; sgml-parent-document: ("users_guide.xml" "book" "chapter" "sect1") *** + ;;; End: *** + --> |