summaryrefslogtreecommitdiff
path: root/ml/modules.txt
diff options
context:
space:
mode:
authormichele.simionato <devnull@localhost>2007-12-08 09:57:41 +0000
committermichele.simionato <devnull@localhost>2007-12-08 09:57:41 +0000
commitc16096239a55e74292a32fe16c834797f6854edd (patch)
tree9f06d660d4a144a3812e44eb4486c85d7f1ef084 /ml/modules.txt
parent3f58be1222e48ba71c030e3cd8230f590d4a01f3 (diff)
downloadmicheles-c16096239a55e74292a32fe16c834797f6854edd.tar.gz
Changed from numbered articles to named articles
Diffstat (limited to 'ml/modules.txt')
-rw-r--r--ml/modules.txt383
1 files changed, 383 insertions, 0 deletions
diff --git a/ml/modules.txt b/ml/modules.txt
new file mode 100644
index 0000000..43a5664
--- /dev/null
+++ b/ml/modules.txt
@@ -0,0 +1,383 @@
+Functional Programming For Dynamic Programmers - Part 3
+=======================================================
+
+:author: Michele Simionato
+:date: December 2007
+
+This is the third of a series of articles on functional programming in
+statically typed languages. It is intended for dynamic programmers, i.e.
+for programmers with a background in dynamically typed languages, such as Perl,
+Python, Ruby, or languages in the Lisp family. The approch is eminently practical
+and example-based; the main goal is to see if we can stole some good idea from
+statically typed languages. In order to be concrete, I will consider languages
+in the ML family, because they are pretty nice and much easier to understand
+that Haskell.
+
+Writing your first module
+------------------------------------------------------
+
+Usually programs start small, but they have an unfortunate tendency to
+grow, getting larger and larger and eventually to explode. To control
+this tendency, programmers have invented various techniques, which are
+essentially all variations of a same idea: large programs should be
+decomposed in smaller, more manageable, conceptually connected units
+of code. These units of code go under various names, depending on the
+language you are using; common terms are structures, classes,
+namespaces, modules, packages, libraries, components etc. In this
+article I will focus on the ML terminology and tecniques.
+
+In SML, the basic mechanism of code control is the structure, which
+takes the place of what is called a module in other languages. We
+already encountered structures before, such as the ``TextIO``
+structure, containing functions such as ``TextIO.openIn``,
+``TextIO.closeIn`` etc.
+
+It is also possible to define custom structures. For instance, suppose
+we want to implement Python-like file-iteration in SML in order to be
+able to write things like ``Iter.app print (Iter.file fname)`` to
+print all the lines in a text file. We can do so with the following
+structure::
+
+ - structure Iter = struct
+
+ exception StopIteration
+
+ fun app func iter =
+ (while true do func (iter ())) handle StopIteration => ()
+
+ fun map func iter =
+ fn () => func (iter ())
+
+ fun file fname = let
+ val inp = TextIO.openIn fname
+ in fn () =>
+ (case TextIO.inputLine inp
+ of NONE => raise StopIteration
+ | SOME line => line)
+ handle err => (TextIO.closeIn inp; raise err)
+ end
+ end;
+ structure Iter :
+ sig
+ exception StopIteration
+ val app : ('a -> 'b) -> (unit -> 'a) -> unit
+ val map : ('a -> 'b) -> (unit -> 'a) -> unit -> 'b
+ val file : string -> unit -> string
+ end
+
+We see many interesting things here. First of all, the REPL returns us
+a string representation of the so-called *signature* of the structure,
+i.e. a description of the types of the objects encapsulated by the
+structure. Iterators have been implemented as thunks, i.e. functions
+of type ``unit -> 'a``; in particular, ``file`` is a higher order
+function taking a filename and returning a string-value thunk: at each
+call of the closure``Iter.file fname`` we get a different line of the
+file. All types have been inferred correctly: ``Iter.app`` is an
+higher order function taking a generic function ``'a->'b`` (which
+means a function taking an unspecified type ``'a`` and returning an
+unspecified type ``'b``, possibly different from ``'a``) and
+converting it into a function ``iterator -> unit``, whereas
+``Iter.map`` applies a generic function to an iterator and returns an
+iterator. For instance, suppose we want to define an utility to
+convert to upper case a text file, by applying the function [#]_
+
+ ``- fun upper str = String.implode (map Char.toUpper (String.explode str));``
+
+to each line; we can test it by writing a file like the following::
+
+ $ cat three-lines.txt
+ line 1
+ line 2
+ line 3
+
+and by defining
+
+::
+
+ - val next = Iter.map upper (Iter.file "three-lines.txt");
+ val next : unit -> string = _fn
+
+With this definitions, we get the same behavior as in Python [#]_ ::
+
+ - next ();
+ val it : string = " LINE 1\n"
+ - next ();
+ val it : string = " LINE 2\n"
+ - next ();
+ val it : string = " LINE 3\n"
+ - next ();
+ Uncaught exception
+ StopIteration
+
+.. [#] In this example I am using the ``String`` and ``Char`` structures of the
+ SML standard library, documented here
+ http://www.standardml.org/Basis/string.html and here
+ http://www.standardml.org/Basis/char.html
+
+.. [#] In Python we would have
+
+::
+
+ >>> it = itertools.imap(str.upper, file("three-lines.txt"))
+ >>> it.next()
+ ' LINE 1\n'
+ >>> it.next()
+ ' LINE 2\n'
+ >>> it.next()
+ ' LINE 3\n'
+ >>> it.next()
+ Traceback (most recent call last):
+ File "<stdin>", line 1, in <module>
+ StopIteration
+
+Importing modules
+----------------------------------------------------
+
+Structures in the standard library are automatically available; on the other
+hand, if you write your own structure and save it into a file, you need to
+import the file before you can use it. The simplest way to import a file is
+through the ``use`` expression::
+
+ - use "<filename.aml>";
+
+This simply includes the imported file in the current program, and everything
+is compiled together. In most SML implementation it is also possible to compile
+a file as a standalone component; the details vary with the implementation.
+In Alice you can just save the above structure into a file called ``iter.aml``
+and compile it as
+
+ ``alicec iter.aml``
+
+This creates a bytecode compiled file called ``iter.alc``.
+The compiled structure can be imported in your programs with the line
+
+ ``import structure Iter from "iter"``
+
+assuming ``iter.alc`` is in the same directory as your main program.
+If you want to import in the program namespace all the objects defined
+inside the structure, you can open it:
+
+ ``- open Iter;``
+
+this is however actively discouraged, to prevent namespace pollution,
+i.e. name conflicts, since an exported name can shadow a pre-existent
+name. There is actually a good use case for opening a structure,
+i.e. inside another structure. In particular, it is possible to
+redefine structures, augmenting them with additional objects.
+For instance, we could supplement the ``Iter`` structure with
+a routine to read binary files in chunks::
+
+ - structure Iter = struct
+ open Iter
+ fun binfile (fname, chunksize) = let
+ val inp = BinIO.openIn fname
+ in
+ fn () => let
+ val vec = BinIO.inputN (inp, chunksize) (* read N bytes *)
+ in
+ if Word8Vector.length(vec) = 0 then raise StopIteration else vec
+ handle err => (BinIO.closeIn inp; raise err)
+ end
+ end
+ end;
+
+Notice that ``BinIO.inputN`` returns a vector
+of bytes, not a string; however, you can get a *bona fide* string by
+applying the standard library function Byte.bytesToString_ to
+the returned chunks. Here is an example::
+
+ - Byte.bytesToString (Iter.binfile("three-lines.txt", 40) ());
+ val it : string = " line 1\n line 2\n line 3\n"
+
+An importanting to notice is that *you can redefine even standard library
+structures*, augmenting them with new features, or replacing objects
+with others, even *with a different signature*. This is similar to what
+happens in Ruby, where you can add methods even to builtin classes,
+and should be done with care.
+
+.. _Byte.bytesToString: http://www.standardml.org/Basis/byte.html
+
+Structures are not first class values
+-----------------------------------------------------------
+
+The problem with structures is that they are not first class values, i.e. they cannot
+be passed to functions and they cannot be inspected. This is the reason why giving at
+the prompt the name of a structure does not work::
+
+ - TextIO;
+ 1.0-1.6: unknown value or constructor `TextIO'
+
+``TextIO`` is not recognized as the name of a know value. Structures live
+in a completely separate namespace [#]_, so that you can associate any value
+to the name ``TextIO`` and still you can access the contents of the structure
+without any problem::
+
+ - val TextIO = "astring";
+ 1.4-1.10: warning: value identifier `TextIO' violates standard naming conventions,
+ which suggest value names of the form foo, foo', fooBar
+ val TextIO : string = "astring"
+
+ - TextIO.print;
+ val it : string -> unit = _lazy
+
+Notice the warning about violating standard naming conventions, since
+a regular value such as a string should be denoted with a
+non-capitalized identitifier, whereas capitalized identitifiers should
+be reserved to structures.
+
+Structures can be given name and aliases via the ``structure`` declaration;
+for instance, if you want to give a shorter alias to ``TextIO`` you can define
+
+ ``- structure T = TextIO;``
+
+Structures can be arbitrarily nested, i.e they can contain
+substructures at any level of nesting. For instance, if you want to
+extract the substructure ``StreamIO`` from ``TextIO`` you can define
+
+ ``- structure S = TextIO.StreamIO;``
+
+The lack of first class structures is motivated by reasons of
+(premature) optimization. Suppose structure where first class values:
+then you could write things like
+
+ ``val S = if someRuntimeCondition() then struct ... end else struct ... end``
+
+and that would make life pretty hard for the compiler: basically, it would be
+impossible to compile the structure once and for all, and you would be forced
+to recompile it at runtime. In these days of just-in-time
+compilation this is less of an issue than in the past, and in particular
+Alice ML allows you to do that, by wrapping structures in packages.
+But in order to discuss that, we must first discuss signatures, which is
+the SML name for interfaces.
+
+.. [#] Readers familiar with Common Lisp will be familiar with the concept
+ of having separate namespaces for different kind of objects.
+
+Abstract signatures and abstract types
+---------------------------------------
+
+Strictly speaking, it is not necessary to define signatures for your
+own structures, since even if you do not specify a signature, the
+compiler is able to figure out the complete signature, thanks to type
+inference. The problem is exactly that: the compiler extracts the
+*complete* signature, a.k.a. the *principal* signature of you library,
+which is too much. Typically, you don't want to expose all the objects
+in your structure, for safety reason (your structure may contain
+private functions which should not be exported to client code), for
+sake of simplicity (you may want to implement the `facade pattern`_)
+or for code reuse. Using computer science buzzwords, we may say that
+`information hiding`_ is implemented in SML by defining *abstract
+signatures*, as opposed to the *concrete* (or principal) signatures we
+saw until now.
+
+To give a concrete example, let me consider the ``List`` and
+``Vector`` structures in the standard library: the ``List`` structure
+defines the type ``list``, whereas the ``Vector`` structure defines
+the type ``vector``; both types have a common alias ``t``.
+``list`` and ``vector`` are container types, in the sense that a
+list/vector may contain objects of any type, as long as all the elements
+are homegenous, so you may have a ``string list``, an ``int list``,
+a ``string list list``, etc [#]_::
+
+ - val lst = [0,1,2];
+ val lst : int list = [0, 1, 2]
+ - val vec = #[0,1,2];
+ val vec : int vector = #[0, 1, 2]
+ - val lst = [["hello", "world"]];
+ val lst : string list list = [["hello", "world"]]
+
+Both ``List`` and ``Vector`` provide a function ``length`` returning the
+number of elements of the list/vector, and a function ``sub`` return
+the i-th element of the list/vector::
+
+ - List.length(lst);
+ val it : int = 3
+ - Vector.length(vec);
+ val it : int = 3
+
+ -List.sub(lst, 1) (*) take the second element of the list
+ val it : int = 1
+ -Vector.sub(vec, 1) (*) take the second element of the vector
+ val it : int = 1
+
+We may abstract this common behavior by defining an abstract signature::
+
+ - signature SIMPLE_SEQUENCE = sig
+ type 'a t
+ val length: 'a t -> int
+ val sub: 'a t * int -> 'a
+ end
+ sig
+ type 'a t
+ val length : 'a t -> int
+ val sub : 'a t * int -> 'a
+ end
+
+Here the type t is *abstract*: it becomes concrete only if you *ascribe*
+the signature to a concrete structure::
+
+ - structure L=List:SIMPLE_SEQUENCE;
+ structure L :
+ sig
+ type t = list
+ val length : 'a List.t -> int
+ val sub : 'a List.t * int -> 'a
+ end = List
+
+ - structure V=Vector:SIMPLE_SEQUENCE;
+ structure V :
+ sig
+ type t = vector
+ val length : 'a Vector.t -> int
+ val sub : 'a Vector.t * int -> 'a
+ end = Vector
+
+i.e. the abstract type ``t`` is replaced by the concrete type ``list`` when the
+signature is ascribed to the ``List`` structure and by the concrete type ``vector``
+when the signature is ascribed to the ``Vector`` structure.
+Moreover, having ascribed the ``SIMPLE_SEQUENCE`` signature to the
+structures ``L`` and ``V``, we have automatically hidden all the additional
+functionality of the original structures, so that for instance ``L.map``
+and ``V.map`` are not accessible, even if ``List.map`` and ``Vector.map``
+do exists::
+
+ - L.map
+ 1.2-1.5: unknown value or constructor `map'
+ - V.map;
+ 1.2-1.5: unknown value or constructor `map'
+
+In a sense, you may see ascribing the signature as a way of importing a selected
+subset of the functionality of the original structure; you could even import a subset
+of the names defined in the original structures in the current namespace
+by opening the ascribed structure::
+
+ - open Vector:SIMPLE_SEQUENCE;
+ structure _id20 :
+ sig
+ type t = vector
+ val length : 'a Vector.t -> int
+ val sub : 'a Vector.t * int -> 'a
+ end = Vector
+ val sub : 'a Vector.t * int -> 'a = _fn
+ val length : 'a Vector.t -> int = _fn
+ type t = Vector.t
+
+Signatures allows much more than that, and the practical usage of abstract
+signatures in SML will become clear once we will introduce the concept of
+functors and packages. Nevertheless, I am sure that the
+expert object oriented programmer has already understood the point
+behind abstract signatures: they are nothing else than
+interfaces. Code written to work with a ``SIMPLE_SEQUENCE`` interface
+will automatically work with lists, vectors, and any data structure
+(builtin *and* custom) satisfying the interface.
+
+.. [#] Notice that since SML is a functional language, both lists and vectors are immutable.
+
+.. _information hiding: http://en.wikipedia.org/wiki/Information_hiding
+.. _facade pattern: http://en.wikipedia.org/wiki/Facade_pattern
+
+----
+
+ *divide et impera*
+
+ -- old Roman saying