summaryrefslogtreecommitdiff
path: root/ml/article4.txt
blob: 71bb24ba6783ecff263d9ec8d324087c2af82382 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
Functional Programming For Dynamic Programmers - Part 4
=======================================================

:author: Michele Simionato
:date: November 2007

This is the fourth of a series of articles on functional programming in
statically typed languages. It is intended for dynamic programmers, i.e. 
for programmers with a background in dynamically typed languages, such as Perl, 
Python, Ruby, or languages in the Lisp family. The approch is eminently practical
and example-based; the main goal is to see if we can stole some good idea from
statically typed languages. In order to be concrete, I will consider languages
in the ML family, because they are pretty nice and much easier to understand
that Haskell.

The Standard ML type system
-------------------------------------------------------------------

SML has a sophisticated type which is, however, completely different from
the type system of dynamic languages. Here I am assuming that you, the
dynamic programmer, are familiar with the system of languages such as  
Python and Ruby, or  Common Lisp and Smalltalk. 
In such languages you here often the mantra *everything is an
object* meaning that everything is a first class value which can created and
inspected at runtime. Moreover, any object has a class, and classes themselves
are objects, i.e. they are instances of metaclasses. In SML things are completely
different. There a lots of things which are not first class values (structures,
exceptions, signatures, ...) and even types are such: in SML any value has a type, 
but types itself are not values, i.e. they are not values. 
That means that passing the name of a type to the prompt gives an error::

  - string;
  1.0-1.6: unknown value or constructor `string'

SML types are not classes in any OO sense of the term,
you cannot introspect them, there are no methods, no inheritance,  and
they live in a separated namespace.  
The first step in order to learn the SML type system is to forget
everything you know about OOP. Having said that, you can certainly
program in an OO style in ML,  but you don't use types for OOP, types are used
internally by the compiler, but from the user's point of view they are just
labels attached to values.

The type system of SML is extensible, and the user can define new types (i.e.
new labels) in terms of primitive types. The labels are just that; but together
with a new label, the definition of an user defined type includes the definition
of one or more functions, the *constructors* of the type, which are first class
values and extremely useful for pattern matching.
Let me give a concrete example: an
``int_or_string`` datatype with two constructors ``INT`` and ``STR``, defined
as follows::

 - datatype int_or_string = INT of int | STR of string;
 datatype int_or_string = INT of int | STR of string

This definitions tells the compiler that the label ``int_or_string`` has to
be used when showing the signature of objects of the newly defined types;
but the important things are the constructors, which are able to convert (or *cast*)
primitive values to values of the new type::

 - val v1 = INT 1;
 val v1 : int_or_string = INT 1
 -  val v2 = STR "1";
 val v2 : int_or_string = STR "1"

The constructors are essential since you can use them in *pattern matching*.
For instance, suppose you want to define an utility function casting integer to strings.
You can do so as follows::

 - fun valueToString (INT x) = Int.toString x
        | valueToString (STR x) = x;
 val valueToString : int_or_string -> string = _fn

Let me check that it works:

 - valueToString (INT 1);
 val it : string = "1"
 - valueToString (STR "1");
 val it : string = "1"

In essence, we have just implemented runtime type dispatching: ``valueToString``
is now emulating a C++ overloaded function (or a Lisp generic function) which 
accepts both integers and strings and behaves differently according to the type.
This in spirit, but technically ``valueToString`` is just an ordinary SML function
which takes in input the ``int_or_string`` type, so we are somewhat cheating,
but it works ;)
Consider for instance the issue of defining heterogeneous collections, i.e.
collections containing different types; in SML we cannot define a list

- [1, "two"];
  1.0-1.2: ill-typed constructor argument:
    int * string list
 does not match argument type
    int * int list
 because type
    string
 does not unify with
    int

but we can define

- [INT 1, STR "two"];
val it : int_or_string list = [INT 1, STR "two"]

As Harper puts it, *heterogenity is a special case of homogenity*.

It is also possible to define polymorphic types, where the constructor(s) can
accept any type as argument. For instance, we can define a polymorphic 
(parametric) box type::

 - datatype 'a box = Box of 'a;
 datatype 'a box = Box of 'a

 - Box
 val it : 'a -> 'a box = _fn 

``box`` is a parametric type with constructor ``Box`` and parameter 'a 
(to be read *alpha*), which corresponds to a generic type, so that you
can define 

  - Box(1);
     val it : int box = Box 1
  - Box("hello");
     val it : string box = Box "hello"
  - Box(fn x =>  2*x);
  val it : (int -> int) box = Box (_fn)

In particular we may define composite types, like the following::

 - datatype string_int = STRING_INT of string * int;
 datatype string_int = STRING_INT of string * int
 -

The uppercase identifier is called the constructors of the datatype, and can
be used to make concrete instances of the type::

 - STRING_INT("hello",  1);
 val it : string_int = STRING_INT ("hello", 1)
 -

where the constructor is a first class value, being simply a function::

 - STRING_INT;
 val it  : string * int -> string_int = _fn


a *named_function* type corresponding
to a pair *(function, name)*, where *function* can be of any functions, whereas 
*name* is a string. 
We can do it as follows::

 - datatype ('a, 'b) named_function = NAMED_FUNCTION of ('a->'b) * string;
 datatype ('a, 'b) named_function = NAMED_FUNCTION of ('a -> 'b) * string
 -

*named_function* is a parametric type with parameter 'a (to be read *alpha*),
which corresponds to a generic type, and NAMED_FUNCTION is its associated 
constructor::

 - NAMED_FUNCTION;
 val it : ('a -> 'b) * string -> ('a, 'b) named_function = _fn
 -

In other words, NAMED_FUNCTION is a function converting a pair (value, name),
where *value* can be of any type, into a *named_function* parametric type.
Here is an example::

 - NAMED_FUNCTION (fn x=>2*x, "double");
  (* giving a  name to the function x=>2*x *)
 val it = NAMED_FUNCTION (fn,"double") : (int,int) named_function
 -

Finally, let me notice that SML also allows to define enumeration
types, like the following one::

 - datatype color = RED|GREEN|BLUE;
 datatype color = BLUE | GREEN | RED
 -

but for
enumeration types the name is rather improper, since they are just values::

 - RED;
 val it : color = RED
 -

..

 - GREEN;
 val it : color = GREEN
 -
..

 - BLUE;
 val it : color = BLUE
 -

Polymorphism
-------------------

 -  functor Sequence(ListOrVector:) = funct
     end


Two examples of simple structures in the standard library as List and Vector;
they act as modules with a mostly compatible interface, providing functions
to manipulate lists and vectors respectively. Since SML is a functional language,
both lists and vectors are immutable. Here are a few examples of usage::

 - val lst = [0,1,2];
 val lst : int list = [0, 1, 2]
 - val vec = #[0,1,2];
 val vec : int vector = #[0, 1, 2]

 - List.length(lst);
 val it : int = 3
 - Vector.length(vec);
 val it : int = 3

 -List.sub(lst, 1) (*) take the second element of the list
 val it : int = 1
 -Vector.sub(vec, 1) (*) take the second element of the vector
 val it : int = 1

signature HAS_LENGTH = sig
  val length: 
functor(F:HAS_LENGTH):
  
polyLength(List)


Functors
------------------------------------------------------

As we saw in the previous article, structures are not first class objects and 
they cannot be passed to and returned from regular functions.
To circumvent this restriction, the authors of ML invented
the concept of *functor*, which is basically a *sui generis* function
taking structures in input and to returning structures in output. Functors
are not-first-class objects themselves and therefore they require a specific
declaration, such as the following::

- functor Wrap(SimpleIO:SIMPLE_IO) = struct
    val inputAll = SimpleIO.inputAll
    output = SimpleIO.output
    fun fileIn manage fname = let
        val file = openIn fname
    in
         manage file finally closeIn file
    end
    fun fileOut manage fname = let
        val file = openOut fname
    in
         manage file finally closeOut file
    end
 end

The functor here is important, since it shows how it is possible to write
generic code in SML. In this case, I have just written a library which is
able to wrap *any* structure matching the ``SimpleIO`` interface, and I have
avoided a potentially infinite code duplication.
The specification of which specific structure to use will be the job of the client
code, not of the library author; in particular a particular user of the library
may be interested in specializing it both for ``TextIO`` and ``BinIO``
by writing::

 - structure T = Wrap(TextIO)
 - structure B = Wrap(BinIO)

The operation of specializing a functor is also called *functor instantiation*;
since it happens in a structure declaration it is performed by the compiler
*at compile time*. The advantage is that the compiler can generate different optimized
code for  the structures``T`` and ``B`` in the *client* program.

``do print (IO.fileIn IO.inputAll "three-lines.txt")``

Lists
---------------------------------------------------

Lists are the most common data structures in functional languages (they formed
the core of Lisp at its beginning) and there are many facilities to manage them and
to iterate over them. For instance, in the first article of this series, I showed the
``app`` builtin, to apply a function over the elements of a list; there is also
a ``map`` builtin to build a new list from an old one::

 - val doubles = map (fn x => 2*x) [1,2,3];
 val doubles : int list = [2, 4, 6]

and a ``filter`` function to extract the sublist of elements satisfying a given predicate::

 - fun isEven n = n mod 2 = 0;
 - val even = List.filter isEven [1,2,3];
 val even : int list = [2]

Moreover, you can append lists with the ``@`` operator::

 - [1] @ [2] @ [3,4];
 val it : int list = [1, 2, 3, 4]

There are other standard facilities and you can look at the documentation
http://www.standardml.org/Basis/list.html to find them all.

ML lists are linked lists in the same sense of Lisp or Scheme [#]_, however they 
are immutable. Just as in Scheme [#]_, where

  ``(list 1 2)``

is a shortcut for

 ``(cons 1 (cons 2 '()))``

in ML

  ``[1, 2]``

is a shortcut for

 ``1::2::[]``

and ``::`` is the *cons* operator (short for constructor).


.. [#] Python lists are actually arrays, not lists

.. [#] If you don't know Scheme, 

   [1, 2] (ML) => [1,[2,[]]] (Python)



It is also possible to apply a binary operator to a list, via the ``foldl`` and ``foldr``
functions::

 - val sum = foldl op+ 0 [1,2,3];
 val sum : int = 6


fun enum n = lazy n :: enum (n+1)

(for instance, a log file); how can we process it? The simplest possibility is
to convert it into a lazy list of lines, with the following code::

 fun textFileToList filename = let
     val file = TextIO.openIn filename
     fun lazy readLines () =
         case TextIO.inputLine file 
               handle ex => (TextIO.closeIn file; raise ex)
          of NONE => (TextIO.closeIn file; [])
           | SOME line => line :: readLines ()
 in
     readLines ()
 end


A smarter line counter
-----------------------------------------------------------

 - val countLines = length o textFileToList 




Python extended generators in Alice
---------------------------------------------------------

def consumer():
  result = []
   while True:
     inp = yield
     if inp = END:
       return result 

functor consumer(val ch: channel):
   while true do let
      val inp = Channel.gexst ch
      if inp = END:


In particular, what's the equivalent of the simple
Python one-liner ``for item in container: print item``?  The answer is
that there is no equivalent. The Python one-liner is completely
polymorphic, since it works for any kind of container and it is able
to print any kind of object. However, SML is a statically typed
language, and you must give to the compiler some information about the
types: it cannot predict the future, nor the type of the container and
the types of the items.  Here I will show how you can loop on
homogenous containers, i.e . containers where the items are all
instances of the same type;


 fun writeln (tostring, value)  =
    print(format (tostring o text "\n") value);


-  app (fn item => writeln(int, item)) [1,2,3];
1
2
3

fun sum lst = foldl op+ 0 lst

The thing to remember is that functions associate to the **right**, so

  ``a -> b -> c``

means

  ``a -> (b -> c)``

whereas type constructor associates to the **left**, so 

  ``int ref list``

means

   ``(int ref) list``


Timers
--------------------

fun timedFn f a = let
  val ct = Timer.startRealTimer ()
  val result = f a
in
  (result, Timer.checkRealTimer (ct))
end


 - structure A = struct
     type 'a aggr 
     fun length(a: 'a aggr) = 1
     fun sub(a: 'a aggr, i :int) = 1
   end;
 signature AGGREGATE_TYPE =
   sig
      type 'a aggr 
      val length : 'a aggr -> int
      val sub : 'a aggr * int -> int
   end
 -

ok

"Hello World", part II
-------------------------------------------------

On the other hand, print2 cannot accept anything different than strings::

 - print2("the answer is ", 42);
 1.0-1.28: mismatch on application: expression type
    string * int
 does not match function's argument type
    string * string
 because type
    int
 does not unify with
    string

To print a string and a number, we should define another function::

 - fun print_string_int(s, i)=(print s; print (Int.toString(i)));
 val print_string_int : string * int -> unit = _fn
 - print_string_int("The answer is ", 42);
 The answer is 42val it : unit = ()

This is clearly unpractical, since we can't define an infinite number
of ``print`` functions to be able to print all the potentially
infinite types we can have in our programs.  Fortunately there are
utility library for converting objects in strings (even if not
standard :-(). SML/NJ has the library FormatComb, to be used as
follows (from now on, for concision sake, I will omit to write down
the full output of the REPL)::

 - open FormatComb;
 - print (format (text "The value is " o int o text ".") 42);
 The value is 42

The syntax looks strange (especially the "o" operator) and if you
forget the parenthesis you will get funny error messages. To
understand what is going on, you will need to master SML to a much
more advanced level than this first paper is meant to, so you will
have to wait for "Hello World, Part II" ;) Here I will content myself
to notice that the format utility requires you to specify the types of
the arguments. SML is a *typeful* language: the compiler must know the
types of all your variables *before running the program*.  In a
dynamic language, instead, the types can be introspected at runtime
and there is no need to specify them. This is most of the times an
advantage, however, it also leads to type errors that cannot occur in
a statically typed language such as SML (also called *type safe*
languages). Moreover, SML compilers may perform optimization which are
impossible in a dynamic language (but this is of lesser relevance,
since any user of dynamic languages will use C libraries if speed is
an issue).


Higher order functions can also be used to implement poor man's object
systems; consider for instance this example::

 class Counter(object):
    def __init__(self, initial_value):
         self.initial_value = self.value = 0
    def incr(self):
         self.value += 1
    def show(self):
         print "The value is %d" % self.value 
    def reset(self):
         self.value = self.initial_value

We can get the same effect via a stateful higher order function (a.k.a. a closure)::

 exception MissingMethod

 fun makeCounter initialValue = let
    val value = ref initialValue
 in 
     fn "reset" => value := initialValue
      | "show" => (print The value is "; print (Int.toString (!value)); print "\n")
      | "incr" => value := !value + 1
      | _ => raise MissingMethod
 end

 val counter = makeCounter 0;

 do counter "incr";
 do counter "show"; (* The value is 1 *)
 do counter "reset";
 do counter "show"; (* The value is 0 *)

This example also shows the usage of pattern matching and of exceptions.

You can rewrite it in a functional way as follows:

----

*The venerable master Qc Na was walking with his student, Anton.  Hoping to
prompt the master into a discussion, Anton said "Master, I have heard that
objects are a very good thing - is this true?"  Qc Na looked pityingly at
his student and replied, "Foolish pupil - objects are merely a poor man's
closures."*

*Chastised, Anton took his leave from his master and returned to his cell,
intent on studying closures.  He carefully read the entire "Lambda: The
Ultimate..." series of papers and its cousins, and implemented a small
Scheme interpreter with a closure-based object system.  He learned much, and
looked forward to informing his master of his progress.*

*On his next walk with Qc Na, Anton attempted to impress his master by
saying "Master, I have diligently studied the matter, and now understand
that objects are truly a poor man's closures."  Qc Na responded by hitting
Anton with his stick, saying "When will you learn? Closures are a poor man's
object."  At that moment, Anton became enlightened.*

      -- Anton van Straaten