summaryrefslogtreecommitdiff
path: root/ml/loops.txt
blob: 5dc22c0c452b67b9b982ee28b1bd2c9a86a6b866 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
Functional Programming For Dynamic Programmers - Part 2
=======================================================

:author: Michele Simionato
:date: December 2007

This is the second of a series of articles on functional programming in
statically typed languages. It is intended for dynamic programmers, i.e. 
for programmers with a background in dynamically typed languages, such as Perl, 
Python, Ruby, or languages in the Lisp family. The approch is eminently practical
and example-based; the main goal is to see if we can stole some good idea from
statically typed languages. In order to be concrete, I will consider languages
in the ML family, because they are pretty nice and much easier to understand
that Haskell.

Functional programming and Input/Output
-----------------------------------------------------

Input/output is the black ship of functional programming and it is
often relegated at the end of tutorials and sometimes even omitted, as
if shameful. Sometime, it is not even standardized and
implementation-dependent. On the other hand, *all* programs must
perform some kind of input/output, or at least of output, otherwise we
would have no way to see what a program did. Therefore input/output
*must* be explained at the beginning of any tutorial, not at the
end. I always hated books not telling me how to read a file until the
appendix!  In this section I will explain how you can read a file in
SML. Reading a file is for some reason seen as "inpure" since it is
not functional.  Let me explain that a bit more: in mathematics, a
function takes an argument in input, and returns an output; if I call
twice the function with the same argument, I get twice the *same*
output. In programming, a procedure reading a line from a file returns
a *different* line at each invocation, even if called with the same
arguments (the arguments maybe simply the empty tuple), so it is not a
function in the mathematical sense. On the other hand, a procedure
writing a line into a file, is also impure, since it is modifying an
object, the file itself. 
In strict functional programming there is no
mutation, objects are declared once and for all times they do not
change. Functional programming in its purest form is therefore useless:
notice that even the paradigmatic "Hello world" program is not a functional 
program, since the "Hello world" string is printed via an imperative side 
effect. In other words, every functional programming language must find a 
non-functional way to manage input-output. Haskell tries very hard to hide
non-functional constructs into monads; SML just uses traditional
imperative constructs to perform input/output.  We already saw the
``print`` function to write on stdout; to read from stdin, you can
define the following utility::

 - fun input prompt = (print prompt; getOpt(TextIO.inputLine TextIO.stdIn, ""))
 val input : string -> string = _fn

The parenthesis here are needed since the right hand side of a function definition
must be an expression, and the parenthesis (plus the expression separator ";")
allows you to compose expressions in a single expression returning as value
the value of the last expression. ``TextIO`` is a module in the standard library
managing textual I/O and ``TextIO.inputLine inputFile`` is a function
returning a line wrapped into an option object or NONE if we arrived at the
end of the input file. In particular, in the case of standard input,
we arrive at the end when we give CTRL-D (in a Unix-like system); in
this case ``input`` will return the empty string, otherwise it returns the
input line, comprehensive of the newline character.

Here is an example of usage::

 - input "Enter a string: ";
 Enter a string: hello
 val it : string = "hello\n"
 - input "Enter CTRL-D ";
 - Enter CTRL-D 
 val it : string = ""

I we want to read a text file (assuming it fits in memory)
the simplest way is to use ``TextIO.inputAll``::

 - fun fileToString filename = let
     val file = TextIO.openIn filename
   in
     TextIO.inputAll file finally TextIO.closeIn file
   end

Conversely, if we want to write a text on a file we can define::

 - fun saveFile (fname, content) = let
      val out = TextIO.openOut fname
   in
      TextIO.output(out, content) finally TextIO.closeOut out
   end;

The examples here use the standard *let* expression, which has the form::

  let
     <bindings>
  in 
     <expression; .. ; expression>
  end

and returns as value the value of the last expression. Moreover, the examples
here use the *finally* construct, which is an Alice extension to SML, and
works as you would expect:: 

   <expression possibly raising exceptions> finally <cleanup expression>

the finally clause if executed always, even
if there is an exception in the first expression.

The examples here are very simple and do not address the issue or reading
or writing large files which do not fit in memory. But don't worry,
I will address those situations in the next issues. In order to do so,
however, I need to perform a digression on two of the most common
programming techniques in functional languages, recursion and higher order 
functions. You, as a dynamic programmer, will probably already know
both concepts, but in languages such are Python, Perl, or even Common Lisp
they are not as pervasive as in truly functional languages.

Loops and recursion
----------------------------------------------------

Perhaps the most common construct in imperative programming is the *for* loop;
in spite of that, *for* loops are usually missing in functional languages. In this
section I will explain why and the way to work around this omission.
To begin with a practical example, suppose we want to define a function
to count the number of lines in a text file, something akin to the following
Python code [#]_::

 def countLines(fname):
     counter = 0
     with file(fname) as f:
          for line in f:
              counter += 1
     return counter

.. [#] We are using here the ``with`` statement, available in Python 2.5 by
        importing it from the future: ``from __future__ import with_statement``.

How can we implement it without a *for* loop? 
One solution is to cheat and to use a *while* loop, which exists in SML::

 fun countLines fname = let
      val counter = ref 0
      val inp = TextIO.openIn fname
 in
      while not (TextIO.endOfStream inp) do  
          (TextIO.inputLine inp; counter := !counter + 1) 
      finally TextIO.closeIn inp;  
      !counter
 end

Some explanation about ML references is in order here.
ML does not allow to mutate a variable directly (as a matter of fact
most types in ML are immutable); if you want to increment an integer,
you must wrap it into a reference object, and you must mutate the
reference. Some simple experiment at the prompt should give you
the gist of what references are.

Put the integer 0 into a memory location and returns a reference to it::

 - val counter = ref 0; 
 val counter : int ref = ref 0

Return the value in the location pointed by the reference::

 - !counter; 
 val it : int = 0

Put in that location the value 0+1; return the empty tuple::

 - counter := !counter + 1;
 val it : unit = ()

Return the new value of the counter::

 - !counter
 val it : int = 1

From this explanation, it should be obvious what ``countLines`` does;
the implementation works, but it is very ugly and strongly discouraged.  I
show it here, to prove that SML can be used in an imperative way, not
to suggest you to code in this way.   However, it may be 
not obvious to re-implement ``countLines`` without mutation, unless
you have already coded in functional languages.

There is a standard tecnique to solve this kind of problem, i.e. the
conversion of an imperative loop into a functional recursion: *the
accumulator tecnique*.

The accumulator is the mutating parameter in the imperative loop, in
this case, the counter. The accumulator tecnique consists in rewriting
the loop as a recursive function, promoting the accumulator to the
rank of additional parameter in the function signature::

 - fun countLines' (file, counter) = 
       case TextIO.inputLine file
         of NONE => counter
          | SOME line => countLines'(file, counter + 1);
 val countLines' : TextIO.instream * int -> int = _fn

As you see, ``countLines'`` is a recursive function calling itself
with an incremented counter, until the file is read completely, then
it returns the final value of the accumulator.  You can then rewrite the
original function as

::

 - fun countLines (fname) = let
        val file = TextIO.openIn fname
    in
         countLines' (file, 0) 
         finally TextIO.closeIn file
    end;
 val countLines : string -> int = _fn

Recursion and accumulators are ubiquitous in functional programming, so the
time spent understanding this example is worth it. I will close this paragraph
by noting that the accumulator tecnique is also used to convert non-tail-call
recursive functions like the good old factorial::

 - fun fact 0 = 1
     | fact n = n*fact (n-1);

into a tail-call recursive function, i.e. a function returning either
a value without making a recursive call, or directly the result of a
recursive call::

 - fun fact n = fact' (n, 1) 
   and fact' (0, acc) = acc
     | fact' (n, acc) = fact' (n-1,  acc*n);

Here I have use the ``and`` syntax to define two or more conceptually connected
functions in a single step.

Tail-call recursive functions are equivalent to imperative loops, in this case to
the Python code::

  def fact(n):
         acc = 1
         while n > 0:
              acc = acc*n
              n = n-1
         return acc

Internally, the compiler of functional languages are able to translate tail-call
recursive functions back to imperative loops, so that the implementation is
very efficient. In a language like Python instead, which is not a truly functional
language, it is possible to write::

 def fact(n, acc):
       if n == 0: 
             return acc
       else: 
             return fact(n-1, acc*n)
     
but this function is not converted into a loop, you will have a loss
of performance and you will incur in the recursion limit if try to
compute the factorial of a large enough integer (say 1000) [#]_ .

.. [#] Notice that Python does not make tail call optimization *on
        purpose*, essentially for two reasons: 1) the language has a
        philosophy of preferring iteration over recursion; 2) keeping the
        recursive calls in the stack gives more informative tracebacks and
        helps debugging. Since I am at that, let me notice that Python error
        reporting and debugging features are infinitely superior to the ones
        of any SML or Scheme implementation I have seen. This is not because
        of lack of tail call optimization, it is because as a general
        philosophy Python does not care about speed but cares about
        programmer; moreover, despite the fact that SML
        and Scheme are twice as old as Python, a lot more of work went into
        Python than in SML/Scheme for what concerns practical issues.

Higher order functions
---------------------------------------------------------

Just as recursion is pervasive in functional programming, so are
higher order functions. You may judge how much functional is a
language by measuring how good is the support for recursion and for
higher order functions. In this respect, ML shines since it has a
particularly elegant syntax to define higher order functions
i.e. functions returning functions.

.. higher order functions: http://en.wikipedia.org/wiki/Higher_order_functions

The canonical example of higher order function is the adder::

 -  fun adder n = fn x => x+n;
 val adder : int -> int -> int = _fn

The notation ``fn x => x+n`` denotes what is usually called a lambda
function, according to the Lisp tradition; in this example it means
that the ``adder`` returns a function which adds *n* to any integer
number; for instance::

 - val add1 = adder 1; (* adds 1 to any number *)
 val add1 : int -> int = _fn
 -  add1 1;
 val it : int = 2

Notice that ML use the notation  ``int -> int -> int`` to denote the type of
the adder, which should be read as

    ``int -> (int -> int)`` 

i.e. *the arrow associates on the right* and should be interpreted as
"the adder is a function taking an *int* and returning a function *int->int*".
On the other hand, *function application associates on the left* and
`` adder 1 2`` should be read as

   ``(adder 1) 2``

An equivalent, but cleaner notation for the adder is

::

 -  fun adder n x = x+n;
 val adder : int -> int -> int = _fn

Notice the difference betwen ``adder(n, x)``, denoting a function of
two arguments, and ``adder n x`` denoting an higher order function,
so that ``adder n`` is a function itself. 

The ``adder`` is a simple example,
but it does not make justice to the usefulness of higher order functions.
To give a practical example, we may consider the problem of avoiding
the boilerplate when opening/closing a file. Lisp use for that a few
macros (``with-input-from-file``, ``with-output-to-file``) whereas
Python use a special syntax (the ``with`` statement); in ML it is quite
natural to define a higher order function such as the following::

 - fun ` manage fname = let
        val file = TextIO.openIn fname
    in
         manage file
         finally TextIO.closeIn file
    end;
  val ` : (TextIO.instream -> 'a) -> string -> 'a = _fn

Here ````` is just an identifier for the higher order function; I
could have used for it a longer name, such as
``higherOrderFunctionAddingAutomaticOpenCloseFunctionality``, but I am
lazy, and also I wanted to show that in ML one has a considerable
freedom in the choice of valid identifiers. The action of ````` is
clear: it takes a function which operates on TextIO.stream objects
(i.e. text files) and converts it into a function that operates on
strings (i.e. file names) and has opening/closing functionality
embedded. The type of the return value is not specified at this stage,
and this is indicated by the notation *'a* (to be read alpha), which
denotes a generic type.  Using ````` the ``fileToString`` function
defined in the first article of this series could be written as simply
as::

 - val fileToString = `TextIO.inputAll;
 val fileToString : string -> string = _fn

whereas ``countLines`` could be written as::

 - val countLines = `(fn file => countLines'(file, 0))
 val countLines : string -> int = _fn

You should begin to appreciate the power of higher order functions,
now ;) An evident weakness of the approach used here, is that it works
only for text files (actually only for files opened for input; we
would need to define a different higher order function for files
opened for output); if we wanted to wrap binary files, we would need
to define equivalent higher order functions using the ``BinIO``
library instead of ``TextIO``; then, if we wanted to use it for
sockets, we would need to define yet another higher order function; in
general there are infinite resources which can be opened and closed,
and we could define an infinite number of higher order functions doing
all the same thing.  This is bad; fortunately this potentially
infinite code duplication can be solved using functors, but you will
have to wait for the next articles to see how to do it ;)

----

*Debugging is twice as hard as writing the code in the first
place. Therefore, if you write the code as cleverly as possible, you
are, by definition, not smart enough to debug it.* -- Brian W. Kernighan