diff options
author | michele.simionato <devnull@localhost> | 2007-12-09 15:58:39 +0000 |
---|---|---|
committer | michele.simionato <devnull@localhost> | 2007-12-09 15:58:39 +0000 |
commit | 9b1b7aa5a6b762b1110391c79241a1b8dccbfa83 (patch) | |
tree | 42dd0820d6699d9ffb47519269c80f806c563633 /ml/text.txt | |
parent | c16096239a55e74292a32fe16c834797f6854edd (diff) | |
download | micheles-9b1b7aa5a6b762b1110391c79241a1b8dccbfa83.tar.gz |
Various enhancements
Diffstat (limited to 'ml/text.txt')
-rw-r--r-- | ml/text.txt | 319 |
1 files changed, 318 insertions, 1 deletions
diff --git a/ml/text.txt b/ml/text.txt index c53cc55..63a0874 100644 --- a/ml/text.txt +++ b/ml/text.txt @@ -1,5 +1,322 @@ -Text processing in SML +Functional Programming For Dynamic Programmers - Part 4 +======================================================= + +:author: Michele Simionato +:date: December 2007 + +This is the fourth of a series of articles on functional programming in +statically typed languages. It is intended for dynamic programmers, i.e. +for programmers with a background in dynamically typed languages, such as Perl, +Python, Ruby, or languages in the Lisp family. The approch is eminently practical +and example-based; the main goal is to see if we can stole some good idea from +statically typed languages. In order to be concrete, I will consider languages +in the ML family, because they are pretty nice and much easier to understand +that Haskell. + +Academic languages vs enterprise languages +----------------------------------------------------------- + +There is a widespread misconception that academic languages are clean but +unpractical, whereas enterprise languages are practical but dirty. I never +bought that argument, since I have seen examples of the opposite, and also +because I always believed that a language can be both clean *and* practical. + +For instance in my opinion both Python and Ruby are reasonably clean and +reasonably practical and this is part of the reason why they are having so +much success lately. I believe the reason why they are so clean is that from +the beginning they managed to stay away from optimization, the root of all evil. +On the other hand, I am also convinced that it possible to write a language +which is both practical, clean and fast. Unfortunately, such a language does +not exist yet. SML is fast and somewhat clean but it is definitely not practical. +Here by practical I mean enterprise-ready. + +There is an irriducible aporia_ between enterprise programmers and +academics: people working on enterprise are facing every day problems +that are already solved; to them, the importanting thing is to avoid +reinventing the wheel. Given two languages, one that provides a solution +to a problem they have, and one providing the tools to build the solution +themselves, they prefer the first one. This is natural since in an enterprise +environment one has deadlines and it is essential to be as productive +as possible; of course the best way to be productive is to *not write* code, +and to re-use code written by others. +On the other hand, in academia one has to do with new problems, or +with new techniques to solve old problems: the proof of concept is +more important than the practical implementation. Also, academics +are (rightly) interested in writing papers, not in writing code. For all these +reasons it is clear that you cannot face an +academic language such as SML with an enterprise mindset. +For instance, if you come from an enterprise environment you will be used to +expect ready availabity of a nearly infinite number +of libraries doing everything you may need and more. This is certainly +the situation for all enterprise-level languages such as C, C++, Java, C#, +Perl, Python and now even Ruby, which is younger but rapidly spreading. + +As I said many time, I assume that my readers are dynamic programmers, +well acquainted with scripting languages; I feel the urge warn them that the +situation in SML is very different than in Perl, Python or Ruby [#]_ . Here +are a few differences. + +1. + Scripting languages have a dominant (sometimes unique) + implementation; ML has dozens of implementations, each with + useful features which are not in the standard, so that + different implementations provides incompatible extensions. + +2. + Scripting languages have a BDFL_ (Benevolent Dictator For Life, + i.e. Guido van Rossum for Python, Larry Wall for Perl, Yukihiro + Matsumoto for Ruby) whereas SML is a language designed by + committee. That means that a scripting language can evolve much + *much* faster than a language designed by committee. Also, + languages designed by committee are compromises whereas + single-person languages are much more consistent. + +3. + SML has a very small community: the support you can have in newsgroup + and mailing lists is simply not comparable to the support you can get in + the scripting languages mailing list and newsgroups. + + if Andreas Rossberg won the first price + at the lottery and decided to leave for a tropical paradise, nobody would + provide support for Alice ML on the mailing list + +4. + SML is a deserted language, even compared to non-mainstream languages + as OCaml, Haskell, Scheme, Common Lisp, Smalltalk, Tcl, ...; it is perhaps the + least used language I have ever seen and this is a pity. + +5. + Generally speaking, if you want bindings for external libraries you have to write + it yourself; there are no standard bindings for GUI toolkits, database drivers, + scientific programming libraries. + +6. + Scripting languages have traditionally a strong support for Web programming + which is next to absent in SML. + + +All the issues I have just listed are just accidental, not +structural: in principle one could just solve them by writing code; in principle +a DHH_ could take an SML implementation and make it the next Ruby on Rails. +However that has not happened yet, so if you want to work with SML right now you +will be forced to reinvent the wheel. In this article I will reinvent a few wheels, +just to be able to give some meaninful example to the enterprise programmer +in the next articles. + +.. _aporia: http://en.wikipedia.org/wiki/Aporia +.. _BDFL: http://en.wikipedia.org/wiki/BDFL +.. _DHH: http://en.wikipedia.org/wiki/DHH + +.. [#] Notice that I put Perl, Python and Ruby in the mainstream + languages, since even if the number of dynamic programmers is + inferior the number of say C/C++, Java or .NET programmers, + in practice scripting programmers have available nearly everything the + other languages have available, and much better languages. + +String interpolation the practical way -------------------------------------------------------- +Every language has some builtin mechanism to do string interpolation: C +has ``sprintf``, Python has the ``%`` operator, Lisp has ``format``, but +SML has none. Therefore I will provide here a very simple interpolation library +for SML, so that I will be able to provide more interesting examples later. +The library is based on list processing. +Lists are the most common data structures in +functional languages (they formed the core of Lisp at its beginning) +and you simply cannot live without them. You should look at the `basis library`_ +to see what you can do with lists. Here I will just say that SML lists are +linked lists in the same sense of Lisp or Scheme (with the difference +that they are immutable) and not as in Python (Python lists are actually arrays, +Python is abusing the name). Just as in Scheme, where + + ``(list 1 2)`` + +is a shortcut for + + ``(cons 1 (cons 2 '()))`` + +in ML + + ``[1, 2]`` + +is a shortcut for + + ``1::2::[]`` + +and ``::`` is the *cons* operator (short for constructor). +If you don't know Scheme, a SML lists should be thought of as of a +nested Python list, i.e. + + ``[1, 2] (SML) => [1, [2, []]] (Python)`` + +.. _basis library: http://www.standardml.org/Basis/list.html + +Here is the code:: + + $ cat format.aml + .. include:: format.aml :literal: + +A few comments are in order. + +1. + We used the ability to define exceptions with a value: + ``exception ArityError of string`` means that ``ArityError`` accepts + a string argument (the error message). + +2. + We used the standard library ``String.fields`` utility, which splits a string + according to a delimiter; in our case ``String.fields (fn c => c = #"$") templ`` + splits the template into a list a strings using the character ``$`` as delimiter. + In SML characters are specified with a sharp notation, so the character ``$`` + is written ``#"$"``. + +3. + At the core of the algorithm is the recursive function + ``interp'(templN1, argsN)`` which takes a list of N+1 strings (the + splitted template) and a list of N strings (the arguments) and returns the + interpolated string. For instance, for the template ``The square of $ is $\n`` + which contains N=2 delimiters, + ``templN1`` would be ``["The square of ", " is ", "\n"]``. + +4. + The builtin ``concat`` concatenates a list of strings whereas ``rev`` reverts + a list; you are advised to read the documentation or any book on SML for more. + +String interpolation the mathematical way +------------------------------------------------- + +Functional languages have a reputation for abstractness +but in this series of articles I have focused solely on very concrete +earth-bound aspects. To correct the balance, in this section I will discuss +a few programming techniques which are quite common in +functional languages, but unheard of in "regular" languages, and that +require a mathematically-inclined mind to be understood. I am doing so +because these tricks are actually used a lot in functional languages, and +I don't want to give a false impression by ignoring them +and just discussing the easy things. + +For instance, you should not believe that functional +programming is just about functions; there is an higher order +approach to functional programming, in which the primary objects +are not functions, but operators operating on functions, the so +called *combinators*. For people with a background in Physics, I will +say this is exactly analogous to the switch from the +Schroedinger picture, in which the emphasis is on the wave functions, +to the Heisenberg picture, in which the emphasis is on the quantum +operators. + +Having warned my readers that this section is not for the faint of heart +(but you may safely skip it if you don't feel adventurous) +I will start from the concatenation operator + +:: + + ^ : string * string -> string + +which satifies the properties:: + + a ^ (b ^ c) = (a ^ b) ^ c = a ^ b ^ c + a ^ "" = "" ^ a = a + +Mathematically speaking, the set of strings in SML +is a monoid_ with respect to concatenation and the empty +string "" is the identity element. + +.. _monoid: http://en.wikipedia.org/wiki/Monoid +.. _group: http://en.wikipedia.org/wiki/Group_%28mathematics%29 +.. _group representations: http://en.wikipedia.org/wiki/Group_representation + +If you have a background in Mathematics or in Physics you will be familiar +with the theory of `group representations`_; in particular groups (and monoids +too) can be represented with operators operating on function spaces. +In the case at hand, I can define a lift transformation converting plain +strings into operators by preserving the composition law:: + + - fun L s f s' = f (s' ^ s); + val L : string -> (string -> 'a) -> string -> 'a = _fn + +(this is the same as ``fun L s = fn f => fn s' => f (s' ^ s)``). +In other words, ``L s`` is an operator (combinator) taking a function and +returning a function, with the property + + ``L (s1 ^ s2) = (L s1) o (L s2)`` + +where ``o`` denotes the composition of operators. Just as ``L`` is an upgrade +operation, promoving a plain simple string to the celestial world of operators +in function spaces, I can define a downgrade operation ``l``, demoving +celestial operators back to earthly strings:: + + - fun l O = O Fn. id ""; + val l : (('a -> 'a) -> string -> 'b) -> 'b = _fn + +``l`` takes the operator, applies it to the identity function and returns a function +which is then applied to the empty string, finally bringing back at home a +plain string; ``l`` is the inverse of ``L``, i.e. ``l o L`` is the identity +in the space of strings where ``L o l`` is the identity in the space +of operators. You can try yourself at the prompt that + +:: + + - l(L"a" o L"b"); + val it : string = "ab" + +so we succeded in making simple things hard and abstract. But not happy with +that, we are going to make things harder and more abstract, by defining +another combinator taking a function and returning an higher order function:: + + - fun str f s s' = f (s ^ s') + val str : (string -> 'a) -> string -> string -> 'a = _fn + +(this is the same as ``fun str f = fn s => fn s' => f ( s ^ s')``). +This combinator is so abstract than even when lift back to the mortal world +it is more than a mere string, it is actually the identity function on strings:: + + - l str; + val it : string -> string = _fn + - (l str) "hello"; + val it : string = "hello" + +The ``str`` combinator can be composed in the celestial word: when lift +back to the mortal world, it returns a higher order version of the concatenation +operator:: + + - l (str o str); + val it : string -> string -> string = _fn + + - l (str o str) "hello" " world"; + val it : string = "hello world" + +We can also compose ``str`` with other combinators and we can write things like:: + + - l( L"a" o str o L"b" o str o L"c") "?" ":"; + val it : string = "a?b:c" + +In other words, we have reinvented string interpolation the difficult way. +Still, there is at least an advantage of this approach with respect to the +approach we used in the previous paragraph: combinator-based string interpolation +happens at compile type and *mistakes are discovered by the compiler*, not +at runtime. Moreover, the approach can be easily extended to manage +different types: for instance, if we have to do with numbers, we +can define the combinators:: + + - fun int f s n = f (s ^ (Int.toString n)) + val str : (string -> 'a) -> string -> string -> 'a = _fn + + - fun real f s n = f (s ^ (Real.toString n)) + val real : (string -> 'a) -> string -> real -> 'a = _fn + +and use them as follows:: + + - print (l(int o L" / " o int o L" = " o real o L"\n") 5 2 2.5); + 5 / 2 = 2.5 + val it : unit = () + +If you make a mistake like using an int instead of a real, or if you forget an +argument, you will get a compile time type error. + +---- + +*Give a man a fish and he will eat for a day. Teach a man to fish and he will eat +for the rest of his life.* -- Chinese proverb |