summaryrefslogtreecommitdiff
path: root/pypers/optparse/paper2.txt
blob: 00e00cddf1febcb200c7b47ee0f8c588bc9ddbac (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
The optparse module: writing command-line tools the easy way
=======================================================================

 :Status: Draft
 :Author: Michele Simionato
 :E-mail: michele.simionato@gmail.com
 :Date: May 2004

*The optparse module is a powerful, flexible, extensible, easy-to-use 
command-line parsing library for Python. Using optparse, you can add 
intelligent, sophisticated handling of command-line options to your 
scripts with very little overhead.* -- Greg Ward, optparse author

Introduction
-----------------------------------------------------------------------

Once upon a time, when graphic interfaces were still to be dreamed
about, command-line tools were the body and the soul of all programming
tools. Many years have passed since then, but some things have not 
changed: command-line tools are still fast, efficient, portable, easy 
to use and - more importantly - reliable. You can count on them.
You can expect command-line scripts to work in any situation, 
during the installation phase, in a situation of disaster recovery, when 
your window manager breaks down and even in systems with severe 
memory/hardware constraints. When you really need them, command-line 
tools are always there. 

Hence, it is important for a programming language - especially
one that wants to be called a "scripting" language - to provide 
facilities to help the programmer in the task of writing command-line 
tools. For a long time Python support for this kind of tasks has 
been provided by the``getopt`` module. I have never 
been particularly fond of ``getopt``, since it required 
a sensible amount of coding even for the parsing of simple 
command-lines. However, with the coming of Python 2.3 the situation 
has changed: thanks to the great job of Greg Ward (the author of 
``optparse`` a.k.a. ``Optik``) now the Python programmer 
has at her disposal (in the standard library and not as an 
add-on module) a fully fledged Object Oriented API for 
command-line arguments parsing, which makes writing Unix-style 
command-line tools easy, efficient and fast.

The only disadvantage of ``optparse`` is that it is a 
sophisticated tool, which requires some time to be fully mastered. 
The purpose of this paper is to help the reader to rapidly get the 
10% of the features of ``optparse`` that she will use in the 90% of 
the cases. Taking as an example a real life application - a search and 
replace tool -  I will guide the reader through (some of) the wonders 
of ``optparse``. Also, I will show some trick that will make your life 
with ``optparse`` much happier. 
This paper is intended for both Unix and
Windows programmers - actually I will argue that Windows programmers 
need ``optparse`` even more than Unix programmers; it does not
require any particular expertise to be fully appreciated.

A simple example
---------------------------------------

I will take as pedagogical example a little tool I wrote some time ago,
a multiple files search and replace tool. I needed it because I am 
not always working under Unix, and I do not always have sed/awk or
even Emacs installed, so it made sense to have this 
little Python script in my toolbox. It is only few lines long,
it can always be modified and extended with a minimal effort,
works on every platform (including my PDA) and has the advantage 
of being completely command-line driven: 
it does not require to have any graphics library installed
and I can use it when I work on a remote machine via ssh.

The tool takes a bunch of files and replace a given regular expression 
everywhere in-place; moreover, it saves a backup copy of the original 
un-modified files and give the option to recover 
them when I want to.  Of course, all of this can be done more efficiently 
in the Unix world with specialized tools, but those tools are written 
in C and they are not as easily customizable as a Python script, that 
you may change in real time to suit your needs. So, it makes sense 
to write this kind of utility in Python (or in Perl, but I am writing on 
Pyzine now ;)

As a final note, let me notice that I find ``optparse`` 
to be much more useful in the Windows world than in the Unix/Linux/Mac OS X
world. The reason is that the pletora 
of pretty good command-line tools which are available under Unix are 
missing in the Windows environment, or do not have a satisfactory 
equivalent. Therefore, 
it makes sense to write a personal collection of command-line scripts 
for your more common task, if you need to work on many platforms and
portability is an important requirement. 
Using Python and ``optparse``, you may write your own scripts 
once and having them to run on every platform running Python, 
which means in practice any traditional platform and increasingly
more of the non-traditional ones - Python is spreading into the 
embedded market too, including PDA's, cellular phones, and more.

The Unix philosophy for command-line arguments
-------------------------------------------------

In order to understand how ``optparse`` works, it is essential
to understand the Unix philosophy about command-lines arguments. 

As Greg Ward puts it:

*The purpose of optparse is to make it very easy to provide the 
most standard, obvious, straightforward, and user-friendly user 
interface for Unix command-line programs. The optparse philosophy 
is heavily influenced by the Unix and GNU toolkits ...*


Here is a brief summary of the terminology:
the arguments given to a command-line script - *i.e.*  the arguments
that Python stores in the list ``sys.argv[1:]`` - are classified in
three groups: options, option arguments and positional arguments. 
Options can be distinguished since they are prefixed by a dash
or a double dash; options can have arguments or not 
(there is at most an option argument right after each option); 
options without arguments are called flags. Positional arguments 
are what it is left in the command-line after you remove options 
and option arguments.

In the example of the search/replace tool, 
I will need two options with an argument - I want 
to pass to the script a regular expression and a replacement string - 
and I will need a flag specifying whether or not a backup of the original 
files needs to be performed. Finally, I will need a number of positional
arguments to store the names of the files on which the search and
replace will act.

Consider - for the sake of the example - the following situations:
you have a bunch of text files in the current directory containing dates 
in the European format DD-MM-YYYY, and that you want to convert them in
the American format MM-DD-YYYY. If you are sure that all your dates
are in the correct format, your can match them with a simple regular
expression such as ``(\d\d)-(\d\d)-(\d\d\d\d)``.

In this particular example it is not so important to make a backup
copy of the original files, since to revert to the original
format it is enough to run the script again. So the syntax to use 
would be something like

 ::

  $> replace.py --nobackup --regx="(\d\d)-(\d\d)-(\d\d\d\d)" \
                           --repl="\2-\1-\3" *.txt


In order to emphasize the portability, I have used a generic 
``$>`` promtp, meaning that these examples work equally well on
both Unix and Windows (of course on Unix I could do the same 
job with sed or awk, but these tools are not as flexible as
a Python script). 

The syntax here has the advantage of being
quite clear, but the disadvantage of being quite verbose, and it is
handier to use abbreviations for the name of the options. For instance, 
sensible abbreviations can be ``-x`` for ``--regx``, ``-r`` for ``--repl`` 
and ``-n`` for ``--nobackup``; moreover, the ``=`` sign can safely be
removed. Then the previous command reads

 ::

  $> replace.py -n -x"(\dd)-(\dd)-(\d\d\d\d)" -r"\2-\1-\3" *.txt

You see here the Unix convention at work: one-letter options
(a.k.a. short options) are prefixed with a single dash, whereas 
long options are prefixed with a double dash. The advantage of the 
convention is that short options can be composed: for instance

 ::

  $> replace.py -nx "(\dd)-(\dd)-(\d\d\d\d)" -r "\2-\1-\3" *.txt

means the same as the previous line, i.e. ``-nx`` is parsed as
``-n -x``.  You can also freely exchange the order of the options,
for instance in this way:

 ::

  $> replace.py -nr "\2-\1-\3" *.txt -x "(\dd)-(\dd)-(\d\d\d\d)"

This command will be parsed exactly as before, i.e. options and option 
arguments are not positional. 

How does it work in practice?
-----------------------------

Having stated the requirements, we may start implementing our 
search and replace tool. The first step, is to write down the 
documentation string:

 ::

  #!/usr/bin/env python
  """
  Given a sequence of text files, replaces everywhere
  a regular expression x with a replacement string s.

    usage: %prog files [options]
    -x, --regx=REGX: regular expression
    -r, --repl=REPL: replacement string
    -n, --nobackup: do not make backup copies
  """

On Windows the first line in unnecessary, but is good practice to have it 
in the Unix world.

The next step is to write down a simple search and replace routine:

 ::
 
  import re

  def replace(regx, repl, files, backup_option=True):
      rx = re.compile(regx)
      for fname in files:
          txt = file(fname, "U").read() # quick & dirty
          if backup_option:
              print >> file(fname+".bak", "w"), txt,
          print >> file(fname, "w"), rx.sub(repl, txt),

This replace routine is entirely unsurprising, the only thing you
may notice is the usage of the "U" option in the line

 ::

     txt=file(fname,"U").read()

This is a new feature of Python 2.3. Text files open with the "U"
option are read in "Universal" mode: this means that Python takes
care for you of the newline pain, i.e. this script will work 
correctly everywhere, independently by the newline
conventions of your operating system. The script works by reading 
the whole file in memory: this is bad practice, and here I am assuming 
that you will use this script only on short files that will fit in 
your memory, otherwise you should "massage" the code a bit.
Also, a full fledged script would check if the file exists 
and can be read, and would do something in the case it is not.

So, how does it work? It is quite simple, really. 
First you need to instantiate an argument line parser from
the ``OptionParser`` class provided by ``optparse``:

 ::

  import optparse 
  parser = optparse.OptionParser("usage: %prog files [options]")

The string ``"usage: %prog files [options]"`` will be used to
print a customized usage message,  where ``%prog`` will be replaced
by the name of the script (in this case `replace.py``). You
may safely omit it and ``optparse`` will use a default 
``"usage: %prog [options]"`` string.

Then, you tell the parser informations about which options
it must recognize:

 ::

  parser.add_option("-x", "--regx",
                  help="regular expression")
  parser.add_option("-r", "--repl",
                  help="replacement string")
  parser.add_option("-n", "--nobackup",
                  action="store_true",
                  help="do not make backup copies")

The ``help`` keyword argument is intended to document the
intent of the given option; it is also used by ``optparse`` in the 
usage message. The ``action=store_true`` keyword argument is
used to distinguish flags from options with arguments, it tells
``optparse`` to set the flag ``nobackup`` to ``True`` if ``-n``
or ``--nobackup`` is given in the command line.

Finally, you tell the parse to do its job and to parse the command line:

 ::

  option, files = parser.parse_args()

The ``.parse_args()`` method returns two values: ``option``, 
which is an instance of the ``optparse.Option`` class, and ``files``,
which is a list of positional arguments.
The ``option`` object has attributes - called *destionations* in 
``optparse`` terminology - corresponding to the given options.
In our example, ``option`` will have the attributes ``option.regx``, 
``option.repl`` and  ``option.nobackup``.

If no options are passed to the command line, all these attributes
are initialized to ``None``, otherwise they are initialized to 
the argument option. In particular flag options are initialized to 
``True`` if they are given, to ``None`` otherwise. So, in our example 
``option.nobackup`` is ``True`` if the flag ``-n`` or ``--nobackup`` 
is given.
The list ``files`` contains the files passed
to the command line (assuming you passed
the names of accessible text files in your system).

The main logic can be as simple as the following:

 ::

      if not files:
          print "No files given!"
      elif option.regx and option.repl:
          replace(option.regex, option.repl, files, not option.nobackup)
      else:
          print "Missing options or unrecognized options."
	  print __doc__ # documentation on how to use the script

A nice feature of ``optparse`` is that an help option is automatically
created, so ``replace.py -h`` (or  ``replace.py --help``) will work as
you may expect:

 ::

  $> replace.py --help
  usage: replace.py files [options]


  options:
    -h, --help           show this help message and exit
    -xREGX, --regx=REGX  regular expression
    -rREPL, --repl=REPL  replacement string
    -n, --nobackup       do not make backup copies


You may programmatically print the usage message by invoking  
``parser.print_help()``.

At this point you may test your script and see that it works as
advertised.

How to reduce verbosity and make your life with ``optparse`` happier
---------------------------------------------------------------------

The power of ``optparse``comes with a penalty: using ``optparse`` in
the standard way, as I explained before, involves a certain amount of 
verbosity/redundance. 

Suppose for instance 
I want to add the ability to restore the original file from the backup copy.
Then, we have to change the script in three points: in the docstring,
in the ``add_option`` list, and in the ``if .. elif .. else ...`` 
statement. At least one of this is redundant. 

The redundance can be removed by parsing the docstring to infer the 
options to be recognized. This avoids the boring task 
of writing by hand the ``parser.add_option`` lines. 
I implemented this idea in a cookbook recipe, by writing an 
``optionparse`` module which is just a thin wrapper around ``optparse``.
For sake of space, I cannot repeat it here, but you can find the code
and a small explanation in the Python Cookbook (see the reference below).
It is really easy to use. For instance, the paper you are 
reading now has been written by using ``optionparse``: I used it to 
write a simple wrapper to docutils - the standard
Python tool which converts (restructured) text files to HTML pages. 
It is also nice to notice that internally
docutils itself uses ``optparse`` to do its job, so actually this
paper has been composed by using ``optparse`` twice!
 
Finally, you should keep in mind that this article only scratch the
surface of ``optparse``, which is quite sophisticated. 
For instance you can specify default values, different destinations, 
a ``store_false`` action and much more, even if often you don't need 
all this power. Still, it is handy to have the power at your disposal when 
you need it.  The serious user of ``optparse`` is strongly 
encorauged to read the documentation in the standard library, which 
is pretty good and detailed. I think that this article has fullfilled
its function of "appetizer" to ``optparse``, if it has stimulate 
the reader to learn more.

References
--------------------------

- ``optparse/optik`` is a sourceforge project on its own:  
  http://optik.sourceforge.net

- starting from Python 2.3, ``optparse`` is included in the standard library:
  http://www.python.org/doc/2.3.4/lib/module-optparse.html

- I wrote a Python Cookbook recipe about optparse:
  http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/278844