summaryrefslogtreecommitdiff
path: root/pypers/optparse/paper2.html
blob: 67f132a5d7594b5cd61a3f657708a55aaaceda16 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta name="generator" content="Docutils 0.3.9: http://docutils.sourceforge.net/" />
<title>The optparse module: writing command-line tools the easy way</title>
</head>
<body>
<div class="document" id="the-optparse-module-writing-command-line-tools-the-easy-way">
<h1 class="title">The optparse module: writing command-line tools the easy way</h1>
<blockquote>
<table class="docutils field-list" frame="void" rules="none">
<col class="field-name" />
<col class="field-body" />
<tbody valign="top">
<tr class="field"><th class="field-name">Status:</th><td class="field-body">Draft</td>
</tr>
<tr class="field"><th class="field-name">Author:</th><td class="field-body">Michele Simionato</td>
</tr>
<tr class="field"><th class="field-name">E-mail:</th><td class="field-body"><a class="reference" href="mailto:michele.simionato&#64;gmail.com">michele.simionato&#64;gmail.com</a></td>
</tr>
<tr class="field"><th class="field-name">Date:</th><td class="field-body">May 2004</td>
</tr>
</tbody>
</table>
</blockquote>
<p><em>The optparse module is a powerful, flexible, extensible, easy-to-use 
command-line parsing library for Python. Using optparse, you can add 
intelligent, sophisticated handling of command-line options to your 
scripts with very little overhead.</em> -- Greg Ward, optparse author</p>
<div class="section" id="introduction">
<h1><a name="introduction">Introduction</a></h1>
<p>Once upon a time, when graphic interfaces were still to be dreamed
about, command-line tools were the body and the soul of all programming
tools. Many years have passed since then, but some things have not 
changed: command-line tools are still fast, efficient, portable, easy 
to use and - more importantly - reliable. You can count on them.
You can expect command-line scripts to work in any situation, 
during the installation phase, in a situation of disaster recovery, when 
your window manager breaks down and even in systems with severe 
memory/hardware constraints. When you really need them, command-line 
tools are always there.</p>
<p>Hence, it is important for a programming language - especially
one that wants to be called a &quot;scripting&quot; language - to provide 
facilities to help the programmer in the task of writing command-line 
tools. For a long time Python support for this kind of tasks has 
been provided by the``getopt`` module. I have never 
been particularly fond of <tt class="docutils literal"><span class="pre">getopt</span></tt>, since it required 
a sensible amount of coding even for the parsing of simple 
command-lines. However, with the coming of Python 2.3 the situation 
has changed: thanks to the great job of Greg Ward (the author of 
<tt class="docutils literal"><span class="pre">optparse</span></tt> a.k.a. <tt class="docutils literal"><span class="pre">Optik</span></tt>) now the Python programmer 
has at her disposal (in the standard library and not as an 
add-on module) a fully fledged Object Oriented API for 
command-line arguments parsing, which makes writing Unix-style 
command-line tools easy, efficient and fast.</p>
<p>The only disadvantage of <tt class="docutils literal"><span class="pre">optparse</span></tt> is that it is a 
sophisticated tool, which requires some time to be fully mastered. 
The purpose of this paper is to help the reader to rapidly get the 
10% of the features of <tt class="docutils literal"><span class="pre">optparse</span></tt> that she will use in the 90% of 
the cases. Taking as an example a real life application - a search and 
replace tool -  I will guide the reader through (some of) the wonders 
of <tt class="docutils literal"><span class="pre">optparse</span></tt>. Also, I will show some trick that will make your life 
with <tt class="docutils literal"><span class="pre">optparse</span></tt> much happier. 
This paper is intended for both Unix and
Windows programmers - actually I will argue that Windows programmers 
need <tt class="docutils literal"><span class="pre">optparse</span></tt> even more than Unix programmers; it does not
require any particular expertise to be fully appreciated.</p>
</div>
<div class="section" id="a-simple-example">
<h1><a name="a-simple-example">A simple example</a></h1>
<p>I will take as pedagogical example a little tool I wrote some time ago,
a multiple files search and replace tool. I needed it because I am 
not always working under Unix, and I do not always have sed/awk or
even Emacs installed, so it made sense to have this 
little Python script in my toolbox. It is only few lines long,
it can always be modified and extended with a minimal effort,
works on every platform (including my PDA) and has the advantage 
of being completely command-line driven: 
it does not require to have any graphics library installed
and I can use it when I work on a remote machine via ssh.</p>
<p>The tool takes a bunch of files and replace a given regular expression 
everywhere in-place; moreover, it saves a backup copy of the original 
un-modified files and give the option to recover 
them when I want to.  Of course, all of this can be done more efficiently 
in the Unix world with specialized tools, but those tools are written 
in C and they are not as easily customizable as a Python script, that 
you may change in real time to suit your needs. So, it makes sense 
to write this kind of utility in Python (or in Perl, but I am writing on 
Pyzine now ;)</p>
<p>As a final note, let me notice that I find <tt class="docutils literal"><span class="pre">optparse</span></tt> 
to be much more useful in the Windows world than in the Unix/Linux/Mac OS X
world. The reason is that the pletora 
of pretty good command-line tools which are available under Unix are 
missing in the Windows environment, or do not have a satisfactory 
equivalent. Therefore, 
it makes sense to write a personal collection of command-line scripts 
for your more common task, if you need to work on many platforms and
portability is an important requirement. 
Using Python and <tt class="docutils literal"><span class="pre">optparse</span></tt>, you may write your own scripts 
once and having them to run on every platform running Python, 
which means in practice any traditional platform and increasingly
more of the non-traditional ones - Python is spreading into the 
embedded market too, including PDA's, cellular phones, and more.</p>
</div>
<div class="section" id="the-unix-philosophy-for-command-line-arguments">
<h1><a name="the-unix-philosophy-for-command-line-arguments">The Unix philosophy for command-line arguments</a></h1>
<p>In order to understand how <tt class="docutils literal"><span class="pre">optparse</span></tt> works, it is essential
to understand the Unix philosophy about command-lines arguments.</p>
<p>As Greg Ward puts it:</p>
<p><em>The purpose of optparse is to make it very easy to provide the 
most standard, obvious, straightforward, and user-friendly user 
interface for Unix command-line programs. The optparse philosophy 
is heavily influenced by the Unix and GNU toolkits ...</em></p>
<p>Here is a brief summary of the terminology:
the arguments given to a command-line script - <em>i.e.</em>  the arguments
that Python stores in the list <tt class="docutils literal"><span class="pre">sys.argv[1:]</span></tt> - are classified in
three groups: options, option arguments and positional arguments. 
Options can be distinguished since they are prefixed by a dash
or a double dash; options can have arguments or not 
(there is at most an option argument right after each option); 
options without arguments are called flags. Positional arguments 
are what it is left in the command-line after you remove options 
and option arguments.</p>
<p>In the example of the search/replace tool, 
I will need two options with an argument - I want 
to pass to the script a regular expression and a replacement string - 
and I will need a flag specifying whether or not a backup of the original 
files needs to be performed. Finally, I will need a number of positional
arguments to store the names of the files on which the search and
replace will act.</p>
<p>Consider - for the sake of the example - the following situations:
you have a bunch of text files in the current directory containing dates 
in the European format DD-MM-YYYY, and that you want to convert them in
the American format MM-DD-YYYY. If you are sure that all your dates
are in the correct format, your can match them with a simple regular
expression such as <tt class="docutils literal"><span class="pre">(\d\d)-(\d\d)-(\d\d\d\d)</span></tt>.</p>
<p>In this particular example it is not so important to make a backup
copy of the original files, since to revert to the original
format it is enough to run the script again. So the syntax to use 
would be something like</p>
<blockquote>
<pre class="literal-block">
$&gt; replace.py --nobackup --regx=&quot;(\d\d)-(\d\d)-(\d\d\d\d)&quot; \
                         --repl=&quot;\2-\1-\3&quot; *.txt
</pre>
</blockquote>
<p>In order to emphasize the portability, I have used a generic 
<tt class="docutils literal"><span class="pre">$&gt;</span></tt> promtp, meaning that these examples work equally well on
both Unix and Windows (of course on Unix I could do the same 
job with sed or awk, but these tools are not as flexible as
a Python script).</p>
<p>The syntax here has the advantage of being
quite clear, but the disadvantage of being quite verbose, and it is
handier to use abbreviations for the name of the options. For instance, 
sensible abbreviations can be <tt class="docutils literal"><span class="pre">-x</span></tt> for <tt class="docutils literal"><span class="pre">--regx</span></tt>, <tt class="docutils literal"><span class="pre">-r</span></tt> for <tt class="docutils literal"><span class="pre">--repl</span></tt> 
and <tt class="docutils literal"><span class="pre">-n</span></tt> for <tt class="docutils literal"><span class="pre">--nobackup</span></tt>; moreover, the <tt class="docutils literal"><span class="pre">=</span></tt> sign can safely be
removed. Then the previous command reads</p>
<blockquote>
<pre class="literal-block">
$&gt; replace.py -n -x&quot;(\dd)-(\dd)-(\d\d\d\d)&quot; -r&quot;\2-\1-\3&quot; *.txt
</pre>
</blockquote>
<p>You see here the Unix convention at work: one-letter options
(a.k.a. short options) are prefixed with a single dash, whereas 
long options are prefixed with a double dash. The advantage of the 
convention is that short options can be composed: for instance</p>
<blockquote>
<pre class="literal-block">
$&gt; replace.py -nx &quot;(\dd)-(\dd)-(\d\d\d\d)&quot; -r &quot;\2-\1-\3&quot; *.txt
</pre>
</blockquote>
<p>means the same as the previous line, i.e. <tt class="docutils literal"><span class="pre">-nx</span></tt> is parsed as
<tt class="docutils literal"><span class="pre">-n</span> <span class="pre">-x</span></tt>.  You can also freely exchange the order of the options,
for instance in this way:</p>
<blockquote>
<pre class="literal-block">
$&gt; replace.py -nr &quot;\2-\1-\3&quot; *.txt -x &quot;(\dd)-(\dd)-(\d\d\d\d)&quot;
</pre>
</blockquote>
<p>This command will be parsed exactly as before, i.e. options and option 
arguments are not positional.</p>
</div>
<div class="section" id="how-does-it-work-in-practice">
<h1><a name="how-does-it-work-in-practice">How does it work in practice?</a></h1>
<p>Having stated the requirements, we may start implementing our 
search and replace tool. The first step, is to write down the 
documentation string:</p>
<blockquote>
<pre class="literal-block">
#!/usr/bin/env python
&quot;&quot;&quot;
Given a sequence of text files, replaces everywhere
a regular expression x with a replacement string s.

  usage: %prog files [options]
  -x, --regx=REGX: regular expression
  -r, --repl=REPL: replacement string
  -n, --nobackup: do not make backup copies
&quot;&quot;&quot;
</pre>
</blockquote>
<p>On Windows the first line in unnecessary, but is good practice to have it 
in the Unix world.</p>
<p>The next step is to write down a simple search and replace routine:</p>
<blockquote>
<pre class="literal-block">
import re

def replace(regx, repl, files, backup_option=True):
    rx = re.compile(regx)
    for fname in files:
        txt = file(fname, &quot;U&quot;).read() # quick &amp; dirty
        if backup_option:
            print &gt;&gt; file(fname+&quot;.bak&quot;, &quot;w&quot;), txt,
        print &gt;&gt; file(fname, &quot;w&quot;), rx.sub(repl, txt),
</pre>
</blockquote>
<p>This replace routine is entirely unsurprising, the only thing you
may notice is the usage of the &quot;U&quot; option in the line</p>
<blockquote>
<pre class="literal-block">
txt=file(fname,&quot;U&quot;).read()
</pre>
</blockquote>
<p>This is a new feature of Python 2.3. Text files open with the &quot;U&quot;
option are read in &quot;Universal&quot; mode: this means that Python takes
care for you of the newline pain, i.e. this script will work 
correctly everywhere, independently by the newline
conventions of your operating system. The script works by reading 
the whole file in memory: this is bad practice, and here I am assuming 
that you will use this script only on short files that will fit in 
your memory, otherwise you should &quot;massage&quot; the code a bit.
Also, a full fledged script would check if the file exists 
and can be read, and would do something in the case it is not.</p>
<p>So, how does it work? It is quite simple, really. 
First you need to instantiate an argument line parser from
the <tt class="docutils literal"><span class="pre">OptionParser</span></tt> class provided by <tt class="docutils literal"><span class="pre">optparse</span></tt>:</p>
<blockquote>
<pre class="literal-block">
import optparse 
parser = optparse.OptionParser(&quot;usage: %prog files [options]&quot;)
</pre>
</blockquote>
<p>The string <tt class="docutils literal"><span class="pre">&quot;usage:</span> <span class="pre">%prog</span> <span class="pre">files</span> <span class="pre">[options]&quot;</span></tt> will be used to
print a customized usage message,  where <tt class="docutils literal"><span class="pre">%prog</span></tt> will be replaced
by the name of the script (in this case <cite>replace.py`</cite>). You
may safely omit it and <tt class="docutils literal"><span class="pre">optparse</span></tt> will use a default 
<tt class="docutils literal"><span class="pre">&quot;usage:</span> <span class="pre">%prog</span> <span class="pre">[options]&quot;</span></tt> string.</p>
<p>Then, you tell the parser informations about which options
it must recognize:</p>
<blockquote>
<pre class="literal-block">
parser.add_option(&quot;-x&quot;, &quot;--regx&quot;,
                help=&quot;regular expression&quot;)
parser.add_option(&quot;-r&quot;, &quot;--repl&quot;,
                help=&quot;replacement string&quot;)
parser.add_option(&quot;-n&quot;, &quot;--nobackup&quot;,
                action=&quot;store_true&quot;,
                help=&quot;do not make backup copies&quot;)
</pre>
</blockquote>
<p>The <tt class="docutils literal"><span class="pre">help</span></tt> keyword argument is intended to document the
intent of the given option; it is also used by <tt class="docutils literal"><span class="pre">optparse</span></tt> in the 
usage message. The <tt class="docutils literal"><span class="pre">action=store_true</span></tt> keyword argument is
used to distinguish flags from options with arguments, it tells
<tt class="docutils literal"><span class="pre">optparse</span></tt> to set the flag <tt class="docutils literal"><span class="pre">nobackup</span></tt> to <tt class="docutils literal"><span class="pre">True</span></tt> if <tt class="docutils literal"><span class="pre">-n</span></tt>
or <tt class="docutils literal"><span class="pre">--nobackup</span></tt> is given in the command line.</p>
<p>Finally, you tell the parse to do its job and to parse the command line:</p>
<blockquote>
<pre class="literal-block">
option, files = parser.parse_args()
</pre>
</blockquote>
<p>The <tt class="docutils literal"><span class="pre">.parse_args()</span></tt> method returns two values: <tt class="docutils literal"><span class="pre">option</span></tt>, 
which is an instance of the <tt class="docutils literal"><span class="pre">optparse.Option</span></tt> class, and <tt class="docutils literal"><span class="pre">files</span></tt>,
which is a list of positional arguments.
The <tt class="docutils literal"><span class="pre">option</span></tt> object has attributes - called <em>destionations</em> in 
<tt class="docutils literal"><span class="pre">optparse</span></tt> terminology - corresponding to the given options.
In our example, <tt class="docutils literal"><span class="pre">option</span></tt> will have the attributes <tt class="docutils literal"><span class="pre">option.regx</span></tt>, 
<tt class="docutils literal"><span class="pre">option.repl</span></tt> and  <tt class="docutils literal"><span class="pre">option.nobackup</span></tt>.</p>
<p>If no options are passed to the command line, all these attributes
are initialized to <tt class="docutils literal"><span class="pre">None</span></tt>, otherwise they are initialized to 
the argument option. In particular flag options are initialized to 
<tt class="docutils literal"><span class="pre">True</span></tt> if they are given, to <tt class="docutils literal"><span class="pre">None</span></tt> otherwise. So, in our example 
<tt class="docutils literal"><span class="pre">option.nobackup</span></tt> is <tt class="docutils literal"><span class="pre">True</span></tt> if the flag <tt class="docutils literal"><span class="pre">-n</span></tt> or <tt class="docutils literal"><span class="pre">--nobackup</span></tt> 
is given.
The list <tt class="docutils literal"><span class="pre">files</span></tt> contains the files passed
to the command line (assuming you passed
the names of accessible text files in your system).</p>
<p>The main logic can be as simple as the following:</p>
<blockquote>
<pre class="literal-block">
if not files:
    print &quot;No files given!&quot;
elif option.regx and option.repl:
    replace(option.regex, option.repl, files, not option.nobackup)
else:
    print &quot;Missing options or unrecognized options.&quot;
    print __doc__ # documentation on how to use the script
</pre>
</blockquote>
<p>A nice feature of <tt class="docutils literal"><span class="pre">optparse</span></tt> is that an help option is automatically
created, so <tt class="docutils literal"><span class="pre">replace.py</span> <span class="pre">-h</span></tt> (or  <tt class="docutils literal"><span class="pre">replace.py</span> <span class="pre">--help</span></tt>) will work as
you may expect:</p>
<blockquote>
<pre class="literal-block">
$&gt; replace.py --help
usage: replace.py files [options]


options:
  -h, --help           show this help message and exit
  -xREGX, --regx=REGX  regular expression
  -rREPL, --repl=REPL  replacement string
  -n, --nobackup       do not make backup copies
</pre>
</blockquote>
<p>You may programmatically print the usage message by invoking  
<tt class="docutils literal"><span class="pre">parser.print_help()</span></tt>.</p>
<p>At this point you may test your script and see that it works as
advertised.</p>
</div>
<div class="section" id="how-to-reduce-verbosity-and-make-your-life-with-optparse-happier">
<h1><a name="how-to-reduce-verbosity-and-make-your-life-with-optparse-happier">How to reduce verbosity and make your life with <tt class="docutils literal"><span class="pre">optparse</span></tt> happier</a></h1>
<p>The power of <tt class="docutils literal"><span class="pre">optparse``comes</span> <span class="pre">with</span> <span class="pre">a</span> <span class="pre">penalty:</span> <span class="pre">using</span> <span class="pre">``optparse</span></tt> in
the standard way, as I explained before, involves a certain amount of 
verbosity/redundance.</p>
<p>Suppose for instance 
I want to add the ability to restore the original file from the backup copy.
Then, we have to change the script in three points: in the docstring,
in the <tt class="docutils literal"><span class="pre">add_option</span></tt> list, and in the <tt class="docutils literal"><span class="pre">if</span> <span class="pre">..</span> <span class="pre">elif</span> <span class="pre">..</span> <span class="pre">else</span> <span class="pre">...</span></tt> 
statement. At least one of this is redundant.</p>
<p>The redundance can be removed by parsing the docstring to infer the 
options to be recognized. This avoids the boring task 
of writing by hand the <tt class="docutils literal"><span class="pre">parser.add_option</span></tt> lines. 
I implemented this idea in a cookbook recipe, by writing an 
<tt class="docutils literal"><span class="pre">optionparse</span></tt> module which is just a thin wrapper around <tt class="docutils literal"><span class="pre">optparse</span></tt>.
For sake of space, I cannot repeat it here, but you can find the code
and a small explanation in the Python Cookbook (see the reference below).
It is really easy to use. For instance, the paper you are 
reading now has been written by using <tt class="docutils literal"><span class="pre">optionparse</span></tt>: I used it to 
write a simple wrapper to docutils - the standard
Python tool which converts (restructured) text files to HTML pages. 
It is also nice to notice that internally
docutils itself uses <tt class="docutils literal"><span class="pre">optparse</span></tt> to do its job, so actually this
paper has been composed by using <tt class="docutils literal"><span class="pre">optparse</span></tt> twice!</p>
<p>Finally, you should keep in mind that this article only scratch the
surface of <tt class="docutils literal"><span class="pre">optparse</span></tt>, which is quite sophisticated. 
For instance you can specify default values, different destinations, 
a <tt class="docutils literal"><span class="pre">store_false</span></tt> action and much more, even if often you don't need 
all this power. Still, it is handy to have the power at your disposal when 
you need it.  The serious user of <tt class="docutils literal"><span class="pre">optparse</span></tt> is strongly 
encorauged to read the documentation in the standard library, which 
is pretty good and detailed. I think that this article has fullfilled
its function of &quot;appetizer&quot; to <tt class="docutils literal"><span class="pre">optparse</span></tt>, if it has stimulate 
the reader to learn more.</p>
</div>
<div class="section" id="references">
<h1><a name="references">References</a></h1>
<ul class="simple">
<li><tt class="docutils literal"><span class="pre">optparse/optik</span></tt> is a sourceforge project on its own:  
<a class="reference" href="http://optik.sourceforge.net">http://optik.sourceforge.net</a></li>
<li>starting from Python 2.3, <tt class="docutils literal"><span class="pre">optparse</span></tt> is included in the standard library:
<a class="reference" href="http://www.python.org/doc/2.3.4/lib/module-optparse.html">http://www.python.org/doc/2.3.4/lib/module-optparse.html</a></li>
<li>I wrote a Python Cookbook recipe about optparse:
<a class="reference" href="http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/278844">http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/278844</a></li>
</ul>
</div>
</div>
</body>
</html>