summaryrefslogtreecommitdiff
path: root/pypers/optparse/paper2.html
diff options
context:
space:
mode:
Diffstat (limited to 'pypers/optparse/paper2.html')
-rwxr-xr-xpypers/optparse/paper2.html373
1 files changed, 373 insertions, 0 deletions
diff --git a/pypers/optparse/paper2.html b/pypers/optparse/paper2.html
new file mode 100755
index 0000000..67f132a
--- /dev/null
+++ b/pypers/optparse/paper2.html
@@ -0,0 +1,373 @@
+<?xml version="1.0" encoding="utf-8" ?>
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
+<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="Docutils 0.3.9: http://docutils.sourceforge.net/" />
+<title>The optparse module: writing command-line tools the easy way</title>
+</head>
+<body>
+<div class="document" id="the-optparse-module-writing-command-line-tools-the-easy-way">
+<h1 class="title">The optparse module: writing command-line tools the easy way</h1>
+<blockquote>
+<table class="docutils field-list" frame="void" rules="none">
+<col class="field-name" />
+<col class="field-body" />
+<tbody valign="top">
+<tr class="field"><th class="field-name">Status:</th><td class="field-body">Draft</td>
+</tr>
+<tr class="field"><th class="field-name">Author:</th><td class="field-body">Michele Simionato</td>
+</tr>
+<tr class="field"><th class="field-name">E-mail:</th><td class="field-body"><a class="reference" href="mailto:michele.simionato&#64;gmail.com">michele.simionato&#64;gmail.com</a></td>
+</tr>
+<tr class="field"><th class="field-name">Date:</th><td class="field-body">May 2004</td>
+</tr>
+</tbody>
+</table>
+</blockquote>
+<p><em>The optparse module is a powerful, flexible, extensible, easy-to-use
+command-line parsing library for Python. Using optparse, you can add
+intelligent, sophisticated handling of command-line options to your
+scripts with very little overhead.</em> -- Greg Ward, optparse author</p>
+<div class="section" id="introduction">
+<h1><a name="introduction">Introduction</a></h1>
+<p>Once upon a time, when graphic interfaces were still to be dreamed
+about, command-line tools were the body and the soul of all programming
+tools. Many years have passed since then, but some things have not
+changed: command-line tools are still fast, efficient, portable, easy
+to use and - more importantly - reliable. You can count on them.
+You can expect command-line scripts to work in any situation,
+during the installation phase, in a situation of disaster recovery, when
+your window manager breaks down and even in systems with severe
+memory/hardware constraints. When you really need them, command-line
+tools are always there.</p>
+<p>Hence, it is important for a programming language - especially
+one that wants to be called a &quot;scripting&quot; language - to provide
+facilities to help the programmer in the task of writing command-line
+tools. For a long time Python support for this kind of tasks has
+been provided by the``getopt`` module. I have never
+been particularly fond of <tt class="docutils literal"><span class="pre">getopt</span></tt>, since it required
+a sensible amount of coding even for the parsing of simple
+command-lines. However, with the coming of Python 2.3 the situation
+has changed: thanks to the great job of Greg Ward (the author of
+<tt class="docutils literal"><span class="pre">optparse</span></tt> a.k.a. <tt class="docutils literal"><span class="pre">Optik</span></tt>) now the Python programmer
+has at her disposal (in the standard library and not as an
+add-on module) a fully fledged Object Oriented API for
+command-line arguments parsing, which makes writing Unix-style
+command-line tools easy, efficient and fast.</p>
+<p>The only disadvantage of <tt class="docutils literal"><span class="pre">optparse</span></tt> is that it is a
+sophisticated tool, which requires some time to be fully mastered.
+The purpose of this paper is to help the reader to rapidly get the
+10% of the features of <tt class="docutils literal"><span class="pre">optparse</span></tt> that she will use in the 90% of
+the cases. Taking as an example a real life application - a search and
+replace tool - I will guide the reader through (some of) the wonders
+of <tt class="docutils literal"><span class="pre">optparse</span></tt>. Also, I will show some trick that will make your life
+with <tt class="docutils literal"><span class="pre">optparse</span></tt> much happier.
+This paper is intended for both Unix and
+Windows programmers - actually I will argue that Windows programmers
+need <tt class="docutils literal"><span class="pre">optparse</span></tt> even more than Unix programmers; it does not
+require any particular expertise to be fully appreciated.</p>
+</div>
+<div class="section" id="a-simple-example">
+<h1><a name="a-simple-example">A simple example</a></h1>
+<p>I will take as pedagogical example a little tool I wrote some time ago,
+a multiple files search and replace tool. I needed it because I am
+not always working under Unix, and I do not always have sed/awk or
+even Emacs installed, so it made sense to have this
+little Python script in my toolbox. It is only few lines long,
+it can always be modified and extended with a minimal effort,
+works on every platform (including my PDA) and has the advantage
+of being completely command-line driven:
+it does not require to have any graphics library installed
+and I can use it when I work on a remote machine via ssh.</p>
+<p>The tool takes a bunch of files and replace a given regular expression
+everywhere in-place; moreover, it saves a backup copy of the original
+un-modified files and give the option to recover
+them when I want to. Of course, all of this can be done more efficiently
+in the Unix world with specialized tools, but those tools are written
+in C and they are not as easily customizable as a Python script, that
+you may change in real time to suit your needs. So, it makes sense
+to write this kind of utility in Python (or in Perl, but I am writing on
+Pyzine now ;)</p>
+<p>As a final note, let me notice that I find <tt class="docutils literal"><span class="pre">optparse</span></tt>
+to be much more useful in the Windows world than in the Unix/Linux/Mac OS X
+world. The reason is that the pletora
+of pretty good command-line tools which are available under Unix are
+missing in the Windows environment, or do not have a satisfactory
+equivalent. Therefore,
+it makes sense to write a personal collection of command-line scripts
+for your more common task, if you need to work on many platforms and
+portability is an important requirement.
+Using Python and <tt class="docutils literal"><span class="pre">optparse</span></tt>, you may write your own scripts
+once and having them to run on every platform running Python,
+which means in practice any traditional platform and increasingly
+more of the non-traditional ones - Python is spreading into the
+embedded market too, including PDA's, cellular phones, and more.</p>
+</div>
+<div class="section" id="the-unix-philosophy-for-command-line-arguments">
+<h1><a name="the-unix-philosophy-for-command-line-arguments">The Unix philosophy for command-line arguments</a></h1>
+<p>In order to understand how <tt class="docutils literal"><span class="pre">optparse</span></tt> works, it is essential
+to understand the Unix philosophy about command-lines arguments.</p>
+<p>As Greg Ward puts it:</p>
+<p><em>The purpose of optparse is to make it very easy to provide the
+most standard, obvious, straightforward, and user-friendly user
+interface for Unix command-line programs. The optparse philosophy
+is heavily influenced by the Unix and GNU toolkits ...</em></p>
+<p>Here is a brief summary of the terminology:
+the arguments given to a command-line script - <em>i.e.</em> the arguments
+that Python stores in the list <tt class="docutils literal"><span class="pre">sys.argv[1:]</span></tt> - are classified in
+three groups: options, option arguments and positional arguments.
+Options can be distinguished since they are prefixed by a dash
+or a double dash; options can have arguments or not
+(there is at most an option argument right after each option);
+options without arguments are called flags. Positional arguments
+are what it is left in the command-line after you remove options
+and option arguments.</p>
+<p>In the example of the search/replace tool,
+I will need two options with an argument - I want
+to pass to the script a regular expression and a replacement string -
+and I will need a flag specifying whether or not a backup of the original
+files needs to be performed. Finally, I will need a number of positional
+arguments to store the names of the files on which the search and
+replace will act.</p>
+<p>Consider - for the sake of the example - the following situations:
+you have a bunch of text files in the current directory containing dates
+in the European format DD-MM-YYYY, and that you want to convert them in
+the American format MM-DD-YYYY. If you are sure that all your dates
+are in the correct format, your can match them with a simple regular
+expression such as <tt class="docutils literal"><span class="pre">(\d\d)-(\d\d)-(\d\d\d\d)</span></tt>.</p>
+<p>In this particular example it is not so important to make a backup
+copy of the original files, since to revert to the original
+format it is enough to run the script again. So the syntax to use
+would be something like</p>
+<blockquote>
+<pre class="literal-block">
+$&gt; replace.py --nobackup --regx=&quot;(\d\d)-(\d\d)-(\d\d\d\d)&quot; \
+ --repl=&quot;\2-\1-\3&quot; *.txt
+</pre>
+</blockquote>
+<p>In order to emphasize the portability, I have used a generic
+<tt class="docutils literal"><span class="pre">$&gt;</span></tt> promtp, meaning that these examples work equally well on
+both Unix and Windows (of course on Unix I could do the same
+job with sed or awk, but these tools are not as flexible as
+a Python script).</p>
+<p>The syntax here has the advantage of being
+quite clear, but the disadvantage of being quite verbose, and it is
+handier to use abbreviations for the name of the options. For instance,
+sensible abbreviations can be <tt class="docutils literal"><span class="pre">-x</span></tt> for <tt class="docutils literal"><span class="pre">--regx</span></tt>, <tt class="docutils literal"><span class="pre">-r</span></tt> for <tt class="docutils literal"><span class="pre">--repl</span></tt>
+and <tt class="docutils literal"><span class="pre">-n</span></tt> for <tt class="docutils literal"><span class="pre">--nobackup</span></tt>; moreover, the <tt class="docutils literal"><span class="pre">=</span></tt> sign can safely be
+removed. Then the previous command reads</p>
+<blockquote>
+<pre class="literal-block">
+$&gt; replace.py -n -x&quot;(\dd)-(\dd)-(\d\d\d\d)&quot; -r&quot;\2-\1-\3&quot; *.txt
+</pre>
+</blockquote>
+<p>You see here the Unix convention at work: one-letter options
+(a.k.a. short options) are prefixed with a single dash, whereas
+long options are prefixed with a double dash. The advantage of the
+convention is that short options can be composed: for instance</p>
+<blockquote>
+<pre class="literal-block">
+$&gt; replace.py -nx &quot;(\dd)-(\dd)-(\d\d\d\d)&quot; -r &quot;\2-\1-\3&quot; *.txt
+</pre>
+</blockquote>
+<p>means the same as the previous line, i.e. <tt class="docutils literal"><span class="pre">-nx</span></tt> is parsed as
+<tt class="docutils literal"><span class="pre">-n</span> <span class="pre">-x</span></tt>. You can also freely exchange the order of the options,
+for instance in this way:</p>
+<blockquote>
+<pre class="literal-block">
+$&gt; replace.py -nr &quot;\2-\1-\3&quot; *.txt -x &quot;(\dd)-(\dd)-(\d\d\d\d)&quot;
+</pre>
+</blockquote>
+<p>This command will be parsed exactly as before, i.e. options and option
+arguments are not positional.</p>
+</div>
+<div class="section" id="how-does-it-work-in-practice">
+<h1><a name="how-does-it-work-in-practice">How does it work in practice?</a></h1>
+<p>Having stated the requirements, we may start implementing our
+search and replace tool. The first step, is to write down the
+documentation string:</p>
+<blockquote>
+<pre class="literal-block">
+#!/usr/bin/env python
+&quot;&quot;&quot;
+Given a sequence of text files, replaces everywhere
+a regular expression x with a replacement string s.
+
+ usage: %prog files [options]
+ -x, --regx=REGX: regular expression
+ -r, --repl=REPL: replacement string
+ -n, --nobackup: do not make backup copies
+&quot;&quot;&quot;
+</pre>
+</blockquote>
+<p>On Windows the first line in unnecessary, but is good practice to have it
+in the Unix world.</p>
+<p>The next step is to write down a simple search and replace routine:</p>
+<blockquote>
+<pre class="literal-block">
+import re
+
+def replace(regx, repl, files, backup_option=True):
+ rx = re.compile(regx)
+ for fname in files:
+ txt = file(fname, &quot;U&quot;).read() # quick &amp; dirty
+ if backup_option:
+ print &gt;&gt; file(fname+&quot;.bak&quot;, &quot;w&quot;), txt,
+ print &gt;&gt; file(fname, &quot;w&quot;), rx.sub(repl, txt),
+</pre>
+</blockquote>
+<p>This replace routine is entirely unsurprising, the only thing you
+may notice is the usage of the &quot;U&quot; option in the line</p>
+<blockquote>
+<pre class="literal-block">
+txt=file(fname,&quot;U&quot;).read()
+</pre>
+</blockquote>
+<p>This is a new feature of Python 2.3. Text files open with the &quot;U&quot;
+option are read in &quot;Universal&quot; mode: this means that Python takes
+care for you of the newline pain, i.e. this script will work
+correctly everywhere, independently by the newline
+conventions of your operating system. The script works by reading
+the whole file in memory: this is bad practice, and here I am assuming
+that you will use this script only on short files that will fit in
+your memory, otherwise you should &quot;massage&quot; the code a bit.
+Also, a full fledged script would check if the file exists
+and can be read, and would do something in the case it is not.</p>
+<p>So, how does it work? It is quite simple, really.
+First you need to instantiate an argument line parser from
+the <tt class="docutils literal"><span class="pre">OptionParser</span></tt> class provided by <tt class="docutils literal"><span class="pre">optparse</span></tt>:</p>
+<blockquote>
+<pre class="literal-block">
+import optparse
+parser = optparse.OptionParser(&quot;usage: %prog files [options]&quot;)
+</pre>
+</blockquote>
+<p>The string <tt class="docutils literal"><span class="pre">&quot;usage:</span> <span class="pre">%prog</span> <span class="pre">files</span> <span class="pre">[options]&quot;</span></tt> will be used to
+print a customized usage message, where <tt class="docutils literal"><span class="pre">%prog</span></tt> will be replaced
+by the name of the script (in this case <cite>replace.py`</cite>). You
+may safely omit it and <tt class="docutils literal"><span class="pre">optparse</span></tt> will use a default
+<tt class="docutils literal"><span class="pre">&quot;usage:</span> <span class="pre">%prog</span> <span class="pre">[options]&quot;</span></tt> string.</p>
+<p>Then, you tell the parser informations about which options
+it must recognize:</p>
+<blockquote>
+<pre class="literal-block">
+parser.add_option(&quot;-x&quot;, &quot;--regx&quot;,
+ help=&quot;regular expression&quot;)
+parser.add_option(&quot;-r&quot;, &quot;--repl&quot;,
+ help=&quot;replacement string&quot;)
+parser.add_option(&quot;-n&quot;, &quot;--nobackup&quot;,
+ action=&quot;store_true&quot;,
+ help=&quot;do not make backup copies&quot;)
+</pre>
+</blockquote>
+<p>The <tt class="docutils literal"><span class="pre">help</span></tt> keyword argument is intended to document the
+intent of the given option; it is also used by <tt class="docutils literal"><span class="pre">optparse</span></tt> in the
+usage message. The <tt class="docutils literal"><span class="pre">action=store_true</span></tt> keyword argument is
+used to distinguish flags from options with arguments, it tells
+<tt class="docutils literal"><span class="pre">optparse</span></tt> to set the flag <tt class="docutils literal"><span class="pre">nobackup</span></tt> to <tt class="docutils literal"><span class="pre">True</span></tt> if <tt class="docutils literal"><span class="pre">-n</span></tt>
+or <tt class="docutils literal"><span class="pre">--nobackup</span></tt> is given in the command line.</p>
+<p>Finally, you tell the parse to do its job and to parse the command line:</p>
+<blockquote>
+<pre class="literal-block">
+option, files = parser.parse_args()
+</pre>
+</blockquote>
+<p>The <tt class="docutils literal"><span class="pre">.parse_args()</span></tt> method returns two values: <tt class="docutils literal"><span class="pre">option</span></tt>,
+which is an instance of the <tt class="docutils literal"><span class="pre">optparse.Option</span></tt> class, and <tt class="docutils literal"><span class="pre">files</span></tt>,
+which is a list of positional arguments.
+The <tt class="docutils literal"><span class="pre">option</span></tt> object has attributes - called <em>destionations</em> in
+<tt class="docutils literal"><span class="pre">optparse</span></tt> terminology - corresponding to the given options.
+In our example, <tt class="docutils literal"><span class="pre">option</span></tt> will have the attributes <tt class="docutils literal"><span class="pre">option.regx</span></tt>,
+<tt class="docutils literal"><span class="pre">option.repl</span></tt> and <tt class="docutils literal"><span class="pre">option.nobackup</span></tt>.</p>
+<p>If no options are passed to the command line, all these attributes
+are initialized to <tt class="docutils literal"><span class="pre">None</span></tt>, otherwise they are initialized to
+the argument option. In particular flag options are initialized to
+<tt class="docutils literal"><span class="pre">True</span></tt> if they are given, to <tt class="docutils literal"><span class="pre">None</span></tt> otherwise. So, in our example
+<tt class="docutils literal"><span class="pre">option.nobackup</span></tt> is <tt class="docutils literal"><span class="pre">True</span></tt> if the flag <tt class="docutils literal"><span class="pre">-n</span></tt> or <tt class="docutils literal"><span class="pre">--nobackup</span></tt>
+is given.
+The list <tt class="docutils literal"><span class="pre">files</span></tt> contains the files passed
+to the command line (assuming you passed
+the names of accessible text files in your system).</p>
+<p>The main logic can be as simple as the following:</p>
+<blockquote>
+<pre class="literal-block">
+if not files:
+ print &quot;No files given!&quot;
+elif option.regx and option.repl:
+ replace(option.regex, option.repl, files, not option.nobackup)
+else:
+ print &quot;Missing options or unrecognized options.&quot;
+ print __doc__ # documentation on how to use the script
+</pre>
+</blockquote>
+<p>A nice feature of <tt class="docutils literal"><span class="pre">optparse</span></tt> is that an help option is automatically
+created, so <tt class="docutils literal"><span class="pre">replace.py</span> <span class="pre">-h</span></tt> (or <tt class="docutils literal"><span class="pre">replace.py</span> <span class="pre">--help</span></tt>) will work as
+you may expect:</p>
+<blockquote>
+<pre class="literal-block">
+$&gt; replace.py --help
+usage: replace.py files [options]
+
+
+options:
+ -h, --help show this help message and exit
+ -xREGX, --regx=REGX regular expression
+ -rREPL, --repl=REPL replacement string
+ -n, --nobackup do not make backup copies
+</pre>
+</blockquote>
+<p>You may programmatically print the usage message by invoking
+<tt class="docutils literal"><span class="pre">parser.print_help()</span></tt>.</p>
+<p>At this point you may test your script and see that it works as
+advertised.</p>
+</div>
+<div class="section" id="how-to-reduce-verbosity-and-make-your-life-with-optparse-happier">
+<h1><a name="how-to-reduce-verbosity-and-make-your-life-with-optparse-happier">How to reduce verbosity and make your life with <tt class="docutils literal"><span class="pre">optparse</span></tt> happier</a></h1>
+<p>The power of <tt class="docutils literal"><span class="pre">optparse``comes</span> <span class="pre">with</span> <span class="pre">a</span> <span class="pre">penalty:</span> <span class="pre">using</span> <span class="pre">``optparse</span></tt> in
+the standard way, as I explained before, involves a certain amount of
+verbosity/redundance.</p>
+<p>Suppose for instance
+I want to add the ability to restore the original file from the backup copy.
+Then, we have to change the script in three points: in the docstring,
+in the <tt class="docutils literal"><span class="pre">add_option</span></tt> list, and in the <tt class="docutils literal"><span class="pre">if</span> <span class="pre">..</span> <span class="pre">elif</span> <span class="pre">..</span> <span class="pre">else</span> <span class="pre">...</span></tt>
+statement. At least one of this is redundant.</p>
+<p>The redundance can be removed by parsing the docstring to infer the
+options to be recognized. This avoids the boring task
+of writing by hand the <tt class="docutils literal"><span class="pre">parser.add_option</span></tt> lines.
+I implemented this idea in a cookbook recipe, by writing an
+<tt class="docutils literal"><span class="pre">optionparse</span></tt> module which is just a thin wrapper around <tt class="docutils literal"><span class="pre">optparse</span></tt>.
+For sake of space, I cannot repeat it here, but you can find the code
+and a small explanation in the Python Cookbook (see the reference below).
+It is really easy to use. For instance, the paper you are
+reading now has been written by using <tt class="docutils literal"><span class="pre">optionparse</span></tt>: I used it to
+write a simple wrapper to docutils - the standard
+Python tool which converts (restructured) text files to HTML pages.
+It is also nice to notice that internally
+docutils itself uses <tt class="docutils literal"><span class="pre">optparse</span></tt> to do its job, so actually this
+paper has been composed by using <tt class="docutils literal"><span class="pre">optparse</span></tt> twice!</p>
+<p>Finally, you should keep in mind that this article only scratch the
+surface of <tt class="docutils literal"><span class="pre">optparse</span></tt>, which is quite sophisticated.
+For instance you can specify default values, different destinations,
+a <tt class="docutils literal"><span class="pre">store_false</span></tt> action and much more, even if often you don't need
+all this power. Still, it is handy to have the power at your disposal when
+you need it. The serious user of <tt class="docutils literal"><span class="pre">optparse</span></tt> is strongly
+encorauged to read the documentation in the standard library, which
+is pretty good and detailed. I think that this article has fullfilled
+its function of &quot;appetizer&quot; to <tt class="docutils literal"><span class="pre">optparse</span></tt>, if it has stimulate
+the reader to learn more.</p>
+</div>
+<div class="section" id="references">
+<h1><a name="references">References</a></h1>
+<ul class="simple">
+<li><tt class="docutils literal"><span class="pre">optparse/optik</span></tt> is a sourceforge project on its own:
+<a class="reference" href="http://optik.sourceforge.net">http://optik.sourceforge.net</a></li>
+<li>starting from Python 2.3, <tt class="docutils literal"><span class="pre">optparse</span></tt> is included in the standard library:
+<a class="reference" href="http://www.python.org/doc/2.3.4/lib/module-optparse.html">http://www.python.org/doc/2.3.4/lib/module-optparse.html</a></li>
+<li>I wrote a Python Cookbook recipe about optparse:
+<a class="reference" href="http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/278844">http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/278844</a></li>
+</ul>
+</div>
+</div>
+</body>
+</html>