diff options
Diffstat (limited to 'pypers/optparse/paper2.html')
-rwxr-xr-x | pypers/optparse/paper2.html | 373 |
1 files changed, 373 insertions, 0 deletions
diff --git a/pypers/optparse/paper2.html b/pypers/optparse/paper2.html new file mode 100755 index 0000000..67f132a --- /dev/null +++ b/pypers/optparse/paper2.html @@ -0,0 +1,373 @@ +<?xml version="1.0" encoding="utf-8" ?> +<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> +<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> +<head> +<meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> +<meta name="generator" content="Docutils 0.3.9: http://docutils.sourceforge.net/" /> +<title>The optparse module: writing command-line tools the easy way</title> +</head> +<body> +<div class="document" id="the-optparse-module-writing-command-line-tools-the-easy-way"> +<h1 class="title">The optparse module: writing command-line tools the easy way</h1> +<blockquote> +<table class="docutils field-list" frame="void" rules="none"> +<col class="field-name" /> +<col class="field-body" /> +<tbody valign="top"> +<tr class="field"><th class="field-name">Status:</th><td class="field-body">Draft</td> +</tr> +<tr class="field"><th class="field-name">Author:</th><td class="field-body">Michele Simionato</td> +</tr> +<tr class="field"><th class="field-name">E-mail:</th><td class="field-body"><a class="reference" href="mailto:michele.simionato@gmail.com">michele.simionato@gmail.com</a></td> +</tr> +<tr class="field"><th class="field-name">Date:</th><td class="field-body">May 2004</td> +</tr> +</tbody> +</table> +</blockquote> +<p><em>The optparse module is a powerful, flexible, extensible, easy-to-use +command-line parsing library for Python. Using optparse, you can add +intelligent, sophisticated handling of command-line options to your +scripts with very little overhead.</em> -- Greg Ward, optparse author</p> +<div class="section" id="introduction"> +<h1><a name="introduction">Introduction</a></h1> +<p>Once upon a time, when graphic interfaces were still to be dreamed +about, command-line tools were the body and the soul of all programming +tools. Many years have passed since then, but some things have not +changed: command-line tools are still fast, efficient, portable, easy +to use and - more importantly - reliable. You can count on them. +You can expect command-line scripts to work in any situation, +during the installation phase, in a situation of disaster recovery, when +your window manager breaks down and even in systems with severe +memory/hardware constraints. When you really need them, command-line +tools are always there.</p> +<p>Hence, it is important for a programming language - especially +one that wants to be called a "scripting" language - to provide +facilities to help the programmer in the task of writing command-line +tools. For a long time Python support for this kind of tasks has +been provided by the``getopt`` module. I have never +been particularly fond of <tt class="docutils literal"><span class="pre">getopt</span></tt>, since it required +a sensible amount of coding even for the parsing of simple +command-lines. However, with the coming of Python 2.3 the situation +has changed: thanks to the great job of Greg Ward (the author of +<tt class="docutils literal"><span class="pre">optparse</span></tt> a.k.a. <tt class="docutils literal"><span class="pre">Optik</span></tt>) now the Python programmer +has at her disposal (in the standard library and not as an +add-on module) a fully fledged Object Oriented API for +command-line arguments parsing, which makes writing Unix-style +command-line tools easy, efficient and fast.</p> +<p>The only disadvantage of <tt class="docutils literal"><span class="pre">optparse</span></tt> is that it is a +sophisticated tool, which requires some time to be fully mastered. +The purpose of this paper is to help the reader to rapidly get the +10% of the features of <tt class="docutils literal"><span class="pre">optparse</span></tt> that she will use in the 90% of +the cases. Taking as an example a real life application - a search and +replace tool - I will guide the reader through (some of) the wonders +of <tt class="docutils literal"><span class="pre">optparse</span></tt>. Also, I will show some trick that will make your life +with <tt class="docutils literal"><span class="pre">optparse</span></tt> much happier. +This paper is intended for both Unix and +Windows programmers - actually I will argue that Windows programmers +need <tt class="docutils literal"><span class="pre">optparse</span></tt> even more than Unix programmers; it does not +require any particular expertise to be fully appreciated.</p> +</div> +<div class="section" id="a-simple-example"> +<h1><a name="a-simple-example">A simple example</a></h1> +<p>I will take as pedagogical example a little tool I wrote some time ago, +a multiple files search and replace tool. I needed it because I am +not always working under Unix, and I do not always have sed/awk or +even Emacs installed, so it made sense to have this +little Python script in my toolbox. It is only few lines long, +it can always be modified and extended with a minimal effort, +works on every platform (including my PDA) and has the advantage +of being completely command-line driven: +it does not require to have any graphics library installed +and I can use it when I work on a remote machine via ssh.</p> +<p>The tool takes a bunch of files and replace a given regular expression +everywhere in-place; moreover, it saves a backup copy of the original +un-modified files and give the option to recover +them when I want to. Of course, all of this can be done more efficiently +in the Unix world with specialized tools, but those tools are written +in C and they are not as easily customizable as a Python script, that +you may change in real time to suit your needs. So, it makes sense +to write this kind of utility in Python (or in Perl, but I am writing on +Pyzine now ;)</p> +<p>As a final note, let me notice that I find <tt class="docutils literal"><span class="pre">optparse</span></tt> +to be much more useful in the Windows world than in the Unix/Linux/Mac OS X +world. The reason is that the pletora +of pretty good command-line tools which are available under Unix are +missing in the Windows environment, or do not have a satisfactory +equivalent. Therefore, +it makes sense to write a personal collection of command-line scripts +for your more common task, if you need to work on many platforms and +portability is an important requirement. +Using Python and <tt class="docutils literal"><span class="pre">optparse</span></tt>, you may write your own scripts +once and having them to run on every platform running Python, +which means in practice any traditional platform and increasingly +more of the non-traditional ones - Python is spreading into the +embedded market too, including PDA's, cellular phones, and more.</p> +</div> +<div class="section" id="the-unix-philosophy-for-command-line-arguments"> +<h1><a name="the-unix-philosophy-for-command-line-arguments">The Unix philosophy for command-line arguments</a></h1> +<p>In order to understand how <tt class="docutils literal"><span class="pre">optparse</span></tt> works, it is essential +to understand the Unix philosophy about command-lines arguments.</p> +<p>As Greg Ward puts it:</p> +<p><em>The purpose of optparse is to make it very easy to provide the +most standard, obvious, straightforward, and user-friendly user +interface for Unix command-line programs. The optparse philosophy +is heavily influenced by the Unix and GNU toolkits ...</em></p> +<p>Here is a brief summary of the terminology: +the arguments given to a command-line script - <em>i.e.</em> the arguments +that Python stores in the list <tt class="docutils literal"><span class="pre">sys.argv[1:]</span></tt> - are classified in +three groups: options, option arguments and positional arguments. +Options can be distinguished since they are prefixed by a dash +or a double dash; options can have arguments or not +(there is at most an option argument right after each option); +options without arguments are called flags. Positional arguments +are what it is left in the command-line after you remove options +and option arguments.</p> +<p>In the example of the search/replace tool, +I will need two options with an argument - I want +to pass to the script a regular expression and a replacement string - +and I will need a flag specifying whether or not a backup of the original +files needs to be performed. Finally, I will need a number of positional +arguments to store the names of the files on which the search and +replace will act.</p> +<p>Consider - for the sake of the example - the following situations: +you have a bunch of text files in the current directory containing dates +in the European format DD-MM-YYYY, and that you want to convert them in +the American format MM-DD-YYYY. If you are sure that all your dates +are in the correct format, your can match them with a simple regular +expression such as <tt class="docutils literal"><span class="pre">(\d\d)-(\d\d)-(\d\d\d\d)</span></tt>.</p> +<p>In this particular example it is not so important to make a backup +copy of the original files, since to revert to the original +format it is enough to run the script again. So the syntax to use +would be something like</p> +<blockquote> +<pre class="literal-block"> +$> replace.py --nobackup --regx="(\d\d)-(\d\d)-(\d\d\d\d)" \ + --repl="\2-\1-\3" *.txt +</pre> +</blockquote> +<p>In order to emphasize the portability, I have used a generic +<tt class="docutils literal"><span class="pre">$></span></tt> promtp, meaning that these examples work equally well on +both Unix and Windows (of course on Unix I could do the same +job with sed or awk, but these tools are not as flexible as +a Python script).</p> +<p>The syntax here has the advantage of being +quite clear, but the disadvantage of being quite verbose, and it is +handier to use abbreviations for the name of the options. For instance, +sensible abbreviations can be <tt class="docutils literal"><span class="pre">-x</span></tt> for <tt class="docutils literal"><span class="pre">--regx</span></tt>, <tt class="docutils literal"><span class="pre">-r</span></tt> for <tt class="docutils literal"><span class="pre">--repl</span></tt> +and <tt class="docutils literal"><span class="pre">-n</span></tt> for <tt class="docutils literal"><span class="pre">--nobackup</span></tt>; moreover, the <tt class="docutils literal"><span class="pre">=</span></tt> sign can safely be +removed. Then the previous command reads</p> +<blockquote> +<pre class="literal-block"> +$> replace.py -n -x"(\dd)-(\dd)-(\d\d\d\d)" -r"\2-\1-\3" *.txt +</pre> +</blockquote> +<p>You see here the Unix convention at work: one-letter options +(a.k.a. short options) are prefixed with a single dash, whereas +long options are prefixed with a double dash. The advantage of the +convention is that short options can be composed: for instance</p> +<blockquote> +<pre class="literal-block"> +$> replace.py -nx "(\dd)-(\dd)-(\d\d\d\d)" -r "\2-\1-\3" *.txt +</pre> +</blockquote> +<p>means the same as the previous line, i.e. <tt class="docutils literal"><span class="pre">-nx</span></tt> is parsed as +<tt class="docutils literal"><span class="pre">-n</span> <span class="pre">-x</span></tt>. You can also freely exchange the order of the options, +for instance in this way:</p> +<blockquote> +<pre class="literal-block"> +$> replace.py -nr "\2-\1-\3" *.txt -x "(\dd)-(\dd)-(\d\d\d\d)" +</pre> +</blockquote> +<p>This command will be parsed exactly as before, i.e. options and option +arguments are not positional.</p> +</div> +<div class="section" id="how-does-it-work-in-practice"> +<h1><a name="how-does-it-work-in-practice">How does it work in practice?</a></h1> +<p>Having stated the requirements, we may start implementing our +search and replace tool. The first step, is to write down the +documentation string:</p> +<blockquote> +<pre class="literal-block"> +#!/usr/bin/env python +""" +Given a sequence of text files, replaces everywhere +a regular expression x with a replacement string s. + + usage: %prog files [options] + -x, --regx=REGX: regular expression + -r, --repl=REPL: replacement string + -n, --nobackup: do not make backup copies +""" +</pre> +</blockquote> +<p>On Windows the first line in unnecessary, but is good practice to have it +in the Unix world.</p> +<p>The next step is to write down a simple search and replace routine:</p> +<blockquote> +<pre class="literal-block"> +import re + +def replace(regx, repl, files, backup_option=True): + rx = re.compile(regx) + for fname in files: + txt = file(fname, "U").read() # quick & dirty + if backup_option: + print >> file(fname+".bak", "w"), txt, + print >> file(fname, "w"), rx.sub(repl, txt), +</pre> +</blockquote> +<p>This replace routine is entirely unsurprising, the only thing you +may notice is the usage of the "U" option in the line</p> +<blockquote> +<pre class="literal-block"> +txt=file(fname,"U").read() +</pre> +</blockquote> +<p>This is a new feature of Python 2.3. Text files open with the "U" +option are read in "Universal" mode: this means that Python takes +care for you of the newline pain, i.e. this script will work +correctly everywhere, independently by the newline +conventions of your operating system. The script works by reading +the whole file in memory: this is bad practice, and here I am assuming +that you will use this script only on short files that will fit in +your memory, otherwise you should "massage" the code a bit. +Also, a full fledged script would check if the file exists +and can be read, and would do something in the case it is not.</p> +<p>So, how does it work? It is quite simple, really. +First you need to instantiate an argument line parser from +the <tt class="docutils literal"><span class="pre">OptionParser</span></tt> class provided by <tt class="docutils literal"><span class="pre">optparse</span></tt>:</p> +<blockquote> +<pre class="literal-block"> +import optparse +parser = optparse.OptionParser("usage: %prog files [options]") +</pre> +</blockquote> +<p>The string <tt class="docutils literal"><span class="pre">"usage:</span> <span class="pre">%prog</span> <span class="pre">files</span> <span class="pre">[options]"</span></tt> will be used to +print a customized usage message, where <tt class="docutils literal"><span class="pre">%prog</span></tt> will be replaced +by the name of the script (in this case <cite>replace.py`</cite>). You +may safely omit it and <tt class="docutils literal"><span class="pre">optparse</span></tt> will use a default +<tt class="docutils literal"><span class="pre">"usage:</span> <span class="pre">%prog</span> <span class="pre">[options]"</span></tt> string.</p> +<p>Then, you tell the parser informations about which options +it must recognize:</p> +<blockquote> +<pre class="literal-block"> +parser.add_option("-x", "--regx", + help="regular expression") +parser.add_option("-r", "--repl", + help="replacement string") +parser.add_option("-n", "--nobackup", + action="store_true", + help="do not make backup copies") +</pre> +</blockquote> +<p>The <tt class="docutils literal"><span class="pre">help</span></tt> keyword argument is intended to document the +intent of the given option; it is also used by <tt class="docutils literal"><span class="pre">optparse</span></tt> in the +usage message. The <tt class="docutils literal"><span class="pre">action=store_true</span></tt> keyword argument is +used to distinguish flags from options with arguments, it tells +<tt class="docutils literal"><span class="pre">optparse</span></tt> to set the flag <tt class="docutils literal"><span class="pre">nobackup</span></tt> to <tt class="docutils literal"><span class="pre">True</span></tt> if <tt class="docutils literal"><span class="pre">-n</span></tt> +or <tt class="docutils literal"><span class="pre">--nobackup</span></tt> is given in the command line.</p> +<p>Finally, you tell the parse to do its job and to parse the command line:</p> +<blockquote> +<pre class="literal-block"> +option, files = parser.parse_args() +</pre> +</blockquote> +<p>The <tt class="docutils literal"><span class="pre">.parse_args()</span></tt> method returns two values: <tt class="docutils literal"><span class="pre">option</span></tt>, +which is an instance of the <tt class="docutils literal"><span class="pre">optparse.Option</span></tt> class, and <tt class="docutils literal"><span class="pre">files</span></tt>, +which is a list of positional arguments. +The <tt class="docutils literal"><span class="pre">option</span></tt> object has attributes - called <em>destionations</em> in +<tt class="docutils literal"><span class="pre">optparse</span></tt> terminology - corresponding to the given options. +In our example, <tt class="docutils literal"><span class="pre">option</span></tt> will have the attributes <tt class="docutils literal"><span class="pre">option.regx</span></tt>, +<tt class="docutils literal"><span class="pre">option.repl</span></tt> and <tt class="docutils literal"><span class="pre">option.nobackup</span></tt>.</p> +<p>If no options are passed to the command line, all these attributes +are initialized to <tt class="docutils literal"><span class="pre">None</span></tt>, otherwise they are initialized to +the argument option. In particular flag options are initialized to +<tt class="docutils literal"><span class="pre">True</span></tt> if they are given, to <tt class="docutils literal"><span class="pre">None</span></tt> otherwise. So, in our example +<tt class="docutils literal"><span class="pre">option.nobackup</span></tt> is <tt class="docutils literal"><span class="pre">True</span></tt> if the flag <tt class="docutils literal"><span class="pre">-n</span></tt> or <tt class="docutils literal"><span class="pre">--nobackup</span></tt> +is given. +The list <tt class="docutils literal"><span class="pre">files</span></tt> contains the files passed +to the command line (assuming you passed +the names of accessible text files in your system).</p> +<p>The main logic can be as simple as the following:</p> +<blockquote> +<pre class="literal-block"> +if not files: + print "No files given!" +elif option.regx and option.repl: + replace(option.regex, option.repl, files, not option.nobackup) +else: + print "Missing options or unrecognized options." + print __doc__ # documentation on how to use the script +</pre> +</blockquote> +<p>A nice feature of <tt class="docutils literal"><span class="pre">optparse</span></tt> is that an help option is automatically +created, so <tt class="docutils literal"><span class="pre">replace.py</span> <span class="pre">-h</span></tt> (or <tt class="docutils literal"><span class="pre">replace.py</span> <span class="pre">--help</span></tt>) will work as +you may expect:</p> +<blockquote> +<pre class="literal-block"> +$> replace.py --help +usage: replace.py files [options] + + +options: + -h, --help show this help message and exit + -xREGX, --regx=REGX regular expression + -rREPL, --repl=REPL replacement string + -n, --nobackup do not make backup copies +</pre> +</blockquote> +<p>You may programmatically print the usage message by invoking +<tt class="docutils literal"><span class="pre">parser.print_help()</span></tt>.</p> +<p>At this point you may test your script and see that it works as +advertised.</p> +</div> +<div class="section" id="how-to-reduce-verbosity-and-make-your-life-with-optparse-happier"> +<h1><a name="how-to-reduce-verbosity-and-make-your-life-with-optparse-happier">How to reduce verbosity and make your life with <tt class="docutils literal"><span class="pre">optparse</span></tt> happier</a></h1> +<p>The power of <tt class="docutils literal"><span class="pre">optparse``comes</span> <span class="pre">with</span> <span class="pre">a</span> <span class="pre">penalty:</span> <span class="pre">using</span> <span class="pre">``optparse</span></tt> in +the standard way, as I explained before, involves a certain amount of +verbosity/redundance.</p> +<p>Suppose for instance +I want to add the ability to restore the original file from the backup copy. +Then, we have to change the script in three points: in the docstring, +in the <tt class="docutils literal"><span class="pre">add_option</span></tt> list, and in the <tt class="docutils literal"><span class="pre">if</span> <span class="pre">..</span> <span class="pre">elif</span> <span class="pre">..</span> <span class="pre">else</span> <span class="pre">...</span></tt> +statement. At least one of this is redundant.</p> +<p>The redundance can be removed by parsing the docstring to infer the +options to be recognized. This avoids the boring task +of writing by hand the <tt class="docutils literal"><span class="pre">parser.add_option</span></tt> lines. +I implemented this idea in a cookbook recipe, by writing an +<tt class="docutils literal"><span class="pre">optionparse</span></tt> module which is just a thin wrapper around <tt class="docutils literal"><span class="pre">optparse</span></tt>. +For sake of space, I cannot repeat it here, but you can find the code +and a small explanation in the Python Cookbook (see the reference below). +It is really easy to use. For instance, the paper you are +reading now has been written by using <tt class="docutils literal"><span class="pre">optionparse</span></tt>: I used it to +write a simple wrapper to docutils - the standard +Python tool which converts (restructured) text files to HTML pages. +It is also nice to notice that internally +docutils itself uses <tt class="docutils literal"><span class="pre">optparse</span></tt> to do its job, so actually this +paper has been composed by using <tt class="docutils literal"><span class="pre">optparse</span></tt> twice!</p> +<p>Finally, you should keep in mind that this article only scratch the +surface of <tt class="docutils literal"><span class="pre">optparse</span></tt>, which is quite sophisticated. +For instance you can specify default values, different destinations, +a <tt class="docutils literal"><span class="pre">store_false</span></tt> action and much more, even if often you don't need +all this power. Still, it is handy to have the power at your disposal when +you need it. The serious user of <tt class="docutils literal"><span class="pre">optparse</span></tt> is strongly +encorauged to read the documentation in the standard library, which +is pretty good and detailed. I think that this article has fullfilled +its function of "appetizer" to <tt class="docutils literal"><span class="pre">optparse</span></tt>, if it has stimulate +the reader to learn more.</p> +</div> +<div class="section" id="references"> +<h1><a name="references">References</a></h1> +<ul class="simple"> +<li><tt class="docutils literal"><span class="pre">optparse/optik</span></tt> is a sourceforge project on its own: +<a class="reference" href="http://optik.sourceforge.net">http://optik.sourceforge.net</a></li> +<li>starting from Python 2.3, <tt class="docutils literal"><span class="pre">optparse</span></tt> is included in the standard library: +<a class="reference" href="http://www.python.org/doc/2.3.4/lib/module-optparse.html">http://www.python.org/doc/2.3.4/lib/module-optparse.html</a></li> +<li>I wrote a Python Cookbook recipe about optparse: +<a class="reference" href="http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/278844">http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/278844</a></li> +</ul> +</div> +</div> +</body> +</html> |