diff options
Diffstat (limited to 'README.python')
-rw-r--r-- | README.python | 87 |
1 files changed, 50 insertions, 37 deletions
diff --git a/README.python b/README.python index a22f517..579034a 100644 --- a/README.python +++ b/README.python @@ -1,32 +1,43 @@ README ------ -Most (nearly all) python packages use setuptools, for detailed information on -setuptools see the setuptools docs[1]. If you're not familiar with setuptools -you should read the docs[1][2] before continuing. - Please note that this tool expects any python packages to be on pypi, you cannot currently import packages from other places. -This import tool uses a combination of pypi metadata, -pip and setuptools commands to extract dependency information -to create a set of definitions useable with Baserock. This is not a stable -process and will not work smoothly in many cases: because setup.py -is just an ordinary Python script it's possible for a setup.py to do things that -break the import tool's means to extract dependencies, for example, some packages -bypass parts of setuptools and subclass parts of distutils's core instead. -Another problem with importing python packages is that packages are uploaded -to pypi as tarballs rather than as repositories and as a result the import tool -generates a lot of tarball lorries which is the least desireable kind of lorry -to use with Baserock. To avoid this the import tool looks through parts of the -package metadata for links to real repos, this detection is currently extremely -basic and will hopefully be improved in future to allow the tool to reduce the -number of tarball lorries it generates. Some python packages -only declare their dependency information in a human readable form within a -README, this tool cannot do anything to extract dependency -information that is not encoded in a machine readable fashion. At the time of -writing numpy is an example of such a package: running the import tool on numpy -will yield a stratum that contains numpy and none of its dependencies. +This import tool uses PyPI metadata and setuptools commands to extract +dependency information. + +To get runtime dependency information for a package, it sets up a 'virtualenv' +environment, installs the package from PyPI using 'pip', and then uses 'pip +freeze' to get a list of exactly what packages have been installed. This is +pretty inefficient, in terms of computing resources: installation involves +downloading and sometimes compiling C source code. However, it is the most +reliable and maintainable approach we have found so far. + +Python packaging metadata is something of a free-for-all. Reusing the code of +Pip is great because Pip is probably the most widely-tested consumer of Python +packaging metadata. We did submit a patch to add an 'install +--list-dependencies' mode to Pip, but this wasn't accepted. The current +approach should work with pretty much any version of Pip, no special patches +required. + +Most (nearly all) python packages use setuptools, for detailed information on +setuptools see the setuptools docs[1]. + +Python packages are uploaded to PyPI as tarballs, and PyPI doesn't have a +standard way of indicating where the canonical source repository for a package +is. The python.to_lorry import extension just generates .lorry files pointing +to those tarballs, as this is the most reliable thing it can do. It would be +possible to guess at where the source repo is, but this can have random failure +cases and you end up mirroring a project's home page instead of its source code +sometimes. + +Some python packages only declare their dependency information in a human +readable form within a README, this tool cannot do anything to extract +dependency information that is not encoded in a machine readable fashion. At +the time of writing, 'numpy' is an example of such a package: running the import +tool on 'numpy' will yield a stratum that contains 'numpy' and none of its +dependencies. Python packages may require other packages to be present for build/installation to proceed, in setuptools these are called setup requirements. @@ -34,26 +45,28 @@ Setup requirements naturally translate to Baserock build dependencies, in practice most python packages don't have any setup requirements, so the lists of build-depends for each chunk will generally be empty lists. -Many python packages require additional (in addition to a python interpreter) -packages to be present at runtime, in setuptools parlance these are install -requirements. The import tool uses pip to recursively extract runtime -dependency information for a given package, each dependency is added to the -same stratum as the package we're trying to import. All packages implicitly -depend on a python interpreter, the import tool encodes this by making all -strata build depend on core, which at the time of writing contains cpython. Traps ----- * Because pip executes setup.py commands to determine dependencies and some packages' setup.py files invoke compilers, the import tool may end up -running compilers. +running compilers. You can pass `--log=/dev/stdout` to get detailed progress +information on the console, which will show you if this is happening. -* pip puts errors on stdout, some import tool errors may be vague: if it's -not clear what's going on you can check the log, if you're using ---log-level=debug then the import tool will log the output of all the commands -it executes to obtain dependency information. -[1]: https://pythonhosted.org/setuptools/ -[2]: https://pythonhosted.org/an_example_pypi_project/setuptools.html +Good testcases +-------------- + +Here are some interesting test cases: + + - ftw.blog (fails because needs .zip import) + - MySQL-python (fails because needs .zip import) + - nixtla (long, unnecessary compilation involved) + - python-keystoneclient (~40 deps, needs --force-stratum due to missing tag) + - rejester (~24 deps) + - requests (~26 deps, mostly not actually needed at runtime) + + +[1]: https://pythonhosted.org/setuptools/ |