summaryrefslogtreecommitdiff
path: root/README.python
diff options
context:
space:
mode:
authorRichard Ipsum <richard.ipsum@codethink.co.uk>2015-06-19 15:17:33 +0100
committerRichard Ipsum <richard.ipsum@codethink.co.uk>2015-08-26 14:38:11 +0000
commit5957c58e8113439e3ef36ff646eb52695085046d (patch)
treef1009b3a8a1a3cb7ef02d6cb7a1759b60796e095 /README.python
parentca1d08d765cc298bb4eeee9a2182aa67de657f5f (diff)
downloadimport-5957c58e8113439e3ef36ff646eb52695085046d.tar.gz
Use virtualenv and Pip to find runtime deps; remove searching for upstream
Co-authored-by: Sam Thursfield <sam.thursfield@codethink.co.uk> Since upstream pip do not want to merge https://github.com/pypa/pip/pull/2371 we should avoid depending on this pull request. To find runtime dependencies we now run pip install inside a virtual env then run pip freeze to obtain the dependency set, this has the advantage that nearly all the work is being done by pip. Originally the python extensions were designed to look for upstream git repos, in practice this is unreliable and won't be compatible with obtaining dependencies using pip install, so the downside of this approach is that all lorries will be tarballs, the upshot is that we can now automatically import many packages that we couldn't import before. Another upshot of this approach is that we may be able to consider the removal of a lot of the spec processing and validation code if we're willing to worry less about build dependencies, we're not sure whether we should be willing to worry less about build dependencies though. We've had encouraging results using this patch so far, we are now able to import, without user intervention, packages that failed previously, such as boto, persistent-pineapple, jmespath, coverage, requests also almost imported successfully but appears to require a release of pytest that is uploaded as a zip. Change-Id: I705c6f6bd722df041d17630287382f851008e97a
Diffstat (limited to 'README.python')
-rw-r--r--README.python87
1 files changed, 50 insertions, 37 deletions
diff --git a/README.python b/README.python
index a22f517..579034a 100644
--- a/README.python
+++ b/README.python
@@ -1,32 +1,43 @@
README
------
-Most (nearly all) python packages use setuptools, for detailed information on
-setuptools see the setuptools docs[1]. If you're not familiar with setuptools
-you should read the docs[1][2] before continuing.
-
Please note that this tool expects any python packages to be on pypi, you
cannot currently import packages from other places.
-This import tool uses a combination of pypi metadata,
-pip and setuptools commands to extract dependency information
-to create a set of definitions useable with Baserock. This is not a stable
-process and will not work smoothly in many cases: because setup.py
-is just an ordinary Python script it's possible for a setup.py to do things that
-break the import tool's means to extract dependencies, for example, some packages
-bypass parts of setuptools and subclass parts of distutils's core instead.
-Another problem with importing python packages is that packages are uploaded
-to pypi as tarballs rather than as repositories and as a result the import tool
-generates a lot of tarball lorries which is the least desireable kind of lorry
-to use with Baserock. To avoid this the import tool looks through parts of the
-package metadata for links to real repos, this detection is currently extremely
-basic and will hopefully be improved in future to allow the tool to reduce the
-number of tarball lorries it generates. Some python packages
-only declare their dependency information in a human readable form within a
-README, this tool cannot do anything to extract dependency
-information that is not encoded in a machine readable fashion. At the time of
-writing numpy is an example of such a package: running the import tool on numpy
-will yield a stratum that contains numpy and none of its dependencies.
+This import tool uses PyPI metadata and setuptools commands to extract
+dependency information.
+
+To get runtime dependency information for a package, it sets up a 'virtualenv'
+environment, installs the package from PyPI using 'pip', and then uses 'pip
+freeze' to get a list of exactly what packages have been installed. This is
+pretty inefficient, in terms of computing resources: installation involves
+downloading and sometimes compiling C source code. However, it is the most
+reliable and maintainable approach we have found so far.
+
+Python packaging metadata is something of a free-for-all. Reusing the code of
+Pip is great because Pip is probably the most widely-tested consumer of Python
+packaging metadata. We did submit a patch to add an 'install
+--list-dependencies' mode to Pip, but this wasn't accepted. The current
+approach should work with pretty much any version of Pip, no special patches
+required.
+
+Most (nearly all) python packages use setuptools, for detailed information on
+setuptools see the setuptools docs[1].
+
+Python packages are uploaded to PyPI as tarballs, and PyPI doesn't have a
+standard way of indicating where the canonical source repository for a package
+is. The python.to_lorry import extension just generates .lorry files pointing
+to those tarballs, as this is the most reliable thing it can do. It would be
+possible to guess at where the source repo is, but this can have random failure
+cases and you end up mirroring a project's home page instead of its source code
+sometimes.
+
+Some python packages only declare their dependency information in a human
+readable form within a README, this tool cannot do anything to extract
+dependency information that is not encoded in a machine readable fashion. At
+the time of writing, 'numpy' is an example of such a package: running the import
+tool on 'numpy' will yield a stratum that contains 'numpy' and none of its
+dependencies.
Python packages may require other packages to be present for
build/installation to proceed, in setuptools these are called setup requirements.
@@ -34,26 +45,28 @@ Setup requirements naturally translate to Baserock build dependencies, in
practice most python packages don't have any setup requirements, so the lists
of build-depends for each chunk will generally be empty lists.
-Many python packages require additional (in addition to a python interpreter)
-packages to be present at runtime, in setuptools parlance these are install
-requirements. The import tool uses pip to recursively extract runtime
-dependency information for a given package, each dependency is added to the
-same stratum as the package we're trying to import. All packages implicitly
-depend on a python interpreter, the import tool encodes this by making all
-strata build depend on core, which at the time of writing contains cpython.
Traps
-----
* Because pip executes setup.py commands to determine dependencies
and some packages' setup.py files invoke compilers, the import tool may end up
-running compilers.
+running compilers. You can pass `--log=/dev/stdout` to get detailed progress
+information on the console, which will show you if this is happening.
-* pip puts errors on stdout, some import tool errors may be vague: if it's
-not clear what's going on you can check the log, if you're using
---log-level=debug then the import tool will log the output of all the commands
-it executes to obtain dependency information.
-[1]: https://pythonhosted.org/setuptools/
-[2]: https://pythonhosted.org/an_example_pypi_project/setuptools.html
+Good testcases
+--------------
+
+Here are some interesting test cases:
+
+ - ftw.blog (fails because needs .zip import)
+ - MySQL-python (fails because needs .zip import)
+ - nixtla (long, unnecessary compilation involved)
+ - python-keystoneclient (~40 deps, needs --force-stratum due to missing tag)
+ - rejester (~24 deps)
+ - requests (~26 deps, mostly not actually needed at runtime)
+
+
+[1]: https://pythonhosted.org/setuptools/