summaryrefslogtreecommitdiff
path: root/README.python
blob: 579034a5b2ca9cdaa67697cb5ec7651a5b76747e (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
README
------

Please note that this tool expects any python packages to be on pypi, you
cannot currently import packages from other places.

This import tool uses PyPI metadata and setuptools commands to extract
dependency information.

To get runtime dependency information for a package, it sets up a 'virtualenv'
environment, installs the package from PyPI using 'pip', and then uses 'pip
freeze' to get a list of exactly what packages have been installed. This is
pretty inefficient, in terms of computing resources: installation involves
downloading and sometimes compiling C source code. However, it is the most
reliable and maintainable approach we have found so far.

Python packaging metadata is something of a free-for-all. Reusing the code of
Pip is great because Pip is probably the most widely-tested consumer of Python
packaging metadata. We did submit a patch to add an 'install
--list-dependencies' mode to Pip, but this wasn't accepted. The current
approach should work with pretty much any version of Pip, no special patches
required.

Most (nearly all) python packages use setuptools, for detailed information on
setuptools see the setuptools docs[1].

Python packages are uploaded to PyPI as tarballs, and PyPI doesn't have a
standard way of indicating where the canonical source repository for a package
is. The python.to_lorry import extension just generates .lorry files pointing
to those tarballs, as this is the most reliable thing it can do. It would be
possible to guess at where the source repo is, but this can have random failure
cases and you end up mirroring a project's home page instead of its source code
sometimes.

Some python packages only declare their dependency information in a human
readable form within a README, this tool cannot do anything to extract
dependency information that is not encoded in a machine readable fashion. At
the time of writing, 'numpy' is an example of such a package: running the import
tool on 'numpy' will yield a stratum that contains 'numpy' and none of its
dependencies.

Python packages may require other packages to be present for
build/installation to proceed, in setuptools these are called setup requirements.
Setup requirements naturally translate to Baserock build dependencies, in
practice most python packages don't have any setup requirements, so the lists
of build-depends for each chunk will generally be empty lists.


Traps
-----

* Because pip executes setup.py commands to determine dependencies
and some packages' setup.py files invoke compilers, the import tool may end up
running compilers. You can pass `--log=/dev/stdout` to get detailed progress
information on the console, which will show you if this is happening.



Good testcases
--------------

Here are some interesting test cases:

  - ftw.blog (fails because needs .zip import)
  - MySQL-python (fails because needs .zip import)
  - nixtla (long, unnecessary compilation involved)
  - python-keystoneclient (~40 deps, needs --force-stratum due to missing tag)
  - rejester (~24 deps)
  - requests (~26 deps, mostly not actually needed at runtime)


[1]: https://pythonhosted.org/setuptools/