summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--README.python87
-rw-r--r--TODO.python65
-rw-r--r--baserockimport/app.py3
-rwxr-xr-xbaserockimport/exts/python.find_deps124
-rwxr-xr-xbaserockimport/exts/python.to_lorry73
-rwxr-xr-xbaserockimport/exts/python_run_pip32
6 files changed, 173 insertions, 211 deletions
diff --git a/README.python b/README.python
index a22f517..579034a 100644
--- a/README.python
+++ b/README.python
@@ -1,32 +1,43 @@
README
------
-Most (nearly all) python packages use setuptools, for detailed information on
-setuptools see the setuptools docs[1]. If you're not familiar with setuptools
-you should read the docs[1][2] before continuing.
-
Please note that this tool expects any python packages to be on pypi, you
cannot currently import packages from other places.
-This import tool uses a combination of pypi metadata,
-pip and setuptools commands to extract dependency information
-to create a set of definitions useable with Baserock. This is not a stable
-process and will not work smoothly in many cases: because setup.py
-is just an ordinary Python script it's possible for a setup.py to do things that
-break the import tool's means to extract dependencies, for example, some packages
-bypass parts of setuptools and subclass parts of distutils's core instead.
-Another problem with importing python packages is that packages are uploaded
-to pypi as tarballs rather than as repositories and as a result the import tool
-generates a lot of tarball lorries which is the least desireable kind of lorry
-to use with Baserock. To avoid this the import tool looks through parts of the
-package metadata for links to real repos, this detection is currently extremely
-basic and will hopefully be improved in future to allow the tool to reduce the
-number of tarball lorries it generates. Some python packages
-only declare their dependency information in a human readable form within a
-README, this tool cannot do anything to extract dependency
-information that is not encoded in a machine readable fashion. At the time of
-writing numpy is an example of such a package: running the import tool on numpy
-will yield a stratum that contains numpy and none of its dependencies.
+This import tool uses PyPI metadata and setuptools commands to extract
+dependency information.
+
+To get runtime dependency information for a package, it sets up a 'virtualenv'
+environment, installs the package from PyPI using 'pip', and then uses 'pip
+freeze' to get a list of exactly what packages have been installed. This is
+pretty inefficient, in terms of computing resources: installation involves
+downloading and sometimes compiling C source code. However, it is the most
+reliable and maintainable approach we have found so far.
+
+Python packaging metadata is something of a free-for-all. Reusing the code of
+Pip is great because Pip is probably the most widely-tested consumer of Python
+packaging metadata. We did submit a patch to add an 'install
+--list-dependencies' mode to Pip, but this wasn't accepted. The current
+approach should work with pretty much any version of Pip, no special patches
+required.
+
+Most (nearly all) python packages use setuptools, for detailed information on
+setuptools see the setuptools docs[1].
+
+Python packages are uploaded to PyPI as tarballs, and PyPI doesn't have a
+standard way of indicating where the canonical source repository for a package
+is. The python.to_lorry import extension just generates .lorry files pointing
+to those tarballs, as this is the most reliable thing it can do. It would be
+possible to guess at where the source repo is, but this can have random failure
+cases and you end up mirroring a project's home page instead of its source code
+sometimes.
+
+Some python packages only declare their dependency information in a human
+readable form within a README, this tool cannot do anything to extract
+dependency information that is not encoded in a machine readable fashion. At
+the time of writing, 'numpy' is an example of such a package: running the import
+tool on 'numpy' will yield a stratum that contains 'numpy' and none of its
+dependencies.
Python packages may require other packages to be present for
build/installation to proceed, in setuptools these are called setup requirements.
@@ -34,26 +45,28 @@ Setup requirements naturally translate to Baserock build dependencies, in
practice most python packages don't have any setup requirements, so the lists
of build-depends for each chunk will generally be empty lists.
-Many python packages require additional (in addition to a python interpreter)
-packages to be present at runtime, in setuptools parlance these are install
-requirements. The import tool uses pip to recursively extract runtime
-dependency information for a given package, each dependency is added to the
-same stratum as the package we're trying to import. All packages implicitly
-depend on a python interpreter, the import tool encodes this by making all
-strata build depend on core, which at the time of writing contains cpython.
Traps
-----
* Because pip executes setup.py commands to determine dependencies
and some packages' setup.py files invoke compilers, the import tool may end up
-running compilers.
+running compilers. You can pass `--log=/dev/stdout` to get detailed progress
+information on the console, which will show you if this is happening.
-* pip puts errors on stdout, some import tool errors may be vague: if it's
-not clear what's going on you can check the log, if you're using
---log-level=debug then the import tool will log the output of all the commands
-it executes to obtain dependency information.
-[1]: https://pythonhosted.org/setuptools/
-[2]: https://pythonhosted.org/an_example_pypi_project/setuptools.html
+Good testcases
+--------------
+
+Here are some interesting test cases:
+
+ - ftw.blog (fails because needs .zip import)
+ - MySQL-python (fails because needs .zip import)
+ - nixtla (long, unnecessary compilation involved)
+ - python-keystoneclient (~40 deps, needs --force-stratum due to missing tag)
+ - rejester (~24 deps)
+ - requests (~26 deps, mostly not actually needed at runtime)
+
+
+[1]: https://pythonhosted.org/setuptools/
diff --git a/TODO.python b/TODO.python
index 16b7889..6c276f7 100644
--- a/TODO.python
+++ b/TODO.python
@@ -25,68 +25,3 @@ this will be confusing, we should emit nice version numbers.
i.e. nixtla
* add a test runner
-
-* Importing python packages that use pbr fails, see
-https://bitbucket.org/pypa/setuptools/issue/73/typeerror-dist-must-be-a-distribution#comment-7267980
-The most sensible option would seem to be to make use of the sane environment
-that pbr provides: just read the dependency information from the text files
-that pbr projects provide, see, http://docs.openstack.org/developer/pbr/
-
-Results from running the import tool on various python packages follow:
-
-* Imports tested so far (stratum is generated)
- * SUCCEEDS
- * nixtla: fine but requires compilation
- * ryser
- * Twisted
- * Django
- * textdata
- * whale-agent
- * virtualenv
- * lxml
- * nose
- * six
- * simplejson
- * pika
- * MarkupSafe
- * zc.buildout
- * Paste
- * pycrypto
- * Jinja2
- * Flask
- * bcdoc
- * pymongo
-
- * FAILS
- * python-keystoneclient
- * All openstack stuff requires pbr, pbr does not play nicely with
- current setuptools see: [Issue 73](https://bitbucket.org/pypa/setuptoolsissue/73/typeerror-dist-must-be-a-distribution#comment-7267980)
- we can either fix setuptools/pbr or make use of the sane environment
- pbr provides.
- * persistent-pineapple
- * Git repo[1] has different layout to tarball[2] downloadeable from pypi,
- git repo's layout isn't 'installable' by pip, so dependencies can
- not be determined.
- [1]: https://github.com/JasonAUnrein/Persistent-Pineapple
- [2]: https://pypi.python.org/packages/source/p/persistent_pineapple/persistent_pineapple-1.0.0.tar.gz
- * ftw.blog
- * cannot satisfy dependencies
- * boto
- * cannot satisfy dependencies
- * jmespath
- * cannot satisfy dependencies
- * rejester
- * its setup.py subclasses distutils.core
- * requests
- * cannot satisfy dependencies
- * MySQL-python
- * egg_info blows up,
- * python setup.py install doesn't even work
- * maybe the user's expected to do some manual stuff first, who knows
- * rejester (its setup.py subclasses distutils.core)
- * redis-jobs (succeeded at first, no longer exists on pypi)
- * coverage (stratum couldn't be generated because some tags are missing)
-
-* Imports completely tested, built, deployed and executed successfully:
-
- * Flask
diff --git a/baserockimport/app.py b/baserockimport/app.py
index 5f3d435..ae95d58 100644
--- a/baserockimport/app.py
+++ b/baserockimport/app.py
@@ -227,7 +227,8 @@ class BaserockImportApplication(cliapp.Application):
loop = baserockimport.mainloop.ImportLoop(app=self,
goal_kind='python',
goal_name=package_name,
- goal_version=package_version)
+ goal_version=package_version,
+ ignore_version_field=True)
loop.enable_importer('python', strata=['strata/core.morph'],
package_comp_callback=comp)
loop.run()
diff --git a/baserockimport/exts/python.find_deps b/baserockimport/exts/python.find_deps
index 91a9e39..b173110 100755
--- a/baserockimport/exts/python.find_deps
+++ b/baserockimport/exts/python.find_deps
@@ -24,6 +24,7 @@
from __future__ import print_function
+import contextlib
import sys
import subprocess
import os
@@ -32,6 +33,7 @@ import tempfile
import logging
import select
import signal
+import shutil
import pkg_resources
import xmlrpclib
@@ -262,65 +264,109 @@ def find_build_deps(source, name, version=None):
return build_deps
+
+@contextlib.contextmanager
+def temporary_virtualenv():
+ tempdir = tempfile.mkdtemp()
+
+ logging.debug('Creating virtualenv in %s', tempdir)
+ p = subprocess.Popen(['virtualenv', tempdir], stdout=subprocess.PIPE,
+ stderr=subprocess.STDOUT)
+
+ while True:
+ line = p.stdout.readline()
+ if line == '':
+ break
+
+ logging.debug(line.rstrip('\n'))
+
+ p.wait() # even with eof, wait for termination
+
+ try:
+ yield tempdir
+ finally:
+ logging.debug('Removing virtualenv at %s', tempdir)
+ shutil.rmtree(tempdir)
+
+
def find_runtime_deps(source, name, version=None, use_requirements_file=False):
logging.debug('Finding runtime dependencies for %s%s at %s'
% (name, ' %s' % version if version else '', source))
- # Run our patched pip to get a list of installed deps
- # Run pip install . --list-dependencies=instdeps.txt with cwd=source
-
# Some temporary file needed for storing the requirements
tmpfd, tmppath = tempfile.mkstemp()
logging.debug('Writing install requirements to: %s', tmppath)
- args = ['pip', 'install', '.', '--list-dependencies=%s' % tmppath]
- if use_requirements_file:
- args.insert(args.index('.') + 1, '-r')
- args.insert(args.index('.') + 2, 'requirements.txt')
+ with temporary_virtualenv() as virtenv_path:
+ shutil.copytree(source, os.path.join(virtenv_path, 'source'))
- logging.debug('Running pip, args: %s' % args)
+ pip_runner = os.path.join(os.path.dirname(os.path.abspath(__file__)),
+ 'python_run_pip')
+ logging.debug('pip_runner: %s', pip_runner)
- p = subprocess.Popen(args, cwd=source, stdout=subprocess.PIPE,
- stderr=subprocess.STDOUT)
+ subprocess_env = os.environ.copy()
+ subprocess_env['TMPDIR'] = os.path.join(virtenv_path, 'tmp')
+ logging.debug('Using %s as TMPDIR', subprocess_env['TMPDIR'])
+ p = subprocess.Popen(pip_runner, cwd=virtenv_path, env=subprocess_env,
+ stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
- output = []
- while True:
- line = p.stdout.readline()
- if line == '':
- break
+ output = []
+ while True:
+ line = p.stdout.readline()
+ if line == '':
+ break
- logging.debug(line.rstrip('\n'))
- output.append(line)
+ logging.debug(line.rstrip('\n'))
+ output.append(line)
- p.wait() # even with eof, wait for termination
+ p.wait() # even with eof, wait for termination
- logging.debug('pip exited with code: %d' % p.returncode)
+ logging.debug('pip exited with code: %d' % p.returncode)
- if p.returncode != 0:
- error('Failed to get runtime dependencies for %s%s at %s. Output from '
- 'Pip: %s' % (name, ' %s' % version if version else '', source,
- ' '.join(output)))
+ if p.returncode != 0:
+ error('Failed to get runtime dependencies for %s%s at %s. Output '
+ 'from Pip: %s' % (name, ' %s' % version if version else '',
+ source, ' '.join(output)))
+
+ # Now run pip freeze
+ logging.debug('Running pip freeze')
+
+ p = subprocess.Popen(['/bin/bash', '-c',
+ 'source bin/activate; '
+ 'pip freeze --disable-pip-version-check'],
+ cwd=virtenv_path,
+ stdout=tmpfd, stderr=subprocess.PIPE)
- with os.fdopen(tmpfd) as tmpfile:
- ss = resolve_specs(pkg_resources.parse_requirements(tmpfile))
- logging.debug("Resolved specs for %s: %s" % (name, ss))
+ _, err = p.communicate()
+ os.close(tmpfd)
- logging.debug("Removing root package from specs")
+ if p.returncode != 0:
+ error('failed to get runtime dependencies for %s%s at %s: %s'
+ % (name, ' %s' % version if version else '', source, err))
- # filter out "root" package
- # hyphens and underscores are treated as equivalents
- # in distribution names
- specsets = {k: v for (k, v) in ss.iteritems()
- if k not in [name, name.replace('_', '-')]}
+ with open(tmppath) as f:
+ logging.debug(f.read())
- versions = resolve_versions(specsets)
- logging.debug('Resolved versions: %s' % versions)
+ with open(tmppath) as tmpfile:
+ ss = resolve_specs(pkg_resources.parse_requirements(tmpfile))
+ logging.debug("Resolved specs for %s: %s" % (name, ss))
- # Since any of the candidates in versions should satisfy
- # all specs, we just pick the first version we see
- runtime_deps = {name: vs[0] for (name, vs) in versions.iteritems()}
+ logging.debug("Removing root package from specs")
- os.remove(tmppath)
+ # filter out "root" package
+ # hyphens and underscores are treated as equivalents
+ # in distribution names
+ specsets = {k: v for (k, v) in ss.iteritems()
+ if k not in [name, name.replace('_', '-')]}
+
+ versions = resolve_versions(specsets)
+ logging.debug('Resolved versions: %s' % versions)
+
+ # Since any of the candidates in versions should satisfy
+ # all specs, we just pick the first version we see
+ runtime_deps = {name: vs[0] for (name, vs) in versions.iteritems()}
+
+ os.remove(tmppath)
if (len(runtime_deps) == 0 and not use_requirements_file
and os.path.isfile(os.path.join(source, 'requirements.txt'))):
@@ -360,6 +406,8 @@ class PythonFindDepsExtension(ImportExtension):
root = {'python': deps}
+ logging.debug('Returning %s', root)
+
print(json.dumps(root))
if __name__ == '__main__':
diff --git a/baserockimport/exts/python.to_lorry b/baserockimport/exts/python.to_lorry
index ebde27a..30f38e7 100755
--- a/baserockimport/exts/python.to_lorry
+++ b/baserockimport/exts/python.to_lorry
@@ -37,58 +37,6 @@ from importer_python_common import *
from utils import warn, error
import utils
-
-def find_repo_type(url):
-
- debug_vcss = False
-
- # Don't bother with detection if we can't get a 200 OK
- logging.debug("Getting '%s' ..." % url)
-
- status_code = requests.get(url).status_code
- if status_code != 200:
- logging.debug('Got %d status code from %s, aborting repo detection'
- % (status_code, url))
- return None
-
- logging.debug('200 OK for %s' % url)
- logging.debug('Finding repo type for %s' % url)
-
- vcss = [('git', 'clone'), ('hg', 'clone'),
- ('svn', 'checkout'), ('bzr', 'branch')]
-
- for (vcs, vcs_command) in vcss:
- logging.debug('Trying %s %s' % (vcs, vcs_command))
- tempdir = tempfile.mkdtemp()
-
- p = subprocess.Popen([vcs, vcs_command, url], stdout=subprocess.PIPE,
- stderr=subprocess.STDOUT, stdin=subprocess.PIPE,
- cwd=tempdir)
-
- # We close stdin on parent side to prevent the child from blocking
- # if it reads on stdin
- p.stdin.close()
-
- while True:
- line = p.stdout.readline()
- if line == '':
- break
-
- if debug_vcss:
- logging.debug(line.rstrip('\n'))
-
- p.wait() # even with eof on both streams, we still wait
-
- shutil.rmtree(tempdir)
-
- if p.returncode == 0:
- logging.debug('%s is a %s repo' % (url, vcs))
- return vcs
-
- logging.debug("%s doesn't seem to be a repo" % url)
-
- return None
-
def filter_urls(urls):
allowed_extensions = ['tar.gz', 'tgz', 'tar.Z', 'tar.bz2', 'tbz2',
'tar.lzma', 'tar.xz', 'tlz', 'txz', 'tar']
@@ -101,7 +49,7 @@ def filter_urls(urls):
def get_releases(client, package_name):
try:
- releases = client.package_releases(package_name)
+ releases = client.package_releases(package_name, True)
except Exception as e:
error("Couldn't fetch release data:", e)
@@ -185,23 +133,8 @@ class PythonLorryExtension(ImportExtension):
logging.debug('Treating %s as %s' % (package_name, new_proj_name))
package_name = new_proj_name
- try:
- metadata = self.fetch_package_metadata(package_name)
- except Exception as e:
- error("Couldn't fetch package metadata: ", e)
-
- info = metadata.json()['info']
-
- repo_type = (find_repo_type(info['home_page'])
- if 'home_page' in info else None)
-
- if repo_type:
- # TODO: Don't hardcode extname here.
- print(utils.str_repo_lorry('python', lorry_prefix, package_name,
- repo_type, info['home_page']))
- else:
- print(generate_tarball_lorry(lorry_prefix, client,
- package_name, version))
+ print(generate_tarball_lorry(lorry_prefix, client,
+ package_name, version))
if __name__ == '__main__':
PythonLorryExtension().run()
diff --git a/baserockimport/exts/python_run_pip b/baserockimport/exts/python_run_pip
new file mode 100755
index 0000000..f3877b4
--- /dev/null
+++ b/baserockimport/exts/python_run_pip
@@ -0,0 +1,32 @@
+#!/bin/bash
+#
+# Copyright © 2015 Codethink Limited
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; version 2 of the License.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License along
+# with this program; if not, write to the Free Software Foundation, Inc.,
+# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+
+set -e
+
+pip_install="pip install --disable-pip-version-check ."
+
+source bin/activate
+cd source
+
+if [[ -e requirements.txt ]]
+then
+ echo "Running $pip_install -r requirements.txt"
+ $pip_install -r requirements.txt
+else
+ echo "Running $pip_install"
+ $pip_install
+fi