Initial import of Baserock import tool for importing foreign packaging

author: Sam Thursfield <sam.thursfield@codethink.co.uk> 2014-10-14 16:41:16 +0100
committer: Sam Thursfield <sam.thursfield@codethink.co.uk> 2014-10-14 16:41:16 +0100
commit: c11bcfcd39bd9c9e30184ea29d21ef52624d056a (patch)
tree: 8b4fbe74ced0b68ced598e42c9f19182beea73ba
download: import-c11bcfcd39bd9c9e30184ea29d21ef52624d056a.tar.gz
12 files changed, 2105 insertions, 0 deletions
diff --git a/README b/README
new file mode 100644
index 0000000..3ac7997
--- /dev/null
+++ b/README
@@ -0,0 +1,100 @@
+How to use the Baserock Import Tool
+===================================
+
+The tool helps you generate Baserock build instructions by importing metadata
+from a foreign packaging system.
+
+The process it follows is this:
+
+1. Pick a package from the processing queue.
+2. Find its source code, and generate a suitable .lorry file.
+3. Make it available as a local Git repo.
+4. Check out the commit corresponding to the requested version of the package.
+5. Analyse the source tree and generate a suitable chunk .morph to build the
+   requested package.
+6. Analyse the source tree and generate a list of dependencies for the package.
+7. Enqueue any new dependencies, and repeat.
+
+Once the queue is empty:
+
+8. Generate a stratum .morph for the package(s) the user requested.
+
+The tool is not magic. It can be taught the conventions for each packaging
+system, but these will not work in all cases. When an import fails it will
+continue to the next package, so that the first run does as many imports as
+possible.
+
+For imports that could not be done automatically, you will need to write an
+appropriate .lorry or .morph file manually and rerun the tool. It will resume
+processing where it left off.
+
+It's possible to teach the code about more conventions, but it is only
+worthwhile to do that for common patterns.
+
+
+Package-system specific code and data
+-------------------------------------
+
+For each supported package system, there should be an xxx.to_lorry program, and
+a xxx.to_chunk program. These should output on stdout a .lorry file and a .morph
+file, respectively.
+
+Each packaging system can have static data saved in a .yaml file, for known
+metadata that the programs cannot discover automatically.
+
+The following field should be honoured by most packaging systems:
+`known-source-uris`. It maps package name to source URI.
+
+
+Help with .lorry generation
+---------------------------
+
+The simplest fix is to add the source to the 'known-source-uris` dict in the
+static metadata.
+
+If you write a .lorry file by hand, be sure to fill in the `x-products-YYY`
+field. 'x' means this field is an extension to the .lorry format. YYY is the
+name of the packaging system, e.g. 'rubygems'. It should contain a list of
+which packages this repository contains the source code for.
+
+
+Help with linking package version to Git tag
+--------------------------------------------
+
+Some projects do not tag releases.
+
+Currently, you must create a tag in the local checkout for the tool to continue.
+In future, the Lorry tool should be extended to handle creation of missing
+tags, so that they are propagated to the project Trove. The .lorry file would
+need to contain a dict mapping product version number to commit SHA1.
+
+If you are in a hurry, you can use the `--use-master-if-no-tag` option. Instead
+of an error, the tool will use whatever is the `master` ref of the component
+repo.
+
+
+Help with chunk .morph generation
+---------------------------------
+
+If you create a chunk morph by hand, you must add some extra fields:
+
+  - `x-build-dependencies-YYY`
+  - `x-runtime-dependencies-YYY`
+
+These are a dict mapping dependency name to dependency version. For example:
+
+    x-build-dependencies-rubygems: {}
+    x-runtime-dependencies-rubygems:
+        hashie: 2.1.2
+        json: 1.8.1
+        mixlib-log: 1.6.0
+        rack: 1.5.2
+
+All dependencies will be included in the resulting stratum. Those which are build
+dependencies of other components will be added to the relevant 'build-depends'
+field.
+
+These fields are non-standard extensions to the morphology format.
+
+For more package-system specific information, see the relevant README file, e.g
+README.rubygems for RubyGem imports.
diff --git a/README.omnibus b/README.omnibus
new file mode 100644
index 0000000..840bbab
--- /dev/null
+++ b/README.omnibus
@@ -0,0 +1,17 @@
+Omnibus import
+==============
+
+See 'README' for general information on the Baserock Import Tool.
+
+To use
+------
+
+First, clone the Git repository corresponding to the Omnibus project you want
+to import. For example, if you want to import the Chef Server, clone:
+<https://github.com/opscode/omnibus-chef-server>
+
+As per Omnibus' instructions, you should then run `bundle install --binstubs`
+in the checkout to make available the various dependent repos and Gems of the
+project definitions.
+
+
diff --git a/README.rubygems b/README.rubygems
new file mode 100644
index 0000000..1afb62d
--- /dev/null
+++ b/README.rubygems
@@ -0,0 +1,52 @@
+Here is some information I have learned while importing RubyGem packages into
+Baserock.
+
+First, beware that RubyGem .gemspec files are actually normal Ruby programs,
+and are executed when loaded. A Bundler Gemfile is also a Ruby program, and
+could run arbitrary code when loaded.
+
+The Standard Case
+-----------------
+
+Most Ruby projects provide one or more .gemspec files, which describe the
+runtime and development dependencies of the Gem.
+
+Using the .gemspec file and the `gem build` command it is possible to create
+the .gem file. It can then be installed with `gem install`.
+
+Note that use of `gem build` is discouraged by its own help file in favour
+of using Rake, but there is much less standardisation among Rakefiles and they
+may introduce requirements on Hoe, rake-compiler, Jeweler or other tools.
+
+The 'development' dependencies includes everything useful to test, document,
+and create a Gem of the project. All we want to do is create a Gem, which I'll
+refer to as 'building'.
+
+
+Gem with no .gemspec
+--------------------
+
+Some Gems choose not to include a .gemspec, like [Nokigori]. In the case of
+Nokigori, and others, [Hoe] is used, which adds Rake tasks that create the Gem.
+The `gem build` command cannot not be used in these cases.
+
+You may be able to use the `rake gem` command instead of `gem build`.
+
+[Nokigori]: https://github.com/sparklemotion/nokogiri/blob/master/Y_U_NO_GEMSPEC.md
+[Hoe]: http://www.zenspider.com/projects/hoe.html
+
+
+Signed Gems
+-----------
+
+It's possible for a Gem maintainer to sign their Gems. See:
+
+  - <http://blog.meldium.com/home/2013/3/3/signed-rubygems-part>
+  - <http://www.ruby-doc.org/stdlib-1.9.3/libdoc/rubygems/rdoc/Gem/Security.html>
+
+When building a Gem in Baserock, signing is unnecessary because it's not going
+to be shared except as part of the build system. The .gemspec may include a
+`signing_key` field, which will be a local path on the maintainer's system to
+their private key. Removing this field causes an unsigned Gem to be built.
+
+Known Gems that do this: 'net-ssh' and family.
diff --git a/importer_base.py b/importer_base.py
new file mode 100644
index 0000000..5def0dc
--- /dev/null
+++ b/importer_base.py
@@ -0,0 +1,72 @@
+# Base class for import tools written in Python.
+#
+# Copyright (C) 2014  Codethink Limited
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; version 2 of the License.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License along
+# with this program; if not, write to the Free Software Foundation, Inc.,
+# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+
+
+import logging
+import os
+import sys
+
+
+class ImportException(Exception):
+    pass
+
+
+class ImportExtension(object):
+    '''A base class for import extensions.
+
+    A subclass should subclass this class, and add a ``process_args`` method.
+
+    Note that it is not necessary to subclass this class for import extensions.
+    This class is here just to collect common code.
+
+    '''
+
+    def __init__(self):
+        self.setup_logging()
+
+    def setup_logging(self):
+        '''Direct all logging output to MORPH_LOG_FD, if set.
+
+        This file descriptor is read by Morph and written into its own log
+        file.
+
+        This overrides cliapp's usual configurable logging setup.
+
+        '''
+        log_write_fd = int(os.environ.get('MORPH_LOG_FD', 0))
+
+        if log_write_fd == 0:
+            return
+
+        formatter = logging.Formatter('%(message)s')
+
+        handler = logging.StreamHandler(os.fdopen(log_write_fd, 'w'))
+        handler.setFormatter(formatter)
+
+        logger = logging.getLogger()
+        logger.addHandler(handler)
+        logger.setLevel(logging.DEBUG)
+
+    def process_args(self, args):
+        raise NotImplementedError()
+
+    def run(self):
+        try:
+            self.process_args(sys.argv[1:])
+        except ImportException as e:
+            sys.stderr.write('ERROR: %s\n' % e.message)
+            sys.exit(1)
diff --git a/importer_base.rb b/importer_base.rb
new file mode 100644
index 0000000..4e7a7b5
--- /dev/null
+++ b/importer_base.rb
@@ -0,0 +1,81 @@
+#!/usr/bin/env ruby
+#
+# Base class for importers written in Ruby
+#
+# Copyright (C) 2014  Codethink Limited
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; version 2 of the License.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License along
+# with this program; if not, write to the Free Software Foundation, Inc.,
+# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+
+require 'json'
+require 'logger'
+require 'optparse'
+require 'yaml'
+
+module Importer
+  class Base
+    private
+
+    def create_option_parser(banner, description)
+      opts = OptionParser.new
+
+      opts.banner = banner
+
+      opts.on('-?', '--help', 'print this help') do
+        puts opts
+        print "\n", description
+        exit 255
+      end
+    end
+
+    def log
+      @logger ||= create_logger
+    end
+
+    def error(message)
+      log.error(message)
+      STDERR.puts(message)
+    end
+
+    def local_data_path(file)
+      # Return the path to 'file' relative to the currently running program.
+      # Used as a simple mechanism of finding local data files.
+      script_dir = File.dirname(__FILE__)
+      File.join(script_dir, file)
+    end
+
+    def write_lorry(file, lorry)
+      format_options = { :indent => '    ' }
+      file.puts(JSON.pretty_generate(lorry, format_options))
+    end
+
+    def write_morph(file, morph)
+      file.write(YAML.dump(morph))
+    end
+
+    def create_logger
+      # Use the logger that was passed in from the 'main' import process, if
+      # detected.
+      log_fd = ENV['MORPH_LOG_FD']
+      if log_fd
+        log_stream = IO.new(Integer(log_fd), 'w')
+        logger = Logger.new(log_stream)
+        logger.level = Logger::DEBUG
+        logger.formatter = proc { |severity, datetime, progname, msg| "#{msg}\n" }
+      else
+        logger = Logger.new('/dev/null')
+      end
+      logger
+    end
+  end
+end
diff --git a/main.py b/main.py
new file mode 100644
index 0000000..b5ebece
--- /dev/null
+++ b/main.py
@@ -0,0 +1,920 @@
+#!/usr/bin/python
+# Import foreign packaging systems into Baserock
+#
+# Copyright (C) 2014  Codethink Limited
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; version 2 of the License.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License along
+# with this program; if not, write to the Free Software Foundation, Inc.,
+# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+
+
+import ansicolor
+import cliapp
+import morphlib
+import networkx
+import six
+
+import contextlib
+import copy
+import json
+import logging
+import os
+import pipes
+import sys
+import tempfile
+import time
+
+from logging import debug
+
+
+class LorrySet(object):
+    '''Manages a set of .lorry files.
+
+    The structure of .lorry files makes the code a little more confusing than
+    I would like. A lorry "entry" is a dict of one entry mapping name to info.
+    A lorry "file" is a dict of one or more of these entries merged together.
+    If it were a list of entries with 'name' fields, the code would be neater.
+
+    '''
+    def __init__(self, lorries_path):
+        self.path = lorries_path
+
+        if os.path.exists(lorries_path):
+            self.data = self.parse_all_lorries()
+        else:
+            os.makedirs(lorries_path)
+            self.data = {}
+
+    def all_lorry_files(self):
+        for dirpath, dirnames, filenames in os.walk(self.path):
+            for filename in filenames:
+                if filename.endswith('.lorry'):
+                    yield os.path.join(dirpath, filename)
+
+    def parse_all_lorries(self):
+        lorry_set = {}
+        for lorry_file in self.all_lorry_files():
+            lorry = self.parse_lorry(lorry_file)
+
+            lorry_items = lorry.items()
+
+            for key, value in lorry_items:
+                if key in lorry_set:
+                    raise Exception(
+                        '%s: duplicates existing lorry %s' % (lorry_file, key))
+
+            lorry_set.update(lorry_items)
+
+        return lorry_set
+
+    def parse_lorry(self, lorry_file):
+        try:
+            with open(lorry_file, 'r') as f:
+                lorry = json.load(f)
+            return lorry
+        except ValueError as e:
+            raise cliapp.AppException(
+                "Error parsing %s: %s" % (lorry_file, e))
+
+    def get_lorry(self, name):
+        return {name: self.data[name]}
+
+    def find_lorry_for_package(self, kind, package_name):
+        key = 'x-products-%s' % kind
+        for name, lorry in self.data.iteritems():
+            products = lorry.get(key, [])
+            for entry in products:
+                if entry == package_name:
+                    return {name: lorry}
+
+        return None
+
+    def _check_for_conflicts_in_standard_fields(self, existing, new):
+        '''Ensure that two lorries for the same project do actually match.'''
+        for field, value in existing.iteritems():
+            if field.startswith('x-'):
+                continue
+            if field == 'url':
+                # FIXME: need a much better way of detecting whether the URLs
+                # are equivalent ... right now HTTP vs. HTTPS will cause an
+                # error, for example!
+                matches = (value.rstrip('/') == new[field].rstrip('/'))
+            else:
+                matches = (value == new[field])
+            if not matches:
+                raise Exception(
+                    'Lorry %s conflicts with existing entry %s at field %s' %
+                    (new, existing, field))
+
+    def _merge_products_fields(self, existing, new):
+        '''Merge the x-products- fields from new lorry into an existing one.'''
+        is_product_field = lambda x: x.startswith('x-products-')
+
+        existing_fields = [f for f in existing.iterkeys() if
+                           is_product_field(f)]
+        new_fields = [f for f in new.iterkeys() if f not in existing_fields and
+                      is_product_field(f)]
+
+        for field in existing_fields:
+            existing[field].extend(new[field])
+            existing[field] = list(set(existing[field]))
+
+        for field in new_fields:
+            existing[field] = new[field]
+
+    def add(self, filename, lorry_entry):
+        logging.debug('Adding %s to lorryset', filename)
+
+        filename = os.path.join(self.path, '%s.lorry' % filename)
+
+        assert len(lorry_entry) == 1
+
+        project_name = lorry_entry.keys()[0]
+        info = lorry_entry.values()[0]
+
+        if len(project_name) == 0:
+            raise cliapp.AppException(
+                'Invalid lorry %s: %s' % (filename, lorry_entry))
+
+        if not isinstance(info.get('url'), six.string_types):
+            raise cliapp.AppException(
+                'Invalid URL in lorry %s: %s' % (filename, info.get('url')))
+
+        if project_name in self.data:
+            stored_lorry = self.get_lorry(project_name)
+
+            self._check_for_conflicts_in_standard_fields(
+                stored_lorry[project_name], lorry_entry[project_name])
+            self._merge_products_fields(
+                stored_lorry[project_name], lorry_entry[project_name])
+            lorry_entry = stored_lorry
+        else:
+            self.data[project_name] = info
+
+        self._add_lorry_entry_to_lorry_file(filename, lorry_entry)
+
+    def _add_lorry_entry_to_lorry_file(self, filename, entry):
+        if os.path.exists(filename):
+            with open(filename) as f:
+                contents = json.load(f)
+        else:
+            contents = {}
+
+        contents.update(entry)
+
+        with morphlib.savefile.SaveFile(filename, 'w') as f:
+            json.dump(contents, f, indent=4, separators=(',', ': '),
+                      sort_keys=True)
+
+
+class MorphologySet(morphlib.morphset.MorphologySet):
+    def __init__(self, path):
+        super(MorphologySet, self).__init__()
+
+        self.path = path
+        self.loader = morphlib.morphloader.MorphologyLoader()
+
+        if os.path.exists(path):
+            self.load_all_morphologies()
+        else:
+            os.makedirs(path)
+
+    def load_all_morphologies(self):
+        logging.info('Loading all .morph files under %s', self.path)
+
+        class FakeGitDir(morphlib.gitdir.GitDirectory):
+            '''Ugh
+
+            This is here because the default constructor will search up the
+            directory heirarchy until it finds a '.git' directory, but that
+            may be totally the wrong place for our purpose: we don't have a
+            Git directory at all.
+
+            '''
+            def __init__(self, path):
+                self.dirname = path
+                self._config = {}
+
+        gitdir = FakeGitDir(self.path)
+        finder = morphlib.morphologyfinder.MorphologyFinder(gitdir)
+        for filename in (f for f in finder.list_morphologies()
+                         if not gitdir.is_symlink(f)):
+            text = finder.read_morphology(filename)
+            morph = self.loader.load_from_string(text, filename=filename)
+            morph.repo_url = None  # self.root_repository_url
+            morph.ref = None  # self.system_branch_name
+            self.add_morphology(morph)
+
+    def get_morphology(self, repo_url, ref, filename):
+        return self._get_morphology(repo_url, ref, filename)
+
+    def save_morphology(self, filename, morphology):
+        self.add_morphology(morphology)
+        morphology_to_save = copy.copy(morphology)
+        self.loader.unset_defaults(morphology_to_save)
+        filename = os.path.join(self.path, filename)
+        self.loader.save_to_file(filename, morphology_to_save)
+
+
+class GitDirectory(morphlib.gitdir.GitDirectory):
+    def __init__(self, dirname):
+        super(GitDirectory, self).__init__(dirname)
+
+        # Work around strange/unintentional behaviour in GitDirectory class
+        # when 'repopath' isn't a Git repo. If 'repopath' is contained
+        # within a Git repo then the GitDirectory will traverse up to the
+        # parent repo, which isn't what we want in this case.
+        if self.dirname != dirname:
+            logging.error(
+                'Got git directory %s for %s!', self.dirname, dirname)
+            raise cliapp.AppException(
+                '%s is not the root of a Git repository' % dirname)
+
+    def has_ref(self, ref):
+        try:
+            self._rev_parse(ref)
+            return True
+        except morphlib.gitdir.InvalidRefError:
+            return False
+
+
+class BaserockImportException(cliapp.AppException):
+    pass
+
+
+class Package(object):
+    '''A package in the processing queue.
+
+    In order to provide helpful errors, this item keeps track of what
+    packages depend on it, and hence of why it was added to the queue.
+
+    '''
+    def __init__(self, kind, name, version):
+        self.kind = kind
+        self.name = name
+        self.version = version
+        self.required_by = []
+        self.morphology = None
+        self.is_build_dep = False
+        self.version_in_use = version
+
+    def __cmp__(self, other):
+        return cmp(self.name, other.name)
+
+    def __repr__(self):
+        return '<Package %s-%s>' % (self.name, self.version)
+
+    def __str__(self):
+        if len(self.required_by) > 0:
+            required_msg = ', '.join(self.required_by)
+            required_msg = ', required by: ' + required_msg
+        else:
+            required_msg = ''
+        return '%s-%s%s' % (self.name, self.version, required_msg)
+
+    def add_required_by(self, item):
+        self.required_by.append('%s-%s' % (item.name, item.version))
+
+    def match(self, name, version):
+        return (self.name==name and self.version==version)
+
+    def set_morphology(self, morphology):
+        self.morphology = morphology
+
+    def set_is_build_dep(self, is_build_dep):
+        self.is_build_dep = is_build_dep
+
+    def set_version_in_use(self, version_in_use):
+        self.version_in_use = version_in_use
+
+
+def find(iterable, match):
+    return next((x for x in iterable if match(x)), None)
+
+
+def run_extension(filename, args, cwd='.'):
+    output = []
+    errors = []
+
+    ext_logger = logging.getLogger(filename)
+
+    def report_extension_stdout(line):
+        output.append(line)
+
+    def report_extension_stderr(line):
+        errors.append(line)
+
+    def report_extension_logger(line):
+        ext_logger.debug(line)
+
+    ext = morphlib.extensions.ExtensionSubprocess(
+        report_stdout=report_extension_stdout,
+        report_stderr=report_extension_stderr,
+        report_logger=report_extension_logger,
+    )
+
+    # There are better ways of doing this, but it works for now.
+    main_path = os.path.dirname(os.path.realpath(__file__))
+    extension_path = os.path.join(main_path, filename)
+
+    logging.debug("Running %s %s with cwd %s" % (extension_path, args, cwd))
+    returncode = ext.run(extension_path, args, cwd, os.environ)
+
+    if returncode == 0:
+        ext_logger.info('succeeded')
+    else:
+        for line in errors:
+            ext_logger.error(line)
+        message = '%s failed with code %s: %s' % (
+            filename, returncode, '\n'.join(errors))
+        raise BaserockImportException(message)
+
+    return '\n'.join(output)
+
+
+class ImportLoop(object):
+    '''Import a package and all of its dependencies into Baserock.
+
+    This class holds the state for the processing loop.
+
+    '''
+
+    def __init__(self, app, goal_kind, goal_name, goal_version, extra_args=[]):
+        self.app = app
+        self.goal_kind = goal_kind
+        self.goal_name = goal_name
+        self.goal_version = goal_version
+        self.extra_args = extra_args
+
+        self.lorry_set = LorrySet(self.app.settings['lorries-dir'])
+        self.morph_set = MorphologySet(self.app.settings['definitions-dir'])
+
+        self.morphloader = morphlib.morphloader.MorphologyLoader()
+
+        self.importers = {}
+
+    def enable_importer(self, kind, extra_args=[]):
+        assert kind not in self.importers
+        self.importers[kind] = {
+            'extra_args': extra_args
+        }
+
+    def run(self):
+        '''Process the goal package and all of its dependencies.'''
+        start_time = time.time()
+        start_displaytime = time.strftime('%x %X %Z', time.localtime())
+
+        self.app.status(
+            '%s: Import of %s %s started', start_displaytime, self.goal_kind,
+            self.goal_name)
+
+        if not self.app.settings['update-existing']:
+            self.app.status(
+                'Not updating existing Git checkouts or existing definitions')
+
+        chunk_dir = os.path.join(self.morph_set.path, 'strata', self.goal_name)
+        if not os.path.exists(chunk_dir):
+            os.makedirs(chunk_dir)
+
+        goal = Package(self.goal_kind, self.goal_name, self.goal_version)
+        to_process = [goal]
+        processed = networkx.DiGraph()
+
+        errors = {}
+
+        while len(to_process) > 0:
+            current_item = to_process.pop()
+
+            try:
+                self._process_package(current_item)
+                error = False
+            except BaserockImportException as e:
+                self.app.status(str(e), error=True)
+                errors[current_item] = e
+                error = True
+
+            processed.add_node(current_item)
+
+            if not error:
+                self._process_dependencies_from_morphology(
+                    current_item, current_item.morphology, to_process,
+                    processed)
+
+        if len(errors) > 0:
+            self.app.status(
+                '\nErrors encountered, not generating a stratum morphology.')
+            self.app.status(
+                'See the README files for guidance.')
+        else:
+            self._generate_stratum_morph_if_none_exists(
+                processed, self.goal_name)
+
+        duration = time.time() - start_time
+        end_displaytime = time.strftime('%x %X %Z', time.localtime())
+
+        self.app.status(
+            '%s: Import of %s %s ended (took %i seconds)', end_displaytime,
+            self.goal_kind, self.goal_name, duration)
+
+    def _process_package(self, package):
+        kind = package.kind
+        name = package.name
+        version = package.version
+
+        lorry = self._find_or_create_lorry_file(kind, name)
+        source_repo, url = self._fetch_or_update_source(lorry)
+
+        checked_out_version, ref = self._checkout_source_version(
+            source_repo, name, version)
+        package.set_version_in_use(checked_out_version)
+
+        chunk_morph = self._find_or_create_chunk_morph(
+            kind, name, checked_out_version, source_repo, url, ref)
+
+        if self.app.settings['use-local-sources']:
+            chunk_morph.repo_url = 'file://' + source_repo.dirname
+        else:
+            reponame = lorry.keys()[0]
+            chunk_morph.repo_url = 'upstream:%s' % reponame
+
+        package.set_morphology(chunk_morph)
+
+    def _process_dependencies_from_morphology(self, current_item, morphology,
+                                              to_process, processed):
+        '''Enqueue all dependencies of a package that are yet to be processed.
+
+        Dependencies are communicated using extra fields in morphologies,
+        currently.
+
+        '''
+        for key, value in morphology.iteritems():
+            if key.startswith('x-build-dependencies-'):
+                kind = key[len('x-build-dependencies-'):]
+                is_build_deps = True
+            elif key.startswith('x-runtime-dependencies-'):
+                kind = key[len('x-runtime-dependencies-'):]
+                is_build_deps = False
+            else:
+                continue
+
+            # We need to validate this field because it doesn't go through the
+            # normal MorphologyFactory validation, being an extension.
+            if not hasattr(value, 'iteritems'):
+                value_type = type(value).__name__
+                raise cliapp.AppException(
+                    "Morphology for %s has invalid '%s': should be a dict, but "
+                    "got a %s." % (morphology['name'], key, value_type))
+
+            self._process_dependency_list(
+                current_item, kind, value, to_process, processed, is_build_deps)
+
+    def _process_dependency_list(self, current_item, kind, deps, to_process,
+                                 processed, these_are_build_deps):
+        # All deps are added as nodes to the 'processed' graph. Runtime
+        # dependencies only need to appear in the stratum, but build
+        # dependencies have ordering constraints, so we add edges in
+        # the graph for build-dependencies too.
+
+        for dep_name, dep_version in deps.iteritems():
+            dep_package = find(
+                processed, lambda i: i.match(dep_name, dep_version))
+
+            if dep_package is None:
+                # Not yet processed
+                queue_item = find(
+                    to_process, lambda i: i.match(dep_name, dep_version))
+                if queue_item is None:
+                    queue_item = Package(kind, dep_name, dep_version)
+                    to_process.append(queue_item)
+                dep_package = queue_item
+
+            dep_package.add_required_by(current_item)
+
+            if these_are_build_deps or current_item.is_build_dep:
+                # A runtime dep of a build dep becomes a build dep
+                # itself.
+                dep_package.set_is_build_dep(True)
+                processed.add_edge(dep_package, current_item)
+
+    def _find_or_create_lorry_file(self, kind, name):
+        # Note that the lorry file may already exist for 'name', but lorry
+        # files are named for project name rather than package name. In this
+        # case we will generate the lorry, and try to add it to the set, at
+        # which point LorrySet will notice the existing one and merge the two.
+        lorry = self.lorry_set.find_lorry_for_package(kind, name)
+
+        if lorry is None:
+            lorry = self._generate_lorry_for_package(kind, name)
+
+            if len(lorry) != 1:
+                raise Exception(
+                    'Expected generated lorry file with one entry.')
+
+            lorry_filename = lorry.keys()[0]
+
+            if '/' in lorry_filename:
+                # We try to be a bit clever and guess that if there's a prefix
+                # in the name, e.g. 'ruby-gems/chef' then it should go in a
+                # mega-lorry file, such as ruby-gems.lorry.
+                parts = lorry_filename.split('/', 1)
+                lorry_filename = parts[0]
+
+            if lorry_filename == '':
+                raise cliapp.AppException(
+                    'Invalid lorry data for %s: %s' % (name, lorry))
+
+            self.lorry_set.add(lorry_filename, lorry)
+        else:
+            lorry_filename = lorry.keys()[0]
+            logging.info(
+                'Found existing lorry file for %s: %s', name, lorry_filename)
+
+        return lorry
+
+    def _generate_lorry_for_package(self, kind, name):
+        tool = '%s.to_lorry' % kind
+        if kind not in self.importers:
+            raise Exception('Importer for %s was not enabled.' % kind)
+        extra_args = self.importers[kind]['extra_args']
+        self.app.status('Calling %s to generate lorry for %s', tool, name)
+        lorry_text = run_extension(tool, extra_args + [name])
+        try:
+            lorry = json.loads(lorry_text)
+        except ValueError as e:
+            raise cliapp.AppException(
+                'Invalid output from %s: %s' % (tool, lorry_text))
+        return lorry
+
+    def _run_lorry(self, lorry):
+        f = tempfile.NamedTemporaryFile(delete=False)
+        try:
+            logging.debug(json.dumps(lorry))
+            json.dump(lorry, f)
+            f.close()
+            cliapp.runcmd([
+                'lorry', '--working-area',
+                self.app.settings['lorry-working-dir'], '--pull-only',
+                '--bundle', 'never', '--tarball', 'never', f.name])
+        finally:
+            os.unlink(f.name)
+
+    def _fetch_or_update_source(self, lorry):
+        assert len(lorry) == 1
+        lorry_name, lorry_entry = lorry.items()[0]
+
+        url = lorry_entry['url']
+        reponame = '_'.join(lorry_name.split('/'))
+        repopath = os.path.join(
+            self.app.settings['lorry-working-dir'], reponame, 'git')
+
+        checkoutpath = os.path.join(
+            self.app.settings['checkouts-dir'], reponame)
+
+        try:
+            already_lorried = os.path.exists(repopath)
+            if already_lorried:
+                if self.app.settings['update-existing']:
+                    self.app.status('Updating lorry of %s', url)
+                    self._run_lorry(lorry)
+            else:
+                self.app.status('Lorrying %s', url)
+                self._run_lorry(lorry)
+
+            if os.path.exists(checkoutpath):
+                repo = GitDirectory(checkoutpath)
+                repo.update_remotes()
+            else:
+                if already_lorried:
+                    logging.warning(
+                        'Expected %s to exist, but will recreate it',
+                        checkoutpath)
+                cliapp.runcmd(['git', 'clone', repopath, checkoutpath])
+                repo = GitDirectory(checkoutpath)
+        except cliapp.AppException as e:
+            raise BaserockImportException(e.msg.rstrip())
+
+        return repo, url
+
+    def _checkout_source_version(self, source_repo, name, version):
+        # FIXME: we need to be a bit smarter than this. Right now we assume
+        # that 'version' is a valid Git ref.
+
+        possible_names = [
+            version,
+            'v%s' % version,
+            '%s-%s' % (name, version)
+        ]
+
+        for tag_name in possible_names:
+            if source_repo.has_ref(tag_name):
+                source_repo.checkout(tag_name)
+                ref = tag_name
+                break
+        else:
+            if self.app.settings['use-master-if-no-tag']:
+                logging.warning(
+                    "Couldn't find tag %s in repo %s. Using 'master'.",
+                    tag_name, source_repo)
+                source_repo.checkout('master')
+                ref = version = 'master'
+            else:
+                raise BaserockImportException(
+                    'Could not find ref for %s version %s.' % (name, version))
+
+        return version, ref
+
+    def _find_or_create_chunk_morph(self, kind, name, version, source_repo,
+                                    repo_url, named_ref):
+        morphology_filename = 'strata/%s/%s-%s.morph' % (
+            self.goal_name, name, version)
+        sha1 = source_repo.resolve_ref_to_commit(named_ref)
+
+        def generate_morphology():
+            morphology = self._generate_chunk_morph_for_package(
+                source_repo, kind, name, version, morphology_filename)
+            self.morph_set.save_morphology(morphology_filename, morphology)
+            return morphology
+
+        if self.app.settings['update-existing']:
+            morphology = generate_morphology()
+        else:
+            morphology = self.morph_set.get_morphology(
+                repo_url, sha1, morphology_filename)
+
+            if morphology is None:
+                # Existing chunk morphologies loaded from disk don't contain
+                # the repo and ref information. That's stored in the stratum
+                # morph. So the first time we touch a chunk morph we need to
+                # set this info.
+                logging.debug("Didn't find morphology for %s|%s|%s", repo_url,
+                              sha1, morphology_filename)
+                morphology = self.morph_set.get_morphology(
+                    None, None, morphology_filename)
+
+                if morphology is None:
+                    logging.debug("Didn't find morphology for None|None|%s",
+                                  morphology_filename)
+                    morphology = generate_morphology()
+
+        morphology.repo_url = repo_url
+        morphology.ref = sha1
+        morphology.named_ref = named_ref
+
+        return morphology
+
+    def _generate_chunk_morph_for_package(self, source_repo, kind, name,
+                                          version, filename):
+        tool = '%s.to_chunk' % kind
+
+        if kind not in self.importers:
+            raise Exception('Importer for %s was not enabled.' % kind)
+        extra_args = self.importers[kind]['extra_args']
+
+        self.app.status(
+            'Calling %s to generate chunk morph for %s %s', tool, name,
+            version)
+
+        args = extra_args + [source_repo.dirname, name]
+        if version != 'master':
+            args.append(version)
+        text = run_extension(tool, args)
+
+        return self.morphloader.load_from_string(text, filename)
+
+    def _sort_chunks_by_build_order(self, graph):
+        order = reversed(sorted(graph.nodes()))
+        try:
+            return networkx.topological_sort(graph, nbunch=order)
+        except networkx.NetworkXUnfeasible as e:
+            # Cycle detected!
+            loop_subgraphs = networkx.strongly_connected_component_subgraphs(
+                graph, copy=False)
+            all_loops_str = []
+            for graph in loop_subgraphs:
+                if graph.number_of_nodes() > 1:
+                    loops_str = '->'.join(str(node) for node in graph.nodes())
+                    all_loops_str.append(loops_str)
+            raise cliapp.AppException(
+                'One or more cycles detected in build graph: %s' %
+                (', '.join(all_loops_str)))
+
+    def _generate_stratum_morph_if_none_exists(self, graph, goal_name):
+        filename = os.path.join(
+            self.app.settings['definitions-dir'], 'strata', '%s.morph' %
+            goal_name)
+
+        if os.path.exists(filename) and not self.app.settings['update-existing']:
+            self.app.status(
+                msg='Found stratum morph for %s at %s, not overwriting' %
+                (goal_name, filename))
+            return
+
+        self.app.status(msg='Generating stratum morph for %s' % goal_name)
+
+        chunk_entries = []
+
+        for package in self._sort_chunks_by_build_order(graph):
+            m = package.morphology
+            if m is None:
+                raise cliapp.AppException('No morphology for %s' % package)
+
+            def format_build_dep(name, version):
+                dep_package = find(graph, lambda p: p.match(name, version))
+                return '%s-%s' % (name, dep_package.version_in_use)
+
+            build_depends = [
+                format_build_dep(name, version) for name, version in
+                m['x-build-dependencies-rubygems'].iteritems()
+            ]
+
+            entry = {
+                'name': m['name'],
+                'repo': m.repo_url,
+                'ref': m.ref,
+                'unpetrify-ref': m.named_ref,
+                'morph': m.filename,
+                'build-depends': build_depends,
+            }
+            chunk_entries.append(entry)
+
+        stratum_name = goal_name
+        stratum = {
+            'name': stratum_name,
+            'kind': 'stratum',
+            'description': 'Autogenerated by Baserock import tool',
+            'build-depends': [
+                {'morph': 'strata/ruby.morph'}
+            ],
+            'chunks': chunk_entries,
+        }
+
+        morphology = self.morphloader.load_from_string(
+            json.dumps(stratum), filename=filename)
+        self.morphloader.unset_defaults(morphology)
+        self.morphloader.save_to_file(filename, morphology)
+
+
+class BaserockImportApplication(cliapp.Application):
+    def add_settings(self):
+        self.settings.string(['lorries-dir'],
+                             "location for Lorry files",
+                             metavar="PATH",
+                             default=os.path.abspath('./lorries'))
+        self.settings.string(['definitions-dir'],
+                             "location for morphology files",
+                             metavar="PATH",
+                             default=os.path.abspath('./definitions'))
+        self.settings.string(['checkouts-dir'],
+                             "location for Git checkouts",
+                             metavar="PATH",
+                             default=os.path.abspath('./checkouts'))
+        self.settings.string(['lorry-working-dir'],
+                             "Lorry working directory",
+                             metavar="PATH",
+                             default=os.path.abspath('./lorry-working-dir'))
+
+        self.settings.boolean(['update-existing'],
+                              "update all the checked-out Git trees and "
+                              "generated definitions",
+                              default=False)
+        self.settings.boolean(['use-local-sources'],
+                              "use file:/// URLs in the stratum 'repo' "
+                              "fields, instead of upstream: URLs",
+                              default=False)
+        self.settings.boolean(['use-master-if-no-tag'],
+                              "if the correct tag for a version can't be "
+                              "found, use 'master' instead of raising an "
+                              "error",
+                              default=False)
+
+    def _stream_has_colours(self, stream):
+        # http://blog.mathieu-leplatre.info/colored-output-in-console-with-python.html
+        if not hasattr(stream, "isatty"):
+            return False
+        if not stream.isatty():
+            return False # auto color only on TTYs
+        try:
+            import curses
+            curses.setupterm()
+            return curses.tigetnum("colors") > 2
+        except:
+            # guess false in case of error
+            return False
+
+    def setup(self):
+        self.add_subcommand('omnibus', self.import_omnibus,
+                            arg_synopsis='REPO PROJECT_NAME SOFTWARE_NAME')
+        self.add_subcommand('rubygems', self.import_rubygems,
+                            arg_synopsis='GEM_NAME')
+
+        self.stdout_has_colours = self._stream_has_colours(sys.stdout)
+
+    def setup_logging_formatter_for_file(self):
+        root_logger = logging.getLogger()
+        root_logger.name = 'main'
+
+        # You need recent cliapp for this to work, with commit "Split logging
+        # setup into further overrideable methods".
+        return logging.Formatter("%(name)s: %(levelname)s: %(message)s")
+
+    def process_args(self, args):
+        if len(args) == 0:
+            # Cliapp default is to just say "ERROR: must give subcommand" if
+            # no args are passed, I prefer this.
+            args = ['help']
+
+        super(BaserockImportApplication, self).process_args(args)
+
+    def status(self, msg, *args, **kwargs):
+        text = msg % args
+        if kwargs.get('error') == True:
+            logging.error(text)
+            if self.stdout_has_colours:
+                sys.stdout.write(ansicolor.red(text))
+            else:
+                sys.stdout.write(text)
+        else:
+            logging.info(text)
+            sys.stdout.write(text)
+        sys.stdout.write('\n')
+
+    def import_omnibus(self, args):
+        '''Import a software component from an Omnibus project.
+
+        Omnibus is a tool for generating application bundles for various
+        platforms. See <https://github.com/opscode/omnibus> for more
+        information.
+
+        '''
+        if len(args) != 3:
+            raise cliapp.AppException(
+                'Please give the location of the Omnibus definitions repo, '
+                'and the name of the project and the top-level software '
+                'component.')
+
+        def running_inside_bundler():
+            return 'BUNDLE_GEMFILE' in os.environ
+
+        def command_to_run_python_in_directory(directory, args):
+            # Bundler requires that we run it from the Omnibus project
+            # directory. That messes up any relative paths the user may have
+            # passed on the commandline, so we do a bit of a hack to change
+            # back to the original directory inside the `bundle exec` process.
+            subshell_command = "(cd %s; exec python %s)" % \
+                (pipes.quote(directory), ' '.join(map(pipes.quote, args)))
+            shell_command = "sh -c %s" % pipes.quote(subshell_command)
+            return shell_command
+
+        def reexecute_self_with_bundler(path):
+            script = sys.argv[0]
+
+            logging.info('Reexecuting %s within Bundler, so that extensions '
+                         'use the correct dependencies for Omnibus and the '
+                         'Omnibus project definitions.', script)
+            command = command_to_run_python_in_directory(os.getcwd(), sys.argv)
+
+            logging.debug('Running: `bundle exec %s` in dir %s', command, path)
+            os.chdir(path)
+            os.execvp('bundle', [script, 'exec', command])
+
+        # Omnibus definitions are spread across multiple repos, and there is
+        # no stability guarantee for the definition format. The official advice
+        # is to use Bundler to execute Omnibus, so let's do that.
+        if not running_inside_bundler():
+            reexecute_self_with_bundler(args[0])
+
+        definitions_dir = args[0]
+        project_name = args[1]
+
+        loop = ImportLoop(
+            app=self,
+            goal_kind='omnibus', goal_name=args[2], goal_version='master')
+        loop.enable_importer('omnibus',
+                             extra_args=[definitions_dir, project_name])
+        loop.enable_importer('rubygems')
+        loop.run()
+
+    def import_rubygems(self, args):
+        '''Import one or more RubyGems.'''
+        if len(args) != 1:
+            raise cliapp.AppException(
+                'Please pass the name of a RubyGem on the commandline.')
+
+        loop = ImportLoop(
+            app=self,
+            goal_kind='rubygems', goal_name=args[0], goal_version='master')
+        loop.enable_importer('rubygems')
+        loop.run()
+
+
+app = BaserockImportApplication(progname='import')
+app.run()
diff --git a/omnibus.to_chunk b/omnibus.to_chunk
new file mode 100755
index 0000000..1189199
--- /dev/null
+++ b/omnibus.to_chunk
@@ -0,0 +1,274 @@
+#!/usr/bin/env ruby
+#
+# Create a chunk morphology to integrate Omnibus software in Baserock
+#
+# Copyright (C) 2014  Codethink Limited
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; version 2 of the License.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License along
+# with this program; if not, write to the Free Software Foundation, Inc.,
+# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+
+require 'bundler'
+require 'omnibus'
+
+require 'optparse'
+require 'rubygems/commands/build_command'
+require 'rubygems/commands/install_command'
+require 'shellwords'
+
+require_relative 'importer_base'
+
+BANNER = "Usage: omnibus.to_chunk PROJECT_DIR PROJECT_NAME SOURCE_DIR SOFTWARE_NAME"
+
+DESCRIPTION = <<-END
+Generate a .morph file for a given Omnibus software component.
+END
+
+class Omnibus::Builder
+  # It's possible to use `gem install` in build commands, which is a great
+  # way of subverting the dependency tracking Omnibus provides. It's done
+  # in `omnibus-chef/config/software/chefdk.rb`, for example.
+  #
+  # To handle this, here we extend the class that executes the build commands
+  # to detect when `gem install` is run. It uses the Gem library to turn the
+  # commandline back into a Bundler::Dependency object that we can use.
+  #
+  # We also trap `gem build` so we know when a software component is a RubyGem
+  # that should be handled by 'rubygems.to_chunk'.
+
+  class GemBuildCommandParser < Gem::Commands::BuildCommand
+    def gemspec_path(args)
+      handle_options args
+      if options[:args].length != 1
+        raise Exception, "Invalid `gem build` commandline: 1 argument " +
+                         "expected, got #{options[:args]}."
+      end
+      options[:args][0]
+    end
+  end
+
+  class GemInstallCommandParser < Gem::Commands::InstallCommand
+    def dependency_list_from_commandline(args)
+      handle_options args
+
+      # `gem install foo*` is sometimes used when installing a locally built
+      # Gem, to avoid needing to know the exact version number that was built.
+      # We only care about remote Gems being installed, so anything with a '*'
+      # in its name can be ignored.
+      gem_names = options[:args].delete_if { |name| name.include?('*') }
+
+      gem_names.collect do |gem_name|
+        Bundler::Dependency.new(gem_name, options[:version])
+      end
+    end
+  end
+
+  def gem(command, options = {})
+    # This function re-implements the 'gem' function in the build-commands DSL.
+    if command.start_with? 'build'
+      parser = GemBuildCommandParser.new
+      args = Shellwords.split(command).drop(1)
+      if built_gemspec != nil
+        raise Exception, "More than one `gem build` command was run as part " +
+                         "of the build process. The 'rubygems.to_chunk' " +
+                         "program currently supports only one .gemspec " +
+                         "build per chunk, so this can't be processed " +
+                         "automatically."
+      end
+      @built_gemspec = parser.gemspec_path(args)
+    elsif command.start_with? 'install'
+      parser = GemInstallCommandParser.new
+      args = Shellwords.split(command).drop(1)
+      args_without_build_flags = args.take_while { |item| item != '--' }
+      gems = parser.dependency_list_from_commandline(args_without_build_flags)
+      manually_installed_rubygems.concat gems
+    end
+  end
+
+  def built_gemspec
+    @built_gemspec
+  end
+
+  def manually_installed_rubygems
+    @manually_installed_rubygems ||= []
+  end
+end
+
+class OmnibusChunkMorphologyGenerator < Importer::Base
+  def initialize
+    local_data = YAML.load_file(local_data_path("omnibus.yaml"))
+    @dependency_blacklist = local_data['dependency-blacklist']
+  end
+
+  def parse_options(arguments)
+    opts = create_option_parser(BANNER, DESCRIPTION)
+
+    parsed_arguments = opts.parse!(arguments)
+
+    if parsed_arguments.length != 4 and parsed_arguments.length != 5
+      STDERR.puts "Expected 4 or 5 arguments, got #{parsed_arguments}."
+      opts.parse(['-?'])
+      exit 255
+    end
+
+    project_dir, project_name, source_dir, software_name, expected_version = \
+      parsed_arguments
+    # Not yet implemented
+    #if expected_version != nil
+    #  expected_version = Gem::Version.new(expected_version)
+    #end
+    [project_dir, project_name, source_dir, software_name, expected_version]
+  end
+
+  class SubprocessError < RuntimeError
+  end
+
+  def run_tool_capture_output(tool_name, *args)
+    tool_path = local_data_path(tool_name)
+
+    # FIXME: something breaks when we try to share this FD, it's not
+    # ideal that the subprocess doesn't log anything, though.
+    env_changes = {'MORPH_LOG_FD' => nil}
+
+    command = [[tool_path, tool_name], *args]
+    log.info("Running #{command.join(' ')} in #{scripts_dir}")
+
+    text = IO.popen(
+      env_changes, command, :chdir => scripts_dir, :err => [:child, :out]
+    ) do |io|
+      io.read
+    end
+
+    if $? == 0
+      text
+    else
+      raise SubprocessError, text
+    end
+  end
+
+  def generate_chunk_morph_for_rubygems_software(software, source_dir)
+    # This is a better heuristic for getting the name of the Gem
+    # than the software name, it seems ...
+    gem_name = software.relative_path
+
+    text = run_tool_capture_output('rubygems.to_chunk', source_dir, gem_name)
+    log.debug("Text from output: #{text}, result #{$?}")
+
+    morphology = YAML::load(text)
+    return morphology
+  rescue SubprocessError => e
+    error "Tried to import #{software.name} as a RubyGem, got the " \
+          "following error from rubygems.to_chunk: #{e.message}"
+    exit 1
+  end
+
+  def resolve_rubygems_deps(requirements)
+    return {} if requirements.empty?
+
+    log.info('Resolving RubyGem requirements with Bundler')
+
+    fake_gemfile = Bundler::Dsl.new
+    fake_gemfile.source('https://rubygems.org')
+
+    requirements.each do |dep|
+      fake_gemfile.gem(dep.name, dep.requirement)
+    end
+
+    definition = fake_gemfile.to_definition('Gemfile.lock', true)
+    resolved_specs = definition.resolve_remotely!
+
+    Hash[resolved_specs.collect { |spec| [spec.name, spec.version.to_s]}]
+  end
+
+  def generate_chunk_morph_for_software(project, software, source_dir)
+    if software.builder.built_gemspec != nil
+      morphology = generate_chunk_morph_for_rubygems_software(software,
+                                                              source_dir)
+    else
+      morphology = {
+        "name" => software.name,
+        "kind" => "chunk",
+        "description" => "Automatically generated by omnibus.to_chunk"
+      }
+    end
+
+    omnibus_deps = {}
+    rubygems_deps = {}
+
+    software.dependencies.each do |name|
+      software = Omnibus::Software.load(project, name)
+      if @dependency_blacklist.member? name
+        log.info(
+          "Not adding #{name} as a dependency as it is marked to be ignored.")
+      elsif software.fetcher.instance_of?(Omnibus::PathFetcher)
+        log.info(
+          "Not adding #{name} as a dependency: it's installed from " +
+          "a path which probably means that it is package configuration, not " +
+          "a 3rd-party component to be imported.")
+      elsif software.fetcher.instance_of?(Omnibus::NullFetcher)
+        if software.builder.built_gemspec
+          log.info(
+            "Adding #{name} as a RubyGem dependency because it builds " +
+            "#{software.builder.built_gemspec}")
+          rubygems_deps[name] = software.version
+        else
+          log.info(
+            "Not adding #{name} as a dependency: no sources listed.")
+        end
+      else
+        omnibus_deps[name] = software.version
+      end
+    end
+
+    gem_requirements = software.builder.manually_installed_rubygems
+    rubygems_deps = resolve_rubygems_deps(gem_requirements)
+
+    morphology.update({
+      # Possibly this tool should look at software.build and
+      # generate suitable configure, build and install-commands.
+      # For now: don't bother!
+
+      # FIXME: are these build or runtime dependencies? We'll assume both.
+      "x-build-dependencies-omnibus" => omnibus_deps,
+      "x-runtime-dependencies-omnibus" => omnibus_deps,
+
+      "x-build-dependencies-rubygems" => {},
+      "x-runtime-dependencies-rubygems" => rubygems_deps,
+    })
+
+    if software.description
+      morphology['description'] = software.description + '\n\n' +
+        morphology['description']
+    end
+
+    morphology
+  end
+
+  def run
+    project_dir, project_name, source_dir, software_name = parse_options(ARGV)
+
+    log.info("Creating chunk morph for #{software_name} from project " +
+             "#{project_name}, defined in #{project_dir}")
+
+    Dir.chdir(project_dir)
+
+    project = Omnibus::Project.load(project_name)
+
+    software = Omnibus::Software.load(@project, software_name)
+
+    morph = generate_chunk_morph_for_software(project, software, source_dir)
+
+    write_morph(STDOUT, morph)
+  end
+end
+
+OmnibusChunkMorphologyGenerator.new.run
diff --git a/omnibus.to_lorry b/omnibus.to_lorry
new file mode 100755
index 0000000..256f924
--- /dev/null
+++ b/omnibus.to_lorry
@@ -0,0 +1,94 @@
+#!/usr/bin/env ruby
+#
+# Create a Baserock .lorry file for a given Omnibus software component
+#
+# Copyright (C) 2014  Codethink Limited
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; version 2 of the License.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License along
+# with this program; if not, write to the Free Software Foundation, Inc.,
+# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+
+require 'bundler'
+require 'omnibus'
+
+require 'optparse'
+require 'rubygems/commands/install_command'
+require 'shellwords'
+
+require_relative 'importer_base'
+
+BANNER = "Usage: omnibus.to_lorry PROJECT_DIR PROJECT_NAME SOFTWARE_NAME"
+
+DESCRIPTION = <<-END
+Generate a .lorry file for a given Omnibus software component.
+END
+
+class OmnibusLorryGenerator < Importer::Base
+  def parse_options(arguments)
+    opts = create_option_parser(BANNER, DESCRIPTION)
+
+    parsed_arguments = opts.parse!(arguments)
+
+    if parsed_arguments.length != 3
+      STDERR.puts "Expected 3 arguments, got #{parsed_arguments}."
+      opts.parse(['-?'])
+      exit 255
+    end
+
+    project_dir, project_name, software_name = parsed_arguments
+    [project_dir, project_name, software_name]
+  end
+
+  def generate_lorry_for_software(software)
+    lorry_body = {
+      'x-products-omnibus' => [software.name]
+    }
+
+    if software.source and software.source.member? :git
+      lorry_body.update({
+        'type' => 'git',
+        'url' => software.source[:git],
+      })
+    elsif software.source and software.source.member? :url
+      lorry_body.update({
+        'type' => 'tarball',
+        'url' => software.source[:url],
+        # lorry doesn't validate the checksum right now, but maybe it should.
+      'x-md5' => software.source[:md5],
+      })
+    else
+      error "Couldn't generate lorry file from source '#{software.source.inspect}'"
+      exit 1
+    end
+
+    { software.name => lorry_body }
+  end
+
+  def run
+    project_dir, project_name, software_name = parse_options(ARGV)
+
+    log.info("Creating lorry for #{software_name} from project " +
+             "#{project_name}, defined in #{project_dir}")
+
+    Dir.chdir(project_dir)
+
+    project = Omnibus::Project.load(project_name)
+
+    software = Omnibus::Software.load(project, software_name)
+
+    lorry = generate_lorry_for_software(software)
+
+    write_lorry(STDOUT, lorry)
+  end
+end
+
+OmnibusLorryGenerator.new.run
diff --git a/omnibus.yaml b/omnibus.yaml
new file mode 100644
index 0000000..2116f2a
--- /dev/null
+++ b/omnibus.yaml
@@ -0,0 +1,7 @@
+---
+
+dependency-blacklist:
+  # This is provided as a single downloadable .pem file, which isn't something
+  # Lorry can understand. Also, it's provided by the 'ca-certificates' chunk in
+  # Baserock already.
+  - cacerts
diff --git a/rubygems.to_chunk b/rubygems.to_chunk
new file mode 100755
index 0000000..796fe89
--- /dev/null
+++ b/rubygems.to_chunk
@@ -0,0 +1,275 @@
+#!/usr/bin/env ruby
+#
+# Create a chunk morphology to integrate a RubyGem in Baserock
+#
+# Copyright (C) 2014  Codethink Limited
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; version 2 of the License.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License along
+# with this program; if not, write to the Free Software Foundation, Inc.,
+# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+
+require 'bundler'
+
+require_relative 'importer_base'
+
+class << Bundler
+  def default_gemfile
+    # This is a hack to make things not crash when there's no Gemfile
+    Pathname.new('.')
+  end
+end
+
+def spec_is_from_current_source_tree(spec, source_dir)
+  spec.source.instance_of? Bundler::Source::Path and
+    File.identical?(spec.source.path, source_dir)
+end
+
+BANNER = "Usage: rubygems.to_chunk SOURCE_DIR GEM_NAME [VERSION]"
+
+DESCRIPTION = <<-END
+This tool reads the Gemfile and optionally the Gemfile.lock from a Ruby project
+source tree in SOURCE_DIR. It outputs a chunk morphology for GEM_NAME on
+stdout. If VERSION is supplied, it is used to check that the build instructions
+will produce the expected version of the Gem.
+
+It is intended for use with the `baserock-import` tool.
+END
+
+class RubyGemChunkMorphologyGenerator < Importer::Base
+  def initialize
+    local_data = YAML.load_file(local_data_path("rubygems.yaml"))
+    @build_dependency_whitelist = local_data['build-dependency-whitelist']
+  end
+
+  def parse_options(arguments)
+    opts = create_option_parser(BANNER, DESCRIPTION)
+
+    parsed_arguments = opts.parse!(arguments)
+
+    if parsed_arguments.length != 2 && parsed_arguments.length != 3
+      STDERR.puts "Expected 2 or 3 arguments, got #{parsed_arguments}."
+      opts.parse(['-?'])
+      exit 255
+    end
+
+    source_dir, gem_name, expected_version = parsed_arguments
+    source_dir = File.absolute_path(source_dir)
+    if expected_version != nil
+      expected_version = Gem::Version.new(expected_version.dup)
+    end
+    [source_dir, gem_name, expected_version]
+  end
+
+  def load_local_gemspecs()
+    # Look for .gemspec files in the source repo.
+    #
+    # If there is no .gemspec, but you set 'name' and 'version' then
+    # inside Bundler::Source::Path.load_spec_files this call will create a
+    # fake gemspec matching that name and version. That's probably not useful.
+
+    dir = '.'
+
+    source = Bundler::Source::Path.new({
+      'path' => dir,
+    })
+
+    log.info "Loaded #{source.specs.count} specs from source dir."
+    source.specs.each do |spec|
+      log.debug "  * #{spec.inspect} #{spec.dependencies.inspect}"
+    end
+
+    source
+  end
+
+  def get_spec_for_gem(specs, gem_name)
+    found = specs[gem_name].select {|s| Gem::Platform.match(s.platform)}
+    if found.empty?
+      raise Exception,
+        "No Gemspecs found matching '#{gem_name}'"
+    elsif found.length != 1
+      raise Exception,
+        "Unsure which Gem to use for #{gem_name}, got #{found}"
+    end
+    found[0]
+  end
+
+  def chunk_name_for_gemspec(spec)
+    # Chunk names are the Gem's "full name" (name + version number), so
+    # that we don't break in the rare but possible case that two different
+    # versions of the same Gem are required for something to work. It'd be
+    # nicer to only use the full_name if we detect such a conflict.
+    spec.full_name
+  end
+
+  def is_signed_gem(spec)
+    spec.signing_key != nil
+  end
+
+  def generate_chunk_morph_for_gem(spec)
+    description = 'Automatically generated by rubygems.to_chunk'
+
+    bin_dir = "\"$DESTDIR/$PREFIX/bin\""
+    gem_dir = "\"$DESTDIR/$(gem environment home)\""
+
+    # There's more splitting to be done, but putting the docs in the
+    # correct artifact is the single biggest win for enabling smaller
+    # system images.
+    #
+    # Adding this to Morph's default ruleset is painful, because:
+    #   - Changing the default split rules triggers a rebuild of everything.
+    #   - The whole split rule code needs reworking to prevent overlaps and to
+    #     make it possible to extend rules without creating overlaps. It's
+    #     otherwise impossible to reason about.
+
+    split_rules = [
+      {
+        'artifact' => "#{spec.full_name}-doc",
+        'include' => [
+          'usr/lib/ruby/gems/\d[\w.]*/doc/.*'
+        ]
+      }
+    ]
+
+    # It'd be rather tricky to include these build instructions as a
+    # BuildSystem implementation in Morph. The problem is that there's no
+    # way for the default commands to know what .gemspec file they should
+    # be building. It doesn't help that the .gemspec may be in a subdirectory
+    # (as in Rails, for example).
+    #
+    # Note that `gem help build` says the following:
+    #
+    #   The best way to build a gem is to use a Rakefile and the
+    #   Gem::PackageTask which ships with RubyGems.
+    #
+    # It's often possible to run `rake gem`, but this may require Hoe,
+    # rake-compiler, Jeweler or other assistance tools to be present at Gem
+    # construction time. It seems that many Ruby projects that use these tools
+    # also maintain an up-to-date generated .gemspec file, which means that we
+    # can get away with using `gem build` just fine in many cases.
+    #
+    # Were we to use `setup.rb install` or `rake install`, programs that loaded
+    # with the 'rubygems' library would complain that required Gems were not
+    # installed. We must have the Gem metadata available, and `gem build; gem
+    # install` seems the easiest way to achieve that.
+
+    configure_commands = []
+
+    if is_signed_gem(spec)
+      # This is a best-guess hack for allowing unsigned builds of Gems that are
+      # normally built signed. There's no value in building signed Gems when we
+      # control the build and deployment environment, and we obviously can't
+      # provide the private key of the Gem's maintainer.
+      configure_commands <<
+        "sed -e '/cert_chain\\s*=/d' -e '/signing_key\\s*=/d' -i " +
+        "#{spec.name}.gemspec"
+    end
+
+    build_commands = [
+      "gem build #{spec.name}.gemspec",
+    ]
+
+    install_commands = [
+      "mkdir -p #{gem_dir}",
+      "gem install --install-dir #{gem_dir} --bindir #{bin_dir} " +
+        "--ignore-dependencies --local ./#{spec.full_name}.gem"
+    ]
+
+    {
+      'name' => chunk_name_for_gemspec(spec),
+      'kind' => 'chunk',
+      'description' => description,
+      'build-system' => 'manual',
+      'products' => split_rules,
+      'configure-commands' => configure_commands,
+      'build-commands' => build_commands,
+      'install-commands' => install_commands,
+    }
+  end
+
+  def build_deps_for_gem(spec)
+    deps = spec.dependencies.select do |d|
+      d.type == :development && @build_dependency_whitelist.member?(d.name)
+    end
+  end
+
+  def runtime_deps_for_gem(spec)
+    spec.dependencies.select {|d| d.type == :runtime}
+  end
+
+  def run
+    source_dir_name, gem_name, expected_version = parse_options(ARGV)
+
+    log.info("Creating chunk morph for #{gem_name} based on " +
+             "source code in #{source_dir_name}")
+
+    Dir.chdir(source_dir_name)
+
+    # Instead of reading the real Gemfile, invent one that simply includes the
+    # chosen .gemspec. If present, the Gemfile.lock will be honoured.
+    fake_gemfile = Bundler::Dsl.new
+    fake_gemfile.source('https://rubygems.org')
+    begin
+      fake_gemfile.gemspec({:name => gem_name})
+    rescue Bundler::InvalidOption
+      error "Did not find #{gem_name}.gemspec in #{source_dir_name}"
+      exit 1
+    end
+
+    definition = fake_gemfile.to_definition('Gemfile.lock', true)
+    resolved_specs = definition.resolve_remotely!
+
+    spec = get_spec_for_gem(resolved_specs, gem_name)
+
+    if not spec_is_from_current_source_tree(spec, source_dir_name)
+      error "Specified gem '#{spec.name}' doesn't live in the source in " +
+            "'#{source_dir_name}'"
+      log.debug "SPEC: #{spec.inspect} #{spec.source}"
+      exit 1
+    end
+
+    if expected_version != nil && spec.version != expected_version
+      # This check is brought to you by Coderay, which changes its version
+      # number based on an environment variable. Other Gems may do this too.
+      error "Source in #{source_dir_name} produces #{spec.full_name}, but " +
+            "the expected version was #{expected_version}."
+      exit 1
+    end
+
+    morph = generate_chunk_morph_for_gem(spec)
+
+    # One might think that you could use the Bundler::Dependency.groups
+    # field to filter but it doesn't seem to be useful. Instead we go back to
+    # the Gem::Specification of the target Gem and use the dependencies fild
+    # there. We look up each dependency in the resolved_specset to find out
+    # what version Bundler has chosen of it.
+
+    def format_deps_for_morphology(specset, dep_list)
+      info = dep_list.collect do |dep|
+        spec = specset[dep][0]
+        [spec.name, spec.version.to_s]
+      end
+      Hash[info]
+    end
+
+    build_deps = format_deps_for_morphology(
+      resolved_specs, build_deps_for_gem(spec))
+    runtime_deps = format_deps_for_morphology(
+      resolved_specs, runtime_deps_for_gem(spec))
+
+    morph['x-build-dependencies-rubygems'] = build_deps
+    morph['x-runtime-dependencies-rubygems'] = runtime_deps
+
+    write_morph(STDOUT, morph)
+  end
+end
+
+RubyGemChunkMorphologyGenerator.new.run
diff --git a/rubygems.to_lorry b/rubygems.to_lorry
new file mode 100755
index 0000000..7a00820
--- /dev/null
+++ b/rubygems.to_lorry
@@ -0,0 +1,164 @@
+#!/usr/bin/python
+#
+# Create a Baserock .lorry file for a given RubyGem
+#
+# Copyright (C) 2014  Codethink Limited
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; version 2 of the License.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License along
+# with this program; if not, write to the Free Software Foundation, Inc.,
+# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+
+
+import requests
+import requests_cache
+import yaml
+
+import logging
+import json
+import os
+import sys
+import urlparse
+
+from importer_base import ImportException, ImportExtension
+
+
+class GenerateLorryException(ImportException):
+    pass
+
+
+class RubyGemsWebServiceClient(object):
+    def __init__(self):
+        # Save hammering the rubygems.org API: 'requests' API calls are
+        # transparently cached in an SQLite database, instead.
+        requests_cache.install_cache('rubygems_api_cache')
+
+    def _request(self, url):
+        r = requests.get(url)
+        if r.ok:
+            return json.loads(r.text)
+        else:
+            raise GenerateLorryException(
+                'Request to %s failed: %s' % (r.url, r.reason))
+
+    def get_gem_info(self, gem_name):
+        info = self._request(
+            'http://rubygems.org/api/v1/gems/%s.json' % gem_name)
+
+        if info['name'] != gem_name:
+            # Sanity check
+            raise GenerateLorryException(
+                 'Received info for Gem "%s", requested "%s"' % info['name'],
+                  gem_name)
+
+        return info
+
+
+class RubyGemLorryGenerator(ImportExtension):
+    def __init__(self):
+        super(RubyGemLorryGenerator, self).__init__()
+
+        with open('rubygems.yaml', 'r') as f:
+            local_data = yaml.load(f.read())
+
+        self.lorry_prefix = local_data['lorry-prefix']
+        self.known_source_uris = local_data['known-source-uris']
+
+        logging.debug(
+            "Loaded %i known source URIs from local metadata.", len(self.known_source_uris))
+
+    def process_args(self, args):
+        if len(args) != 1:
+            raise ImportException(
+                'Please call me with the name of a RubyGem as an argument.\n')
+
+        gem_name = args[0]
+
+        lorry = self.generate_lorry_for_gem(gem_name)
+        self.write_lorry(sys.stdout, lorry)
+
+    def find_upstream_repo_for_gem(self, gem_name, gem_info):
+        source_code_uri = gem_info['source_code_uri']
+
+        if gem_name in self.known_source_uris:
+            logging.debug('Found %s in known-source-uris', gem_name)
+            known_uri = self.known_source_uris[gem_name]
+            if source_code_uri is not None and known_uri != source_code_uri:
+                sys.stderr.write(
+                    '%s: Hardcoded source URI %s doesn\'t match spec URI %s\n' %
+                    (gem_name, known_uri, source_code_uri))
+            return known_uri
+
+        if source_code_uri is not None and len(source_code_uri) > 0:
+            logging.debug('Got source_code_uri %s', source_code_uri)
+            if source_code_uri.endswith('/tree'):
+                source_code_uri = source_code_uri[:-len('/tree')]
+
+            return source_code_uri
+
+        homepage_uri = gem_info['homepage_uri']
+        if homepage_uri is not None and len(homepage_uri) > 0:
+            logging.debug('Got homepage_uri %s', source_code_uri)
+            netloc = urlparse.urlsplit(homepage_uri)[1]
+            if netloc == 'github.com':
+                return homepage_uri
+
+        # Further possible leads on locating source code.
+        # http://ruby-toolbox.com/projects/$gemname -> sometimes contains an
+        #   upstream link, even if the gem info does not.
+        # https://github.com/search?q=$gemname -> often the first result is
+        #   the correct one, but you can never know.
+
+        raise GenerateLorryException(
+            "Gem metadata for '%s' does not point to its source code "
+            "repository." % gem_name)
+
+    def project_name_from_repo(self, repo_url):
+        if repo_url.endswith('/tree/master'):
+            repo_url = repo_url[:-len('/tree/master')]
+        if repo_url.endswith('/'):
+            repo_url = repo_url[:-1]
+        if repo_url.endswith('.git'):
+            repo_url = repo_url[:-len('.git')]
+        return os.path.basename(repo_url)
+
+    def generate_lorry_for_gem(self, gem_name):
+        rubygems_client = RubyGemsWebServiceClient()
+
+        gem_info = rubygems_client.get_gem_info(gem_name)
+
+        gem_source_url = self.find_upstream_repo_for_gem(gem_name, gem_info)
+        logging.info('Got URL <%s> for %s', gem_source_url, gem_name)
+
+        project_name = self.project_name_from_repo(gem_source_url)
+        lorry_name = self.lorry_prefix + project_name
+
+        # One repo may produce multiple Gems. It's up to the caller to merge
+        # multiple .lorry files that get generated for the same repo.
+
+        lorry = {
+            lorry_name: {
+                'type': 'git',
+                'url': gem_source_url,
+                'x-products-rubygems': [gem_name]
+            }
+        }
+
+        return lorry
+
+    def write_lorry(self, stream, lorry):
+        json.dump(lorry, stream, indent=4)
+        # Needed so the morphlib.extensions code will pick up the last line.
+        stream.write('\n')
+
+
+if __name__ == '__main__':
+    RubyGemLorryGenerator().run()
diff --git a/rubygems.yaml b/rubygems.yaml
new file mode 100644
index 0000000..e1e6fcc
--- /dev/null
+++ b/rubygems.yaml
@@ -0,0 +1,49 @@
+---
+
+lorry-prefix: ruby-gems/
+
+# The :development dependency set is way too broad for our needs: for most Gems,
+# it includes test tools and development aids that aren't necessary for just
+# building the Gem. It's hard to even get a stratum if we include all these
+# tools because of the number of circular dependencies. Instead, only those
+# tools which are known to be required at Gem build time are listed as
+# build-dependencies, and any other :development dependencies are ignored.
+build-dependency-whitelist:
+  - hoe
+  # rake is bundled with Ruby, so it is not included in the whitelist.
+
+# The following Gems don't provide a source_code_uri in their Gem metadata.
+# Ideally ... they would do.
+known-source-uris:
+  appbundler: https://github.com/opscode/appbundler
+  ast: https://github.com/openSUSE/ast
+  brass: https://github.com/rubyworks/brass
+  coveralls: https://github.com/lemurheavy/coveralls-ruby
+  dep-selector-libgecode: https://github.com/opscode/dep-selector-libgecode
+  diff-lcs: https://github.com/halostatue/diff-lcs
+  erubis: https://github.com/kwatch/erubis
+  fog-brightbox: https://github.com/brightbox/fog-brightbox
+  highline: https://github.com/JEG2/highline
+  hoe: https://github.com/seattlerb/hoe
+  indexer: https://github.com/rubyworks/indexer
+  json: https://github.com/flori/json
+  method_source: https://github.com/banister/method_source
+  mixlib-authentication: https://github.com/opscode/mixlib-authentication
+  mixlib-cli: https://github.com/opscode/mixlib-cli
+  mixlib-log: https://github.com/opscode/mixlib-log
+  mixlib-shellout: http://github.com/opscode/mixlib-shellout
+  ohai: http://github.com/opscode/ohai
+  rack-cache: https://github.com/rtomayko/rack-cache
+  actionmailer: https://github.com/rails/rails
+  actionpack: https://github.com/rails/rails
+  actionview: https://github.com/rails/rails
+  activejob: https://github.com/rails/rails
+  activemodel: https://github.com/rails/rails
+  activerecord: https://github.com/rails/rails
+  activesupport: https://github.com/rails/rails
+  rails: https://github.com/rails/rails
+  railties: https://github.com/rails/rails
+  pg: https://github.com/ged/ruby-pg
+  sigar: https://github.com/hyperic/sigar
+  sprockets: https://github.com/sstephenson/sprockets
+  tins: https://github.com/flori/tins
author	Sam Thursfield <sam.thursfield@codethink.co.uk>	2014-10-14 16:41:16 +0100
committer	Sam Thursfield <sam.thursfield@codethink.co.uk>	2014-10-14 16:41:16 +0100
commit	c11bcfcd39bd9c9e30184ea29d21ef52624d056a (patch)
tree	8b4fbe74ced0b68ced598e42c9f19182beea73ba
download	import-c11bcfcd39bd9c9e30184ea29d21ef52624d056a.tar.gz