From c11bcfcd39bd9c9e30184ea29d21ef52624d056a Mon Sep 17 00:00:00 2001 From: Sam Thursfield Date: Tue, 14 Oct 2014 16:41:16 +0100 Subject: Initial import of Baserock import tool for importing foreign packaging --- README | 100 ++++++ README.omnibus | 17 + README.rubygems | 52 +++ importer_base.py | 72 +++++ importer_base.rb | 81 +++++ main.py | 920 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ omnibus.to_chunk | 274 ++++++++++++++++ omnibus.to_lorry | 94 ++++++ omnibus.yaml | 7 + rubygems.to_chunk | 275 ++++++++++++++++ rubygems.to_lorry | 164 ++++++++++ rubygems.yaml | 49 +++ 12 files changed, 2105 insertions(+) create mode 100644 README create mode 100644 README.omnibus create mode 100644 README.rubygems create mode 100644 importer_base.py create mode 100644 importer_base.rb create mode 100644 main.py create mode 100755 omnibus.to_chunk create mode 100755 omnibus.to_lorry create mode 100644 omnibus.yaml create mode 100755 rubygems.to_chunk create mode 100755 rubygems.to_lorry create mode 100644 rubygems.yaml diff --git a/README b/README new file mode 100644 index 0000000..3ac7997 --- /dev/null +++ b/README @@ -0,0 +1,100 @@ +How to use the Baserock Import Tool +=================================== + +The tool helps you generate Baserock build instructions by importing metadata +from a foreign packaging system. + +The process it follows is this: + +1. Pick a package from the processing queue. +2. Find its source code, and generate a suitable .lorry file. +3. Make it available as a local Git repo. +4. Check out the commit corresponding to the requested version of the package. +5. Analyse the source tree and generate a suitable chunk .morph to build the + requested package. +6. Analyse the source tree and generate a list of dependencies for the package. +7. Enqueue any new dependencies, and repeat. + +Once the queue is empty: + +8. Generate a stratum .morph for the package(s) the user requested. + +The tool is not magic. It can be taught the conventions for each packaging +system, but these will not work in all cases. When an import fails it will +continue to the next package, so that the first run does as many imports as +possible. + +For imports that could not be done automatically, you will need to write an +appropriate .lorry or .morph file manually and rerun the tool. It will resume +processing where it left off. + +It's possible to teach the code about more conventions, but it is only +worthwhile to do that for common patterns. + + +Package-system specific code and data +------------------------------------- + +For each supported package system, there should be an xxx.to_lorry program, and +a xxx.to_chunk program. These should output on stdout a .lorry file and a .morph +file, respectively. + +Each packaging system can have static data saved in a .yaml file, for known +metadata that the programs cannot discover automatically. + +The following field should be honoured by most packaging systems: +`known-source-uris`. It maps package name to source URI. + + +Help with .lorry generation +--------------------------- + +The simplest fix is to add the source to the 'known-source-uris` dict in the +static metadata. + +If you write a .lorry file by hand, be sure to fill in the `x-products-YYY` +field. 'x' means this field is an extension to the .lorry format. YYY is the +name of the packaging system, e.g. 'rubygems'. It should contain a list of +which packages this repository contains the source code for. + + +Help with linking package version to Git tag +-------------------------------------------- + +Some projects do not tag releases. + +Currently, you must create a tag in the local checkout for the tool to continue. +In future, the Lorry tool should be extended to handle creation of missing +tags, so that they are propagated to the project Trove. The .lorry file would +need to contain a dict mapping product version number to commit SHA1. + +If you are in a hurry, you can use the `--use-master-if-no-tag` option. Instead +of an error, the tool will use whatever is the `master` ref of the component +repo. + + +Help with chunk .morph generation +--------------------------------- + +If you create a chunk morph by hand, you must add some extra fields: + + - `x-build-dependencies-YYY` + - `x-runtime-dependencies-YYY` + +These are a dict mapping dependency name to dependency version. For example: + + x-build-dependencies-rubygems: {} + x-runtime-dependencies-rubygems: + hashie: 2.1.2 + json: 1.8.1 + mixlib-log: 1.6.0 + rack: 1.5.2 + +All dependencies will be included in the resulting stratum. Those which are build +dependencies of other components will be added to the relevant 'build-depends' +field. + +These fields are non-standard extensions to the morphology format. + +For more package-system specific information, see the relevant README file, e.g +README.rubygems for RubyGem imports. diff --git a/README.omnibus b/README.omnibus new file mode 100644 index 0000000..840bbab --- /dev/null +++ b/README.omnibus @@ -0,0 +1,17 @@ +Omnibus import +============== + +See 'README' for general information on the Baserock Import Tool. + +To use +------ + +First, clone the Git repository corresponding to the Omnibus project you want +to import. For example, if you want to import the Chef Server, clone: + + +As per Omnibus' instructions, you should then run `bundle install --binstubs` +in the checkout to make available the various dependent repos and Gems of the +project definitions. + + diff --git a/README.rubygems b/README.rubygems new file mode 100644 index 0000000..1afb62d --- /dev/null +++ b/README.rubygems @@ -0,0 +1,52 @@ +Here is some information I have learned while importing RubyGem packages into +Baserock. + +First, beware that RubyGem .gemspec files are actually normal Ruby programs, +and are executed when loaded. A Bundler Gemfile is also a Ruby program, and +could run arbitrary code when loaded. + +The Standard Case +----------------- + +Most Ruby projects provide one or more .gemspec files, which describe the +runtime and development dependencies of the Gem. + +Using the .gemspec file and the `gem build` command it is possible to create +the .gem file. It can then be installed with `gem install`. + +Note that use of `gem build` is discouraged by its own help file in favour +of using Rake, but there is much less standardisation among Rakefiles and they +may introduce requirements on Hoe, rake-compiler, Jeweler or other tools. + +The 'development' dependencies includes everything useful to test, document, +and create a Gem of the project. All we want to do is create a Gem, which I'll +refer to as 'building'. + + +Gem with no .gemspec +-------------------- + +Some Gems choose not to include a .gemspec, like [Nokigori]. In the case of +Nokigori, and others, [Hoe] is used, which adds Rake tasks that create the Gem. +The `gem build` command cannot not be used in these cases. + +You may be able to use the `rake gem` command instead of `gem build`. + +[Nokigori]: https://github.com/sparklemotion/nokogiri/blob/master/Y_U_NO_GEMSPEC.md +[Hoe]: http://www.zenspider.com/projects/hoe.html + + +Signed Gems +----------- + +It's possible for a Gem maintainer to sign their Gems. See: + + - + - + +When building a Gem in Baserock, signing is unnecessary because it's not going +to be shared except as part of the build system. The .gemspec may include a +`signing_key` field, which will be a local path on the maintainer's system to +their private key. Removing this field causes an unsigned Gem to be built. + +Known Gems that do this: 'net-ssh' and family. diff --git a/importer_base.py b/importer_base.py new file mode 100644 index 0000000..5def0dc --- /dev/null +++ b/importer_base.py @@ -0,0 +1,72 @@ +# Base class for import tools written in Python. +# +# Copyright (C) 2014 Codethink Limited +# +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; version 2 of the License. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License along +# with this program; if not, write to the Free Software Foundation, Inc., +# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. + + +import logging +import os +import sys + + +class ImportException(Exception): + pass + + +class ImportExtension(object): + '''A base class for import extensions. + + A subclass should subclass this class, and add a ``process_args`` method. + + Note that it is not necessary to subclass this class for import extensions. + This class is here just to collect common code. + + ''' + + def __init__(self): + self.setup_logging() + + def setup_logging(self): + '''Direct all logging output to MORPH_LOG_FD, if set. + + This file descriptor is read by Morph and written into its own log + file. + + This overrides cliapp's usual configurable logging setup. + + ''' + log_write_fd = int(os.environ.get('MORPH_LOG_FD', 0)) + + if log_write_fd == 0: + return + + formatter = logging.Formatter('%(message)s') + + handler = logging.StreamHandler(os.fdopen(log_write_fd, 'w')) + handler.setFormatter(formatter) + + logger = logging.getLogger() + logger.addHandler(handler) + logger.setLevel(logging.DEBUG) + + def process_args(self, args): + raise NotImplementedError() + + def run(self): + try: + self.process_args(sys.argv[1:]) + except ImportException as e: + sys.stderr.write('ERROR: %s\n' % e.message) + sys.exit(1) diff --git a/importer_base.rb b/importer_base.rb new file mode 100644 index 0000000..4e7a7b5 --- /dev/null +++ b/importer_base.rb @@ -0,0 +1,81 @@ +#!/usr/bin/env ruby +# +# Base class for importers written in Ruby +# +# Copyright (C) 2014 Codethink Limited +# +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; version 2 of the License. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License along +# with this program; if not, write to the Free Software Foundation, Inc., +# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. + +require 'json' +require 'logger' +require 'optparse' +require 'yaml' + +module Importer + class Base + private + + def create_option_parser(banner, description) + opts = OptionParser.new + + opts.banner = banner + + opts.on('-?', '--help', 'print this help') do + puts opts + print "\n", description + exit 255 + end + end + + def log + @logger ||= create_logger + end + + def error(message) + log.error(message) + STDERR.puts(message) + end + + def local_data_path(file) + # Return the path to 'file' relative to the currently running program. + # Used as a simple mechanism of finding local data files. + script_dir = File.dirname(__FILE__) + File.join(script_dir, file) + end + + def write_lorry(file, lorry) + format_options = { :indent => ' ' } + file.puts(JSON.pretty_generate(lorry, format_options)) + end + + def write_morph(file, morph) + file.write(YAML.dump(morph)) + end + + def create_logger + # Use the logger that was passed in from the 'main' import process, if + # detected. + log_fd = ENV['MORPH_LOG_FD'] + if log_fd + log_stream = IO.new(Integer(log_fd), 'w') + logger = Logger.new(log_stream) + logger.level = Logger::DEBUG + logger.formatter = proc { |severity, datetime, progname, msg| "#{msg}\n" } + else + logger = Logger.new('/dev/null') + end + logger + end + end +end diff --git a/main.py b/main.py new file mode 100644 index 0000000..b5ebece --- /dev/null +++ b/main.py @@ -0,0 +1,920 @@ +#!/usr/bin/python +# Import foreign packaging systems into Baserock +# +# Copyright (C) 2014 Codethink Limited +# +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; version 2 of the License. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License along +# with this program; if not, write to the Free Software Foundation, Inc., +# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. + + +import ansicolor +import cliapp +import morphlib +import networkx +import six + +import contextlib +import copy +import json +import logging +import os +import pipes +import sys +import tempfile +import time + +from logging import debug + + +class LorrySet(object): + '''Manages a set of .lorry files. + + The structure of .lorry files makes the code a little more confusing than + I would like. A lorry "entry" is a dict of one entry mapping name to info. + A lorry "file" is a dict of one or more of these entries merged together. + If it were a list of entries with 'name' fields, the code would be neater. + + ''' + def __init__(self, lorries_path): + self.path = lorries_path + + if os.path.exists(lorries_path): + self.data = self.parse_all_lorries() + else: + os.makedirs(lorries_path) + self.data = {} + + def all_lorry_files(self): + for dirpath, dirnames, filenames in os.walk(self.path): + for filename in filenames: + if filename.endswith('.lorry'): + yield os.path.join(dirpath, filename) + + def parse_all_lorries(self): + lorry_set = {} + for lorry_file in self.all_lorry_files(): + lorry = self.parse_lorry(lorry_file) + + lorry_items = lorry.items() + + for key, value in lorry_items: + if key in lorry_set: + raise Exception( + '%s: duplicates existing lorry %s' % (lorry_file, key)) + + lorry_set.update(lorry_items) + + return lorry_set + + def parse_lorry(self, lorry_file): + try: + with open(lorry_file, 'r') as f: + lorry = json.load(f) + return lorry + except ValueError as e: + raise cliapp.AppException( + "Error parsing %s: %s" % (lorry_file, e)) + + def get_lorry(self, name): + return {name: self.data[name]} + + def find_lorry_for_package(self, kind, package_name): + key = 'x-products-%s' % kind + for name, lorry in self.data.iteritems(): + products = lorry.get(key, []) + for entry in products: + if entry == package_name: + return {name: lorry} + + return None + + def _check_for_conflicts_in_standard_fields(self, existing, new): + '''Ensure that two lorries for the same project do actually match.''' + for field, value in existing.iteritems(): + if field.startswith('x-'): + continue + if field == 'url': + # FIXME: need a much better way of detecting whether the URLs + # are equivalent ... right now HTTP vs. HTTPS will cause an + # error, for example! + matches = (value.rstrip('/') == new[field].rstrip('/')) + else: + matches = (value == new[field]) + if not matches: + raise Exception( + 'Lorry %s conflicts with existing entry %s at field %s' % + (new, existing, field)) + + def _merge_products_fields(self, existing, new): + '''Merge the x-products- fields from new lorry into an existing one.''' + is_product_field = lambda x: x.startswith('x-products-') + + existing_fields = [f for f in existing.iterkeys() if + is_product_field(f)] + new_fields = [f for f in new.iterkeys() if f not in existing_fields and + is_product_field(f)] + + for field in existing_fields: + existing[field].extend(new[field]) + existing[field] = list(set(existing[field])) + + for field in new_fields: + existing[field] = new[field] + + def add(self, filename, lorry_entry): + logging.debug('Adding %s to lorryset', filename) + + filename = os.path.join(self.path, '%s.lorry' % filename) + + assert len(lorry_entry) == 1 + + project_name = lorry_entry.keys()[0] + info = lorry_entry.values()[0] + + if len(project_name) == 0: + raise cliapp.AppException( + 'Invalid lorry %s: %s' % (filename, lorry_entry)) + + if not isinstance(info.get('url'), six.string_types): + raise cliapp.AppException( + 'Invalid URL in lorry %s: %s' % (filename, info.get('url'))) + + if project_name in self.data: + stored_lorry = self.get_lorry(project_name) + + self._check_for_conflicts_in_standard_fields( + stored_lorry[project_name], lorry_entry[project_name]) + self._merge_products_fields( + stored_lorry[project_name], lorry_entry[project_name]) + lorry_entry = stored_lorry + else: + self.data[project_name] = info + + self._add_lorry_entry_to_lorry_file(filename, lorry_entry) + + def _add_lorry_entry_to_lorry_file(self, filename, entry): + if os.path.exists(filename): + with open(filename) as f: + contents = json.load(f) + else: + contents = {} + + contents.update(entry) + + with morphlib.savefile.SaveFile(filename, 'w') as f: + json.dump(contents, f, indent=4, separators=(',', ': '), + sort_keys=True) + + +class MorphologySet(morphlib.morphset.MorphologySet): + def __init__(self, path): + super(MorphologySet, self).__init__() + + self.path = path + self.loader = morphlib.morphloader.MorphologyLoader() + + if os.path.exists(path): + self.load_all_morphologies() + else: + os.makedirs(path) + + def load_all_morphologies(self): + logging.info('Loading all .morph files under %s', self.path) + + class FakeGitDir(morphlib.gitdir.GitDirectory): + '''Ugh + + This is here because the default constructor will search up the + directory heirarchy until it finds a '.git' directory, but that + may be totally the wrong place for our purpose: we don't have a + Git directory at all. + + ''' + def __init__(self, path): + self.dirname = path + self._config = {} + + gitdir = FakeGitDir(self.path) + finder = morphlib.morphologyfinder.MorphologyFinder(gitdir) + for filename in (f for f in finder.list_morphologies() + if not gitdir.is_symlink(f)): + text = finder.read_morphology(filename) + morph = self.loader.load_from_string(text, filename=filename) + morph.repo_url = None # self.root_repository_url + morph.ref = None # self.system_branch_name + self.add_morphology(morph) + + def get_morphology(self, repo_url, ref, filename): + return self._get_morphology(repo_url, ref, filename) + + def save_morphology(self, filename, morphology): + self.add_morphology(morphology) + morphology_to_save = copy.copy(morphology) + self.loader.unset_defaults(morphology_to_save) + filename = os.path.join(self.path, filename) + self.loader.save_to_file(filename, morphology_to_save) + + +class GitDirectory(morphlib.gitdir.GitDirectory): + def __init__(self, dirname): + super(GitDirectory, self).__init__(dirname) + + # Work around strange/unintentional behaviour in GitDirectory class + # when 'repopath' isn't a Git repo. If 'repopath' is contained + # within a Git repo then the GitDirectory will traverse up to the + # parent repo, which isn't what we want in this case. + if self.dirname != dirname: + logging.error( + 'Got git directory %s for %s!', self.dirname, dirname) + raise cliapp.AppException( + '%s is not the root of a Git repository' % dirname) + + def has_ref(self, ref): + try: + self._rev_parse(ref) + return True + except morphlib.gitdir.InvalidRefError: + return False + + +class BaserockImportException(cliapp.AppException): + pass + + +class Package(object): + '''A package in the processing queue. + + In order to provide helpful errors, this item keeps track of what + packages depend on it, and hence of why it was added to the queue. + + ''' + def __init__(self, kind, name, version): + self.kind = kind + self.name = name + self.version = version + self.required_by = [] + self.morphology = None + self.is_build_dep = False + self.version_in_use = version + + def __cmp__(self, other): + return cmp(self.name, other.name) + + def __repr__(self): + return '' % (self.name, self.version) + + def __str__(self): + if len(self.required_by) > 0: + required_msg = ', '.join(self.required_by) + required_msg = ', required by: ' + required_msg + else: + required_msg = '' + return '%s-%s%s' % (self.name, self.version, required_msg) + + def add_required_by(self, item): + self.required_by.append('%s-%s' % (item.name, item.version)) + + def match(self, name, version): + return (self.name==name and self.version==version) + + def set_morphology(self, morphology): + self.morphology = morphology + + def set_is_build_dep(self, is_build_dep): + self.is_build_dep = is_build_dep + + def set_version_in_use(self, version_in_use): + self.version_in_use = version_in_use + + +def find(iterable, match): + return next((x for x in iterable if match(x)), None) + + +def run_extension(filename, args, cwd='.'): + output = [] + errors = [] + + ext_logger = logging.getLogger(filename) + + def report_extension_stdout(line): + output.append(line) + + def report_extension_stderr(line): + errors.append(line) + + def report_extension_logger(line): + ext_logger.debug(line) + + ext = morphlib.extensions.ExtensionSubprocess( + report_stdout=report_extension_stdout, + report_stderr=report_extension_stderr, + report_logger=report_extension_logger, + ) + + # There are better ways of doing this, but it works for now. + main_path = os.path.dirname(os.path.realpath(__file__)) + extension_path = os.path.join(main_path, filename) + + logging.debug("Running %s %s with cwd %s" % (extension_path, args, cwd)) + returncode = ext.run(extension_path, args, cwd, os.environ) + + if returncode == 0: + ext_logger.info('succeeded') + else: + for line in errors: + ext_logger.error(line) + message = '%s failed with code %s: %s' % ( + filename, returncode, '\n'.join(errors)) + raise BaserockImportException(message) + + return '\n'.join(output) + + +class ImportLoop(object): + '''Import a package and all of its dependencies into Baserock. + + This class holds the state for the processing loop. + + ''' + + def __init__(self, app, goal_kind, goal_name, goal_version, extra_args=[]): + self.app = app + self.goal_kind = goal_kind + self.goal_name = goal_name + self.goal_version = goal_version + self.extra_args = extra_args + + self.lorry_set = LorrySet(self.app.settings['lorries-dir']) + self.morph_set = MorphologySet(self.app.settings['definitions-dir']) + + self.morphloader = morphlib.morphloader.MorphologyLoader() + + self.importers = {} + + def enable_importer(self, kind, extra_args=[]): + assert kind not in self.importers + self.importers[kind] = { + 'extra_args': extra_args + } + + def run(self): + '''Process the goal package and all of its dependencies.''' + start_time = time.time() + start_displaytime = time.strftime('%x %X %Z', time.localtime()) + + self.app.status( + '%s: Import of %s %s started', start_displaytime, self.goal_kind, + self.goal_name) + + if not self.app.settings['update-existing']: + self.app.status( + 'Not updating existing Git checkouts or existing definitions') + + chunk_dir = os.path.join(self.morph_set.path, 'strata', self.goal_name) + if not os.path.exists(chunk_dir): + os.makedirs(chunk_dir) + + goal = Package(self.goal_kind, self.goal_name, self.goal_version) + to_process = [goal] + processed = networkx.DiGraph() + + errors = {} + + while len(to_process) > 0: + current_item = to_process.pop() + + try: + self._process_package(current_item) + error = False + except BaserockImportException as e: + self.app.status(str(e), error=True) + errors[current_item] = e + error = True + + processed.add_node(current_item) + + if not error: + self._process_dependencies_from_morphology( + current_item, current_item.morphology, to_process, + processed) + + if len(errors) > 0: + self.app.status( + '\nErrors encountered, not generating a stratum morphology.') + self.app.status( + 'See the README files for guidance.') + else: + self._generate_stratum_morph_if_none_exists( + processed, self.goal_name) + + duration = time.time() - start_time + end_displaytime = time.strftime('%x %X %Z', time.localtime()) + + self.app.status( + '%s: Import of %s %s ended (took %i seconds)', end_displaytime, + self.goal_kind, self.goal_name, duration) + + def _process_package(self, package): + kind = package.kind + name = package.name + version = package.version + + lorry = self._find_or_create_lorry_file(kind, name) + source_repo, url = self._fetch_or_update_source(lorry) + + checked_out_version, ref = self._checkout_source_version( + source_repo, name, version) + package.set_version_in_use(checked_out_version) + + chunk_morph = self._find_or_create_chunk_morph( + kind, name, checked_out_version, source_repo, url, ref) + + if self.app.settings['use-local-sources']: + chunk_morph.repo_url = 'file://' + source_repo.dirname + else: + reponame = lorry.keys()[0] + chunk_morph.repo_url = 'upstream:%s' % reponame + + package.set_morphology(chunk_morph) + + def _process_dependencies_from_morphology(self, current_item, morphology, + to_process, processed): + '''Enqueue all dependencies of a package that are yet to be processed. + + Dependencies are communicated using extra fields in morphologies, + currently. + + ''' + for key, value in morphology.iteritems(): + if key.startswith('x-build-dependencies-'): + kind = key[len('x-build-dependencies-'):] + is_build_deps = True + elif key.startswith('x-runtime-dependencies-'): + kind = key[len('x-runtime-dependencies-'):] + is_build_deps = False + else: + continue + + # We need to validate this field because it doesn't go through the + # normal MorphologyFactory validation, being an extension. + if not hasattr(value, 'iteritems'): + value_type = type(value).__name__ + raise cliapp.AppException( + "Morphology for %s has invalid '%s': should be a dict, but " + "got a %s." % (morphology['name'], key, value_type)) + + self._process_dependency_list( + current_item, kind, value, to_process, processed, is_build_deps) + + def _process_dependency_list(self, current_item, kind, deps, to_process, + processed, these_are_build_deps): + # All deps are added as nodes to the 'processed' graph. Runtime + # dependencies only need to appear in the stratum, but build + # dependencies have ordering constraints, so we add edges in + # the graph for build-dependencies too. + + for dep_name, dep_version in deps.iteritems(): + dep_package = find( + processed, lambda i: i.match(dep_name, dep_version)) + + if dep_package is None: + # Not yet processed + queue_item = find( + to_process, lambda i: i.match(dep_name, dep_version)) + if queue_item is None: + queue_item = Package(kind, dep_name, dep_version) + to_process.append(queue_item) + dep_package = queue_item + + dep_package.add_required_by(current_item) + + if these_are_build_deps or current_item.is_build_dep: + # A runtime dep of a build dep becomes a build dep + # itself. + dep_package.set_is_build_dep(True) + processed.add_edge(dep_package, current_item) + + def _find_or_create_lorry_file(self, kind, name): + # Note that the lorry file may already exist for 'name', but lorry + # files are named for project name rather than package name. In this + # case we will generate the lorry, and try to add it to the set, at + # which point LorrySet will notice the existing one and merge the two. + lorry = self.lorry_set.find_lorry_for_package(kind, name) + + if lorry is None: + lorry = self._generate_lorry_for_package(kind, name) + + if len(lorry) != 1: + raise Exception( + 'Expected generated lorry file with one entry.') + + lorry_filename = lorry.keys()[0] + + if '/' in lorry_filename: + # We try to be a bit clever and guess that if there's a prefix + # in the name, e.g. 'ruby-gems/chef' then it should go in a + # mega-lorry file, such as ruby-gems.lorry. + parts = lorry_filename.split('/', 1) + lorry_filename = parts[0] + + if lorry_filename == '': + raise cliapp.AppException( + 'Invalid lorry data for %s: %s' % (name, lorry)) + + self.lorry_set.add(lorry_filename, lorry) + else: + lorry_filename = lorry.keys()[0] + logging.info( + 'Found existing lorry file for %s: %s', name, lorry_filename) + + return lorry + + def _generate_lorry_for_package(self, kind, name): + tool = '%s.to_lorry' % kind + if kind not in self.importers: + raise Exception('Importer for %s was not enabled.' % kind) + extra_args = self.importers[kind]['extra_args'] + self.app.status('Calling %s to generate lorry for %s', tool, name) + lorry_text = run_extension(tool, extra_args + [name]) + try: + lorry = json.loads(lorry_text) + except ValueError as e: + raise cliapp.AppException( + 'Invalid output from %s: %s' % (tool, lorry_text)) + return lorry + + def _run_lorry(self, lorry): + f = tempfile.NamedTemporaryFile(delete=False) + try: + logging.debug(json.dumps(lorry)) + json.dump(lorry, f) + f.close() + cliapp.runcmd([ + 'lorry', '--working-area', + self.app.settings['lorry-working-dir'], '--pull-only', + '--bundle', 'never', '--tarball', 'never', f.name]) + finally: + os.unlink(f.name) + + def _fetch_or_update_source(self, lorry): + assert len(lorry) == 1 + lorry_name, lorry_entry = lorry.items()[0] + + url = lorry_entry['url'] + reponame = '_'.join(lorry_name.split('/')) + repopath = os.path.join( + self.app.settings['lorry-working-dir'], reponame, 'git') + + checkoutpath = os.path.join( + self.app.settings['checkouts-dir'], reponame) + + try: + already_lorried = os.path.exists(repopath) + if already_lorried: + if self.app.settings['update-existing']: + self.app.status('Updating lorry of %s', url) + self._run_lorry(lorry) + else: + self.app.status('Lorrying %s', url) + self._run_lorry(lorry) + + if os.path.exists(checkoutpath): + repo = GitDirectory(checkoutpath) + repo.update_remotes() + else: + if already_lorried: + logging.warning( + 'Expected %s to exist, but will recreate it', + checkoutpath) + cliapp.runcmd(['git', 'clone', repopath, checkoutpath]) + repo = GitDirectory(checkoutpath) + except cliapp.AppException as e: + raise BaserockImportException(e.msg.rstrip()) + + return repo, url + + def _checkout_source_version(self, source_repo, name, version): + # FIXME: we need to be a bit smarter than this. Right now we assume + # that 'version' is a valid Git ref. + + possible_names = [ + version, + 'v%s' % version, + '%s-%s' % (name, version) + ] + + for tag_name in possible_names: + if source_repo.has_ref(tag_name): + source_repo.checkout(tag_name) + ref = tag_name + break + else: + if self.app.settings['use-master-if-no-tag']: + logging.warning( + "Couldn't find tag %s in repo %s. Using 'master'.", + tag_name, source_repo) + source_repo.checkout('master') + ref = version = 'master' + else: + raise BaserockImportException( + 'Could not find ref for %s version %s.' % (name, version)) + + return version, ref + + def _find_or_create_chunk_morph(self, kind, name, version, source_repo, + repo_url, named_ref): + morphology_filename = 'strata/%s/%s-%s.morph' % ( + self.goal_name, name, version) + sha1 = source_repo.resolve_ref_to_commit(named_ref) + + def generate_morphology(): + morphology = self._generate_chunk_morph_for_package( + source_repo, kind, name, version, morphology_filename) + self.morph_set.save_morphology(morphology_filename, morphology) + return morphology + + if self.app.settings['update-existing']: + morphology = generate_morphology() + else: + morphology = self.morph_set.get_morphology( + repo_url, sha1, morphology_filename) + + if morphology is None: + # Existing chunk morphologies loaded from disk don't contain + # the repo and ref information. That's stored in the stratum + # morph. So the first time we touch a chunk morph we need to + # set this info. + logging.debug("Didn't find morphology for %s|%s|%s", repo_url, + sha1, morphology_filename) + morphology = self.morph_set.get_morphology( + None, None, morphology_filename) + + if morphology is None: + logging.debug("Didn't find morphology for None|None|%s", + morphology_filename) + morphology = generate_morphology() + + morphology.repo_url = repo_url + morphology.ref = sha1 + morphology.named_ref = named_ref + + return morphology + + def _generate_chunk_morph_for_package(self, source_repo, kind, name, + version, filename): + tool = '%s.to_chunk' % kind + + if kind not in self.importers: + raise Exception('Importer for %s was not enabled.' % kind) + extra_args = self.importers[kind]['extra_args'] + + self.app.status( + 'Calling %s to generate chunk morph for %s %s', tool, name, + version) + + args = extra_args + [source_repo.dirname, name] + if version != 'master': + args.append(version) + text = run_extension(tool, args) + + return self.morphloader.load_from_string(text, filename) + + def _sort_chunks_by_build_order(self, graph): + order = reversed(sorted(graph.nodes())) + try: + return networkx.topological_sort(graph, nbunch=order) + except networkx.NetworkXUnfeasible as e: + # Cycle detected! + loop_subgraphs = networkx.strongly_connected_component_subgraphs( + graph, copy=False) + all_loops_str = [] + for graph in loop_subgraphs: + if graph.number_of_nodes() > 1: + loops_str = '->'.join(str(node) for node in graph.nodes()) + all_loops_str.append(loops_str) + raise cliapp.AppException( + 'One or more cycles detected in build graph: %s' % + (', '.join(all_loops_str))) + + def _generate_stratum_morph_if_none_exists(self, graph, goal_name): + filename = os.path.join( + self.app.settings['definitions-dir'], 'strata', '%s.morph' % + goal_name) + + if os.path.exists(filename) and not self.app.settings['update-existing']: + self.app.status( + msg='Found stratum morph for %s at %s, not overwriting' % + (goal_name, filename)) + return + + self.app.status(msg='Generating stratum morph for %s' % goal_name) + + chunk_entries = [] + + for package in self._sort_chunks_by_build_order(graph): + m = package.morphology + if m is None: + raise cliapp.AppException('No morphology for %s' % package) + + def format_build_dep(name, version): + dep_package = find(graph, lambda p: p.match(name, version)) + return '%s-%s' % (name, dep_package.version_in_use) + + build_depends = [ + format_build_dep(name, version) for name, version in + m['x-build-dependencies-rubygems'].iteritems() + ] + + entry = { + 'name': m['name'], + 'repo': m.repo_url, + 'ref': m.ref, + 'unpetrify-ref': m.named_ref, + 'morph': m.filename, + 'build-depends': build_depends, + } + chunk_entries.append(entry) + + stratum_name = goal_name + stratum = { + 'name': stratum_name, + 'kind': 'stratum', + 'description': 'Autogenerated by Baserock import tool', + 'build-depends': [ + {'morph': 'strata/ruby.morph'} + ], + 'chunks': chunk_entries, + } + + morphology = self.morphloader.load_from_string( + json.dumps(stratum), filename=filename) + self.morphloader.unset_defaults(morphology) + self.morphloader.save_to_file(filename, morphology) + + +class BaserockImportApplication(cliapp.Application): + def add_settings(self): + self.settings.string(['lorries-dir'], + "location for Lorry files", + metavar="PATH", + default=os.path.abspath('./lorries')) + self.settings.string(['definitions-dir'], + "location for morphology files", + metavar="PATH", + default=os.path.abspath('./definitions')) + self.settings.string(['checkouts-dir'], + "location for Git checkouts", + metavar="PATH", + default=os.path.abspath('./checkouts')) + self.settings.string(['lorry-working-dir'], + "Lorry working directory", + metavar="PATH", + default=os.path.abspath('./lorry-working-dir')) + + self.settings.boolean(['update-existing'], + "update all the checked-out Git trees and " + "generated definitions", + default=False) + self.settings.boolean(['use-local-sources'], + "use file:/// URLs in the stratum 'repo' " + "fields, instead of upstream: URLs", + default=False) + self.settings.boolean(['use-master-if-no-tag'], + "if the correct tag for a version can't be " + "found, use 'master' instead of raising an " + "error", + default=False) + + def _stream_has_colours(self, stream): + # http://blog.mathieu-leplatre.info/colored-output-in-console-with-python.html + if not hasattr(stream, "isatty"): + return False + if not stream.isatty(): + return False # auto color only on TTYs + try: + import curses + curses.setupterm() + return curses.tigetnum("colors") > 2 + except: + # guess false in case of error + return False + + def setup(self): + self.add_subcommand('omnibus', self.import_omnibus, + arg_synopsis='REPO PROJECT_NAME SOFTWARE_NAME') + self.add_subcommand('rubygems', self.import_rubygems, + arg_synopsis='GEM_NAME') + + self.stdout_has_colours = self._stream_has_colours(sys.stdout) + + def setup_logging_formatter_for_file(self): + root_logger = logging.getLogger() + root_logger.name = 'main' + + # You need recent cliapp for this to work, with commit "Split logging + # setup into further overrideable methods". + return logging.Formatter("%(name)s: %(levelname)s: %(message)s") + + def process_args(self, args): + if len(args) == 0: + # Cliapp default is to just say "ERROR: must give subcommand" if + # no args are passed, I prefer this. + args = ['help'] + + super(BaserockImportApplication, self).process_args(args) + + def status(self, msg, *args, **kwargs): + text = msg % args + if kwargs.get('error') == True: + logging.error(text) + if self.stdout_has_colours: + sys.stdout.write(ansicolor.red(text)) + else: + sys.stdout.write(text) + else: + logging.info(text) + sys.stdout.write(text) + sys.stdout.write('\n') + + def import_omnibus(self, args): + '''Import a software component from an Omnibus project. + + Omnibus is a tool for generating application bundles for various + platforms. See for more + information. + + ''' + if len(args) != 3: + raise cliapp.AppException( + 'Please give the location of the Omnibus definitions repo, ' + 'and the name of the project and the top-level software ' + 'component.') + + def running_inside_bundler(): + return 'BUNDLE_GEMFILE' in os.environ + + def command_to_run_python_in_directory(directory, args): + # Bundler requires that we run it from the Omnibus project + # directory. That messes up any relative paths the user may have + # passed on the commandline, so we do a bit of a hack to change + # back to the original directory inside the `bundle exec` process. + subshell_command = "(cd %s; exec python %s)" % \ + (pipes.quote(directory), ' '.join(map(pipes.quote, args))) + shell_command = "sh -c %s" % pipes.quote(subshell_command) + return shell_command + + def reexecute_self_with_bundler(path): + script = sys.argv[0] + + logging.info('Reexecuting %s within Bundler, so that extensions ' + 'use the correct dependencies for Omnibus and the ' + 'Omnibus project definitions.', script) + command = command_to_run_python_in_directory(os.getcwd(), sys.argv) + + logging.debug('Running: `bundle exec %s` in dir %s', command, path) + os.chdir(path) + os.execvp('bundle', [script, 'exec', command]) + + # Omnibus definitions are spread across multiple repos, and there is + # no stability guarantee for the definition format. The official advice + # is to use Bundler to execute Omnibus, so let's do that. + if not running_inside_bundler(): + reexecute_self_with_bundler(args[0]) + + definitions_dir = args[0] + project_name = args[1] + + loop = ImportLoop( + app=self, + goal_kind='omnibus', goal_name=args[2], goal_version='master') + loop.enable_importer('omnibus', + extra_args=[definitions_dir, project_name]) + loop.enable_importer('rubygems') + loop.run() + + def import_rubygems(self, args): + '''Import one or more RubyGems.''' + if len(args) != 1: + raise cliapp.AppException( + 'Please pass the name of a RubyGem on the commandline.') + + loop = ImportLoop( + app=self, + goal_kind='rubygems', goal_name=args[0], goal_version='master') + loop.enable_importer('rubygems') + loop.run() + + +app = BaserockImportApplication(progname='import') +app.run() diff --git a/omnibus.to_chunk b/omnibus.to_chunk new file mode 100755 index 0000000..1189199 --- /dev/null +++ b/omnibus.to_chunk @@ -0,0 +1,274 @@ +#!/usr/bin/env ruby +# +# Create a chunk morphology to integrate Omnibus software in Baserock +# +# Copyright (C) 2014 Codethink Limited +# +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; version 2 of the License. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License along +# with this program; if not, write to the Free Software Foundation, Inc., +# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. + +require 'bundler' +require 'omnibus' + +require 'optparse' +require 'rubygems/commands/build_command' +require 'rubygems/commands/install_command' +require 'shellwords' + +require_relative 'importer_base' + +BANNER = "Usage: omnibus.to_chunk PROJECT_DIR PROJECT_NAME SOURCE_DIR SOFTWARE_NAME" + +DESCRIPTION = <<-END +Generate a .morph file for a given Omnibus software component. +END + +class Omnibus::Builder + # It's possible to use `gem install` in build commands, which is a great + # way of subverting the dependency tracking Omnibus provides. It's done + # in `omnibus-chef/config/software/chefdk.rb`, for example. + # + # To handle this, here we extend the class that executes the build commands + # to detect when `gem install` is run. It uses the Gem library to turn the + # commandline back into a Bundler::Dependency object that we can use. + # + # We also trap `gem build` so we know when a software component is a RubyGem + # that should be handled by 'rubygems.to_chunk'. + + class GemBuildCommandParser < Gem::Commands::BuildCommand + def gemspec_path(args) + handle_options args + if options[:args].length != 1 + raise Exception, "Invalid `gem build` commandline: 1 argument " + + "expected, got #{options[:args]}." + end + options[:args][0] + end + end + + class GemInstallCommandParser < Gem::Commands::InstallCommand + def dependency_list_from_commandline(args) + handle_options args + + # `gem install foo*` is sometimes used when installing a locally built + # Gem, to avoid needing to know the exact version number that was built. + # We only care about remote Gems being installed, so anything with a '*' + # in its name can be ignored. + gem_names = options[:args].delete_if { |name| name.include?('*') } + + gem_names.collect do |gem_name| + Bundler::Dependency.new(gem_name, options[:version]) + end + end + end + + def gem(command, options = {}) + # This function re-implements the 'gem' function in the build-commands DSL. + if command.start_with? 'build' + parser = GemBuildCommandParser.new + args = Shellwords.split(command).drop(1) + if built_gemspec != nil + raise Exception, "More than one `gem build` command was run as part " + + "of the build process. The 'rubygems.to_chunk' " + + "program currently supports only one .gemspec " + + "build per chunk, so this can't be processed " + + "automatically." + end + @built_gemspec = parser.gemspec_path(args) + elsif command.start_with? 'install' + parser = GemInstallCommandParser.new + args = Shellwords.split(command).drop(1) + args_without_build_flags = args.take_while { |item| item != '--' } + gems = parser.dependency_list_from_commandline(args_without_build_flags) + manually_installed_rubygems.concat gems + end + end + + def built_gemspec + @built_gemspec + end + + def manually_installed_rubygems + @manually_installed_rubygems ||= [] + end +end + +class OmnibusChunkMorphologyGenerator < Importer::Base + def initialize + local_data = YAML.load_file(local_data_path("omnibus.yaml")) + @dependency_blacklist = local_data['dependency-blacklist'] + end + + def parse_options(arguments) + opts = create_option_parser(BANNER, DESCRIPTION) + + parsed_arguments = opts.parse!(arguments) + + if parsed_arguments.length != 4 and parsed_arguments.length != 5 + STDERR.puts "Expected 4 or 5 arguments, got #{parsed_arguments}." + opts.parse(['-?']) + exit 255 + end + + project_dir, project_name, source_dir, software_name, expected_version = \ + parsed_arguments + # Not yet implemented + #if expected_version != nil + # expected_version = Gem::Version.new(expected_version) + #end + [project_dir, project_name, source_dir, software_name, expected_version] + end + + class SubprocessError < RuntimeError + end + + def run_tool_capture_output(tool_name, *args) + tool_path = local_data_path(tool_name) + + # FIXME: something breaks when we try to share this FD, it's not + # ideal that the subprocess doesn't log anything, though. + env_changes = {'MORPH_LOG_FD' => nil} + + command = [[tool_path, tool_name], *args] + log.info("Running #{command.join(' ')} in #{scripts_dir}") + + text = IO.popen( + env_changes, command, :chdir => scripts_dir, :err => [:child, :out] + ) do |io| + io.read + end + + if $? == 0 + text + else + raise SubprocessError, text + end + end + + def generate_chunk_morph_for_rubygems_software(software, source_dir) + # This is a better heuristic for getting the name of the Gem + # than the software name, it seems ... + gem_name = software.relative_path + + text = run_tool_capture_output('rubygems.to_chunk', source_dir, gem_name) + log.debug("Text from output: #{text}, result #{$?}") + + morphology = YAML::load(text) + return morphology + rescue SubprocessError => e + error "Tried to import #{software.name} as a RubyGem, got the " \ + "following error from rubygems.to_chunk: #{e.message}" + exit 1 + end + + def resolve_rubygems_deps(requirements) + return {} if requirements.empty? + + log.info('Resolving RubyGem requirements with Bundler') + + fake_gemfile = Bundler::Dsl.new + fake_gemfile.source('https://rubygems.org') + + requirements.each do |dep| + fake_gemfile.gem(dep.name, dep.requirement) + end + + definition = fake_gemfile.to_definition('Gemfile.lock', true) + resolved_specs = definition.resolve_remotely! + + Hash[resolved_specs.collect { |spec| [spec.name, spec.version.to_s]}] + end + + def generate_chunk_morph_for_software(project, software, source_dir) + if software.builder.built_gemspec != nil + morphology = generate_chunk_morph_for_rubygems_software(software, + source_dir) + else + morphology = { + "name" => software.name, + "kind" => "chunk", + "description" => "Automatically generated by omnibus.to_chunk" + } + end + + omnibus_deps = {} + rubygems_deps = {} + + software.dependencies.each do |name| + software = Omnibus::Software.load(project, name) + if @dependency_blacklist.member? name + log.info( + "Not adding #{name} as a dependency as it is marked to be ignored.") + elsif software.fetcher.instance_of?(Omnibus::PathFetcher) + log.info( + "Not adding #{name} as a dependency: it's installed from " + + "a path which probably means that it is package configuration, not " + + "a 3rd-party component to be imported.") + elsif software.fetcher.instance_of?(Omnibus::NullFetcher) + if software.builder.built_gemspec + log.info( + "Adding #{name} as a RubyGem dependency because it builds " + + "#{software.builder.built_gemspec}") + rubygems_deps[name] = software.version + else + log.info( + "Not adding #{name} as a dependency: no sources listed.") + end + else + omnibus_deps[name] = software.version + end + end + + gem_requirements = software.builder.manually_installed_rubygems + rubygems_deps = resolve_rubygems_deps(gem_requirements) + + morphology.update({ + # Possibly this tool should look at software.build and + # generate suitable configure, build and install-commands. + # For now: don't bother! + + # FIXME: are these build or runtime dependencies? We'll assume both. + "x-build-dependencies-omnibus" => omnibus_deps, + "x-runtime-dependencies-omnibus" => omnibus_deps, + + "x-build-dependencies-rubygems" => {}, + "x-runtime-dependencies-rubygems" => rubygems_deps, + }) + + if software.description + morphology['description'] = software.description + '\n\n' + + morphology['description'] + end + + morphology + end + + def run + project_dir, project_name, source_dir, software_name = parse_options(ARGV) + + log.info("Creating chunk morph for #{software_name} from project " + + "#{project_name}, defined in #{project_dir}") + + Dir.chdir(project_dir) + + project = Omnibus::Project.load(project_name) + + software = Omnibus::Software.load(@project, software_name) + + morph = generate_chunk_morph_for_software(project, software, source_dir) + + write_morph(STDOUT, morph) + end +end + +OmnibusChunkMorphologyGenerator.new.run diff --git a/omnibus.to_lorry b/omnibus.to_lorry new file mode 100755 index 0000000..256f924 --- /dev/null +++ b/omnibus.to_lorry @@ -0,0 +1,94 @@ +#!/usr/bin/env ruby +# +# Create a Baserock .lorry file for a given Omnibus software component +# +# Copyright (C) 2014 Codethink Limited +# +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; version 2 of the License. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License along +# with this program; if not, write to the Free Software Foundation, Inc., +# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. + +require 'bundler' +require 'omnibus' + +require 'optparse' +require 'rubygems/commands/install_command' +require 'shellwords' + +require_relative 'importer_base' + +BANNER = "Usage: omnibus.to_lorry PROJECT_DIR PROJECT_NAME SOFTWARE_NAME" + +DESCRIPTION = <<-END +Generate a .lorry file for a given Omnibus software component. +END + +class OmnibusLorryGenerator < Importer::Base + def parse_options(arguments) + opts = create_option_parser(BANNER, DESCRIPTION) + + parsed_arguments = opts.parse!(arguments) + + if parsed_arguments.length != 3 + STDERR.puts "Expected 3 arguments, got #{parsed_arguments}." + opts.parse(['-?']) + exit 255 + end + + project_dir, project_name, software_name = parsed_arguments + [project_dir, project_name, software_name] + end + + def generate_lorry_for_software(software) + lorry_body = { + 'x-products-omnibus' => [software.name] + } + + if software.source and software.source.member? :git + lorry_body.update({ + 'type' => 'git', + 'url' => software.source[:git], + }) + elsif software.source and software.source.member? :url + lorry_body.update({ + 'type' => 'tarball', + 'url' => software.source[:url], + # lorry doesn't validate the checksum right now, but maybe it should. + 'x-md5' => software.source[:md5], + }) + else + error "Couldn't generate lorry file from source '#{software.source.inspect}'" + exit 1 + end + + { software.name => lorry_body } + end + + def run + project_dir, project_name, software_name = parse_options(ARGV) + + log.info("Creating lorry for #{software_name} from project " + + "#{project_name}, defined in #{project_dir}") + + Dir.chdir(project_dir) + + project = Omnibus::Project.load(project_name) + + software = Omnibus::Software.load(project, software_name) + + lorry = generate_lorry_for_software(software) + + write_lorry(STDOUT, lorry) + end +end + +OmnibusLorryGenerator.new.run diff --git a/omnibus.yaml b/omnibus.yaml new file mode 100644 index 0000000..2116f2a --- /dev/null +++ b/omnibus.yaml @@ -0,0 +1,7 @@ +--- + +dependency-blacklist: + # This is provided as a single downloadable .pem file, which isn't something + # Lorry can understand. Also, it's provided by the 'ca-certificates' chunk in + # Baserock already. + - cacerts diff --git a/rubygems.to_chunk b/rubygems.to_chunk new file mode 100755 index 0000000..796fe89 --- /dev/null +++ b/rubygems.to_chunk @@ -0,0 +1,275 @@ +#!/usr/bin/env ruby +# +# Create a chunk morphology to integrate a RubyGem in Baserock +# +# Copyright (C) 2014 Codethink Limited +# +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; version 2 of the License. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License along +# with this program; if not, write to the Free Software Foundation, Inc., +# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. + +require 'bundler' + +require_relative 'importer_base' + +class << Bundler + def default_gemfile + # This is a hack to make things not crash when there's no Gemfile + Pathname.new('.') + end +end + +def spec_is_from_current_source_tree(spec, source_dir) + spec.source.instance_of? Bundler::Source::Path and + File.identical?(spec.source.path, source_dir) +end + +BANNER = "Usage: rubygems.to_chunk SOURCE_DIR GEM_NAME [VERSION]" + +DESCRIPTION = <<-END +This tool reads the Gemfile and optionally the Gemfile.lock from a Ruby project +source tree in SOURCE_DIR. It outputs a chunk morphology for GEM_NAME on +stdout. If VERSION is supplied, it is used to check that the build instructions +will produce the expected version of the Gem. + +It is intended for use with the `baserock-import` tool. +END + +class RubyGemChunkMorphologyGenerator < Importer::Base + def initialize + local_data = YAML.load_file(local_data_path("rubygems.yaml")) + @build_dependency_whitelist = local_data['build-dependency-whitelist'] + end + + def parse_options(arguments) + opts = create_option_parser(BANNER, DESCRIPTION) + + parsed_arguments = opts.parse!(arguments) + + if parsed_arguments.length != 2 && parsed_arguments.length != 3 + STDERR.puts "Expected 2 or 3 arguments, got #{parsed_arguments}." + opts.parse(['-?']) + exit 255 + end + + source_dir, gem_name, expected_version = parsed_arguments + source_dir = File.absolute_path(source_dir) + if expected_version != nil + expected_version = Gem::Version.new(expected_version.dup) + end + [source_dir, gem_name, expected_version] + end + + def load_local_gemspecs() + # Look for .gemspec files in the source repo. + # + # If there is no .gemspec, but you set 'name' and 'version' then + # inside Bundler::Source::Path.load_spec_files this call will create a + # fake gemspec matching that name and version. That's probably not useful. + + dir = '.' + + source = Bundler::Source::Path.new({ + 'path' => dir, + }) + + log.info "Loaded #{source.specs.count} specs from source dir." + source.specs.each do |spec| + log.debug " * #{spec.inspect} #{spec.dependencies.inspect}" + end + + source + end + + def get_spec_for_gem(specs, gem_name) + found = specs[gem_name].select {|s| Gem::Platform.match(s.platform)} + if found.empty? + raise Exception, + "No Gemspecs found matching '#{gem_name}'" + elsif found.length != 1 + raise Exception, + "Unsure which Gem to use for #{gem_name}, got #{found}" + end + found[0] + end + + def chunk_name_for_gemspec(spec) + # Chunk names are the Gem's "full name" (name + version number), so + # that we don't break in the rare but possible case that two different + # versions of the same Gem are required for something to work. It'd be + # nicer to only use the full_name if we detect such a conflict. + spec.full_name + end + + def is_signed_gem(spec) + spec.signing_key != nil + end + + def generate_chunk_morph_for_gem(spec) + description = 'Automatically generated by rubygems.to_chunk' + + bin_dir = "\"$DESTDIR/$PREFIX/bin\"" + gem_dir = "\"$DESTDIR/$(gem environment home)\"" + + # There's more splitting to be done, but putting the docs in the + # correct artifact is the single biggest win for enabling smaller + # system images. + # + # Adding this to Morph's default ruleset is painful, because: + # - Changing the default split rules triggers a rebuild of everything. + # - The whole split rule code needs reworking to prevent overlaps and to + # make it possible to extend rules without creating overlaps. It's + # otherwise impossible to reason about. + + split_rules = [ + { + 'artifact' => "#{spec.full_name}-doc", + 'include' => [ + 'usr/lib/ruby/gems/\d[\w.]*/doc/.*' + ] + } + ] + + # It'd be rather tricky to include these build instructions as a + # BuildSystem implementation in Morph. The problem is that there's no + # way for the default commands to know what .gemspec file they should + # be building. It doesn't help that the .gemspec may be in a subdirectory + # (as in Rails, for example). + # + # Note that `gem help build` says the following: + # + # The best way to build a gem is to use a Rakefile and the + # Gem::PackageTask which ships with RubyGems. + # + # It's often possible to run `rake gem`, but this may require Hoe, + # rake-compiler, Jeweler or other assistance tools to be present at Gem + # construction time. It seems that many Ruby projects that use these tools + # also maintain an up-to-date generated .gemspec file, which means that we + # can get away with using `gem build` just fine in many cases. + # + # Were we to use `setup.rb install` or `rake install`, programs that loaded + # with the 'rubygems' library would complain that required Gems were not + # installed. We must have the Gem metadata available, and `gem build; gem + # install` seems the easiest way to achieve that. + + configure_commands = [] + + if is_signed_gem(spec) + # This is a best-guess hack for allowing unsigned builds of Gems that are + # normally built signed. There's no value in building signed Gems when we + # control the build and deployment environment, and we obviously can't + # provide the private key of the Gem's maintainer. + configure_commands << + "sed -e '/cert_chain\\s*=/d' -e '/signing_key\\s*=/d' -i " + + "#{spec.name}.gemspec" + end + + build_commands = [ + "gem build #{spec.name}.gemspec", + ] + + install_commands = [ + "mkdir -p #{gem_dir}", + "gem install --install-dir #{gem_dir} --bindir #{bin_dir} " + + "--ignore-dependencies --local ./#{spec.full_name}.gem" + ] + + { + 'name' => chunk_name_for_gemspec(spec), + 'kind' => 'chunk', + 'description' => description, + 'build-system' => 'manual', + 'products' => split_rules, + 'configure-commands' => configure_commands, + 'build-commands' => build_commands, + 'install-commands' => install_commands, + } + end + + def build_deps_for_gem(spec) + deps = spec.dependencies.select do |d| + d.type == :development && @build_dependency_whitelist.member?(d.name) + end + end + + def runtime_deps_for_gem(spec) + spec.dependencies.select {|d| d.type == :runtime} + end + + def run + source_dir_name, gem_name, expected_version = parse_options(ARGV) + + log.info("Creating chunk morph for #{gem_name} based on " + + "source code in #{source_dir_name}") + + Dir.chdir(source_dir_name) + + # Instead of reading the real Gemfile, invent one that simply includes the + # chosen .gemspec. If present, the Gemfile.lock will be honoured. + fake_gemfile = Bundler::Dsl.new + fake_gemfile.source('https://rubygems.org') + begin + fake_gemfile.gemspec({:name => gem_name}) + rescue Bundler::InvalidOption + error "Did not find #{gem_name}.gemspec in #{source_dir_name}" + exit 1 + end + + definition = fake_gemfile.to_definition('Gemfile.lock', true) + resolved_specs = definition.resolve_remotely! + + spec = get_spec_for_gem(resolved_specs, gem_name) + + if not spec_is_from_current_source_tree(spec, source_dir_name) + error "Specified gem '#{spec.name}' doesn't live in the source in " + + "'#{source_dir_name}'" + log.debug "SPEC: #{spec.inspect} #{spec.source}" + exit 1 + end + + if expected_version != nil && spec.version != expected_version + # This check is brought to you by Coderay, which changes its version + # number based on an environment variable. Other Gems may do this too. + error "Source in #{source_dir_name} produces #{spec.full_name}, but " + + "the expected version was #{expected_version}." + exit 1 + end + + morph = generate_chunk_morph_for_gem(spec) + + # One might think that you could use the Bundler::Dependency.groups + # field to filter but it doesn't seem to be useful. Instead we go back to + # the Gem::Specification of the target Gem and use the dependencies fild + # there. We look up each dependency in the resolved_specset to find out + # what version Bundler has chosen of it. + + def format_deps_for_morphology(specset, dep_list) + info = dep_list.collect do |dep| + spec = specset[dep][0] + [spec.name, spec.version.to_s] + end + Hash[info] + end + + build_deps = format_deps_for_morphology( + resolved_specs, build_deps_for_gem(spec)) + runtime_deps = format_deps_for_morphology( + resolved_specs, runtime_deps_for_gem(spec)) + + morph['x-build-dependencies-rubygems'] = build_deps + morph['x-runtime-dependencies-rubygems'] = runtime_deps + + write_morph(STDOUT, morph) + end +end + +RubyGemChunkMorphologyGenerator.new.run diff --git a/rubygems.to_lorry b/rubygems.to_lorry new file mode 100755 index 0000000..7a00820 --- /dev/null +++ b/rubygems.to_lorry @@ -0,0 +1,164 @@ +#!/usr/bin/python +# +# Create a Baserock .lorry file for a given RubyGem +# +# Copyright (C) 2014 Codethink Limited +# +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; version 2 of the License. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License along +# with this program; if not, write to the Free Software Foundation, Inc., +# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. + + +import requests +import requests_cache +import yaml + +import logging +import json +import os +import sys +import urlparse + +from importer_base import ImportException, ImportExtension + + +class GenerateLorryException(ImportException): + pass + + +class RubyGemsWebServiceClient(object): + def __init__(self): + # Save hammering the rubygems.org API: 'requests' API calls are + # transparently cached in an SQLite database, instead. + requests_cache.install_cache('rubygems_api_cache') + + def _request(self, url): + r = requests.get(url) + if r.ok: + return json.loads(r.text) + else: + raise GenerateLorryException( + 'Request to %s failed: %s' % (r.url, r.reason)) + + def get_gem_info(self, gem_name): + info = self._request( + 'http://rubygems.org/api/v1/gems/%s.json' % gem_name) + + if info['name'] != gem_name: + # Sanity check + raise GenerateLorryException( + 'Received info for Gem "%s", requested "%s"' % info['name'], + gem_name) + + return info + + +class RubyGemLorryGenerator(ImportExtension): + def __init__(self): + super(RubyGemLorryGenerator, self).__init__() + + with open('rubygems.yaml', 'r') as f: + local_data = yaml.load(f.read()) + + self.lorry_prefix = local_data['lorry-prefix'] + self.known_source_uris = local_data['known-source-uris'] + + logging.debug( + "Loaded %i known source URIs from local metadata.", len(self.known_source_uris)) + + def process_args(self, args): + if len(args) != 1: + raise ImportException( + 'Please call me with the name of a RubyGem as an argument.\n') + + gem_name = args[0] + + lorry = self.generate_lorry_for_gem(gem_name) + self.write_lorry(sys.stdout, lorry) + + def find_upstream_repo_for_gem(self, gem_name, gem_info): + source_code_uri = gem_info['source_code_uri'] + + if gem_name in self.known_source_uris: + logging.debug('Found %s in known-source-uris', gem_name) + known_uri = self.known_source_uris[gem_name] + if source_code_uri is not None and known_uri != source_code_uri: + sys.stderr.write( + '%s: Hardcoded source URI %s doesn\'t match spec URI %s\n' % + (gem_name, known_uri, source_code_uri)) + return known_uri + + if source_code_uri is not None and len(source_code_uri) > 0: + logging.debug('Got source_code_uri %s', source_code_uri) + if source_code_uri.endswith('/tree'): + source_code_uri = source_code_uri[:-len('/tree')] + + return source_code_uri + + homepage_uri = gem_info['homepage_uri'] + if homepage_uri is not None and len(homepage_uri) > 0: + logging.debug('Got homepage_uri %s', source_code_uri) + netloc = urlparse.urlsplit(homepage_uri)[1] + if netloc == 'github.com': + return homepage_uri + + # Further possible leads on locating source code. + # http://ruby-toolbox.com/projects/$gemname -> sometimes contains an + # upstream link, even if the gem info does not. + # https://github.com/search?q=$gemname -> often the first result is + # the correct one, but you can never know. + + raise GenerateLorryException( + "Gem metadata for '%s' does not point to its source code " + "repository." % gem_name) + + def project_name_from_repo(self, repo_url): + if repo_url.endswith('/tree/master'): + repo_url = repo_url[:-len('/tree/master')] + if repo_url.endswith('/'): + repo_url = repo_url[:-1] + if repo_url.endswith('.git'): + repo_url = repo_url[:-len('.git')] + return os.path.basename(repo_url) + + def generate_lorry_for_gem(self, gem_name): + rubygems_client = RubyGemsWebServiceClient() + + gem_info = rubygems_client.get_gem_info(gem_name) + + gem_source_url = self.find_upstream_repo_for_gem(gem_name, gem_info) + logging.info('Got URL <%s> for %s', gem_source_url, gem_name) + + project_name = self.project_name_from_repo(gem_source_url) + lorry_name = self.lorry_prefix + project_name + + # One repo may produce multiple Gems. It's up to the caller to merge + # multiple .lorry files that get generated for the same repo. + + lorry = { + lorry_name: { + 'type': 'git', + 'url': gem_source_url, + 'x-products-rubygems': [gem_name] + } + } + + return lorry + + def write_lorry(self, stream, lorry): + json.dump(lorry, stream, indent=4) + # Needed so the morphlib.extensions code will pick up the last line. + stream.write('\n') + + +if __name__ == '__main__': + RubyGemLorryGenerator().run() diff --git a/rubygems.yaml b/rubygems.yaml new file mode 100644 index 0000000..e1e6fcc --- /dev/null +++ b/rubygems.yaml @@ -0,0 +1,49 @@ +--- + +lorry-prefix: ruby-gems/ + +# The :development dependency set is way too broad for our needs: for most Gems, +# it includes test tools and development aids that aren't necessary for just +# building the Gem. It's hard to even get a stratum if we include all these +# tools because of the number of circular dependencies. Instead, only those +# tools which are known to be required at Gem build time are listed as +# build-dependencies, and any other :development dependencies are ignored. +build-dependency-whitelist: + - hoe + # rake is bundled with Ruby, so it is not included in the whitelist. + +# The following Gems don't provide a source_code_uri in their Gem metadata. +# Ideally ... they would do. +known-source-uris: + appbundler: https://github.com/opscode/appbundler + ast: https://github.com/openSUSE/ast + brass: https://github.com/rubyworks/brass + coveralls: https://github.com/lemurheavy/coveralls-ruby + dep-selector-libgecode: https://github.com/opscode/dep-selector-libgecode + diff-lcs: https://github.com/halostatue/diff-lcs + erubis: https://github.com/kwatch/erubis + fog-brightbox: https://github.com/brightbox/fog-brightbox + highline: https://github.com/JEG2/highline + hoe: https://github.com/seattlerb/hoe + indexer: https://github.com/rubyworks/indexer + json: https://github.com/flori/json + method_source: https://github.com/banister/method_source + mixlib-authentication: https://github.com/opscode/mixlib-authentication + mixlib-cli: https://github.com/opscode/mixlib-cli + mixlib-log: https://github.com/opscode/mixlib-log + mixlib-shellout: http://github.com/opscode/mixlib-shellout + ohai: http://github.com/opscode/ohai + rack-cache: https://github.com/rtomayko/rack-cache + actionmailer: https://github.com/rails/rails + actionpack: https://github.com/rails/rails + actionview: https://github.com/rails/rails + activejob: https://github.com/rails/rails + activemodel: https://github.com/rails/rails + activerecord: https://github.com/rails/rails + activesupport: https://github.com/rails/rails + rails: https://github.com/rails/rails + railties: https://github.com/rails/rails + pg: https://github.com/ged/ruby-pg + sigar: https://github.com/hyperic/sigar + sprockets: https://github.com/sstephenson/sprockets + tins: https://github.com/flori/tins -- cgit v1.2.1