summaryrefslogtreecommitdiff
path: root/app/services/projects/import_service.rb
Commit message (Collapse)AuthorAgeFilesLines
* Merge branch 'fix/use-shard-name-in-gitlab-projects-instead-of-shard-path' ↵Sean McGivern2018-04-041-1/+1
|\ | | | | | | | | | | | | | | | | into 'master' Use shard name in Git::GitlabProjects instead of shard path Closes gitaly#1110 See merge request gitlab-org/gitlab-ce!18015
| * Use shard name in Git::GitlabProjects instead of shard pathAhmad Sherif2018-04-031-1/+1
| | | | | | | | Closes gitaly#1110
* | Raise more descriptive errors when URLs are blockedDouwe Maan2018-04-021-1/+5
|/
* Merge branch 'fj-15329-services-callbacks-ssrf' into 'security-10-6'Douwe Maan2018-03-211-1/+1
| | | | | Server Side Request Forgery in Services and Web Hooks See merge request gitlab/gitlabhq!2337
* Rename fetch_refs to refmapDouwe Maan2017-11-231-3/+3
|
* Clean up repository fetch and mirror methodsDouwe Maan2017-11-231-14/+6
|
* Prefer polymorphism over specific type checks in Import servicedm-import-service-polymorphismDouwe Maan2017-11-151-11/+11
|
* Replace old GH importer with the parallel importergithub-importer-refactorYorick Peterse2017-11-071-4/+4
|
* Rewrite the GitHub importer from scratchYorick Peterse2017-11-071-1/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Prior to this MR there were two GitHub related importers: * Github::Import: the main importer used for GitHub projects * Gitlab::GithubImport: importer that's somewhat confusingly used for importing Gitea projects (apparently they have a compatible API) This MR renames the Gitea importer to Gitlab::LegacyGithubImport and introduces a new GitHub importer in the Gitlab::GithubImport namespace. This new GitHub importer uses Sidekiq for importing multiple resources in parallel, though it also has the ability to import data sequentially should this be necessary. The new code is spread across the following directories: * lib/gitlab/github_import: this directory contains most of the importer code such as the classes used for importing resources. * app/workers/gitlab/github_import: this directory contains the Sidekiq workers, most of which simply use the code from the directory above. * app/workers/concerns/gitlab/github_import: this directory provides a few modules that are included in every GitHub importer worker. == Stages The import work is divided into separate stages, with each stage importing a specific set of data. Stages will schedule the work that needs to be performed, followed by scheduling a job for the "AdvanceStageWorker" worker. This worker will periodically check if all work is completed and schedule the next stage if this is the case. If work is not yet completed this worker will reschedule itself. Using this approach we don't have to block threads by calling `sleep()`, as doing so for large projects could block the thread from doing any work for many hours. == Retrying Work Workers will reschedule themselves whenever necessary. For example, hitting the GitHub API's rate limit will result in jobs rescheduling themselves. These jobs are not processed until the rate limit has been reset. == User Lookups Part of the importing process involves looking up user details in the GitHub API so we can map them to GitLab users. The old importer used an in-memory cache, but this obviously doesn't work when the work is spread across different threads. The new importer uses a Redis cache and makes sure we only perform API/database calls if absolutely necessary. Frequently used keys are refreshed, and lookup misses are also cached; removing the need for performing API/database calls if we know we don't have the data we're looking for. == Performance & Models The new importer in various places uses raw INSERT statements (as generated by `Gitlab::Database.bulk_insert`) instead of using Rails models. This allows us to bypass any validations and callbacks, drastically reducing the number of SQL queries and Gitaly RPC calls necessary to import projects. To ensure the code produces valid data the corresponding tests check if the produced rows are valid according to the model validation rules.
* Encapsulate git operations for mirroring in Gitlab::GitAlejandro Rodríguez2017-11-031-1/+1
|
* Add explanation why we should return early for GitHub importerDouglas Barbosa Alexandre2017-08-071-0/+2
|
* Does not fetch repository when importing from GitHub on import serviceDouglas Barbosa Alexandre2017-08-071-2/+4
|
* Rename path_with_namespace -> disk_path when dealing with the filesystemGabriel Mazetto2017-08-011-1/+1
|
* Rename many path_with_namespace -> full_pathGabriel Mazetto2017-08-011-1/+1
|
* Does not remove the GitHub remote when importing from GitHubDouglas Barbosa Alexandre2017-04-181-1/+0
|
* Fix Rubocop offensesDouglas Barbosa Alexandre2017-04-031-1/+1
|
* Refactoring Projects::ImportServiceDouglas Barbosa Alexandre2017-04-031-28/+22
|
* Fetch GitHub project as a mirror to get all refs at onceDouglas Barbosa Alexandre2017-04-031-1/+24
|
* Merge branch 'ssrf' into 'security' Douwe Maan2017-03-201-1/+2
| | | | | Protect server against SSRF in project import URLs See merge request !2068
* Enable and autocorrect the CustomErrorClass copSean McGivern2017-03-011-1/+1
|
* Improve Gitlab::ImportSourcesRémy Coutable2016-12-191-30/+2
| | | | Signed-off-by: Rémy Coutable <remy@rymai.me>
* Rename Gogs to Gitea, DRY the controller and improve viewsRémy Coutable2016-12-191-6/+21
| | | | Signed-off-by: Rémy Coutable <remy@rymai.me>
* Gogs ImporterKim "BKC" Carlbäcker2016-12-191-0/+1
|
* Check if repository already exists before trying to re-import itAhmad Sherif2016-10-281-1/+1
|
* fix broken repo 500 errors in UI and added relevant specsJames Lopez2016-09-291-0/+5
|
* fixes a few issues to do with import_url not being saved correctly for ↵James Lopez2016-07-121-1/+1
| | | | imports. This should prevent the import_data to be created when it should not and output an error properly validating before creating it.
* Refactor repository paths handling to allow multiple git mount pointsAlejandro Rodríguez2016-06-291-1/+1
|
* adapted current services stuff to use new project import, plus fixes a few ↵James Lopez2016-06-141-10/+14
| | | | issues, updated routes, etc...
* lots of refactoring to reuse import serviceJames Lopez2016-06-141-2/+5
|
* fix empty message on shell errorJames Lopez2016-06-021-1/+1
|
* Flush repository cache before import project datafix-gh-pr-importDouglas Barbosa Alexandre2016-04-041-0/+2
| | | | | GitHub Pull Requests importer handle with the repository while importing data, we need to make sure that the cached values are valid.
* Move Gitlab::BitbucketImport::KeyDeleter to it's own importerDouglas Barbosa Alexandre2016-01-261-4/+0
|
* Extract Projects::ImportService service from RepositoryImportWorkerDouglas Barbosa Alexandre2016-01-251-0/+71