diff options
Diffstat (limited to 'doc/development/github_importer.md')
-rw-r--r-- | doc/development/github_importer.md | 53 |
1 files changed, 27 insertions, 26 deletions
diff --git a/doc/development/github_importer.md b/doc/development/github_importer.md index 17b4a28f57d..73497c22b65 100644 --- a/doc/development/github_importer.md +++ b/doc/development/github_importer.md @@ -12,12 +12,13 @@ necessary to import GitHub projects into a GitLab instance. The GitHub importer offers two different types of importers: a sequential importer and a parallel importer. The Rake task `import:github` uses the -sequential importer, while everything else uses the parallel importer. The -difference between these two importers is quite simple: the sequential importer -does all work in a single thread, making it more useful for debugging purposes -or Rake tasks. The parallel importer, on the other hand, uses Sidekiq. +sequential importer, and everything else uses the parallel importer. The +difference between these two importers is: -## Requirements +- The sequential importer does all the work in a single thread, so it's more suited for debugging purposes or Rake tasks. +- The parallel importer uses Sidekiq. + +## Prerequisites - GitLab CE 10.2.0 or newer. - Sidekiq workers that process the `github_importer` and @@ -69,30 +70,38 @@ don't need to perform this work in parallel. This worker imports all pull requests. For every pull request a job for the `Gitlab::GithubImport::ImportPullRequestWorker` worker is scheduled. -### 5. Stage::ImportPullRequestsMergedByWorker +### 5. Stage::ImportCollaboratorsWorker + +This worker imports only direct repository collaborators who are not outside collaborators. +For every collaborator, we schedule a job for the `Gitlab::GithubImport::ImportCollaboratorWorker` worker. + +NOTE: +This stage is optional (controlled by `Gitlab::GithubImport::Settings`) and is selected by default. + +### 6. Stage::ImportPullRequestsMergedByWorker This worker imports the pull requests' _merged-by_ user information. The [_List pull requests_](https://docs.github.com/en/rest/pulls#list-pull-requests) API doesn't provide this information. Therefore, this stage must fetch each merged pull request individually to import this information. A -`Gitlab::GithubImport::ImportPullRequestMergedByWorker` job is scheduled for each fetched pull +`Gitlab::GithubImport::PullRequests::ImportMergedByWorker` job is scheduled for each fetched pull request. -### 6. Stage::ImportPullRequestsReviewRequestsWorker +### 7. Stage::ImportPullRequestsReviewRequestsWorker This worker imports assigned reviewers of pull requests. For each pull request, this worker: - Fetches all assigned review requests. - Schedules a `Gitlab::GithubImport::PullRequests::ImportReviewRequestWorker` job for each fetched review request. -### 7. Stage::ImportPullRequestsReviewsWorker +### 8. Stage::ImportPullRequestsReviewsWorker This worker imports reviews of pull requests. For each pull request, this worker: - Fetches all the pages of reviews. -- Schedules a `Gitlab::GithubImport::ImportPullRequestReviewWorker` job for each fetched review. +- Schedules a `Gitlab::GithubImport::PullRequests::ImportReviewWorker` job for each fetched review. -### 8. Stage::ImportIssuesAndDiffNotesWorker +### 9. Stage::ImportIssuesAndDiffNotesWorker This worker imports all issues and pull request comments. For every issue, we schedule a job for the `Gitlab::GithubImport::ImportIssueWorker` worker. For @@ -108,7 +117,7 @@ label links in the same worker removes the need for performing a separate crawl through the API data, reducing the number of API calls necessary to import a project. -### 9. Stage::ImportIssueEventsWorker +### 10. Stage::ImportIssueEventsWorker This worker imports all issues and pull request events. For every event, we schedule a job for the `Gitlab::GithubImport::ImportIssueEventWorker` worker. @@ -124,7 +133,7 @@ Therefore, both issues and pull requests have a common API for most related thin NOTE: This stage is optional and can consume significant extra import time (controlled by `Gitlab::GithubImport::Settings`). -### 10. Stage::ImportNotesWorker +### 11. Stage::ImportNotesWorker This worker imports regular comments for both issues and pull requests. For every comment, we schedule a job for the @@ -135,7 +144,7 @@ returns comments for both issues and pull requests. This means we have to wait for all issues and pull requests to be imported before we can import regular comments. -### 11. Stage::ImportAttachmentsWorker +### 12. Stage::ImportAttachmentsWorker This worker imports note attachments that are linked inside Markdown. For each entity with Markdown text in the project, we schedule a job of: @@ -154,7 +163,7 @@ Each job: NOTE: It's an optional stage that could consume significant extra import time (controlled by `Gitlab::GithubImport::Settings`). -### 12. Stage::ImportProtectedBranchesWorker +### 13. Stage::ImportProtectedBranchesWorker This worker imports protected branch rules. For every rule that exists on GitHub, we schedule a job of @@ -163,7 +172,7 @@ For every rule that exists on GitHub, we schedule a job of Each job compares the branch protection rules from GitHub and GitLab and applies the strictest of the rules to the branches in GitLab. -### 13. Stage::FinishImportWorker +### 14. Stage::FinishImportWorker This worker completes the import process by performing some housekeeping (such as flushing any caches) and by marking the import as completed. @@ -179,10 +188,10 @@ Advancing stages is done in one of two ways: The first approach should only be used by workers that perform all their work in a single thread, while `AdvanceStageWorker` should be used for everything else. -The way `AdvanceStageWorker` works is fairly simple. When scheduling a job it +When you schedule a job, `AdvanceStageWorker` is given a project ID, a list of Redis keys, and the name of the next stage. The Redis keys (produced by `Gitlab::JobWaiter`) are used to check if the -currently running stage has been completed or not. If the stage has not yet been +running stage has been completed or not. If the stage has not yet been completed `AdvanceStageWorker` reschedules itself. After a stage finishes `AdvanceStageworker` refreshes the import JID (more on this below) and schedule the worker of the next stage. @@ -324,14 +333,6 @@ The last log entry reports the number of objects fetched and imported: } ``` -## Errors when importing large projects - -The GitHub importer may encounter errors when importing large projects. For help with this, see the -documentation for the following use cases: - -- [Alternative way to import notes and diff notes](../user/project/import/github.md#alternative-way-to-import-notes-and-diff-notes) -- [Reduce GitHub API request objects per page](../user/project/import/github.md#reduce-github-api-request-objects-per-page) - ## Metrics dashboards To assess the GitHub importer health, the [GitHub importer dashboard](https://dashboards.gitlab.net/d/importers-github-importer/importers-github-importer) |