summaryrefslogtreecommitdiff
path: root/doc/development/github_importer.md
diff options
context:
space:
mode:
Diffstat (limited to 'doc/development/github_importer.md')
-rw-r--r--doc/development/github_importer.md53
1 files changed, 27 insertions, 26 deletions
diff --git a/doc/development/github_importer.md b/doc/development/github_importer.md
index 17b4a28f57d..73497c22b65 100644
--- a/doc/development/github_importer.md
+++ b/doc/development/github_importer.md
@@ -12,12 +12,13 @@ necessary to import GitHub projects into a GitLab instance.
The GitHub importer offers two different types of importers: a sequential
importer and a parallel importer. The Rake task `import:github` uses the
-sequential importer, while everything else uses the parallel importer. The
-difference between these two importers is quite simple: the sequential importer
-does all work in a single thread, making it more useful for debugging purposes
-or Rake tasks. The parallel importer, on the other hand, uses Sidekiq.
+sequential importer, and everything else uses the parallel importer. The
+difference between these two importers is:
-## Requirements
+- The sequential importer does all the work in a single thread, so it's more suited for debugging purposes or Rake tasks.
+- The parallel importer uses Sidekiq.
+
+## Prerequisites
- GitLab CE 10.2.0 or newer.
- Sidekiq workers that process the `github_importer` and
@@ -69,30 +70,38 @@ don't need to perform this work in parallel.
This worker imports all pull requests. For every pull request a job for the
`Gitlab::GithubImport::ImportPullRequestWorker` worker is scheduled.
-### 5. Stage::ImportPullRequestsMergedByWorker
+### 5. Stage::ImportCollaboratorsWorker
+
+This worker imports only direct repository collaborators who are not outside collaborators.
+For every collaborator, we schedule a job for the `Gitlab::GithubImport::ImportCollaboratorWorker` worker.
+
+NOTE:
+This stage is optional (controlled by `Gitlab::GithubImport::Settings`) and is selected by default.
+
+### 6. Stage::ImportPullRequestsMergedByWorker
This worker imports the pull requests' _merged-by_ user information. The
[_List pull requests_](https://docs.github.com/en/rest/pulls#list-pull-requests)
API doesn't provide this information. Therefore, this stage must fetch each merged pull request
individually to import this information. A
-`Gitlab::GithubImport::ImportPullRequestMergedByWorker` job is scheduled for each fetched pull
+`Gitlab::GithubImport::PullRequests::ImportMergedByWorker` job is scheduled for each fetched pull
request.
-### 6. Stage::ImportPullRequestsReviewRequestsWorker
+### 7. Stage::ImportPullRequestsReviewRequestsWorker
This worker imports assigned reviewers of pull requests. For each pull request, this worker:
- Fetches all assigned review requests.
- Schedules a `Gitlab::GithubImport::PullRequests::ImportReviewRequestWorker` job for each fetched review request.
-### 7. Stage::ImportPullRequestsReviewsWorker
+### 8. Stage::ImportPullRequestsReviewsWorker
This worker imports reviews of pull requests. For each pull request, this worker:
- Fetches all the pages of reviews.
-- Schedules a `Gitlab::GithubImport::ImportPullRequestReviewWorker` job for each fetched review.
+- Schedules a `Gitlab::GithubImport::PullRequests::ImportReviewWorker` job for each fetched review.
-### 8. Stage::ImportIssuesAndDiffNotesWorker
+### 9. Stage::ImportIssuesAndDiffNotesWorker
This worker imports all issues and pull request comments. For every issue, we
schedule a job for the `Gitlab::GithubImport::ImportIssueWorker` worker. For
@@ -108,7 +117,7 @@ label links in the same worker removes the need for performing a separate crawl
through the API data, reducing the number of API calls necessary to import a
project.
-### 9. Stage::ImportIssueEventsWorker
+### 10. Stage::ImportIssueEventsWorker
This worker imports all issues and pull request events. For every event, we
schedule a job for the `Gitlab::GithubImport::ImportIssueEventWorker` worker.
@@ -124,7 +133,7 @@ Therefore, both issues and pull requests have a common API for most related thin
NOTE:
This stage is optional and can consume significant extra import time (controlled by `Gitlab::GithubImport::Settings`).
-### 10. Stage::ImportNotesWorker
+### 11. Stage::ImportNotesWorker
This worker imports regular comments for both issues and pull requests. For
every comment, we schedule a job for the
@@ -135,7 +144,7 @@ returns comments for both issues and pull requests. This means we have to wait
for all issues and pull requests to be imported before we can import regular
comments.
-### 11. Stage::ImportAttachmentsWorker
+### 12. Stage::ImportAttachmentsWorker
This worker imports note attachments that are linked inside Markdown.
For each entity with Markdown text in the project, we schedule a job of:
@@ -154,7 +163,7 @@ Each job:
NOTE:
It's an optional stage that could consume significant extra import time (controlled by `Gitlab::GithubImport::Settings`).
-### 12. Stage::ImportProtectedBranchesWorker
+### 13. Stage::ImportProtectedBranchesWorker
This worker imports protected branch rules.
For every rule that exists on GitHub, we schedule a job of
@@ -163,7 +172,7 @@ For every rule that exists on GitHub, we schedule a job of
Each job compares the branch protection rules from GitHub and GitLab and applies
the strictest of the rules to the branches in GitLab.
-### 13. Stage::FinishImportWorker
+### 14. Stage::FinishImportWorker
This worker completes the import process by performing some housekeeping
(such as flushing any caches) and by marking the import as completed.
@@ -179,10 +188,10 @@ Advancing stages is done in one of two ways:
The first approach should only be used by workers that perform all their work in
a single thread, while `AdvanceStageWorker` should be used for everything else.
-The way `AdvanceStageWorker` works is fairly simple. When scheduling a job it
+When you schedule a job, `AdvanceStageWorker`
is given a project ID, a list of Redis keys, and the name of the next
stage. The Redis keys (produced by `Gitlab::JobWaiter`) are used to check if the
-currently running stage has been completed or not. If the stage has not yet been
+running stage has been completed or not. If the stage has not yet been
completed `AdvanceStageWorker` reschedules itself. After a stage finishes
`AdvanceStageworker` refreshes the import JID (more on this below) and
schedule the worker of the next stage.
@@ -324,14 +333,6 @@ The last log entry reports the number of objects fetched and imported:
}
```
-## Errors when importing large projects
-
-The GitHub importer may encounter errors when importing large projects. For help with this, see the
-documentation for the following use cases:
-
-- [Alternative way to import notes and diff notes](../user/project/import/github.md#alternative-way-to-import-notes-and-diff-notes)
-- [Reduce GitHub API request objects per page](../user/project/import/github.md#reduce-github-api-request-objects-per-page)
-
## Metrics dashboards
To assess the GitHub importer health, the [GitHub importer dashboard](https://dashboards.gitlab.net/d/importers-github-importer/importers-github-importer)