diff options
author | Achilleas Pipinellis <axilleas@axilleas.me> | 2016-09-25 12:44:09 +0200 |
---|---|---|
committer | Achilleas Pipinellis <axilleas@axilleas.me> | 2016-10-11 16:13:04 +0200 |
commit | d8f33c0a51d9106ece6cd4bae469e40734e05f85 (patch) | |
tree | f8168d158bb85d303e6dcab953ebf6a26c9dfdd2 /doc/administration | |
parent | 6602b917f7ffcb8c8e9134ee156ac49c24ed2a9b (diff) | |
download | gitlab-ce-d8f33c0a51d9106ece6cd4bae469e40734e05f85.tar.gz |
Move operations/ to new locationdocs/refactor-operations
[ci skip]
Diffstat (limited to 'doc/administration')
-rw-r--r-- | doc/administration/operations.md | 6 | ||||
-rw-r--r-- | doc/administration/operations/cleaning_up_redis_sessions.md | 52 | ||||
-rw-r--r-- | doc/administration/operations/moving_repositories.md | 180 | ||||
-rw-r--r-- | doc/administration/operations/sidekiq_memory_killer.md | 40 | ||||
-rw-r--r-- | doc/administration/operations/unicorn.md | 86 |
5 files changed, 364 insertions, 0 deletions
diff --git a/doc/administration/operations.md b/doc/administration/operations.md new file mode 100644 index 00000000000..4b582d16b64 --- /dev/null +++ b/doc/administration/operations.md @@ -0,0 +1,6 @@ +# GitLab operations + +- [Sidekiq MemoryKiller](operations/sidekiq_memory_killer.md) +- [Cleaning up Redis sessions](operations/cleaning_up_redis_sessions.md) +- [Understanding Unicorn and unicorn-worker-killer](operations/unicorn.md) +- [Moving repositories to a new location](operations/moving_repositories.md) diff --git a/doc/administration/operations/cleaning_up_redis_sessions.md b/doc/administration/operations/cleaning_up_redis_sessions.md new file mode 100644 index 00000000000..93521e976d5 --- /dev/null +++ b/doc/administration/operations/cleaning_up_redis_sessions.md @@ -0,0 +1,52 @@ +# Cleaning up stale Redis sessions + +Since version 6.2, GitLab stores web user sessions as key-value pairs in Redis. +Prior to GitLab 7.3, user sessions did not automatically expire from Redis. If +you have been running a large GitLab server (thousands of users) since before +GitLab 7.3 we recommend cleaning up stale sessions to compact the Redis +database after you upgrade to GitLab 7.3. You can also perform a cleanup while +still running GitLab 7.2 or older, but in that case new stale sessions will +start building up again after you clean up. + +In GitLab versions prior to 7.3.0, the session keys in Redis are 16-byte +hexadecimal values such as '976aa289e2189b17d7ef525a6702ace9'. Starting with +GitLab 7.3.0, the keys are +prefixed with 'session:gitlab:', so they would look like +'session:gitlab:976aa289e2189b17d7ef525a6702ace9'. Below we describe how to +remove the keys in the old format. + +First we define a shell function with the proper Redis connection details. + +``` +rcli() { + # This example works for Omnibus installations of GitLab 7.3 or newer. For an + # installation from source you will have to change the socket path and the + # path to redis-cli. + sudo /opt/gitlab/embedded/bin/redis-cli -s /var/opt/gitlab/redis/redis.socket "$@" +} + +# test the new shell function; the response should be PONG +rcli ping +``` + +Now we do a search to see if there are any session keys in the old format for +us to clean up. + +``` +# returns the number of old-format session keys in Redis +rcli keys '*' | grep '^[a-f0-9]\{32\}$' | wc -l +``` + +If the number is larger than zero, you can proceed to expire the keys from +Redis. If the number is zero there is nothing to clean up. + +``` +# Tell Redis to expire each matched key after 600 seconds. +rcli keys '*' | grep '^[a-f0-9]\{32\}$' | awk '{ print "expire", $0, 600 }' | rcli +# This will print '(integer) 1' for each key that gets expired. +``` + +Over the next 15 minutes (10 minutes expiry time plus 5 minutes Redis +background save interval) your Redis database will be compacted. If you are +still using GitLab 7.2, users who are not clicking around in GitLab during the +10 minute expiry window will be signed out of GitLab. diff --git a/doc/administration/operations/moving_repositories.md b/doc/administration/operations/moving_repositories.md new file mode 100644 index 00000000000..54adb99386a --- /dev/null +++ b/doc/administration/operations/moving_repositories.md @@ -0,0 +1,180 @@ +# Moving repositories managed by GitLab + +Sometimes you need to move all repositories managed by GitLab to +another filesystem or another server. In this document we will look +at some of the ways you can copy all your repositories from +`/var/opt/gitlab/git-data/repositories` to `/mnt/gitlab/repositories`. + +We will look at three scenarios: the target directory is empty, the +target directory contains an outdated copy of the repositories, and +how to deal with thousands of repositories. + +**Each of the approaches we list can/will overwrite data in the +target directory `/mnt/gitlab/repositories`. Do not mix up the +source and the target.** + +## Target directory is empty: use a tar pipe + +If the target directory `/mnt/gitlab/repositories` is empty the +simplest thing to do is to use a tar pipe. This method has low +overhead and tar is almost always already installed on your system. +However, it is not possible to resume an interrupted tar pipe: if +that happens then all data must be copied again. + +``` +# As the git user +tar -C /var/opt/gitlab/git-data/repositories -cf - -- . |\ + tar -C /mnt/gitlab/repositories -xf - +``` + +If you want to see progress, replace `-xf` with `-xvf`. + +### Tar pipe to another server + +You can also use a tar pipe to copy data to another server. If your +'git' user has SSH access to the newserver as 'git@newserver', you +can pipe the data through SSH. + +``` +# As the git user +tar -C /var/opt/gitlab/git-data/repositories -cf - -- . |\ + ssh git@newserver tar -C /mnt/gitlab/repositories -xf - +``` + +If you want to compress the data before it goes over the network +(which will cost you CPU cycles) you can replace `ssh` with `ssh -C`. + +## The target directory contains an outdated copy of the repositories: use rsync + +If the target directory already contains a partial / outdated copy +of the repositories it may be wasteful to copy all the data again +with tar. In this scenario it is better to use rsync. This utility +is either already installed on your system or easily installable +via apt, yum etc. + +``` +# As the 'git' user +rsync -a --delete /var/opt/gitlab/git-data/repositories/. \ + /mnt/gitlab/repositories +``` + +The `/.` in the command above is very important, without it you can +easily get the wrong directory structure in the target directory. +If you want to see progress, replace `-a` with `-av`. + +### Single rsync to another server + +If the 'git' user on your source system has SSH access to the target +server you can send the repositories over the network with rsync. + +``` +# As the 'git' user +rsync -a --delete /var/opt/gitlab/git-data/repositories/. \ + git@newserver:/mnt/gitlab/repositories +``` + +## Thousands of Git repositories: use one rsync per repository + +Every time you start an rsync job it has to inspect all files in +the source directory, all files in the target directory, and then +decide what files to copy or not. If the source or target directory +has many contents this startup phase of rsync can become a burden +for your GitLab server. In cases like this you can make rsync's +life easier by dividing its work in smaller pieces, and sync one +repository at a time. + +In addition to rsync we will use [GNU +Parallel](http://www.gnu.org/software/parallel/). This utility is +not included in GitLab so you need to install it yourself with apt +or yum. Also note that the GitLab scripts we used below were added +in GitLab 8.1. + +** This process does not clean up repositories at the target location that no +longer exist at the source. ** If you start using your GitLab instance with +`/mnt/gitlab/repositories`, you need to run `gitlab-rake gitlab:cleanup:repos` +after switching to the new repository storage directory. + +### Parallel rsync for all repositories known to GitLab + +This will sync repositories with 10 rsync processes at a time. We keep +track of progress so that the transfer can be restarted if necessary. + +First we create a new directory, owned by 'git', to hold transfer +logs. We assume the directory is empty before we start the transfer +procedure, and that we are the only ones writing files in it. + +``` +# Omnibus +sudo mkdir /var/opt/gitlab/transfer-logs +sudo chown git:git /var/opt/gitlab/transfer-logs + +# Source +sudo -u git -H mkdir /home/git/transfer-logs +``` + +We seed the process with a list of the directories we want to copy. + +``` +# Omnibus +sudo -u git sh -c 'gitlab-rake gitlab:list_repos > /var/opt/gitlab/transfer-logs/all-repos-$(date +%s).txt' + +# Source +cd /home/git/gitlab +sudo -u git -H sh -c 'bundle exec rake gitlab:list_repos > /home/git/transfer-logs/all-repos-$(date +%s).txt' +``` + +Now we can start the transfer. The command below is idempotent, and +the number of jobs done by GNU Parallel should converge to zero. If it +does not some repositories listed in all-repos-1234.txt may have been +deleted/renamed before they could be copied. + +``` +# Omnibus +sudo -u git sh -c ' +cat /var/opt/gitlab/transfer-logs/* | sort | uniq -u |\ + /usr/bin/env JOBS=10 \ + /opt/gitlab/embedded/service/gitlab-rails/bin/parallel-rsync-repos \ + /var/opt/gitlab/transfer-logs/success-$(date +%s).log \ + /var/opt/gitlab/git-data/repositories \ + /mnt/gitlab/repositories +' + +# Source +cd /home/git/gitlab +sudo -u git -H sh -c ' +cat /home/git/transfer-logs/* | sort | uniq -u |\ + /usr/bin/env JOBS=10 \ + bin/parallel-rsync-repos \ + /home/git/transfer-logs/success-$(date +%s).log \ + /home/git/repositories \ + /mnt/gitlab/repositories +` +``` + +### Parallel rsync only for repositories with recent activity + +Suppose you have already done one sync that started after 2015-10-1 12:00 UTC. +Then you might only want to sync repositories that were changed via GitLab +_after_ that time. You can use the 'SINCE' variable to tell 'rake +gitlab:list_repos' to only print repositories with recent activity. + +``` +# Omnibus +sudo gitlab-rake gitlab:list_repos SINCE='2015-10-1 12:00 UTC' |\ + sudo -u git \ + /usr/bin/env JOBS=10 \ + /opt/gitlab/embedded/service/gitlab-rails/bin/parallel-rsync-repos \ + success-$(date +%s).log \ + /var/opt/gitlab/git-data/repositories \ + /mnt/gitlab/repositories + +# Source +cd /home/git/gitlab +sudo -u git -H bundle exec rake gitlab:list_repos SINCE='2015-10-1 12:00 UTC' |\ + sudo -u git -H \ + /usr/bin/env JOBS=10 \ + bin/parallel-rsync-repos \ + success-$(date +%s).log \ + /home/git/repositories \ + /mnt/gitlab/repositories +``` diff --git a/doc/administration/operations/sidekiq_memory_killer.md b/doc/administration/operations/sidekiq_memory_killer.md new file mode 100644 index 00000000000..b5e78348989 --- /dev/null +++ b/doc/administration/operations/sidekiq_memory_killer.md @@ -0,0 +1,40 @@ +# Sidekiq MemoryKiller + +The GitLab Rails application code suffers from memory leaks. For web requests +this problem is made manageable using +[unicorn-worker-killer](https://github.com/kzk/unicorn-worker-killer) which +restarts Unicorn worker processes in between requests when needed. The Sidekiq +MemoryKiller applies the same approach to the Sidekiq processes used by GitLab +to process background jobs. + +Unlike unicorn-worker-killer, which is enabled by default for all GitLab +installations since GitLab 6.4, the Sidekiq MemoryKiller is enabled by default +_only_ for Omnibus packages. The reason for this is that the MemoryKiller +relies on Runit to restart Sidekiq after a memory-induced shutdown and GitLab +installations from source do not all use Runit or an equivalent. + +With the default settings, the MemoryKiller will cause a Sidekiq restart no +more often than once every 15 minutes, with the restart causing about one +minute of delay for incoming background jobs. + +## Configuring the MemoryKiller + +The MemoryKiller is controlled using environment variables. + +- `SIDEKIQ_MEMORY_KILLER_MAX_RSS`: if this variable is set, and its value is + greater than 0, then after each Sidekiq job, the MemoryKiller will check the + RSS of the Sidekiq process that executed the job. If the RSS of the Sidekiq + process (expressed in kilobytes) exceeds SIDEKIQ_MEMORY_KILLER_MAX_RSS, a + delayed shutdown is triggered. The default value for Omnibus packages is set + [in the omnibus-gitlab + repository](https://gitlab.com/gitlab-org/omnibus-gitlab/blob/master/files/gitlab-cookbooks/gitlab/attributes/default.rb). +- `SIDEKIQ_MEMORY_KILLER_GRACE_TIME`: defaults 900 seconds (15 minutes). When + a shutdown is triggered, the Sidekiq process will keep working normally for + another 15 minutes. +- `SIDEKIQ_MEMORY_KILLER_SHUTDOWN_WAIT`: defaults to 30 seconds. When the grace + time has expired, the MemoryKiller tells Sidekiq to stop accepting new jobs. + Existing jobs get 30 seconds to finish. After that, the MemoryKiller tells + Sidekiq to shut down, and an external supervision mechanism (e.g. Runit) must + restart Sidekiq. +- `SIDEKIQ_MEMORY_KILLER_SHUTDOWN_SIGNAL`: defaults to `SIGKILL`. The name of + the final signal sent to the Sidekiq process when we want it to shut down. diff --git a/doc/administration/operations/unicorn.md b/doc/administration/operations/unicorn.md new file mode 100644 index 00000000000..bad61151bda --- /dev/null +++ b/doc/administration/operations/unicorn.md @@ -0,0 +1,86 @@ +# Understanding Unicorn and unicorn-worker-killer + +## Unicorn + +GitLab uses [Unicorn](http://unicorn.bogomips.org/), a pre-forking Ruby web +server, to handle web requests (web browsers and Git HTTP clients). Unicorn is +a daemon written in Ruby and C that can load and run a Ruby on Rails +application; in our case the Rails application is GitLab Community Edition or +GitLab Enterprise Edition. + +Unicorn has a multi-process architecture to make better use of available CPU +cores (processes can run on different cores) and to have stronger fault +tolerance (most failures stay isolated in only one process and cannot take down +GitLab entirely). On startup, the Unicorn 'master' process loads a clean Ruby +environment with the GitLab application code, and then spawns 'workers' which +inherit this clean initial environment. The 'master' never handles any +requests, that is left to the workers. The operating system network stack +queues incoming requests and distributes them among the workers. + +In a perfect world, the master would spawn its pool of workers once, and then +the workers handle incoming web requests one after another until the end of +time. In reality, worker processes can crash or time out: if the master notices +that a worker takes too long to handle a request it will terminate the worker +process with SIGKILL ('kill -9'). No matter how the worker process ended, the +master process will replace it with a new 'clean' process again. Unicorn is +designed to be able to replace 'crashed' workers without dropping user +requests. + +This is what a Unicorn worker timeout looks like in `unicorn_stderr.log`. The +master process has PID 56227 below. + +``` +[2015-06-05T10:58:08.660325 #56227] ERROR -- : worker=10 PID:53009 timeout (61s > 60s), killing +[2015-06-05T10:58:08.699360 #56227] ERROR -- : reaped #<Process::Status: pid 53009 SIGKILL (signal 9)> worker=10 +[2015-06-05T10:58:08.708141 #62538] INFO -- : worker=10 spawned pid=62538 +[2015-06-05T10:58:08.708824 #62538] INFO -- : worker=10 ready +``` + +### Tunables + +The main tunables for Unicorn are the number of worker processes and the +request timeout after which the Unicorn master terminates a worker process. +See the [omnibus-gitlab Unicorn settings +documentation](https://gitlab.com/gitlab-org/omnibus-gitlab/blob/master/doc/settings/unicorn.md) +if you want to adjust these settings. + +## unicorn-worker-killer + +GitLab has memory leaks. These memory leaks manifest themselves in long-running +processes, such as Unicorn workers. (The Unicorn master process is not known to +leak memory, probably because it does not handle user requests.) + +To make these memory leaks manageable, GitLab comes with the +[unicorn-worker-killer gem](https://github.com/kzk/unicorn-worker-killer). This +gem [monkey-patches](https://en.wikipedia.org/wiki/Monkey_patch) the Unicorn +workers to do a memory self-check after every 16 requests. If the memory of the +Unicorn worker exceeds a pre-set limit then the worker process exits. The +Unicorn master then automatically replaces the worker process. + +This is a robust way to handle memory leaks: Unicorn is designed to handle +workers that 'crash' so no user requests will be dropped. The +unicorn-worker-killer gem is designed to only terminate a worker process _in +between requests_, so no user requests are affected. + +This is what a Unicorn worker memory restart looks like in unicorn_stderr.log. +You see that worker 4 (PID 125918) is inspecting itself and decides to exit. +The threshold memory value was 254802235 bytes, about 250MB. With GitLab this +threshold is a random value between 200 and 250 MB. The master process (PID +117565) then reaps the worker process and spawns a new 'worker 4' with PID +127549. + +``` +[2015-06-05T12:07:41.828374 #125918] WARN -- : #<Unicorn::HttpServer:0x00000002734770>: worker (pid: 125918) exceeds memory limit (256413696 bytes > 254802235 bytes) +[2015-06-05T12:07:41.828472 #125918] WARN -- : Unicorn::WorkerKiller send SIGQUIT (pid: 125918) alive: 23 sec (trial 1) +[2015-06-05T12:07:42.025916 #117565] INFO -- : reaped #<Process::Status: pid 125918 exit 0> worker=4 +[2015-06-05T12:07:42.034527 #127549] INFO -- : worker=4 spawned pid=127549 +[2015-06-05T12:07:42.035217 #127549] INFO -- : worker=4 ready +``` + +One other thing that stands out in the log snippet above, taken from +GitLab.com, is that 'worker 4' was serving requests for only 23 seconds. This +is a normal value for our current GitLab.com setup and traffic. + +The high frequency of Unicorn memory restarts on some GitLab sites can be a +source of confusion for administrators. Usually they are a [red +herring](https://en.wikipedia.org/wiki/Red_herring). |