diff options
Diffstat (limited to 'doc/administration/troubleshooting')
-rw-r--r-- | doc/administration/troubleshooting/debug.md | 110 | ||||
-rw-r--r-- | doc/administration/troubleshooting/diagnostics_tools.md | 27 | ||||
-rw-r--r-- | doc/administration/troubleshooting/elasticsearch.md | 345 | ||||
-rw-r--r-- | doc/administration/troubleshooting/kubernetes_cheat_sheet.md | 261 | ||||
-rw-r--r-- | doc/administration/troubleshooting/sidekiq.md | 118 |
5 files changed, 806 insertions, 55 deletions
diff --git a/doc/administration/troubleshooting/debug.md b/doc/administration/troubleshooting/debug.md index 8f7280d5128..604dff5983d 100644 --- a/doc/administration/troubleshooting/debug.md +++ b/doc/administration/troubleshooting/debug.md @@ -10,43 +10,43 @@ an SMTP server, but you're not seeing mail delivered. Here's how to check the se 1. Run a Rails console: - ```sh - sudo gitlab-rails console production - ``` + ```sh + sudo gitlab-rails console production + ``` - or for source installs: + or for source installs: - ```sh - bundle exec rails console production - ``` + ```sh + bundle exec rails console production + ``` 1. Look at the ActionMailer `delivery_method` to make sure it matches what you intended. If you configured SMTP, it should say `:smtp`. If you're using Sendmail, it should say `:sendmail`: - ```ruby - irb(main):001:0> ActionMailer::Base.delivery_method - => :smtp - ``` + ```ruby + irb(main):001:0> ActionMailer::Base.delivery_method + => :smtp + ``` 1. If you're using SMTP, check the mail settings: - ```ruby - irb(main):002:0> ActionMailer::Base.smtp_settings - => {:address=>"localhost", :port=>25, :domain=>"localhost.localdomain", :user_name=>nil, :password=>nil, :authentication=>nil, :enable_starttls_auto=>true}``` - ``` + ```ruby + irb(main):002:0> ActionMailer::Base.smtp_settings + => {:address=>"localhost", :port=>25, :domain=>"localhost.localdomain", :user_name=>nil, :password=>nil, :authentication=>nil, :enable_starttls_auto=>true}``` + ``` - In the example above, the SMTP server is configured for the local machine. If this is intended, you may need to check your local mail - logs (e.g. `/var/log/mail.log`) for more details. + In the example above, the SMTP server is configured for the local machine. If this is intended, you may need to check your local mail + logs (e.g. `/var/log/mail.log`) for more details. -1. Send a test message via the console. +1. Send a test message via the console. - ```ruby - irb(main):003:0> Notify.test_email('youremail@email.com', 'Hello World', 'This is a test message').deliver_now - ``` + ```ruby + irb(main):003:0> Notify.test_email('youremail@email.com', 'Hello World', 'This is a test message').deliver_now + ``` - If you do not receive an e-mail and/or see an error message, then check - your mail server settings. + If you do not receive an e-mail and/or see an error message, then check + your mail server settings. ## Advanced Issues @@ -103,37 +103,37 @@ downtime. Otherwise skip to the next section. 1. Run `sudo gdb -p <PID>` to attach to the unicorn process. 1. In the gdb window, type: - ``` - call (void) rb_backtrace() - ``` + ``` + call (void) rb_backtrace() + ``` 1. This forces the process to generate a Ruby backtrace. Check `/var/log/gitlab/unicorn/unicorn_stderr.log` for the backtace. For example, you may see: - ```ruby - from /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/metrics/sampler.rb:33:in `block in start' - from /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/metrics/sampler.rb:33:in `loop' - from /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/metrics/sampler.rb:36:in `block (2 levels) in start' - from /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/metrics/sampler.rb:44:in `sample' - from /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/metrics/sampler.rb:68:in `sample_objects' - from /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/metrics/sampler.rb:68:in `each_with_object' - from /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/metrics/sampler.rb:68:in `each' - from /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/metrics/sampler.rb:69:in `block in sample_objects' - from /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/metrics/sampler.rb:69:in `name' - ``` + ```ruby + from /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/metrics/sampler.rb:33:in `block in start' + from /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/metrics/sampler.rb:33:in `loop' + from /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/metrics/sampler.rb:36:in `block (2 levels) in start' + from /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/metrics/sampler.rb:44:in `sample' + from /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/metrics/sampler.rb:68:in `sample_objects' + from /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/metrics/sampler.rb:68:in `each_with_object' + from /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/metrics/sampler.rb:68:in `each' + from /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/metrics/sampler.rb:69:in `block in sample_objects' + from /opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/metrics/sampler.rb:69:in `name' + ``` 1. To see the current threads, run: - ``` - thread apply all bt - ``` + ``` + thread apply all bt + ``` 1. Once you're done debugging with `gdb`, be sure to detach from the process and exit: - ``` - detach - exit - ``` + ``` + detach + exit + ``` Note that if the unicorn process terminates before you are able to run these commands, gdb will report an error. To buy more time, you can always raise the @@ -162,21 +162,21 @@ separate Rails process to debug the issue: 1. Create a Personal Access Token for your user (Profile Settings -> Access Tokens). 1. Bring up the GitLab Rails console. For omnibus users, run: - ``` - sudo gitlab-rails console - ``` + ``` + sudo gitlab-rails console + ``` 1. At the Rails console, run: - ```ruby - [1] pry(main)> app.get '<URL FROM STEP 2>/?private_token=<TOKEN FROM STEP 3>' - ``` + ```ruby + [1] pry(main)> app.get '<URL FROM STEP 2>/?private_token=<TOKEN FROM STEP 3>' + ``` - For example: + For example: - ```ruby - [1] pry(main)> app.get 'https://gitlab.com/gitlab-org/gitlab-ce/issues/1?private_token=123456' - ``` + ```ruby + [1] pry(main)> app.get 'https://gitlab.com/gitlab-org/gitlab-ce/issues/1?private_token=123456' + ``` 1. In a new window, run `top`. It should show this ruby process using 100% CPU. Write down the PID. 1. Follow step 2 from the previous section on using gdb. @@ -209,7 +209,7 @@ ps auwx | grep unicorn | awk '{ print " -p " $2}' | xargs strace -tt -T -f -s 10 The output in `/tmp/unicorn.txt` may help diagnose the root cause. -# More information +## More information - [Debugging Stuck Ruby Processes](https://blog.newrelic.com/2013/04/29/debugging-stuck-ruby-processes-what-to-do-before-you-kill-9/) - [Cheatsheet of using gdb and ruby processes](gdb-stuck-ruby.txt) diff --git a/doc/administration/troubleshooting/diagnostics_tools.md b/doc/administration/troubleshooting/diagnostics_tools.md new file mode 100644 index 00000000000..ab3b25f0e97 --- /dev/null +++ b/doc/administration/troubleshooting/diagnostics_tools.md @@ -0,0 +1,27 @@ +--- +type: reference +--- + +# Diagnostics tools + +These are some of the diagnostics tools the GitLab Support team uses during troubleshooting. +They are listed here for transparency, and they may be useful for users with experience +with troubleshooting GitLab. If you are currently having an issue with GitLab, you +may want to check your [support options](https://about.gitlab.com/support/) first, +before attempting to use these tools. + +## gitlabsos + +The [gitlabsos](https://gitlab.com/gitlab-com/support/toolbox/gitlabsos/) utility +provides a unified method of gathering info and logs from GitLab and the system it's +running on. + +## strace-parser + +[strace-parser](https://gitlab.com/wchandler/strace-parser) is a small tool to analyze +and summarize raw strace data. + +## Pritaly + +[Pritaly](https://gitlab.com/wchandler/pritaly) takes Gitaly logs and colorizes output +or converts the logs to JSON. diff --git a/doc/administration/troubleshooting/elasticsearch.md b/doc/administration/troubleshooting/elasticsearch.md new file mode 100644 index 00000000000..c4a7ba01fae --- /dev/null +++ b/doc/administration/troubleshooting/elasticsearch.md @@ -0,0 +1,345 @@ +# Troubleshooting ElasticSearch + +Troubleshooting ElasticSearch requires: + +- Knowledge of common terms. +- Establishing within which category the problem fits. + +## Common terminology + +- **Lucene**: A full-text search library written in Java. +- **Near Realtime (NRT)**: Refers to the slight latency from the time to index a + document to the time when it becomes searchable. +- **Cluster**: A collection of one or more nodes that work together to hold all + the data, providing indexing and search capabilities. +- **Node**: A single server that works as part of a cluster. +- **Index**: A collection of documents that have somewhat similar characteristics. +- **Document**: A basic unit of information that can be indexed. +- **Shards**: Fully-functional and independent subdivisions of indices. Each shard is actually + a Lucene index. +- **Replicas**: Failover mechanisms that duplicate indices. + +## Troubleshooting workflows + +The type of problem will determine what steps to take. The possible troubleshooting workflows are for: + +- Search results. +- Indexing. +- Integration. +- Performance. + +### Search Results workflow + +The following workflow is for ElasticSearch search results issues: + +```mermaid +graph TD; + B --> |No| B1 + B --> |Yes| B4 + B1 --> B2 + B2 --> B3 + B4 --> B5 + B5 --> |Yes| B6 + B5 --> |No| B7 + B7 --> B8 + B{Is GitLab using<br>ElasticSearch for<br>searching?} + B1[Check Admin Area > Integrations<br>to ensure the settings are correct] + B2[Perform a search via<br>the rails console] + B3[If all settings are correct<br>and it still doesn't show ElasticSearch<br>doing the searches, escalate<br>to GitLab support.] + B4[Perform<br>the same search via the<br>ElasticSearch API] + B5{Are the results<br>the same?} + B6[This means it is working as intended.<br>Speak with GitLab support<br>to confirm if the issue lies with<br>the filters.] + B7[Check the index status of the project<br>containing the missing search<br>results.] + B8(Indexing Troubleshooting) +``` + +### Indexing workflow + +The following workflow is for ElasticSearch indexing issues: + +```mermaid +graph TD; + C --> |Yes| C1 + C1 --> |Yes| C2 + C1 --> |No| C3 + C3 --> |Yes| C4 + C3 --> |No| C5 + C --> |No| C6 + C6 --> |No| C10 + C7 --> |GitLab| C8 + C7 --> |ElasticSearch| C9 + C6 --> |Yes| C7 + C10 --> |No| C12 + C10 --> |Yes| C11 + C12 --> |Yes| C13 + C12 --> |No| C14 + C14 --> |Yes| C15 + C14 --> |No| C16 + C{Is the problem with<br>creating an empty<br>index?} + C1{Does the gitlab-production<br>index exist on the<br>ElasticSearch instance?} + C2(Try to manually<br>delete the index on the<br>ElasticSearch instance and<br>retry creating an empty index.) + C3{Can indices be made<br>manually on the ElasticSearch<br>instance?} + C4(Retry the creation of an empty index) + C5(It is best to speak with an<br>ElasticSearch admin concerning the<br>instance's inability to create indices.) + C6{Is the indexer presenting<br>errors during indexing?} + C7{Is the error a GitLab<br>error or an ElasticSearch<br>error?} + C8[Escalate to<br>GitLab support] + C9[You will want<br>to speak with an<br>ElasticSearch admin.] + C10{Does the index status<br>show 100%?} + C11[Escalate to<br>GitLab support] + C12{Does re-indexing the project<br> present any GitLab errors?} + C13[Rectify the GitLab errors and<br>restart troubleshooting, or<br>escalate to GitLab support.] + C14{Does re-indexing the project<br>present errors on the <br>ElasticSearch instance?} + C15[It would be best<br>to speak with an<br>ElasticSearch admin.] + C16[This is likely a bug/issue<br>in GitLab and will require<br>deeper investigation. Escalate<br>to GitLab support.] +``` + +### Integration workflow + +The following workflow is for ElasticSearch integration issues: + +```mermaid +graph TD; + D --> |No| D1 + D --> |Yes| D2 + D2 --> |No| D3 + D2 --> |Yes| D4 + D4 --> |No| D5 + D4 --> |Yes| D6 + D{Is the error concerning<br>the beta indexer?} + D1[It would be best<br>to speak with an<br>ElasticSearch admin.] + D2{Is the ICU development<br>package installed?} + D3>This package is required.<br>Install the package<br>and retry.] + D4{Is the error stemming<br>from the indexer?} + D5[This would indicate an OS level<br> issue. It would be best to<br>contact your sysadmin.] + D6[This is likely a bug/issue<br>in GitLab and will require<br>deeper investigation. Escalate<br>to GitLab support.] +``` + +### Performance workflow + +The following workflow is for ElasticSearch performance issues: + +```mermaid +graph TD; + F --> |Yes| F1 + F --> |No| F2 + F2 --> |No| F3 + F2 --> |Yes| F4 + F4 --> F5 + F5 --> |No| F6 + F5 --> |Yes| F7 + F{Is the ElasticSearch instance<br>running on the same server<br>as the GitLab instance?} + F1(This is not advised and will cause issues.<br>We recommend moving the ElasticSearch<br>instance to a different server.) + F2{Does the ElasticSearch<br>server have at least 8<br>GB of RAM and 2 CPU<br>cores?} + F3(According to ElasticSearch, a non-prod<br>server needs these as a base requirement.<br>Production often requires more. We recommend<br>you increase the server specifications.) + F4(Obtain the <br>cluster health information) + F5(Does it show the<br>status as green?) + F6(We recommend you speak with<br>an ElasticSearch admin<br>about implementing sharding.) + F7(Escalate to<br>GitLab support.) +``` + +## Troubleshooting walkthrough + +Most ElasticSearch troubleshooting can be broken down into 4 categories: + +- [Troubleshooting search results](#troubleshooting-search-results) +- [Troubleshooting indexing](#troubleshooting-indexing) +- [Troubleshooting integration](#troubleshooting-integration) +- [Troubleshooting performance](#troubleshooting-performance) + +Generally speaking, if it does not fall into those four categories, it is either: + +- Something GitLab support needs to look into. +- Not a true ElasticSearch issue. + +Exercise caution. Issues that appear to be ElasticSearch problems can be OS-level issues. + +### Troubleshooting search results + +Troubleshooting search result issues is rather straight forward on ElasticSearch. + +The first step is to confirm GitLab is using ElasticSearch for the search function. +To do this: + +1. Confirm the integration is enabled in **Admin Area > Settings > Integrations**. +1. Confirm searches utilize ElasticSearch by accessing the rails console + (`sudo gitlab-rails console`) and running the following commands: + + ```rails + u = User.find_by_email('email_of_user_doing_search') + s = SearchService.new(u, {:search => 'search_term'}) + pp s.search_objects.class.name + ``` + +The ouput from the last command is the key here. If it shows: + +- `ActiveRecord::Relation`, **it is not** using ElasticSearch. +- `Kaminari::PaginatableArray`, **it is** using ElasticSearch. + +| Not using ElasticSearch | Using ElasticSearch | +|--------------------------|------------------------------| +| `ActiveRecord::Relation` | `Kaminari::PaginatableArray` | + +If all the settings look correct and it is still not using ElasticSearch for the search function, it is best to escalate to GitLab support. This could be a bug/issue. + +Moving past that, it is best to attempt the same search using the [ElasticSearch Search API](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-search.html) and compare the results from what you see in GitLab. + +If the results: + +- Sync up, then there is not a technical "issue" per se. Instead, it might be a problem + with the ElasticSearch filters we are using. This can be complicated, so it is best to + escalate to GitLab support to check these and guide you on the potential on whether or + not a feature request is needed. +- Do not match up, this indicates a problem with the documents generated from the + project. It is best to re-index that project and proceed with + [Troubleshooting indexing](#troubleshooting-indexing). + +### Troubleshooting indexing + +Troubleshooting indexing issues can be tricky. It can pretty quickly go to either GitLab +support or your ElasticSearch admin. + +The best place to start is to determine if the issue is with creating an empty index. +If it is, check on the ElasticSearch side to determine if the `gitlab-production` (the +name for the GitLab index) exists. If it exists, manually delete it on the ElasticSearch +side and attempt to recreate it from the +[`create_empty_index`](../../integration/elasticsearch.md#gitlab-elasticsearch-rake-tasks) +rake task. + +If you still encounter issues, try creating an index manually on the ElasticSearch +instance. The details of the index aren't important here, as we want to test if indices +can be made. If the indices: + +- Cannot be made, speak with your ElasticSearch admin. +- Can be made, Escalate this to GitLab support. + +If the issue is not with creating an empty index, the next step is to check for errors +during the indexing of projects. If errors do occur, they will either stem from the indexing: + +- On the GitLab side. You need to rectify those. If they are not + something you are familiar with, contact GitLab support for guidance. +- Within the ElasticSearch instance itself. See if the error is [documented and has a fix](../../integration/elasticsearch.md#troubleshooting). If not, speak with your ElasticSearch admin. + +If the indexing process does not present errors, you will want to check the status of the indexed projects. You can do this via the following rake tasks: + +- [`sudo gitlab-rake gitlab:elastic:index_projects_status`](../../integration/elasticsearch.md#gitlab-elasticsearch-rake-tasks) (shows the overall status) +- [`sudo gitlab-rake gitlab:elastic:projects_not_indexed`](../../integration/elasticsearch.md#gitlab-elasticsearch-rake-tasks) (shows specific projects that are not indexed) + +If: + +- Everything is showing at 100%, escalate to GitLab support. This could be a potential + bug/issue. +- You do see something not at 100%, attempt to reindex that project. To do this, + run `sudo gitlab-rake gitlab:elastic:index_projects ID_FROM=<project ID> ID_TO=<project ID>`. + +If reindexing the project shows: + +- Errors on the GitLab side, escalate those to GitLab support. +- ElasticSearch errors or doesn't present any errors at all, reach out to your + ElasticSearch admin to check the instance. + +### Troubleshooting integration + +Troubleshooting integration tends to be pretty straight forward, as there really isn't +much to "integrate" here. + +If the issue is: + +- Not concerning the beta indexer, it is almost always an + ElasticSearch-side issue. This means you should reach out to your ElasticSearch admin + regarding the error(s) you are seeing. If you are unsure here, it never hurts to reach + out to GitLab support. +- With the beta indexer, check if the ICU development package is installed. + This is a required package so make sure you install it. + +Beyond that, you will want to review the error. If it is: + +- Specifically from the indexer, this could be a bug/issue and should be escalated to + GitLab support. +- An OS issue, you will want to reach out to your systems administrator. + +### Troubleshooting performance + +Troubleshooting performance can be difficult on ElasticSearch. There is a ton of tuning +that *can* be done, but the majority of this falls on shoulders of a skilled +ElasticSearch administrator. + +Generally speaking, ensure: + +* The ElasticSearch server **is not** running on the same node as GitLab. +* The ElasticSearch server have enough RAM and CPU cores. +* That sharding **is** being used. + +Going into some more detail here, if ElasticSearch is running on the same server as GitLab, resource contention is **very** likely to occur. Ideally, ElasticSearch, which requires ample resources, should be running on its own server (maybe coupled with logstash and kibana). + +When it comes to ElasticSearch, RAM is the key resource. ElasticSearch themselves recommend: + +- **At least** 8 GB of RAM for a non-production instance. +- **At least** 16 GB of RAM for a production instance. +- Ideally, 64 GB of RAM. + +For CPU, ElasticSearch recommends at least 2 CPU cores, but ElasticSearch states common +setups use up to 8 cores. For more details on server specs, check out +[ElasticSearch's hardware guide](https://www.elastic.co/guide/en/elasticsearch/guide/current/hardware.html). + +Beyond the obvious, sharding comes into play. Sharding is a core part of ElasticSearch. +It allows for horizontal scaling of indices, which is helpful when you are dealing with +a large amount of data. + +With the way GitLab does indexing, there is a **huge** amount of documents being +indexed. By utilizing sharding, you can speed up ElasticSearch's ability to locate +data, since each shard is a Lucene index. + +If you are not using sharding, you are likely to hit issues when you start using +ElasticSearch in a production environment. + +Keep in mind that an index with only one shard has **no scale factor** and will +likely encounter issues when called upon with some frequency. + +If you need to know how many shards, read +[ElasticSearch's documentation on capacity planning](https://www.elastic.co/guide/en/elasticsearch/guide/2.x/capacity-planning.html), +as the answer is not straight forward. + +The easiest way to determine if sharding is in use is to check the output of the +[ElasticSearch Health API](https://www.elastic.co/guide/en/elasticsearch/reference/current/cluster-health.html): + +- Red means the cluster is down. +- Yellow means it is up with no sharding/replication. +- Green means it is healthy (up, sharding, replicating). + +For production use, it should always be green. + +Beyond these steps, you get into some of the more complicated things to check, +such as merges and caching. These can get complicated and it takes some time to +learn them, so it is best to escalate/pair with an ElasticSearch expert if you need to +dig further into these. + +Feel free to reach out to GitLab support, but this is likely to be something a skilled +ElasticSearch admin has more experience with. + +## Common issues + +All common issues [should be documented](../../integration/elasticsearch.md#troubleshooting). If not, +feel free to update that page with issues you encounter and solutions. + +## Replication + +Setting up ElasticSearch isn't too bad, but it can be a bit finnicky and time consuming. + +The eastiest method is to spin up a docker container with the required version and +bind ports 9200/9300 so it can be used. + +The following is an example of running a docker container of ElasticSearch v7.2.0: + +```bash +docker pull docker.elastic.co/elasticsearch/elasticsearch:7.2.0 +docker run -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" docker.elastic.co/elasticsearch/elasticsearch:7.2.0 +``` + +From here, you can: + +- Grab the IP of the docker container (use `docker inspect <container_id>`) +- Use `<IP.add.re.ss:9200>` to communicate with it. + +This is a quick method to test out ElasticSearch, but by no means is this a +production solution. diff --git a/doc/administration/troubleshooting/kubernetes_cheat_sheet.md b/doc/administration/troubleshooting/kubernetes_cheat_sheet.md new file mode 100644 index 00000000000..260af333e8e --- /dev/null +++ b/doc/administration/troubleshooting/kubernetes_cheat_sheet.md @@ -0,0 +1,261 @@ +--- +type: reference +--- + +# Kubernetes, GitLab and You + +This is a list of useful information regarding Kubernetes that the GitLab Support +Team sometimes uses while troubleshooting. GitLab is making this public, so that anyone +can make use of the Support team's collected knowledge + +CAUTION: **Caution:** +These commands **can alter or break** your Kubernetes components so use these at your own risk. + +If you are on a [paid tier](https://about.gitlab.com/pricing/) and are not sure how +to use these commands, it is best to [contact Support](https://about.gitlab.com/support/) +and they will assist you with any issues you are having. + +## Generic kubernetes commands + +- How to authorize to your GCP project (can be especially useful if you have projects + under different GCP accounts): + + ```bash + gcloud auth login + ``` + +- How to access Kubernetes dashboard: + + ```bash + # for minikube: + minikube dashboard —url + # for non-local installations if access via Kubectl is configured: + kubectl proxy + ``` + +- How to ssh to a Kubernetes node and enter the container as root + <https://github.com/kubernetes/kubernetes/issues/30656>: + + - For GCP, you may find the node name and run `gcloud compute ssh node-name`. + - List containers using `docker ps`. + - Enter container using `docker exec --user root -ti container-id bash`. + +- How to copy a file from local machine to a pod: + + ```bash + kubectl cp file-name pod-name:./destination-path + ``` + +- What to do with pods in `CrashLoopBackoff` status: + + - Check logs via Kubernetes dashboard. + - Check logs via Kubectl: + + ```bash + kubectl logs <unicorn pod> -c dependencies + ``` + +- How to tail all Kubernetes cluster events in real time: + + ```bash + kubectl get events -w --all-namespaces + ``` + +- How to get logs of the previously terminated pod instance: + + ```bash + kubectl logs <pod-name> --previous + ``` + + NOTE: **Note:** + No logs are kept in the containers/pods themselves, everything is written to stdout. + This is the principle of Kubernetes, read [Twelve-factor app](https://12factor.net/) + for details. + +## GitLab-specific kubernetes information + +- Minimal config that can be used to test a Kubernetes helm chart can be found + [here](https://gitlab.com/charts/gitlab/issues/620). + +- Tailing logs of a separate pod. An example for a unicorn pod: + + ```bash + kubectl logs gitlab-unicorn-7656fdd6bf-jqzfs -c unicorn + ``` + +- Tail and follow all pods that share a label (in this case, `unicorn`): + + ```bash + # all containers in the unicorn pods + kubectl logs -f -l app=unicorn --all-containers=true --max-log-requests=50 + + # only the unicorn containers in all unicorn pods + kubectl logs -f -l app=unicorn -c unicorn --max-log-requests=50 + ``` + +- One can stream logs from all containers at once, similar to the Omnibus + command `gitlab-ctl tail`: + + ```bash + kubectl logs -f -l release=gitlab --all-containers=true --max-log-requests=100 + ``` + +- Check all events in the `gitlab` namespace (the namespace name can be different if you + specified a different one when deploying the helm chart): + + ```bash + kubectl get events -w --namespace=gitlab + ``` + +- Most of the useful GitLab tools (console, rake tasks, etc) are found in the task-runner + pod. You may enter it and run commands inside or run them from the outside: + + ```bash + # find the pod + kubectl get pods | grep task-runner + + # enter it + kubectl exec -it <task-runner-pod-name> bash + + # open rails console + # rails console can be also called from other GitLab pods + /srv/gitlab/bin/rails console + + # source-style commands should also work + /srv/gitlab && bundle exec rake gitlab:check RAILS_ENV=production + + # run GitLab check. Note that the output can be confusing and invalid because of the specific structure of GitLab installed via helm chart + /usr/local/bin/gitlab-rake gitlab:check + + # open console without entering pod + kubectl exec -it <task-runner-pod-name> /srv/gitlab/bin/rails console + + # check the status of DB migrations + kubectl exec -it <task-runner-pod-name> /usr/local/bin/gitlab-rake db:migrate:status + ``` + + You can also use `gitlab-rake`, instead of `/usr/local/bin/gitlab-rake`. + +- Troubleshooting **Operations > Kubernetes** integration: + + - Check the output of `kubectl get events -w --all-namespaces`. + - Check the logs of pods within `gitlab-managed-apps` namespace. + - On the side of GitLab check sidekiq log and kubernetes log. When GitLab is installed + via Helm Chart, `kubernetes.log` can be found inside the sidekiq pod. + +- How to get your initial admin password <https://docs.gitlab.com/charts/installation/deployment.html#initial-login>: + + ```bash + # find the name of the secret containing the password + kubectl get secrets | grep initial-root + # decode it + kubectl get secret <secret-name> -ojsonpath={.data.password} | base64 --decode ; echo + ``` + +- How to connect to a GitLab Postgres database: + + ```bash + kubectl exec -it <task-runner-pod-name> -- /srv/gitlab/bin/rails dbconsole -p + ``` + +- How to get info about Helm installation status: + + ```bash + helm status name-of-installation + ``` + +- How to update GitLab installed using Helm Chart: + + ```bash + helm repo upgrade + + # get current values and redirect them to yaml file (analogue of gitlab.rb values) + helm get values <release name> > gitlab.yaml + + # run upgrade itself + helm upgrade <release name> <chart path> -f gitlab.yaml + ``` + + After <https://canary.gitlab.com/charts/gitlab/issues/780> is fixed, it should + be possible to use [Updating GitLab using the Helm Chart](https://docs.gitlab.com/ee/install/kubernetes/gitlab_chart.html#updating-gitlab-using-the-helm-chart) + for upgrades. + +- How to apply changes to GitLab config: + + - Modify the `gitlab.yaml` file. + - Run the following command to apply changes: + + ```bash + helm upgrade <release name> <chart path> -f gitlab.yaml + ``` + +## Installation of minimal GitLab config via Minukube on macOS + +This section is based on [Developing for Kubernetes with Minikube](https://gitlab.com/charts/gitlab/blob/master/doc/minikube/index.md) +and [Helm](https://gitlab.com/charts/gitlab/blob/master/doc/helm/index.md). Refer +to those documents for details. + +- Install Kubectl via Homebrew: + + ```bash + brew install kubernetes-cli + ``` + +- Install Minikube via Homebrew: + + ```bash + brew cask install minikube + ``` + +- Start Minikube and configure it. If Minikube cannot start, try running `minikube delete && minikube start` + and repeat the steps: + + ```bash + minikube start --cpus 3 --memory 8192 # minimum amount for GitLab to work + minikube addons enable ingress + minikube addons enable kube-dns + ``` + +- Install Helm via Homebrew and initialize it: + + ```bash + brew install kubernetes-helm + helm init --service-account tiller + ``` + +- Copy the file <https://gitlab.com/charts/gitlab/raw/master/examples/values-minikube-minimum.yaml> + to your workstation. + +- Find the IP address in the output of `minikube ip` and update the yaml file with + this IP address. + +- Install the GitLab Helm Chart: + + ```bash + helm repo add gitlab https://charts.gitlab.io + helm install --name gitlab -f <path-to-yaml-file> gitlab/gitlab + ``` + + If you want to modify some GitLab settings, you can use the above-mentioned config + as a base and create your own yaml file. + +- Monitor the installation progress via `helm status gitlab` and `minikube dashboard`. + The installation could take up to 20-30 minutes depending on the amount of resources + on your workstation. + +- When all the pods show either a `Running` or `Completed` status, get the GitLab password as + described in [Initial login](https://docs.gitlab.com/ee/install/kubernetes/gitlab_chart.html#initial-login), + and log in to GitLab via the UI. It will be accessible via `https://gitlab.domain` + where `domain` is the value provided in the yaml file. + +<!-- ## Troubleshooting + +Include any troubleshooting steps that you can foresee. If you know beforehand what issues +one might have when setting this up, or when something is changed, or on upgrading, it's +important to describe those, too. Think of things that may go wrong and include them here. +This is important to minimize requests for support, and to avoid doc comments with +questions that you know someone might ask. + +Each scenario can be a third-level heading, e.g. `### Getting error message X`. +If you have none to add when creating a doc, leave this section in place +but commented out to help encourage others to add to it in the future. --> diff --git a/doc/administration/troubleshooting/sidekiq.md b/doc/administration/troubleshooting/sidekiq.md index 7067958ecb4..9b016c64e29 100644 --- a/doc/administration/troubleshooting/sidekiq.md +++ b/doc/administration/troubleshooting/sidekiq.md @@ -169,3 +169,121 @@ The PostgreSQL wiki has details on the query you can run to see blocking queries. The query is different based on PostgreSQL version. See [Lock Monitoring](https://wiki.postgresql.org/wiki/Lock_Monitoring) for the query details. + +## Managing Sidekiq queues + +It is possible to use [Sidekiq API](https://github.com/mperham/sidekiq/wiki/API) +to perform a number of troubleshoting on Sidekiq. + +These are the administrative commands and it should only be used if currently +admin interface is not suitable due to scale of installation. + +All this commands should be run using `gitlab-rails console`. + +### View the queue size + +```ruby +Sidekiq::Queue.new("pipeline_processing:build_queue").size +``` + +### Enumerate all enqueued jobs + +```ruby +queue = Sidekiq::Queue.new("chaos:chaos_sleep") +queue.each do |job| + # job.klass # => 'MyWorker' + # job.args # => [1, 2, 3] + # job.jid # => jid + # job.queue # => chaos:chaos_sleep + # job["retry"] # => 3 + # job.item # => { + # "class"=>"Chaos::SleepWorker", + # "args"=>[1000], + # "retry"=>3, + # "queue"=>"chaos:chaos_sleep", + # "backtrace"=>true, + # "queue_namespace"=>"chaos", + # "jid"=>"39bc482b823cceaf07213523", + # "created_at"=>1566317076.266069, + # "correlation_id"=>"c323b832-a857-4858-b695-672de6f0e1af", + # "enqueued_at"=>1566317076.26761}, + # } + + # job.delete if job.jid == 'abcdef1234567890' +end +``` + +### Enumerate currently running jobs + +```ruby +workers = Sidekiq::Workers.new +workers.each do |process_id, thread_id, work| + # process_id is a unique identifier per Sidekiq process + # thread_id is a unique identifier per thread + # work is a Hash which looks like: + # {"queue"=>"chaos:chaos_sleep", + # "payload"=> + # { "class"=>"Chaos::SleepWorker", + # "args"=>[1000], + # "retry"=>3, + # "queue"=>"chaos:chaos_sleep", + # "backtrace"=>true, + # "queue_namespace"=>"chaos", + # "jid"=>"b2a31e3eac7b1a99ff235869", + # "created_at"=>1566316974.9215662, + # "correlation_id"=>"e484fb26-7576-45f9-bf21-b99389e1c53c", + # "enqueued_at"=>1566316974.9229589}, + # "run_at"=>1566316974}], +end +``` + +### Remove sidekiq jobs for given parameters (destructive) + +```ruby +# for jobs like this: +# RepositoryImportWorker.new.perform_async(100) +id_list = [100] + +queue = Sidekiq::Queue.new('repository_import') +queue.each do |job| + job.delete if id_list.include?(job.args[0]) +end +``` + +### Remove specific job ID (destructive) + +```ruby +queue = Sidekiq::Queue.new('repository_import') +queue.each do |job| + job.delete if job.jid == 'my-job-id' +end +``` + +## Canceling running jobs (destructive) + +> Introduced in GitLab 12.3. + +This is highly risky operation and use it as last resort. +Doing that might result in data corruption, as the job +is interrupted mid-execution and it is not guaranteed +that proper rollback of transactions is implemented. + +```ruby +Gitlab::SidekiqMonitor.cancel_job('job-id') +``` + +> This requires the Sidekiq to be run with `SIDEKIQ_MONITOR_WORKER=1` +> environment variable. + +To perform of the interrupt we use `Thread.raise` which +has number of drawbacks, as mentioned in [Why Ruby’s Timeout is dangerous (and Thread.raise is terrifying)](https://jvns.ca/blog/2015/11/27/why-rubys-timeout-is-dangerous-and-thread-dot-raise-is-terrifying/): + +> This is where the implications get interesting, and terrifying. This means that an exception can get raised: +> +> * during a network request (ok, as long as the surrounding code is prepared to catch Timeout::Error) +> * during the cleanup for the network request +> * during a rescue block +> * while creating an object to save to the database afterwards +> * in any of your code, regardless of whether it could have possibly raised an exception before +> +> Nobody writes code to defend against an exception being raised on literally any line. That’s not even possible. So Thread.raise is basically like a sneak attack on your code that could result in almost anything. It would probably be okay if it were pure-functional code that did not modify any state. But this is Ruby, so that’s unlikely :) |