Add latest changes from gitlab-org/gitlab@13-3-stable-ee

author: GitLab Bot <gitlab-bot@gitlab.com> 2020-08-20 18:42:06 +0000
committer: GitLab Bot <gitlab-bot@gitlab.com> 2020-08-20 18:42:06 +0000
commit: 6e4e1050d9dba2b7b2523fdd1768823ab85feef4 (patch)
tree: 78be5963ec075d80116a932011d695dd33910b4e /doc/administration/consul.md
parent: 1ce776de4ae122aba3f349c02c17cebeaa8ecf07 (diff)
download: gitlab-ce-6e4e1050d9dba2b7b2523fdd1768823ab85feef4.tar.gz
1 files changed, 239 insertions, 0 deletions
diff --git a/doc/administration/consul.md b/doc/administration/consul.md
new file mode 100644
index 00000000000..ae22d8bd4d0
--- /dev/null
+++ b/doc/administration/consul.md
@@ -0,0 +1,239 @@
+---
+type: reference
+---
+
+# How to set up Consul **(PREMIUM ONLY)**
+
+A Consul cluster consists of both
+[server and client agents](https://www.consul.io/docs/agent).
+The servers run on their own nodes and the clients run on other nodes that in
+turn communicate with the servers.
+
+GitLab Premium includes a bundled version of [Consul](https://www.consul.io/)
+a service networking solution that you can manage by using `/etc/gitlab/gitlab.rb`.
+
+## Configure the Consul nodes
+
+NOTE: **Important:**
+Before proceeding, refer to the
+[available reference architectures](reference_architectures/index.md#available-reference-architectures)
+to find out how many Consul server nodes you should have.
+
+On **each** Consul server node perform the following:
+
+1. Follow the instructions to [install](https://about.gitlab.com/install/)
+   GitLab by choosing your preferred platform, but do not supply the
+   `EXTERNAL_URL` value when asked.
+1. Edit `/etc/gitlab/gitlab.rb`, and add the following by replacing the values
+   noted in the `retry_join` section. In the example below, there are three
+   nodes, two denoted with their IP, and one with its FQDN, you can use either
+   notation:
+
+   ```ruby
+   # Disable all components except Consul
+   roles ['consul_role']
+
+   # Consul nodes: can be FQDN or IP, separated by a whitespace
+   consul['configuration'] = {
+     server: true,
+     retry_join: %w(10.10.10.1 consul1.gitlab.example.com 10.10.10.2)
+   }
+
+   # Disable auto migrations
+   gitlab_rails['auto_migrate'] = false
+   ```
+
+1. [Reconfigure GitLab](restart_gitlab.md#omnibus-gitlab-reconfigure) for the changes
+   to take effect.
+1. Run the following command to ensure Consul is both configured correctly and
+   to verify that all server nodes are communicating:
+
+   ```shell
+   sudo /opt/gitlab/embedded/bin/consul members
+   ```
+
+   The output should be similar to:
+
+   ```plaintext
+   Node                 Address               Status  Type    Build  Protocol  DC
+   CONSUL_NODE_ONE      XXX.XXX.XXX.YYY:8301  alive   server  0.9.2  2         gitlab_consul
+   CONSUL_NODE_TWO      XXX.XXX.XXX.YYY:8301  alive   server  0.9.2  2         gitlab_consul
+   CONSUL_NODE_THREE    XXX.XXX.XXX.YYY:8301  alive   server  0.9.2  2         gitlab_consul
+   ```
+
+   If the results display any nodes with a status that isn't `alive`, or if any
+   of the three nodes are missing, see the [Troubleshooting section](#troubleshooting-consul).
+
+## Upgrade the Consul nodes
+
+To upgrade your Consul nodes, upgrade the GitLab package.
+
+Nodes should be:
+
+- Members of a healthy cluster prior to upgrading the Omnibus GitLab package.
+- Upgraded one node at a time.
+
+Identify any existing health issues in the cluster by running the following command
+within each node. The command will return an empty array if the cluster is healthy:
+
+```shell
+curl http://127.0.0.1:8500/v1/health/state/critical
+```
+
+Consul nodes communicate using the raft protocol. If the current leader goes
+offline, there needs to be a leader election. A leader node must exist to facilitate
+synchronization across the cluster. If too many nodes go offline at the same time,
+the cluster will lose quorum and not elect a leader due to
+[broken consensus](https://www.consul.io/docs/internals/consensus.html).
+
+Consult the [troubleshooting section](#troubleshooting-consul) if the cluster is not
+able to recover after the upgrade. The [outage recovery](#outage-recovery) may
+be of particular interest.
+
+NOTE: **Note:**
+GitLab uses Consul to store only transient data that is easily regenerated. If
+the bundled Consul was not used by any process other than GitLab itself, then
+[rebuilding the cluster from scratch](#recreate-from-scratch) is fine.
+
+## Troubleshooting Consul
+
+Below are some useful operations should you need to debug any issues.
+You can see any error logs by running:
+
+```shell
+sudo gitlab-ctl tail consul
+```
+
+### Check the cluster membership
+
+To determine which nodes are part of the cluster, run the following on any member in the cluster:
+
+```shell
+sudo /opt/gitlab/embedded/bin/consul members
+```
+
+The output should be similar to:
+
+```plaintext
+Node            Address               Status  Type    Build  Protocol  DC
+consul-b        XX.XX.X.Y:8301        alive   server  0.9.0  2         gitlab_consul
+consul-c        XX.XX.X.Y:8301        alive   server  0.9.0  2         gitlab_consul
+consul-c        XX.XX.X.Y:8301        alive   server  0.9.0  2         gitlab_consul
+db-a            XX.XX.X.Y:8301        alive   client  0.9.0  2         gitlab_consul
+db-b            XX.XX.X.Y:8301        alive   client  0.9.0  2         gitlab_consul
+```
+
+Ideally all nodes will have a `Status` of `alive`.
+
+### Restart Consul
+
+If it is necessary to restart Consul, it is important to do this in
+a controlled manner to maintain quorum. If quorum is lost, to recover the cluster,
+you will need to follow the Consul [outage recovery](#outage-recovery) process.
+
+To be safe, it's recommended that you only restart Consul in one node at a time to
+ensure the cluster remains intact. For larger clusters, it is possible to restart
+multiple nodes at a time. See the
+[Consul consensus document](https://www.consul.io/docs/internals/consensus.html#deployment-table)
+for how many failures it can tolerate. This will be the number of simultaneous
+restarts it can sustain.
+
+To restart Consul:
+
+```shell
+sudo gitlab-ctl restart consul
+```
+
+### Consul nodes unable to communicate
+
+By default, Consul will attempt to
+[bind](https://www.consul.io/docs/agent/options.html#_bind) to `0.0.0.0`, but
+it will advertise the first private IP address on the node for other Consul nodes
+to communicate with it. If the other nodes cannot communicate with a node on
+this address, then the cluster will have a failed status.
+
+If you are running into this issue, you will see messages like the following in `gitlab-ctl tail consul` output:
+
+```plaintext
+2017-09-25_19:53:39.90821     2017/09/25 19:53:39 [WARN] raft: no known peers, aborting election
+2017-09-25_19:53:41.74356     2017/09/25 19:53:41 [ERR] agent: failed to sync remote state: No cluster leader
+```
+
+To fix this:
+
+1. Pick an address on each node that all of the other nodes can reach this node through.
+1. Update your `/etc/gitlab/gitlab.rb`
+
+   ```ruby
+   consul['configuration'] = {
+     ...
+     bind_addr: 'IP ADDRESS'
+   }
+   ```
+
+1. Reconfigure GitLab;
+
+   ```shell
+   gitlab-ctl reconfigure
+   ```
+
+If you still see the errors, you may have to
+[erase the Consul database and reinitialize](#recreate-from-scratch) on the affected node.
+
+### Consul does not start - multiple private IPs
+
+In case that a node has multiple private IPs, Consul will be confused as to
+which of the private addresses to advertise, and then immediately exit on start.
+
+You will see messages like the following in `gitlab-ctl tail consul` output:
+
+```plaintext
+2017-11-09_17:41:45.52876 ==> Starting Consul agent...
+2017-11-09_17:41:45.53057 ==> Error creating agent: Failed to get advertise address: Multiple private IPs found. Please configure one.
+```
+
+To fix this:
+
+1. Pick an address on the node that all of the other nodes can reach this node through.
+1. Update your `/etc/gitlab/gitlab.rb`
+
+   ```ruby
+   consul['configuration'] = {
+     ...
+     bind_addr: 'IP ADDRESS'
+   }
+   ```
+
+1. Reconfigure GitLab;
+
+   ```shell
+   gitlab-ctl reconfigure
+   ```
+
+### Outage recovery
+
+If you lost enough Consul nodes in the cluster to break quorum, then the cluster
+is considered failed, and it will not function without manual intervention.
+In that case, you can either recreate the nodes from scratch or attempt a
+recover.
+
+#### Recreate from scratch
+
+By default, GitLab does not store anything in the Consul node that cannot be
+recreated. To erase the Consul database and reinitialize:
+
+```shell
+sudo gitlab-ctl stop consul
+sudo rm -rf /var/opt/gitlab/consul/data
+sudo gitlab-ctl start consul
+```
+
+After this, the node should start back up, and the rest of the server agents rejoin.
+Shortly after that, the client agents should rejoin as well.
+
+#### Recover a failed node
+
+If you have taken advantage of Consul to store other data and want to restore
+the failed node, follow the
+[Consul guide](https://learn.hashicorp.com/tutorials/consul/recovery-outage)
+to recover a failed cluster.
author	GitLab Bot <gitlab-bot@gitlab.com>	2020-08-20 18:42:06 +0000
committer	GitLab Bot <gitlab-bot@gitlab.com>	2020-08-20 18:42:06 +0000
commit	6e4e1050d9dba2b7b2523fdd1768823ab85feef4 (patch)
tree	78be5963ec075d80116a932011d695dd33910b4e /doc/administration/consul.md
parent	1ce776de4ae122aba3f349c02c17cebeaa8ecf07 (diff)
download	gitlab-ce-6e4e1050d9dba2b7b2523fdd1768823ab85feef4.tar.gz