summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorKyle Wiebers <kwiebers@gitlab.com>2019-09-11 12:31:10 -0500
committerKyle Wiebers <kwiebers@gitlab.com>2019-09-12 09:33:20 -0500
commit084234633e297ff1a7a68a964fa1cb6a6cb21c42 (patch)
tree9dcece5c7889a8485a8137828ba4bc71a8d84e06
parent6db9cbfe77a556ea2d321f0ea0a0c6b6e14a817c (diff)
downloadgitlab-ce-docs-add-review-app-troubleshooting.tar.gz
Add review app cluster health troubleshooting guidancedocs-add-review-app-troubleshooting
Add Diagnosing unhealthy review-app releases Update mermaid chart Add some cli tools under Other resources section
-rw-r--r--doc/development/testing_guide/img/k9s.pngbin0 -> 364038 bytes
-rw-r--r--doc/development/testing_guide/review_apps.md46
2 files changed, 39 insertions, 7 deletions
diff --git a/doc/development/testing_guide/img/k9s.png b/doc/development/testing_guide/img/k9s.png
new file mode 100644
index 00000000000..c4b222f0b64
--- /dev/null
+++ b/doc/development/testing_guide/img/k9s.png
Binary files differ
diff --git a/doc/development/testing_guide/review_apps.md b/doc/development/testing_guide/review_apps.md
index 13772cbe015..8698a1e4c2d 100644
--- a/doc/development/testing_guide/review_apps.md
+++ b/doc/development/testing_guide/review_apps.md
@@ -10,24 +10,30 @@ Review Apps are automatically deployed by each pipeline, both in
```mermaid
graph TD
- build-qa-image -.->|once the `prepare` stage is done| gitlab:assets:compile
- review-build-cng -->|triggers a CNG-mirror pipeline and wait for it to be done| CNG-mirror
- review-build-cng -.->|once the `test` stage is done| review-deploy
- review-deploy -.->|once the `review` stage is done| review-qa-smoke
+ build-qa-image -->|once the `prepare` stage is done| gitlab:assets:compile
+ gitlab:assets:compile -->|once the `gitlab:assets:compile` job is done| review-build-cng
+ review-build-cng -.->|triggers a CNG-mirror pipeline and wait for it to be done| CNG-mirror
+ CNG-mirror -.->|polls until completed| review-build-cng
+ review-build-cng -->|once the `review-build-cng` job is done| review-deploy
+ review-deploy -->|once the `review-deploy` job is done| review-qa-smoke
subgraph "1. gitlab-ce/ee `prepare` stage"
build-qa-image
end
subgraph "2. gitlab-ce/ee `test` stage"
- gitlab:assets:compile -->|plays dependent job once done| review-build-cng
+ gitlab:assets:compile
end
-subgraph "3. gitlab-ce/ee `review` stage"
+subgraph "3. gitlab-ce/ee `review-prepare` stage"
+ review-build-cng
+ end
+
+subgraph "4. gitlab-ce/ee `review` stage"
review-deploy["review-deploy<br><br>Helm deploys the Review App using the Cloud<br/>Native images built by the CNG-mirror pipeline.<br><br>Cloud Native images are deployed to the `review-apps-ce` or `review-apps-ee`<br>Kubernetes (GKE) cluster, in the GCP `gitlab-review-apps` project."]
end
-subgraph "4. gitlab-ce/ee `qa` stage"
+subgraph "5. gitlab-ce/ee `qa` stage"
review-qa-smoke[review-qa-smoke<br><br>gitlab-qa runs the smoke suite against the Review App.]
end
@@ -177,6 +183,25 @@ secure note named **gitlab-{ce,ee} Review App's root password**.
`review-qa-raise-e-12chm0-migrations.1-nqwtx`.
1. Click on the `Container logs` link.
+### Diagnosing unhealthy review-app releases
+
+If [Review App Stability](https://gitlab.com/gitlab-org/quality/team-tasks/issues/93) dips this may be a signal
+that the `review-apps-ce/ee` cluster is unhealthy. Leading indicators may be healthcheck failures leading to restarts or majority failure for Review App deployments.
+
+The following items may help diagnose this:
+
+- [Instance group CPU Utilization in GCP](https://console.cloud.google.com/compute/instanceGroups/details/us-central1-a/gke-review-apps-ce-preemp-n1-standard-a4c9571c-grp?project=gitlab-review-apps&tab=monitoring&graph=GCE_CPU&duration=PT12H) - helpful to identify if nodes are problematic or the entire cluster is trending towards unhealthy
+- [Instance Group size in GCP](https://console.cloud.google.com/compute/instanceGroups/details/us-central1-a/gke-review-apps-ce-preemp-n1-standard-a4c9571c-grp?project=gitlab-review-apps&tab=monitoring&graph=GCE_SIZE&duration=PT12H) - aids in identifying load spikes on the cluster. Kubernetes will add nodes up to 220 based on total resource requests.
+- `kubectl top nodes --sort-by=cpu` - can identify if node spikes are common or load on specific nodes which may get rebalanced by the Kubernetes scheduler.
+- `kubectl top pods --sort-by=cpu` -
+- [K9s] - K9s is a powerful command line dashboard which allows you to filter by labels. This can help identify trends with apps exceeding the [review-app resource requests](https://gitlab.com/gitlab-org/gitlab-ce/blob/master/scripts/review_apps/base-config.yaml). Kubernetes will schedule pods to nodes based on resource requests and allow for CPU usage up to the limits.
+ - In K9s you can sort or add filters by typing the `/` character
+ - `-lrelease=<review-app-slug>` - filters down to all pods for a release. This aids in determining what is having issues in a single deployment
+ - `-lapp=<app>` - filters down to all pods for a specific app. This aids in determining resource usage by app.
+ - You can scroll to a Kubernetes resource and hit `d`(describe), `s`(shell), `l`(logs) for a deeper inspection
+
+![K9s](img/k9s.png)
+
### Troubleshoot a pending `dns-gitlab-review-app-external-dns` Deployment
#### Finding the problem
@@ -266,6 +291,12 @@ find a way to limit it to only us.**
## Other resources
- [Review Apps integration for CE/EE (presentation)](https://docs.google.com/presentation/d/1QPLr6FO4LduROU8pQIPkX1yfGvD13GEJIBOenqoKxR8/edit?usp=sharing)
+- [Stability issues](https://gitlab.com/gitlab-org/quality/team-tasks/issues/212)
+
+### Helpful command line tools
+
+- [K9s] - enables CLI dashboard across pods and enabling filtering by labels
+- [Stern](https://github.com/wercker/stern) - enables cross pod log tailing based on label/field selectors
[charts-1068]: https://gitlab.com/gitlab-org/charts/gitlab/issues/1068
[gitlab-pipeline]: https://gitlab.com/gitlab-org/gitlab-ce/pipelines/44362587
@@ -285,6 +316,7 @@ find a way to limit it to only us.**
[gitlab-ci-yml]: https://gitlab.com/gitlab-org/gitlab-ce/blob/master/.gitlab-ci.yml
[gitlab-k8s-integration]: ../../user/project/clusters/index.md
[password-bug]: https://gitlab.com/gitlab-org/gitlab-ce/issues/53621
+[K9s]: https://github.com/derailed/k9s
---