| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
What
---
Make the retryableHTTP client introduced in
https://gitlab.com/gitlab-org/gitlab-shell/-/merge_requests/703 the
default HTTP client.
Why
---
In
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/7979#note_1254964426
we've seen a 99% error reduction on `git` commands from `gitlab-shell`
when the retryableHTTP client is used.
This has been running in production for over 2 weeks in `us-east1-b` and
5 days fleet-wide so we should be confident that this client works as
expected.
Reference: https://gitlab.com/gitlab-com/gl-infra/production/-/issues/7979
Signed-off-by: Steve Azzopardi <sazzopardi@gitlab.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
What
---
- Update the `client.HttpClient` fields to have `http.Client` and
`retryablehttp.Client`, one of them will be `nil` depending on the
feature flag toggle.
- Create new method `newRetryableRequest` which will create a
`retryablehttp.Request` and use that if the
`FF_GITLAB_SHELL_RETRYABLE_HTTP` feature flag is turned on.
- Add checks for `FF_GITLAB_SHELL_RETRYABLE_HTTP` everywhere we use the
http client to use the `retryablehttp.Client` or the default
`http.Client`
- New job `tests-integration-retryableHttp` to run the integraiton tests
with the new retryablehttp client. We didn't update go tests because
some assertions are different and will break table driven tests.
Why
---
As discussed in
https://gitlab.com/gitlab-org/gitlab-shell/-/merge_requests/703#note_1229645097
we want to put the client behind a feature flag, not just the retry
logic. This does bring extra risk for accessing a `nil` field but there
should be checks everytime we access `RetryableHTTP` and `HTTPClient`.
Reference: https://gitlab.com/gitlab-com/gl-infra/production/-/issues/7979
Signed-off-by: Steve Azzopardi <sazzopardi@gitlab.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
What
---
Change the default `HTTP.Client` to
`github.com/hashicorp/go-retryablehttp.Client` to get automatic retries
and exponential backoff.
We retry the request 2 times resulting in 3 attempts of sending the
request, the min retry wait is 1 second, and the maximum is 15
seconds.
Hide the retry logic behind a temporary feature flag
`FF_GITLAB_SHELL_RETRYABLE_HTTP` to easily roll this out in GitLab.com.
When we verify that this works as expected we will remove
`FF_GITLAB_SHELL_RETRYABLE_HTTP` and have the retry logic as the default
logic.
Why
---
In https://gitlab.com/gitlab-com/gl-infra/production/-/issues/7979 users
end up seeing the following errors when trying to `git-clone(1)` a
repository locally on in CI.
```shell
remote: ===============================
remote:
remote: ERROR: Internal API unreachable
remote:
remote: ================================
```
When we look at the application logs we see the following error:
```json
{ "err": "http://gitlab-webservice-git.gitlab.svc:8181/api/v4/internal/allowed":
dial tcp 10.69.184.120:8181: connect: connection refused", "msg":
"Internal API unreachable"}
```
In
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/7979#note_1222670120
we've correlated these `connection refused` errors with infrastructure
events that remove the git pods that are hosting
`gitlab-webservice-git` service. We could try to make the underlying
infrastructure more reactive to these changes as suggested in
https://gitlab.com/gitlab-com/gl-infra/production/-/issues/7979#note_1225164944
but we can still end up serving bad requests.
Implementing retry logic for 5xx or other errors would allow users to
still be able to `git-clone(1)` reposirories, although it being slower.
This is espically important during CI runs so users don't have to retry
jobs themselves.
Reference: https://gitlab.com/gitlab-com/gl-infra/production/-/issues/7979
Closes: https://gitlab.com/gitlab-org/gitlab-shell/-/issues/604
Signed-off-by: Steve Azzopardi <sazzopardi@gitlab.com>
|
|
|
|
|
|
|
| |
When the shell environment includes SSH_CONNECTION, one spec fails as
the way we're stubbing the environment to the subprocess doesn't wipe
out the pre-existing variable. This commit changes how we do it so the
spec passes even in this environment.
|
|
|