diff options
Diffstat (limited to 'doc/operations/incident_management')
17 files changed, 611 insertions, 563 deletions
diff --git a/doc/operations/incident_management/alert_details.md b/doc/operations/incident_management/alert_details.md index 860e6d32ae4..459331ea0a5 100644 --- a/doc/operations/incident_management/alert_details.md +++ b/doc/operations/incident_management/alert_details.md @@ -1,200 +1,5 @@ --- -stage: Monitor -group: Health -info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#designated-technical-writers +redirect_to: alerts.md --- -# Alert details page - -Navigate to the Alert details view by visiting the -[Alert list](./alerts.md) and selecting an alert from the -list. You need least Developer [permissions](../../user/permissions.md) to access -alerts. - -TIP: **Tip:** -To review live examples of GitLab alerts, visit the -[alert list](https://gitlab.com/gitlab-examples/ops/incident-setup/everyone/tanuki-inc/-/alert_management) -for this demo project. Click any alert in the list to examine its alert details -page. - -Alerts provide **Overview** and **Alert details** tabs to give you the right -amount of information you need. - -## Alert overview tab - -The **Overview** tab provides basic information about the alert: - -![Alert Detail Overview](./img/alert_detail_overview_v13_1.png) - -## Alert details tab - -![Alert Full Details](./img/alert_detail_full_v13_1.png) - -### Update an alert's status - -The Alert detail view enables you to update the Alert Status. -See [Create and manage alerts in GitLab](./alerts.md) for more details. - -### Create an issue from an alert - -> [Introduced](https://gitlab.com/gitlab-org/gitlab/-/issues/217745) in GitLab 13.1. - -The Alert detail view enables you to create an issue with a -description automatically populated from an alert. To create the issue, -click the **Create Issue** button. You can then view the issue from the -alert by clicking the **View Issue** button. - -Closing a GitLab issue associated with an alert changes the alert's status to Resolved. -See [Create and manage alerts in GitLab](alerts.md) for more details about alert statuses. - -### Update an alert's assignee - -> [Introduced](https://gitlab.com/groups/gitlab-org/-/epics/3066) in GitLab 13.1. - -The Alert detail view allows users to update the Alert assignee. - -In large teams, where there is shared ownership of an alert, it can be difficult -to track who is investigating and working on it. The Alert detail view -enables you to update the Alert assignee: - -NOTE: **Note:** -GitLab currently only supports a single assignee per alert. - -1. To display the list of current alerts, click - **{cloud-gear}** **Operations > Alerts**: - - ![Alert List View Assignee(s)](./img/alert_list_assignees_v13_1.png) - -1. Select your desired alert to display its **Alert Details View**: - - ![Alert Details View Assignee(s)](./img/alert_details_assignees_v13_1.png) - -1. If the right sidebar is not expanded, click - **{angle-double-right}** **Expand sidebar** to expand it. -1. In the right sidebar, locate the **Assignee** and click **Edit**. From the - dropdown menu, select each user you want to assign to the alert. GitLab creates - a [to-do list item](../../user/todos.md) for each user. - - ![Alert Details View Assignee(s)](./img/alert_todo_assignees_v13_1.png) - -To remove an assignee, click **Edit** next to the **Assignee** dropdown menu and -deselect the user from the list of assignees, or click **Unassigned**. - -### Alert system notes - -> [Introduced](https://gitlab.com/groups/gitlab-org/-/epics/3066) in GitLab 13.1. - -When you take action on an alert, this is logged as a system note, -which is visible in the Alert Details view. This gives you a linear -timeline of the alert's investigation and assignment history. - -The following actions will result in a system note: - -- [Updating the status of an alert](#update-an-alerts-status) -- [Creating an issue based on an alert](#create-an-issue-from-an-alert) -- [Assignment of an alert to a user](#update-an-alerts-assignee) - -![Alert Details View System Notes](./img/alert_detail_system_notes_v13_1.png) - -### Create a to-do from an alert - -> [Introduced](https://gitlab.com/groups/gitlab-org/-/epics/3066) in GitLab 13.1. - -You can manually create [To-Do list items](../../user/todos.md) for yourself from the -Alert details screen, and view them later on your **To-Do List**. To add a to-do: - -1. To display the list of current alerts, click - **{cloud-gear}** **Operations > Alerts**. -1. Select your desired alert to display its **Alert Management Details View**. -1. Click the **Add a To-Do** button in the right sidebar: - - ![Alert Details Add A To Do](./img/alert_detail_add_todo_v13_1.png) - -Click the **To-Do** **{todo-done}** in the navigation bar to view your current to-do list. - -![Alert Details Added to Do](./img/alert_detail_added_todo_v13_1.png) - -### View an alert's metrics data - -> [Introduced](https://gitlab.com/gitlab-org/gitlab/-/issues/217768) in GitLab 13.2. - -To view the metrics for an alert: - - 1. Sign in as a user with Developer or higher [permissions](../../user/permissions.md). - 1. Navigate to **{cloud-gear}** **Operations > Alerts**. - 1. Click the alert you want to view. - 1. Below the title of the alert, click the **Metrics** tab. - -![Alert Metrics View](img/alert_detail_metrics_v13_2.png) - -For GitLab-managed Prometheus instances, metrics data is automatically available -for the alert, making it easy to see surrounding behavior. See -[Managed Prometheus instances](../metrics/alerts.md#managed-prometheus-instances) -for information on setting up alerts. - -For externally-managed Prometheus instances, you can configure your alerting rules to -display a chart in the alert. See -[Embedding metrics based on alerts in incident issues](../metrics/embed.md#embedding-metrics-based-on-alerts-in-incident-issues) -for information on how to appropriately configure your alerting rules. See -[External Prometheus instances](../metrics/alerts.md#external-prometheus-instances) -for information on setting up alerts for your self-managed Prometheus instance. - -## Use cases for assigning alerts - -Consider a team formed by different sections of monitoring, collaborating on a -single application. After an alert surfaces, it's extremely important to -route the alert to the team members who can address and resolve the alert. - -Assigning Alerts eases collaboration and delegation. All -assignees are shown in your team's work-flows, and all assignees receive -notifications, simplifying communication and ownership of the alert. - -After completing their portion of investigating or fixing the alert, users can -unassign their account from the alert when their role is complete. -The alert status can be updated on the [Alert list](./alerts.md) to -reflect if the alert has been resolved. - -## View an alert's logs - -> [Introduced](https://gitlab.com/gitlab-org/gitlab/-/issues/217768) in GitLab 13.3. - -To view the logs for an alert: - - 1. Sign in as a user with Developer or higher [permissions](../../user/permissions.md). - 1. Navigate to **{cloud-gear}** **Operations > Alerts**. - 1. Click the alert you want to view. - 1. Below the title of the alert, click the **Metrics** tab. - 1. Click the [menu](../metrics/dashboards/index.md#chart-context-menu) of the metric chart to view options. - 1. Click **View logs**. - -Read [View logs from metrics panel](#view-logs-from-metrics-panel) for additional information. - -## Embed metrics in incidents and issues - -You can embed metrics anywhere [GitLab Markdown](../../user/markdown.md) is used, such as descriptions, -comments on issues, and merge requests. Embedding metrics helps you share them -when discussing incidents or performance issues. You can output the dashboard directly -into any issue, merge request, epic, or any other Markdown text field in GitLab -by [copying and pasting the link to the metrics dashboard](../metrics/embed.md#embedding-gitlab-managed-kubernetes-metrics). - -You can embed both -[GitLab-hosted metrics](../metrics/embed.md) and -[Grafana metrics](../metrics/embed_grafana.md) -in incidents and issue templates. - -### Context menu - -You can view more details about an embedded metrics panel from the context menu. -To access the context menu, click the **{ellipsis_v}** **More actions** dropdown box -above the upper right corner of the panel. For a list of options, see -[Chart context menu](../metrics/dashboards/index.md#chart-context-menu). - -#### View logs from metrics panel - -> - [Introduced](https://gitlab.com/gitlab-org/gitlab/-/issues/201846) in GitLab Ultimate 12.8. -> - [Moved](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/25455) to [GitLab Core](https://about.gitlab.com/pricing/) 12.9. - -Viewing logs from a metrics panel can be useful if you're triaging an application -incident and need to [explore logs](../metrics/dashboards/index.md#chart-context-menu) -from across your application. These logs help you understand what is affecting -your application's performance and resolve any problems. +This document was moved to [another location](alerts.md). diff --git a/doc/operations/incident_management/alert_integrations.md b/doc/operations/incident_management/alert_integrations.md new file mode 100644 index 00000000000..58c1e1eae76 --- /dev/null +++ b/doc/operations/incident_management/alert_integrations.md @@ -0,0 +1,163 @@ +--- +stage: Monitor +group: Health +info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#designated-technical-writers +--- + +# Alert integrations + +> - [Introduced](https://gitlab.com/gitlab-org/gitlab/-/issues/13203) in [GitLab Ultimate](https://about.gitlab.com/pricing/) 12.4. +> - [Moved](https://gitlab.com/gitlab-org/gitlab/-/issues/42640) to [GitLab Core](https://about.gitlab.com/pricing/) in 12.8. + +GitLab can accept alerts from any source via a webhook receiver. This can be configured generically or, in GitLab versions 13.1 and greater, you can configure +[External Prometheus instances](../metrics/alerts.md#external-prometheus-instances) +to use this endpoint. + +## Integrations list + +> [Introduced](https://gitlab.com/gitlab-org/gitlab/-/issues/245331) in [GitLab Core](https://about.gitlab.com/pricing/) 13.5. + +With Maintainer or higher [permissions](../../user/permissions.md), you can view +the list of configured alerts integrations by navigating to +**Settings > Operations** in your project's sidebar menu, and expanding **Alerts** section. +The list displays the integration name, type, and status (enabled or disabled): + +![Current Integrations](img/integrations_list_v13_5.png) + +## Configuration + +You can either configure alerts to integrate with an [external Prometheus server](#external-prometheus-integration), +or provide a [generic HTTP endpoint](#generic-http-endpoint) to receive alerts +from other services. + +### Generic HTTP Endpoint + +Enabling the Generic HTTP Endpoint creates a unique HTTP endpoint that can receive alert payloads in JSON format. You can always +[customize the payload](#customizing-the-payload) to your liking. + +You will need to activate the endpoint and obtain credentials to set up this integration: + +1. Sign in to GitLab as a user with maintainer [permissions](../../user/permissions.md) + for a project. +1. Navigate to **Settings > Operations** in your project. +1. Expand the **Alerts** section, and in the **Integration** dropdown menu, select **Generic**. +1. Toggle the **Active** alert setting to display the **URL** and **Authorization Key** + for the webhook configuration. + +### External Prometheus integration + +For GitLab versions 13.1 and greater, please see [External Prometheus Instances](../metrics/alerts.md#external-prometheus-instances) to configure alerts for this integration. + +## Customizing the payload + +You can customize the payload by sending the following parameters. This applies to all types of integrations. All fields +other than `title` are optional: + +| Property | Type | Description | +| ------------------------- | --------------- | ----------- | +| `title` | String | The title of the incident. Required. | +| `description` | String | A high-level summary of the problem. | +| `start_time` | DateTime | The time of the incident. If none is provided, a timestamp of the issue will be used. | +| `end_time` | DateTime | For existing alerts only. When provided, the alert is resolved and the associated incident is closed. | +| `service` | String | The affected service. | +| `monitoring_tool` | String | The name of the associated monitoring tool. | +| `hosts` | String or Array | One or more hosts, as to where this incident occurred. | +| `severity` | String | The severity of the alert. Must be one of `critical`, `high`, `medium`, `low`, `info`, `unknown`. Default is `critical`. | +| `fingerprint` | String or Array | The unique identifier of the alert. This can be used to group occurrences of the same alert. | +| `gitlab_environment_name` | String | The name of the associated GitLab [environment](../../ci/environments/index.md). This can be used to associate your alert to your environment. | + +You can also add custom fields to the alert's payload. The values of extra +parameters aren't limited to primitive types (such as strings or numbers), but +can be a nested JSON object. For example: + +```json +{ "foo": { "bar": { "baz": 42 } } } +``` + +TIP: **Payload size:** +Ensure your requests are smaller than the [payload application limits](../../administration/instance_limits.md#generic-alert-json-payloads). + +Example request: + +```shell +curl --request POST \ + --data '{"title": "Incident title"}' \ + --header "Authorization: Bearer <authorization_key>" \ + --header "Content-Type: application/json" \ + <url> +``` + +The `<authorization_key>` and `<url>` values can be found when configuring an alert integration. + +Example payload: + +```json +{ + "title": "Incident title", + "description": "Short description of the incident", + "start_time": "2019-09-12T06:00:55Z", + "service": "service affected", + "monitoring_tool": "value", + "hosts": "value", + "severity": "high", + "fingerprint": "d19381d4e8ebca87b55cda6e8eee7385", + "foo": { + "bar": { + "baz": 42 + } + } +} +``` + +## Triggering test alerts + +> [Introduced](https://gitlab.com/groups/gitlab-org/-/epics/3066) in GitLab Core in 13.2. + +After a [project maintainer or owner](../../user/permissions.md) +configures an integration, you can trigger a test +alert to confirm your integration works properly. + +1. Sign in as a user with Developer or greater [permissions](../../user/permissions.md). +1. Navigate to **Settings > Operations** in your project. +1. Click **Alerts endpoint** to expand the section. +1. Enter a sample payload in **Alert test payload** (valid JSON is required). +1. Click **Test alert payload**. + +GitLab displays an error or success message, depending on the outcome of your test. + +## Automatic grouping of identical alerts **(PREMIUM)** + +> [Introduced](https://gitlab.com/gitlab-org/gitlab/-/issues/214557) in [GitLab Premium](https://about.gitlab.com/pricing/) 13.2. + +In GitLab versions 13.2 and greater, GitLab groups alerts based on their +payload. When an incoming alert contains the same payload as another alert +(excluding the `start_time` and `hosts` attributes), GitLab groups these alerts +together and displays a counter on the [Alert Management List](./incidents.md) +and details pages. + +If the existing alert is already `resolved`, GitLab creates a new alert instead. + +![Alert Management List](./img/alert_list_v13_1.png) + +## Link to your Opsgenie Alerts + +> [Introduced](https://gitlab.com/groups/gitlab-org/-/epics/3066) in [GitLab Premium](https://about.gitlab.com/pricing/) 13.2. + +You can monitor alerts using a GitLab integration with [Opsgenie](https://www.atlassian.com/software/opsgenie). + +If you enable the Opsgenie integration, you can't have other GitLab alert +services, such as [Generic Alerts](generic_alerts.md) or Prometheus alerts, +active at the same time. + +To enable Opsgenie integration: + +1. Sign in as a user with Maintainer or Owner [permissions](../../user/permissions.md). +1. Navigate to **Operations > Alerts**. +1. In the **Integrations** select box, select **Opsgenie**. +1. Select the **Active** toggle. +1. In the **API URL** field, enter the base URL for your Opsgenie integration, + such as `https://app.opsgenie.com/alert/list`. +1. Select **Save changes**. + +After you enable the integration, navigate to the Alerts list page at +**Operations > Alerts**, and then select **View alerts in Opsgenie**. diff --git a/doc/operations/incident_management/alert_notifications.md b/doc/operations/incident_management/alert_notifications.md new file mode 100644 index 00000000000..130c4e82088 --- /dev/null +++ b/doc/operations/incident_management/alert_notifications.md @@ -0,0 +1,36 @@ +--- +stage: Monitor +group: Health +info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#designated-technical-writers +--- + +# Paging and notifications + +When there is a new alert or incident, it is important for a responder to be notified +immediately so they can triage and respond to the problem. Responders can receive +notifications using the methods described on this page. + +## Slack notifications + +> [Introduced](https://gitlab.com/gitlab-org/gitlab/-/issues/216326) in GitLab 13.1. + +Responders can be paged via Slack using the +[Slack Notifications Service](../../user/project/integrations/slack.md), which you +can configure for new alerts and new incidents. After configuring, responders +receive a **single** page via Slack. To set up Slack notifications on your mobile +device, make sure to enable notifications for the Slack app on your phone so +you never miss a page. + +## Email notifications + +Email notifications are available in projects that have been +[configured to create incidents automatically](incidents.md#create-incidents-automatically) +for triggered alerts. Project members with the **Owner** or **Maintainer** roles are +sent an email notification automatically. (This is not configurable.) To optionally +send additional email notifications to project members with the **Developer** role: + +1. Navigate to **Settings > Operations**. +1. Expand the **Incidents** section. +1. In the **Alert Integration** tab, select the **Send a separate email notification to Developers** + check box. +1. Select **Save changes**. diff --git a/doc/operations/incident_management/alerts.md b/doc/operations/incident_management/alerts.md index d908af63000..a6168386024 100644 --- a/doc/operations/incident_management/alerts.md +++ b/doc/operations/incident_management/alerts.md @@ -4,119 +4,261 @@ group: Health info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#designated-technical-writers --- -# Create and manage alerts in GitLab +# Alerts -Users with at least Developer [permissions](../../user/permissions.md) can access -the Alert Management list at **{cloud-gear}** **Operations > Alerts** in your -project's sidebar. The Alert Management list displays alerts sorted by start time, -but you can change the sort order by clicking the headers in the Alert Management list. +Alerts are a critical entity in your incident managment workflow. They represent a notable event that might indicate a service outage or disruption. GitLab provides a list view for triage and detail view for deeper investigation of what happened. + +## Alert List + +Users with at least Developer [permissions](../../user/permissions.md) can +access the Alert list at **Operations > Alerts** in your project's +sidebar. The Alert list displays alerts sorted by start time, but +you can change the sort order by clicking the headers in the Alert list. ([Introduced](https://gitlab.com/gitlab-org/gitlab/-/issues/217745) in GitLab 13.1.) The alert list displays the following information: -![Alert List](./img/alert_list_v13_1.png) +![Alert List](img/alert_list_v13_1.png) -- **Search** - The alert list supports a simple free text search on the title, +- **Search**: The alert list supports a simple free text search on the title, description, monitoring tool, and service fields. ([Introduced](https://gitlab.com/gitlab-org/gitlab/-/issues/213884) in GitLab 13.1.) -- **Severity** - The current importance of a alert and how much attention it should - receive. For a listing of all statuses, read [Alert Management severity](#alert-severity). -- **Start time** - How long ago the alert fired. This field uses the standard - GitLab pattern of `X time ago`, but is supported by a granular date/time tooltip - depending on the user's locale. -- **Alert description** - The description of the alert, which attempts to capture the most meaningful data. -- **Event count** - The number of times that an alert has fired. -- **Issue** - A link to the incident issue that has been created for the alert. -- **Status** - The current status of the alert: +- **Severity**: The current importance of a alert and how much attention it + should receive. For a listing of all statuses, read [Alert Management severity](#alert-severity). +- **Start time**: How long ago the alert fired. This field uses the standard + GitLab pattern of `X time ago`, but is supported by a granular date/time + tooltip depending on the user's locale. +- **Alert description**: The description of the alert, which attempts to + capture the most meaningful data. +- **Event count**: The number of times that an alert has fired. +- **Issue**: A link to the incident issue that has been created for the alert. +- **Status**: The current status of the alert: - **Triggered**: No one has begun investigation. - **Acknowledged**: Someone is actively investigating the problem. - **Resolved**: No further work is required. - + TIP: **Tip:** -Check out a [live example](https://gitlab.com/gitlab-examples/ops/incident-setup/everyone/tanuki-inc/-/alert_management) +Check out a live example available from the +[`tanuki-inc` project page](https://gitlab-examples-ops-incident-setup-everyone-tanuki-inc.34.69.64.147.nip.io/) in GitLab to examine alerts in action. -## Enable Alerts +## Alert severity + +Each level of alert contains a uniquely shaped and color-coded icon to help +you identify the severity of a particular alert. These severity icons help you +immediately identify which alerts you should prioritize investigating: + +![Alert Management Severity System](img/alert_management_severity_v13_0.png) -NOTE: **Note:** -You need at least Maintainer [permissions](../../user/permissions.md) to enable -the Alerts feature. +Alerts contain one of the following icons: -There are several ways to accept alerts into your GitLab project. -Enabling any of these methods enables the Alert list. After configuring -alerts, visit **{cloud-gear}** **Operations > Alerts** in your project's sidebar -to view the list of alerts. +| Severity | Icon | Color (hexadecimal) | +|----------|-------------------------|---------------------| +| Critical | **{severity-critical}** | `#8b2615` | +| High | **{severity-high}** | `#c0341d` | +| Medium | **{severity-medium}** | `#fca429` | +| Low | **{severity-low}** | `#fdbc60` | +| Info | **{severity-info}** | `#418cd8` | +| Unknown | **{severity-unknown}** | `#bababa` | -### Enable GitLab-managed Prometheus alerts +## Alert details page -You can install the GitLab-managed Prometheus application on your Kubernetes -cluster. For more information, read -[Managed Prometheus on Kubernetes](../../user/project/integrations/prometheus.md#managed-prometheus-on-kubernetes). -When GitLab-managed Prometheus is installed, the [Alerts list](alerts.md) -is also enabled. +Navigate to the Alert details view by visiting the [Alert list](./alerts.md) +and selecting an alert from the list. You need least Developer [permissions](../../user/permissions.md) +to access alerts. -To populate the alerts with data, read -[GitLab-Managed Prometheus instances](../metrics/alerts.md#managed-prometheus-instances). +TIP: **Tip:** +To review live examples of GitLab alerts, visit the +[alert list](https://gitlab.com/gitlab-examples/ops/incident-setup/everyone/tanuki-inc/-/alert_management) +for this demo project. Select any alert in the list to examine its alert details +page. -### Enable external Prometheus alerts +Alerts provide **Overview** and **Alert details** tabs to give you the right +amount of information you need. -You can configure an externally-managed Prometheus instance to send alerts -to GitLab. To set up this configuration, read the [configuring Prometheus](../metrics/alerts.md#external-prometheus-instances) documentation. Activating the external Prometheus -configuration also enables the [Alerts list](./alerts.md). +### Alert details tab -To populate the alerts with data, read -[External Prometheus instances](../metrics/alerts.md#external-prometheus-instances). +The **Alert details** tab has two sections. The top section provides a short list of critical details such as the severity, start time, number of events, and originating monitorting tool. The second section displays the full alert payload. -### Enable a Generic Alerts endpoint +### Metrics tab -GitLab provides the Generic Alerts endpoint so you can accept alerts from a third-party -alerts service. Read the -[instructions for toggling generic alerts](generic_alerts.md#setting-up-generic-alerts) -to add this option. After configuring the endpoint, the -[Alerts list](./alerts.md) is enabled. +> [Introduced](https://gitlab.com/gitlab-org/gitlab/-/issues/217768) in GitLab 13.2. -To populate the alerts with data, read [Customizing the payload](./generic_alerts.md#customizing-the-payload) for requests to the alerts endpoint. +The **Metrics** tab will display a metrics chart for alerts coming from Prometheus. If the alert originated from any other tool, the **Metrics** tab will be empty. To set up alerts for GitLab-managed Prometheus instances, see [Managed Prometheus instances](../metrics/alerts.md#managed-prometheus-instances). For externally-managed Prometheus instances, you will need to configure your alerting +rules to display a chart in the alert. For information about how to configure +your alerting rules, see [Embedding metrics based on alerts in incident issues](../metrics/embed.md#embedding-metrics-based-on-alerts-in-incident-issues). See +[External Prometheus instances](../metrics/alerts.md#external-prometheus-instances) +for information about setting up alerts for your self-managed Prometheus +instance. -### Opsgenie integration **(PREMIUM)** +To view the metrics for an alert: -> [Introduced](https://gitlab.com/groups/gitlab-org/-/epics/3066) in [GitLab Premium](https://about.gitlab.com/pricing/) 13.2. +1. Sign in as a user with Developer or higher [permissions](../../user/permissions.md). +1. Navigate to **Operations > Alerts**. +1. Select the alert you want to view. +1. Below the title of the alert, select the **Metrics** tab. -A new way of monitoring Alerts via a GitLab integration is with -[Opsgenie](https://www.atlassian.com/software/opsgenie). +![Alert Metrics View](img/alert_detail_metrics_v13_2.png) -NOTE: **Note:** -If you enable the Opsgenie integration, you can't have other GitLab alert services, -such as [Generic Alerts](./generic_alerts.md) or -Prometheus alerts, active at the same time. +#### View an alert's logs -To enable Opsgenie integration: +> - [Introduced](https://gitlab.com/gitlab-org/gitlab/-/issues/201846) in GitLab Ultimate 12.8. and [improved](https://gitlab.com/gitlab-org/gitlab/-/issues/217768) in GitLab 13.3. +> - [Moved](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/25455) to [GitLab Core](https://about.gitlab.com/pricing/) 12.9. -1. Sign in as a user with Maintainer or Owner [permissions](../../user/permissions.md). -1. Navigate to **{cloud-gear}** **Operations > Alerts**. -1. In the **Integrations** select box, select Opsgenie. -1. Click the **Active** toggle. -1. In the **API URL**, enter the base URL for your Opsgenie integration, such - as `https://app.opsgenie.com/alert/list`. -1. Click **Save changes**. +Viewing logs from a metrics panel can be useful if you're triaging an +application incident and need to [explore logs](../metrics/dashboards/index.md#chart-context-menu) +from across your application. These logs help you understand what's affecting +your application's performance and how to resolve any problems. -After enabling the integration, navigate to the Alerts list page at -**{cloud-gear}** **Operations > Alerts**, and click **View alerts in Opsgenie**. +To view the logs for an alert: -## Alert severity +1. Sign in as a user with Developer or higher [permissions](../../user/permissions.md). +1. Navigate to **Operations > Alerts**. +1. Select the alert you want to view. +1. Below the title of the alert, select the **Metrics** tab. +1. Select the [menu](../metrics/dashboards/index.md#chart-context-menu) of + the metric chart to view options. +1. Select **View logs**. -Each level of alert contains a uniquely shaped and color-coded icon to help -you identify the severity of a particular alert. These severity icons help you -immediately identify which alerts you should prioritize investigating: +### Activity feed tab -![Alert Management Severity System](./img/alert_management_severity_v13_0.png) +> [Introduced](https://gitlab.com/groups/gitlab-org/-/epics/3066) in GitLab 13.1. -Alerts contain one of the following icons: +The **Activity feed** tab is a log of activity on the alert. When you take action on an alert, this is logged as a system note. This gives you a linear +timeline of the alert's investigation and assignment history. + +The following actions will result in a system note: + +- [Updating the status of an alert](#update-an-alerts-status) +- [Creating an incident based on an alert](#create-an-incident-from-an-alert) +- [Assignment of an alert to a user](#assign-an-alert) + +![Alert Details Activity Feed](img/alert_detail_activity_feed_v13_5.png) + +## Alert actions + +There are different actions avilable in GitLab to help triage and respond to alerts. + +### Update an alert's status + +The Alert detail view enables you to update the Alert Status. +See [Create and manage alerts in GitLab](./alerts.md) for more details. + +### Create an incident from an alert + +> [Introduced](https://gitlab.com/gitlab-org/gitlab/-/issues/217745) in GitLab 13.1. + +The Alert detail view enables you to create an issue with a +description populated from an alert. To create the issue, +select the **Create Issue** button. You can then view the issue from the +alert by selecting the **View Issue** button. + +Closing a GitLab issue associated with an alert changes the alert's status to +Resolved. See [Create and manage alerts in GitLab](alerts.md) for more details +about alert statuses. + +### Assign an alert + +> [Introduced](https://gitlab.com/groups/gitlab-org/-/epics/3066) in GitLab 13.1. + +In large teams, where there is shared ownership of an alert, it can be +difficult to track who is investigating and working on it. Assigning alerts eases collaboration and delegation by indicating which user is owning the alert. GitLab supports only a single assignee per alert. + +To assign an alert: + +1. To display the list of current alerts, navigate to **Operations > Alerts**: + + ![Alert List View Assignee(s)](./img/alert_list_assignees_v13_1.png) + +1. Select your desired alert to display its **Alert Details View**: + + ![Alert Details View Assignee(s)](./img/alert_details_assignees_v13_1.png) + +1. If the right sidebar is not expanded, select + **{angle-double-right}** **Expand sidebar** to expand it. +1. In the right sidebar, locate the **Assignee**, and then select **Edit**. + From the dropdown menu, select each user you want to assign to the alert. + GitLab creates a [to-do item](../../user/todos.md) for each user. + + ![Alert Details View Assignee(s)](./img/alert_todo_assignees_v13_1.png) + +After completing their portion of investigating or fixing the alert, users can +unassign themselves from the alert. To remove an assignee, select **Edit** next to the **Assignee** dropdown menu +and deselect the user from the list of assignees, or select **Unassigned**. + +### Create a to do from an alert + +> [Introduced](https://gitlab.com/groups/gitlab-org/-/epics/3066) in GitLab 13.1. + +You can manually create [To-Do list items](../../user/todos.md) for yourself +from the Alert details screen, and view them later on your **To-Do List**. To +add a to do: + +1. To display the list of current alerts, navigate to **Operations > Alerts**. +1. Select your desired alert to display its **Alert Management Details View**. +1. Select the **Add a To-Do** button in the right sidebar: + + ![Alert Details Add A To Do](./img/alert_detail_add_todo_v13_1.png) + +Select the **To-Do List** **{todo-done}** in the navigation bar to view your current to-do list. + +![Alert Details Added to do](./img/alert_detail_added_todo_v13_1.png) + +## Link runbooks to alerts + +> Runbook URLs [introduced](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/39315) in GitLab 13.3. + +When creating alerts from the metrics dashboard for +[managed Prometheus instances](../metrics/alerts.md#managed-prometheus-instances), +you can link a runbook. When the alert triggers, you can access the runbook through +the [chart context menu](../metrics/dashboards/index.md#chart-context-menu) in the +upper-right corner of the metrics chart, making it easy for you to locate and access +the correct runbook: + +![Linked Runbook in charts](img/link_runbooks_to_alerts_v13_5.png) + +## View the environment that generated the alert + +> - [Introduced](https://gitlab.com/gitlab-org/gitlab/-/issues/232492) in GitLab 13.5. +> - It's [deployed behind a feature flag](../../user/feature_flags.md), disabled by default. +> - It's disabled on GitLab.com. +> - It's not recommended for production use. +> - To use it in GitLab self-managed instances, ask a GitLab administrator to [enable it](#enable-or-disable-environment-link-in-alert-details). **(CORE ONLY)** + +CAUTION: **Warning:** +This feature might not be available to you. Check the **version history** note above for details. + +The environment information and the link are displayed in the [Alert Details tab](#alert-details-tab). + +### Enable or disable Environment Link in Alert Details **(CORE ONLY)** + +Viewing the environment is under development and not ready for production use. It is +deployed behind a feature flag that is **disabled by default**. +[GitLab administrators with access to the GitLab Rails console](../../administration/feature_flags.md) +can enable it. + +To enable it: + +```ruby +Feature.enable(:expose_environment_path_in_alert_details) +``` + +To enable for just a particular project: + +```ruby +project = Project.find_by_full_path('your-group/your-project') +Feature.enable(:expose_environment_path_in_alert_details, project) +``` + +To disable it: + +```ruby +Feature.disable(:expose_environment_path_in_alert_details) +``` + +To disable for just a particular project: -| Severity | Icon | Color (hexadecimal) | -|---|---|---| -| Critical | **{severity-critical}** | `#8b2615` | -| High | **{severity-high}** | `#c0341d` | -| Medium | **{severity-medium}** | `#fca429` | -| Low | **{severity-low}** | `#fdbc60` | -| Info | **{severity-info}** | `#418cd8` | -| Unknown | **{severity-unknown}** | `#bababa` | +```ruby +project = Project.find_by_full_path('your-group/your-project') +Feature.disable(:expose_environment_path_in_alert_details, project) +``` diff --git a/doc/operations/incident_management/generic_alerts.md b/doc/operations/incident_management/generic_alerts.md index 11d4dbc6924..a8f2f9a58a6 100644 --- a/doc/operations/incident_management/generic_alerts.md +++ b/doc/operations/incident_management/generic_alerts.md @@ -1,126 +1,5 @@ --- -stage: Monitor -group: Health -info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#designated-technical-writers +redirect_to: alert_notifications.md --- -# Generic alerts integration - -> - [Introduced](https://gitlab.com/gitlab-org/gitlab/-/issues/13203) in [GitLab Ultimate](https://about.gitlab.com/pricing/) 12.4. -> - [Moved](https://gitlab.com/gitlab-org/gitlab/-/issues/42640) to [GitLab Core](https://about.gitlab.com/pricing/) in 12.8. - -GitLab can accept alerts from any source via a generic webhook receiver. -When you set up the generic alerts integration, a unique endpoint will -be created which can receive a payload in JSON format, and will in turn -create an issue with the payload in the body of the issue. You can always -[customize the payload](#customizing-the-payload) to your liking. - -The entire payload will be posted in the issue discussion as a comment -authored by the GitLab Alert Bot. - -NOTE: **Note:** -In GitLab versions 13.1 and greater, you can configure -[External Prometheus instances](../metrics/alerts.md#external-prometheus-instances) -to use this endpoint. - -## Setting up generic alerts - -To obtain credentials for setting up a generic alerts integration: - -- Sign in to GitLab as a user with maintainer [permissions](../../user/permissions.md) for a project. -- Navigate to the **Operations** page for your project, depending on your installed version of GitLab: - - *In GitLab versions 13.1 and greater,* navigate to **Settings > Operations** in your project. - - *In GitLab versions prior to 13.1,* navigate to **Settings > Integrations** in your project. GitLab will display a banner encouraging you to enable the Alerts endpoint in **Settings > Operations** instead. -- Click **Alerts endpoint**. -- Toggle the **Active** alert setting to display the **URL** and **Authorization Key** for the webhook configuration. - -## Customizing the payload - -You can customize the payload by sending the following parameters. All fields other than `title` are optional: - -| Property | Type | Description | -| -------- | ---- | ----------- | -| `title` | String | The title of the incident. Required. | -| `description` | String | A high-level summary of the problem. | -| `start_time` | DateTime | The time of the incident. If none is provided, a timestamp of the issue will be used. | -| `end_time` | DateTime | For existing alerts only. When provided, the alert is resolved and the associated incident is closed. | -| `service` | String | The affected service. | -| `monitoring_tool` | String | The name of the associated monitoring tool. | -| `hosts` | String or Array | One or more hosts, as to where this incident occurred. | -| `severity` | String | The severity of the alert. Must be one of `critical`, `high`, `medium`, `low`, `info`, `unknown`. Default is `critical`. | -| `fingerprint` | String or Array | The unique identifier of the alert. This can be used to group occurrences of the same alert. | -| `gitlab_environment_name` | String | The name of the associated GitLab [environment](../../ci/environments/index.md). This can be used to associate your alert to your environment. | - -You can also add custom fields to the alert's payload. The values of extra parameters -are not limited to primitive types, such as strings or numbers, but can be a nested -JSON object. For example: - -```json -{ "foo": { "bar": { "baz": 42 } } } -``` - -TIP: **Payload size:** -Ensure your requests are smaller than the [payload application limits](../../administration/instance_limits.md#generic-alert-json-payloads). - -Example request: - -```shell -curl --request POST \ - --data '{"title": "Incident title"}' \ - --header "Authorization: Bearer <authorization_key>" \ - --header "Content-Type: application/json" \ - <url> -``` - -The `<authorization_key>` and `<url>` values can be found when [setting up generic alerts](#setting-up-generic-alerts). - -Example payload: - -```json -{ - "title": "Incident title", - "description": "Short description of the incident", - "start_time": "2019-09-12T06:00:55Z", - "service": "service affected", - "monitoring_tool": "value", - "hosts": "value", - "severity": "high", - "fingerprint": "d19381d4e8ebca87b55cda6e8eee7385", - "foo": { - "bar": { - "baz": 42 - } - } -} -``` - -## Triggering test alerts - -> [Introduced](https://gitlab.com/groups/gitlab-org/-/epics/3066) in GitLab Core in 13.2. - -After a [project maintainer or owner](#setting-up-generic-alerts) -[configures generic alerts](#setting-up-generic-alerts), you can trigger a -test alert to confirm your integration works properly. - -1. Sign in as a user with Developer or greater [permissions](../../user/permissions.md). -1. Navigate to **Settings > Operations** in your project. -1. Click **Alerts endpoint** to expand the section. -1. Enter a sample payload in **Alert test payload** (valid JSON is required). -1. Click **Test alert payload**. - -GitLab displays an error or success message, depending on the outcome of your test. - -## Automatic grouping of identical alerts **(PREMIUM)** - -> [Introduced](https://gitlab.com/gitlab-org/gitlab/-/issues/214557) in [GitLab Premium](https://about.gitlab.com/pricing/) 13.2. - -In GitLab versions 13.2 and greater, GitLab groups alerts based on their payload. -When an incoming alert contains the same payload as another alert (excluding the -`start_time` and `hosts` attributes), GitLab groups these alerts together and -displays a counter on the -[Alert Management List](./incidents.md) -and details pages. - -If the existing alert is already `resolved`, then a new alert will be created instead. - -![Alert Management List](./img/alert_list_v13_1.png) +This document was moved to [another location](alert_notifications.md). diff --git a/doc/operations/incident_management/img/alert_detail_activity_feed_v13_5.png b/doc/operations/incident_management/img/alert_detail_activity_feed_v13_5.png Binary files differnew file mode 100644 index 00000000000..2c1c4c39515 --- /dev/null +++ b/doc/operations/incident_management/img/alert_detail_activity_feed_v13_5.png diff --git a/doc/operations/incident_management/img/incident_highlight_bar_v13_5.png b/doc/operations/incident_management/img/incident_highlight_bar_v13_5.png Binary files differnew file mode 100644 index 00000000000..6a40e97820c --- /dev/null +++ b/doc/operations/incident_management/img/incident_highlight_bar_v13_5.png diff --git a/doc/operations/incident_management/img/incident_list_v13_4.png b/doc/operations/incident_management/img/incident_list_v13_4.png Binary files differdeleted file mode 100644 index bf00e630c67..00000000000 --- a/doc/operations/incident_management/img/incident_list_v13_4.png +++ /dev/null diff --git a/doc/operations/incident_management/img/incident_list_v13_5.png b/doc/operations/incident_management/img/incident_list_v13_5.png Binary files differnew file mode 100644 index 00000000000..88942a70e88 --- /dev/null +++ b/doc/operations/incident_management/img/incident_list_v13_5.png diff --git a/doc/operations/incident_management/img/incident_sla_settings_v13_5.png b/doc/operations/incident_management/img/incident_sla_settings_v13_5.png Binary files differnew file mode 100644 index 00000000000..94c8b840210 --- /dev/null +++ b/doc/operations/incident_management/img/incident_sla_settings_v13_5.png diff --git a/doc/operations/incident_management/img/integrations_list_v13_5.png b/doc/operations/incident_management/img/integrations_list_v13_5.png Binary files differnew file mode 100644 index 00000000000..babaa785ad6 --- /dev/null +++ b/doc/operations/incident_management/img/integrations_list_v13_5.png diff --git a/doc/operations/incident_management/img/link_runbooks_to_alerts_v13_5.png b/doc/operations/incident_management/img/link_runbooks_to_alerts_v13_5.png Binary files differnew file mode 100644 index 00000000000..a63001b4cde --- /dev/null +++ b/doc/operations/incident_management/img/link_runbooks_to_alerts_v13_5.png diff --git a/doc/operations/incident_management/img/timeline_view_toggle_v13_5.png b/doc/operations/incident_management/img/timeline_view_toggle_v13_5.png Binary files differnew file mode 100644 index 00000000000..542ca139f7e --- /dev/null +++ b/doc/operations/incident_management/img/timeline_view_toggle_v13_5.png diff --git a/doc/operations/incident_management/incidents.md b/doc/operations/incident_management/incidents.md index 3ff02b3dc6b..3d85fa0ebd8 100644 --- a/doc/operations/incident_management/incidents.md +++ b/doc/operations/incident_management/incidents.md @@ -4,16 +4,88 @@ group: Health info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#designated-technical-writers --- -# Create and manage incidents in GitLab +# Incidents -While no configuration is required to use the [manual features](#create-an-incident-manually) -of incident management, some simple [configuration](#configure-incidents) is needed to automate incident creation. +Incidents are critical entities in incident management workflows. They represent a service disruption or outage that needs to be restored urgently. GitLab provides tools for the triage, response, and remediation of incidents. -For users with at least Developer [permissions](../../user/permissions.md), the -Incident Management list is available at **Operations > Incidents** +## Incident Creation + +You can create an incident manually or automatically. + +### Create incidents manually + +If you have at least Guest [permissions](../../user/permissions.md), to create an Incident, you have two options to do this manually. + +**From the Incidents List:** + +> [Moved](https://gitlab.com/gitlab-org/monitor/health/-/issues/24) to GitLab core in 13.3. + +- Navigate to **Operations > Incidents** and click **Create Incident**. +- Create a new issue using the `incident` template available when creating it. +- Create a new issue and assign the `incident` label to it. + +![Incident List Create](./img/incident_list_create_v13_3.png) + +**From the Issues List:** + +> [Introduced](https://gitlab.com/gitlab-org/gitlab/-/issues/230857) in GitLab 13.4. + +- Navigate to **Issues > List** and click **Create Issue**. +- Create a new issue using the `type` drop-down and select `Incident`. +- The page refreshes and the page only displays fields relevant to Incidents. + +![Incident List Create](./img/new_incident_create_v13_4.png) + +### Create incidents automatically + +> [Introduced](https://gitlab.com/gitlab-org/gitlab/-/issues/4925) in GitLab Ultimate 11.11. + +With Maintainer or higher [permissions](../../user/permissions.md), you can enable + GitLab to create incident automatically whenever an alert is triggered: + +1. Navigate to **Settings > Operations > Incidents** and expand + **Incidents**: + + ![Incident Management Settings](./img/incident_management_settings_v13_3.png) + +1. Check the **Create an incident** + checkbox. +1. To customize the incident, select an [issue templates](../../user/project/description_templates.md#creating-issue-templates). +1. To send [an email notification](alert_notifications.md#email-notifications) to users + with [Developer permissions](../../user/permissions.md), select + **Send a separate email notification to Developers**. Email notifications will also be sent to users with **Maintainer** and **Owner** permissions. +1. Click **Save changes**. + +### Create incidents via the PagerDuty webhook + +> [Introduced](https://gitlab.com/gitlab-org/gitlab/-/issues/119018) in GitLab 13.3. + +You can set up a webhook with PagerDuty to automatically create a GitLab incident +for each PagerDuty incident. This configuration requires you to make changes +in both PagerDuty and GitLab: + +1. Sign in as a user with Maintainer [permissions](../../user/permissions.md). +1. Navigate to **Settings > Operations > Incidents** and expand **Incidents**. +1. Select the **PagerDuty integration** tab: + + ![PagerDuty incidents integration](./img/pagerduty_incidents_integration_v13_3.png) + +1. Activate the integration, and save the changes in GitLab. +1. Copy the value of **Webhook URL** for use in a later step. +1. Follow the steps described in the + [PagerDuty documentation](https://support.pagerduty.com/docs/webhooks) + to add the webhook URL to a PagerDuty webhook integration. + +To confirm the integration is successful, trigger a test incident from PagerDuty to +confirm that a GitLab incident is created from the incident. + +## Incident list + +For users with at least Guest [permissions](../../user/permissions.md), the +Incident list is available at **Operations > Incidents** in your project's sidebar. The list contains the following metrics: -![Incident List](img/incident_list_v13_4.png) +![Incident List](img/incident_list_v13_5.png) - **Status** - To filter incidents by their status, click **Open**, **Closed**, or **All** above the incident list. @@ -27,8 +99,8 @@ in your project's sidebar. The list contains the following metrics: - **{severity-low}** **Low - S4** - **{severity-unknown}** **Unknown** - NOTE: **Note:** - Editing incident severity on the incident details page was [introduced](https://gitlab.com/gitlab-org/gitlab/-/issues/229402) in GitLab 13.4. + [Editing incident severity](#change-severity) on the incident details page was + [introduced](https://gitlab.com/gitlab-org/gitlab/-/issues/229402) in GitLab 13.4. - **Incident** - The description of the incident, which attempts to capture the most meaningful data. @@ -44,113 +116,117 @@ The Incident list displays incidents sorted by incident created date. To see if a column is sortable, point your mouse at the header. Sortable columns display an arrow next to the column name. +Incidents share the [Issues API](../../user/project/issues/index.md). + TIP: **Tip:** For a live example of the incident list in action, visit this [demo project](https://gitlab.com/gitlab-examples/ops/incident-setup/everyone/tanuki-inc/-/incidents). -NOTE: **Note:** -Incidents share the [Issues API](../../user/project/issues/index.md). +## Incident details + +> [Introduced](https://gitlab.com/gitlab-org/gitlab/-/issues/230847) in GitLab 13.4. -## Configure incidents +Users with at least Reporter [permissions](../../user/permissions.md) can view +the Incident Details page. Navigate to **Operations > Incidents** in your project's +sidebar, and select an incident from the list. -> [Introduced](https://gitlab.com/gitlab-org/gitlab/-/issues/4925) in GitLab Ultimate 11.11. +When you take any of these actions on an incident, GitLab logs a system note and +displays it in the Incident Details view: -With Maintainer or higher [permissions](../../user/permissions.md), you can enable -or disable Incident Management features in the GitLab user interface -to create issues when alerts are triggered: +- Updating the severity of an incident + ([Introduced](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/42358) in GitLab 13.5.) -1. Navigate to **Settings > Operations > Incidents** and expand - **Incidents**: +For live examples of GitLab incidents, visit the `tanuki-inc` project's +[incident list page](https://gitlab.com/gitlab-examples/ops/incident-setup/everyone/tanuki-inc/-/incidents). +Click any incident in the list to display its incident details page. - ![Incident Management Settings](./img/incident_management_settings_v13_3.png) +### Summary -1. For GitLab versions 11.11 and greater, you can select the **Create an issue** - checkbox to create an issue based on your own - [issue templates](../../user/project/description_templates.md#creating-issue-templates). - For more information, see - [Trigger actions from alerts](../metrics/alerts.md#trigger-actions-from-alerts) **(ULTIMATE)**. -1. To create issues from alerts, select the template in the **Issue Template** - select box. -1. To send [separate email notifications](index.md#notify-developers-of-alerts) to users - with [Developer permissions](../../user/permissions.md), select - **Send a separate email notification to Developers**. -1. Click **Save changes**. +The summary section for incidents provides both critical details about and the +contents of the issue template (if one was used). The highlighted bar at the top +of the incident displays from left to right: -Appropriately configured alerts include an -[embedded chart](../metrics/embed.md#embedding-metrics-based-on-alerts-in-incident-issues) -for the query corresponding to the alert. You can also configure GitLab to -[close issues](../metrics/alerts.md#trigger-actions-from-alerts) -when you receive notification that the alert is resolved. +- The link to the original alert. +- The alert start time. +- The event count. -## Create an incident manually +Beneath the highlight bar, GitLab displays a summary that includes the following fields: -If you have at least Developer [permissions](../../user/permissions.md), to create an Incident, you have two options. +- Start time +- Severity +- `full_query` +- Monitoring tool -### From the Incidents List +Comments are displayed in threads, but can be displayed chronologically +[in a timeline view](#timeline-view). -> [Moved](https://gitlab.com/gitlab-org/monitor/health/-/issues/24) to GitLab core in 13.3. +### Alert details -- Navigate to **Operations > Incidents** and click **Create Incident**. -- Create a new issue using the `incident` template available when creating it. -- Create a new issue and assign the `incident` label to it. +Incidents show the details of linked alerts in a separate tab. To populate this +tab, the incident must have been created with a linked alert. Incidents +created automatically from alerts have this +field populated. -![Incident List Create](./img/incident_list_create_v13_3.png) +![Incident alert details](./img/incident_alert_details_v13_4.png) -### From the Issues List +### Timeline view -> [Introduced](https://gitlab.com/gitlab-org/gitlab/-/issues/230857) in GitLab 13.4. +> [Introduced](https://gitlab.com/gitlab-org/gitlab/-/issues/227836) in [GitLab Premium](https://about.gitlab.com/pricing/) 13.5. -- Navigate to **Issues > List** and click **Create Issue**. -- Create a new issue using the `type` drop-down and select `Incident`. -- The page refreshes and the page only displays fields relevant to Incidents. +To quickly see the latest updates on an incident, click +**{comments}** **Turn timeline view on** in the comment bar to display comments +un-threaded and ordered chronologically, newest to oldest: -![Incident List Create](./img/new_incident_create_v13_4.png) +![Timeline view toggle](./img/timeline_view_toggle_v13_5.png) -## Configure PagerDuty integration +### Service Level Agreement countdown timer -> [Introduced](https://gitlab.com/gitlab-org/gitlab/-/issues/119018) in GitLab 13.3. +> [Introduced](https://gitlab.com/gitlab-org/gitlab/-/issues/241663) in [GitLab Premium](https://about.gitlab.com/pricing/) 13.5. -You can set up a webhook with PagerDuty to automatically create a GitLab issue -for each PagerDuty incident. This configuration requires you to make changes -in both PagerDuty and GitLab: +After enabling **Incident SLA** in the Incident Management configuration, newly-created +incidents display a SLA (Service Level Agreement) timer showing the time remaining before +the SLA period expires. If the incident is not closed before the SLA period ends, GitLab +adds a `missed::SLA` label to the incident. -1. Sign in as a user with Maintainer [permissions](../../user/permissions.md). -1. Navigate to **Settings > Operations > Incidents** and expand **Incidents**. -1. Select the **PagerDuty integration** tab: +## Incident Actions - ![PagerDuty incidents integration](./img/pagerduty_incidents_integration_v13_3.png) +There are different actions avilable to help triage and respond to incidents. -1. Activate the integration, and save the changes in GitLab. -1. Copy the value of **Webhook URL** for use in a later step. -1. Follow the steps described in the - [PagerDuty documentation](https://support.pagerduty.com/docs/webhooks) - to add the webhook URL to a PagerDuty webhook integration. +### Assign incidents -To confirm the integration is successful, trigger a test incident from PagerDuty to -confirm that a GitLab issue is created from the incident. +Assign incidents to users that are actively responding. Select **Edit** in the right-hand side bar to select or deselect assignees. -## Incident details +### Change severity -> [Introduced](https://gitlab.com/gitlab-org/gitlab/-/issues/230847) in GitLab 13.4. +See [Incident List](#incident-list) for a full description of the severities available. Select **Edit** in the right-hand side bar to change the severity of an incident. -### Summary +### Add a to do -The summary section for incidents provides both critical details about and the -contents of the issue template (if one was used). The highlighted bar at the top -of the incident displays from left to right: the link to the original alert, the -alert start time, and the event count. Beneath the highlight bar, GitLab -displays a summary that includes the following fields: +Add a to-do for incidents that you want to track in your to-do list. Clicke the **Add a to do** button at the top of the right-hand side bar to add a to do. -- Start time -- Severity -- `full_query` -- Monitoring tool +### Manage incidents from Slack -### Alert details +Slack slash commands allow you to control GitLab and view GitLab content without leaving Slack. -Incidents show the details of linked alerts in a separate tab. To populate this -tab, the incident must have been created with a linked alert. Incidents -[created automatically](#configure-incidents) from alerts will have this -field populated. +Learn how to [set up Slack slash commands](../../user/project/integrations/slack_slash_commands.md) +and how to [use the available slash commands](../../integration/slash_commands.md). -![Incident alert details](./img/incident_alert_details_v13_4.png) +### Associate Zoom calls + +GitLab enables you to [associate a Zoom meeting with an issue](../../user/project/issues/associate_zoom_meeting.md) +for synchronous communication during incident management. After starting a Zoom +call for an incident, you can associate the conference call with an issue. Your +team members can join the Zoom call without requesting a link. + +### Embed metrics in incidents + +You can embed metrics anywhere [GitLab Markdown](../../user/markdown.md) is +used, such as descriptions, comments on issues, and merge requests. Embedding +metrics helps you share them when discussing incidents or performance issues. +You can output the dashboard directly into any issue, merge request, epic, or +any other Markdown text field in GitLab by +[copying and pasting the link to the metrics dashboard](../metrics/embed.md#embedding-gitlab-managed-kubernetes-metrics). + +You can embed both [GitLab-hosted metrics](../metrics/embed.md) and +[Grafana metrics](../metrics/embed_grafana.md) in incidents and issue +templates. diff --git a/doc/operations/incident_management/index.md b/doc/operations/incident_management/index.md index 28e69a6bbfe..60571c03d74 100644 --- a/doc/operations/incident_management/index.md +++ b/doc/operations/incident_management/index.md @@ -8,74 +8,11 @@ info: To determine the technical writer assigned to the Stage/Group associated w > [Introduced](https://gitlab.com/groups/gitlab-org/-/epics/2877) in GitLab 13.0. -Incident Management enables developers to easily discover and view the alerts -generated by their application. By surfacing alert information where the code is -being developed, efficiency and awareness can be increased. - -GitLab offers solutions for handling incidents in your applications and services, -such as [setting up Prometheus alerts](#configure-prometheus-alerts), -[displaying metrics](./alert_details.md#embed-metrics-in-incidents-and-issues), and sending notifications. - -## Alert notifications - -### Slack Notifications - -> [Introduced](https://gitlab.com/gitlab-org/gitlab/-/issues/216326) in GitLab 13.1. - -You can be alerted via a Slack message when a new alert has been received. - -See the [Slack Notifications Service docs](../../user/project/integrations/slack.md) for information on how to set this up. - -### Notify developers of alerts - -GitLab can react to the alerts triggered from your applications and services -by creating issues and alerting developers through email. By default, GitLab -sends these emails to [owners and maintainers](../../user/permissions.md) of the project. -These emails contain details of the alert, and a link for more information. - -To send separate email notifications to users with -[Developer permissions](../../user/permissions.md), see -[Configure incidents](./incidents.md#configure-incidents). - -## Configure Prometheus alerts - -You can set up Prometheus alerts in: - -- [GitLab-managed Prometheus](../metrics/alerts.md) installations. -- [Self-managed Prometheus](../metrics/alerts.md#external-prometheus-instances) installations. - -Prometheus alerts are created by the special Alert Bot user. You can't remove this -user, but it does not count toward your license limit. - -## Configure external generic alerts - -GitLab can accept alerts from any source through a generic webhook receiver. -When [configuring the generic alerts integration](./generic_alerts.md), GitLab -creates a unique endpoint which receives a JSON-formatted, customizable payload. - -After configuration, you can manage your alerts using either the -[alerts section](./alerts.md) or the [alert details section](./alert_details.md). - -## Integrate incidents with Slack - -Slack slash commands allow you to control GitLab and view GitLab content without leaving Slack. - -Learn how to [set up Slack slash commands](../../user/project/integrations/slack_slash_commands.md) -and how to [use the available slash commands](../../integration/slash_commands.md). - -## Integrate issues with Zoom - -GitLab enables you to [associate a Zoom meeting with an issue](../../user/project/issues/associate_zoom_meeting.md) -for synchronous communication during incident management. After starting a Zoom -call for an incident, you can associate the conference call with an issue. Your -team members can join the Zoom call without requesting a link. - -## More information - -For information about GitLab and incident management, see: - -- [Generic alerts](generic_alerts.md) -- [Alerts](alerts.md) -- [Alert details](alert_details.md) -- [Incidents](incidents.md) -- [Status page](status_page.md) +Incident Management enables developers to easily triage and view the alerts and incidents +generated by their application. By surfacing alerts and incidents where the code is +being developed, efficiency and awareness can be increased. Check out the following sections for more information: + +- [Integrate your monitoring tools](alert_integrations.md). +- Receive [notifications](alert_notifications.md) for triggered alerts. +- Triage [Alerts](alerts.md) and [Incidents](incidents.md). +- Inform stakeholders with [Status Page](status_page.md). diff --git a/doc/operations/incident_management/integrations.md b/doc/operations/incident_management/integrations.md new file mode 100644 index 00000000000..9d4f32ab2bf --- /dev/null +++ b/doc/operations/incident_management/integrations.md @@ -0,0 +1,16 @@ +--- +stage: Monitor +group: Health +info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#designated-technical-writers +--- + +# Integrations + +> [Introduced](https://gitlab.com/gitlab-org/gitlab/-/issues/245331) in [GitLab Core](https://about.gitlab.com/pricing/) 13.5. + +With Maintainer or higher [permissions](../../user/permissions.md), you can view +the list of configured alerts integrations by navigating to +**Settings > Operations** in your project's sidebar menu, and expanding **Alerts** section. +The list displays the integration name, type, and status (enabled or disabled): + +![Current Integrations](img/integrations_list_v13_5.png) diff --git a/doc/operations/incident_management/status_page.md b/doc/operations/incident_management/status_page.md index 9db3593caec..e5d0ae1ddbb 100644 --- a/doc/operations/incident_management/status_page.md +++ b/doc/operations/incident_management/status_page.md @@ -4,7 +4,7 @@ group: Health info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#designated-technical-writers --- -# GitLab Status Page **(ULTIMATE)** +# Status Page > [Introduced](https://gitlab.com/groups/gitlab-org/-/epics/2479) in [GitLab Ultimate](https://about.gitlab.com/pricing/) 12.10. @@ -25,7 +25,7 @@ Clicking an incident displays a detail page with more information about a partic valid image extension. [Introduced](https://gitlab.com/gitlab-org/gitlab/-/issues/205166) in GitLab 13.1. - A chronological ordered list of updates to the incident. -## Set up a GitLab Status Page +## Set up a Status Page To configure a GitLab Status Page you must: @@ -37,11 +37,10 @@ To configure a GitLab Status Page you must: ### Configure GitLab with cloud provider information -To provide GitLab with the AWS account information needed to push content to your Status Page: - -NOTE: **Note:** Only AWS S3 is supported as a deploy target. +To provide GitLab with the AWS account information needed to push content to your Status Page: + 1. Sign into GitLab as a user with Maintainer or greater [permissions](../../user/permissions.md). 1. Navigate to **{settings}** **Settings > Operations**. Next to **Status Page**, click **Expand**. @@ -74,8 +73,6 @@ the necessary CI/CD variables to deploy the Status Page to AWS S3: 1. Scroll to **Variables**, and click **Expand**. 1. Add the following variables from your Amazon Console: - `S3_BUCKET_NAME` - The name of the Amazon S3 bucket. - - NOTE: **Note:** If no bucket with the provided name exists, the first pipeline run creates one and configures it for [static website hosting](https://docs.aws.amazon.com/AmazonS3/latest/dev/HostingWebsiteOnS3Setup.html). @@ -128,10 +125,7 @@ To publish an incident: 1. Create an issue in the project you enabled the GitLab Status Page settings in. 1. A [project or group owner](../../user/permissions.md) must use the `/publish` [quick action](../../user/project/quick_actions.md) to publish the - issue to the GitLab Status Page. - - NOTE: **Note:** - Confidential issues can't be published. + issue to the GitLab Status Page. Confidential issues can't be published. A background worker publishes the issue onto the Status Page using the credentials you provided during setup. As part of publication, GitLab will: |