diff options
Diffstat (limited to 'doc/administration/monitoring/prometheus/gitlab_metrics.md')
-rw-r--r-- | doc/administration/monitoring/prometheus/gitlab_metrics.md | 120 |
1 files changed, 75 insertions, 45 deletions
diff --git a/doc/administration/monitoring/prometheus/gitlab_metrics.md b/doc/administration/monitoring/prometheus/gitlab_metrics.md index f725db9a039..f3084b1732e 100644 --- a/doc/administration/monitoring/prometheus/gitlab_metrics.md +++ b/doc/administration/monitoring/prometheus/gitlab_metrics.md @@ -6,30 +6,29 @@ info: To determine the technical writer assigned to the Stage/Group associated w # GitLab Prometheus metrics ->**Note:** -Available since [Omnibus GitLab 9.3](https://gitlab.com/gitlab-org/gitlab-foss/issues/29118). For -installations from source you'll have to configure it yourself. - To enable the GitLab Prometheus metrics: -1. Log into GitLab as an administrator. -1. Navigate to **Admin Area > Settings > Metrics and profiling**. +1. Log into GitLab as a user with [administrator permissions](../../../user/permissions.md). +1. Navigate to **{admin}** **Admin Area > Settings > Metrics and profiling**. 1. Find the **Metrics - Prometheus** section, and click **Enable Prometheus Metrics**. 1. [Restart GitLab](../../restart_gitlab.md#omnibus-gitlab-restart) for the changes to take effect. +NOTE: **Note:** +For installations from source you'll have to configure it yourself. + ## Collecting the metrics GitLab monitors its own internal service metrics, and makes them available at the -`/-/metrics` endpoint. Unlike other [Prometheus](https://prometheus.io) exporters, in order to access -it, the client IP needs to be [included in a whitelist](../ip_whitelist.md). +`/-/metrics` endpoint. Unlike other [Prometheus](https://prometheus.io) exporters, to access +it, the client IP address needs to be [explicitly allowed](../ip_whitelist.md). -For Omnibus and Chart installations, these metrics are automatically enabled -and collected as of [GitLab -9.4](https://gitlab.com/gitlab-org/omnibus-gitlab/-/merge_requests/1702). For -source installations or earlier versions, these metrics will need to be enabled +For [Omnibus GitLab](https://docs.gitlab.com/omnibus/) and Chart installations, +these metrics are enabled and collected as of +[GitLab 9.4](https://gitlab.com/gitlab-org/omnibus-gitlab/-/merge_requests/1702). +For source installations or earlier versions, these metrics must be enabled manually and collected by a Prometheus server. -See also [Sidekiq metrics](#sidekiq-metrics) for how to enable and view metrics from Sidekiq nodes. +For enabling and viewing metrics from Sidekiq nodes, see [Sidekiq metrics](#sidekiq-metrics). ## Metrics available @@ -43,32 +42,33 @@ The following metrics are available: | `gitlab_cache_operation_duration_seconds` | Histogram | 10.2 | Cache access time | | | `gitlab_cache_operations_total` | Counter | 12.2 | Cache operations by controller/action | `controller`, `action`, `operation` | | `gitlab_ci_pipeline_creation_duration_seconds` | Histogram | 13.0 | Time in seconds it takes to create a CI/CD pipeline | | +| `gitlab_ci_pipeline_size_builds` | Histogram | 13.1 | Total number of builds within a pipeline grouped by a pipeline source | `source` | | `job_waiter_started_total` | Counter | 12.9 | Number of batches of jobs started where a web request is waiting for the jobs to complete | `worker` | | `job_waiter_timeouts_total` | Counter | 12.9 | Number of batches of jobs that timed out where a web request is waiting for the jobs to complete | `worker` | | `gitlab_database_transaction_seconds` | Histogram | 12.1 | Time spent in database transactions, in seconds | | | `gitlab_method_call_duration_seconds` | Histogram | 10.2 | Method calls real duration | `controller`, `action`, `module`, `method` | | `gitlab_page_out_of_bounds` | Counter | 12.8 | Counter for the PageLimiter pagination limit being hit | `controller`, `action`, `bot` | | `gitlab_rails_queue_duration_seconds` | Histogram | 9.4 | Measures latency between GitLab Workhorse forwarding a request to Rails | | -| `gitlab_sql_duration_seconds` | Histogram | 10.2 | SQL execution time, excluding SCHEMA operations and BEGIN / COMMIT | | -| `gitlab_transaction_allocated_memory_bytes` | Histogram | 10.2 | Allocated memory for all transactions (gitlab_transaction_* metrics) | | +| `gitlab_sql_duration_seconds` | Histogram | 10.2 | SQL execution time, excluding `SCHEMA` operations and `BEGIN` / `COMMIT` | | +| `gitlab_transaction_allocated_memory_bytes` | Histogram | 10.2 | Allocated memory for all transactions (`gitlab_transaction_*` metrics) | | | `gitlab_transaction_cache_<key>_count_total` | Counter | 10.2 | Counter for total Rails cache calls (per key) | | | `gitlab_transaction_cache_<key>_duration_total` | Counter | 10.2 | Counter for total time (seconds) spent in Rails cache calls (per key) | | | `gitlab_transaction_cache_count_total` | Counter | 10.2 | Counter for total Rails cache calls (aggregate) | | | `gitlab_transaction_cache_duration_total` | Counter | 10.2 | Counter for total time (seconds) spent in Rails cache calls (aggregate) | | | `gitlab_transaction_cache_read_hit_count_total` | Counter | 10.2 | Counter for cache hits for Rails cache calls | `controller`, `action` | | `gitlab_transaction_cache_read_miss_count_total` | Counter | 10.2 | Counter for cache misses for Rails cache calls | `controller`, `action` | -| `gitlab_transaction_duration_seconds` | Histogram | 10.2 | Duration for all transactions (gitlab_transaction_* metrics) | `controller`, `action` | +| `gitlab_transaction_duration_seconds` | Histogram | 10.2 | Duration for all transactions (`gitlab_transaction_*` metrics) | `controller`, `action` | | `gitlab_transaction_event_build_found_total` | Counter | 9.4 | Counter for build found for API /jobs/request | | | `gitlab_transaction_event_build_invalid_total` | Counter | 9.4 | Counter for build invalid due to concurrency conflict for API /jobs/request | | | `gitlab_transaction_event_build_not_found_cached_total` | Counter | 9.4 | Counter for cached response of build not found for API /jobs/request | | | `gitlab_transaction_event_build_not_found_total` | Counter | 9.4 | Counter for build not found for API /jobs/request | | | `gitlab_transaction_event_change_default_branch_total` | Counter | 9.4 | Counter when default branch is changed for any repository | | | `gitlab_transaction_event_create_repository_total` | Counter | 9.4 | Counter when any repository is created | | -| `gitlab_transaction_event_etag_caching_cache_hit_total` | Counter | 9.4 | Counter for etag cache hit. | `endpoint` | -| `gitlab_transaction_event_etag_caching_header_missing_total` | Counter | 9.4 | Counter for etag cache miss - header missing | `endpoint` | -| `gitlab_transaction_event_etag_caching_key_not_found_total` | Counter | 9.4 | Counter for etag cache miss - key not found | `endpoint` | -| `gitlab_transaction_event_etag_caching_middleware_used_total` | Counter | 9.4 | Counter for etag middleware accessed | `endpoint` | -| `gitlab_transaction_event_etag_caching_resource_changed_total` | Counter | 9.4 | Counter for etag cache miss - resource changed | `endpoint` | +| `gitlab_transaction_event_etag_caching_cache_hit_total` | Counter | 9.4 | Counter for ETag cache hit. | `endpoint` | +| `gitlab_transaction_event_etag_caching_header_missing_total` | Counter | 9.4 | Counter for ETag cache miss - header missing | `endpoint` | +| `gitlab_transaction_event_etag_caching_key_not_found_total` | Counter | 9.4 | Counter for ETag cache miss - key not found | `endpoint` | +| `gitlab_transaction_event_etag_caching_middleware_used_total` | Counter | 9.4 | Counter for ETag middleware accessed | `endpoint` | +| `gitlab_transaction_event_etag_caching_resource_changed_total` | Counter | 9.4 | Counter for ETag cache miss - resource changed | `endpoint` | | `gitlab_transaction_event_fork_repository_total` | Counter | 9.4 | Counter for repository forks (RepositoryForkWorker). Only incremented when source repository exists | | | `gitlab_transaction_event_import_repository_total` | Counter | 9.4 | Counter for repository imports (RepositoryImportWorker) | | | `gitlab_transaction_event_push_branch_total` | Counter | 9.4 | Counter for all branch pushes | | @@ -92,6 +92,13 @@ The following metrics are available: | `gitlab_view_rendering_duration_seconds` | Histogram | 10.2 | Duration for views (histogram) | `controller`, `action`, `view` | | `http_requests_total` | Counter | 9.4 | Rack request count | `method` | | `http_request_duration_seconds` | Histogram | 9.4 | HTTP response time from rack middleware | `method`, `status` | +| `gitlab_transaction_db_count_total` | Counter | 13.1 | Counter for total number of sql calls | `controller`, `action` | +| `gitlab_transaction_db_write_count_total` | Counter | 13.1 | Counter for total number of write sql calls | `controller`, `action` | +| `gitlab_transaction_db_cached_count_total` | Counter | 13.1 | Counter for total number of cached sql calls | `controller`, `action` | +| `http_redis_requests_duration_seconds` | Histogram | 13.1 | Redis requests duration during web transactions | `controller`, `action` | +| `http_redis_requests_total` | Counter | 13.1 | Redis requests count during web transactions | `controller`, `action` | +| `http_elasticsearch_requests_duration_seconds` **(STARTER)** | Histogram | 13.1 | Elasticsearch requests duration during web transactions | `controller`, `action` | +| `http_elasticsearch_requests_total` **(STARTER)** | Counter | 13.1 | Elasticsearch requests count during web transactions | `controller`, `action` | | `pipelines_created_total` | Counter | 9.4 | Counter of pipelines created | | | `rack_uncaught_errors_total` | Counter | 9.4 | Rack connections handling uncaught errors count | | | `user_session_logins_total` | Counter | 9.4 | Counter of how many users have logged in | | @@ -99,7 +106,7 @@ The following metrics are available: | `failed_login_captcha_total` | Gauge | 11.0 | Counter of failed CAPTCHA attempts during login | | | `successful_login_captcha_total` | Gauge | 11.0 | Counter of successful CAPTCHA attempts during login | | | `auto_devops_pipelines_completed_total` | Counter | 12.7 | Counter of completed Auto DevOps pipelines, labeled by status | | -| `gitlab_metrics_dashboard_processing_time_ms` | Summary | 12.10 | Metrics dashboard processing time in milliseconds | service, stages | +| `gitlab_metrics_dashboard_processing_time_ms` | Summary | 12.10 | Metrics dashboard processing time in milliseconds | service, stages | ## Metrics controlled by a feature flag @@ -119,13 +126,17 @@ configuration option in `gitlab.yml`. These metrics are served from the | Metric | Type | Since | Description | Labels | |:---------------------------------------------- |:------- |:----- |:----------- |:------ | -| `sidekiq_jobs_cpu_seconds` | Histogram | 12.4 | Seconds of cpu time to run Sidekiq job | `queue`, `boundary`, `external_dependencies`, `feature_category`, `job_status`, `urgency` | +| `sidekiq_jobs_cpu_seconds` | Histogram | 12.4 | Seconds of CPU time to run Sidekiq job | `queue`, `boundary`, `external_dependencies`, `feature_category`, `job_status`, `urgency` | | `sidekiq_jobs_completion_seconds` | Histogram | 12.2 | Seconds to complete Sidekiq job | `queue`, `boundary`, `external_dependencies`, `feature_category`, `job_status`, `urgency` | | `sidekiq_jobs_db_seconds` | Histogram | 12.9 | Seconds of DB time to run Sidekiq job | `queue`, `boundary`, `external_dependencies`, `feature_category`, `job_status`, `urgency` | | `sidekiq_jobs_gitaly_seconds` | Histogram | 12.9 | Seconds of Gitaly time to run Sidekiq job | `queue`, `boundary`, `external_dependencies`, `feature_category`, `job_status`, `urgency` | +| `sidekiq_redis_requests_duration_seconds` | Histogram | 13.1 | Duration in seconds that a Sidekiq job spent querying a Redis server | `queue`, `boundary`, `external_dependencies`, `feature_category`, `job_status`, `urgency` | +| `sidekiq_elasticsearch_requests_duration_seconds` | Histogram | 13.1 | Duration in seconds that a Sidekiq job spent in requests to an Elasticsearch server | `queue`, `boundary`, `external_dependencies`, `feature_category`, `job_status`, `urgency` | | `sidekiq_jobs_queue_duration_seconds` | Histogram | 12.5 | Duration in seconds that a Sidekiq job was queued before being executed | `queue`, `boundary`, `external_dependencies`, `feature_category`, `urgency` | | `sidekiq_jobs_failed_total` | Counter | 12.2 | Sidekiq jobs failed | `queue`, `boundary`, `external_dependencies`, `feature_category`, `urgency` | | `sidekiq_jobs_retried_total` | Counter | 12.2 | Sidekiq jobs retried | `queue`, `boundary`, `external_dependencies`, `feature_category`, `urgency` | +| `sidekiq_redis_requests_total` | Counter | 13.1 | Redis requests during a Sidekiq job execution | `queue`, `boundary`, `external_dependencies`, `feature_category`, `job_status`, `urgency` | +| `sidekiq_elasticsearch_requests_total` | Counter | 13.1 | Elasticsearch requests during a Sidekiq job execution | `queue`, `boundary`, `external_dependencies`, `feature_category`, `job_status`, `urgency` | | `sidekiq_running_jobs` | Gauge | 12.2 | Number of Sidekiq jobs running | `queue`, `boundary`, `external_dependencies`, `feature_category`, `urgency` | | `sidekiq_concurrency` | Gauge | 12.5 | Maximum number of Sidekiq jobs | | | `geo_db_replication_lag_seconds` | Gauge | 10.2 | Database replication lag (seconds) | `url` | @@ -172,7 +183,29 @@ The following metrics are available: | Metric | Type | Since | Description | |:--------------------------------- |:--------- |:------------------------------------------------------------- |:-------------------------------------- | -| `db_load_balancing_hosts` | Gauge | [12.3](https://gitlab.com/gitlab-org/gitlab/issues/13630) | Current number of load balancing hosts | +| `db_load_balancing_hosts` | Gauge | [12.3](https://gitlab.com/gitlab-org/gitlab/-/issues/13630) | Current number of load balancing hosts | + +## Connection pool metrics + +These metrics record the status of the database [connection pools](https://api.rubyonrails.org/classes/ActiveRecord/ConnectionAdapters/ConnectionPool.html). + +They all have these labels: + +1. `class` - the Ruby class being recorded. + 1. `ActiveRecord::Base` is the main database connection. + 1. `Geo::TrackingBase` is the connection to the Geo tracking database, if + enabled. +1. `host` - the host name used to connect to the database. +1. `port` - the port used to connect to the database. + +| Metric | Type | Since | Description | +|:----------------------------------------------|:------|:------|:--------------------------------------------------| +| `gitlab_database_connection_pool_size` | Gauge | 13.0 | Total connection pool capacity | +| `gitlab_database_connection_pool_connections` | Gauge | 13.0 | Current connections in the pool | +| `gitlab_database_connection_pool_busy` | Gauge | 13.0 | Connections in use where the owner is still alive | +| `gitlab_database_connection_pool_dead` | Gauge | 13.0 | Connections in use where the owner is not alive | +| `gitlab_database_connection_pool_idle` | Gauge | 13.0 | Connections not in use | +| `gitlab_database_connection_pool_waiting` | Gauge | 13.0 | Threads currently waiting on this queue | ## Ruby metrics @@ -205,31 +238,28 @@ Unicorn specific metrics, when Unicorn is used. When Puma is used instead of Unicorn, the following metrics are available: -| Metric | Type | Since | Description | -|:---------------------------------------------- |:------- |:----- |:----------- | -| `puma_workers` | Gauge | 12.0 | Total number of workers | -| `puma_running_workers` | Gauge | 12.0 | Number of booted workers | -| `puma_stale_workers` | Gauge | 12.0 | Number of old workers | -| `puma_running` | Gauge | 12.0 | Number of running threads | -| `puma_queued_connections` | Gauge | 12.0 | Number of connections in that worker's "to do" set waiting for a worker thread | -| `puma_active_connections` | Gauge | 12.0 | Number of threads processing a request | -| `puma_pool_capacity` | Gauge | 12.0 | Number of requests the worker is capable of taking right now | -| `puma_max_threads` | Gauge | 12.0 | Maximum number of worker threads | -| `puma_idle_threads` | Gauge | 12.0 | Number of spawned threads which are not processing a request | -| `puma_killer_terminations_total` | Gauge | 12.0 | Number of workers terminated by PumaWorkerKiller | +| Metric | Type | Since | Description | +|:--------------------------------- |:------- |:----- |:----------- | +| `puma_workers` | Gauge | 12.0 | Total number of workers | +| `puma_running_workers` | Gauge | 12.0 | Number of booted workers | +| `puma_stale_workers` | Gauge | 12.0 | Number of old workers | +| `puma_running` | Gauge | 12.0 | Number of running threads | +| `puma_queued_connections` | Gauge | 12.0 | Number of connections in that worker's "to do" set waiting for a worker thread | +| `puma_active_connections` | Gauge | 12.0 | Number of threads processing a request | +| `puma_pool_capacity` | Gauge | 12.0 | Number of requests the worker is capable of taking right now | +| `puma_max_threads` | Gauge | 12.0 | Maximum number of worker threads | +| `puma_idle_threads` | Gauge | 12.0 | Number of spawned threads which are not processing a request | +| `puma_killer_terminations_total` | Gauge | 12.0 | Number of workers terminated by PumaWorkerKiller | ## Metrics shared directory GitLab's Prometheus client requires a directory to store metrics data shared between multi-process services. Those files are shared among all instances running under Unicorn server. -The directory needs to be accessible to all running Unicorn's processes otherwise -metrics will not function correctly. - -For best performance its advisable that this directory will be located in `tmpfs`. - -Its location is configured using environment variable `prometheus_multiproc_dir`. +The directory must be accessible to all running Unicorn's processes, or +metrics won't function correctly. -If GitLab is installed using Omnibus and `tmpfs` is available then metrics -directory will be automatically configured. +This directory's location is configured using environment variable `prometheus_multiproc_dir`. +For best performance, create this directory in `tmpfs`. -[← Back to the main Prometheus page](index.md) +If GitLab is installed using [Omnibus GitLab](https://docs.gitlab.com/omnibus/) +and `tmpfs` is available, then the metrics directory will be configured for you. |