summaryrefslogtreecommitdiff
path: root/doc/administration/monitoring/prometheus/gitlab_metrics.md
diff options
context:
space:
mode:
Diffstat (limited to 'doc/administration/monitoring/prometheus/gitlab_metrics.md')
-rw-r--r--doc/administration/monitoring/prometheus/gitlab_metrics.md72
1 files changed, 68 insertions, 4 deletions
diff --git a/doc/administration/monitoring/prometheus/gitlab_metrics.md b/doc/administration/monitoring/prometheus/gitlab_metrics.md
index 3bfcc9a289e..3dcd1593099 100644
--- a/doc/administration/monitoring/prometheus/gitlab_metrics.md
+++ b/doc/administration/monitoring/prometheus/gitlab_metrics.md
@@ -43,10 +43,52 @@ The following metrics are available:
| redis_ping_latency_seconds | Gauge | 9.4 | Round trip time of the redis ping |
| user_session_logins_total | Counter | 9.4 | Counter of how many users have logged in |
| upload_file_does_not_exist | Counter | 10.7 in EE, 11.5 in CE | Number of times an upload record could not find its file |
-| failed_login_captcha_total | Gauge | 11.0 | Counter of failed CAPTCHA attempts during login |
-| successful_login_captcha_total | Gauge | 11.0 | Counter of successful CAPTCHA attempts during login |
-| unicorn_active_connections | Gauge | 11.0 | The number of active Unicorn connections (workers) |
-| unicorn_queued_connections | Gauge | 11.0 | The number of queued Unicorn connections |
+| failed_login_captcha_total | Gauge | 11.0 | Counter of failed CAPTCHA attempts during login |
+| successful_login_captcha_total | Gauge | 11.0 | Counter of successful CAPTCHA attempts during login |
+| unicorn_active_connections | Gauge | 11.0 | The number of active Unicorn connections (workers) |
+| unicorn_queued_connections | Gauge | 11.0 | The number of queued Unicorn connections |
+| unicorn_workers | Gauge | 12.0 | The number of Unicorn workers |
+
+## Sidekiq Metrics available for Geo **[PREMIUM]**
+
+Sidekiq jobs may also gather metrics, and these metrics can be accessed if the Sidekiq exporter is enabled (e.g. via
+the `monitoring.sidekiq_exporter` configuration option in `gitlab.yml`.
+
+| Metric | Type | Since | Description | Labels |
+|:-------------------------------------------- |:------- |:----- |:----------- |:------ |
+| geo_db_replication_lag_seconds | Gauge | 10.2 | Database replication lag (seconds) | url
+| geo_repositories | Gauge | 10.2 | Total number of repositories available on primary | url
+| geo_repositories_synced | Gauge | 10.2 | Number of repositories synced on secondary | url
+| geo_repositories_failed | Gauge | 10.2 | Number of repositories failed to sync on secondary | url
+| geo_lfs_objects | Gauge | 10.2 | Total number of LFS objects available on primary | url
+| geo_lfs_objects_synced | Gauge | 10.2 | Number of LFS objects synced on secondary | url
+| geo_lfs_objects_failed | Gauge | 10.2 | Number of LFS objects failed to sync on secondary | url
+| geo_attachments | Gauge | 10.2 | Total number of file attachments available on primary | url
+| geo_attachments_synced | Gauge | 10.2 | Number of attachments synced on secondary | url
+| geo_attachments_failed | Gauge | 10.2 | Number of attachments failed to sync on secondary | url
+| geo_last_event_id | Gauge | 10.2 | Database ID of the latest event log entry on the primary | url
+| geo_last_event_timestamp | Gauge | 10.2 | UNIX timestamp of the latest event log entry on the primary | url
+| geo_cursor_last_event_id | Gauge | 10.2 | Last database ID of the event log processed by the secondary | url
+| geo_cursor_last_event_timestamp | Gauge | 10.2 | Last UNIX timestamp of the event log processed by the secondary | url
+| geo_status_failed_total | Counter | 10.2 | Number of times retrieving the status from the Geo Node failed | url
+| geo_last_successful_status_check_timestamp | Gauge | 10.2 | Last timestamp when the status was successfully updated | url
+| geo_lfs_objects_synced_missing_on_primary | Gauge | 10.7 | Number of LFS objects marked as synced due to the file missing on the primary | url
+| geo_job_artifacts_synced_missing_on_primary | Gauge | 10.7 | Number of job artifacts marked as synced due to the file missing on the primary | url
+| geo_attachments_synced_missing_on_primary | Gauge | 10.7 | Number of attachments marked as synced due to the file missing on the primary | url
+| geo_repositories_checksummed_count | Gauge | 10.7 | Number of repositories checksummed on primary | url
+| geo_repositories_checksum_failed_count | Gauge | 10.7 | Number of repositories failed to calculate the checksum on primary | url
+| geo_wikis_checksummed_count | Gauge | 10.7 | Number of wikis checksummed on primary | url
+| geo_wikis_checksum_failed_count | Gauge | 10.7 | Number of wikis failed to calculate the checksum on primary | url
+| geo_repositories_verified_count | Gauge | 10.7 | Number of repositories verified on secondary | url
+| geo_repositories_verification_failed_count | Gauge | 10.7 | Number of repositories failed to verify on secondary | url
+| geo_repositories_checksum_mismatch_count | Gauge | 10.7 | Number of repositories that checksum mismatch on secondary | url
+| geo_wikis_verified_count | Gauge | 10.7 | Number of wikis verified on secondary | url
+| geo_wikis_verification_failed_count | Gauge | 10.7 | Number of wikis failed to verify on secondary | url
+| geo_wikis_checksum_mismatch_count | Gauge | 10.7 | Number of wikis that checksum mismatch on secondary | url
+| geo_repositories_checked_count | Gauge | 11.1 | Number of repositories that have been checked via `git fsck` | url
+| geo_repositories_checked_failed_count | Gauge | 11.1 | Number of repositories that have a failure from `git fsck` | url
+| geo_repositories_retrying_verification_count | Gauge | 11.2 | Number of repositories verification failures that Geo is actively trying to correct on secondary | url
+| geo_wikis_retrying_verification_count | Gauge | 11.2 | Number of wikis verification failures that Geo is actively trying to correct on secondary | url
### Ruby metrics
@@ -59,9 +101,31 @@ Some basic Ruby runtime metrics are available:
| ruby_file_descriptors | Gauge | 11.1 | File descriptors per process |
| ruby_memory_bytes | Gauge | 11.1 | Memory usage by process |
| ruby_sampler_duration_seconds_total | Counter | 11.1 | Time spent collecting stats |
+| ruby_process_cpu_seconds_total | Gauge | 12.0 | Total amount of CPU time per process |
+| ruby_process_max_fds | Gauge | 12.0 | Maximum number of open file descriptors per process |
+| ruby_process_resident_memory_bytes | Gauge | 12.0 | Memory usage by process, measured in bytes |
+| ruby_process_start_time_seconds | Gauge | 12.0 | The elapsed time between system boot and the process started, measured in seconds |
[GC.stat]: https://ruby-doc.org/core-2.3.0/GC.html#method-c-stat
+## Puma Metrics **[EXPERIMENTAL]**
+
+When Puma is used instead of Unicorn, following metrics are available:
+
+| Metric | Type | Since | Description |
+|:-------------------------------------------- |:------- |:----- |:----------- |
+| puma_workers | Gauge | 12.0 | Total number of workers |
+| puma_running_workers | Gauge | 12.0 | Number of booted workers |
+| puma_stale_workers | Gauge | 12.0 | Number of old workers |
+| puma_phase | Gauge | 12.0 | Phase number (increased during phased restarts) |
+| puma_running | Gauge | 12.0 | Number of running threads |
+| puma_queued_connections | Gauge | 12.0 | Number of connections in that worker's "todo" set waiting for a worker thread |
+| puma_active_connections | Gauge | 12.0 | Number of threads processing a request |
+| puma_pool_capacity | Gauge | 12.0 | Number of requests the worker is capable of taking right now |
+| puma_max_threads | Gauge | 12.0 | Maximum number of worker threads |
+| puma_idle_threads | Gauge | 12.0 | Number of spawned threads which are not processing a request |
+| rack_state_total | Gauge | 12.0 | Number of requests in a given rack state |
+
## Metrics shared directory
GitLab's Prometheus client requires a directory to store metrics data shared between multi-process services.