From c2cbfc5c4afbe8385659f97769db8450284639cf Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Kamil=20Trzci=C5=84ski?= <ayufan@ayufan.eu>
Date: Tue, 20 Aug 2019 17:25:04 +0200
Subject: Rework `Sidekiq::JobsThreads` into `Monitor`

This makes:
- very shallow `Middleware::Monitor` to only request tracking
  of sidekiq jobs,
- `SidekiqStatus::Monitor` to be responsible to maintain persistent
  connection to receive messages,
- `SidekiqStatus::Monitor` to always use structured logging
  and instance variables
---
 doc/administration/troubleshooting/sidekiq.md | 118 ++++++++++++++++++++++++++
 1 file changed, 118 insertions(+)

(limited to 'doc')

diff --git a/doc/administration/troubleshooting/sidekiq.md b/doc/administration/troubleshooting/sidekiq.md
index 7067958ecb4..9b016c64e29 100644
--- a/doc/administration/troubleshooting/sidekiq.md
+++ b/doc/administration/troubleshooting/sidekiq.md
@@ -169,3 +169,121 @@ The PostgreSQL wiki has details on the query you can run to see blocking
 queries. The query is different based on PostgreSQL version. See
 [Lock Monitoring](https://wiki.postgresql.org/wiki/Lock_Monitoring) for
 the query details.
+
+## Managing Sidekiq queues
+
+It is possible to use [Sidekiq API](https://github.com/mperham/sidekiq/wiki/API)
+to perform a number of troubleshoting on Sidekiq.
+
+These are the administrative commands and it should only be used if currently
+admin interface is not suitable due to scale of installation.
+
+All this commands should be run using `gitlab-rails console`.
+
+### View the queue size
+
+```ruby
+Sidekiq::Queue.new("pipeline_processing:build_queue").size
+```
+
+### Enumerate all enqueued jobs
+
+```ruby
+queue = Sidekiq::Queue.new("chaos:chaos_sleep")
+queue.each do |job|
+  # job.klass # => 'MyWorker'
+  # job.args # => [1, 2, 3]
+  # job.jid # => jid
+  # job.queue # => chaos:chaos_sleep
+  # job["retry"] # => 3
+  # job.item # => {
+  #   "class"=>"Chaos::SleepWorker",
+  #   "args"=>[1000],
+  #   "retry"=>3,
+  #   "queue"=>"chaos:chaos_sleep",
+  #   "backtrace"=>true,
+  #   "queue_namespace"=>"chaos",
+  #   "jid"=>"39bc482b823cceaf07213523",
+  #   "created_at"=>1566317076.266069,
+  #   "correlation_id"=>"c323b832-a857-4858-b695-672de6f0e1af",
+  #   "enqueued_at"=>1566317076.26761},
+  # }
+
+  # job.delete if job.jid == 'abcdef1234567890'
+end
+```
+
+### Enumerate currently running jobs
+
+```ruby
+workers = Sidekiq::Workers.new
+workers.each do |process_id, thread_id, work|
+  # process_id is a unique identifier per Sidekiq process
+  # thread_id is a unique identifier per thread
+  # work is a Hash which looks like:
+  # {"queue"=>"chaos:chaos_sleep",
+  #  "payload"=>
+  #  { "class"=>"Chaos::SleepWorker",
+  #    "args"=>[1000],
+  #    "retry"=>3,
+  #    "queue"=>"chaos:chaos_sleep",
+  #    "backtrace"=>true,
+  #    "queue_namespace"=>"chaos",
+  #    "jid"=>"b2a31e3eac7b1a99ff235869",
+  #    "created_at"=>1566316974.9215662,
+  #    "correlation_id"=>"e484fb26-7576-45f9-bf21-b99389e1c53c",
+  #    "enqueued_at"=>1566316974.9229589},
+  #  "run_at"=>1566316974}],
+end
+```
+
+### Remove sidekiq jobs for given parameters (destructive)
+
+```ruby
+# for jobs like this:
+# RepositoryImportWorker.new.perform_async(100)
+id_list = [100]
+
+queue = Sidekiq::Queue.new('repository_import')
+queue.each do |job|
+  job.delete if id_list.include?(job.args[0])
+end
+```
+
+### Remove specific job ID (destructive)
+
+```ruby
+queue = Sidekiq::Queue.new('repository_import')
+queue.each do |job|
+  job.delete if job.jid == 'my-job-id'
+end
+```
+
+## Canceling running jobs (destructive)
+
+> Introduced in GitLab 12.3.
+
+This is highly risky operation and use it as last resort.
+Doing that might result in data corruption, as the job
+is interrupted mid-execution and it is not guaranteed
+that proper rollback of transactions is implemented.
+
+```ruby
+Gitlab::SidekiqMonitor.cancel_job('job-id')
+```
+
+> This requires the Sidekiq to be run with `SIDEKIQ_MONITOR_WORKER=1`
+> environment variable.
+
+To perform of the interrupt we use `Thread.raise` which
+has number of drawbacks, as mentioned in [Why Ruby’s Timeout is dangerous (and Thread.raise is terrifying)](https://jvns.ca/blog/2015/11/27/why-rubys-timeout-is-dangerous-and-thread-dot-raise-is-terrifying/):
+
+> This is where the implications get interesting, and terrifying. This means that an exception can get raised:
+>
+> * during a network request (ok, as long as the surrounding code is prepared to catch Timeout::Error)
+> * during the cleanup for the network request
+> * during a rescue block
+> * while creating an object to save to the database afterwards
+> * in any of your code, regardless of whether it could have possibly raised an exception before
+>
+> Nobody writes code to defend against an exception being raised on literally any line. That’s not even possible. So Thread.raise is  basically like a sneak attack on your code that could result in almost anything. It would probably be okay if it were pure-functional code that did not modify any state. But this is Ruby, so that’s unlikely :)
-- 
cgit v1.2.1