patches/rt-Increase-decrease-the-nr-of-migratory-tasks-when-.patch


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154

From: Daniel Bristot de Oliveira <bristot@redhat.com>
Date: Mon, 26 Jun 2017 17:07:15 +0200
Subject: rt: Increase/decrease the nr of migratory tasks when enabling/disabling migration

There is a problem in the migrate_disable()/enable() implementation
regarding the number of migratory tasks in the rt/dl RQs. The problem
is the following:

When a task is attached to the rt runqueue, it is checked if it either
can run in more than one CPU, or if it is with migration disable. If
either check is true, the rt_rq->rt_nr_migratory counter is not
increased. The counter increases otherwise.

When the task is detached, the same check is done. If either check is
true, the rt_rq->rt_nr_migratory counter is not decreased. The counter
decreases otherwise. The same check is done in the dl scheduler.

One important thing is that, migrate disable/enable does not touch this
counter for tasks attached to the rt rq. So suppose the following chain
of events.

Assumptions:
Task A is the only runnable task in A      Task B runs on the CPU B
Task A runs on CFS (non-rt)                Task B has RT priority
Thus, rt_nr_migratory is 0                 B is running
Task A can run on all CPUS.

Timeline:
        CPU A/TASK A                        CPU B/TASK B
A takes the rt mutex X                           .
A disables migration                             .
           .                          B tries to take the rt mutex X
           .                          As it is held by A {
           .                            A inherits the rt priority of B
           .                            A is dequeued from CFS RQ of CPU A
           .                            A is enqueued in the RT RQ of CPU A
           .                            As migration is disabled
           .                              rt_nr_migratory in A is not increased
           .
A enables migration
A releases the rt mutex X {
  A returns to its original priority
  A ask to be dequeued from RT RQ {
    As migration is now enabled and it can run on all CPUS {
       rt_nr_migratory should be decreased
       As rt_nr_migratory is 0, rt_nr_migratory under flows
    }
}

This variable is important because it notifies if there are more than one
runnable & migratory task in the runqueue. If there are more than one
tasks, the rt_rq is set as overloaded, and then tries to migrate some
tasks. This rule is important to keep the scheduler working conserving,
that is, in a system with M CPUs, the M highest priority tasks should be
running.

As rt_nr_migratory is unsigned, it will become > 0, notifying that the
RQ is overloaded, activating pushing mechanism without need.

This patch fixes this problem by decreasing/increasing the
rt/dl_nr_migratory in the migrate disable/enable operations.

Reported-by: Pei Zhang <pezhang@redhat.com>
Reported-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: LKML <linux-kernel@vger.kernel.org>
Cc: linux-rt-users <linux-rt-users@vger.kernel.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 kernel/sched/core.c |   49 ++++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 44 insertions(+), 5 deletions(-)

--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6807,6 +6807,47 @@ const u32 sched_prio_to_wmult[40] = {
 
 #if defined(CONFIG_PREEMPT_COUNT) && defined(CONFIG_SMP)
 
+static inline void
+update_nr_migratory(struct task_struct *p, long delta)
+{
+	if (unlikely((p->sched_class == &rt_sched_class ||
+		      p->sched_class == &dl_sched_class) &&
+		      p->nr_cpus_allowed > 1)) {
+		if (p->sched_class == &rt_sched_class)
+			task_rq(p)->rt.rt_nr_migratory += delta;
+		else
+			task_rq(p)->dl.dl_nr_migratory += delta;
+	}
+}
+
+static inline void
+migrate_disable_update_cpus_allowed(struct task_struct *p)
+{
+	struct rq *rq;
+	struct rq_flags rf;
+
+	p->cpus_ptr = cpumask_of(smp_processor_id());
+
+	rq = task_rq_lock(p, &rf);
+	update_nr_migratory(p, -1);
+	p->nr_cpus_allowed = 1;
+	task_rq_unlock(rq, p, &rf);
+}
+
+static inline void
+migrate_enable_update_cpus_allowed(struct task_struct *p)
+{
+	struct rq *rq;
+	struct rq_flags rf;
+
+	p->cpus_ptr = &p->cpus_mask;
+
+	rq = task_rq_lock(p, &rf);
+	p->nr_cpus_allowed = cpumask_weight(&p->cpus_mask);
+	update_nr_migratory(p, 1);
+	task_rq_unlock(rq, p, &rf);
+}
+
 void migrate_disable(void)
 {
 	struct task_struct *p = current;
@@ -6827,10 +6868,9 @@ void migrate_disable(void)
 	}
 
 	preempt_disable();
-	p->migrate_disable = 1;
 
-	p->cpus_ptr = cpumask_of(smp_processor_id());
-	p->nr_cpus_allowed = 1;
+	migrate_disable_update_cpus_allowed(p);
+	p->migrate_disable = 1;
 
 	preempt_enable();
 }
@@ -6859,9 +6899,8 @@ void migrate_enable(void)
 
 	preempt_disable();
 
-	p->cpus_ptr = &p->cpus_mask;
-	p->nr_cpus_allowed = cpumask_weight(&p->cpus_mask);
 	p->migrate_disable = 0;
+	migrate_enable_update_cpus_allowed(p);
 
 	if (p->migrate_disable_update) {
 		struct rq *rq;