From 25c2d55c00c6097e6792ebf21e31342f23b9b768 Mon Sep 17 00:00:00 2001 From: Li Zefan Date: Wed, 24 Mar 2010 13:17:50 +0800 Subject: sched: Remove USER_SCHED from documentation USER_SCHED has been removed, so update the documentation accordingly. Signed-off-by: Li Zefan Signed-off-by: Peter Zijlstra Acked-by: Serge E. Hallyn LKML-Reference: <4BA9A07E.8070508@cn.fujitsu.com> Signed-off-by: Ingo Molnar --- Documentation/scheduler/sched-design-CFS.txt | 54 ++-------------------------- Documentation/scheduler/sched-rt-group.txt | 20 +++-------- 2 files changed, 7 insertions(+), 67 deletions(-) (limited to 'Documentation') diff --git a/Documentation/scheduler/sched-design-CFS.txt b/Documentation/scheduler/sched-design-CFS.txt index 6f33593e59e2..8239ebbcddce 100644 --- a/Documentation/scheduler/sched-design-CFS.txt +++ b/Documentation/scheduler/sched-design-CFS.txt @@ -211,7 +211,7 @@ provide fair CPU time to each such task group. For example, it may be desirable to first provide fair CPU time to each user on the system and then to each task belonging to a user. -CONFIG_GROUP_SCHED strives to achieve exactly that. It lets tasks to be +CONFIG_CGROUP_SCHED strives to achieve exactly that. It lets tasks to be grouped and divides CPU time fairly among such groups. CONFIG_RT_GROUP_SCHED permits to group real-time (i.e., SCHED_FIFO and @@ -220,38 +220,11 @@ SCHED_RR) tasks. CONFIG_FAIR_GROUP_SCHED permits to group CFS (i.e., SCHED_NORMAL and SCHED_BATCH) tasks. -At present, there are two (mutually exclusive) mechanisms to group tasks for -CPU bandwidth control purposes: - - - Based on user id (CONFIG_USER_SCHED) - - With this option, tasks are grouped according to their user id. - - - Based on "cgroup" pseudo filesystem (CONFIG_CGROUP_SCHED) - - This options needs CONFIG_CGROUPS to be defined, and lets the administrator + These options need CONFIG_CGROUPS to be defined, and let the administrator create arbitrary groups of tasks, using the "cgroup" pseudo filesystem. See Documentation/cgroups/cgroups.txt for more information about this filesystem. -Only one of these options to group tasks can be chosen and not both. - -When CONFIG_USER_SCHED is defined, a directory is created in sysfs for each new -user and a "cpu_share" file is added in that directory. - - # cd /sys/kernel/uids - # cat 512/cpu_share # Display user 512's CPU share - 1024 - # echo 2048 > 512/cpu_share # Modify user 512's CPU share - # cat 512/cpu_share # Display user 512's CPU share - 2048 - # - -CPU bandwidth between two users is divided in the ratio of their CPU shares. -For example: if you would like user "root" to get twice the bandwidth of user -"guest," then set the cpu_share for both the users such that "root"'s cpu_share -is twice "guest"'s cpu_share. - -When CONFIG_CGROUP_SCHED is defined, a "cpu.shares" file is created for each +When CONFIG_FAIR_GROUP_SCHED is defined, a "cpu.shares" file is created for each group created using the pseudo filesystem. See example steps below to create task groups and modify their CPU share using the "cgroups" pseudo filesystem. @@ -273,24 +246,3 @@ task groups and modify their CPU share using the "cgroups" pseudo filesystem. # #Launch gmplayer (or your favourite movie player) # echo > multimedia/tasks - -8. Implementation note: user namespaces - -User namespaces are intended to be hierarchical. But they are currently -only partially implemented. Each of those has ramifications for CFS. - -First, since user namespaces are hierarchical, the /sys/kernel/uids -presentation is inadequate. Eventually we will likely want to use sysfs -tagging to provide private views of /sys/kernel/uids within each user -namespace. - -Second, the hierarchical nature is intended to support completely -unprivileged use of user namespaces. So if using user groups, then -we want the users in a user namespace to be children of the user -who created it. - -That is currently unimplemented. So instead, every user in a new -user namespace will receive 1024 shares just like any user in the -initial user namespace. Note that at the moment creation of a new -user namespace requires each of CAP_SYS_ADMIN, CAP_SETUID, and -CAP_SETGID. diff --git a/Documentation/scheduler/sched-rt-group.txt b/Documentation/scheduler/sched-rt-group.txt index 86eabe6c3419..605b0d40329d 100644 --- a/Documentation/scheduler/sched-rt-group.txt +++ b/Documentation/scheduler/sched-rt-group.txt @@ -126,23 +126,12 @@ priority! 2.3 Basis for grouping tasks ---------------------------- -There are two compile-time settings for allocating CPU bandwidth. These are -configured using the "Basis for grouping tasks" multiple choice menu under -General setup > Group CPU Scheduler: - -a. CONFIG_USER_SCHED (aka "Basis for grouping tasks" = "user id") - -This lets you use the virtual files under -"/sys/kernel/uids//cpu_rt_runtime_us" to control he CPU time reserved for -each user . - -The other option is: - -.o CONFIG_CGROUP_SCHED (aka "Basis for grouping tasks" = "Control groups") +Enabling CONFIG_RT_GROUP_SCHED lets you explicitly allocate real +CPU bandwidth to task groups. This uses the /cgroup virtual file system and "/cgroup//cpu.rt_runtime_us" to control the CPU time reserved for each -control group instead. +control group. For more information on working with control groups, you should read Documentation/cgroups/cgroups.txt as well. @@ -161,8 +150,7 @@ For now, this can be simplified to just the following (but see Future plans): =============== There is work in progress to make the scheduling period for each group -("/sys/kernel/uids//cpu_rt_period_us" or -"/cgroup//cpu.rt_period_us" respectively) configurable as well. +("/cgroup//cpu.rt_period_us") configurable as well. The constraint on the period is that a subgroup must have a smaller or equal period to its parent. But realistically its not very useful _yet_ -- cgit v1.2.1 From 969c79215a35b06e5e3efe69b9412f858df7856c Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Thu, 6 May 2010 18:49:21 +0200 Subject: sched: replace migration_thread with cpu_stop Currently migration_thread is serving three purposes - migration pusher, context to execute active_load_balance() and forced context switcher for expedited RCU synchronize_sched. All three roles are hardcoded into migration_thread() and determining which job is scheduled is slightly messy. This patch kills migration_thread and replaces all three uses with cpu_stop. The three different roles of migration_thread() are splitted into three separate cpu_stop callbacks - migration_cpu_stop(), active_load_balance_cpu_stop() and synchronize_sched_expedited_cpu_stop() - and each use case now simply asks cpu_stop to execute the callback as necessary. synchronize_sched_expedited() was implemented with private preallocated resources and custom multi-cpu queueing and waiting logic, both of which are provided by cpu_stop. synchronize_sched_expedited_count is made atomic and all other shared resources along with the mutex are dropped. synchronize_sched_expedited() also implemented a check to detect cases where not all the callback got executed on their assigned cpus and fall back to synchronize_sched(). If called with cpu hotplug blocked, cpu_stop already guarantees that and the condition cannot happen; otherwise, stop_machine() would break. However, this patch preserves the paranoid check using a cpumask to record on which cpus the stopper ran so that it can serve as a bisection point if something actually goes wrong theree. Because the internal execution state is no longer visible, rcu_expedited_torture_stats() is removed. This patch also renames cpu_stop threads to from "stopper/%d" to "migration/%d". The names of these threads ultimately don't matter and there's no reason to make unnecessary userland visible changes. With this patch applied, stop_machine() and sched now share the same resources. stop_machine() is faster without wasting any resources and sched migration users are much cleaner. Signed-off-by: Tejun Heo Acked-by: Peter Zijlstra Cc: Ingo Molnar Cc: Dipankar Sarma Cc: Josh Triplett Cc: Paul E. McKenney Cc: Oleg Nesterov Cc: Dimitri Sivanich --- Documentation/RCU/torture.txt | 10 ---------- 1 file changed, 10 deletions(-) (limited to 'Documentation') diff --git a/Documentation/RCU/torture.txt b/Documentation/RCU/torture.txt index 0e50bc2aa1e2..5d9016795fd8 100644 --- a/Documentation/RCU/torture.txt +++ b/Documentation/RCU/torture.txt @@ -182,16 +182,6 @@ Similarly, sched_expedited RCU provides the following: sched_expedited-torture: Reader Pipe: 12660320201 95875 0 0 0 0 0 0 0 0 0 sched_expedited-torture: Reader Batch: 12660424885 0 0 0 0 0 0 0 0 0 0 sched_expedited-torture: Free-Block Circulation: 1090795 1090795 1090794 1090793 1090792 1090791 1090790 1090789 1090788 1090787 0 - state: -1 / 0:0 3:0 4:0 - -As before, the first four lines are similar to those for RCU. -The last line shows the task-migration state. The first number is --1 if synchronize_sched_expedited() is idle, -2 if in the process of -posting wakeups to the migration kthreads, and N when waiting on CPU N. -Each of the colon-separated fields following the "/" is a CPU:state pair. -Valid states are "0" for idle, "1" for waiting for quiescent state, -"2" for passed through quiescent state, and "3" when a race with a -CPU-hotplug event forces use of the synchronize_sched() primitive. USAGE -- cgit v1.2.1