summaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
* x86: Add NMI types for kmap_atomic, fixPeter Zijlstra2009-06-152-6/+10
| | | | | | | | | | | | | | | | | | | | | I just realized this has a kmap_atomic bug in... The below would fix it - but it's complicating this code some more. Alternatively I would have to introduce something like pte_offset_map_irq() which would make the irq/nmi detection and leave the regular code paths alone, however that would mean either duplicating the gup_fast() pagewalk or passing down a pte function pointer, which would only duplicate the gup_pte_range() bit, neither is really attractive ... Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> CC: Nick Piggin <npiggin@suse.de> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* perf report: Fix 32-bit printf formatIngo Molnar2009-06-151-1/+1
| | | | | | | | | | | | | | | | | | Yong Wang reported the following compiler warning: builtin-report.c: In function 'process_overflow_event': builtin-report.c:984: error: cast to pointer from integer of different size Which happens because we try to print ->ips[] out with a limited format, losing the high 32 bits. Print it out using %016Lx instead. Reported-by: Yong Wang <yong.y.wang@linux.intel.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* perf_counter: Make set_perf_counter_pending() declaration commonPaul Mackerras2009-06-154-6/+3
| | | | | | | | | | | | | | | | | | | At present, every architecture that supports perf_counters has to declare set_perf_counter_pending() in its arch-specific headers. This consolidates the declarations into a single declaration in one common place, include/linux/perf_counter.h. On powerpc, we continue to provide a static inline definition of set_perf_counter_pending() in the powerpc hw_irq.h. Also, this removes from the x86 perf_counter.h the unused null definitions of {test,clear}_perf_counter_pending. Reported-by: Mike Frysinger <vapier.adi@gmail.com> Signed-off-by: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: benh@kernel.crashing.org LKML-Reference: <18998.13388.920691.523227@cargo.ozlabs.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* perf_counter: powerpc: Fix two compile warningsPaul Mackerras2009-06-151-1/+3
| | | | | | | | | | | | | | | | | | | This fixes a couple of compile warnings that crept into the powerpc perf_counter code recently: CC arch/powerpc/kernel/perf_counter.o arch/powerpc/kernel/perf_counter.c: In function 'record_and_restart': arch/powerpc/kernel/perf_counter.c:1016: warning: unused variable 'addr' arch/powerpc/kernel/perf_counter.c: In function 'hw_perf_counter_init': arch/powerpc/kernel/perf_counter.c:891: warning: 'ev' may be used uninitialized in this function Stephen Rothwell reported this against linux-next as well. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <18998.12884.787039.22202@cargo.ozlabs.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* perf report: Add per system call overhead histogramIngo Molnar2009-06-151-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Take advantage of call-graph percounter sampling/recording to display a non-trivial histogram: the true, collapsed/summarized cost measurement, on a per system call total overhead basis: aldebaran:~/linux/linux/tools/perf> ./perf record -g -a -f ~/hackbench 10 aldebaran:~/linux/linux/tools/perf> ./perf report -s symbol --syscalls | head -10 # # (3536 samples) # # Overhead Symbol # ........ ...... # 40.75% [k] sys_write 40.21% [k] sys_read 4.44% [k] do_nmi ... This is done by accounting each (reliable) call-chain that chains back to a given system call to that system call function. [ So in the above example we can see that hackbench spends about 40% of its total time somewhere in sys_write() and 40% somewhere in sys_read(), the rest of the time is spent in user-space. The time is not spent in sys_write() _itself_ but in one of its many child functions. ] Or, a recording of a (source files are already in the page-cache) kernel build: $ perf record -g -m 512 -f -- make -j32 kernel $ perf report -s s --syscalls | grep '\[k\]' | grep -v nmi 4.14% [k] do_page_fault 1.20% [k] sys_write 1.10% [k] sys_open 0.63% [k] sys_exit_group 0.48% [k] smp_apic_timer_interrupt 0.37% [k] sys_read 0.37% [k] sys_execve 0.20% [k] sys_mmap 0.18% [k] sys_close 0.14% [k] sys_munmap 0.13% [k] sys_poll 0.09% [k] sys_newstat 0.07% [k] sys_clone 0.06% [k] sys_newfstat 0.05% [k] sys_access 0.05% [k] schedule Shows the true total cost of each syscall variant that gets used during a kernel build. This profile reveals it that pagefaults are the costliest, followed by read()/write(). An interesting detail: timer interrupts cost 0.5% - or 0.5 seconds per 100 seconds of kernel build-time. (this was done with HZ=1000) The summary is done in 'perf report', i.e. in the post-processing stage - so once we have a good call-graph recording, this type of non-trivial high-level analysis becomes possible. Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Pekka Enberg <penberg@cs.helsinki.fi> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* perf_counter: x86: Fix call-chain support to use NMI-safe methodsPeter Zijlstra2009-06-151-10/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | __copy_from_user_inatomic() isn't NMI safe in that it can trigger the page fault handler which is another trap and its return path invokes IRET which will also close the NMI context. Therefore use a GUP based approach to copy the stack frames over. We tried an alternative solution as well: we used a forward ported version of Mathieu Desnoyers's "NMI safe INT3 and Page Fault" patch that modifies the exception return path to use an open-coded IRET with explicit stack unrolling and TF checking. This didnt work as it interacted with faulting user-space instructions, causing them not to restart properly, which corrupts user-space registers. Solving that would probably involve disassembling those instructions and backtracing the RIP. But even without that, the code was deemed rather complex to the already non-trivial x86 entry assembly code, so instead we went for this GUP based method that does a software-walk of the pagetables. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Nick Piggin <npiggin@suse.de> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Vegard Nossum <vegard.nossum@gmail.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* x86: Add NMI types for kmap_atomicPeter Zijlstra2009-06-152-3/+6
| | | | | | | | | | | | Two new kmap_atomic slots for NMI context. And teach pte_offset_map() about NMI context. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> CC: Nick Piggin <npiggin@suse.de> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* x86, mm: Add __get_user_pages_fast()Peter Zijlstra2009-06-152-0/+62
| | | | | | | | | | | | Introduce a gup_fast() variant which is usable from IRQ/NMI context. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> CC: Nick Piggin <npiggin@suse.de> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* perf_counter: Fix ctx->mutex vs counter->mutex inversionPeter Zijlstra2009-06-151-23/+11
| | | | | | | | | | | | | | Simon triggered a lockdep inversion report about us taking ctx->mutex vs counter->mutex in inverse orders. Fix that up. Reported-by: Simon Holm Thøgersen <odie@cs.aau.dk> Tested-by: Simon Holm Thøgersen <odie@cs.aau.dk> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* perf record: Fix fast task-exit raceIngo Molnar2009-06-151-4/+12
| | | | | | | | | | | | | | | | | | | Recording with -a (or with -p) can race with tasks going away: couldn't open /proc/8440/maps Causing an early exit() and no recording done. Do not abort the recording session - instead just skip that task. Also, only print the warnings under -v. Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* perf_counter, x86: Fix kernel-space call-chainsIngo Molnar2009-06-151-13/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Kernel-space call-chains were trimmed at the first entry because we never processed anything beyond the first stack context. Allow the backtrace to jump from NMI to IRQ stack then to task stack and finally user-space stack. Also calculate the stack and bp variables correctly so that the stack walker does not exit early. We can get deep traces as a result, visible in perf report -D output: 0x32af0 [0xe0]: PERF_EVENT (IP, 5): 15134: 0xffffffff815225fd period: 1 ... chain: u:2, k:22, nr:24 ..... 0: 0xffffffff815225fd ..... 1: 0xffffffff810ac51c ..... 2: 0xffffffff81018e29 ..... 3: 0xffffffff81523939 ..... 4: 0xffffffff81524b8f ..... 5: 0xffffffff81524bd9 ..... 6: 0xffffffff8105e498 ..... 7: 0xffffffff8152315a ..... 8: 0xffffffff81522c3a ..... 9: 0xffffffff810d9b74 ..... 10: 0xffffffff810dbeec ..... 11: 0xffffffff810dc3fb This is a 22-entries kernel-space chain. (We still only record reliable stack entries.) Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* perf_counter, x86: Fix call-chain walkingIngo Molnar2009-06-141-2/+4
| | | | | | | | | | | | | | Fix the ptregs variant when we hit user-mode tasks. Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Arjan van de Ven <arjan@infradead.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* perf record/report: Add call graph / call chain profilingIngo Molnar2009-06-142-12/+53
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add the first steps of call-graph profiling: - add the -c (--call-graph) option to perf record - parse the call-graph record and printout out under -D (--dump-trace) The call-graph data is not put into the histogram yet, but it can be seen that it's being processed correctly: 0x3ce0 [0x38]: event: 35 . . ... raw event: size 56 bytes . 0000: 23 00 00 00 05 00 38 00 d4 df 0e 81 ff ff ff ff #.....8........ . 0010: 60 0b 00 00 60 0b 00 00 03 00 00 00 01 00 02 00 `...`.......... . 0020: d4 df 0e 81 ff ff ff ff a0 61 ed 41 36 00 00 00 .........a.A6.. . 0030: 04 92 e6 41 36 00 00 00 .a.A6.. . 0x3ce0 [0x38]: PERF_EVENT (IP, 5): 2912: 0xffffffff810edfd4 period: 1 ... chain: u:2, k:1, nr:3 ..... 0: 0xffffffff810edfd4 ..... 1: 0x3641ed61a0 ..... 2: 0x3641e69204 ... thread: perf:2912 ...... dso: [kernel] This shows a 3-entry call-graph: with 1 kernel-space and two user-space entries Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Arjan van de Ven <arjan@infradead.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* perf report: Print out raw events in hexaIngo Molnar2009-06-141-1/+35
| | | | | | | | | | | | | | | | | | | | | | | | | Print out events in hexa dump format, when -D is specified: 0x4868 [0x48]: event: 1 . . ... raw event: size 72 bytes . 0000: 01 00 00 00 00 00 48 00 d4 72 00 00 d4 72 00 00 ......H..r...r. . 0010: 00 00 40 f2 3e 00 00 00 00 30 01 00 00 00 00 00 ..@.>....0..... . 0020: 00 00 00 00 00 00 00 00 2f 75 73 72 2f 6c 69 62 ......../usr/li . 0030: 36 34 2f 6c 69 62 65 6c 66 2d 30 2e 31 34 31 2e 64/libelf-0.141 . 0040: 73 6f 00 00 00 00 00 00 f-0.141 . 0x4868 [0x48]: PERF_EVENT_MMAP 29396: [0x3ef2400000(0x13000) @ (nil)]: /usr/lib64/libelf-0.141.so This helps the debugging of mis-parsing of data files, and helps the addition of new sample/trace formats. Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* perf annotate: Fixes for filename:line displaysFrederic Weisbecker2009-06-131-5/+6
| | | | | | | | | | | | | | - fix addr2line on userspace binary: don't only check kernel image. - fix string allocation size for path: missing ending null char room - fix overflow in symbol extra info Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> LKML-Reference: <1244907563-7820-1-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* perf stat: Enable raw data to be printedIngo Molnar2009-06-132-18/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If -vv (very verbose) is specified, print out raw data in the following format: $ perf stat -vv -r 3 ./loop_1b_instructions [ perf stat: executing run #1 ... ] [ perf stat: executing run #2 ... ] [ perf stat: executing run #3 ... ] debug: runtime[0]: 235871872 debug: walltime[0]: 236646752 debug: runtime_cycles[0]: 755150182 debug: counter/0[0]: 235871872 debug: counter/1[0]: 235871872 debug: counter/2[0]: 235871872 debug: scaled[0]: 0 debug: counter/0[1]: 2 debug: counter/1[1]: 235870662 debug: counter/2[1]: 235870662 debug: scaled[1]: 0 debug: counter/0[2]: 1 debug: counter/1[2]: 235870437 debug: counter/2[2]: 235870437 debug: scaled[2]: 0 debug: counter/0[3]: 140 debug: counter/1[3]: 235870298 debug: counter/2[3]: 235870298 debug: scaled[3]: 0 debug: counter/0[4]: 755150182 debug: counter/1[4]: 235870145 debug: counter/2[4]: 235870145 debug: scaled[4]: 0 debug: counter/0[5]: 1001411258 debug: counter/1[5]: 235868838 debug: counter/2[5]: 235868838 debug: scaled[5]: 0 debug: counter/0[6]: 27897 debug: counter/1[6]: 235868560 debug: counter/2[6]: 235868560 debug: scaled[6]: 0 debug: counter/0[7]: 2910 debug: counter/1[7]: 235868151 debug: counter/2[7]: 235868151 debug: scaled[7]: 0 debug: runtime[0]: 235980257 debug: walltime[0]: 236770942 debug: runtime_cycles[0]: 755114546 debug: counter/0[0]: 235980257 debug: counter/1[0]: 235980257 debug: counter/2[0]: 235980257 debug: scaled[0]: 0 debug: counter/0[1]: 3 debug: counter/1[1]: 235980049 debug: counter/2[1]: 235980049 debug: scaled[1]: 0 debug: counter/0[2]: 1 debug: counter/1[2]: 235979907 debug: counter/2[2]: 235979907 debug: scaled[2]: 0 debug: counter/0[3]: 135 debug: counter/1[3]: 235979780 debug: counter/2[3]: 235979780 debug: scaled[3]: 0 debug: counter/0[4]: 755114546 debug: counter/1[4]: 235979652 debug: counter/2[4]: 235979652 debug: scaled[4]: 0 debug: counter/0[5]: 1001439771 debug: counter/1[5]: 235979304 debug: counter/2[5]: 235979304 debug: scaled[5]: 0 debug: counter/0[6]: 23723 debug: counter/1[6]: 235979050 debug: counter/2[6]: 235979050 debug: scaled[6]: 0 debug: counter/0[7]: 2213 debug: counter/1[7]: 235978820 debug: counter/2[7]: 235978820 debug: scaled[7]: 0 debug: runtime[0]: 235888002 debug: walltime[0]: 236700533 debug: runtime_cycles[0]: 754881504 debug: counter/0[0]: 235888002 debug: counter/1[0]: 235888002 debug: counter/2[0]: 235888002 debug: scaled[0]: 0 debug: counter/0[1]: 2 debug: counter/1[1]: 235887793 debug: counter/2[1]: 235887793 debug: scaled[1]: 0 debug: counter/0[2]: 1 debug: counter/1[2]: 235887645 debug: counter/2[2]: 235887645 debug: scaled[2]: 0 debug: counter/0[3]: 135 debug: counter/1[3]: 235887499 debug: counter/2[3]: 235887499 debug: scaled[3]: 0 debug: counter/0[4]: 754881504 debug: counter/1[4]: 235887368 debug: counter/2[4]: 235887368 debug: scaled[4]: 0 debug: counter/0[5]: 1001401731 debug: counter/1[5]: 235887024 debug: counter/2[5]: 235887024 debug: scaled[5]: 0 debug: counter/0[6]: 24212 debug: counter/1[6]: 235886786 debug: counter/2[6]: 235886786 debug: scaled[6]: 0 debug: counter/0[7]: 1824 debug: counter/1[7]: 235886560 debug: counter/2[7]: 235886560 debug: scaled[7]: 0 Performance counter stats for '/home/mingo/loop_1b_instructions' (3 runs): 235.913377 task-clock-msecs # 0.997 CPUs ( +- 0.011% ) 2 context-switches # 0.000 M/sec ( +- 0.000% ) 1 CPU-migrations # 0.000 M/sec ( +- 0.000% ) 136 page-faults # 0.001 M/sec ( +- 0.730% ) 755048744 cycles # 3200.534 M/sec ( +- 0.009% ) 1001417586 instructions # 1.326 IPC ( +- 0.001% ) 25277 cache-references # 0.107 M/sec ( +- 3.988% ) 2315 cache-misses # 0.010 M/sec ( +- 9.845% ) 0.236706075 seconds time elapsed. This allows the summary stats to be validated. Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* perf stat: Add feature to run and measure a command multiple timesIngo Molnar2009-06-131-65/+194
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add the --repeat <n> feature to perf stat, which repeats a given command up to a 100 times, collects the stats and calculates an average and a stddev. For example, the following oneliner 'perf stat' command runs hackbench 5 times and prints a tabulated result of all metrics, with averages and noise levels (in percentage) printed: aldebaran:~/linux/linux/tools/perf> ./perf stat --repeat 5 ~/hackbench 10 Time: 0.117 Time: 0.108 Time: 0.089 Time: 0.088 Time: 0.100 Performance counter stats for '/home/mingo/hackbench 10' (5 runs): 1243.989586 task-clock-msecs # 10.460 CPUs ( +- 4.720% ) 47706 context-switches # 0.038 M/sec ( +- 19.706% ) 387 CPU-migrations # 0.000 M/sec ( +- 3.608% ) 17793 page-faults # 0.014 M/sec ( +- 0.354% ) 3770941606 cycles # 3031.329 M/sec ( +- 4.621% ) 1566372416 instructions # 0.415 IPC ( +- 2.703% ) 16783421 cache-references # 13.492 M/sec ( +- 5.202% ) 7128590 cache-misses # 5.730 M/sec ( +- 7.420% ) 0.118924455 seconds time elapsed. The goal of this feature is to allow the reliance on these accurate statistics and to know how many times a command has to be repeated for the noise to go down to an acceptable level. (The -v option can be used to see a line printed out as each run progresses.) Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* perf stat: Reorganize outputIngo Molnar2009-06-132-29/+42
| | | | | | | | | | | | | | - use IPC for the instruction normalization output - CPUs for the CPU utilization factor value. - print out time elapsed like the other rows - tidy up the task-clocks/cpu-clocks printout Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* perf_counter, x86: Update AMD hw caching related event tableJaswinder Singh Rajput2009-06-131-21/+15
| | | | | | | | | | | | | | | All AMD models share the same hw caching related event table. Also complete the table with more events. Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com> Cc: Robert Richter <robert.richter@amd.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <1244835381.2802.2.camel@ht.satnam> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* perf_counter, x86: Check old-AMD performance monitoring supportJaswinder Singh Rajput2009-06-131-0/+4
| | | | | | | | | | | | | | AMD supports performance monitoring start from K7 (i.e. family 6), so disable it for earlier AMD CPUs. Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com> Cc: Robert Richter <robert.richter@amd.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <1244714289.6923.0.camel@ht.satnam> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* perf_counter: Fix stack corruption in perf_read_hwMarti Raudsepp2009-06-131-1/+1
| | | | | | | | | | | With PERF_FORMAT_ID, perf_read_hw now needs space for up to 4 values. Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* perf_counter: Fix atomic_set vs. atomic64_t type mismatchPaul Mackerras2009-06-131-1/+1
| | | | | | | | | | | Using atomic_set on an atomic64_t variable gives a compiler warning on powerpc, and won't give the desired result at runtime. This fixes an instance of this error in the perf_counter code. Signed-off-by: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <18995.20490.979429.244883@cargo.ozlabs.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* perf annotate: Print a sorted summary of annotated overhead linesFrederic Weisbecker2009-06-131-21/+90
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It's can be very annoying to scroll down perf annotated output until we find relevant overhead. Using the -l option, you can now have a small summary sorted per overhead in the beginning of the output. Example: ./perf annotate -l -k ../../vmlinux -s __lock_acquire Sorted summary for file ../../vmlinux ---------------------------------------------- 12.04 /home/fweisbec/linux/linux-2.6-tip/kernel/lockdep.c:1653 4.61 /home/fweisbec/linux/linux-2.6-tip/kernel/lockdep.c:1740 3.77 /home/fweisbec/linux/linux-2.6-tip/kernel/lockdep.c:1775 3.56 /home/fweisbec/linux/linux-2.6-tip/kernel/lockdep.c:1653 2.93 /home/fweisbec/linux/linux-2.6-tip/arch/x86/include/asm/irqflags.h:15 2.83 /home/fweisbec/linux/linux-2.6-tip/kernel/lockdep.c:2545 2.30 /home/fweisbec/linux/linux-2.6-tip/kernel/lockdep.c:2594 2.20 /home/fweisbec/linux/linux-2.6-tip/kernel/lockdep.c:2388 2.20 /home/fweisbec/linux/linux-2.6-tip/kernel/lockdep.c:730 2.09 /home/fweisbec/linux/linux-2.6-tip/kernel/lockdep.c:730 2.09 /home/fweisbec/linux/linux-2.6-tip/kernel/lockdep.c:138 1.88 /home/fweisbec/linux/linux-2.6-tip/kernel/lockdep.c:2548 1.47 /home/fweisbec/linux/linux-2.6-tip/arch/x86/include/asm/irqflags.h:15 1.36 /home/fweisbec/linux/linux-2.6-tip/kernel/lockdep.c:2594 1.36 /home/fweisbec/linux/linux-2.6-tip/kernel/lockdep.c:730 1.26 /home/fweisbec/linux/linux-2.6-tip/kernel/lockdep.c:1654 1.26 /home/fweisbec/linux/linux-2.6-tip/kernel/lockdep.c:1653 1.15 /home/fweisbec/linux/linux-2.6-tip/kernel/lockdep.c:2592 1.15 /home/fweisbec/linux/linux-2.6-tip/kernel/lockdep.c:1740 1.15 /home/fweisbec/linux/linux-2.6-tip/kernel/lockdep.c:1740 [...] Only overhead over 0.5% are summarized. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> LKML-Reference: <1244844682-12928-2-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* perf annotate: Print the filename:line for annotated colored linesFrederic Weisbecker2009-06-132-1/+98
| | | | | | | | | | | | | | | | | | | | | | When we have a colored line in perf annotate, ie a middle/high overhead one, it's sometimes useful to get the matching line and filename from the source file, especially this path prepares to another subsequent one which will print a sorted summary of midle/high overhead lines in the beginning of the output. Filename:Lines have the same color than the concerned ip lines. It can be slow because it relies on addr2line. We could also use objdump with -l but that implies we would have to bufferize objdump output and parse it to filter the relevant lines since we want to print a sorted summary in the beginning. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> LKML-Reference: <1244844682-12928-1-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* Merge branch 'upstream-linus' of ↵Linus Torvalds2009-06-123-59/+178
|\ | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/configfs * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/configfs: configfs: Rework configfs_depend_item() locking and make lockdep happy configfs: Silence lockdep on mkdir() and rmdir()
| * configfs: Rework configfs_depend_item() locking and make lockdep happyLouis Rilling2009-04-301-59/+47
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | configfs_depend_item() recursively locks all inodes mutex from configfs root to the target item, which makes lockdep unhappy. The purpose of this recursive locking is to ensure that the item tree can be safely parsed and that the target item, if found, is not about to leave. This patch reworks configfs_depend_item() locking using configfs_dirent_lock. Since configfs_dirent_lock protects all changes to the configfs_dirent tree, and protects tagging of items to be removed, this lock can be used instead of the inodes mutex lock chain. This needs that the check for dependents be done atomically with CONFIGFS_USET_DROPPING tagging. Now lockdep looks happy with configfs. [ Lifted the setting of s_type into configfs_new_dirent() to satisfy the atomic setting of CONFIGFS_USET_CREATING -- Joel ] Signed-off-by: Louis Rilling <louis.rilling@kerlabs.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
| * configfs: Silence lockdep on mkdir() and rmdir()Louis Rilling2009-04-303-0/+131
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When attaching default groups (subdirs) of a new group (in mkdir() or in configfs_register()), configfs recursively takes inode's mutexes along the path from the parent of the new group to the default subdirs. This is needed to ensure that the VFS will not race with operations on these sub-dirs. This is safe for the following reasons: - the VFS allows one to lock first an inode and second one of its children (The lock subclasses for this pattern are respectively I_MUTEX_PARENT and I_MUTEX_CHILD); - from this rule any inode path can be recursively locked in descending order as long as it stays under a single mountpoint and does not follow symlinks. Unfortunately lockdep does not know (yet?) how to handle such recursion. I've tried to use Peter Zijlstra's lock_set_subclass() helper to upgrade i_mutexes from I_MUTEX_CHILD to I_MUTEX_PARENT when we know that we might recursively lock some of their descendant, but this usage does not seem to fit the purpose of lock_set_subclass() because it leads to several i_mutex locked with subclass I_MUTEX_PARENT by the same task. >From inside configfs it is not possible to serialize those recursive locking with a top-level one, because mkdir() and rmdir() are already called with inodes locked by the VFS. So using some mutex_lock_nest_lock() is not an option. I am proposing two solutions: 1) one that wraps recursive mutex_lock()s with lockdep_off()/lockdep_on(). 2) (as suggested earlier by Peter Zijlstra) one that puts the i_mutexes recursively locked in different classes based on their depth from the top-level config_group created. This induces an arbitrary limit (MAX_LOCK_DEPTH - 2 == 46) on the nesting of configfs default groups whenever lockdep is activated but this limit looks reasonably high. Unfortunately, this also isolates VFS operations on configfs default groups from the others and thus lowers the chances to detect locking issues. Nobody likes solution 1), which I can understand. This patch implements solution 2). However lockdep is still not happy with configfs_depend_item(). Next patch reworks the locking of configfs_depend_item() and finally makes lockdep happy. [ Note: This hides a few locking interactions with the VFS from lockdep. That was my big concern, because we like lockdep's protection. However, the current state always dumps a spurious warning. The locking is correct, so I tell people to ignore the warning and that we'll keep our eyes on the locking to make sure it stays correct. With this patch, we eliminate the warning. We do lose some of the lockdep protections, but this only means that we still have to keep our eyes on the locking. We're going to do that anyway. -- Joel ] Signed-off-by: Louis Rilling <louis.rilling@kerlabs.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
* | Merge branch 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6Linus Torvalds2009-06-1257-338/+1322
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6: (30 commits) [S390] wire up sys_perf_counter_open [S390] wire up sys_rt_tgsigqueueinfo [S390] ftrace: add system call tracer support [S390] ftrace: add function graph tracer support [S390] ftrace: add function trace mcount test support [S390] ftrace: add dynamic ftrace support [S390] kprobes: use probe_kernel_write [S390] maccess: arch specific probe_kernel_write() implementation [S390] maccess: add weak attribute to probe_kernel_write [S390] profile_tick called twice [S390] dasd: forward internal errors to dasd_sleep_on caller [S390] dasd: sync after async probe [S390] dasd: check_characteristics cleanup [S390] dasd: no High Performance FICON in 31-bit mode [S390] dcssblk: revert devt conversion [S390] qdio: fix access beyond ARRAY_SIZE of irq_ptr->{in,out}put_qs [S390] vmalloc: add vmalloc kernel parameter support [S390] uaccess: use might_fault() instead of might_sleep() [S390] 3270: lock dependency fixes [S390] 3270: do not register with tty_register_device ...
| * | [S390] wire up sys_perf_counter_openHeiko Carstens2009-06-123-1/+12
| | | | | | | | | | | | | | | Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
| * | [S390] wire up sys_rt_tgsigqueueinfoHeiko Carstens2009-06-123-1/+11
| | | | | | | | | | | | | | | Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
| * | [S390] ftrace: add system call tracer supportHeiko Carstens2009-06-127-2/+73
| | | | | | | | | | | | | | | | | | | | | System call tracer support for s390. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
| * | [S390] ftrace: add function graph tracer supportHeiko Carstens2009-06-129-12/+166
| | | | | | | | | | | | | | | | | | | | | Function graph tracer support for s390. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
| * | [S390] ftrace: add function trace mcount test supportHeiko Carstens2009-06-122-12/+27
| | | | | | | | | | | | | | | | | | | | | | | | Add support for early test if the function tracer is enabled or disabled. Saves some extra function calls. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
| * | [S390] ftrace: add dynamic ftrace supportHeiko Carstens2009-06-1210-29/+276
| | | | | | | | | | | | | | | | | | | | | Dynamic ftrace support for s390. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
| * | [S390] kprobes: use probe_kernel_writeHeiko Carstens2009-06-121-29/+2
| | | | | | | | | | | | | | | | | | | | | Use proble_kernel_write() to patch the kernel. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
| * | [S390] maccess: arch specific probe_kernel_write() implementationHeiko Carstens2009-06-122-1/+62
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add an s390 specific probe_kernel_write() function which allows to write to the kernel text segment even if write protection is enabled. This is implemented using the lra (load real address) and stura (store using real address) instructions. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
| * | [S390] maccess: add weak attribute to probe_kernel_writeHeiko Carstens2009-06-121-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | probe_kernel_write() gets used to write to the kernel address space. E.g. to patch the kernel (kgdb, ftrace, kprobes...). Some architectures however enable write protection for the kernel text section, so that writes to this region would fault. This patch allows to specify an architecture specific version of probe_kernel_write() which allows to handle and bypass write protection of the text segment. That way it is still possible to catch random writes to kernel text and explicitly allow writes via this interface. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
| * | [S390] profile_tick called twiceMartin Schwidefsky2009-06-121-7/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | profile_tick is called twice for every clock comparator interrupt. The generic clock event code does it in tick_sched_timer and the s390 backend code in clock_comparator_work. That is one too many, remove the one in the arch backend code. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
| * | [S390] dasd: forward internal errors to dasd_sleep_on callerStefan Weinhuber2009-06-124-9/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a DASD requests is started with dasd_sleep_on and fails, then the calling function may need to know the reason for the failure. In cases of hardware errors it can inspect the sense data in the irb, but when the reason is internal (e.g. start_IO failed) then it needs a meaningfull return code. Signed-off-by: Stefan Weinhuber <wein@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
| * | [S390] dasd: sync after async probeSebastian Ott2009-06-122-2/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some functions called as a late_initcall depend on completely initialized devices. Since commit f3445a1a656bc26b07946cc6d20de1ef07c8d116 the dasd driver uses the new async framework and relies on the fact that synchronization is done in prepare_namespace which is called after the late_initcalls. Fix this by calling async_synchronize_full at the end of the related init functions. Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
| * | [S390] dasd: check_characteristics cleanupSebastian Ott2009-06-124-19/+18
| | | | | | | | | | | | | | | | | | | | | | | | Fix a broken memset (sizeof pointer vs sizeof the underlying structure) by cleaning up the involved functions. Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
| * | [S390] dasd: no High Performance FICON in 31-bit modeStefan Weinhuber2009-06-121-3/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | The High Performance FICON feature is not supported in 31-bit mode, no matter what the various flags say. So we need to check for the CONFIG_64BIT option as well. Signed-off-by: Stefan Weinhuber <wein@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
| * | [S390] dcssblk: revert devt conversionGerald Schaefer2009-06-121-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git commit f331c0296f2a9fee0d396a70598b954062603015 changed users of ->first_minor to devt. This broke device handling in dcssblk, so that no additional devices could be added after the first one. This patch reverts the devt conversion to the previous ->first_minor handling. Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
| * | [S390] qdio: fix access beyond ARRAY_SIZE of irq_ptr->{in,out}put_qsRoel Kluin2009-06-121-1/+1
| | | | | | | | | | | | | | | | | | | | | Do not go beyond ARRAY_SIZE of irq_ptr->{in,out}put_qs Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
| * | [S390] vmalloc: add vmalloc kernel parameter supportHeiko Carstens2009-06-122-5/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With the kernel parameter 'vmalloc=<size>' the size of the vmalloc area can be specified. This can be used to increase or decrease the size of the area. Works in the same way as on some other architectures. This can be useful for features which make excessive use of vmalloc and wouldn't work otherwise. The default sizes remain unchanged: 96MB for 31 bit kernels and 1GB for 64 bit kernels. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
| * | [S390] uaccess: use might_fault() instead of might_sleep()Heiko Carstens2009-06-122-9/+9
| | | | | | | | | | | | | | | | | | | | | Adds more checking in case lockdep is turned on. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
| * | [S390] 3270: lock dependency fixesMartin Schwidefsky2009-06-122-48/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Lockdep found a problem with the lock order of the view lock and the ccw device lock. raw3270_activate_view/raw3270_deactivate_view first take the ccw device lock then call the activate/deactivate functions of the view which take view lock. The update functions of the con3270/tty3270 view will first take the view lock, then take the ccw device lock. To fix this the activate/deactivate functions are changed to avoid taking the view lock by moving the functions calls that modify the 3270 output buffer to the update function which is called by a timer. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
| * | [S390] 3270: do not register with tty_register_deviceMartin Schwidefsky2009-06-121-15/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The tty3270_notifier that calls tty_register_device / tty_unregister_device is harmful in two ways: 1) the device node that is create is wrong because the minor numbers for 3270 tty start with 1 and tty_notifier passes the minor as index. 2) If 1) is corrected you'll get a warning: WARNING: at fs/sysfs/dir.c:462 sysfs_add_one+0x4c/0x60() sysfs: duplicate filename '227:1' can not be created The 227:1 link is already created by raw3270_create_attributes to refer to ../../class/tty/tty<devno>. There cannot be two links. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
| * | [S390] qdio: inline qdio_perf_stat_incJan Glauber2009-06-122-16/+6
| | | | | | | | | | | | | | | | | | | | | | | | Move qdio_perf_stat_inc to the header file so it can be inlined. Remove unused qdio_perf_stat_dec. Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
| * | [S390] qdio: simplify error handling in irq handlerJan Glauber2009-06-121-32/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | The check for the device status in qdio_establish_handle_irq() had dead code. Remove the unused code and simplify the error handling. Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>