summaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
* tools/power x86_energy_perf_policy: close /proc/stat in for_every_cpu()Colin Ian King2012-11-301-1/+1
| | | | | | | | Instead of returning out of for_every_cpu() we should break out of the loop= which will then tidy up correctly by closing the file /proc/stat. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Len Brown <len.brown@intel.com>
* tools/power turbostat: v3.0: monitor Watts and TemperatureLen Brown2012-11-302-56/+690
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Show power in Watts and temperature in Celsius when hardware support is present. Intel's Sandy Bridge and Ivy Bridge processor generations support RAPL (Run-Time-Average-Power-Limiting). Per the Intel SDM (Intel® 64 and IA-32 Architectures Software Developer Manual) RAPL provides hardware energy counters and power control MSRs (Model Specific Registers). RAPL MSRs are designed primarily as a method to implement power capping. However, they are useful for monitoring system power whether or not power capping is used. In addition, Turbostat now shows temperature from DTS (Digital Thermal Sensor) and PTM (Package Thermal Monitor) hardware, if present. As before, turbostat reads MSRs, and never writes MSRs. New columns are present in turbostat output: The Pkg_W column shows Watts for each package (socket) in the system. On multi-socket systems, the system summary on the 1st row shows the sum for all sockets together. The Cor_W column shows Watts due to processors cores. Note that Core_W is included in Pkg_W. The optional GFX_W column shows Watts due to the graphics "un-core". Note that GFX_W is included in Pkg_W. The optional RAM_W column on server processors shows Watts due to DRAM DIMMS. As DRAM DIMMs are outside the processor package, RAM_W is not included in Pkg_W. The optional PKG_% and RAM_% columns on server processors shows the % of time in the measurement interval that RAPL power limiting is in effect on the package and on DRAM. Note that the RAPL energy counters have some limitations. First, hardware updates the counters about once every milli-second. This is fine for typical turbostat measurement intervals > 1 sec. However, when turbostat is used to measure events that approach 1ms, the counters are less useful. Second, the 32-bit energy counters are subject to wrapping. For example, a counter incrementing 15 micro-Joule units on a 130 Watt TDP server processor could (in theory) roll over in about 9 minutes. Turbostat detects and handles up to 1 counter overflow per measurement interval. But when the measurement interval exceeds the guaranteed counter range, we can't detect if more than 1 overflow occured. So in this case turbostat indicates that the results are in question by replacing the fractional part of the Watts in the output with "**": Pkg_W Cor_W GFX_W 3** 0** 0** Third, the RAPL counters are energy (Joule) counters -- they sum up weighted events in the package to estimate energy consumed. They are not analong power (Watt) meters. In practice, they tend to under-count because they don't cover every possible use of energy in the package. The accuracy of the RAPL counters will vary between product generations, and between SKU's in the same product generation, and with temperature. turbostat's -v (verbose) option now displays more power and thermal configuration information -- as shown on the turbostat.8 manual page. For example, it now displays the Package and DRAM Thermal Design Power (TDP): cpu0: MSR_PKG_POWER_INFO: 0x2f064001980410 (130 W TDP, RAPL 51 - 200 W, 0.045898 sec.) cpu0: MSR_DRAM_POWER_INFO,: 0x28025800780118 (35 W TDP, RAPL 15 - 75 W, 0.039062 sec.) cpu8: MSR_PKG_POWER_INFO: 0x2f064001980410 (130 W TDP, RAPL 51 - 200 W, 0.045898 sec.) cpu8: MSR_DRAM_POWER_INFO,: 0x28025800780118 (35 W TDP, RAPL 15 - 75 W, 0.039062 sec.) Signed-off-by: Len Brown <len.brown@intel.com>
* tools/power turbostat: fix output buffering issueLen Brown2012-11-301-0/+1
| | | | | | | | | | | In periodic mode, turbostat writes to stdout, but users were un-able to re-direct stdout, eg. turbostat > outputfile would result in an empty outputfile. Signed-off-by: Len Brown <len.brown@intel.com>
* tools/power turbostat: prevent infinite loop on migration error pathLen Brown2012-11-271-1/+10
| | | | | | | | | | | | | | | | Turbostat assumed if it can't migrate to a CPU, then the CPU must have gone off-line and turbostat should re-initialize with the new topology. But if turbostat can not migrate because it is restricted by a cpuset, then it will fail to migrate even after re-initialization, resulting in an infinite loop. Spit out a warning when we can't migrate and endure only 2 re-initialize cycles in a row before giving up and exiting. Signed-off-by: Len Brown <len.brown@intel.com>
* x86 power: define RAPL MSRsLen Brown2012-11-231-0/+23
| | | | | | | | | | | | | The Run Time Average Power Limiting interface is currently model specific, present on Sandy Bridge and Ivy Bridge processors. These #defines correspond to documentation in the latest "Intel® 64 and IA-32 Architectures Software Developer Manual", plus some typos in that document corrected. Signed-off-by: Len Brown <len.brown@intel.com> Cc: x86@kernel.org
* tools/power/x86/turbostat: share kernel MSR #definesLen Brown2012-11-233-19/+22
| | | | | | | | Now that turbostat is built in the kernel tree, it can share MSR #defines with the kernel. Signed-off-by: Len Brown <len.brown@intel.com> Cc: x86@kernel.org
* tools/power turbostat: graceful fail on garbage inputLen Brown2012-11-011-8/+18
| | | | | | | When invald MSR's are specified on the command line, turbostat should simply print an error and exit. Signed-off-by: Len Brown <len.brown@intel.com>
* tools/power turbostat: Repair Segmentation fault when using -i optionLen Brown2012-11-011-1/+1
| | | | | | | | Fix regression caused by commit 8e180f3cb6b7510a3bdf14e16ce87c9f5d86f102 (tools/power turbostat: add [-d MSR#][-D MSR#] options to print counter deltas) Signed-off-by: Len Brown <len.brown@intel.com>
* Merge tag 'gpio-fixes-v3.7-rc4' of ↵Linus Torvalds2012-10-305-7/+48
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio Pull GPIO fixes from Linus Walleij: - Fix a potential bit wrap issue in the Timberdale driver - Fix up the buffer allocation size in the 74x164 driver - Set the value in direction_output() right in the mvebu driver - Return proper error codes for invalid GPIOs - Fix an off-mode bug for the OMAP - Don't initialize the mask_cach on the mvebu driver * tag 'gpio-fixes-v3.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio: GPIO: mvebu-gpio: Don't initialize the mask_cache gpio/omap: fix off-mode bug: clear debounce settings on free/reset gpiolib: Don't return -EPROBE_DEFER to sysfs, or for invalid gpios gpio: mvebu: correctly set the value in direction_output() gpio-74x164: Fix buffer allocation size gpio-timberdale: fix a potential wrapping issue
| * GPIO: mvebu-gpio: Don't initialize the mask_cacheAndrew Lunn2012-10-301-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Due to the SMP nature of some of the chips, which have per CPU registers, the driver does not use the generic irq_gc_mask_set_bit() & irq_gc_mask_clr_bit() functions, which only support a single register. The driver has its own implementation of these functions, which can pick the correct register depending on the CPU being used. The functions do however use the gc->mask_cache value. The call to irq_setup_generic_chip() was passing IRQ_GC_INIT_MASK_CACHE, which caused the gc->mask_cache to be initialized to the contents of some random register. This resulted in unexpected interrupts been delivered from random GPIO lines. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Tested-by: Jamie Lentin <jm@lentin.co.uk> Acked-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Tested-by: Michael Walle <michael@walle.cc> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
| * gpio/omap: fix off-mode bug: clear debounce settings on free/resetJon Hunter2012-10-271-0/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This change was originally titled "gpio/omap: fix off-mode bug: clear debounce clock enable mask on free/reset". The title has been updated slightly to reflect (what should be) the final fix. When a GPIO is freed or shutdown, we need to ensure that any debounce settings are cleared and if the GPIO is the only GPIO in the bank that is currently using debounce, then disable the debounce clock as well to save power. Currently, the debounce settings are not cleared on a GPIO free or shutdown and so during a context restore on subsequent off-mode transition, the previous debounce values are restored from the shadow copies (bank->context.debounce*) leading to mismatch state between driver state and hardware state. This was discovered when board code was doing gpio_request_one() gpio_set_debounce() gpio_free() which was leaving the GPIO debounce settings in a confused state. If that GPIO bank is subsequently used with off-mode enabled, bogus state would be restored, leaving GPIO debounce enabled which then prevented the CORE powerdomain from transitioning. To fix this, introduce a new function called _clear_gpio_debounce() to clear any debounce settings when the GPIO is freed or shutdown. If this GPIO is the last debounce-enabled GPIO in the bank, the debounce will also be cut. Please note that we cannot use _gpio_dbck_disable() to disable the debounce clock because this has been specifically created for the gpio suspend path and is intended to shutdown the debounce clock while debounce is enabled. Special thanks to Kevin Hilman for root causing the bug. This fix is a collaborative effort with inputs from Kevin Hilman, Grazvydas Ignotas and Santosh Shilimkar. Testing: - This has been unit tested on an OMAP3430 Beagle board, by requesting a gpio, enabling debounce and then freeing the gpio and checking the register contents, the saved register context and the debounce clock state. - Kevin Hilman tested on 37xx/EVM board which configures GPIO debounce for the ads7846 touchscreen in its board file using the above sequence, and so was failing off-mode tests in dynamic idle. Verified that off-mode tests are passing with this patch. V5 changes: - Corrected author Reported-by: Paul Walmsley <paul@pwsan.com> Cc: Igor Grinberg <grinberg@compulab.co.il> Cc: Grazvydas Ignotas <notasas@gmail.com> Cc: Jon Hunter <jon-hunter@ti.com> Signed-off-by: Jon Hunter <jon-hunter@ti.com> Reviewed-by: Kevin Hilman <khilman@ti.com> Tested-by: Kevin Hilman <khilman@ti.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@ti.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
| * gpiolib: Don't return -EPROBE_DEFER to sysfs, or for invalid gpiosMathias Nyman2012-10-261-3/+7
| | | | | | | | | | | | | | | | | | gpios requested with invalid numbers, or gpios requested from userspace via sysfs should not try to be deferred on failure. Cc: stable@kernel.org Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
| * gpio: mvebu: correctly set the value in direction_output()Thomas Petazzoni2012-10-241-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | The ->direction_output() operation of gpio_chip is supposed to set the direction to output but also to set the GPIO to an initial value. Unfortunately, this last part was not done until now, causing for example the LEDs to not be properly set to their default initial value. This patch fixes this by calling the mvebu_gpio_set() function from mvebu_gpio_direction_output() before configuring the GPIO as an output GPIO. Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
| * gpio-74x164: Fix buffer allocation sizeRoland Stigge2012-10-161-1/+1
| | | | | | | | | | | | | | | | The new registers handling in the gpio-74x164 driver allocates chip->registers * 8 bytes where only one byte per register is necessary. This patch fixes this. Signed-off-by: Roland Stigge <stigge@antcom.de> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
| * gpio-timberdale: fix a potential wrapping issueDan Carpenter2012-10-151-2/+2
| | | | | | | | | | | | | | | | | | ->last_ier is an unsigned long but the high bits can't be used int the original code because the shift wraps. Cc: stable@kernel.org Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
* | Merge tag 'ext4_for_linus_stable' of ↵Linus Torvalds2012-10-301-10/+9
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 bugfix from Ted Ts'o: "This fixes the root cause of the ext4 data corruption bug which raised a ruckus on LWN, Phoronix, and Slashdot. This bug only showed up when non-standard mount options (journal_async_commit and/or journal_checksum) were enabled, and when the file system was not cleanly unmounted, but the root cause was the inode bitmap modifications was not being properly journaled. This could potentially lead to minor file system corruptions (pass 5 complaints with the inode allocation bitmap) after an unclean shutdown under the wrong/unlucky workloads, but it turned into major failure if the journal_checksum and/or jouaral_async_commit was enabled." * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: fix unjournaled inode bitmap modification
| * | ext4: fix unjournaled inode bitmap modificationEric Sandeen2012-10-281-10/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 119c0d4460b001e44b41dcf73dc6ee794b98bd31 changed ext4_new_inode() such that the inode bitmap was being modified outside a transaction, which could lead to corruption, and was discovered when journal_checksum found a bad checksum in the journal during log replay. Nix ran into this when using the journal_async_commit mount option, which enables journal checksumming. The ensuing journal replay failures due to the bad checksums led to filesystem corruption reported as the now infamous "Apparent serious progressive ext4 data corruption bug" [ Changed by tytso to only call ext4_journal_get_write_access() only when we're fairly certain that we're going to allocate the inode. ] I've tested this by mounting with journal_checksum and running fsstress then dropping power; I've also tested by hacking DM to create snapshots w/o first quiescing, which allows me to test journal replay repeatedly w/o actually power-cycling the box. Without the patch I hit a journal checksum error every time. With this fix it survives many iterations. Reported-by: Nix <nix@esperi.org.uk> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org
* | | Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds2012-10-3013-68/+113
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull block driver update from Jens Axboe: "Distilled down variant, the rest will pass over to 3.8. I pulled it into the for-linus branch I had waiting for a pull request as well, in case you are wondering why there are new entries in here too. This also got rid of two reverts and the ones of the mtip32xx patches that went in later in the 3.6 cycle, so the series looks a bit cleaner." * 'for-linus' of git://git.kernel.dk/linux-block: loop: Make explicit loop device destruction lazy mtip32xx:Added appropriate timeout value for secure erase xen/blkback: Change xen_vbd's flush_support and discard_secure to have type unsigned int, rather than bool cciss: select CONFIG_CHECK_SIGNATURE cciss: remove unneeded memset() xen/blkback: use kmem_cache_zalloc instead of kmem_cache_alloc/memset pktcdvd: update MAINTAINERS floppy: remove dr, reuse drive on do_floppy_init floppy: use common function to check if floppies can be registered floppy: properly handle failure on add_disk loop floppy: do put_disk on current dr if blk_init_queue fails floppy: don't call alloc_ordered_workqueue inside the alloc_disk loop xen/blkback: Fix compile warning block: Add blk_rq_pos(rq) to sort rq when plushing drivers/block: remove CONFIG_EXPERIMENTAL block: remove CONFIG_EXPERIMENTAL vfs: fix: don't increase bio_slab_max if krealloc() fails blkcg: stop iteration early if root_rl is the only request list blkcg: Fix use-after-free of q->root_blkg and q->root_rl.blkg
| * | | loop: Make explicit loop device destruction lazyDave Chinner2012-10-301-2/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | xfstests has always had random failures of tests due to loop devices failing to be torn down and hence leaving filesytems that cannot be unmounted. This causes test runs to immediately stop. Over the past 6 or 7 years we've added hacks like explicit unmount -d commands for loop mounts, losetup -d after unmount -d fails, etc, but still the problems persist. Recently, the frequency of loop related failures increased again to the point that xfstests 259 will reliably fail with a stray loop device that was not torn down. That is despite the fact the test is above as simple as it gets - loop 5 or 6 times running mkfs.xfs with different paramters: lofile=$(losetup -f) losetup $lofile "$testfile" "$MKFS_XFS_PROG" -b size=512 $lofile >/dev/null || echo "mkfs failed!" sync losetup -d $lofile And losteup -d $lofile is failing with EBUSY on 1-3 of these loops every time the test is run. Turns out that blkid is running simultaneously with losetup -d, and so it sees an elevated reference count and returns EBUSY. But why is blkid running? It's obvious, isn't it? udev has decided to try and find out what is on the block device as a result of a creation notification. And it is racing with mkfs, so might still be scanning the device when mkfs finishes and we try to tear it down. So, make losetup -d force autoremove behaviour. That is, when the last reference goes away, tear down the device. xfstests wants it *gone*, not causing random teardown failures when we know that all the operations the tests have specifically run on the device have completed and are no longer referencing the loop device. Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | mtip32xx:Added appropriate timeout value for secure eraseSelvan Mani2012-10-302-4/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Added appropriate timeout value for secure erase based on identify device data Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Signed-off-by: Selvan Mani <smani@micron.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | xen/blkback: Change xen_vbd's flush_support and discard_secure to have type ↵Oliver Chick2012-10-301-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | unsigned int, rather than bool Changing the type of bdev parameters to be unsigned int :1, rather than bool. This is more consistent with the types of other features in the block drivers. Signed-off-by: Oliver Chick <oliver.chick@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | cciss: select CONFIG_CHECK_SIGNATUREAkinobu Mita2012-10-301-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The patch cciss-use-check_signature.patch in -mm tree introduced a build error: drivers/built-in.o: In function `CISS_signature_present': drivers/block/cciss.c:4270: undefined reference to `check_signature' Add missing CONFIG_CHECK_SIGNATURE to fix this issue. Reported-by: Fengguang Wu <wfg@linux.intel.com> Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Fengguang Wu <wfg@linux.intel.com> Cc: Mike Miller <mike.miller@hp.com> Cc: Jens Axboe <axboe@kernel.dk> Acked-by: "Stephen M. Cameron" <scameron@beardog.cce.hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | cciss: remove unneeded memset()Wei Yongjun2012-10-301-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The memory return by kzalloc() or kmem_cache_zalloc() has already be set to zero, so remove useless memset(0). spatch with a semantic match is used to found this problem. (http://coccinelle.lip6.fr/) Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Cc: Mike Miller <mike.miller@hp.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Stephen M. Cameron <scameron@beardog.cce.hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | xen/blkback: use kmem_cache_zalloc instead of kmem_cache_alloc/memsetWei Yongjun2012-10-301-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Using kmem_cache_zalloc() instead of kmem_cache_alloc() and memset(). spatch with a semantic match is used to found this problem. (http://coccinelle.lip6.fr/) Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * | | pktcdvd: update MAINTAINERSJiri Kosina2012-10-301-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Peter is not going to maintain the driver any more. I have the hardware. Acked-by: Peter Osterlund <petero2@telia.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
| * | | floppy: remove dr, reuse drive on do_floppy_initHerton Ronaldo Krzesinski2012-10-301-26/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a small cleanup, that also may turn error handling of unitialized disks more readable. We don't need a separate variable to track allocated disks, remove dr and reuse drive variable instead. Signed-off-by: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | floppy: use common function to check if floppies can be registeredHerton Ronaldo Krzesinski2012-10-301-10/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The same checks to see if a drive can be or is registered are repeated through the code, factor out the checks in a common function and replace the repeated checks with it. Signed-off-by: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | floppy: properly handle failure on add_disk loopHerton Ronaldo Krzesinski2012-10-301-1/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On floppy initialization, if something failed inside the loop we call add_disk, there was no cleanup of previous iterations in the error handling. Cc: stable@vger.kernel.org Signed-off-by: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | floppy: do put_disk on current dr if blk_init_queue failsHerton Ronaldo Krzesinski2012-10-301-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If blk_init_queue fails, we do not call put_disk on the current dr (dr is decremented first in the error handling loop). Cc: stable@vger.kernel.org Reviewed-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | floppy: don't call alloc_ordered_workqueue inside the alloc_disk loopHerton Ronaldo Krzesinski2012-10-301-9/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since commit 070ad7e ("floppy: convert to delayed work and single-thread wq"), we end up calling alloc_ordered_workqueue multiple times inside the loop, which shouldn't be intended. Besides the leak, other side effect in the current code is if blk_init_queue fails, we would end up calling unregister_blkdev even if we didn't call yet register_blkdev. Just moved the allocation of floppy_wq before the loop, and adjusted the code accordingly. Cc: stable@vger.kernel.org # 3.5+ Acked-by: Vivek Goyal <vgoyal@redhat.com> Reviewed-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | xen/blkback: Fix compile warningKonrad Rzeszutek Wilk2012-10-301-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | drivers/block/xen-blkback/xenbus.c:260:5: warning: symbol 'xenvbd_sysfs_addif' was not declared. Should it be static? drivers/block/xen-blkback/xenbus.c:284:6: warning: symbol 'xenvbd_sysfs_delif' was not declared. Should it be static? Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * | | block: Add blk_rq_pos(rq) to sort rq when plushingJianpeng Ma2012-10-251-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | My workload is a raid5 which had 16 disks. And used our filesystem to write using direct-io mode. I used the blktrace to find those message: 8,16 0 6647 2.453665504 2579 M W 7493152 + 8 [md0_raid5] 8,16 0 6648 2.453672411 2579 Q W 7493160 + 8 [md0_raid5] 8,16 0 6649 2.453672606 2579 M W 7493160 + 8 [md0_raid5] 8,16 0 6650 2.453679255 2579 Q W 7493168 + 8 [md0_raid5] 8,16 0 6651 2.453679441 2579 M W 7493168 + 8 [md0_raid5] 8,16 0 6652 2.453685948 2579 Q W 7493176 + 8 [md0_raid5] 8,16 0 6653 2.453686149 2579 M W 7493176 + 8 [md0_raid5] 8,16 0 6654 2.453693074 2579 Q W 7493184 + 8 [md0_raid5] 8,16 0 6655 2.453693254 2579 M W 7493184 + 8 [md0_raid5] 8,16 0 6656 2.453704290 2579 Q W 7493192 + 8 [md0_raid5] 8,16 0 6657 2.453704482 2579 M W 7493192 + 8 [md0_raid5] 8,16 0 6658 2.453715016 2579 Q W 7493200 + 8 [md0_raid5] 8,16 0 6659 2.453715247 2579 M W 7493200 + 8 [md0_raid5] 8,16 0 6660 2.453721730 2579 Q W 7493208 + 8 [md0_raid5] 8,16 0 6661 2.453721974 2579 M W 7493208 + 8 [md0_raid5] 8,16 0 6662 2.453728202 2579 Q W 7493216 + 8 [md0_raid5] 8,16 0 6663 2.453728436 2579 M W 7493216 + 8 [md0_raid5] 8,16 0 6664 2.453734782 2579 Q W 7493224 + 8 [md0_raid5] 8,16 0 6665 2.453735019 2579 M W 7493224 + 8 [md0_raid5] 8,16 0 6666 2.453741401 2579 Q W 7493232 + 8 [md0_raid5] 8,16 0 6667 2.453741632 2579 M W 7493232 + 8 [md0_raid5] 8,16 0 6668 2.453748148 2579 Q W 7493240 + 8 [md0_raid5] 8,16 0 6669 2.453748386 2579 M W 7493240 + 8 [md0_raid5] 8,16 0 6670 2.453851843 2579 I W 7493144 + 104 [md0_raid5] 8,16 0 0 2.453853661 0 m N cfq2579 insert_request 8,16 0 6671 2.453854064 2579 I W 7493120 + 24 [md0_raid5] 8,16 0 0 2.453854439 0 m N cfq2579 insert_request 8,16 0 6672 2.453854793 2579 U N [md0_raid5] 2 8,16 0 0 2.453855513 0 m N cfq2579 Not idling.st->count:1 8,16 0 0 2.453855927 0 m N cfq2579 dispatch_insert 8,16 0 0 2.453861771 0 m N cfq2579 dispatched a request 8,16 0 0 2.453862248 0 m N cfq2579 activate rq,drv=1 8,16 0 6673 2.453862332 2579 D W 7493120 + 24 [md0_raid5] 8,16 0 0 2.453865957 0 m N cfq2579 Not idling.st->count:1 8,16 0 0 2.453866269 0 m N cfq2579 dispatch_insert 8,16 0 0 2.453866707 0 m N cfq2579 dispatched a request 8,16 0 0 2.453867061 0 m N cfq2579 activate rq,drv=2 8,16 0 6674 2.453867145 2579 D W 7493144 + 104 [md0_raid5] 8,16 0 6675 2.454147608 0 C W 7493120 + 24 [0] 8,16 0 0 2.454149357 0 m N cfq2579 complete rqnoidle 0 8,16 0 6676 2.454791505 0 C W 7493144 + 104 [0] 8,16 0 0 2.454794803 0 m N cfq2579 complete rqnoidle 0 8,16 0 0 2.454795160 0 m N cfq schedule dispatch From above messages,we can find rq[W 7493144 + 104] and rq[W 7493120 + 24] do not merge. Because the bio order is: 8,16 0 6638 2.453619407 2579 Q W 7493144 + 8 [md0_raid5] 8,16 0 6639 2.453620460 2579 G W 7493144 + 8 [md0_raid5] 8,16 0 6640 2.453639311 2579 Q W 7493120 + 8 [md0_raid5] 8,16 0 6641 2.453639842 2579 G W 7493120 + 8 [md0_raid5] The bio(7493144) first and bio(7493120) later.So the subsequent bios will be divided into two parts. When flushing plug-list,because elv_attempt_insert_merge only support backmerge,not supporting frontmerge. So rq[7493120 + 24] can't merge with rq[7493144 + 104]. From my test,i found those situation can count 25% in our system. Using this patch, there is no this situation. Signed-off-by: Jianpeng Ma <majianpeng@gmail.com> CC:Shaohua Li <shli@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | drivers/block: remove CONFIG_EXPERIMENTALKees Cook2012-10-231-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This config item has not carried much meaning for a while now and is almost always enabled by default. As agreed during the Linux kernel summit, remove it. CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org> CC: Asai Thambi S P <asamymuthupa@micron.com> CC: Pete Zaitcev <zaitcev@redhat.com> CC: Cong Wang <xiyou.wangcong@gmail.com> CC: Jens Axboe <jaxboe@fusionio.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | block: remove CONFIG_EXPERIMENTALKees Cook2012-10-231-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This config item has not carried much meaning for a while now and is almost always enabled by default. As agreed during the Linux kernel summit, remove it. CC: Jens Axboe <axboe@kernel.dk> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | vfs: fix: don't increase bio_slab_max if krealloc() failsAnna Leuschner2012-10-221-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Without the patch, bio_slab_max, representing bio_slabs capacity, is increased before krealloc() of bio_slabs. If krealloc() fails, bio_slab_max is too high. Fix that by only updating bio_slab_max if krealloc() is successful. Signed-off-by: Anna Leuschner <anna.m.leuschner@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | blkcg: stop iteration early if root_rl is the only request listJun'ichi Nomura2012-10-221-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | __blk_queue_next_rl() finds next request list based on blkg_list while skipping root_blkg in the list. OTOH, root_rl is special as it may exist even without root_blkg. Though the later part of the function handles such a case correctly, exiting early is good for readability of the code. Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Cc: Tejun Heo <tj@kernel.org> Acked-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | blkcg: Fix use-after-free of q->root_blkg and q->root_rl.blkgJun'ichi Nomura2012-10-221-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | blk_put_rl() does not call blkg_put() for q->root_rl because we don't take request list reference on q->root_blkg. However, if root_blkg is once attached then detached (freed), blk_put_rl() is confused by the bogus pointer in q->root_blkg. For example, with !CONFIG_BLK_DEV_THROTTLING && CONFIG_CFQ_GROUP_IOSCHED, switching IO scheduler from cfq to deadline will cause system stall after the following warning with 3.6: > WARNING: at /work/build/linux/block/blk-cgroup.h:250 > blk_put_rl+0x4d/0x95() > Modules linked in: bridge stp llc sunrpc acpi_cpufreq freq_table mperf > ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 > Pid: 0, comm: swapper/0 Not tainted 3.6.0 #1 > Call Trace: > <IRQ> [<ffffffff810453bd>] warn_slowpath_common+0x85/0x9d > [<ffffffff810453ef>] warn_slowpath_null+0x1a/0x1c > [<ffffffff811d5f8d>] blk_put_rl+0x4d/0x95 > [<ffffffff811d614a>] __blk_put_request+0xc3/0xcb > [<ffffffff811d71a3>] blk_finish_request+0x232/0x23f > [<ffffffff811d76c3>] ? blk_end_bidi_request+0x34/0x5d > [<ffffffff811d76d1>] blk_end_bidi_request+0x42/0x5d > [<ffffffff811d7728>] blk_end_request+0x10/0x12 > [<ffffffff812cdf16>] scsi_io_completion+0x207/0x4d5 > [<ffffffff812c6fcf>] scsi_finish_command+0xfa/0x103 > [<ffffffff812ce2f8>] scsi_softirq_done+0xff/0x108 > [<ffffffff811dcea5>] blk_done_softirq+0x8d/0xa1 > [<ffffffff810915d5>] ? > generic_smp_call_function_single_interrupt+0x9f/0xd7 > [<ffffffff8104cf5b>] __do_softirq+0x102/0x213 > [<ffffffff8108a5ec>] ? lock_release_holdtime+0xb6/0xbb > [<ffffffff8104d2b4>] ? raise_softirq_irqoff+0x9/0x3d > [<ffffffff81424dfc>] call_softirq+0x1c/0x30 > [<ffffffff81011beb>] do_softirq+0x4b/0xa3 > [<ffffffff8104cdb0>] irq_exit+0x53/0xd5 > [<ffffffff8102d865>] smp_call_function_single_interrupt+0x34/0x36 > [<ffffffff8142486f>] call_function_single_interrupt+0x6f/0x80 > <EOI> [<ffffffff8101800b>] ? mwait_idle+0x94/0xcd > [<ffffffff81018002>] ? mwait_idle+0x8b/0xcd > [<ffffffff81017811>] cpu_idle+0xbb/0x114 > [<ffffffff81401fbd>] rest_init+0xc1/0xc8 > [<ffffffff81401efc>] ? csum_partial_copy_generic+0x16c/0x16c > [<ffffffff81cdbd3d>] start_kernel+0x3d4/0x3e1 > [<ffffffff81cdb79e>] ? kernel_init+0x1f7/0x1f7 > [<ffffffff81cdb2dd>] x86_64_start_reservations+0xb8/0xbd > [<ffffffff81cdb3e3>] x86_64_start_kernel+0x101/0x110 This patch clears q->root_blkg and q->root_rl.blkg when root blkg is destroyed. Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Acked-by: Vivek Goyal <vgoyal@redhat.com> Acked-by: Tejun Heo <tj@kernel.org> Cc: stable@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | | | Merge branch 'for-linus' of ↵Linus Torvalds2012-10-292-2/+6
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull Ceph fixes form Sage Weil: "There are two fixes in the messenger code, one that can trigger a NULL dereference, and one that error in refcounting (extra put). There is also a trivial fix that in the fs client code that is triggered by NFS reexport." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: ceph: fix dentry reference leak in encode_fh() libceph: avoid NULL kref_put when osd reset races with alloc_msg rbd: reset BACKOFF if unable to re-queue
| * | | | ceph: fix dentry reference leak in encode_fh()David Zafman2012-10-291-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Call to d_find_alias() needs a corresponding dput() This fixes http://tracker.newdream.net/issues/3271 Signed-off-by: David Zafman <david.zafman@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
| * | | | libceph: avoid NULL kref_put when osd reset races with alloc_msgSage Weil2012-10-241-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The ceph_on_in_msg_alloc() method drops con->mutex while it allocates a message. If that races with a timeout that resends a zillion messages and resets the connection, and the ->alloc_msg() method returns a NULL message, it will call ceph_msg_put(NULL) and BUG. Fix by only calling put if msg is non-NULL. Fixes http://tracker.newdream.net/issues/3142 Signed-off-by: Sage Weil <sage@inktank.com>
| * | | | rbd: reset BACKOFF if unable to re-queueAlex Elder2012-10-091-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If ceph_fault() is unable to queue work after a delay, it sets the BACKOFF connection flag so con_work() will attempt to do so. In con_work(), when BACKOFF is set, if queue_delayed_work() doesn't result in newly-queued work, it simply ignores this condition and proceeds as if no backoff delay were desired. There are two problems with this--one of which is a bug. The first problem is simply that the intended behavior is to back off, and if we aren't able queue the work item to run after a delay we're not doing that. The only reason queue_delayed_work() won't queue work is if the provided work item is already queued. In the messenger, this means that con_work() is already scheduled to be run again. So if we simply set the BACKOFF flag again when this occurs, we know the next con_work() call will again attempt to hold off activity on the connection until after the delay. The second problem--the bug--is a leak of a reference count. If queue_delayed_work() returns 0 in con_work(), con->ops->put() drops the connection reference held on entry to con_work(). However, processing is (was) allowed to continue, and at the end of the function a second con->ops->put() is called. This patch fixes both problems. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
* | | | | Merge branch 'i2c-for-linus' of ↵Linus Torvalds2012-10-285-40/+40
|\ \ \ \ \ | |_|_|/ / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging Pull i2c subsystem fixes from Jean Delvare. * 'i2c-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging: i2c-i801: Fix comment i2c-i801: Simplify dependency towards GPIOLIB i2c-stub: Move to drivers/i2c
| * | | | i2c-i801: Fix commentJean Delvare2012-10-281-1/+1
| | | | | | | | | | | | | | | | | | | | Signed-off-by: Jean Delvare <khali@linux-fr.org>
| * | | | i2c-i801: Simplify dependency towards GPIOLIBJean Delvare2012-10-282-4/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Arbitrarily selecting GPIOLIB causes trouble on some architectures, so don't do that. Instead, just make the optional multiplexing code depend on CONFIG_I2C_MUX_GPIO instead of CONFIG_I2C_MUX for now. We can revisit if the i2c-i801 driver ever supports other multiplexing flavors. Also make that optional code depend on DMI, as it won't do anything without that. Signed-off-by: Jean Delvare <khali@linux-fr.org> Cc: Fengguang Wu <fengguang.wu@intel.com>
| * | | | i2c-stub: Move to drivers/i2cJean Delvare2012-10-283-35/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move the i2c-stub driver to drivers/i2c, to match the Kconfig entry. This is less confusing that way. I also fixed all checkpatch warnings and errors. Signed-off-by: Jean Delvare <khali@linux-fr.org> Cc: Peter Huewe <peterhuewe@gmx.de>
* | | | | Linux 3.7-rc3v3.7-rc3Linus Torvalds2012-10-281-1/+1
| | | | |
* | | | | Merge tag 'ktest-v3.7-rc2' of ↵Linus Torvalds2012-10-281-2/+4
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest Pull ktest confusion fix from Steven Rostedt: "With the v3.7-rc2 kernel, the network cards on my target boxes were not being brought up. I found that the modules for the network was not being installed. This was due to the config CONFIG_MODULES_USE_ELF_RELA that came before CONFIG_MODULES, and confused ktest in thinking that CONFIG_MODULES=y was not found. Ktest needs to test all configs and not just stop if something starts with CONFIG_MODULES." * tag 'ktest-v3.7-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest: ktest: Fix ktest confusion with CONFIG_MODULES_USE_ELF_RELA
| * | | | | ktest: Fix ktest confusion with CONFIG_MODULES_USE_ELF_RELASteven Rostedt2012-10-261-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In order to decide if ktest should bother installing modules on the target box, it checks if the config file has CONFIG_MODULES=y. But it also checks if the '=y' part exists. It only will install modules if the config exists and is set with '=y'. But as the regex that was used tests: /^CONFIG_MODULES(=y)?/ this will also match: CONFIG_MODULES_USE_ELF_RELA as the '=y' part was optional and it did not test the rest of the line. When this happens, ktest will stop checking the rest of the configs but it will also think that no modules are needed to be installed. What it should do is only jump out of the loop if it actually found a CONFIG_MODULES that is set to true. Otherwise, ktest wont install the necessary modules needed for proper booting of the test target. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* | | | | | Merge tag 'spi-mxs' of ↵Linus Torvalds2012-10-281-1/+2
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/broonie/misc Pull minor spi MXS fixes from Mark Brown: "These fixes are both pretty minor ones and are driver local." * tag 'spi-mxs' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/misc: spi: mxs: Terminate DMA in case of DMA timeout spi: mxs: Assign message status after transfer finished
| * | | | | | spi: mxs: Terminate DMA in case of DMA timeoutMarek Vasut2012-10-171-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In case the SPI DMA times out, the DMA might still be in some kind of inconsistent state. Issue dmaengine_terminate_all() on the particular channel to kill off all operations before continuing. Signed-off-by: Marek Vasut <marex@denx.de> Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>