summaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
* md: move compat_ioctl handling into md.cArnd Bergmann2009-12-142-18/+23
| | | | | | | | | | | | The RAID ioctls are only implemented in md.c, so the handling for them should also be moved there from fs/compat_ioctl.c. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Neil Brown <neilb@suse.de> Cc: Andre Noll <maan@systemlinux.org> Cc: linux-raid@vger.kernel.org Signed-off-by: NeilBrown <neilb@suse.de>
* md: revise Kconfig help for MD_MULTIPATHNeilBrown2009-12-141-5/+4
| | | | | | | | Make it clear in the config message that MD_MULTIPATH is not under active development. Cc: Oren Held <orenhe@il.ibm.com> Signed-off-by: NeilBrown <neilb@suse.de>
* md: add MODULE_DESCRIPTION for all md related modules.NeilBrown2009-12-149-0/+9
| | | | | | Suggested by Oren Held <orenhe@il.ibm.com> Signed-off-by: NeilBrown <neilb@suse.de>
* raid: improve MD/raid10 handling of correctable read errors.Robert Becker2009-12-143-0/+112
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We've noticed severe lasting performance degradation of our raid arrays when we have drives that yield large amounts of media errors. The raid10 module will queue each failed read for retry, and also will attempt call fix_read_error() to perform the read recovery. Read recovery is performed while the array is frozen, so repeated recovery attempts can degrade the performance of the array for extended periods of time. With this patch I propose adding a per md device max number of corrected read attempts. Each rdev will maintain a count of read correction attempts in the rdev->read_errors field (not used currently for raid10). When we enter fix_read_error() we'll check to see when the last read error occurred, and divide the read error count by 2 for every hour since the last read error. If at that point our read error count exceeds the read error threshold, we'll fail the raid device. In addition in this patch I add sysfs nodes (get/set) for the per md max_read_errors attribute, the rdev->read_errors attribute, and added some printk's to indicate when fix_read_error fails to repair an rdev. For testing I used debugfs->fail_make_request to inject IO errors to the rdev while doing IO to the raid array. Signed-off-by: Robert Becker <Rob.Becker@riverbed.com> Signed-off-by: NeilBrown <neilb@suse.de>
* md/raid10: print more useful messages on device failure.Robert Becker2009-12-141-3/+29
| | | | | | | | | When we get a read error on a device in a RAID10, and attempting to repair the error fails, print more useful messages about why it failed. Signed-off-by: Robert Becker <Rob.Becker@riverbed.com> Signed-off-by: NeilBrown <neilb@suse.de>
* md/bitmap: update dirty flag when bitmap bits are explicitly set.NeilBrown2009-12-141-0/+6
| | | | | | | | | | | | | | There is a sysfs file which allows bits in the write-intent bitmap to be explicit set - indicating that the block is thought to be 'dirty'. When this happens we should really set recovery_cp backwards to include the block to reflect this dirtiness. In particular, a 'resync' process will refuse to start if recovery_cp is beyond the end of the array, so this is needed to allow a resync to be triggered. Signed-off-by: NeilBrown <neilb@suse.de>
* md: Support write-intent bitmaps with externally managed metadata.NeilBrown2009-12-144-33/+137
| | | | | | | | | | | In this case, the metadata needs to not be in the same sector as the bitmap. md will not read/write any bitmap metadata. Config must be done via sysfs and when a recovery makes the array non-degraded again, writing 'true' to 'bitmap/can_clear' will allow bits in the bitmap to be cleared again. Signed-off-by: NeilBrown <neilb@suse.de>
* md/bitmap: move setting of daemon_lastrun out of bitmap_read_sbNeilBrown2009-12-141-1/+1
| | | | | | | | | Setting daemon_lastrun really has nothing to do with reading the bitmap superblock, it just happens to be needed at the same time. bitmap_read_sb is about to become options, so move that code out to after the call to bitmap_read_sb. Signed-off-by: NeilBrown <neilb@suse.de>
* md: support updating bitmap parameters via sysfs.NeilBrown2009-12-144-2/+240
| | | | | | | | | | | | | | | | | | | | A new attribute directory 'bitmap' in 'md' is created which contains files for configuring the bitmap. 'location' identifies where the bitmap is, either 'none', or 'file' or 'sector offset from metadata'. Writing 'location' can create or remove a bitmap. Adding a 'file' bitmap this way is not yet supported. 'chunksize' and 'time_base' must be set before 'location' can be set. 'chunksize' can be set before creating a bitmap, but is currently always over-ridden by the bitmap superblock. 'time_base' and 'backlog' can be updated at any time. Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Andre Noll <maan@systemlinux.org>
* md: factor out parsing of fixed-point numbersNeilBrown2009-12-142-23/+44
| | | | | | | | safe_delay_store can parse fixed point numbers (for fractions of a second). We will want to do that for another sysfs file soon, so factor out the code. Signed-off-by: NeilBrown <neilb@suse.de>
* md: support bitmap offset appropriate for external-metadata arrays.NeilBrown2009-12-142-5/+15
| | | | | | | | | | | | | For md arrays were metadata is managed externally, the kernel does not know about a superblock so the superblock offset is 0. If we want to have a write-intent-bitmap near the end of the devices of such an array, we should support sector_t sized offset. We need offset be possibly negative for when the bitmap is before the metadata, so use loff_t instead. Also add sanity check that bitmap does not overlap with data. Signed-off-by: NeilBrown <neilb@suse.de>
* md: remove needless setting of thread->timeout in raid10_quiesceNeilBrown2009-12-142-7/+1
| | | | | | | | | | As bitmap_create and bitmap_destroy already set thread->timeout as appropriate, there is no need to do it in raid10_quiesce. There is a possible need to wake the thread after the timeout has been set low, but it is better to do that where the timeout is actually set low, in bitmap_create. Signed-off-by: NeilBrown <neilb@suse.de>
* md: change daemon_sleep to be in 'jiffies' rather than 'seconds'.NeilBrown2009-12-142-8/+8
| | | | | | This removes a lot of multiplications by HZ. Signed-off-by: NeilBrown <neilb@suse.de>
* md: move offset, daemon_sleep and chunksize out of bitmap structureNeilBrown2009-12-146-32/+40
| | | | | | | ... and into bitmap_info. These are all configuration parameters that need to be set before the bitmap is created. Signed-off-by: NeilBrown <neilb@suse.de>
* md: collect bitmap-specific fields into one structure.NeilBrown2009-12-143-56/+63
| | | | | | | | In preparation for making bitmap fields configurable via sysfs, start tidying up by making a single structure to contain the configuration fields. Signed-off-by: NeilBrown <neilb@suse.de>
* md/raid1: add takeover support for raid5->raid1NeilBrown2009-12-142-76/+120
| | | | | | A 2-device raid5 array can now be converted to raid1. Signed-off-by: NeilBrown <neilb@suse.de>
* md: add honouring of suspend_{lo,hi} to raid1.NeilBrown2009-12-141-0/+22
| | | | | | | | This will allow us to stop writeout to portions of the array while they are resynced by someone else - e.g. another node in a cluster. Signed-off-by: NeilBrown <neilb@suse.de>
* md/raid5: don't complete make_request on barrier until writes are scheduledNeilBrown2009-12-141-12/+39
| | | | | | | | | | | | | | The post-barrier-flush is sent by md as soon as make_request on the barrier write completes. For raid5, the data might not be in the per-device queues yet. So for barrier requests, wait for any pre-reading to be done so that the request will be in the per-device queues. We use the 'preread_active' count to check that nothing is still in the preread phase, and delay the decrement of this count until after write requests have been submitted to the underlying devices. Signed-off-by: NeilBrown <neilb@suse.de>
* md: support barrier requests on all personalities.NeilBrown2009-12-147-7/+126
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously barriers were only supported on RAID1. This is because other levels requires synchronisation across all devices and so needed a different approach. Here is that approach. When a barrier arrives, we send a zero-length barrier to every active device. When that completes - and if the original request was not empty - we submit the barrier request itself (with the barrier flag cleared) and then submit a fresh load of zero length barriers. The barrier request itself is asynchronous, but any subsequent request will block until the barrier completes. The reason for clearing the barrier flag is that a barrier request is allowed to fail. If we pass a non-empty barrier through a striping raid level it is conceivable that part of it could succeed and part could fail. That would be way too hard to deal with. So if the first run of zero length barriers succeed, we assume all is sufficiently well that we send the request and ignore errors in the second run of barriers. RAID5 needs extra care as write requests may not have been submitted to the underlying devices yet. So we flush the stripe cache before proceeding with the barrier. Note that the second set of zero-length barriers are submitted immediately after the original request is submitted. Thus when a personality finds mddev->barrier to be set during make_request, it should not return from make_request until the corresponding per-device request(s) have been queued. That will be done in later patches. Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Andre Noll <maan@systemlinux.org>
* md: don't reset curr_resync_completed after an interrupted resyncNeilBrown2009-12-141-1/+3
| | | | | | | | | If a resync/recovery/check/repair is interrupted for some reason, it can be useful to know exactly where it got up to. So in that case, do not clear curr_resync_completed. Initialise it when starting a resync/recovery/... instead. Signed-off-by: NeilBrown <neilb@suse.de>
* md: adjust resync_min usefully when resync aborts.NeilBrown2009-12-141-3/+7
| | | | | | | | | | When a 'check' or 'repair' finished we should clear resync_min so that a future check/repair will cover the whole array (by default). However if it is interrupted, we should update resync_min to where we got up to, so that when the check/repair continues it just does the remainder of the array. Signed-off-by: NeilBrown <neilb@suse.de>
* md: remove sparse warning:symbol XXX was not declared.NeilBrown2009-12-142-19/+19
| | | | Signed-off-by: NeilBrown <neilb@suse.de>
* md/raid5: remove some sparse warnings.NeilBrown2009-12-141-2/+1
| | | | | | qd_idx is previously declared and given exactly the same value! Signed-off-by: NeilBrown <neilb@suse.de>
* md/bitmap: protect against bitmap removal while being updated.NeilBrown2009-12-144-8/+22
| | | | | | | | | | | | | | | | | | A write intent bitmap can be removed from an array while the array is active. When this happens, all IO is suspended and flushed before the bitmap is removed. However it is possible that bitmap_daemon_work is still running to clear old bits from the bitmap. If it is, it can dereference the bitmap after it has been freed. So introduce a new mutex to protect bitmap_daemon_work and get it before destroying a bitmap. This is suitable for any current -stable kernel. Signed-off-by: NeilBrown <neilb@suse.de> Cc: stable@kernel.org
* Merge branch 'ixp4xx' of ↵Linus Torvalds2009-12-1238-911/+461
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/chris/linux-2.6 * 'ixp4xx' of git://git.kernel.org/pub/scm/linux/kernel/git/chris/linux-2.6: IXP4xx: GTWX5715 platform only has two PCI IRQ lines, not four. IXP4xx: Introduce IXP4XX_GPIO_IRQ(n) macro and convert IXP4xx platform files. IXP4xx: move Gemtek GTWX5715 platform macros to the platform code. IXP4xx: Remove unused Motorola PrPMC1100 platform macros. IXP4xx: move FSG platform macros to the platform code. IXP4xx: move DSM G600 platform macros to the platform code. IXP4xx: move NAS100D platform macros to the platform code. IXP4xx: move NSLU2 platform macros to the platform code. IXP4xx: move Coyote platform macros to the platform code. IXP4xx: move AVILA platform macros to the platform code. IXP4xx: move IXDP425 platform macros to the platform code. IXP4xx: Extend PCI MMIO indirect address space to 1 GB. IXP4xx: Fix compilation failure with CONFIG_IXP4XX_INDIRECT_PCI. IXP4xx: Drop "__ixp4xx_" prefix from in/out/ioread/iowrite functions for clarity. IXP4xx: Rename indirect MMIO primitives from __ixp4xx_* to __indirect_*. IXP4xx: Ensure index is positive in irq_to_gpio() and npe_request(). ARM: fix insl() and outsl() endianness on IXP4xx architecture. IXP4xx: Fix normally-disabled debugging text in drivers/net/arm/ixp4xx_eth.c. IXP4xx: change the timer base frequency to 66.666000 MHz.
| * IXP4xx: GTWX5715 platform only has two PCI IRQ lines, not four.Krzysztof Hałasa2009-12-051-21/+11
| | | | | | | | Signed-off-by: Krzysztof Hałasa <khc@pm.waw.pl>
| * IXP4xx: Introduce IXP4XX_GPIO_IRQ(n) macro and convert IXP4xx platform files.Krzysztof Hałasa2009-12-0510-234/+169
| | | | | | | | Signed-off-by: Krzysztof Hałasa <khc@pm.waw.pl>
| * IXP4xx: move Gemtek GTWX5715 platform macros to the platform code.Krzysztof Hałasa2009-12-053-121/+43
| | | | | | | | Signed-off-by: Krzysztof Hałasa <khc@pm.waw.pl>
| * IXP4xx: Remove unused Motorola PrPMC1100 platform macros.Krzysztof Hałasa2009-12-053-44/+0
| | | | | | | | | | | | | | PrPMC1100 is handled by IXDP425 platform code, there is no need for duplicate set of macros. Remove them. Signed-off-by: Krzysztof Hałasa <khc@pm.waw.pl>
| * IXP4xx: move FSG platform macros to the platform code.Krzysztof Hałasa2009-12-056-60/+25
| | | | | | | | Signed-off-by: Krzysztof Hałasa <khc@pm.waw.pl>
| * IXP4xx: move DSM G600 platform macros to the platform code.Krzysztof Hałasa2009-12-055-64/+35
| | | | | | | | Signed-off-by: Krzysztof Hałasa <khc@pm.waw.pl>
| * IXP4xx: move NAS100D platform macros to the platform code.Krzysztof Hałasa2009-12-055-64/+31
| | | | | | | | Signed-off-by: Krzysztof Hałasa <khc@pm.waw.pl>
| * IXP4xx: move NSLU2 platform macros to the platform code.Krzysztof Hałasa2009-12-055-65/+32
| | | | | | | | Signed-off-by: Krzysztof Hałasa <khc@pm.waw.pl>
| * IXP4xx: move Coyote platform macros to the platform code.Krzysztof Hałasa2009-12-055-43/+19
| | | | | | | | Signed-off-by: Krzysztof Hałasa <khc@pm.waw.pl>
| * IXP4xx: move AVILA platform macros to the platform code.Krzysztof Hałasa2009-12-055-51/+18
| | | | | | | | Signed-off-by: Krzysztof Hałasa <khc@pm.waw.pl>
| * IXP4xx: move IXDP425 platform macros to the platform code.Krzysztof Hałasa2009-12-055-52/+25
| | | | | | | | Signed-off-by: Krzysztof Hałasa <khc@pm.waw.pl>
| * IXP4xx: Extend PCI MMIO indirect address space to 1 GB.Krzysztof Hałasa2009-12-054-29/+29
| | | | | | | | | | | | | | | | | | IXP4xx CPUs can indirectly access the whole 4 GB PCI MMIO address space (using the non-prefetch registers). Previously the available space depended on the CPU variant, since one of the IXP43x platforms needed more than the usual 128 MB. 1 GB should be enough for everyone, and if not, we can trivially increase it. Signed-off-by: Krzysztof Hałasa <khc@pm.waw.pl>
| * IXP4xx: Fix compilation failure with CONFIG_IXP4XX_INDIRECT_PCI.Krzysztof Hałasa2009-12-051-8/+15
| | | | | | | | | | | | | | | | Instead of including the heavy linux/mm.h for VMALLOC_START, test the addresses against PCI MIN and MAX addresses. Indirect PCI uses 1:1 mapping for MMIO space making this change possible. Signed-off-by: Krzysztof Hałasa <khc@pm.waw.pl>
| * IXP4xx: Drop "__ixp4xx_" prefix from in/out/ioread/iowrite functions for ↵Krzysztof Hałasa2009-12-051-95/+55
| | | | | | | | | | | | clarity. Signed-off-by: Krzysztof Hałasa <khc@pm.waw.pl>
| * IXP4xx: Rename indirect MMIO primitives from __ixp4xx_* to __indirect_*.Krzysztof Hałasa2009-12-051-57/+50
| | | | | | | | Signed-off-by: Krzysztof Hałasa <khc@pm.waw.pl>
| * IXP4xx: Ensure index is positive in irq_to_gpio() and npe_request().Roel Kluin2009-12-054-4/+4
| | | | | | | | | | | | | | The indexes were signed, so negatives were possible. Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Signed-off-by: Krzysztof Hałasa <khc@pm.waw.pl>
| * ARM: fix insl() and outsl() endianness on IXP4xx architecture.Krzysztof Hałasa2009-12-051-3/+4
| | | | | | | | | | | | The repetitive in/out functions must preserve order, not value. Signed-off-by: Krzysztof Hałasa <khc@pm.waw.pl>
| * IXP4xx: Fix normally-disabled debugging text in drivers/net/arm/ixp4xx_eth.c.Krzysztof Hałasa2009-12-051-1/+1
| | | | | | | | Signed-off-by: Krzysztof Hałasa <khc@pm.waw.pl>
| * IXP4xx: change the timer base frequency to 66.666000 MHz.Krzysztof Hałasa2009-12-051-1/+1
| | | | | | | | | | | | | | | | | | Clock generators used by IXP4xx processors are usually 33.333 MHz, sometimes 33.33 MHz and few platforms use 33 MHz. The timers tick twice as fast, that means 66.666, 66.66 or 66 MHz. Current 66.666666 MHz means 10 ppm offset from the usual 66.666 MHz. Signed-off-by: Krzysztof Hałasa <khc@pm.waw.pl>
* | [BKL] add 'might_sleep()' to the outermost lock takerLinus Torvalds2009-12-121-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As shown by the previous patch (6698e3472: "tty: Fix BKL taken under a spinlock bug introduced in the BKL split") the BKL removal is prone to some subtle issues, where removing the BKL in one place may in fact make a previously nested BKL call the new outer call, and then prone to nasty deadlocks with other spinlocks. In general, we should never take the BKL while we're holding a spinlock, so let's just add a "might_sleep()" to it (even though the BKL doesn't technically sleep - at least not yet), and we'll get nice warnings the next time this kind of problem happens during BKL removal. Acked-and-Tested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | tty: Fix BKL taken under a spinlock bug introduced in the BKL splitAlan Cox2009-12-121-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The fasync path takes the BKL (it probably doesn't need to in fact) while holding the file_list spinlock. You can't do that with the kernel lock: it causes lock inversions and deadlocks. Leave the BKL over that bit for the moment. Identified by AKPM. Signed-off-by: Alan Cox <alan@linux.intel.com> Acked-and-Tested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | Merge branch 'next' of ↵Linus Torvalds2009-12-12234-2713/+13126
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (151 commits) powerpc: Fix usage of 64-bit instruction in 32-bit altivec code MAINTAINERS: Add PowerPC patterns powerpc/pseries: Track previous CPPR values to correctly EOI interrupts powerpc/pseries: Correct pseries/dlpar.c build break without CONFIG_SMP powerpc: Make "intspec" pointers in irq_host->xlate() const powerpc/8xx: DTLB Miss cleanup powerpc/8xx: Remove DIRTY pte handling in DTLB Error. powerpc/8xx: Start using dcbX instructions in various copy routines powerpc/8xx: Restore _PAGE_WRITETHRU powerpc/8xx: Add missing Guarded setting in DTLB Error. powerpc/8xx: Fixup DAR from buggy dcbX instructions. powerpc/8xx: Tag DAR with 0x00f0 to catch buggy instructions. powerpc/8xx: Update TLB asm so it behaves as linux mm expects. powerpc/8xx: Invalidate non present TLBs powerpc/pseries: Serialize cpu hotplug operations during deactivate Vs deallocate pseries/pseries: Add code to online/offline CPUs of a DLPAR node powerpc: stop_this_cpu: remove the cpu from the online map. powerpc/pseries: Add kernel based CPU DLPAR handling sysfs/cpu: Add probe/release files powerpc/pseries: Kernel DLPAR Infrastructure ...
| * | powerpc: Fix usage of 64-bit instruction in 32-bit altivec codeBenjamin Herrenschmidt2009-12-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | e821ea70f3b4873b50056a1e0f74befed1014c09 introduced a bug by copying some 64-bit originated code as-is to be used by both 32 and 64-bit but this code contains a 64-bit ony "cmpdi" instruction. This changes it to cmpwi, which is fine since VRSAVE can only contains a 32-bit value anyway. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> CC: <stable@kernel.org>
| * | Merge commit 'origin/master' into nextBenjamin Herrenschmidt2009-12-093651-114842/+248433
| |\ \ | | | | | | | | | | | | | | | | Conflicts: include/linux/kvm.h
| * | | MAINTAINERS: Add PowerPC patternsJoe Perches2009-12-091-0/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On Fri, 2009-12-04 at 20:59 +1100, Benjamin Herrenschmidt wrote: > On Fri, 2009-12-04 at 10:34 +0100, Jean Delvare wrote: > > I've sent it to linuxppc-dev@ozlabs.org on October 14th. This is the > > address which is listed 22 times in MAINTAINERS. If it isn't correct, > > then please update MAINTAINERS. > No it's fine both shoul work. Your patches are there, just waiting for > me to pick them up, I was just firing a reminder to the rest of the CC > list :-) (and I do remember fwd'ing a couple of your patches to the > list, for some reason they didn't make it to patchwork back then, that > was a few month ago). > Anyways, I've been stretched thin with all sort of stuff lately, so bear > with me if I'm a bit slow at taking or testing stuff, I'm doing my best. Adding patterns to the PowerPC sections of MAINTAINERS is useful. Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Josh Boyer <jwboyer@linux.vnet.ibm.com> Acked-by: Grant Likely <grant.likely@secretlab.ca> Acked-by: Olof Johansson <olof@lixom.net> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>