summaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
* Merge branch 'v2.6.25-rc3-lockdep' of ↵Linus Torvalds2008-02-262-5/+5
|\ | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/peterz/linux-2.6-lockdep * 'v2.6.25-rc3-lockdep' of git://git.kernel.org/pub/scm/linux/kernel/git/peterz/linux-2.6-lockdep: Subject: lockdep: include all lock classes in all_lock_classes lockdep: increase MAX_LOCK_DEPTH
| * Subject: lockdep: include all lock classes in all_lock_classesDale Farnsworth2008-02-251-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | Add each lock class to the all_lock_classes list when it is first registered. Previously, lock classes were added to all_lock_classes when the lock class was first used. Since one of the uses of the list is to find unused locks, this didn't work well. Signed-off-by: Dale Farnsworth <dale@farnsworth.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * lockdep: increase MAX_LOCK_DEPTHPeter Zijlstra2008-02-251-1/+1
| | | | | | | | | | | | | | | | | | | | Some code paths exceed the current max lock depth (XFS), so increase this limit a bit. I looked at making this a dynamic allocated array, but we should not advocate insane lock depths, so stay with this as long as it works... Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | Merge branch 'for-linus' of ↵Linus Torvalds2008-02-267-109/+350
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6: firewire: fix NULL pointer deref. and resource leak Documentation: correction to debugging-via-ohci1394 ieee1394: sbp2: fix rescan-scsi-bus firewire: fw-sbp2: fix NULL pointer deref. in scsi_remove_device firewire: fw-sbp2: fix NULL pointer deref. in slave_alloc firewire: fw-sbp2: (try to) avoid I/O errors during reconnect firewire: fw-sbp2: enforce a retry of __scsi_add_device if bus generation changed firewire: fw-sbp2: sort includes firewire: fw-sbp2: logout and login after failed reconnect firewire: fw-sbp2: don't add scsi_device twice firewire: fw-sbp2: log bus_id at management request failures firewire: fw-sbp2: wait for completion of fetch agent reset ieee1394: sbp2: add INQUIRY delay workaround firewire: fw-sbp2: add INQUIRY delay workaround firewire: log GUID of new devices firewire: fw-sbp2: don't retry login or reconnect after unplug firewire: fix "kobject_add failed for fw* with -EEXIST" firewire: fw-sbp2: fix logout before login retry firewire: fw-sbp2: unsigned int vs. unsigned
| * | firewire: fix NULL pointer deref. and resource leakStefan Richter2008-02-211-2/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | By supplying ioctl()s in the wrong order, a userspace client was able to trigger NULL pointer dereferences. Furthermore, by calling ioctl_create_iso_context more than once, new contexts could be created without ever freeing the previously created contexts. Thanks to Anders Blomdell for the report. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
| * | Documentation: correction to debugging-via-ohci1394Stefan Richter2008-02-191-8/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rectify a factoid about firewire-ohci. Acked-by: Ingo Molnar <mingo@elte.hu> Also fix a typo spotted by Bernhard Kaindl. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
| * | ieee1394: sbp2: fix rescan-scsi-busStefan Richter2008-02-191-0/+3
| | | | | | | | | | | | | | | | | | rescan-scsi-bus used to add SBP-2 targets which weren't there. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
| * | firewire: fw-sbp2: fix NULL pointer deref. in scsi_remove_deviceStefan Richter2008-02-191-4/+4
| | | | | | | | | | | | | | | | | | | | | Fix a kernel bug when unplugging an SBP-2 device after having its scsi_device already removed via the "delete" sysfs attribute. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
| * | firewire: fw-sbp2: fix NULL pointer deref. in slave_allocStefan Richter2008-02-191-0/+4
| | | | | | | | | | | | | | | | | | | | | Fix a kernel bug when running rescan-scsi-bus while a FireWire disk is connected: http://bugzilla.kernel.org/show_bug.cgi?id=10008 Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
| * | firewire: fw-sbp2: (try to) avoid I/O errors during reconnectStefan Richter2008-02-191-4/+122
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While fw-sbp2 takes the necessary time to reconnect to a logical unit after bus reset, the SCSI core keeps sending new commands. They are all immediately completed with host busy status, and application clients or filesystems will break quickly. The SCSI device might even be taken offline: http://bugzilla.kernel.org/show_bug.cgi?id=9734 The only remedy seems to be to block the SCSI device until reconnect. Alas the SCSI core has no useful API to block only one logical unit i.e. the scsi_device, therefore we block the entire Scsi_Host. This currently corresponds to an SBP-2 target. In case of targets with multiple logical units, we need to satisfy the dependencies between logical units by carefully tracking the blocking state of the target and its units. We block all logical units of a target as soon as one of them needs to be blocked, and keep them blocked until all of them are ready to be unblocked. Furthermore, as the history of the old sbp2 driver has shown, the scsi_block_requests() API is a minefield with high potential of deadlocks. We therefore take extra measures to keep logical units unblocked during __scsi_add_device() and during shutdown. This avoids I/O errors during reconnect in many but alas not in all cases. There may still be errors after a re-login had to be performed. Also, some bridges have been seen to cease fetching management ORBs if I/O went on up until a bus reset. In these cases, all management ORBs time out after mgt_orb_timeout. The old sbp2 driver is less vulnerable or maybe not vulnerable to this, for as yet unknown reasons. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
| * | firewire: fw-sbp2: enforce a retry of __scsi_add_device if bus generation ↵Stefan Richter2008-02-161-14/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | changed fw-sbp2 is unable to reconnect while performing __scsi_add_device because there is only a single workqueue thread context available for both at the moment. This should be fixed eventually. An actual failure of __scsi_add_device is easy to handle, but an incomplete execution of __scsi_add_device with an sdev returned would remain undetected and leave the SBP-2 target unusable. Therefore we use a workaround: If there was a bus reset during __scsi_add_device (i.e. during the SCSI probe), we remove the new sdev immediately, log out, and attempt login and SCSI probe again. Tested-by: Jarod Wilson <jwilson@redhat.com> (earlier version) Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
| * | firewire: fw-sbp2: sort includesStefan Richter2008-02-161-7/+7
| | | | | | | | | | | | Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
| * | firewire: fw-sbp2: logout and login after failed reconnectStefan Richter2008-02-161-2/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If fw-sbp2 was too late with requesting the reconnect, the target would reject this. In this case, log out before attempting the reconnect. Else several firmwares will deny the re-login because they somehow didn't invalidate the old login. Also, don't retry reconnects in this situation. The retries won't succeed either. These changes improve chances for successful re-login and shorten the period during which the logical unit is inaccessible. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Signed-off-by: Jarod Wilson <jwilson@redhat.com>
| * | firewire: fw-sbp2: don't add scsi_device twiceStefan Richter2008-02-161-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a reconnect failed but re-login succeeded, __scsi_add_device was called again. In those cases, __scsi_add_device succeeded and returned the pointer to the existing scsi_device. fw-sbp2 then continued orderly, except that it missed to call sbp2_cancel_orbs. SCSI core would call fw-sbp2's eh_abort_handler eventually if there had been an outstanding command. This patch avoids the needless lookups and temporary allocations in SCSI core and I/O stall and timeout until eh_abort_handler hits. Also, __scsi_add_device tolerating calls for devices which already exist is undocumented behavior on which we shouldn't rely. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Signed-off-by: Jarod Wilson <jwilson@redhat.com>
| * | firewire: fw-sbp2: log bus_id at management request failuresStefan Richter2008-02-161-33/+33
| | | | | | | | | | | | | | | | | | | | | for easier readable logs if more than one SBP-2 device is present. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Signed-off-by: Jarod Wilson <jwilson@redhat.com>
| * | firewire: fw-sbp2: wait for completion of fetch agent resetStefan Richter2008-02-161-11/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Like the old sbp2 driver, wait for the write transaction to the AGENT_RESET to complete before proceeding (after login, after reconnect, or in SCSI error handling). There is one occasion where AGENT_RESET is written to from atomic context when getting DEAD status for a command ORB. There we still continue without waiting for the transaction to complete because this is more difficult to fix... Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
| * | ieee1394: sbp2: add INQUIRY delay workaroundStefan Richter2008-02-162-0/+14
| | | | | | | | | | | | | | | | | | | | | | | | Add the same workaround as found in fw-sbp2 for feature parity and compatibility of the workarounds module parameter. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Signed-off-by: Jarod Wilson <jwilson@redhat.com>
| * | firewire: fw-sbp2: add INQUIRY delay workaroundStefan Richter2008-02-161-0/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Several different SBP-2 bridges accept a login early while the IDE device is still powering up. They are therefore unable to respond to SCSI INQUIRY immediately, and the SCSI core has to retry the INQUIRY. One of these retries is typically successful, and all is well. But in case of Momobay FX-3A, the INQUIRY retries tend to fail entirely. This can usually be avoided by waiting a little while after login before letting the SCSI core send the INQUIRY. The old sbp2 driver handles this more gracefully for as yet unknown reasons (perhaps because it waits for fetch agent resets to complete, unlike fw-sbp2 which quickly proceeds after requesting the agent reset). Therefore the workaround is not as much necessary for sbp2. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Signed-off-by: Jarod Wilson <jwilson@redhat.com>
| * | firewire: log GUID of new devicesStefan Richter2008-02-161-11/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This should help to interpret user reports. E.g. one can look up the vendor OUI (first three bytes of the GUID) and thus tell what is what. Also simplifies the math in the GUID sysfs attribute. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Signed-off-by: Jarod Wilson <jwilson@redhat.com>
| * | firewire: fw-sbp2: don't retry login or reconnect after unplugStefan Richter2008-02-161-5/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a device is being unplugged while fw-sbp2 had a login or reconnect on schedule, it would take about half a minute to shut the fw_unit down: Jan 27 18:34:54 stein firewire_sbp2: logged in to fw2.0 LUN 0000 (0 retries) <unplug> Jan 27 18:34:59 stein firewire_sbp2: sbp2_scsi_abort Jan 27 18:34:59 stein scsi 25:0:0:0: Device offlined - not ready after error recovery Jan 27 18:35:01 stein firewire_sbp2: orb reply timed out, rcode=0x11 Jan 27 18:35:06 stein firewire_sbp2: orb reply timed out, rcode=0x11 Jan 27 18:35:12 stein firewire_sbp2: orb reply timed out, rcode=0x11 Jan 27 18:35:17 stein firewire_sbp2: orb reply timed out, rcode=0x11 Jan 27 18:35:22 stein firewire_sbp2: orb reply timed out, rcode=0x11 Jan 27 18:35:27 stein firewire_sbp2: orb reply timed out, rcode=0x11 Jan 27 18:35:32 stein firewire_sbp2: orb reply timed out, rcode=0x11 Jan 27 18:35:32 stein firewire_sbp2: failed to login to fw2.0 LUN 0000 Jan 27 18:35:32 stein firewire_sbp2: released fw2.0 After this patch, typically only a few seconds spent in __scsi_add_device remain: Jan 27 19:05:50 stein firewire_sbp2: logged in to fw2.0 LUN 0000 (0 retries) <unplug> Jan 27 19:05:56 stein firewire_sbp2: sbp2_scsi_abort Jan 27 19:05:56 stein scsi 33:0:0:0: Device offlined - not ready after error recovery Jan 27 19:05:56 stein firewire_sbp2: released fw2.0 The benefit of this is less noise in the syslog. It furthermore avoids a few wasted CPU cycles and needlessly prolonged lifetime of a few driver objects. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Signed-off-by: Jarod Wilson <jwilson@redhat.com>
| * | firewire: fix "kobject_add failed for fw* with -EEXIST"Stefan Richter2008-02-163-10/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is a race between shutdown and creation of devices: fw-core may attempt to add a device with the same name of an already existing device. http://bugzilla.kernel.org/show_bug.cgi?id=9828 Impact of the bug: Happens rarely (when shutdown of a device coincides with creation of another), forces the user to unplug and replug the new device to get it working. The fix is obvious: Free the minor number *after* instead of *before* device_unregister(). This requires to take an additional reference of the fw_device as long as the IDR tree points to it. And while we are at it, we fix an additional race condition: fw_device_op_open() took its reference of the fw_device a little bit too late, hence was in danger to access an already invalid fw_device. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
| * | firewire: fw-sbp2: fix logout before login retryStefan Richter2008-02-161-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fixes a "can't recognize device" kind of bug. If the SCSI INQUIRY failed and hence __scsi_add_device failed due to a bus reset, we tried a logout and then waited for the already scheduled login work to happen. So far so good, but the generation used for the logout was outdated, hence the logout never reached the target. The target might therefore deny the subsequent relogin attempt, which would also leave the target inaccessible. Therefore fetch a fresh device->generation for the logout. Use memory barriers to prevent our plan being foiled by compiler or hardware optimizations. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
| * | firewire: fw-sbp2: unsigned int vs. unsignedStefan Richter2008-02-161-8/+6
| | | | | | | | | | | | | | | | | | | | | Standardize on "unsigned int" style. Sort some struct members thematically. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
* | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86Linus Torvalds2008-02-2624-160/+191
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86: (24 commits) x86: no robust/pi futex for real i386 CPUs x86: fix boot failure on 486 due to TSC breakage x86: fix build on non-C locales. x86: make c_idle.work have a static address. x86: don't save unreliable stack trace entries x86: don't make swapper_pg_pmd global x86: don't print a warning when MTRR are blank and running in KVM x86: fix execve with -fstack-protect x86: fix vsyscall wreckage x86: rename KERNEL_TEXT_SIZE => KERNEL_IMAGE_SIZE x86: fix spontaneous reboot with allyesconfig bzImage x86: remove double-checking empty zero pages debug x86: notsc is ignored on common configurations x86/mtrr: fix kernel-doc missing notation x86: handle BIOSes which terminate e820 with CF=1 and no SMAP x86: add comments for NOPs x86: don't use P6_NOPs if compiling with CONFIG_X86_GENERIC x86: require family >= 6 if we are using P6 NOPs x86: do not promote TM3x00/TM5x00 to i686-class x86: hpet fix docbook comment ...
| * | | x86: no robust/pi futex for real i386 CPUsThomas Gleixner2008-02-261-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Real i386 CPUs do not have cmpxchg instructions. Catch it before crashing on an invalid opcode. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | x86: fix boot failure on 486 due to TSC breakageMikael Pettersson2008-02-262-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | > Diffing dmesg between git7 and git8 doesn't sched any light since > git8 also removed the printouts of the x86 caps as they were being > initialised and updated. I'm currently adding those printouts back > in the hope of seeing where and when the caps get broken. That turned out to be very illuminating: --- dmesg-2.6.24-git7 2008-02-24 18:01:25.295851000 +0100 +++ dmesg-2.6.24-git8 2008-02-24 18:01:25.530358000 +0100 ... CPU: After generic identify, caps: 00000003 00000000 00000000 00000000 00000000 00000000 00000000 00000000 CPU: After all inits, caps: 00000003 00000000 00000000 00000000 00000000 00000000 00000000 00000000 +CPU: After applying cleared_cpu_caps, caps: 00000013 00000000 00000000 00000000 00000000 00000000 00000000 00000000 Notice how the TSC cap bit goes from Off to On. (The first two lines are printout loops from -git7 forward-ported to -git8, the third line is the same printout loop added just after the xor-with-cleared_cpu_caps[] loop.) Here's how the breakage occurs: 1. arch/x86/kernel/tsc_32.c:tsc_init() sees !cpu_has_tsc, so bails and calls setup_clear_cpu_cap(X86_FEATURE_TSC). 2. include/asm-x86/cpufeature.h:setup_clear_cpu_cap(bit) clears the bit in boot_cpu_data and sets it in cleared_cpu_caps 3. arch/x86/kernel/cpu/common.c:identify_cpu() XORs all caps in with cleared_cpu_caps HOWEVER, at this point c->x86_capability correctly has TSC Off, cleared_cpu_caps has TSC On, so the XOR incorrectly sets TSC to On in c->x86_capability, with disastrous results. The real bug is that clearing bits with XOR only works if the bits are known to be 1 prior to the XOR, and that's not true here. A simple fix is to convert the XOR to AND-NOT instead. The following patch does that, and allows my 486 to boot 2.6.25-rc kernels again. [ mingo@elte.hu: fixed a similar bug in setup_64.c as well. ] The breakage was introduced via commit 7d851c8d3db0. Signed-off-by: Mikael Pettersson <mikpe@it.uu.se> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | x86: fix build on non-C locales.Priit Laes2008-02-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For some locales regex range [a-zA-Z] does not work as it is supposed to. so we have to use [:alnum:] and [:xdigit:] to make it work as intended. [1] http://en.wikipedia.org/wiki/Estonian_alphabet Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | x86: make c_idle.work have a static address.Glauber Costa2008-02-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, c_idle is declared in the stack, and thus, have no static address. Peter Zijlstra points out this simple solution, in which c_idle.work is initializated separatedly. Note that the INIT_WORK macro has a static declaration of a key inside. Signed-off-by: Glauber Costa <gcosta@redhat.com> Acked-by: Peter Zijlstra <pzijlstr@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | x86: don't save unreliable stack trace entriesVegard Nossum2008-02-261-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, there is no way for print_stack_trace() to determine whether a given stack trace entry was deemed reliable or not, simply because save_stack_trace() does not record this information. (Perhaps needless to say, this makes the saved stack traces A LOT harder to read, and probably with no other benefits, since debugging features that use save_stack_trace() most likely also require frame pointers, etc.) This patch reverts to the old behaviour of only recording the reliable trace entries for saved stack traces. Signed-off-by: Vegard Nossum <vegardno@ifi.uio.no> Acked-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | x86: don't make swapper_pg_pmd globalAdrian Bunk2008-02-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | There doesn't seem to be any reason for swapper_pg_pmd being global. Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | x86: don't print a warning when MTRR are blank and running in KVMJoerg Roedel2008-02-261-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Inside a KVM virtual machine the MTRRs are usually blank. This confuses Linux and causes a warning message at boot. This patch removes that warning message when running Linux as a KVM guest. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | x86: fix execve with -fstack-protectIngo Molnar2008-02-262-5/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | pointed out by pageexec@freemail.hu: > what happens here is that gcc treats the argument area as owned by the > callee, not the caller and is allowed to do certain tricks. for ssp it > will make a copy of the struct passed by value into the local variable > area and pass *its* address down, and it won't copy it back into the > original instance stored in the argument area. > > so once sys_execve returns, the pt_regs passed by value hasn't at all > changed and its default content will cause a nice double fault (FWIW, > this part took me the longest to debug, being down with cold didn't > help it either ;). To fix this we pass in pt_regs by pointer. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | x86: fix vsyscall wreckageThomas Gleixner2008-02-261-49/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | based on a report from Arne Georg Gleditsch about user-space apps misbehaving after toggling /proc/sys/kernel/vsyscall64, a review of the code revealed that the "NOP patching" done there is fundamentally unsafe for a number of reasons: 1) the patching code runs without synchronizing other CPUs 2) it inserts NOPs even if there is no clock source which provides vread 3) when the clock source changes to one without vread we run in exactly the same problem as in #2 4) if nobody toggles the proc entry from 1 to 0 and to 1 again, then the syscall is not patched out as a result it is possible to break user-space via this patching. The only safe thing for now is to remove the patching. This code was broken since v2.6.21. Reported-by: Arne Georg Gleditsch <arne.gleditsch@dolphinics.no> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | x86: rename KERNEL_TEXT_SIZE => KERNEL_IMAGE_SIZEIngo Molnar2008-02-262-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The KERNEL_TEXT_SIZE constant was mis-named, as we not only map the kernel text but data, bss and init sections as well. That name led me on the wrong path with the KERNEL_TEXT_SIZE regression, because i knew how big of _text_ my images have and i knew about the 40 MB "text" limit so i wrongly thought to be on the safe side of the 40 MB limit with my 29 MB of text, while the total image size was slightly above 40 MB. Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | x86: fix spontaneous reboot with allyesconfig bzImageIngo Molnar2008-02-263-12/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | recently the 64-bit allyesconfig bzImage kernel started spontaneously rebooting during early bootup. after a few fun hours spent with early init debugging, it turns out that we've got this rather annoying limit on the size of the kernel image: #define KERNEL_TEXT_SIZE (40*1024*1024) which limit my vmlinux just happened to pass: text data bss dec hex filename 29703744 4222751 8646224 42572719 2899baf vmlinux 40 MB is 42572719 bytes, so my vmlinux was just 1.5% above this limit :-/ So it happily crashed right in head_64.S, which - as we all know - is the most debuggable code in the whole architecture ;-) So increase the limit to allow an up to 128MB kernel image to be mapped. (should anyone be that crazy or lazy) We have a full 4K of pagetable (level2_kernel_pgt) allocated for these mappings already, so there's no RAM overhead and the limit was rather pointless and arbitrary. Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | x86: remove double-checking empty zero pages debugYinghai Lu2008-02-261-8/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | so far no one complained about that. Signed-off-by: Yinghai Lu <yinghai.lu@sun.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | x86: notsc is ignored on common configurationsPavel Machek2008-02-261-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | notsc is ignored in 32-bit kernels if CONFIG_X86_TSC is on.. which is bad, fix it. Signed-off-by: Pavel Machek <pavel@suse.cz> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | x86/mtrr: fix kernel-doc missing notationRandy Dunlap2008-02-261-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix mtrr kernel-doc warning: Warning(linux-2.6.24-git12//arch/x86/kernel/cpu/mtrr/main.c:677): No description found for parameter 'end_pfn' Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | x86: handle BIOSes which terminate e820 with CF=1 and no SMAPH. Peter Anvin2008-02-261-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The proper way to terminate the e820 chain is with %ebx == 0 on the last legitimate memory block. However, several BIOSes don't do that and instead return error (CF = 1) when trying to read off the end of the list. For this error return, %eax doesn't necessarily return the SMAP signature -- correctly so, since %ah should contain an error code in this case. To deal with some particularly broken BIOSes, we clear the entire e820 chain if the SMAP signature is missing in the middle, indicating a plain insane e820 implementation. However, we need to make the test for CF = 1 before the SMAP check. This fixes at least one HP laptop (nc6400) for which none of the memory-probing methods (e820, e801, 88) functioned fully according to spec. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | x86: add comments for NOPsH. Peter Anvin2008-02-261-17/+45
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add comments describing the various NOP sequences. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | x86: don't use P6_NOPs if compiling with CONFIG_X86_GENERICH. Peter Anvin2008-02-261-1/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | P6_NOPs are definitely not supported on some VIA CPUs, and possibly (unverified) on AMD K7s. It is also the only thing that prevents a 686 kernel from running on Transmeta TM3x00/5x00 (Crusoe) series. The performance benefit over generic NOPs is very small, so when building for generic consumption, avoid using them. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | x86: require family >= 6 if we are using P6 NOPsH. Peter Anvin2008-02-262-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The P6 family of NOPs are only available on family >= 6 or above, so enforce that in the boot code. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | x86: do not promote TM3x00/TM5x00 to i686-classH. Peter Anvin2008-02-261-7/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We have been promoting Transmeta TM3x00/TM5x00 chips to i686-class based on the notion that they contain all the user-space visible features of an i686-class chip. However, this is not actually true: they lack the EA-taking long NOPs (0F 1F /0). Since this is a userspace-visible incompatibility, downgrade these CPUs to the manufacturer-defined i586 level. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | x86: hpet fix docbook commentPavel Machek2008-02-261-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Pavel Machek <Pavel@suse.cz> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | x86: make DEBUG_PAGEALLOC and CPA more robustIngo Molnar2008-02-261-33/+51
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use PF_MEMALLOC to prevent recursive calls in the DBEUG_PAGEALLOC case. This makes the code simpler and more robust against allocation failures. This fixes the following fallback to non-mmconfig: http://lkml.org/lkml/2008/2/20/551 http://bugzilla.kernel.org/show_bug.cgi?id=10083 Also, for DEBUG_PAGEALLOC=n reduce the pool size to one page. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | x86/lguest: fix pgdir pmd index calculationAhmed S. Darwish2008-02-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Hi all, Beginning from commits close to v2.6.25-rc2, running lguest always oopses the host kernel. Oops is at [1]. Bisection led to the following commit: commit 37cc8d7f963ba2deec29c9b68716944516a3244f x86/early_ioremap: don't assume we're using swapper_pg_dir At the early stages of boot, before the kernel pagetable has been fully initialized, a Xen kernel will still be running off the Xen-provided pagetables rather than swapper_pg_dir[]. Therefore, readback cr3 to determine the base of the pagetable rather than assuming swapper_pg_dir[]. static inline pmd_t * __init early_ioremap_pmd(unsigned long addr) { - pgd_t *pgd = &swapper_pg_dir[pgd_index(addr)]; + /* Don't assume we're using swapper_pg_dir at this point */ + pgd_t *base = __va(read_cr3()); + pgd_t *pgd = &base[pgd_index(addr)]; pud_t *pud = pud_offset(pgd, addr); pmd_t *pmd = pmd_offset(pud, addr); Trying to analyze the problem, it seems on the guest side of lguest, %cr3 has a different value from &swapper_pg-dir (which is AFAIK fine on a pravirt guest): Putting some debugging messages in early_ioremap_pmd: /* Appears 3 times */ [ 0.000000] *************************** [ 0.000000] __va(%cr3) = c0000000, &swapper_pg_dir = c02cc000 [ 0.000000] *************************** After 8 hours of debugging and staring on lguest code, I noticed something strange in paravirt_ops->set_pmd hypercall invocation: static void lguest_set_pmd(pmd_t *pmdp, pmd_t pmdval) { *pmdp = pmdval; lazy_hcall(LHCALL_SET_PMD, __pa(pmdp)&PAGE_MASK, (__pa(pmdp)&(PAGE_SIZE-1))/4, 0); } The first hcall parameter is global pgdir which looks fine. The second parameter is the pmd index in the pgdir which is suspectful. AFAIK, calculating the index of pmd does not need a divisoin over four. Removing the division made lguest work fine again . Patch is at [2]. I am not sure why the division over four existed in the first place. It seems bogus, maybe the Xen patch just made the problem appear ? [2]: The patch: [PATCH] lguest: fix pgdir pmd index cacluation Remove an error in index calculation which leads to removing a not existing shadow page table (leading to a Null dereference). Signed-off-by: Ahmed S. Darwish <darwish.07@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | lguest: fix build breakageTony Breeds2008-02-261-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | [ mingo@elte.hu: merged to Rusty's patch ] Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | lguest: include function prototypesHarvey Harrison2008-02-262-9/+12
| | |/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | Added a declaration to asm-x86/lguest.h and moved the extern arrays there as well. As an alternative to including asm/lguest.h directly, an include could be put in linux/lguest.h Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Cc: "rusty@rustcorp.com.au" <rusty@rustcorp.com.au> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-schedLinus Torvalds2008-02-265-32/+27
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched: latencytop: change /proc task_struct access method latencytop: fix memory leak on latency proc file latencytop: fix kernel panic while reading latency proc file sched: add declaration of sched_tail to sched.h sched: fix signedness warnings in sched.c sched: clean up __pick_last_entity() a bit sched: remove duplicate code from sched_fair.c sched: make early bootup sched_clock() use safer
| * | | latencytop: change /proc task_struct access methodHiroshi Shimamoto2008-02-251-28/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Change getting task_struct by get_proc_task() at read or write time, and returns -ESRCH if get_proc_task() returns NULL. This is same behavior as other /proc files. Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>