summaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
* PM: Make it possible to avoid races between wakeup and system sleepRafael J. Wysocki2010-07-1914-10/+375
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | One of the arguments during the suspend blockers discussion was that the mainline kernel didn't contain any mechanisms making it possible to avoid races between wakeup and system suspend. Generally, there are two problems in that area. First, if a wakeup event occurs exactly when /sys/power/state is being written to, it may be delivered to user space right before the freezer kicks in, so the user space consumer of the event may not be able to process it before the system is suspended. Second, if a wakeup event occurs after user space has been frozen, it is not generally guaranteed that the ongoing transition of the system into a sleep state will be aborted. To address these issues introduce a new global sysfs attribute, /sys/power/wakeup_count, associated with a running counter of wakeup events and three helper functions, pm_stay_awake(), pm_relax(), and pm_wakeup_event(), that may be used by kernel subsystems to control the behavior of this attribute and to request the PM core to abort system transitions into a sleep state already in progress. The /sys/power/wakeup_count file may be read from or written to by user space. Reads will always succeed (unless interrupted by a signal) and return the current value of the wakeup events counter. Writes, however, will only succeed if the written number is equal to the current value of the wakeup events counter. If a write is successful, it will cause the kernel to save the current value of the wakeup events counter and to abort the subsequent system transition into a sleep state if any wakeup events are reported after the write has returned. [The assumption is that before writing to /sys/power/state user space will first read from /sys/power/wakeup_count. Next, user space consumers of wakeup events will have a chance to acknowledge or veto the upcoming system transition to a sleep state. Finally, if the transition is allowed to proceed, /sys/power/wakeup_count will be written to and if that succeeds, /sys/power/state will be written to as well. Still, if any wakeup events are reported to the PM core by kernel subsystems after that point, the transition will be aborted.] Additionally, put a wakeup events counter into struct dev_pm_info and make these per-device wakeup event counters available via sysfs, so that it's possible to check the activity of various wakeup event sources within the kernel. To illustrate how subsystems can use pm_wakeup_event(), make the low-level PCI runtime PM wakeup-handling code use it. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org> Acked-by: Greg Kroah-Hartman <gregkh@suse.de> Acked-by: markgross <markgross@thegnar.org> Reviewed-by: Alan Stern <stern@rowland.harvard.edu>
* PNPACPI: Add support for remote wakeupAlan Stern2010-07-193-0/+27
| | | | | | | | | | | This patch (as1354) adds remote-wakeup support to the pnpacpi driver. The new can_wakeup method also allows other PNP protocol drivers (pnpbios or iaspnp) to add wakeup support, but I don't know enough about how they work to actually do it. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Reviewed-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
* PM: describe kernel policy regarding wakeup defaults (v. 2)Alan Stern2010-07-191-3/+7
| | | | | | | | | | | | | | | This patch (as1381b) updates a comment describing the kernel's policy toward enabling wakeup by default. It also makes device_set_wakeup_capable() actually do something when CONFIG_PM isn't enabled. It's not clear this is necessary; however if it isn't then device_init_wakeup() and device_can_wakeup() should also be do-nothing routines. Furthermore, I don't expect this change to have any noticeable effect -- but if it does then clearly the old behavior was wrong. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
* PM / Hibernate: Fix typos in comments in kernel/power/swap.cCesar Eduardo Barros2010-07-191-2/+2
| | | | | | | | There are a few typos in kernel/power/swap.c. Fix them. Signed-off-by: Cesar Eduardo Barros <cesarb@cesarb.net> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
* Merge branch 'x86/kprobes' of ↵Linus Torvalds2010-07-181-1/+1
|\ | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/frob/linux-2.6-roland * 'x86/kprobes' of git://git.kernel.org/pub/scm/linux/kernel/git/frob/linux-2.6-roland: x86: kprobes: fix swapped segment registers in kretprobe
| * x86: kprobes: fix swapped segment registers in kretprobeRoland McGrath2010-07-181-1/+1
| | | | | | | | | | | | | | | | | | In commit f007ea26, the order of the %es and %ds segment registers got accidentally swapped, so synthesized 'struct pt_regs' frames have the two values inverted. It's almost sure that these values never matter, and that they also never differ. But wrong is wrong. Signed-off-by: Roland McGrath <roland@redhat.com>
* | Merge branch 'for-linus' of ↵Linus Torvalds2010-07-183-0/+34
|\ \ | |/ |/| | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: PCI: fall back to original BIOS BAR addresses
| * PCI: fall back to original BIOS BAR addressesBjorn Helgaas2010-07-163-0/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we fail to assign resources to a PCI BAR, this patch makes us try the original address from BIOS rather than leaving it disabled. Linux tries to make sure all PCI device BARs are inside the upstream PCI host bridge or P2P bridge apertures, reassigning BARs if necessary. Windows does similar reassignment. Before this patch, if we could not move a BAR into an aperture, we left the resource unassigned, i.e., at address zero. Windows leaves such BARs at the original BIOS addresses, and this patch makes Linux do the same. This is a bit ugly because we disable the resource long before we try to reassign it, so we have to keep track of the BIOS BAR address somewhere. For lack of a better place, I put it in the struct pci_dev. I think it would be cleaner to attempt the assignment immediately when the claim fails, so we could easily remember the original address. But we currently claim motherboard resources in the middle, after attempting to claim PCI resources and before assigning new PCI resources, and changing that is a fairly big job. Addresses https://bugzilla.kernel.org/show_bug.cgi?id=16263 Reported-by: Andrew <nitr0@seti.kr.ua> Tested-by: Andrew <nitr0@seti.kr.ua> Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
* | Merge branch 'upstream-linus' of ↵Linus Torvalds2010-07-1816-224/+504
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2 * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2: ocfs2: Silence gcc warning in ocfs2_write_zero_page(). jbd2/ocfs2: Fix block checksumming when a buffer is used in several transactions ocfs2/dlm: Remove BUG_ON from migration in the rare case of a down node ocfs2: Don't duplicate pages past i_size during CoW. ocfs2: tighten up strlen() checking ocfs2: Make xattr reflink work with new local alloc reservation. ocfs2: make xattr extension work with new local alloc reservation. ocfs2: Remove the redundant cpu_to_le64. ocfs2/dlm: don't access beyond bitmap size ocfs2: No need to zero pages past i_size. ocfs2: Zero the tail cluster when extending past i_size. ocfs2: When zero extending, do it by page. ocfs2: Limit default local alloc size within bitmap range. ocfs2: Move orphan scan work to ocfs2_wq. fs/ocfs2/dlm: Add missing spin_unlock
| * | ocfs2: Silence gcc warning in ocfs2_write_zero_page().Joel Becker2010-07-161-1/+1
| | | | | | | | | | | | | | | | | | | | | ocfs2_write_zero_page() has a loop that won't ever be skipped, but gcc doesn't know that. Set ret=0 just to make gcc happy. Signed-off-by: Joel Becker <joel.becker@oracle.com>
| * | jbd2/ocfs2: Fix block checksumming when a buffer is used in several transactionsJan Kara2010-07-154-28/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | OCFS2 uses t_commit trigger to compute and store checksum of the just committed blocks. When a buffer has b_frozen_data, checksum is computed for it instead of b_data but this can result in an old checksum being written to the filesystem in the following scenario: 1) transaction1 is opened 2) handle1 is opened 3) journal_access(handle1, bh) - This sets jh->b_transaction to transaction1 4) modify(bh) 5) journal_dirty(handle1, bh) 6) handle1 is closed 7) start committing transaction1, opening transaction2 8) handle2 is opened 9) journal_access(handle2, bh) - This copies off b_frozen_data to make it safe for transaction1 to commit. jh->b_next_transaction is set to transaction2. 10) jbd2_journal_write_metadata() checksums b_frozen_data 11) the journal correctly writes b_frozen_data to the disk journal 12) handle2 is closed - There was no dirty call for the bh on handle2, so it is never queued for any more journal operation 13) Checkpointing finally happens, and it just spools the bh via normal buffer writeback. This will write b_data, which was never triggered on and thus contains a wrong (old) checksum. This patch fixes the problem by calling the trigger at the moment data is frozen for journal commit - i.e., either when b_frozen_data is created by do_get_write_access or just before we write a buffer to the log if b_frozen_data does not exist. We also rename the trigger to t_frozen as that better describes when it is called. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
| * | ocfs2/dlm: Remove BUG_ON from migration in the rare case of a down nodeWengang Wang2010-07-151-8/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For migration, we are waiting for DLM_LOCK_RES_MIGRATING flag to be set before sending DLM_MIG_LOCKRES_MSG message to the target. We are using dlm_migration_can_proceed() for that purpose. However, if the node is down, dlm_migration_can_proceed() will also return "go ahead". In this rare case, the DLM_LOCK_RES_MIGRATING flag might not be set yet. Remove the BUG_ON() that trips over this condition. Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
| * | ocfs2: Don't duplicate pages past i_size during CoW.Tao Ma2010-07-151-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | During CoW, the pages after i_size don't contain valid data, so there's no need to read and duplicate them. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
| * | ocfs2: tighten up strlen() checkingDan Carpenter2010-07-121-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This function is only called from one place and it's like this: dlm_register_domain(conn->cc_name, dlm_key, &fs_version); The "conn->cc_name" is 64 characters long. If strlen(conn->cc_name) were equal to O2NM_MAX_NAME_LEN (64) that would be a bug because strlen() doesn't count the NULL character. In fact, if you look how O2NM_MAX_NAME_LEN is used, it mostly describes 64 character buffers. The only exception is nd_name from struct o2nm_node. Anyway I looked into it and in this case the domain string comes from osb->uuid_str in ocfs2_setup_osb_uuid(). That's 32 characters and NULL which easily fits into O2NM_MAX_NAME_LEN. This patch doesn't change how the code works, but I think it makes the code a little cleaner. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
| * | ocfs2: Make xattr reflink work with new local alloc reservation.Tao Ma2010-07-121-36/+90
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The new reservation code in local alloc has add the limitation that the caller should handle the case that the local alloc doesn't give use enough contiguous clusters. It make the old xattr reflink code broken. So this patch udpate the xattr reflink code so that it can handle the case that local alloc give us one cluster at a time. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
| * | ocfs2: make xattr extension work with new local alloc reservation.Tao Ma2010-07-121-29/+45
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The old ocfs2_xattr_extent_allocation is too optimistic about the clusters we can get. So actually if the file system is too fragmented, ocfs2_add_clusters_in_btree will return us with EGAIN and we need to allocate clusters once again. So this patch change it to a while loop so that we can allocate clusters until we reach clusters_to_add. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com> Cc: stable@kernel.org
| * | ocfs2: Remove the redundant cpu_to_le64.Tao Ma2010-07-121-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In ocfs2_block_group_alloc, we set c_blkno by bg->bg_blkno. But actually bg->bg_blkno is already changed to little endian in ocfs2_block_group_fill. So remove the extra cpu_to_le64. Reported-by: Marcos Matsunaga <Marcos.Matsunaga@oracle.com> Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
| * | ocfs2/dlm: don't access beyond bitmap sizeWengang Wang2010-07-121-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | dlm->recovery_map is defined as unsigned long recovery_map[BITS_TO_LONGS(O2NM_MAX_NODES)]; We should treat O2NM_MAX_NODES as the bit map size in bits. This patches fixes a bit operation that takes O2NM_MAX_NODES + 1 as bitmap size. Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
| * | ocfs2: No need to zero pages past i_size.Joel Becker2010-07-121-4/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When ocfs2 fills a hole, it does so by allocating clusters. When a cluster is larger than the write, ocfs2 must zero the portions of the cluster outside of the write. If the clustersize is smaller than a pagecache page, this is handled by the normal pagecache mechanisms, but when the clustersize is larger than a page, ocfs2's write code will zero the pages adjacent to the write. This makes sure the entire cluster is zeroed correctly. Currently ocfs2 behaves exactly the same when writing past i_size. However, this means ocfs2 is writing zeroed pages for portions of a new cluster that are beyond i_size. The page writeback code isn't expecting this. It treats all pages past the one containing i_size as left behind due to a previous truncate operation. Thankfully, ocfs2 calculates the number of pages it will be working on up front. The rest of the write code merely honors the original calculation. We can simply trim the number of pages to only cover the actual file data. Signed-off-by: Joel Becker <joel.becker@oracle.com> Cc: stable@kernel.org
| * | ocfs2: Zero the tail cluster when extending past i_size.Joel Becker2010-07-086-54/+207
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ocfs2's allocation unit is the cluster. This can be larger than a block or even a memory page. This means that a file may have many blocks in its last extent that are beyond the block containing i_size. There also may be more unwritten extents after that. When ocfs2 grows a file, it zeros the entire cluster in order to ensure future i_size growth will see cleared blocks. Unfortunately, block_write_full_page() drops the pages past i_size. This means that ocfs2 is actually leaking garbage data into the tail end of that last cluster. This is a bug. We adjust ocfs2_write_begin_nolock() and ocfs2_extend_file() to detect when a write or truncate is past i_size. They will use ocfs2_zero_extend() to ensure the data is properly zeroed. Older versions of ocfs2_zero_extend() simply zeroed every block between i_size and the zeroing position. This presumes three things: 1) There is allocation for all of these blocks. 2) The extents are not unwritten. 3) The extents are not refcounted. (1) and (2) hold true for non-sparse filesystems, which used to be the only users of ocfs2_zero_extend(). (3) is another bug. Since we're now using ocfs2_zero_extend() for sparse filesystems as well, we teach ocfs2_zero_extend() to check every extent between i_size and the zeroing position. If the extent is unwritten, it is ignored. If it is refcounted, it is CoWed. Then it is zeroed. Signed-off-by: Joel Becker <joel.becker@oracle.com> Cc: stable@kernel.org
| * | ocfs2: When zero extending, do it by page.Joel Becker2010-07-082-64/+84
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ocfs2_zero_extend() does its zeroing block by block, but it calls a function named ocfs2_write_zero_page(). Let's have ocfs2_write_zero_page() handle the page level. From ocfs2_zero_extend()'s perspective, it is now page-at-a-time. Signed-off-by: Joel Becker <joel.becker@oracle.com> Cc: stable@kernel.org
| * | ocfs2: Limit default local alloc size within bitmap range.Tao Ma2010-06-151-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In commit 6b82021b9e91cd689fdffadbcdb9a42597bbe764, we increase our local alloc size and calculate how much megabytes we can get according to group size and volume size. But we also need to check the maximum bits a local alloc block bitmap can have. With a bs=512, cs=32K, local volume with 160G, it calculate 96MB while the maximum local alloc size is only 76M. So the bitmap will overflow and corrupt the system truncate log file. See bug http://oss.oracle.com/bugzilla/show_bug.cgi?id=1262 Signed-off-by: Tao Ma <tao.ma@oracle.com> Acked-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
| * | ocfs2: Move orphan scan work to ocfs2_wq.Tao Ma2010-06-151-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We used to let orphan scan work in the default work queue, but there is a corner case which will make the system deadlock. The scenario is like this: 1. set heartbeat threadshold to 200. this will allow us to have a great chance to have a orphan scan work before our quorum decision. 2. mount node 1. 3. after 1~2 minutes, mount node 2(in order to make the bug easier to reproduce, better add maxcpus=1 to kernel command line). 4. node 1 do orphan scan work. 5. node 2 do orphan scan work. 6. node 1 do orphan scan work. After this, node 1 hold the orphan scan lock while node 2 know node 1 is the master. 7. ifdown eth2 in node 2(eth2 is what we do ocfs2 interconnection). Now when node 2 begins orphan scan, the system queue is blocked. The root cause is that both orphan scan work and quorum decision work will use the system event work queue. orphan scan has a chance of blocking the event work queue(in dlm_wait_for_node_death) so that there is no chance for quorum decision work to proceed. This patch resolve it by moving orphan scan work to ocfs2_wq. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
| * | fs/ocfs2/dlm: Add missing spin_unlockJulia Lawall2010-06-151-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a spin_unlock missing on the error path. Unlock as in the other code that leads to the leave label. The semantic match that finds this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ expression E1; @@ * spin_lock(E1,...); <+... when != E1 if (...) { ... when != E1 * return ...; } ...+> * spin_unlock(E1,...); // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Joel Becker <joel.becker@oracle.com>
* | | drm/i915: add 'reclaimable' to i915 self-reclaimable page allocationsLinus Torvalds2010-07-181-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The hibernate issues that got fixed in commit 985b823b9192 ("drm/i915: fix hibernation since i915 self-reclaim fixes") turn out to have been incomplete. Vefa Bicakci tested lots of hibernate cycles, and without the __GFP_RECLAIMABLE flag the system eventually fails to resume. With the flag added, Vefa can apparently hibernate forever (or until he gets bored running his automated scripts, whichever comes first). The reclaimable flag was there originally, and was one of the flags that were dropped (unintentionally) by commit 4bdadb978569 ("drm/i915: Selectively enable self-reclaim") that introduced all these problems, but I didn't want to just blindly add back all the flags in commit 985b823b9192, and it looked like __GFP_RECLAIM wasn't necessary. It clearly was. I still suspect that there is some subtle reason we're missing that causes the problems, but __GFP_RECLAIMABLE is certainly not wrong to use in this context, and is what the code historically used. And we have no idea what the causes the corruption without it. Reported-and-tested-by: M. Vefa Bicakci <bicave@superonline.com> Cc: Dave Airlie <airlied@gmail.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | Merge branch 'perf-fixes-for-linus' of ↵Linus Torvalds2010-07-165-29/+36
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: tracing: Add alignment to syscall metadata declarations perf: Sync callchains with period based hits perf: Resurrect flat callchains perf: Version String fix, for fallback if not from git perf: Version String fix, using kernel version
| * | | tracing: Add alignment to syscall metadata declarationsSteven Rostedt2010-07-091-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For some reason if we declare a static variable and then assign it later, and the assignment contains a __attribute__((__aligned__(#))), some versions of gcc will ignore it. This caused the syscall meta data to not be compact in its section and caused a kernel oops when the section was being read. The fix for these versions of gcc seems to be to add the aligned attribute to the declaration as well. This fixes the BZ regression: https://bugzilla.kernel.org/show_bug.cgi?id=16353 Reported-by: Zeev Tarantov <zeev.tarantov@gmail.com> Tested-by: Zeev Tarantov <zeev.tarantov@gmail.com> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <AANLkTinkKVmB0fpVeqUkMeqe3ZYeXJdI8xDuzJEOjYwh@mail.gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * | | perf: Sync callchains with period based hitsFrederic Weisbecker2010-07-083-19/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Hists have their hits increased by the event period. And this period based counting is the foundation of all the stats in perf report. But callchains still use the raw number of hits, without taking the period into account. So when we compute the percentage, absolute based percentages are totally broken, and relative ones too in the first parent level. Because we pass the number of events muliplied by their period as the total number of hits to the callchain filtering, while callchains expect this number to be the number of raw hits. perf report -g graph was simply not working, showing no graph unless the min percent was zero. And even there the percentage of the branches was always 0. And may be fractal filtering was broken on the first branch level too. flat also was broken, but it was hidden because of other breakages. Anyway fix this by counting using periods on callchains. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Paul Mackerras <paulus@samba.org>
| * | | perf: Resurrect flat callchainsFrederic Weisbecker2010-07-081-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Initialize the callchain radix tree root correctly. When we walk through the parents, we must stop after the root, but since it wasn't well initialized, its parent pointer was random. Also the number of hits was random because uninitialized, hence it was part of the callchain while the root doesn't contain anything. This fixes segfaults and percentages followed by empty callchains while running: perf report -g flat Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: 2.6.31.x-2.6.34.x <stable@kernel.org>
| * | | perf: Version String fix, for fallback if not from gitThavidu Ranatunga2010-07-051-8/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This gets rid of the default version fallback for Perf and changes it so that it returns the version of the kernel from it's Makefile (if sources were not from git, ie. if it was downloaded from a tarball) Signed-off-by: Thavidu Ranatunga <tharan@au1.ibm.com> Acked-by: Ian Munsie <imunsie@au1.ibm.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <1278316815-6099-2-git-send-email-tharan@au1.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | perf: Version String fix, using kernel versionThavidu Ranatunga2010-07-051-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Changes the Perf --version string such that it shows the kernel version as suggested by Ingo as follows: That way the perf that comes with v2.6.34 will be: perf version v2.6.34 while interim versions will have the version of the interim kernel - for example: perf version v2.6.35-rc4-70-g39ef13a This functionality was already in the perf version generator file except that it was looking for a .git in the perf directory instead of the kernel directory. Signed-off-by: Thavidu Ranatunga <tharan@au1.ibm.com> Acked-by: Ian Munsie <imunsie@au1.ibm.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <1278316815-6099-1-git-send-email-tharan@au1.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixesLinus Torvalds2010-07-165-10/+23
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes: GFS2: rename causes kernel Oops GFS2: BUG in gfs2_adjust_quota GFS2: Fix kernel NULL pointer dereference by dlm_astd GFS2: recovery stuck on transaction lock GFS2: O_TRUNC not working on stuffed files across cluster
| * | | | GFS2: rename causes kernel OopsBob Peterson2010-07-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes a kernel Oops in the GFS2 rename code. The problem was in the way the gfs2 directory code was trying to re-use sentinel directory entries. In the failing case, gfs2's rename function was renaming a file to another name that had the same non-trivial length. The file being renamed happened to be the first directory entry on the leaf block. First, the rename code (gfs2_rename in ops_inode.c) found the original directory entry and decided it could do its job by simply replacing the directory entry with another. Therefore it determined correctly that no block allocations were needed. Next, the rename code deleted the old directory entry prior to replacing it with the new name. Therefore, the soon-to-be replaced directory entry was temporarily made into a directory entry "sentinel" or a place holder at the start of a leaf block. Lastly, it went to re-add the replacement directory entry in that leaf block. However, when gfs2_dirent_find_space was looking for space in the leaf block, it used the wrong value for the sentinel. That threw off its calculations so later it decides it can't really re-use the sentinel and therefore must allocate a new leaf block. But because it previously decided to re-use the directory entry, it didn't waste the time to grab a new block allocation for the inode. Therefore, the inode's i_alloc pointer was still NULL and it crashes trying to reference it. In the case of sentinel directory entries, the entire dirent is reused, not just the "free space" portion of it, and therefore the function gfs2_dirent_find_space should use the value 0 rather than GFS2_DIRENT_SIZE(0) for the actual dirent size. Fixing this calculation enables the reproducer programs to work properly. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| * | | | GFS2: BUG in gfs2_adjust_quotaAbhijith Das2010-07-151-5/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | HighMem pages on i686 do not get mapped to the buffer_heads and this was causing a NULL pointer dereference when we were trying to memset page buffers to zero. We now use zero_user() that kmaps the page and directly manipulates page data. This patch also fixes a boundary condition that was incorrect. Signed-off-by: Abhi Das <adas@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| * | | | GFS2: Fix kernel NULL pointer dereference by dlm_astdBob Peterson2010-07-151-4/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes a problem in an error path when looking up dinodes. There are two sister-functions, gfs2_inode_lookup and gfs2_process_unlinked_inode. Both functions acquire and hold the i_iopen glock for the dinode being looked up. The last thing they try to do is hold the i_gl glock for the dinode. If that glock fails for some reason, the error path was incorrectly calling gfs2_glock_put for the i_iopen glock twice. This resulted in the glock being prematurely freed. The "minimum hold time" usually kept the glock in memory, but the lock interface to dlm (aka lock_dlm) freed its memory for the glock. In some circumstances, it would cause dlm's dlm_astd daemon to try to call the bast function for the freed lock_dlm memory, which resulted in a NULL pointer dereference. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| * | | | GFS2: recovery stuck on transaction lockBob Peterson2010-07-151-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes bugzilla bug #590878: GFS2: recovery stuck on transaction lock. We set the frozen flag on the glock when we receive a completion that cannot be delivered due to blocked locks. At that point we check to see whether the first waiting holder has the noexp flag set. If the noexp lock is queued later, then we need to unfreeze the glock at that point in time, namely, in the glock work function. This patch was originally written by Steve Whitehouse, but since he's on holiday, I'm submitting it. It's been well tested with a complex recovery test called revolver. Signed-off-by: Steve Whitehouse <swhiteho@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
| * | | | GFS2: O_TRUNC not working on stuffed files across clusterBob Peterson2010-07-151-0/+1
| | |_|/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch replaces a statement that got dropped out by accident. Without the patch, truncates on stuffed (very small) files cause those files to have an unpredictable size. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* | | | Merge branch 'for-linus' of ↵Linus Torvalds2010-07-163-7/+8
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: w90p910_ts - fix call to setup_timer() Input: synaptics - fix wrong dimensions check Input: i8042 - mark stubs in i8042.h "static inline"
| * | | | Input: w90p910_ts - fix call to setup_timer()Wan ZongShun2010-07-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | No need to take address, w90p910_ts is already a pointer. Signed-off-by: Wan ZongShun <mcuos.com@gmail.com> Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
| * | | | Input: synaptics - fix wrong dimensions checkTakashi Iwai2010-07-141-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The commit 83ba9ea8a04b72dfee2515428c15e7414ba4fc61 ommitted the return line for the old synaptics model accidentally. This resulted in a wrong check, namely, the dimensions are checked for the old devices that don't support the query properly. This patch adds the return line back. Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
| * | | | Input: i8042 - mark stubs in i8042.h "static inline"Feng Tang2010-06-301-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Otherwise we may run into following: drivers/platform/built-in.o: In function `i8042_lock_chip': /home/test/ws2/projects/linux-2.6/include/linux/i8042.h:50: multiple definition of `i8042_lock_chip' drivers/input/serio/built-in.o:/home/test/ws2/projects/linux-2.6/include/linux/i8042.h:50: first defined here ... make[1]: *** [drivers/built-in.o] Error 1 make: *** [drivers] Error 2 Signed-off-by: Feng Tang <feng.tang@intel.com> Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
* | | | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6Linus Torvalds2010-07-151-1/+1
|\ \ \ \ \ | |_|/ / / |/| | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: skcipher - avoid NULL dereference
| * | | | crypto: skcipher - avoid NULL dereferenceJiri Slaby2010-06-231-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Stanse found a potential NULL dereference in ablkcipher_next_slow. Even though kmalloc fails, its retval is dereferenced later. Return from that function properly earlier. Signed-off-by: Jiri Slaby <jslaby@suse.cz> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* | | | | Merge master.kernel.org:/home/rmk/linux-2.6-armLinus Torvalds2010-07-148-93/+243
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * master.kernel.org:/home/rmk/linux-2.6-arm: ARM: 6226/1: fix kprobe bug in ldr instruction emulation ARM: Update mach-types ARM: lockdep: fix unannotated irqs-on ARM: 6184/2: ux500: use neutral PRCMU base ARM: 6212/1: atomic ops: add memory constraints to inline asm ARM: 6211/1: atomic ops: fix register constraints for atomic64_add_unless ARM: 6210/1: Do not rely on reset defaults of L2X0_AUX_CTRL
| * | | | | ARM: 6226/1: fix kprobe bug in ldr instruction emulationNicolas Pitre2010-07-141-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | From: Bin Yang <bin.yang@marvell.com> Cc: stable@kernel.org Signed-off-by: Bin Yang <bin.yang@marvell.com> Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
| * | | | | ARM: Update mach-typesRussell King2010-07-121-3/+149
| | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
| * | | | | ARM: lockdep: fix unannotated irqs-onRussell King2010-07-102-19/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | CPU: Testing write buffer coherency: ok ------------[ cut here ]------------ WARNING: at kernel/lockdep.c:3145 check_flags+0xcc/0x1dc() Modules linked in: [<c0035120>] (unwind_backtrace+0x0/0xf8) from [<c0355374>] (dump_stack+0x20/0x24) [<c0355374>] (dump_stack+0x20/0x24) from [<c0060c04>] (warn_slowpath_common+0x58/0x70) [<c0060c04>] (warn_slowpath_common+0x58/0x70) from [<c0060c3c>] (warn_slowpath_null+0x20/0x24) [<c0060c3c>] (warn_slowpath_null+0x20/0x24) from [<c008f224>] (check_flags+0xcc/0x1dc) [<c008f224>] (check_flags+0xcc/0x1dc) from [<c00945dc>] (lock_acquire+0x50/0x140) [<c00945dc>] (lock_acquire+0x50/0x140) from [<c0358434>] (_raw_spin_lock+0x50/0x88) [<c0358434>] (_raw_spin_lock+0x50/0x88) from [<c00fd114>] (set_task_comm+0x2c/0x60) [<c00fd114>] (set_task_comm+0x2c/0x60) from [<c007e184>] (kthreadd+0x30/0x108) [<c007e184>] (kthreadd+0x30/0x108) from [<c0030104>] (kernel_thread_exit+0x0/0x8) ---[ end trace 1b75b31a2719ed1c ]--- possible reason: unannotated irqs-on. irq event stamp: 3 hardirqs last enabled at (2): [<c0059bb0>] finish_task_switch+0x48/0xb0 hardirqs last disabled at (3): [<c002f0b0>] ret_slow_syscall+0xc/0x1c softirqs last enabled at (0): [<c005f3e0>] copy_process+0x394/0xe5c softirqs last disabled at (0): [<(null)>] (null) Fix this by ensuring that the lockdep interrupt state is manipulated in the appropriate places. We essentially treat userspace as an entirely separate environment which isn't relevant to lockdep (lockdep doesn't monitor userspace.) We don't tell lockdep that IRQs will be enabled in that environment. Instead, when creating kernel threads (which is a rare event compared to entering/leaving userspace) we have to update the lockdep state. Do this by starting threads with IRQs disabled, and in the kthread helper, tell lockdep that IRQs are enabled, and enable them. This provides lockdep with a consistent view of the current IRQ state in kernel space. This also revert portions of 0d928b0b616d1c5c5fe76019a87cba171ca91633 which didn't fix the problem. Tested-by: Ming Lei <tom.leiming@gmail.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
| * | | | | ARM: 6184/2: ux500: use neutral PRCMU baseLinus Walleij2010-07-092-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The MTU wallclock timing fix-up patch was hardwired to the DB8500 causing a regression. This makes it work on the DB5500 as well. Signed-off-by: Linus Walleij <linus.walleij@stericsson.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
| * | | | | ARM: 6212/1: atomic ops: add memory constraints to inline asmWill Deacon2010-07-091-66/+66
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, the 32-bit and 64-bit atomic operations on ARM do not include memory constraints in the inline assembly blocks. In the case of barrier-less operations [for example, atomic_add], this means that the compiler may constant fold values which have actually been modified by a call to an atomic operation. This issue can be observed in the atomic64_test routine in <kernel root>/lib/atomic64_test.c: 00000000 <test_atomic64>: 0: e1a0c00d mov ip, sp 4: e92dd830 push {r4, r5, fp, ip, lr, pc} 8: e24cb004 sub fp, ip, #4 c: e24dd008 sub sp, sp, #8 10: e24b3014 sub r3, fp, #20 14: e30d000d movw r0, #53261 ; 0xd00d 18: e3011337 movw r1, #4919 ; 0x1337 1c: e34c0001 movt r0, #49153 ; 0xc001 20: e34a1aa3 movt r1, #43683 ; 0xaaa3 24: e16300f8 strd r0, [r3, #-8]! 28: e30c0afe movw r0, #51966 ; 0xcafe 2c: e30b1eef movw r1, #48879 ; 0xbeef 30: e34d0eaf movt r0, #57007 ; 0xdeaf 34: e34d1ead movt r1, #57005 ; 0xdead 38: e1b34f9f ldrexd r4, [r3] 3c: e1a34f90 strexd r4, r0, [r3] 40: e3340000 teq r4, #0 44: 1afffffb bne 38 <test_atomic64+0x38> 48: e59f0004 ldr r0, [pc, #4] ; 54 <test_atomic64+0x54> 4c: e3a0101e mov r1, #30 50: ebfffffe bl 0 <__bug> 54: 00000000 .word 0x00000000 The atomic64_set (0x38-0x44) writes to the atomic64_t, but the compiler doesn't see this, assumes the test condition is always false and generates an unconditional branch to __bug. The rest of the test is optimised away. This patch adds suitable memory constraints to the atomic operations on ARM to ensure that the compiler is informed of the correct data hazards. We have to use the "Qo" constraints to avoid hitting the GCC anomaly described at http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44492 , where the compiler makes assumptions about the writeback in the addressing mode used by the inline assembly. These constraints forbid the use of auto{inc,dec} addressing modes, so it doesn't matter if we don't use the operand exactly once. Cc: stable@kernel.org Reviewed-by: Nicolas Pitre <nicolas.pitre@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
| * | | | | ARM: 6211/1: atomic ops: fix register constraints for atomic64_add_unlessWill Deacon2010-07-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The atomic64_add_unless function compares an atomic variable with a given value and, if they are not equal, adds another given value to the atomic variable. The function returns zero if the addition did not occur and non-zero otherwise. On ARM, the return value is initialised to 1 in C code. Inline assembly code then performs the atomic64_add_unless operation, setting the return value to 0 iff the addition does not occur. This means that when the addition *does* occur, the value of ret must be preserved across the inline assembly and therefore requires a "+r" constraint rather than the current one of "=&r". Thanks to Nicolas Pitre for helping to spot this. Cc: stable@kernel.org Reviewed-by: Nicolas Pitre <nicolas.pitre@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>