summaryrefslogtreecommitdiff
path: root/include
Commit message (Collapse)AuthorAgeFilesLines
* NVMe: Change the definition of nvme_user_ioMatthew Wilcox2011-11-041-5/+3
| | | | | | | | | | | | | | | | | | | | | | | The read and write commands don't define a 'result', so there's no need to copy it back to userspace. Remove the ability of the ioctl to submit commands to a different namespace; it's just asking for trouble, and the use case I have in mind will be addressed througha different ioctl in the future. That removes the need for both the block_shift and nsid arguments. Check that the opcode is one of 'read' or 'write'. Future opcodes may be added in the future, but we will need a different structure definition for them. The nblocks field is redefined to be 0-based. This allows the user to request the full 65536 blocks. Don't byteswap the reftag, apptag and appmask. Martin Petersen tells me these are calculated in big-endian and are transmitted to the device in big-endian. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
* NVMe: Correct the definitions of two ioctlsMatthew Wilcox2011-11-041-2/+2
| | | | | | | | NVME_IOCTL_SUBMIT_IO has a struct nvme_user_io, not a struct nvme_rw_command as a parameter, and NVME_IOCTL_DOWNLOAD_FW is a Write, not a Read. Reported-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
* NVMe: Remove outdated commentsMatthew Wilcox2011-11-041-1/+0
| | | | | | | The head can never overrun the tail since we won't allocate enough command IDs to let that happen. The status codes are in sync with the spec. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
* NVMe: Update admin opcodes to match the 1.0RC specKrzysztof Wierzbicki2011-11-041-7/+7
| | | | | | Signed-off-by: Krzysztof Wierzbicki <krzysztof.wierzbicki@intel.com> Signed-off-by: Matthew Wilcox <willy@linux.intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
* NVMe: Update BAR structure to match the current specMatthew Wilcox2011-11-041-2/+4
| | | | | | | | | | Add two reserved registers in the middle of the BAR to match the 1.0 spec plus ECN 0002. Also rename IMC and ISC to INTMC and INTSC to conform with the spec. We still don't need to use them :-) Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
* NVMe: Add download / activate firmware ioctlsMatthew Wilcox2011-11-041-6/+27
| | | | Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
* NVMe: Add remaining status codesMatthew Wilcox2011-11-041-0/+15
| | | | Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
* NVMe: Add NVME_IOCTL_SUBMIT_IOMatthew Wilcox2011-11-041-0/+18
| | | | | | Allow userspace to submit synchronous I/O like the SCSI sg interface does. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
* NVMe: Make nvme_common_command more featurefulMatthew Wilcox2011-11-041-8/+12
| | | | | | | Add prp1, prp2 and the metadata prp to the common command, since the fields are generally used this way. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
* NVMe: New driverMatthew Wilcox2011-11-041-0/+343
| | | | | | This driver is for devices that follow the NVM Express standard Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
* Merge branch 'nf' of git://1984.lsi.us.es/netDavid S. Miller2011-10-171-0/+1
|\
| * IPVS netns shutdown/startup dead-lockHans Schillstrom2011-10-121-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ip_vs_mutext is used by both netns shutdown code and startup and both implicit uses sk_lock-AF_INET mutex. cleanup CPU-1 startup CPU-2 ip_vs_dst_event() ip_vs_genl_set_cmd() sk_lock-AF_INET __ip_vs_mutex sk_lock-AF_INET __ip_vs_mutex * DEAD LOCK * A new mutex placed in ip_vs netns struct called sync_mutex is added. Comments from Julian and Simon added. This patch has been running for more than 3 month now and it seems to work. Ver. 3 IP_VS_SO_GET_DAEMON in do_ip_vs_get_ctl protected by sync_mutex instead of __ip_vs_mutex as sugested by Julian. Signed-off-by: Hans Schillstrom <hans@schillstrom.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* | udplite: fast-path computation of checksum coverageGerrit Renker2011-10-171-32/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 903ab86d195cca295379699299c5fc10beba31c7 of 1 March this year ("udp: Add lockless transmit path") introduced a new fast TX path that broke the checksum coverage computation of UDP-lite, which so far depended on up->len (only set if the socket is locked and 0 in the fast path). Fixed by providing both fast- and slow-path computation of checksum coverage. The latter can be removed when UDP(-lite)v6 also uses a lockless transmit path. Reported-by: Thomas Volkert <thomas@homer-conferencing.com> Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge branch 'for-linus' of http://people.redhat.com/agk/git/linux-dmLinus Torvalds2011-10-061-0/+5
|\ \ | | | | | | | | | | | | | | | | | | | | | * 'for-linus' of http://people.redhat.com/agk/git/linux-dm: dm crypt: always disable discard_zeroes_data dm: raid fix write_mostly arg validation dm table: avoid crash if integrity profile changes dm: flakey fix corrupt_bio_byte error path
| * | dm crypt: always disable discard_zeroes_dataMilan Broz2011-09-251-0/+5
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | If optional discard support in dm-crypt is enabled, discards requests bypass the crypt queue and blocks of the underlying device are discarded. For the read path, discarded blocks are handled the same as normal ciphertext blocks, thus decrypted. So if the underlying device announces discarded regions return zeroes, dm-crypt must disable this flag because after decryption there is just random noise instead of zeroes. Signed-off-by: Milan Broz <mbroz@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
* | Merge git://github.com/davem330/netLinus Torvalds2011-10-041-3/+10
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://github.com/davem330/net: pch_gbe: Fixed the issue on which a network freezes pch_gbe: Fixed the issue on which PC was frozen when link was downed. make PACKET_STATISTICS getsockopt report consistently between ring and non-ring net: xen-netback: correctly restart Tx after a VM restore/migrate bonding: properly stop queuing work when requested can bcm: fix incomplete tx_setup fix RDSRDMA: Fix cleanup of rds_iw_mr_pool net: Documentation: Fix type of variables ibmveth: Fix oops on request_irq failure ipv6: nullify ipv6_ac_list and ipv6_fl_list when creating new socket cxgb4: Fix EEH on IBM P7IOC can bcm: fix tx_setup off-by-one errors MAINTAINERS: tehuti: Alexander Indenbaum's address bounces dp83640: reduce driver noise ptp: fix L2 event message recognition
| * | ptp: fix L2 event message recognitionRichard Cochran2011-09-291-3/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The IEEE 1588 standard defines two kinds of messages, event and general messages. Event messages require time stamping, and general do not. When using UDP transport, two separate ports are used for the two message types. The BPF designed to recognize event messages incorrectly classifies L2 general messages as event messages. This commit fixes the issue by extending the filter to check the message type field for L2 PTP packets. Event messages are be distinguished from general messages by testing the "general" bit. Signed-off-by: Richard Cochran <richard.cochran@omicron.at> Cc: <stable@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | PCI: Disable MPS configuration by defaultJon Mason2011-10-041-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add the ability to disable PCI-E MPS turning and using the BIOS configured MPS defaults. Due to the number of issues recently discovered on some x86 chipsets, make this the default behavior. Also, add the option for peer to peer DMA MPS configuration. Peer to peer DMA is outside the scope of this patch, but MPS configuration could prevent it from working by having the MPS on one root port different than the MPS on another. To work around this, simply make the system wide MPS the smallest possible value (128B). Signed-off-by: Jon Mason <mason@myri.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| | |
| \ \
| \ \
| \ \
*---. \ \ Merge branches 'irq-urgent-for-linus', 'x86-urgent-for-linus' and ↵Linus Torvalds2011-10-012-1/+1
|\ \ \ \ \ | |_|_|/ / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 'sched-urgent-for-linus' of git://tesla.tglx.de/git/linux-2.6-tip * 'irq-urgent-for-linus' of git://tesla.tglx.de/git/linux-2.6-tip: irq: Fix check for already initialized irq_domain in irq_domain_add irq: Add declaration of irq_domain_simple_ops to irqdomain.h * 'x86-urgent-for-linus' of git://tesla.tglx.de/git/linux-2.6-tip: x86/rtc: Don't recursively acquire rtc_lock * 'sched-urgent-for-linus' of git://tesla.tglx.de/git/linux-2.6-tip: posix-cpu-timers: Cure SMP wobbles sched: Fix up wchan borkage sched/rt: Migrate equal priority tasks to available CPUs
| | | * | posix-cpu-timers: Cure SMP wobblesPeter Zijlstra2011-09-301-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | David reported: Attached below is a watered-down version of rt/tst-cpuclock2.c from GLIBC. Just build it with "gcc -o test test.c -lpthread -lrt" or similar. Run it several times, and you will see cases where the main thread will measure a process clock difference before and after the nanosleep which is smaller than the cpu-burner thread's individual thread clock difference. This doesn't make any sense since the cpu-burner thread is part of the top-level process's thread group. I've reproduced this on both x86-64 and sparc64 (using both 32-bit and 64-bit binaries). For example: [davem@boricha build-x86_64-linux]$ ./test process: before(0.001221967) after(0.498624371) diff(497402404) thread: before(0.000081692) after(0.498316431) diff(498234739) self: before(0.001223521) after(0.001240219) diff(16698) [davem@boricha build-x86_64-linux]$ The diff of 'process' should always be >= the diff of 'thread'. I make sure to wrap the 'thread' clock measurements the most tightly around the nanosleep() call, and that the 'process' clock measurements are the outer-most ones. --- #include <unistd.h> #include <stdio.h> #include <stdlib.h> #include <time.h> #include <fcntl.h> #include <string.h> #include <errno.h> #include <pthread.h> static pthread_barrier_t barrier; static void *chew_cpu(void *arg) { pthread_barrier_wait(&barrier); while (1) __asm__ __volatile__("" : : : "memory"); return NULL; } int main(void) { clockid_t process_clock, my_thread_clock, th_clock; struct timespec process_before, process_after; struct timespec me_before, me_after; struct timespec th_before, th_after; struct timespec sleeptime; unsigned long diff; pthread_t th; int err; err = clock_getcpuclockid(0, &process_clock); if (err) return 1; err = pthread_getcpuclockid(pthread_self(), &my_thread_clock); if (err) return 1; pthread_barrier_init(&barrier, NULL, 2); err = pthread_create(&th, NULL, chew_cpu, NULL); if (err) return 1; err = pthread_getcpuclockid(th, &th_clock); if (err) return 1; pthread_barrier_wait(&barrier); err = clock_gettime(process_clock, &process_before); if (err) return 1; err = clock_gettime(my_thread_clock, &me_before); if (err) return 1; err = clock_gettime(th_clock, &th_before); if (err) return 1; sleeptime.tv_sec = 0; sleeptime.tv_nsec = 500000000; nanosleep(&sleeptime, NULL); err = clock_gettime(th_clock, &th_after); if (err) return 1; err = clock_gettime(my_thread_clock, &me_after); if (err) return 1; err = clock_gettime(process_clock, &process_after); if (err) return 1; diff = process_after.tv_nsec - process_before.tv_nsec; printf("process: before(%lu.%.9lu) after(%lu.%.9lu) diff(%lu)\n", process_before.tv_sec, process_before.tv_nsec, process_after.tv_sec, process_after.tv_nsec, diff); diff = th_after.tv_nsec - th_before.tv_nsec; printf("thread: before(%lu.%.9lu) after(%lu.%.9lu) diff(%lu)\n", th_before.tv_sec, th_before.tv_nsec, th_after.tv_sec, th_after.tv_nsec, diff); diff = me_after.tv_nsec - me_before.tv_nsec; printf("self: before(%lu.%.9lu) after(%lu.%.9lu) diff(%lu)\n", me_before.tv_sec, me_before.tv_nsec, me_after.tv_sec, me_after.tv_nsec, diff); return 0; } This is due to us using p->se.sum_exec_runtime in thread_group_cputime() where we iterate the thread group and sum all data. This does not take time since the last schedule operation (tick or otherwise) into account. We can cure this by using task_sched_runtime() at the cost of having to take locks. This also means we can (and must) do away with thread_group_sched_runtime() since the modified thread_group_cputime() is now more accurate and would deadlock when called from thread_group_sched_runtime(). Aside of that it makes the function safe on 32 bit systems. The old code added t->se.sum_exec_runtime unprotected. sum_exec_runtime is a 64bit value and could be changed on another cpu at the same time. Reported-by: David Miller <davem@davemloft.net> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: stable@kernel.org Link: http://lkml.kernel.org/r/1314874459.7945.22.camel@twins Tested-by: David Miller <davem@davemloft.net> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | | irq: Add declaration of irq_domain_simple_ops to irqdomain.hRob Herring2011-09-201-0/+1
| |/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | irq_domain_simple_ops is exported, but is not declared in irqdomain.h, so add it. Signed-off-by: Rob Herring <rob.herring@calxeda.com> Cc: Grant Likely <grant.likely@secretlab.ca> Cc: marc.zyngier@arm.com Cc: thomas.abraham@linaro.org Cc: jamie@jamieiles.com Cc: b-cousson@ti.com Cc: shawn.guo@linaro.org Cc: linux-arm-kernel@lists.infradead.org Cc: devicetree-discuss@lists.ozlabs.org Link: http://lkml.kernel.org/r/1316017900-19918-2-git-send-email-robherring2@gmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* | | | Merge branch 'writeback-for-linus' of git://github.com/fengguang/linuxLinus Torvalds2011-09-281-5/+5
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | * 'writeback-for-linus' of git://github.com/fengguang/linux: writeback: show raw dirtied_when in trace writeback_single_inode
| * | | | writeback: show raw dirtied_when in trace writeback_single_inodeWu Fengguang2011-08-311-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Save inode->dirtied_when in the raw trace output for reliable scripting, and to also show in formatted output the relative age in seconds for easy human reading. CC: Jan Kara <jack@suse.cz> Acked-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
* | | | | vfs: remove LOOKUP_NO_AUTOMOUNT flagLinus Torvalds2011-09-271-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | That flag no longer makes sense, since we don't look up automount points as eagerly any more. Additionally, it turns out that the NO_AUTOMOUNT handling was buggy to begin with: it would avoid automounting even for cases where we really *needed* to do the automount handling, and could return ENOENT for autofs entries that hadn't been instantiated yet. With our new non-eager automount semantics, one discussion has been about adding a AT_AUTOMOUNT flag to vfs_fstatat (and thus the newfstatat() and fstatat64() system calls), but it's probably not worth it: you can always force at least directory automounting by simply adding the final '/' to the filename, which works for *all* of the stat family system calls, old and new. So AT_NO_AUTOMOUNT (and thus LOOKUP_NO_AUTOMOUNT) really were just a result of our bad default behavior. Acked-by: Ian Kent <raven@themaw.net> Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | vfs pathname lookup: Add LOOKUP_AUTOMOUNT flagLinus Torvalds2011-09-261-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since we've now turned around and made LOOKUP_FOLLOW *not* force an automount, we want to add the ability to force an automount event on lookup even if we don't happen to have one of the other flags that force it implicitly (LOOKUP_OPEN, LOOKUP_DIRECTORY, LOOKUP_PARENT..) Most cases will never want to use this, since you'd normally want to delay automounting as long as possible, which usually implies LOOKUP_OPEN (when we open a file or directory, we really cannot avoid the automount any more). But Trond argued sufficiently forcefully that at a minimum bind mounting a file and quotactl will want to force the automount lookup. Some other cases (like nfs_follow_remote_path()) could use it too, although LOOKUP_DIRECTORY would work there as well. This commit just adds the flag and logic, no users yet, though. It also doesn't actually touch the LOOKUP_NO_AUTOMOUNT flag that is related, and was made irrelevant by the same change that made us not follow on LOOKUP_FOLLOW. Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Ian Kent <raven@themaw.net> Cc: Jeff Layton <jlayton@redhat.com> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: David Howells <dhowells@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Greg KH <gregkh@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | Merge branch 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6Linus Torvalds2011-09-221-0/+1
|\ \ \ \ \ | |_|_|_|/ |/| | | | | | | | | | | | | | | | | | | * 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6: [S390] kvm: extension capability for new address space layout [S390] kvm: fix address mode switching
| * | | | [S390] kvm: extension capability for new address space layoutChristian Borntraeger2011-09-201-0/+1
| | |/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 598841ca9919d008b520114d8a4378c4ce4e40a1 ([S390] use gmap address spaces for kvm guest images) changed kvm on s390 to use a separate address space for kvm guests. We can now put KVM guests anywhere in the user address mode with a size up to 8PB - as long as the memory is 1MB-aligned. This change was done without KVM extension capability bit. The change was added after 3.0, but we still have a chance to add a feature bit before 3.1 (keeping the releases in a sane state). We use number 71 to avoid collisions with other pending kvm patches as requested by Alexander Graf. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Avi Kivity <avi@redhat.com> Cc: Alexander Graf <agraf@suse.de> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
* | | | Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds2011-09-213-5/+4
|\ \ \ \ | |/ / / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'for-linus' of git://git.kernel.dk/linux-block: floppy: use del_timer_sync() in init cleanup blk-cgroup: be able to remove the record of unplugged device block: Don't check QUEUE_FLAG_SAME_COMP in __blk_complete_request mm: Add comment explaining task state setting in bdi_forker_thread() mm: Cleanup clearing of BDI_pending bit in bdi_forker_thread() block: simplify force plug flush code a little bit block: change force plug flush call order block: Fix queue_flag update when rq_affinity goes from 2 to 1 block: separate priority boosting from REQ_META block: remove READ_META and WRITE_META xen-blkback: fixed indentation and comments xen-blkback: Don't disconnect backend until state switched to XenbusStateClosed.
| * | | block: simplify force plug flush code a little bitShaohua Li2011-08-241-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Cleaning up the code a little bit. attempt_plug_merge() traverses the plug list anyway, we can do the request counting there, so stack size is reduced a little bit. The motivation here is I suspect if we should count the requests for each queue (task could handle multiple disks in the meantime), but my test doesn't show it's worthy doing. If somebody proves we should do it, below change will make that more easier. Signed-off-by: Shaohua Li <shli@kernel.org> Signed-off-by: Shaohua Li <shaohua.li@intel.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| * | | block: separate priority boosting from REQ_METAChristoph Hellwig2011-08-231-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a new REQ_PRIO to let requests preempt others in the cfq I/O schedule, and lave REQ_META purely for marking requests as metadata in blktrace. All existing callers of REQ_META except for XFS are updated to also set REQ_PRIO for now. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Namhyung Kim <namhyung@gmail.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| * | | block: remove READ_META and WRITE_METAChristoph Hellwig2011-08-231-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Replace all occurnanced of the undocumented READ_META with READ | REQ_META and remove the unused WRITE_META define. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
* | | | Merge git://github.com/davem330/netLinus Torvalds2011-09-191-1/+18
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | * git://github.com/davem330/net: tcp: fix validation of D-SACK tcp: fix build error if !CONFIG_SYN_COOKIES
| * | | | tcp: fix build error if !CONFIG_SYN_COOKIESEric Dumazet2011-09-181-1/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 946cedccbd7387 (tcp: Change possible SYN flooding messages) added a build error if CONFIG_SYN_COOKIES=n Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | Merge branch 'for-linus' of git://git.infradead.org/users/sameo/mfd-2.6Linus Torvalds2011-09-181-1/+1
|\ \ \ \ \ | |/ / / / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'for-linus' of git://git.infradead.org/users/sameo/mfd-2.6: mfd: Fix omap-usb-host build failure mfd: Make omap-usb-host TLL mode work again mfd: Set MAX8997 irq pointer mfd: Fix initialisation of tps65910 interrupts mfd: Check for twl4030-madc NULL pointer mfd: Copy the device pointer to the twl4030-madc structure mfd: Rename wm8350 static gpio_set_debounce() mfd: Fix value of WM8994_CONFIGURE_GPIO
| * | | | mfd: Fix value of WM8994_CONFIGURE_GPIOMark Brown2011-09-061-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This needs to be an out of band value for the register and on this device registers are 16 bit so we must shift left one to the 17th bit. Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Cc: stable@kernel.org Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
* | | | | Merge git://github.com/davem330/netLinus Torvalds2011-09-187-4/+32
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://github.com/davem330/net: (62 commits) ipv6: don't use inetpeer to store metrics for routes. can: ti_hecc: include linux/io.h IRDA: Fix global type conflicts in net/irda/irsysctl.c v2 net: Handle different key sizes between address families in flow cache net: Align AF-specific flowi structs to long ipv4: Fix fib_info->fib_metrics leak caif: fix a potential NULL dereference sctp: deal with multiple COOKIE_ECHO chunks ibmveth: Fix checksum offload failure handling ibmveth: Checksum offload is always disabled ibmveth: Fix issue with DMA mapping failure ibmveth: Fix DMA unmap error pch_gbe: support ML7831 IOH pch_gbe: added the process of FIFO over run error pch_gbe: fixed the issue which receives an unnecessary packet. sfc: Use 64-bit writes for TX push where possible Revert "sfc: Use write-combining to reduce TX latency" and follow-ups bnx2x: Fix ethtool advertisement bnx2x: Fix 578xx link LED bnx2x: Fix XMAC loopback test ...
| * | | | | net: Handle different key sizes between address families in flow cachedpward2011-09-161-0/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With the conversion of struct flowi to a union of AF-specific structs, some operations on the flow cache need to account for the exact size of the key. Signed-off-by: David Ward <david.ward@ll.mit.edu> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | net: Align AF-specific flowi structs to longDavid Ward2011-09-161-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | AF-specific flowi structs are now passed to flow_key_compare, which must also be aligned to a long. Signed-off-by: David Ward <david.ward@ll.mit.edu> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | sctp: deal with multiple COOKIE_ECHO chunksMax Matveev2011-09-161-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Attempt to reduce the number of IP packets emitted in response to single SCTP packet (2e3216cd) introduced a complication - if a packet contains two COOKIE_ECHO chunks and nothing else then SCTP state machine corks the socket while processing first COOKIE_ECHO and then loses the association and forgets to uncork the socket. To deal with the issue add new SCTP command which can be used to set association explictly. Use this new command when processing second COOKIE_ECHO chunk to restore the context for SCTP state machine. Signed-off-by: Max Matveev <makc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | Merge branch 'master' of ../netdev/David S. Miller2011-09-1614-27/+56
| |\ \ \ \ \
| | * | | | | net: relax PKTINFO non local ipv6 udp xmit checkMaciej Żenczykowski2011-08-301-0/+1
| | | |_|/ / | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allow transparent sockets to be less restrictive about the source ip of ipv6 udp packets being sent. Google-Bug-Id: 5018138 Signed-off-by: Maciej Żenczykowski <maze@google.com> CC: "Erik Kline" <ek@google.com> CC: "Lorenzo Colitti" <lorenzo@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | net: copy userspace buffers on device forwardingMichael S. Tsirkin2011-09-151-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | dev_forward_skb loops an skb back into host networking stack which might hang on the memory indefinitely. In particular, this can happen in macvtap in bridged mode. Copy the userspace fragments to avoid blocking the sender in that case. As this patch makes skb_copy_ubufs extern now, I also added some documentation and made it clear the SKBTX_DEV_ZEROCOPY flag automatically instead of doing it in all callers. This can be made into a separate patch if people feel it's worth it. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | tcp: Change possible SYN flooding messagesEric Dumazet2011-09-153-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | "Possible SYN flooding on port xxxx " messages can fill logs on servers. Change logic to log the message only once per listener, and add two new SNMP counters to track : TCPReqQFullDoCookies : number of times a SYNCOOKIE was replied to client TCPReqQFullDrop : number of times a SYN request was dropped because syncookies were not enabled. Based on a prior patch from Tom Herbert, and suggestions from David. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | drivers/gpio/gpio-generic.c: fix build errorsRussell King2011-09-141-10/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Building a kernel with hotplug disabled results in a link failure: `bgpio_remove' referenced in section `___ksymtab_gpl+bgpio_remove' of drivers/built-in.o: defined in discarded section `.devexit.text' of drivers/built-in.o This is because of bgpio_remove() is exported. It is illegal to export symbols which are discarded either at link time or as part of an init/exit section. Fix this by dropping the __devexit attributation from bgpio_remove(). Also drop the __devinit attributation from bgpio_init(). Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Cc: Grant Likely <grant.likely@secretlab.ca> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | | memcg: Revert "memcg: add memory.vmscan_stat"Johannes Weiner2011-09-142-19/+6
| |_|_|_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Revert the post-3.0 commit 82f9d486e59f5 ("memcg: add memory.vmscan_stat"). The implementation of per-memcg reclaim statistics violates how memcg hierarchies usually behave: hierarchically. The reclaim statistics are accounted to child memcgs and the parent hitting the limit, but not to hierarchy levels in between. Usually, hierarchical statistics are perfectly recursive, with each level representing the sum of itself and all its children. Since this exports statistics to userspace, this may lead to confusion and problems with changing things after the release, so revert it now, we can try again later. Signed-off-by: Johannes Weiner <jweiner@redhat.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: Michal Hocko <mhocko@suse.cz> Cc: Ying Han <yinghan@google.com> Cc: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | regulator: fix kernel-doc warning in consumer.hRandy Dunlap2011-09-081-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix kernel-doc warning about internal/private data by marking it as "private:" so that kernel-doc will ignore it. Warning(include/linux/regulator/consumer.h:128): No description found for parameter 'ret' Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Acked-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | wireless: fix kernel-doc warning in net/cfg80211.hRandy Dunlap2011-09-081-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix kernel-doc warning in net/cfg80211.h: Warning(include/net/cfg80211.h:1884): No description found for parameter 'registered' Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | Merge branch 'perf-fixes-for-linus' of git://tesla.tglx.de/git/linux-2.6-tipLinus Torvalds2011-09-071-9/+15
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'perf-fixes-for-linus' of git://tesla.tglx.de/git/linux-2.6-tip: x86, perf: Check that current->mm is alive before getting user callchain perf_event: Fix broken calc_timer_values() perf events: Fix slow and broken cgroup context switch code
| * | | | | perf events: Fix slow and broken cgroup context switch codeStephane Eranian2011-08-291-9/+15
| | |_|/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current cgroup context switch code was incorrect leading to bogus counts. Furthermore, as soon as there was an active cgroup event on a CPU, the context switch cost on that CPU would increase by a significant amount as demonstrated by a simple ping/pong example: $ ./pong Both processes pinned to CPU1, running for 10s 10684.51 ctxsw/s Now start a cgroup perf stat: $ perf stat -e cycles,cycles -A -a -G test -C 1 -- sleep 100 $ ./pong Both processes pinned to CPU1, running for 10s 6674.61 ctxsw/s That's a 37% penalty. Note that pong is not even in the monitored cgroup. The results shown by perf stat are bogus: $ perf stat -e cycles,cycles -A -a -G test -C 1 -- sleep 100 Performance counter stats for 'sleep 100': CPU1 <not counted> cycles test CPU1 16,984,189,138 cycles # 0.000 GHz The second 'cycles' event should report a count @ CPU clock (here 2.4GHz) as it is counting across all cgroups. The patch below fixes the bogus accounting and bypasses any cgroup switches in case the outgoing and incoming tasks are in the same cgroup. With this patch the same test now yields: $ ./pong Both processes pinned to CPU1, running for 10s 10775.30 ctxsw/s Start perf stat with cgroup: $ perf stat -e cycles,cycles -A -a -G test -C 1 -- sleep 10 Run pong outside the cgroup: $ /pong Both processes pinned to CPU1, running for 10s 10687.80 ctxsw/s The penalty is now less than 2%. And the results for perf stat are correct: $ perf stat -e cycles,cycles -A -a -G test -C 1 -- sleep 10 Performance counter stats for 'sleep 10': CPU1 <not counted> cycles test # 0.000 GHz CPU1 23,933,981,448 cycles # 0.000 GHz Now perf stat reports the correct counts for for the non cgroup event. If we run pong inside the cgroup, then we also get the correct counts: $ perf stat -e cycles,cycles -A -a -G test -C 1 -- sleep 10 Performance counter stats for 'sleep 10': CPU1 22,297,726,205 cycles test # 0.000 GHz CPU1 23,933,981,448 cycles # 0.000 GHz 10.001457237 seconds time elapsed Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20110825135803.GA4697@quad Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | | | | fs/9p: Use protocol-defined value for lock/getlock 'type' field.Jim Garlick2011-09-061-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Jim Garlick <garlick@llnl.gov> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>