summaryrefslogtreecommitdiff
path: root/fs/nfs
Commit message (Collapse)AuthorAgeFilesLines
* Merge branch 'for-linus' of ↵Linus Torvalds2012-03-211-4/+2
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs pile 1 from Al Viro: "This is _not_ all; in particular, Miklos' and Jan's stuff is not there yet." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (64 commits) ext4: initialization of ext4_li_mtx needs to be done earlier debugfs-related mode_t whack-a-mole hfsplus: add an ioctl to bless files hfsplus: change finder_info to u32 hfsplus: initialise userflags qnx4: new helper - try_extent() qnx4: get rid of qnx4_bread/qnx4_getblk take removal of PF_FORKNOEXEC to flush_old_exec() trim includes in inode.c um: uml_dup_mmap() relies on ->mmap_sem being held, but activate_mm() doesn't hold it um: embed ->stub_pages[] into mmu_context gadgetfs: list_for_each_safe() misuse ocfs2: fix leaks on failure exits in module_init ecryptfs: make register_filesystem() the last potential failure exit ntfs: forgets to unregister sysctls on register_filesystem() failure logfs: missing cleanup on register_filesystem() failure jfs: mising cleanup on register_filesystem() failure make configfs_pin_fs() return root dentry on success configfs: configfs_create_dir() has parent dentry in dentry->d_parent configfs: sanitize configfs_create() ...
| * switch open-coded instances of d_make_root() to new helperAl Viro2012-03-201-4/+2
| | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | Merge branch 'next' of ↵Linus Torvalds2012-03-212-0/+2
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull security subsystem updates for 3.4 from James Morris: "The main addition here is the new Yama security module from Kees Cook, which was discussed at the Linux Security Summit last year. Its purpose is to collect miscellaneous DAC security enhancements in one place. This also marks a departure in policy for LSM modules, which were previously limited to being standalone access control systems. Chromium OS is using Yama, and I believe there are plans for Ubuntu, at least. This patchset also includes maintenance updates for AppArmor, TOMOYO and others." Fix trivial conflict in <net/sock.h> due to the jumo_label->static_key rename. * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (38 commits) AppArmor: Fix location of const qualifier on generated string tables TOMOYO: Return error if fails to delete a domain AppArmor: add const qualifiers to string arrays AppArmor: Add ability to load extended policy TOMOYO: Return appropriate value to poll(). AppArmor: Move path failure information into aa_get_name and rename AppArmor: Update dfa matching routines. AppArmor: Minor cleanup of d_namespace_path to consolidate error handling AppArmor: Retrieve the dentry_path for error reporting when path lookup fails AppArmor: Add const qualifiers to generated string tables AppArmor: Fix oops in policy unpack auditing AppArmor: Fix error returned when a path lookup is disconnected KEYS: testing wrong bit for KEY_FLAG_REVOKED TOMOYO: Fix mount flags checking order. security: fix ima kconfig warning AppArmor: Fix the error case for chroot relative path name lookup AppArmor: fix mapping of META_READ to audit and quiet flags AppArmor: Fix underflow in xindex calculation AppArmor: Fix dropping of allowed operations that are force audited AppArmor: Add mising end of structure test to caps unpacking ...
| * | security: trim security.hAl Viro2012-02-141-0/+1
| | | | | | | | | | | | | | | | | | | | | Trim security.h Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: James Morris <jmorris@namei.org>
| * | Merge branch 'next-queue' into nextJames Morris2012-02-091-0/+1
| |\ \
| | * | KEYS: Allow special keyrings to be clearedDavid Howells2012-01-191-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The kernel contains some special internal keyrings, for instance the DNS resolver keyring : 2a93faf1 I----- 1 perm 1f030000 0 0 keyring .dns_resolver: empty It would occasionally be useful to allow the contents of such keyrings to be flushed by root (cache invalidation). Allow a flag to be set on a keyring to mark that someone possessing the sysadmin capability can clear the keyring, even without normal write access to the keyring. Set this flag on the special keyrings created by the DNS resolver, the NFS identity mapper and the CIFS identity mapper. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Jeff Layton <jlayton@redhat.com> Acked-by: Steve Dickson <steved@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>
* | | | nfs: remove the second argument of k[un]map_atomic()Cong Wang2012-03-202-6/+6
| |_|/ |/| | | | | | | | Signed-off-by: Cong Wang <amwang@redhat.com>
* | | NFSv4: fix server_scope memory leakWeston Andros Adamson2012-02-171-6/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | server_scope would never be freed if nfs4_check_cl_exchange_flags() returned non-zero Signed-off-by: Weston Andros Adamson <dros@netapp.com> Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* | | NFSv4.1: Fix a NFSv4.1 session initialisation regressionTrond Myklebust2012-02-171-65/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit aacd553 (NFSv4.1: cleanup init and reset of session slot tables) introduces a regression in the session initialisation code. New tables now find their sequence ids initialised to 0, rather than the mandated value of 1 (see RFC5661). Fix the problem by merging nfs4_reset_slot_table() and nfs4_init_slot_table(). Since the tbl->max_slots is initialised to 0, the test in nfs4_reset_slot_table for max_reqs != tbl->max_slots will automatically pass for an empty table. Reported-by: Vitaliy Gusev <gusev.vitaliy@nexenta.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* | | NFSv4: Ensure we throw out bad delegation stateids on NFS4ERR_BAD_STATEIDTrond Myklebust2012-02-091-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | To ensure that we don't just reuse the bad delegation when we attempt to recover the nfs4_state that received the bad stateid error. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org
* | | NFSv4: Fix an Oops in the NFSv4 getacl codeTrond Myklebust2012-02-032-5/+8
|/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit bf118a342f10dafe44b14451a1392c3254629a1f (NFSv4: include bitmap in nfsv4 get acl data) introduces the 'acl_scratch' page for the case where we may need to decode multi-page data. However it fails to take into account the fact that the variable may be NULL (for the case where we're not doing multi-page decode), and it also attaches it to the encoding xdr_stream rather than the decoding one. The immediate result is an Oops in nfs4_xdr_enc_getacl due to the call to page_address() with a NULL page pointer. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Andy Adamson <andros@netapp.com> Cc: stable@vger.kernel.org
* | Merge tag 'nfs-for-3.3-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds2012-01-167-178/+222
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | NFS client bugfixes and cleanups for Linux 3.3 (pull 2) * tag 'nfs-for-3.3-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: pnfsblock: alloc short extent before submit bio pnfsblock: remove rpc_call_ops from struct parallel_io pnfsblock: move find lock page logic out of bl_write_pagelist pnfsblock: cleanup bl_mark_sectors_init pnfsblock: limit bio page count pnfsblock: don't spinlock when freeing block_dev pnfsblock: clean up _add_entry pnfsblock: set read/write tk_status to pnfs_error pnfsblock: acquire im_lock in _preload_range NFS4: fix compile warnings in nfs4proc.c nfs: check for integer overflow in decode_devicenotify_args() NFS: cleanup endian type in decode_ds_addr() NFS: add an endian notation
| * pnfsblock: alloc short extent before submit bioPeng Tao2012-01-123-37/+131
| | | | | | | | | | | | | | | | | | | | | | | | As discussed earlier, it is better for block client to allocate memory for tracking extents state before submitting bio. So the patch does it by allocating a short_extent for every INVALID extent touched by write pagelist and for every zeroing page we created, saving them in layout header. Then in end_io we can just use them to create commit list items and avoid memory allocation there. Signed-off-by: Peng Tao <peng_tao@emc.com> Signed-off-by: Benny Halevy <bhalevy@tonian.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * pnfsblock: remove rpc_call_ops from struct parallel_ioPeng Tao2012-01-121-13/+0
| | | | | | | | | | | | | | | | block layout can just make use of generic read/write_done. Signed-off-by: Peng Tao <peng_tao@emc.com> Signed-off-by: Benny Halevy <bhalevy@tonian.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * pnfsblock: move find lock page logic out of bl_write_pagelistPeng Tao2012-01-121-24/+54
| | | | | | | | | | | | | | | | Also avoid unnecessary lock_page if page is handled by others. Signed-off-by: Peng Tao <peng_tao@emc.com> Signed-off-by: Benny Halevy <bhalevy@tonian.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * pnfsblock: cleanup bl_mark_sectors_initPeng Tao2012-01-123-77/+8
| | | | | | | | | | | | | | | | | | It does not need to manipulate on partial initialized blocks. Writeback code takes care of it. Signed-off-by: Peng Tao <peng_tao@emc.com> Signed-off-by: Benny Halevy <bhalevy@tonian.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * pnfsblock: limit bio page countPeng Tao2012-01-121-6/+11
| | | | | | | | | | | | | | | | | | | | One bio can have at most BIO_MAX_PAGES pages. We should limit it bec otherwise bio_alloc will fail when there are many pages in one read/write_pagelist. Cc: <stable@vger.kernel.org> #3.1+ Signed-off-by: Peng Tao <peng_tao@emc.com> Signed-off-by: Benny Halevy <bhalevy@tonian.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * pnfsblock: don't spinlock when freeing block_devPeng Tao2012-01-121-7/+4
| | | | | | | | | | | | | | | | | | | | bl_free_block_dev() may sleep. We can not call it with spinlock held. Besides, there is no need to take bm_lock as we are last user freeing bm_devlist. Cc: <stable@vger.kernel.org> #3.1+ Signed-off-by: Peng Tao <peng_tao@emc.com> Signed-off-by: Benny Halevy <bhalevy@tonian.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * pnfsblock: clean up _add_entryPeng Tao2012-01-121-7/+1
| | | | | | | | | | | | | | | | | | It is wrong to kmalloc in _add_entry() as it is inside spinlock. memory should be already allocated _add_entry() is called. Signed-off-by: Peng Tao <peng_tao@emc.com> Signed-off-by: Benny Halevy <bhalevy@tonian.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * pnfsblock: set read/write tk_status to pnfs_errorPeng Tao2012-01-121-1/+2
| | | | | | | | | | | | | | | | To pass the IO status to upper layer. Signed-off-by: Peng Tao <peng_tao@emc.com> Signed-off-by: Benny Halevy <bhalevy@tonian.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * pnfsblock: acquire im_lock in _preload_rangePeng Tao2012-01-121-5/+6
| | | | | | | | | | | | | | | | | | | | When calling _add_entry, we should take the im_lock to protect agains other modifiers. Cc: <stable@vger.kernel.org> #3.1+ Signed-off-by: Peng Tao <peng_tao@emc.com> Signed-off-by: Benny Halevy <bhalevy@tonian.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * NFS4: fix compile warnings in nfs4proc.cPeng Tao2012-01-121-1/+1
| | | | | | | | | | | | | | | | | | | | | | compile in nfs-for-3.3 branch shows following warnings. Fix it here. fs/nfs/nfs4proc.c: In function ‘__nfs4_get_acl_uncached’: fs/nfs/nfs4proc.c:3589: warning: format ‘%ld’ expects type ‘long int’, but argument 4 has type ‘size_t’ fs/nfs/nfs4proc.c:3589: warning: format ‘%ld’ expects type ‘long int’, but argument 6 has type ‘size_t’ Signed-off-by: Peng Tao <peng_tao@emc.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * nfs: check for integer overflow in decode_devicenotify_args()Dan Carpenter2012-01-121-0/+4
| | | | | | | | | | | | | | | | On 32 bit, if n is too large then "n * sizeof(*args->devs)" could overflow and args->devs would be smaller than expected. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * NFS: cleanup endian type in decode_ds_addr()Dan Carpenter2012-01-121-1/+1
| | | | | | | | | | | | | | | | port is supposed to be a __be16 here. The existing code should work fine, but this is a cleanup. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * NFS: add an endian notationDan Carpenter2012-01-121-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This function returns a big endian value. The implementation in fs/nfs/callback_proc.c is declared with "__be32" but the .h file uses "unsigned" instead. It makes sparse complain: fs/nfs/callback_proc.c:232:8: error: symbol 'nfs4_callback_layoutrecall' redeclared with different type (originally declared at fs/nfs/callback.h:165) - different base types Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* | Merge tag 'for-linus' of git://github.com/rustyrussell/linuxLinus Torvalds2012-01-142-2/+2
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Autogenerated GPG tag for Rusty D1ADB8F1: 15EE 8D6C AB0E 7F0C F999 BFCB D920 0E6C D1AD B8F1 * tag 'for-linus' of git://github.com/rustyrussell/linux: module_param: check that bool parameters really are bool. intelfbdrv.c: bailearly is an int module_param paride/pcd: fix bool verbose module parameter. module_param: make bool parameters really bool (drivers & misc) module_param: make bool parameters really bool (arch) module_param: make bool parameters really bool (core code) kernel/async: remove redundant declaration. printk: fix unnecessary module_param_name. lirc_parallel: fix module parameter description. module_param: avoid bool abuse, add bint for special cases. module_param: check type correctness for module_param_array modpost: use linker section to generate table. modpost: use a table rather than a giant if/else statement. modules: sysfs - export: taint, coresize, initsize kernel/params: replace DEBUGP with pr_debug module: replace DEBUGP with pr_debug module: struct module_ref should contains long fields module: Fix performance regression on modules with large symbol tables module: Add comments describing how the "strmap" logic works Fix up conflicts in scripts/mod/file2alias.c due to the new linker- generated table approach to adding __mod_*_device_table entries. The ARM sa11x0 mcp bus needed to be converted to that too.
| * | module_param: make bool parameters really bool (drivers & misc)Rusty Russell2012-01-132-2/+2
| |/ | | | | | | | | | | | | | | | | | | | | | | module_param(bool) used to counter-intuitively take an int. In fddd5201 (mid-2009) we allowed bool or int/unsigned int using a messy trick. It's time to remove the int/unsigned int option. For this version it'll simply give a warning, but it'll break next kernel version. Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
* | mm: compaction: introduce sync-light migration for use by compactionMel Gorman2012-01-122-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds a lightweight sync migrate operation MIGRATE_SYNC_LIGHT mode that avoids writing back pages to backing storage. Async compaction maps to MIGRATE_ASYNC while sync compaction maps to MIGRATE_SYNC_LIGHT. For other migrate_pages users such as memory hotplug, MIGRATE_SYNC is used. This avoids sync compaction stalling for an excessive length of time, particularly when copying files to a USB stick where there might be a large number of dirty pages backed by a filesystem that does not support ->writepages. [aarcange@redhat.com: This patch is heavily based on Andrea's work] [akpm@linux-foundation.org: fix fs/nfs/write.c build] [akpm@linux-foundation.org: fix fs/btrfs/disk-io.c build] Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: Dave Jones <davej@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Andy Isaacson <adi@hexapodia.org> Cc: Nai Xia <nai.xia@gmail.com> Cc: Johannes Weiner <jweiner@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | mm: compaction: determine if dirty pages can be migrated without blocking ↵Mel Gorman2012-01-122-3/+3
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | within ->migratepage Asynchronous compaction is used when allocating transparent hugepages to avoid blocking for long periods of time. Due to reports of stalling, there was a debate on disabling synchronous compaction but this severely impacted allocation success rates. Part of the reason was that many dirty pages are skipped in asynchronous compaction by the following check; if (PageDirty(page) && !sync && mapping->a_ops->migratepage != migrate_page) rc = -EBUSY; This skips over all mapping aops using buffer_migrate_page() even though it is possible to migrate some of these pages without blocking. This patch updates the ->migratepage callback with a "sync" parameter. It is the responsibility of the callback to fail gracefully if migration would block. Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: Dave Jones <davej@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Andy Isaacson <adi@hexapodia.org> Cc: Nai Xia <nai.xia@gmail.com> Cc: Johannes Weiner <jweiner@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge branch 'nfs-for-3.3' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds2012-01-1017-233/+422
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'nfs-for-3.3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: NFSv4: Change the default setting of the nfs4_disable_idmapping parameter NFSv4: Save the owner/group name string when doing open NFS: Remove pNFS bloat from the generic write path pnfs-obj: Must return layout on IO error pnfs-obj: pNFS errors are communicated on iodata->pnfs_error NFS: Cache state owners after files are closed NFS: Clean up nfs4_find_state_owners_locked() NFSv4: include bitmap in nfsv4 get acl data nfs: fix a minor do_div portability issue NFSv4.1: cleanup comment and debug printk NFSv4.1: change nfs4_free_slot parameters for dynamic slots NFSv4.1: cleanup init and reset of session slot tables NFSv4.1: fix backchannel slotid off-by-one bug nfs: fix regression in handling of context= option in NFSv4 NFS - fix recent breakage to NFS error handling. NFS: Retry mounting NFSROOT SUNRPC: Clean up the RPCSEC_GSS service ticket requests
| * NFSv4: Change the default setting of the nfs4_disable_idmapping parameterTrond Myklebust2012-01-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that the use of numeric uids/gids is officially sanctioned in RFC3530bis, it is time to change the default here to 'enabled'. By doing so, we ensure that NFSv4 copies the behaviour of NFSv3 when we're using the default AUTH_SYS authentication (i.e. when the client uses the numeric uids/gids as authentication tokens), so that when new files are created, they will appear to have the correct user/group. It also fixes a number of backward compatibility issues when migrating from NFSv3 to NFSv4 on a platform where the server uses different uid/gid mappings than the client. Note also that this setting has been successfully tested against servers that do not support numeric uids/gids at several Connectathon/Bakeathon events at this point, and the fall back to using string names/groups has been shown to work well in all those test cases. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * NFSv4: Save the owner/group name string when doing openTrond Myklebust2012-01-074-59/+142
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ...so that we can do the uid/gid mapping outside the asynchronous RPC context. This fixes a bug in the current NFSv4 atomic open code where the client isn't able to determine what the true uid/gid fields of the file are, (because the asynchronous nature of the OPEN call denies it the ability to do an upcall) and so fills them with default values, marking the inode as needing revalidation. Unfortunately, in some cases, the VFS will do some additional sanity checks on the file, and may override the server's decision to allow the open because it sees the wrong owner/group fields. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * NFS: Remove pNFS bloat from the generic write pathTrond Myklebust2012-01-063-27/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | We have no business doing any this in the standard write release path. Get rid of it, and put it in the pNFS layer. Also, while we're at it, get rid of the completely bogus unlock/relock semantics that were present in nfs_writeback_release_full(). It is not only unnecessary, but actually dangerous to release the write lock just in order to take it again in nfs_page_async_flush(). Better just to open code the pgio operations in a pnfs helper. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * pnfs-obj: Must return layout on IO errorBoaz Harrosh2012-01-063-1/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As mandated by the standard. In case of an IO error, a pNFS objects layout driver must return it's layout. This is because all device errors are reported to the server as part of the layout return buffer. This is implemented the same way PNFS_LAYOUTRET_ON_SETATTR is done, through a bit flag on the pnfs_layoutdriver_type->flags member. The flag is set by the layout driver that wants a layout_return preformed at pnfs_ld_{write,read}_done in case of an error. (Though I have not defined a wrapper like pnfs_ld_layoutret_on_setattr because this code is never called outside of pnfs.c and pnfs IO paths) Without this patch 3.[0-2] Kernels leak memory and have an annoying WARN_ON after every IO error utilizing the pnfs-obj driver. [This patch is for 3.2 Kernel. 3.1/0 Kernels need a different patch] CC: Stable Tree <stable@kernel.org> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * pnfs-obj: pNFS errors are communicated on iodata->pnfs_errorBoaz Harrosh2012-01-061-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some time along the way pNFS IO errors were switched to communicate with a special iodata->pnfs_error member instead of the regular RPC members. But objlayout was not switched over. Fix that! Without this fix any IO error is hanged, because IO is not switched to MDS and pages are never cleared or read. [Applies to 3.2.0. Same bug different patch for 3.1/0 Kernels] CC: Stable Tree <stable@kernel.org> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * NFS: Cache state owners after files are closedChuck Lever2012-01-053-9/+91
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Servers have a finite amount of memory to store NFSv4 open and lock owners. Moreover, servers may have a difficult time determining when they can reap their state owner table, thanks to gray areas in the NFSv4 protocol specification. Thus clients should be careful to reuse state owners when possible. Currently Linux is not too careful. When a user has closed all her files on one mount point, the state owner's reference count goes to zero, and it is released. The next OPEN allocates a new one. A workload that serially opens and closes files can run through a large number of open owners this way. When a state owner's reference count goes to zero, slap it onto a free list for that nfs_server, with an expiry time. Garbage collect before looking for a state owner. This makes state owners for active users available for re-use. Now that there can be unused state owners remaining at umount time, purge the state owner free list when a server is destroyed. Also be sure not to reclaim unused state owners during state recovery. This change has benefits for the client as well. For some workloads, this approach drops the number of OPEN_CONFIRM calls from the same as the number of OPEN calls, down to just one. This reduces wire traffic and thus open(2) latency. Before this patch, untarring a kernel source tarball shows the OPEN_CONFIRM call counter steadily increasing through the test. With the patch, the OPEN_CONFIRM count remains at 1 throughout the entire untar. As long as the expiry time is kept short, I don't think garbage collection should be terribly expensive, although it does bounce the clp->cl_lock around a bit. [ At some point we should rationalize the use of the nfs_server ->destroy method. ] Signed-off-by: Chuck Lever <chuck.lever@oracle.com> [Trond: Fixed a garbage collection race and a few efficiency issues] Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * NFS: Clean up nfs4_find_state_owners_locked()Chuck Lever2012-01-051-12/+3
| | | | | | | | | | | | | | | | | | | | | | | | There's no longer a need to check the so_server field in the state owner, because nowadays the RB tree we search for state owners contains owners for that only server. Make nfs4_find_state_owners_locked() use the same tree searching logic as nfs4_insert_state_owner_locked(). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * NFSv4: include bitmap in nfsv4 get acl dataAndy Adamson2012-01-052-47/+80
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The NFSv4 bitmap size is unbounded: a server can return an arbitrary sized bitmap in an FATTR4_WORD0_ACL request. Replace using the nfs4_fattr_bitmap_maxsz as a guess to the maximum bitmask returned by a server with the inclusion of the bitmap (xdr length plus bitmasks) and the acl data xdr length to the (cached) acl page data. This is a general solution to commit e5012d1f "NFSv4.1: update nfs4_fattr_bitmap_maxsz" and fixes hitting a BUG_ON in xdr_shrink_bufhead when getting ACLs. Fix a bug in decode_getacl that returned -EINVAL on ACLs > page when getxattr was called with a NULL buffer, preventing ACL > PAGE_SIZE from being retrieved. Cc: stable@kernel.org Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * nfs: fix a minor do_div portability issueChris Metcalf2012-01-051-4/+5
| | | | | | | | | | | | | | | | | | | | This change modifies filelayout_get_dense_offset() to use the functions in math64.h and thus avoid a 32-bit platform compile error trying to use do_div() on an s64 type. Signed-off-by: Chris Metcalf <cmetcalf@tilera.com> Reviewed-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * NFSv4.1: cleanup comment and debug printkAndy Adamson2012-01-051-5/+2
| | | | | | | | | | Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * NFSv4.1: change nfs4_free_slot parameters for dynamic slotsAndy Adamson2012-01-051-3/+2
| | | | | | | | | | Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * NFSv4.1: cleanup init and reset of session slot tablesAndy Adamson2012-01-051-37/+22
| | | | | | | | | | | | | | | | We are either initializing or resetting a session. Initialize or reset the session slot tables accordingly. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * NFSv4.1: fix backchannel slotid off-by-one bugAndy Adamson2012-01-051-1/+1
| | | | | | | | | | | | Cc:stable@kernel.org Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * nfs: fix regression in handling of context= option in NFSv4Jeff Layton2012-01-051-24/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Setting the security context of a NFSv4 mount via the context= mount option is currently broken. The NFSv4 codepath allocates a parsed options struct, and then parses the mount options to fill it. It eventually calls nfs4_remote_mount which calls security_init_mnt_opts. That clobbers the lsm_opts struct that was populated earlier. This bug also looks like it causes a small memory leak on each v4 mount where context= is used. Fix this by moving the initialization of the lsm_opts into nfs_alloc_parsed_mount_data. Also, add a destructor for nfs_parsed_mount_data to make it easier to free all of the allocations hanging off of it, and to ensure that the security_free_mnt_opts is called whenever security_init_mnt_opts is. I believe this regression was introduced quite some time ago, probably by commit c02d7adf. Cc: stable@vger.kernel.org Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * NFS - fix recent breakage to NFS error handling.NeilBrown2012-01-051-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | From c6d615d2b97fe305cbf123a8751ced859dca1d5e Mon Sep 17 00:00:00 2001 From: NeilBrown <neilb@suse.de> Date: Wed, 16 Nov 2011 09:39:05 +1100 Subject: [PATCH] NFS - fix recent breakage to NFS error handling. commit 02c24a82187d5a628c68edfe71ae60dc135cd178 made a small and presumably unintended change to write error handling in NFS. Previously an error from filemap_write_and_wait_range would only be of interest if nfs_file_fsync did not return an error. After this commit, an error from filemap_write_and_wait_range would mean that (the rest of) nfs_file_fsync would not even be called. This means that: 1/ you are more likely to see EIO than e.g. EDQUOT or ENOSPC. 2/ NFS_CONTEXT_ERROR_WRITE remains set for longer so more writes are synchronous. This patch restores previous behaviour. Cc: stable@kernel.org Cc: Josef Bacik <josef@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * SUNRPC: Clean up the RPCSEC_GSS service ticket requestsTrond Myklebust2012-01-051-1/+1
| | | | | | | | | | | | | | | | Instead of hacking specific service names into gss_encode_v1_msg, we should just allow the caller to specify the service name explicitly. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Acked-by: J. Bruce Fields <bfields@redhat.com>
* | Merge branch 'pm-for-linus' of ↵Linus Torvalds2012-01-084-5/+9
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm * 'pm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (76 commits) PM / Hibernate: Implement compat_ioctl for /dev/snapshot PM / Freezer: fix return value of freezable_schedule_timeout_killable() PM / shmobile: Allow the A4R domain to be turned off at run time PM / input / touchscreen: Make st1232 use device PM QoS constraints PM / QoS: Introduce dev_pm_qos_add_ancestor_request() PM / shmobile: Remove the stay_on flag from SH7372's PM domains PM / shmobile: Don't include SH7372's INTCS in syscore suspend/resume PM / shmobile: Add support for the sh7372 A4S power domain / sleep mode PM: Drop generic_subsys_pm_ops PM / Sleep: Remove forward-only callbacks from AMBA bus type PM / Sleep: Remove forward-only callbacks from platform bus type PM: Run the driver callback directly if the subsystem one is not there PM / Sleep: Make pm_op() and pm_noirq_op() return callback pointers PM/Devfreq: Add Exynos4-bus device DVFS driver for Exynos4210/4212/4412. PM / Sleep: Merge internal functions in generic_ops.c PM / Sleep: Simplify generic system suspend callbacks PM / Hibernate: Remove deprecated hibernation snapshot ioctls PM / Sleep: Fix freezer failures due to racy usermodehelper_is_disabled() ARM: S3C64XX: Implement basic power domain support PM / shmobile: Use common always on power domain governor ... Fix up trivial conflict in fs/xfs/xfs_buf.c due to removal of unused XBT_FORCE_SLEEP bit
| * \ Merge branch 'master' into pm-sleepRafael J. Wysocki2011-12-213-21/+38
| |\ \ | | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * master: (848 commits) SELinux: Fix RCU deref check warning in sel_netport_insert() binary_sysctl(): fix memory leak mm/vmalloc.c: remove static declaration of va from __get_vm_area_node ipmi_watchdog: restore settings when BMC reset oom: fix integer overflow of points in oom_badness memcg: keep root group unchanged if creation fails nilfs2: potential integer overflow in nilfs_ioctl_clean_segments() nilfs2: unbreak compat ioctl cpusets: stall when updating mems_allowed for mempolicy or disjoint nodemask evm: prevent racing during tfm allocation evm: key must be set once during initialization mmc: vub300: fix type of firmware_rom_wait_states module parameter Revert "mmc: enable runtime PM by default" mmc: sdhci: remove "state" argument from sdhci_suspend_host x86, dumpstack: Fix code bytes breakage due to missing KERN_CONT IB/qib: Correct sense on freectxts increment and decrement RDMA/cma: Verify private data length cgroups: fix a css_set not found bug in cgroup_attach_proc oprofile: Fix uninitialized memory access when writing to writing to oprofilefs Revert "xen/pv-on-hvm kexec: add xs_reset_watches to shutdown watches from old kernel" ... Conflicts: kernel/cgroup_freezer.c
| * | Freezer / sunrpc / NFS: don't allow TASK_KILLABLE sleeps to block the freezerJeff Layton2011-12-064-5/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allow the freezer to skip wait_on_bit_killable sleeps in the sunrpc layer. This should allow suspend and hibernate events to proceed, even when there are RPC's pending on the wire. Also, wrap the TASK_KILLABLE sleeps in NFS layer in freezer_do_not_count and freezer_count calls. This allows the freezer to skip tasks that are sleeping while looping on EJUKEBOX or NFS4ERR_DELAY sorts of errors. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
* | | vfs: switch ->show_options() to struct dentry *Al Viro2012-01-061-3/+3
| | | | | | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>