summaryrefslogtreecommitdiff
path: root/fs/nfsd
Commit message (Collapse)AuthorAgeFilesLines
* Merge branch 'nfsd-next' of ↵Stephen Rothwell2023-04-115-259/+204
|\ | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
| * NFSD: Watch for rq_pages bounds checking errors in nfsd_splice_actor()Chuck Lever2023-04-101-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There have been several bugs over the years where the NFSD splice actor has attempted to write outside the rq_pages array. This is a "should never happen" condition, but if for some reason the pipe splice actor should attempt to walk past the end of rq_pages, it needs to terminate the READ operation to prevent corruption of the pointer addresses in the fields just beyond the array. A server crash is thus prevented. Since the code is not behaving, the READ operation returns -EIO to the client. None of the READ payload data can be trusted if the splice actor isn't operating as expected. Suggested-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org>
| * SUNRPC: return proper error from get_expiry()NeilBrown2023-04-102-11/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The get_expiry() function currently returns a timestamp, and uses the special return value of 0 to indicate an error. Unfortunately this causes a problem when 0 is the correct return value. On a system with no RTC it is possible that the boot time will be seen to be "3". When exportfs probes to see if a particular filesystem supports NFS export it tries to cache information with an expiry time of "3". The intention is for this to be "long in the past". Even with no RTC it will not be far in the future (at most a second or two) so this is harmless. But if the boot time happens to have been calculated to be "3", then get_expiry will fail incorrectly as it converts the number to "seconds since bootime" - 0. To avoid this problem we change get_expiry() to report the error quite separately from the expiry time. The error is now the return value. The expiry time is reported through a by-reference parameter. Reported-by: Jerry Zhang <jerry@skydio.com> Tested-by: Jerry Zhang <jerry@skydio.com> Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
| * NFSD: Convert filecache to rhltableChuck Lever2023-04-102-187/+133
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While we were converting the nfs4_file hashtable to use the kernel's resizable hashtable data structure, Neil Brown observed that the list variant (rhltable) would be better for managing nfsd_file items as well. The nfsd_file hash table will contain multiple entries for the same inode -- these should be kept together on a list. And, it could be possible for exotic or malicious client behavior to cause the hash table to resize itself on every insertion. A nice simplification is that rhltable_lookup() can return a list that contains only nfsd_file items that match a given inode, which enables us to eliminate specialized hash table helper functions and use the default functions provided by the rhashtable implementation). Since we are now storing nfsd_file items for the same inode on a single list, that effectively reduces the number of hash entries that have to be tracked in the hash table. The mininum bucket count is therefore lowered. Light testing with fstests generic/531 show no regressions. Suggested-by: Neil Brown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
| * nfsd: allow reaping files still under writebackJeff Layton2023-04-101-1/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On most filesystems, there is no reason to delay reaping an nfsd_file just because its underlying inode is still under writeback. nfsd just relies on client activity or the local flusher threads to do writeback. The main exception is NFS, which flushes all of its dirty data on last close. Add a new EXPORT_OP_FLUSH_ON_CLOSE flag to allow filesystems to signal that they do this, and only skip closing files under writeback on such filesystems. Also, remove a redundant NULL file pointer check in nfsd_file_check_writeback, and clean up nfs's export op flag definitions. Signed-off-by: Jeff Layton <jlayton@kernel.org> Acked-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
| * nfsd: update comment over __nfsd_file_cache_purgeJeff Layton2023-04-101-1/+2
| | | | | | | | | | Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
| * nfsd: don't take/put an extra reference when putting a fileJeff Layton2023-04-101-3/+1
| | | | | | | | | | | | | | | | The last thing that filp_close does is an fput, so don't bother taking and putting the extra reference. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
| * nfsd: simplify the delayed disposal list codeJeff Layton2023-04-101-42/+21
| | | | | | | | | | | | | | | | | | | | | | | | When queueing a dispose list to the appropriate "freeme" lists, it pointlessly queues the objects one at a time to an intermediate list. Remove a few helpers and just open code a list_move to make it more clear and efficient. Better document the resulting functions with kerneldoc comments. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
| * nfsd: add some comments to nfsd_file_do_acquireJeff Layton2023-04-101-0/+5
| | | | | | | | | | | | | | | | | | David Howells mentioned that he found this bit of code confusing, so sprinkle in some comments to clarify. Reported-by: David Howells <dhowells@redhat.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
| * nfsd: don't kill nfsd_files because of lease break errorJeff Layton2023-04-101-14/+15
| | | | | | | | | | | | | | | | | | An error from break_lease is non-fatal, so we needn't destroy the nfsd_file in that case. Just put the reference like we normally would and return the error. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
| * nfsd: simplify test_bit return in NFSD_FILE_KEY_FULL comparatorJeff Layton2023-04-101-1/+1
| | | | | | | | | | | | | | | | test_bit returns bool, so we can just compare the result of that to the key->gc value without the "!!". Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
| * nfsd: NFSD_FILE_KEY_INODE only needs to find GC'ed entriesJeff Layton2023-04-101-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | Since v4 files are expected to be long-lived, there's little value in closing them out of the cache when there is conflicting access. Change the comparator to also match the gc value in the key. Change both of the current users of that key to set the gc value in the key to "true". Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
| * nfsd: don't open-code clear_and_wake_up_bitJeff Layton2023-04-101-3/+1
| | | | | | | | | | Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
* | Merge branch 'for-next' of ↵Stephen Rothwell2023-04-111-2/+1
|\ \ | |/ |/| | | git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping.git
| * Merge branch 'vfs.misc.fixes' into for-nextChristian Brauner2023-03-312-2/+9
| |\ | | | | | | | | | Signed-off-by: Christian Brauner <brauner@kernel.org>
| * \ Merge branch 'fs.misc' into for-nextChristian Brauner2023-03-131-0/+2
| |\ \ | | | | | | | | | | | | Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
| * | | xattr: remove unused argumentChristian Brauner2023-03-061-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | his helpers is really just used to check for user.* xattr support so don't make it pointlessly generic. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
* | | | Merge tag 'nfsd-6.3-5' of ↵Linus Torvalds2023-04-043-9/+11
|\ \ \ \ | |_|_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fixes from Chuck Lever: - Fix a crash and a resource leak in NFSv4 COMPOUND processing - Fix issues with AUTH_SYS credential handling - Try again to address an NFS/NFSD/SUNRPC build dependency regression * tag 'nfsd-6.3-5' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: NFSD: callback request does not use correct credential for AUTH_SYS NFS: Remove "select RPCSEC_GSS_KRB5 sunrpc: only free unix grouplist after RCU settles nfsd: call op_release, even when op_func returns an error NFSD: Avoid calling OPDESC() with ops->opnum == OP_ILLEGAL
| * | | NFSD: callback request does not use correct credential for AUTH_SYSDai Ngo2023-04-041-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently callback request does not use the credential specified in CREATE_SESSION if the security flavor for the back channel is AUTH_SYS. Problem was discovered by pynfs 4.1 DELEG5 and DELEG7 test with error: DELEG5 st_delegation.testCBSecParms : FAILURE expected callback with uid, gid == 17, 19, got 0, 0 Signed-off-by: Dai Ngo <dai.ngo@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Fixes: 8276c902bbe9 ("SUNRPC: remove uid and gid from struct auth_cred") Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
| * | | nfsd: call op_release, even when op_func returns an errorJeff Layton2023-03-312-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For ops with "trivial" replies, nfsd4_encode_operation will shortcut most of the encoding work and skip to just marshalling up the status. One of the things it skips is calling op_release. This could cause a memory leak in the layoutget codepath if there is an error at an inopportune time. Have the compound processing engine always call op_release, even when op_func sets an error in op->status. With this change, we also need nfsd4_block_get_device_info_scsi to set the gd_device pointer to NULL on error to avoid a double free. Reported-by: Zhi Li <yieli@redhat.com> Link: https://bugzilla.redhat.com/show_bug.cgi?id=2181403 Fixes: 34b1744c91cc ("nfsd4: define ->op_release for compound ops") Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
| * | | NFSD: Avoid calling OPDESC() with ops->opnum == OP_ILLEGALChuck Lever2023-03-311-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | OPDESC() simply indexes into nfsd4_ops[] by the op's operation number, without range checking that value. It assumes callers are careful to avoid calling it with an out-of-bounds opnum value. nfsd4_decode_compound() is not so careful, and can invoke OPDESC() with opnum set to OP_ILLEGAL, which is 10044 -- well beyond the end of nfsd4_ops[]. Reported-by: Jeff Layton <jlayton@kernel.org> Fixes: f4f9ef4a1b0a ("nfsd4: opdesc will be useful outside nfs4proc.c") Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
* | | | Merge tag 'nfsd-6.3-3' of ↵Linus Torvalds2023-03-212-2/+9
|\ \ \ \ | |/ / / | | | / | |_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fixes from Chuck Lever: - Fix a crash during NFS READs from certain client implementations - Address a minor kbuild regression in v6.3 * tag 'nfsd-6.3-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: nfsd: don't replace page in rq_pages if it's a continuation of last page NFS & NFSD: Update GSS dependencies
| * | nfsd: don't replace page in rq_pages if it's a continuation of last pageJeff Layton2023-03-171-1/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The splice read calls nfsd_splice_actor to put the pages containing file data into the svc_rqst->rq_pages array. It's possible however to get a splice result that only has a partial page at the end, if (e.g.) the filesystem hands back a short read that doesn't cover the whole page. nfsd_splice_actor will plop the partial page into its rq_pages array and return. Then later, when nfsd_splice_actor is called again, the remainder of the page may end up being filled out. At this point, nfsd_splice_actor will put the page into the array _again_ corrupting the reply. If this is done enough times, rq_next_page will overrun the array and corrupt the trailing fields -- the rq_respages and rq_next_page pointers themselves. If we've already added the page to the array in the last pass, don't add it to the array a second time when dealing with a splice continuation. This was originally handled properly in nfsd_splice_actor, but commit 91e23b1c3982 ("NFSD: Clean up nfsd_splice_actor()") removed the check for it. Fixes: 91e23b1c3982 ("NFSD: Clean up nfsd_splice_actor()") Cc: Al Viro <viro@zeniv.linux.org.uk> Reported-by: Dario Lesca <d.lesca@solinos.it> Tested-by: David Critch <dcritch@redhat.com> Link: https://bugzilla.redhat.com/show_bug.cgi?id=2150630 Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
| * | NFS & NFSD: Update GSS dependenciesChuck Lever2023-03-101-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Geert reports that: > On v6.2, "make ARCH=m68k defconfig" gives you > CONFIG_RPCSEC_GSS_KRB5=m > On v6.3, it became builtin, due to dropping the dependencies on > the individual crypto modules. > > $ grep -E "CRYPTO_(MD5|DES|CBC|CTS|ECB|HMAC|SHA1|AES)" .config > CONFIG_CRYPTO_AES=y > CONFIG_CRYPTO_AES_TI=m > CONFIG_CRYPTO_DES=m > CONFIG_CRYPTO_CBC=m > CONFIG_CRYPTO_CTS=m > CONFIG_CRYPTO_ECB=m > CONFIG_CRYPTO_HMAC=m > CONFIG_CRYPTO_MD5=m > CONFIG_CRYPTO_SHA1=m This behavior is triggered by the "default y" in the definition of RPCSEC_GSS. The "default y" was added in 2010 by commit df486a25900f ("NFS: Fix the selection of security flavours in Kconfig"). However, svc_gss_principal was removed in 2012 by commit 03a4e1f6ddf2 ("nfsd4: move principal name into svc_cred"), so the 2010 fix is no longer necessary. We can safely change the NFS_V4 and NFSD_V4 dependencies back to RPCSEC_GSS_KRB5 to get the nicer v6.2 behavior back. Selecting KRB5 symbolically represents the true requirement here: that all spec-compliant NFSv4 implementations must have Kerberos available to use. Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Fixes: dfe9a123451a ("SUNRPC: Enable rpcsec_gss_krb5.ko to be built without CRYPTO_DES") Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
* | | Merge tag 'nfsd-6.3-2' of ↵Linus Torvalds2023-03-101-0/+2
|\ \ \ | |/ / | | / | |/ |/| | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fixes from Chuck Lever: - Protect NFSD writes against filesystem freezing - Fix a potential memory leak during server shutdown * tag 'nfsd-6.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: SUNRPC: Fix a server shutdown leak NFSD: Protect against filesystem freezing
| * NFSD: Protect against filesystem freezingChuck Lever2023-03-071-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Flole observes this WARNING on occasion: [1210423.486503] WARNING: CPU: 8 PID: 1524732 at fs/ext4/ext4_jbd2.c:75 ext4_journal_check_start+0x68/0xb0 Reported-by: <flole@flole.de> Suggested-by: Jan Kara <jack@suse.cz> Link: https://bugzilla.kernel.org/show_bug.cgi?id=217123 Fixes: 73da852e3831 ("nfsd: use vfs_iter_read/write") Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
* | Merge tag 'nfsd-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linuxLinus Torvalds2023-02-2217-411/+286
|\ \ | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull nfsd updates from Chuck Lever: "Two significant security enhancements are part of this release: - NFSD's RPC header encoding and decoding, including RPCSEC GSS and gssproxy header parsing, has been overhauled to make it more memory-safe. - Support for Kerberos AES-SHA2-based encryption types has been added for both the NFS client and server. This provides a clean path for deprecating and removing insecure encryption types based on DES and SHA-1. AES-SHA2 is also FIPS-140 compliant, so that NFS with Kerberos may now be used on systems with fips enabled. In addition to these, NFSD is now able to handle crossing into an auto-mounted mount point on an exported NFS mount. A number of fixes have been made to NFSD's server-side copy implementation. RPC metrics have been converted to per-CPU variables. This helps reduce unnecessary cross-CPU and cross-node memory bus traffic, and significantly reduces noise when KCSAN is enabled" * tag 'nfsd-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (121 commits) NFSD: Clean up nfsd_symlink() NFSD: copy the whole verifier in nfsd_copy_write_verifier nfsd: don't fsync nfsd_files on last close SUNRPC: Fix occasional warning when destroying gss_krb5_enctypes nfsd: fix courtesy client with deny mode handling in nfs4_upgrade_open NFSD: fix problems with cleanup on errors in nfsd4_copy nfsd: fix race to check ls_layouts nfsd: don't hand out delegation on setuid files being opened for write SUNRPC: Remove ->xpo_secure_port() SUNRPC: Clean up the svc_xprt_flags() macro nfsd: remove fs/nfsd/fault_inject.c NFSD: fix leaked reference count of nfsd4_ssc_umount_item nfsd: clean up potential nfsd_file refcount leaks in COPY codepath nfsd: zero out pointers after putting nfsd_files on COPY setup error SUNRPC: Fix whitespace damage in svcauth_unix.c nfsd: eliminate __nfs4_get_fd nfsd: add some kerneldoc comments for stateid preprocessing functions nfsd: eliminate find_deleg_file_locked nfsd: don't take nfsd4_copy ref for OP_OFFLOAD_STATUS SUNRPC: Add encryption self-tests ...
| * NFSD: Clean up nfsd_symlink()Chuck Lever2023-02-201-8/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The pointer dentry is assigned a value that is never read, the assignment is redundant and can be removed. Cleans up clang-scan warning: fs/nfsd/nfsctl.c:1231:2: warning: Value stored to 'dentry' is never read [deadcode.DeadStores] dentry = ERR_PTR(ret); No need to initialize "int ret = -ENOMEM;" either. These are vestiges of nfsd_mkdir(), from whence I copied nfsd_symlink(). Reported-by: Colin Ian King <colin.i.king@gmail.com> Reported-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
| * NFSD: copy the whole verifier in nfsd_copy_write_verifierChuck Lever2023-02-201-1/+1
| | | | | | | | | | | | | | | | | | Currently, we're only memcpy'ing the first __be32. Ensure we copy into both words. Fixes: 91d2e9b56cf5 ("NFSD: Clean up the nfsd_net::nfssvc_boot field") Reported-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
| * nfsd: don't fsync nfsd_files on last closeJeff Layton2023-02-202-63/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Most of the time, NFSv4 clients issue a COMMIT before the final CLOSE of an open stateid, so with NFSv4, the fsync in the nfsd_file_free path is usually a no-op and doesn't block. We have a customer running knfsd over very slow storage (XFS over Ceph RBD). They were using the "async" export option because performance was more important than data integrity for this application. That export option turns NFSv4 COMMIT calls into no-ops. Due to the fsync in this codepath however, their final CLOSE calls would still stall (since a CLOSE effectively became a COMMIT). I think this fsync is not strictly necessary. We only use that result to reset the write verifier. Instead of fsync'ing all of the data when we free an nfsd_file, we can just check for writeback errors when one is acquired and when it is freed. If the client never comes back, then it'll never see the error anyway and there is no point in resetting it. If an error occurs after the nfsd_file is removed from the cache but before the inode is evicted, then it will reset the write verifier on the next nfsd_file_acquire, (since there will be an unseen error). The only exception here is if something else opens and fsyncs the file during that window. Given that local applications work with this limitation today, I don't see that as an issue. Link: https://bugzilla.redhat.com/show_bug.cgi?id=2166658 Fixes: ac3a2585f018 ("nfsd: rework refcounting in filecache") Reported-and-tested-by: Pierguido Lambri <plambri@redhat.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
| * nfsd: fix courtesy client with deny mode handling in nfs4_upgrade_openJeff Layton2023-02-201-10/+11
| | | | | | | | | | | | | | | | | | | | | | | | The nested if statements here make no sense, as you can never reach "else" branch in the nested statement. Fix the error handling for when there is a courtesy client that holds a conflicting deny mode. Fixes: 3d6942715180 ("NFSD: add support for share reservation conflict to courteous server") Reported-by: 張智諺 <cc85nod@gmail.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Dai Ngo <dai.ngo@oracle.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
| * NFSD: fix problems with cleanup on errors in nfsd4_copyDai Ngo2023-02-202-6/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When nfsd4_copy fails to allocate memory for async_copy->cp_src, or nfs4_init_copy_state fails, it calls cleanup_async_copy to do the cleanup for the async_copy which causes page fault since async_copy is not yet initialized. This patche rearranges the order of initializing the fields in async_copy and adds checks in cleanup_async_copy to skip un-initialized fields. Fixes: ce0887ac96d3 ("NFSD add nfs4 inter ssc to nfsd4_copy") Fixes: 87689df69491 ("NFSD: Shrink size of struct nfsd4_copy") Signed-off-by: Dai Ngo <dai.ngo@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
| * nfsd: fix race to check ls_layoutsBenjamin Coddington2023-02-201-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Its possible for __break_lease to find the layout's lease before we've added the layout to the owner's ls_layouts list. In that case, setting ls_recalled = true without actually recalling the layout will cause the server to never send a recall callback. Move the check for ls_layouts before setting ls_recalled. Fixes: c5c707f96fc9 ("nfsd: implement pNFS layout recalls") Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
| * nfsd: don't hand out delegation on setuid files being opened for writeJeff Layton2023-02-201-0/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We had a bug report that xfstest generic/355 was failing on NFSv4.0. This test sets various combinations of setuid/setgid modes and tests whether DIO writes will cause them to be stripped. What I found was that the server did properly strip those bits, but the client didn't notice because it held a delegation that was not recalled. The recall didn't occur because the client itself was the one generating the activity and we avoid recalls in that case. Clearing setuid bits is an "implicit" activity. The client didn't specifically request that we do that, so we need the server to issue a CB_RECALL, or avoid the situation entirely by not issuing a delegation. The easiest fix here is to simply not give out a delegation if the file is being opened for write, and the mode has the setuid and/or setgid bit set. Note that there is a potential race between the mode and lease being set, so we test for this condition both before and after setting the lease. This patch fixes generic/355, generic/683 and generic/684 for me. (Note that 355 fails only on v4.0, and 683 and 684 require NFSv4.2 to run and fail). Reported-by: Boyang Xue <bxue@redhat.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
| * nfsd: remove fs/nfsd/fault_inject.cJeff Layton2023-02-201-142/+0
| | | | | | | | | | | | | | This file is no longer built at all. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
| * NFSD: fix leaked reference count of nfsd4_ssc_umount_itemDai Ngo2023-02-201-4/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The reference count of nfsd4_ssc_umount_item is not decremented on error conditions. This prevents the laundromat from unmounting the vfsmount of the source file. This patch decrements the reference count of nfsd4_ssc_umount_item on error. Fixes: f4e44b393389 ("NFSD: delay unmount source's export after inter-server copy completed.") Signed-off-by: Dai Ngo <dai.ngo@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
| * nfsd: clean up potential nfsd_file refcount leaks in COPY codepathJeff Layton2023-02-201-13/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are two different flavors of the nfsd4_copy struct. One is embedded in the compound and is used directly in synchronous copies. The other is dynamically allocated, refcounted and tracked in the client struture. For the embedded one, the cleanup just involves releasing any nfsd_files held on its behalf. For the async one, the cleanup is a bit more involved, and we need to dequeue it from lists, unhash it, etc. There is at least one potential refcount leak in this code now. If the kthread_create call fails, then both the src and dst nfsd_files in the original nfsd4_copy object are leaked. The cleanup in this codepath is also sort of weird. In the async copy case, we'll have up to four nfsd_file references (src and dst for both flavors of copy structure). They are both put at the end of nfsd4_do_async_copy, even though the ones held on behalf of the embedded one outlive that structure. Change it so that we always clean up the nfsd_file refs held by the embedded copy structure before nfsd4_copy returns. Rework cleanup_async_copy to handle both inter and intra copies. Eliminate nfsd4_cleanup_intra_ssc since it now becomes a no-op. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
| * nfsd: zero out pointers after putting nfsd_files on COPY setup errorJeff Layton2023-02-201-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | At first, I thought this might be a source of nfsd_file overputs, but the current callers seem to avoid an extra put when nfsd4_verify_copy returns an error. Still, it's "bad form" to leave the pointers filled out when we don't have a reference to them anymore, and that might lead to bugs later. Zero them out as a defensive coding measure. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
| * nfsd: eliminate __nfs4_get_fdJeff Layton2023-02-201-13/+7
| | | | | | | | | | | | | | This is wrapper is pointless, and just obscures what's going on. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
| * nfsd: add some kerneldoc comments for stateid preprocessing functionsJeff Layton2023-02-201-4/+25
| | | | | | | | | | Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
| * nfsd: eliminate find_deleg_file_lockedJeff Layton2023-02-201-10/+1
| | | | | | | | | | | | | | We really don't need an accessor function here. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
| * nfsd: don't take nfsd4_copy ref for OP_OFFLOAD_STATUSJeff Layton2023-02-202-13/+24
| | | | | | | | | | | | | | | | | | | | | | | | We're not doing any blocking operations for OP_OFFLOAD_STATUS, so taking and putting a reference is a waste of effort. Take the client lock, search for the copy and fetch the wr_bytes_written field and return. Also, make find_async_copy a static function. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
| * NFSD: Replace /proc/fs/nfsd/supported_krb5_enctypes with a symlinkChuck Lever2023-02-201-16/+58
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that I've added a file under /proc/net/rpc that is managed by the SunRPC's Kerberos mechanism, replace NFSD's supported_krb5_enctypes file with a symlink to the new SunRPC proc file, which contains exactly the same content. Remarkably, commit b0b0c0a26e84 ("nfsd: add proc file listing kernel's gss_krb5 enctypes") added the nfsd_supported_krb5_enctypes file in 2011, but this file has never been documented in nfsd(7). Tested-by: Scott Mayhew <smayhew@redhat.com> Reviewed-by: Simo Sorce <simo@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
| * SUNRPC: Use per-CPU counters to tally server RPC countsChuck Lever2023-02-205-12/+16
| | | | | | | | | | | | | | - Improves counting accuracy - Reduces cross-CPU memory traffic Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
| * nfsd: move reply cache initialization into nfsd startupJeff Layton2023-02-202-9/+9
| | | | | | | | | | | | | | | | | | | | | | | | There's no need to start the reply cache before nfsd is up and running, and doing so means that we register a shrinker for every net namespace instead of just the ones where nfsd is running. Move it to the per-net nfsd startup instead. Reported-by: Dai Ngo <dai.ngo@oracle.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
| * SUNRPC: Refactor RPC server dispatch methodChuck Lever2023-02-203-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, svcauth_gss_accept() pre-reserves response buffer space for the RPC payload length and GSS sequence number before returning to the dispatcher, which then adds the header's accept_stat field. The problem is the accept_stat field is supposed to go before the length and seq_num fields. So svcauth_gss_release() has to relocate the accept_stat value (see svcauth_gss_prepare_to_wrap()). To enable these fields to be added to the response buffer in the correct (final) order, the pointer to the accept_stat has to be made available to svcauth_gss_accept() so that it can set it before reserving space for the length and seq_num fields. As a first step, move the pointer to the location of the accept_stat field into struct svc_rqst. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
| * SUNRPC: Push svcxdr_init_encode() into svc_process_common()Chuck Lever2023-02-202-7/+1
| | | | | | | | | | | | | | | | Now that all vs_dispatch functions invoke svcxdr_init_encode(), it is common code and can be pushed down into the generic RPC server. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
| * nfsd: fix potential race in nfs4_find_fileJeff Layton2023-02-201-6/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | The WARN_ON_ONCE check is not terribly useful. It also seems possible for nfs4_find_file to race with the destruction of an fi_deleg_file while trying to take a reference to it. Now that it's safe to pass nfs_get_file a NULL pointer, remove the WARN and NULL pointer check. Take the fi_lock when fetching fi_deleg_file. Cc: NeilBrown <neilb@suse.de> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
| * nfsd: allow nfsd_file_get to sanely handle a NULL pointerJeff Layton2023-02-202-6/+3
| | | | | | | | | | | | | | | | ...and remove some now-useless NULL pointer checks in its callers. Suggested-by: NeilBrown <neilb@suse.de> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
| * NFSD: enhance inter-server copy cleanupDai Ngo2023-02-202-68/+45
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently nfsd4_setup_inter_ssc returns the vfsmount of the source server's export when the mount completes. After the copy is done nfsd4_cleanup_inter_ssc is called with the vfsmount of the source server and it searches nfsd_ssc_mount_list for a matching entry to do the clean up. The problems with this approach are (1) the need to search the nfsd_ssc_mount_list and (2) the code has to handle the case where the matching entry is not found which looks ugly. The enhancement is instead of nfsd4_setup_inter_ssc returning the vfsmount, it returns the nfsd4_ssc_umount_item which has the vfsmount embedded in it. When nfsd4_cleanup_inter_ssc is called it's passed with the nfsd4_ssc_umount_item directly to do the clean up so no searching is needed and there is no need to handle the 'not found' case. Signed-off-by: Dai Ngo <dai.ngo@oracle.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> [ cel: adjusted whitespace and variable/function names ] Reviewed-by: Olga Kornievskaia <kolga@netapp.com>