summaryrefslogtreecommitdiff
path: root/fs/gfs2
Commit message (Collapse)AuthorAgeFilesLines
* Merge branch 'akpm-current/current'Stephen Rothwell2021-04-131-2/+1
|\
| * mm: introduce and use mapping_emptyMatthew Wilcox (Oracle)2021-04-121-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Patch series "Remove nrexceptional tracking", v2. We actually use nrexceptional for very little these days. It's a minor pain to keep in sync with nrpages, but the pain becomes much bigger with the THP patches because we don't know how many indices a shadow entry occupies. It's easier to just remove it than keep it accurate. Also, we save 8 bytes per inode which is nothing to sneeze at; on my laptop, it would improve shmem_inode_cache from 22 to 23 objects per 16kB, and inode_cache from 26 to 27 objects. Combined, that saves a megabyte of memory from a combined usage of 25MB for both caches. Unfortunately, ext4 doesn't cross a magic boundary, so it doesn't save any memory for ext4. This patch (of 4): Instead of checking the two counters (nrpages and nrexceptional), we can just check whether i_pages is empty. Link: https://lkml.kernel.org/r/20201026151849.24232-1-willy@infradead.org Link: https://lkml.kernel.org/r/20201026151849.24232-2-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Tested-by: Vishal Verma <vishal.l.verma@intel.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
* | Merge remote-tracking branch 'kspp/for-next/kspp'Stephen Rothwell2021-04-133-3/+5
|\ \
| * | treewide: Change list_sort to use const pointersSami Tolvanen2021-04-083-3/+5
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | list_sort() internally casts the comparison function passed to it to a different type with constant struct list_head pointers, and uses this pointer to call the functions, which trips indirect call Control-Flow Integrity (CFI) checking. Instead of removing the consts, this change defines the list_cmp_func_t type and changes the comparison function types of all list_sort() callers to use const pointers, thus avoiding type mismatches. Suggested-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Kees Cook <keescook@chromium.org> Tested-by: Nick Desaulniers <ndesaulniers@google.com> Tested-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20210408182843.1754385-10-samitolvanen@google.com
* | Merge remote-tracking branch 'gfs2/for-next'Stephen Rothwell2021-04-1323-243/+309
|\ \
| * | gfs2: Fix a number of kernel-doc warningsLee Jones2021-04-0917-113/+102
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Building the kernel with W=1 results in a number of kernel-doc warnings like incorrect function names and parameter descriptions. Fix those, mostly by adding missing parameter descriptions, removing left-over descriptions, and demoting some less important kernel-doc comments into regular comments. Originally proposed by Lee Jones; improved and combined into a single patch by Andreas. Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| * | gfs2: Make gfs2_setattr_simple staticAndreas Gruenbacher2021-04-082-2/+1
| | | | | | | | | | | | | | | | | | This function is only used in inode.c. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| * | gfs2: Add new sysfs file for gfs2 statusBob Peterson2021-04-081-0/+67
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds a new file: /sys/fs/gfs2/*/status which will report the status of the file system. Catting this file dumps the current status of the file system according to various superblock variables. For example: Journal Checked: 1 Journal Live: 1 Journal ID: 0 Spectator: 0 Withdrawn: 0 No barriers: 0 No recovery: 0 Demote: 0 No Journal ID: 1 Mounted RO: 0 RO Recovery: 0 Skip DLM Unlock: 0 Force AIL Flush: 0 FS Frozen: 0 Withdrawing: 0 Withdraw In Prog: 0 Remote Withdraw: 0 Withdraw Recovery: 0 sd_log_error: 0 sd_log_flush_lock: 0 sd_log_num_revoke: 0 sd_log_in_flight: 0 sd_log_blks_needed: 0 sd_log_blks_free: 32768 sd_log_flush_head: 0 sd_log_flush_tail: 5384 sd_log_blks_reserved: 0 sd_log_revokes_available: 503 Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| * | gfs2: Silence possible null pointer dereference warningAndreas Gruenbacher2021-04-031-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In gfs2_rbm_find, rs is always NULL when minext is NULL, so gfs2_reservation_check_and_update will never be called on a NULL minext. This isn't innediately obvious though, so also check for a NULL minext for better code readability. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| * | gfs2: Turn gfs2_meta_indirect_buffer into gfs2_meta_bufferAndreas Gruenbacher2021-04-033-9/+8
| | | | | | | | | | | | | | | | | | | | | | | | Instead of only supporting GFS2_METATYPE_DI and GFS2_METATYPE_IN blocks, make the block type a parameter of gfs2_meta_indirect_buffer and rename the function to gfs2_meta_buffer. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| * | gfs2: Replace gfs2_lblk_to_dblk with gfs2_get_extentAndreas Gruenbacher2021-04-033-26/+5
| | | | | | | | | | | | | | | | | | | | | We don't need two very similar functions for mapping logical blocks to physical blocks. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| * | gfs2: Turn gfs2_extent_map into gfs2_{get,alloc}_extentAndreas Gruenbacher2021-04-035-33/+53
| | | | | | | | | | | | | | | | | | | | | | | | Convert gfs2_extent_map to iomap and split it into gfs2_get_extent and gfs2_alloc_extent. Instead of hardcoding the extent size, pass it in via the extlen parameter. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| * | gfs2: Add new gfs2_iomap_get helperAndreas Gruenbacher2021-04-033-35/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | Rename the current gfs2_iomap_get and gfs2_iomap_alloc functions to __*. Add a new gfs2_iomap_get helper that doesn't expose struct metapath. Rename gfs2_iomap_get_alloc to gfs2_iomap_alloc. Use the new helpers where they make sense. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| * | gfs2: Remove unused variable sb_formatAndreas Gruenbacher2021-04-032-2/+0
| | | | | | | | | | | | Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| * | gfs2: Fix dir.c function parameter descriptionsBob Peterson2021-04-031-17/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | This patch simply fixes a bunch of function parameter comments in dir.c that were reported by the kernel test robot. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| * | gfs2: Eliminate gh parameter from go_xmote_bh funcBob Peterson2021-04-033-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | The only glock that uses go_xmote_bh glops function is the freeze glock which uses freeze_go_xmote_bh. It does not use its gh parameter, so this patch eliminates the unneeded parameter. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| * | gfs2: don't create empty buffers for NO_CREATEBob Peterson2021-04-031-3/+7
| |/ | | | | | | | | | | | | | | | | | | | | | | Before this patch, function gfs2_getbuf would create empty buffers when it was given the NO_CREATE directive from gfs2_journal_wipe. This is a waste of time: the buffer_head is only used by gfs2_remove_from_journal to determine if the buffer is pinned (which it won't be if it's newly created) and if there's an associated bd element (same story). This patch removes the useless buffer assignment. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
* | Merge remote-tracking branch 'ovl/fileattr_v5' into for-nextAl Viro2021-04-0810-68/+59
|\ \
| * | gfs2: convert to fileattrMiklos Szeredi2021-04-073-43/+27
| |/ | | | | | | | | | | | | | | Use the fileattr API to let the VFS handle locking, permission checking and conversion. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Cc: Andreas Gruenbacher <agruenba@redhat.com>
| * Merge tag 'gfs2-v5.12-rc2-fixes2' of ↵Linus Torvalds2021-04-031-5/+9
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 fixes from Andreas Gruenbacher: "Two more gfs2 fixes" * tag 'gfs2-v5.12-rc2-fixes2' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: gfs2: report "already frozen/thawed" errors gfs2: Flag a withdraw if init_threads() fails
| | * gfs2: report "already frozen/thawed" errorsBob Peterson2021-03-251-4/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Before this patch, gfs2's freeze function failed to report an error when the target file system was already frozen as it should (and as generic vfs function freeze_super does. Similarly, gfs2's thaw function failed to report an error when trying to thaw a file system that is not frozen, as vfs function thaw_super does. The errors were checked, but it always returned a 0 return code. This patch adds the missing error return codes to gfs2 freeze and thaw. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| | * gfs2: Flag a withdraw if init_threads() failsAndrew Price2021-03-151-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Interrupting mount with ^C quickly enough can cause the kthread_run() calls in gfs2's init_threads() to fail and the error path leads to a deadlock on the s_umount rwsem. The abridged chain of events is: [mount path] get_tree_bdev() sget_fc() alloc_super() down_write_nested(&s->s_umount, SINGLE_DEPTH_NESTING); [acquired] gfs2_fill_super() gfs2_make_fs_rw() init_threads() kthread_run() ( Interrupted ) [Error path] gfs2_gl_hash_clear() flush_workqueue(glock_workqueue) wait_for_completion() [workqueue context] glock_work_func() run_queue() do_xmote() freeze_go_sync() freeze_super() down_write(&sb->s_umount) [deadlock] In freeze_go_sync() there is a gfs2_withdrawn() check that we can use to make sure freeze_super() is not called in the error path, so add a gfs2_withdraw_delayed() call when init_threads() fails. Ref: https://bugzilla.kernel.org/show_bug.cgi?id=212231 Reported-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: Andrew Price <anprice@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| * | Merge tag 'block-5.12-2021-03-12-v2' of git://git.kernel.dk/linux-blockLinus Torvalds2021-03-121-1/+1
| |\ \ | | |/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull block fixes from Jens Axboe: "Mostly just random fixes all over the map. The only odd-one-out change is finally getting the rename of BIO_MAX_PAGES to BIO_MAX_VECS done. This should've been done with the multipage bvec change, but it's been left. Do it now to avoid hassles around changes piling up for the next merge window. Summary: - NVMe pull request: - one more quirk (Dmitry Monakhov) - fix max_zone_append_sectors initialization (Chaitanya Kulkarni) - nvme-fc reset/create race fix (James Smart) - fix status code on aborts/resets (Hannes Reinecke) - fix the CSS check for ZNS namespaces (Chaitanya Kulkarni) - fix a use after free in a debug printk in nvme-rdma (Lv Yunlong) - Follow-up NVMe error fix for NULL 'id' (Christoph) - Fixup for the bd_size_lock being IRQ safe, now that the offending driver has been dropped (Damien). - rsxx probe failure error return (Jia-Ju) - umem probe failure error return (Wei) - s390/dasd unbind fixes (Stefan) - blk-cgroup stats summing fix (Xunlei) - zone reset handling fix (Damien) - Rename BIO_MAX_PAGES to BIO_MAX_VECS (Christoph) - Suppress uevent trigger for hidden devices (Daniel) - Fix handling of discard on busy device (Jan) - Fix stale cache issue with zone reset (Shin'ichiro)" * tag 'block-5.12-2021-03-12-v2' of git://git.kernel.dk/linux-block: nvme: fix the nsid value to print in nvme_validate_or_alloc_ns block: Discard page cache of zone reset target range block: Suppress uevent for hidden device when removed block: rename BIO_MAX_PAGES to BIO_MAX_VECS nvme-pci: add the DISABLE_WRITE_ZEROES quirk for a Samsung PM1725a nvme-rdma: Fix a use after free in nvmet_rdma_write_data_done nvme-core: check ctrl css before setting up zns nvme-fc: fix racing controller reset and create association nvme-fc: return NVME_SC_HOST_ABORTED_CMD when a command has been aborted nvme-fc: set NVME_REQ_CANCELLED in nvme_fc_terminate_exchange() nvme: add NVME_REQ_CANCELLED flag in nvme_cancel_request() nvme: simplify error logic in nvme_validate_ns() nvme: set max_zone_append_sectors nvme_revalidate_zones block: rsxx: fix error return code of rsxx_pci_probe() block: Fix REQ_OP_ZONE_RESET_ALL handling umem: fix error return code in mm_pci_probe() blk-cgroup: Fix the recursive blkg rwstat s390/dasd: fix hanging IO request during DASD driver unbind s390/dasd: fix hanging DASD driver unbind block: Try to handle busy underlying device on discard
| | * block: rename BIO_MAX_PAGES to BIO_MAX_VECSChristoph Hellwig2021-03-111-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Ever since the addition of multipage bio_vecs BIO_MAX_PAGES has been horribly confusingly misnamed. Rename it to BIO_MAX_VECS to stop confusing users of the bio API. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20210311110137.1132391-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | gfs2: bypass log flush if the journal is not liveBob Peterson2021-03-121-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Patch fe3e397668775 ("gfs2: Rework the log space allocation logic") changed gfs2_log_flush to reserve a set of journal blocks in case no transaction is active. However, gfs2_log_flush also gets called in cases where we don't have an active journal, for example, for spectator mounts. In that case, trying to reserve blocks would sleep forever, but we want gfs2_log_flush to be a no-op instead. Fixes: fe3e397668775 ("gfs2: Rework the log space allocation logic") Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| * | gfs2: bypass signal_our_withdraw if no journalBob Peterson2021-03-121-5/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Before this patch, function signal_our_withdraw referenced the journal inode immediately. But corrupt file systems may have some invalid journals, in which case our attempt to read it in will withdraw and the resulting signal_our_withdraw would dereference the NULL value. This patch adds a check to signal_our_withdraw so that if the journal has not yet been initialized, it simply returns and does the old-style withdraw. Thanks, Andy Price, for his analysis. Reported-by: syzbot+50a8a9cf8127f2c6f5df@syzkaller.appspotmail.com Fixes: 601ef0d52e96 ("gfs2: Force withdraw to replay journals and wait for it to finish") Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| * | gfs2: fix use-after-free in trans_drainBob Peterson2021-03-072-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds code to function trans_drain to remove drained bd elements from the ail lists, if queued, before freeing the bd. If we don't remove the bd from the ail, function ail_drain will try to reference the bd after it has been freed by trans_drain. Thanks to Andy Price for his analysis of the problem. Reported-by: Andy Price <anprice@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| * | gfs2: make function gfs2_make_fs_ro() to void typeYang Li2021-03-074-13/+5
| |/ | | | | | | | | | | | | | | | | | | | | It fixes the following warning detected by coccinelle: ./fs/gfs2/super.c:592:5-10: Unneeded variable: "error". Return "0" on line 628 Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
* | gfs2: be careful with inode refreshAl Viro2021-03-121-8/+14
|/ | | | | | | | | | | | 1) gfs2_dinode_in() should *not* touch ->i_rdev on live inodes; even "zero and immediately reread the same value from dinode" is broken - have it overlap with ->release() of char device and you can get all kinds of bogus behaviour. 2) mismatch on inode type on live inodes should be treated as fs corruption rather than blindly setting ->i_mode. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* Merge branch 'work.misc' of ↵Linus Torvalds2021-02-271-2/+2
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull misc vfs updates from Al Viro: "Assorted stuff pile - no common topic here" * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: whack-a-mole: don't open-code iminor/imajor 9p: fix misuse of sscanf() in v9fs_stat2inode() audit_alloc_mark(): don't open-code ERR_CAST() fs/inode.c: make inode_init_always() initialize i_ino to 0 vfs: don't unnecessarily clone write access for writable fds
| * whack-a-mole: don't open-code iminor/imajorAl Viro2021-02-231-2/+2
| | | | | | | | | | | | several instances creeped back into the tree... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | Merge tag 'gfs2-for-5.12' of ↵Linus Torvalds2021-02-2325-660/+964
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 updates from Andreas Gruenbacher: - Log space and revoke accounting rework to fix some failed asserts. - Local resource group glock sharing for better local performance. - Add support for version 1802 filesystems: trusted xattr support and '-o rgrplvb' mounts by default. - Actually synchronize on the inode glock's FREEING bit during withdraw ("gfs2: fix glock confusion in function signal_our_withdraw"). - Fix parallel recovery of multiple journals ("gfs2: keep bios separate for each journal"). - Various other bug fixes. * tag 'gfs2-for-5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: (49 commits) gfs2: Don't get stuck with I/O plugged in gfs2_ail1_flush gfs2: Per-revoke accounting in transactions gfs2: Rework the log space allocation logic gfs2: Minor calc_reserved cleanup gfs2: Use resource group glock sharing gfs2: Allow node-wide exclusive glock sharing gfs2: Add local resource group locking gfs2: Add per-reservation reserved block accounting gfs2: Rename rs_{free -> requested} and rd_{reserved -> requested} gfs2: Check for active reservation in gfs2_release gfs2: Don't search for unreserved space twice gfs2: Only pass reservation down to gfs2_rbm_find gfs2: Also reflect single-block allocations in rgd->rd_extfail_pt gfs2: Recursive gfs2_quota_hold in gfs2_iomap_end gfs2: Add trusted xattr support gfs2: Enable rgrplvb for sb_fs_format 1802 gfs2: Don't skip dlm unlock if glock has an lvb gfs2: Lock imbalance on error path in gfs2_recover_one gfs2: Move function gfs2_ail_empty_tr gfs2: Get rid of current_tail() ...
| * | gfs2: Don't get stuck with I/O plugged in gfs2_ail1_flushBob Peterson2021-02-231-2/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In gfs2_ail1_flush, we're using I/O plugging to give the block layer a better chance of merging I/O requests. If we're too aggressive here, we can end up waiting on I/O to complete while still plugged. Fix that in a way similar to writeback_sb_inodes, except that we can't use blk_flush_plug because blk_flush_plug_list is not exported. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| | |
| | \
| *-. \ Merge branches 'rgrp-glock-sharing' and 'gfs2-revoke' from ↵Andreas Gruenbacher2021-02-2319-549/+777
| |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | https://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2.git Merge the resource group glock sharing feature and the revoke accounting rework. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| | | * | gfs2: Per-revoke accounting in transactionsAndreas Gruenbacher2021-02-227-42/+131
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the log, revokes are stored as a revoke descriptor (struct gfs2_log_descriptor), followed by zero or more additional revoke blocks (struct gfs2_meta_header). On filesystems with a blocksize of 4k, the revoke descriptor contains up to 503 revokes, and the metadata blocks contain up to 509 revokes each. We've so far been reserving space for revokes in transactions in block granularity, so a lot more space than necessary was being allocated and then released again. This patch switches to assigning revokes to transactions individually instead. Initially, space for the revoke descriptor is reserved and handed out to transactions. When more revokes than that are reserved, additional revoke blocks are added. When the log is flushed, the space for the additional revoke blocks is released, but we keep the space for the revoke descriptor block allocated. Transactions may still reserve more revokes than they will actually need in the end, but now we won't overshoot the target as much, and by only returning the space for excess revokes at log flush time, we further reduce the amount of contention between processes. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| | | * | gfs2: Rework the log space allocation logicAndreas Gruenbacher2021-02-223-69/+104
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current log space allocation logic is hard to understand or extend. The principle it that when the log is flushed, we may or may not have a transaction active that has space allocated in the log. To deal with that, we set aside a magical number of blocks to be used in case we don't have an active transaction. It isn't clear that the pool will always be big enough. In addition, we can't return unused log space at the end of a transaction, so the number of blocks allocated must exactly match the number of blocks used. Simplify this as follows: * When transactions are allocated or merged, always reserve enough blocks to flush the transaction (err on the safe side). * In gfs2_log_flush, return any allocated blocks that haven't been used. * Maintain a pool of spare blocks big enough to do one log flush, as before. * In gfs2_log_flush, when we have no active transaction, allocate a suitable number of blocks. For that, use the spare pool when called from logd, and leave the pool alone otherwise. This means that when the log is almost full, logd will still be able to do one more log flush, which will result in more log space becoming available. This will make the log space allocator code easier to work with in the future. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| | | * | gfs2: Minor calc_reserved cleanupAndreas Gruenbacher2021-02-221-8/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | No functional change. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| | | * | gfs2: Move function gfs2_ail_empty_trAndreas Gruenbacher2021-02-031-17/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move this function further up in log.c so that we can use it in the next patch. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| | | * | gfs2: Get rid of current_tail()Andreas Gruenbacher2021-02-033-37/+44
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Keep the current value of the updated log tail in the super block as sb_log_flush_tail instead of computing it on the fly. This avoids unnecessary sd_ail_lock taking and cleans up the code. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| | | * | gfs2: Use a tighter bound in gfs2_trans_beginAndreas Gruenbacher2021-02-031-2/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use a tighter bound for the number of blocks required by transactions in gfs2_trans_begin: in the worst case, we'll have mixed data and metadata, so we'll need a log desciptor for each type. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| | | * | gfs2: Clean up gfs2_log_reserveAndreas Gruenbacher2021-02-032-35/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Wake up log waiters in gfs2_log_release when log space has actually become available. This is a much better place for the wakeup than gfs2_logd. Check if enough log space is immeditely available before anything else. If there isn't, use io_wait_event to wait instead of open-coding it. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| | | * | gfs2: Don't wait for journal flush in clean_journalAndreas Gruenbacher2021-02-031-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 588bff95c94e added gfs2_write_log_header() and started using it in clean_journal(), with an additional call to log_flush_wait() at the end of gfs2_write_log_header() which is unnecessary for clean_journal(). Move that call out of gfs2_write_log_header() to restore the previous behavior. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| | | * | gfs2: Move lock flush locking to gfs2_trans_{begin,end}Andreas Gruenbacher2021-02-033-33/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move the read locking of sd_log_flush_lock from gfs2_log_reserve to gfs2_trans_begin, and its unlocking from gfs2_log_release to gfs2_trans_end. Use gfs2_log_release in two places in which it was open coded before. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| | | * | gfs2: Get rid of sd_reserving_logAndreas Gruenbacher2021-02-036-18/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This counter and the associated wait queue are only used so that gfs2_make_fs_ro can efficiently wait for all pending log space allocations to fail after setting the filesystem to read-only. This comes at the cost of waking up that wait queue very frequently. Instead, when gfs2_log_reserve fails because the filesystem has become read-only, Wake up sd_log_waitq. In gfs2_make_fs_ro, set the file system read-only and then wait until all the log space has been released. Give up and report the problem after a while. With that, sd_reserving_log and sd_reserving_log_wait can be removed. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| | | * | gfs2: Clean up on-stack transactionsAndreas Gruenbacher2021-02-035-42/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Replace the TR_ALLOCED flag by its inverse, TR_ONSTACK: that way, the flag only needs to be set in the exceptional case of on-stack transactions. Split off __gfs2_trans_begin from gfs2_trans_begin and use it to replace the open-coded version in gfs2_ail_empty_gl. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| | | * | gfs2: Use sb_start_intwrite in gfs2_ail_empty_glAndreas Gruenbacher2021-02-032-6/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 2e60d7683c8d ("GFS2: update freeze code to use freeze/thaw_super on all nodes") optimized away the sb_start_intwrite ... sb_end_intwrite protection for the on-stack transactions in gfs2_ail_empty_gl with no explanation. I can't think of a valid reason for doing that, so revert that change. This simplifies the next commit. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| | | * | gfs2: Clean up ail2_emptyAndreas Gruenbacher2021-01-191-17/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Clean up the logic in ail2_empty (no functional change). Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| | | * | gfs2: Rename gfs2_{write => flush}_revokesAndreas Gruenbacher2021-01-193-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Function gfs2_write_revokes doesn't actually write any revokes; instead, it adds revokes to the system transaction during a flush. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| | | * | gfs2: Minor debugging improvementAndreas Gruenbacher2021-01-191-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Split the assert in gfs2_trans_end into two parts. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| | | * | gfs2: Some documentation updatesAndreas Gruenbacher2021-01-191-13/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The calc_reserved description claims that buf_limit is 502 (on 4k filesystems), but it is actually 503. Fix / clarify the entire description. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>