| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
| |
With an -rt kernel, and a heavy sync IO load, tasks can jam
up on journal locks without unplugging, which can lead to
terminal IO starvation. Unplug and schedule when waiting for space.
Signed-off-by: Mike Galbraith <mgalbraith@suse.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Theodore Tso <tytso@mit.edu>
Link: http://lkml.kernel.org/r/1341812414.7370.73.camel@marge.simpson.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
|
|
|
|
|
|
|
| |
Retry loops on RT might loop forever when the modifying side was
preempted. Use cpu_chill() instead of cpu_relax() to let the system
make progress.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable-rt@vger.kernel.org
|
|
|
|
| |
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On Sat, 2007-10-27 at 11:44 +0200, Ingo Molnar wrote:
> * Nick Piggin <nickpiggin@yahoo.com.au> wrote:
>
> > > [10138.175796] [<c0105de3>] show_trace+0x12/0x14
> > > [10138.180291] [<c0105dfb>] dump_stack+0x16/0x18
> > > [10138.184769] [<c011609f>] native_smp_call_function_mask+0x138/0x13d
> > > [10138.191117] [<c0117606>] smp_call_function+0x1e/0x24
> > > [10138.196210] [<c012f85c>] on_each_cpu+0x25/0x50
> > > [10138.200807] [<c0115c74>] flush_tlb_all+0x1e/0x20
> > > [10138.205553] [<c016caaf>] kmap_high+0x1b6/0x417
> > > [10138.210118] [<c011ec88>] kmap+0x4d/0x4f
> > > [10138.214102] [<c026a9d8>] ntfs_end_buffer_async_read+0x228/0x2f9
> > > [10138.220163] [<c01a0e9e>] end_bio_bh_io_sync+0x26/0x3f
> > > [10138.225352] [<c01a2b09>] bio_endio+0x42/0x6d
> > > [10138.229769] [<c02c2a08>] __end_that_request_first+0x115/0x4ac
> > > [10138.235682] [<c02c2da7>] end_that_request_chunk+0x8/0xa
> > > [10138.241052] [<c0365943>] ide_end_request+0x55/0x10a
> > > [10138.246058] [<c036dae3>] ide_dma_intr+0x6f/0xac
> > > [10138.250727] [<c0366d83>] ide_intr+0x93/0x1e0
> > > [10138.255125] [<c015afb4>] handle_IRQ_event+0x5c/0xc9
> >
> > Looks like ntfs is kmap()ing from interrupt context. Should be using
> > kmap_atomic instead, I think.
>
> it's not atomic interrupt context but irq thread context - and -rt
> remaps kmap_atomic() to kmap() internally.
Hm. Looking at the change to mm/bounce.c, perhaps I should do this
instead?
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
|
|
| |
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Otherwise there will be warning on ARM like below:
WARNING: at build/linux/kernel/smp.c:459 smp_call_function_many+0x98/0x264()
Modules linked in:
[<c0013bb4>] (unwind_backtrace+0x0/0xe4) from [<c001be94>] (warn_slowpath_common+0x4c/0x64)
[<c001be94>] (warn_slowpath_common+0x4c/0x64) from [<c001bec4>] (warn_slowpath_null+0x18/0x1c)
[<c001bec4>] (warn_slowpath_null+0x18/0x1c) from [<c0053ff8>](smp_call_function_many+0x98/0x264)
[<c0053ff8>] (smp_call_function_many+0x98/0x264) from [<c0054364>] (smp_call_function+0x44/0x6c)
[<c0054364>] (smp_call_function+0x44/0x6c) from [<c0017d50>] (__new_context+0xbc/0x124)
[<c0017d50>] (__new_context+0xbc/0x124) from [<c009e49c>] (flush_old_exec+0x460/0x5e4)
[<c009e49c>] (flush_old_exec+0x460/0x5e4) from [<c00d61ac>] (load_elf_binary+0x2e0/0x11ac)
[<c00d61ac>] (load_elf_binary+0x2e0/0x11ac) from [<c009d060>] (search_binary_handler+0x94/0x2a4)
[<c009d060>] (search_binary_handler+0x94/0x2a4) from [<c009e8fc>] (do_execve+0x254/0x364)
[<c009e8fc>] (do_execve+0x254/0x364) from [<c0010e84>] (sys_execve+0x34/0x54)
[<c0010e84>] (sys_execve+0x34/0x54) from [<c000da00>] (ret_fast_syscall+0x0/0x30)
---[ end trace 0000000000000002 ]---
The reason is that ARM need irq enabled when doing activate_mm().
According to mm-protect-activate-switch-mm.patch, actually
preempt_[disable|enable]_rt() is sufficient.
Inspired-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Yong Zhang <yong.zhang0@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/1337061236-1766-1-git-send-email-yong.zhang0@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
|
|
|
|
| |
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
|
|
|
|
|
|
|
| |
On RT we cannot loop with preemption disabled here as
mnt_make_readonly() might have been preempted. We can safely enable
preemption while waiting for MNT_WRITE_HOLD to be cleared. Safe on !RT
as well.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
|
|
|
|
|
|
|
|
|
| |
If hrtimer_try_to_cancel() requires a retry, then depending on the
priority setting te retry loop might prevent timer callback completion
on RT. Prevent that by waiting for completion on RT, no change for a
non RT kernel.
Reported-by: Sankara Muthukrishnan <sankara.m@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable-rt@vger.kernel.org
|
|
|
|
|
|
|
| |
Wrap the bit_spin_lock calls into a separate inline and add the RT
replacements with a real spinlock.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
|
|
| |
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 03b71c6ca6286625d8f1ed44aabab9b5bf5dac10 ]
The search ioctl skips items that are too large for a result buffer, but
inline items of a certain size occuring before any search result is
found would trigger an overflow and stop the search entirely.
Bug: https://bugzilla.kernel.org/show_bug.cgi?id=57641
Cc: stable@vger.kernel.org
Signed-off-by: Gabriel de Perthuis <g2p.code+btrfs@gmail.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit e6155736ad76b2070652745f9e54cdea3f0d8567 ]
In the case where we are allocating for a non-extent file,
we must limit the groups we allocate from to those below
2^32 blocks, and ext4_mb_regular_allocator() attempts to
do this initially by putting a cap on ngroups for the
subsequent search loop.
However, the initial target group comes in from the
allocation context (ac), and it may already be beyond
the artificially limited ngroups. In this case,
the limit
if (group == ngroups)
group = 0;
at the top of the loop is never true, and the loop will
run away.
Catch this case inside the loop and reset the search to
start at group 0.
[sandeen@redhat.com: add commit msg & comments]
Signed-off-by: Lachlan McIlroy <lmcilroy@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 41b0fc42800569f63e029549b75c4c9cb63f2dfd ]
A user reported a panic while running a balance. What was happening was he was
relocating a block, which added the reference to the relocation tree. Then
relocation would walk through the relocation tree and drop that reference and
free that block, and then it would walk down a snapshot which referenced the
same block and add another ref to the block. The problem is this was all
happening in the same transaction, so the parent block was free'ed up when we
drop our reference which was immediately available for allocation, and then it
was used _again_ to add a reference for the same block from a different
snapshot. This resulted in something like this in the delayed ref tree
add ref to 90234880, parent=2067398656, ref_root 1766, level 1
del ref to 90234880, parent=2067398656, ref_root 18446744073709551608, level 1
add ref to 90234880, parent=2067398656, ref_root 1767, level 1
as you can see the ref_root's don't match, because when we inc the ref we use
the header owner, which is the original tree the block belonged to, instead of
the data reloc tree. Then when we remove the extent we use the reloc tree
objectid. But none of this matters, since it is a shared reference which means
only the parent matters. When the delayed ref stuff runs it adds all the
increments first, and then does all the drops, to make sure that we don't delete
the ref if we net a positive ref count. But tree blocks aren't allowed to have
multiple refs from the same block, so this panics when it tries to add the
second ref. We need the add and the drop to cancel each other out in memory so
we only do the final add.
So to fix this we need to adjust how the delayed refs are added to the tree.
Only the ref_root matters when it is a normal backref, and only the parent
matters when it is a shared backref. So make our decision based on what ref
type we have. This allows us to keep the ref_root in memory in case anybody
wants to use it for something else, and it allows the delayed refs to be merged
properly so we don't end up with this panic.
With this patch the users image no longer panics on mount, and it has a clean
fsck after a normal mount/umount cycle. Thanks,
Cc: stable@vger.kernel.org
Reported-by: Roman Mamedov <rm@romanrm.ru>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 3f8a6411fbada1fa482276591e037f3b1adcf55b ]
Addresses-Red-Hat-Bugzilla: #913245
Reported-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit ce8a5dbdf9e709bdaf4618d7ef8cceb91e8adc69 ]
When checking if an autofs mount point is busy it isn't sufficient to
only check if it's a mount point.
For example, if the mount of an offset mountpoint in a tree is denied
for this host by its export and the dentry becomes a process working
directory the check incorrectly returns the mount as not in use at
expire.
This can happen since the default when mounting within a tree is
nostrict, which means ingnore mount fails on mounts within the tree and
continue. The nostrict option is meant to allow mounting in this case.
Signed-off-by: David Jeffery <djeffery@redhat.com>
Signed-off-by: Ian Kent <raven@themaw.net>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 7f3e3c7cfcec148ccca9c0dd2dbfd7b00b7ac10f ]
Fox the Kconfig documentation for CONFIG_EXT4_DEBUG to match the
change made by commit a0b30c1229: ext4: use module parameters instead
of debugfs for mballoc_debug
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit c5c72d814cf0f650010337c73638b25e6d14d2d4 ]
Commit fb0a387dcdc restricts block allocations for indirect-mapped
files to block groups less than s_blockfile_groups. However, the
online resizing code wasn't setting s_blockfile_groups, so the newly
added block groups were not available for non-extent mapped files.
Reported-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 171a7f21a76a0958c225b97c00a97a10390d40ee ]
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 5d3ee20855e28169d711b394857ee608a5023094 ]
It is incorrect to use list_for_each_entry_safe() for journal callback
traversial because ->next may be removed by other task:
->ext4_mb_free_metadata()
->ext4_mb_free_metadata()
->ext4_journal_callback_del()
This results in the following issue:
WARNING: at lib/list_debug.c:62 __list_del_entry+0x1c0/0x250()
Hardware name:
list_del corruption. prev->next should be ffff88019a4ec198, but was 6b6b6b6b6b6b6b6b
Modules linked in: cpufreq_ondemand acpi_cpufreq freq_table mperf coretemp kvm_intel kvm crc32c_intel ghash_clmulni_intel microcode sg xhci_hcd button sd_mod crc_t10dif aesni_intel ablk_helper cryptd lrw aes_x86_64 xts gf128mul ahci libahci pata_acpi ata_generic dm_mirror dm_region_hash dm_log dm_mod
Pid: 16400, comm: jbd2/dm-1-8 Tainted: G W 3.8.0-rc3+ #107
Call Trace:
[<ffffffff8106fb0d>] warn_slowpath_common+0xad/0xf0
[<ffffffff8106fc06>] warn_slowpath_fmt+0x46/0x50
[<ffffffff813637e9>] ? ext4_journal_commit_callback+0x99/0xc0
[<ffffffff8148cae0>] __list_del_entry+0x1c0/0x250
[<ffffffff813637bf>] ext4_journal_commit_callback+0x6f/0xc0
[<ffffffff813ca336>] jbd2_journal_commit_transaction+0x23a6/0x2570
[<ffffffff8108aa42>] ? try_to_del_timer_sync+0x82/0xa0
[<ffffffff8108b491>] ? del_timer_sync+0x91/0x1e0
[<ffffffff813d3ecf>] kjournald2+0x19f/0x6a0
[<ffffffff810ad630>] ? wake_up_bit+0x40/0x40
[<ffffffff813d3d30>] ? bit_spin_lock+0x80/0x80
[<ffffffff810ac6be>] kthread+0x10e/0x120
[<ffffffff810ac5b0>] ? __init_kthread_worker+0x70/0x70
[<ffffffff818ff6ac>] ret_from_fork+0x7c/0xb0
[<ffffffff810ac5b0>] ? __init_kthread_worker+0x70/0x70
This patch fix the issue as follows:
- ext4_journal_commit_callback() make list truly traversial safe
simply by always starting from list_head
- fix race between two ext4_journal_callback_del() and
ext4_journal_callback_try_del()
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: stable@vger.kernel.com
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 794446c6946513c684d448205fbd76fa35f38b72 ]
The following race is possible:
[kjournald2] other_task
jbd2_journal_commit_transaction()
j_state = T_FINISHED;
spin_unlock(&journal->j_list_lock);
->jbd2_journal_remove_checkpoint()
->jbd2_journal_free_transaction();
->kmem_cache_free(transaction)
->j_commit_callback(journal, transaction);
-> USE_AFTER_FREE
WARNING: at lib/list_debug.c:62 __list_del_entry+0x1c0/0x250()
Hardware name:
list_del corruption. prev->next should be ffff88019a4ec198, but was 6b6b6b6b6b6b6b6b
Modules linked in: cpufreq_ondemand acpi_cpufreq freq_table mperf coretemp kvm_intel kvm crc32c_intel ghash_clmulni_intel microcode sg xhci_hcd button sd_mod crc_t10dif aesni_intel ablk_helper cryptd lrw aes_x86_64 xts gf128mul ahci libahci pata_acpi ata_generic dm_mirror dm_region_hash dm_log dm_mod
Pid: 16400, comm: jbd2/dm-1-8 Tainted: G W 3.8.0-rc3+ #107
Call Trace:
[<ffffffff8106fb0d>] warn_slowpath_common+0xad/0xf0
[<ffffffff8106fc06>] warn_slowpath_fmt+0x46/0x50
[<ffffffff813637e9>] ? ext4_journal_commit_callback+0x99/0xc0
[<ffffffff8148cae0>] __list_del_entry+0x1c0/0x250
[<ffffffff813637bf>] ext4_journal_commit_callback+0x6f/0xc0
[<ffffffff813ca336>] jbd2_journal_commit_transaction+0x23a6/0x2570
[<ffffffff8108aa42>] ? try_to_del_timer_sync+0x82/0xa0
[<ffffffff8108b491>] ? del_timer_sync+0x91/0x1e0
[<ffffffff813d3ecf>] kjournald2+0x19f/0x6a0
[<ffffffff810ad630>] ? wake_up_bit+0x40/0x40
[<ffffffff813d3d30>] ? bit_spin_lock+0x80/0x80
[<ffffffff810ac6be>] kthread+0x10e/0x120
[<ffffffff810ac5b0>] ? __init_kthread_worker+0x70/0x70
[<ffffffff818ff6ac>] ret_from_fork+0x7c/0xb0
[<ffffffff810ac5b0>] ? __init_kthread_worker+0x70/0x70
In order to demonstrace this issue one should mount ext4 with mount -o
discard option on SSD disk. This makes callback longer and race
window becomes wider.
In order to fix this we should mark transaction as finished only after
callbacks have completed
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit d76a3a77113db020d9bb1e894822869410450bd9 ]
In the case where an inode has a very stale transaction id (tid) in
i_datasync_tid or i_sync_tid, it's possible that after a very large
(2**31) number of transactions, that the tid number space might wrap,
causing tid_geq()'s calculations to fail.
Commit deeeaf13 "jbd2: fix fsync() tid wraparound bug", later modified
by commit e7b04ac0 "jbd2: don't wake kjournald unnecessarily",
attempted to fix this problem, but it only avoided kjournald spinning
forever by fixing the logic in jbd2_log_start_commit().
Unfortunately, in the codepaths in fs/ext4/fsync.c and fs/ext4/inode.c
that might call jbd2_log_start_commit() with a stale tid, those
functions will subsequently call jbd2_log_wait_commit() with the same
stale tid, and then wait for a very long time. To fix this, we
replace the calls to jbd2_log_start_commit() and
jbd2_log_wait_commit() with a call to a new function,
jbd2_complete_transaction(), which will correctly handle stale tid's.
As a bonus, jbd2_complete_transaction() will avoid locking
j_state_lock for writing unless a commit needs to be started. This
should have a small (but probably not measurable) improvement for
ext4's scalability.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reported-by: Ben Hutchings <ben@decadent.org.uk>
Reported-by: George Barnett <gbarnett@atlassian.com>
Cc: stable@vger.kernel.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit bf8d909705e9d9bac31d9b8eac6734d2b51332a7 ]
The seconds field of an nfstime4 structure is 64bit, but we are assuming
that the first 32bits are zero-filled. So if the client tries to set
atime to a value before the epoch (touch -t 196001010101), then the
server will save the wrong value on disk.
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Cc: stable@kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit b022032e195ffca83d7002d6b84297d796ed443b ]
we should return error status directly when nfs4_preprocess_stateid_op
return error.
Signed-off-by: fanchaoting <fanchaoting@cn.fujitsu.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 0c7c3e67ab91ec6caa44bdf1fc89a48012ceb0c5 ]
Don't actually close any opens until we don't need them at all.
This means being left with write access when it's not really necessary,
but that's better than putting a file that might still have posix locks
held on it, as we have been.
Reported-by: Toralf Förster <toralf.foerster@gmx.de>
Cc: stable@kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 8b6cc4d6f841d31f72fe7478453759166d366274 ]
A server shouldn't normally return NFS4ERR_GRACE if the client holds a
delegation, since no conflicting lock reclaims can be granted, however
the spec does not require the server to grant the open in this
instance
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 1dfd89af8697a299e7982ae740d4695ecd917eef ]
After a server reboot, the reclaimer thread will recover all the existing
locks. For locks that are blocked, however, it will change the value
of block->b_status to nlm_lck_denied_grace_period in order to signal that
they need to wake up and resend the original blocking lock request.
Due to a bug, however, the block->b_status never gets reset after the
blocked locks have been woken up, and so the process goes into an
infinite loop of resends until the blocked lock is satisfied.
Reported-by: Marc Eshel <eshel@us.ibm.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit e56fb2874015370e3b7f8d85051f6dce26051df9 ]
threadgroup_lock() takes signal->cred_guard_mutex to ensure that
thread_group_leader() is stable. This doesn't look nice, the scope of
this lock in do_execve() is huge.
And as Dave pointed out this can lead to deadlock, we have the
following dependencies:
do_execve: cred_guard_mutex -> i_mutex
cgroup_mount: i_mutex -> cgroup_mutex
attach_task_by_pid: cgroup_mutex -> cred_guard_mutex
Change de_thread() to take threadgroup_change_begin() around the
switch-the-leader code and change threadgroup_lock() to avoid
->cred_guard_mutex.
Note that de_thread() can't sleep with ->group_rwsem held, this can
obviously deadlock with the exiting leader if the writer is active, so it
does threadgroup_change_end() before schedule().
Reported-by: Dave Jones <davej@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 421348f1ca0bf17769dee0aed4d991845ae0536d ]
Call cond_resched() in shrink_dcache_parent() to maintain interactivity.
Before this patch:
void shrink_dcache_parent(struct dentry * parent)
{
while ((found = select_parent(parent, &dispose)) != 0)
shrink_dentry_list(&dispose);
}
select_parent() populates the dispose list with dentries which
shrink_dentry_list() then deletes. select_parent() carefully uses
need_resched() to avoid doing too much work at once. But neither
shrink_dcache_parent() nor its called functions call cond_resched(). So
once need_resched() is set select_parent() will return single dentry
dispose list which is then deleted by shrink_dentry_list(). This is
inefficient when there are a lot of dentry to process. This can cause
softlockup and hurts interactivity on non preemptable kernels.
This change adds cond_resched() in shrink_dcache_parent(). The benefit
of this is that need_resched() is quickly cleared so that future calls
to select_parent() are able to efficiently return a big batch of dentry.
These additional cond_resched() do not seem to impact performance, at
least for the workload below.
Here is a program which can cause soft lockup if other system activity
sets need_resched().
int main()
{
struct rlimit rlim;
int i;
int f[100000];
char buf[20];
struct timeval t1, t2;
double diff;
/* cleanup past run */
system("rm -rf x");
/* boost nfile rlimit */
rlim.rlim_cur = 200000;
rlim.rlim_max = 200000;
if (setrlimit(RLIMIT_NOFILE, &rlim))
err(1, "setrlimit");
/* make directory for files */
if (mkdir("x", 0700))
err(1, "mkdir");
if (gettimeofday(&t1, NULL))
err(1, "gettimeofday");
/* populate directory with open files */
for (i = 0; i < 100000; i++) {
snprintf(buf, sizeof(buf), "x/%d", i);
f[i] = open(buf, O_CREAT);
if (f[i] == -1)
err(1, "open");
}
/* close some of the files */
for (i = 0; i < 85000; i++)
close(f[i]);
/* unlink all files, even open ones */
system("rm -rf x");
if (gettimeofday(&t2, NULL))
err(1, "gettimeofday");
diff = (((double)t2.tv_sec * 1000000 + t2.tv_usec) -
((double)t1.tv_sec * 1000000 + t1.tv_usec));
printf("done: %g elapsed\n", diff/1e6);
return 0;
}
Signed-off-by: Greg Thelen <gthelen@google.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 04df32fa10ab9a6f0643db2949d42efc966bc844 ]
When we run the crackerjack testsuite, the inotify_add_watch test is
stalled.
This is caused by the invalid mask 0 - the task is waiting for the event
but it never comes. inotify_add_watch() should return -EINVAL as it did
before commit 676a0675cf92 ("inotify: remove broken mask checks causing
unmount to be EINVAL"). That commit removes the invalid mask check, but
that check is needed.
Check the mask's ALL_INOTIFY_BITS before the inotify_arg_to_mask() call.
If none are set, just return -EINVAL.
Because IN_UNMOUNT is in ALL_INOTIFY_BITS, this change will not trigger
the problem that above commit fixed.
[akpm@linux-foundation.org: fix build]
Signed-off-by: Zhao Hongjiang <zhaohongjiang@huawei.com>
Acked-by: Jim Somerville <Jim.Somerville@windriver.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Eric Paris <eparis@parisplace.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 6ee8630e02be6dd89926ca0fbc21af68b23dc087 ]
On architectures where a pgd entry may be shared between user and kernel
(e.g. ARM+LPAE), freeing page tables needs a ceiling other than 0.
This patch introduces a generic USER_PGTABLES_CEILING that arch code can
override. It is the responsibility of the arch code setting the ceiling
to ensure the complete freeing of the page tables (usually in
pgd_free()).
[catalin.marinas@arm.com: commit log; shift_arg_pages(), asm-generic/pgtables.h changes]
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: <stable@vger.kernel.org> [3.3+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit ec686c9239b4d472052a271c505d04dae84214cc ]
There is a kernel memory leak observed when the proc file
/proc/fs/fscache/stats is read.
The reason is that in fscache_stats_open, single_open is called and the
respective release function is not called during release. Hence fix
with correct release function - single_release().
Addresses https://bugzilla.kernel.org/show_bug.cgi?id=57101
Signed-off-by: Anurup m <anurup.m@huawei.com>
Cc: shyju pv <shyju.pv@huawei.com>
Cc: Sanil kumar <sanil.kumar@huawei.com>
Cc: Nataraj m <nataraj.m@huawei.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: David Howells <dhowells@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit f7db5e7660b122142410dcf36ba903c73d473250 ]
The inode->i_mutex isn't hold when updating filp->f_pos
in read()/write(), so the filp->f_pos might be read as
0 or 1 in readdir() when there is concurrent read()/write()
on this same file, then may cause use after free in readdir().
The bug can be reproduced with Li Zefan's test code on the
link:
https://patchwork.kernel.org/patch/2160771/
This patch fixes the use after free under this situation.
Cc: stable <stable@vger.kernel.org>
Reported-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 91d80a84bbc8f28375cca7e65ec666577b4209ad ]
dprintk() shouldn't access @ring after it's unmapped.
Signed-off-by: Zhao Hongjiang <zhaohongjiang@huawei.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 4bc4bee4595662d8bff92180d5c32e3313a704b0 ]
While trying to track down a tree log replay bug I noticed that fsck was always
complaining about nbytes not being right for our fsynced file. That is because
the new fsync stuff doesn't wait for ordered extents to complete, so the inodes
nbytes are not necessarily updated properly when we log it. So to fix this we
need to set nbytes to whatever it is on the inode that is on disk, so when we
replay the extents we can just add the bytes that are being added as we replay
the extent. This makes it work for the case that we have the wrong nbytes or
the case that we logged everything and nbytes is actually correct. With this
I'm no longer getting nbytes errors out of btrfsck.
Cc: stable@vger.kernel.org
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 12f267a20aecf8b84a2a9069b9011f1661c779b4 ]
Change a u32 to loff_t hfsplus_file_truncate().
Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Hin-Tak Leung <htl10@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 5b55d708335a9e3e4f61f2dadf7511502205ccd1 ]
Revert commit 62a3ddef6181 ("vfs: fix spinning prevention in prune_icache_sb").
This commit doesn't look right: since we are looking at the tail of the
list (sb->s_inode_lru.prev) if we want to skip an inode, we should put
it back at the head of the list instead of the tail, otherwise we will
keep spinning on it.
Discovered when investigating why prune_icache_sb came top in perf
reports of a swapping load.
Signed-off-by: Suleiman Souhlal <suleiman@google.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: stable@vger.kernel.org # v3.2+
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit c369c9a4a7c82d33329d869cbaf93304cc7a0c40 ]
Fixes a regression in cifs_parse_mount_options where a password
which begins with a delimitor is parsed incorrectly as being a blank
password.
Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit c2952d202f710d326ac36a8ea6bd216b20615ec8 ]
When withdraw occurs, we need to continue to allow unlocks of fcntl
locks to occur, however these will only be local, since the node has
withdrawn from the cluster. This prevents triggering a VFS level
bug trap due to locks remaining when a file is closed.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 67e753ca41782913d805ff4a8a2b0f60b26b7915 ]
The UBIFS space fixup is a useful feature which allows to fixup the "broken"
flash space at the time of the first mount. The "broken" space is usually the
result of using a "dumb" industrial flasher which is not able to skip empty
NAND pages and just writes all 0xFFs to the empty space, which has grave
side-effects for UBIFS when UBIFS trise to write useful data to those empty
pages.
The fix-up feature works roughly like this:
1. mkfs.ubifs sets the fixup flag in UBIFS superblock when creating the image
(see -F option)
2. when the file-system is mounted for the first time, UBIFS notices the fixup
flag and re-writes the entire media atomically, which may take really a lot
of time.
3. UBIFS clears the fixup flag in the superblock.
This works fine when the file system is mounted R/W for the very first time.
But it did not really work in the case when we first mount the file-system R/O,
and then re-mount R/W. The reason was that we started the fixup procedure too
late, which we cannot really do because we have to fixup the space before it
starts being used.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Reported-by: Mark Jackson <mpfj-list@mimc.co.uk>
Cc: stable@vger.kernel.org # 3.0+
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit f4881bc7a83eff263789dd524b7c269d138d4af5 ]
Dave reported a warning when running xfstest 275. We have been leaking delalloc
metadata space when our reservations fail. This is because we were improperly
calculating how much space to free for our checksum reservations. The problem
is we would sometimes free up space that had already been freed in another
thread and we would end up with negative usage for the delalloc space. This
patch fixes the problem by calculating how much space the other threads would
have already freed, and then calculate how much space we need to free had we not
done the reservation at all, and then freeing any excess space. This makes
xfstests 275 no longer have leaked space. Thanks
Cc: stable@vger.kernel.org
Reported-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 64a817cfbded8674f345d1117b117f942a351a69 ]
Since we only enforce an upper bound, not a lower bound, a "negative"
length can get through here.
The symptom seen was a warning when we attempt to a kmalloc with an
excessive size.
Reported-by: Toralf Förster <toralf.foerster@gmx.de>
Cc: stable@kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit c1681bf8a7b1b98edee8b862a42c19c4e53205fd ]
struct block_device lifecycle is defined by its inode (see fs/block_dev.c) -
block_device allocated first time we access /dev/loopXX and deallocated on
bdev_destroy_inode. When we create the device "losetup /dev/loopXX afile"
we want that block_device stay alive until we destroy the loop device
with "losetup -d".
But because we do not hold /dev/loopXX inode its counter goes 0, and
inode/bdev can be destroyed at any moment. Usually it happens at memory
pressure or when user drops inode cache (like in the test below). When later in
loop_clr_fd() we want to use bdev we have use-after-free error with following
stack:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000280
bd_set_size+0x10/0xa0
loop_clr_fd+0x1f8/0x420 [loop]
lo_ioctl+0x200/0x7e0 [loop]
lo_compat_ioctl+0x47/0xe0 [loop]
compat_blkdev_ioctl+0x341/0x1290
do_filp_open+0x42/0xa0
compat_sys_ioctl+0xc1/0xf20
do_sys_open+0x16e/0x1d0
sysenter_dispatch+0x7/0x1a
To prevent use-after-free we need to grab the device in loop_set_fd()
and put it later in loop_clr_fd().
The issue is reprodusible on current Linus head and v3.3. Here is the test:
dd if=/dev/zero of=loop.file bs=1M count=1
while [ true ]; do
losetup /dev/loop0 loop.file
echo 2 > /proc/sys/vm/drop_caches
losetup -d /dev/loop0
done
[ Doing bdgrab/bput in loop_set_fd/loop_clr_fd is safe, because every
time we call loop_set_fd() we check that loop_device->lo_state is
Lo_unbound and set it to Lo_bound If somebody will try to set_fd again
it will get EBUSY. And if we try to loop_clr_fd() on unbound loop
device we'll get ENXIO.
loop_set_fd/loop_clr_fd (and any other loop ioctl) is called under
loop_device->lo_ctl_mutex. ]
Signed-off-by: Anatol Pomozov <anatol.pomozov@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit d8fe29e9dea8d7d61fd140d8779326856478fc62 ]
A user reported a panic where we were panicing somewhere in
tree_backref_for_extent from scrub_print_warning. He only captured the trace
but looking at scrub_print_warning we drop the path right before we mess with
the extent buffer to print out a bunch of stuff, which isn't right. So fix this
by dropping the path after we use the eb if we need to. Thanks,
Cc: stable@vger.kernel.org
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit fdf30d1c1b386e1b73116cc7e0fb14e962b763b0 ]
A user reported a problem where he was getting early ENOSPC with hundreds of
gigs of free data space and 6 gigs of free metadata space. This is because the
global block reserve was taking up the entire free metadata space. This is
ridiculous, we have infrastructure in place to throttle if we start using too
much of the global reserve, so instead of letting it get this huge just limit it
to 512mb so that users can still get work done. This allowed the user to
complete his rsync without issues. Thanks
Cc: stable@vger.kernel.org
Reported-and-tested-by: Stefan Priebe <s.priebe@profihost.ag>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 4adaa611020fa6ac65b0ac8db78276af4ec04e63 ]
Btrfs uses page_mkwrite to ensure stable pages during
crc calculations and mmap workloads. We call clear_page_dirty_for_io
before we do any crcs, and this forces any application with the file
mapped to wait for the crc to finish before it is allowed to change
the file.
With compression on, the clear_page_dirty_for_io step is happening after
we've compressed the pages. This means the applications might be
changing the pages while we are compressing them, and some of those
modifications might not hit the disk.
This commit adds the clear_page_dirty_for_io before compression starts
and makes sure to redirty the page if we have to fallback to
uncompressed IO as well.
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Reported-by: Alexandre Oliva <oliva@gnu.org>
cc: stable@vger.kernel.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 9bf7a4890518186238d2579be16ecc5190a707c0 ]
We need to inc the nlink of deleted entries when running replay so we can do the
unlink on the fs_root and get everything cleaned up and then have the orphan
cleanup do the right thing. The problem is inc_nlink complains about this, even
thought it still does the right thing. So use set_nlink() if our i_nlink is 0
to keep users from seeing the warnings during log replay. Thanks,
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 3151527ee007b73a0ebd296010f1c0454a919c7d ]
Guarantee that the policy of which files may be access that is
established by setting the root directory will not be violated
by user namespaces by verifying that the root directory points
to the root of the mount namespace at the time of user namespace
creation.
Changing the root is a privileged operation, and as a matter of policy
it serves to limit unprivileged processes to files below the current
root directory.
For reasons of simplicity and comprehensibility the privilege to
change the root directory is gated solely on the CAP_SYS_CHROOT
capability in the user namespace. Therefore when creating a user
namespace we must ensure that the policy of which files may be access
can not be violated by changing the root directory.
Anyone who runs a processes in a chroot and would like to use user
namespace can setup the same view of filesystems with a mount
namespace instead. With this result that this is not a practical
limitation for using user namespaces.
Cc: stable@vger.kernel.org
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Reported-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 7ea600b5314529f9d1b9d6d3c41cb26fce6a7a4a ]
... lest we get livelocks between path_is_under() and d_path() and friends.
The thing is, wrt fairness lglocks are more similar to rwsems than to rwlocks;
it is possible to have thread B spin on attempt to take lock shared while thread
A is already holding it shared, if B is on lower-numbered CPU than A and there's
a thread C spinning on attempt to take the same lock exclusive.
As the result, we need consistent ordering between vfsmount_lock (lglock) and
rename_lock (seq_lock), even though everything that takes both is going to take
vfsmount_lock only shared.
Spotted-by: Brad Spengler <spender@grsecurity.net>
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 24956804349ca0eadcdde032d65e8c00b4214096 ]
Note that clearing NFS_INO_LAYOUTCOMMIT is tricky, since it requires
you to also clear the NFS_LSEG_LAYOUTCOMMIT bits from the layout
segments.
The only two sites that need to do this are the ones that call
pnfs_return_layout() without first doing a layout commit.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: Benny Halevy <bhalevy@tonian.com>
Cc: stable@vger.kernel.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|