| Commit message (Collapse) | Author | Age | Files | Lines |
|\
| |
| |
| |
| |
| |
| |
| | |
git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2
* 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2:
Ocfs2/move_extents: Validate moving goal after the adjustment.
Ocfs2/move_extents: Avoid doing division in extent moving.
|
| |\
| | |
| | |
| | | |
ocfs2-merge-window
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
though the goal_to_be_moved will be validated again in following moving, it's
still a good idea to validate it after adjustment at the very beginning, instead
of validating it before adjustment.
Signed-off-by: Tristan Ye <tristan.ye@oracle.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
It's not wise enough to do a 64bits division anywhere in kernside, replace it
with a decent helper or proper shifts.
Signed-off-by: Tristan Ye <tristan.ye@oracle.com>
|
|\ \ \
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
* git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-linus:
Squashfs: update email address
Squashfs: add extra sanity checks at mount time
Squashfs: add sanity checks to fragment reading at mount time
Squashfs: add sanity checks to lookup table reading at mount time
Squashfs: add sanity checks to id reading at mount time
Squashfs: add sanity checks to xattr reading at mount time
Squashfs: reverse order of filesystem table reading
Squashfs: move table allocation into squashfs_read_table()
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
My existing email address may stop working in a month or two, so update
email to one that will continue working.
Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Add some extra sanity checks of the inode and directory structures.
Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Fsfuzzer generates corrupted filesystems which throw a warn_on in
kmalloc. One of these is due to a corrupted superblock fragments field.
Fix this by checking that the number of bytes to be read (and allocated)
does not extend into the next filesystem structure.
Also add a couple of other sanity checks of the mount-time fragment table
structures.
Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Fsfuzzer generates corrupted filesystems which throw a warn_on in
kmalloc. One of these is due to a corrupted superblock inodes field.
Fix this by checking that the number of bytes to be read (and allocated)
does not extend into the next filesystem structure.
Also add a couple of other sanity checks of the mount-time lookup table
structures.
Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Fsfuzzer generates corrupted filesystems which throw a warn_on in
kmalloc. One of these is due to a corrupted superblock no_ids field.
Fix this by checking that the number of bytes to be read (and allocated)
does not extend into the next filesystem structure.
Also add a couple of other sanity checks of the mount-time id table
structures.
Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
These checks add sanity checking of the mount-time xattr structures.
Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Reverse order of table reading from mostly first to last in placement
order, to last to first. This is to enable extra superblock sanity
checks to be added in later patches.
Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
This eliminates a lot of duplicate code.
Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
The kernel automatically evaluates partition tables of storage devices.
The code for evaluating GUID partitions (in fs/partitions/efi.c) contains
a bug that causes a kernel oops on certain corrupted GUID partition
tables.
This bug has security impacts, because it allows, for example, to
prepare a storage device that crashes a kernel subsystem upon connecting
the device (e.g., a "USB Stick of (Partial) Death").
crc = efi_crc32((const unsigned char *) (*gpt), le32_to_cpu((*gpt)->header_size));
computes a CRC32 checksum over gpt covering (*gpt)->header_size bytes.
There is no validation of (*gpt)->header_size before the efi_crc32 call.
A corrupted partition table may have large values for (*gpt)->header_size.
In this case, the CRC32 computation access memory beyond the memory
allocated for gpt, which may cause a kernel heap overflow.
Validate value of GUID partition table header size.
[akpm@linux-foundation.org: fix layout and indenting]
Signed-off-by: Timo Warns <warns@pre-sense.de>
Cc: Matt Domsch <Matt_Domsch@dell.com>
Cc: Eugene Teo <eugeneteo@kernel.sg>
Cc: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
The balloon driver in a Xen guest frees guest pages and marks them as
mmio. When the kernel crashes and the crash kernel attempts to read the
oldmem via /proc/vmcore a read from ballooned pages will generate 100%
load in dom0 because Xen asks qemu-dm for the page content. Since the
reads come in as 8byte requests each ballooned page is tried 512 times.
With this change a hook can be registered which checks wether the given
pfn is really ram. The hook has to return a value > 0 for ram pages, a
value < 0 on error (because the hypercall is not known) and 0 for non-ram
pages.
This will reduce the time to read /proc/vmcore. Without this change a
512M guest with 128M crashkernel region needs 200 seconds to read it, with
this change it takes just 2 seconds.
Signed-off-by: Olaf Hering <olaf@aepfle.de>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Currently, pagemap_read() has three error and/or corner case handling
mistake.
(1) If ppos parameter is wrong, mm refcount will be leak.
(2) If count parameter is 0, mm refcount will be leak too.
(3) If the current task is sleeping in kmalloc() and the system
is out of memory and oom-killer kill the proc associated task,
mm_refcount prevent the task free its memory. then system may
hang up.
<Quote Hugh's explain why we shold call kmalloc() before get_mm()>
check_mem_permission gets a reference to the mm. If we
__get_free_page after check_mem_permission, imagine what happens if the
system is out of memory, and the mm we're looking at is selected for
killing by the OOM killer: while we wait in __get_free_page for more
memory, no memory is freed from the selected mm because it cannot reach
exit_mmap while we hold that reference.
This patch fixes the above three.
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jovi Zhang <bookjovi@gmail.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Stephen Wilson <wilsons@start.ca>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
It whould be better if put check_mem_permission after __get_free_page in
mem_write, to be same as function mem_read.
Hugh Dickins explained the reason.
check_mem_permission gets a reference to the mm. If we __get_free_page
after check_mem_permission, imagine what happens if the system is out
of memory, and the mm we're looking at is selected for killing by the
OOM killer: while we wait in __get_free_page for more memory, no memory
is freed from the selected mm because it cannot reach exit_mmap while
we hold that reference.
Reported-by: Jovi Zhang <bookjovi@gmail.com>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Hugh Dickins <hughd@google.com>
Reviewed-by: Stephen Wilson <wilsons@start.ca>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
There is a macro for the max size kmalloc can allocate, so use it instead
of a hardcoded number.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
No need for this local array to be writable, so mark it const.
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Convert fs/proc/ from strict_strto*() to kstrto*() functions.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Now, exe_file is not proc FS dependent, so we can use it to name core
file. So we add %E pattern for core file name cration which extract path
from mm_struct->exe_file. Then it converts slashes to exclamation marks
and pastes the result to the core file name itself.
This is useful for environments where binary names are longer than 16
character (the current->comm limitation). Also where there are binaries
with same name but in a different path. Further in case the binery itself
changes its current->comm after exec.
So by doing (s/$/#/ -- # is treated as git comment):
$ sysctl kernel.core_pattern='core.%p.%e.%E'
$ ln /bin/cat cat45678901234567890
$ ./cat45678901234567890
^Z
$ rm cat45678901234567890
$ fg
^\Quit (core dumped)
$ ls core*
we now get:
core.2434.cat456789012345.!root!cat45678901234567890 (deleted)
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Reviewed-by: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Setup and cleanup of mm_struct->exe_file is currently done in fs/proc/.
This was because exe_file was needed only for /proc/<pid>/exe. Since we
will need the exe_file functionality also for core dumps (so core name can
contain full binary path), built this functionality always into the
kernel.
To achieve that move that out of proc FS to the kernel/ where in fact it
should belong. By doing that we can make dup_mm_exe_file static. Also we
can drop linux/proc_fs.h inclusion in fs/exec.c and kernel/fork.c.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Two new stats in per-memcg memory.stat which tracks the number of page
faults and number of major page faults.
"pgfault"
"pgmajfault"
They are different from "pgpgin"/"pgpgout" stat which count number of
pages charged/discharged to the cgroup and have no meaning of reading/
writing page to disk.
It is valuable to track the two stats for both measuring application's
performance as well as the efficiency of the kernel page reclaim path.
Counting pagefaults per process is useful, but we also need the aggregated
value since processes are monitored and controlled in cgroup basis in
memcg.
Functional test: check the total number of pgfault/pgmajfault of all
memcgs and compare with global vmstat value:
$ cat /proc/vmstat | grep fault
pgfault 1070751
pgmajfault 553
$ cat /dev/cgroup/memory.stat | grep fault
pgfault 1071138
pgmajfault 553
total_pgfault 1071142
total_pgmajfault 553
$ cat /dev/cgroup/A/memory.stat | grep fault
pgfault 199
pgmajfault 0
total_pgfault 199
total_pgmajfault 0
Performance test: run page fault test(pft) wit 16 thread on faulting in
15G anon pages in 16G container. There is no regression noticed on the
"flt/cpu/s"
Sample output from pft:
TAG pft:anon-sys-default:
Gb Thr CLine User System Wall flt/cpu/s fault/wsec
15 16 1 0.67s 233.41s 14.76s 16798.546 266356.260
+-------------------------------------------------------------------------+
N Min Max Median Avg Stddev
x 10 16682.962 17344.027 16913.524 16928.812 166.5362
+ 10 16695.568 16923.896 16820.604 16824.652 84.816568
No difference proven at 95.0% confidence
[akpm@linux-foundation.org: fix build]
[hughd@google.com: shmem fix]
Signed-off-by: Ying Han <yinghan@google.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Originally i_lastfrag was 32 bits but then we added support for handling
64 bit metadata and it became a 64 bit variable. That was during 2007, in
54fb996ac15c "[PATCH] ufs2 write: block allocation update". Unfortunately
these casts got left behind so the value got truncated to 32 bit again.
[akpm@linux-foundation.org: remove now-unneeded min_t/max_t casting]
Signed-off-by: Dan Carpenter <error27@gmail.com>
Cc: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|\ \ \ \
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild-2.6
* 'trivial' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild-2.6:
gfs2: Drop __TIME__ usage
isdn/diva: Drop __TIME__ usage
atm: Drop __TIME__ usage
dlm: Drop __TIME__ usage
wan/pc300: Drop __TIME__ usage
parport: Drop __TIME__ usage
hdlcdrv: Drop __TIME__ usage
baycom: Drop __TIME__ usage
pmcraid: Drop __DATE__ usage
edac: Drop __DATE__ usage
rio: Drop __DATE__ usage
scsi/wd33c93: Drop __TIME__ usage
scsi/in2000: Drop __TIME__ usage
aacraid: Drop __TIME__ usage
media/cx231xx: Drop __TIME__ usage
media/radio-maxiradio: Drop __TIME__ usage
nozomi: Drop __TIME__ usage
cyclades: Drop __TIME__ usage
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
The kernel already prints its build timestamp during boot, no need to
repeat it in random drivers and produce different object files each
time.
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: cluster-devel@redhat.com
Signed-off-by: Michal Marek <mmarek@suse.cz>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
The kernel already prints its build timestamp during boot, no need to
repeat it in random drivers and produce different object files each
time.
Cc: Christine Caulfield <ccaulfie@redhat.com>
Cc: David Teigland <teigland@redhat.com>
Cc: cluster-devel@redhat.com
Signed-off-by: Michal Marek <mmarek@suse.cz>
|
|\ \ \ \ \
| | |_|/ /
| |/| | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2
* 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2: (28 commits)
Ocfs2: Teach local-mounted ocfs2 to handle unwritten_extents correctly.
ocfs2/dlm: Do not migrate resource to a node that is leaving the domain
ocfs2/dlm: Add new dlm message DLM_BEGIN_EXIT_DOMAIN_MSG
Ocfs2/move_extents: Set several trivial constraints for threshold.
Ocfs2/move_extents: Let defrag handle partial extent moving.
Ocfs2/move_extents: move/defrag extents within a certain range.
Ocfs2/move_extents: helper to calculate the defraging length in one run.
Ocfs2/move_extents: move entire/partial extent.
Ocfs2/move_extents: helpers to update the group descriptor and global bitmap inode.
Ocfs2/move_extents: helper to probe a proper region to move in an alloc group.
Ocfs2/move_extents: helper to validate and adjust moving goal.
Ocfs2/move_extents: find the victim alloc group, where the given #blk fits.
Ocfs2/move_extents: defrag a range of extent.
Ocfs2/move_extents: move a range of extent.
Ocfs2/move_extents: lock allocators and reserve metadata blocks and data clusters for extents moving.
Ocfs2/move_extents: Add basic framework and source files for extent moving.
Ocfs2/move_extents: Adding new ioctl code 'OCFS2_IOC_MOVE_EXT' to ocfs2.
Ocfs2/refcounttree: Publicize couple of funcs from refcounttree.c
Ocfs2: Add a new code 'OCFS2_INFO_FREEFRAG' for o2info ioctl.
Ocfs2: Add a new code 'OCFS2_INFO_FREEINODE' for o2info ioctl.
...
|
| |\ \ \ \
| | | |_|/
| | |/| |
| | | | |
| | | | |
| | | | |
| | | | | |
ocfs2-merge-window
Conflicts:
fs/ocfs2/ioctl.c
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
The threshold should be greater than clustersize and less than i_size.
Signed-off-by: Tristan Ye <tristan.ye@oracle.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
We're going to support partial extent moving, which may split entire extent
movement into pieces to compromise the insuffice allocations, it eases the
'ENSPC' pain and makes the whole moving much less likely to fail, the downside
is it may make the fs even more fragmented before moving, just let the userspace
make a trade-off here.
Signed-off-by: Tristan Ye <tristan.ye@oracle.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
the basic logic of moving extents for a file is pretty like punching-hole
sequence, walk the extents within the range as user specified, calculating
an appropriate len to defrag/move, then let ocfs2_defrag/move_extent() to
do the actual moving.
This func ends up setting 'OCFS2_MOVE_EXT_FL_COMPLETE' to userpace if operation
gets done successfully.
Signed-off-by: Tristan Ye <tristan.ye@oracle.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
The helper is to calculate the defrag length in one run according to a threshold,
it will proceed doing defragmentation until the threshold was meet, and skip a
LARGE extent if any.
Signed-off-by: Tristan Ye <tristan.ye@oracle.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
ocfs2_move_extent() logic will validate the goal_offset_in_block,
where extents to be moved, what's more, it also compromises a bit
to probe the appropriate region around given goal_offset when the
original goal is not able to fit the movement.
Signed-off-by: Tristan Ye <tristan.ye@oracle.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
inode.
These helpers were actually borrowed from alloc.c, which may be publicized
later.
Signed-off-by: Tristan Ye <tristan.ye@oracle.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Before doing the movement of extents, we'd better probe the alloc group from
'goal_blk' for searching a contiguous region to fit the wanted movement, we
even will have a best-effort try by compromising to a threshold around the
given goal.
Signed-off-by: Tristan Ye <tristan.ye@oracle.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
First best-effort attempt to validate and adjust the goal (physical address in
block), while it can't guarantee later operation can succeed all the time since
global_bitmap may change a bit over time.
Signed-off-by: Tristan Ye <tristan.ye@oracle.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
This function tries locate the right alloc group, where a given physical block
resides, it returns the caller a buffer_head of victim group descriptor, and also
the offset of block in this group, by passing the block number.
Signed-off-by: Tristan Ye <tristan.ye@oracle.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
It's a relatively complete function to accomplish defragmentation for entire
or partial extent, one journal handle was kept during the operation, it was
logically doing one more thing than ocfs2_move_extent() acutally, yes, it's
claiming the new clusters itself;-)
Signed-off-by: Tristan Ye <tristan.ye@oracle.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
The moving range of __ocfs2_move_extent() was within one extent always, it
consists following parts:
1. Duplicates the clusters in pages to new_blkoffset, where extent to be moved.
2. Split the original extent with new extent, coalecse the nearby extents if possible.
3. Append old clusters to truncate log, or decrease_refcount if the extent was refcounted.
Signed-off-by: Tristan Ye <tristan.ye@oracle.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
clusters for extents moving.
ocfs2_lock_allocators_move_extents() was like the common ocfs2_lock_allocators(),
to lock metadata and data alloctors during extents moving, reserve appropriate
metadata blocks and data clusters, also performa a best- effort to calculate the
credits for journal transaction in one run of movement.
Signed-off-by: Tristan Ye <tristan.ye@oracle.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Adding new files move_extents.[c|h] and fill it with nothing but
only a context structure.
Signed-off-by: Tristan Ye <tristan.ye@oracle.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Patch also manages to add a manipulative struture for this ioctl.
Signed-off-by: Tristan Ye <tristan.ye@oracle.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
The original goal of commonizing these funcs is to benefit defraging/extent_moving
codes in the future, based on the fact that reflink and defragmentation having
the same Copy-On-Wrtie mechanism.
Signed-off-by: Tristan Ye <tristan.ye@oracle.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
This new code is a bit more complicated than former ones, the goal is to
show user all statistics required to take a deep insight into filesystem
on how the disk is being fragmentaed.
The goal is achieved by scaning global bitmap from (cluster)group to group
to figure out following factors in the filesystem:
- How many free chunks in a fixed size as user requested.
- How many real free chunks in all size.
- Min/Max/Avg size(in) clusters of free chunks.
- How do free chunks distribute(in size) in terms of a histogram,
just like following:
---------------------------------------------------------
Extent Size Range : Free extents Free Clusters Percent
32K... 64K- : 1 1 0.00%
1M... 2M- : 9 288 0.03%
8M... 16M- : 2 831 0.09%
32M... 64M- : 1 2047 0.23%
128M... 256M- : 1 8191 0.92%
256M... 512M- : 2 21706 2.43%
512M... 1024M- : 27 858623 96.29%
---------------------------------------------------------
Userspace ioctl() call eventually gets the above info returned by passing
a 'struct ocfs2_info_freefrag' with the chunk_size being specified first.
Signed-off-by: Tristan Ye <tristan.ye@oracle.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
The new code is dedicated to calculate free inodes number of all inode_allocs,
then return the info to userpace in terms of an array.
Specially, flag 'OCFS2_INFO_FL_NON_COHERENT', manipulated by '--cluster-coherent'
from userspace, is now going to be involved. setting the flag on means no cluster
coherency considered, usually, userspace tools choose none-coherency strategy by
default for the sake of performace.
Signed-off-by: Tristan Ye <tristan.ye@oracle.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
It just removes some macros for the sake of typechecking gains.
Signed-off-by: Tristan Ye <tristan.ye@oracle.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Oops, local-mounted of 'ocfs2_fops_no_plocks' is just missing the support
of unwritten_extents/punching-hole due to no func pointer was given correctly
to '.follocate' field.
Signed-off-by: Tristan Ye <tristan.ye@oracle.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
During dlm domain shutdown, o2dlm has to free all the lock resources. Ones that
have no locks and references are freed. Ones that have locks and/or references
are migrated to another node.
The first task in migration is finding a target. Currently we scan the lock
resource and find one node that either has a lock or a reference. This is not
very efficient in a parallel umount case as we might end up migrating the
lock resource to a node which itself may have to migrate it to a third node.
The patch scans the dlm->exit_domain_map to ensure the target node is not
leaving the domain. If no valid target node is found, o2dlm does not migrate
the resource but instead waits for the unlock and deref messages that will
allow it to free the resource.
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Joel Becker <jlbec@evilplan.org>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
This patch adds a new dlm message DLM_BEGIN_EXIT_DOMAIN_MSG and ups the dlm
protocol to 1.2.
o2dlm sends this new message in dlm_unregister_domain() to mark the beginning
of the exit domain. This message is sent to all nodes in the domain.
Currently o2dlm has no way of informing other nodes of its impending exit.
This information is useful as the other nodes could disregard the exiting
node in certain operations. For example, in resource migration. If two or
more nodes were umounting in parallel, it would be more efficient if o2dlm
were to choose a non-exiting node to be the new master node rather than an
exiting one.
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Joel Becker <jlbec@evilplan.org>
|