summaryrefslogtreecommitdiff
path: root/Monitor.c
Commit message (Collapse)AuthorAgeFilesLines
* Monitor: handle v.quick removal of devices better.NeilBrown2011-03-221-1/+1
| | | | | | | | | | | | | If a device fails and then is removed before Monitor sees the failure, GET_DISK_INFO returns nothing so Monitor relies on mdstat info where '_' is incorrectly interpreted as 'a spare'. We should treat '_' as 'removed' - that is safer. Without this, a v.quick fail+remove gets reported as 'Failed' then 'SpareActive'. Signed-off-by: NeilBrown <neilb@suse.de>
* FIX: ping_monitor() usage causes memory leaksAdam Kwolek2011-03-181-1/+1
| | | | | | | | | | When for ping_monitor() input devnum2devname() is used, received string pointer should be passed to free() for memory release. It is not made in several places. This use case should have function to avoid memory leak. Signed-off-by: Adam Kwolek <adam.kwolek@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
* Various compile fixes.NeilBrown2011-02-011-1/+2
| | | | | | | Make "make everything" succeed. This fixed some real bugs. Signed-off-by: NeilBrown <neilb@suse.de>
* Allow domain_test to report that no domains were found.NeilBrown2011-02-011-1/+1
| | | | | | | | | Sometime we will need to know the difference between no domains found and domains didn't match. So allow domain_test to return different values and fix up all callers to maintain current behaviour. Signed-off-by: NeilBrown <neilb@suse.de>
* Monitor: do not move partitions to external containerCzarnowska, Anna2011-02-011-0/+4
| | | | | | | | Arrays on partitions are not supported for external metadata so do not take such spare from native array. Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
* Monitor: avoid adding too many spares to containerCzarnowska, Anna2011-01-281-3/+30
| | | | | | | | | | | | | | | | | Tests revealed that sometimes there are still more spares taken than needed. The reason for this is that after adding one spare to container with degraded subarray if between ioctl in main loop and load_container in try_spare_migration mdmon activates the spare we see active<raid but find no spares in parent container and so add an extra spare. To prevent such behaviour we count active disks in the list returned by getinfo_super_disks and compare it with subarray->active. If the number has increased it means new spare was added and activated so there is no need for more. Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
* fix: Monitor: min_size must be set to 0Czarnowska, Anna2011-01-171-1/+3
| | | | | | | | Otherwise a random value will be used for comparison later for native and ddf metadata (until min_acceptable_spare_size is defined). Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
* fix: segfault if subarray is monitored but container is notCzarnowska, Anna2011-01-171-0/+5
| | | | | | | | | | | In this situation to->parent is null so "to" doesn't change to parent container and to->metadata is still null. This results in segmentation fault when checking to->metadata->ss->external. We should just skip this array as container is needed to move spares to. Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
* Monitor: skip array if error getting sizeAnna Czarnowska2011-01-121-8/+13
| | | | | | | | | | | | | | load_super tries to load container first anyway but if it fails eg. after physically removing a disk then it tries to read metadata from container device. This will always fail and print confusing errors. So use load_container instead of load_super on container. On failure to read metadata we should skip this array. It will be dealt with the next time round. Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
* move_spare function modified and moved to Manage.cAnna Czarnowska2011-01-051-47/+4
| | | | | | | It will also be needed for Incremental. Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
* Use one function chosing spares from containerAnna Czarnowska2011-01-051-38/+13
| | | | | | | | | | | | | | | | | container_chose_spares in Monitor.c and get_spares_for_grow in super-intel.c do the same thing: search for spares in a container. Another version will also be needed for Incremental so a more general solution is presented here and applied in two previous contexts. Normally domlist==NULL would lead an empty list but this is typically checked earlier so here it is interpreted as "do not test domains". Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
* Monitor: Check destination array domain early.Marcin Labun2010-12-211-6/+8
| | | | | | | | | Destination arrays that do not have any domains are excluded from spare sharing. We can check it early, without searching for donor arrays. Signed-off-by: Marcin Labun <marcin.labun@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
* fix: Monitor doesn't return after starting daemonAnna Czarnowska2010-12-151-4/+12
| | | | | Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
* Allow --update=devicesize with --re-addNeilBrown2010-12-091-3/+3
| | | | | | | | | | This is useful with 1.1 and 1.2 metadata to update the metadata if the device size has changed. The same functionality can be achieved by writing to the device size in sysfs after re-adding normally, but in some cases this might be easier. Signed-off-by: NeilBrown <neilb@suse.de>
* Monitor: don't add more spares than neededAnna Czarnowska2010-12-031-1/+22
| | | | | | | | | | | | | | | | When we add a spare to a container it takes a while before it is noticed by mdmon and recovery starts. During this time the array remains degraded but we don't want to add any more spares to this container. Therefore we must check container with degraded array if it doesn't already have a suitable spare. container_choose_spare is reused with from=to Domain check is not needed in this situation. Ping_manager after moving disk is needed to be able to see newly added disk in container after coming back through the loop. Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
* Monitor: only get min_size onceAnna Czarnowska2010-12-031-8/+8
| | | | | | | | We may call chose_spare several times before we find a suitable one so it is better to get the size beforehand. Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
* Monitor: pass statelist reference when adding new arraysAnna Czarnowska2010-12-031-5/+5
| | | | | | | Otherwise it will not get updated. Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
* Monitor: array that has disappeared doesn't need sparesAnna Czarnowska2010-11-291-1/+1
| | | | | | | | If a degraded array disappears we still have it in statelist with active<raid but it is pointless to look for spares for it. Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
* Monitor: fix writing autorebuild.pidAnna Czarnowska2010-11-291-10/+16
| | | | | | | | | If /var/run/mdadm doesn't exist we can never succeed writing so we should try to create it first. When we make sure it is there we write pid file as before. Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
* Monitor: reset dev when size too smallAnna Czarnowska2010-11-291-2/+3
| | | | | | | | | Cc: linux-raid@vger.kernel.org, Williams, Dan J <dan.j.williams@intel.com>, Ciechanowski, Ed <ed.ciechanowski@intel.com> Otherwise spare will be considered good anyway. Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
* Monitor: devid should be dev_tAnna Czarnowska2010-11-291-7/+7
| | | | | | | | For consistency with makedev(). int is not sufficient. Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
* Monitor: few bug fixes for spare migrationAnna Czarnowska2010-11-291-2/+11
| | | | | | | | | | | 1. If array not changed we should still report any degraded - another array may have a new spare that we can move. 2. Array with err=1 can't give a spare. 3. We look for spares in "from" not "st" which is supertype and has devname=NULL. Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
* Monitor: choose spare correctly for external metadata.NeilBrown2010-11-251-1/+62
| | | | | | | | | When metadata is managed externally - probably as a container - we need to examine that metadata to see which devices are spares. So use the getinfo_super_disk message and use the info returned. Signed-off-by: NeilBrown <neilb@suse.de>
* Monitor: separate 'choose_spare' out from 'move_spare'NeilBrown2010-11-251-34/+42
| | | | | | | | choosing a spare from a container is more complicated that from a native array. So separate out choose_spare to make it easier to use an alternate implementation Signed-off-by: NeilBrown <neilb@suse.de>
* Monitor: check spare group is non-NULL before adding to domain listNeilBrown2010-11-231-1/+3
| | | | | | | ... otherwise we crash. Reported-by: "Labun, Marcin" <Marcin.Labun@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
* Monitor: Allow metadata to set minimum size for spare to migrate in.Anna Czarnowska2010-11-221-1/+31
| | | | | Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
* Monitor: teach spare migration about containersNeilBrown2010-11-221-5/+23
| | | | | | | | | | When trying to move a spare, move to the container of a degraded array, not to the array itself. And don't try to move from a subarray, only from a native or container array. And don't move from a container which contains degraded subarrays. Signed-off-by: NeilBrown <neilb@suse.de>
* Monitor: policy based spare migration.NeilBrown2010-11-221-23/+39
| | | | | | | | | | | Rather than only migrating between arrays with the same spare_group, we now migrate based on domains set in the policy. In order for spare_group to continue to work, we treat it as a domain of the destination array, and a domain of any device we might remove from a source array. Signed-off-by: NeilBrown <neilb@suse.de>
* Monitor: split out check_donorNeilBrown2010-11-221-8/+16
| | | | | | | | Checking compatibility between arrays for spare migration is going to become a little more complicated, so split it out into a separate function. Signed-off-by: NeilBrown <neilb@suse.de>
* Monitor: split out move_spare in spare migration.NeilBrown2010-11-221-43/+55
| | | | | | This is a simple refactoring with no functionality change. Signed-off-by: NeilBrown <neilb@suse.de>
* Monior: create struct for holding alert info.NeilBrown2010-11-221-67/+70
| | | | | | | Rather than passing mailaddr, mailfrom, cmd, dosyslog around in argument lists, create a structure to hold them all. Signed-off-by: NeilBrown <neilb@suse.de>
* Monitor: use calloc rather than mallocNeilBrown2010-11-221-13/+3
| | | | | | | calloc zeros the memory allocated, which is safer, particularly as we add more things to struct state. Signed-off-by: NeilBrown <neilb@suse.de>
* Monitor: minor optimisation to spare migration.NeilBrown2010-11-221-16/+21
| | | | | | | Only try spare migration if we know that at least one array is degraded. Signed-off-by: NeilBrown <neilb@suse.de>
* Monitor: link containers with subarrays in statelistMarcin Labun2010-11-221-0/+36
| | | | | | | | | Each containers has list of its subarrays. Each subarray has back link to its parent container. Signed-off-by: Marcin Labun <marcin.labun@intel.com> Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
* Break Monitor into smaller functions.NeilBrown2010-11-221-397/+452
| | | | | | | Monitor() has become way too big. Break it up into multiple smaller functions that are all called from the main loop. Signed-off-by: NeilBrown <neilb@suse.de>
* Monitor: track metadata type or parent/container of arrays.NeilBrown2010-11-221-0/+26
| | | | | | | | | | For subarrays, record the devid of the parent. For others arrays, record the metadata type. This will be used in a subsequent patch to link related arrays together and allow spare migration between containers. Signed-off-by: NeilBrown <neilb@suse.de>
* Monitor: include containers in scan modeAnna Czarnowska2010-11-221-3/+3
| | | | | Signed-off-by: Marcin Labun <marcin.labun@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
* Monitor: avoid skipping checks on external arraysNeilBrown2010-11-221-2/+3
| | | | | | | | utime is not correct for external metadata so we must not risk the observed time ever matching the old time. Reported-by: Anna Czarnowska <anna.czarnowska@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
* mdadm: added --no-sharing option for Monitor modeAnna Czarnowska2010-11-221-2/+42
| | | | | | | | | | | | --no-sharing option disables moving spares between arrays/containers. Without the option spares are moved if needed according to config rules. We only allow one process moving spares started with --scan option. If there is such process running and another instance of Monitor is starting without --scan, then we issue a warning but allow it to continue. Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
* Monitor: set err on arrays not in mdstatAnna Czarnowska2010-11-221-11/+12
| | | | | | | | | | | | | | | mse can be NULL when the array was not in mdstat when we read it but existed in statelist and was recreated after reading mdstat. In this case we set err as we can't get full update on this array this time. If the same array is given twice in command line it appears twice in statelist. The first one will mark mse->devnum=INT_MAX so the second one can't find mse. We set err on the second one as it's not needed. Also if it becomes degraded we would look for spares twice for the same array. Signed-off-by: Anna Czarnowska <anna.czarnowska@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
* Improve type names for mddev_devNeilBrown2010-11-221-3/+3
| | | | | | | | | Remove the _t pointer typedef and remove the _s suffix for the structure, These things do not help readability. Signed-off-by: NeilBrown <neilb@suse.de>
* Improve mddev_ident type definitions.NeilBrown2010-11-221-2/+2
| | | | | | | | Remove the _t typedef and remove the _s suffix from the struct name. These things do not help readability. Signed-off-by: NeilBrown <neilb@suse.de>
* Compile with -Wextra by defaultNeilBrown2010-08-051-7/+7
| | | | | | This produced lots of warning, some of which pointed to actual bugs. Signed-off-by: NeilBrown <neilb@suse.de>
* Add --test option to --re-add and similarNeilBrown2010-07-061-3/+3
| | | | | | | | | | --test can be given in Manage mode. This can be used when there is an attempt to fail or remove 'faulty', 'failed' or 'detached' devices, or to re-add 'missing' devices. If no devices were failed, removed, or re-added, then mdadm will exit with status '2'. Signed-off-by: NeilBrown <neilb@suse.de>
* Monitor: don't report the disappearance of a faulty device as SpareActive.NeilBrown2010-05-181-0/+1
| | | | | | | | | | | | Normally Monitor doesn't see faulty devices in active slots - they get moved away too quickly. But if it does, it reports the "faulty device disappeared" event (when it finally does get moved away) as SpareActive due to insufficient checking. So add a better check. Reported-by: Pierre Vignéras <pierre@vigneras.name>
* Monitor: add option to specify rebuild incrementsZdenek Behan2009-10-191-13/+13
| | | | | | | | | | ie. the percent increments after which RebuildNN event is generated This is particulary useful when using --program option, rather than (only) syslog for alerts. Signed-off-by: Zdenek Behan <rain@matfyz.cz> Signed-off-by: NeilBrown <neilb@suse.de>
* Monitor: use pclose rather than fcloseNeilBrown2009-07-101-1/+1
| | | | | | | | Using pclose is probably the right thing to do seeing that we used popen, but as there is no clear need to wait for sendmail to finish, it isn't really important. Signed-off-by: NeilBrown <neilb@suse.de>
* Merge branch 'master' into devel-3.0NeilBrown2009-06-021-5/+14
|\ | | | | | | | | | | Conflicts: super0.c super1.c
| * Monitor: support spare-group manipulation for 1.x metadata.NeilBrown2009-05-121-5/+14
| | | | | | | | | | | | | | | | | | The code for moving spares around a spare-group currently only works for 0.90 metadata. Generalise it for 1.x metadata as well. Reported-by: "Garth Snyder" <garth@grsweb.us> Signed-off-by NeilBrown <neilb@suse.de>
* | Move WaitClean from Monitor.c to sysfs.cNeilBrown2009-06-021-104/+0
| | | | | | | | | | | | That way mdmon doesn't need to include Monitor.o Signed-off-by: NeilBrown <neilb@suse.de>