delta/ceph.git - github.com: ceph/ceph.git

	Commit message (Collapse)	Author	Age	Files	Lines
*	osdc/Objecter: clean up reduncant op assignmentswip-osd-alloc	Sage Weil	2012-12-07	1	-8/+0
\| \| \| \| \| \|	add_op() sets the op code; the caller doesn't need to do it again. Signed-off-by: Sage Weil <sage@inktank.com>
*	osd: implement prealloc/fallocate object operation	Sage Weil	2012-12-07	7	-0/+97
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Implement a rados PREALLOC method that will call fallocate(2) to allocate disk blocks for a while, but not write to them. We choose the semantics that modify the file size so that the exposed object metadata will be less confusing. e.e.g, prealloc to 4MB will result in a 4MB object full of zeros (or whatever data was prevoiusly written). Include flags for only doing prealloc on object creation, and for only doing prealloc on an existing object. Signed-off-by: Sage Weil <sage@inktank.com>
*	rgw: document admin api web interface.	caleb miles	2012-12-07	1	-3/+1504
\| \| \| \|	Signed-off-by: caleb miles <caleb.miles@inktank.com>
*	doc/install/os-recommendations: fix syncfs notes	Sage Weil	2012-12-07	1	-10/+11
\| \| \| \| \| \| \| \| \|	For argonaut, squeeze and wheezy lack syncfs. For bobtail, only older kernels are problematic; we don't depend on glibc support. Signed-off-by: Sage Weil <sage@inktank.com>
*	doc: fix bobtail version in os-recommendations	Sage Weil	2012-12-07	1	-1/+1
\| \| \| \|	Signed-off-by: Sage Weil <sage@inktank.com>
*	Merge remote-tracking branch 'gh/wip_doc'	Sage Weil	2012-12-07	2	-16/+16
\|\
\| *	doc: write descriptions for the remaining msgr options	Greg Farnum	2012-12-04	1	-7/+7
\| \| \| \| \| \| \| \|	Signed-off-by: Greg Farnum <greg@inktank.com>
\| *	doc: added some descriptions in ms-ref and filestore-config-ref	Samuel Just	2012-12-04	2	-9/+9
\| \| \| \| \| \| \| \|	Signed-off-by: Samuel Just <sam.just@inktank.com>
* \|	doc: Change per doc request.	John Wilkins	2012-12-06	1	-2/+2
\| \| \| \| \| \| \| \|	Signed-off-by: John Wilkins <john.wilkins@inktank.com>
* \|	Merge branch 'next'	Dan Mick	2012-12-05	2	-6/+12
\|\ \
\| * \	Merge branch 'testing' into next	Dan Mick	2012-12-05	2	-6/+12
\| \|\ \
\| \| * \|	rbd: update manpage for import/export	Dan Mick	2012-12-05	2	-6/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Dan Mick <dan.mick@inktank.com>
\| \| * \|	librbd: hold AioCompletion lock while modifying global state	Dan Mick	2012-12-05	1	-4/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	C_AioRead::finish needs to add in each chunk of a partial read request to the 'partial' map in the AioCompletion's state (in destriper, of type StripedReadResult). That map is global and must be protected from simultaneous access. Use the AioCompletion lock; could create a separate lock if contention is an issue. Fixes: #3567 Signed-off-by: Dan Mick <dan.mick@inktank.com> (cherry picked from commit a55700cc0aea0ff79e55c6bf78e9757b81fe9425)
\| \| * \|	librbd: handle parent change while async I/Os are in flight	Dan Mick	2012-12-05	1	-6/+38
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	During a test_librbd_fsx run including flatten, ImageCtx->parent was being dereferenced while null. Between the time the parent overlap is calculated and the time the guard+write completes with ENOENT and submits the copyup+write, the parent image could have changed (by resize) or been made irrelevant (by child flatten) such that the parent overlap is now incorrect. Handle "no parent" by just sending the copyup+write; the copyup part will be a no-op. Move to WRITE_FLAT state in this case because there's no more child to deal with. Handle "overlap changed" by recalculating overlap before reading parent data; if none is left, don't read, but rather just clear m_object_image_extents, in which case the copyup will again be a no-op because it will be of zero length. However we still have a parent, so stay in WRITE_COPYUP state and come back through as usual. Signed-off-by: Dan Mick <dan.mick@inktank.com> Fixes: #3524 (cherry picked from commit 41e16a3b40efb80a5ed7a5587438569ca86c85a3)
\| \| * \|	Striper: use local variable inside if() that tested it	Dan Mick	2012-12-05	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Dan Mick <dan.mick@inktank.com> (cherry picked from commit 917a6f296323164f9d79df94916932722e66fc0a)
* \| \| \|	Merge branch 'next'	Dan Mick	2012-12-05	3	-11/+44
\|\ \ \ \ \| \|/ / / \| \| \| \| \| \| \| \|	Pull in fixes for 3567 and 3524
\| * \| \|	librbd: hold AioCompletion lock while modifying global state	Dan Mick	2012-12-05	1	-4/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	C_AioRead::finish needs to add in each chunk of a partial read request to the 'partial' map in the AioCompletion's state (in destriper, of type StripedReadResult). That map is global and must be protected from simultaneous access. Use the AioCompletion lock; could create a separate lock if contention is an issue. Fixes: #3567 Signed-off-by: Dan Mick <dan.mick@inktank.com>
\| * \| \|	librbd: handle parent change while async I/Os are in flight	Dan Mick	2012-12-05	1	-6/+38
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	During a test_librbd_fsx run including flatten, ImageCtx->parent was being dereferenced while null. Between the time the parent overlap is calculated and the time the guard+write completes with ENOENT and submits the copyup+write, the parent image could have changed (by resize) or been made irrelevant (by child flatten) such that the parent overlap is now incorrect. Handle "no parent" by just sending the copyup+write; the copyup part will be a no-op. Move to WRITE_FLAT state in this case because there's no more child to deal with. Handle "overlap changed" by recalculating overlap before reading parent data; if none is left, don't read, but rather just clear m_object_image_extents, in which case the copyup will again be a no-op because it will be of zero length. However we still have a parent, so stay in WRITE_COPYUP state and come back through as usual. Signed-off-by: Dan Mick <dan.mick@inktank.com> Fixes: #3524
\| * \| \|	Striper: use local variable inside if() that tested it	Dan Mick	2012-12-05	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Dan Mick <dan.mick@inktank.com>
* \| \| \|	Merge branch 'next'	Josh Durgin	2012-12-05	24	-140/+405
\|\ \ \ \ \| \|/ / /
\| * \| \|	qa: add script for running xfstests in a vm	Josh Durgin	2012-12-05	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
\| * \| \|	OSD: ignore queries on now deleted pools	Samuel Just	2012-12-05	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Samuel Just <sam.just@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
\| * \| \|	Merge remote-tracking branch 'origin/wip-mds' into next	Greg Farnum	2012-12-04	13	-76/+209
\| \|\ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Reviewed-by: Sage Weil <sage@inktank.com> Reviewed-by: Greg Farnum <greg@inktank.com>
\| \| * \| \|	mds: journal remote inode's projected parent	Yan, Zheng	2012-12-04	1	-7/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Server::_rename_prepare() adds remote inode's parent instead of projected parent to the journal. So during journal replay, the journal entry for the rename operation will wrongly revert the remote inode's projected rename. This issue can be reproduced by: touch file1 ln file1 file2 rm file1 mv file2 file3 After journal replay, file1 reappears and directory's fragstat gets corrupted. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
\| \| * \| \|	mds: don't create bloom filter for incomplete dir	Yan, Zheng	2012-12-04	2	-6/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Creating bloom filter for incomplete dir that was added by log replay will confuse subsequent dir lookup and can create null dentry for existing file. The erroneous null dentry confuses the fragstat accounting and causes undeletable empty directory. The fix is check if the dir is complete before creating the bloom filter. For the MDCache::trim_non_auth{,_subtree} cases, just do not call CDir::add_to_bloom because bloom filter is useless for replica. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
\| \| * \| \|	Merge remote-tracking branch 'gh/wip-mds' into next	Sage Weil	2012-12-04	13	-76/+209
\| \| \|\ \ \
\| \| \| * \| \|	mds: fix freeze inode deadlock	Yan, Zheng	2012-12-01	9	-19/+124
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	CInode::freeze_inode() is used in the case of cross authority rename. Server::handle_slave_rename_prep() calls it to wait for all other operations on source inode to complete. This happens after all locks for the rename operation are acquired. But to acquire locks, we need auth pin locks' parent objects first. So there is an ABBA deadlock if someone auth pins the source inode after locks for rename are acquired and before Server::handle_slave_rename_prep() is called. The fix is freeze and auth pin the source inode at the same time. This patch introduces CInode::freeze_auth_pin(), it waits for all other MDRequests to release auth pins, then change the inode to FROZENAUTHPIN state, this state prevents other MDRequests from getting new auth pins. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
\| \| \| * \| \|	mds: use rdlock_try() when checking NULL dentry	Yan, Zheng	2012-12-01	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Use rdlock_try() instead can_read() when path_traverse encounters a NULL dentry. This can partly avoid infinitely waiting for the dentry to become readable when the dentry is replica. Strictly speaking, use rdlock_try() is still enough because auth MDS may drop the REQRDLOCK message in some cases. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
\| \| \| * \| \|	mds: allow open_remote_ino() to open xlocked dentry	Yan, Zheng	2012-12-01	3	-27/+39
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	discover_ino() has a parameter want_xlocked. The parameter indicates if remote discover handler can proceed when xlocked dentry is encountered. open_remote_ino() uses discover_ino() to find non-auth inode, but always set 'want_xlocked' to false. This may cause dead lock in some corner cases. For example: we rename a inode's primary dentry to one of its remote dentry and send slave request to one witness MDS. but before the slave request reaches the witness MDS, the inode is trimmed from the witness MDS' cache. Then when the slave request arrives, open_remote_ino() will be called during traversing the destpath. open_remote_ino() calls discover_ino() with 'want_xlocled=false' to find the inode. discover_ino() sends MDiscover message to the inode's authority MDS. The handler of MDiscover message finds the inode's primary dentry is xlocked and it sleeps. The fix is add a parameter 'want_xlocked' to open_remote_ino() and make open_remote_ino() pass the parameter to discover_ino(). Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
\| \| \| * \| \|	mds: fix assertion in handle_cache_expire	Yan, Zheng	2012-12-01	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	During export, it's possible to get cache expire messages in DISCOVERING, FREEZING and PREPPING state. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
\| \| \| * \| \|	mds: fix open_remote_inode race	Yan, Zheng	2012-12-01	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	discover_ino() may return -ENOENT if it races with other FS activities. so use C_MDC_RetryOpenRemoteIno instead of C_MDC_OpenRemoteIno as onfinish callback. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
\| \| \| * \| \|	mds: consider revoking caps in imported caps as issued	Yan, Zheng	2012-12-01	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The clients may already send caps release message to the exporting MDS, so the importing MDS waits for the release message forever. consider revoking caps as issued can avoid this issue. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
\| \| \| * \| \|	mds: drop locks if requiring auth pinning new objects.	Yan, Zheng	2012-12-01	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Locker::acquire_locks() skip auth pinning replica object if we only request a rdlock and the lock is read-lockable. To get all locks, we may call Locker::acquire_locks() several times, locks in replca objects may become not read-lockable between calls. So it is possible we need auth pin new objects after already take some locks. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
\| \| \| * \| \|	mds: don't forward client request from MDS	Yan, Zheng	2012-12-01	1	-6/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Forwarding client request that was from MDS will trigger assertion in MDS::forward_message_mds(). MDS only send client requests for stray migration/reintegration, so it's safe to drop them. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
\| \| \| * \| \|	mds: call eval() after caps are exported	Yan, Zheng	2012-12-01	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For an inode just changed authority, if the new auth MDS want to change a lock in the inode from 'sync' to 'lock' state before caps are exported. The lock in replica can be in 'sync->lock' state because client caps prevent it from transitting to 'lock' state. So we should call eval() after clearing client caps. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
\| \| \| * \| \|	mds: clear lock flushed if replica is waiting for AC_LOCKFLUSHED	Yan, Zheng	2012-12-01	1	-1/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	So eval_gather() will not skip calling scatter_writebehind(), otherwise the replica lock may be in flushing state forever. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
\| \| \| * \| \|	mds: Don't acquire replica object's versionlock	Yan, Zheng	2012-12-01	2	-15/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Both CInode and CDentry's versionlocks are of type LocalLock. Acquiring LocalLock in replica object is useless and problematic. For example, if two requests try acquiring a replica object's versionlock, the first request succeeds, the second request is added to wait queue. Later when the first request finishes, MDCache::request_drop_foreign_locks() finds the lock's parent is non-auth, it skips waking requests in the wait queue. So the second request hangs. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
\| \| \| * \| \|	mds: allow try_eval to eval unstable locks in freezing object	Yan, Zheng	2012-12-01	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Unstable locks hold auth_pins on the object, it prevents the freezing object become frozen and then unfreeze. So try_eval() should not wait for freezing object Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
\| * \| \| \| \|	Merge branch 'wip-filestore' into next	Sage Weil	2012-12-04	3	-24/+30
\| \|\ \ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Reviewed-by: Sam Just <sam.just@inktank.com>
\| \| * \| \| \| \|	os/JournalingObjectStore: applied_seq -> max_applied_seq	Sage Weil	2012-12-02	2	-10/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Rename applied_seq to max_applied_seq, since it is a bound; there may be seq's < max_applied_seq that are not applied. This aligns the naming with max_applying_seq. Signed-off-by: Sage Weil <sage@inktank.com>
\| \| * \| \| \| \|	os/FileStore: only wait for applying ops to complete before commit	Sage Weil	2012-12-02	3	-19/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We can have a large number of operations in the op_wq waiting to be applied to the fs. Currently, when we want to commit, we want for them all to apply. This can take a very long time (the default queue length is 500 operations!). Instead, mark an Op as started ("applying") when the thread pool actually starts to apply it. At that point, only wait for applying ops to complete. We let any threads with an op seq < max_applying_seq begin as well so that we have a proper ordering/barrier. When those flush, applied_seq will == max_applying_seq, and that becomes the committing_seq value. Note that 'applied_seq' is still maintain, but serves no real purpose except to populate our asserts with sanity checks. max_applying_seq serves the purpose applied_seq used to. This removes once unnecessary source of latency associated with fs commits. Signed-off-by: Sage Weil <sage@inktank.com>
\| * \| \| \| \| \|	Merge branch 'wip-msgr-delay-queue' into next	Sage Weil	2012-12-04	3	-20/+133
\| \|\ \ \ \ \ \
\| \| * \| \| \| \| \|	msg/Pipe: flush delayed messages when stealing/failing pipes	Sage Weil	2012-12-01	2	-2/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If we are failing a pipe, flush the incoming messages before we try to reconnect. Similarly, flush queued messages on an existing pipe beore we replace it. This ensures that when we get a socket failure and reconnect the delayed messages are handled in the normal fashion. Specifically, it fixes a situation like: - read msg, update in_seq etc. - delay msg - pipe faults - peer reconnects, we replace existing pipe, discard delayed msgs - peer resends msgs - we discard, because they are < in_seq Signed-off-by: Sage Weil <sage@inktank.com>
\| \| * \| \| \| \| \|	msg/Pipe: release dispatch throttle on delayed queue discard	Sage Weil	2012-11-29	2	-8/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This avoids leaking into the throttle and deadlocking. Signed-off-by: Sage Weil <sage@inktank.com>
\| \| * \| \| \| \| \|	msg/Pipe: start delay thread after we know peer type	Sage Weil	2012-11-29	2	-4/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	At end of connect(), or end of accept(). Signed-off-by: Sage Weil <sage@inktank.com>
\| \| * \| \| \| \| \|	msg/Pipe: drop queue helpers	Sage Weil	2012-11-29	2	-35/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There is a single caller; these only obfuscate. Signed-off-by: Sage Weil <sage@inktank.com>
\| \| * \| \| \| \| \|	msg/Pipe: refactor msgr delays	Sage Weil	2012-11-29	2	-83/+73
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- move all delay state into a single class - create thread once and only once per Pipe - adjust debug levels - discard messages at the appropriate times Signed-off-by: Sage Weil <sage@inktank.com>
\| \| * \| \| \| \| \|	msgr: add a delay_until queue that is used to delay deliveries.	Greg Farnum	2012-11-29	2	-5/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Its life-cycle matches that of delay_queue, and the delayed_delivery function respects it. For now queue_received is just setting it to delay everything by 1 second. Signed-off-by: Greg Farnum <greg@inktank.com>
\| \| * \| \| \| \| \|	msgr: clear out the delay queue when stop()ing	Greg Farnum	2012-11-29	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	After some brief thought, I believe deleting any messages in the delay queue is correct -- we are trying to simulate line delays in delivery and so anything still in the queue has supposedly not arrived yet. So delete them when we stop the Pipe for any reason. Signed-off-by: Greg Farnum <greg@inktank.com>
\| \| * \| \| \| \| \|	msgr: move the delay queue initialization into start_reader	Greg Farnum	2012-11-29	1	-9/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The Pipe doesn't know the peer type in the constructor. It doesn't always know in start_reader either, so this needs more work, but at least it knows more frequently than it did. Signed-off-by: Greg Farnum <greg@inktank.com>