osd: flush peering queue (consume maps) prior to boot

If the osd itself is behind on many maps during boot, it will get more and (as part of that) flush the peering wq to ensure the pgs consume them. However, it is possible for OSD to have latest/recnet maps, but pgs to be behind, and to jump directly to boot and join. The OSD is then laggy and unresponsive because the peering wq is way behind. To avoid this, call consume_map() (kick the peering wq) at the end of init and flush it to ensure we are *internally* all caught up before we consider joining the cluster. I'm pretty sure this is the root cause of #3905 and possibly #3995. Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com> Reviewed-by: Samuel Just <sam.just@inktank.com>
author: Sage Weil <sage@inktank.com> 2013-02-07 10:21:49 -0800
committer: Sage Weil <sage@inktank.com> 2013-02-07 10:21:49 -0800
commit: af95d934b039d65d3667fc022e2ecaebba107b01 (patch)
tree: 7047bfff55a279013784fda698aae2f832ee929b
parent: 27fb0e63053a581b67a79718876e89fea0026d7a (diff)
download: ceph-af95d934b039d65d3667fc022e2ecaebba107b01.tar.gz
1 files changed, 5 insertions, 0 deletions
diff --git a/src/osd/OSD.cc b/src/osd/OSD.cc
index 6247c40a4c5..1bd27de323a 100644
--- a/src/osd/OSD.cc
+++ b/src/osd/OSD.cc
@@ -1032,6 +1032,11 @@ int OSD::init()
 
   osd_lock.Lock();
 
+  dout(10) << "ensuring pgs have consumed prior maps" << dendl;
+  consume_map();
+  peering_wq.drain();
+
+  dout(10) << "done with init, starting boot process" << dendl;
   state = STATE_BOOTING;
   start_boot();
author	Sage Weil <sage@inktank.com>	2013-02-07 10:21:49 -0800
committer	Sage Weil <sage@inktank.com>	2013-02-07 10:21:49 -0800
commit	af95d934b039d65d3667fc022e2ecaebba107b01 (patch)
tree	7047bfff55a279013784fda698aae2f832ee929b
parent	27fb0e63053a581b67a79718876e89fea0026d7a (diff)
download	ceph-af95d934b039d65d3667fc022e2ecaebba107b01.tar.gz