stun: update timer timeout and retransmissions

This patch updates the stun timing constants and provides the rationale with the choice of these new values, in the context of the ice connection check algorithm. One important value during the discovery state is the combination of the initial timeout and the number of retransmissions, because this state may complete after the last stun discovery binding request has timed out. With the combination of 500ms and 3 retransmissions, the discovery state is bound to 2000ms to discover server reflexive and relay candidates. The retransmission delay doubles at each retransmission except for the last one. Generally, this state will complete sooner, when all discovery requests get a reply before the timeout. Another mechanism is used during the connection check, where an stun request is sent with an initial timeout defined by : RTO = MAX(500ms, Ta * (number of in-progress + waiting pairs)) with Ta = 20ms The initial timeout is bounded by a minimum value, 500ms, and scales linearly depending of the number of pairs on the way to be emited. The same number of retransmissions than in the discovery state in used during the connection check. The total time to wait for a pair to fail is then RTO + 2*RTO + RTO = 4*RTO with 3 retransmissions. On a typical laptop setup, with a wired and a wifi interface with IPv4/IPv6 dual stack, a link-local and a link-global IPv6 address, a couple a virtual addresses, a server-reflexive address, a turn relay one, we end up with a total of 90 local candidates for 2 streams and 2 components each. The connection checks list includes up to 200 pairs when tcp pairs are discarded, with : <33 in-progress and waiting pairs in 50% cases (RTO = 660ms), <55 in-progress and waiting pairs in 90% cases (RTO = 1100ms), and up to 86 in-progres and waiting pairs (RTO = 1720ms) The number of retransmission of 3 seems to be quite robust to handle sporadic packets loss, if we consider for example a typical packet loss frequency of 1% of the overall packets transmitted. And a relatevely large initial timeout is interesting because it reduces the overall network overhead caused by the stun requests and replies, mesured around 3KB/s during a connection check with 4 components. Finally, the total time to wait until all retransmissions have completed and have timed out (2000ms with an initial timeout of 500ms and 3 retransmissions) gives a bound to the worst network latency we can accept, when no packet is lost on the wire.
author: Fabrice Bellet <fabrice@bellet.info> 2020-04-14 17:25:24 +0200
committer: Olivier Crête <olivier.crete@ocrete.ca> 2020-05-07 23:42:48 +0000
commit: 010ecd50341b0f861f9e395f552cee94bab79825 (patch)
tree: b84cb1b61c14ad75c43da8c512003c793f5d7c1b
parent: e9cbb3dacb382c4ef14e7b2288b05f57ca26faad (diff)
download: libnice-010ecd50341b0f861f9e395f552cee94bab79825.tar.gz
2 files changed, 24 insertions, 17 deletions
diff --git a/agent/conncheck.c b/agent/conncheck.c
index c9b79fe..fc4b3eb 100644
--- a/agent/conncheck.c
+++ b/agent/conncheck.c
@@ -2762,13 +2762,10 @@ static unsigned int priv_compute_conncheck_timer (NiceAgent *agent, NiceStream *
 
   rto = agent->timer_ta  * waiting_and_in_progress;
 
-  /* RFC8445 indicates that the min rto value should be 500ms, but
-   * we prefer a lower value of 100ms, which should be overriden
-   * most of the time, when a significant number of pairs are handled.
-   */
   nice_debug ("Agent %p : timer set to %dms, "
-    "waiting+in_progress=%d", agent, MAX (rto, 100), waiting_and_in_progress);
-  return MAX (rto, 100);
+    "waiting+in_progress=%d", agent, MAX (rto, STUN_TIMER_DEFAULT_TIMEOUT),
+    waiting_and_in_progress);
+  return MAX (rto, STUN_TIMER_DEFAULT_TIMEOUT);
 }
 
 /*
diff --git a/stun/usages/timer.h b/stun/usages/timer.h
index 097e75b..17f3669 100644
--- a/stun/usages/timer.h
+++ b/stun/usages/timer.h
@@ -130,29 +130,39 @@ struct stun_timer_s {
  * STUN_TIMER_DEFAULT_TIMEOUT:
  *
  * The default intial timeout to use for the timer
- * RFC recommendds 500, but it's ridiculous, 50ms is known to work in most
- * cases as it is also what is used by SIP style VoIP when sending A-Law and
- * mu-Law audio, so 200ms should be hyper safe. With an initial timeout
- * of 200ms, a default of 7 transmissions, the last timeout will be
- * 16 * 200ms, and we expect to receive a response from the stun server
- * before (1 + 2 + 4 + 8 + 16 + 32 + 16) * 200ms = 15200 ms after the initial
- * stun request has been sent.
+ * This timeout is used for discovering server reflexive and relay
+ * candidates, and also for keepalives, and turn refreshes.
+ *
+ * This value is important because it defines how much time will be
+ * required to discover our local candidates, and this is an
+ * uncompressible delay before the agent signals that candidates
+ * gathering is done.
+ *
+ * The overall delay required for the discovery stun requests is
+ * computed as follow, with 3 retransmissions and an initial delay
+ * of 500ms :  500 * ( 1 + 2 + 1 ) = 2000 ms
+ * The timeout doubles at each retransmission, except for the last one.
  */
-#define STUN_TIMER_DEFAULT_TIMEOUT 200
+#define STUN_TIMER_DEFAULT_TIMEOUT 500
 
 /**
  * STUN_TIMER_DEFAULT_MAX_RETRANSMISSIONS:
  *
- * The default maximum retransmissions allowed before a timer decides to timeout
+ * The default maximum retransmissions before declaring that the
+ * transaction timed out.
  */
-#define STUN_TIMER_DEFAULT_MAX_RETRANSMISSIONS 7
+#define STUN_TIMER_DEFAULT_MAX_RETRANSMISSIONS 3
 
 /**
  * STUN_TIMER_DEFAULT_RELIABLE_TIMEOUT:
  *
  * The default intial timeout to use for a reliable timer
+ *
+ * The idea with this value is that stun request sent over udp or tcp
+ * should fail at the same time, with an initial default timeout set
+ * to 500ms.
  */
-#define STUN_TIMER_DEFAULT_RELIABLE_TIMEOUT 7900
+#define STUN_TIMER_DEFAULT_RELIABLE_TIMEOUT 2000
 
 /**
  * StunUsageTimerReturn:
author	Fabrice Bellet <fabrice@bellet.info>	2020-04-14 17:25:24 +0200
committer	Olivier Crête <olivier.crete@ocrete.ca>	2020-05-07 23:42:48 +0000
commit	010ecd50341b0f861f9e395f552cee94bab79825 (patch)
tree	b84cb1b61c14ad75c43da8c512003c793f5d7c1b
parent	e9cbb3dacb382c4ef14e7b2288b05f57ca26faad (diff)
download	libnice-010ecd50341b0f861f9e395f552cee94bab79825.tar.gz