summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorJames Socol <me@jamessocol.com>2018-08-21 11:37:27 -0400
committerJames Socol <me@jamessocol.com>2018-08-21 11:37:27 -0400
commit2ee0f5a0b01e344758a1caab07971534ce3718bb (patch)
treec72031e037ae0541f5243f2b3e4b872809bbe80a
parent5235d482d56f3a5baae4e9fbf4d34746763c4ef3 (diff)
parentdb576ab5f5a13b46e990f0fbd383e5b0af96da42 (diff)
downloadpystatsd-2ee0f5a0b01e344758a1caab07971534ce3718bb.tar.gz
Merge branch 'document-tags'
-rw-r--r--docs/tags.rst93
1 files changed, 93 insertions, 0 deletions
diff --git a/docs/tags.rst b/docs/tags.rst
new file mode 100644
index 0000000..88f1c6f
--- /dev/null
+++ b/docs/tags.rst
@@ -0,0 +1,93 @@
+.. _tags-chapter:
+
+============================
+Unsupported: Tagging Metrics
+============================
+
+Tagged metrics—such as those used by Datadog_ and `Telegraf`_—are
+explicitly outside the scope of this library. Alternatives_ exist and
+are recommended. This document lays out the reasons to avoid support for
+tags.
+
+
+Aggregating and Disaggregating Metrics
+======================================
+
+Given a simple metric, like a :ref:`counter <counter-type>` or
+:ref:`timer-type`, the very first operation StatsD will perform is an
+aggregation over time. For example, over a 30-second window, calculate
+the total number of events (a counter) or several aggregations like
+average, median, 90th percentile (a timer).
+
+A very common next step is for users to want to perform additional
+aggregations. For example, if we're timing a ``/widgets`` API endpoint
+for both ``GET`` and ``POST`` requests, we might want to know the median
+time across both HTTP methods.
+
+*Without* tags, we must start with the most disaggregated metrics,
+e.g.::
+
+ statsd.timing('api.widgets.GET', response_time)
+ statsd.timing('api.widgets.POST', response_time)
+
+We can then *aggregate* these metrics with wildcards (e.g. in
+Graphite)::
+
+ weightedAverage(api.widgets.*.mean, api.widgets.*.count)
+
+However, *with* tags, we have an alternative approach: to use a single,
+aggregated metric name, and *disaggregate* via tags, e.g.::
+
+ statsd.timing('api.widgets', response_time, {'method': 'GET'})
+ statsd.timing('api.widgets', response_time, {'method': 'POST'})
+
+By default, queries for the ``api.widgets`` timer will include all
+requests, but may be filtered to specific subsets with tags (e.g. in
+Datadog)::
+
+ api.widgets.mean{method:GET}
+
+
+Naming Metrics
+==============
+
+The examples above demonstrate that there is a fundamental change in how
+metrics must be named, particularly in the absence of tags, to avoid
+data loss. If tags are not supported, there is no way to disaggregate
+``api.widgets`` into its ``GET`` and ``POST`` subsets.
+
+Thus, it is incredibly important that an application be written with
+specific metrics capabilities in mind. If using a metrics system that
+does not support tags, like StatsD_ or StatsDaemon_, metric names must
+be disaggregated by default. If using a system that *does* support tags,
+like Datadog or Telegraf, metric names may be aggregated by default.
+
+If an application is expecting tags to work but they are not supported
+by the underlying metrics system, the best case scenario is a loss of
+data resolution. The worst case scenario is a complete loss of data, if
+the metrics system is incapable of correctly parsing the extended metric
+data.
+
+
+Explicit Opt-in
+===============
+
+Given that the best case scenario for a mismatch of application and
+metrics system is a form of data loss, the choice to use metrics with
+tags must be incredibly explicit.
+
+Technically, this library is capable of sending metrics to Datadog_ and
+Telegraf_, as well as StatsD_. However, to take advantage of these,
+you'll need to change your strategy for naming—and tagging—metrics.
+
+To avoid silently failing, this library forces you to make an explicit
+change to how you send metrics to these systems. At a minimum, you must
+touch every file that has ``import statsd``, but that's not really
+enough: you need to touch every metrics call.
+
+
+.. _Datadog: https://www.datadoghq.com/
+.. _Telegraf: https://github.com/influxdata/telegraf
+.. _Alternatives: https://pypi.org/project/statsd-tags/
+.. _StatsD: https://github.com/etsy/statsd
+.. _StatsDaemon: https://github.com/bitly/statsdaemon