1 files changed, 351 insertions, 0 deletions
diff --git a/src/third_party/wiredtiger/src/docs/devdoc-optrack.dox b/src/third_party/wiredtiger/src/docs/devdoc-optrack.dox
new file mode 100644
index 00000000000..360b3b6de47
--- /dev/null
+++ b/src/third_party/wiredtiger/src/docs/devdoc-optrack.dox
@@ -0,0 +1,351 @@
+/*! @page devdoc-optrack Operation Tracking
+
+# Overview
+
+This tutorial will walk you through using operation tracking on one of `wtperf`
+workloads: from preparing the workload for most effective data collection, to
+gathering, visualizing and interpreting the execution logs.
+
+## Why use operation tracking?
+
+Analysis of WiredTiger JIRA tickets revealed that in diagnosing performance
+bugs engineers rely mostly on information that shows which functions were
+executed over time and across threads. They seek to find correlations between
+performance anomalies, such as long latencies or periods of low throughput, and
+events on these timelines. The purpose of operation tracking is to visualize
+the execution so that these correlations could be easily spotted.
+
+## How does operation tracking work?
+
+Operation tracking allows you to put a pair of macros `WT_TRACK_OP_INIT` and
+`WT_TRACK_OP_END` into any function in WiredTiger whose invocation time you
+want to be measured and recorded. As your program runs, the `WT_TRACK_OP_INIT`
+macro will produce the function entry timestamp and  `WT_TRACK_OP_END` macro
+will produce the function exit timestamp. These timestamps, along with the
+function name, will be recorded in a set of log files. It is typical to put the
+`WT_TRACK_OP_INIT` in the beginning of the function and the  `WT_TRACK_OP_END`
+macro at every exit point from the function, but you can put them anywhere
+within the function body. The only condition required in order to correctly
+visualize the logs is that each `WT_TRACK_OP_INIT` has a matching
+`WT_TRACK_OP_END`. It is totally okay to insert tracking macros into nested
+functions.
+
+## Will operation tracking introduce runtime overhead?
+
+The answer depends on your workload, and how many functions you track. For
+`wtperf` suite the largest overhead of 30% was observed on the CPU-bound
+workload `small-btree`. For I/O-bound workloads, such as
+`evict-btree-stress-multi`, the overhead is negligible, unless you decided to
+track short frequently executed functions.
+
+# Prerequisites
+
+## Prepare your workload
+
+Modify your benchmark so that it produces the desired behavior with as little
+running time as possible. The longer the benchmark runs the more data it will
+produce, and the more difficult it will be for you to go over all this data.
+
+In our running example with the `evict-btree-stress-multi.wtperf` workload, we
+separate the database creation phase from the work phase. We first create the
+database with operation tracking disabled. Then we enable operation tracking
+and run the work phase. So operation tracking logs are generated during the
+work phase only. We also reduce the work phase to 30 seconds, to make sure that
+the volume of information is manageable.
+
+## Install the python modules
+
+To generate data visualizations, you will need the python modules bokeh,
+matplotlib and pandas installed. You can install them via the following
+commands (if you don't use pip modify for your python module manager):
+
+    pip install bokeh matplotlib pandas
+
+## Enable operation tracking
+To enable operation tracking, you need to add the following option to your
+connection configuration string.
+
+    operation_tracking=(enabled=true,path=.)
+
+For example, if you are running `wtperf`, you would add it to the `conn_config`
+string:
+
+    ./wtperf -c conn_config="operation_tracking=(enabled=true,path=.)"
+
+The "path" argument determines where the log files, produced by operation
+tracking, will be stored. Now you can run your workload.
+
+# Working with log files
+
+## Becoming familiar with the operation tracking log files
+
+After the run is complete, the path directory will have a bunch of files of the
+following format:
+
+    optrack.00000<pid>.000000000<sid>
+
+and the file
+
+    optrack-map.00000<pid>
+
+where `pid` is the process ID and `sid` is the session ID.
+
+The first kind of files are the actual log files. The second is the map file.
+Log files are generated in the binary format, because this is a lot more
+efficient than writing strings, and the map file is needed to decode them.
+
+## Processing the log files
+
+Now the binary log files need to be converted into the text format. To do that
+use the script `wt_optrack_decode.py` in the WiredTiger tree. We will refer to
+the path to your WiredTiger tree as WT. Suppose that the process ID that
+generated the operation tracking files is 25660. Then you'd run the decode
+script like so:
+
+    % WT/tools/optrack/wt_optrack_decode.py -m optrack-map.0000025660 optrack.0000025660.00000000*
+
+As the script runs you will see lots of output on the screen reporting the
+progress through the parsing process. One kind of output you might see is
+something like this:
+
+    TSC_NSEC ratio parsed: 2.8000
+
+This refers to the clock cycles to nanoseconds ratio encoded into the log
+files.  In the example above, we had 2.8 ticks per nanosecond, suggesting that
+the workload was run on a 2.8 GHz processor. This ratio is used later to
+convert the timestamps, which are recorded in clock cycles for efficiency, to
+nanoseconds.
+
+In the end, you should see lots of output files whose names look like the
+optrack log files, but with the suffixes `-internal.txt` and `-external.txt`.
+The "internal" files are the log files for WT internal sessions (such as
+eviction threads). The "external" files are for the sessions created by the
+client application.
+
+Now that the files have been decoded, it's time to visualize the data. This can
+be done using the script find-latency-spikes.py located in the `tools/optrack`
+directory of the WiredTiger tree. To process your text files from our example,
+you would run this script like so:
+
+    % WT/tools/optrack/find-latency-spikes.py optrack.0000025660.00000000*.txt
+
+As the script runs, you will probably see messages similar to this one:
+
+    Processing file optrack.0000025660.0000000026-external.txt
+    Your data may have errors. Check the file .optrack.0000025660.0000000026-external.txt.log for details.
+
+This means that as the script was processing the log file, it found some
+inconsistencies, for example there was a function entry timestamp, but no
+corresponding function exit, or vice versa. Sometimes these errors are benign.
+For example, the log file will never have an exit timestamp from a
+session-closing function, because operation tracking is torn down as the
+session closes. In other cases, an error may indicate that there is a mistake
+in how the operation-tracking macros were inserted. For example, the programmer
+may have forgotten to insert a macro where there is an early return from a
+function.  Examining the log file mentioned in the error message will give you
+more details on the error.
+
+The data processing script will not fail or exit if it encounters these error
+messages, it will proceed despite them, attempting to visualize the portion of
+the log that did not contain the errors. The function timestamps implicated in
+the errors will simply be dropped.
+
+At the very end of the processing, you will see messages like this:
+
+    Normalizing data...
+    Generating timeline charts... 99% complete
+    Generating outlier histograms... 100% complete
+
+This is how you know the processing has completed. The script takes a while to
+run, but it is parallelizable, so that is something that can be done in the
+future to speed it up.
+
+# Looking at the data: Part 1
+
+In the directory where you ran the data processing script, you will have the
+file `WT-outliers.html` and the directory `BUCKET-FILES` to which that HTML
+file refers. If you would like to look at the visualization on another machine,
+transfer both the HTML file and the directory to that machine. Otherwise, just
+open `WT-outliers.html` locally in your browser.
+
+To get an idea of what kind of output you will see, take a look at [this
+example data](http://www.ece.ubc.ca/~sasha/TUTORIAL-DEMO/) generated by the
+workload described earlier in this tutorial. Please open the URL, because we
+will now walk over what it shows.
+
+The main page shows a number of outlier histograms. Each histogram corresponds
+to a function that was tracked by the operation tracking system in WT. The
+x-axis shows the execution timeline (in nanoseconds). The y-axis shows how many
+abnormally long executions of the function occurred during very period of the
+execution.
+
+![](http://www.ece.ubc.ca/~sasha/TUTORIAL-DEMO-CONFIG/IMG/outlier_histograms.png)
+
+You can click on outlier bars and look at the detailed visualization of the
+period during which the abnormally long function invocations occurred. But
+before we do that, let's talk about what is considered an "outlier" or an
+"abnormally long execution" and how to control these thresholds. As annotations
+on the histograms show, by default a function invocation is considered an
+outlier if its duration is longer than two standard deviations above average.
+(The average is computed for this function over the entire execution). If you
+would like a more precise control of outlier thresholds, you can do that in a
+configuration file that is read by the execution script.
+
+Before we talk about other visualization screens, let's actually generate a
+visualization with some more meaningful outlier thresholds than two standard
+deviations. This way, it will be easier to navigate the data, and we will be
+able to observe some very interesting behavior that gives us a clue as to why
+eviction is sometimes slow. (If you can't wait to find out, skip over the next
+step.)
+
+## Getting a more precise control of the outlier thresholds
+
+Let's learn how to configure custom outlier thresholds. The easiest way to
+begin is to grab a sample configuration file and edit it to your needs:
+
+    % cp WT/tools/optrack/sample.config .
+
+Here is how that file looks:
+
+    # Units in which timestamps are recorded
+    1000000000
+    # Outlier threshold for individual functions
+    __curfile_reset 100 ms
+    __curfile_search 50 ms
+    __curfile_update 3 ms
+    __evict_lru_pages 75 ms
+    __evict_lru_walk 20 ms
+    __evict_page 75 ms
+    __evict_walk 20 ms
+    __wt_cache_eviction_worker 10 ms
+    __wt_cond_wait_signal 5 s
+    __wt_cursor_set_valuev 10 ms
+
+In that file all lines prefixed with a # are comments.
+
+The first non-comment line specifies the units in which your timestamps are
+recorded. The units are specified by providing a value indicating how many time
+units there are in a second. By default operation scripts generate timestamps
+in nanoseconds, hence you see a value of 1000000000 on the first line in our
+sample file. If your time units were, say, milliseconds, you'd change that
+value to 1000, and so on.
+
+The remaining non-comment lines in the file provide the outlier thresholds for
+any functions you care to have a custom threshold. The format is:
+
+    <function name> <value> <unit>
+
+For example, to specify a threshold of 15 milliseconds for the
+`__wt_cache_eviction_worker` function, you'd insert the following line:
+
+    __wt_cache_eviction_worker 15 ms
+
+Valid units are nanoseconds (ns), microseconds (us), milliseconds (ms) and
+seconds (s).
+
+If you don't specify a threshold for a function the default of two standard
+deviations will be used. Let's actually configure some more meaningful outlier
+thresholds in our `sample.config` file and re-run the visualization script. Here
+is the modified `sample.config`:
+
+    # Units in which timestamps are recorded
+    1000000000
+    # Outlier threshold for individual functions
+    __curfile_search 30 ms
+    __evict_lru_pages 250 ms
+    __evict_lru_walk 1 ms
+    __evict_page 15 ms
+    __evict_walk 1 ms
+    __wt_cache_eviction_worker 15 ms
+    __wt_cond_wait_signal 10 s
+
+We rerun the data processing script with the configuration file argument, like
+so:
+
+    % WT/tools/optrack/find-latency-spikes.py -c sample.config optrack.0000025660.00000000*.txt
+
+The new charts can be accessed
+[here](http://www.ece.ubc.ca/~sasha/TUTORIAL-DEMO-CONFIG/).
+
+# Looking at the data: Part 2
+
+Let's examine the visualized data in more detail. Please click on [this
+URL](http://www.ece.ubc.ca/~sasha/TUTORIAL-DEMO-CONFIG/):
+
+Although I insert the screenshots here, you will have more fun if you access it
+directly.
+
+Now you can see the outlier charts complying with the configuration parameters
+we supplied. For example, on the chart for the `__wt_cache_eviction_worker`, we
+see only the intervals where that function took longer than 15 ms to complete.
+
+![](http://www.ece.ubc.ca/~sasha/TUTORIAL-DEMO-CONFIG/IMG/wt_cache_eviction_worker_outliers.png)
+
+Let's click on one of those intervals to examine what happened there. I am
+going to click on the tallest bar in the chart, which will take me to a
+detailed output of function call activity by thread for interval 137 at [this
+URL](http://www.ece.ubc.ca/~sasha/TUTORIAL-DEMO-CONFIG/BUCKET-FILES/bucket-137.html).
+
+There is a lot of graphics there, so the URL might take a few seconds to load.
+
+At the very top of the page you see a small navigation bar. The bar tells you
+where you are in the execution (the red marker). By clicking on other intervals
+within the bar you can navigate to other intervals. For example, if you wanted
+to look at the execution interval located after the current one, you will just
+click on the white bar following the current, red-highlighted, bar.
+
+![](http://www.ece.ubc.ca/~sasha/TUTORIAL-DEMO-CONFIG/IMG/interval_137_nav_bar.png)
+
+Next you see a legend that shows all functions that were called during this
+execution interval and their corresponding colours.
+
+![](http://www.ece.ubc.ca/~sasha/TUTORIAL-DEMO-CONFIG/IMG/interval_137_lgnd.png)
+
+Then you will see the most interesting information: function calls across
+threads that occurred during this period. Durations and hierarchy of function
+calls is preserved, meaning that longer functions will be shown with longer
+bars and if function A called function B, B will be shown on top of A.
+
+By hovering over the colored bars corresponding to functions, you will see a
+box appear telling you the name of the function, how long it took and its
+original timestamp in your operation tracking log file. This way, if you are
+suspicious about the correctness of your log or want to go back to it for some
+reason, you know what to look for.
+
+In our example visualization, if we scroll down to the function sequences for
+external threads, we will quickly see a few instances where
+__wt_cache_eviction_worker took longer than 15 ms, for example here:
+
+![](http://www.ece.ubc.ca/~sasha/TUTORIAL-DEMO-CONFIG/IMG/wt_cache_eviction_worker_over_15ms.png)
+
+As we can see, the threads are simply waiting on the condition variable inside
+the eviction worker. To try and understand why, we might want to scroll up and
+look at the activity of internal threads during this period. Interestingly
+enough, most of the internal threads are also waiting on the condition variable
+during this period.
+
+![](http://www.ece.ubc.ca/~sasha/TUTORIAL-DEMO-CONFIG/IMG/interval_137_threads_waiting.png)
+
+Looking at the internal thread with session id 1 shows something suspicious.
+
+![](http://www.ece.ubc.ca/~sasha/TUTORIAL-DEMO-CONFIG/IMG/session_1_dead_period.png)
+
+During this period where all threads are waiting, this thread shows no activity
+at all, whereas prior and after that "dead" period it was making regular calls
+to `__evict_walk`. Perhaps that thread was unscheduled or stopped calling
+__evict_walk for other reasons. Perhaps other threads were dependent on work
+performed within `__evict_walk` to make forward progress.
+
+To better understand why `__evict_walk` was interrupted for almost 10ms we
+might want to go back and insert operation tracking macros inside the functions
+called by `__evict_walk`, to see if the thread was doing some other work during
+this time or was simply unscheduled.
+
+# Summary
+
+Operation tracking is great for debugging performance problems where you are
+interested in understanding the activity of the workload over time and across
+threads. To effectively navigate large quantities of data it is best to
+reproduce your performance problem in a short run.
+
+*/