diff options
Diffstat (limited to 'src/third_party/wiredtiger/src/docs/devdoc-optrack.dox')
-rw-r--r-- | src/third_party/wiredtiger/src/docs/devdoc-optrack.dox | 351 |
1 files changed, 351 insertions, 0 deletions
diff --git a/src/third_party/wiredtiger/src/docs/devdoc-optrack.dox b/src/third_party/wiredtiger/src/docs/devdoc-optrack.dox new file mode 100644 index 00000000000..360b3b6de47 --- /dev/null +++ b/src/third_party/wiredtiger/src/docs/devdoc-optrack.dox @@ -0,0 +1,351 @@ +/*! @page devdoc-optrack Operation Tracking + +# Overview + +This tutorial will walk you through using operation tracking on one of `wtperf` +workloads: from preparing the workload for most effective data collection, to +gathering, visualizing and interpreting the execution logs. + +## Why use operation tracking? + +Analysis of WiredTiger JIRA tickets revealed that in diagnosing performance +bugs engineers rely mostly on information that shows which functions were +executed over time and across threads. They seek to find correlations between +performance anomalies, such as long latencies or periods of low throughput, and +events on these timelines. The purpose of operation tracking is to visualize +the execution so that these correlations could be easily spotted. + +## How does operation tracking work? + +Operation tracking allows you to put a pair of macros `WT_TRACK_OP_INIT` and +`WT_TRACK_OP_END` into any function in WiredTiger whose invocation time you +want to be measured and recorded. As your program runs, the `WT_TRACK_OP_INIT` +macro will produce the function entry timestamp and `WT_TRACK_OP_END` macro +will produce the function exit timestamp. These timestamps, along with the +function name, will be recorded in a set of log files. It is typical to put the +`WT_TRACK_OP_INIT` in the beginning of the function and the `WT_TRACK_OP_END` +macro at every exit point from the function, but you can put them anywhere +within the function body. The only condition required in order to correctly +visualize the logs is that each `WT_TRACK_OP_INIT` has a matching +`WT_TRACK_OP_END`. It is totally okay to insert tracking macros into nested +functions. + +## Will operation tracking introduce runtime overhead? + +The answer depends on your workload, and how many functions you track. For +`wtperf` suite the largest overhead of 30% was observed on the CPU-bound +workload `small-btree`. For I/O-bound workloads, such as +`evict-btree-stress-multi`, the overhead is negligible, unless you decided to +track short frequently executed functions. + +# Prerequisites + +## Prepare your workload + +Modify your benchmark so that it produces the desired behavior with as little +running time as possible. The longer the benchmark runs the more data it will +produce, and the more difficult it will be for you to go over all this data. + +In our running example with the `evict-btree-stress-multi.wtperf` workload, we +separate the database creation phase from the work phase. We first create the +database with operation tracking disabled. Then we enable operation tracking +and run the work phase. So operation tracking logs are generated during the +work phase only. We also reduce the work phase to 30 seconds, to make sure that +the volume of information is manageable. + +## Install the python modules + +To generate data visualizations, you will need the python modules bokeh, +matplotlib and pandas installed. You can install them via the following +commands (if you don't use pip modify for your python module manager): + + pip install bokeh matplotlib pandas + +## Enable operation tracking +To enable operation tracking, you need to add the following option to your +connection configuration string. + + operation_tracking=(enabled=true,path=.) + +For example, if you are running `wtperf`, you would add it to the `conn_config` +string: + + ./wtperf -c conn_config="operation_tracking=(enabled=true,path=.)" + +The "path" argument determines where the log files, produced by operation +tracking, will be stored. Now you can run your workload. + +# Working with log files + +## Becoming familiar with the operation tracking log files + +After the run is complete, the path directory will have a bunch of files of the +following format: + + optrack.00000<pid>.000000000<sid> + +and the file + + optrack-map.00000<pid> + +where `pid` is the process ID and `sid` is the session ID. + +The first kind of files are the actual log files. The second is the map file. +Log files are generated in the binary format, because this is a lot more +efficient than writing strings, and the map file is needed to decode them. + +## Processing the log files + +Now the binary log files need to be converted into the text format. To do that +use the script `wt_optrack_decode.py` in the WiredTiger tree. We will refer to +the path to your WiredTiger tree as WT. Suppose that the process ID that +generated the operation tracking files is 25660. Then you'd run the decode +script like so: + + % WT/tools/optrack/wt_optrack_decode.py -m optrack-map.0000025660 optrack.0000025660.00000000* + +As the script runs you will see lots of output on the screen reporting the +progress through the parsing process. One kind of output you might see is +something like this: + + TSC_NSEC ratio parsed: 2.8000 + +This refers to the clock cycles to nanoseconds ratio encoded into the log +files. In the example above, we had 2.8 ticks per nanosecond, suggesting that +the workload was run on a 2.8 GHz processor. This ratio is used later to +convert the timestamps, which are recorded in clock cycles for efficiency, to +nanoseconds. + +In the end, you should see lots of output files whose names look like the +optrack log files, but with the suffixes `-internal.txt` and `-external.txt`. +The "internal" files are the log files for WT internal sessions (such as +eviction threads). The "external" files are for the sessions created by the +client application. + +Now that the files have been decoded, it's time to visualize the data. This can +be done using the script find-latency-spikes.py located in the `tools/optrack` +directory of the WiredTiger tree. To process your text files from our example, +you would run this script like so: + + % WT/tools/optrack/find-latency-spikes.py optrack.0000025660.00000000*.txt + +As the script runs, you will probably see messages similar to this one: + + Processing file optrack.0000025660.0000000026-external.txt + Your data may have errors. Check the file .optrack.0000025660.0000000026-external.txt.log for details. + +This means that as the script was processing the log file, it found some +inconsistencies, for example there was a function entry timestamp, but no +corresponding function exit, or vice versa. Sometimes these errors are benign. +For example, the log file will never have an exit timestamp from a +session-closing function, because operation tracking is torn down as the +session closes. In other cases, an error may indicate that there is a mistake +in how the operation-tracking macros were inserted. For example, the programmer +may have forgotten to insert a macro where there is an early return from a +function. Examining the log file mentioned in the error message will give you +more details on the error. + +The data processing script will not fail or exit if it encounters these error +messages, it will proceed despite them, attempting to visualize the portion of +the log that did not contain the errors. The function timestamps implicated in +the errors will simply be dropped. + +At the very end of the processing, you will see messages like this: + + Normalizing data... + Generating timeline charts... 99% complete + Generating outlier histograms... 100% complete + +This is how you know the processing has completed. The script takes a while to +run, but it is parallelizable, so that is something that can be done in the +future to speed it up. + +# Looking at the data: Part 1 + +In the directory where you ran the data processing script, you will have the +file `WT-outliers.html` and the directory `BUCKET-FILES` to which that HTML +file refers. If you would like to look at the visualization on another machine, +transfer both the HTML file and the directory to that machine. Otherwise, just +open `WT-outliers.html` locally in your browser. + +To get an idea of what kind of output you will see, take a look at [this +example data](http://www.ece.ubc.ca/~sasha/TUTORIAL-DEMO/) generated by the +workload described earlier in this tutorial. Please open the URL, because we +will now walk over what it shows. + +The main page shows a number of outlier histograms. Each histogram corresponds +to a function that was tracked by the operation tracking system in WT. The +x-axis shows the execution timeline (in nanoseconds). The y-axis shows how many +abnormally long executions of the function occurred during very period of the +execution. + +![](http://www.ece.ubc.ca/~sasha/TUTORIAL-DEMO-CONFIG/IMG/outlier_histograms.png) + +You can click on outlier bars and look at the detailed visualization of the +period during which the abnormally long function invocations occurred. But +before we do that, let's talk about what is considered an "outlier" or an +"abnormally long execution" and how to control these thresholds. As annotations +on the histograms show, by default a function invocation is considered an +outlier if its duration is longer than two standard deviations above average. +(The average is computed for this function over the entire execution). If you +would like a more precise control of outlier thresholds, you can do that in a +configuration file that is read by the execution script. + +Before we talk about other visualization screens, let's actually generate a +visualization with some more meaningful outlier thresholds than two standard +deviations. This way, it will be easier to navigate the data, and we will be +able to observe some very interesting behavior that gives us a clue as to why +eviction is sometimes slow. (If you can't wait to find out, skip over the next +step.) + +## Getting a more precise control of the outlier thresholds + +Let's learn how to configure custom outlier thresholds. The easiest way to +begin is to grab a sample configuration file and edit it to your needs: + + % cp WT/tools/optrack/sample.config . + +Here is how that file looks: + + # Units in which timestamps are recorded + 1000000000 + # Outlier threshold for individual functions + __curfile_reset 100 ms + __curfile_search 50 ms + __curfile_update 3 ms + __evict_lru_pages 75 ms + __evict_lru_walk 20 ms + __evict_page 75 ms + __evict_walk 20 ms + __wt_cache_eviction_worker 10 ms + __wt_cond_wait_signal 5 s + __wt_cursor_set_valuev 10 ms + +In that file all lines prefixed with a # are comments. + +The first non-comment line specifies the units in which your timestamps are +recorded. The units are specified by providing a value indicating how many time +units there are in a second. By default operation scripts generate timestamps +in nanoseconds, hence you see a value of 1000000000 on the first line in our +sample file. If your time units were, say, milliseconds, you'd change that +value to 1000, and so on. + +The remaining non-comment lines in the file provide the outlier thresholds for +any functions you care to have a custom threshold. The format is: + + <function name> <value> <unit> + +For example, to specify a threshold of 15 milliseconds for the +`__wt_cache_eviction_worker` function, you'd insert the following line: + + __wt_cache_eviction_worker 15 ms + +Valid units are nanoseconds (ns), microseconds (us), milliseconds (ms) and +seconds (s). + +If you don't specify a threshold for a function the default of two standard +deviations will be used. Let's actually configure some more meaningful outlier +thresholds in our `sample.config` file and re-run the visualization script. Here +is the modified `sample.config`: + + # Units in which timestamps are recorded + 1000000000 + # Outlier threshold for individual functions + __curfile_search 30 ms + __evict_lru_pages 250 ms + __evict_lru_walk 1 ms + __evict_page 15 ms + __evict_walk 1 ms + __wt_cache_eviction_worker 15 ms + __wt_cond_wait_signal 10 s + +We rerun the data processing script with the configuration file argument, like +so: + + % WT/tools/optrack/find-latency-spikes.py -c sample.config optrack.0000025660.00000000*.txt + +The new charts can be accessed +[here](http://www.ece.ubc.ca/~sasha/TUTORIAL-DEMO-CONFIG/). + +# Looking at the data: Part 2 + +Let's examine the visualized data in more detail. Please click on [this +URL](http://www.ece.ubc.ca/~sasha/TUTORIAL-DEMO-CONFIG/): + +Although I insert the screenshots here, you will have more fun if you access it +directly. + +Now you can see the outlier charts complying with the configuration parameters +we supplied. For example, on the chart for the `__wt_cache_eviction_worker`, we +see only the intervals where that function took longer than 15 ms to complete. + +![](http://www.ece.ubc.ca/~sasha/TUTORIAL-DEMO-CONFIG/IMG/wt_cache_eviction_worker_outliers.png) + +Let's click on one of those intervals to examine what happened there. I am +going to click on the tallest bar in the chart, which will take me to a +detailed output of function call activity by thread for interval 137 at [this +URL](http://www.ece.ubc.ca/~sasha/TUTORIAL-DEMO-CONFIG/BUCKET-FILES/bucket-137.html). + +There is a lot of graphics there, so the URL might take a few seconds to load. + +At the very top of the page you see a small navigation bar. The bar tells you +where you are in the execution (the red marker). By clicking on other intervals +within the bar you can navigate to other intervals. For example, if you wanted +to look at the execution interval located after the current one, you will just +click on the white bar following the current, red-highlighted, bar. + +![](http://www.ece.ubc.ca/~sasha/TUTORIAL-DEMO-CONFIG/IMG/interval_137_nav_bar.png) + +Next you see a legend that shows all functions that were called during this +execution interval and their corresponding colours. + +![](http://www.ece.ubc.ca/~sasha/TUTORIAL-DEMO-CONFIG/IMG/interval_137_lgnd.png) + +Then you will see the most interesting information: function calls across +threads that occurred during this period. Durations and hierarchy of function +calls is preserved, meaning that longer functions will be shown with longer +bars and if function A called function B, B will be shown on top of A. + +By hovering over the colored bars corresponding to functions, you will see a +box appear telling you the name of the function, how long it took and its +original timestamp in your operation tracking log file. This way, if you are +suspicious about the correctness of your log or want to go back to it for some +reason, you know what to look for. + +In our example visualization, if we scroll down to the function sequences for +external threads, we will quickly see a few instances where +__wt_cache_eviction_worker took longer than 15 ms, for example here: + +![](http://www.ece.ubc.ca/~sasha/TUTORIAL-DEMO-CONFIG/IMG/wt_cache_eviction_worker_over_15ms.png) + +As we can see, the threads are simply waiting on the condition variable inside +the eviction worker. To try and understand why, we might want to scroll up and +look at the activity of internal threads during this period. Interestingly +enough, most of the internal threads are also waiting on the condition variable +during this period. + +![](http://www.ece.ubc.ca/~sasha/TUTORIAL-DEMO-CONFIG/IMG/interval_137_threads_waiting.png) + +Looking at the internal thread with session id 1 shows something suspicious. + +![](http://www.ece.ubc.ca/~sasha/TUTORIAL-DEMO-CONFIG/IMG/session_1_dead_period.png) + +During this period where all threads are waiting, this thread shows no activity +at all, whereas prior and after that "dead" period it was making regular calls +to `__evict_walk`. Perhaps that thread was unscheduled or stopped calling +__evict_walk for other reasons. Perhaps other threads were dependent on work +performed within `__evict_walk` to make forward progress. + +To better understand why `__evict_walk` was interrupted for almost 10ms we +might want to go back and insert operation tracking macros inside the functions +called by `__evict_walk`, to see if the thread was doing some other work during +this time or was simply unscheduled. + +# Summary + +Operation tracking is great for debugging performance problems where you are +interested in understanding the activity of the workload over time and across +threads. To effectively navigate large quantities of data it is best to +reproduce your performance problem in a short run. + +*/ |