summaryrefslogtreecommitdiff
path: root/src/third_party/wiredtiger/src/docs/wtperf.dox
blob: f521103f717516fe3bfb69cac8fd427e4d06ddda (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
/*! @page wtperf Simulating workloads with wtperf

The WiredTiger distribution includes a tool that can be used to simulate
workloads in WiredTiger, in the directory \c bench/wtperf.

The \c wtperf utility generally has two phases, the populate phase which
creates a database and then populates an object in that database, and a
workload phase, that does some set of operations on the object.

For example, the following configuration uses a single thread to
populate a file object with 500,000 records in a 500MB cache.  The
workload phase consists of 8 threads running for two minutes, all
reading from the file.

@code
conn_config="cache_size=500MB"
table_config="type=file"
icount=500000
run_time=120
populate_threads=1
threads=((count=8,reads=1))
@endcode

In most cases, where the workload is the only interesting phase, the
populate phase can be performed once and the workload phase run
repeatedly (for more information, see the wtperf \c create configuration
variable).

The \c conn_config configuration supports setting any WiredTiger
connection configuration value.  This is commonly used to configure
statistics with regular reports, to obtain more information from the
run:

@code
conn_config="cache_size=20G,statistics=(fast,clear),statistics_log=(wait=600)"
report_interval=5
@endcode

Note quoting must be used when passing values to Wiredtiger
configuration, as opposed to configuring the \c wtperf utility itself.

The \c table_config configuration supports setting any WiredTiger object
creation configuration value, for example, the above test can be
converted to using an LSM store instead of a B+tree store, with
additional LSM configuration, by changing \c conn_config to:

@code
table_config="lsm=(chunk_size=5MB),type=lsm,os_cache_dirty_max=16MB"
@endcode

More complex workloads can be configured by creating more threads doing
inserts and updates as well as reads.  For example, to configure two
inserting threads two threads doing a mixture of inserts, reads and
updates:

@code
threads=((count=2,inserts=1),(count=2,inserts=1,reads=1,updates=1))
@endcode

Example \c wtperf configuration files can be found in the
\c bench/wtperf/runners/ directory.

There are also a number of command line arguments that can be passed
to \c wtperf:
@par <code>-C config</code>
Specify configuration strings for the ::wiredtiger_open function.
This argument is additive to the \c conn_config parameter in the
configuration file.
@par <code>-h directory</code>
Specify a database home directory.  The default is \c ./WT_TEST.
@par <code>-m monitor_directory</code>
Specify a directory for all monitoring related files.
The default is the database home directory.
@par <code>-O config_file</code>
Specify the configuration file to run.
@par <code>-o config</code>
Specify configuration strings for the \c wtperf program.
This argument will override settings in the configuration file.
@par <code>-T config</code>
Specify configuration strings for the WT_SESSION::create function.
This argument is additive to the \c table_config parameter in the
configuration file.

@section monitor Monitoring wtperf

Like all WiredTiger applications, the \c wtperf command can be configured
with statistics logging.

In addition to statistics logging, \c wtperf can monitor performance and
operation latency times.  Monitoring is enabled using the \c sample_interval
configuration.  For example to record information every 10 seconds, set the
following on the command line or add it to the \c wtperf configuration file:

@code
sample_interval=10
@endcode

Enabling monitoring causes \c wtperf to create a file \c monitor in the
database home directory (or another directory as specified using the
\c -m option to \c wtperf).

The following example shows how to run the \c medium-btree.wtperf configuration
with monitoring enabled, and then generate a graph.

@code
# Change into the WiredTiger directory.
cd wiredtiger

# Configure and build WiredTiger if not already built.
./configure && make

# Remove and re-create the run directory.
rm -rf WTPERF_RUN && mkdir WTPERF_RUN

# Run the medium-btree.wtperf workload, sampling performance every 5 seconds.
bench/wtperf/wtperf \
    -h WTPERF_RUN \
    -o sample_interval=5 \
    -O bench/wtperf/runners/medium-btree.wtperf
@endcode

@section config Wtperf configuration options

The following is a list of the currently available \c wtperf
configuration options:

\if START_AUTO_GENERATED_WTPERF_CONFIGURATION
DO NOT EDIT: THIS PART OF THE FILE IS GENERATED BY dist/s_docs.
\endif

@par backup_interval (unsigned int, default=0)
backup the database every interval seconds during the workload phase, 0 to disable
@par checkpoint_interval (unsigned int, default=120)
checkpoint every interval seconds during the workload phase.
@par checkpoint_stress_rate (unsigned int, default=0)
checkpoint every rate operations during the populate phase in the populate thread(s), 0 to  disable
@par checkpoint_threads (unsigned int, default=0)
number of checkpoint threads
@par conn_config (string, default="create,statistics=(fast),statistics_log=(json,wait=1)")
connection configuration string
@par close_conn (boolean, default=true)
properly close connection at end of test. Setting to false does not sync data to disk and can  result in lost data after test exits.
@par compact (boolean, default=false)
post-populate compact for LSM merging activity
@par compression (string, default="none")
compression extension.  Allowed configuration values are: 'none', 'lz4', 'snappy', 'zlib',  'zstd'
@par create (boolean, default=true)
do population phase; false to use existing database
@par database_count (unsigned int, default=1)
number of WiredTiger databases to use. Each database will execute the workload using a separate  home directory and complete set of worker threads
@par drop_tables (boolean, default=false)
Whether to drop all tables at the end of the run, and report time taken to do the drop.
@par in_memory (boolean, default=false)
Whether to create the database in-memory.
@par icount (unsigned int, default=5000)
number of records to initially populate. If multiple tables are configured the count is spread  evenly across all tables.
@par max_idle_table_cycle (unsigned int, default=0)
Enable regular create and drop of idle tables. Value is the maximum number of seconds a create  or drop is allowed before aborting or printing a warning based on max_idle_table_cycle_fatal  setting.
@par max_idle_table_cycle_fatal (boolean, default=false)
print warning (false) or abort (true) of max_idle_table_cycle failure.
@par index (boolean, default=false)
Whether to create an index on the value field.
@par insert_rmw (boolean, default=false)
execute a read prior to each insert in workload phase
@par key_sz (unsigned int, default=20)
key size
@par log_partial (boolean, default=false)
perform partial logging on first table only.
@par log_like_table (boolean, default=false)
Append all modification operations to another shared table.
@par min_throughput (unsigned int, default=0)
notify if any throughput measured is less than this amount. Aborts or prints warning based on  min_throughput_fatal setting. Requires sample_interval to be configured
@par min_throughput_fatal (boolean, default=false)
print warning (false) or abort (true) of min_throughput failure.
@par max_latency (unsigned int, default=0)
notify if any latency measured exceeds this number of milliseconds.Aborts or prints warning  based on min_throughput_fatal setting. Requires sample_interval to be configured
@par max_latency_fatal (boolean, default=false)
print warning (false) or abort (true) of max_latency failure.
@par pareto (unsigned int, default=0)
use pareto distribution for random numbers. Zero to disable, otherwise a percentage indicating  how aggressive the distribution should be.
@par populate_ops_per_txn (unsigned int, default=0)
number of operations to group into each transaction in the populate phase, zero for auto-commit
@par populate_threads (unsigned int, default=1)
number of populate threads, 1 for bulk load
@par pre_load_data (boolean, default=false)
Scan all data prior to starting the workload phase to warm the cache
@par random_range (unsigned int, default=0)
if non zero choose a value from within this range as the key for insert operations
@par random_value (boolean, default=false)
generate random content for the value
@par range_partition (boolean, default=false)
partition data by range (vs hash)
@par read_range (unsigned int, default=0)
read a sequential range of keys upon each read operation. This value tells us how many keys  to read each time, or an upper bound on the number of keys read if read_range_random is set.
@par read_range_random (boolean, default=false)
if doing range reads, select the number of keys to read  in a range uniformly at random.
@par readonly (boolean, default=false)
reopen the connection between populate and workload phases in readonly mode.  Requires  reopen_connection turned on (default).  Requires that read be the only workload specified
@par reopen_connection (boolean, default=true)
close and reopen the connection between populate and workload phases
@par report_interval (unsigned int, default=2)
output throughput information every interval seconds, 0 to disable
@par run_ops (unsigned int, default=0)
total insert, modify, read and update workload operations
@par run_time (unsigned int, default=0)
total workload seconds
@par sample_interval (unsigned int, default=0)
performance logging every interval seconds, 0 to disable
@par sample_rate (unsigned int, default=50)
how often the latency of operations is measured. One for every operation, two for every second  operation, three for every third operation etc.
@par scan_icount (unsigned int, default=0)
number of records in scan tables to populate
@par scan_interval (unsigned int, default=0)
scan tables every interval seconds during the workload phase, 0 to disable
@par scan_pct (unsigned int, default=10)
percentage of entire data set scanned, if scan_interval is enabled
@par scan_table_count (unsigned int, default=0)
number of separate tables to be used for scanning. Zero indicates that tables are shared with  other operations
@par select_latest (boolean, default=false)
in workloads that involve inserts and another type of operation, select the recently inserted records with higher probability
@par sess_config (string, default="")
session configuration string
@par session_count_idle (unsigned int, default=0)
number of idle sessions to create. Default 0.
@par table_config (string, default="key_format=S,value_format=S,type=file,exclusive=true,leaf_value_max=64MB,memory_page_max=10m, split_pct=90,checksum=on")
table configuration string
@par table_count (unsigned int, default=1)
number of tables to run operations over. Keys are divided evenly over the tables. Cursors are  held open on all tables. Default 1, maximum 99999.
@par table_count_idle (unsigned int, default=0)
number of tables to create, that won't be populated. Default 0.
@par threads (string, default="")
workload configuration: each 'count' entry is the total number of threads, and the 'insert',  'modify', 'read' and 'update' entries are the ratios of insert, modify, read and update  operations done by each worker thread; If a throttle value is provided each thread will do a  maximum of that number of operations per second; multiple workload configurations may be  specified per threads configuration; for example, a more complex threads configuration might be  'threads=((count=2,reads=1)(count=8,reads=1,inserts=2,updates=1))' which would create 2 threads  doing nothing but reads and 8 threads each doing 50% inserts and 25% reads and updates.   Allowed configuration values are 'count', 'throttle', 'inserts', 'reads', 'read_range',  'modify', 'modify_delta', 'modify_distribute', 'modify_force_update', 'updates',  'update_delta', 'truncate', 'truncate_pct' and 'truncate_count'. There are also behavior  modifiers, supported modifiers are 'ops_per_txn'
@par tiered (string, default="none")
tiered extension.  Allowed configuration values are: 'none', 'dir_store', 's3'
@par tiered_flush_interval (unsigned int, default=0)
Call flush_tier every interval seconds during the workload phase.  We recommend this value be larger than the checkpoint_interval. 0 to disable. The  'tiered_extension' must be set to something other than 'none'.
@par transaction_config (string, default="")
WT_SESSION.begin_transaction configuration string, applied during the populate phase when  populate_ops_per_txn is nonzero
@par table_name (string, default="test")
table name
@par truncate_single_ops (boolean, default=false)
Implement truncate via cursor remove instead of session API
@par value_sz_max (unsigned int, default=1000)
maximum value size when delta updates/modify operations are present. Default disabled
@par value_sz_min (unsigned int, default=1)
minimum value size when delta updates/modify operations are present. Default disabled
@par value_sz (unsigned int, default=100)
value size
@par verbose (unsigned int, default=1)
verbosity
@par warmup (unsigned int, default=0)
How long to run the workload phase before starting measurements

\if STOP_AUTO_GENERATED_WTPERF_CONFIGURATION
DO NOT EDIT: THIS PART OF THE FILE IS GENERATED BY dist/s_docs.
\endif

*/