summaryrefslogtreecommitdiff
path: root/doc/source/config/container_server_config.rst
blob: 1684acfcaf395e74b6e52dbbefe1123bb7181027 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
.. _container-server-config:

------------------------------
Container Server Configuration
------------------------------

This document describes the configuration options available for the container
server. Documentation for other swift configuration options can be found at
:doc:`index`.

An example Container Server configuration can be found at
etc/container-server.conf-sample in the source code repository.

The following configuration sections are available:

* :ref:`[DEFAULT] <container_server_default_options>`
* `[container-server]`_
* `[container-replicator]`_
* `[container-sharder]`_
* `[container-updater]`_
* `[container-auditor]`_

.. _container_server_default_options:

*********
[DEFAULT]
*********

===============================  ==========  ============================================
Option                           Default     Description
-------------------------------  ----------  --------------------------------------------
swift_dir                        /etc/swift  Swift configuration directory
devices                          /srv/node   Parent directory of where devices are mounted
mount_check                      true        Whether or not check if the devices are
                                             mounted to prevent accidentally writing
                                             to the root device
bind_ip                          0.0.0.0     IP Address for server to bind to
bind_port                        6201        Port for server to bind to
keep_idle                        600         Value to set for socket TCP_KEEPIDLE
bind_timeout                     30          Seconds to attempt bind before giving up
backlog                          4096        Maximum number of allowed pending
                                             connections
workers                          auto        Override the number of pre-forked workers
                                             that will accept connections.  If set it
                                             should be an integer, zero means no fork.  If
                                             unset, it will try to default to the number
                                             of effective cpu cores and fallback to one.
                                             Increasing the number of workers may reduce
                                             the possibility of slow file system
                                             operations in one request from negatively
                                             impacting other requests.  See
                                             :ref:`general-service-tuning`.
max_clients                      1024        Maximum number of clients one worker can
                                             process simultaneously (it will actually
                                             accept(2) N + 1). Setting this to one (1)
                                             will only handle one request at a time,
                                             without accepting another request
                                             concurrently.
user                             swift       User to run as
disable_fallocate                false       Disable "fast fail" fallocate checks if the
                                             underlying filesystem does not support it.
log_name                         swift       Label used when logging
log_facility                     LOG_LOCAL0  Syslog log facility
log_level                        INFO        Logging level
log_address                      /dev/log    Logging directory
log_max_line_length              0           Caps the length of log lines to the
                                             value given; no limit if set to 0, the
                                             default.
log_custom_handlers              None        Comma-separated list of functions to call
                                             to setup custom log handlers.
log_udp_host                                 Override log_address
log_udp_port                     514         UDP log port
log_statsd_host                  None        Enables StatsD logging; IPv4/IPv6
                                             address or a hostname.  If a
                                             hostname resolves to an IPv4 and IPv6
                                             address, the IPv4 address will be
                                             used.
log_statsd_port                  8125
log_statsd_default_sample_rate   1.0
log_statsd_sample_rate_factor    1.0
log_statsd_metric_prefix
eventlet_debug                   false       If true, turn on debug logging for eventlet
fallocate_reserve                1%          You can set fallocate_reserve to the
                                             number of bytes or percentage of disk
                                             space you'd like fallocate to reserve,
                                             whether there is space for the given
                                             file size or not. Percentage will be used
                                             if the value ends with a '%'. This is
                                             useful for systems that behave badly when
                                             they completely run out of space; you can
                                             make the services pretend they're out of
                                             space early.
db_preallocation                 off         If you don't mind the extra disk space usage
                                             in overhead, you can turn this on to preallocate
                                             disk space with SQLite databases to decrease
                                             fragmentation.
nice_priority                    None        Scheduling priority of server processes.
                                             Niceness values range from -20 (most
                                             favorable to the process) to 19 (least
                                             favorable to the process). The default
                                             does not modify priority.
ionice_class                     None        I/O scheduling class of server processes.
                                             I/O niceness class values are IOPRIO_CLASS_RT
                                             (realtime), IOPRIO_CLASS_BE (best-effort),
                                             and IOPRIO_CLASS_IDLE (idle).
                                             The default does not modify class and
                                             priority. Linux supports io scheduling
                                             priorities and classes since 2.6.13
                                             with the CFQ io scheduler.
                                             Work only with ionice_priority.
ionice_priority                  None        I/O scheduling priority of server processes.
                                             I/O niceness priority is a number which
                                             goes from 0 to 7. The higher the value,
                                             the lower the I/O priority of the process.
                                             Work only with ionice_class.
                                             Ignored if IOPRIO_CLASS_IDLE is set.
===============================  ==========  ============================================

******************
[container-server]
******************

==============================  ================  ========================================
Option                          Default           Description
------------------------------  ----------------  ----------------------------------------
use                                               paste.deploy entry point for the
                                                  container server.  For most cases, this
                                                  should be ``egg:swift#container``.
set log_name                    container-server  Label used when logging
set log_facility                LOG_LOCAL0        Syslog log facility
set log_level                   INFO              Logging level
set log_requests                True              Whether or not to log each
                                                  request
set log_address                 /dev/log          Logging directory
node_timeout                    3                 Request timeout to external services
conn_timeout                    0.5               Connection timeout to external services
allow_versions                  false             Enable/Disable object versioning feature
replication_server                                Configure parameter for creating
                                                  specific server. To handle all verbs,
                                                  including replication verbs, do not
                                                  specify "replication_server"
                                                  (this is the default). To only
                                                  handle replication, set to a True
                                                  value (e.g. "True" or "1").
                                                  To handle only non-replication
                                                  verbs, set to "False". Unless you
                                                  have a separate replication network, you
                                                  should not specify any value for
                                                  "replication_server".
nice_priority                   None              Scheduling priority of server processes.
                                                  Niceness values range from -20 (most
                                                  favorable to the process) to 19 (least
                                                  favorable to the process). The default
                                                  does not modify priority.
ionice_class                    None              I/O scheduling class of server processes.
                                                  I/O niceness class values are
                                                  IOPRIO_CLASS_RT (realtime),
                                                  IOPRIO_CLASS_BE (best-effort),
                                                  and IOPRIO_CLASS_IDLE (idle).
                                                  The default does not modify class and
                                                  priority. Linux supports io scheduling
                                                  priorities and classes since 2.6.13 with
                                                  the CFQ io scheduler.
                                                  Work only with ionice_priority.
ionice_priority                 None              I/O scheduling priority of server
                                                  processes. I/O niceness priority is
                                                  a number which goes from 0 to 7.
                                                  The higher the value, the lower the I/O
                                                  priority of the process. Work only with
                                                  ionice_class.
                                                  Ignored if IOPRIO_CLASS_IDLE is set.
==============================  ================  ========================================

**********************
[container-replicator]
**********************

==================== ===========================  =============================
Option               Default                      Description
-------------------- ---------------------------  -----------------------------
log_name             container-replicator         Label used when logging
log_facility         LOG_LOCAL0                   Syslog log facility
log_level            INFO                         Logging level
log_address          /dev/log                     Logging directory
per_diff             1000                         Maximum number of database
                                                  rows that will be sync'd in a
                                                  single HTTP replication
                                                  request. Databases with less
                                                  than or equal to this number
                                                  of differing rows will always
                                                  be sync'd using an HTTP
                                                  replication request rather
                                                  than using rsync.
max_diffs            100                          Maximum number of HTTP
                                                  replication requests attempted
                                                  on each replication pass for
                                                  any one container. This caps
                                                  how long the replicator will
                                                  spend trying to sync a given
                                                  database per pass so the other
                                                  databases don't get starved.
concurrency          8                            Number of replication workers
                                                  to spawn
interval             30                           Time in seconds to wait
                                                  between replication passes
databases_per_second 50                           Maximum databases to process
                                                  per second.  Should be tuned
                                                  according to individual
                                                  system specs.  0 is unlimited.
node_timeout         10                           Request timeout to external
                                                  services
conn_timeout         0.5                          Connection timeout to external
                                                  services
reclaim_age          604800                       Time elapsed in seconds before
                                                  a container can be reclaimed
rsync_module         {replication_ip}::container  Format of the rsync module
                                                  where the replicator will send
                                                  data. The configuration value
                                                  can include some variables
                                                  that will be extracted from
                                                  the ring. Variables must
                                                  follow the format {NAME} where
                                                  NAME is one of: ip, port,
                                                  replication_ip,
                                                  replication_port, region,
                                                  zone, device, meta. See
                                                  etc/rsyncd.conf-sample for
                                                  some examples.
rsync_compress       no                           Allow rsync to compress data
                                                  which is transmitted to
                                                  destination node during sync.
                                                  However, this is applicable
                                                  only when destination node is
                                                  in a different region than the
                                                  local one. NOTE: Objects that
                                                  are already compressed (for
                                                  example: .tar.gz, mp3) might
                                                  slow down the syncing process.
recon_cache_path     /var/cache/swift             Path to recon cache
nice_priority        None                         Scheduling priority of server
                                                  processes. Niceness values
                                                  range from -20 (most favorable
                                                  to the process) to 19 (least
                                                  favorable to the process).
                                                  The default does not modify
                                                  priority.
ionice_class         None                         I/O scheduling class of server
                                                  processes. I/O niceness class
                                                  values are
                                                  IOPRIO_CLASS_RT (realtime),
                                                  IOPRIO_CLASS_BE (best-effort),
                                                  and IOPRIO_CLASS_IDLE (idle).
                                                  The default does not modify
                                                  class and priority. Linux
                                                  supports io scheduling
                                                  priorities and classes since
                                                  2.6.13 with the CFQ io
                                                  scheduler.
                                                  Work only with ionice_priority.
ionice_priority      None                         I/O scheduling priority of
                                                  server processes. I/O niceness
                                                  priority is a number which goes
                                                  from 0 to 7.
                                                  The higher the value, the lower
                                                  the I/O priority of the process.
                                                  Work only with ionice_class.
                                                  Ignored if IOPRIO_CLASS_IDLE
                                                  is set.
==================== ===========================  =============================

*******************
[container-sharder]
*******************

The container-sharder re-uses features of the container-replicator and inherits
the following configuration options defined for the `[container-replicator]`_:

* interval
* databases_per_second
* per_diff
* max_diffs
* concurrency
* node_timeout
* conn_timeout
* reclaim_age
* rsync_compress
* rsync_module
* recon_cache_path

Some config options in this section may also be used by the
:ref:`swift-manage-shard-ranges CLI tool <swift-manage-shard-ranges>`.

================================= =================  =======================================
Option                            Default            Description
--------------------------------- -----------------  ---------------------------------------
log_name                          container-sharder  Label used when logging
log_facility                      LOG_LOCAL0         Syslog log facility
log_level                         INFO               Logging level
log_address                       /dev/log           Logging directory


auto_shard                        false               If the auto_shard option
                                                      is true then the sharder
                                                      will automatically select
                                                      containers to shard, scan
                                                      for shard ranges, and
                                                      select shards to shrink.
                                                      Warning: auto-sharding is
                                                      still under development
                                                      and should not be used in
                                                      production; do not set
                                                      this option to true in a
                                                      production cluster.

shard_container_threshold         1000000             This defines the
                                                      object count at which a
                                                      container with
                                                      container-sharding
                                                      enabled will start to
                                                      shard. This also
                                                      indirectly determines the
                                                      the defaults for
                                                      rows_per_shard,
                                                      shrink_threshold and
                                                      expansion_limit.

rows_per_shard                    500000              This defines the initial
                                                      nominal size of shard
                                                      containers. The default
                                                      is shard_container_threshold // 2.

shrink_threshold                                      This defines the
                                                      object count below which
                                                      a 'donor' shard container
                                                      will be considered for
                                                      shrinking into another
                                                      'acceptor' shard
                                                      container. The default is
                                                      determined by
                                                      shard_shrink_point. If
                                                      set, shrink_threshold
                                                      will take precedence over
                                                      shard_shrink_point.

shard_shrink_point                10                  Deprecated: shrink_threshold
                                                      is recommended and if set
                                                      will take precedence over
                                                      shard_shrink_point.
                                                      This defines the
                                                      object count below which
                                                      a 'donor' shard container
                                                      will be considered for
                                                      shrinking into another
                                                      'acceptor' shard
                                                      container.
                                                      shard_shrink_point is a
                                                      percentage of
                                                      shard_container_threshold
                                                      e.g. the default value of
                                                      10 means 10% of the
                                                      shard_container_threshold.

expansion_limit                                       This defines the
                                                      maximum allowed size of
                                                      an acceptor shard
                                                      container after having a
                                                      donor merged into it. The
                                                      default is determined by
                                                      shard_shrink_merge_point.
                                                      If set, expansion_limit
                                                      will take precedence over
                                                      shard_shrink_merge_point.

shard_shrink_merge_point          75                  Deprecated: expansion_limit
                                                      is recommended and if set
                                                      will take precedence over
                                                      shard_shrink_merge_point.
                                                      This defines the
                                                      maximum allowed size of
                                                      an acceptor shard
                                                      container after having a
                                                      donor merged into it.
                                                      Shard_shrink_merge_point
                                                      is a percentage of
                                                      shard_container_threshold.
                                                      e.g. the default value of
                                                      75 means that the
                                                      projected sum of a donor
                                                      object count and acceptor
                                                      count must be less than
                                                      75% of shard_container_threshold
                                                      for the donor to be
                                                      allowed to merge into the
                                                      acceptor.

                                                      For example, if
                                                      shard_container_threshold
                                                      is 1 million,
                                                      shard_shrink_point is 10,
                                                      and shard_shrink_merge_point
                                                      is 75 then a shard will
                                                      be considered for
                                                      shrinking if it has less
                                                      than or equal to 100
                                                      thousand objects but will
                                                      only merge into an
                                                      acceptor if the combined
                                                      object count would be
                                                      less than or equal to 750
                                                      thousand objects.


shard_scanner_batch_size          10                  When auto-sharding is
                                                      enabled this defines the
                                                      maximum number of shard
                                                      ranges that will be found
                                                      each time the sharder
                                                      daemon visits a sharding
                                                      container. If necessary
                                                      the sharder daemon will
                                                      continue to search for
                                                      more shard ranges each
                                                      time it visits the
                                                      container.

cleave_batch_size                 2                   Defines the number of
                                                      shard ranges that will be
                                                      cleaved each time the
                                                      sharder daemon visits a
                                                      sharding container.

cleave_row_batch_size             10000               Defines the size of
                                                      batches of object rows
                                                      read from a sharding
                                                      container and merged to a
                                                      shard container during
                                                      cleaving.

shard_replication_quorum          auto                Defines the number of
                                                      successfully replicated
                                                      shard dbs required when
                                                      cleaving a previously
                                                      uncleaved shard range
                                                      before the sharder will
                                                      progress to the next
                                                      shard range. The value
                                                      should be less than or
                                                      equal to the container
                                                      ring replica count. The
                                                      default of 'auto' causes
                                                      the container ring quorum
                                                      value to be used. This
                                                      option only applies to
                                                      the container-sharder
                                                      replication and does not
                                                      affect the number of
                                                      shard container replicas
                                                      that will eventually be
                                                      replicated by the
                                                      container-replicator.


existing_shard_replication_quorum auto                Defines the number of
                                                      successfully replicated
                                                      shard dbs required when
                                                      cleaving a shard range
                                                      that has been previously
                                                      cleaved on another node
                                                      before the sharder will
                                                      progress to the next
                                                      shard range. The value
                                                      should be less than or
                                                      equal to the container
                                                      ring replica count. The
                                                      default of 'auto' causes
                                                      the shard_replication_quorum
                                                      value to be used. This
                                                      option only applies to
                                                      the container-sharder
                                                      replication and does not
                                                      affect the number of
                                                      shard container replicas
                                                      that will eventually be
                                                      replicated by the
                                                      container-replicator.

internal_client_conf_path         see description     The sharder uses an
                                                      internal client to create
                                                      and make requests to
                                                      containers. The absolute
                                                      path to the client config
                                                      file can be configured.
                                                      Defaults to
                                                      /etc/swift/internal-client.conf

request_tries                     3                   The number of time the
                                                      internal client will
                                                      retry requests.

recon_candidates_limit            5                   Each time the sharder
                                                      dumps stats to the recon
                                                      cache file it includes a
                                                      list of containers that
                                                      appear to need sharding
                                                      but are not yet sharding.
                                                      By default this list is
                                                      limited to the top 5
                                                      containers, ordered by
                                                      object count. The limit
                                                      may be changed by setting
                                                      recon_candidates_limit to
                                                      an integer value. A
                                                      negative value implies no
                                                      limit.

broker_timeout                    60                  Large databases tend to
                                                      take a while to work
                                                      with, but we want to make
                                                      sure we write down our
                                                      progress. Use a
                                                      larger-than-normal broker
                                                      timeout to make us less
                                                      likely to bomb out on a
                                                      LockTimeout.
================================= =================  =======================================

*******************
[container-updater]
*******************

========================  =================  ==================================
Option                    Default            Description
------------------------  -----------------  ----------------------------------
log_name                  container-updater  Label used when logging
log_facility              LOG_LOCAL0         Syslog log facility
log_level                 INFO               Logging level
log_address               /dev/log           Logging directory
interval                  300                Minimum time for a pass to take
concurrency               4                  Number of updater workers to spawn
node_timeout              3                  Request timeout to external
                                             services
conn_timeout              0.5                Connection timeout to external
                                             services
containers_per_second     50                 Maximum containers updated per second.
                                             Should be tuned according to individual
                                             system specs. 0 is unlimited.

slowdown                  0.01               Time in seconds to wait between
                                             containers. Deprecated in favor of
                                             containers_per_second.
account_suppression_time  60                 Seconds to suppress updating an
                                             account that has generated an
                                             error (timeout, not yet found,
                                             etc.)
recon_cache_path          /var/cache/swift   Path to recon cache
nice_priority             None               Scheduling priority of server
                                             processes. Niceness values range
                                             from -20 (most favorable to the
                                             process) to 19 (least favorable
                                             to the process). The default does
                                             not modify priority.
ionice_class              None               I/O scheduling class of server
                                             processes. I/O niceness class
                                             values are IOPRIO_CLASS_RT (realtime),
                                             IOPRIO_CLASS_BE (best-effort),
                                             and IOPRIO_CLASS_IDLE (idle).
                                             The default does not modify class and
                                             priority. Linux supports io scheduling
                                             priorities and classes since 2.6.13 with
                                             the CFQ io scheduler.
                                             Work only with ionice_priority.
ionice_priority           None               I/O scheduling priority of server
                                             processes. I/O niceness priority is
                                             a number which goes from 0 to 7.
                                             The higher the value, the lower
                                             the I/O priority of the process.
                                             Work only with ionice_class.
                                             Ignored if IOPRIO_CLASS_IDLE is set.
========================  =================  ==================================

*******************
[container-auditor]
*******************

=====================  =================  =======================================
Option                 Default            Description
---------------------  -----------------  ---------------------------------------
log_name               container-auditor  Label used when logging
log_facility           LOG_LOCAL0         Syslog log facility
log_level              INFO               Logging level
log_address            /dev/log           Logging directory
interval               1800               Minimum time for a pass to take
containers_per_second  200                Maximum containers audited per second.
                                          Should be tuned according to individual
                                          system specs. 0 is unlimited.
recon_cache_path       /var/cache/swift   Path to recon cache
nice_priority          None               Scheduling priority of server processes.
                                          Niceness values range from -20 (most
                                          favorable to the process) to 19 (least
                                          favorable to the process). The default
                                          does not modify priority.
ionice_class           None               I/O scheduling class of server processes.
                                          I/O niceness class values are
                                          IOPRIO_CLASS_RT (realtime),
                                          IOPRIO_CLASS_BE (best-effort),
                                          and IOPRIO_CLASS_IDLE (idle).
                                          The default does not modify class and
                                          priority. Linux supports io scheduling
                                          priorities and classes since 2.6.13 with
                                          the CFQ io scheduler.
                                          Work only with ionice_priority.
ionice_priority        None               I/O scheduling priority of server
                                          processes. I/O niceness priority is
                                          a number which goes from 0 to 7.
                                          The higher the value, the lower the I/O
                                          priority of the process. Work only with
                                          ionice_class.
                                          Ignored if IOPRIO_CLASS_IDLE is set.
=====================  =================  =======================================