diff options
author | Nick Vatamaniuc <vatamane@apache.org> | 2016-11-25 11:47:56 -0500 |
---|---|---|
committer | Nick Vatamaniuc <vatamane@apache.org> | 2017-04-28 17:35:50 -0400 |
commit | 4841774575fb5771a245e5f046a26eaa7ac8dbb4 (patch) | |
tree | 67d23b5b1e97bcb8d5f07798303c833e291460d2 /dev | |
parent | dcfa0902b62613f6c17062d756aa2c6e68871cfa (diff) | |
download | couchdb-4841774575fb5771a245e5f046a26eaa7ac8dbb4.tar.gz |
Stitch scheduling replicator together.
Glue together all the scheduling replicator pieces.
Scheduler is the main component. It can run a large number of replication jobs
by switching between them, stopping and starting some periodically. Jobs
which fail are backed off exponentially. Normal (non-continuous) jobs will be
allowed to run to completion to preserve their current semantics.
Scheduler behavior can configured by these configuration options in
`[replicator]` sections:
* `max_jobs` : Number of actively running replications. Making this too high
could cause performance issues. Making it too low could mean replications jobs
might not have enough time to make progress before getting unscheduled again.
This parameter can be adjusted at runtime and will take effect during next
reschudling cycle.
* `interval` : Scheduling interval in milliseconds. During each reschedule
cycle scheduler might start or stop up to "max_churn" number of jobs.
* `max_churn` : Maximum number of replications to start and stop during
rescheduling. This parameter along with "interval" defines the rate of job
replacement. During startup, however a much larger number of jobs could be
started (up to max_jobs) in short period of time.
Replication jobs are added to the scheduler by the document processor or from
the `couch_replicator:replicate/2` function when called from `_replicate` HTTP
endpoint handler.
Document processor listens for updates via couch_mutlidb_changes module then
tries to add replication jobs to the scheduler. Sometimes translating a
document update to a replication job could fail, either permantly (if document
is malformed and missing some expected fields for example) or temporarily if
it is a filtered replication and filter cannot be fetched. A failed filter
fetch will be retried with an exponential backoff.
couch_replicator_clustering is in charge of monitoring cluster membership
changes. When membership changes, after a configurable quiet period, a rescan
will be initiated. Rescan will shufle replication jobs to make sure a
replication job is running on only one node.
A new set of stats were added to introspect scheduler and doc processor
internals.
The top replication supervisor structure is `rest_for_one`. This means if
a child crashes, all children to the "right" of it will be restarted (if
visualized supervisor hierarchy as an upside-down tree). Clustering,
connection pool and rate limiter are towards the "left" as they are more
fundamental, if clustering child crashes, most other components will be
restart. Doc process or and multi-db changes children are towards the "right".
If they crash, they can be safely restarted without affecting already running
replication or components like clustering or connection pool.
Jira: COUCHDB-3324
Diffstat (limited to 'dev')
-rwxr-xr-x | dev/run | 14 |
1 files changed, 14 insertions, 0 deletions
@@ -125,6 +125,8 @@ def setup_argparse(): help='HAProxy port') parser.add_option('--node-number', dest="node_number", type=int, default=1, help='The node number to seed them when creating the node(s)') + parser.add_option('-c', '--config-overrides', action="append", default=[], + help='Optional key=val config overrides. Can be repeated') return parser.parse_args() @@ -143,6 +145,7 @@ def setup_context(opts, args): 'with_haproxy': opts.with_haproxy, 'haproxy': opts.haproxy, 'haproxy_port': opts.haproxy_port, + 'config_overrides': opts.config_overrides, 'procs': []} @@ -190,6 +193,16 @@ def setup_configs(ctx): write_config(ctx, node, env) +def apply_config_overrides(ctx, content): + for kv_str in ctx['config_overrides']: + key, val = kv_str.split('=') + key, val = key.strip(), val.strip() + match = "[;=]{0,2}%s.*" % key + repl = "%s = %s" % (key, val) + content = re.sub(match, repl, content) + return content + + def get_ports(idnode): assert idnode return ((10000 * idnode) + 5984, (10000 * idnode) + 5986) @@ -214,6 +227,7 @@ def write_config(ctx, node, env): if base == "default.ini": content = hack_default_ini(ctx, node, content) + content = apply_config_overrides(ctx, content) elif base == "local.ini": content = hack_local_ini(ctx, content) |