summaryrefslogtreecommitdiff
path: root/etc
Commit message (Collapse)AuthorAgeFilesLines
* wsgi: Add keepalive_timeout optionTim Burke2023-04-181-0/+6
| | | | | | | | | | | | | | Clients sometimes hold open connections "just in case" they might later pipeline requests. This can cause issues for proxies, especially if operators restrict max_clients in an effort to improve response times for the requests that *do* get serviced. Add a new keepalive_timeout option to give proxies a way to drop these established-but-idle connections without impacting active connections (as may happen when reducing client_timeout). Note that this requires eventlet 0.33.4 or later. Change-Id: Ib5bb84fa3f8a4b9c062d58c8d3689e7030d9feb3
* Merge "internal_client: Remove allow_modify_pipeline option"Zuul2023-04-141-0/+1
|\
| * internal_client: Remove allow_modify_pipeline optionMatthew Oliver2023-04-141-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The internal client is suppose to be internal to the cluster, and as such we rely on it to not remove any headers we decide to send. However if the allow_modify_pipeline option is set the gatekeeper middleware is added to the internal client's proxy pipeline. So firstly, this patch removes the allow_modify_pipeline option from the internal client constructor. And when calling loadapp allow_modify_pipeline is always passed with a False. Further, an op could directly put the gatekeeper middleware into the internal client config. The internal client constructor will now check the pipeline and raise a ValueError if one has been placed in the pipeline. To do this, there is now a check_gatekeeper_loaded staticmethod that will walk the pipeline which called from the InternalClient.__init__ method. Enabling this walking through the pipeline, we are now stashing the wsgi pipeline in each filter so that we don't have to rely on 'app' naming conventions to iterate the pipeline. Co-Authored-By: Alistair Coles <alistairncoles@gmail.com> Change-Id: Idcca7ac0796935c8883de9084d612d64159d9f92
* | quotas: Add account-level per-policy quotasTim Burke2023-03-211-2/+2
|/ | | | | | | | | | | | | | Reseller admins can set new headers on accounts like X-Account-Quota-Bytes-Policy-<policy-name>: <quota> This may be done to limit consumption of a faster, all-flash policy, for example. This is independent of the existing X-Account-Meta-Quota-Bytes header, which continues to limit the total storage for an account across all policies. Change-Id: Ib25c2f667e5b81301f8c67375644981a13487cfe
* Merge "slo: Default allow_async_delete to true"Zuul2022-12-011-1/+1
|\
| * slo: Default allow_async_delete to trueTim Burke2021-12-211-1/+1
| | | | | | | | | | | | | | | | We've had this option for a year now, and it seems to help. Let's enable it for everyone. Note that Swift clients still need to opt into the async delete via a query param, while S3 clients get it for free. Change-Id: Ib4164f877908b855ce354cc722d9cb0be8be9921
* | Sharder: warn when sharding appears to have stalled.Jianjian Huo2022-10-141-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch add a configurable timeout after which the sharder will warn if a container DB has not completed sharding. The new config is container_sharding_timeout with a default of 172800 seconds (2 days). Drive-by fix: recording sharding progress will cover the case of shard range shrinking too. Co-Authored-By: Alistair Coles <alistairncoles@gmail.com> Change-Id: I6ce299b5232a8f394e35f148317f9e08208a0c0f
* | Merge "proxy: Add a chance to skip memcache for get_*_info calls"Zuul2022-09-261-0/+2
|\ \
| * | proxy: Add a chance to skip memcache for get_*_info callsTim Burke2022-08-301-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If you've got thousands of requests per second for objects in a single container, you basically NEVER want that container's info to ever fall out of memcache. If it *does*, all those clients are almost certainly going to overload the container. Avoid this by allowing some small fraction of requests to bypass and refresh the cache, pushing out the TTL as long as there continue to be requests to the container. The likelihood of skipping the cache is configurable, similar to what we did for shard range sets. Change-Id: If9249a42b30e2a2e7c4b0b91f947f24bf891b86f Closes-Bug: #1883324
* | | Merge "Add note about rsync_bwlimit suffixes"Zuul2022-08-301-1/+2
|\ \ \ | |/ / |/| |
| * | Add note about rsync_bwlimit suffixesTim Burke2022-08-261-1/+2
| | | | | | | | | | | | Change-Id: I019451e118d3bd7263a52cf4bf354d0d0d2b4607
* | | Merge "Add backend rate limiting middleware"Zuul2022-08-303-3/+39
|\ \ \ | |/ / |/| |
| * | Add backend rate limiting middlewareAlistair Coles2022-05-203-3/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a fairly blunt tool: ratelimiting is per device and applied independently in each worker, but this at least provides some limit to disk IO on backend servers. GET, HEAD, PUT, POST, DELETE, UPDATE and REPLICATE methods may be rate-limited. Only requests with a path starting '<device>/<partition>', where <partition> can be cast to an integer, will be rate-limited. Other requests, including, for example, recon requests with paths such as 'recon/version', are unconditionally forwarded to the next app in the pipeline. OPTIONS and SSYNC methods are not rate-limited. Note that SSYNC sub-requests are passed directly to the object server app and will not pass though this middleware. Change-Id: I78b59a081698a6bff0d74cbac7525e28f7b5d7c1
* | | AUTHORS/CHANGELOG for 2.30.02.30.0Tim Burke2022-08-171-1/+1
| | | | | | | | | | | | Change-Id: If7c9e13fc62f8104ccb70a12b9c839f78e7e6e3e
* | | Merge "DB Replicator: Add handoff_delete option"Zuul2022-07-222-4/+22
|\ \ \
| * | | DB Replicator: Add handoff_delete optionMatthew Oliver2022-07-212-4/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently the object-replicator has an option called `handoff_delete` which allows us to define the the number of replicas which are ensured in swift. Once a handoff node ensures that many successful responses it can go ahead and delete the handoff partition. By default it's 'auto' or rather the number of primary nodes. But this can be reduced. It's useful in draining full disks, but has to be used carefully. This patch adds the same option to the DB replicator and works the same way. But instead of deleting a partition it's done at the per DB level. Because it's done in the DB Replicator level it means the option is now available to both the Account and Container replicators. Change-Id: Ide739a6d805bda20071c7977f5083574a5345a33
* | | | Merge "Add ring_ip option to object services"Zuul2022-06-061-0/+6
|\ \ \ \
| * | | | Add ring_ip option to object servicesClay Gerrard2022-06-021-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This will be used when finding their own devices in rings, defaulting to the bind_ip. Notably, this allows services to be containerized while servers_per_port is enabled: * For the object-server, the ring_ip should be set to the host ip and will be used to discover which ports need binding. Sockets will still be bound to the bind_ip (likely 0.0.0.0), with the assumption that the host will publish ports 1:1. * For the replicator and reconstructor, the ring_ip will be used to discover which devices should be replicated. While bind_ip could previously be used for this, it would have required a separate config from the object-server. Also rename object deamon's bind_ip attribute to ring_ip so that it's more obvious wherever we're using the IP for ring lookups instead of socket binding. Co-Authored-By: Tim Burke <tim.burke@gmail.com> Change-Id: I1c9bb8086994f7930acd8cda8f56e766938c2218
* | | | | Merge "tempurl: Deprecate sha1 signatures"Zuul2022-06-011-1/+1
|\ \ \ \ \
| * | | | | tempurl: Deprecate sha1 signaturesTim Burke2022-04-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We've known this would eventually be necessary for a while [1], and way back in 2017 we started seeing SHA-1 collisions [2]. [1] https://www.schneier.com/blog/archives/2012/10/when_will_we_se.html [2] https://security.googleblog.com/2017/02/announcing-first-sha1-collision.html UpgradeImpact: ============== "sha1" has been removed from the default set of `allowed_digests` in the tempurl middleware config. If your cluster still has clients requiring the use of SHA-1, - explicitly configure `allowed_digests` to include "sha1" and - encourage your clients to move to more-secure algorithms. Depends-On: https://review.opendev.org/c/openstack/tempest/+/832771 Change-Id: I6e6fa76671c860191a2ce921cb6caddc859b1066 Related-Change: Ia9dd1a91cc3c9c946f5f029cdefc9e66bcf01046 Closes-Bug: #1733634
* | | | | | Merge "replicator: Log rsync file transfers less"Zuul2022-05-271-0/+7
|\ \ \ \ \ \ | |_|_|_|/ / |/| | | | |
| * | | | | replicator: Log rsync file transfers lessTim Burke2022-04-281-0/+7
| |/ / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Drop log level for successful rsyncs to debug; ops don't usually care. - Add an option to skip "send" lines entirely -- in a large cluster, during a meaningful expansion, there's too much information getting logged; it's just wasting disk space. Note that we already have similar filtering for directory creation; that's been present since the initial commit of Swift code. Drive-by: make it a little more clear that more than one suffix was likely replicated when logging about success. Change-Id: I02ba67e77e3378b2c2c8c682d5d230d31cd1bfa9
* | | | | Merge "Add missing services to sample rsyslog.conf"Zuul2022-05-181-0/+5
|\ \ \ \ \
| * | | | | Add missing services to sample rsyslog.confTakashi Kajinami2022-05-131-0/+5
| | |/ / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The sample rsyslog.conf file doesn't include some container services and object services. This change adds these services so that all daemon services are listed. Change-Id: Ica45b86d5b4da4e3ffc334c86bd383bebe7e7d5d
* | | | | Merge "Rip out pickle support in our memcached client"Zuul2022-05-052-22/+0
|\ \ \ \ \ | |/ / / / |/| | | |
| * | | | Rip out pickle support in our memcached clientTim Burke2022-04-272-22/+0
| |/ / / | | | | | | | | | | | | | | | | | | | | | | | | We said this would be going away back in 1.7.0 -- lets actually remove it. Change-Id: I9742dd907abea86da9259740d913924bb1ce73e7 Related-Change: Id7d6d547b103b4f23ebf5be98b88f09ec6027ce4
* | | | Clarify that rsync_io_timeout is also used for contimeoutTim Burke2022-04-281-1/+1
|/ / / | | | | | | | | | Change-Id: I5e4a270add2a625e6d5cb0ae9468313ddc88a81b
* | | Merge "object-updater: defer ratelimited updates"Zuul2022-02-221-0/+6
|\ \ \
| * | | object-updater: defer ratelimited updatesAlistair Coles2022-02-211-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, objects updates that could not be sent immediately due to per-container/bucket ratelimiting [1] would be skipped and re-tried during the next updater cycle. There could potentially be a period of time at the end of a cycle when the updater slept, having completed a sweep of the on-disk async pending files, despite having skipped updates during the cycle. Skipped updates would then be read from disk again during the next cycle. With this change the updater will defer skipped updates to an in-memory queue (up to a configurable maximum number) until the sweep of async pending files has completed, and then trickle out deferred updates until the cycle's interval expires. This increases the useful work done in the current cycle and reduces the amount of repeated disk IO during the next cycle. The deferrals queue is bounded in size and will evict least recently read updates in order to accept more recently read updates. This reduces the probablility that a deferred update has been made obsolete by newer on-disk async pending files while waiting in the deferrals queue. The deferrals queue is implemented as a collection of per-bucket queues so that updates can be drained from the queues in the order that buckets cease to be ratelimited. [1] Related-Change: Idef25cd6026b02c1b5c10a9816c8c6cbe505e7ed Co-Authored-By: Clay Gerrard <clay.gerrard@gmail.com> Co-Authored-By: Matthew Oliver <matt@oliver.net.au> Change-Id: I95e58df9f15c5f9d552b8f4c4989a474f52262f4
* | | | Merge "memcache: Add an item_size_warning_threshold option"Zuul2022-02-151-0/+6
|\ \ \ \
| * | | | memcache: Add an item_size_warning_threshold optionMatthew Oliver2022-02-151-0/+6
| | |/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Whenever an item is set which is larger than item_size_warning_threshold then a warning is logged in the form: 'Item size larger than warning threshold: 2048576 (2Mi) >= 1000000 (977Ki)' Setting the value to -1 (default) will turn off the warning. Change-Id: I1fb50844d6b9571efaab8ac67705b2fc1fe93e25
* | | | Trim sensitive information in the logs (CVE-2017-8761)Matthew Oliver2022-02-091-10/+14
|/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Several headers and query params were previously revealed in logs but are now redacted: * X-Auth-Token header (previously redacted in the {auth_token} field, but not the {headers} field) * temp_url_sig query param (used by tempurl middleware) * Authorization header and X-Amz-Signature and Signature query parameters (used by s3api middleware) This patch adds some new middleware helper methods to track headers and query parameters that should be redacted by proxy-logging. While instantiating the middleware, authors can call either: register_sensitive_header('case-insensitive-header-name') register_sensitive_param('case-sensitive-query-param-name') to add items that should be redacted. The redaction uses proxy-logging's existing reveal_sensitive_prefix config option to determine how much to reveal. Note that query params will still be logged in their entirety if eventlet_debug is enabled. UpgradeImpact ============= The reveal_sensitive_prefix config option now applies to more items; operators should review their currently-configured value to ensure it is appropriate for these new contexts. In particular, operators should consider reducing the value if it is more than 20 or so, even if that previously offered sufficient protection for auth tokens. Co-Authored-By: Tim Burke <tim.burke@gmail.com> Closes-Bug: #1685798 Change-Id: I88b8cfd30292325e0870029058da6fb38026ae1a
* | | Merge "s3api: Allow multiple storage domains"Zuul2022-01-281-2/+2
|\ \ \
| * | | s3api: Allow multiple storage domainsTim Burke2022-01-241-2/+2
| |/ / | | | | | | | | | | | | | | | | | | | | | Sometimes a cluster might be accessible via more than one set of domain names. Allow operators to configure them such that virtual-host style requests work with all names. Change-Id: I83b2fded44000bf04f558e2deb6553565d54fd4a
* | | Merge "proxy: Add a chance to skip memcache when looking for shard ranges"Zuul2022-01-271-0/+9
|\ \ \
| * | | proxy: Add a chance to skip memcache when looking for shard rangesTim Burke2022-01-261-0/+9
| |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | By having some small portion of calls skip cache and go straight to disk, we can ensure the cache is always kept fresh and never expires (at least, for active containers). Previously, when shard ranges fell out of cache there would frequently be a thundering herd that could overwhelm the container server, leading to 503s served to clients or an increase in async pendings. Include metrics for hit/miss/skip rates. Change-Id: I6d74719fb41665f787375a08184c1969c86ce2cf Related-Bug: #1883324
* | | Merge "Modify log_name in internal clients' pipeline configs"Zuul2022-01-261-1/+3
|\ \ \ | |/ / |/| |
| * | Modify log_name in internal clients' pipeline configsAlistair Coles2022-01-121-1/+3
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Modify the 'log_name' option in the InternalClient wsgi config for the following services: container-sharder, container-reconciler, container-deleter, container-sync and object-expirer. Previously the 'log_name' value for all internal client instances sharing a single internal-client.conf file took the value configured in the conf file, or would default to 'swift'. This resulted in no distinction between logs from each internal client, and no association with the service using a particular internal client. With this change the 'log_name' value will typically be <log_route>-ic where <log_route> is the service's conf file section name. For example, 'container-sharder-ic'. Note: any 'log_name' value configured in an internal client conf file will now be ignored for these services unless the option key is preceded by 'set'. Note: by default, the logger's StatdsClient uses the log_name as its tail_prefix when composing metrics' names. However, the proxy-logging middleware overrides the tail_prefix with the hard-coded value 'proxy-server'. This change to log_name therefore does not change the statsd metric names emitted by the internal client's proxy-logging. This patch does not change the logging of the services themselves, just their internal clients. Change-Id: I844381fb9e1f3462043d27eb93e3fa188b206d05 Related-Change: Ida39ec7eb02a93cf4b2aa68fc07b7f0ae27b5439
* | Finer grained ratelimit for updateClay Gerrard2022-01-061-0/+10
|/ | | | | | | | | | | | Throw our stream of async_pendings through a hash ring; if the virtual bucket gets hot just start leaving the updates on the floor and move on. It's off by default; and if you use it you're probably going to leave a bunch of async updates pointed at a small set of containers in the queue for the next sweep every sweep (so maybe turn it off at some point) Co-Authored-By: Alistair Coles <alistairncoles@gmail.com> Change-Id: Idef25cd6026b02c1b5c10a9816c8c6cbe505e7ed
* reconstructor: restrict max objects per revert jobAlistair Coles2021-12-031-0/+14
| | | | | | | | | | | | | | | | | | Previously the ssync Sender would attempt to revert all objects in a partition within a single SSYNC request. With this change the reconstructor daemon option max_objects_per_revert can be used to limit the number of objects reverted inside a single SSYNC request for revert type jobs i.e. when reverting handoff partitions. If more than max_objects_per_revert are available, the remaining objects will remain in the sender partition and will not be reverted until the next call to ssync.Sender, which would currrently be the next time the reconstructor visits that handoff partition. Note that the option only applies to handoff revert jobs, not to sync jobs. Change-Id: If81760c80a4692212e3774e73af5ce37c02e8aff
* sharder: Make stats interval configurableTim Burke2021-10-011-0/+3
| | | | Change-Id: Ia794a7e21794d2c1212be0e2d163004f85c2ab78
* Add a project scope read-only role to keystoneauthPete Zaitcev2021-08-021-0/+5
| | | | | | | | | This patch continues work for more of the "Consistent and Secure Default Policies". We already have system scope personas implemented, but the architecture people are asking for project scope now. At least we don't need domain scope. Change-Id: If7d39ac0dfbe991d835b76eb79ae978fc2fd3520
* Merge "container-reconciler: support multiple processes"Zuul2021-07-221-0/+10
|\
| * container-reconciler: support multiple processesClay Gerrard2021-07-211-0/+10
| | | | | | | | | | | | | | | | | | This follows the same pattern of configuration used in the object-expirer. When the container-recociler has a configuration value for processes it expects that many instances of the reconciler to be configured with a process value from [0, processes). Change-Id: Ie46bda37ca3f6e692ec31a4ddcd46f343fb1aeca
* | reconstructor: retire nondurable_purge_delay optionAlistair Coles2021-07-191-9/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The nondurable_purge_delay option was introduced in [1] to prevent the reconstructor removing non-durable data files on handoffs that were about to be made durable. The DiskFileManager commit_window option has since been introduced [2] which specifies a similar time window during which non-durable data files should not be removed. The commit_window option can be re-used by the reconstructor, making the nondurable_purge_delay option redundant. The nondurable_purge_delay option has not been available in any tagged release and is therefore removed with no backwards compatibility. [1] Related-Change: I0d519ebaaade35249fb7b17bd5f419ffdaa616c0 [2] Related-Change: I5f3318a44af64b77a63713e6ff8d0fd3b6144f13 Change-Id: I1589a7517b7375fcc21472e2d514f26986bf5079
* | diskfile: don't remove recently written non-durablesAlistair Coles2021-07-191-0/+7
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | DiskFileManager will remove any stale files during cleanup_ondisk_files(): these include tombstones and nondurable EC data fragments whose timestamps are older than reclaim_age. It can usually be safely assumed that a non-durable data fragment older than reclaim_age is not going to become durable. However, if an agent PUTs objects with specified older X-Timestamps (for example the reconciler or container-sync) then there is a window of time during which the object server has written an old non-durable data file but has not yet committed it to make it durable. Previously, if another process (for example the reconstructor) called cleanup_ondisk_files during this window then the non-durable data file would be removed. The subsequent attempt to commit the data file would then result in a traceback due to there no longer being a data file to rename, and of course the data file is lost. This patch modifies cleanup_ondisk_files to not remove old, otherwise stale, non-durable data files that were only written to disk in the preceding 'commit_window' seconds. 'commit_window' is configurable for the object server and defaults to 60.0 seconds. Closes-Bug: #1936508 Related-Change: I0d519ebaaade35249fb7b17bd5f419ffdaa616c0 Change-Id: I5f3318a44af64b77a63713e6ff8d0fd3b6144f13
* Add concurrency to reconcilerClay Gerrard2021-07-141-0/+2
| | | | | | | | | | Each reconciler process can now reconcile more than one queue entry at a time, up to the configured concurrency. By default concurrency is 1. There is no expected change to existing behavior. Entries are processed serially one a time. Change-Id: I72e9601b58c2f20bb1294876bb39f2c78827d5f8
* Merge "reconciler: PPI aware reconciler"Zuul2021-07-141-0/+1
|\
| * reconciler: PPI aware reconcilerMatthew Oliver2021-07-131-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | This patch makes the reconciler PPI aware. It does this by adding a helper method `can_reconcile_policy` that is used to check that the policies used for the source and destination aren't in the middle of a PPI (their ring doesn't have next_part_power set). In order to accomplish this the reconciler has had to include the POLICIES singleton and grown swift_dir and ring_check_interval config options. Closes-Bug: #1934314 Change-Id: I78a94dd1be90913a7a75d90850ec5ef4a85be4db
* | Merge "sharder: avoid small tail shards"Zuul2021-07-081-0/+6
|\ \