summaryrefslogtreecommitdiff
path: root/releasenotes
diff options
context:
space:
mode:
authorTim Burke <tim.burke@gmail.com>2021-06-24 10:42:12 -0700
committerTim Burke <tim.burke@gmail.com>2021-07-23 15:05:18 -0700
commita8f15128639f7e62a3e8d51c607b45c496206191 (patch)
tree9ba834fe8305344e9f3212a5c06c3561c0566139 /releasenotes
parent5759072d2544f30949eb41e0102e281aac192438 (diff)
downloadswift-a8f15128639f7e62a3e8d51c607b45c496206191.tar.gz
AUTHORS/CHANGELOG for 2.28.02.28.0
Change-Id: Ibbaaca2a0fd36aa6a893b0e1146dd35fafe1e634
Diffstat (limited to 'releasenotes')
-rw-r--r--releasenotes/notes/2_28_0_release-f2515e07fb61cd01.yaml235
1 files changed, 235 insertions, 0 deletions
diff --git a/releasenotes/notes/2_28_0_release-f2515e07fb61cd01.yaml b/releasenotes/notes/2_28_0_release-f2515e07fb61cd01.yaml
new file mode 100644
index 000000000..bd3fdde75
--- /dev/null
+++ b/releasenotes/notes/2_28_0_release-f2515e07fb61cd01.yaml
@@ -0,0 +1,235 @@
+---
+features:
+ - |
+ ``swift-manage-shard-ranges`` improvements:
+
+ * Exit codes are now applied more consistently:
+
+ - 0 for success
+ - 1 for an unexpected outcome
+ - 2 for invalid options
+ - 3 for user exit
+
+ As a result, some errors that previously resulted in exit code 2
+ will now exit with code 1.
+
+ * Added a new 'repair' command to automatically identify and
+ optionally resolve overlapping shard ranges.
+
+ * Added a new 'analyze' command to automatically identify overlapping
+ shard ranges and recommend a resolution based on a JSON listing
+ of shard ranges such as produced by the 'show' command.
+
+ * Added a ``--includes`` option for the 'show' command to only output
+ shard ranges that may include a given object name.
+
+ * Added a ``--dry-run`` option for the 'compact' command.
+
+ * The 'compact' command now outputs the total number of compactible
+ sequences.
+
+ - |
+ Partition power increase improvements:
+
+ * The relinker now spawns multiple subprocesses to process disks
+ in parallel. By default, one worker is spawned per disk; use the
+ new ``--workers`` option to control how many subprocesses are used.
+ Use ``--workers=0`` to maintain the previous behavior.
+
+ * The relinker can now target specific storage policies or
+ partitions by using the new ``--policy`` and ``--partition``
+ options.
+
+ - |
+ More daemons now support systemd notify sockets.
+
+ - |
+ The container-reconciler now scales out better with new ``processes``,
+ ``process``, and ``concurrency`` options, similar to the object-expirer.
+deprecations:
+ - |
+ Container sharding deprecations:
+
+ * Added a new config option, ``shrink_threshold``, to specify the
+ absolute size below which a shard will be considered for shrinking.
+ This overrides the ``shard_shrink_point`` configuration option, which
+ expressed this as a percentage of ``shard_container_threshold``.
+ ``shard_shrink_point`` is now deprecated.
+
+ * Similar to above, ``expansion_limit`` was added as an absolute-size
+ replacement for the now-deprecated ``shard_shrink_merge_point``
+ configuration option.
+fixes:
+ - |
+ Sharding improvements:
+
+ * When building a listing from shards, any failure to retrieve
+ listings will result in a 503 response. Previously, failures
+ fetching a partiucular shard would result in a gap in listings.
+
+ * Container-server logs now include the shard path in the referer
+ field when receiving stat updates.
+
+ * Added a new config option, ``rows_per_shard``, to specify how many
+ objects should be in each shard when scanning for ranges. The default
+ is ``shard_container_threshold / 2``, preserving existing behavior.
+
+ * Added a new config option, ``minimum_shard_size``. When scanning
+ for shard ranges, if the final shard would otherwise contain
+ fewer than this many objects, the previous shard will instead
+ be expanded to the end of the namespace (and so may contain up
+ to ``rows_per_shard + minimum_shard_size`` objects). This reduces
+ the number of small shards generated. The default value is
+ ``rows_per_shard / 5``.
+
+ * The sharder now correctly identifies and fails audits for shard
+ ranges that overlap exactly.
+
+ * The sharder and swift-manage-shard-ranges now consider total row
+ count (instead of just object count) when deciding whether a shard
+ is a candidate for shrinking.
+
+ * If the sharder encounters shard range gaps while cleaving, it will
+ now log an error and halt sharding progress. Previously, rows may
+ not have been moved properly, leading to data loss.
+
+ * Sharding cycle time and last-completion time are now available via
+ swift-recon.
+
+ * Fixed an issue where resolving overlapping shard ranges via shrinking
+ could prematurely mark created or cleaved shards as active.
+
+ - |
+ S3 API improvements:
+
+ * Added an option, ``ratelimit_as_client_error``, to return 429s for
+ rate-limited responses. Several clients/SDKs have seem to support
+ retries with backoffs on 429, and having it as a client error
+ cleans up logging and metrics. By default, Swift will respond 503,
+ matching AWS documentation.
+
+ * Fixed a server error in bucket listings when ``s3_acl`` is enabled
+ and staticweb is configured for the container.
+
+ * Fixed a server error when a client exceeds ``client_timeout`` during an
+ upload. Now, a ``RequestTimeout`` error is correctly returned.
+
+ * Fixed a server error when downloading multipart uploads/static large
+ objects that have missing or inaccessible segments. This is a state
+ that cannot arise in AWS, so a new ``BrokenMPU`` error is returned,
+ indicating that retrying the request is unlikely to succeed.
+
+ * Fixed several issues with the prefix, marker, and delimiter
+ parameters that would be mirrored back to clients when listing
+ buckets.
+
+ - |
+ Partition power increase fixes:
+
+ * The relinker now performs eventlet-hub selection the same way as
+ other daemons. In particular, ``epolls`` will no longer be selected,
+ as it seemed to cause occassional hangs.
+
+ * Partitions that encountered errors during relinking are no longer
+ marked as completed in the relinker state file. This ensures that
+ a subsequent relink will retry the failed partitions.
+
+ * Partition cleanup is more robust, decreasing the likelihood of
+ leaving behind mostly-empty partitions from the old partition
+ power.
+
+ * Improved relinker progress logging, and started collecting
+ progress information for swift-recon.
+
+ * Cleanup is more robust to files and directories being deleted by
+ another process.
+
+ * The relinker better handles data found from earlier partition power
+ increases.
+
+ * The relinker better handles tombstones found for the same object
+ but with different inodes.
+
+ * The reconciler now defers working on policies that have a partition
+ power increase in progress to avoid issues with concurrent writes.
+
+ - |
+ Erasure coding fixes:
+
+ * Added the ability to quarantine EC fragments that have no (or few)
+ other fragments in the cluster. A new configuration option,
+ ``quarantine_threshold``, in the reconstructor controls the point at
+ the fragment will be quarantined; the default (0) will never
+ quarantine. Only fragments older than ``quarantine_age`` (default:
+ ``reclaim_age``) may be quarantined. Before quarantining, the
+ reconstructor will attempt to fetch fragments from handoff nodes
+ in addition to the usual primary nodes; a new ``request_node_count``
+ option (default ``2 * replicas``) limits the total number of nodes to
+ contact.
+
+ * Added a delay before deleting non-durable data. A new configuration
+ option, ``commit_window`` in the ``[DEFAULT]`` section of
+ object-server.conf, adjusts this delay; the default is 60 seconds. This
+ improves the durability of both back-dated PUTs (from the reconciler or
+ container-sync, for example) and fresh writes to handoffs by preventing
+ the reconstructor from deleting data that the object-server was still
+ writing.
+
+ * Improved proxy-server and object-reconstructor logging when data
+ cannot be reconstructed.
+
+ * Fixed an issue where some but not all fragments having metadata
+ applied could prevent reconstruction of missing fragments.
+
+ * Server-side copying of erasure-coded data to a replicated policy no
+ longer copies EC sysmeta. The previous behavior had no material
+ effect, but could confuse operators examining data on disk.
+
+ - |
+ Python 3 fixes:
+
+ * Fixed a server error when performing a PUT authorized via
+ tempurl with some proxy pipelines.
+
+ * Fixed a server error during GET of a symlink with some proxy
+ pipelines.
+
+ * Fixed an issue with logging setup when /dev/log doesn't exist
+ or is not a UNIX socket.
+
+ - |
+ The dark-data audit watcher now skips objects younger than a new
+ configurable ``grace_age`` period. This avoids issues where data
+ could be flagged, quarantined, or deleted because of listing
+ consistency issues. The default is one week.
+
+ - |
+ The dark-data audit watcher now requires that all primary locations
+ for an object's container agree that the data does not appear in
+ listings to consider data "dark". Previously, a network partition
+ that left an object node isolated could cause it to quarantine or
+ delete all of its data.
+
+ - |
+ ``EPIPE`` errors no longer log tracebacks.
+
+ - |
+ The account and container auditors now log and update recon before
+ going to sleep.
+
+ - |
+ The object-expirer logs fewer client disconnects.
+
+ - |
+ ``swift-recon-cron`` now includes the last time it was run in the recon
+ information.
+
+ - |
+ ``EIO`` errors during read now cause object diskfiles to be quarantined.
+
+ - |
+ The formpost middleware now properly supports uploading multiple files
+ with different content-types.
+
+ - |
+ Various other minor bug fixes and improvements.