summaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
* Remove dependency on rabbit_channel in classic queuesclassic-queue-type-fixes_ffKarl Nilsson2022-08-115-46/+96
| | | | | | | | | | | | Some queue -> channel messages were missed from classic queues when the queue type API was introduced. This commit fixes that which should make use of classic queues portable outside of the channel. This includes some refactoring to make more explicit that the stream_queue feature flag also enables queue types. More
* Feature flags: Support "auto-enable" feature flagsJean-Sébastien Pédron2022-08-113-15/+140
| | | | | | | | | | | | | | | | | | A feature flag can be marked as "auto-enable" by setting `auto_enable` to true in its properties. An auto-enable feature flag is automatically enabled as soon as all nodes in the cluster support it. This is achieved by trying to enable it when RabbitMQ starts, when a plugin is enabled/disabled or when a node joins/re-joins a cluster. If the feature flag can't be enabled, the error is ignored. An auto-enable feature flag also implicitly depends on `feature_flags_v2`. However, it should be used cautiously, especially if the feature flag has a migration function, because it might be enabled at an inappropriate time w.r.t. the user's workload.
* Revert "Set kernel param prevent_overlapping_partitions to true"David Ansari2022-08-107-10/+82
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 8070344a38b5d3efb2e6687c73e0a163c12bd5aa. We learnt during the last 6 days on master branch that RabbitMQ - as of today - is not compatible with kernel parameter `prevent_overlapping_partitions` set to `true`. RabbitMQ explicitly disconnects node in at least two places: 1. rabbit_node_monitor to "promote" a partial network partition to a full partition, and 2. rabbit_mnesia after a node reset to disconnect it from the rest of the cluster. There is no atomicity in the way we disconnect several nodes, because it's a simple loop. Therefore, remote nodes may/will detect disconnection at different times obviously. In global's new behavior behind prevent_overlapping_partitions, our attempt to disconnect all nodes in rabbit_mnesia creates a partial network partition from global's point of view, leading to a complete disconnection of the cluster. For example, test ``` make ct-clustering_management t=cluster_size_3:join_and_part_cluster ``` was flaky and demonstrates the 2nd bullet point above where RabbitMQ interfering with Erlang distribution conflicts with global's prevent_overlapping_partitions. When RabbitMQ resets a node, its last step is to loop over clustered nodes and disconnect from them one at a time. In this test with a 3-node cluster where we reset node A: 1. Node A instructs node B and C to remove node A from their view of the cluster 2. Node A disconnects from node B 3. global on node B get a nodedow event for node A, but node C is still connected to node A 4. global on node B concludes there is a network partition and disconnect from node A and node C At this point, each node is on its own. Nothing in RabbitMQ tries to restore the connection between nodes B and C. The correct path forward is: 1. Get rid of Mnesia replacing it with Khepri. 2. Once mirrored classic queues are removed, get rid of rabbit_node_monitor. 3. Have a clear and consistent view of the nodes comprising a RabbitMQ Cluster: In other words, do not use different sources of truths like nodes(), Mnesia, Ra clusters, global monitor at different places in the code. For the time being we live with `prevent_overlapping_partitions` set to `false` and with the workaround for global:sync/0 being stuck introduced in https://github.com/rabbitmq/rabbitmq-server/commit/9fcb31f348590a74fd526333cf881cfbe27241e6
* Bump dependencies in stream Java testArnaud Cogoluègnes2022-08-104-18/+20
|
* Merge pull request #5479 from ↵Arnaud Cogoluègnes2022-08-102-4/+4
|\ | | | | | | | | rabbitmq/mk-rabbitmq-stream-java-test-suite-interface-change Streams: adapt tests to the latest Java stream client listener interface
| * Streams: adapt tests to the latest Java stream client listener interfaceMichael Klishin2022-08-102-4/+4
|/
* Merge pull request #5467 from rabbitmq/mk-upgrade-eetcdMichael Klishin2022-08-103-5/+5
|\ | | | | Bump eetcd to 0.3.6
| * Bump eetcd to 0.3.6Michael Klishin2022-08-093-5/+5
| | | | | | | | | | See https://github.com/zhongwencool/eetcd/releases/tag/v0.3.6 for details
* | Fix failing testDavid Ansari2022-08-091-20/+21
|/ | | | | | due to the changes in https://github.com/rabbitmq/ra/pull/298 'delivery' ra event is now received before 'applied' ra event.
* Merge pull request #5427 from rabbitmq/rabbitmq-server-5412-stream-info-commandArnaud Cogoluègnes2022-08-099-115/+284
|\ | | | | Add StreamStats command to stream protocol
| * Use atom_to_binary/1 instead of rabbit_data_coercionArnaud Cogoluègnes2022-08-081-2/+1
| |
| * Rename StreamInfo to StreamStatsArnaud Cogoluègnes2022-08-087-66/+59
| | | | | | | | | | | | Other changes: returns a map of int64, use the new osiris:get_stats/1 API. References #5412
| * Keep stream_* return codesArnaud Cogoluègnes2022-08-032-3/+7
| | | | | | | | | | | | To keep compatibility with the Erlang client's users. References #5412
| * Make process liveness check remote if necessaryArnaud Cogoluègnes2022-08-031-2/+7
| | | | | | | | References #5412
| * Add StreamInfo command to stream protocolArnaud Cogoluègnes2022-08-039-110/+278
| | | | | | | | | | | | | | It returns general information on a stream, the first and committed offsets for now. Fixes #5412
* | Merge pull request #5442 from rabbitmq/prevent-overlapping-partitionsMichael Klishin2022-08-087-82/+10
|\ \ | | | | | | Set kernel param prevent_overlapping_partitions to true
| * | Set kernel param prevent_overlapping_partitions to trueDavid Ansari2022-08-087-82/+10
|/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This kernel parameter got introduced in Erlang 24.3. It is set to `false` by default in Erlang 24. It is set to `true` by default in Erlang 25. This commit requires Erlang >= 24.3. As described in commit message https://github.com/rabbitmq/rabbitmq-server/commit/4bf78d822d7496e03061119f4cb07c0b306e4c03 setting this flag to `true` will prevent global:sync/0 from hanging in the presence of network failures. Instead of relying on our own workaround of global:sync/0 being stuck introduced in https://github.com/rabbitmq/rabbitmq-server/commit/9fcb31f348590a74fd526333cf881cfbe27241e6 let us instead rely on the official Erlang fix that comes by setting prevent_overloapping_partitions to true.
* | Enable a post-#4563 testDavid Ansari2022-08-081-6/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | When a non-mirrored durable classic queue is hosted on a node that goes down, prior to #4563 not only was the behaviour that the queue gets deleted from the rabbit_queue table, but also that its corresponding bindings get deleted. The purpose of this test was to make sure that bindings get also properly deleted from the new rabbit_index_route table. Given that the behaviour now changed #4563 we can either delete this test or - as done in this commit - adapt this test.
* | Disable a test that needs revision post-#4563Michael Klishin2022-08-061-2/+4
| | | | | | | | | | How the behavior of this test should change is yet to be discussed with @dcorbacho @ansd @lhoguin
* | Merge pull request #5319 from NuwanSameera/feature-delete-connection-by-usernameMichael Klishin2022-08-065-5/+110
|\ \ | | | | | | HTTP API: allow connections to be listed and closed by username
| * | Add delete connection by username featureNuwan Sameera2022-08-055-5/+110
|/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Format code Fix whitespace, fix warning Update API docs Remove blank lines Add get all connections by username Fix method name issue Enable GET method to get connections by username Update API documentation Modify list all connections of username method Remove list_by_username method and modify get all connections of user API Code formatting, break up lines for readability Refactor code to use pattern matching more effectively Typo
* | Merge pull request #4563 from ↵Michael Klishin2022-08-051-8/+25
|\ \ | | | | | | | | | rabbitmq/loic-dont-delete-durable-queues-on-node-down
| * | Don't delete durable CQs from rabbit_queue table on node downLoïc Hoguin2022-04-151-8/+25
| | | | | | | | | | | | | | | This should help avoid issues where queues are no longer listed in rabbit_queue after a node has restarted, under load.
* | | Merge pull request #5450 from ↵Michael Klishin2022-08-058-11/+11
|\ \ \ | | | | | | | | | | | | rabbitmq/dependabot/github_actions/master/actions/cache-3.0.6
| * | | Bump actions/cache from 3.0.5 to 3.0.6dependabot[bot]2022-08-058-11/+11
|/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Bumps [actions/cache](https://github.com/actions/cache) from 3.0.5 to 3.0.6. - [Release notes](https://github.com/actions/cache/releases) - [Changelog](https://github.com/actions/cache/blob/main/RELEASES.md) - [Commits](https://github.com/actions/cache/compare/v3.0.5...v3.0.6) --- updated-dependencies: - dependency-name: actions/cache dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com>
* | | Terraform: add AMIs for Fedora 34 and 35Michael Klishin2022-08-051-0/+4
| | | | | | | | | | | | Pair: @the-mikedavis
* | | Merge pull request #5447 from rabbitmq/mk-require-erlang-24Michael Klishin2022-08-052-4/+5
|\ \ \ | | | | | | | | Require Erlang 24.3
| * | | Bump minimum required Erlang version to 24.3Michael Klishin2022-08-051-2/+2
| | | | | | | | | | | | | | | | | | | | we expect that 3.11 GA will require 25.0 but this would do for now
| * | | Terraform: Erlang packages for 24 and 25 have long been availableMichael Klishin2022-08-051-2/+3
|/ / /
* | | Merge pull request #5445 from rabbitmq/mk-drop-erlang-23-from-actionsMichael Klishin2022-08-054-13/+8
|\ \ \ | | | | | | | | Drop Erlang 23 from Actions test matrix (in 3.12.x/master, 3.11.x)
| * | | Drop Erlang 23 from Actions test matrixMichael Klishin2022-08-054-13/+8
|/ / / | | | | | | | | | | | | we still use it for the 3.8.x mixed version umbrella, for now
* | | Merge pull request #5438 from rabbitmq/global-syncJean-Sébastien Pédron2022-08-042-7/+7
|\ \ \ | | | | | | | | Prevent global:sync/0 from being stuck when hostname resolution is not available early on boot
| * | | Log at debug level when "global hang workaround" is appliedDavid Ansari2022-08-041-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | because it outputs the whole process state of global_name_server. Also, fix erroneous comments.
| * | | Prevent global:sync/0 from being stuckDavid Ansari2022-08-041-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Prior to this commit, global:sync/0 gets sometimes stuck when either performing a rolling update on Kubernetes or when creating a new RabbitMQ cluster on Kubernetes. When performing a rolling update, the node being booted will be stuck in: ``` 2022-07-26 10:49:58.891896+00:00 [debug] <0.226.0> == Plugins (prelaunch phase) == 2022-07-26 10:49:58.891908+00:00 [debug] <0.226.0> Setting plugins up 2022-07-26 10:49:58.920915+00:00 [debug] <0.226.0> Loading the following plugins: [cowlib,cowboy,rabbitmq_web_dispatch, 2022-07-26 10:49:58.920915+00:00 [debug] <0.226.0> rabbitmq_management_agent,amqp_client, 2022-07-26 10:49:58.920915+00:00 [debug] <0.226.0> rabbitmq_management,quantile_estimator, 2022-07-26 10:49:58.920915+00:00 [debug] <0.226.0> prometheus,rabbitmq_peer_discovery_common, 2022-07-26 10:49:58.920915+00:00 [debug] <0.226.0> accept,rabbitmq_peer_discovery_k8s, 2022-07-26 10:49:58.920915+00:00 [debug] <0.226.0> rabbitmq_prometheus] 2022-07-26 10:49:58.926373+00:00 [debug] <0.226.0> Feature flags: REFRESHING after applications load... 2022-07-26 10:49:58.926416+00:00 [debug] <0.372.0> Feature flags: registering controller globally before proceeding with task: refresh_after_app_load 2022-07-26 10:49:58.926450+00:00 [debug] <0.372.0> Feature flags: [global sync] @ rabbit@r1-server-3.r1-nodes.default ``` During cluster creation, an example log of global:sync/0 being stuck can be found in bullet point 2 of https://github.com/rabbitmq/rabbitmq-server/pull/5331#pullrequestreview-1050715029 When global:sync/0 is stuck, it never receives a message in line https://github.com/erlang/otp/blob/bd05b07f973f11d73c4fc77d59b69f212f121c2d/lib/kernel/src/global.erl#L2942 This issue can be observed in both `kind` and GKE. `kind` uses CoreDNS, GKE uses kubedns. CoreDNS does not resolve the hostname of RabbitMQ and its peers correctly for up to 30 seconds after node startup. This is because the default cache value of CoreDNS is 30 seconds and CoreDNS has a bug described in https://github.com/kubernetes/kubernetes/issues/92559 global:sync/0 is known to be buggy "in the presence of network failures" unless the kernel parameter `prevent_overlapping_partitions` is set to `true`. When either: 1. setting CoreDNS cache value to 1 second (see https://github.com/rabbitmq/rabbitmq-server/issues/5322#issuecomment-1195826135 on how to set this value), or 2. setting the kernel parameter `prevent_overlapping_partitions` to `true` rolling updates do NOT get stuck anymore. This means we are hitting here a combination of: 1. Kubernetes DNS bug not updating DNS caches promptly for headless services with `publishNotReadyAddresses: true`, and 2. Erlang bug which causes global:sync/0 to hang forever in the presence of network failures. The Erlang bug is fixed by setting `prevent_overlapping_partitions` to `true` (default in Erlang/OTP 25). In RabbitMQ however, we explicitly set `prevent_overlapping_partitions` to `false` because we fear other issues could arise if we set this parameter to `true`. Luckily, to resolve this issue of global:sync/0 being stuck, we can just call function rabbit_node_monitor:global_sync/0 which provides a workaround. This function was introduced 8 years ago in https://github.com/rabbitmq/rabbitmq-server/commit/9fcb31f348590a74fd526333cf881cfbe27241e6 With this commit applied, rolling updates are not stuck anymore and we see in the debug log the workaround sometimes being applied.
* | | | Merge pull request #5433 from rabbitmq/loic-fix-cqv1-full-recovery-bugMichael Klishin2022-08-043-6/+45
|\ \ \ \ | |/ / / |/| | |
| * | | CQv1: Fix failure to recover messages in rare casesLoïc Hoguin2022-08-043-6/+45
|/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | When a full recovery was done it was possible to lose messages for v1 queues when the queues only had a journal file and no segment files. In practice it should be a rare event because it requires the queue (or maybe the node) to crash first and then the vhost or the node to be restarted gracefully.
* | | rabbitmqctl(8): correct add_vhost option dashesMichael Klishin2022-08-021-1/+1
| | |
* | | Merge pull request #5418 from rabbitmq/ik-list_vhost_information_itemsMichael Klishin2022-08-021-3/+17
|\ \ \ | | | | | | | | rabbitmqctl(8): add new virtual host information items (follow-up to #5399)
| * | | rabbitmqctl(8): document optional args supported by add_vhostMichael Klishin2022-08-021-3/+9
| | | |
| * | | Follow-up for #5399. Add new vhost information items to the listIliia Khaprov - VMware2022-08-021-0/+8
| | |/ | |/|
* | | Merge pull request #5416 from ↵Michael Klishin2022-08-023-3/+3
|\ \ \ | | | | | | | | | | | | | | | | rabbitmq/dependabot/github_actions/master/erlef/setup-beam-1.12 Bump erlef/setup-beam from 1.11 to 1.12
| * | | Bump erlef/setup-beam from 1.11 to 1.12dependabot[bot]2022-08-013-3/+3
| |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Bumps [erlef/setup-beam](https://github.com/erlef/setup-beam) from 1.11 to 1.12. - [Release notes](https://github.com/erlef/setup-beam/releases) - [Commits](https://github.com/erlef/setup-beam/compare/v1.11...v1.12) --- updated-dependencies: - dependency-name: erlef/setup-beam dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com>
* | | 3.10.7 shipped with Osiris 1.3.0Michael Klishin2022-08-021-1/+1
|/ /
* | Merge pull request #5408 from rabbitmq/ik-import-vhost-default-queue-type-5399Michael Klishin2022-08-013-1/+4
|\ \ | | | | | | Import default queue type when virtual host is imported
| * | close #5399, set default vhost queue type from import's metadataIliia Khaprov2022-08-013-1/+4
| | |
* | | Merge pull request #5410 from rabbitmq/rabbitmq-server-5305-follow-upMichael Klishin2022-08-014-5/+57
|\ \ \ | |/ / |/| | ctl add_vhost: check if relevant feature flags are enabled
| * | Validate the feature flag behind user-provided queue type on the server endMichael Klishin2022-08-013-4/+24
| | |
| * | ctl add_vhost: check if relevant feature flags are enabledMichael Klishin2022-08-012-2/+34
|/ /
* | Merge pull request #5235 from rabbitmq/remove-quorum_queue-ff-compatibility-codeJean-Sébastien Pédron2022-08-0133-1481/+250
|\ \ | | | | | | Remove pre-quorum-queue compatibility code
| * | Remove test code which depended on the `quorum_queue` feature flagsJean-Sébastien Pédron2022-08-0123-413/+142
| | | | | | | | | | | | These checks are now irrelevant as the feature flag is required.