| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
| |
Some queue -> channel messages were missed from classic queues
when the queue type API was introduced. This commit fixes that
which should make use of classic queues portable outside of the
channel.
This includes some refactoring to make more explicit that
the stream_queue feature flag also enables queue types.
More
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A feature flag can be marked as "auto-enable" by setting `auto_enable` to
true in its properties.
An auto-enable feature flag is automatically enabled as soon as all
nodes in the cluster support it. This is achieved by trying to enable it
when RabbitMQ starts, when a plugin is enabled/disabled or when a node
joins/re-joins a cluster. If the feature flag can't be enabled, the
error is ignored.
An auto-enable feature flag also implicitly depends on
`feature_flags_v2`.
However, it should be used cautiously, especially if the feature flag
has a migration function, because it might be enabled at an
inappropriate time w.r.t. the user's workload.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit 8070344a38b5d3efb2e6687c73e0a163c12bd5aa.
We learnt during the last 6 days on master branch that RabbitMQ
- as of today - is not compatible with kernel parameter
`prevent_overlapping_partitions` set to `true`.
RabbitMQ explicitly disconnects node in at least two places:
1. rabbit_node_monitor to "promote" a partial network partition
to a full partition, and
2. rabbit_mnesia after a node reset to disconnect it from the
rest of the cluster.
There is no atomicity in the way we disconnect several nodes,
because it's a simple loop. Therefore, remote nodes may/will detect
disconnection at different times obviously. In global's new
behavior behind prevent_overlapping_partitions, our attempt to
disconnect all nodes in rabbit_mnesia creates a partial network
partition from global's point of view, leading to a complete
disconnection of the cluster.
For example, test
```
make ct-clustering_management t=cluster_size_3:join_and_part_cluster
```
was flaky and demonstrates the 2nd bullet point above where RabbitMQ
interfering with Erlang distribution conflicts with global's
prevent_overlapping_partitions.
When RabbitMQ resets a node, its last step is to loop over
clustered nodes and disconnect from them one at a time.
In this test with a 3-node cluster where we reset node A:
1. Node A instructs node B and C to remove node A from their view
of the cluster
2. Node A disconnects from node B
3. global on node B get a nodedow event for node A, but node C is
still connected to node A
4. global on node B concludes there is a network partition and
disconnect from node A and node C
At this point, each node is on its own.
Nothing in RabbitMQ tries to restore the connection between
nodes B and C.
The correct path forward is:
1. Get rid of Mnesia replacing it with Khepri.
2. Once mirrored classic queues are removed, get rid of rabbit_node_monitor.
3. Have a clear and consistent view of the nodes comprising a RabbitMQ Cluster:
In other words, do not use different sources of truths like nodes(),
Mnesia, Ra clusters, global monitor at different places in the code.
For the time being we live with `prevent_overlapping_partitions` set to `false`
and with the workaround for global:sync/0 being stuck introduced in
https://github.com/rabbitmq/rabbitmq-server/commit/9fcb31f348590a74fd526333cf881cfbe27241e6
|
| |
|
|\
| |
| |
| |
| | |
rabbitmq/mk-rabbitmq-stream-java-test-suite-interface-change
Streams: adapt tests to the latest Java stream client listener interface
|
|/ |
|
|\
| |
| | |
Bump eetcd to 0.3.6
|
| |
| |
| |
| |
| | |
See https://github.com/zhongwencool/eetcd/releases/tag/v0.3.6 for
details
|
|/
|
|
|
|
| |
due to the changes in https://github.com/rabbitmq/ra/pull/298
'delivery' ra event is now received before 'applied' ra event.
|
|\
| |
| | |
Add StreamStats command to stream protocol
|
| | |
|
| |
| |
| |
| |
| |
| | |
Other changes: returns a map of int64, use the new osiris:get_stats/1 API.
References #5412
|
| |
| |
| |
| |
| |
| | |
To keep compatibility with the Erlang client's users.
References #5412
|
| |
| |
| |
| | |
References #5412
|
| |
| |
| |
| |
| |
| |
| | |
It returns general information on a stream, the first
and committed offsets for now.
Fixes #5412
|
|\ \
| | |
| | | |
Set kernel param prevent_overlapping_partitions to true
|
|/ /
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This kernel parameter got introduced in Erlang 24.3.
It is set to `false` by default in Erlang 24.
It is set to `true` by default in Erlang 25.
This commit requires Erlang >= 24.3.
As described in commit message
https://github.com/rabbitmq/rabbitmq-server/commit/4bf78d822d7496e03061119f4cb07c0b306e4c03
setting this flag to `true` will prevent global:sync/0 from hanging
in the presence of network failures.
Instead of relying on our own workaround of global:sync/0 being stuck
introduced in
https://github.com/rabbitmq/rabbitmq-server/commit/9fcb31f348590a74fd526333cf881cfbe27241e6
let us instead rely on the official Erlang fix that comes by setting
prevent_overloapping_partitions to true.
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
When a non-mirrored durable classic queue is hosted on a node
that goes down, prior to #4563 not only was the behaviour
that the queue gets deleted from the rabbit_queue table,
but also that its corresponding bindings get deleted.
The purpose of this test was to make sure that bindings
get also properly deleted from the new rabbit_index_route
table.
Given that the behaviour now changed #4563 we can either
delete this test or - as done in this commit - adapt this test.
|
| |
| |
| |
| |
| | |
How the behavior of this test should change
is yet to be discussed with @dcorbacho @ansd @lhoguin
|
|\ \
| | |
| | | |
HTTP API: allow connections to be listed and closed by username
|
|/ /
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Format code
Fix whitespace, fix warning
Update API docs
Remove blank lines
Add get all connections by username
Fix method name issue
Enable GET method to get connections by username
Update API documentation
Modify list all connections of username method
Remove list_by_username method and modify get all connections of user API
Code formatting, break up lines for readability
Refactor code to use pattern matching more effectively
Typo
|
|\ \
| | |
| | |
| | | |
rabbitmq/loic-dont-delete-durable-queues-on-node-down
|
| | |
| | |
| | |
| | |
| | | |
This should help avoid issues where queues are no longer listed
in rabbit_queue after a node has restarted, under load.
|
|\ \ \
| | | |
| | | |
| | | | |
rabbitmq/dependabot/github_actions/master/actions/cache-3.0.6
|
|/ / /
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Bumps [actions/cache](https://github.com/actions/cache) from 3.0.5 to 3.0.6.
- [Release notes](https://github.com/actions/cache/releases)
- [Changelog](https://github.com/actions/cache/blob/main/RELEASES.md)
- [Commits](https://github.com/actions/cache/compare/v3.0.5...v3.0.6)
---
updated-dependencies:
- dependency-name: actions/cache
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com>
|
| | |
| | |
| | |
| | | |
Pair: @the-mikedavis
|
|\ \ \
| | | |
| | | | |
Require Erlang 24.3
|
| | | |
| | | |
| | | |
| | | |
| | | | |
we expect that 3.11 GA will require 25.0 but this would do
for now
|
|/ / / |
|
|\ \ \
| | | |
| | | | |
Drop Erlang 23 from Actions test matrix (in 3.12.x/master, 3.11.x)
|
|/ / /
| | |
| | |
| | |
| | | |
we still use it for the 3.8.x mixed version umbrella,
for now
|
|\ \ \
| | | |
| | | | |
Prevent global:sync/0 from being stuck when hostname resolution is not available early on boot
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
because it outputs the whole process state of global_name_server.
Also, fix erroneous comments.
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Prior to this commit, global:sync/0 gets sometimes stuck when either
performing a rolling update on Kubernetes or when creating a new
RabbitMQ cluster on Kubernetes.
When performing a rolling update, the node being booted will be stuck
in:
```
2022-07-26 10:49:58.891896+00:00 [debug] <0.226.0> == Plugins (prelaunch phase) ==
2022-07-26 10:49:58.891908+00:00 [debug] <0.226.0> Setting plugins up
2022-07-26 10:49:58.920915+00:00 [debug] <0.226.0> Loading the following plugins: [cowlib,cowboy,rabbitmq_web_dispatch,
2022-07-26 10:49:58.920915+00:00 [debug] <0.226.0> rabbitmq_management_agent,amqp_client,
2022-07-26 10:49:58.920915+00:00 [debug] <0.226.0> rabbitmq_management,quantile_estimator,
2022-07-26 10:49:58.920915+00:00 [debug] <0.226.0> prometheus,rabbitmq_peer_discovery_common,
2022-07-26 10:49:58.920915+00:00 [debug] <0.226.0> accept,rabbitmq_peer_discovery_k8s,
2022-07-26 10:49:58.920915+00:00 [debug] <0.226.0> rabbitmq_prometheus]
2022-07-26 10:49:58.926373+00:00 [debug] <0.226.0> Feature flags: REFRESHING after applications load...
2022-07-26 10:49:58.926416+00:00 [debug] <0.372.0> Feature flags: registering controller globally before proceeding with task: refresh_after_app_load
2022-07-26 10:49:58.926450+00:00 [debug] <0.372.0> Feature flags: [global sync] @ rabbit@r1-server-3.r1-nodes.default
```
During cluster creation, an example log of global:sync/0 being stuck can
be found in bullet point 2 of
https://github.com/rabbitmq/rabbitmq-server/pull/5331#pullrequestreview-1050715029
When global:sync/0 is stuck, it never receives a message in line
https://github.com/erlang/otp/blob/bd05b07f973f11d73c4fc77d59b69f212f121c2d/lib/kernel/src/global.erl#L2942
This issue can be observed in both `kind` and GKE.
`kind` uses CoreDNS, GKE uses kubedns.
CoreDNS does not resolve the hostname of RabbitMQ and its peers
correctly for up to 30 seconds after node startup.
This is because the default cache value of CoreDNS is 30 seconds and
CoreDNS has a bug described in
https://github.com/kubernetes/kubernetes/issues/92559
global:sync/0 is known to be buggy "in the presence of network failures"
unless the kernel parameter `prevent_overlapping_partitions` is set to
`true`.
When either:
1. setting CoreDNS cache value to 1 second (see
https://github.com/rabbitmq/rabbitmq-server/issues/5322#issuecomment-1195826135
on how to set this value), or
2. setting the kernel parameter `prevent_overlapping_partitions` to `true`
rolling updates do NOT get stuck anymore.
This means we are hitting here a combination of:
1. Kubernetes DNS bug not updating DNS caches promptly for headless
services with `publishNotReadyAddresses: true`, and
2. Erlang bug which causes global:sync/0 to hang forever in the presence
of network failures.
The Erlang bug is fixed by setting `prevent_overlapping_partitions` to `true` (default in Erlang/OTP 25).
In RabbitMQ however, we explicitly set `prevent_overlapping_partitions`
to `false` because we fear other issues could arise if we set this parameter to `true`.
Luckily, to resolve this issue of global:sync/0 being stuck, we can just
call function rabbit_node_monitor:global_sync/0 which provides a
workaround. This function was introduced 8 years ago in
https://github.com/rabbitmq/rabbitmq-server/commit/9fcb31f348590a74fd526333cf881cfbe27241e6
With this commit applied, rolling updates are not stuck anymore and we
see in the debug log the workaround sometimes being applied.
|
|\ \ \ \
| |/ / /
|/| | | |
|
|/ / /
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
When a full recovery was done it was possible to lose messages
for v1 queues when the queues only had a journal file and no
segment files.
In practice it should be a rare event because it requires the
queue (or maybe the node) to crash first and then the vhost or
the node to be restarted gracefully.
|
| | | |
|
|\ \ \
| | | |
| | | | |
rabbitmqctl(8): add new virtual host information items (follow-up to #5399)
|
| | | | |
|
| | |/
| |/| |
|
|\ \ \
| | | |
| | | |
| | | |
| | | | |
rabbitmq/dependabot/github_actions/master/erlef/setup-beam-1.12
Bump erlef/setup-beam from 1.11 to 1.12
|
| |/ /
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Bumps [erlef/setup-beam](https://github.com/erlef/setup-beam) from 1.11 to 1.12.
- [Release notes](https://github.com/erlef/setup-beam/releases)
- [Commits](https://github.com/erlef/setup-beam/compare/v1.11...v1.12)
---
updated-dependencies:
- dependency-name: erlef/setup-beam
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
|
|/ / |
|
|\ \
| | |
| | | |
Import default queue type when virtual host is imported
|
| | | |
|
|\ \ \
| |/ /
|/| | |
ctl add_vhost: check if relevant feature flags are enabled
|
| | | |
|
|/ / |
|
|\ \
| | |
| | | |
Remove pre-quorum-queue compatibility code
|
| | |
| | |
| | |
| | | |
These checks are now irrelevant as the feature flag is required.
|