qpid/cpp/design_docs/old-cluster-issues.txt


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82

-*-org-*-
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#   http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied.  See the License for the
# specific language governing permissions and limitations
# under the License.

* Issues with the old design.

The cluster is based on virtual synchrony: each broker multicasts
events and the events from all brokers are serialized and delivered in
the same order to each broker.

In the current design raw byte buffers from client connections are
multicast, serialized and delivered in the same order to each broker.

Each broker has a replica of all queues, exchanges, bindings and also
all connections & sessions from every broker. Cluster code treats the
broker as a "black box", it "plays" the client data into the
connection objects and assumes that by giving the same input, each
broker will reach the same state.

A new broker joining the cluster receives a snapshot of the current
cluster state, and then follows the multicast conversation.

** Maintenance issues.

The entire state of each broker is replicated to every member:
connections, sessions, queues, messages, exchanges, management objects
etc. Any discrepancy in the state that affects how messages are
allocated to consumers can cause an inconsistency.

- Entire broker state must be faithfully updated to new members.
- Management model also has to be replicated.
- All queues are replicated, can't have unreplicated queues (e.g. for management)

Events that are not deterministically predictable from the client
input data stream can cause inconsistencies. In particular use of
timers/timestamps require cluster workarounds to synchronize.

A member that encounters an error which is not encounted by all other
members is considered inconsistent and will shut itself down. Such
errors can come from any area of the broker code, e.g. different
ACL files can cause inconsistent errors.

The following areas required workarounds to work in a cluster:

- Timers/timestamps in broker code: management, heartbeats, TTL
- Security: cluster must replicate *after* decryption by security layer.
- Management: not initially included in the replicated model, source of many inconsistencies.

It is very easy for someone adding a feature or fixing a bug in the
standalone broker to break the cluster by:
- adding new state that needs to be replicated in cluster updates.
- doing something in a timer or other non-connection thread.

It's very hard to test for such breaks. We need a looser coupling
and a more explicitly defined interface between cluster and standalone
broker code.

** Performance issues.

Virtual synchrony delivers all data from all clients in a single
stream to each broker.  The cluster must play this data thru the full
broker code stack: connections, sessions etc. in a single thread
context in order to get identical behavior on each broker. The cluster
has a pipelined design to get some concurrency but this is a severe
limitation on scalability in multi-core hosts compared to the
standalone broker which processes each connection in a separate thread
context.