summaryrefslogtreecommitdiff
path: root/storage/ndb/src/old_files/rep/TODO
blob: a2462fae6cd9f20e3e1e2b010a529d105b345f3a (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
REQUIREMENTS
------------
- It should be possible to run two systems with replication using the 
  same configuration file on both systems.

FEATURES TO IMPLEMENT
---------------------
- Fix so that execute and command uses ExtSender.
  None of them should have their own signals, this should 
  instead by abstacted to the RepStateRequest layer.
- Delete signals
    GSN_REP_INSERT_GCIBUFFER_CONF
    GSN_REP_INSERT_GCIBUFFER_REF
- Fix so that all ExtSenders are set at one point in the code only.
- Verify the following signals:
    GSN_REP_INSERT_GCIBUFFER_REQ
    GSN_REP_CLEAR_SS_GCIBUFFER_REQ
    GSN_REP_DROP_TABLE_REQ
- Fix all @todo's in the code
- Remove all #if 1, #if 0 etc.  
- Fix correct usage of dbug package used in MySQL source code.
- System table storing all info about channels
- Think about how channels, subscriptions etc map to SUMA Subscriptions
- TableInfoPS must be secured if SS REP is restarted and PS REP still
  has all log records needed to sync. (This could be saved in a system
  table instead of using the struct.)

KNOWN BUGS AND LIMITATIONS
--------------------------
- REP#1: Non-consistency due to non-logging stop                   [LIMITATION]
  Problem:
  - Stopping replication in state other than "Logging" can 
    lead to a non-consistent state of the destination database
  Suggested solution:
  - Implement a cleanData flag (= false) that indicates that 
    this has happend.

- REP#2: PS REP uses epochs from old subscription                         [BUG]
  The following scenario can lead to a non-correct replication:
  - Start replication X 
  - Wait until replication is in "Logging" state
  - Kill SS REP 
  - Let PS REP be alive
  - Start new replication Y
  - Replication Y can use old PS REP epochs from replication X.
  Suggested solution:
  - Mark PS buffers with channel ids
  - Make sure that all epoch requests use channel number in the requests.

- REP#3: When having two node groups, there is sometimes 626            [FIXED]
  Problem:
  - Sometimes (when doing updated) there is 626 error code when 
    using 2 node groups.
  - 626 = Tuple does not exists.
  - Current code in RepState.cpp is:
    if(s == Channel::App &&
       m_channel.getState() == Channel::DATASCAN_COMPLETED && 
       i.last() >= m_channel.getDataScanEpochs().last() &&
       i.last() >= m_channel.getMetaScanEpochs().last()) 
    {
      m_channel.setState(Channel::LOG);
      disableAutoStart();
    }
    When the system gets into LOG state, force flag is turned off
  Suggested solution:
  - During DATASCAN, force=true (i.e. updates are treated as writes, 
    deletes error due to non-existing tuple are ignored)
  - The code above must take ALL node groups into account.

- REP#4: User requests sometime vanish when DB node is down        [LIMITATION]
  Problem:
  - PS REP node does not always REF when no connection to GREP exists
  Suggested solution:
  - All REP->GREP signalsends should be checked.  If they return <0,
    then a REF signal should be returned.

- REP#5: User requests sometime vanish when PS REP is down                [BUG]
  Scenario:
  - Execute "Start" with PS REP node down
  Solution:
  - When start is executed, the connect flag should be checked

- REP#6: No warning if table exists                                [Lars, BUG!]
  Problem:
  - There is no warning if a replicated table already exists in the
    database.  
  Suggested solution:
  - Print warning
  - Set cleanData = false

- REP#7: Starting 2nd subscription crashes DB node (Grep.cpp:994)       [FIXED]
  Scenario:
  - Start replication 
  - Wait until replication is in "Logging" state
  - Kill SS REP
  - Let PS REP be alive
  - Start new replication
  - Now GREP crashes in Grep.cpp:994.
  Suggested fix:
  - If a new subscription is requested with same subscriberData 
    as already exists, then SUMA (or GREP) sends a REF signal
    indicating that SUMA does not allow a new subscription to be
    created. [Now no senderData is sent from REP.]

- REP#8: Dangling subscriptions in GREP/SUMA                 [Johan,LIMITATION]
  Problem:
  - If both REP nodes die, then there is no possibility to remove
    subscriptions from GREP/SUMA
  Suggested solution 1:
  - Fix so that GREP/SUMA can receive a subscription removal 
    signal with subid 0.  This means that ALL subscriptions are
    removed.  This meaning should be documented in the 
    signaldata class. 
  - A new user command "STOP ALL" is implemented that sends
    a request to delete all subscriptions.
  Suggested solution 2:
  - When GREP detects that ALL PS REP nodes associated with a s
    subscription are killed, then that subscription should be
    deleted.