summaryrefslogtreecommitdiff
path: root/qpid/cpp/src/qpid/linearstore/ISSUES
blob: ccadefc20c74e9b6daf32f5b55392d9050f6974e (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#   http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied.  See the License for the
# specific language governing permissions and limitations
# under the License.
#

Linear Store issues:

Current/pending:
================
 Q-JIRA RHBZ     Description / Comments
 ------ -------  ----------------------
   5359 -        Linearstore: Implement new management schema and wire into store
   5360 -        Linearstore: Evaluate and rework logging to produce a consistent log output
   5361 -        Linearstore: No tests for linearstore functionality currently exist
                   svn r.1564893 2014-02-05: Added tx-test-soak.sh
                   svn r.1564935 2014-02-05: Added license text to tx-test-soak.sh
                   * No existing tests for linearstore:
                   ** Basic broker-level tests for txn and non-txn recovery
                   ** Store-level tests which check write boundary conditions
                   ** EFP tests, including file recovery, error management
                   ** Unit tests
                   ** Basic performance tests
   5362 -        Linearstore: No store tools exist for examining the journals
                   svn r.1556888 2014-01-09: WIP checkin for linearstore version of qpid_qls_analyze. Needs testing and tidy-up.
                   svn r.1560530 2014-01-22: Bugfixes for qpid_qls_analyze
                   svn r.1561848 2014-01-27: Bugfixes and enhancements for qpid_qls_analyze
                   svn r.1564808 2014-02-05: Bugfixes and enhancements for qpid_qls_analyze
                   * Store analysis and status
                   * Recovery/reading of message content
                   * Empty file pool status and management
   5464 -        [linearstore] Incompletely created journal files accumulate in EFP
   5484 1035843  Slow performance for producers
                   svn r.1558592 2014-01-15 fixes an issue with using /dev/random as a source of random numbers for Journal serial numbers.
                   svn r.1558913 2014-01-16 replaces use of /dev/urandom with several calls to rand() to construct a 64-bit random number.
                   * Recommend rebuilding and testing for performance again with these two fixes. Marked POST.
   -    1039522  Qpid crashes while recovering from linear store around apid::linearstore::journal::JournalFile::getFqFileName() including enq_rec::decode() threw JERR_JREC_BAD_RECTAIL
                   * Possible dup of 1039525
                   * May be fixed by QPID-5483 - waiting for needinfo, recommend rebuilding with QPID-5483 fix and re-testing. Marked POST.
   -    1039525  Qpid crashes while recovering from linear store around apid::linearstore::journal::jexception::format including enq_rec::decode() threw JERR_JREC_BAD_REC_TAIL
                   * Possible dup of 1039522
                   * May be fixed by QPID-5483 - waiting for needinfo, recommend rebuilding with QPID-5483 fix and re-testing. Marked POST.
#  -    1049870  [LinearStore] auto-delete property does not survive restart

Fixed/closed (in commit order):
===============================
 Q-JIRA RHBZ     Description / Comments
 ------ -------  ----------------------
   5357 1052518  Linearstore: Empty file recycling not functional
                   svn r.1545563 2013-11-26: Propsed fix. VERIFIED
   5358 1052727  Linearstore: Checksums not implemented in record tail
                   svn r.1547601 2013-12-03: Propsed fix. NEEDINFO on algorithm
   5387 1036071  Linearstore: Segmentation fault when deleting queue
                   svn r.1547641 2013-12-03: Propsed fix. VERIFIED
   5388 1035802  Linearstore: Segmentation fault when recovering empty queue
                   svn r.1547921 2013-12-04: Propsed fix. VERIFIED
NO-JIRA -        Added missing Apache copyright/license text
                   svn r.1551304 2013-12-16: Propsed fix
   5425 1052445  Linearstore: Transaction Prepared List (TPL) fails with jexception 0x0402 AtomicCounter::addLimit() threw JERR_JNLF_FILEOFFSOVFL
                   svn r.1551361 2013-12-16: Proposed fix VERIFIED
   5442 1039949  Linearstore: Dtx recover test fails
                   svn r.1552772 2013-12-20: Proposed fix VERIFIED
   5444 1052775  Linearstore: Recovering from qpid-txtest fails with "Inconsistent TPL 2PC count" error message
                   svn r.1553148 2013-12-23: Proposed fix NEEDIFNO on reproduction and testing
   -    1038599  [LinearStore] Abort when deleting used queue after restart
                   CLOSED-NOTABUG 2014-01-06
   5460 1051097  [linearstore] Recovery of store which contains prepared but incomplete transactions results in message loss
                   svn r.1556892 2014-01-09: Proposed fix VERIFIED
   5473 1051924  [linearstore] Recovery of journal in which last logical file contains truncated record causes crash
                   svn r.1557620 2014-01-12: Proposed fix MODIFIED
   5483 -        [linearstore] Recovery of journal with partly written record fails with "JERR_JREC_BADRECTAIL: Invalid data record tail" error message
                   svn r.1558589 2014-01-15: Proposed fix
                   * May be linked to RHBZ 1039522 - VERIFIED
                   * May be linked to RHBZ 1039525 - VERIFIED
   5487 1054448  [linearstore] Replace use of /dev/urandom with c random generator calls
                   svn r.1558913 2014-01-16: Proposed fix VEFIFIED
   5479 1053701  [linearstore] Using recovered store results in "JERR_JNLF_FILEOFFSOVFL: Attempted to increase submitted offset past file size. (JournalFile::submittedDblkCount)" error message
                   * Probability: 2 of 600 (0.3%) using tx-test-soak.sh
                   * Fixed by checkin for QPID-5480, no longer able to reproduce. VERIFIED
   5480 1053749  [linearstore] Recovery of store failure with "JERR_MAP_NOTFOUND: Key not found in map." error message
                   svn r.1564877 2014-02-05: Proposed fix
                   * Probability: 6 of 600 (1.0%) using tx-test-soak.sh
                   * If broker is started a second time after failure, it starts correctly and test completes ok.
                   * Problem: File is being recycled to EFP with still-locked enqueues in it (ie dequeued transactionally).
                   * Problem: Record alignment check writes filler records to wrong file when decoding bad record moves across a file boundary
   5603 1063700  [linearstore] broker restart fails under stress test
                   svn r.1574513 2014-03-05: Proposed fix. POST
                   * jexception 0x0701 RecoveryManager::readNextRemainingRecord() threw JERR_JREC_BADRECTAIL
   5607 1064181  [linearstore] Qpidd closes transactional client session&connection with async_dequeue() failed
                   svn r.1575009 2014-03-06 Proposed fix. POST
                   * jexception 0x010b LinearFileController::getCurrentSerial() threw JERR_NULL
   -    1064230  [linearstore] Qpidd linearstore recovery sometimes fail to recover messages with recoverMessages() failed
                   * jexception 0x0701 RecoveryManager::readNextRemainingRecord() threw JERR_JREC_BADRECTAIL
                   * possible dup of 1063700
   -    1036026  [LinearStore] Qpid linear store unable to create durable queue - framing-error: Queue <q-name>: create() failed: jexception 0x0000
                   * UNABLE TO REPRODUCE - but Frantizek has additional info
                   * Retested after checkin 1575009, problem solved. VERIFIED

Ordered checkin list:
=====================
In order to port the linearstore changes from trunk to a branch, the following svn checkins need to be ported in order:

no.   svn r  Q-JIRA     RHBZ       Date
--- ------- ------- -------- ----------
 1. 1545563    5357  1052518 2013-11-26
 2. 1547601    5358  1052727 2013-12-03
 3. 1547641    5387  1036071 2013-12-03
 4. 1547921    5388  1035802 2013-12-04
 5. 1551304 NO-JIRA        - 2013-12-16
 6. 1551361    5425  1052445 2013-12-16
 7. 1552772    5442  1039949 2013-12-20
 8. 1553148    5444  1052775 2013-12-23
 9. 1556888    5362        - 2014-01-09
10. 1556892    5460  1051097 2014-01-09
11. 1557620    5473  1051924 2014-01-12
12. 1558589    5483        - 2014-01-15
13. 1558592    5484  1035843 2014-01-15
14. 1558913    5487  1054448 2014-01-16
15. 1560530    5362        - 2014-01-22
16. 1561848    5362        - 2014-01-27
17. 1564808    5362        - 2014-02-05
18. 1564877    5480  1053749 2014-02-05
19. 1564893    5361        - 2014-02-05
20. 1564935    5361        - 2014-02-05
21. 1574513    5603  1063700 2014-03-05
22. 1575009    5607  1064181 2014-03-06

See above sections for details on these checkins.

Future work:
============
* One journal file lost when queue deleted. All files except for one are recycled back to the EFP.
* Complete exceptions - several exceptions thrown using jexception have no exception numbers
* Investigate ability of store to detect missing journal files, especially from logical end of a journal
* Investigate ability of store to handle file muddle-ups (ie journal files from EFP which are not zeroed or other journals)
* Look at improving the efficiency of recovery - right now the entire store is read once, and then each recovered record xid and data is read again

Code tidy-up
------------
* Remove old comments
* Use c++ cast templates instead of (xxx)y
* Member names: xxx_
* Rename classes, functions and variables to camel-case
* Add Doxygen docs to classes
* Make fid's consistent in name (fid, file_id, pfid) and format (hex vs decimal)