summaryrefslogtreecommitdiff
path: root/storage/maria
diff options
context:
space:
mode:
authorunknown <guilhem@gbichot3.local>2007-06-22 14:49:37 +0200
committerunknown <guilhem@gbichot3.local>2007-06-22 14:49:37 +0200
commit1a96259191b193b353387cbb70d7567009e3b247 (patch)
tree27f19470e270f1546d4eb9ac1eaf51ff23ec8a08 /storage/maria
parentfd9bd5802932b08da7484c54445ae14ee4e25385 (diff)
downloadmariadb-git-1a96259191b193b353387cbb70d7567009e3b247.tar.gz
- WL#3239 "log CREATE TABLE in Maria"
- WL#3240 "log DROP TABLE in Maria" - similarly, log RENAME TABLE, REPAIR/OPTIMIZE TABLE, and DELETE no_WHERE_clause (== the DELETE which just truncates the files) - create_rename_lsn added to MARIA_SHARE's state - all these operations (except DROP TABLE) also update the table's create_rename_lsn, which is needed for the correctness of Recovery (see function comment of _ma_repair_write_log_record() in ma_check.c) - write a COMMIT record when transaction commits. - don't log REDOs/UNDOs if this is an internal temporary table like inside ALTER TABLE (I expect this to be a big win). There was already no logging for user-created "CREATE TEMPORARY" tables. - don't fsync files/directories if the table is not transactional - in translog_write_record(), autogenerate a 2-byte-id for the table and log the "id->name" pair (LOGREC_FILE_ID); log LOGREC_LONG_TRANSACTION_ID; automatically store the table's 2-byte-id in any log record. - preparations for Checkpoint: translog_get_horizon(); pausing Checkpoint when some dirty pages are unknown; capturing trn->rec_lsn, trn->first_undo_lsn for Checkpoint and log's low-water-mark computing. - assertions, comments. storage/maria/Makefile.am: more files to build storage/maria/ha_maria.cc: - logging a REPAIR log record if REPAIR/OPTIMIZE was successful. - ha_maria::data_file_type does not have to be set in every info() call, just do it once in open(). - if caller said that transactionality can be disabled (like if caller is ALTER TABLE) i.e. thd->transaction.on==FALSE, then we temporarily disable transactionality of the table in external_lock(); that will ensure that no REDOs/UNDOs are logged for this possibly massive write operation (they are not needed, as if any write fails, the table will be dropped). We re-enable in external_lock(F_UNLCK), which in ALTER TABLE happens before the tmp table replaces the original one (which is good, as thus the final table will have a REDO RENAME and a correct create_rename_lsn). - when we commit we also have to write a log record, so trnman_commit_trn() calls become ma_commit() calls - at end of engine's initialization, we are potentially entering a multi-threaded dangerous world (clients are going to be accepted) and so some assertions of mutex-owning become enforceable, for that we set maria_multi_threaded=TRUE (see ma_control_file.c) storage/maria/ha_maria.h: new member ha_maria::save_transactional (see also ha_maria.cc) storage/maria/ma_blockrec.c: - fixing comments according to discussion with Monty - if a table is transactional but temporarily non-transactional (like in ALTER TABLE), we need to give a sensible LSN to the pages (and, if we give 0, pagecache asserts). - translog_write_record() now takes care of storing the share's 2-byte-id in the log record storage/maria/ma_blockrec.h: fixing comment according to discussion with Monty storage/maria/ma_check.c: When REPAIR/OPTIMIZE modify the data/index file, if this is a transactional table, they must sync it; if they remove files or rename files, they must sync the directory, so that everything is durable. This is just applying to REPAIR/OPTIMIZE the logic already implemented in CREATE/DROP/RENAME a few months ago. Adding a function to write a LOGREC_REPAIR_TABLE at end of REPAIR/OPTIMIZE (called only by ha_maria, not by maria_chk), and to update the table's create_rename_lsn. storage/maria/ma_close.c: fix for a future bug storage/maria/ma_control_file.c: ensuring that if Maria is running in multi-threaded mode, anybody wanting to write to the control file and update last_checkpoint_lsn/last_logno owns the log's lock. storage/maria/ma_control_file.h: see ma_control_file.c storage/maria/ma_create.c: when creating a table: - sync it and its directory only if this is a transactional table and there is a log (no point in syncing in maria_chk) - decouple the two uses of linkname/linkname_ptr (for index file and for data file) into more variables, as we need to know all links until the moment we write the LOGREC_CREATE_TABLE. - set share.data_file_type early so that _ma_initialize_data_file() knows it (Monty's bugfix so that a table always has at least a bitmap page when it is created; so data-file is not 0 bytes anymore). - log a LOGREC_CREATE_TABLE; it contains the bytes which we have just written to the index file's header. Update table's create_rename_lsn. - syncing of kfile had been bugified in a previous merge, correcting - syncing of dfile is now needed as it's not empty anymore - in _ma_initialize_data_file(), use share's block_size and not the global one. This is a gratuitous change, both variables are equal, just that I find it more future-proof to use share-bound variable rather than global one. storage/maria/ma_delete_all.c: log a LOGREC_DELETE_ALL record when doing ma_delete_all_rows(); update create_rename_lsn then. storage/maria/ma_delete_table.c: - logging LOGREC_DROP_TABLE; knowing if this is needed, requires knowing if the table is transactional, which requires opening the table. - we need to sync directories only if the table is transactional storage/maria/ma_extra.c: questions storage/maria/ma_init.c: when maria_end() is called, engine is not multithreaded storage/maria/ma_loghandler.c: - translog_inited has to be visible to ma_create() (see how it is used in ma_create()) - checkpoint record will be a single record, not three - no REDO for TRUNCATE (TRUNCATE calls ma_create() internally so will log a REDO_CREATE) - adding REDO for DELETE no_WHERE_clause (fast DELETE of all rows by truncating the files), REPAIR. - MY_WAIT_IF_FULL to wait&retry if a log write hits a full disk - in translog_write_record(), if MARIA_SHARE does not yet have a 2-byte-id, generate one for it and log LOGREC_FILE_ID; automatically store this short id into log records. - in translog_write_record(), if transaction has not logged its long trid, log LOGREC_LONG_TRANSACTION_ID. - For Checkpoint, we need to know the current end-of-log: adding translog_get_horizon(). - For Control File, adding an assertion that the thread owns the log's lock (control file is protected by this lock) storage/maria/ma_loghandler.h: Changes in log records (see ma_loghandler.c). new prototypes, new functions. storage/maria/ma_loghandler_lsn.h: adding a type LSN_WITH_FLAGS especially for TRN::first_undo_lsn, where the most significant byte is used for flags. storage/maria/ma_open.c: storing the create_rename_lsn in the index file's header (in the state, precisely) and retrieving it from there. storage/maria/ma_pagecache.c: - my set_if_bigger was wrong, correcting it - if the first_in_switch list is not empty, it means that changed_blocks misses some dirty pages, so Checkpoint cannot run and needs to wait. A variable missing_blocks_in_changed_list is added to tell that (should it be named missing_blocks_in_changed_blocks?) - pagecache_collect_changed_blocks_with_lsn() now also tells the minimum rec_lsn (needed for low-water mark computation). storage/maria/ma_pagecache.h: see ma_pagecache.c storage/maria/ma_panic.c: comment storage/maria/ma_range.c: comment storage/maria/ma_rename.c: - logging LOGREC_RENAME_TABLE; knowing if this is needed, requires knowing if the table is transactional, which requires opening the table. - update create_rename_lsn - we need to sync directories only if the table is transactional storage/maria/ma_static.c: comment storage/maria/ma_test_all.sh: - tip for Valgrind-ing ma_test_all - do "export maria_path=somepath" before calling ma_test_all, if you want to run ma_test_all out of storage/maria (useful to have parallel runs, like one normal and one Valgrind, they must not use the same tables so need to run in different directories) storage/maria/maria_def.h: - state now contains, in memory and on disk, the create_rename_lsn - share now contains a 2-byte-id storage/maria/trnman.c: preparations for Checkpoint: capture trn->rec_lsn, trn->first_undo_lsn; minimum first_undo_lsn needed to know log's low-water-mark storage/maria/trnman.h: using most significant byte of first_undo_lsn to hold miscellaneous flags, for now TRANSACTION_LOGGED_LONG_ID. dummy_transaction_object is already declared in ma_static.c. storage/maria/trnman_public.h: dummy_transaction_object was declared in all files including trnman_public.h, while in fact it's a single object. new prototype storage/maria/unittest/ma_test_loghandler-t.c: update for new prototype storage/maria/unittest/ma_test_loghandler_multigroup-t.c: update for new prototype storage/maria/unittest/ma_test_loghandler_multithread-t.c: update for new prototype storage/maria/unittest/ma_test_loghandler_pagecache-t.c: update for new prototype storage/maria/ma_commit.c: function which wraps: - writing a LOGREC_COMMIT record (==commit on disk) - calling trnman_commit_trn() (=commit in memory) storage/maria/ma_commit.h: new header file .tree-is-private: this file is now needed to keep our tree private (don't push it to public trees). When 5.1 is merged into mysql-maria, we can abandon our maria-specific post-commit trigger; .tree_is_private will take care of keeping commit mails private. Don't push this file to public trees.
Diffstat (limited to 'storage/maria')
-rw-r--r--storage/maria/Makefile.am6
-rw-r--r--storage/maria/ha_maria.cc50
-rw-r--r--storage/maria/ha_maria.h5
-rw-r--r--storage/maria/ma_blockrec.c140
-rw-r--r--storage/maria/ma_blockrec.h2
-rw-r--r--storage/maria/ma_check.c88
-rw-r--r--storage/maria/ma_close.c15
-rw-r--r--storage/maria/ma_commit.c71
-rw-r--r--storage/maria/ma_commit.h18
-rw-r--r--storage/maria/ma_control_file.c16
-rw-r--r--storage/maria/ma_control_file.h2
-rw-r--r--storage/maria/ma_create.c148
-rw-r--r--storage/maria/ma_delete_all.c79
-rw-r--r--storage/maria/ma_delete_table.c99
-rw-r--r--storage/maria/ma_extra.c48
-rw-r--r--storage/maria/ma_init.c2
-rw-r--r--storage/maria/ma_loghandler.c355
-rw-r--r--storage/maria/ma_loghandler.h35
-rw-r--r--storage/maria/ma_loghandler_lsn.h10
-rw-r--r--storage/maria/ma_open.c22
-rwxr-xr-xstorage/maria/ma_pagecache.c179
-rw-r--r--storage/maria/ma_pagecache.h1
-rw-r--r--storage/maria/ma_panic.c7
-rw-r--r--storage/maria/ma_range.c32
-rw-r--r--storage/maria/ma_rename.c96
-rw-r--r--storage/maria/ma_static.c8
-rwxr-xr-xstorage/maria/ma_test_all.sh278
-rw-r--r--storage/maria/maria_def.h6
-rw-r--r--storage/maria/trnman.c173
-rw-r--r--storage/maria/trnman.h5
-rw-r--r--storage/maria/trnman_public.h7
-rw-r--r--storage/maria/unittest/ma_test_loghandler-t.c14
-rw-r--r--storage/maria/unittest/ma_test_loghandler_multigroup-t.c14
-rw-r--r--storage/maria/unittest/ma_test_loghandler_multithread-t.c6
-rw-r--r--storage/maria/unittest/ma_test_loghandler_pagecache-t.c2
35 files changed, 1407 insertions, 632 deletions
diff --git a/storage/maria/Makefile.am b/storage/maria/Makefile.am
index 9d8ab704541..fbb25584910 100644
--- a/storage/maria/Makefile.am
+++ b/storage/maria/Makefile.am
@@ -54,7 +54,8 @@ noinst_HEADERS = maria_def.h ma_rt_index.h ma_rt_key.h ma_rt_mbr.h \
ma_sp_defs.h ma_fulltext.h ma_ftdefs.h ma_ft_test1.h \
ma_ft_eval.h trnman.h lockman.h tablockman.h \
ma_control_file.h ha_maria.h ma_blockrec.h \
- ma_loghandler.h ma_loghandler_lsn.h ma_pagecache.h
+ ma_loghandler.h ma_loghandler_lsn.h ma_pagecache.h \
+ ma_commit.h
ma_test1_DEPENDENCIES= $(LIBRARIES)
ma_test1_LDADD= @CLIENT_EXTRA_LDFLAGS@ libmaria.a \
$(top_builddir)/storage/myisam/libmyisam.a \
@@ -112,7 +113,8 @@ libmaria_a_SOURCES = ma_init.c ma_open.c ma_extra.c ma_info.c ma_rkey.c \
ha_maria.cc trnman.c lockman.c tablockman.c \
ma_rt_index.c ma_rt_key.c ma_rt_mbr.c ma_rt_split.c \
ma_sp_key.c ma_control_file.c ma_loghandler.c \
- ma_pagecache.c ma_pagecaches.c
+ ma_pagecache.c ma_pagecaches.c \
+ ma_commit.c
CLEANFILES = test?.MA? FT?.MA? isam.log ma_test_all ma_rt_test.MA? sp_test.MA?
SUFFIXES = .sh
diff --git a/storage/maria/ha_maria.cc b/storage/maria/ha_maria.cc
index 288366675a7..e05f97a384d 100644
--- a/storage/maria/ha_maria.cc
+++ b/storage/maria/ha_maria.cc
@@ -30,6 +30,7 @@
#include "maria_def.h"
#include "ma_rt_index.h"
#include "ma_blockrec.h"
+#include "ma_commit.h"
#define MARIA_CANNOT_ROLLBACK HA_NO_TRANSACTIONS
#ifdef MARIA_CANNOT_ROLLBACK
@@ -690,7 +691,8 @@ int ha_maria::open(const char *name, int mode, uint test_if_locked)
info(HA_STATUS_NO_LOCK | HA_STATUS_VARIABLE | HA_STATUS_CONST);
if (!(test_if_locked & HA_OPEN_WAIT_IF_LOCKED))
VOID(maria_extra(file, HA_EXTRA_WAIT_LOCK, 0));
- if (file->s->data_file_type != STATIC_RECORD)
+ save_transactional= file->s->base.transactional;
+ if ((data_file_type= file->s->data_file_type) != STATIC_RECORD)
int_table_flags |= HA_REC_NOT_IN_SEQ;
if (file->s->options & (HA_OPTION_CHECKSUM | HA_OPTION_COMPRESS_RECORD))
int_table_flags |= HA_HAS_CHECKSUM;
@@ -1178,6 +1180,8 @@ int ha_maria::repair(THD *thd, HA_CHECK &param, bool do_optimize)
llstr(rows, llbuff),
llstr(file->state->records, llbuff2));
}
+ if (!error)
+ error= _ma_repair_write_log_record(&param, file);
}
else
{
@@ -1806,7 +1810,6 @@ int ha_maria::info(uint flag)
MY_APPEND_EXT | MY_UNPACK_FILENAME);
if (strcmp(name_buff, maria_info.index_file_name))
index_file_name=maria_info.index_file_name;
- data_file_type= maria_info.data_file_type;
}
if (flag & HA_STATUS_ERRKEY)
{
@@ -1860,7 +1863,7 @@ int ha_maria::external_lock(THD *thd, int lock_type)
{
TRN *trn= THD_TRN;
DBUG_ENTER("ha_maria::external_lock");
- if (!file->s->base.transactional)
+ if (!save_transactional)
goto skip_transaction;
if (!trn && lock_type != F_UNLCK) /* no transaction yet - open it now */
{
@@ -1884,6 +1887,19 @@ int ha_maria::external_lock(THD *thd, int lock_type)
trans_register_ha(thd, FALSE, maria_hton);
trnman_new_statement(trn);
}
+ if (!thd->transaction.on)
+ {
+ /*
+ No need to log REDOs/UNDOs. If this is an internal temporary table
+ which will be renamed to a permanent table (like in ALTER TABLE),
+ the rename happens after unlocking so will be durable (and the table
+ will get its create_rename_lsn).
+ Note: if we wanted to enable users to have an old backup and apply
+ tons of archived logs to roll-forward, we could then not disable
+ REDOs/UNDOs in this case.
+ */
+ file->s->base.transactional= FALSE;
+ }
}
else
{
@@ -1894,7 +1910,8 @@ int ha_maria::external_lock(THD *thd, int lock_type)
{
/* autocommit ? rollback a transaction */
#ifdef MARIA_CANNOT_ROLLBACK
- trnman_commit_trn(trn);
+ if (ma_commit(trn))
+ DBUG_RETURN(1);
THD_TRN= 0;
#else
if (!(thd->options & (OPTION_NOT_AUTOCOMMIT | OPTION_BEGIN)))
@@ -1906,6 +1923,7 @@ int ha_maria::external_lock(THD *thd, int lock_type)
#endif
}
}
+ file->s->base.transactional= save_transactional;
}
skip_transaction:
DBUG_RETURN(maria_lock_database(file, !table->s->tmp_table ?
@@ -1916,7 +1934,7 @@ skip_transaction:
int ha_maria::start_stmt(THD *thd, thr_lock_type lock_type)
{
TRN *trn= THD_TRN;
- if (file->s->base.transactional)
+ if (save_transactional)
{
DBUG_ASSERT(trn); // this may be called only after external_lock()
DBUG_ASSERT(trnman_has_locked_tables(trn));
@@ -2186,8 +2204,7 @@ static int maria_commit(handlerton *hton __attribute__ ((unused)),
DBUG_RETURN(0); // end of statement
DBUG_PRINT("info", ("THD_TRN set to 0x0"));
THD_TRN= 0;
- DBUG_RETURN(trnman_commit_trn(trn) ?
- HA_ERR_OUT_OF_MEM : 0); // end of transaction
+ DBUG_RETURN(ma_commit(trn)); // end of transaction
}
@@ -2212,6 +2229,7 @@ static int maria_rollback(handlerton *hton __attribute__ ((unused)),
static int ha_maria_init(void *p)
{
+ int res;
maria_hton= (handlerton *)p;
maria_hton->state= SHOW_OPTION_YES;
maria_hton->db_type= DB_TYPE_MARIA;
@@ -2223,14 +2241,16 @@ static int ha_maria_init(void *p)
maria_hton->flags= HTON_CAN_RECREATE | HTON_SUPPORT_LOG_TABLES;
bzero(maria_log_pagecache, sizeof(*maria_log_pagecache));
maria_data_root= mysql_real_data_home;
- return (test(maria_init() || ma_control_file_create_or_open() ||
- (init_pagecache(maria_log_pagecache,
- TRANSLOG_PAGECACHE_SIZE, 0, 0,
- TRANSLOG_PAGE_SIZE) == 0) ||
- translog_init(maria_data_root, TRANSLOG_FILE_SIZE,
- MYSQL_VERSION_ID, server_id, maria_log_pagecache,
- TRANSLOG_DEFAULT_FLAGS) ||
- trnman_init()));
+ res= maria_init() || ma_control_file_create_or_open() ||
+ (init_pagecache(maria_log_pagecache,
+ TRANSLOG_PAGECACHE_SIZE, 0, 0,
+ TRANSLOG_PAGE_SIZE) == 0) ||
+ translog_init(maria_data_root, TRANSLOG_FILE_SIZE,
+ MYSQL_VERSION_ID, server_id, maria_log_pagecache,
+ TRANSLOG_DEFAULT_FLAGS) ||
+ trnman_init();
+ maria_multi_threaded= TRUE;
+ return res;
}
diff --git a/storage/maria/ha_maria.h b/storage/maria/ha_maria.h
index dd0a9594ef3..a2f6b190657 100644
--- a/storage/maria/ha_maria.h
+++ b/storage/maria/ha_maria.h
@@ -39,6 +39,11 @@ class ha_maria :public handler
char *data_file_name, *index_file_name;
enum data_file_type data_file_type;
bool can_enable_indexes;
+ /**
+ @brief for temporarily disabling table's transactionality
+ (if THD::transaction::on is false), remember the original value here
+ */
+ bool save_transactional;
int repair(THD * thd, HA_CHECK &param, bool optimize);
public:
diff --git a/storage/maria/ma_blockrec.c b/storage/maria/ma_blockrec.c
index 39769507887..d2512f1e025 100644
--- a/storage/maria/ma_blockrec.c
+++ b/storage/maria/ma_blockrec.c
@@ -171,11 +171,14 @@
started and we can then delete TRANSID and VER_PTR from the row to
gain more space.
- If a row is deleted in Maria, we change TRANSID to current transid and
- change VER_PTR to point to the undo record for the delete. The undo
- record must contain the original TRANSID, so that another transaction
- can use this to check if they should use the found row or go to the
- previous row pointed to by the VER_PTR in the undo row.
+ If a row is deleted in Maria, we change TRANSID to the deleting
+ transaction's id, change VER_PTR to point to the undo record for the delete,
+ and add DELETE_TRANSID (the id of the transaction which last
+ inserted/updated the row before its deletion). DELETE_TRANSID allows an old
+ transaction to avoid reading the log to know if it can see the last version
+ before delete (in other words it reduces the probability of having to follow
+ VER_PTR). TODO: depending on a compilation option, evaluate the performance
+ impact of not storing DELETE_TRANSID (which would make the row smaller).
Description of the different parts:
@@ -391,7 +394,12 @@ my_bool _ma_once_end_block_record(MARIA_SHARE *share)
share->temporary ? FLUSH_IGNORE_CHANGED :
FLUSH_RELEASE))
res= 1;
- if (my_close(share->bitmap.file.file, MYF(MY_WME)))
+ /*
+ File must be synced as it is going out of the maria_open_list and so
+ becoming unknown to Checkpoint.
+ */
+ if (my_sync(share->bitmap.file.file, MYF(MY_WME)) ||
+ my_close(share->bitmap.file.file, MYF(MY_WME)))
res= 1;
/*
Trivial assignment to guard against multiple invocations
@@ -400,6 +408,8 @@ my_bool _ma_once_end_block_record(MARIA_SHARE *share)
*/
share->bitmap.file.file= -1;
}
+ if (share->id != 0)
+ translog_deassign_id_from_share(share);
return res;
}
@@ -573,7 +583,14 @@ void _ma_unpin_all_pages(MARIA_HA *info, LSN undo_lsn)
DBUG_ASSERT(undo_lsn != 0 || !info->s->base.transactional);
if (!info->s->base.transactional)
- undo_lsn= 0; /* Avoid assert in key cache */
+ {
+ /*
+ If this is a transactional table but with transactionality temporarily
+ disabled (like in ALTER TABLE) we need to give a sensible LSN to pages
+ and not 0. If this is not a transactional table it will reduce to 0.
+ */
+ undo_lsn= info->s->state.create_rename_lsn;
+ }
while (pinned_page-- != page_link)
pagecache_unlock_by_link(info->s->pagecache, pinned_page->link,
@@ -1133,7 +1150,6 @@ static my_bool write_tail(MARIA_HA *info,
LSN lsn;
/* Log REDO changes of tail page */
- fileid_store(log_data, info->dfile.file);
page_store(log_data+ FILEID_STORE_SIZE, block->page);
dirpos_store(log_data+ FILEID_STORE_SIZE + PAGE_STORE_SIZE,
row_pos.rownr);
@@ -1143,7 +1159,8 @@ static my_bool write_tail(MARIA_HA *info,
log_array[TRANSLOG_INTERNAL_PARTS + 1].length= length;
if (translog_write_record(&lsn, LOGREC_REDO_INSERT_ROW_TAIL,
info->trn, share, sizeof(log_data) + length,
- TRANSLOG_INTERNAL_PARTS + 2, log_array))
+ TRANSLOG_INTERNAL_PARTS + 2, log_array,
+ log_data))
DBUG_RETURN(1);
}
@@ -1388,7 +1405,6 @@ static my_bool free_full_pages(MARIA_HA *info, MARIA_ROW *row)
size_t extents_length= row->extents_count * ROW_EXTENT_SIZE;
DBUG_ENTER("free_full_pages");
- fileid_store(log_data, info->dfile.file);
pagerange_store(log_data + FILEID_STORE_SIZE,
row->extents_count);
log_array[TRANSLOG_INTERNAL_PARTS + 0].str= (char*) log_data;
@@ -1397,7 +1413,8 @@ static my_bool free_full_pages(MARIA_HA *info, MARIA_ROW *row)
log_array[TRANSLOG_INTERNAL_PARTS + 1].length= extents_length;
if (translog_write_record(&lsn, LOGREC_REDO_PURGE_BLOCKS, info->trn,
info->s, sizeof(log_data) + extents_length,
- TRANSLOG_INTERNAL_PARTS + 2, log_array))
+ TRANSLOG_INTERNAL_PARTS + 2, log_array,
+ log_data))
DBUG_RETURN(1);
DBUG_RETURN (_ma_bitmap_free_full_pages(info, row->extents,
@@ -1431,7 +1448,6 @@ static my_bool free_full_page_range(MARIA_HA *info, ulonglong page, uint count)
{
LSN lsn;
DBUG_ASSERT(info->trn->rec_lsn);
- fileid_store(log_data, info->dfile.file);
pagerange_store(log_data + FILEID_STORE_SIZE, 1);
int5store(log_data + FILEID_STORE_SIZE + PAGERANGE_STORE_SIZE,
page);
@@ -1442,7 +1458,8 @@ static my_bool free_full_page_range(MARIA_HA *info, ulonglong page, uint count)
if (translog_write_record(&lsn, LOGREC_REDO_PURGE_BLOCKS,
info->trn, info->s, sizeof(log_data),
- TRANSLOG_INTERNAL_PARTS + 1, log_array))
+ TRANSLOG_INTERNAL_PARTS + 1, log_array,
+ log_data))
res= 1;
}
@@ -1455,24 +1472,25 @@ static my_bool free_full_page_range(MARIA_HA *info, ulonglong page, uint count)
}
-/*
- Write a record to a (set of) pages
+/**
+ @brief Write a record to a (set of) pages
- SYNOPSIS
- write_block_record()
- info Maria handler
- old_record Orignal record in case of update; NULL in case of insert
- record Record we should write
- row Statistics about record (calculated by calc_record_size())
- map_blocks On which pages the record should be stored
- row_pos Position on head page where to put head part of record
+ @param info Maria handler
+ @param old_record Original record in case of update; NULL in case of
+ insert
+ @param record Record we should write
+ @param row Statistics about record (calculated by
+ calc_record_size())
+ @param map_blocks On which pages the record should be stored
+ @param row_pos Position on head page where to put head part of
+ record
- NOTES
- On return all pinned pages are released.
+ @note
+ On return all pinned pages are released.
- RETURN
- 0 ok
- 1 error
+ @return Operation status
+ @retval 0 OK
+ @retval 1 Error
*/
static my_bool write_block_record(MARIA_HA *info,
@@ -1940,7 +1958,6 @@ static my_bool write_block_record(MARIA_HA *info,
size_t data_length= (size_t) (data - row_pos->data);
/* Log REDO changes of head page */
- fileid_store(log_data, info->dfile.file);
page_store(log_data+ FILEID_STORE_SIZE, head_block->page);
dirpos_store(log_data+ FILEID_STORE_SIZE + PAGE_STORE_SIZE,
row_pos->rownr);
@@ -1950,7 +1967,8 @@ static my_bool write_block_record(MARIA_HA *info,
log_array[TRANSLOG_INTERNAL_PARTS + 1].length= data_length;
if (translog_write_record(&lsn, LOGREC_REDO_INSERT_ROW_HEAD, info->trn,
share, sizeof(log_data) + data_length,
- TRANSLOG_INTERNAL_PARTS + 2, log_array))
+ TRANSLOG_INTERNAL_PARTS + 2, log_array,
+ log_data))
goto disk_err;
}
@@ -2010,7 +2028,6 @@ static my_bool write_block_record(MARIA_HA *info,
NullS))
goto disk_err;
}
- fileid_store(log_data, info->dfile.file);
log_pos= log_data + FILEID_STORE_SIZE;
log_array_pos= log_array+ TRANSLOG_INTERNAL_PARTS+1;
@@ -2068,7 +2085,7 @@ static my_bool write_block_record(MARIA_HA *info,
error= translog_write_record(&lsn, LOGREC_REDO_INSERT_ROW_BLOBS,
info->trn, share, log_entry_length,
(uint) (log_array_pos - log_array),
- log_array);
+ log_array, log_data);
if (log_array != tmp_log_array)
my_free((gptr) log_array, MYF(0));
if (error)
@@ -2084,7 +2101,6 @@ static my_bool write_block_record(MARIA_HA *info,
/* LOGREC_UNDO_ROW_INSERT & LOGREC_UNDO_ROW_INSERT share same header */
lsn_store(log_data, info->trn->undo_lsn);
- fileid_store(log_data + LSN_STORE_SIZE, info->dfile.file);
page_store(log_data+ LSN_STORE_SIZE + FILEID_STORE_SIZE,
head_block->page);
dirpos_store(log_data+ LSN_STORE_SIZE + FILEID_STORE_SIZE +
@@ -2099,7 +2115,8 @@ static my_bool write_block_record(MARIA_HA *info,
/* Write UNDO log record for the INSERT */
if (translog_write_record(&lsn, LOGREC_UNDO_ROW_INSERT,
info->trn, share, sizeof(log_data),
- TRANSLOG_INTERNAL_PARTS + 1, log_array))
+ TRANSLOG_INTERNAL_PARTS + 1, log_array,
+ log_data + LSN_STORE_SIZE))
goto disk_err;
}
else
@@ -2114,7 +2131,7 @@ static my_bool write_block_record(MARIA_HA *info,
if (translog_write_record(&lsn, LOGREC_UNDO_ROW_UPDATE, info->trn,
share, sizeof(log_data) + row_length,
TRANSLOG_INTERNAL_PARTS + 1 + row_parts_count,
- log_array))
+ log_array, log_data + LSN_STORE_SIZE))
goto disk_err;
}
}
@@ -2164,6 +2181,15 @@ crashed:
my_errno= HA_ERR_WRONG_IN_RECORD;
disk_err:
+ /**
+ @todo RECOVERY we are going to let dirty pages go to disk while we have
+ logged UNDO, this violates WAL. If we have not written any full pages,
+ all dirty pages are pinned so we could just delete them from the
+ pagecache. Moreover, we have written some REDOs without a closing UNDO,
+ it's possible that a next operation by this transaction succeeds and then
+ Recovery would glue the "orphan REDOs" to the succeeded operation and
+ execute the failed REDOs.
+ */
/* Unpin all pinned pages to not cause problems for disk cache */
_ma_unpin_all_pages(info, 0);
@@ -2229,20 +2255,18 @@ my_bool _ma_write_block_record(MARIA_HA *info __attribute__ ((unused)),
}
-/*
- Remove row written by _ma_write_block_record
+/**
+ @brief Remove row written by _ma_write_block_record()
- SYNOPSIS
- _ma_abort_write_block_record()
- info Maria handler
+ @param info Maria handler
- INFORMATION
- This is called in case we got a duplicate unique key while
- writing keys.
+ @note
+ This is called in case we got a duplicate unique key while
+ writing keys.
- RETURN
- 0 ok
- 1 error
+ @return Operation status
+ @retval 0 OK
+ @retval 1 Error
*/
my_bool _ma_write_abort_block_record(MARIA_HA *info)
@@ -2288,16 +2312,19 @@ my_bool _ma_write_abort_block_record(MARIA_HA *info)
really undo a failed insert. Note that this UNDO will cause recover
to ignore the LOGREC_UNDO_ROW_INSERT that is the previous entry
in the UNDO chain.
- We will soon change that: we will here execute the UNDO records
- generated while we were trying to write the row; this will log some CLRs
- which will replace this LOGREC_UNDO_PURGE. RECOVERY TODO BUG.
+ */
+ /**
+ @todo RECOVERY BUG
+ We will soon change that: we will here execute the UNDO records
+ generated while we were trying to write the row; this will log some
+ CLRs which will replace this LOGREC_UNDO_PURGE.
*/
lsn_store(log_data, info->trn->undo_lsn);
log_array[TRANSLOG_INTERNAL_PARTS + 0].str= (char*) log_data;
log_array[TRANSLOG_INTERNAL_PARTS + 0].length= sizeof(log_data);
if (translog_write_record(&lsn, LOGREC_UNDO_ROW_PURGE,
- info->trn, info->s, sizeof(log_data),
- TRANSLOG_INTERNAL_PARTS + 1, log_array))
+ info->trn, NULL, sizeof(log_data),
+ TRANSLOG_INTERNAL_PARTS + 1, log_array, NULL))
res= 1;
}
_ma_unpin_all_pages(info, info->trn->undo_lsn);
@@ -2514,7 +2541,6 @@ static my_bool delete_head_or_tail(MARIA_HA *info,
DBUG_ASSERT(share->pagecache->block_size == block_size);
/* Log REDO data */
- fileid_store(log_data, info->dfile.file);
page_store(log_data+ FILEID_STORE_SIZE, page);
dirpos_store(log_data+ FILEID_STORE_SIZE + PAGE_STORE_SIZE,
record_number);
@@ -2524,7 +2550,8 @@ static my_bool delete_head_or_tail(MARIA_HA *info,
if (translog_write_record(&lsn, (head ? LOGREC_REDO_PURGE_ROW_HEAD :
LOGREC_REDO_PURGE_ROW_TAIL),
info->trn, share, sizeof(log_data),
- TRANSLOG_INTERNAL_PARTS + 1, log_array))
+ TRANSLOG_INTERNAL_PARTS + 1, log_array,
+ log_data))
DBUG_RETURN(1);
if (pagecache_write(share->pagecache,
&info->dfile, page, 0,
@@ -2545,7 +2572,6 @@ static my_bool delete_head_or_tail(MARIA_HA *info,
PAGE_STORE_SIZE + PAGERANGE_STORE_SIZE];
LEX_STRING log_array[TRANSLOG_INTERNAL_PARTS + 1];
- fileid_store(log_data, info->dfile.file);
pagerange_store(log_data + FILEID_STORE_SIZE, 1);
page_store(log_data+ FILEID_STORE_SIZE + PAGERANGE_STORE_SIZE, page);
pagerange_store(log_data + FILEID_STORE_SIZE + PAGERANGE_STORE_SIZE +
@@ -2554,7 +2580,8 @@ static my_bool delete_head_or_tail(MARIA_HA *info,
log_array[TRANSLOG_INTERNAL_PARTS + 0].length= sizeof(log_data);
if (translog_write_record(&lsn, LOGREC_REDO_PURGE_BLOCKS,
info->trn, share, sizeof(log_data),
- TRANSLOG_INTERNAL_PARTS + 1, log_array))
+ TRANSLOG_INTERNAL_PARTS + 1, log_array,
+ log_data))
DBUG_RETURN(1);
DBUG_ASSERT(empty_space >= info->s->bitmap.sizes[0]);
}
@@ -2631,7 +2658,6 @@ my_bool _ma_delete_block_record(MARIA_HA *info, const byte *record)
/* Write UNDO record */
lsn_store(log_data, info->trn->undo_lsn);
- fileid_store(log_data+ LSN_STORE_SIZE, info->dfile.file);
page_store(log_data+ LSN_STORE_SIZE + FILEID_STORE_SIZE, page);
dirpos_store(log_data+ LSN_STORE_SIZE + FILEID_STORE_SIZE +
PAGE_STORE_SIZE, record_number);
@@ -2645,7 +2671,7 @@ my_bool _ma_delete_block_record(MARIA_HA *info, const byte *record)
if (translog_write_record(&lsn, LOGREC_UNDO_ROW_DELETE, info->trn,
info->s, sizeof(log_data) + row_length,
TRANSLOG_INTERNAL_PARTS + 1 + row_parts_count,
- info->log_row_parts))
+ info->log_row_parts, log_data + LSN_STORE_SIZE))
goto err;
}
diff --git a/storage/maria/ma_blockrec.h b/storage/maria/ma_blockrec.h
index f45250ff39c..819d1c2e4d2 100644
--- a/storage/maria/ma_blockrec.h
+++ b/storage/maria/ma_blockrec.h
@@ -96,7 +96,7 @@ enum en_page_type { UNALLOCATED_PAGE, HEAD_PAGE, TAIL_PAGE, BLOB_PAGE, MAX_PAGE_
/******* defines that affects allocation (density) of data *******/
/*
- If the tail part (from the main block or a blob) uses more than 75 % of
+ If the tail part (from the main block or a blob) would use more than 75 % of
the size of page, store the tail on a full page instead of a shared
tail page.
*/
diff --git a/storage/maria/ma_check.c b/storage/maria/ma_check.c
index 8f10c98d0ee..0fc2b77304d 100644
--- a/storage/maria/ma_check.c
+++ b/storage/maria/ma_check.c
@@ -53,6 +53,7 @@
#endif
#include "ma_rt_index.h"
#include "ma_blockrec.h"
+#include "trnman_public.h"
/* Functions defined in this file */
@@ -2132,11 +2133,15 @@ err:
/* Replace the actual file with the temporary file */
if (new_file >= 0)
{
+ myf sync_dir= (share->base.transactional && !share->temporary) ?
+ MY_SYNC_DIR : 0;
my_close(new_file,MYF(0));
info->dfile.file= new_file= -1;
if (maria_change_to_newfile(share->data_file_name,MARIA_NAME_DEXT,
- DATA_TMP_EXT, (param->testflag & T_BACKUP_DATA ?
- MYF(MY_REDEL_MAKE_BACKUP): MYF(0))) ||
+ DATA_TMP_EXT,
+ MYF((param->testflag & T_BACKUP_DATA ?
+ MY_REDEL_MAKE_BACKUP : 0) |
+ sync_dir)) ||
_ma_open_datafile(info,share,-1))
got_error=1;
}
@@ -2328,6 +2333,8 @@ int maria_sort_index(HA_CHECK *param, register MARIA_HA *info, my_string name)
int old_lock;
MARIA_SHARE *share=info->s;
MARIA_STATE_INFO old_state;
+ myf sync_dir= (share->base.transactional && !share->temporary) ?
+ MY_SYNC_DIR : 0;
DBUG_ENTER("maria_sort_index");
/* cannot sort index files with R-tree indexes */
@@ -2388,7 +2395,7 @@ int maria_sort_index(HA_CHECK *param, register MARIA_HA *info, my_string name)
share->kfile.file = -1;
VOID(my_close(new_file,MYF(MY_WME)));
if (maria_change_to_newfile(share->index_file_name, MARIA_NAME_IEXT,
- INDEX_TMP_EXT, MYF(0)) ||
+ INDEX_TMP_EXT, sync_dir) ||
_ma_open_keyfile(share))
goto err2;
info->lock_type= F_UNLCK; /* Force maria_readinfo to lock */
@@ -2604,6 +2611,8 @@ int maria_repair_by_sort(HA_CHECK *param, register MARIA_HA *info,
char llbuff[22];
MARIA_SORT_INFO sort_info;
ulonglong key_map=share->state.key_map;
+ myf sync_dir= (share->base.transactional && !share->temporary) ?
+ MY_SYNC_DIR : 0;
DBUG_ENTER("maria_repair_by_sort");
start_records=info->state->records;
@@ -2922,8 +2931,9 @@ err:
info->dfile.file= new_file= -1;
if (maria_change_to_newfile(share->data_file_name,MARIA_NAME_DEXT,
DATA_TMP_EXT,
- (param->testflag & T_BACKUP_DATA ?
- MYF(MY_REDEL_MAKE_BACKUP): MYF(0))) ||
+ MYF((param->testflag & T_BACKUP_DATA ?
+ MY_REDEL_MAKE_BACKUP : 0) |
+ sync_dir)) ||
_ma_open_datafile(info,share,-1))
got_error=1;
}
@@ -3022,6 +3032,8 @@ int maria_repair_parallel(HA_CHECK *param, register MARIA_HA *info,
MARIA_SORT_INFO sort_info;
ulonglong key_map=share->state.key_map;
pthread_attr_t thr_attr;
+ myf sync_dir= (share->base.transactional && !share->temporary) ?
+ MY_SYNC_DIR : 0;
DBUG_ENTER("maria_repair_parallel");
start_records=info->state->records;
@@ -3445,8 +3457,9 @@ err:
info->dfile.file= new_file= -1;
if (maria_change_to_newfile(share->data_file_name,MARIA_NAME_DEXT,
DATA_TMP_EXT,
- (param->testflag & T_BACKUP_DATA ?
- MYF(MY_REDEL_MAKE_BACKUP): MYF(0))) ||
+ MYF((param->testflag & T_BACKUP_DATA ?
+ MY_REDEL_MAKE_BACKUP : 0) |
+ sync_dir)) ||
_ma_open_datafile(info,share,-1))
got_error=1;
}
@@ -5135,3 +5148,64 @@ static void restore_data_file_type(MARIA_SHARE *share)
share->data_file_type= share->state.header.data_file_type=
share->pack.header_length= 0;
}
+
+
+/**
+ @brief Writes a LOGREC_REPAIR_TABLE record and updates create_rename_lsn
+
+ REPAIR/OPTIMIZE have replaced the data/index file with a new file
+ and so, in this scenario:
+ @verbatim
+ CHECKPOINT - REDO_INSERT - COMMIT - ... - REPAIR - ... - crash
+ @endverbatim
+ we do not want Recovery to apply the REDO_INSERT to the table, as it would
+ then possibly wrongly extend the table. By updating create_rename_lsn at
+ the end of REPAIR, we know that REDO_INSERT will be skipped.
+
+ @param param description of the REPAIR operation
+ @param info table
+
+ @return Operation status
+ @retval 0 ok
+ @retval 1 error (disk problem)
+*/
+
+int _ma_repair_write_log_record(const HA_CHECK *param, MARIA_HA *info)
+{
+ MARIA_SHARE *share= info->s;
+ /* Only called from ha_maria.cc, not maria_check, so translog is inited */
+ if (share->base.transactional && !share->temporary)
+ {
+ /* For now this record is only informative */
+ LEX_STRING log_array[TRANSLOG_INTERNAL_PARTS + 1];
+ uchar log_data[LSN_STORE_SIZE];
+ compile_time_assert(LSN_STORE_SIZE >= (FILEID_STORE_SIZE + 4));
+ log_array[TRANSLOG_INTERNAL_PARTS + 0].str= (char*) log_data;
+ log_array[TRANSLOG_INTERNAL_PARTS + 0].length= FILEID_STORE_SIZE + 4;
+ /*
+ testflag gives an idea of what REPAIR did (in particular T_QUICK
+ or not: did it touch the data file or not?).
+ */
+ int4store(log_data + FILEID_STORE_SIZE, param->testflag);
+ if (unlikely(translog_write_record(&share->state.create_rename_lsn,
+ LOGREC_REDO_REPAIR_TABLE,
+ &dummy_transaction_object, share,
+ log_array[TRANSLOG_INTERNAL_PARTS +
+ 0].length,
+ sizeof(log_array)/sizeof(log_array[0]),
+ log_array, log_data)))
+ return 1;
+ /*
+ But this piece is really needed, to have the new table's content durable
+ and to not apply old REDOs to the new table. The table's existence was
+ made durable earlier (MY_SYNC_DIR passed to maria_change_to_newfile()).
+ */
+ lsn_store(log_data, share->state.create_rename_lsn);
+ DBUG_ASSERT(info->dfile.file >= 0);
+ DBUG_ASSERT(share->kfile.file >= 0);
+ return (my_pwrite(share->kfile.file, log_data, sizeof(log_data),
+ sizeof(share->state.header) + 2, MYF(MY_NABP)) ||
+ _ma_sync_table_files(info));
+ }
+ return 0;
+}
diff --git a/storage/maria/ma_close.c b/storage/maria/ma_close.c
index dc60ce8aa83..34c1bfb4d6d 100644
--- a/storage/maria/ma_close.c
+++ b/storage/maria/ma_close.c
@@ -57,14 +57,6 @@ int maria_close(register MARIA_HA *info)
info->opt_flag&= ~(READ_CACHE_USED | WRITE_CACHE_USED);
}
flag= !--share->reopen;
- /*
- RECOVERY TODO:
- If "flag" is TRUE, in the line below we are going to make the table
- unknown to future checkpoints, so it needs to have fsync'ed itself
- entirely (bitmap, pages, etc) at this point.
- The flushing is currently done a few lines further (which is ok, as we
- still hold THR_LOCK_maria), but syncing is missing.
- */
maria_open_list=list_delete(maria_open_list,&info->open_list);
pthread_mutex_unlock(&share->intern_lock);
@@ -82,7 +74,12 @@ int maria_close(register MARIA_HA *info)
FLUSH_IGNORE_CHANGED :
FLUSH_RELEASE)))
error= my_errno;
-
+ /*
+ File must be synced as it is going out of the maria_open_list and so
+ becoming unknown to Checkpoint.
+ */
+ if (my_sync(share->kfile.file, MYF(MY_WME)))
+ error= my_errno;
/*
If we are crashed, we can safely flush the current state as it will
not change the crashed state.
diff --git a/storage/maria/ma_commit.c b/storage/maria/ma_commit.c
new file mode 100644
index 00000000000..88aaee0509f
--- /dev/null
+++ b/storage/maria/ma_commit.c
@@ -0,0 +1,71 @@
+/* Copyright (C) 2007 MySQL AB
+
+ This program is free software; you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation; version 2 of the License.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program; if not, write to the Free Software
+ Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA */
+
+#include "maria_def.h"
+#include "trnman.h"
+
+/**
+ @brief writes a COMMIT record to log and commits transaction in memory
+
+ @param trn transaction
+
+ @return Operation status
+ @retval 0 ok
+ @retval 1 error (disk error or out of memory)
+*/
+
+int ma_commit(TRN *trn)
+{
+ if (trn->undo_lsn == 0) /* no work done, rollback (cheaper than commit) */
+ return trnman_rollback_trn(trn);
+ /*
+ - if COMMIT record is written before trnman_commit_trn():
+ if Checkpoint comes in the middle it will see trn is not committed,
+ then if crash, Recovery might roll back trn (if min(rec_lsn) is after
+ COMMIT record) and this is not an issue as
+ * transaction's updates were not made visible to other transactions
+ * "commit ok" was not sent to client
+ Alternatively, Recovery might commit trn (if min(rec_lsn) is before COMMIT
+ record), which is ok too. All in all it means that "trn committed" is not
+ 100% equal to "COMMIT record written".
+ - if COMMIT record is written after trnman_commit_trn():
+ if crash happens between the two, trn will be rolled back which is an
+ issue (transaction's updates were made visible to other transactions).
+ So we need to go the first way.
+ */
+ /**
+ @todo RECOVERY share's state is written to disk only in
+ maria_lock_database(), so COMMIT record is not the last record of the
+ transaction! It is probably an issue. Recovery of the state is a problem
+ not yet solved.
+ */
+ LSN commit_lsn;
+ LEX_STRING log_array[TRANSLOG_INTERNAL_PARTS];
+ /*
+ We do not store "thd->transaction.xid_state.xid" for now, it will be
+ needed only when we support XA.
+ */
+ return
+ translog_write_record(&commit_lsn, LOGREC_COMMIT,
+ trn, NULL, 0,
+ sizeof(log_array)/sizeof(log_array[0]),
+ log_array, NULL) ||
+ translog_flush(commit_lsn) || trnman_commit_trn(trn);
+ /*
+ Note: if trnman_commit_trn() fails above, we have already
+ written the COMMIT record, so Checkpoint and Recovery will see the
+ transaction as committed.
+ */
+}
diff --git a/storage/maria/ma_commit.h b/storage/maria/ma_commit.h
new file mode 100644
index 00000000000..2c57c73fd7a
--- /dev/null
+++ b/storage/maria/ma_commit.h
@@ -0,0 +1,18 @@
+/* Copyright (C) 2007 MySQL AB
+
+ This program is free software; you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation; version 2 of the License.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program; if not, write to the Free Software
+ Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA */
+
+C_MODE_START
+int ma_commit(TRN *trn);
+C_MODE_END
diff --git a/storage/maria/ma_control_file.c b/storage/maria/ma_control_file.c
index f53da8a5881..db5440dc873 100644
--- a/storage/maria/ma_control_file.c
+++ b/storage/maria/ma_control_file.c
@@ -50,6 +50,13 @@
LSN last_checkpoint_lsn;
uint32 last_logno;
+/**
+ @brief If log's lock should be asserted when writing to control file.
+
+ Can be re-used by any function which needs to be thread-safe except when
+ it is called at startup.
+*/
+my_bool maria_multi_threaded= FALSE;
/*
Control file is less then 512 bytes (a disk sector),
@@ -203,6 +210,8 @@ err:
the last_checkpoint_lsn and last_logno global variables.
Called when we have created a new log (after syncing this log's creation)
and when we have written a checkpoint (after syncing this log record).
+ Variables last_checkpoint_lsn and last_logno must be protected by caller
+ using log's lock, unless this function is called at startup.
SYNOPSIS
ma_control_file_write_and_force()
@@ -233,12 +242,14 @@ int ma_control_file_write_and_force(const LSN checkpoint_lsn, uint32 logno,
DBUG_ENTER("ma_control_file_write_and_force");
DBUG_ASSERT(control_file_fd >= 0); /* must be open */
+#ifndef DBUG_OFF
+ if (maria_multi_threaded)
+ translog_lock_assert_owner();
+#endif
memcpy(buffer + CONTROL_FILE_MAGIC_STRING_OFFSET,
CONTROL_FILE_MAGIC_STRING, CONTROL_FILE_MAGIC_STRING_SIZE);
- /* TODO: you need some protection to be able to read last_* global vars */
-
if (objs_to_write == CONTROL_FILE_UPDATE_ONLY_LSN)
update_checkpoint_lsn= TRUE;
else if (objs_to_write == CONTROL_FILE_UPDATE_ONLY_LOGNO)
@@ -270,7 +281,6 @@ int ma_control_file_write_and_force(const LSN checkpoint_lsn, uint32 logno,
my_sync(control_file_fd, MYF(MY_WME)))
DBUG_RETURN(1);
- /* TODO: you need some protection to be able to write last_* global vars */
if (update_checkpoint_lsn)
last_checkpoint_lsn= checkpoint_lsn;
if (update_logno)
diff --git a/storage/maria/ma_control_file.h b/storage/maria/ma_control_file.h
index 4728d719b2f..c974838684b 100644
--- a/storage/maria/ma_control_file.h
+++ b/storage/maria/ma_control_file.h
@@ -43,6 +43,8 @@ extern LSN last_checkpoint_lsn;
*/
extern uint32 last_logno;
+extern my_bool maria_multi_threaded;
+
typedef enum enum_control_file_error {
CONTROL_FILE_OK= 0,
CONTROL_FILE_TOO_SMALL,
diff --git a/storage/maria/ma_create.c b/storage/maria/ma_create.c
index d8660dd41cb..53e15deb74b 100644
--- a/storage/maria/ma_create.c
+++ b/storage/maria/ma_create.c
@@ -19,6 +19,7 @@
#include "ma_sp_defs.h"
#include <my_bit.h>
#include "ma_blockrec.h"
+#include "trnman_public.h"
#if defined(MSDOS) || defined(__WIN__)
#ifdef __WIN__
@@ -51,7 +52,8 @@ int maria_create(const char *name, enum data_file_type datafile_type,
unique_key_parts,fulltext_keys,offset, not_block_record_extra_length;
uint max_field_lengths, extra_header_size;
ulong reclength, real_reclength,min_pack_length;
- char filename[FN_REFLEN],linkname[FN_REFLEN], *linkname_ptr;
+ char filename[FN_REFLEN], dlinkname[FN_REFLEN], *dlinkname_ptr= NULL,
+ klinkname[FN_REFLEN], *klinkname_ptr= NULL;
ulong pack_reclength;
ulonglong tot_length,max_rows, tmp;
enum en_fieldtype type;
@@ -62,11 +64,12 @@ int maria_create(const char *name, enum data_file_type datafile_type,
HA_KEYSEG *keyseg,tmp_keyseg;
MARIA_COLUMNDEF *column, *end_column;
ulong *rec_per_key_part;
- my_off_t key_root[HA_MAX_POSSIBLE_KEY];
+ my_off_t key_root[HA_MAX_POSSIBLE_KEY], kfile_size_before_extension;
MARIA_CREATE_INFO tmp_create_info;
my_bool tmp_table= FALSE; /* cache for presence of HA_OPTION_TMP_TABLE */
my_bool forced_packed;
- myf sync_dir= MY_SYNC_DIR;
+ myf sync_dir= 0;
+ uchar *log_data= NULL;
DBUG_ENTER("maria_create");
DBUG_PRINT("enter", ("keys: %u columns: %u uniques: %u flags: %u",
keys, columns, uniques, flags));
@@ -250,8 +253,9 @@ int maria_create(const char *name, enum data_file_type datafile_type,
if (flags & HA_CREATE_TMP_TABLE)
{
options|= HA_OPTION_TMP_TABLE;
+ tmp_table= TRUE;
create_mode|= O_EXCL | O_NOFOLLOW;
- /* temp tables are not crash-safe (dropped at restart) */
+ /* "CREATE TEMPORARY" tables are not crash-safe (dropped at restart) */
ci->transactional= FALSE;
}
share.base.null_bytes= ci->null_bytes;
@@ -624,6 +628,7 @@ int maria_create(const char *name, enum data_file_type datafile_type,
share.state.dellink = HA_OFFSET_ERROR;
share.state.first_bitmap_with_space= 0;
+ share.state.create_rename_lsn= 0;
share.state.process= (ulong) getpid();
share.state.unique= (ulong) 0;
share.state.update_count=(ulong) 0;
@@ -671,11 +676,15 @@ int maria_create(const char *name, enum data_file_type datafile_type,
#endif
/* max_data_file_length and max_key_file_length are recalculated on open */
- if (options & HA_OPTION_TMP_TABLE)
- {
- tmp_table= TRUE;
- sync_dir= 0;
+ if (tmp_table)
share.base.max_data_file_length= (my_off_t) ci->data_file_length;
+ else if (ci->transactional && translog_inited)
+ {
+ /*
+ we have checked translog_inited above, because maria_chk may call us
+ (via maria_recreate_table()) and it does not have a log.
+ */
+ sync_dir= MY_SYNC_DIR;
}
if (datafile_type == BLOCK_RECORD)
@@ -712,9 +721,9 @@ int maria_create(const char *name, enum data_file_type datafile_type,
MY_UNPACK_FILENAME | (have_iext ? MY_REPLACE_EXT :
MY_APPEND_EXT));
}
- fn_format(linkname, name, "", MARIA_NAME_IEXT,
+ fn_format(klinkname, name, "", MARIA_NAME_IEXT,
MY_UNPACK_FILENAME|MY_APPEND_EXT);
- linkname_ptr=linkname;
+ klinkname_ptr= klinkname;
/*
Don't create the table if the link or file exists to ensure that one
doesn't accidently destroy another table.
@@ -730,7 +739,6 @@ int maria_create(const char *name, enum data_file_type datafile_type,
(MY_UNPACK_FILENAME |
(flags & HA_DONT_TOUCH_DATA) ? MY_RETURN_REAL_PATH : 0) |
MY_APPEND_EXT);
- linkname_ptr=0;
/*
Replace the current file.
Don't sync dir now if the data file has the same path.
@@ -753,7 +761,7 @@ int maria_create(const char *name, enum data_file_type datafile_type,
goto err;
}
- if ((file= my_create_with_symlink(linkname_ptr, filename, 0, create_mode,
+ if ((file= my_create_with_symlink(klinkname_ptr, filename, 0, create_mode,
MYF(MY_WME|create_flag))) < 0)
goto err;
errpos=1;
@@ -780,24 +788,24 @@ int maria_create(const char *name, enum data_file_type datafile_type,
MY_UNPACK_FILENAME |
(have_dext ? MY_REPLACE_EXT : MY_APPEND_EXT));
}
- fn_format(linkname, name, "",MARIA_NAME_DEXT,
+ fn_format(dlinkname, name, "",MARIA_NAME_DEXT,
MY_UNPACK_FILENAME | MY_APPEND_EXT);
- linkname_ptr=linkname;
+ dlinkname_ptr= dlinkname;
create_flag=0;
}
else
{
fn_format(filename,name,"", MARIA_NAME_DEXT,
MY_UNPACK_FILENAME | MY_APPEND_EXT);
- linkname_ptr=0;
create_flag=MY_DELETE_OLD;
}
if ((dfile=
- my_create_with_symlink(linkname_ptr, filename, 0, create_mode,
+ my_create_with_symlink(dlinkname_ptr, filename, 0, create_mode,
MYF(MY_WME | create_flag | sync_dir))) < 0)
goto err;
errpos=3;
+ share.data_file_type= datafile_type;
if (_ma_initialize_data_file(dfile, &share))
goto err;
}
@@ -925,14 +933,82 @@ int maria_create(const char *name, enum data_file_type datafile_type,
goto err;
}
+ if ((kfile_size_before_extension= my_tell(file,MYF(0))) == MY_FILEPOS_ERROR)
+ goto err;
#ifndef DBUG_OFF
- if ((uint) my_tell(file,MYF(0)) != info_length)
+ if (kfile_size_before_extension != info_length)
+ DBUG_PRINT("warning",("info_length: %u != used_length: %u",
+ info_length, (uint)kfile_size_before_extension));
+#endif
+
+ if (sync_dir)
{
- uint pos= (uint) my_tell(file,MYF(0));
- DBUG_PRINT("warning",("info_length: %d != used_length: %d",
- info_length, pos));
+ /*
+ we log the first bytes and then the size to which we extend; this is
+ not log 1 KB of mostly zeroes if this is a small table.
+ */
+ char empty_string[]= "";
+ LEX_STRING log_array[TRANSLOG_INTERNAL_PARTS + 3];
+ uint total_rec_length= 0;
+ uint i;
+ log_array[TRANSLOG_INTERNAL_PARTS + 0].length= 1 + 2 +
+ kfile_size_before_extension;
+ /* we are needing maybe 64 kB, so don't use the stack */
+ log_data= my_malloc(log_array[TRANSLOG_INTERNAL_PARTS + 0].length, MYF(0));
+ if ((log_data == NULL) ||
+ my_pread(file, 1 + 2 + log_data, kfile_size_before_extension,
+ 0, MYF(MY_NABP)))
+ goto err_no_lock;
+ /*
+ remember if the data file was created or not, to know if Recovery can
+ do it or not, in the future
+ */
+ log_data[0]= test(flags & HA_DONT_TOUCH_DATA);
+ int2store(log_data + 1, kfile_size_before_extension);
+ log_array[TRANSLOG_INTERNAL_PARTS + 0].str= log_data;
+ /* symlink description is also needed for re-creation by Recovery: */
+ log_array[TRANSLOG_INTERNAL_PARTS + 1].str=
+ dlinkname_ptr ? dlinkname : empty_string;
+ log_array[TRANSLOG_INTERNAL_PARTS + 1].length=
+ strlen(log_array[TRANSLOG_INTERNAL_PARTS + 1].str);
+ log_array[TRANSLOG_INTERNAL_PARTS + 2].str=
+ klinkname_ptr ? klinkname : empty_string;
+ log_array[TRANSLOG_INTERNAL_PARTS + 2].length=
+ strlen(log_array[TRANSLOG_INTERNAL_PARTS + 2].str);
+ for (i= TRANSLOG_INTERNAL_PARTS;
+ i < (sizeof(log_array)/sizeof(log_array[0])); i++)
+ total_rec_length+= log_array[i].length;
+ /*
+ For this record to be of any use for Recovery, we need the upper
+ MySQL layer to be crash-safe, which it is not now (that would require
+ work using the ddl_log of sql/sql_table.cc); when it is, we should
+ reconsider the moment of writing this log record (before or after op,
+ under THR_LOCK_maria or not...), how to use it in Recovery, and force
+ the log. For now this record is just informative.
+ Note that in case of TRUNCATE TABLE we also come here.
+ When in CREATE/TRUNCATE (or DROP or RENAME or REPAIR) we have not called
+ external_lock(), so have no TRN. It does not matter, as all these
+ operations are non-transactional and sync their files.
+ */
+ if (unlikely(translog_write_record(&share.state.create_rename_lsn,
+ LOGREC_REDO_CREATE_TABLE,
+ &dummy_transaction_object, NULL,
+ total_rec_length,
+ sizeof(log_array)/sizeof(log_array[0]),
+ log_array, NULL)))
+ goto err_no_lock;
+ /*
+ store LSN into file, needed for Recovery to not be confused if a
+ DROP+CREATE happened (applying REDOs to the wrong table).
+ If such direct my_pwrite() to a fixed offset is too "hackish", I can
+ call ma_state_info_write() again but it will be less efficient.
+ */
+ lsn_store(log_data, share.state.create_rename_lsn);
+ if (my_pwrite(file, log_data, LSN_STORE_SIZE,
+ sizeof(share.state.header) + 2, MYF(MY_NABP)))
+ goto err_no_lock;
+ my_free(log_data, MYF(0));
}
-#endif
/* Enlarge files */
DBUG_PRINT("info", ("enlarge to keystart: %lu",
@@ -940,38 +1016,25 @@ int maria_create(const char *name, enum data_file_type datafile_type,
if (my_chsize(file,(ulong) share.base.keystart,0,MYF(0)))
goto err;
+ if (sync_dir && my_sync(file, MYF(0)))
+ goto err;
+
if (! (flags & HA_DONT_TOUCH_DATA))
{
#ifdef USE_RELOC
if (my_chsize(dfile,share.base.min_pack_length*ci->reloc_rows,0,MYF(0)))
goto err;
- if (!tmp_table && my_sync(file, MYF(0)))
- goto err;
#endif
- /* if !USE_RELOC, there was no write to the file, no need to sync it */
errpos=2;
- if (my_close(dfile,MYF(0)))
+ if ((sync_dir && my_sync(dfile, MYF(0))) || my_close(dfile,MYF(0)))
goto err;
}
- errpos=0;
pthread_mutex_unlock(&THR_LOCK_maria);
res= 0;
+ my_free((char*) rec_per_key_part,MYF(0));
+ errpos=0;
if (my_close(file,MYF(0)))
res= my_errno;
- /*
- RECOVERY TODO
- Write a log record describing the CREATE operation (just the file
- names, link names, and the full header's content).
- For this record to be of any use for Recovery, we need the upper
- MySQL layer to be crash-safe, which it is not now (that would require work
- using the ddl_log of sql/sql_table.cc); when is is, we should reconsider
- the moment of writing this log record (before or after op, under
- THR_LOCK_maria or not...), how to use it in Recovery, and force the log.
- For now this record is just informative.
- If operation failed earlier, we clean up in "err:" and the MySQL layer
- will clean up the frm, so we needn't write anything to the log.
- */
- my_free((char*) rec_per_key_part,MYF(0));
DBUG_RETURN(res);
err:
@@ -996,6 +1059,7 @@ err_no_lock:
MY_UNPACK_FILENAME | MY_APPEND_EXT),
sync_dir);
}
+ my_free(log_data, MYF(MY_ALLOW_ZERO_PTR));
my_free((char*) rec_per_key_part, MYF(0));
DBUG_RETURN(my_errno=save_errno); /* return the fatal errno */
}
@@ -1086,9 +1150,9 @@ int _ma_initialize_data_file(File dfile, MARIA_SHARE *share)
{
if (share->data_file_type == BLOCK_RECORD)
{
- if (my_chsize(dfile, maria_block_size, 0, MYF(MY_WME)))
+ if (my_chsize(dfile, share->base.block_size, 0, MYF(MY_WME)))
return 1;
- share->state.state.data_file_length= maria_block_size;
+ share->state.state.data_file_length= share->base.block_size;
_ma_bitmap_delete_all(share);
}
return 0;
diff --git a/storage/maria/ma_delete_all.c b/storage/maria/ma_delete_all.c
index 2d85b347662..7286f540aa1 100644
--- a/storage/maria/ma_delete_all.c
+++ b/storage/maria/ma_delete_all.c
@@ -17,21 +17,38 @@
/* This clears the status information and truncates files */
#include "maria_def.h"
+#include "trnman_public.h"
+
+/**
+ @brief deletes all rows from a table
+
+ @param info Maria handler
+
+ @return Operation status
+ @retval 0 ok
+ @retval 1 error
+*/
int maria_delete_all_rows(MARIA_HA *info)
{
uint i;
MARIA_SHARE *share=info->s;
MARIA_STATE_INFO *state=&share->state;
+ my_bool log_record;
DBUG_ENTER("maria_delete_all_rows");
if (share->options & HA_OPTION_READ_ONLY_DATA)
{
DBUG_RETURN(my_errno=EACCES);
}
- /* LOCK TODO take X-lock on table here */
+ /**
+ @todo LOCK take X-lock on table here.
+ When we have versioning, if some other thread is looking at this table,
+ we cannot shrink the file like this.
+ */
if (_ma_readinfo(info,F_WRLCK,1))
DBUG_RETURN(my_errno);
+ log_record= share->base.transactional && !share->temporary;
if (_ma_mark_file_changed(info))
goto err;
@@ -54,27 +71,13 @@ int maria_delete_all_rows(MARIA_HA *info)
*/
flush_pagecache_blocks(share->pagecache, &share->kfile,
FLUSH_IGNORE_CHANGED);
- /*
- RECOVERY TODO Log the two chsize and header modifications and force the
- log. So that if crash between the two chsize, we finish the work at
- Recovery. For this scenario:
- "TRUNCATE TABLE t1; DROP TABLE t1; RENAME TABLE t2 to t1; crash;"
- Recovery mustn't truncate the new t1, so the log records of TRUNCATE
- should be applied only if t1 exists and its ZeroDirtyPagesLSN is smaller
- than the records'. See more comments below.
- */
if (my_chsize(info->dfile.file, 0, 0, MYF(MY_WME)) ||
my_chsize(share->kfile.file, share->base.keystart, 0, MYF(MY_WME)) )
goto err;
- if (_ma_initialize_data_file(info->dfile.file, info->s))
+ if (_ma_initialize_data_file(info->dfile.file, share))
goto err;
- /*
- RECOVERY TODO Consider updating ZeroDirtyPagesLSN here. It is
- not a necessity (it is one only in RENAME commands) but an optional
- optimization which will allow some REDO skipping at Recovery.
- */
VOID(_ma_writeinfo(info,WRITEINFO_UPDATE_KEYFILE));
#ifdef HAVE_MMAP
/* Resize mmaped area */
@@ -82,24 +85,48 @@ int maria_delete_all_rows(MARIA_HA *info)
_ma_remap_file(info, (my_off_t)0);
rw_unlock(&info->s->mmap_lock);
#endif
- /*
- RECOVERY TODO Until we have the TRUNCATE log record and take it into
- account for log-low-water-mark calculation and use it in Recovery, we need
- to sync.
- */
- if (_ma_sync_table_files(info))
- goto err;
+ if (log_record)
+ {
+ /* For now this record is only informative */
+ LEX_STRING log_array[TRANSLOG_INTERNAL_PARTS + 1];
+ uchar log_data[LSN_STORE_SIZE];
+ log_array[TRANSLOG_INTERNAL_PARTS + 0].str= (char*) log_data;
+ log_array[TRANSLOG_INTERNAL_PARTS + 0].length= FILEID_STORE_SIZE;
+ if (unlikely(translog_write_record(&share->state.create_rename_lsn,
+ LOGREC_REDO_DELETE_ALL,
+ info->trn, share, 0,
+ sizeof(log_array)/sizeof(log_array[0]),
+ log_array, log_data)))
+ goto err;
+ /*
+ store LSN into file. It is an optimization so that all old REDOs for
+ this table are ignored (scenario: checkpoint, INSERT1s, DELETE ALL;
+ INSERT2s, crash: then Recovery can skip INSERT1s). It also allows us to
+ ignore the present record at Recovery.
+ Note that storing the LSN could not be done by _ma_writeinfo() above as
+ the table is locked at this moment. So we need to do it by ourselves.
+ */
+ lsn_store(log_data, share->state.create_rename_lsn);
+ if (my_pwrite(share->kfile.file, log_data, sizeof(log_data),
+ sizeof(share->state.header) + 2, MYF(MY_NABP)) ||
+ _ma_sync_table_files(info))
+ goto err;
+ /**
+ @todo RECOVERY Until we take into account the log record above
+ for log-low-water-mark calculation and use it in Recovery, we need
+ to sync above.
+ */
+ }
allow_break(); /* Allow SIGHUP & SIGINT */
DBUG_RETURN(0);
err:
{
int save_errno=my_errno;
- /* RECOVERY TODO log the header modifications */
VOID(_ma_writeinfo(info,WRITEINFO_UPDATE_KEYFILE));
info->update|=HA_STATE_WRITTEN; /* Buffer changed */
- /* RECOVERY TODO until we log above we have to sync */
- if (_ma_sync_table_files(info) && !save_errno)
+ /** @todo RECOVERY until we use the log record above we have to sync */
+ if (log_record &&_ma_sync_table_files(info) && !save_errno)
save_errno= my_errno;
allow_break(); /* Allow SIGHUP & SIGINT */
DBUG_RETURN(my_errno=save_errno);
diff --git a/storage/maria/ma_delete_table.c b/storage/maria/ma_delete_table.c
index aafe7a1dee9..990714043bf 100644
--- a/storage/maria/ma_delete_table.c
+++ b/storage/maria/ma_delete_table.c
@@ -13,11 +13,18 @@
along with this program; if not, write to the Free Software
Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA */
-/*
- deletes a table
-*/
-
#include "ma_fulltext.h"
+#include "trnman_public.h"
+
+/**
+ @brief drops (deletes) a table
+
+ @param name table's name
+
+ @return Operation status
+ @retval 0 ok
+ @retval 1 error
+*/
int maria_delete_table(const char *name)
{
@@ -25,56 +32,78 @@ int maria_delete_table(const char *name)
#ifdef USE_RAID
uint raid_type=0,raid_chunks=0;
#endif
+ MARIA_HA *info;
+ myf sync_dir;
DBUG_ENTER("maria_delete_table");
#ifdef EXTRA_DEBUG
_ma_check_table_is_closed(name,"delete");
#endif
- /* LOCK TODO take X-lock on table here */
+ /** @todo LOCK take X-lock on table */
+ /*
+ We need to know if this table is transactional.
+ When built with RAID support, we also need to determine if this table
+ makes use of the raid feature. If yes, we need to remove all raid
+ chunks. This is done with my_raid_delete(). Unfortunately it is
+ necessary to open the table just to check this. We use
+ 'open_for_repair' to be able to open even a crashed table. If even
+ this open fails, we assume no raid configuration for this table
+ and try to remove the normal data file only. This may however
+ leave the raid chunks behind.
+ */
+ if (!(info= maria_open(name, O_RDONLY, HA_OPEN_FOR_REPAIR)))
+ {
#ifdef USE_RAID
+ raid_type= 0;
+#endif
+ sync_dir= 0;
+ }
+ else
{
- MARIA_HA *info;
- /*
- When built with RAID support, we need to determine if this table
- makes use of the raid feature. If yes, we need to remove all raid
- chunks. This is done with my_raid_delete(). Unfortunately it is
- necessary to open the table just to check this. We use
- 'open_for_repair' to be able to open even a crashed table. If even
- this open fails, we assume no raid configuration for this table
- and try to remove the normal data file only. This may however
- leave the raid chunks behind.
- */
- if (!(info= maria_open(name, O_RDONLY, HA_OPEN_FOR_REPAIR)))
- raid_type= 0;
- else
- {
- raid_type= info->s->base.raid_type;
- raid_chunks= info->s->base.raid_chunks;
- maria_close(info);
- }
+#ifdef USE_RAID
+ raid_type= info->s->base.raid_type;
+ raid_chunks= info->s->base.raid_chunks;
+#endif
+ sync_dir= (info->s->base.transactional && !info->s->temporary) ?
+ MY_SYNC_DIR : 0;
+ maria_close(info);
}
+#ifdef USE_RAID
#ifdef EXTRA_DEBUG
_ma_check_table_is_closed(name,"delete");
#endif
#endif /* USE_RAID */
+ if (sync_dir)
+ {
+ /*
+ For this log record to be of any use for Recovery, we need the upper
+ MySQL layer to be crash-safe in DDLs; when it is we should reconsider
+ the moment of writing this log record, how to use it in Recovery, and
+ force the log. For now this record is only informative.
+ */
+ LSN lsn;
+ LEX_STRING log_array[TRANSLOG_INTERNAL_PARTS + 1];
+ log_array[TRANSLOG_INTERNAL_PARTS + 0].str= (char *)name;
+ log_array[TRANSLOG_INTERNAL_PARTS + 0].length= strlen(name);
+ if (unlikely(translog_write_record(&lsn, LOGREC_REDO_DROP_TABLE,
+ &dummy_transaction_object, NULL,
+ log_array[TRANSLOG_INTERNAL_PARTS +
+ 0].length,
+ sizeof(log_array)/sizeof(log_array[0]),
+ log_array, NULL)))
+ DBUG_RETURN(1);
+ }
+
fn_format(from,name,"",MARIA_NAME_IEXT,MY_UNPACK_FILENAME|MY_APPEND_EXT);
- /*
- RECOVERY TODO log the two deletes below.
- Then do the file deletions.
- For this log record to be of any use for Recovery, we need the upper MySQL
- layer to be crash-safe in DDLs; when it is we should reconsider the moment
- of writing this log record, how to use it in Recovery, and force the log.
- For now this record is only informative.
- */
- if (my_delete_with_symlink(from, MYF(MY_WME | MY_SYNC_DIR)))
+ if (my_delete_with_symlink(from, MYF(MY_WME | sync_dir)))
DBUG_RETURN(my_errno);
fn_format(from,name,"",MARIA_NAME_DEXT,MY_UNPACK_FILENAME|MY_APPEND_EXT);
#ifdef USE_RAID
if (raid_type)
- DBUG_RETURN(my_raid_delete(from, raid_chunks, MYF(MY_WME | MY_SYNC_DIR)) ?
+ DBUG_RETURN(my_raid_delete(from, raid_chunks, MYF(MY_WME | sync_dir)) ?
my_errno : 0);
#endif
- DBUG_RETURN(my_delete_with_symlink(from, MYF(MY_WME | MY_SYNC_DIR)) ?
+ DBUG_RETURN(my_delete_with_symlink(from, MYF(MY_WME | sync_dir)) ?
my_errno : 0);
}
diff --git a/storage/maria/ma_extra.c b/storage/maria/ma_extra.c
index d6a0d2f4441..61eba165412 100644
--- a/storage/maria/ma_extra.c
+++ b/storage/maria/ma_extra.c
@@ -21,21 +21,20 @@
static void maria_extra_keyflag(MARIA_HA *info,
enum ha_extra_function function);
+/**
+ @brief Set options and buffers to optimize table handling
-/*
- Set options and buffers to optimize table handling
+ @param name table's name
+ @param info open table
+ @param function operation
+ @param extra_arg Pointer to extra argument (normally pointer to
+ ulong); used when function is one of:
+ HA_EXTRA_WRITE_CACHE
+ HA_EXTRA_CACHE
- SYNOPSIS
- maria_extra()
- info open table
- function operation
- extra_arg Pointer to extra argument (normally pointer to ulong)
- Used when function is one of:
- HA_EXTRA_WRITE_CACHE
- HA_EXTRA_CACHE
- RETURN VALUES
- 0 ok
- # error
+ @return Operation status
+ @retval 0 ok
+ @retval !=0 error
*/
int maria_extra(MARIA_HA *info, enum ha_extra_function function,
@@ -265,14 +264,24 @@ int maria_extra(MARIA_HA *info, enum ha_extra_function function,
pthread_mutex_unlock(&THR_LOCK_maria);
break;
case HA_EXTRA_PREPARE_FOR_DELETE:
+ /* QQ: suggest to rename it to "PREPARE_FOR_DROP" */
pthread_mutex_lock(&THR_LOCK_maria);
share->last_version= 0L; /* Impossible version */
#ifdef __WIN__
/* Close the isam and data files as Win32 can't drop an open table */
pthread_mutex_lock(&share->intern_lock);
+ /*
+ If this is Windows we remove blocks from pagecache. If not Windows we
+ don't do it, so these pages stay in the pagecache? So they may later be
+ flushed to a wrong file?
+ Or is it that this flush_pagecache_blocks() never finds any blocks? Then
+ why do we do it on Windows?
+ Don't we wait for all instances to be closed before dropping the table?
+ Do we ever do something useful here?
+ BUG?
+ */
if (flush_pagecache_blocks(share->pagecache, &share->kfile,
- (function == HA_EXTRA_FORCE_REOPEN ?
- FLUSH_RELEASE : FLUSH_IGNORE_CHANGED)))
+ FLUSH_IGNORE_CHANGED))
{
error=my_errno;
share->changed=1;
@@ -292,9 +301,11 @@ int maria_extra(MARIA_HA *info, enum ha_extra_function function,
info->lock_type = F_UNLCK;
}
if (share->kfile.file >= 0)
+ {
_ma_decrement_open_count(info);
- if (share->kfile.file >= 0 && my_close(share->kfile,MYF(0)))
- error=my_errno;
+ if (my_close(share->kfile,MYF(0)))
+ error=my_errno;
+ }
{
LIST *list_element ;
for (list_element=maria_open_list ;
@@ -304,6 +315,9 @@ int maria_extra(MARIA_HA *info, enum ha_extra_function function,
MARIA_HA *tmpinfo=(MARIA_HA*) list_element->data;
if (tmpinfo->s == info->s)
{
+ /**
+ @todo RECOVERY BUG: flush of bitmap and sync of dfile are missing
+ */
if (tmpinfo->dfile.file >= 0 &&
my_close(tmpinfo->dfile.file, MYF(0)))
error = my_errno;
diff --git a/storage/maria/ma_init.c b/storage/maria/ma_init.c
index ac4826a721d..8042c6d9873 100644
--- a/storage/maria/ma_init.c
+++ b/storage/maria/ma_init.c
@@ -53,7 +53,7 @@ void maria_end(void)
{
if (maria_inited)
{
- maria_inited= FALSE;
+ maria_inited= maria_multi_threaded= FALSE;
ft_free_stopwords();
trnman_destroy();
translog_destroy();
diff --git a/storage/maria/ma_loghandler.c b/storage/maria/ma_loghandler.c
index 474f50e1e2c..9ed1d4b9d93 100644
--- a/storage/maria/ma_loghandler.c
+++ b/storage/maria/ma_loghandler.c
@@ -17,6 +17,14 @@
#include "ma_blockrec.h"
#include "trnman.h"
+/**
+ @file
+ @brief Module which writes and reads to a transaction log
+
+ @todo LOG: in functions where the log's lock is required, a
+ translog_assert_owner() could be added.
+*/
+
/* number of opened log files in the pagecache (should be at least 2) */
#define OPENED_FILES_NUM 3
@@ -166,7 +174,7 @@ static struct st_translog_descriptor log_descriptor;
/* Marker for end of log */
static byte end_of_log= 0;
-static my_bool translog_inited;
+my_bool translog_inited= 0;
/* record classes */
enum record_class
@@ -218,7 +226,7 @@ struct st_log_record_type_descriptor
uint16 read_header_len;
/* HOOK for writing the record called before lock */
prewrite_rec_hook prewrite_hook;
- /* HOOK for writing the record called when LSN is known */
+ /* HOOK for writing the record called when LSN is known, inside lock */
inwrite_rec_hook inwrite_hook;
/* HOOK for reading headers */
read_rec_hook read_hook;
@@ -230,6 +238,13 @@ struct st_log_record_type_descriptor
};
+#include <my_atomic.h>
+/* an array that maps id of a MARIA_SHARE to this MARIA_SHARE */
+static MARIA_SHARE **id_to_share= NULL;
+#define SHARE_ID_MAX 65535 /* array's size */
+/* lock for id_to_share */
+static my_atomic_rwlock_t LOCK_id_to_share;
+
static my_bool write_hook_for_redo(enum translog_record_type type,
TRN *trn, LSN *lsn,
struct st_translog_parts *parts);
@@ -291,7 +306,9 @@ static LOG_DESC INIT_LOGREC_REDO_INSERT_ROW_HEAD=
write_hook_for_redo, NULL, 0};
static LOG_DESC INIT_LOGREC_REDO_INSERT_ROW_TAIL=
-{LOGRECTYPE_VARIABLE_LENGTH, 0, 8, NULL, NULL, NULL, 0};
+{LOGRECTYPE_VARIABLE_LENGTH, 0,
+ FILEID_STORE_SIZE + PAGE_STORE_SIZE + DIRPOS_STORE_SIZE, NULL,
+ write_hook_for_redo, NULL, 0};
static LOG_DESC INIT_LOGREC_REDO_INSERT_ROW_BLOB=
{LOGRECTYPE_VARIABLE_LENGTH, 0, 8, NULL, write_hook_for_redo, NULL, 0};
@@ -376,15 +393,9 @@ static LOG_DESC INIT_LOGREC_COMMIT=
static LOG_DESC INIT_LOGREC_COMMIT_WITH_UNDO_PURGE=
{LOGRECTYPE_PSEUDOFIXEDLENGTH, 5, 5, NULL, NULL, NULL, 1};
-static LOG_DESC INIT_LOGREC_CHECKPOINT_PAGE=
-{LOGRECTYPE_VARIABLE_LENGTH, 0, 6, NULL, NULL, NULL, 0};
-
-static LOG_DESC INIT_LOGREC_CHECKPOINT_TRAN=
+static LOG_DESC INIT_LOGREC_CHECKPOINT=
{LOGRECTYPE_VARIABLE_LENGTH, 0, 0, NULL, NULL, NULL, 0};
-static LOG_DESC INIT_LOGREC_CHECKPOINT_TABL=
-{LOGRECTYPE_VARIABLE_LENGTH, 0, 8, NULL, NULL, NULL, 0};
-
static LOG_DESC INIT_LOGREC_REDO_CREATE_TABLE=
{LOGRECTYPE_VARIABLE_LENGTH, 0, 0, NULL, NULL, NULL, 0};
@@ -394,8 +405,13 @@ static LOG_DESC INIT_LOGREC_REDO_RENAME_TABLE=
static LOG_DESC INIT_LOGREC_REDO_DROP_TABLE=
{LOGRECTYPE_VARIABLE_LENGTH, 0, 0, NULL, NULL, NULL, 0};
-static LOG_DESC INIT_LOGREC_REDO_TRUNCATE_TABLE=
-{LOGRECTYPE_VARIABLE_LENGTH, 0, 0, NULL, NULL, NULL, 0};
+static LOG_DESC INIT_LOGREC_REDO_DELETE_ALL=
+{LOGRECTYPE_FIXEDLENGTH, FILEID_STORE_SIZE, FILEID_STORE_SIZE,
+ NULL, NULL, NULL, 0};
+
+static LOG_DESC INIT_LOGREC_REDO_REPAIR_TABLE=
+{LOGRECTYPE_FIXEDLENGTH, FILEID_STORE_SIZE + 4, FILEID_STORE_SIZE + 4,
+ NULL, NULL, NULL, 0};
static LOG_DESC INIT_LOGREC_FILE_ID=
{LOGRECTYPE_VARIABLE_LENGTH, 0, 4, NULL, NULL, NULL, 0};
@@ -403,6 +419,7 @@ static LOG_DESC INIT_LOGREC_FILE_ID=
static LOG_DESC INIT_LOGREC_LONG_TRANSACTION_ID=
{LOGRECTYPE_FIXEDLENGTH, 6, 6, NULL, NULL, NULL, 0};
+const myf log_write_flags= MY_WME | MY_NABP | MY_WAIT_IF_FULL;
static void loghandler_init()
{
@@ -454,20 +471,18 @@ static void loghandler_init()
INIT_LOGREC_COMMIT;
log_record_type_descriptor[LOGREC_COMMIT_WITH_UNDO_PURGE]=
INIT_LOGREC_COMMIT_WITH_UNDO_PURGE;
- log_record_type_descriptor[LOGREC_CHECKPOINT_PAGE]=
- INIT_LOGREC_CHECKPOINT_PAGE;
- log_record_type_descriptor[LOGREC_CHECKPOINT_TRAN]=
- INIT_LOGREC_CHECKPOINT_TRAN;
- log_record_type_descriptor[LOGREC_CHECKPOINT_TABL]=
- INIT_LOGREC_CHECKPOINT_TABL;
+ log_record_type_descriptor[LOGREC_CHECKPOINT]=
+ INIT_LOGREC_CHECKPOINT;
log_record_type_descriptor[LOGREC_REDO_CREATE_TABLE]=
INIT_LOGREC_REDO_CREATE_TABLE;
log_record_type_descriptor[LOGREC_REDO_RENAME_TABLE]=
INIT_LOGREC_REDO_RENAME_TABLE;
log_record_type_descriptor[LOGREC_REDO_DROP_TABLE]=
INIT_LOGREC_REDO_DROP_TABLE;
- log_record_type_descriptor[LOGREC_REDO_TRUNCATE_TABLE]=
- INIT_LOGREC_REDO_TRUNCATE_TABLE;
+ log_record_type_descriptor[LOGREC_REDO_DELETE_ALL]=
+ INIT_LOGREC_REDO_DELETE_ALL;
+ log_record_type_descriptor[LOGREC_REDO_REPAIR_TABLE]=
+ INIT_LOGREC_REDO_REPAIR_TABLE;
log_record_type_descriptor[LOGREC_FILE_ID]=
INIT_LOGREC_FILE_ID;
log_record_type_descriptor[LOGREC_LONG_TRANSACTION_ID]=
@@ -554,6 +569,7 @@ static File open_logfile_by_number_no_cache(uint32 file_no)
DBUG_ENTER("open_logfile_by_number_no_cache");
/* TODO: add O_DIRECT to open flags (when buffer is aligned) */
+ /* TODO: use my_create() */
if ((file= my_open(translog_filename_by_fileno(file_no, path),
O_CREAT | O_BINARY | O_RDWR,
MYF(MY_WME))) < 0)
@@ -615,7 +631,7 @@ static my_bool translog_write_file_header()
bzero(page, sizeof(page_buff) - (page- page_buff));
DBUG_RETURN(my_pwrite(log_descriptor.log_file_num[0], page_buff,
- sizeof(page_buff), 0, MYF(MY_WME | MY_NABP)) != 0);
+ sizeof(page_buff), 0, log_write_flags) != 0);
}
@@ -1222,7 +1238,7 @@ static my_bool translog_buffer_next(TRANSLOG_ADDRESS *horizon,
/*
- Set max LSN send to file
+ Set max LSN sent to file
SYNOPSIS
translog_set_sent_to_file()
@@ -1512,7 +1528,7 @@ static my_bool translog_buffer_flush(struct st_translog_buffer *buffer)
}
if (my_pwrite(buffer->file, (char*) buffer->buffer,
buffer->size, LSN_OFFSET(buffer->offset),
- MYF(MY_WME | MY_NABP)))
+ log_write_flags))
{
UNRECOVERABLE_ERROR(("Can't write buffer (%lu,0x%lx) size %lu "
"to the disk (%d)",
@@ -2230,7 +2246,16 @@ my_bool translog_init(const char *directory,
*/
log_descriptor.flushed--; /* offset decreased */
log_descriptor.sent_to_file--; /* offset decreased */
-
+ /*
+ Log records will refer to a MARIA_SHARE by a unique 2-byte id; set up
+ structures for generating 2-byte ids:
+ */
+ my_atomic_rwlock_init(&LOCK_id_to_share);
+ id_to_share= (MARIA_SHARE **) my_malloc(SHARE_ID_MAX*sizeof(MARIA_SHARE*),
+ MYF(MY_WME|MY_ZEROFILL));
+ if (unlikely(!id_to_share))
+ DBUG_RETURN(1);
+ id_to_share--; /* min id is 1 */
translog_inited= 1;
DBUG_RETURN(0);
}
@@ -2303,6 +2328,8 @@ void translog_destroy()
}
pthread_mutex_destroy(&log_descriptor.sent_to_file_lock);
my_close(log_descriptor.directory_fd, MYF(MY_WME));
+ my_atomic_rwlock_destroy(&LOCK_id_to_share);
+ my_free((gptr)(id_to_share + 1), MYF(MY_ALLOW_ZERO_PTR));
translog_inited= 0;
}
DBUG_VOID_RETURN;
@@ -2362,6 +2389,14 @@ static inline my_bool translog_unlock()
}
+#define translog_buffer_lock_assert_owner(B) \
+ safe_mutex_assert_owner(&B->mutex);
+void translog_lock_assert_owner()
+{
+ translog_buffer_lock_assert_owner(log_descriptor.bc.buffer);
+}
+
+
/*
Start new page
@@ -4154,26 +4189,30 @@ err:
}
-/*
- Write the log record
-
- SYNOPSIS
- translog_write_record()
- lsn LSN of the record will be written here
- type the log record type
- trn Transaction structure pointer for hooks by
- record log type, for short_id
- share MARIA_SHARE of table or NULL
- rec_len record length or 0 (count it)
- part_no number of parts or 0 (count it)
- parts_data zero ended (in case of number of parts is 0)
- array of LEX_STRINGs (parts), first
- TRANSLOG_INTERNAL_PARTS positions in the log
- should be unused (need for loghandler)
-
- RETURN
- 0 OK
- 1 Error
+/**
+ @brief Writes the log record
+
+ If share has no 2-byte-id yet, gives an id to the share and logs
+ LOGREC_FILE_ID. If transaction has not logged LOGREC_LONG_TRANSACTION_ID
+ yet, logs it.
+
+ @param lsn LSN of the record will be written here
+ @param type the log record type
+ @param trn Transaction structure pointer for hooks by
+ record log type, for short_id
+ @param share MARIA_SHARE of table or NULL
+ @param rec_len record length or 0 (count it)
+ @param part_no number of parts or 0 (count it)
+ @param parts_data zero ended (in case of number of parts is 0)
+ array of LEX_STRINGs (parts), first
+ TRANSLOG_INTERNAL_PARTS positions in the log
+ should be unused (need for loghandler)
+ @param store_share_id if share!=NULL then share's id will automatically
+ be stored in the two first bytes pointed (so
+ pointer is assumed to be !=NULL)
+ @return Operation status
+ @retval 0 OK
+ @retval 1 Error
*/
my_bool translog_write_record(LSN *lsn,
@@ -4181,7 +4220,8 @@ my_bool translog_write_record(LSN *lsn,
TRN *trn, struct st_maria_share *share,
translog_size_t rec_len,
uint part_no,
- LEX_STRING *parts_data)
+ LEX_STRING *parts_data,
+ uchar *store_share_id)
{
struct st_translog_parts parts;
LEX_STRING *part;
@@ -4191,10 +4231,41 @@ my_bool translog_write_record(LSN *lsn,
DBUG_PRINT("enter", ("type: %u ShortTrID: %u",
(uint) type, (uint)short_trid));
- if (share && !share->base.transactional)
+ if (share)
{
- DBUG_PRINT("info", ("It is not transactional table"));
- DBUG_RETURN(0);
+ if (!share->base.transactional)
+ {
+ DBUG_PRINT("info", ("It is not transactional table"));
+ DBUG_RETURN(0);
+ }
+ if (unlikely(share->id == 0))
+ {
+ /*
+ First log write for this MARIA_SHARE; give it a short id.
+ When the lock manager is enabled and needs a short id, it should be
+ assigned in the lock manager (because row locks will be taken before
+ log records are written; for example SELECT FOR UPDATE takes locks but
+ writes no log record.
+ */
+ if (unlikely(translog_assign_id_to_share(share, trn)))
+ DBUG_RETURN(1);
+ }
+ fileid_store(store_share_id, share->id);
+ }
+ if (unlikely(!(trn->first_undo_lsn & TRANSACTION_LOGGED_LONG_ID)))
+ {
+ LSN lsn;
+ LEX_STRING log_array[TRANSLOG_INTERNAL_PARTS + 1];
+ uchar log_data[6];
+ int6store(log_data, trn->trid);
+ log_array[TRANSLOG_INTERNAL_PARTS + 0].str= (char*) log_data;
+ log_array[TRANSLOG_INTERNAL_PARTS + 0].length= sizeof(log_data);
+ trn->first_undo_lsn|= TRANSACTION_LOGGED_LONG_ID; /* no recursion */
+ if (unlikely(translog_write_record(&lsn, LOGREC_LONG_TRANSACTION_ID,
+ trn, NULL, sizeof(log_data),
+ sizeof(log_array)/sizeof(log_array[0]),
+ log_array, NULL)))
+ DBUG_RETURN(1);
}
parts.parts= parts_data;
@@ -4375,20 +4446,19 @@ void translog_free_record_header(TRANSLOG_HEADER_BUFFER *buff)
}
-/*
- Set current horizon in the scanner data structure
+/**
+ @brief Returns the current horizon at the end of the current log
- SYNOPSIS
- translog_scanner_set_horizon()
- scanner Information about current chunk during scanning
+ @return Horizon
*/
-static void translog_scanner_set_horizon(struct st_translog_scanner_data
- *scanner)
+TRANSLOG_ADDRESS translog_get_horizon()
{
+ TRANSLOG_ADDRESS res;
translog_lock();
- scanner->horizon= log_descriptor.horizon;
+ res= log_descriptor.horizon;
translog_unlock();
+ return res;
}
@@ -4446,7 +4516,7 @@ my_bool translog_init_scanner(LSN lsn,
scanner->fixed_horizon= fixed_horizon;
- translog_scanner_set_horizon(scanner);
+ scanner->horizon= translog_get_horizon();
DBUG_PRINT("info", ("horizon: (0x%lu,0x%lx)",
(ulong) LSN_FILE_NO(scanner->horizon),
(ulong) LSN_OFFSET(scanner->horizon)));
@@ -4499,7 +4569,7 @@ static my_bool translog_scanner_eol(TRANSLOG_SCANNER_DATA *scanner)
DBUG_PRINT("info", ("Horizon is fixed and reached"));
DBUG_RETURN(1);
}
- translog_scanner_set_horizon(scanner);
+ scanner->horizon= translog_get_horizon();
DBUG_PRINT("info",
("Horizon is re-read, EOL: %d",
scanner->horizon <= (scanner->page_addr +
@@ -5368,17 +5438,31 @@ static void translog_force_current_buffer_to_finish()
}
-/*
- Flush the log up to given LSN (included)
-
- SYNOPSIS
- translog_flush()
- lsn log record serial number up to which (inclusive)
- the log have to be flushed
-
- RETURN
- 0 OK
- 1 Error
+/**
+ @brief Flush the log up to given LSN (included)
+
+ @param lsn log record serial number up to which (inclusive)
+ the log has to be flushed
+
+ @return Operation status
+ @retval 0 OK
+ @retval 1 Error
+
+ @todo LOG: when a log write fails, we should not write to this log anymore
+ (if we add more log records to this log they will be unreadable: we will hit
+ the broken log record): all translog_flush() should be made to fail (because
+ translog_flush() is when a a transaction wants something durable and we
+ cannot make anything durable as log is corrupted). For that, a "my_bool
+ st_translog_descriptor::write_error" could be set to 1 when a
+ translog_write_record() or translog_flush() fails, and translog_flush()
+ would test this var (and translog_write_record() could also test this var if
+ it wants, though it's not absolutely needed).
+ Then, either shut Maria down immediately, or switch to a new log (but if we
+ get write error after write error, that would create too many logs).
+ A popular open-source transactional engine intentionally crashes as soon as
+ a log flush fails (we however don't want to crash the entire mysqld, but
+ stopping all engine's operations immediately would make sense).
+ Same applies to translog_write_record().
*/
my_bool translog_flush(LSN lsn)
@@ -5469,24 +5553,55 @@ my_bool translog_flush(LSN lsn)
/* We sync file when we are closing it => do nothing if file closed */
}
log_descriptor.flushed= sent_to_file;
+ /** @todo LOG decide if syncing of directory is needed */
rc|= my_sync(log_descriptor.directory_fd, MYF(MY_WME));
translog_unlock();
DBUG_RETURN(rc);
}
+/**
+ @brief Sets transaction's rec_lsn if needed
+
+ A transaction sometimes writes a REDO even before the page is in the
+ pagecache (example: brand new head or tail pages; full pages). So, if
+ Checkpoint happens just after the REDO write, it needs to know that the
+ REDO phase must start before this REDO. Scanning the pagecache cannot
+ tell that as the page is not in the cache. So, transaction sets its rec_lsn
+ to the REDO's LSN or somewhere before, and Checkpoint reads the
+ transaction's rec_lsn.
+
+ @todo move it to a separate file
+
+ @return Operation status, always 0 (success)
+*/
+
static my_bool write_hook_for_redo(enum translog_record_type type
__attribute__ ((unused)),
TRN *trn, LSN *lsn,
struct st_translog_parts *parts
__attribute__ ((unused)))
{
+ /*
+ If the hook stays so simple, it would be faster to pass
+ !trn->rec_lsn ? trn->rec_lsn : some_dummy_lsn
+ to translog_write_record(), like Monty did in his original code, and not
+ have a hook. For now we keep it like this.
+ */
if (trn->rec_lsn == 0)
trn->rec_lsn= *lsn;
return 0;
}
+/**
+ @brief Sets transaction's undo_lsn, first_undo_lsn if needed
+
+ @todo move it to a separate file
+
+ @return Operation status, always 0 (success)
+*/
+
static my_bool write_hook_for_undo(enum translog_record_type type
__attribute__ ((unused)),
TRN *trn, LSN *lsn,
@@ -5494,11 +5609,109 @@ static my_bool write_hook_for_undo(enum translog_record_type type
__attribute__ ((unused)))
{
trn->undo_lsn= *lsn;
- if (trn->first_undo_lsn == 0)
- trn->first_undo_lsn= *lsn;
+ if (unlikely(LSN_WITH_FLAGS_TO_LSN(trn->first_undo_lsn) == 0))
+ trn->first_undo_lsn=
+ trn->undo_lsn | LSN_WITH_FLAGS_TO_FLAGS(trn->first_undo_lsn);
return 0;
/*
when we implement purging, we will specialize this hook: UNDO_PURGE
records will additionally set trn->undo_purge_lsn
*/
}
+
+
+/**
+ @brief Gives a 2-byte-id to MARIA_SHARE and logs this fact
+
+ If a MARIA_SHARE does not yet have a 2-byte-id (unique over all currently
+ open MARIA_SHAREs), give it one and record this assignment in the log
+ (LOGREC_FILE_ID log record).
+
+ @param share table
+ @param trn calling transaction
+
+ @return Operation status
+ @retval 0 OK
+ @retval 1 Error
+
+ @note Can be called even if share already has an id (then will do nothing)
+*/
+
+int translog_assign_id_to_share(MARIA_SHARE *share, TRN *trn)
+{
+ /*
+ If you give an id to a non-BLOCK_RECORD table, you also need to release
+ this id somewhere. Then you can change the assertion.
+ */
+ DBUG_ASSERT(share->data_file_type == BLOCK_RECORD);
+ /* re-check under mutex to avoid having 2 ids for the same share */
+ pthread_mutex_lock(&share->intern_lock);
+ if (likely(share->id == 0))
+ {
+ /* Inspired by set_short_trid() of trnman.c */
+ int i= share->kfile.file % SHARE_ID_MAX + 1;
+ my_atomic_rwlock_wrlock(&LOCK_id_to_share);
+ /**
+ @todo RECOVERY BUG: if all slots are used, and we're using rwlocks
+ above, we will never exit the loop. To be discussed with Serg.
+ */
+ for ( ; ; i= i % SHARE_ID_MAX + 1) /* the range is [1..SHARE_ID_MAX] */
+ {
+ void *tmp= NULL;
+ if (id_to_share[i] == NULL &&
+ my_atomic_casptr((void **)&id_to_share[i], &tmp, share))
+ break;
+ }
+ my_atomic_rwlock_wrunlock(&LOCK_id_to_share);
+ share->id= (uint16)i;
+ DBUG_PRINT("info", ("id_to_share: 0x%lx -> %u", (ulong)share, i));
+ LSN lsn;
+ LEX_STRING log_array[TRANSLOG_INTERNAL_PARTS + 2];
+ uchar log_data[FILEID_STORE_SIZE];
+ log_array[TRANSLOG_INTERNAL_PARTS + 0].str= (char*) log_data;
+ log_array[TRANSLOG_INTERNAL_PARTS + 0].length= sizeof(log_data);
+ /*
+ open_file_name is an unresolved name (symlinks are not resolved, datadir
+ is not realpath-ed, etc) which is good: the log can be moved to another
+ directory and continue working.
+ */
+ log_array[TRANSLOG_INTERNAL_PARTS + 1].str= share->open_file_name;
+ /**
+ @todo if we had the name's length in MARIA_SHARE we could avoid this
+ strlen()
+ */
+ log_array[TRANSLOG_INTERNAL_PARTS + 1].length=
+ strlen(share->open_file_name);
+ if (unlikely(translog_write_record(&lsn, LOGREC_FILE_ID, trn, share,
+ sizeof(log_data) +
+ log_array[TRANSLOG_INTERNAL_PARTS +
+ 1].length,
+ sizeof(log_array)/sizeof(log_array[0]),
+ log_array, log_data)))
+ return 1;
+ }
+ pthread_mutex_unlock(&share->intern_lock);
+ return 0;
+}
+
+
+/**
+ @brief Recycles a MARIA_SHARE's short id.
+
+ @param share table
+
+ @note Must be called only if share has an id (i.e. id != 0)
+*/
+
+void translog_deassign_id_from_share(MARIA_SHARE *share)
+{
+ DBUG_PRINT("info", ("id_to_share: 0x%lx id %u -> 0",
+ (ulong)share, share->id));
+ /*
+ We don't need any mutex as we are called only when closing the last
+ instance of the table: no writes can be happening.
+ */
+ my_atomic_rwlock_rdlock(&LOCK_id_to_share);
+ my_atomic_storeptr((void **)&id_to_share[share->id], 0);
+ my_atomic_rwlock_rdunlock(&LOCK_id_to_share);
+}
diff --git a/storage/maria/ma_loghandler.h b/storage/maria/ma_loghandler.h
index e9872e7bfb7..0a160a9bc53 100644
--- a/storage/maria/ma_loghandler.h
+++ b/storage/maria/ma_loghandler.h
@@ -86,13 +86,12 @@ enum translog_record_type
LOGREC_PREPARE_WITH_UNDO_PURGE,
LOGREC_COMMIT,
LOGREC_COMMIT_WITH_UNDO_PURGE,
- LOGREC_CHECKPOINT_PAGE,
- LOGREC_CHECKPOINT_TRAN,
- LOGREC_CHECKPOINT_TABL,
+ LOGREC_CHECKPOINT,
LOGREC_REDO_CREATE_TABLE,
LOGREC_REDO_RENAME_TABLE,
LOGREC_REDO_DROP_TABLE,
- LOGREC_REDO_TRUNCATE_TABLE,
+ LOGREC_REDO_DELETE_ALL,
+ LOGREC_REDO_REPAIR_TABLE,
LOGREC_FILE_ID,
LOGREC_LONG_TRANSACTION_ID,
LOGREC_RESERVED_FUTURE_EXTENSION= 63
@@ -181,9 +180,7 @@ struct st_translog_reader_data
};
struct st_transaction;
-#ifdef __cplusplus
-extern "C" {
-#endif
+C_MODE_START
/* Records types for unittests */
#define LOGREC_FIXED_RECORD_0LSN_EXAMPLE 1
@@ -199,13 +196,12 @@ extern my_bool translog_init(const char *directory, uint32 log_file_max_size,
uint32 server_version, uint32 server_id,
PAGECACHE *pagecache, uint flags);
-extern my_bool translog_write_record(LSN *lsn,
- enum translog_record_type type,
- struct st_transaction *trn,
- struct st_maria_share *share,
- translog_size_t rec_len,
- uint part_no,
- LEX_STRING *parts_data);
+extern my_bool
+translog_write_record(LSN *lsn, enum translog_record_type type,
+ struct st_transaction *trn,
+ struct st_maria_share *share,
+ translog_size_t rec_len, uint part_no,
+ LEX_STRING *parts_data, uchar *store_share_id);
extern void translog_destroy();
@@ -232,7 +228,10 @@ extern translog_size_t translog_read_next_record_header(TRANSLOG_SCANNER_DATA
*scanner,
TRANSLOG_HEADER_BUFFER
*buff);
-#ifdef __cplusplus
-}
-#endif
-
+extern void translog_lock_assert_owner();
+extern TRANSLOG_ADDRESS translog_get_horizon();
+extern int translog_assign_id_to_share(struct st_maria_share *share,
+ struct st_transaction *trn);
+extern void translog_deassign_id_from_share(struct st_maria_share *share);
+extern my_bool translog_inited;
+C_MODE_END
diff --git a/storage/maria/ma_loghandler_lsn.h b/storage/maria/ma_loghandler_lsn.h
index 1789d3ce61b..c641337e8ba 100644
--- a/storage/maria/ma_loghandler_lsn.h
+++ b/storage/maria/ma_loghandler_lsn.h
@@ -35,7 +35,7 @@ typedef TRANSLOG_ADDRESS LSN;
/* checks LSN */
#define LSN_VALID(L) DBUG_ASSERT((L) >= 0 && (L) < (uint64)0xFFFFFFFFFFFFFFLL)
-/* size of stored LSN on a disk */
+/* size of stored LSN on a disk, don't change it! */
#define LSN_STORE_SIZE 7
/* Puts LSN into buffer (dst) */
@@ -53,4 +53,12 @@ typedef TRANSLOG_ADDRESS LSN;
#define LSN_REPLACE_OFFSET(L, S) (LSN_FINE_NO_PART(L) | (S))
+/*
+ an 8-byte type whose most significant byte is used for "flags"; 7
+ other bytes are a LSN.
+*/
+typedef LSN LSN_WITH_FLAGS;
+#define LSN_WITH_FLAGS_TO_LSN(x) (x & ULL(0x00FFFFFFFFFFFFFF))
+#define LSN_WITH_FLAGS_TO_FLAGS(x) (x & ULL(0xFF00000000000000))
+
#endif
diff --git a/storage/maria/ma_open.c b/storage/maria/ma_open.c
index b8ce6d123e7..4e72adf3b7e 100644
--- a/storage/maria/ma_open.c
+++ b/storage/maria/ma_open.c
@@ -919,12 +919,23 @@ static void setup_key_functions(register MARIA_KEYDEF *keyinfo)
}
-/*
- Function to save and store the header in the index file (.MYI)
+/**
+ @brief Function to save and store the header in the index file (.MYI)
+
+ @param file descriptor of the index file to write
+ @param state state information to write to the file
+ @param pWrite bitmap (determines the amount of information to
+ write, and if my_write() or my_pwrite() should be
+ used)
+
+ @return Operation status
+ @retval 0 OK
+ @retval 1 Error
*/
uint _ma_state_info_write(File file, MARIA_STATE_INFO *state, uint pWrite)
{
+ /** @todo RECOVERY write it only at checkpoint time */
uchar buff[MARIA_STATE_INFO_SIZE + MARIA_STATE_EXTRA_SIZE];
uchar *ptr=buff;
uint i, keys= (uint) state->header.keys;
@@ -935,6 +946,11 @@ uint _ma_state_info_write(File file, MARIA_STATE_INFO *state, uint pWrite)
/* open_count must be first because of _ma_mark_file_changed ! */
mi_int2store(ptr,state->open_count); ptr+= 2;
+ /*
+ if you change the offset of this LSN inside the file, fix
+ ma_create + ma_rename + ma_delete_all + backward-compatibility.
+ */
+ lsn_store(ptr, state->create_rename_lsn); ptr+= LSN_STORE_SIZE;
*ptr++= (uchar)state->changed;
*ptr++= state->sortkey;
mi_rowstore(ptr,state->state.records); ptr+= 8;
@@ -959,6 +975,7 @@ uint _ma_state_info_write(File file, MARIA_STATE_INFO *state, uint pWrite)
{
mi_sizestore(ptr,state->key_root[i]); ptr+= 8;
}
+ /** @todo RECOVERY key_del is a problem for recovery */
mi_sizestore(ptr,state->key_del); ptr+= 8;
if (pWrite & 2) /* From maria_chk */
{
@@ -994,6 +1011,7 @@ byte *_ma_state_info_read(byte *ptr, MARIA_STATE_INFO *state)
key_parts= mi_uint2korr(state->header.key_parts);
state->open_count = mi_uint2korr(ptr); ptr+= 2;
+ state->create_rename_lsn= lsn_korr(ptr); ptr+= LSN_STORE_SIZE;
state->changed= (my_bool) *ptr++;
state->sortkey= (uint) *ptr++;
state->state.records= mi_rowkorr(ptr); ptr+= 8;
diff --git a/storage/maria/ma_pagecache.c b/storage/maria/ma_pagecache.c
index 18c36fcfbd1..ae42f702b0a 100755
--- a/storage/maria/ma_pagecache.c
+++ b/storage/maria/ma_pagecache.c
@@ -114,6 +114,11 @@
/* TODO: put it to my_static.c */
my_bool my_disable_flush_pagecache_blocks= 0;
+/**
+ when flushing pages of a file, it can happen that we take some dirty blocks
+ out of changed_blocks[]; Checkpoint must not run at this moment.
+*/
+uint changed_blocks_is_incomplete= 0;
#define STRUCT_PTR(TYPE, MEMBER, a) \
(TYPE *) ((char *) (a) - offsetof(TYPE, MEMBER))
@@ -308,7 +313,7 @@ struct st_pagecache_block_link
enum pagecache_page_type type; /* type of the block */
uint hits_left; /* number of hits left until promotion */
ulonglong last_hit_time; /* timestamp of the last hit */
- LSN rec_lsn; /* LSN when first became dirty */
+ LSN rec_lsn; /**< LSN when first became dirty */
KEYCACHE_CONDVAR *condvar; /* condition variable for 'no readers' event */
};
@@ -2523,7 +2528,8 @@ void pagecache_unlock(PAGECACHE *pagecache,
{
DBUG_ASSERT(lock == PAGECACHE_LOCK_WRITE_UNLOCK);
DBUG_ASSERT(pin == PAGECACHE_UNPIN);
- set_if_bigger(block->rec_lsn, first_REDO_LSN_for_page);
+ if (block->rec_lsn == 0)
+ block->rec_lsn= first_REDO_LSN_for_page;
}
if (lsn != 0)
{
@@ -2685,7 +2691,8 @@ void pagecache_unlock_by_link(PAGECACHE *pagecache,
DBUG_ASSERT(lock == PAGECACHE_LOCK_WRITE_UNLOCK ||
lock == PAGECACHE_LOCK_READ_UNLOCK);
DBUG_ASSERT(pin == PAGECACHE_UNPIN);
- set_if_bigger(block->rec_lsn, first_REDO_LSN_for_page);
+ if (block->rec_lsn == 0)
+ block->rec_lsn= first_REDO_LSN_for_page;
}
if (lsn != 0)
{
@@ -3279,8 +3286,8 @@ restart:
if (need_lock_change)
{
/*
- RECOVERY TODO BUG We are doing an unlock here, so need to give the
- page its rec_lsn
+ We don't set rec_lsn of the block; this is ok as for the
+ Maria-block-record's pages, we always keep pages pinned here.
*/
if (make_lock_and_pin(pagecache, block,
write_lock_change_table[lock].unlock_lock,
@@ -3500,22 +3507,21 @@ static int flush_cached_blocks(PAGECACHE *pagecache,
}
-/*
- flush all key blocks for a file to disk, but don't do any mutex locks
+/**
+ @brief flush all key blocks for a file to disk but don't do any mutex locks
- flush_pagecache_blocks_int()
- pagecache pointer to a key cache data structure
- file handler for the file to flush to
- flush_type type of the flush
+ @param pagecache pointer to a pagecache data structure
+ @param file handler for the file to flush to
+ @param flush_type type of the flush
- NOTES
- This function doesn't do any mutex locks because it needs to be called
- both from flush_pagecache_blocks and flush_all_key_blocks (the later one
- does the mutex lock in the resize_pagecache() function).
+ @note
+ This function doesn't do any mutex locks because it needs to be called
+ both from flush_pagecache_blocks and flush_all_key_blocks (the later one
+ does the mutex lock in the resize_pagecache() function).
- RETURN
- 0 ok
- 1 error
+ @return Operation status
+ @retval 0 OK
+ @retval 1 Error
*/
static int flush_pagecache_blocks_int(PAGECACHE *pagecache,
@@ -3547,6 +3553,7 @@ static int flush_pagecache_blocks_int(PAGECACHE *pagecache,
#if defined(PAGECACHE_DEBUG)
uint cnt= 0;
#endif
+ uint8 changed_blocks_is_incomplete_incremented= 0;
if (type != FLUSH_IGNORE_CHANGED)
{
@@ -3636,16 +3643,23 @@ restart:
else
{
/* Link the block into a list of blocks 'in switch' */
- /*
- RECOVERY TODO BUG this unlink_changed() is a serious problem for
- Maria's Checkpoint: it removes a page from the list of dirty
- pages, while it's still dirty. A solution is to abandon
- first_in_switch, just wait for this page to be
- flushed by somebody else, and loop. TODO: check all places
- where we remove a page from the list of dirty pages
- */
unlink_changed(block);
link_changed(block, &first_in_switch);
+ /*
+ We have just removed a page from the list of dirty pages
+ ("changed_blocks") though it's still dirty (the flush by another
+ thread has not yet happened). Checkpoint will miss the page and so
+ must be blocked until that flush has happened.
+ */
+ /**
+ @todo RECOVERY: check all places where we remove a page from the
+ list of dirty pages
+ */
+ if (unlikely(!changed_blocks_is_incomplete_incremented))
+ {
+ changed_blocks_is_incomplete_incremented= 1;
+ changed_blocks_is_incomplete++;
+ }
}
}
}
@@ -3683,6 +3697,8 @@ restart:
KEYCACHE_DBUG_ASSERT(cnt <= pagecache->blocks_used);
#endif
}
+ changed_blocks_is_incomplete-=
+ changed_blocks_is_incomplete_incremented;
/* The following happens very seldom */
if (! (type == FLUSH_KEEP || type == FLUSH_FORCE_WRITE))
{
@@ -3789,51 +3805,56 @@ int reset_pagecache_counters(const char *name, PAGECACHE *pagecache)
}
-/*
- Allocates a buffer and stores in it some information about all dirty pages
- of type PAGECACHE_LSN_PAGE.
-
- SYNOPSIS
- pagecache_collect_changed_blocks_with_lsn()
- pagecache pointer to the page cache
- str (OUT) pointer to a LEX_STRING where the allocated buffer, and
- its size, will be put
- max_lsn (OUT) pointer to a LSN where the maximum rec_lsn of all
- relevant dirty pages will be put
-
- DESCRIPTION
- Does the allocation because the caller cannot know the size itself.
- Memory freeing is to be done by the caller (if the "str" member of the
- LEX_STRING is not NULL).
- Ignores all pages of another type than PAGECACHE_LSN_PAGE, because they
- are not interesting for a checkpoint record.
- The caller has the intention of doing checkpoints.
-
- RETURN
- 0 on success
- 1 on error
+/**
+ @brief Allocates a buffer and stores in it some info about all dirty pages
+
+ Does the allocation because the caller cannot know the size itself.
+ Memory freeing is to be done by the caller (if the "str" member of the
+ LEX_STRING is not NULL).
+ Ignores all pages of another type than PAGECACHE_LSN_PAGE, because they
+ are not interesting for a checkpoint record.
+ The caller has the intention of doing checkpoints.
+
+ @param pagecache pointer to the page cache
+ @param[out] str pointer to where the allocated buffer, and
+ its size, will be put
+ @param[out] min_rec_lsn pointer to where the minimum rec_lsn of all
+ relevant dirty pages will be put
+ @param[out] max_rec_lsn pointer to where the maximum rec_lsn of all
+ relevant dirty pages will be put
+ @return Operation status
+ @retval 0 OK
+ @retval 1 Error
*/
+
my_bool pagecache_collect_changed_blocks_with_lsn(PAGECACHE *pagecache,
LEX_STRING *str,
- LSN *max_lsn)
+ LSN *min_rec_lsn,
+ LSN *max_rec_lsn)
{
my_bool error= 0;
ulong stored_list_size= 0;
uint file_hash;
char *ptr;
+ LSN minimum_rec_lsn= ULONGLONG_MAX, maximum_rec_lsn= 0;
DBUG_ENTER("pagecache_collect_changed_blocks_with_LSN");
- *max_lsn= 0;
DBUG_ASSERT(NULL == str->str);
/*
We lock the entire cache but will be quick, just reading/writing a few MBs
of memory at most.
- When we enter here, we must be sure that no "first_in_switch" situation
- is happening or will happen (either we have to get rid of
- first_in_switch in the code or, first_in_switch has to increment a
- "danger" counter for this function to know it has to wait). TODO.
*/
pagecache_pthread_mutex_lock(&pagecache->cache_lock);
+ while (changed_blocks_is_incomplete > 0)
+ {
+ /*
+ Some pages are more recent in memory than on disk (=dirty) and are not
+ in "changed_blocks" so we cannot know them. Wait.
+ */
+ pagecache_pthread_mutex_unlock(&pagecache->cache_lock);
+ sleep(1);
+ pagecache_pthread_mutex_lock(&pagecache->cache_lock);
+ }
/* Count how many dirty pages are interesting */
for (file_hash= 0; file_hash < PAGECACHE_CHANGED_BLOCKS_HASH; file_hash++)
@@ -3851,35 +3872,15 @@ my_bool pagecache_collect_changed_blocks_with_lsn(PAGECACHE *pagecache,
DBUG_ASSERT(block->status & PCBLOCK_CHANGED);
if (block->type != PAGECACHE_LSN_PAGE)
continue; /* no need to store it */
- /*
- In the current pagecache, rec_lsn is not set correctly:
- 1) it is set on pagecache_unlock(), too late (a page is dirty
- (PCBLOCK_CHANGED) since the first pagecache_write()). So in this
- scenario:
- thread1: thread2:
- write_REDO
- pagecache_write() checkpoint : reclsn not known
- pagecache_unlock(sets rec_lsn)
- commit
- crash,
- at recovery we will wrongly skip the REDO. It also affects the
- low-water mark's computation.
- 2) sometimes the unlocking can be an implicit action of
- pagecache_write(), without any call to pagecache_unlock(), then
- rec_lsn is not set.
- 1) and 2) are critical problems.
- TODO: fix this when Monty has explained how he writes BLOB pages.
- */
- if (block->rec_lsn == 0)
- {
- DBUG_ASSERT(0);
- goto err;
- }
stored_list_size++;
}
}
- str->length= 8+(4+4+8)*stored_list_size;
+ str->length= 8 + /* number of dirty pages */
+ (4 + /* file */
+ 4 + /* pageno */
+ LSN_STORE_SIZE /* rec_lsn */
+ ) * stored_list_size;
if (NULL == (str->str= my_malloc(str->length, MYF(MY_WME))))
goto err;
ptr= str->str;
@@ -3896,19 +3897,27 @@ my_bool pagecache_collect_changed_blocks_with_lsn(PAGECACHE *pagecache,
{
if (block->type != PAGECACHE_LSN_PAGE)
continue; /* no need to store it in the checkpoint record */
- DBUG_ASSERT((4 == sizeof(block->hash_link->file.file)));
- DBUG_ASSERT((4 == sizeof(block->hash_link->pageno)));
+ compile_time_assert((4 == sizeof(block->hash_link->file.file)));
+ compile_time_assert((4 == sizeof(block->hash_link->pageno)));
int4store(ptr, block->hash_link->file.file);
ptr+= 4;
int4store(ptr, block->hash_link->pageno);
ptr+= 4;
- int8store(ptr, (ulonglong) block->rec_lsn);
- ptr+= 8;
- set_if_bigger(*max_lsn, block->rec_lsn);
+ lsn_store(ptr, block->rec_lsn);
+ ptr+= LSN_STORE_SIZE;
+ if (block->rec_lsn != 0)
+ {
+ if (cmp_translog_addr(block->rec_lsn, minimum_rec_lsn) < 0)
+ minimum_rec_lsn= block->rec_lsn;
+ if (cmp_translog_addr(block->rec_lsn, maximum_rec_lsn) > 0)
+ maximum_rec_lsn= block->rec_lsn;
+ } /* otherwise, some trn->rec_lsn should hold the info */
}
}
end:
pagecache_pthread_mutex_unlock(&pagecache->cache_lock);
+ *min_rec_lsn= minimum_rec_lsn;
+ *max_rec_lsn= maximum_rec_lsn;
DBUG_RETURN(error);
err:
diff --git a/storage/maria/ma_pagecache.h b/storage/maria/ma_pagecache.h
index ef14cd48cef..478f71161eb 100644
--- a/storage/maria/ma_pagecache.h
+++ b/storage/maria/ma_pagecache.h
@@ -239,6 +239,7 @@ extern my_bool pagecache_delete_pages(PAGECACHE *pagecache,
extern void end_pagecache(PAGECACHE *keycache, my_bool cleanup);
extern my_bool pagecache_collect_changed_blocks_with_lsn(PAGECACHE *pagecache,
LEX_STRING *str,
+ LSN *min_lsn,
LSN *max_lsn);
extern int reset_pagecache_counters(const char *name, PAGECACHE *pagecache);
diff --git a/storage/maria/ma_panic.c b/storage/maria/ma_panic.c
index b74403e6eb2..0394f630343 100644
--- a/storage/maria/ma_panic.c
+++ b/storage/maria/ma_panic.c
@@ -52,7 +52,12 @@ int maria_panic(enum ha_panic_function flag)
info=(MARIA_HA*) list_element->data;
switch (flag) {
case HA_PANIC_CLOSE:
- pthread_mutex_unlock(&THR_LOCK_maria); /* Not exactly right... */
+ /*
+ If bad luck (if some tables would be used now, which normally does not
+ happen in MySQL), as we release the mutex, the list may change and so
+ we may crash.
+ */
+ pthread_mutex_unlock(&THR_LOCK_maria);
if (maria_close(info))
error=my_errno;
pthread_mutex_lock(&THR_LOCK_maria);
diff --git a/storage/maria/ma_range.c b/storage/maria/ma_range.c
index f91a61259d7..b359868e8e4 100644
--- a/storage/maria/ma_range.c
+++ b/storage/maria/ma_range.c
@@ -29,25 +29,22 @@ static uint _ma_keynr(MARIA_HA *info, MARIA_KEYDEF *keyinfo, byte *page,
byte *keypos, uint *ret_max_key);
-/*
- Estimate how many records there is in a given range
+/**
+ @brief Estimate how many records there is in a given range
- SYNOPSIS
- maria_records_in_range()
- info MARIA handler
- inx Index to use
- min_key Min key. Is = 0 if no min range
- max_key Max key. Is = 0 if no max range
+ @param info MARIA handler
+ @param inx Index to use
+ @param min_key Min key. Is = 0 if no min range
+ @param max_key Max key. Is = 0 if no max range
- NOTES
- We should ONLY return 0 if there is no rows in range
+ @note
+ We should ONLY return 0 if there is no rows in range
- RETURN
- HA_POS_ERROR error (or we can't estimate number of rows)
- number Estimated number of rows
+ @return Estimated number of rows or error
+ @retval HA_POS_ERROR error (or we can't estimate number of rows)
+ @retval number Estimated number of rows
*/
-
ha_rows maria_records_in_range(MARIA_HA *info, int inx, key_range *min_key,
key_range *max_key)
{
@@ -115,6 +112,13 @@ ha_rows maria_records_in_range(MARIA_HA *info, int inx, key_range *min_key,
rw_unlock(&info->s->key_root_lock[inx]);
fast_ma_writeinfo(info);
+ /**
+ @todo LOCK
+ If res==0 (no rows), if we need to guarantee repeatability of the search,
+ we will need to set a next-key lock in this statement.
+ Also SELECT COUNT(*)...
+ */
+
DBUG_PRINT("info",("records: %ld",(ulong) (res)));
DBUG_RETURN(res);
}
diff --git a/storage/maria/ma_rename.c b/storage/maria/ma_rename.c
index a80bbcd398f..5224698c614 100644
--- a/storage/maria/ma_rename.c
+++ b/storage/maria/ma_rename.c
@@ -18,6 +18,18 @@
*/
#include "ma_fulltext.h"
+#include "trnman_public.h"
+
+/**
+ @brief renames a table
+
+ @param old_name current name of table
+ @param new_name table should be renamed to this name
+
+ @return Operation status
+ @retval 0 OK
+ @retval !=0 Error
+*/
int maria_rename(const char *old_name, const char *new_name)
{
@@ -26,22 +38,73 @@ int maria_rename(const char *old_name, const char *new_name)
#ifdef USE_RAID
uint raid_type=0,raid_chunks=0;
#endif
+ MARIA_HA *info;
+ MARIA_SHARE *share;
+ myf sync_dir;
DBUG_ENTER("maria_rename");
#ifdef EXTRA_DEBUG
_ma_check_table_is_closed(old_name,"rename old_table");
_ma_check_table_is_closed(new_name,"rename new table2");
#endif
- /* LOCK TODO take X-lock on table here */
+ /** @todo LOCK take X-lock on table */
+ if (!(info= maria_open(old_name, O_RDWR, HA_OPEN_FOR_REPAIR)))
+ DBUG_RETURN(my_errno);
+ share= info->s;
#ifdef USE_RAID
+ raid_type = share->base.raid_type;
+ raid_chunks = share->base.raid_chunks;
+#endif
+
+ sync_dir= (share->base.transactional && !share->temporary) ?
+ MY_SYNC_DIR : 0;
+ if (sync_dir)
{
- MARIA_HA *info;
- if (!(info=maria_open(old_name, O_RDONLY, 0)))
- DBUG_RETURN(my_errno);
- raid_type = info->s->base.raid_type;
- raid_chunks = info->s->base.raid_chunks;
- maria_close(info);
+ uchar log_data[LSN_STORE_SIZE];
+ LEX_STRING log_array[TRANSLOG_INTERNAL_PARTS + 3];
+ uint old_name_len= strlen(old_name), new_name_len= strlen(new_name);
+ int2store(log_data, old_name_len);
+ int2store(log_data + 2, new_name_len);
+ log_array[TRANSLOG_INTERNAL_PARTS + 0].str= log_data;
+ log_array[TRANSLOG_INTERNAL_PARTS + 0].length= 2 + 2;
+ log_array[TRANSLOG_INTERNAL_PARTS + 1].str= (char *)old_name;
+ log_array[TRANSLOG_INTERNAL_PARTS + 1].length= old_name_len;
+ log_array[TRANSLOG_INTERNAL_PARTS + 2].str= (char *)new_name;
+ log_array[TRANSLOG_INTERNAL_PARTS + 2].length= new_name_len;
+ /*
+ For this record to be of any use for Recovery, we need the upper
+ MySQL layer to be crash-safe, which it is not now (that would require
+ work using the ddl_log of sql/sql_table.cc); when it is, we should
+ reconsider the moment of writing this log record (before or after op,
+ under THR_LOCK_maria or not...), how to use it in Recovery, and force
+ the log. For now this record is just informative.
+ */
+ if (unlikely(translog_write_record(&share->state.create_rename_lsn,
+ LOGREC_REDO_RENAME_TABLE,
+ &dummy_transaction_object, NULL,
+ 2 + 2 + old_name_len + new_name_len,
+ sizeof(log_array)/sizeof(log_array[0]),
+ log_array, NULL)))
+ {
+ maria_close(info);
+ DBUG_RETURN(1);
+ }
+ /*
+ store LSN into file, needed for Recovery to not be confused if a
+ RENAME happened (applying REDOs to the wrong table).
+ */
+ lsn_store(log_data, share->state.create_rename_lsn);
+ if (my_pwrite(share->kfile.file, log_data, sizeof(log_data),
+ sizeof(share->state.header) + 2, MYF(MY_NABP)) ||
+ my_sync(share->kfile.file, MYF(MY_WME)))
+ {
+ maria_close(info);
+ DBUG_RETURN(1);
+ }
}
+
+ maria_close(info);
+#ifdef USE_RAID
#ifdef EXTRA_DEBUG
_ma_check_table_is_closed(old_name,"rename raidcheck");
#endif
@@ -49,29 +112,18 @@ int maria_rename(const char *old_name, const char *new_name)
fn_format(from,old_name,"",MARIA_NAME_IEXT,MY_UNPACK_FILENAME|MY_APPEND_EXT);
fn_format(to,new_name,"",MARIA_NAME_IEXT,MY_UNPACK_FILENAME|MY_APPEND_EXT);
- /*
- RECOVERY TODO log the two renames below. Update
- ZeroDirtyPagesLSN of the table on disk (=> sync the files), this is
- needed so that Recovery does not pick a wrong table.
- Then do the file renames.
- For this log record to be of any use for Recovery, we need the upper MySQL
- layer to be crash-safe in DDLs; when it is we should reconsider the moment
- of writing this log record, how to use it in Recovery, and force the log.
- For now this record is only informative. But ZeroDirtyPagesLSN is
- critically needed!
- */
- if (my_rename_with_symlink(from, to, MYF(MY_WME | MY_SYNC_DIR)))
+ if (my_rename_with_symlink(from, to, MYF(MY_WME | sync_dir)))
DBUG_RETURN(my_errno);
fn_format(from,old_name,"",MARIA_NAME_DEXT,MY_UNPACK_FILENAME|MY_APPEND_EXT);
fn_format(to,new_name,"",MARIA_NAME_DEXT,MY_UNPACK_FILENAME|MY_APPEND_EXT);
#ifdef USE_RAID
if (raid_type)
data_file_rename_error= my_raid_rename(from, to, raid_chunks,
- MYF(MY_WME | MY_SYNC_DIR));
+ MYF(MY_WME | sync_dir));
else
#endif
data_file_rename_error=
- my_rename_with_symlink(from, to, MYF(MY_WME | MY_SYNC_DIR));
+ my_rename_with_symlink(from, to, MYF(MY_WME | sync_dir));
if (data_file_rename_error)
{
/*
@@ -81,7 +133,7 @@ int maria_rename(const char *old_name, const char *new_name)
data_file_rename_error= my_errno;
fn_format(from, old_name, "", MARIA_NAME_IEXT, MYF(MY_UNPACK_FILENAME|MY_APPEND_EXT));
fn_format(to, new_name, "", MARIA_NAME_IEXT, MYF(MY_UNPACK_FILENAME|MY_APPEND_EXT));
- my_rename_with_symlink(to, from, MYF(MY_WME | MY_SYNC_DIR));
+ my_rename_with_symlink(to, from, MYF(MY_WME | sync_dir));
}
DBUG_RETURN(data_file_rename_error);
diff --git a/storage/maria/ma_static.c b/storage/maria/ma_static.c
index c77f3f512fd..16bf0eca935 100644
--- a/storage/maria/ma_static.c
+++ b/storage/maria/ma_static.c
@@ -47,7 +47,13 @@ PAGECACHE *maria_pagecache= &maria_pagecache_var;
PAGECACHE maria_log_pagecache_var;
PAGECACHE *maria_log_pagecache= &maria_log_pagecache_var;
-/* For using maria externally */
+/**
+ @brief when transactionality does not matter we can use this transaction
+
+ Used in external programs like ma_test*, and also internally inside
+ libmaria when there is no transaction around and the operation isn't
+ transactional (CREATE/DROP/RENAME/OPTIMIZE/REPAIR).
+*/
TRN dummy_transaction_object;
/* Enough for comparing if number is zero */
diff --git a/storage/maria/ma_test_all.sh b/storage/maria/ma_test_all.sh
index 8ee326a9c69..76b6c32913f 100755
--- a/storage/maria/ma_test_all.sh
+++ b/storage/maria/ma_test_all.sh
@@ -3,10 +3,16 @@
# Execute some simple basic test on MyISAM libary to check if things
# works at all.
+# If you want to run this in Valgrind, you should use --trace-children=yes,
+# so that it detects problems in ma_test* and not in the shell script
valgrind="valgrind --alignment=8 --leak-check=yes"
silent="-s"
suffix=""
#set -x -v -e
+if [ -z "$maria_path" ]
+then
+ maria_path="."
+fi
run_tests()
{
@@ -14,139 +20,139 @@ run_tests()
#
# First some simple tests
#
- ./ma_test1$suffix $silent $row_type
- ./maria_chk$suffix -se test1
- ./ma_test1$suffix $silent -N $row_type
- ./maria_chk$suffix -se test1
- ./ma_test1$suffix $silent -P --checksum $row_type
- ./maria_chk$suffix -se test1
- ./ma_test1$suffix $silent -P -N $row_type
- ./maria_chk$suffix -se test1
- ./ma_test1$suffix $silent -B -N -R2 $row_type
- ./maria_chk$suffix -sm test1
- ./ma_test1$suffix $silent -a -k 480 --unique $row_type
- ./maria_chk$suffix -sm test1
- ./ma_test1$suffix $silent -a -N -R1 $row_type
- ./maria_chk$suffix -sm test1
- ./ma_test1$suffix $silent -p $row_type
- ./maria_chk$suffix -sm test1
- ./ma_test1$suffix $silent -p -N --unique $row_type
- ./maria_chk$suffix -sm test1
- ./ma_test1$suffix $silent -p -N --key_length=127 --checksum $row_type
- ./maria_chk$suffix -sm test1
- ./ma_test1$suffix $silent -p -N --key_length=128 $row_type
- ./maria_chk$suffix -sm test1
- ./ma_test1$suffix $silent -p --key_length=480 $row_type
- ./maria_chk$suffix -sm test1
- ./ma_test1$suffix $silent -a -B $row_type
- ./maria_chk$suffix -sm test1
- ./ma_test1$suffix $silent -a -B --key_length=64 --unique $row_type
- ./maria_chk$suffix -sm test1
- ./ma_test1$suffix $silent -a -B -k 480 --checksum $row_type
- ./maria_chk$suffix -sm test1
- ./ma_test1$suffix $silent -a -B -k 480 -N --unique --checksum $row_type
- ./maria_chk$suffix -sm test1
- ./ma_test1$suffix $silent -a -m $row_type
- ./maria_chk$suffix -sm test1
- ./ma_test1$suffix $silent -a -m -P --unique --checksum $row_type
- ./maria_chk$suffix -sm test1
- ./ma_test1$suffix $silent -a -m -P --key_length=480 --key_cache $row_type
- ./maria_chk$suffix -sm test1
- ./ma_test1$suffix $silent -m -p $row_type
- ./maria_chk$suffix -sm test1
- ./ma_test1$suffix $silent -w --unique $row_type
- ./maria_chk$suffix -sm test1
- ./ma_test1$suffix $silent -a -w --key_length=64 --checksum $row_type
- ./maria_chk$suffix -sm test1
- ./ma_test1$suffix $silent -a -w -N --key_length=480 $row_type
- ./maria_chk$suffix -sm test1
- ./ma_test1$suffix $silent -a -w --key_length=480 --checksum $row_type
- ./maria_chk$suffix -sm test1
- ./ma_test1$suffix $silent -a -b -N $row_type
- ./maria_chk$suffix -sm test1
- ./ma_test1$suffix $silent -a -b --key_length=480 $row_type
- ./maria_chk$suffix -sm test1
- ./ma_test1$suffix $silent -p -B --key_length=480 $row_type
- ./maria_chk$suffix -sm test1
- ./ma_test1$suffix $silent --checksum --unique $row_type
- ./maria_chk$suffix -se test1
- ./ma_test1$suffix $silent --unique $row_type
- ./maria_chk$suffix -se test1
+ $maria_path/ma_test1$suffix $silent $row_type
+ $maria_path/maria_chk$suffix -se test1
+ $maria_path/ma_test1$suffix $silent -N $row_type
+ $maria_path/maria_chk$suffix -se test1
+ $maria_path/ma_test1$suffix $silent -P --checksum $row_type
+ $maria_path/maria_chk$suffix -se test1
+ $maria_path/ma_test1$suffix $silent -P -N $row_type
+ $maria_path/maria_chk$suffix -se test1
+ $maria_path/ma_test1$suffix $silent -B -N -R2 $row_type
+ $maria_path/maria_chk$suffix -sm test1
+ $maria_path/ma_test1$suffix $silent -a -k 480 --unique $row_type
+ $maria_path/maria_chk$suffix -sm test1
+ $maria_path/ma_test1$suffix $silent -a -N -R1 $row_type
+ $maria_path/maria_chk$suffix -sm test1
+ $maria_path/ma_test1$suffix $silent -p $row_type
+ $maria_path/maria_chk$suffix -sm test1
+ $maria_path/ma_test1$suffix $silent -p -N --unique $row_type
+ $maria_path/maria_chk$suffix -sm test1
+ $maria_path/ma_test1$suffix $silent -p -N --key_length=127 --checksum $row_type
+ $maria_path/maria_chk$suffix -sm test1
+ $maria_path/ma_test1$suffix $silent -p -N --key_length=128 $row_type
+ $maria_path/maria_chk$suffix -sm test1
+ $maria_path/ma_test1$suffix $silent -p --key_length=480 $row_type
+ $maria_path/maria_chk$suffix -sm test1
+ $maria_path/ma_test1$suffix $silent -a -B $row_type
+ $maria_path/maria_chk$suffix -sm test1
+ $maria_path/ma_test1$suffix $silent -a -B --key_length=64 --unique $row_type
+ $maria_path/maria_chk$suffix -sm test1
+ $maria_path/ma_test1$suffix $silent -a -B -k 480 --checksum $row_type
+ $maria_path/maria_chk$suffix -sm test1
+ $maria_path/ma_test1$suffix $silent -a -B -k 480 -N --unique --checksum $row_type
+ $maria_path/maria_chk$suffix -sm test1
+ $maria_path/ma_test1$suffix $silent -a -m $row_type
+ $maria_path/maria_chk$suffix -sm test1
+ $maria_path/ma_test1$suffix $silent -a -m -P --unique --checksum $row_type
+ $maria_path/maria_chk$suffix -sm test1
+ $maria_path/ma_test1$suffix $silent -a -m -P --key_length=480 --key_cache $row_type
+ $maria_path/maria_chk$suffix -sm test1
+ $maria_path/ma_test1$suffix $silent -m -p $row_type
+ $maria_path/maria_chk$suffix -sm test1
+ $maria_path/ma_test1$suffix $silent -w --unique $row_type
+ $maria_path/maria_chk$suffix -sm test1
+ $maria_path/ma_test1$suffix $silent -a -w --key_length=64 --checksum $row_type
+ $maria_path/maria_chk$suffix -sm test1
+ $maria_path/ma_test1$suffix $silent -a -w -N --key_length=480 $row_type
+ $maria_path/maria_chk$suffix -sm test1
+ $maria_path/ma_test1$suffix $silent -a -w --key_length=480 --checksum $row_type
+ $maria_path/maria_chk$suffix -sm test1
+ $maria_path/ma_test1$suffix $silent -a -b -N $row_type
+ $maria_path/maria_chk$suffix -sm test1
+ $maria_path/ma_test1$suffix $silent -a -b --key_length=480 $row_type
+ $maria_path/maria_chk$suffix -sm test1
+ $maria_path/ma_test1$suffix $silent -p -B --key_length=480 $row_type
+ $maria_path/maria_chk$suffix -sm test1
+ $maria_path/ma_test1$suffix $silent --checksum --unique $row_type
+ $maria_path/maria_chk$suffix -se test1
+ $maria_path/ma_test1$suffix $silent --unique $row_type
+ $maria_path/maria_chk$suffix -se test1
- ./ma_test1$suffix $silent --key_multiple -N -S $row_type
- ./maria_chk$suffix -sm test1
- ./ma_test1$suffix $silent --key_multiple -a -p --key_length=480 $row_type
- ./maria_chk$suffix -sm test1
- ./ma_test1$suffix $silent --key_multiple -a -B --key_length=480 $row_type
- ./maria_chk$suffix -sm test1
- ./ma_test1$suffix $silent --key_multiple -P -S $row_type
- ./maria_chk$suffix -sm test1
+ $maria_path/ma_test1$suffix $silent --key_multiple -N -S $row_type
+ $maria_path/maria_chk$suffix -sm test1
+ $maria_path/ma_test1$suffix $silent --key_multiple -a -p --key_length=480 $row_type
+ $maria_path/maria_chk$suffix -sm test1
+ $maria_path/ma_test1$suffix $silent --key_multiple -a -B --key_length=480 $row_type
+ $maria_path/maria_chk$suffix -sm test1
+ $maria_path/ma_test1$suffix $silent --key_multiple -P -S $row_type
+ $maria_path/maria_chk$suffix -sm test1
- ./maria_pack$suffix --force -s test1
- ./maria_chk$suffix -ess test1
+ $maria_path/maria_pack$suffix --force -s test1
+ $maria_path/maria_chk$suffix -ess test1
- ./ma_test2$suffix $silent -L -K -W -P $row_type
- ./maria_chk$suffix -sm test2
- ./ma_test2$suffix $silent -L -K -W -P -A $row_type
- ./maria_chk$suffix -sm test2
- ./ma_test2$suffix $silent -L -K -P -R3 -m50 -b1000000 $row_type
- ./maria_chk$suffix -sm test2
- ./ma_test2$suffix $silent -L -B $row_type
- ./maria_chk$suffix -sm test2
- ./ma_test2$suffix $silent -D -B -c $row_type
- ./maria_chk$suffix -sm test2
- ./ma_test2$suffix $silent -m10000 -e4096 -K $row_type
- ./maria_chk$suffix -sm test2
- ./ma_test2$suffix $silent -m10000 -e8192 -K $row_type
- ./maria_chk$suffix -sm test2
- ./ma_test2$suffix $silent -m10000 -e16384 -E16384 -K -L $row_type
- ./maria_chk$suffix -sm test2
+ $maria_path/ma_test2$suffix $silent -L -K -W -P $row_type
+ $maria_path/maria_chk$suffix -sm test2
+ $maria_path/ma_test2$suffix $silent -L -K -W -P -A $row_type
+ $maria_path/maria_chk$suffix -sm test2
+ $maria_path/ma_test2$suffix $silent -L -K -P -R3 -m50 -b1000000 $row_type
+ $maria_path/maria_chk$suffix -sm test2
+ $maria_path/ma_test2$suffix $silent -L -B $row_type
+ $maria_path/maria_chk$suffix -sm test2
+ $maria_path/ma_test2$suffix $silent -D -B -c $row_type
+ $maria_path/maria_chk$suffix -sm test2
+ $maria_path/ma_test2$suffix $silent -m10000 -e4096 -K $row_type
+ $maria_path/maria_chk$suffix -sm test2
+ $maria_path/ma_test2$suffix $silent -m10000 -e8192 -K $row_type
+ $maria_path/maria_chk$suffix -sm test2
+ $maria_path/ma_test2$suffix $silent -m10000 -e16384 -E16384 -K -L $row_type
+ $maria_path/maria_chk$suffix -sm test2
}
run_repair_tests()
{
row_type=$1
- ./ma_test1$suffix $silent --checksum $row_type
- ./maria_chk$suffix -se test1
- ./maria_chk$suffix -rs test1
- ./maria_chk$suffix -se test1
- ./maria_chk$suffix -rqs test1
- ./maria_chk$suffix -se test1
- ./maria_chk$suffix -rs --correct-checksum test1
- ./maria_chk$suffix -se test1
- ./maria_chk$suffix -rqs --correct-checksum test1
- ./maria_chk$suffix -se test1
- ./maria_chk$suffix -ros --correct-checksum test1
- ./maria_chk$suffix -se test1
- ./maria_chk$suffix -rqos --correct-checksum test1
- ./maria_chk$suffix -se test1
+ $maria_path/ma_test1$suffix $silent --checksum $row_type
+ $maria_path/maria_chk$suffix -se test1
+ $maria_path/maria_chk$suffix -rs test1
+ $maria_path/maria_chk$suffix -se test1
+ $maria_path/maria_chk$suffix -rqs test1
+ $maria_path/maria_chk$suffix -se test1
+ $maria_path/maria_chk$suffix -rs --correct-checksum test1
+ $maria_path/maria_chk$suffix -se test1
+ $maria_path/maria_chk$suffix -rqs --correct-checksum test1
+ $maria_path/maria_chk$suffix -se test1
+ $maria_path/maria_chk$suffix -ros --correct-checksum test1
+ $maria_path/maria_chk$suffix -se test1
+ $maria_path/maria_chk$suffix -rqos --correct-checksum test1
+ $maria_path/maria_chk$suffix -se test1
}
run_pack_tests()
{
row_type=$1
# check of maria_pack / maria_chk
- ./ma_test1$suffix $silent --checksum $row_type
- ./maria_pack$suffix --force -s test1
- ./maria_chk$suffix -ess test1
- ./maria_chk$suffix -rqs test1
- ./maria_chk$suffix -es test1
- ./maria_chk$suffix -rs test1
- ./maria_chk$suffix -es test1
- ./maria_chk$suffix -rus test1
- ./maria_chk$suffix -es test1
+ $maria_path/ma_test1$suffix $silent --checksum $row_type
+ $maria_path/maria_pack$suffix --force -s test1
+ $maria_path/maria_chk$suffix -ess test1
+ $maria_path/maria_chk$suffix -rqs test1
+ $maria_path/maria_chk$suffix -es test1
+ $maria_path/maria_chk$suffix -rs test1
+ $maria_path/maria_chk$suffix -es test1
+ $maria_path/maria_chk$suffix -rus test1
+ $maria_path/maria_chk$suffix -es test1
- ./ma_test1$suffix $silent --checksum -S $row_type
- ./maria_chk$suffix -se test1
- ./maria_chk$suffix -ros test1
- ./maria_chk$suffix -rqs test1
- ./maria_chk$suffix -se test1
+ $maria_path/ma_test1$suffix $silent --checksum -S $row_type
+ $maria_path/maria_chk$suffix -se test1
+ $maria_path/maria_chk$suffix -ros test1
+ $maria_path/maria_chk$suffix -rqs test1
+ $maria_path/maria_chk$suffix -se test1
- ./maria_pack$suffix --force -s test1
- ./maria_chk$suffix -rqs test1
- ./maria_chk$suffix -es test1
- ./maria_chk$suffix -rus test1
- ./maria_chk$suffix -es test1
+ $maria_path/maria_pack$suffix --force -s test1
+ $maria_path/maria_chk$suffix -rqs test1
+ $maria_path/maria_chk$suffix -es test1
+ $maria_path/maria_chk$suffix -rus test1
+ $maria_path/maria_chk$suffix -es test1
}
echo "Running tests with dynamic row format"
@@ -169,27 +175,27 @@ run_tests "-M -T"
# Tests that gives warnings
#
-./ma_test2$suffix $silent -L -K -W -P -S -R1 -m500
-./maria_chk$suffix -sm test2
+$maria_path/ma_test2$suffix $silent -L -K -W -P -S -R1 -m500
+$maria_path/maria_chk$suffix -sm test2
echo "ma_test2$suffix $silent -L -K -R1 -m2000 ; Should give error 135"
-./ma_test2$suffix $silent -L -K -R1 -m2000
-echo "./maria_chk$suffix -sm test2 will warn that 'Datafile is almost full'"
-./maria_chk$suffix -sm test2
-./maria_chk$suffix -ssm test2
+$maria_path/ma_test2$suffix $silent -L -K -R1 -m2000
+echo "$maria_path/maria_chk$suffix -sm test2 will warn that 'Datafile is almost full'"
+$maria_path/maria_chk$suffix -sm test2
+$maria_path/maria_chk$suffix -ssm test2
#
# Some timing tests
#
-time ./ma_test2$suffix $silent
-time ./ma_test2$suffix $silent -S
-time ./ma_test2$suffix $silent -M
-time ./ma_test2$suffix $silent -B
-time ./ma_test2$suffix $silent -L
-time ./ma_test2$suffix $silent -K
-time ./ma_test2$suffix $silent -K -B
-time ./ma_test2$suffix $silent -L -B
-time ./ma_test2$suffix $silent -L -K -B
-time ./ma_test2$suffix $silent -L -K -W -B
-time ./ma_test2$suffix $silent -L -K -W -B -S
-time ./ma_test2$suffix $silent -L -K -W -B -M
-time ./ma_test2$suffix $silent -D -K -W -B -S
+time $maria_path/ma_test2$suffix $silent
+time $maria_path/ma_test2$suffix $silent -S
+time $maria_path/ma_test2$suffix $silent -M
+time $maria_path/ma_test2$suffix $silent -B
+time $maria_path/ma_test2$suffix $silent -L
+time $maria_path/ma_test2$suffix $silent -K
+time $maria_path/ma_test2$suffix $silent -K -B
+time $maria_path/ma_test2$suffix $silent -L -B
+time $maria_path/ma_test2$suffix $silent -L -K -B
+time $maria_path/ma_test2$suffix $silent -L -K -W -B
+time $maria_path/ma_test2$suffix $silent -L -K -W -B -S
+time $maria_path/ma_test2$suffix $silent -L -K -W -B -M
+time $maria_path/ma_test2$suffix $silent -D -K -W -B -S
diff --git a/storage/maria/maria_def.h b/storage/maria/maria_def.h
index d9e31e800c4..740808c7bbe 100644
--- a/storage/maria/maria_def.h
+++ b/storage/maria/maria_def.h
@@ -93,6 +93,7 @@ typedef struct st_maria_state_info
uint sortkey; /* sorted by this key (not used) */
uint open_count;
uint8 changed; /* Changed since mariachk */
+ LSN create_rename_lsn; /**< LSN when table was last created/renamed */
/* the following isn't saved on disk */
uint state_diff_length; /* Should be 0 */
@@ -101,7 +102,8 @@ typedef struct st_maria_state_info
} MARIA_STATE_INFO;
-#define MARIA_STATE_INFO_SIZE (24 + 4 + 11*8 + 4*4 + 8 + 3*4 + 5*8)
+#define MARIA_STATE_INFO_SIZE \
+ (24 + LSN_STORE_SIZE + 4 + 11*8 + 4*4 + 8 + 3*4 + 5*8)
#define MARIA_STATE_KEY_SIZE 8
#define MARIA_STATE_KEYBLOCK_SIZE 8
#define MARIA_STATE_KEYSEG_SIZE 4
@@ -229,6 +231,7 @@ typedef struct st_maria_share
PAGECACHE *pagecache; /* ref to the current key cache */
MARIA_DECODE_TREE *decode_trees;
uint16 *decode_tables;
+ uint16 id; /**< 2-byte id by which log records refer to the table */
/* Called the first time the table instance is opened */
my_bool (*once_init)(struct st_maria_share *, File);
/* Called when the last instance of the table is closed */
@@ -889,6 +892,7 @@ volatile int *_ma_killed_ptr(HA_CHECK *param);
void _ma_check_print_error _VARARGS((HA_CHECK *param, const char *fmt, ...));
void _ma_check_print_warning _VARARGS((HA_CHECK *param, const char *fmt, ...));
void _ma_check_print_info _VARARGS((HA_CHECK *param, const char *fmt, ...));
+int _ma_repair_write_log_record(const HA_CHECK *param, MARIA_HA *info);
C_MODE_END
int _ma_flush_pending_blocks(MARIA_SORT_PARAM *param);
diff --git a/storage/maria/trnman.c b/storage/maria/trnman.c
index d6b35f071ea..83249ab328f 100644
--- a/storage/maria/trnman.c
+++ b/storage/maria/trnman.c
@@ -52,6 +52,7 @@ static my_atomic_rwlock_t LOCK_short_trid_to_trn, LOCK_pool;
/*
Simple interface functions
+ QQ: if they stay so simple, should we make them inline?
*/
uint trnman_increment_locked_tables(TRN *trn)
@@ -343,6 +344,9 @@ int trnman_end_trn(TRN *trn, my_bool commit)
LF_PINS *pins= trn->pins;
DBUG_ENTER("trnman_end_trn");
+ DBUG_ASSERT(trn->rec_lsn == 0);
+ /* if a rollback, all UNDO records should have been executed */
+ DBUG_ASSERT(commit || trn->undo_lsn == 0);
DBUG_PRINT("info", ("pthread_mutex_lock LOCK_trn_list"));
pthread_mutex_lock(&LOCK_trn_list);
@@ -379,8 +383,6 @@ int trnman_end_trn(TRN *trn, my_bool commit)
/*
if transaction is committed and it was not the only active transaction -
add it to the committed list (which is used for read-from relation)
- TODO check in the condition below that a transaction have made some
- changes, was not read-only. Something like '&& UndoLSN != 0'
*/
if (commit && active_list_min.next != &active_list_max)
{
@@ -390,6 +392,19 @@ int trnman_end_trn(TRN *trn, my_bool commit)
trnman_committed_transactions++;
res= lf_hash_insert(&trid_to_committed_trn, pins, &trn);
+ /*
+ By going on with life is res<0, we let other threads block on
+ our rows (because they will never see us committed in
+ trid_to_committed_trn) until they timeout. Though correct, this is not a
+ good situation:
+ - if connection reconnects and wants to check if its rows have been
+ committed, it will not be able to do that (it will just lock on them) so
+ connection stays permanently in doubt
+ - internal structures trid_to_committed_trn and committed_list are
+ desynchronized.
+ So we should take Maria down immediately, the two problems being
+ automatically solved at restart.
+ */
DBUG_ASSERT(res <= 0);
}
if (res)
@@ -526,71 +541,133 @@ void trnman_rollback_statement(TRN *trn __attribute__ ((unused)))
}
-/*
- Allocates two buffers and stores in them some information about transactions
- of the active list (into the first buffer) and of the committed list (into
- the second buffer).
-
- SYNOPSIS
- trnman_collect_transactions()
- str_act (OUT) pointer to a LEX_STRING where the allocated buffer, and
- its size, will be put
- str_com (OUT) pointer to a LEX_STRING where the allocated buffer, and
- its size, will be put
+/**
+ @brief Allocates buffers and stores in them some info about transactions
+ Does the allocation because the caller cannot know the size itself.
+ Memory freeing is to be done by the caller (if the "str" member of the
+ LEX_STRING is not NULL).
+ The caller has the intention of doing checkpoints.
- DESCRIPTION
- Does the allocation because the caller cannot know the size itself.
- Memory freeing is to be done by the caller (if the "str" member of the
- LEX_STRING is not NULL).
- The caller has the intention of doing checkpoints.
+ @param[out] str_act pointer to where the allocated buffer,
+ and its size, will be put; buffer will be filled
+ with info about active transactions
+ @param[out] str_com pointer to where the allocated buffer,
+ and its size, will be put; buffer will be filled
+ with info about committed transactions
+ @param[out] min_first_undo_lsn pointer to where the minimum
+ first_undo_lsn of all transactions will be put
- RETURN
- 0 on success
- 1 on error
+ @return Operation status
+ @retval 0 OK
+ @retval 1 Error
*/
-my_bool trnman_collect_transactions(LEX_STRING *str_act, LEX_STRING *str_com)
+
+my_bool trnman_collect_transactions(LEX_STRING *str_act, LEX_STRING *str_com,
+ LSN *min_rec_lsn, LSN *min_first_undo_lsn)
{
my_bool error;
TRN *trn;
char *ptr;
+ uint stored_transactions= 0;
+ LSN minimum_rec_lsn= ULONGLONG_MAX, minimum_first_undo_lsn= ULONGLONG_MAX;
DBUG_ENTER("trnman_collect_transactions");
DBUG_ASSERT((NULL == str_act->str) && (NULL == str_com->str));
+ /* validate the use of read_non_atomic() in general: */
+ compile_time_assert((sizeof(LSN) == 8) && (sizeof(LSN_WITH_FLAGS) == 8));
+
DBUG_PRINT("info", ("pthread_mutex_lock LOCK_trn_list"));
pthread_mutex_lock(&LOCK_trn_list);
- str_act->length= 8+(6+2+7+7+7)*trnman_active_transactions;
- str_com->length= 8+(6+7+7)*trnman_committed_transactions;
+ str_act->length= 2 + /* number of active transactions */
+ LSN_STORE_SIZE + /* minimum of their rec_lsn */
+ (6 + /* long id */
+ 2 + /* short id */
+ LSN_STORE_SIZE + /* undo_lsn */
+#ifdef MARIA_VERSIONING /* not enabled yet */
+ LSN_STORE_SIZE + /* undo_purge_lsn */
+#endif
+ LSN_STORE_SIZE /* first_undo_lsn */
+ ) * trnman_active_transactions;
+ str_com->length= 8 + /* number of committed transactions */
+ (6 + /* long id */
+#ifdef MARIA_VERSIONING /* not enabled yet */
+ LSN_STORE_SIZE + /* undo_purge_lsn */
+#endif
+ LSN_STORE_SIZE /* first_undo_lsn */
+ ) * trnman_committed_transactions;
if ((NULL == (str_act->str= my_malloc(str_act->length, MYF(MY_WME)))) ||
(NULL == (str_com->str= my_malloc(str_com->length, MYF(MY_WME)))))
goto err;
/* First, the active transactions */
- ptr= str_act->str;
- int8store(ptr, (ulonglong)trnman_active_transactions);
- ptr+= 8;
+ ptr= str_act->str + 2 + LSN_STORE_SIZE;
for (trn= active_list_min.next; trn != &active_list_max; trn= trn->next)
{
/*
- trns with a short trid of 0 are not initialized; Recovery will recognize
- this and ignore them.
- State is not needed for now (only when we supported prepared trns).
- For LSNs, Sanja will soon push lsn7store.
+ trns with a short trid of 0 are not even initialized, we can ignore
+ them. trns with undo_lsn==0 have done no writes, we can ignore them
+ too. XID not needed now.
*/
+ uint sid;
+ LSN rec_lsn, undo_lsn, first_undo_lsn;
+ if ((sid= trn->short_id) == 0)
+ {
+ /*
+ Not even inited, has done nothing. Or it is the
+ dummy_transaction_object, which does only non-transactional
+ immediate-sync operations (CREATE/DROP/RENAME/REPAIR TABLE), and so
+ can be forgotten for Checkpoint.
+ */
+ continue;
+ }
+#ifndef MARIA_CHECKPOINT
+/*
+ in the checkpoint patch (not yet ready) we will have a real implementation
+ of lsn_read_non_atomic(); for now it's not needed
+*/
+#define lsn_read_non_atomic(A) (A)
+#endif
+ /* needed for low-water mark calculation */
+ if (((rec_lsn= lsn_read_non_atomic(trn->rec_lsn)) > 0) &&
+ (cmp_translog_addr(rec_lsn, minimum_rec_lsn) < 0))
+ minimum_rec_lsn= rec_lsn;
+ /*
+ trn may have logged REDOs but not yet UNDO, that's why we read rec_lsn
+ before deciding to ignore if undo_lsn==0.
+ */
+ if ((undo_lsn= trn->undo_lsn) == 0) /* trn can be forgotten */
+ continue;
+ stored_transactions++;
int6store(ptr, trn->trid);
ptr+= 6;
- int2store(ptr, trn->short_id);
+ int2store(ptr, sid);
ptr+= 2;
- /* needed for rollback */
- /* lsn7store(ptr, trn->undo_lsn); */
- ptr+= 7;
- /* needed for purge */
- /* lsn7store(ptr, trn->undo_purge_lsn); */
- ptr+= 7;
+ lsn_store(ptr, undo_lsn); /* needed for rollback */
+ ptr+= LSN_STORE_SIZE;
+#ifdef MARIA_VERSIONING /* not enabled yet */
+ /* to know where purging should start (last delete of this trn) */
+ lsn_store(ptr, trn->undo_purge_lsn);
+ ptr+= LSN_STORE_SIZE;
+#endif
/* needed for low-water mark calculation */
- /* lsn7store(ptr, read_non_atomic(&trn->first_undo_lsn)); */
- ptr+= 7;
+ if (((first_undo_lsn= lsn_read_non_atomic(trn->first_undo_lsn)) > 0) &&
+ (cmp_translog_addr(first_undo_lsn, minimum_first_undo_lsn) < 0))
+ minimum_first_undo_lsn= first_undo_lsn;
+ lsn_store(ptr, first_undo_lsn);
+ ptr+= LSN_STORE_SIZE;
+ /**
+ @todo RECOVERY: add a comment explaining why we can dirtily read some
+ vars, inspired by the text of "assumption 8" in WL#3072
+ */
}
+ str_act->length= ptr - str_act->str; /* as we maybe over-estimated */
+ ptr= str_act->str;
+ int2store(ptr, stored_transactions);
+ ptr+= 2;
+ /* this LSN influences how REDOs for any page can be ignored by Recovery */
+ lsn_store(ptr, minimum_rec_lsn);
+ /* one day there will also be a list of prepared transactions */
/* do the same for committed ones */
ptr= str_com->str;
int8store(ptr, (ulonglong)trnman_committed_transactions);
@@ -598,18 +675,26 @@ my_bool trnman_collect_transactions(LEX_STRING *str_act, LEX_STRING *str_com)
for (trn= committed_list_min.next; trn != &committed_list_max;
trn= trn->next)
{
+ LSN first_undo_lsn;
int6store(ptr, trn->trid);
ptr+= 6;
- /* mi_int7store(ptr, trn->undo_purge_lsn); */
- ptr+= 7;
- /* mi_int7store(ptr, read_non_atomic(&trn->first_undo_lsn)); */
- ptr+= 7;
+#ifdef MARIA_VERSIONING /* not enabled yet */
+ lsn_store(ptr, trn->undo_purge_lsn);
+ ptr+= LSN_STORE_SIZE;
+#endif
+ first_undo_lsn= LSN_WITH_FLAGS_TO_LSN(trn->first_undo_lsn);
+ if (cmp_translog_addr(first_undo_lsn, minimum_first_undo_lsn) < 0)
+ minimum_first_undo_lsn= first_undo_lsn;
+ lsn_store(ptr, first_undo_lsn);
+ ptr+= LSN_STORE_SIZE;
}
/*
TODO: if we see there exists no transaction (active and committed) we can
tell the lock-free structures to do some freeing (my_free()).
*/
error= 0;
+ *min_rec_lsn= minimum_rec_lsn;
+ *min_first_undo_lsn= minimum_first_undo_lsn;
goto end;
err:
error= 1;
diff --git a/storage/maria/trnman.h b/storage/maria/trnman.h
index 1e1550efb46..1a4423f2a11 100644
--- a/storage/maria/trnman.h
+++ b/storage/maria/trnman.h
@@ -45,12 +45,13 @@ struct st_transaction
LF_PINS *pins;
TrID trid, min_read_from, commit_trid;
TRN *next, *prev;
- LSN rec_lsn, undo_lsn, first_undo_lsn;
+ LSN rec_lsn, undo_lsn;
+ LSN_WITH_FLAGS first_undo_lsn;
uint locked_tables;
/* Note! if locks.loid is 0, trn is NOT initialized */
};
-TRN dummy_transaction_object;
+#define TRANSACTION_LOGGED_LONG_ID ULL(0x8000000000000000)
C_MODE_END
diff --git a/storage/maria/trnman_public.h b/storage/maria/trnman_public.h
index 4b3f8acb4b3..3e0a21c26a6 100644
--- a/storage/maria/trnman_public.h
+++ b/storage/maria/trnman_public.h
@@ -20,6 +20,8 @@
to include my_atomic.h in C++ code.
*/
+#include "ma_loghandler_lsn.h"
+
C_MODE_START
typedef uint64 TrID; /* our TrID is 6 bytes */
typedef struct st_transaction TRN;
@@ -27,6 +29,7 @@ typedef struct st_transaction TRN;
#define SHORT_TRID_MAX 65535
extern uint trnman_active_transactions, trnman_allocated_transactions;
+extern TRN dummy_transaction_object;
int trnman_init(void);
void trnman_destroy(void);
@@ -39,7 +42,9 @@ void trnman_free_trn(TRN *trn);
int trnman_can_read_from(TRN *trn, TrID trid);
void trnman_new_statement(TRN *trn);
void trnman_rollback_statement(TRN *trn);
-my_bool trnman_collect_transactions(LEX_STRING *str_act, LEX_STRING *str_com);
+my_bool trnman_collect_transactions(LEX_STRING *str_act, LEX_STRING *str_com,
+ LSN *min_rec_lsn,
+ LSN *min_first_undo_lsn);
uint trnman_increment_locked_tables(TRN *trn);
uint trnman_decrement_locked_tables(TRN *trn);
diff --git a/storage/maria/unittest/ma_test_loghandler-t.c b/storage/maria/unittest/ma_test_loghandler-t.c
index f05d58a784f..e31136d52ec 100644
--- a/storage/maria/unittest/ma_test_loghandler-t.c
+++ b/storage/maria/unittest/ma_test_loghandler-t.c
@@ -196,7 +196,7 @@ int main(int argc __attribute__((unused)), char *argv[])
if (translog_write_record(&lsn,
LOGREC_FIXED_RECORD_0LSN_EXAMPLE,
trn, NULL,
- 6, TRANSLOG_INTERNAL_PARTS + 1, parts))
+ 6, TRANSLOG_INTERNAL_PARTS + 1, parts, NULL))
{
fprintf(stderr, "Can't write record #%lu\n", (ulong) 0);
translog_destroy();
@@ -218,7 +218,7 @@ int main(int argc __attribute__((unused)), char *argv[])
parts[TRANSLOG_INTERNAL_PARTS + 1].str= NULL;
parts[TRANSLOG_INTERNAL_PARTS + 1].length= 0;
if (translog_write_record(&lsn, LOGREC_FIXED_RECORD_1LSN_EXAMPLE,
- trn, NULL, LSN_STORE_SIZE, 0, parts))
+ trn, NULL, LSN_STORE_SIZE, 0, parts, NULL))
{
fprintf(stderr, "1 Can't write reference defore record #%lu\n",
(ulong) i);
@@ -238,7 +238,7 @@ int main(int argc __attribute__((unused)), char *argv[])
if (translog_write_record(&lsn,
LOGREC_VARIABLE_RECORD_1LSN_EXAMPLE,
trn, NULL, 0, TRANSLOG_INTERNAL_PARTS + 2,
- parts))
+ parts, NULL))
{
fprintf(stderr, "1 Can't write var reference defore record #%lu\n",
(ulong) i);
@@ -257,7 +257,7 @@ int main(int argc __attribute__((unused)), char *argv[])
if (translog_write_record(&lsn,
LOGREC_FIXED_RECORD_2LSN_EXAMPLE,
trn, NULL,
- 23, TRANSLOG_INTERNAL_PARTS + 1, parts))
+ 23, TRANSLOG_INTERNAL_PARTS + 1, parts, NULL))
{
fprintf(stderr, "0 Can't write reference defore record #%lu\n",
(ulong) i);
@@ -277,7 +277,7 @@ int main(int argc __attribute__((unused)), char *argv[])
if (translog_write_record(&lsn,
LOGREC_VARIABLE_RECORD_2LSN_EXAMPLE,
trn, NULL, 14 + rec_len,
- TRANSLOG_INTERNAL_PARTS + 2, parts))
+ TRANSLOG_INTERNAL_PARTS + 2, parts, NULL))
{
fprintf(stderr, "0 Can't write var reference defore record #%lu\n",
(ulong) i);
@@ -294,7 +294,7 @@ int main(int argc __attribute__((unused)), char *argv[])
LOGREC_FIXED_RECORD_0LSN_EXAMPLE,
trn, NULL, 6,
TRANSLOG_INTERNAL_PARTS + 1,
- parts))
+ parts, NULL))
{
fprintf(stderr, "Can't write record #%lu\n", (ulong) i);
translog_destroy();
@@ -313,7 +313,7 @@ int main(int argc __attribute__((unused)), char *argv[])
LOGREC_VARIABLE_RECORD_0LSN_EXAMPLE,
trn, NULL, rec_len,
TRANSLOG_INTERNAL_PARTS + 1,
- parts))
+ parts, NULL))
{
fprintf(stderr, "Can't write variable record #%lu\n", (ulong) i);
translog_destroy();
diff --git a/storage/maria/unittest/ma_test_loghandler_multigroup-t.c b/storage/maria/unittest/ma_test_loghandler_multigroup-t.c
index 9ed57da8fec..1281ee425d8 100644
--- a/storage/maria/unittest/ma_test_loghandler_multigroup-t.c
+++ b/storage/maria/unittest/ma_test_loghandler_multigroup-t.c
@@ -192,7 +192,7 @@ int main(int argc __attribute__((unused)), char *argv[])
trn->short_id= 0;
if (translog_write_record(&lsn, LOGREC_FIXED_RECORD_0LSN_EXAMPLE,
trn, NULL,
- 6, TRANSLOG_INTERNAL_PARTS + 1, parts))
+ 6, TRANSLOG_INTERNAL_PARTS + 1, parts, NULL))
{
fprintf(stderr, "Can't write record #%lu\n", (ulong) 0);
translog_destroy();
@@ -214,7 +214,7 @@ int main(int argc __attribute__((unused)), char *argv[])
LOGREC_FIXED_RECORD_1LSN_EXAMPLE,
trn, NULL,
LSN_STORE_SIZE,
- TRANSLOG_INTERNAL_PARTS + 1, parts))
+ TRANSLOG_INTERNAL_PARTS + 1, parts, NULL))
{
fprintf(stderr, "1 Can't write reference before record #%lu\n",
(ulong) i);
@@ -234,7 +234,7 @@ int main(int argc __attribute__((unused)), char *argv[])
LOGREC_VARIABLE_RECORD_1LSN_EXAMPLE,
trn, NULL, LSN_STORE_SIZE + rec_len,
TRANSLOG_INTERNAL_PARTS + 2,
- parts))
+ parts, NULL))
{
fprintf(stderr, "1 Can't write var reference before record #%lu\n",
(ulong) i);
@@ -255,7 +255,7 @@ int main(int argc __attribute__((unused)), char *argv[])
LOGREC_FIXED_RECORD_2LSN_EXAMPLE,
trn, NULL, 23,
TRANSLOG_INTERNAL_PARTS + 1,
- parts))
+ parts, NULL))
{
fprintf(stderr, "0 Can't write reference before record #%lu\n",
(ulong) i);
@@ -276,7 +276,7 @@ int main(int argc __attribute__((unused)), char *argv[])
LOGREC_VARIABLE_RECORD_2LSN_EXAMPLE,
trn, NULL, LSN_STORE_SIZE * 2 + rec_len,
TRANSLOG_INTERNAL_PARTS + 2,
- parts))
+ parts, NULL))
{
fprintf(stderr, "0 Can't write var reference before record #%lu\n",
(ulong) i);
@@ -293,7 +293,7 @@ int main(int argc __attribute__((unused)), char *argv[])
if (translog_write_record(&lsn,
LOGREC_FIXED_RECORD_0LSN_EXAMPLE,
trn, NULL, 6,
- TRANSLOG_INTERNAL_PARTS + 1, parts))
+ TRANSLOG_INTERNAL_PARTS + 1, parts, NULL))
{
fprintf(stderr, "Can't write record #%lu\n", (ulong) i);
translog_destroy();
@@ -311,7 +311,7 @@ int main(int argc __attribute__((unused)), char *argv[])
if (translog_write_record(&lsn,
LOGREC_VARIABLE_RECORD_0LSN_EXAMPLE,
trn, NULL, rec_len,
- TRANSLOG_INTERNAL_PARTS + 1, parts))
+ TRANSLOG_INTERNAL_PARTS + 1, parts, NULL))
{
fprintf(stderr, "Can't write variable record #%lu\n", (ulong) i);
translog_destroy();
diff --git a/storage/maria/unittest/ma_test_loghandler_multithread-t.c b/storage/maria/unittest/ma_test_loghandler_multithread-t.c
index 688c1ec33be..ff966160acc 100644
--- a/storage/maria/unittest/ma_test_loghandler_multithread-t.c
+++ b/storage/maria/unittest/ma_test_loghandler_multithread-t.c
@@ -137,7 +137,7 @@ void writer(int num)
if (translog_write_record(&lsn,
LOGREC_FIXED_RECORD_0LSN_EXAMPLE,
&trn, NULL, 6, TRANSLOG_INTERNAL_PARTS + 1,
- parts))
+ parts, NULL))
{
fprintf(stderr, "Can't write LOGREC_FIXED_RECORD_0LSN_EXAMPLE record #%lu "
"thread %i\n", (ulong) i, num);
@@ -154,7 +154,7 @@ void writer(int num)
LOGREC_VARIABLE_RECORD_0LSN_EXAMPLE,
&trn, NULL,
len, TRANSLOG_INTERNAL_PARTS + 1,
- parts))
+ parts, NULL))
{
fprintf(stderr, "Can't write variable record #%lu\n", (ulong) i);
translog_destroy();
@@ -303,7 +303,7 @@ int main(int argc __attribute__((unused)),
LOGREC_FIXED_RECORD_0LSN_EXAMPLE,
&dummy_transaction_object, NULL, 6,
TRANSLOG_INTERNAL_PARTS + 1,
- parts))
+ parts, NULL))
{
fprintf(stderr, "Can't write the first record\n");
translog_destroy();
diff --git a/storage/maria/unittest/ma_test_loghandler_pagecache-t.c b/storage/maria/unittest/ma_test_loghandler_pagecache-t.c
index b43f0cfa98c..35e05f9c997 100644
--- a/storage/maria/unittest/ma_test_loghandler_pagecache-t.c
+++ b/storage/maria/unittest/ma_test_loghandler_pagecache-t.c
@@ -94,7 +94,7 @@ int main(int argc __attribute__((unused)), char *argv[])
LOGREC_FIXED_RECORD_0LSN_EXAMPLE,
&dummy_transaction_object, NULL, 6,
TRANSLOG_INTERNAL_PARTS + 1,
- parts))
+ parts, NULL))
{
fprintf(stderr, "Can't write record #%lu\n", (ulong) 0);
translog_destroy();