From 1a96259191b193b353387cbb70d7567009e3b247 Mon Sep 17 00:00:00 2001 From: unknown Date: Fri, 22 Jun 2007 14:49:37 +0200 Subject: - WL#3239 "log CREATE TABLE in Maria" - WL#3240 "log DROP TABLE in Maria" - similarly, log RENAME TABLE, REPAIR/OPTIMIZE TABLE, and DELETE no_WHERE_clause (== the DELETE which just truncates the files) - create_rename_lsn added to MARIA_SHARE's state - all these operations (except DROP TABLE) also update the table's create_rename_lsn, which is needed for the correctness of Recovery (see function comment of _ma_repair_write_log_record() in ma_check.c) - write a COMMIT record when transaction commits. - don't log REDOs/UNDOs if this is an internal temporary table like inside ALTER TABLE (I expect this to be a big win). There was already no logging for user-created "CREATE TEMPORARY" tables. - don't fsync files/directories if the table is not transactional - in translog_write_record(), autogenerate a 2-byte-id for the table and log the "id->name" pair (LOGREC_FILE_ID); log LOGREC_LONG_TRANSACTION_ID; automatically store the table's 2-byte-id in any log record. - preparations for Checkpoint: translog_get_horizon(); pausing Checkpoint when some dirty pages are unknown; capturing trn->rec_lsn, trn->first_undo_lsn for Checkpoint and log's low-water-mark computing. - assertions, comments. storage/maria/Makefile.am: more files to build storage/maria/ha_maria.cc: - logging a REPAIR log record if REPAIR/OPTIMIZE was successful. - ha_maria::data_file_type does not have to be set in every info() call, just do it once in open(). - if caller said that transactionality can be disabled (like if caller is ALTER TABLE) i.e. thd->transaction.on==FALSE, then we temporarily disable transactionality of the table in external_lock(); that will ensure that no REDOs/UNDOs are logged for this possibly massive write operation (they are not needed, as if any write fails, the table will be dropped). We re-enable in external_lock(F_UNLCK), which in ALTER TABLE happens before the tmp table replaces the original one (which is good, as thus the final table will have a REDO RENAME and a correct create_rename_lsn). - when we commit we also have to write a log record, so trnman_commit_trn() calls become ma_commit() calls - at end of engine's initialization, we are potentially entering a multi-threaded dangerous world (clients are going to be accepted) and so some assertions of mutex-owning become enforceable, for that we set maria_multi_threaded=TRUE (see ma_control_file.c) storage/maria/ha_maria.h: new member ha_maria::save_transactional (see also ha_maria.cc) storage/maria/ma_blockrec.c: - fixing comments according to discussion with Monty - if a table is transactional but temporarily non-transactional (like in ALTER TABLE), we need to give a sensible LSN to the pages (and, if we give 0, pagecache asserts). - translog_write_record() now takes care of storing the share's 2-byte-id in the log record storage/maria/ma_blockrec.h: fixing comment according to discussion with Monty storage/maria/ma_check.c: When REPAIR/OPTIMIZE modify the data/index file, if this is a transactional table, they must sync it; if they remove files or rename files, they must sync the directory, so that everything is durable. This is just applying to REPAIR/OPTIMIZE the logic already implemented in CREATE/DROP/RENAME a few months ago. Adding a function to write a LOGREC_REPAIR_TABLE at end of REPAIR/OPTIMIZE (called only by ha_maria, not by maria_chk), and to update the table's create_rename_lsn. storage/maria/ma_close.c: fix for a future bug storage/maria/ma_control_file.c: ensuring that if Maria is running in multi-threaded mode, anybody wanting to write to the control file and update last_checkpoint_lsn/last_logno owns the log's lock. storage/maria/ma_control_file.h: see ma_control_file.c storage/maria/ma_create.c: when creating a table: - sync it and its directory only if this is a transactional table and there is a log (no point in syncing in maria_chk) - decouple the two uses of linkname/linkname_ptr (for index file and for data file) into more variables, as we need to know all links until the moment we write the LOGREC_CREATE_TABLE. - set share.data_file_type early so that _ma_initialize_data_file() knows it (Monty's bugfix so that a table always has at least a bitmap page when it is created; so data-file is not 0 bytes anymore). - log a LOGREC_CREATE_TABLE; it contains the bytes which we have just written to the index file's header. Update table's create_rename_lsn. - syncing of kfile had been bugified in a previous merge, correcting - syncing of dfile is now needed as it's not empty anymore - in _ma_initialize_data_file(), use share's block_size and not the global one. This is a gratuitous change, both variables are equal, just that I find it more future-proof to use share-bound variable rather than global one. storage/maria/ma_delete_all.c: log a LOGREC_DELETE_ALL record when doing ma_delete_all_rows(); update create_rename_lsn then. storage/maria/ma_delete_table.c: - logging LOGREC_DROP_TABLE; knowing if this is needed, requires knowing if the table is transactional, which requires opening the table. - we need to sync directories only if the table is transactional storage/maria/ma_extra.c: questions storage/maria/ma_init.c: when maria_end() is called, engine is not multithreaded storage/maria/ma_loghandler.c: - translog_inited has to be visible to ma_create() (see how it is used in ma_create()) - checkpoint record will be a single record, not three - no REDO for TRUNCATE (TRUNCATE calls ma_create() internally so will log a REDO_CREATE) - adding REDO for DELETE no_WHERE_clause (fast DELETE of all rows by truncating the files), REPAIR. - MY_WAIT_IF_FULL to wait&retry if a log write hits a full disk - in translog_write_record(), if MARIA_SHARE does not yet have a 2-byte-id, generate one for it and log LOGREC_FILE_ID; automatically store this short id into log records. - in translog_write_record(), if transaction has not logged its long trid, log LOGREC_LONG_TRANSACTION_ID. - For Checkpoint, we need to know the current end-of-log: adding translog_get_horizon(). - For Control File, adding an assertion that the thread owns the log's lock (control file is protected by this lock) storage/maria/ma_loghandler.h: Changes in log records (see ma_loghandler.c). new prototypes, new functions. storage/maria/ma_loghandler_lsn.h: adding a type LSN_WITH_FLAGS especially for TRN::first_undo_lsn, where the most significant byte is used for flags. storage/maria/ma_open.c: storing the create_rename_lsn in the index file's header (in the state, precisely) and retrieving it from there. storage/maria/ma_pagecache.c: - my set_if_bigger was wrong, correcting it - if the first_in_switch list is not empty, it means that changed_blocks misses some dirty pages, so Checkpoint cannot run and needs to wait. A variable missing_blocks_in_changed_list is added to tell that (should it be named missing_blocks_in_changed_blocks?) - pagecache_collect_changed_blocks_with_lsn() now also tells the minimum rec_lsn (needed for low-water mark computation). storage/maria/ma_pagecache.h: see ma_pagecache.c storage/maria/ma_panic.c: comment storage/maria/ma_range.c: comment storage/maria/ma_rename.c: - logging LOGREC_RENAME_TABLE; knowing if this is needed, requires knowing if the table is transactional, which requires opening the table. - update create_rename_lsn - we need to sync directories only if the table is transactional storage/maria/ma_static.c: comment storage/maria/ma_test_all.sh: - tip for Valgrind-ing ma_test_all - do "export maria_path=somepath" before calling ma_test_all, if you want to run ma_test_all out of storage/maria (useful to have parallel runs, like one normal and one Valgrind, they must not use the same tables so need to run in different directories) storage/maria/maria_def.h: - state now contains, in memory and on disk, the create_rename_lsn - share now contains a 2-byte-id storage/maria/trnman.c: preparations for Checkpoint: capture trn->rec_lsn, trn->first_undo_lsn; minimum first_undo_lsn needed to know log's low-water-mark storage/maria/trnman.h: using most significant byte of first_undo_lsn to hold miscellaneous flags, for now TRANSACTION_LOGGED_LONG_ID. dummy_transaction_object is already declared in ma_static.c. storage/maria/trnman_public.h: dummy_transaction_object was declared in all files including trnman_public.h, while in fact it's a single object. new prototype storage/maria/unittest/ma_test_loghandler-t.c: update for new prototype storage/maria/unittest/ma_test_loghandler_multigroup-t.c: update for new prototype storage/maria/unittest/ma_test_loghandler_multithread-t.c: update for new prototype storage/maria/unittest/ma_test_loghandler_pagecache-t.c: update for new prototype storage/maria/ma_commit.c: function which wraps: - writing a LOGREC_COMMIT record (==commit on disk) - calling trnman_commit_trn() (=commit in memory) storage/maria/ma_commit.h: new header file .tree-is-private: this file is now needed to keep our tree private (don't push it to public trees). When 5.1 is merged into mysql-maria, we can abandon our maria-specific post-commit trigger; .tree_is_private will take care of keeping commit mails private. Don't push this file to public trees. --- storage/maria/ma_rename.c | 96 ++++++++++++++++++++++++++++++++++++----------- 1 file changed, 74 insertions(+), 22 deletions(-) (limited to 'storage/maria/ma_rename.c') diff --git a/storage/maria/ma_rename.c b/storage/maria/ma_rename.c index a80bbcd398f..5224698c614 100644 --- a/storage/maria/ma_rename.c +++ b/storage/maria/ma_rename.c @@ -18,6 +18,18 @@ */ #include "ma_fulltext.h" +#include "trnman_public.h" + +/** + @brief renames a table + + @param old_name current name of table + @param new_name table should be renamed to this name + + @return Operation status + @retval 0 OK + @retval !=0 Error +*/ int maria_rename(const char *old_name, const char *new_name) { @@ -26,22 +38,73 @@ int maria_rename(const char *old_name, const char *new_name) #ifdef USE_RAID uint raid_type=0,raid_chunks=0; #endif + MARIA_HA *info; + MARIA_SHARE *share; + myf sync_dir; DBUG_ENTER("maria_rename"); #ifdef EXTRA_DEBUG _ma_check_table_is_closed(old_name,"rename old_table"); _ma_check_table_is_closed(new_name,"rename new table2"); #endif - /* LOCK TODO take X-lock on table here */ + /** @todo LOCK take X-lock on table */ + if (!(info= maria_open(old_name, O_RDWR, HA_OPEN_FOR_REPAIR))) + DBUG_RETURN(my_errno); + share= info->s; #ifdef USE_RAID + raid_type = share->base.raid_type; + raid_chunks = share->base.raid_chunks; +#endif + + sync_dir= (share->base.transactional && !share->temporary) ? + MY_SYNC_DIR : 0; + if (sync_dir) { - MARIA_HA *info; - if (!(info=maria_open(old_name, O_RDONLY, 0))) - DBUG_RETURN(my_errno); - raid_type = info->s->base.raid_type; - raid_chunks = info->s->base.raid_chunks; - maria_close(info); + uchar log_data[LSN_STORE_SIZE]; + LEX_STRING log_array[TRANSLOG_INTERNAL_PARTS + 3]; + uint old_name_len= strlen(old_name), new_name_len= strlen(new_name); + int2store(log_data, old_name_len); + int2store(log_data + 2, new_name_len); + log_array[TRANSLOG_INTERNAL_PARTS + 0].str= log_data; + log_array[TRANSLOG_INTERNAL_PARTS + 0].length= 2 + 2; + log_array[TRANSLOG_INTERNAL_PARTS + 1].str= (char *)old_name; + log_array[TRANSLOG_INTERNAL_PARTS + 1].length= old_name_len; + log_array[TRANSLOG_INTERNAL_PARTS + 2].str= (char *)new_name; + log_array[TRANSLOG_INTERNAL_PARTS + 2].length= new_name_len; + /* + For this record to be of any use for Recovery, we need the upper + MySQL layer to be crash-safe, which it is not now (that would require + work using the ddl_log of sql/sql_table.cc); when it is, we should + reconsider the moment of writing this log record (before or after op, + under THR_LOCK_maria or not...), how to use it in Recovery, and force + the log. For now this record is just informative. + */ + if (unlikely(translog_write_record(&share->state.create_rename_lsn, + LOGREC_REDO_RENAME_TABLE, + &dummy_transaction_object, NULL, + 2 + 2 + old_name_len + new_name_len, + sizeof(log_array)/sizeof(log_array[0]), + log_array, NULL))) + { + maria_close(info); + DBUG_RETURN(1); + } + /* + store LSN into file, needed for Recovery to not be confused if a + RENAME happened (applying REDOs to the wrong table). + */ + lsn_store(log_data, share->state.create_rename_lsn); + if (my_pwrite(share->kfile.file, log_data, sizeof(log_data), + sizeof(share->state.header) + 2, MYF(MY_NABP)) || + my_sync(share->kfile.file, MYF(MY_WME))) + { + maria_close(info); + DBUG_RETURN(1); + } } + + maria_close(info); +#ifdef USE_RAID #ifdef EXTRA_DEBUG _ma_check_table_is_closed(old_name,"rename raidcheck"); #endif @@ -49,29 +112,18 @@ int maria_rename(const char *old_name, const char *new_name) fn_format(from,old_name,"",MARIA_NAME_IEXT,MY_UNPACK_FILENAME|MY_APPEND_EXT); fn_format(to,new_name,"",MARIA_NAME_IEXT,MY_UNPACK_FILENAME|MY_APPEND_EXT); - /* - RECOVERY TODO log the two renames below. Update - ZeroDirtyPagesLSN of the table on disk (=> sync the files), this is - needed so that Recovery does not pick a wrong table. - Then do the file renames. - For this log record to be of any use for Recovery, we need the upper MySQL - layer to be crash-safe in DDLs; when it is we should reconsider the moment - of writing this log record, how to use it in Recovery, and force the log. - For now this record is only informative. But ZeroDirtyPagesLSN is - critically needed! - */ - if (my_rename_with_symlink(from, to, MYF(MY_WME | MY_SYNC_DIR))) + if (my_rename_with_symlink(from, to, MYF(MY_WME | sync_dir))) DBUG_RETURN(my_errno); fn_format(from,old_name,"",MARIA_NAME_DEXT,MY_UNPACK_FILENAME|MY_APPEND_EXT); fn_format(to,new_name,"",MARIA_NAME_DEXT,MY_UNPACK_FILENAME|MY_APPEND_EXT); #ifdef USE_RAID if (raid_type) data_file_rename_error= my_raid_rename(from, to, raid_chunks, - MYF(MY_WME | MY_SYNC_DIR)); + MYF(MY_WME | sync_dir)); else #endif data_file_rename_error= - my_rename_with_symlink(from, to, MYF(MY_WME | MY_SYNC_DIR)); + my_rename_with_symlink(from, to, MYF(MY_WME | sync_dir)); if (data_file_rename_error) { /* @@ -81,7 +133,7 @@ int maria_rename(const char *old_name, const char *new_name) data_file_rename_error= my_errno; fn_format(from, old_name, "", MARIA_NAME_IEXT, MYF(MY_UNPACK_FILENAME|MY_APPEND_EXT)); fn_format(to, new_name, "", MARIA_NAME_IEXT, MYF(MY_UNPACK_FILENAME|MY_APPEND_EXT)); - my_rename_with_symlink(to, from, MYF(MY_WME | MY_SYNC_DIR)); + my_rename_with_symlink(to, from, MYF(MY_WME | sync_dir)); } DBUG_RETURN(data_file_rename_error); -- cgit v1.2.1