diff options
author | Sergey Petrunya <psergey@askmonty.org> | 2009-09-15 14:46:35 +0400 |
---|---|---|
committer | Sergey Petrunya <psergey@askmonty.org> | 2009-09-15 14:46:35 +0400 |
commit | 151e5d586cc59afb3c664d56d47265c964fe7fb1 (patch) | |
tree | 1972ed60ad937e632cf111e7beafb2148a87efeb | |
parent | 2083b7ddfd85b2bab837b73cb8357afcc2e893d8 (diff) | |
parent | b495b3db2346a98650946ad5907ba3fe164eadc7 (diff) | |
download | mariadb-git-151e5d586cc59afb3c664d56d47265c964fe7fb1.tar.gz |
Merge lp:maria -> lp:~maria-captains/maria/maria-5.1-merge
190 files changed, 9424 insertions, 3375 deletions
diff --git a/.bzrignore b/.bzrignore index ea3f83a4bd5..258a16fdeca 100644 --- a/.bzrignore +++ b/.bzrignore @@ -1920,3 +1920,4 @@ sql/share/swedish sql/share/ukrainian libmysqld/examples/mysqltest.cc extra/libevent/event-config.h +libmysqld/opt_table_elimination.cc diff --git a/BUILD/Makefile.am b/BUILD/Makefile.am index cd4b00ea731..8343f1e2f27 100644 --- a/BUILD/Makefile.am +++ b/BUILD/Makefile.am @@ -71,12 +71,16 @@ EXTRA_DIST = FINISH.sh \ compile-ppc-max \ compile-solaris-amd64 \ compile-solaris-amd64-debug \ + compile-solaris-amd64-debug-forte \ compile-solaris-amd64-forte \ - compile-solaris-amd64-forte-debug \ compile-solaris-sparc \ compile-solaris-sparc-debug \ compile-solaris-sparc-forte \ - compile-solaris-sparc-purify + compile-solaris-sparc-purify \ + compile-solaris-x86-32 \ + compile-solaris-x86-32-debug \ + compile-solaris-x86-32-debug-forte \ + compile-solaris-x86-forte-32 # Don't update the files from bitkeeper %::SCCS/s.% diff --git a/KNOWN_BUGS.txt b/KNOWN_BUGS.txt index 189c7dcd613..38472fc978c 100644 --- a/KNOWN_BUGS.txt +++ b/KNOWN_BUGS.txt @@ -1,86 +1,35 @@ -This file should contain all know fatal bugs in the Maria storage -engine for the last source or binary release. Minor bugs, extensions -and feature request and bugs found since this release can be find in the -MySQL bugs databases at: http://bugs.mysql.com/ (category "Maria -storage engine"). +This file should contain all know fatal bugs in the Mariadb and the +Maria storage engine for the last source or binary release. Minor +bugs, extensions and feature request and bugs found since this release +can be find in the MariaDB bugs database at: +https://bugs.launchpad.net/maria and in the MySQL bugs databases at: +http://bugs.mysql.com/ (category "Maria storage engine"). There shouldn't normally be any bugs that affects normal operations in -any Maria release. Still, there are always exceptions and edge cases +any MariaDB release. Still, there are always exceptions and edge cases and that's what this file is for. -For the first few Alpha releases of Maria there may be some edge cases -that crashes during recovery; We don't like that but we think it's -better to get the Maria alpha out early to get things tested and get -more developers on the code early than wait until these are fixed. We -do however think that the bugs are not seriously enough to stop anyone -from starting to test and even use Maria for real (as long as they are -prepared to upgrade to next MySQL-Maria release ASAP). - If you have found a bug that is not listed here, please add it to -http://bugs.mysql.com/ so that we can either fix it for next release -or in the worst case add it here for others to know! +http://bugs.launchpad.net/maria so that we can either fix it for next +release or in the worst case add it here for others to know! IMPORTANT: -If you have been using a MySQL-5.1-Maria-alpha build and upgrading to -MySQL-5.1-Maria-beta you MUST run maria_chk --recover on all your -Maria tables. This is because we made an incompatible change of how -transaction id is stored and old transaction id's must be reset! +If you have been using the Maria storage engine with +MySQL-5.1-Maria-alpha build and upgrading to a newer MariaDB you MUST +run maria_chk --recover on all your Maria tables. This is because we +made an incompatible change of how transaction id is stored and old +transaction id's must be reset! cd mysql-data-directory maria_chk --recover */*.MAI -As the Maria-1.5 engine is now in beta we will do our best to not +As the Maria storage engine is now in beta we will do our best to not introduce any incompatible changes in the data format for the Maria tables; If this would be ever be needed, we will, if possible, support both the old and the new version to make upgrades as easy as possible. -Known bugs that we are working on and will be fixed shortly -=========================================================== - -- We have some time ago some instabilities in log writing that is was - under investigation but we haven't been able to repeat in a while. - This causes mainly assert to triggers in the code and sometimes - the log handler doesn't start up after restart. - Most of this should now be fixed. - -- INSERT on a duplicate key against a key inserted by another connection - that has not yet ended will give a duplicate key error instead of - waiting for the other statement to end. - - -Known bugs that are planned to be fixed before Gamma/RC -======================================================= - -- If we get a write failure on disk (disk full or disk error) for the - log, we should stop all usage of transactional tables and mark all - transactional tables that are changed as crashed. - For the moment, if this happens, you have to take down mysqld, - remove all logs, restart mysqld and repair your tables. - - If you get the related error: - "Disk is full writing '/usr/local/mysql/var/maria_log.????????' (Errcode: 28) - Waiting for someone to free space..." - you should either free disk space, in which Maria will continue as before - or kill mysqld, remove logs and repair tables. - - -Known bugs that are planned to be fixed later -============================================= - -LOCK TABLES .. WRITE CONCURRENT is mainly done for testing MVCC. Don't -use this in production. - -Missing features that is planned to fix before Beta -=================================================== - -None - -Features planned for future releases -==================================== - -Most notable is full transaction support and multiple reader/writers -in Maria 2.0 - -http://forge.mysql.com/worklog/ -(you can enter "maria" in the "quick search" field there). +Note that for the MariaDB 5.1 release the Maria storage engine is +classified as 'beta'; It should work, but use it with caution. Please +report all bugs to https://bugs.launchpad.net/maria so that we can fix +them! @@ -1,10 +1,21 @@ -This is a release of MySQL, a dual-license SQL database server. +This is a release of MariaDB, a branch of MySQL. MySQL is brought to you by the MySQL team at Sun Microsystems, Inc. +MariaDB is a drop-in replacement of MySQL, with more features, less +bugs and better performance. + +MariaDB is brought to you by many of the original developers of MySQL, +that now work for Monty Program Ab, and by many people in the +community. + +MySQL, which is the base of MariaDB, is brought to you by Sun. + License information can be found in these files: - For GPL (free) distributions, see the COPYING file and the EXCEPTIONS-CLIENT file. -- For commercial distributions, see the LICENSE.mysql file. + +A description of the MariaDB project can be found at: +http://askmonty.org/wiki/index.php/MariaDB GPLv2 Disclaimer For the avoidance of doubt, except that if any license choice @@ -15,36 +26,34 @@ is made available with the language indicating that GPLv2 or any later version may be used, or where a choice of which version of the GPL is applied is otherwise unspecified. -For further information about MySQL or additional documentation, see: -- The latest information about MySQL: http://www.mysql.com -- The current MySQL documentation: http://dev.mysql.com/doc +The differences between MariaDB and MySQL can be found at: +http://askmonty.org/wiki/index.php/MariaDB_versus_MySQL + +Documentation about MySQL can be found at: +http://dev.mysql.com/doc + +For further information about MySQL documentation, see: +- The current MySQL documentation: Some manual sections of special interest: -- If you are migrating from an older version of MySQL, please read the - "Upgrading from..." section first! -- To see what MySQL can do, take a look at the features section. -- For installation instructions, see the Installing and Upgrading -chapter. -- For the new features/bugfix history, see the Change History appendix. -- For the currently known bugs/misfeatures (known errors) see the -Problems - and Common Errors appendix. + + - For a list of developers and other contributors, see the Credits appendix. A local copy of the MySQL Reference Manual can be found in the Docs directory in GNU Info format. You can also browse the manual online or -download it in any of several formats at the URL given earlier in this -file. +download it in any of several formats from +http://dev.mysql.com/doc ************************************************************ IMPORTANT: -Bug or error reports should be sent to http://bugs.mysql.com. - - +Bug or error reports regarding MariaDB should be sent to +https://bugs.launchpad.net/maria +Bugs in the MySQL code can also be sent to http://bugs.mysql.com *************************************************************************** %%The following software may be included in this product: diff --git a/config/ac-macros/misc.m4 b/config/ac-macros/misc.m4 index 1eec0e9e18c..996ac62e025 100644 --- a/config/ac-macros/misc.m4 +++ b/config/ac-macros/misc.m4 @@ -601,15 +601,15 @@ dnl --------------------------------------------------------------------------- dnl MYSQL_NEEDS_MYSYS_NEW AC_DEFUN([MYSQL_NEEDS_MYSYS_NEW], -[AC_CACHE_CHECK([needs mysys_new helpers], mysql_use_mysys_new, +[AC_CACHE_CHECK([needs mysys_new helpers], mysql_cv_use_mysys_new, [ AC_LANG_PUSH(C++) AC_TRY_LINK([], [ class A { public: int b; }; A *a=new A; a->b=10; delete a; -], mysql_use_mysys_new=no, mysql_use_mysys_new=yes) +], mysql_cv_use_mysys_new=no, mysql_cv_use_mysys_new=yes) AC_LANG_POP(C++) ]) -if test "$mysql_use_mysys_new" = "yes" +if test "$mysql_cv_use_mysys_new" = "yes" then AC_DEFINE([USE_MYSYS_NEW], [1], [Needs to use mysys_new helpers]) fi diff --git a/extra/yassl/include/yassl_int.hpp b/extra/yassl/include/yassl_int.hpp index d18dc41860c..12489468e6b 100644 --- a/extra/yassl/include/yassl_int.hpp +++ b/extra/yassl/include/yassl_int.hpp @@ -441,7 +441,7 @@ public: const Ciphers& GetCiphers() const; const DH_Parms& GetDH_Parms() const; const Stats& GetStats() const; - const VerifyCallback getVerifyCallback() const; + VerifyCallback getVerifyCallback() const; pem_password_cb GetPasswordCb() const; void* GetUserData() const; bool GetSessionCacheOff() const; diff --git a/extra/yassl/src/handshake.cpp b/extra/yassl/src/handshake.cpp index b4d9005af15..43f446ec76f 100644 --- a/extra/yassl/src/handshake.cpp +++ b/extra/yassl/src/handshake.cpp @@ -789,7 +789,7 @@ void processReply(SSL& ssl) { if (ssl.GetError()) return; - if (DoProcessReply(ssl)) + if (DoProcessReply(ssl)) { { // didn't complete process if (!ssl.getSocket().IsNonBlocking()) { @@ -874,7 +874,7 @@ void sendServerKeyExchange(SSL& ssl, BufferOutput buffer) // send change cipher void sendChangeCipher(SSL& ssl, BufferOutput buffer) { - if (ssl.getSecurity().get_parms().entity_ == server_end) + if (ssl.getSecurity().get_parms().entity_ == server_end) { { if (ssl.getSecurity().get_resuming()) ssl.verifyState(clientKeyExchangeComplete); diff --git a/extra/yassl/src/yassl_imp.cpp b/extra/yassl/src/yassl_imp.cpp index f079df8c7ce..9ced6493d8b 100644 --- a/extra/yassl/src/yassl_imp.cpp +++ b/extra/yassl/src/yassl_imp.cpp @@ -1304,7 +1304,7 @@ void ServerHello::Process(input_buffer&, SSL& ssl) else ssl.useSecurity().use_connection().sessionID_Set_ = false; - if (ssl.getSecurity().get_resuming()) + if (ssl.getSecurity().get_resuming()) { { if (memcmp(session_id_, ssl.getSecurity().get_resume().GetID(), ID_LEN) == 0) { diff --git a/extra/yassl/src/yassl_int.cpp b/extra/yassl/src/yassl_int.cpp index b7f91d72166..8e4a9aa95ec 100644 --- a/extra/yassl/src/yassl_int.cpp +++ b/extra/yassl/src/yassl_int.cpp @@ -1833,7 +1833,7 @@ SSL_CTX::GetCA_List() const } -const VerifyCallback SSL_CTX::getVerifyCallback() const +VerifyCallback SSL_CTX::getVerifyCallback() const { return verifyCallback_; } diff --git a/extra/yassl/taocrypt/include/modes.hpp b/extra/yassl/taocrypt/include/modes.hpp index 4575fe1414b..f377ff85651 100644 --- a/extra/yassl/taocrypt/include/modes.hpp +++ b/extra/yassl/taocrypt/include/modes.hpp @@ -95,7 +95,7 @@ inline void Mode_BASE::Process(byte* out, const byte* in, word32 sz) { if (mode_ == ECB) ECB_Process(out, in, sz); - else if (mode_ == CBC) + else if (mode_ == CBC) { { if (dir_ == ENCRYPTION) CBC_Encrypt(out, in, sz); diff --git a/extra/yassl/taocrypt/src/asn.cpp b/extra/yassl/taocrypt/src/asn.cpp index 78200841bda..530dd4bed04 100644 --- a/extra/yassl/taocrypt/src/asn.cpp +++ b/extra/yassl/taocrypt/src/asn.cpp @@ -1063,7 +1063,7 @@ word32 DecodeDSA_Signature(byte* decoded, const byte* encoded, word32 sz) return 0; } word32 rLen = GetLength(source); - if (rLen != 20) + if (rLen != 20) { { if (rLen == 21) { // zero at front, eat source.next(); @@ -1087,7 +1087,7 @@ word32 DecodeDSA_Signature(byte* decoded, const byte* encoded, word32 sz) return 0; } word32 sLen = GetLength(source); - if (sLen != 20) + if (sLen != 20) { { if (sLen == 21) { source.next(); // zero at front, eat diff --git a/include/my_global.h b/include/my_global.h index e808ce04015..19dfce67e42 100644 --- a/include/my_global.h +++ b/include/my_global.h @@ -926,8 +926,7 @@ typedef long long my_ptrdiff_t; #define MY_ALIGN(A,L) (((A) + (L) - 1) & ~((L) - 1)) #define ALIGN_SIZE(A) MY_ALIGN((A),sizeof(double)) /* Size to make adressable obj. */ -#define ALIGN_PTR(A, t) ((t*) MY_ALIGN((A),sizeof(t))) - /* Offset of field f in structure t */ +#define ALIGN_PTR(A, t) ((t*) MY_ALIGN((A), sizeof(double))) #define OFFSET(t, f) ((size_t)(char *)&((t *)0)->f) #define ADD_TO_PTR(ptr,size,type) (type) ((uchar*) (ptr)+size) #define PTR_BYTE_DIFF(A,B) (my_ptrdiff_t) ((uchar*) (A) - (uchar*) (B)) diff --git a/libmysqld/Makefile.am b/libmysqld/Makefile.am index 7dc44e179c3..d1dacbaec0c 100644 --- a/libmysqld/Makefile.am +++ b/libmysqld/Makefile.am @@ -76,7 +76,7 @@ sqlsources = derror.cc field.cc field_conv.cc strfunc.cc filesort.cc \ rpl_filter.cc sql_partition.cc sql_builtin.cc sql_plugin.cc \ sql_tablespace.cc \ rpl_injector.cc my_user.c partition_info.cc \ - sql_servers.cc event_parse_data.cc + sql_servers.cc event_parse_data.cc opt_table_elimination.cc libmysqld_int_a_SOURCES= $(libmysqld_sources) nodist_libmysqld_int_a_SOURCES= $(libmysqlsources) $(sqlsources) diff --git a/mysql-test/lib/mtr_gcov.pl b/mysql-test/lib/mtr_gcov.pl index f531889b08d..ef1067bfd27 100644 --- a/mysql-test/lib/mtr_gcov.pl +++ b/mysql-test/lib/mtr_gcov.pl @@ -20,6 +20,8 @@ use strict; +our $basedir; + sub gcov_prepare ($) { my ($dir)= @_; print "Purging gcov information from '$dir'...\n"; @@ -42,7 +44,7 @@ sub gcov_collect ($$$) { # Get current directory to return to later. my $start_dir= cwd(); - print "Collecting source coverage info using '$gcov'...\n"; + print "Collecting source coverage info using '$gcov'...$basedir\n"; -f "$start_dir/$gcov_msg" and unlink("$start_dir/$gcov_msg"); -f "$start_dir/$gcov_err" and unlink("$start_dir/$gcov_err"); @@ -62,6 +64,7 @@ sub gcov_collect ($$$) { $dir_reported= 1; } system("$gcov $f 2>>$start_dir/$gcov_err >>$start_dir/$gcov_msg"); + system("perl $basedir/mysql-test/lib/process-purecov-annotations.pl $f.gcov"); } chdir($start_dir); } diff --git a/mysql-test/lib/process-purecov-annotations.pl b/mysql-test/lib/process-purecov-annotations.pl new file mode 100755 index 00000000000..843d1d2f130 --- /dev/null +++ b/mysql-test/lib/process-purecov-annotations.pl @@ -0,0 +1,64 @@ +#!/usr/bin/perl +# -*- cperl -*- + +# This script processes a .gcov coverage report to honor purecov +# annotations: lines marked as inspected or as deadcode are changed +# from looking like lines with code that was never executed to look +# like lines that have no executable code. + +use strict; +use warnings; + +foreach my $in_file_name ( @ARGV ) +{ + my $out_file_name=$in_file_name . ".tmp"; + my $skipping=0; + + open(IN, "<", $in_file_name) || next; + open(OUT, ">", $out_file_name); + while(<IN>) + { + my $line= $_; + my $check= $line; + + # process purecov: start/end multi-blocks + my $started=0; + my $ended= 0; + while (($started=($check =~ s/purecov: *begin *(deadcode|inspected)//)) || + ($ended=($check =~ s/purecov: *end//))) + { + $skipping= $skipping + $started - $ended; + } + if ($skipping < 0) + { + print OUT "WARNING: #####: incorrect order of purecov begin/end annotations\n"; + $skipping= 0; + } + + # Besides purecov annotations, also remove uncovered code mark from cases + # like the following: + # + # -: 211:*/ + # -: 212:class Field_value : public Value_dep + # #####: 213:{ + # -: 214:public: + # + # I have no idea why would gcov think there is uncovered code there + # + my @arr= split(/:/, $line); + if ($skipping || $line =~ /purecov: *(inspected|deadcode)/ || + $arr[2] =~ m/^{ */) + { + # Change '####' to '-'. + $arr[0] =~ s/#####/ -/g; + $line= join(":", @arr); + } + print OUT $line; + } + close(IN); + close(OUT); + system("cat $out_file_name > $in_file_name"); + system("rm $out_file_name"); +} + + diff --git a/mysql-test/mysql-test-run.pl b/mysql-test/mysql-test-run.pl index e48bc1954cb..812e25ef351 100755 --- a/mysql-test/mysql-test-run.pl +++ b/mysql-test/mysql-test-run.pl @@ -170,6 +170,7 @@ our $opt_force; our $opt_mem= $ENV{'MTR_MEM'}; our $opt_gcov; +our $opt_gcov_src_dir; our $opt_gcov_exe= "gcov"; our $opt_gcov_err= "mysql-test-gcov.msg"; our $opt_gcov_msg= "mysql-test-gcov.err"; @@ -272,7 +273,7 @@ sub main { command_line_setup(); if ( $opt_gcov ) { - gcov_prepare($basedir); + gcov_prepare($basedir . "/" . $opt_gcov_src_dir); } if (!$opt_suites) { @@ -418,7 +419,7 @@ sub main { mtr_print_line(); if ( $opt_gcov ) { - gcov_collect($basedir, $opt_gcov_exe, + gcov_collect($basedir . "/" . $opt_gcov_src_dir, $opt_gcov_exe, $opt_gcov_msg, $opt_gcov_err); } @@ -890,6 +891,7 @@ sub command_line_setup { # Coverage, profiling etc 'gcov' => \$opt_gcov, + 'gcov-src-dir=s' => \$opt_gcov_src_dir, 'valgrind|valgrind-all' => \$opt_valgrind, 'valgrind-mysqltest' => \$opt_valgrind_mysqltest, 'valgrind-mysqld' => \$opt_valgrind_mysqld, @@ -1799,6 +1801,20 @@ sub tool_arguments ($$) { return mtr_args2str($exe, @$args); } +# This is not used to actually start a mysqld server, just to allow test +# scripts to run the mysqld binary to test invalid server startup options. +sub mysqld_client_arguments () { + my $default_mysqld= default_mysqld(); + my $exe = find_mysqld($basedir); + my $args; + mtr_init_args(\$args); + mtr_add_arg($args, "--no-defaults"); + mtr_add_arg($args, "--basedir=%s", $basedir); + mtr_add_arg($args, "--character-sets-dir=%s", $default_mysqld->value("character-sets-dir")); + mtr_add_arg($args, "--language=%s", $default_mysqld->value("language")); + return mtr_args2str($exe, @$args); +} + sub have_maria_support () { my $maria_var= $mysqld_variables{'maria'}; @@ -2008,6 +2024,7 @@ sub environment_setup { $ENV{'MYSQLADMIN'}= native_path($exe_mysqladmin); $ENV{'MYSQL_CLIENT_TEST'}= mysql_client_test_arguments(); $ENV{'MYSQL_FIX_SYSTEM_TABLES'}= mysql_fix_arguments(); + $ENV{'MYSQLD'}= mysqld_client_arguments(); $ENV{'EXE_MYSQL'}= $exe_mysql; # ---------------------------------------------------- @@ -5516,6 +5533,9 @@ Misc options actions. Disable facility with NUM=0. gcov Collect coverage information after the test. The result is a gcov file per source and header file. + gcov-src-dir=subdir Colllect coverage only within the given subdirectory. + For example, if you're only developing the SQL layer, + it makes sense to use --gcov-src-dir=sql experimental=<file> Refer to list of tests considered experimental; failures will be marked exp-fail instead of fail. report-features First run a "test" that reports mysql features diff --git a/mysql-test/r/log_slow.result b/mysql-test/r/log_slow.result new file mode 100644 index 00000000000..57d12a64f5a --- /dev/null +++ b/mysql-test/r/log_slow.result @@ -0,0 +1,60 @@ +select @@log_slow_filter; +@@log_slow_filter + +select @@log_slow_rate_limit; +@@log_slow_rate_limit +1 +select @@log_slow_verbosity; +@@log_slow_verbosity + +show variables like "log_slow%"; +Variable_name Value +log_slow_filter +log_slow_queries ON +log_slow_rate_limit 1 +log_slow_time 10.000000 +log_slow_verbosity +set @@log_slow_filter= "filesort,filesort_on_disk,full_join,full_scan,query_cache,query_cache_miss,tmp_table,tmp_table_on_disk,admin"; +select @@log_slow_filter; +@@log_slow_filter +admin,filesort,filesort_on_disk,full_join,full_scan,query_cache,query_cache_miss,tmp_table,tmp_table_on_disk +set @@log_slow_filter="admin,admin"; +select @@log_slow_filter; +@@log_slow_filter +admin +set @@log_slow_filter=7; +select @@log_slow_filter; +@@log_slow_filter +admin,filesort,filesort_on_disk +set @@log_slow_filter= "filesort,impossible,impossible2,admin"; +ERROR 42000: Variable 'log_slow_filter' can't be set to the value of 'impossible' +set @@log_slow_filter= "filesort, admin"; +ERROR 42000: Variable 'log_slow_filter' can't be set to the value of ' admin' +set @@log_slow_filter= 1<<31; +ERROR 42000: Variable 'log_slow_filter' can't be set to the value of '2147483648' +select @@log_slow_filter; +@@log_slow_filter +admin,filesort,filesort_on_disk +set @@log_slow_verbosity= "query_plan,innodb"; +select @@log_slow_verbosity; +@@log_slow_verbosity +innodb,query_plan +set @@log_slow_verbosity=1; +select @@log_slow_verbosity; +@@log_slow_verbosity +innodb +show fields from mysql.slow_log; +Field Type Null Key Default Extra +start_time timestamp NO CURRENT_TIMESTAMP on update CURRENT_TIMESTAMP +user_host mediumtext NO NULL +query_time time NO NULL +lock_time time NO NULL +rows_sent int(11) NO NULL +rows_examined int(11) NO NULL +db varchar(512) NO NULL +last_insert_id int(11) NO NULL +insert_id int(11) NO NULL +server_id int(10) unsigned NO NULL +sql_text mediumtext NO NULL +set @@log_slow_filter=default; +set @@log_slow_verbosity=default; diff --git a/mysql-test/r/mysql-bug41486.result b/mysql-test/r/mysql-bug41486.result index 02777ab587f..62a6712eae1 100644 --- a/mysql-test/r/mysql-bug41486.result +++ b/mysql-test/r/mysql-bug41486.result @@ -3,6 +3,9 @@ SET @old_max_allowed_packet= @@global.max_allowed_packet; SET @@global.max_allowed_packet = 2 * 1024 * 1024 + 1024; CREATE TABLE t1(data LONGBLOB); INSERT INTO t1 SELECT REPEAT('1', 2*1024*1024); +SELECT COUNT(*) FROM t1; +COUNT(*) +1 SET @old_general_log = @@global.general_log; SET @@global.general_log = 0; SET @@global.general_log = @old_general_log; diff --git a/mysql-test/r/mysqld_option_err.result b/mysql-test/r/mysqld_option_err.result new file mode 100644 index 00000000000..255f109b788 --- /dev/null +++ b/mysql-test/r/mysqld_option_err.result @@ -0,0 +1,6 @@ +Test that unknown option is not silently ignored. +Test bad binlog format. +Test bad default storage engine. +Test non-numeric value passed to number option. +Test that bad value for plugin enum option is rejected correctly. +Done. diff --git a/mysql-test/r/ps_11bugs.result b/mysql-test/r/ps_11bugs.result index a298c552806..5c11163ab9e 100644 --- a/mysql-test/r/ps_11bugs.result +++ b/mysql-test/r/ps_11bugs.result @@ -121,8 +121,8 @@ insert into t1 values (1); explain select * from t1 where 3 in (select (1+1) union select 1); id select_type table type possible_keys key key_len ref rows Extra 1 PRIMARY NULL NULL NULL NULL NULL NULL NULL Impossible WHERE noticed after reading const tables -2 DEPENDENT SUBQUERY NULL NULL NULL NULL NULL NULL NULL No tables used -3 DEPENDENT UNION NULL NULL NULL NULL NULL NULL NULL No tables used +2 DEPENDENT SUBQUERY NULL NULL NULL NULL NULL NULL NULL Impossible HAVING +3 DEPENDENT UNION NULL NULL NULL NULL NULL NULL NULL Impossible HAVING NULL UNION RESULT <union2,3> ALL NULL NULL NULL NULL NULL select * from t1 where 3 in (select (1+1) union select 1); a diff --git a/mysql-test/r/select.result b/mysql-test/r/select.result index 50b5c3c13fb..6b9a6b7c185 100644 --- a/mysql-test/r/select.result +++ b/mysql-test/r/select.result @@ -3585,7 +3585,6 @@ INSERT INTO t2 VALUES (1,'a'),(2,'b'),(3,'c'); EXPLAIN SELECT t1.a FROM t1 LEFT JOIN t2 ON t2.b=t1.b WHERE t1.a=3; id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE t1 const PRIMARY PRIMARY 4 const 1 -1 SIMPLE t2 const b b 22 const 1 Using index DROP TABLE t1,t2; CREATE TABLE t1(id int PRIMARY KEY, b int, e int); CREATE TABLE t2(i int, a int, INDEX si(i), INDEX ai(a)); diff --git a/mysql-test/r/subselect.result b/mysql-test/r/subselect.result index cd4844456eb..1b8e31ebf78 100644 --- a/mysql-test/r/subselect.result +++ b/mysql-test/r/subselect.result @@ -4353,13 +4353,13 @@ id select_type table type possible_keys key key_len ref rows filtered Extra 1 PRIMARY t1 ALL NULL NULL NULL NULL 2 100.00 2 DEPENDENT SUBQUERY t1 ALL NULL NULL NULL NULL 2 100.00 Using temporary; Using filesort Warnings: -Note 1003 select 1 AS `1` from `test`.`t1` where <in_optimizer>(1,<exists>(select 1 AS `1` from `test`.`t1` group by `test`.`t1`.`a` having (<cache>(1) = <ref_null_helper>(1)))) +Note 1003 select 1 AS `1` from `test`.`t1` where <in_optimizer>(1,<exists>(select 1 AS `1` from `test`.`t1` group by `test`.`t1`.`a` having 1)) EXPLAIN EXTENDED SELECT 1 FROM t1 WHERE 1 IN (SELECT 1 FROM t1 WHERE a > 3 GROUP BY a); id select_type table type possible_keys key key_len ref rows filtered Extra 1 PRIMARY NULL NULL NULL NULL NULL NULL NULL NULL Impossible WHERE noticed after reading const tables 2 DEPENDENT SUBQUERY t1 ALL NULL NULL NULL NULL 2 100.00 Using where; Using temporary; Using filesort Warnings: -Note 1003 select 1 AS `1` from `test`.`t1` where <in_optimizer>(1,<exists>(select 1 AS `1` from `test`.`t1` where (`test`.`t1`.`a` > 3) group by `test`.`t1`.`a` having (<cache>(1) = <ref_null_helper>(1)))) +Note 1003 select 1 AS `1` from `test`.`t1` where <in_optimizer>(1,<exists>(select 1 AS `1` from `test`.`t1` where (`test`.`t1`.`a` > 3) group by `test`.`t1`.`a` having 1)) DROP TABLE t1; # # Bug#45061: Incorrectly market field caused wrong result. diff --git a/mysql-test/r/table_elim.result b/mysql-test/r/table_elim.result new file mode 100644 index 00000000000..ae117af3e32 --- /dev/null +++ b/mysql-test/r/table_elim.result @@ -0,0 +1,420 @@ +drop table if exists t0, t1, t2, t3; +drop view if exists v1, v2; +create table t1 (a int); +insert into t1 values (0),(1),(2),(3); +create table t0 as select * from t1; +create table t2 (a int primary key, b int) +as select a, a as b from t1 where a in (1,2); +create table t3 (a int primary key, b int) +as select a, a as b from t1 where a in (1,3); +# This will be eliminated: +explain select t1.a from t1 left join t2 on t2.a=t1.a; +id select_type table type possible_keys key key_len ref rows Extra +1 SIMPLE t1 ALL NULL NULL NULL NULL 4 +explain extended select t1.a from t1 left join t2 on t2.a=t1.a; +id select_type table type possible_keys key key_len ref rows filtered Extra +1 SIMPLE t1 ALL NULL NULL NULL NULL 4 100.00 +Warnings: +Note 1003 select `test`.`t1`.`a` AS `a` from `test`.`t1` where 1 +select t1.a from t1 left join t2 on t2.a=t1.a; +a +0 +1 +2 +3 +# This will not be eliminated as t2.b is in in select list: +explain select * from t1 left join t2 on t2.a=t1.a; +id select_type table type possible_keys key key_len ref rows Extra +1 SIMPLE t1 ALL NULL NULL NULL NULL 4 +1 SIMPLE t2 eq_ref PRIMARY PRIMARY 4 test.t1.a 1 +# This will not be eliminated as t2.b is in in order list: +explain select t1.a from t1 left join t2 on t2.a=t1.a order by t2.b; +id select_type table type possible_keys key key_len ref rows Extra +1 SIMPLE t1 ALL NULL NULL NULL NULL 4 Using temporary; Using filesort +1 SIMPLE t2 eq_ref PRIMARY PRIMARY 4 test.t1.a 1 +# This will not be eliminated as t2.b is in group list: +explain select t1.a from t1 left join t2 on t2.a=t1.a group by t2.b; +id select_type table type possible_keys key key_len ref rows Extra +1 SIMPLE t1 ALL NULL NULL NULL NULL 4 Using temporary; Using filesort +1 SIMPLE t2 eq_ref PRIMARY PRIMARY 4 test.t1.a 1 +# This will not be eliminated as t2.b is in the WHERE +explain select t1.a from t1 left join t2 on t2.a=t1.a where t2.b < 3 or t2.b is null; +id select_type table type possible_keys key key_len ref rows Extra +1 SIMPLE t1 ALL NULL NULL NULL NULL 4 +1 SIMPLE t2 eq_ref PRIMARY PRIMARY 4 test.t1.a 1 Using where +# Elimination of multiple tables: +explain select t1.a from t1 left join (t2 join t3) on t2.a=t1.a and t3.a=t1.a; +id select_type table type possible_keys key key_len ref rows Extra +1 SIMPLE t1 ALL NULL NULL NULL NULL 4 +# Elimination of multiple tables (2): +explain select t1.a from t1 left join (t2 join t3 on t2.b=t3.b) on t2.a=t1.a and t3.a=t1.a; +id select_type table type possible_keys key key_len ref rows Extra +1 SIMPLE t1 ALL NULL NULL NULL NULL 4 +# Elimination when done within an outer join nest: +explain extended +select t0.* +from +t0 left join (t1 left join (t2 join t3 on t2.b=t3.b) on t2.a=t1.a and +t3.a=t1.a) on t0.a=t1.a; +id select_type table type possible_keys key key_len ref rows filtered Extra +1 SIMPLE t0 ALL NULL NULL NULL NULL 4 100.00 +1 SIMPLE t1 ALL NULL NULL NULL NULL 4 100.00 +Warnings: +Note 1003 select `test`.`t0`.`a` AS `a` from `test`.`t0` left join (`test`.`t1`) on((`test`.`t0`.`a` = `test`.`t1`.`a`)) where 1 +# Elimination with aggregate functions +explain select count(*) from t1 left join t2 on t2.a=t1.a; +id select_type table type possible_keys key key_len ref rows Extra +1 SIMPLE t1 ALL NULL NULL NULL NULL 4 +explain select count(1) from t1 left join t2 on t2.a=t1.a; +id select_type table type possible_keys key key_len ref rows Extra +1 SIMPLE t1 ALL NULL NULL NULL NULL 4 +explain select count(1) from t1 left join t2 on t2.a=t1.a group by t1.a; +id select_type table type possible_keys key key_len ref rows Extra +1 SIMPLE t1 ALL NULL NULL NULL NULL 4 Using temporary; Using filesort +This must not use elimination: +explain select count(1) from t1 left join t2 on t2.a=t1.a group by t2.a; +id select_type table type possible_keys key key_len ref rows Extra +1 SIMPLE t1 ALL NULL NULL NULL NULL 4 Using temporary; Using filesort +1 SIMPLE t2 eq_ref PRIMARY PRIMARY 4 test.t1.a 1 Using index +drop table t0, t1, t2, t3; +create table t0 ( id integer, primary key (id)); +create table t1 ( +id integer, +attr1 integer, +primary key (id), +key (attr1) +); +create table t2 ( +id integer, +attr2 integer, +fromdate date, +primary key (id, fromdate), +key (attr2,fromdate) +); +insert into t0 values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9); +insert into t0 select A.id + 10*B.id from t0 A, t0 B where B.id > 0; +insert into t1 select id, id from t0; +insert into t2 select id, id, date_add('2009-06-22', interval id day) from t0; +insert into t2 select id, id+1, date_add('2008-06-22', interval id day) from t0; +create view v1 as +select +F.id, A1.attr1, A2.attr2 +from +t0 F +left join t1 A1 on A1.id=F.id +left join t2 A2 on A2.id=F.id and +A2.fromdate=(select MAX(fromdate) from +t2 where id=A2.id); +create view v2 as +select +F.id, A1.attr1, A2.attr2 +from +t0 F +left join t1 A1 on A1.id=F.id +left join t2 A2 on A2.id=F.id and +A2.fromdate=(select MAX(fromdate) from +t2 where id=F.id); +This should use one table: +explain select id from v1 where id=2; +id select_type table type possible_keys key key_len ref rows Extra +1 PRIMARY F const PRIMARY PRIMARY 4 const 1 Using index +This should use one table: +explain extended select id from v1 where id in (1,2,3,4); +id select_type table type possible_keys key key_len ref rows filtered Extra +1 PRIMARY F range PRIMARY PRIMARY 4 NULL 4 100.00 Using where; Using index +Warnings: +Note 1276 Field or reference 'test.A2.id' of SELECT #3 was resolved in SELECT #1 +Note 1003 select `F`.`id` AS `id` from `test`.`t0` `F` where (`F`.`id` in (1,2,3,4)) +This should use facts and A1 tables: +explain extended select id from v1 where attr1 between 12 and 14; +id select_type table type possible_keys key key_len ref rows filtered Extra +1 PRIMARY A1 range PRIMARY,attr1 attr1 5 NULL 2 100.00 Using where +1 PRIMARY F eq_ref PRIMARY PRIMARY 4 test.A1.id 1 100.00 Using index +Warnings: +Note 1276 Field or reference 'test.A2.id' of SELECT #3 was resolved in SELECT #1 +Note 1003 select `F`.`id` AS `id` from `test`.`t0` `F` join `test`.`t1` `A1` where ((`F`.`id` = `A1`.`id`) and (`A1`.`attr1` between 12 and 14)) +This should use facts, A2 and its subquery: +explain extended select id from v1 where attr2 between 12 and 14; +id select_type table type possible_keys key key_len ref rows filtered Extra +1 PRIMARY A2 range PRIMARY,attr2 attr2 5 NULL 5 100.00 Using where +1 PRIMARY F eq_ref PRIMARY PRIMARY 4 test.A2.id 1 100.00 Using index +3 DEPENDENT SUBQUERY t2 ref PRIMARY PRIMARY 4 test.A2.id 2 100.00 Using index +Warnings: +Note 1276 Field or reference 'test.A2.id' of SELECT #3 was resolved in SELECT #1 +Note 1003 select `F`.`id` AS `id` from `test`.`t0` `F` join `test`.`t2` `A2` where ((`F`.`id` = `A2`.`id`) and (`A2`.`attr2` between 12 and 14) and (`A2`.`fromdate` = (select max(`test`.`t2`.`fromdate`) AS `MAX(fromdate)` from `test`.`t2` where (`test`.`t2`.`id` = `A2`.`id`)))) +This should use one table: +explain select id from v2 where id=2; +id select_type table type possible_keys key key_len ref rows Extra +1 PRIMARY F const PRIMARY PRIMARY 4 const 1 Using index +This should use one table: +explain extended select id from v2 where id in (1,2,3,4); +id select_type table type possible_keys key key_len ref rows filtered Extra +1 PRIMARY F range PRIMARY PRIMARY 4 NULL 4 100.00 Using where; Using index +Warnings: +Note 1276 Field or reference 'test.F.id' of SELECT #3 was resolved in SELECT #1 +Note 1003 select `F`.`id` AS `id` from `test`.`t0` `F` where (`F`.`id` in (1,2,3,4)) +This should use facts and A1 tables: +explain extended select id from v2 where attr1 between 12 and 14; +id select_type table type possible_keys key key_len ref rows filtered Extra +1 PRIMARY A1 range PRIMARY,attr1 attr1 5 NULL 2 100.00 Using where +1 PRIMARY F eq_ref PRIMARY PRIMARY 4 test.A1.id 1 100.00 Using index +Warnings: +Note 1276 Field or reference 'test.F.id' of SELECT #3 was resolved in SELECT #1 +Note 1003 select `F`.`id` AS `id` from `test`.`t0` `F` join `test`.`t1` `A1` where ((`F`.`id` = `A1`.`id`) and (`A1`.`attr1` between 12 and 14)) +This should use facts, A2 and its subquery: +explain extended select id from v2 where attr2 between 12 and 14; +id select_type table type possible_keys key key_len ref rows filtered Extra +1 PRIMARY A2 range PRIMARY,attr2 attr2 5 NULL 5 100.00 Using where +1 PRIMARY F eq_ref PRIMARY PRIMARY 4 test.A2.id 1 100.00 Using where; Using index +3 DEPENDENT SUBQUERY t2 ref PRIMARY PRIMARY 4 test.F.id 2 100.00 Using index +Warnings: +Note 1276 Field or reference 'test.F.id' of SELECT #3 was resolved in SELECT #1 +Note 1003 select `F`.`id` AS `id` from `test`.`t0` `F` join `test`.`t2` `A2` where ((`F`.`id` = `A2`.`id`) and (`A2`.`attr2` between 12 and 14) and (`A2`.`fromdate` = (select max(`test`.`t2`.`fromdate`) AS `MAX(fromdate)` from `test`.`t2` where (`test`.`t2`.`id` = `F`.`id`)))) +drop view v1, v2; +drop table t0, t1, t2; +create table t1 (a int); +insert into t1 values (0),(1),(2),(3); +create table t2 (pk1 int, pk2 int, pk3 int, col int, primary key(pk1, pk2, pk3)); +insert into t2 select a,a,a,a from t1; +This must use only t1: +explain select t1.* from t1 left join t2 on t2.pk1=t1.a and +t2.pk2=t2.pk1+1 and +t2.pk3=t2.pk2+1; +id select_type table type possible_keys key key_len ref rows Extra +1 SIMPLE t1 ALL NULL NULL NULL NULL 4 +This must use only t1: +explain select t1.* from t1 left join t2 on t2.pk1=t1.a and +t2.pk3=t2.pk1+1 and +t2.pk2=t2.pk3+1; +id select_type table type possible_keys key key_len ref rows Extra +1 SIMPLE t1 ALL NULL NULL NULL NULL 4 +This must use both: +explain select t1.* from t1 left join t2 on t2.pk1=t1.a and +t2.pk3=t2.pk1+1 and +t2.pk2=t2.pk3+t2.col; +id select_type table type possible_keys key key_len ref rows Extra +1 SIMPLE t1 ALL NULL NULL NULL NULL 4 +1 SIMPLE t2 ref PRIMARY PRIMARY 4 test.t1.a 1 +This must use only t1: +explain select t1.* from t1 left join t2 on t2.pk2=t1.a and +t2.pk1=t2.pk2+1 and +t2.pk3=t2.pk1; +id select_type table type possible_keys key key_len ref rows Extra +1 SIMPLE t1 ALL NULL NULL NULL NULL 4 +drop table t1, t2; +create table t1 (pk int primary key, col int); +insert into t1 values (1,1),(2,2); +create table t2 like t1; +insert into t2 select * from t1; +create table t3 like t1; +insert into t3 select * from t1; +explain +select t1.* from t1 left join ( t2 left join t3 on t3.pk=t2.col) on t2.col=t1.col; +id select_type table type possible_keys key key_len ref rows Extra +1 SIMPLE t1 ALL NULL NULL NULL NULL 2 +1 SIMPLE t2 ALL NULL NULL NULL NULL 2 +explain +select t1.*, t2.* from t1 left join (t2 left join t3 on t3.pk=t2.col) on t2.pk=t1.col; +id select_type table type possible_keys key key_len ref rows Extra +1 SIMPLE t1 ALL NULL NULL NULL NULL 2 +1 SIMPLE t2 eq_ref PRIMARY PRIMARY 4 test.t1.col 1 +explain select t1.* +from +t1 left join ( t2 left join t3 on t3.pk=t2.col or t3.pk=t2.col) +on t2.col=t1.col or t2.col=t1.col; +id select_type table type possible_keys key key_len ref rows Extra +1 SIMPLE t1 ALL NULL NULL NULL NULL 2 +1 SIMPLE t2 ALL NULL NULL NULL NULL 2 +explain select t1.*, t2.* +from +t1 left join +(t2 left join t3 on t3.pk=t2.col or t3.pk=t2.col) +on t2.pk=t1.col or t2.pk=t1.col; +id select_type table type possible_keys key key_len ref rows Extra +1 SIMPLE t1 ALL NULL NULL NULL NULL 2 +1 SIMPLE t2 eq_ref PRIMARY PRIMARY 4 test.t1.col 1 +drop table t1, t2, t3; +# +# Check things that look like functional dependencies but really are not +# +create table t1 (a char(10) character set latin1 collate latin1_general_ci primary key); +insert into t1 values ('foo'); +insert into t1 values ('bar'); +create table t2 (a char(10) character set latin1 collate latin1_general_cs primary key); +insert into t2 values ('foo'); +insert into t2 values ('FOO'); +this must not use table elimination: +explain select t1.* from t1 left join t2 on t2.a='foo' collate latin1_general_ci; +id select_type table type possible_keys key key_len ref rows Extra +1 SIMPLE t1 index NULL PRIMARY 10 NULL 2 Using index +1 SIMPLE t2 index PRIMARY PRIMARY 10 NULL 2 Using index +this must not use table elimination: +explain select t1.* from t1 left join t2 on t2.a=t1.a collate latin1_general_ci; +id select_type table type possible_keys key key_len ref rows Extra +1 SIMPLE t1 index NULL PRIMARY 10 NULL 2 Using index +1 SIMPLE t2 index PRIMARY PRIMARY 10 NULL 2 Using index +drop table t1,t2; +create table t1 (a int primary key); +insert into t1 values (1),(2); +create table t2 (a char(10) primary key); +insert into t2 values ('1'),('1.0'); +this must not use table elimination: +explain select t1.* from t1 left join t2 on t2.a=1; +id select_type table type possible_keys key key_len ref rows Extra +1 SIMPLE t1 index NULL PRIMARY 4 NULL 2 Using index +1 SIMPLE t2 index PRIMARY PRIMARY 10 NULL 2 Using index +this must not use table elimination: +explain select t1.* from t1 left join t2 on t2.a=t1.a; +id select_type table type possible_keys key key_len ref rows Extra +1 SIMPLE t1 index NULL PRIMARY 4 NULL 2 Using index +1 SIMPLE t2 index PRIMARY PRIMARY 10 NULL 2 Using index +drop table t1, t2; +create table t1 (a char(10) primary key); +insert into t1 values ('foo'),('bar'); +create table t2 (a char(10), unique key(a(2))); +insert into t2 values ('foo'),('bar'); +explain select t1.* from t1 left join t2 on t2.a=t1.a; +id select_type table type possible_keys key key_len ref rows Extra +1 SIMPLE t1 index NULL PRIMARY 10 NULL 2 Using index +1 SIMPLE t2 ref a a 3 test.t1.a 2 +drop table t1, t2; +# +# check UPDATE/DELETE that look like they could be eliminated +# +create table t1 (a int primary key, b int); +insert into t1 values (1,1),(2,2),(3,3); +create table t2 like t1; +insert into t2 select * from t1; +update t1 left join t2 using (a) set t2.a=t2.a+100; +select * from t1; +a b +1 1 +2 2 +3 3 +select * from t2; +a b +101 1 +102 2 +103 3 +delete from t2; +insert into t2 select * from t1; +delete t2 from t1 left join t2 using (a); +select * from t1; +a b +1 1 +2 2 +3 3 +select * from t2; +a b +drop table t1, t2; +# +# Tests with various edge-case ON expressions +# +create table t1 (a int, b int, c int, d int); +insert into t1 values (0,0,0,0),(1,1,1,1),(2,2,2,2),(3,3,3,3); +create table t2 (pk int primary key, b int) +as select a as pk, a as b from t1 where a in (1,2); +create table t3 (pk1 int, pk2 int, b int, unique(pk1,pk2)); +insert into t3 select a as pk1, a as pk2, a as b from t1 where a in (1,3); +explain select t1.a from t1 left join t2 on t2.pk=t1.a and t2.b<t1.b; +id select_type table type possible_keys key key_len ref rows Extra +1 SIMPLE t1 ALL NULL NULL NULL NULL 4 +explain select t1.a from t1 left join t2 on t2.pk=t1.a or t2.b<t1.b; +id select_type table type possible_keys key key_len ref rows Extra +1 SIMPLE t1 ALL NULL NULL NULL NULL 4 +1 SIMPLE t2 ALL PRIMARY NULL NULL NULL 2 +explain select t1.a from t1 left join t2 on t2.b<t1.b or t2.pk=t1.a; +id select_type table type possible_keys key key_len ref rows Extra +1 SIMPLE t1 ALL NULL NULL NULL NULL 4 +1 SIMPLE t2 ALL PRIMARY NULL NULL NULL 2 +explain select t1.a from t1 left join t2 on t2.pk between 10 and 20; +id select_type table type possible_keys key key_len ref rows Extra +1 SIMPLE t1 ALL NULL NULL NULL NULL 4 +1 SIMPLE t2 index PRIMARY PRIMARY 4 NULL 2 Using index +explain select t1.a from t1 left join t2 on t2.pk between 0.5 and 1.5; +id select_type table type possible_keys key key_len ref rows Extra +1 SIMPLE t1 ALL NULL NULL NULL NULL 4 +1 SIMPLE t2 index PRIMARY PRIMARY 4 NULL 2 Using index +explain select t1.a from t1 left join t2 on t2.pk between 10 and 10; +id select_type table type possible_keys key key_len ref rows Extra +1 SIMPLE t1 ALL NULL NULL NULL NULL 4 +explain select t1.a from t1 left join t2 on t2.pk in (10); +id select_type table type possible_keys key key_len ref rows Extra +1 SIMPLE t1 ALL NULL NULL NULL NULL 4 +explain select t1.a from t1 left join t2 on t2.pk in (t1.a); +id select_type table type possible_keys key key_len ref rows Extra +1 SIMPLE t1 ALL NULL NULL NULL NULL 4 +explain select t1.a from t1 left join t2 on TRUE; +id select_type table type possible_keys key key_len ref rows Extra +1 SIMPLE t1 ALL NULL NULL NULL NULL 4 +1 SIMPLE t2 index NULL PRIMARY 4 NULL 2 Using index +explain select t1.a from t1 left join t3 on t3.pk1=t1.a and t3.pk2 IS NULL; +id select_type table type possible_keys key key_len ref rows Extra +1 SIMPLE t1 ALL NULL NULL NULL NULL 4 +drop table t1,t2,t3; +# +# Multi-equality tests +# +create table t1 (a int, b int, c int, d int); +insert into t1 values (0,0,0,0),(1,1,1,1),(2,2,2,2),(3,3,3,3); +create table t2 (pk int primary key, b int, c int); +insert into t2 select a,a,a from t1 where a in (1,2); +explain +select t1.* +from t1 left join t2 on t2.pk=t2.c and t2.b=t1.a and t1.a=t1.b and t2.c=t2.b +where t1.d=1; +id select_type table type possible_keys key key_len ref rows Extra +1 SIMPLE t1 ALL NULL NULL NULL NULL 4 Using where +explain +select t1.* +from +t1 +left join +t2 +on (t2.pk=t2.c and t2.b=t1.a and t1.a=t1.b and t2.c=t2.b) or +(t2.pk=t2.c and t2.b=t1.a and t1.a=t1.b and t2.c=t2.b) +where t1.d=1; +id select_type table type possible_keys key key_len ref rows Extra +1 SIMPLE t1 ALL NULL NULL NULL NULL 4 Using where +#This can't be eliminated: +explain +select t1.* +from +t1 +left join +t2 +on (t2.pk=t2.c and t2.b=t1.a and t2.c=t1.b) or +(t2.pk=t2.c and t1.a=t1.b and t2.c=t1.b) +where t1.d=1; +id select_type table type possible_keys key key_len ref rows Extra +1 SIMPLE t1 ALL NULL NULL NULL NULL 4 Using where +1 SIMPLE t2 eq_ref PRIMARY PRIMARY 4 test.t1.b 1 +explain +select t1.* +from +t1 +left join +t2 +on (t2.pk=t2.c and t2.b=t1.a and t2.c=t1.b) or +(t2.pk=t2.c and t2.c=t1.b) +; +id select_type table type possible_keys key key_len ref rows Extra +1 SIMPLE t1 ALL NULL NULL NULL NULL 4 +explain +select t1.* +from t1 left join t2 on t2.pk=3 or t2.pk= 4; +id select_type table type possible_keys key key_len ref rows Extra +1 SIMPLE t1 ALL NULL NULL NULL NULL 4 +1 SIMPLE t2 index PRIMARY PRIMARY 4 NULL 2 Using index +explain +select t1.* +from t1 left join t2 on t2.pk=3 or t2.pk= 3; +id select_type table type possible_keys key key_len ref rows Extra +1 SIMPLE t1 ALL NULL NULL NULL NULL 4 +explain +select t1.* +from t1 left join t2 on (t2.pk=3 and t2.b=3) or (t2.pk= 4 and t2.b=3); +id select_type table type possible_keys key key_len ref rows Extra +1 SIMPLE t1 ALL NULL NULL NULL NULL 4 +1 SIMPLE t2 range PRIMARY PRIMARY 4 NULL 2 Using where +drop table t1, t2; diff --git a/mysql-test/r/union.result b/mysql-test/r/union.result index 44a3812725a..d81e80c96f9 100644 --- a/mysql-test/r/union.result +++ b/mysql-test/r/union.result @@ -522,7 +522,7 @@ id select_type table type possible_keys key key_len ref rows filtered Extra 2 UNION t2 const PRIMARY PRIMARY 4 const 1 100.00 NULL UNION RESULT <union1,2> ALL NULL NULL NULL NULL NULL NULL Warnings: -Note 1003 (select '1' AS `a`,'1' AS `b` from `test`.`t1` where ('1' = 1)) union (select '1' AS `a`,'10' AS `b` from `test`.`t2` where ('1' = 1)) +Note 1003 (select '1' AS `a`,'1' AS `b` from `test`.`t1` where 1) union (select '1' AS `a`,'10' AS `b` from `test`.`t2` where 1) (select * from t1 where a=5) union (select * from t2 where a=1); a b 1 10 diff --git a/mysql-test/r/variables.result b/mysql-test/r/variables.result index 558e6d98339..4582ccddd81 100644 --- a/mysql-test/r/variables.result +++ b/mysql-test/r/variables.result @@ -865,7 +865,7 @@ select @@query_prealloc_size = @test; @@query_prealloc_size = @test 1 set global sql_mode=repeat('a',80); -ERROR 42000: Variable 'sql_mode' can't be set to the value of 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa' +ERROR 42000: Variable 'sql_mode' can't be set to the value of 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa' End of 4.1 tests create table t1 (a int); select a into @x from t1; diff --git a/mysql-test/suite/pbxt/r/alter_table.result b/mysql-test/suite/pbxt/r/alter_table.result index 467bb54a2a9..7f9ad9665fe 100644 --- a/mysql-test/suite/pbxt/r/alter_table.result +++ b/mysql-test/suite/pbxt/r/alter_table.result @@ -126,23 +126,23 @@ key (n4, n1, n2, n3) ); alter table t1 disable keys; show keys from t1; Table Non_unique Key_name Seq_in_index Column_name Collation Cardinality Sub_part Packed Null Index_type Comment -t1 0 n1 1 n1 NULL 0 NULL NULL BTREE -t1 1 n1_2 1 n1 NULL NULL NULL NULL BTREE -t1 1 n1_2 2 n2 NULL NULL NULL NULL YES BTREE -t1 1 n1_2 3 n3 NULL NULL NULL NULL YES BTREE -t1 1 n1_2 4 n4 NULL NULL NULL NULL YES BTREE -t1 1 n2 1 n2 NULL NULL NULL NULL YES BTREE -t1 1 n2 2 n3 NULL NULL NULL NULL YES BTREE -t1 1 n2 3 n4 NULL NULL NULL NULL YES BTREE -t1 1 n2 4 n1 NULL NULL NULL NULL BTREE -t1 1 n3 1 n3 NULL NULL NULL NULL YES BTREE -t1 1 n3 2 n4 NULL NULL NULL NULL YES BTREE -t1 1 n3 3 n1 NULL NULL NULL NULL BTREE -t1 1 n3 4 n2 NULL NULL NULL NULL YES BTREE -t1 1 n4 1 n4 NULL NULL NULL NULL YES BTREE -t1 1 n4 2 n1 NULL NULL NULL NULL BTREE -t1 1 n4 3 n2 NULL NULL NULL NULL YES BTREE -t1 1 n4 4 n3 NULL NULL NULL NULL YES BTREE +t1 0 n1 1 n1 A 0 NULL NULL BTREE +t1 1 n1_2 1 n1 A 0 NULL NULL BTREE +t1 1 n1_2 2 n2 A 0 NULL NULL YES BTREE +t1 1 n1_2 3 n3 A 0 NULL NULL YES BTREE +t1 1 n1_2 4 n4 A 0 NULL NULL YES BTREE +t1 1 n2 1 n2 A 0 NULL NULL YES BTREE +t1 1 n2 2 n3 A 0 NULL NULL YES BTREE +t1 1 n2 3 n4 A 0 NULL NULL YES BTREE +t1 1 n2 4 n1 A 0 NULL NULL BTREE +t1 1 n3 1 n3 A 0 NULL NULL YES BTREE +t1 1 n3 2 n4 A 0 NULL NULL YES BTREE +t1 1 n3 3 n1 A 0 NULL NULL BTREE +t1 1 n3 4 n2 A 0 NULL NULL YES BTREE +t1 1 n4 1 n4 A 0 NULL NULL YES BTREE +t1 1 n4 2 n1 A 0 NULL NULL BTREE +t1 1 n4 3 n2 A 0 NULL NULL YES BTREE +t1 1 n4 4 n3 A 0 NULL NULL YES BTREE insert into t1 values(10,RAND()*1000,RAND()*1000,RAND()); insert into t1 values(9,RAND()*1000,RAND()*1000,RAND()); insert into t1 values(8,RAND()*1000,RAND()*1000,RAND()); @@ -156,23 +156,23 @@ insert into t1 values(1,RAND()*1000,RAND()*1000,RAND()); alter table t1 enable keys; show keys from t1; Table Non_unique Key_name Seq_in_index Column_name Collation Cardinality Sub_part Packed Null Index_type Comment -t1 0 n1 1 n1 NULL 10 NULL NULL BTREE -t1 1 n1_2 1 n1 NULL NULL NULL NULL BTREE -t1 1 n1_2 2 n2 NULL NULL NULL NULL YES BTREE -t1 1 n1_2 3 n3 NULL NULL NULL NULL YES BTREE -t1 1 n1_2 4 n4 NULL NULL NULL NULL YES BTREE -t1 1 n2 1 n2 NULL NULL NULL NULL YES BTREE -t1 1 n2 2 n3 NULL NULL NULL NULL YES BTREE -t1 1 n2 3 n4 NULL NULL NULL NULL YES BTREE -t1 1 n2 4 n1 NULL NULL NULL NULL BTREE -t1 1 n3 1 n3 NULL NULL NULL NULL YES BTREE -t1 1 n3 2 n4 NULL NULL NULL NULL YES BTREE -t1 1 n3 3 n1 NULL NULL NULL NULL BTREE -t1 1 n3 4 n2 NULL NULL NULL NULL YES BTREE -t1 1 n4 1 n4 NULL NULL NULL NULL YES BTREE -t1 1 n4 2 n1 NULL NULL NULL NULL BTREE -t1 1 n4 3 n2 NULL NULL NULL NULL YES BTREE -t1 1 n4 4 n3 NULL NULL NULL NULL YES BTREE +t1 0 n1 1 n1 A 10 NULL NULL BTREE +t1 1 n1_2 1 n1 A 10 NULL NULL BTREE +t1 1 n1_2 2 n2 A 10 NULL NULL YES BTREE +t1 1 n1_2 3 n3 A 10 NULL NULL YES BTREE +t1 1 n1_2 4 n4 A 10 NULL NULL YES BTREE +t1 1 n2 1 n2 A 10 NULL NULL YES BTREE +t1 1 n2 2 n3 A 10 NULL NULL YES BTREE +t1 1 n2 3 n4 A 10 NULL NULL YES BTREE +t1 1 n2 4 n1 A 10 NULL NULL BTREE +t1 1 n3 1 n3 A 10 NULL NULL YES BTREE +t1 1 n3 2 n4 A 10 NULL NULL YES BTREE +t1 1 n3 3 n1 A 10 NULL NULL BTREE +t1 1 n3 4 n2 A 10 NULL NULL YES BTREE +t1 1 n4 1 n4 A 10 NULL NULL YES BTREE +t1 1 n4 2 n1 A 10 NULL NULL BTREE +t1 1 n4 3 n2 A 10 NULL NULL YES BTREE +t1 1 n4 4 n3 A 10 NULL NULL YES BTREE drop table t1; create table t1 (i int unsigned not null auto_increment primary key); alter table t1 rename t2; @@ -286,17 +286,17 @@ insert into t1 values(1,1), (2,1), (3, 1); alter table t1 add unique (a,b), add key (b); show keys from t1; Table Non_unique Key_name Seq_in_index Column_name Collation Cardinality Sub_part Packed Null Index_type Comment -t1 0 a 1 a A NULL NULL NULL YES BTREE -t1 0 a 2 b A NULL NULL NULL YES BTREE -t1 1 b 1 b A NULL NULL NULL YES BTREE +t1 0 a 1 a A 300 NULL NULL YES BTREE +t1 0 a 2 b A 300 NULL NULL YES BTREE +t1 1 b 1 b A 300 NULL NULL YES BTREE analyze table t1; Table Op Msg_type Msg_text test.t1 analyze status OK show keys from t1; Table Non_unique Key_name Seq_in_index Column_name Collation Cardinality Sub_part Packed Null Index_type Comment -t1 0 a 1 a A NULL NULL NULL YES BTREE -t1 0 a 2 b A NULL NULL NULL YES BTREE -t1 1 b 1 b A NULL NULL NULL YES BTREE +t1 0 a 1 a A 300 NULL NULL YES BTREE +t1 0 a 2 b A 300 NULL NULL YES BTREE +t1 1 b 1 b A 300 NULL NULL YES BTREE drop table t1; CREATE TABLE t1 (i int(10), index(i) ); ALTER TABLE t1 DISABLE KEYS; @@ -545,37 +545,37 @@ drop table if exists t1; create table t1 (a int, key(a)); show indexes from t1; Table Non_unique Key_name Seq_in_index Column_name Collation Cardinality Sub_part Packed Null Index_type Comment -t1 1 a 1 a A NULL NULL NULL YES BTREE +t1 1 a 1 a A 0 NULL NULL YES BTREE "this used not to disable the index" alter table t1 modify a int, disable keys; show indexes from t1; Table Non_unique Key_name Seq_in_index Column_name Collation Cardinality Sub_part Packed Null Index_type Comment -t1 1 a 1 a A NULL NULL NULL YES BTREE +t1 1 a 1 a A 0 NULL NULL YES BTREE alter table t1 enable keys; show indexes from t1; Table Non_unique Key_name Seq_in_index Column_name Collation Cardinality Sub_part Packed Null Index_type Comment -t1 1 a 1 a NULL NULL NULL NULL YES BTREE +t1 1 a 1 a A 0 NULL NULL YES BTREE alter table t1 modify a bigint, disable keys; show indexes from t1; Table Non_unique Key_name Seq_in_index Column_name Collation Cardinality Sub_part Packed Null Index_type Comment -t1 1 a 1 a A NULL NULL NULL YES BTREE +t1 1 a 1 a A 0 NULL NULL YES BTREE alter table t1 enable keys; show indexes from t1; Table Non_unique Key_name Seq_in_index Column_name Collation Cardinality Sub_part Packed Null Index_type Comment -t1 1 a 1 a NULL NULL NULL NULL YES BTREE +t1 1 a 1 a A 0 NULL NULL YES BTREE alter table t1 add b char(10), disable keys; show indexes from t1; Table Non_unique Key_name Seq_in_index Column_name Collation Cardinality Sub_part Packed Null Index_type Comment -t1 1 a 1 a A NULL NULL NULL YES BTREE +t1 1 a 1 a A 0 NULL NULL YES BTREE alter table t1 add c decimal(10,2), enable keys; show indexes from t1; Table Non_unique Key_name Seq_in_index Column_name Collation Cardinality Sub_part Packed Null Index_type Comment -t1 1 a 1 a A NULL NULL NULL YES BTREE +t1 1 a 1 a A 0 NULL NULL YES BTREE "this however did" alter table t1 disable keys; show indexes from t1; Table Non_unique Key_name Seq_in_index Column_name Collation Cardinality Sub_part Packed Null Index_type Comment -t1 1 a 1 a NULL NULL NULL NULL YES BTREE +t1 1 a 1 a A 0 NULL NULL YES BTREE desc t1; Field Type Null Key Default Extra a bigint(20) YES MUL NULL @@ -585,7 +585,7 @@ alter table t1 add d decimal(15,5); "The key should still be disabled" show indexes from t1; Table Non_unique Key_name Seq_in_index Column_name Collation Cardinality Sub_part Packed Null Index_type Comment -t1 1 a 1 a A NULL NULL NULL YES BTREE +t1 1 a 1 a A 0 NULL NULL YES BTREE drop table t1; "Now will test with one unique index" create table t1(a int, b char(10), unique(a)); @@ -595,7 +595,7 @@ t1 0 a 1 a A 0 NULL NULL YES BTREE alter table t1 disable keys; show indexes from t1; Table Non_unique Key_name Seq_in_index Column_name Collation Cardinality Sub_part Packed Null Index_type Comment -t1 0 a 1 a NULL 0 NULL NULL YES BTREE +t1 0 a 1 a A 0 NULL NULL YES BTREE alter table t1 enable keys; "If no copy on noop change, this won't touch the data file" "Unique index, no change" @@ -623,12 +623,12 @@ create table t1(a int, b char(10), unique(a), key(b)); show indexes from t1; Table Non_unique Key_name Seq_in_index Column_name Collation Cardinality Sub_part Packed Null Index_type Comment t1 0 a 1 a A 0 NULL NULL YES BTREE -t1 1 b 1 b A NULL NULL NULL YES BTREE +t1 1 b 1 b A 0 NULL NULL YES BTREE alter table t1 disable keys; show indexes from t1; Table Non_unique Key_name Seq_in_index Column_name Collation Cardinality Sub_part Packed Null Index_type Comment -t1 0 a 1 a NULL 0 NULL NULL YES BTREE -t1 1 b 1 b NULL NULL NULL NULL YES BTREE +t1 0 a 1 a A 0 NULL NULL YES BTREE +t1 1 b 1 b A 0 NULL NULL YES BTREE alter table t1 enable keys; "If no copy on noop change, this won't touch the data file" "The non-unique index will be disabled" @@ -636,31 +636,31 @@ alter table t1 modify a int, disable keys; show indexes from t1; Table Non_unique Key_name Seq_in_index Column_name Collation Cardinality Sub_part Packed Null Index_type Comment t1 0 a 1 a A 0 NULL NULL YES BTREE -t1 1 b 1 b A NULL NULL NULL YES BTREE +t1 1 b 1 b A 0 NULL NULL YES BTREE alter table t1 enable keys; show indexes from t1; Table Non_unique Key_name Seq_in_index Column_name Collation Cardinality Sub_part Packed Null Index_type Comment -t1 0 a 1 a NULL 0 NULL NULL YES BTREE -t1 1 b 1 b NULL NULL NULL NULL YES BTREE +t1 0 a 1 a A 0 NULL NULL YES BTREE +t1 1 b 1 b A 0 NULL NULL YES BTREE "Change the type implying data copy" "The non-unique index will be disabled" alter table t1 modify a bigint, disable keys; show indexes from t1; Table Non_unique Key_name Seq_in_index Column_name Collation Cardinality Sub_part Packed Null Index_type Comment t1 0 a 1 a A 0 NULL NULL YES BTREE -t1 1 b 1 b A NULL NULL NULL YES BTREE +t1 1 b 1 b A 0 NULL NULL YES BTREE "Change again the type, but leave the indexes as_is" alter table t1 modify a int; show indexes from t1; Table Non_unique Key_name Seq_in_index Column_name Collation Cardinality Sub_part Packed Null Index_type Comment t1 0 a 1 a A 0 NULL NULL YES BTREE -t1 1 b 1 b A NULL NULL NULL YES BTREE +t1 1 b 1 b A 0 NULL NULL YES BTREE "Try the same. When data is no copied on similar tables, this is noop" alter table t1 modify a int; show indexes from t1; Table Non_unique Key_name Seq_in_index Column_name Collation Cardinality Sub_part Packed Null Index_type Comment t1 0 a 1 a A 0 NULL NULL YES BTREE -t1 1 b 1 b A NULL NULL NULL YES BTREE +t1 1 b 1 b A 0 NULL NULL YES BTREE drop table t1; create database mysqltest; create table t1 (c1 int); @@ -697,11 +697,11 @@ DROP TABLE IF EXISTS bug24219_2; CREATE TABLE bug24219 (a INT, INDEX(a)); SHOW INDEX FROM bug24219; Table Non_unique Key_name Seq_in_index Column_name Collation Cardinality Sub_part Packed Null Index_type Comment -bug24219 1 a 1 a A NULL NULL NULL YES BTREE +bug24219 1 a 1 a A 0 NULL NULL YES BTREE ALTER TABLE bug24219 RENAME TO bug24219_2, DISABLE KEYS; SHOW INDEX FROM bug24219_2; Table Non_unique Key_name Seq_in_index Column_name Collation Cardinality Sub_part Packed Null Index_type Comment -bug24219_2 1 a 1 a A NULL NULL NULL YES BTREE +bug24219_2 1 a 1 a A 0 NULL NULL YES BTREE DROP TABLE bug24219_2; create table t1 (mycol int(10) not null); alter table t1 alter column mycol set default 0; @@ -882,7 +882,7 @@ int_field int(10) unsigned NO MUL NULL char_field char(10) YES NULL SHOW INDEXES FROM t1; Table Non_unique Key_name Seq_in_index Column_name Collation Cardinality Sub_part Packed Null Index_type Comment -t1 1 int_field 1 int_field A NULL NULL NULL BTREE +t1 1 int_field 1 int_field A 0 NULL NULL BTREE INSERT INTO t1 VALUES (1, "edno"), (1, "edno"), (2, "dve"), (3, "tri"), (5, "pet"); "Non-copy data change - new frm, but old data and index files" ALTER TABLE t1 diff --git a/mysql-test/suite/pbxt/r/analyze.result b/mysql-test/suite/pbxt/r/analyze.result index 4e769b6c5b5..d2e7fc29d3a 100644 --- a/mysql-test/suite/pbxt/r/analyze.result +++ b/mysql-test/suite/pbxt/r/analyze.result @@ -56,5 +56,5 @@ Table Op Msg_type Msg_text test.t1 analyze status OK show index from t1; Table Non_unique Key_name Seq_in_index Column_name Collation Cardinality Sub_part Packed Null Index_type Comment -t1 1 a 1 a A NULL NULL NULL YES BTREE +t1 1 a 1 a A 5 NULL NULL YES BTREE drop table t1; diff --git a/mysql-test/suite/pbxt/r/auto_increment.result b/mysql-test/suite/pbxt/r/auto_increment.result index a945ecebbcc..51c272a4414 100644 --- a/mysql-test/suite/pbxt/r/auto_increment.result +++ b/mysql-test/suite/pbxt/r/auto_increment.result @@ -229,7 +229,8 @@ a b 204 7 delete from t1 where a=0; update t1 set a=NULL where b=6; -ERROR 23000: Column 'a' cannot be null +Warnings: +Warning 1048 Column 'a' cannot be null update t1 set a=300 where b=7; SET SQL_MODE=''; insert into t1(a,b)values(NULL,8); @@ -244,7 +245,7 @@ a b 1 1 200 2 201 4 -203 6 +0 6 300 7 301 8 400 9 @@ -260,7 +261,6 @@ a b 1 1 200 2 201 4 -203 6 300 7 301 8 400 9 @@ -271,20 +271,20 @@ a b 405 14 delete from t1 where a=0; update t1 set a=NULL where b=13; -ERROR 23000: Column 'a' cannot be null +Warnings: +Warning 1048 Column 'a' cannot be null update t1 set a=500 where b=14; select * from t1 order by b; a b 1 1 200 2 201 4 -203 6 300 7 301 8 400 9 401 10 402 11 -404 13 +0 13 500 14 drop table t1; create table t1 (a bigint); diff --git a/mysql-test/suite/pbxt/r/delete.result b/mysql-test/suite/pbxt/r/delete.result index 9d337a1ed34..eb4a4ae78d5 100644 --- a/mysql-test/suite/pbxt/r/delete.result +++ b/mysql-test/suite/pbxt/r/delete.result @@ -125,18 +125,19 @@ a b 0 11 2 12 delete ignore t11.*, t12.* from t11,t12 where t11.a = t12.a and t11.b <> (select b from t2 where t11.a < t2.a); -Warnings: -Error 1242 Subquery returns more than 1 row -Error 1242 Subquery returns more than 1 row +ERROR 21000: Subquery returns more than 1 row select * from t11; a b 0 10 1 11 +2 12 select * from t12; a b 33 10 0 11 +2 12 insert into t11 values (2, 12); +ERROR 23000: Duplicate entry '2' for key 'PRIMARY' delete from t11 where t11.b <> (select b from t2 where t11.a < t2.a); ERROR 21000: Subquery returns more than 1 row select * from t11; @@ -145,13 +146,12 @@ a b 1 11 2 12 delete ignore from t11 where t11.b <> (select b from t2 where t11.a < t2.a); -Warnings: -Error 1242 Subquery returns more than 1 row -Error 1242 Subquery returns more than 1 row +ERROR 21000: Subquery returns more than 1 row select * from t11; a b 0 10 1 11 +2 12 drop table t11, t12, t2; create table t1 (a int, b int, unique key (a), key (b)); insert into t1 values (3, 3), (7, 7); diff --git a/mysql-test/suite/pbxt/r/distinct.result b/mysql-test/suite/pbxt/r/distinct.result index 7da52ea4bd1..c3e8342a6e1 100644 --- a/mysql-test/suite/pbxt/r/distinct.result +++ b/mysql-test/suite/pbxt/r/distinct.result @@ -174,8 +174,8 @@ INSERT INTO t3 VALUES (1,'1'),(2,'2'),(1,'1'),(2,'2'); explain SELECT distinct t3.a FROM t3,t2,t1 WHERE t3.a=t1.b AND t1.a=t2.a; id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE t1 ALL PRIMARY NULL NULL NULL 4 Using temporary -1 SIMPLE t3 ref a a 5 test.t1.b 2 Using where; Using index -1 SIMPLE t2 index a a 4 NULL 5 Using where; Using index; Distinct; Using join buffer +1 SIMPLE t2 ref a a 4 test.t1.a 1 Using index +1 SIMPLE t3 ref a a 5 test.t1.b 1 Using where; Using index SELECT distinct t3.a FROM t3,t2,t1 WHERE t3.a=t1.b AND t1.a=t2.a; a 1 @@ -190,7 +190,7 @@ insert into t3 select * from t4; explain select distinct t1.a from t1,t3 where t1.a=t3.a; id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE t1 index PRIMARY PRIMARY 4 NULL 4 Using index; Using temporary -1 SIMPLE t3 ref a a 5 test.t1.a 11 Using where; Using index; Distinct +1 SIMPLE t3 ref a a 5 test.t1.a 1 Using where; Using index; Distinct select distinct t1.a from t1,t3 where t1.a=t3.a; a 1 @@ -212,7 +212,7 @@ id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE t1 index NULL PRIMARY 4 NULL 1 Using index explain SELECT distinct a from t3 order by a desc limit 2; id select_type table type possible_keys key key_len ref rows Extra -1 SIMPLE t3 index NULL a 5 NULL 40 Using index +1 SIMPLE t3 index NULL a 5 NULL 2 Using index explain SELECT distinct a,b from t3 order by a+1; id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE t3 ALL NULL NULL NULL NULL 204 Using temporary; Using filesort diff --git a/mysql-test/suite/pbxt/r/func_group.result b/mysql-test/suite/pbxt/r/func_group.result index d5f804dee03..d1a0d09ad09 100644 --- a/mysql-test/suite/pbxt/r/func_group.result +++ b/mysql-test/suite/pbxt/r/func_group.result @@ -61,7 +61,7 @@ grp sum NULL NULL 1 7 2 20.25 -3 45.483163247594 +3 45.4831632475944 create table t2 (grp int, a bigint unsigned, c char(10)); insert into t2 select grp,max(a)+max(grp),max(c) from t1 group by grp; replace into t2 select grp, a, c from t1 limit 2,1; @@ -613,8 +613,8 @@ id select_type table type possible_keys key key_len ref rows Extra explain select max(t1.a3), min(t2.a2) from t1, t2 where t1.a2 = 2 and t1.a3 < 'MIN' and t2.a3 > 'CA'; id select_type table type possible_keys key key_len ref rows Extra -1 SIMPLE t2 range k1 k1 3 NULL 1 Using where; Using index -1 SIMPLE t1 range k1 k1 7 NULL 1 Using where; Using index; Using join buffer +1 SIMPLE t1 range k1 k1 7 NULL 1 Using where; Using index +1 SIMPLE t2 range k1 k1 3 NULL 1 Using where; Using index; Using join buffer explain select min(a4 - 0.01) from t1; id select_type table type possible_keys key key_len ref rows Extra @@ -1186,7 +1186,7 @@ std(s1/s2) 0.21325764 select std(o1/o2) from bug22555; std(o1/o2) -0.21325763586649 +0.213257635866493 select std(e1/e2) from bug22555; std(e1/e2) 0.21325764 @@ -1212,7 +1212,7 @@ round(std(s1/s2), 17) 0.21325763586649341 select std(o1/o2) from bug22555; std(o1/o2) -0.21325763586649 +0.213257635866493 select round(std(e1/e2), 17) from bug22555; round(std(e1/e2), 17) 0.21325763586649341 @@ -1237,7 +1237,7 @@ round(std(s1/s2), 17) 0.21325763586649341 select std(o1/o2) from bug22555; std(o1/o2) -0.21325763586649 +0.213257635866493 select round(std(e1/e2), 17) from bug22555; round(std(e1/e2), 17) 0.21325763586649341 diff --git a/mysql-test/suite/pbxt/r/func_math.result b/mysql-test/suite/pbxt/r/func_math.result index d1cdb7cb76e..d4ce452ef51 100644 --- a/mysql-test/suite/pbxt/r/func_math.result +++ b/mysql-test/suite/pbxt/r/func_math.result @@ -60,7 +60,7 @@ Warnings: Note 1003 select ln(exp(10)) AS `ln(exp(10))`,exp((ln(sqrt(10)) * 2)) AS `exp(ln(sqrt(10))*2)`,ln(-(1)) AS `ln(-1)`,ln(0) AS `ln(0)`,ln(NULL) AS `ln(NULL)` select log2(8),log2(15),log2(-2),log2(0),log2(NULL); log2(8) log2(15) log2(-2) log2(0) log2(NULL) -3 3.9068905956085 NULL NULL NULL +3 3.90689059560852 NULL NULL NULL explain extended select log2(8),log2(15),log2(-2),log2(0),log2(NULL); id select_type table type possible_keys key key_len ref rows filtered Extra 1 SIMPLE NULL NULL NULL NULL NULL NULL NULL NULL No tables used @@ -68,7 +68,7 @@ Warnings: Note 1003 select log2(8) AS `log2(8)`,log2(15) AS `log2(15)`,log2(-(2)) AS `log2(-2)`,log2(0) AS `log2(0)`,log2(NULL) AS `log2(NULL)` select log10(100),log10(18),log10(-4),log10(0),log10(NULL); log10(100) log10(18) log10(-4) log10(0) log10(NULL) -2 1.2552725051033 NULL NULL NULL +2 1.25527250510331 NULL NULL NULL explain extended select log10(100),log10(18),log10(-4),log10(0),log10(NULL); id select_type table type possible_keys key key_len ref rows filtered Extra 1 SIMPLE NULL NULL NULL NULL NULL NULL NULL NULL No tables used @@ -85,7 +85,7 @@ Note 1003 select pow(10,log10(10)) AS `pow(10,log10(10))`,pow(2,4) AS `power(2,4 set @@rand_seed1=10000000,@@rand_seed2=1000000; select rand(999999),rand(); rand(999999) rand() -0.014231365187309 0.028870999839968 +0.0142313651873091 0.028870999839968 explain extended select rand(999999),rand(); id select_type table type possible_keys key key_len ref rows filtered Extra 1 SIMPLE NULL NULL NULL NULL NULL NULL NULL NULL No tables used @@ -101,7 +101,7 @@ Warnings: Note 1003 select pi() AS `pi()`,format(sin((pi() / 2)),6) AS `format(sin(pi()/2),6)`,format(cos((pi() / 2)),6) AS `format(cos(pi()/2),6)`,format(abs(tan(pi())),6) AS `format(abs(tan(pi())),6)`,format((1 / tan(1)),6) AS `format(cot(1),6)`,format(asin(1),6) AS `format(asin(1),6)`,format(acos(0),6) AS `format(acos(0),6)`,format(atan(1),6) AS `format(atan(1),6)` select degrees(pi()),radians(360); degrees(pi()) radians(360) -180 6.2831853071796 +180 6.28318530717959 select format(atan(-2, 2), 6); format(atan(-2, 2), 6) -0.785398 diff --git a/mysql-test/suite/pbxt/r/func_str.result b/mysql-test/suite/pbxt/r/func_str.result index 59d7b23f9df..b783abb466f 100644 --- a/mysql-test/suite/pbxt/r/func_str.result +++ b/mysql-test/suite/pbxt/r/func_str.result @@ -1327,10 +1327,10 @@ cast(rtrim(ltrim(' 20.06 ')) as decimal(19,2)) 20.06 select conv("18383815659218730760",10,10) + 0; conv("18383815659218730760",10,10) + 0 -1.8383815659219e+19 +1.83838156592187e+19 select "18383815659218730760" + 0; "18383815659218730760" + 0 -1.8383815659219e+19 +1.83838156592187e+19 CREATE TABLE t1 (code varchar(10)); INSERT INTO t1 VALUES ('a12'), ('A12'), ('a13'); SELECT ASCII(code), code FROM t1 WHERE code='A12'; diff --git a/mysql-test/suite/pbxt/r/grant.result b/mysql-test/suite/pbxt/r/grant.result index 24a2b9b4d55..94f89f2fd87 100644 --- a/mysql-test/suite/pbxt/r/grant.result +++ b/mysql-test/suite/pbxt/r/grant.result @@ -457,7 +457,7 @@ Privilege Context Comment Alter Tables To alter the table Alter routine Functions,Procedures To alter or drop stored functions/procedures Create Databases,Tables,Indexes To create new databases and tables -Create routine Functions,Procedures To use CREATE FUNCTION/PROCEDURE +Create routine Databases To use CREATE FUNCTION/PROCEDURE Create temporary tables Databases To use CREATE TEMPORARY TABLE Create view Tables To create new views Create user Server Admin To create new users diff --git a/mysql-test/suite/pbxt/r/group_min_max.result b/mysql-test/suite/pbxt/r/group_min_max.result index e8bd3d98834..a1e2ce15743 100644 --- a/mysql-test/suite/pbxt/r/group_min_max.result +++ b/mysql-test/suite/pbxt/r/group_min_max.result @@ -133,34 +133,34 @@ Table Op Msg_type Msg_text test.t3 analyze status OK explain select a1, min(a2) from t1 group by a1; id select_type table type possible_keys key key_len ref rows Extra -1 SIMPLE t1 range NULL idx_t1_1 130 NULL 10 Using index for group-by +1 SIMPLE t1 range NULL idx_t1_1 130 NULL 129 Using index for group-by explain select a1, max(a2) from t1 group by a1; id select_type table type possible_keys key key_len ref rows Extra -1 SIMPLE t1 range NULL idx_t1_1 65 NULL 10 Using index for group-by +1 SIMPLE t1 range NULL idx_t1_1 65 NULL 129 Using index for group-by explain select a1, min(a2), max(a2) from t1 group by a1; id select_type table type possible_keys key key_len ref rows Extra -1 SIMPLE t1 range NULL idx_t1_1 130 NULL 10 Using index for group-by +1 SIMPLE t1 range NULL idx_t1_1 130 NULL 129 Using index for group-by explain select a1, a2, b, min(c), max(c) from t1 group by a1,a2,b; id select_type table type possible_keys key key_len ref rows Extra -1 SIMPLE t1 range NULL idx_t1_1 147 NULL 10 Using index for group-by +1 SIMPLE t1 range NULL idx_t1_1 147 NULL 129 Using index for group-by explain select a1,a2,b,max(c),min(c) from t1 group by a1,a2,b; id select_type table type possible_keys key key_len ref rows Extra -1 SIMPLE t1 range NULL idx_t1_1 147 NULL 10 Using index for group-by +1 SIMPLE t1 range NULL idx_t1_1 147 NULL 129 Using index for group-by explain select a1,a2,b,max(c),min(c) from t2 group by a1,a2,b; id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE t2 range NULL idx_t2_1 # NULL # Using index for group-by explain select min(a2), a1, max(a2), min(a2), a1 from t1 group by a1; id select_type table type possible_keys key key_len ref rows Extra -1 SIMPLE t1 range NULL idx_t1_1 130 NULL 10 Using index for group-by +1 SIMPLE t1 range NULL idx_t1_1 130 NULL 129 Using index for group-by explain select a1, b, min(c), a1, max(c), b, a2, max(c), max(c) from t1 group by a1, a2, b; id select_type table type possible_keys key key_len ref rows Extra -1 SIMPLE t1 range NULL idx_t1_1 147 NULL 10 Using index for group-by +1 SIMPLE t1 range NULL idx_t1_1 147 NULL 129 Using index for group-by explain select min(a2) from t1 group by a1; id select_type table type possible_keys key key_len ref rows Extra -1 SIMPLE t1 range NULL idx_t1_1 130 NULL 10 Using index for group-by +1 SIMPLE t1 range NULL idx_t1_1 130 NULL 129 Using index for group-by explain select a2, min(c), max(c) from t1 group by a1,a2,b; id select_type table type possible_keys key key_len ref rows Extra -1 SIMPLE t1 range NULL idx_t1_1 147 NULL 10 Using index for group-by +1 SIMPLE t1 range NULL idx_t1_1 147 NULL 129 Using index for group-by select a1, min(a2) from t1 group by a1; a1 min(a2) a a @@ -293,13 +293,13 @@ id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE t1 range idx_t1_0,idx_t1_1,idx_t1_2 idx_t1_2 65 NULL 1 Using where explain select a1,a2,b, max(c) from t1 where a1 >= 'c' or a1 < 'b' group by a1,a2,b; id select_type table type possible_keys key key_len ref rows Extra -1 SIMPLE t1 range idx_t1_0,idx_t1_1,idx_t1_2 idx_t1_1 147 NULL 1 Using where; Using index for group-by +1 SIMPLE t1 range idx_t1_0,idx_t1_1,idx_t1_2 idx_t1_1 147 NULL 2 Using where; Using index for group-by explain select a1, max(c) from t1 where a1 >= 'c' or a1 < 'b' group by a1,a2,b; id select_type table type possible_keys key key_len ref rows Extra -1 SIMPLE t1 range idx_t1_0,idx_t1_1,idx_t1_2 idx_t1_1 147 NULL 1 Using where; Using index for group-by +1 SIMPLE t1 range idx_t1_0,idx_t1_1,idx_t1_2 idx_t1_1 147 NULL 2 Using where; Using index for group-by explain select a1,a2,b,min(c),max(c) from t1 where a1 >= 'c' or a2 < 'b' group by a1,a2,b; id select_type table type possible_keys key key_len ref rows Extra -1 SIMPLE t1 range idx_t1_0,idx_t1_1,idx_t1_2 idx_t1_1 147 NULL 10 Using where; Using index for group-by +1 SIMPLE t1 range idx_t1_0,idx_t1_1,idx_t1_2 idx_t1_1 147 NULL 129 Using where; Using index for group-by explain select a1,a2,b, max(c) from t1 where a1 = 'z' or a1 = 'b' or a1 = 'd' group by a1,a2,b; id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE t1 range idx_t1_0,idx_t1_1,idx_t1_2 idx_t1_1 65 NULL 3 Using where; Using index @@ -669,40 +669,40 @@ d l421 d p422 explain select a1,a2,b,max(c),min(c) from t1 where (a2 = 'a') and (b = 'b') group by a1; id select_type table type possible_keys key key_len ref rows Extra -1 SIMPLE t1 range NULL idx_t1_1 147 NULL 10 Using where; Using index for group-by +1 SIMPLE t1 range NULL idx_t1_1 147 NULL 129 Using where; Using index for group-by explain select a1,max(c),min(c) from t1 where (a2 = 'a') and (b = 'b') group by a1; id select_type table type possible_keys key key_len ref rows Extra -1 SIMPLE t1 range NULL idx_t1_1 147 NULL 10 Using where; Using index for group-by +1 SIMPLE t1 range NULL idx_t1_1 147 NULL 129 Using where; Using index for group-by explain select a1,a2,b, max(c) from t1 where (b = 'b') group by a1,a2; id select_type table type possible_keys key key_len ref rows Extra -1 SIMPLE t1 range NULL idx_t1_1 147 NULL 10 Using where; Using index for group-by +1 SIMPLE t1 range NULL idx_t1_1 147 NULL 129 Using where; Using index for group-by explain select a1,a2,b,min(c),max(c) from t1 where (b = 'b') group by a1,a2; id select_type table type possible_keys key key_len ref rows Extra -1 SIMPLE t1 range NULL idx_t1_1 147 NULL 10 Using where; Using index for group-by +1 SIMPLE t1 range NULL idx_t1_1 147 NULL 129 Using where; Using index for group-by explain select a1,a2, max(c) from t1 where (b = 'b') group by a1,a2; id select_type table type possible_keys key key_len ref rows Extra -1 SIMPLE t1 range NULL idx_t1_1 147 NULL 10 Using where; Using index for group-by +1 SIMPLE t1 range NULL idx_t1_1 147 NULL 129 Using where; Using index for group-by explain select a1,a2,b,max(c),min(c) from t2 where (a2 = 'a') and (b = 'b') group by a1; id select_type table type possible_keys key key_len ref rows Extra -1 SIMPLE t2 range NULL idx_t2_1 163 NULL 10 Using where; Using index for group-by +1 SIMPLE t2 range NULL idx_t2_1 163 NULL 165 Using where; Using index for group-by explain select a1,max(c),min(c) from t2 where (a2 = 'a') and (b = 'b') group by a1; id select_type table type possible_keys key key_len ref rows Extra -1 SIMPLE t2 range NULL idx_t2_1 163 NULL 10 Using where; Using index for group-by +1 SIMPLE t2 range NULL idx_t2_1 163 NULL 165 Using where; Using index for group-by explain select a1,a2,b, max(c) from t2 where (b = 'b') group by a1,a2; id select_type table type possible_keys key key_len ref rows Extra -1 SIMPLE t2 range NULL idx_t2_1 146 NULL 10 Using where; Using index for group-by +1 SIMPLE t2 range NULL idx_t2_1 146 NULL 165 Using where; Using index for group-by explain select a1,a2,b,min(c),max(c) from t2 where (b = 'b') group by a1,a2; id select_type table type possible_keys key key_len ref rows Extra -1 SIMPLE t2 range NULL idx_t2_1 163 NULL 10 Using where; Using index for group-by +1 SIMPLE t2 range NULL idx_t2_1 163 NULL 165 Using where; Using index for group-by explain select a1,a2, max(c) from t2 where (b = 'b') group by a1,a2; id select_type table type possible_keys key key_len ref rows Extra -1 SIMPLE t2 range NULL idx_t2_1 146 NULL 10 Using where; Using index for group-by +1 SIMPLE t2 range NULL idx_t2_1 146 NULL 165 Using where; Using index for group-by explain select a1,a2,b,max(c),min(c) from t3 where (a2 = 'a') and (b = 'b') group by a1; id select_type table type possible_keys key key_len ref rows Extra -1 SIMPLE t3 range NULL idx_t3_1 6 NULL 10 Using where; Using index for group-by +1 SIMPLE t3 range NULL idx_t3_1 6 NULL 193 Using where; Using index for group-by explain select a1,max(c),min(c) from t3 where (a2 = 'a') and (b = 'b') group by a1; id select_type table type possible_keys key key_len ref rows Extra -1 SIMPLE t3 range NULL idx_t3_1 6 NULL 10 Using where; Using index for group-by +1 SIMPLE t3 range NULL idx_t3_1 6 NULL 193 Using where; Using index for group-by select a1,a2,b,max(c),min(c) from t1 where (a2 = 'a') and (b = 'b') group by a1; a1 a2 b max(c) min(c) a a b h112 e112 @@ -804,22 +804,22 @@ b h212 e212 c h312 e312 explain select a1,a2,b,min(c) from t2 where (a2 = 'a') and b is NULL group by a1; id select_type table type possible_keys key key_len ref rows Extra -1 SIMPLE t2 range NULL idx_t2_1 163 NULL 10 Using where; Using index for group-by +1 SIMPLE t2 range NULL idx_t2_1 163 NULL 165 Using where; Using index for group-by explain select a1,a2,b,max(c) from t2 where (a2 = 'a') and b is NULL group by a1; id select_type table type possible_keys key key_len ref rows Extra -1 SIMPLE t2 range NULL idx_t2_1 146 NULL 10 Using where; Using index for group-by +1 SIMPLE t2 range NULL idx_t2_1 146 NULL 165 Using where; Using index for group-by explain select a1,a2,b,min(c) from t2 where b is NULL group by a1,a2; id select_type table type possible_keys key key_len ref rows Extra -1 SIMPLE t2 range NULL idx_t2_1 163 NULL 10 Using where; Using index for group-by +1 SIMPLE t2 range NULL idx_t2_1 163 NULL 165 Using where; Using index for group-by explain select a1,a2,b,max(c) from t2 where b is NULL group by a1,a2; id select_type table type possible_keys key key_len ref rows Extra -1 SIMPLE t2 range NULL idx_t2_1 146 NULL 10 Using where; Using index for group-by +1 SIMPLE t2 range NULL idx_t2_1 146 NULL 165 Using where; Using index for group-by explain select a1,a2,b,min(c),max(c) from t2 where b is NULL group by a1,a2; id select_type table type possible_keys key key_len ref rows Extra -1 SIMPLE t2 range NULL idx_t2_1 163 NULL 10 Using where; Using index for group-by +1 SIMPLE t2 range NULL idx_t2_1 163 NULL 165 Using where; Using index for group-by explain select a1,a2,b,min(c),max(c) from t2 where b is NULL group by a1,a2; id select_type table type possible_keys key key_len ref rows Extra -1 SIMPLE t2 range NULL idx_t2_1 163 NULL 10 Using where; Using index for group-by +1 SIMPLE t2 range NULL idx_t2_1 163 NULL 165 Using where; Using index for group-by select a1,a2,b,min(c) from t2 where (a2 = 'a') and b is NULL group by a1; a1 a2 b min(c) a a NULL a777 @@ -849,49 +849,49 @@ id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE t1 range NULL idx_t1_1 147 NULL # Using where; Using index for group-by explain select a1,a2,b,min(c),max(c) from t1 where (c > 'b1') group by a1,a2,b; id select_type table type possible_keys key key_len ref rows Extra -1 SIMPLE t1 range NULL idx_t1_1 163 NULL 10 Using where; Using index for group-by +1 SIMPLE t1 range NULL idx_t1_1 163 NULL 129 Using where; Using index for group-by explain select a1,a2,b, max(c) from t1 where (c > 'f123') group by a1,a2,b; id select_type table type possible_keys key key_len ref rows Extra -1 SIMPLE t1 range NULL idx_t1_1 147 NULL 10 Using where; Using index for group-by +1 SIMPLE t1 range NULL idx_t1_1 147 NULL 129 Using where; Using index for group-by explain select a1,a2,b,min(c),max(c) from t1 where (c > 'f123') group by a1,a2,b; id select_type table type possible_keys key key_len ref rows Extra -1 SIMPLE t1 range NULL idx_t1_1 163 NULL 10 Using where; Using index for group-by +1 SIMPLE t1 range NULL idx_t1_1 163 NULL 129 Using where; Using index for group-by explain select a1,a2,b, max(c) from t1 where (c < 'a0') group by a1,a2,b; id select_type table type possible_keys key key_len ref rows Extra -1 SIMPLE t1 range NULL idx_t1_1 163 NULL 10 Using where; Using index for group-by +1 SIMPLE t1 range NULL idx_t1_1 163 NULL 129 Using where; Using index for group-by explain select a1,a2,b,min(c),max(c) from t1 where (c < 'a0') group by a1,a2,b; id select_type table type possible_keys key key_len ref rows Extra -1 SIMPLE t1 range NULL idx_t1_1 163 NULL 10 Using where; Using index for group-by +1 SIMPLE t1 range NULL idx_t1_1 163 NULL 129 Using where; Using index for group-by explain select a1,a2,b, max(c) from t1 where (c < 'k321') group by a1,a2,b; id select_type table type possible_keys key key_len ref rows Extra -1 SIMPLE t1 range NULL idx_t1_1 163 NULL 10 Using where; Using index for group-by +1 SIMPLE t1 range NULL idx_t1_1 163 NULL 129 Using where; Using index for group-by explain select a1,a2,b,min(c),max(c) from t1 where (c < 'k321') group by a1,a2,b; id select_type table type possible_keys key key_len ref rows Extra -1 SIMPLE t1 range NULL idx_t1_1 163 NULL 10 Using where; Using index for group-by +1 SIMPLE t1 range NULL idx_t1_1 163 NULL 129 Using where; Using index for group-by explain select a1,a2,b, max(c) from t1 where (c < 'a0') or (c > 'b1') group by a1,a2,b; id select_type table type possible_keys key key_len ref rows Extra -1 SIMPLE t1 range NULL idx_t1_1 163 NULL 10 Using where; Using index for group-by +1 SIMPLE t1 range NULL idx_t1_1 163 NULL 129 Using where; Using index for group-by explain select a1,a2,b,min(c),max(c) from t1 where (c < 'a0') or (c > 'b1') group by a1,a2,b; id select_type table type possible_keys key key_len ref rows Extra -1 SIMPLE t1 range NULL idx_t1_1 163 NULL 10 Using where; Using index for group-by +1 SIMPLE t1 range NULL idx_t1_1 163 NULL 129 Using where; Using index for group-by explain select a1,a2,b, max(c) from t1 where (c > 'b1') or (c <= 'g1') group by a1,a2,b; id select_type table type possible_keys key key_len ref rows Extra -1 SIMPLE t1 range NULL idx_t1_1 147 NULL 10 Using where; Using index for group-by +1 SIMPLE t1 range NULL idx_t1_1 147 NULL 129 Using where; Using index for group-by explain select a1,a2,b,min(c),max(c) from t1 where (c > 'b1') or (c <= 'g1') group by a1,a2,b; id select_type table type possible_keys key key_len ref rows Extra -1 SIMPLE t1 range NULL idx_t1_1 147 NULL 10 Using where; Using index for group-by +1 SIMPLE t1 range NULL idx_t1_1 147 NULL 129 Using where; Using index for group-by explain select a1,a2,b,min(c),max(c) from t1 where (c > 'b111') and (c <= 'g112') group by a1,a2,b; id select_type table type possible_keys key key_len ref rows Extra -1 SIMPLE t1 range NULL idx_t1_1 163 NULL 10 Using where; Using index for group-by +1 SIMPLE t1 range NULL idx_t1_1 163 NULL 129 Using where; Using index for group-by explain select a1,a2,b,min(c),max(c) from t1 where (c < 'c5') or (c = 'g412') or (c = 'k421') group by a1,a2,b; id select_type table type possible_keys key key_len ref rows Extra -1 SIMPLE t1 range NULL idx_t1_1 163 NULL 10 Using where; Using index for group-by +1 SIMPLE t1 range NULL idx_t1_1 163 NULL 129 Using where; Using index for group-by explain select a1,a2,b,min(c),max(c) from t1 where ((c > 'b111') and (c <= 'g112')) or ((c > 'd000') and (c <= 'i110')) group by a1,a2,b; id select_type table type possible_keys key key_len ref rows Extra -1 SIMPLE t1 range NULL idx_t1_1 163 NULL 10 Using where; Using index for group-by +1 SIMPLE t1 range NULL idx_t1_1 163 NULL 129 Using where; Using index for group-by explain select a1,a2,b,min(c),max(c) from t1 where (c between 'b111' and 'g112') or (c between 'd000' and 'i110') group by a1,a2,b; id select_type table type possible_keys key key_len ref rows Extra -1 SIMPLE t1 range NULL idx_t1_1 163 NULL 10 Using where; Using index for group-by +1 SIMPLE t1 range NULL idx_t1_1 163 NULL 129 Using where; Using index for group-by explain select a1,a2,b, max(c) from t2 where (c > 'b1') group by a1,a2,b; id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE t2 range NULL idx_t2_1 146 NULL # Using where; Using index for group-by @@ -1364,29 +1364,29 @@ explain select a1,a2,b,min(c),max(c) from t1 where exists ( select * from t2 where t2.c > 'b1' ) group by a1,a2,b; id select_type table type possible_keys key key_len ref rows Extra -1 PRIMARY t1 range NULL idx_t1_1 147 NULL 10 Using index for group-by +1 PRIMARY t1 range NULL idx_t1_1 147 NULL 129 Using index for group-by 2 SUBQUERY t2 index NULL idx_t2_1 163 NULL 164 Using where; Using index explain select a1,a2,b,min(c),max(c) from t1 where (a1 >= 'c' or a2 < 'b') and (b > 'a') group by a1,a2,b; id select_type table type possible_keys key key_len ref rows Extra -1 SIMPLE t1 range idx_t1_0,idx_t1_1,idx_t1_2 idx_t1_1 147 NULL 10 Using where; Using index for group-by +1 SIMPLE t1 range idx_t1_0,idx_t1_1,idx_t1_2 idx_t1_1 147 NULL 129 Using where; Using index for group-by explain select a1,a2,b,min(c),max(c) from t1 where (a1 >= 'c' or a2 < 'b') and (c > 'b111') group by a1,a2,b; id select_type table type possible_keys key key_len ref rows Extra -1 SIMPLE t1 range idx_t1_0,idx_t1_1,idx_t1_2 idx_t1_1 163 NULL 10 Using where; Using index for group-by +1 SIMPLE t1 range idx_t1_0,idx_t1_1,idx_t1_2 idx_t1_1 163 NULL 129 Using where; Using index for group-by explain select a1,a2,b,min(c),max(c) from t1 where (a2 >= 'b') and (b = 'a') and (c > 'b111') group by a1,a2,b; id select_type table type possible_keys key key_len ref rows Extra -1 SIMPLE t1 range NULL idx_t1_1 163 NULL 10 Using where; Using index for group-by +1 SIMPLE t1 range NULL idx_t1_1 163 NULL 129 Using where; Using index for group-by explain select a1,a2,b,min(c) from t1 where ((a1 > 'a') or (a1 < '9')) and ((a2 >= 'b') and (a2 < 'z')) and (b = 'a') and ((c < 'h112') or (c = 'j121') or (c > 'k121' and c < 'm122') or (c > 'o122')) group by a1,a2,b; id select_type table type possible_keys key key_len ref rows Extra -1 SIMPLE t1 range idx_t1_0,idx_t1_1,idx_t1_2 idx_t1_1 163 NULL 1 Using where; Using index for group-by +1 SIMPLE t1 range idx_t1_0,idx_t1_1,idx_t1_2 idx_t1_1 163 NULL 2 Using where; Using index for group-by explain select a1,a2,b,min(c) from t1 where ((a1 > 'a') or (a1 < '9')) and ((a2 >= 'b') and (a2 < 'z')) and (b = 'a') and ((c = 'j121') or (c > 'k121' and c < 'm122') or (c > 'o122') or (c < 'h112') or (c = 'c111')) group by a1,a2,b; id select_type table type possible_keys key key_len ref rows Extra -1 SIMPLE t1 range idx_t1_0,idx_t1_1,idx_t1_2 idx_t1_1 163 NULL 1 Using where; Using index for group-by +1 SIMPLE t1 range idx_t1_0,idx_t1_1,idx_t1_2 idx_t1_1 163 NULL 2 Using where; Using index for group-by explain select a1,a2,b,min(c) from t1 where (a1 > 'a') and (a2 > 'a') and (b = 'c') group by a1,a2,b; id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE t1 range idx_t1_0,idx_t1_1,idx_t1_2 idx_t1_2 65 NULL 1 Using where explain select a1,a2,b,min(c) from t1 where (ord(a1) > 97) and (ord(a2) + ord(a1) > 194) and (b = 'c') group by a1,a2,b; id select_type table type possible_keys key key_len ref rows Extra -1 SIMPLE t1 range NULL idx_t1_1 147 NULL 10 Using where; Using index for group-by +1 SIMPLE t1 range NULL idx_t1_1 147 NULL 129 Using where; Using index for group-by explain select a1,a2,b,min(c),max(c) from t2 where (a1 >= 'c' or a2 < 'b') and (b > 'a') group by a1,a2,b; id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE t2 range idx_t2_0,idx_t2_1,idx_t2_2 idx_t2_1 163 NULL # Using where; Using index for group-by @@ -1491,13 +1491,13 @@ select a1,a2,b,min(c) from t2 where (a1 > 'a') and (a2 > 'a') and (b = 'c') grou a1 a2 b min(c) explain select a1,a2,b from t1 where (a1 >= 'c' or a2 < 'b') and (b > 'a') group by a1,a2,b; id select_type table type possible_keys key key_len ref rows Extra -1 SIMPLE t1 range idx_t1_0,idx_t1_1,idx_t1_2 idx_t1_1 147 NULL 10 Using where; Using index for group-by +1 SIMPLE t1 range idx_t1_0,idx_t1_1,idx_t1_2 idx_t1_1 147 NULL 129 Using where; Using index for group-by explain select a1,a2,b from t1 where (a2 >= 'b') and (b = 'a') group by a1,a2,b; id select_type table type possible_keys key key_len ref rows Extra -1 SIMPLE t1 range NULL idx_t1_1 147 NULL 10 Using where; Using index for group-by +1 SIMPLE t1 range NULL idx_t1_1 147 NULL 129 Using where; Using index for group-by explain select a1,a2,b,c from t1 where (a2 >= 'b') and (b = 'a') and (c = 'i121') group by a1,a2,b; id select_type table type possible_keys key key_len ref rows Extra -1 SIMPLE t1 range NULL idx_t1_1 163 NULL 10 Using where; Using index for group-by +1 SIMPLE t1 range NULL idx_t1_1 163 NULL 129 Using where; Using index for group-by explain select a1,a2,b from t1 where (a1 > 'a') and (a2 > 'a') and (b = 'c') group by a1,a2,b; id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE t1 range idx_t1_0,idx_t1_1,idx_t1_2 idx_t1_2 147 NULL 1 Using where @@ -1554,13 +1554,13 @@ select a1,a2,b from t2 where (a1 > 'a') and (a2 > 'a') and (b = 'c') group by a1 a1 a2 b explain select distinct a1,a2,b from t1; id select_type table type possible_keys key key_len ref rows Extra -1 SIMPLE t1 range NULL idx_t1_1 147 NULL 10 Using index for group-by +1 SIMPLE t1 range NULL idx_t1_1 147 NULL 129 Using index for group-by explain select distinct a1,a2,b from t1 where (a2 >= 'b') and (b = 'a'); id select_type table type possible_keys key key_len ref rows Extra -1 SIMPLE t1 range NULL idx_t1_1 147 NULL 10 Using where; Using index for group-by +1 SIMPLE t1 range NULL idx_t1_1 147 NULL 129 Using where; Using index for group-by explain extended select distinct a1,a2,b,c from t1 where (a2 >= 'b') and (b = 'a') and (c = 'i121'); id select_type table type possible_keys key key_len ref rows filtered Extra -1 SIMPLE t1 range NULL idx_t1_1 163 NULL 10 100.00 Using where; Using index for group-by +1 SIMPLE t1 range NULL idx_t1_1 163 NULL 129 99.22 Using where; Using index for group-by Warnings: Note 1003 select distinct `test`.`t1`.`a1` AS `a1`,`test`.`t1`.`a2` AS `a2`,`test`.`t1`.`b` AS `b`,`test`.`t1`.`c` AS `c` from `test`.`t1` where ((`test`.`t1`.`c` = 'i121') and (`test`.`t1`.`b` = 'a') and (`test`.`t1`.`a2` >= 'b')) explain select distinct a1,a2,b from t1 where (a1 > 'a') and (a2 > 'a') and (b = 'c'); @@ -1577,7 +1577,7 @@ id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE t2 range NULL idx_t2_2 146 NULL # Using where; Using index for group-by explain extended select distinct a1,a2,b,c from t2 where (a2 >= 'b') and (b = 'a') and (c = 'i121'); id select_type table type possible_keys key key_len ref rows filtered Extra -1 SIMPLE t2 range NULL idx_t2_1 163 NULL 10 100.00 Using where; Using index for group-by +1 SIMPLE t2 range NULL idx_t2_1 163 NULL 165 99.39 Using where; Using index for group-by Warnings: Note 1003 select distinct `test`.`t2`.`a1` AS `a1`,`test`.`t2`.`a2` AS `a2`,`test`.`t2`.`b` AS `b`,`test`.`t2`.`c` AS `c` from `test`.`t2` where ((`test`.`t2`.`c` = 'i121') and (`test`.`t2`.`b` = 'a') and (`test`.`t2`.`a2` >= 'b')) explain select distinct a1,a2,b from t2 where (a1 > 'a') and (a2 > 'a') and (b = 'c'); @@ -1702,19 +1702,19 @@ c e d e explain select distinct a1,a2,b from t1; id select_type table type possible_keys key key_len ref rows Extra -1 SIMPLE t1 range NULL idx_t1_1 147 NULL 10 Using index for group-by +1 SIMPLE t1 range NULL idx_t1_1 147 NULL 129 Using index for group-by explain select distinct a1,a2,b from t1 where (a2 >= 'b') and (b = 'a') group by a1,a2,b; id select_type table type possible_keys key key_len ref rows Extra -1 SIMPLE t1 range NULL idx_t1_1 147 NULL 10 Using where; Using index for group-by +1 SIMPLE t1 range NULL idx_t1_1 147 NULL 129 Using where; Using index for group-by explain select distinct a1,a2,b,c from t1 where (a2 >= 'b') and (b = 'a') and (c = 'i121') group by a1,a2,b; id select_type table type possible_keys key key_len ref rows Extra -1 SIMPLE t1 range NULL idx_t1_1 163 NULL 10 Using where; Using index for group-by +1 SIMPLE t1 range NULL idx_t1_1 163 NULL 129 Using where; Using index for group-by explain select distinct a1,a2,b from t1 where (a1 > 'a') and (a2 > 'a') and (b = 'c') group by a1,a2,b; id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE t1 range idx_t1_0,idx_t1_1,idx_t1_2 idx_t1_2 147 NULL 1 Using where explain select distinct b from t1 where (a2 >= 'b') and (b = 'a') group by a1,a2,b; id select_type table type possible_keys key key_len ref rows Extra -1 SIMPLE t1 range NULL idx_t1_1 147 NULL 10 Using where; Using index for group-by; Using temporary; Using filesort +1 SIMPLE t1 range NULL idx_t1_1 147 NULL 129 Using where; Using index for group-by; Using temporary; Using filesort explain select distinct a1,a2,b from t2; id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE t2 range NULL idx_t2_2 146 NULL # Using index for group-by @@ -1846,7 +1846,7 @@ id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE t1 range idx_t1_0,idx_t1_1,idx_t1_2 idx_t1_2 65 NULL 1 Using where explain select concat(ord(min(b)),ord(max(b))),min(b),max(b) from t1 group by a1,a2; id select_type table type possible_keys key key_len ref rows Extra -1 SIMPLE t1 range NULL idx_t1_1 147 NULL 10 Using index for group-by +1 SIMPLE t1 range NULL idx_t1_1 147 NULL 129 Using index for group-by select a1,a2,b, concat(min(c), max(c)) from t1 where a1 < 'd' group by a1,a2,b; a1 a2 b concat(min(c), max(c)) a a a a111d111 @@ -1985,7 +1985,7 @@ c d explain select a1 from t1 where a2 = 'b' group by a1; id select_type table type possible_keys key key_len ref rows Extra -1 SIMPLE t1 range NULL idx_t1_1 130 NULL 10 Using where; Using index for group-by +1 SIMPLE t1 range NULL idx_t1_1 130 NULL 129 Using where; Using index for group-by select a1 from t1 where a2 = 'b' group by a1; a1 a @@ -1994,7 +1994,7 @@ c d explain select distinct a1 from t1 where a2 = 'b'; id select_type table type possible_keys key key_len ref rows Extra -1 SIMPLE t1 range NULL idx_t1_1 130 NULL 10 Using where; Using index for group-by +1 SIMPLE t1 range NULL idx_t1_1 130 NULL 129 Using where; Using index for group-by select distinct a1 from t1 where a2 = 'b'; a1 a @@ -2188,7 +2188,7 @@ INSERT INTO t1 (a, b) VALUES (1,1), (1,2), (1,3), (1,4), (1,5), (2,2), (2,3), (2,1), (3,1), (4,1), (4,2), (4,3), (4,4), (4,5), (4,6); EXPLAIN SELECT max(b), a FROM t1 GROUP BY a; id select_type table type possible_keys key key_len ref rows Extra -1 SIMPLE t1 range NULL a 5 NULL 8 Using index for group-by +1 SIMPLE t1 index NULL a 10 NULL 15 Using index FLUSH STATUS; SELECT max(b), a FROM t1 GROUP BY a; max(b) a @@ -2202,7 +2202,7 @@ Handler_read_key 0 Handler_read_next 0 EXPLAIN SELECT max(b), a FROM t1 GROUP BY a; id select_type table type possible_keys key key_len ref rows Extra -1 SIMPLE t1 range NULL a 5 NULL 8 Using index for group-by +1 SIMPLE t1 index NULL a 10 NULL 15 Using index FLUSH STATUS; CREATE TABLE t2 SELECT max(b), a FROM t1 GROUP BY a; SHOW STATUS LIKE 'handler_read__e%'; @@ -2235,14 +2235,14 @@ Handler_read_next 0 EXPLAIN (SELECT max(b), a FROM t1 GROUP BY a) UNION (SELECT max(b), a FROM t1 GROUP BY a); id select_type table type possible_keys key key_len ref rows Extra -1 PRIMARY t1 range NULL a 5 NULL 8 Using index for group-by -2 UNION t1 range NULL a 5 NULL 8 Using index for group-by +1 PRIMARY t1 index NULL a 10 NULL 15 Using index +2 UNION t1 index NULL a 10 NULL 15 Using index NULL UNION RESULT <union1,2> ALL NULL NULL NULL NULL NULL EXPLAIN SELECT (SELECT max(b) FROM t1 GROUP BY a HAVING a < 2) x FROM t1 AS t1_outer; id select_type table type possible_keys key key_len ref rows Extra 1 PRIMARY t1_outer index NULL a 10 NULL 15 Using index -2 SUBQUERY t1 range NULL a 5 NULL 8 Using index for group-by +2 SUBQUERY t1 index NULL a 10 NULL 15 Using index EXPLAIN SELECT 1 FROM t1 AS t1_outer WHERE EXISTS (SELECT max(b) FROM t1 GROUP BY a HAVING a < 2); id select_type table type possible_keys key key_len ref rows Extra @@ -2252,7 +2252,7 @@ EXPLAIN SELECT 1 FROM t1 AS t1_outer WHERE (SELECT max(b) FROM t1 GROUP BY a HAVING a < 2) > 12; id select_type table type possible_keys key key_len ref rows Extra 1 PRIMARY NULL NULL NULL NULL NULL NULL NULL Impossible WHERE -2 SUBQUERY t1 range NULL a 5 NULL 8 Using index for group-by +2 SUBQUERY t1 index NULL a 10 NULL 15 Using index EXPLAIN SELECT 1 FROM t1 AS t1_outer WHERE a IN (SELECT max(b) FROM t1 GROUP BY a HAVING a < 2); id select_type table type possible_keys key key_len ref rows Extra @@ -2261,21 +2261,21 @@ id select_type table type possible_keys key key_len ref rows Extra EXPLAIN SELECT 1 FROM t1 AS t1_outer GROUP BY a HAVING a > (SELECT max(b) FROM t1 GROUP BY a HAVING a < 2); id select_type table type possible_keys key key_len ref rows Extra -1 PRIMARY t1_outer range NULL a 5 NULL 8 Using index for group-by -2 SUBQUERY t1 range NULL a 5 NULL 8 Using index for group-by +1 PRIMARY t1_outer index NULL a 10 NULL 15 Using index +2 SUBQUERY t1 index NULL a 10 NULL 15 Using index EXPLAIN SELECT 1 FROM t1 AS t1_outer1 JOIN t1 AS t1_outer2 ON t1_outer1.a = (SELECT max(b) FROM t1 GROUP BY a HAVING a < 2) AND t1_outer1.b = t1_outer2.b; id select_type table type possible_keys key key_len ref rows Extra 1 PRIMARY t1_outer1 ref a a 5 const 1 Using where; Using index 1 PRIMARY t1_outer2 index NULL a 10 NULL 15 Using where; Using index; Using join buffer -2 SUBQUERY t1 range NULL a 5 NULL 8 Using index for group-by +2 SUBQUERY t1 index NULL a 10 NULL 15 Using index EXPLAIN SELECT (SELECT (SELECT max(b) FROM t1 GROUP BY a HAVING a < 2) x FROM t1 AS t1_outer) x2 FROM t1 AS t1_outer2; id select_type table type possible_keys key key_len ref rows Extra 1 PRIMARY t1_outer2 index NULL a 10 NULL 15 Using index 2 SUBQUERY t1_outer index NULL a 10 NULL 15 Using index -3 SUBQUERY t1 range NULL a 5 NULL 8 Using index for group-by +3 SUBQUERY t1 index NULL a 10 NULL 15 Using index CREATE TABLE t3 LIKE t1; FLUSH STATUS; INSERT INTO t3 SELECT a,MAX(b) FROM t1 GROUP BY a; @@ -2312,7 +2312,7 @@ INSERT INTO t1 VALUES (4), (2), (1), (2), (2), (4), (1), (4); EXPLAIN SELECT DISTINCT(a) FROM t1; id select_type table type possible_keys key key_len ref rows Extra -1 SIMPLE t1 range NULL idx 5 NULL 9 Using index for group-by +1 SIMPLE t1 index NULL idx 5 NULL 16 Using index SELECT DISTINCT(a) FROM t1; a 1 @@ -2320,7 +2320,7 @@ a 4 EXPLAIN SELECT SQL_BIG_RESULT DISTINCT(a) FROM t1; id select_type table type possible_keys key key_len ref rows Extra -1 SIMPLE t1 range NULL idx 5 NULL 9 Using index for group-by +1 SIMPLE t1 index NULL idx 5 NULL 16 Using index SELECT SQL_BIG_RESULT DISTINCT(a) FROM t1; a 1 @@ -2345,7 +2345,7 @@ CREATE INDEX break_it ON t1 (a, b); EXPLAIN SELECT a, MIN(b), MAX(b) FROM t1 GROUP BY a ORDER BY a; id select_type table type possible_keys key key_len ref rows Extra -1 SIMPLE t1 range NULL break_it 10 NULL 7 Using index for group-by +1 SIMPLE t1 index NULL break_it 10 NULL 12 Using index SELECT a, MIN(b), MAX(b) FROM t1 GROUP BY a ORDER BY a; a MIN(b) MAX(b) 1 1 3 @@ -2355,7 +2355,7 @@ a MIN(b) MAX(b) EXPLAIN SELECT a, MIN(b), MAX(b) FROM t1 GROUP BY a ORDER BY a DESC; id select_type table type possible_keys key key_len ref rows Extra -1 SIMPLE t1 range NULL break_it 10 NULL 7 Using index for group-by; Using temporary; Using filesort +1 SIMPLE t1 index NULL break_it 10 NULL 12 Using index SELECT a, MIN(b), MAX(b) FROM t1 GROUP BY a ORDER BY a DESC; a MIN(b) MAX(b) 4 1 3 diff --git a/mysql-test/suite/pbxt/r/join.result b/mysql-test/suite/pbxt/r/join.result index 3adcb4fd27a..a74ee3d3b35 100644 --- a/mysql-test/suite/pbxt/r/join.result +++ b/mysql-test/suite/pbxt/r/join.result @@ -774,7 +774,7 @@ insert into t3 select * from t2 where a < 800; explain select * from t2,t3 where t2.a < 200 and t2.b=t3.b; id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE t2 range a,b a 5 NULL 1 Using where -1 SIMPLE t3 ref b b 5 test.t2.b 11 Using where +1 SIMPLE t3 ref b b 5 test.t2.b 1 Using where drop table t1, t2, t3; create table t1 (a int); insert into t1 values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9); diff --git a/mysql-test/suite/pbxt/r/join_nested.result b/mysql-test/suite/pbxt/r/join_nested.result index a2a2bc6299f..92dc4f8acf2 100644 --- a/mysql-test/suite/pbxt/r/join_nested.result +++ b/mysql-test/suite/pbxt/r/join_nested.result @@ -851,7 +851,7 @@ ON t3.a=1 AND t3.b=t2.b AND t2.b=t4.b; id select_type table type possible_keys key key_len ref rows filtered Extra 1 SIMPLE t3 ALL NULL NULL NULL NULL 2 100.00 1 SIMPLE t4 ALL NULL NULL NULL NULL 2 100.00 Using join buffer -1 SIMPLE t2 ref idx_b idx_b 5 test.t3.b 2 100.00 +1 SIMPLE t2 ref idx_b idx_b 5 test.t3.b 1 100.00 1 SIMPLE t1 ALL NULL NULL NULL NULL 3 100.00 Warnings: Note 1003 select `test`.`t2`.`a` AS `a`,`test`.`t2`.`b` AS `b`,`test`.`t3`.`a` AS `a`,`test`.`t3`.`b` AS `b`,`test`.`t4`.`a` AS `a`,`test`.`t4`.`b` AS `b` from `test`.`t3` join `test`.`t4` left join (`test`.`t1` join `test`.`t2`) on(((`test`.`t3`.`a` = 1) and (`test`.`t3`.`b` = `test`.`t2`.`b`) and (`test`.`t2`.`b` = `test`.`t4`.`b`))) where 1 @@ -958,15 +958,15 @@ id select_type table type possible_keys key key_len ref rows filtered Extra 1 SIMPLE t0 ALL NULL NULL NULL NULL 3 100.00 Using where 1 SIMPLE t1 ALL NULL NULL NULL NULL 3 100.00 Using where; Using join buffer 1 SIMPLE t2 ALL NULL NULL NULL NULL 3 100.00 Using where +1 SIMPLE t4 ref idx_b idx_b 5 test.t2.b 1 100.00 1 SIMPLE t3 ALL NULL NULL NULL NULL 2 100.00 Using where -1 SIMPLE t4 ref idx_b idx_b 5 test.t2.b 2 100.00 Using where 1 SIMPLE t5 ALL idx_b NULL NULL NULL 3 100.00 Using where 1 SIMPLE t7 ALL NULL NULL NULL NULL 2 100.00 Using where 1 SIMPLE t6 ALL NULL NULL NULL NULL 3 100.00 Using where 1 SIMPLE t8 ALL NULL NULL NULL NULL 2 100.00 Using where 1 SIMPLE t9 ALL NULL NULL NULL NULL 3 100.00 Using where; Using join buffer Warnings: -Note 1003 select `test`.`t0`.`a` AS `a`,`test`.`t0`.`b` AS `b`,`test`.`t1`.`a` AS `a`,`test`.`t1`.`b` AS `b`,`test`.`t2`.`a` AS `a`,`test`.`t2`.`b` AS `b`,`test`.`t3`.`a` AS `a`,`test`.`t3`.`b` AS `b`,`test`.`t4`.`a` AS `a`,`test`.`t4`.`b` AS `b`,`test`.`t5`.`a` AS `a`,`test`.`t5`.`b` AS `b`,`test`.`t6`.`a` AS `a`,`test`.`t6`.`b` AS `b`,`test`.`t7`.`a` AS `a`,`test`.`t7`.`b` AS `b`,`test`.`t8`.`a` AS `a`,`test`.`t8`.`b` AS `b`,`test`.`t9`.`a` AS `a`,`test`.`t9`.`b` AS `b` from `test`.`t0` join `test`.`t1` left join (`test`.`t2` left join (`test`.`t3` join `test`.`t4`) on(((`test`.`t4`.`b` = `test`.`t2`.`b`) and (`test`.`t3`.`a` = 1))) join `test`.`t5` left join (`test`.`t6` join `test`.`t7` left join `test`.`t8` on(((`test`.`t8`.`b` = `test`.`t5`.`b`) and (`test`.`t6`.`b` < 10)))) on(((`test`.`t7`.`b` = `test`.`t5`.`b`) and (`test`.`t6`.`b` >= 2)))) on((((`test`.`t3`.`b` = 2) or isnull(`test`.`t3`.`c`)) and ((`test`.`t6`.`b` = 2) or isnull(`test`.`t6`.`c`)) and ((`test`.`t5`.`b` = `test`.`t0`.`b`) or isnull(`test`.`t3`.`c`) or isnull(`test`.`t6`.`c`) or isnull(`test`.`t8`.`c`)) and (`test`.`t1`.`a` <> 2))) join `test`.`t9` where ((`test`.`t9`.`a` = 1) and (`test`.`t1`.`b` = `test`.`t0`.`b`) and (`test`.`t0`.`a` = 1) and ((`test`.`t2`.`a` >= 4) or isnull(`test`.`t2`.`c`)) and ((`test`.`t3`.`a` < 5) or isnull(`test`.`t3`.`c`)) and ((`test`.`t4`.`b` = `test`.`t3`.`b`) or isnull(`test`.`t3`.`c`) or isnull(`test`.`t4`.`c`)) and ((`test`.`t5`.`a` >= 2) or isnull(`test`.`t5`.`c`)) and ((`test`.`t6`.`a` >= 4) or isnull(`test`.`t6`.`c`)) and ((`test`.`t7`.`a` <= 2) or isnull(`test`.`t7`.`c`)) and ((`test`.`t8`.`a` < 1) or isnull(`test`.`t8`.`c`)) and ((`test`.`t9`.`b` = `test`.`t8`.`b`) or isnull(`test`.`t8`.`c`))) +Note 1003 select `test`.`t0`.`a` AS `a`,`test`.`t0`.`b` AS `b`,`test`.`t1`.`a` AS `a`,`test`.`t1`.`b` AS `b`,`test`.`t2`.`a` AS `a`,`test`.`t2`.`b` AS `b`,`test`.`t3`.`a` AS `a`,`test`.`t3`.`b` AS `b`,`test`.`t4`.`a` AS `a`,`test`.`t4`.`b` AS `b`,`test`.`t5`.`a` AS `a`,`test`.`t5`.`b` AS `b`,`test`.`t6`.`a` AS `a`,`test`.`t6`.`b` AS `b`,`test`.`t7`.`a` AS `a`,`test`.`t7`.`b` AS `b`,`test`.`t8`.`a` AS `a`,`test`.`t8`.`b` AS `b`,`test`.`t9`.`a` AS `a`,`test`.`t9`.`b` AS `b` from `test`.`t0` join `test`.`t1` left join (`test`.`t2` left join (`test`.`t3` join `test`.`t4`) on(((`test`.`t4`.`b` = `test`.`t2`.`b`) and (`test`.`t3`.`a` = 1))) join `test`.`t5` left join (`test`.`t6` join `test`.`t7` left join `test`.`t8` on(((`test`.`t8`.`b` = `test`.`t5`.`b`) and (`test`.`t6`.`b` < 10)))) on(((`test`.`t7`.`b` = `test`.`t5`.`b`) and (`test`.`t6`.`b` >= 2)))) on((((`test`.`t3`.`b` = 2) or isnull(`test`.`t3`.`c`)) and ((`test`.`t6`.`b` = 2) or isnull(`test`.`t6`.`c`)) and ((`test`.`t5`.`b` = `test`.`t0`.`b`) or isnull(`test`.`t3`.`c`) or isnull(`test`.`t6`.`c`) or isnull(`test`.`t8`.`c`)) and (`test`.`t1`.`a` <> 2))) join `test`.`t9` where ((`test`.`t9`.`a` = 1) and (`test`.`t1`.`b` = `test`.`t0`.`b`) and (`test`.`t0`.`a` = 1) and ((`test`.`t2`.`a` >= 4) or isnull(`test`.`t2`.`c`)) and ((`test`.`t3`.`a` < 5) or isnull(`test`.`t3`.`c`)) and ((`test`.`t3`.`b` = `test`.`t4`.`b`) or isnull(`test`.`t3`.`c`) or isnull(`test`.`t4`.`c`)) and ((`test`.`t5`.`a` >= 2) or isnull(`test`.`t5`.`c`)) and ((`test`.`t6`.`a` >= 4) or isnull(`test`.`t6`.`c`)) and ((`test`.`t7`.`a` <= 2) or isnull(`test`.`t7`.`c`)) and ((`test`.`t8`.`a` < 1) or isnull(`test`.`t8`.`c`)) and ((`test`.`t9`.`b` = `test`.`t8`.`b`) or isnull(`test`.`t8`.`c`))) CREATE INDEX idx_b ON t8(b); EXPLAIN EXTENDED SELECT t0.a,t0.b,t1.a,t1.b,t2.a,t2.b,t3.a,t3.b,t4.a,t4.b, @@ -1008,14 +1008,14 @@ id select_type table type possible_keys key key_len ref rows filtered Extra 1 SIMPLE t1 ALL NULL NULL NULL NULL 3 100.00 Using where; Using join buffer 1 SIMPLE t2 ALL NULL NULL NULL NULL 3 100.00 Using where 1 SIMPLE t3 ALL NULL NULL NULL NULL 2 100.00 Using where -1 SIMPLE t4 ref idx_b idx_b 5 test.t2.b 2 100.00 Using where +1 SIMPLE t4 ref idx_b idx_b 5 test.t2.b 1 100.00 1 SIMPLE t5 ALL idx_b NULL NULL NULL 3 100.00 Using where -1 SIMPLE t7 ALL NULL NULL NULL NULL 2 100.00 Using where 1 SIMPLE t6 ALL NULL NULL NULL NULL 3 100.00 Using where -1 SIMPLE t8 ref idx_b idx_b 5 test.t5.b 2 100.00 Using where +1 SIMPLE t7 ALL NULL NULL NULL NULL 2 100.00 Using where +1 SIMPLE t8 ref idx_b idx_b 5 test.t5.b 1 100.00 Using where 1 SIMPLE t9 ALL NULL NULL NULL NULL 3 100.00 Using where; Using join buffer +Note 1003 select `test`.`t0`.`a` AS `a`,`test`.`t0`.`b` AS `b`,`test`.`t1`.`a` AS `a`,`test`.`t1`.`b` AS `b`,`test`.`t2`.`a` AS `a`,`test`.`t2`.`b` AS `b`,`test`.`t3`.`a` AS `a`,`test`.`t3`.`b` AS `b`,`test`.`t4`.`a` AS `a`,`test`.`t4`.`b` AS `b`,`test`.`t5`.`a` AS `a`,`test`.`t5`.`b` AS `b`,`test`.`t6`.`a` AS `a`,`test`.`t6`.`b` AS `b`,`test`.`t7`.`a` AS `a`,`test`.`t7`.`b` AS `b`,`test`.`t8`.`a` AS `a`,`test`.`t8`.`b` AS `b`,`test`.`t9`.`a` AS `a`,`test`.`t9`.`b` AS `b` from `test`.`t0` join `test`.`t1` left join (`test`.`t2` left join (`test`.`t3` join `test`.`t4`) on(((`test`.`t4`.`b` = `test`.`t2`.`b`) and (`test`.`t3`.`a` = 1))) join `test`.`t5` left join (`test`.`t6` join `test`.`t7` left join `test`.`t8` on(((`test`.`t8`.`b` = `test`.`t5`.`b`) and (`test`.`t6`.`b` < 10)))) on(((`test`.`t7`.`b` = `test`.`t5`.`b`) and (`test`.`t6`.`b` >= 2)))) on((((`test`.`t3`.`b` = 2) or isnull(`test`.`t3`.`c`)) and ((`test`.`t6`.`b` = 2) or isnull(`test`.`t6`.`c`)) and ((`test`.`t5`.`b` = `test`.`t0`.`b`) or isnull(`test`.`t3`.`c`) or isnull(`test`.`t6`.`c`) or isnull(`test`.`t8`.`c`)) and (`test`.`t1`.`a` <> 2))) join `test`.`t9` where ((`test`.`t9`.`a` = 1) and (`test`.`t1`.`b` = `test`.`t0`.`b`) and (`test`.`t0`.`a` = 1) and ((`test`.`t2`.`a` >= 4) or isnull(`test`.`t2`.`c`)) and ((`test`.`t3`.`a` < 5) or isnull(`test`.`t3`.`c`)) and ((`test`.`t3`.`b` = `test`.`t4`.`b`) or isnull(`test`.`t3`.`c`) or isnull(`test`.`t4`.`c`)) and ((`test`.`t5`.`a` >= 2) or isnull(`test`.`t5`.`c`)) and ((`test`.`t6`.`a` >= 4) or isnull(`test`.`t6`.`c`)) and ((`test`.`t7`.`a` <= 2) or isnull(`test`.`t7`.`c`)) and ((`test`.`t8`.`a` < 1) or isnull(`test`.`t8`.`c`)) and ((`test`.`t9`.`b` = `test`.`t8`.`b`) or isnull(`test`.`t8`.`c`))) Warnings: -Note 1003 select `test`.`t0`.`a` AS `a`,`test`.`t0`.`b` AS `b`,`test`.`t1`.`a` AS `a`,`test`.`t1`.`b` AS `b`,`test`.`t2`.`a` AS `a`,`test`.`t2`.`b` AS `b`,`test`.`t3`.`a` AS `a`,`test`.`t3`.`b` AS `b`,`test`.`t4`.`a` AS `a`,`test`.`t4`.`b` AS `b`,`test`.`t5`.`a` AS `a`,`test`.`t5`.`b` AS `b`,`test`.`t6`.`a` AS `a`,`test`.`t6`.`b` AS `b`,`test`.`t7`.`a` AS `a`,`test`.`t7`.`b` AS `b`,`test`.`t8`.`a` AS `a`,`test`.`t8`.`b` AS `b`,`test`.`t9`.`a` AS `a`,`test`.`t9`.`b` AS `b` from `test`.`t0` join `test`.`t1` left join (`test`.`t2` left join (`test`.`t3` join `test`.`t4`) on(((`test`.`t4`.`b` = `test`.`t2`.`b`) and (`test`.`t3`.`a` = 1))) join `test`.`t5` left join (`test`.`t6` join `test`.`t7` left join `test`.`t8` on(((`test`.`t8`.`b` = `test`.`t5`.`b`) and (`test`.`t6`.`b` < 10)))) on(((`test`.`t7`.`b` = `test`.`t5`.`b`) and (`test`.`t6`.`b` >= 2)))) on((((`test`.`t3`.`b` = 2) or isnull(`test`.`t3`.`c`)) and ((`test`.`t6`.`b` = 2) or isnull(`test`.`t6`.`c`)) and ((`test`.`t5`.`b` = `test`.`t0`.`b`) or isnull(`test`.`t3`.`c`) or isnull(`test`.`t6`.`c`) or isnull(`test`.`t8`.`c`)) and (`test`.`t1`.`a` <> 2))) join `test`.`t9` where ((`test`.`t9`.`a` = 1) and (`test`.`t1`.`b` = `test`.`t0`.`b`) and (`test`.`t0`.`a` = 1) and ((`test`.`t2`.`a` >= 4) or isnull(`test`.`t2`.`c`)) and ((`test`.`t3`.`a` < 5) or isnull(`test`.`t3`.`c`)) and ((`test`.`t4`.`b` = `test`.`t3`.`b`) or isnull(`test`.`t3`.`c`) or isnull(`test`.`t4`.`c`)) and ((`test`.`t5`.`a` >= 2) or isnull(`test`.`t5`.`c`)) and ((`test`.`t6`.`a` >= 4) or isnull(`test`.`t6`.`c`)) and ((`test`.`t7`.`a` <= 2) or isnull(`test`.`t7`.`c`)) and ((`test`.`t8`.`a` < 1) or isnull(`test`.`t8`.`c`)) and ((`test`.`t9`.`b` = `test`.`t8`.`b`) or isnull(`test`.`t8`.`c`))) CREATE INDEX idx_b ON t1(b); CREATE INDEX idx_a ON t0(a); EXPLAIN EXTENDED @@ -1055,17 +1055,17 @@ t0.b=t1.b AND (t9.a=1); id select_type table type possible_keys key key_len ref rows filtered Extra 1 SIMPLE t0 ref idx_a idx_a 5 const 1 100.00 Using where -1 SIMPLE t1 ref idx_b idx_b 5 test.t0.b 2 100.00 Using where +1 SIMPLE t1 ref idx_b idx_b 5 test.t0.b 1 100.00 Using where 1 SIMPLE t2 ALL NULL NULL NULL NULL 3 100.00 Using where 1 SIMPLE t3 ALL NULL NULL NULL NULL 2 100.00 Using where -1 SIMPLE t4 ref idx_b idx_b 5 test.t2.b 2 100.00 Using where +1 SIMPLE t4 ref idx_b idx_b 5 test.t2.b 1 100.00 1 SIMPLE t5 ALL idx_b NULL NULL NULL 3 100.00 Using where -1 SIMPLE t7 ALL NULL NULL NULL NULL 2 100.00 Using where 1 SIMPLE t6 ALL NULL NULL NULL NULL 3 100.00 Using where -1 SIMPLE t8 ref idx_b idx_b 5 test.t5.b 2 100.00 Using where +1 SIMPLE t7 ALL NULL NULL NULL NULL 2 100.00 Using where +1 SIMPLE t8 ref idx_b idx_b 5 test.t5.b 1 100.00 Using where 1 SIMPLE t9 ALL NULL NULL NULL NULL 3 100.00 Using where; Using join buffer +Note 1003 select `test`.`t0`.`a` AS `a`,`test`.`t0`.`b` AS `b`,`test`.`t1`.`a` AS `a`,`test`.`t1`.`b` AS `b`,`test`.`t2`.`a` AS `a`,`test`.`t2`.`b` AS `b`,`test`.`t3`.`a` AS `a`,`test`.`t3`.`b` AS `b`,`test`.`t4`.`a` AS `a`,`test`.`t4`.`b` AS `b`,`test`.`t5`.`a` AS `a`,`test`.`t5`.`b` AS `b`,`test`.`t6`.`a` AS `a`,`test`.`t6`.`b` AS `b`,`test`.`t7`.`a` AS `a`,`test`.`t7`.`b` AS `b`,`test`.`t8`.`a` AS `a`,`test`.`t8`.`b` AS `b`,`test`.`t9`.`a` AS `a`,`test`.`t9`.`b` AS `b` from `test`.`t0` join `test`.`t1` left join (`test`.`t2` left join (`test`.`t3` join `test`.`t4`) on(((`test`.`t4`.`b` = `test`.`t2`.`b`) and (`test`.`t3`.`a` = 1))) join `test`.`t5` left join (`test`.`t6` join `test`.`t7` left join `test`.`t8` on(((`test`.`t8`.`b` = `test`.`t5`.`b`) and (`test`.`t6`.`b` < 10)))) on(((`test`.`t7`.`b` = `test`.`t5`.`b`) and (`test`.`t6`.`b` >= 2)))) on((((`test`.`t3`.`b` = 2) or isnull(`test`.`t3`.`c`)) and ((`test`.`t6`.`b` = 2) or isnull(`test`.`t6`.`c`)) and ((`test`.`t5`.`b` = `test`.`t0`.`b`) or isnull(`test`.`t3`.`c`) or isnull(`test`.`t6`.`c`) or isnull(`test`.`t8`.`c`)) and (`test`.`t1`.`a` <> 2))) join `test`.`t9` where ((`test`.`t9`.`a` = 1) and (`test`.`t1`.`b` = `test`.`t0`.`b`) and (`test`.`t0`.`a` = 1) and ((`test`.`t2`.`a` >= 4) or isnull(`test`.`t2`.`c`)) and ((`test`.`t3`.`a` < 5) or isnull(`test`.`t3`.`c`)) and ((`test`.`t3`.`b` = `test`.`t4`.`b`) or isnull(`test`.`t3`.`c`) or isnull(`test`.`t4`.`c`)) and ((`test`.`t5`.`a` >= 2) or isnull(`test`.`t5`.`c`)) and ((`test`.`t6`.`a` >= 4) or isnull(`test`.`t6`.`c`)) and ((`test`.`t7`.`a` <= 2) or isnull(`test`.`t7`.`c`)) and ((`test`.`t8`.`a` < 1) or isnull(`test`.`t8`.`c`)) and ((`test`.`t9`.`b` = `test`.`t8`.`b`) or isnull(`test`.`t8`.`c`))) Warnings: -Note 1003 select `test`.`t0`.`a` AS `a`,`test`.`t0`.`b` AS `b`,`test`.`t1`.`a` AS `a`,`test`.`t1`.`b` AS `b`,`test`.`t2`.`a` AS `a`,`test`.`t2`.`b` AS `b`,`test`.`t3`.`a` AS `a`,`test`.`t3`.`b` AS `b`,`test`.`t4`.`a` AS `a`,`test`.`t4`.`b` AS `b`,`test`.`t5`.`a` AS `a`,`test`.`t5`.`b` AS `b`,`test`.`t6`.`a` AS `a`,`test`.`t6`.`b` AS `b`,`test`.`t7`.`a` AS `a`,`test`.`t7`.`b` AS `b`,`test`.`t8`.`a` AS `a`,`test`.`t8`.`b` AS `b`,`test`.`t9`.`a` AS `a`,`test`.`t9`.`b` AS `b` from `test`.`t0` join `test`.`t1` left join (`test`.`t2` left join (`test`.`t3` join `test`.`t4`) on(((`test`.`t4`.`b` = `test`.`t2`.`b`) and (`test`.`t3`.`a` = 1))) join `test`.`t5` left join (`test`.`t6` join `test`.`t7` left join `test`.`t8` on(((`test`.`t8`.`b` = `test`.`t5`.`b`) and (`test`.`t6`.`b` < 10)))) on(((`test`.`t7`.`b` = `test`.`t5`.`b`) and (`test`.`t6`.`b` >= 2)))) on((((`test`.`t3`.`b` = 2) or isnull(`test`.`t3`.`c`)) and ((`test`.`t6`.`b` = 2) or isnull(`test`.`t6`.`c`)) and ((`test`.`t5`.`b` = `test`.`t0`.`b`) or isnull(`test`.`t3`.`c`) or isnull(`test`.`t6`.`c`) or isnull(`test`.`t8`.`c`)) and (`test`.`t1`.`a` <> 2))) join `test`.`t9` where ((`test`.`t9`.`a` = 1) and (`test`.`t1`.`b` = `test`.`t0`.`b`) and (`test`.`t0`.`a` = 1) and ((`test`.`t2`.`a` >= 4) or isnull(`test`.`t2`.`c`)) and ((`test`.`t3`.`a` < 5) or isnull(`test`.`t3`.`c`)) and ((`test`.`t4`.`b` = `test`.`t3`.`b`) or isnull(`test`.`t3`.`c`) or isnull(`test`.`t4`.`c`)) and ((`test`.`t5`.`a` >= 2) or isnull(`test`.`t5`.`c`)) and ((`test`.`t6`.`a` >= 4) or isnull(`test`.`t6`.`c`)) and ((`test`.`t7`.`a` <= 2) or isnull(`test`.`t7`.`c`)) and ((`test`.`t8`.`a` < 1) or isnull(`test`.`t8`.`c`)) and ((`test`.`t9`.`b` = `test`.`t8`.`b`) or isnull(`test`.`t8`.`c`))) SELECT t0.a,t0.b,t1.a,t1.b,t2.a,t2.b,t3.a,t3.b,t4.a,t4.b, t5.a,t5.b,t6.a,t6.b,t7.a,t7.b,t8.a,t8.b,t9.a,t9.b FROM t0,t1 @@ -1102,21 +1102,21 @@ t0.b=t1.b AND (t9.a=1); a b a b a b a b a b a b a b a b a b a b 1 2 2 2 NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL 1 1 -1 2 3 2 4 2 1 2 3 2 3 1 6 2 1 1 NULL NULL 1 1 -1 2 3 2 4 2 1 2 3 2 3 3 NULL NULL NULL NULL NULL NULL 1 1 -1 2 3 2 4 2 1 2 4 2 3 1 6 2 1 1 NULL NULL 1 1 -1 2 3 2 4 2 1 2 4 2 3 3 NULL NULL NULL NULL NULL NULL 1 1 -1 2 3 2 5 3 NULL NULL NULL NULL 3 1 6 2 1 1 NULL NULL 1 1 -1 2 3 2 5 3 NULL NULL NULL NULL 3 3 NULL NULL NULL NULL NULL NULL 1 1 1 2 2 2 NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL 1 2 -1 2 3 2 4 2 1 2 3 2 3 1 6 2 1 1 NULL NULL 1 2 1 2 3 2 4 2 1 2 3 2 2 2 6 2 2 2 0 2 1 2 +1 2 3 2 4 2 1 2 3 2 3 1 6 2 1 1 NULL NULL 1 1 +1 2 3 2 4 2 1 2 3 2 3 1 6 2 1 1 NULL NULL 1 2 +1 2 3 2 4 2 1 2 3 2 3 3 NULL NULL NULL NULL NULL NULL 1 1 1 2 3 2 4 2 1 2 3 2 3 3 NULL NULL NULL NULL NULL NULL 1 2 -1 2 3 2 4 2 1 2 4 2 3 1 6 2 1 1 NULL NULL 1 2 1 2 3 2 4 2 1 2 4 2 2 2 6 2 2 2 0 2 1 2 +1 2 3 2 4 2 1 2 4 2 3 1 6 2 1 1 NULL NULL 1 1 +1 2 3 2 4 2 1 2 4 2 3 1 6 2 1 1 NULL NULL 1 2 +1 2 3 2 4 2 1 2 4 2 3 3 NULL NULL NULL NULL NULL NULL 1 1 1 2 3 2 4 2 1 2 4 2 3 3 NULL NULL NULL NULL NULL NULL 1 2 -1 2 3 2 5 3 NULL NULL NULL NULL 3 1 6 2 1 1 NULL NULL 1 2 1 2 3 2 5 3 NULL NULL NULL NULL 2 2 6 2 2 2 0 2 1 2 +1 2 3 2 5 3 NULL NULL NULL NULL 3 1 6 2 1 1 NULL NULL 1 1 +1 2 3 2 5 3 NULL NULL NULL NULL 3 1 6 2 1 1 NULL NULL 1 2 +1 2 3 2 5 3 NULL NULL NULL NULL 3 3 NULL NULL NULL NULL NULL NULL 1 1 1 2 3 2 5 3 NULL NULL NULL NULL 3 3 NULL NULL NULL NULL NULL NULL 1 2 SELECT t2.a,t2.b FROM t2; @@ -1203,7 +1203,7 @@ EXPLAIN SELECT a, b, c FROM t1 LEFT JOIN (t2, t3) ON c < 3 and b = c; id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE t1 index NULL a 5 NULL 21 Using index 1 SIMPLE t3 index c c 5 NULL 6 Using index -1 SIMPLE t2 ref b b 5 test.t3.c 2 Using index +1 SIMPLE t2 ref b b 5 test.t3.c 1 Using index EXPLAIN SELECT a, b, c FROM t1 LEFT JOIN (t2, t3) ON b < 3 and b = c; id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE t1 index NULL a 5 NULL # Using index @@ -1484,8 +1484,8 @@ explain select * from t1 left join on (t1.a = t2.a); id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE t1 ALL NULL NULL NULL NULL 10 -1 SIMPLE t2 ref a a 5 test.t1.a 11 -1 SIMPLE t3 ref a a 5 test.t2.a 11 +1 SIMPLE t2 ref a a 5 test.t1.a 1 +1 SIMPLE t3 ref a a 5 test.t2.a 1 drop table t1, t2, t3; CREATE TABLE t1 (id int NOT NULL PRIMARY KEY, type varchar(10)); CREATE TABLE t2 (pid int NOT NULL PRIMARY KEY, type varchar(10)); diff --git a/mysql-test/suite/pbxt/r/key.result b/mysql-test/suite/pbxt/r/key.result index 0b964a84a4b..d727394f616 100644 --- a/mysql-test/suite/pbxt/r/key.result +++ b/mysql-test/suite/pbxt/r/key.result @@ -153,7 +153,7 @@ t1 0 PRIMARY 1 d A 0 NULL NULL BTREE t1 0 a 1 a A 0 NULL NULL BTREE t1 0 e 1 e A 0 NULL NULL BTREE t1 0 b 1 b A 0 NULL NULL YES BTREE -t1 1 c 1 c A NULL NULL NULL YES BTREE +t1 1 c 1 c A 0 NULL NULL YES BTREE drop table t1; CREATE TABLE t1 (c CHAR(10) NOT NULL,i INT NOT NULL AUTO_INCREMENT, UNIQUE (c,i)); diff --git a/mysql-test/suite/pbxt/r/key_cache.result b/mysql-test/suite/pbxt/r/key_cache.result index 8d71c6ce930..5ff41bd29d7 100644 --- a/mysql-test/suite/pbxt/r/key_cache.result +++ b/mysql-test/suite/pbxt/r/key_cache.result @@ -122,7 +122,7 @@ i explain select count(*) from t1, t2 where t1.p = t2.i; id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE t1 index PRIMARY PRIMARY 4 NULL 2 Using index -1 SIMPLE t2 ref k1 k1 5 test.t1.p 2 Using where; Using index +1 SIMPLE t2 ref k1 k1 5 test.t1.p 1 Using where; Using index select count(*) from t1, t2 where t1.p = t2.i; count(*) 3 @@ -257,8 +257,6 @@ test.t2 assign_to_keycache note The storage engine for the table doesn't support drop table t1,t2,t3; set global keycache2.key_buffer_size=0; set global keycache3.key_buffer_size=100; -Warnings: -Warning 1292 Truncated incorrect key_buffer_size value: '100' set global keycache3.key_buffer_size=0; create table t1 (mytext text, FULLTEXT (mytext)) engine=myisam; insert t1 values ('aaabbb'); diff --git a/mysql-test/suite/pbxt/r/key_diff.result b/mysql-test/suite/pbxt/r/key_diff.result index 9d26bee4557..33bedfcc39e 100644 --- a/mysql-test/suite/pbxt/r/key_diff.result +++ b/mysql-test/suite/pbxt/r/key_diff.result @@ -36,7 +36,7 @@ a a a a explain select t1.*,t2.* from t1,t1 as t2 where t1.A=t2.B; id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE t1 ALL a NULL NULL NULL 5 -1 SIMPLE t2 ALL b NULL NULL NULL 5 Using where; Using join buffer +1 SIMPLE t2 ref b b 4 test.t1.a 1 Using where select t1.*,t2.* from t1,t1 as t2 where t1.A=t2.B order by binary t1.a,t2.a; a b a b A B a a diff --git a/mysql-test/suite/pbxt/r/lowercase_view.result b/mysql-test/suite/pbxt/r/lowercase_view.result index 0debf20108a..c37dc41c495 100644 --- a/mysql-test/suite/pbxt/r/lowercase_view.result +++ b/mysql-test/suite/pbxt/r/lowercase_view.result @@ -119,7 +119,7 @@ create table t1Aa (col1 int); create view v1Aa as select col1 from t1Aa as AaA; show create view v1AA; View Create View character_set_client collation_connection -v1aa CREATE ALGORITHM=UNDEFINED DEFINER=`root`@`localhost` SQL SECURITY DEFINER VIEW `v1aa` AS select `AaA`.`col1` AS `col1` from `t1aa` `AaA` latin1 latin1_swedish_ci +v1aa CREATE ALGORITHM=UNDEFINED DEFINER=`root`@`localhost` SQL SECURITY DEFINER VIEW `v1aa` AS select `aaa`.`col1` AS `col1` from `t1aa` `aaa` latin1 latin1_swedish_ci drop view v1AA; select Aaa.col1 from t1Aa as AaA; col1 @@ -128,7 +128,7 @@ drop view v1AA; create view v1Aa as select AaA.col1 from t1Aa as AaA; show create view v1AA; View Create View character_set_client collation_connection -v1aa CREATE ALGORITHM=UNDEFINED DEFINER=`root`@`localhost` SQL SECURITY DEFINER VIEW `v1aa` AS select `AaA`.`col1` AS `col1` from `t1aa` `AaA` latin1 latin1_swedish_ci +v1aa CREATE ALGORITHM=UNDEFINED DEFINER=`root`@`localhost` SQL SECURITY DEFINER VIEW `v1aa` AS select `aaa`.`col1` AS `col1` from `t1aa` `aaa` latin1 latin1_swedish_ci drop view v1AA; drop table t1Aa; CREATE TABLE t1 (a int, b int); @@ -142,7 +142,7 @@ CREATE OR REPLACE VIEW v1 AS select X.a from t1 AS X group by X.b having (X.a = 1); SHOW CREATE VIEW v1; View Create View character_set_client collation_connection -v1 CREATE ALGORITHM=UNDEFINED DEFINER=`root`@`localhost` SQL SECURITY DEFINER VIEW `v1` AS select `X`.`a` AS `a` from `t1` `X` group by `X`.`b` having (`X`.`a` = 1) latin1 latin1_swedish_ci +v1 CREATE ALGORITHM=UNDEFINED DEFINER=`root`@`localhost` SQL SECURITY DEFINER VIEW `v1` AS select `x`.`a` AS `a` from `t1` `x` group by `x`.`b` having (`x`.`a` = 1) latin1 latin1_swedish_ci SELECT * FROM v1; a DROP VIEW v1; diff --git a/mysql-test/suite/pbxt/r/mysqlshow.result b/mysql-test/suite/pbxt/r/mysqlshow.result index 0e1915dc47a..56b5d125ef3 100644 --- a/mysql-test/suite/pbxt/r/mysqlshow.result +++ b/mysql-test/suite/pbxt/r/mysqlshow.result @@ -107,7 +107,21 @@ Database: information_schema | TRIGGERS | | USER_PRIVILEGES | | VIEWS | +| INNODB_BUFFER_POOL_PAGES | | PBXT_STATISTICS | +| INNODB_CMP | +| INNODB_RSEG | +| XTRADB_ENHANCEMENTS | +| INNODB_BUFFER_POOL_PAGES_INDEX | +| INNODB_INDEX_STATS | +| INNODB_TRX | +| INNODB_CMP_RESET | +| INNODB_LOCK_WAITS | +| INNODB_CMPMEM_RESET | +| INNODB_LOCKS | +| INNODB_CMPMEM | +| INNODB_TABLE_STATS | +| INNODB_BUFFER_POOL_PAGES_BLOB | +---------------------------------------+ Database: INFORMATION_SCHEMA +---------------------------------------+ @@ -141,7 +155,21 @@ Database: INFORMATION_SCHEMA | TRIGGERS | | USER_PRIVILEGES | | VIEWS | +| INNODB_BUFFER_POOL_PAGES | | PBXT_STATISTICS | +| INNODB_CMP | +| INNODB_RSEG | +| XTRADB_ENHANCEMENTS | +| INNODB_BUFFER_POOL_PAGES_INDEX | +| INNODB_INDEX_STATS | +| INNODB_TRX | +| INNODB_CMP_RESET | +| INNODB_LOCK_WAITS | +| INNODB_CMPMEM_RESET | +| INNODB_LOCKS | +| INNODB_CMPMEM | +| INNODB_TABLE_STATS | +| INNODB_BUFFER_POOL_PAGES_BLOB | +---------------------------------------+ Wildcard: inf_rmation_schema +--------------------+ diff --git a/mysql-test/suite/pbxt/r/null.result b/mysql-test/suite/pbxt/r/null.result index 775d169a39a..036ba01ed1d 100644 --- a/mysql-test/suite/pbxt/r/null.result +++ b/mysql-test/suite/pbxt/r/null.result @@ -93,9 +93,11 @@ INSERT INTO t1 SET a = "", d= "2003-01-14 03:54:55"; Warnings: Warning 1265 Data truncated for column 'd' at row 1 UPDATE t1 SET d=1/NULL; -ERROR 23000: Column 'd' cannot be null +Warnings: +Warning 1265 Data truncated for column 'd' at row 1 UPDATE t1 SET d=NULL; -ERROR 23000: Column 'd' cannot be null +Warnings: +Warning 1048 Column 'd' cannot be null INSERT INTO t1 (a) values (null); ERROR 23000: Column 'a' cannot be null INSERT INTO t1 (a) values (1/null); @@ -130,7 +132,7 @@ Warning 1048 Column 'd' cannot be null Warning 1048 Column 'd' cannot be null select * from t1; a b c d - 0 0000-00-00 00:00:00 2003 + 0 0000-00-00 00:00:00 0 0 0000-00-00 00:00:00 0 0 0000-00-00 00:00:00 0 0 0000-00-00 00:00:00 0 diff --git a/mysql-test/suite/pbxt/r/null_key.result b/mysql-test/suite/pbxt/r/null_key.result index c01e0337b32..753ebf31f1c 100644 --- a/mysql-test/suite/pbxt/r/null_key.result +++ b/mysql-test/suite/pbxt/r/null_key.result @@ -407,8 +407,8 @@ EXPLAIN SELECT SQL_CALC_FOUND_ROWS * FROM t1 LEFT JOIN t2 ON t1.a=t2.a LEFT JOIN t3 ON t2.b=t3.b; id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE t1 ALL NULL NULL NULL NULL 4 -1 SIMPLE t2 ref idx idx 5 test.t1.a 2 -1 SIMPLE t3 ref idx idx 5 test.t2.b 186 Using index +1 SIMPLE t2 ref idx idx 5 test.t1.a 1 +1 SIMPLE t3 ref idx idx 5 test.t2.b 1 Using index FLUSH STATUS ; SELECT SQL_CALC_FOUND_ROWS * FROM t1 LEFT JOIN t2 ON t1.a=t2.a LEFT JOIN t3 ON t2.b=t3.b; diff --git a/mysql-test/suite/pbxt/r/partition_pruning.result b/mysql-test/suite/pbxt/r/partition_pruning.result index f003f62c163..9938d352ff7 100644 --- a/mysql-test/suite/pbxt/r/partition_pruning.result +++ b/mysql-test/suite/pbxt/r/partition_pruning.result @@ -338,12 +338,12 @@ select * from t1 X, t1 Y where X.b = Y.b and (X.a=1 or X.a=2) and (Y.a=2 or Y.a=3); id select_type table partitions type possible_keys key key_len ref rows Extra 1 SIMPLE X p1,p2 ALL a,b NULL NULL NULL 2 Using where -1 SIMPLE Y p2,p3 ref a,b b 4 test.X.b 2 Using where +1 SIMPLE Y p2,p3 ref a,b b 4 test.X.b 1 Using where explain partitions select * from t1 X, t1 Y where X.a = Y.a and (X.a=1 or X.a=2); id select_type table partitions type possible_keys key key_len ref rows Extra 1 SIMPLE X p1,p2 ALL a NULL NULL NULL 4 Using where -1 SIMPLE Y p1,p2 ref a a 4 test.X.a 2 +1 SIMPLE Y p1,p2 ref a a 4 test.X.a 1 drop table t1; create table t1 (a int) partition by hash(a) partitions 20; insert into t1 values (1),(2),(3); diff --git a/mysql-test/suite/pbxt/r/pbxt_bugs.result b/mysql-test/suite/pbxt/r/pbxt_bugs.result index 6ebb8459c75..a6db895d3d2 100644 --- a/mysql-test/suite/pbxt/r/pbxt_bugs.result +++ b/mysql-test/suite/pbxt/r/pbxt_bugs.result @@ -1218,3 +1218,59 @@ c1 c2 0 opq 1 jkl DROP TABLE t1; +create table parent (id int primary key); +create table child (id int PRIMARY KEY, FOREIGN KEY (id) REFERENCES parent(id)); +insert into parent values (2), (3), (4); +insert into child values (3), (4); +delete ignore from parent; +ERROR 23000: Cannot delete or update a parent row: a foreign key constraint fails (Constraint: `FOREIGN_1`) +select * from parent; +id +2 +3 +4 +drop table child, parent; +create schema test378222; +use test378222; +create table t1 (id int primary key); +create table t2 (id int primary key); +alter table t1 add constraint foreign key (id) references t2 (id); +alter table t2 add constraint foreign key (id) references t1 (id); +drop schema test378222; +create schema test378222a; +create schema test378222b; +create table test378222a.t1 (id int primary key); +create table test378222b.t2 (id int primary key); +alter table test378222a.t1 add constraint foreign key (id) references test378222b.t2 (id); +alter table test378222b.t2 add constraint foreign key (id) references test378222a.t1 (id); +set foreign_key_checks = 1; +drop schema test378222a; +ERROR 23000: Cannot delete or update a parent row: a foreign key constraint fails +drop schema test378222b; +ERROR 23000: Cannot delete or update a parent row: a foreign key constraint fails +set foreign_key_checks = 0; +drop schema test378222a; +drop schema test378222b; +set foreign_key_checks = 1; +use test; +CREATE TABLE t1(c1 TINYINT AUTO_INCREMENT NULL KEY ) AUTO_INCREMENT=10; +SHOW CREATE TABLE t1; +Table Create Table +t1 CREATE TABLE `t1` ( + `c1` tinyint(4) NOT NULL AUTO_INCREMENT, + PRIMARY KEY (`c1`) +) ENGINE=PBXT AUTO_INCREMENT=10 DEFAULT CHARSET=latin1 +INSERT INTO t1 VALUES(null); +INSERT INTO t1 VALUES(null); +INSERT INTO t1 VALUES(null); +SELECT * FROM t1; +c1 +10 +11 +12 +TRUNCATE TABLE t1; +INSERT INTO t1 VALUES(null); +SELECT * FROM t1; +c1 +1 +DROP TABLE t1; diff --git a/mysql-test/suite/pbxt/r/pbxt_ref_int.result b/mysql-test/suite/pbxt/r/pbxt_ref_int.result index cd86d122452..6f096f064ee 100644 --- a/mysql-test/suite/pbxt/r/pbxt_ref_int.result +++ b/mysql-test/suite/pbxt/r/pbxt_ref_int.result @@ -166,7 +166,7 @@ child CREATE TABLE `child` ( `parent_id` int(11) DEFAULT NULL, KEY `par_ind` (`parent_id`), KEY `child_ind` (`id`), - CONSTRAINT `FOREIGN_1` FOREIGN KEY (`parent_id`) REFERENCES `parent` (`id`) + CONSTRAINT `FOREIGN_1` FOREIGN KEY (`parent_id`) REFERENCES `test`.`parent` (`id`) ) ENGINE=PBXT DEFAULT CHARSET=latin1 drop index child_ind on child; show create table child; @@ -175,7 +175,7 @@ child CREATE TABLE `child` ( `id` int(11) DEFAULT NULL, `parent_id` int(11) DEFAULT NULL, KEY `par_ind` (`parent_id`), - CONSTRAINT `FOREIGN_1` FOREIGN KEY (`parent_id`) REFERENCES `parent` (`id`) + CONSTRAINT `FOREIGN_1` FOREIGN KEY (`parent_id`) REFERENCES `test`.`parent` (`id`) ) ENGINE=PBXT DEFAULT CHARSET=latin1 alter table parent add column c1 varchar(40); insert child values(2000, 2); @@ -243,7 +243,7 @@ child CREATE TABLE `child` ( `id` int(11) DEFAULT NULL, `parent_id` int(11) DEFAULT NULL, KEY `par_ind` (`parent_id`), - CONSTRAINT `FOREIGN_1` FOREIGN KEY (`parent_id`) REFERENCES `parent` (`id`) + CONSTRAINT `FOREIGN_1` FOREIGN KEY (`parent_id`) REFERENCES `test`.`parent` (`id`) ) ENGINE=PBXT DEFAULT CHARSET=latin1 alter table child add column c1 varchar(40); insert child values(400, 1, "asd"); @@ -284,7 +284,7 @@ child CREATE TABLE `child` ( `id` int(11) DEFAULT NULL, `parent_id` int(11) DEFAULT NULL, KEY `par_ind` (`parent_id`), - CONSTRAINT `FOREIGN_1` FOREIGN KEY (`parent_id`) REFERENCES `parent` (`id`) ON DELETE CASCADE + CONSTRAINT `FOREIGN_1` FOREIGN KEY (`parent_id`) REFERENCES `test`.`parent` (`id`) ON DELETE CASCADE ) ENGINE=PBXT DEFAULT CHARSET=latin1 insert parent values(1); insert child values(100, 1); diff --git a/mysql-test/suite/pbxt/r/preload.result b/mysql-test/suite/pbxt/r/preload.result index 285b58e210e..cdd19dec861 100644 --- a/mysql-test/suite/pbxt/r/preload.result +++ b/mysql-test/suite/pbxt/r/preload.result @@ -144,7 +144,7 @@ Key_reads 0 load index into cache t3, t2 key (primary,b) ; Table Op Msg_type Msg_text test.t3 preload_keys Error Table 'test.t3' doesn't exist -test.t3 preload_keys error Corrupt +test.t3 preload_keys status Operation failed test.t2 preload_keys note The storage engine for the table doesn't support preload_keys show status like "key_read%"; Variable_name Value @@ -159,7 +159,7 @@ Key_reads 0 load index into cache t3 key (b), t2 key (c) ; Table Op Msg_type Msg_text test.t3 preload_keys Error Table 'test.t3' doesn't exist -test.t3 preload_keys error Corrupt +test.t3 preload_keys status Operation failed test.t2 preload_keys note The storage engine for the table doesn't support preload_keys show status like "key_read%"; Variable_name Value diff --git a/mysql-test/suite/pbxt/r/ps_1general.result b/mysql-test/suite/pbxt/r/ps_1general.result index 6584274ecf6..baa944eebab 100644 --- a/mysql-test/suite/pbxt/r/ps_1general.result +++ b/mysql-test/suite/pbxt/r/ps_1general.result @@ -289,11 +289,11 @@ prepare stmt4 from ' show index from t2 from test '; execute stmt4; Table Non_unique Key_name Seq_in_index Column_name Collation Cardinality Sub_part Packed Null Index_type Comment t2 0 PRIMARY 1 a A 0 NULL NULL BTREE -t2 1 t2_idx 1 b A NULL NULL NULL YES BTREE +t2 1 t2_idx 1 b A 0 NULL NULL YES BTREE prepare stmt4 from ' show table status from test like ''t2%'' '; execute stmt4; Name Engine Version Row_format Rows Avg_row_length Data_length Max_data_length Index_length Data_free Auto_increment Create_time Update_time Check_time Collation Checksum Create_options Comment -t2 PBXT 10 Fixed 0 29 1 # 4096 0 NULL # # # latin1_swedish_ci NULL +t2 PBXT 10 Fixed 0 29 1024 # 4096 0 NULL # # # latin1_swedish_ci NULL prepare stmt4 from ' show table status from test like ''t9%'' '; execute stmt4; Name Engine Version Row_format Rows Avg_row_length Data_length Max_data_length Index_length Data_free Auto_increment Create_time Update_time Check_time Collation Checksum Create_options Comment @@ -447,7 +447,7 @@ def type 253 10 3 Y 0 31 8 def possible_keys 253 4096 0 Y 0 31 8 def key 253 64 0 Y 0 31 8 def key_len 253 4096 0 Y 0 31 8 -def ref 253 1024 0 Y 0 31 8 +def ref 253 2048 0 Y 0 31 8 def rows 8 10 1 Y 32928 0 63 def Extra 253 255 14 N 1 31 8 id select_type table type possible_keys key key_len ref rows Extra @@ -463,7 +463,7 @@ def type 253 10 5 Y 0 31 8 def possible_keys 253 4096 7 Y 0 31 8 def key 253 64 7 Y 0 31 8 def key_len 253 4096 1 Y 0 31 8 -def ref 253 1024 0 Y 0 31 8 +def ref 253 2048 0 Y 0 31 8 def rows 8 10 1 Y 32928 0 63 def Extra 253 255 27 N 1 31 8 id select_type table type possible_keys key key_len ref rows Extra diff --git a/mysql-test/suite/pbxt/r/range.result b/mysql-test/suite/pbxt/r/range.result index 758c2a064de..d9cf8ac704f 100644 --- a/mysql-test/suite/pbxt/r/range.result +++ b/mysql-test/suite/pbxt/r/range.result @@ -423,19 +423,19 @@ test.t2 analyze status OK explain select * from t1, t2 where t1.uid=t2.uid AND t1.uid > 0; id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE t1 range uid_index uid_index 4 NULL 1 Using where -1 SIMPLE t2 ref uid_index uid_index 4 test.t1.uid 12 +1 SIMPLE t2 ref uid_index uid_index 4 test.t1.uid 1 explain select * from t1, t2 where t1.uid=t2.uid AND t2.uid > 0; id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE t1 range uid_index uid_index 4 NULL 1 Using where -1 SIMPLE t2 ref uid_index uid_index 4 test.t1.uid 12 +1 SIMPLE t2 ref uid_index uid_index 4 test.t1.uid 1 explain select * from t1, t2 where t1.uid=t2.uid AND t1.uid != 0; id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE t1 range uid_index uid_index 4 NULL 2 Using where -1 SIMPLE t2 ref uid_index uid_index 4 test.t1.uid 12 +1 SIMPLE t2 ref uid_index uid_index 4 test.t1.uid 1 explain select * from t1, t2 where t1.uid=t2.uid AND t2.uid != 0; id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE t1 range uid_index uid_index 4 NULL 2 Using where -1 SIMPLE t2 ref uid_index uid_index 4 test.t1.uid 12 +1 SIMPLE t2 ref uid_index uid_index 4 test.t1.uid 1 select * from t1, t2 where t1.uid=t2.uid AND t1.uid > 0; id name uid id name uid 1001 A 1 1001 A 1 diff --git a/mysql-test/suite/pbxt/r/schema.result b/mysql-test/suite/pbxt/r/schema.result index 564fb3626df..4167119d932 100644 --- a/mysql-test/suite/pbxt/r/schema.result +++ b/mysql-test/suite/pbxt/r/schema.result @@ -3,11 +3,13 @@ create schema foo; show create schema foo; Database Create Database foo CREATE DATABASE `foo` /*!40100 DEFAULT CHARACTER SET latin1 */ +create table t1 (id int) engine=pbxt; show schemas; Database information_schema foo mtr mysql +pbxt test drop schema foo; diff --git a/mysql-test/suite/pbxt/r/select.result b/mysql-test/suite/pbxt/r/select.result index 53127fb0dab..41137e9e8dd 100644 --- a/mysql-test/suite/pbxt/r/select.result +++ b/mysql-test/suite/pbxt/r/select.result @@ -604,15 +604,15 @@ id select_type table type possible_keys key key_len ref rows Extra explain select * from t3 as t1,t3 where t1.period=t3.period order by t3.period; id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE t1 ALL period NULL NULL NULL 41810 Using temporary; Using filesort -1 SIMPLE t3 ref period period 4 test.t1.period 18 +1 SIMPLE t3 ref period period 4 test.t1.period 1 explain select * from t3 as t1,t3 where t1.period=t3.period order by t3.period limit 10; id select_type table type possible_keys key key_len ref rows Extra -1 SIMPLE t3 index period period 4 NULL 1 -1 SIMPLE t1 ref period period 4 test.t3.period 18 +1 SIMPLE t3 index period period 4 NULL 10 +1 SIMPLE t1 ref period period 4 test.t3.period 1 explain select * from t3 as t1,t3 where t1.period=t3.period order by t1.period limit 10; id select_type table type possible_keys key key_len ref rows Extra -1 SIMPLE t1 index period period 4 NULL 1 -1 SIMPLE t3 ref period period 4 test.t1.period 18 +1 SIMPLE t1 index period period 4 NULL 10 +1 SIMPLE t3 ref period period 4 test.t1.period 1 select period from t1; period 9410 @@ -2095,7 +2095,7 @@ show keys from t2; Table Non_unique Key_name Seq_in_index Column_name Collation Cardinality Sub_part Packed Null Index_type Comment t2 0 PRIMARY 1 auto A 1199 NULL NULL BTREE t2 0 fld1 1 fld1 A 1199 NULL NULL BTREE -t2 1 fld3 1 fld3 A NULL NULL NULL BTREE +t2 1 fld3 1 fld3 A 1199 NULL NULL BTREE drop table t4, t3, t2, t1; DO 1; DO benchmark(100,1+1),1,1; @@ -2369,7 +2369,7 @@ insert into t2 values (1,3), (2,3), (3,4), (4,4); explain select * from t1 left join t2 on a=c where d in (4); id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE t2 ref c,d d 5 const 1 Using where -1 SIMPLE t1 ref a a 5 test.t2.c 2 Using where +1 SIMPLE t1 ref a a 5 test.t2.c 1 Using where select * from t1 left join t2 on a=c where d in (4); a b c d 3 2 3 4 @@ -2377,7 +2377,7 @@ a b c d explain select * from t1 left join t2 on a=c where d = 4; id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE t2 ref c,d d 5 const 1 Using where -1 SIMPLE t1 ref a a 5 test.t2.c 2 Using where +1 SIMPLE t1 ref a a 5 test.t2.c 1 Using where select * from t1 left join t2 on a=c where d = 4; a b c d 3 2 3 4 @@ -2403,11 +2403,11 @@ INSERT INTO t2 VALUES ('one'),('two'),('three'),('four'),('five'); EXPLAIN SELECT * FROM t1 LEFT JOIN t2 USE INDEX (a) ON t1.a=t2.a; id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE t1 ALL NULL NULL NULL NULL 5 -1 SIMPLE t2 ref a a 23 test.t1.a 2 +1 SIMPLE t2 ref a a 23 test.t1.a 1 EXPLAIN SELECT * FROM t1 LEFT JOIN t2 FORCE INDEX (a) ON t1.a=t2.a; id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE t1 ALL NULL NULL NULL NULL 5 -1 SIMPLE t2 ref a a 23 test.t1.a 2 +1 SIMPLE t2 ref a a 23 test.t1.a 1 DROP TABLE t1, t2; CREATE TABLE t1 ( city char(30) ); INSERT INTO t1 VALUES ('London'); @@ -2792,26 +2792,26 @@ id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE NULL NULL NULL NULL NULL NULL NULL Select tables optimized away select max(key1) from t1 where key1 <= 0.6158; max(key1) -0.61580002307892 +0.615800023078918 select max(key2) from t2 where key2 <= 1.6158; max(key2) -1.6158000230789 +1.61580002307892 select min(key1) from t1 where key1 >= 0.3762; min(key1) -0.37619999051094 +0.376199990510941 select min(key2) from t2 where key2 >= 1.3762; min(key2) -1.3761999607086 +1.37619996070862 select max(key1), min(key2) from t1, t2 where key1 <= 0.6158 and key2 >= 1.3762; max(key1) min(key2) -0.61580002307892 1.3761999607086 +0.615800023078918 1.37619996070862 select max(key1) from t1 where key1 <= 0.6158 and rand() + 0.5 >= 0.5; max(key1) -0.61580002307892 +0.615800023078918 select min(key1) from t1 where key1 >= 0.3762 and rand() + 0.5 >= 0.5; min(key1) -0.37619999051094 +0.376199990510941 DROP TABLE t1,t2; CREATE TABLE t1 (i BIGINT UNSIGNED NOT NULL); INSERT INTO t1 VALUES (10); @@ -3454,7 +3454,7 @@ explain select * from t2 A, t2 B where A.a=5 and A.b=5 and A.C<5 and B.a=5 and B.b=A.e and (B.b =1 or B.b = 3 or B.b=5); id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE A range PRIMARY PRIMARY 12 NULL 1 Using where -1 SIMPLE B ref PRIMARY PRIMARY 8 const,test.A.e 11 +1 SIMPLE B ref PRIMARY PRIMARY 8 const,test.A.e 1 drop table t1, t2; CREATE TABLE t1 (a int PRIMARY KEY, b int, INDEX(b)); INSERT INTO t1 VALUES (1, 3), (9,4), (7,5), (4,5), (6,2), @@ -3468,12 +3468,12 @@ EXPLAIN SELECT a, c, d, f FROM t1,t2 WHERE a=c AND b BETWEEN 4 AND 6; id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE t1 range PRIMARY,b b 5 NULL 1 Using where -1 SIMPLE t2 ref c c 5 test.t1.a 2 Using where +1 SIMPLE t2 ref c c 5 test.t1.a 1 Using where EXPLAIN SELECT a, c, d, f FROM t1,t2 WHERE a=c AND b BETWEEN 4 AND 6 AND a > 0; id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE t1 range PRIMARY,b PRIMARY 4 NULL 1 Using where -1 SIMPLE t2 ref c c 5 test.t1.a 2 Using where +1 SIMPLE t2 ref c c 5 test.t1.a 1 Using where DROP TABLE t1, t2; create table t1 ( a int unsigned not null auto_increment primary key, diff --git a/mysql-test/suite/pbxt/r/select_safe.result b/mysql-test/suite/pbxt/r/select_safe.result index 468eb6cc5a9..ea0c2156d55 100644 --- a/mysql-test/suite/pbxt/r/select_safe.result +++ b/mysql-test/suite/pbxt/r/select_safe.result @@ -70,12 +70,12 @@ insert into t1 values (null,"a"),(null,"a"),(null,"a"),(null,"a"),(null,"a"),(nu explain select STRAIGHT_JOIN * from t1,t1 as t2 where t1.b=t2.b; id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE t1 ALL b NULL NULL NULL 21 -1 SIMPLE t2 ref b b 21 test.t1.b 2 Using where +1 SIMPLE t2 ref b b 21 test.t1.b 1 Using where set MAX_SEEKS_FOR_KEY=1; explain select STRAIGHT_JOIN * from t1,t1 as t2 where t1.b=t2.b; id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE t1 ALL b NULL NULL NULL 21 -1 SIMPLE t2 ref b b 21 test.t1.b 2 Using where +1 SIMPLE t2 ref b b 21 test.t1.b 1 Using where SET MAX_SEEKS_FOR_KEY=DEFAULT; drop table t1; create table t1 (a int); diff --git a/mysql-test/suite/pbxt/r/subselect.result b/mysql-test/suite/pbxt/r/subselect.result index 395770111ff..bae93f76b2c 100644 --- a/mysql-test/suite/pbxt/r/subselect.result +++ b/mysql-test/suite/pbxt/r/subselect.result @@ -1333,7 +1333,7 @@ a explain extended select * from t2 where t2.a in (select a from t1); id select_type table type possible_keys key key_len ref rows filtered Extra 1 PRIMARY t2 index NULL a 5 NULL 4 100.00 Using where; Using index -2 DEPENDENT SUBQUERY t1 index_subquery a a 5 func 1001 100.00 Using index; Using where +2 DEPENDENT SUBQUERY t1 index_subquery a a 5 func 1 100.00 Using index; Using where Warnings: Note 1003 select `test`.`t2`.`a` AS `a` from `test`.`t2` where <in_optimizer>(`test`.`t2`.`a`,<exists>(<index_lookup>(<cache>(`test`.`t2`.`a`) in t1 on a where (<cache>(`test`.`t2`.`a`) = `test`.`t1`.`a`)))) select * from t2 where t2.a in (select a from t1 where t1.b <> 30); @@ -1343,7 +1343,7 @@ a explain extended select * from t2 where t2.a in (select a from t1 where t1.b <> 30); id select_type table type possible_keys key key_len ref rows filtered Extra 1 PRIMARY t2 index NULL a 5 NULL 4 100.00 Using where; Using index -2 DEPENDENT SUBQUERY t1 index_subquery a a 5 func 1001 100.00 Using index; Using where +2 DEPENDENT SUBQUERY t1 index_subquery a a 5 func 1 100.00 Using index; Using where Warnings: Note 1003 select `test`.`t2`.`a` AS `a` from `test`.`t2` where <in_optimizer>(`test`.`t2`.`a`,<exists>(<index_lookup>(<cache>(`test`.`t2`.`a`) in t1 on a where ((`test`.`t1`.`b` <> 30) and (<cache>(`test`.`t2`.`a`) = `test`.`t1`.`a`))))) select * from t2 where t2.a in (select t1.a from t1,t3 where t1.b=t3.a); @@ -1353,8 +1353,8 @@ a explain extended select * from t2 where t2.a in (select t1.a from t1,t3 where t1.b=t3.a); id select_type table type possible_keys key key_len ref rows filtered Extra 1 PRIMARY t2 index NULL a 5 NULL 4 100.00 Using where; Using index -2 DEPENDENT SUBQUERY t1 ref a a 5 func 1001 100.00 Using where; Using index -2 DEPENDENT SUBQUERY t3 index a a 5 NULL 3 100.00 Using where; Using index; Using join buffer +2 DEPENDENT SUBQUERY t1 ref a a 5 func 1 100.00 Using where; Using index +2 DEPENDENT SUBQUERY t3 ref a a 5 test.t1.b 1 100.00 Using where; Using index Warnings: Note 1003 select `test`.`t2`.`a` AS `a` from `test`.`t2` where <in_optimizer>(`test`.`t2`.`a`,<exists>(select 1 AS `Not_used` from `test`.`t1` join `test`.`t3` where ((`test`.`t3`.`a` = `test`.`t1`.`b`) and (<cache>(`test`.`t2`.`a`) = `test`.`t1`.`a`)))) insert into t1 values (3,31); @@ -1370,7 +1370,7 @@ a explain extended select * from t2 where t2.a in (select a from t1 where t1.b <> 30); id select_type table type possible_keys key key_len ref rows filtered Extra 1 PRIMARY t2 index NULL a 5 NULL 4 100.00 Using where; Using index -2 DEPENDENT SUBQUERY t1 index_subquery a a 5 func 1001 100.00 Using index; Using where +2 DEPENDENT SUBQUERY t1 index_subquery a a 5 func 1 100.00 Using index; Using where Warnings: Note 1003 select `test`.`t2`.`a` AS `a` from `test`.`t2` where <in_optimizer>(`test`.`t2`.`a`,<exists>(<index_lookup>(<cache>(`test`.`t2`.`a`) in t1 on a where ((`test`.`t1`.`b` <> 30) and (<cache>(`test`.`t2`.`a`) = `test`.`t1`.`a`))))) drop table t1, t2, t3; @@ -3546,7 +3546,7 @@ ORDER BY t1.t DESC LIMIT 1); id select_type table type possible_keys key key_len ref rows Extra 1 PRIMARY t2 ALL NULL NULL NULL NULL 1 1 PRIMARY t1 index NULL PRIMARY 16 NULL 11 Using where; Using index; Using join buffer -2 DEPENDENT SUBQUERY t1 ref PRIMARY PRIMARY 8 test.t2.i1,const 2 Using where; Using index; Using filesort +2 DEPENDENT SUBQUERY t1 ref PRIMARY PRIMARY 8 test.t2.i1,const 1 Using where; Using index; Using filesort SELECT * FROM t1,t2 WHERE t1.t = (SELECT t1.t FROM t1 WHERE t1.t < t2.t AND t1.i2=1 AND t2.i1=t1.i1 @@ -4214,7 +4214,7 @@ CREATE INDEX I2 ON t1 (b); EXPLAIN SELECT a,b FROM t1 WHERE b IN (SELECT a FROM t1); id select_type table type possible_keys key key_len ref rows Extra 1 PRIMARY t1 ALL NULL NULL NULL NULL 2 Using where -2 DEPENDENT SUBQUERY t1 index_subquery I1 I1 2 func 2 Using index; Using where +2 DEPENDENT SUBQUERY t1 index_subquery I1 I1 2 func 1 Using index; Using where SELECT a,b FROM t1 WHERE b IN (SELECT a FROM t1); a b CREATE TABLE t2 (a VARCHAR(1), b VARCHAR(10)); @@ -4224,14 +4224,14 @@ CREATE INDEX I2 ON t2 (b); EXPLAIN SELECT a,b FROM t2 WHERE b IN (SELECT a FROM t2); id select_type table type possible_keys key key_len ref rows Extra 1 PRIMARY t2 ALL NULL NULL NULL NULL 2 Using where -2 DEPENDENT SUBQUERY t2 index_subquery I1 I1 4 func 2 Using index; Using where +2 DEPENDENT SUBQUERY t2 index_subquery I1 I1 4 func 1 Using index; Using where SELECT a,b FROM t2 WHERE b IN (SELECT a FROM t2); a b EXPLAIN SELECT a,b FROM t1 WHERE b IN (SELECT a FROM t1 WHERE LENGTH(a)<500); id select_type table type possible_keys key key_len ref rows Extra 1 PRIMARY t1 ALL NULL NULL NULL NULL 2 Using where -2 DEPENDENT SUBQUERY t1 index_subquery I1 I1 2 func 2 Using index; Using where +2 DEPENDENT SUBQUERY t1 index_subquery I1 I1 2 func 1 Using index; Using where SELECT a,b FROM t1 WHERE b IN (SELECT a FROM t1 WHERE LENGTH(a)<500); a b DROP TABLE t1,t2; diff --git a/mysql-test/suite/pbxt/r/type_enum.result b/mysql-test/suite/pbxt/r/type_enum.result index 7cfb227e6c5..a84cf0d4edd 100644 --- a/mysql-test/suite/pbxt/r/type_enum.result +++ b/mysql-test/suite/pbxt/r/type_enum.result @@ -1776,8 +1776,14 @@ t1 CREATE TABLE `t1` ( `russian_deviant` enum('E','F','EÿF','F,E') NOT NULL DEFAULT 'E' ) ENGINE=PBXT DEFAULT CHARSET=latin1 drop table t1; +select @@SESSION.sql_mode; +@@SESSION.sql_mode + +select @@GLOBAL.sql_mode; +@@GLOBAL.sql_mode + create table t1(exhausting_charset enum('ABCDEFGHIJKLMNOPQRSTUVWXYZ',' !"','#$%&\'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~','xx\','yy\€','zz‚ƒ„…†‡ˆ‰Š‹ŒŽ‘’“”•–—˜™š›œžŸ ¡¢£¤¥¦§¨©ª«¬®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖרÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ')); -ERROR HY000: Can't create table 'test.t1' (errno: -1) +drop table t1; End of 5.1 tests diff --git a/mysql-test/suite/pbxt/r/type_ranges.result b/mysql-test/suite/pbxt/r/type_ranges.result index 34cabd64bcf..0674dcb4d28 100644 --- a/mysql-test/suite/pbxt/r/type_ranges.result +++ b/mysql-test/suite/pbxt/r/type_ranges.result @@ -70,19 +70,19 @@ flags set('one','two','tree') latin1_swedish_ci NO # show keys from t1; Table Non_unique Key_name Seq_in_index Column_name Collation Cardinality Sub_part Packed Null Index_type Comment t1 0 PRIMARY 1 auto A 0 NULL NULL BTREE -t1 1 utiny 1 utiny A NULL NULL NULL BTREE -t1 1 tiny 1 tiny A NULL NULL NULL BTREE -t1 1 short 1 short A NULL NULL NULL BTREE -t1 1 any_name 1 medium A NULL NULL NULL BTREE -t1 1 longlong 1 longlong A NULL NULL NULL BTREE -t1 1 real_float 1 real_float A NULL NULL NULL BTREE -t1 1 ushort 1 ushort A NULL NULL NULL BTREE -t1 1 umedium 1 umedium A NULL NULL NULL BTREE -t1 1 ulong 1 ulong A NULL NULL NULL BTREE -t1 1 ulonglong 1 ulonglong A NULL NULL NULL BTREE -t1 1 ulonglong 2 ulong A NULL NULL NULL BTREE -t1 1 options 1 options A NULL NULL NULL BTREE -t1 1 options 2 flags A NULL NULL NULL BTREE +t1 1 utiny 1 utiny A 0 NULL NULL BTREE +t1 1 tiny 1 tiny A 0 NULL NULL BTREE +t1 1 short 1 short A 0 NULL NULL BTREE +t1 1 any_name 1 medium A 0 NULL NULL BTREE +t1 1 longlong 1 longlong A 0 NULL NULL BTREE +t1 1 real_float 1 real_float A 0 NULL NULL BTREE +t1 1 ushort 1 ushort A 0 NULL NULL BTREE +t1 1 umedium 1 umedium A 0 NULL NULL BTREE +t1 1 ulong 1 ulong A 0 NULL NULL BTREE +t1 1 ulonglong 1 ulonglong A 0 NULL NULL BTREE +t1 1 ulonglong 2 ulong A 0 NULL NULL BTREE +t1 1 options 1 options A 0 NULL NULL BTREE +t1 1 options 2 flags A 0 NULL NULL BTREE CREATE UNIQUE INDEX test on t1 ( auto ) ; CREATE INDEX test2 on t1 ( ulonglong,ulong) ; CREATE INDEX test3 on t1 ( medium ) ; diff --git a/mysql-test/suite/pbxt/r/type_timestamp.result b/mysql-test/suite/pbxt/r/type_timestamp.result index 2e5a6f46276..47d2a996afd 100644 --- a/mysql-test/suite/pbxt/r/type_timestamp.result +++ b/mysql-test/suite/pbxt/r/type_timestamp.result @@ -101,13 +101,13 @@ create table t1 (t2 timestamp(2), t4 timestamp(4), t6 timestamp(6), t8 timestamp(8), t10 timestamp(10), t12 timestamp(12), t14 timestamp(14)); Warnings: -Warning 1287 The syntax 'TIMESTAMP(2)' is deprecated and will be removed in MySQL 5.2. Please use 'TIMESTAMP' instead -Warning 1287 The syntax 'TIMESTAMP(4)' is deprecated and will be removed in MySQL 5.2. Please use 'TIMESTAMP' instead -Warning 1287 The syntax 'TIMESTAMP(6)' is deprecated and will be removed in MySQL 5.2. Please use 'TIMESTAMP' instead -Warning 1287 The syntax 'TIMESTAMP(8)' is deprecated and will be removed in MySQL 5.2. Please use 'TIMESTAMP' instead -Warning 1287 The syntax 'TIMESTAMP(10)' is deprecated and will be removed in MySQL 5.2. Please use 'TIMESTAMP' instead -Warning 1287 The syntax 'TIMESTAMP(12)' is deprecated and will be removed in MySQL 5.2. Please use 'TIMESTAMP' instead -Warning 1287 The syntax 'TIMESTAMP(14)' is deprecated and will be removed in MySQL 5.2. Please use 'TIMESTAMP' instead +Warning 1287 The syntax 'TIMESTAMP(2)' is deprecated and will be removed in MySQL 6.0. Please use 'TIMESTAMP' instead +Warning 1287 The syntax 'TIMESTAMP(4)' is deprecated and will be removed in MySQL 6.0. Please use 'TIMESTAMP' instead +Warning 1287 The syntax 'TIMESTAMP(6)' is deprecated and will be removed in MySQL 6.0. Please use 'TIMESTAMP' instead +Warning 1287 The syntax 'TIMESTAMP(8)' is deprecated and will be removed in MySQL 6.0. Please use 'TIMESTAMP' instead +Warning 1287 The syntax 'TIMESTAMP(10)' is deprecated and will be removed in MySQL 6.0. Please use 'TIMESTAMP' instead +Warning 1287 The syntax 'TIMESTAMP(12)' is deprecated and will be removed in MySQL 6.0. Please use 'TIMESTAMP' instead +Warning 1287 The syntax 'TIMESTAMP(14)' is deprecated and will be removed in MySQL 6.0. Please use 'TIMESTAMP' instead insert t1 values (0,0,0,0,0,0,0), ("1997-12-31 23:47:59", "1997-12-31 23:47:59", "1997-12-31 23:47:59", "1997-12-31 23:47:59", "1997-12-31 23:47:59", "1997-12-31 23:47:59", diff --git a/mysql-test/suite/pbxt/r/union.result b/mysql-test/suite/pbxt/r/union.result index 85fd7203488..04e5aaf6298 100644 --- a/mysql-test/suite/pbxt/r/union.result +++ b/mysql-test/suite/pbxt/r/union.result @@ -1301,12 +1301,14 @@ t3 CREATE TABLE `t3` ( `left(a,100000000)` longtext ) ENGINE=PBXT DEFAULT CHARSET=latin1 drop tables t1,t2,t3; +SELECT @tmp_max:= @@global.max_allowed_packet; +@tmp_max:= @@global.max_allowed_packet +1048576 +SET @@global.max_allowed_packet=25000000; CREATE TABLE t1 (a mediumtext); CREATE TABLE t2 (b varchar(20)); INSERT INTO t1 VALUES ('a'); CREATE TABLE t3 SELECT REPEAT(a,20000000) AS a FROM t1 UNION SELECT b FROM t2; -Warnings: -Warning 1301 Result of repeat() was larger than max_allowed_packet (1048576) - truncated SHOW CREATE TABLE t3; Table Create Table t3 CREATE TABLE `t3` ( @@ -1340,6 +1342,7 @@ t3 CREATE TABLE `t3` ( `a` varbinary(510) DEFAULT NULL ) ENGINE=PBXT DEFAULT CHARSET=latin1 DROP TABLES t1,t2,t3; +SET @@global.max_allowed_packet:= @tmp_max; create table t1 ( id int not null auto_increment, primary key (id), col1 int); insert into t1 (col1) values (2),(3),(4),(5),(6); select 99 union all select id from t1 order by 1; diff --git a/mysql-test/suite/pbxt/r/view_grant.result b/mysql-test/suite/pbxt/r/view_grant.result index 0847967eb87..5e2744c2933 100644 --- a/mysql-test/suite/pbxt/r/view_grant.result +++ b/mysql-test/suite/pbxt/r/view_grant.result @@ -28,7 +28,7 @@ create view v2 as select * from mysqltest.t2; ERROR 42000: ANY command denied to user 'mysqltest_1'@'localhost' for table 't2' show create view v1; View Create View character_set_client collation_connection -v1 CREATE ALGORITHM=UNDEFINED DEFINER=`mysqltest_1`@`localhost` SQL SECURITY DEFINER VIEW `test`.`v1` AS select `mysqltest`.`t1`.`a` AS `a`,`mysqltest`.`t1`.`b` AS `b` from `mysqltest`.`t1` latin1 latin1_swedish_ci +v1 CREATE ALGORITHM=UNDEFINED DEFINER=`mysqltest_1`@`localhost` SQL SECURITY DEFINER VIEW `v1` AS select `mysqltest`.`t1`.`a` AS `a`,`mysqltest`.`t1`.`b` AS `b` from `mysqltest`.`t1` latin1 latin1_swedish_ci grant create view,drop,select on test.* to mysqltest_1@localhost; use test; alter view v1 as select * from mysqltest.t1; @@ -309,7 +309,7 @@ grant create view,select on test.* to mysqltest_1@localhost; create view v1 as select * from mysqltest.t1; show create view v1; View Create View character_set_client collation_connection -v1 CREATE ALGORITHM=UNDEFINED DEFINER=`mysqltest_1`@`localhost` SQL SECURITY DEFINER VIEW `test`.`v1` AS select `mysqltest`.`t1`.`a` AS `a`,`mysqltest`.`t1`.`b` AS `b` from `mysqltest`.`t1` latin1 latin1_swedish_ci +v1 CREATE ALGORITHM=UNDEFINED DEFINER=`mysqltest_1`@`localhost` SQL SECURITY DEFINER VIEW `v1` AS select `mysqltest`.`t1`.`a` AS `a`,`mysqltest`.`t1`.`b` AS `b` from `mysqltest`.`t1` latin1 latin1_swedish_ci revoke select on mysqltest.t1 from mysqltest_1@localhost; select * from v1; ERROR HY000: View 'test.v1' references invalid table(s) or column(s) or function(s) or definer/invoker of view lack rights to use them diff --git a/mysql-test/suite/pbxt/t/auto_increment.test b/mysql-test/suite/pbxt/t/auto_increment.test index 1819e0cba1f..453d4d7658d 100644 --- a/mysql-test/suite/pbxt/t/auto_increment.test +++ b/mysql-test/suite/pbxt/t/auto_increment.test @@ -150,7 +150,6 @@ delete from t1 where a=0; update t1 set a=0 where b=5; select * from t1 order by b; delete from t1 where a=0; ---error 1048 update t1 set a=NULL where b=6; update t1 set a=300 where b=7; SET SQL_MODE=''; @@ -166,7 +165,6 @@ delete from t1 where a=0; update t1 set a=0 where b=12; select * from t1 order by b; delete from t1 where a=0; ---error 1048 update t1 set a=NULL where b=13; update t1 set a=500 where b=14; select * from t1 order by b; diff --git a/mysql-test/suite/pbxt/t/delete.test b/mysql-test/suite/pbxt/t/delete.test index d774ed50cd1..913e7df3d3a 100644 --- a/mysql-test/suite/pbxt/t/delete.test +++ b/mysql-test/suite/pbxt/t/delete.test @@ -120,13 +120,16 @@ select * from t2; delete t11.*, t12.* from t11,t12 where t11.a = t12.a and t11.b <> (select b from t2 where t11.a < t2.a); select * from t11; select * from t12; +--error 1242 delete ignore t11.*, t12.* from t11,t12 where t11.a = t12.a and t11.b <> (select b from t2 where t11.a < t2.a); select * from t11; select * from t12; +--error 1062 insert into t11 values (2, 12); -- error 1242 delete from t11 where t11.b <> (select b from t2 where t11.a < t2.a); select * from t11; +--error 1242 delete ignore from t11 where t11.b <> (select b from t2 where t11.a < t2.a); select * from t11; drop table t11, t12, t2; diff --git a/mysql-test/suite/pbxt/t/join_nested.test b/mysql-test/suite/pbxt/t/join_nested.test index 9a678b5c0f6..e90aa843042 100644 --- a/mysql-test/suite/pbxt/t/join_nested.test +++ b/mysql-test/suite/pbxt/t/join_nested.test @@ -546,6 +546,7 @@ SELECT t0.a,t0.b,t1.a,t1.b,t2.a,t2.b,t3.a,t3.b,t4.a,t4.b, CREATE INDEX idx_b ON t8(b); +--sorted_result EXPLAIN EXTENDED SELECT t0.a,t0.b,t1.a,t1.b,t2.a,t2.b,t3.a,t3.b,t4.a,t4.b, t5.a,t5.b,t6.a,t6.b,t7.a,t7.b,t8.a,t8.b,t9.a,t9.b @@ -585,6 +586,7 @@ SELECT t0.a,t0.b,t1.a,t1.b,t2.a,t2.b,t3.a,t3.b,t4.a,t4.b, CREATE INDEX idx_b ON t1(b); CREATE INDEX idx_a ON t0(a); +--sorted_result EXPLAIN EXTENDED SELECT t0.a,t0.b,t1.a,t1.b,t2.a,t2.b,t3.a,t3.b,t4.a,t4.b, t5.a,t5.b,t6.a,t6.b,t7.a,t7.b,t8.a,t8.b,t9.a,t9.b @@ -621,6 +623,7 @@ SELECT t0.a,t0.b,t1.a,t1.b,t2.a,t2.b,t3.a,t3.b,t4.a,t4.b, (t8.b=t9.b OR t8.c IS NULL) AND (t9.a=1); +--sorted_result SELECT t0.a,t0.b,t1.a,t1.b,t2.a,t2.b,t3.a,t3.b,t4.a,t4.b, t5.a,t5.b,t6.a,t6.b,t7.a,t7.b,t8.a,t8.b,t9.a,t9.b FROM t0,t1 diff --git a/mysql-test/suite/pbxt/t/lowercase_table_grant-master.opt b/mysql-test/suite/pbxt/t/lowercase_table_grant-master.opt new file mode 100644 index 00000000000..c718e2feb1b --- /dev/null +++ b/mysql-test/suite/pbxt/t/lowercase_table_grant-master.opt @@ -0,0 +1 @@ +--lower_case_table_names diff --git a/mysql-test/suite/pbxt/t/lowercase_table_qcache-master.opt b/mysql-test/suite/pbxt/t/lowercase_table_qcache-master.opt new file mode 100644 index 00000000000..c718e2feb1b --- /dev/null +++ b/mysql-test/suite/pbxt/t/lowercase_table_qcache-master.opt @@ -0,0 +1 @@ +--lower_case_table_names diff --git a/mysql-test/suite/pbxt/t/lowercase_view-master.opt b/mysql-test/suite/pbxt/t/lowercase_view-master.opt new file mode 100644 index 00000000000..62ab6dad1e0 --- /dev/null +++ b/mysql-test/suite/pbxt/t/lowercase_view-master.opt @@ -0,0 +1 @@ +--lower_case_table_names=1 diff --git a/mysql-test/suite/pbxt/t/null.test b/mysql-test/suite/pbxt/t/null.test index a52e9d85b6c..63281133388 100644 --- a/mysql-test/suite/pbxt/t/null.test +++ b/mysql-test/suite/pbxt/t/null.test @@ -61,9 +61,7 @@ drop table t1; # CREATE TABLE t1 (a varchar(16) NOT NULL default '', b smallint(6) NOT NULL default 0, c datetime NOT NULL default '0000-00-00 00:00:00', d smallint(6) NOT NULL default 0); INSERT INTO t1 SET a = "", d= "2003-01-14 03:54:55"; ---error 1048 UPDATE t1 SET d=1/NULL; ---error 1048 UPDATE t1 SET d=NULL; --error 1048 INSERT INTO t1 (a) values (null); diff --git a/mysql-test/suite/pbxt/t/pbxt_bugs.test b/mysql-test/suite/pbxt/t/pbxt_bugs.test index 3976f44267c..b774e9e3034 100644 --- a/mysql-test/suite/pbxt/t/pbxt_bugs.test +++ b/mysql-test/suite/pbxt/t/pbxt_bugs.test @@ -926,7 +926,59 @@ LOAD DATA LOCAL INFILE 'suite/pbxt/t/load_unique_error1.inc' REPLACE INTO TABLE SELECT * FROM t1 ORDER BY c1; DROP TABLE t1; +create table parent (id int primary key); +create table child (id int PRIMARY KEY, FOREIGN KEY (id) REFERENCES parent(id)); +insert into parent values (2), (3), (4); +insert into child values (3), (4); + +--error 1451 +delete ignore from parent; +--sorted_result +select * from parent; + +drop table child, parent; + +# bug 378222: Drop sakila causes error: Cannot delete or update a parent row: a foreign key constraint fails + +create schema test378222; +use test378222; +create table t1 (id int primary key); +create table t2 (id int primary key); +alter table t1 add constraint foreign key (id) references t2 (id); +alter table t2 add constraint foreign key (id) references t1 (id); +drop schema test378222; + +create schema test378222a; +create schema test378222b; +create table test378222a.t1 (id int primary key); +create table test378222b.t2 (id int primary key); +alter table test378222a.t1 add constraint foreign key (id) references test378222b.t2 (id); +alter table test378222b.t2 add constraint foreign key (id) references test378222a.t1 (id); +set foreign_key_checks = 1; +--error 1217 +drop schema test378222a; +--error 1217 +drop schema test378222b; +set foreign_key_checks = 0; +drop schema test378222a; +drop schema test378222b; +set foreign_key_checks = 1; +use test; + +# bug 369086: Incosistent/Incorrect Truncate behavior +CREATE TABLE t1(c1 TINYINT AUTO_INCREMENT NULL KEY ) AUTO_INCREMENT=10; +SHOW CREATE TABLE t1; +INSERT INTO t1 VALUES(null); +INSERT INTO t1 VALUES(null); +INSERT INTO t1 VALUES(null); +SELECT * FROM t1; +TRUNCATE TABLE t1; +INSERT INTO t1 VALUES(null); +SELECT * FROM t1; +DROP TABLE t1; + --disable_query_log + DROP TABLE t2, t5; drop database pbxt; --enable_query_log diff --git a/mysql-test/suite/pbxt/t/rename.test b/mysql-test/suite/pbxt/t/rename.test index 1cd02ef8a65..670c68f7965 100644 --- a/mysql-test/suite/pbxt/t/rename.test +++ b/mysql-test/suite/pbxt/t/rename.test @@ -63,7 +63,28 @@ connection con2; # Wait for the the tables to be renamed # i.e the query below succeds let $query= select * from t2, t4; -source include/wait_for_query_to_suceed.inc; +# source include/wait_for_query_to_suceed.inc; +let $counter= 100; + +disable_abort_on_error; +disable_query_log; +disable_result_log; +eval $query; +while ($mysql_errno) +{ + eval $query; + sleep 0.1; + dec $counter; + + if (!$counter) + { + die("Waited too long for query to suceed"); + } +} +enable_abort_on_error; +enable_query_log; +enable_result_log; + show tables; diff --git a/mysql-test/suite/pbxt/t/schema.test b/mysql-test/suite/pbxt/t/schema.test index a08d9b38935..41e5231f690 100644 --- a/mysql-test/suite/pbxt/t/schema.test +++ b/mysql-test/suite/pbxt/t/schema.test @@ -10,5 +10,12 @@ drop database if exists mysqltest1; create schema foo; show create schema foo; +# force PBXT schema to be created +create table t1 (id int) engine=pbxt; show schemas; drop schema foo; + +--disable_query_log +drop table if exists t1; +drop database pbxt; +--enable_query_log diff --git a/mysql-test/suite/pbxt/t/type_enum.test b/mysql-test/suite/pbxt/t/type_enum.test index 10beff8fdc3..ec562f377f6 100644 --- a/mysql-test/suite/pbxt/t/type_enum.test +++ b/mysql-test/suite/pbxt/t/type_enum.test @@ -153,12 +153,19 @@ create table t1(russian_deviant enum('E','F','EÿF','F,E') NOT NULL DEFAULT'E'); show create table t1; drop table t1; -# ER_WRONG_FIELD_TERMINATORS ---error 1005 +# the following create statement sometimes fails like it would if NO_BACKSLASH_ESCAPES sql mode was on, +# we check sql mode here +select @@SESSION.sql_mode; +select @@GLOBAL.sql_mode; + +## ER_WRONG_FIELD_TERMINATORS +#--error 1005 create table t1(exhausting_charset enum('ABCDEFGHIJKLMNOPQRSTUVWXYZ',' !"','#$%&\'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~','xx\','yy\€','zz‚ƒ„…†‡ˆ‰Š‹ŒŽ‘’“”•–—˜™š›œžŸ ¡¢£¤¥¦§¨©ª«¬®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖרÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ')); +drop table t1; + --disable_query_log drop database pbxt; --enable_query_log diff --git a/mysql-test/suite/pbxt/t/udf-master.opt b/mysql-test/suite/pbxt/t/udf-master.opt new file mode 100644 index 00000000000..7d8786c156a --- /dev/null +++ b/mysql-test/suite/pbxt/t/udf-master.opt @@ -0,0 +1 @@ +$UDF_EXAMPLE_LIB_OPT diff --git a/mysql-test/suite/pbxt/t/union.test b/mysql-test/suite/pbxt/t/union.test index 07add11fe5f..02c73d4bb57 100644 --- a/mysql-test/suite/pbxt/t/union.test +++ b/mysql-test/suite/pbxt/t/union.test @@ -802,6 +802,10 @@ drop tables t1,t2,t3; # exceeds mediumtext maximum length # +SELECT @tmp_max:= @@global.max_allowed_packet; +SET @@global.max_allowed_packet=25000000; +# switching connection to allow the new max_allowed_packet take effect +--connect (newconn, localhost, root,,) CREATE TABLE t1 (a mediumtext); CREATE TABLE t2 (b varchar(20)); INSERT INTO t1 VALUES ('a'); @@ -823,6 +827,9 @@ INSERT INTO t1 VALUES ('a'); CREATE TABLE t3 SELECT REPEAT(a,2) AS a FROM t1 UNION SELECT b FROM t2; SHOW CREATE TABLE t3; DROP TABLES t1,t2,t3; +--connection default +SET @@global.max_allowed_packet:= @tmp_max; +--disconnect newconn # # Bug #10032 Bug in parsing UNION with ORDER BY when one node does not use FROM diff --git a/mysql-test/t/index_merge_myisam.test b/mysql-test/t/index_merge_myisam.test index dccaecef20a..0c4b9c6886c 100644 --- a/mysql-test/t/index_merge_myisam.test +++ b/mysql-test/t/index_merge_myisam.test @@ -25,15 +25,19 @@ let $merge_table_support= 1; --echo # we get another @@optimizer_switch user) --echo # +--replace_regex /,table_elimination=on// select @@optimizer_switch; set optimizer_switch='index_merge=off,index_merge_union=off'; +--replace_regex /,table_elimination=on// select @@optimizer_switch; set optimizer_switch='index_merge_union=on'; +--replace_regex /,table_elimination=on// select @@optimizer_switch; set optimizer_switch='default,index_merge_sort_union=off'; +--replace_regex /,table_elimination=on// select @@optimizer_switch; --error ER_WRONG_VALUE_FOR_VAR @@ -71,17 +75,21 @@ set optimizer_switch='default,index_merge=on,index_merge=off,default'; set optimizer_switch=default; set optimizer_switch='index_merge=off,index_merge_union=off,default'; +--replace_regex /,table_elimination=on// select @@optimizer_switch; set optimizer_switch=default; # Check setting defaults for global vars +--replace_regex /,table_elimination=on// select @@global.optimizer_switch; set @@global.optimizer_switch=default; +--replace_regex /,table_elimination=on// select @@global.optimizer_switch; --echo # --echo # Check index_merge's @@optimizer_switch flags --echo # +--replace_regex /,table_elimination.on// select @@optimizer_switch; create table t0 (a int); @@ -182,6 +190,7 @@ set optimizer_switch='default,index_merge_union=off'; explain select * from t1 where a=10 and b=10 or c=10; set optimizer_switch=default; +--replace_regex /,table_elimination.on// show variables like 'optimizer_switch'; drop table t0, t1; diff --git a/mysql-test/t/log_slow.test b/mysql-test/t/log_slow.test new file mode 100644 index 00000000000..303d5bf8deb --- /dev/null +++ b/mysql-test/t/log_slow.test @@ -0,0 +1,42 @@ +# +# Testing of slow log query options +# + +select @@log_slow_filter; +select @@log_slow_rate_limit; +select @@log_slow_verbosity; +show variables like "log_slow%"; + +# Some simple test to set log_slow_filter +set @@log_slow_filter= "filesort,filesort_on_disk,full_join,full_scan,query_cache,query_cache_miss,tmp_table,tmp_table_on_disk,admin"; +select @@log_slow_filter; +set @@log_slow_filter="admin,admin"; +select @@log_slow_filter; +set @@log_slow_filter=7; +select @@log_slow_filter; + +# Test of wrong values +--error 1231 +set @@log_slow_filter= "filesort,impossible,impossible2,admin"; +--error 1231 +set @@log_slow_filter= "filesort, admin"; +--error 1231 +set @@log_slow_filter= 1<<31; +select @@log_slow_filter; + +# Some simple test to set log_slow_verbosity +set @@log_slow_verbosity= "query_plan,innodb"; +select @@log_slow_verbosity; +set @@log_slow_verbosity=1; +select @@log_slow_verbosity; + +# +# Check which fields are in slow_log table +# + +show fields from mysql.slow_log; + +# Reset used variables + +set @@log_slow_filter=default; +set @@log_slow_verbosity=default; diff --git a/mysql-test/t/mysql-bug41486.test b/mysql-test/t/mysql-bug41486.test index 6e014bca7d1..e7b0acc1935 100644 --- a/mysql-test/t/mysql-bug41486.test +++ b/mysql-test/t/mysql-bug41486.test @@ -27,7 +27,8 @@ connect (con1, localhost, root,,); CREATE TABLE t1(data LONGBLOB); INSERT INTO t1 SELECT REPEAT('1', 2*1024*1024); - +# The following is to remove the race between end of insert and start of MYSQL_DUMP: +SELECT COUNT(*) FROM t1; let $outfile= $MYSQLTEST_VARDIR/tmp/bug41486.sql; --error 0,1 remove_file $outfile; diff --git a/mysql-test/t/mysqld_option_err.test b/mysql-test/t/mysqld_option_err.test new file mode 100644 index 00000000000..2b35a2eb38e --- /dev/null +++ b/mysql-test/t/mysqld_option_err.test @@ -0,0 +1,47 @@ +# +# Test error checks on mysqld command line option parsing. +# +# Call mysqld with different invalid options, and check that it fails in each case. +# +# This means that a test failure results in mysqld starting up, which is only +# caught when the test case times out. This is not ideal, but I did not find an +# easy way to have the server shut down after a successful startup. +# + +--source include/not_embedded.inc + +# We have not run (and do not need) bootstrap of the server. We just +# give it a dummy data directory (for log files etc). + +mkdir $MYSQLTEST_VARDIR/tmp/mysqld_option_err; + + +--echo Test that unknown option is not silently ignored. +--error 2 +--exec $MYSQLD --skip-networking --datadir=$MYSQLTEST_VARDIR/tmp/mysqld_option_err --skip-grant-tables --nonexistentoption >$MYSQLTEST_VARDIR/tmp/mysqld_option_err/mysqltest.log 2>&1 + + +--echo Test bad binlog format. +--error 1 +--exec $MYSQLD --skip-networking --datadir=$MYSQLTEST_VARDIR/tmp/mysqld_option_err --skip-grant-tables --log-bin --binlog-format=badformat >>$MYSQLTEST_VARDIR/tmp/mysqld_option_err/mysqltest.log 2>&1 + + +--echo Test bad default storage engine. +--error 1 +--exec $MYSQLD --skip-networking --datadir=$MYSQLTEST_VARDIR/tmp/mysqld_option_err --skip-grant-tables --default-storage-engine=nonexistentengine >>$MYSQLTEST_VARDIR/tmp/mysqld_option_err/mysqltest.log 2>&1 + + +--echo Test non-numeric value passed to number option. +--error 1 +--exec $MYSQLD --skip-networking --datadir=$MYSQLTEST_VARDIR/tmp/mysqld_option_err --skip-grant-tables --min-examined-row-limit=notanumber >>$MYSQLTEST_VARDIR/tmp/mysqld_option_err/mysqltest.log 2>&1 + + +# Test for MBug#423035: error in parsing enum value for plugin +# variable in mysqld command-line option. +# See also Bug#32034. +--echo Test that bad value for plugin enum option is rejected correctly. +--error 7 +--exec $MYSQLD --skip-networking --datadir=$MYSQLTEST_VARDIR/tmp/mysqld_option_err --skip-grant-tables $EXAMPLE_PLUGIN_OPT --plugin-load=EXAMPLE=ha_example.so --plugin-example-enum-var=noexist >>$MYSQLTEST_VARDIR/tmp/mysqld_option_err/mysqltest.log 2>&1 + + +--echo Done. diff --git a/mysql-test/t/table_elim.test b/mysql-test/t/table_elim.test new file mode 100644 index 00000000000..642c5d51d62 --- /dev/null +++ b/mysql-test/t/table_elim.test @@ -0,0 +1,338 @@ +# +# Table elimination (MWL#17) tests +# +--disable_warnings +drop table if exists t0, t1, t2, t3; +drop view if exists v1, v2; +--enable_warnings + +create table t1 (a int); +insert into t1 values (0),(1),(2),(3); +create table t0 as select * from t1; + +create table t2 (a int primary key, b int) + as select a, a as b from t1 where a in (1,2); + +create table t3 (a int primary key, b int) + as select a, a as b from t1 where a in (1,3); + +--echo # This will be eliminated: +explain select t1.a from t1 left join t2 on t2.a=t1.a; +explain extended select t1.a from t1 left join t2 on t2.a=t1.a; + +select t1.a from t1 left join t2 on t2.a=t1.a; + +--echo # This will not be eliminated as t2.b is in in select list: +explain select * from t1 left join t2 on t2.a=t1.a; + +--echo # This will not be eliminated as t2.b is in in order list: +explain select t1.a from t1 left join t2 on t2.a=t1.a order by t2.b; + +--echo # This will not be eliminated as t2.b is in group list: +explain select t1.a from t1 left join t2 on t2.a=t1.a group by t2.b; + +--echo # This will not be eliminated as t2.b is in the WHERE +explain select t1.a from t1 left join t2 on t2.a=t1.a where t2.b < 3 or t2.b is null; + +--echo # Elimination of multiple tables: +explain select t1.a from t1 left join (t2 join t3) on t2.a=t1.a and t3.a=t1.a; + +--echo # Elimination of multiple tables (2): +explain select t1.a from t1 left join (t2 join t3 on t2.b=t3.b) on t2.a=t1.a and t3.a=t1.a; + +--echo # Elimination when done within an outer join nest: +explain extended +select t0.* +from + t0 left join (t1 left join (t2 join t3 on t2.b=t3.b) on t2.a=t1.a and + t3.a=t1.a) on t0.a=t1.a; + +--echo # Elimination with aggregate functions +explain select count(*) from t1 left join t2 on t2.a=t1.a; +explain select count(1) from t1 left join t2 on t2.a=t1.a; +explain select count(1) from t1 left join t2 on t2.a=t1.a group by t1.a; + +--echo This must not use elimination: +explain select count(1) from t1 left join t2 on t2.a=t1.a group by t2.a; + +drop table t0, t1, t2, t3; + +# This will stand for elim_facts +create table t0 ( id integer, primary key (id)); + +# Attribute1, non-versioned +create table t1 ( + id integer, + attr1 integer, + primary key (id), + key (attr1) +); + +# Attribute2, time-versioned +create table t2 ( + id integer, + attr2 integer, + fromdate date, + primary key (id, fromdate), + key (attr2,fromdate) +); + +insert into t0 values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9); +insert into t0 select A.id + 10*B.id from t0 A, t0 B where B.id > 0; + +insert into t1 select id, id from t0; +insert into t2 select id, id, date_add('2009-06-22', interval id day) from t0; +insert into t2 select id, id+1, date_add('2008-06-22', interval id day) from t0; + +create view v1 as +select + F.id, A1.attr1, A2.attr2 +from + t0 F + left join t1 A1 on A1.id=F.id + left join t2 A2 on A2.id=F.id and + A2.fromdate=(select MAX(fromdate) from + t2 where id=A2.id); +create view v2 as +select + F.id, A1.attr1, A2.attr2 +from + t0 F + left join t1 A1 on A1.id=F.id + left join t2 A2 on A2.id=F.id and + A2.fromdate=(select MAX(fromdate) from + t2 where id=F.id); + +--echo This should use one table: +explain select id from v1 where id=2; +--echo This should use one table: +explain extended select id from v1 where id in (1,2,3,4); +--echo This should use facts and A1 tables: +explain extended select id from v1 where attr1 between 12 and 14; +--echo This should use facts, A2 and its subquery: +explain extended select id from v1 where attr2 between 12 and 14; + +# Repeat for v2: + +--echo This should use one table: +explain select id from v2 where id=2; +--echo This should use one table: +explain extended select id from v2 where id in (1,2,3,4); +--echo This should use facts and A1 tables: +explain extended select id from v2 where attr1 between 12 and 14; +--echo This should use facts, A2 and its subquery: +explain extended select id from v2 where attr2 between 12 and 14; + +drop view v1, v2; +drop table t0, t1, t2; + +# +# Tests for the code that uses t.keypartX=func(t.keypartY) equalities to +# make table elimination inferences +# +create table t1 (a int); +insert into t1 values (0),(1),(2),(3); + +create table t2 (pk1 int, pk2 int, pk3 int, col int, primary key(pk1, pk2, pk3)); +insert into t2 select a,a,a,a from t1; + +--echo This must use only t1: +explain select t1.* from t1 left join t2 on t2.pk1=t1.a and + t2.pk2=t2.pk1+1 and + t2.pk3=t2.pk2+1; + +--echo This must use only t1: +explain select t1.* from t1 left join t2 on t2.pk1=t1.a and + t2.pk3=t2.pk1+1 and + t2.pk2=t2.pk3+1; + +--echo This must use both: +explain select t1.* from t1 left join t2 on t2.pk1=t1.a and + t2.pk3=t2.pk1+1 and + t2.pk2=t2.pk3+t2.col; + +--echo This must use only t1: +explain select t1.* from t1 left join t2 on t2.pk2=t1.a and + t2.pk1=t2.pk2+1 and + t2.pk3=t2.pk1; + +drop table t1, t2; +# +# Check that equality propagation is taken into account +# +create table t1 (pk int primary key, col int); +insert into t1 values (1,1),(2,2); + +create table t2 like t1; +insert into t2 select * from t1; + +create table t3 like t1; +insert into t3 select * from t1; + +explain +select t1.* from t1 left join ( t2 left join t3 on t3.pk=t2.col) on t2.col=t1.col; + +explain +select t1.*, t2.* from t1 left join (t2 left join t3 on t3.pk=t2.col) on t2.pk=t1.col; + +explain select t1.* +from + t1 left join ( t2 left join t3 on t3.pk=t2.col or t3.pk=t2.col) + on t2.col=t1.col or t2.col=t1.col; + +explain select t1.*, t2.* +from + t1 left join + (t2 left join t3 on t3.pk=t2.col or t3.pk=t2.col) + on t2.pk=t1.col or t2.pk=t1.col; + +drop table t1, t2, t3; + +--echo # +--echo # Check things that look like functional dependencies but really are not +--echo # + +create table t1 (a char(10) character set latin1 collate latin1_general_ci primary key); +insert into t1 values ('foo'); +insert into t1 values ('bar'); + +create table t2 (a char(10) character set latin1 collate latin1_general_cs primary key); +insert into t2 values ('foo'); +insert into t2 values ('FOO'); + +-- echo this must not use table elimination: +explain select t1.* from t1 left join t2 on t2.a='foo' collate latin1_general_ci; + +-- echo this must not use table elimination: +explain select t1.* from t1 left join t2 on t2.a=t1.a collate latin1_general_ci; +drop table t1,t2; + +create table t1 (a int primary key); +insert into t1 values (1),(2); +create table t2 (a char(10) primary key); +insert into t2 values ('1'),('1.0'); +-- echo this must not use table elimination: +explain select t1.* from t1 left join t2 on t2.a=1; +-- echo this must not use table elimination: +explain select t1.* from t1 left join t2 on t2.a=t1.a; + +drop table t1, t2; +# partial unique keys do not work at the moment, although they are able to +# provide one-match guarantees: +create table t1 (a char(10) primary key); +insert into t1 values ('foo'),('bar'); + +create table t2 (a char(10), unique key(a(2))); +insert into t2 values ('foo'),('bar'); + +explain select t1.* from t1 left join t2 on t2.a=t1.a; + +drop table t1, t2; + +--echo # +--echo # check UPDATE/DELETE that look like they could be eliminated +--echo # +create table t1 (a int primary key, b int); +insert into t1 values (1,1),(2,2),(3,3); + +create table t2 like t1; +insert into t2 select * from t1; +update t1 left join t2 using (a) set t2.a=t2.a+100; +select * from t1; +select * from t2; + +delete from t2; +insert into t2 select * from t1; + +delete t2 from t1 left join t2 using (a); +select * from t1; +select * from t2; +drop table t1, t2; + +--echo # +--echo # Tests with various edge-case ON expressions +--echo # +create table t1 (a int, b int, c int, d int); +insert into t1 values (0,0,0,0),(1,1,1,1),(2,2,2,2),(3,3,3,3); + +create table t2 (pk int primary key, b int) + as select a as pk, a as b from t1 where a in (1,2); + +create table t3 (pk1 int, pk2 int, b int, unique(pk1,pk2)); +insert into t3 select a as pk1, a as pk2, a as b from t1 where a in (1,3); + +explain select t1.a from t1 left join t2 on t2.pk=t1.a and t2.b<t1.b; +explain select t1.a from t1 left join t2 on t2.pk=t1.a or t2.b<t1.b; +explain select t1.a from t1 left join t2 on t2.b<t1.b or t2.pk=t1.a; + +explain select t1.a from t1 left join t2 on t2.pk between 10 and 20; +explain select t1.a from t1 left join t2 on t2.pk between 0.5 and 1.5; +explain select t1.a from t1 left join t2 on t2.pk between 10 and 10; + +explain select t1.a from t1 left join t2 on t2.pk in (10); +explain select t1.a from t1 left join t2 on t2.pk in (t1.a); + +explain select t1.a from t1 left join t2 on TRUE; + +explain select t1.a from t1 left join t3 on t3.pk1=t1.a and t3.pk2 IS NULL; + +drop table t1,t2,t3; + +--echo # +--echo # Multi-equality tests +--echo # +create table t1 (a int, b int, c int, d int); +insert into t1 values (0,0,0,0),(1,1,1,1),(2,2,2,2),(3,3,3,3); + +create table t2 (pk int primary key, b int, c int); +insert into t2 select a,a,a from t1 where a in (1,2); + +explain +select t1.* +from t1 left join t2 on t2.pk=t2.c and t2.b=t1.a and t1.a=t1.b and t2.c=t2.b +where t1.d=1; + +explain +select t1.* +from + t1 + left join + t2 + on (t2.pk=t2.c and t2.b=t1.a and t1.a=t1.b and t2.c=t2.b) or + (t2.pk=t2.c and t2.b=t1.a and t1.a=t1.b and t2.c=t2.b) +where t1.d=1; + +--echo #This can't be eliminated: +explain +select t1.* +from + t1 + left join + t2 + on (t2.pk=t2.c and t2.b=t1.a and t2.c=t1.b) or + (t2.pk=t2.c and t1.a=t1.b and t2.c=t1.b) +where t1.d=1; + +explain +select t1.* +from + t1 + left join + t2 + on (t2.pk=t2.c and t2.b=t1.a and t2.c=t1.b) or + (t2.pk=t2.c and t2.c=t1.b) +; + +explain +select t1.* +from t1 left join t2 on t2.pk=3 or t2.pk= 4; + +explain +select t1.* +from t1 left join t2 on t2.pk=3 or t2.pk= 3; + +explain +select t1.* +from t1 left join t2 on (t2.pk=3 and t2.b=3) or (t2.pk= 4 and t2.b=3); + +drop table t1, t2; diff --git a/mysql-test/valgrind.supp b/mysql-test/valgrind.supp index efc40f8b942..57eee78bdaa 100644 --- a/mysql-test/valgrind.supp +++ b/mysql-test/valgrind.supp @@ -704,6 +704,75 @@ fun:inet_ntoa } + +# +# Some problem inside glibc on Ubuntu 9.04, x86 (but not amd64): +# +# ==5985== 19 bytes in 1 blocks are still reachable in loss record 1 of 6 +# ==5985== at 0x7AF3FDE: malloc (vg_replace_malloc.c:207) +# ... 11,12, or 13 functions w/o symbols ... +# ==5985== by 0x8717185: nptl_pthread_exit_hack_handler (my_thr_init.c:55) +# +# Since valgrind 3.3.0 doesn't support '...' multi-function pattern, using +# multiple suppressions: +# +{ + Mem loss inside nptl_pthread_exit_hack_handler + Memcheck:Leak + fun:* + fun:* + fun:* + fun:* + fun:* + fun:* + fun:* + fun:* + fun:* + fun:* + fun:* + fun:* + fun:nptl_pthread_exit_hack_handler +} + +{ + Mem loss inside nptl_pthread_exit_hack_handler + Memcheck:Leak + fun:* + fun:* + fun:* + fun:* + fun:* + fun:* + fun:* + fun:* + fun:* + fun:* + fun:* + fun:* + fun:* + fun:nptl_pthread_exit_hack_handler +} + +{ + Mem loss inside nptl_pthread_exit_hack_handler + Memcheck:Leak + fun:* + fun:* + fun:* + fun:* + fun:* + fun:* + fun:* + fun:* + fun:* + fun:* + fun:* + fun:* + fun:* + fun:* + fun:nptl_pthread_exit_hack_handler +} + # # BUG#45630 # Suppress valgrind failures within nptl_pthread_exit_hack_handler on Ubuntu 9.04, x86 (but not amd64) diff --git a/mysys/my_compress.c b/mysys/my_compress.c index 26626d70079..ade2742c4fc 100644 --- a/mysys/my_compress.c +++ b/mysys/my_compress.c @@ -81,12 +81,13 @@ my_bool my_compress(uchar *packet, size_t *len, size_t *complen) This fix is safe, since such memory is only used internally by zlib, so we will not hide any bugs in mysql this way. */ -void *my_az_allocator(void *dummy, unsigned int items, unsigned int size) +void *my_az_allocator(void *dummy __attribute__((unused)), unsigned int items, + unsigned int size) { return my_malloc((size_t)items*(size_t)size, IF_VALGRIND(MY_ZEROFILL, MYF(0))); } -void my_az_free(void *dummy, void *address) +void my_az_free(void *dummy __attribute__((unused)), void *address) { my_free(address, MYF(MY_ALLOW_ZERO_PTR)); } diff --git a/mysys/my_getopt.c b/mysys/my_getopt.c index ceb99975cdb..46e07fda32e 100644 --- a/mysys/my_getopt.c +++ b/mysys/my_getopt.c @@ -611,6 +611,7 @@ static int setval(const struct my_option *opts, uchar* *value, char *argument, my_bool set_maximum_value) { int err= 0; + int pos; if (value && argument) { diff --git a/sql-bench/test-table-elimination.sh b/sql-bench/test-table-elimination.sh new file mode 100755 index 00000000000..338a7ceb4b5 --- /dev/null +++ b/sql-bench/test-table-elimination.sh @@ -0,0 +1,316 @@ +#!@PERL@ +# Test of table elimination feature + +use Cwd; +use DBI; +use Getopt::Long; +use Benchmark; + +$opt_loop_count=100000; +$opt_medium_loop_count=10000; +$opt_small_loop_count=100; + +$pwd = cwd(); $pwd = "." if ($pwd eq ''); +require "$pwd/bench-init.pl" || die "Can't read Configuration file: $!\n"; + +if ($opt_small_test) +{ + $opt_loop_count/=10; + $opt_medium_loop_count/=10; + $opt_small_loop_count/=10; +} + +print "Testing table elimination feature\n"; +print "The test table has $opt_loop_count rows.\n\n"; + +# A query to get the recent versions of all attributes: +$select_current_full_facts=" + select + F.id, A1.attr1, A2.attr2 + from + elim_facts F + left join elim_attr1 A1 on A1.id=F.id + left join elim_attr2 A2 on A2.id=F.id and + A2.fromdate=(select MAX(fromdate) from + elim_attr2 where id=A2.id); +"; +$select_current_full_facts=" + select + F.id, A1.attr1, A2.attr2 + from + elim_facts F + left join elim_attr1 A1 on A1.id=F.id + left join elim_attr2 A2 on A2.id=F.id and + A2.fromdate=(select MAX(fromdate) from + elim_attr2 where id=F.id); +"; +# TODO: same as above but for some given date also? +# TODO: + + +#### +#### Connect and start timeing +#### + +$dbh = $server->connect(); +$start_time=new Benchmark; + +#### +#### Create needed tables +#### + +goto select_test if ($opt_skip_create); + +print "Creating tables\n"; +$dbh->do("drop table elim_facts" . $server->{'drop_attr'}); +$dbh->do("drop table elim_attr1" . $server->{'drop_attr'}); +$dbh->do("drop table elim_attr2" . $server->{'drop_attr'}); + +# The facts table +do_many($dbh,$server->create("elim_facts", + ["id integer"], + ["primary key (id)"])); + +# Attribute1, non-versioned +do_many($dbh,$server->create("elim_attr1", + ["id integer", + "attr1 integer"], + ["primary key (id)", + "key (attr1)"])); + +# Attribute2, time-versioned +do_many($dbh,$server->create("elim_attr2", + ["id integer", + "attr2 integer", + "fromdate date"], + ["primary key (id, fromdate)", + "key (attr2,fromdate)"])); + +#NOTE: ignoring: if ($limits->{'views'}) +$dbh->do("drop view elim_current_facts"); +$dbh->do("create view elim_current_facts as $select_current_full_facts"); + +if ($opt_lock_tables) +{ + do_query($dbh,"LOCK TABLES elim_facts, elim_attr1, elim_attr2 WRITE"); +} + +if ($opt_fast && defined($server->{vacuum})) +{ + $server->vacuum(1,\$dbh); +} + +#### +#### Fill the facts table +#### +$n_facts= $opt_loop_count; + +if ($opt_fast && $server->{transactions}) +{ + $dbh->{AutoCommit} = 0; +} + +print "Inserting $n_facts rows into facts table\n"; +$loop_time=new Benchmark; + +$query="insert into elim_facts values ("; +for ($id=0; $id < $n_facts ; $id++) +{ + do_query($dbh,"$query $id)"); +} + +if ($opt_fast && $server->{transactions}) +{ + $dbh->commit; + $dbh->{AutoCommit} = 1; +} + +$end_time=new Benchmark; +print "Time to insert ($n_facts): " . + timestr(timediff($end_time, $loop_time),"all") . "\n\n"; + +#### +#### Fill attr1 table +#### +if ($opt_fast && $server->{transactions}) +{ + $dbh->{AutoCommit} = 0; +} + +print "Inserting $n_facts rows into attr1 table\n"; +$loop_time=new Benchmark; + +$query="insert into elim_attr1 values ("; +for ($id=0; $id < $n_facts ; $id++) +{ + $attr1= ceil(rand($n_facts)); + do_query($dbh,"$query $id, $attr1)"); +} + +if ($opt_fast && $server->{transactions}) +{ + $dbh->commit; + $dbh->{AutoCommit} = 1; +} + +$end_time=new Benchmark; +print "Time to insert ($n_facts): " . + timestr(timediff($end_time, $loop_time),"all") . "\n\n"; + +#### +#### Fill attr2 table +#### +if ($opt_fast && $server->{transactions}) +{ + $dbh->{AutoCommit} = 0; +} + +print "Inserting $n_facts rows into attr2 table\n"; +$loop_time=new Benchmark; + +for ($id=0; $id < $n_facts ; $id++) +{ + # Two values for each $id - current one and obsolete one. + $attr1= ceil(rand($n_facts)); + $query="insert into elim_attr2 values ($id, $attr1, now())"; + do_query($dbh,$query); + $query="insert into elim_attr2 values ($id, $attr1, '2009-01-01')"; + do_query($dbh,$query); +} + +if ($opt_fast && $server->{transactions}) +{ + $dbh->commit; + $dbh->{AutoCommit} = 1; +} + +$end_time=new Benchmark; +print "Time to insert ($n_facts): " . + timestr(timediff($end_time, $loop_time),"all") . "\n\n"; + +#### +#### Finalize the database population +#### + +if ($opt_lock_tables) +{ + do_query($dbh,"UNLOCK TABLES"); +} + +if ($opt_fast && defined($server->{vacuum})) +{ + $server->vacuum(0,\$dbh,["elim_facts", "elim_attr1", "elim_attr2"]); +} + +if ($opt_lock_tables) +{ + do_query($dbh,"LOCK TABLES elim_facts, elim_attr1, elim_attr2 WRITE"); +} + +#### +#### Do some selects on the table +#### + +select_test: + +# +# The selects will be: +# - N pk-lookups with all attributes +# - pk-attribute-based lookup +# - latest-attribute value based lookup. + + +### +### Bare facts select: +### +print "testing bare facts facts table\n"; +$loop_time=new Benchmark; +$rows=0; +for ($i=0 ; $i < $opt_medium_loop_count ; $i++) +{ + $val= ceil(rand($n_facts)); + $rows+=fetch_all_rows($dbh,"select * from elim_facts where id=$val"); +} +$count=$i; + +$end_time=new Benchmark; +print "time for select_bare_facts ($count:$rows): " . + timestr(timediff($end_time, $loop_time),"all") . "\n"; + + +### +### Full facts select, no elimination: +### +print "testing full facts facts table\n"; +$loop_time=new Benchmark; +$rows=0; +for ($i=0 ; $i < $opt_medium_loop_count ; $i++) +{ + $val= rand($n_facts); + $rows+=fetch_all_rows($dbh,"select * from elim_current_facts where id=$val"); +} +$count=$i; + +$end_time=new Benchmark; +print "time for select_two_attributes ($count:$rows): " . + timestr(timediff($end_time, $loop_time),"all") . "\n"; + +### +### Now with elimination: select only only one fact +### +print "testing selection of one attribute\n"; +$loop_time=new Benchmark; +$rows=0; +for ($i=0 ; $i < $opt_medium_loop_count ; $i++) +{ + $val= rand($n_facts); + $rows+=fetch_all_rows($dbh,"select id, attr1 from elim_current_facts where id=$val"); +} +$count=$i; + +$end_time=new Benchmark; +print "time for select_one_attribute ($count:$rows): " . + timestr(timediff($end_time, $loop_time),"all") . "\n"; + +### +### Now with elimination: select only only one fact +### +print "testing selection of one attribute\n"; +$loop_time=new Benchmark; +$rows=0; +for ($i=0 ; $i < $opt_medium_loop_count ; $i++) +{ + $val= rand($n_facts); + $rows+=fetch_all_rows($dbh,"select id, attr2 from elim_current_facts where id=$val"); +} +$count=$i; + +$end_time=new Benchmark; +print "time for select_one_attribute ($count:$rows): " . + timestr(timediff($end_time, $loop_time),"all") . "\n"; + + +; + +#### +#### End of benchmark +#### + +if ($opt_lock_tables) +{ + do_query($dbh,"UNLOCK TABLES"); +} +if (!$opt_skip_delete) +{ + do_query($dbh,"drop table elim_facts, elim_attr1, elim_attr2" . $server->{'drop_attr'}); +} + +if ($opt_fast && defined($server->{vacuum})) +{ + $server->vacuum(0,\$dbh); +} + +$dbh->disconnect; # close connection + +end_benchmark($start_time); + diff --git a/sql/CMakeLists.txt b/sql/CMakeLists.txt index 7ba0a4cceb2..58ba26782f9 100755 --- a/sql/CMakeLists.txt +++ b/sql/CMakeLists.txt @@ -25,7 +25,7 @@ INCLUDE_DIRECTORIES(${CMAKE_SOURCE_DIR}/include ${CMAKE_SOURCE_DIR}/sql ${CMAKE_SOURCE_DIR}/regex ${CMAKE_SOURCE_DIR}/zlib - ${CMAKE_SOURCE_DIR}/extra/libevent + ${CMAKE_SOURCE_DIR}/extra/libevent ) SET_SOURCE_FILES_PROPERTIES(${CMAKE_SOURCE_DIR}/sql/sql_yacc.h @@ -75,7 +75,7 @@ SET (SQL_SOURCE partition_info.cc rpl_utility.cc rpl_injector.cc sql_locale.cc rpl_rli.cc rpl_mi.cc sql_servers.cc sql_connect.cc scheduler.cc - sql_profile.cc event_parse_data.cc + sql_profile.cc event_parse_data.cc opt_table_elimination.cc ${PROJECT_SOURCE_DIR}/sql/sql_yacc.cc ${PROJECT_SOURCE_DIR}/sql/sql_yacc.h ${PROJECT_SOURCE_DIR}/include/mysqld_error.h @@ -129,7 +129,7 @@ ADD_CUSTOM_COMMAND( # Gen_lex_hash ADD_EXECUTABLE(gen_lex_hash gen_lex_hash.cc) -TARGET_LINK_LIBRARIES(gen_lex_hash debug dbug mysqlclient wsock32) +TARGET_LINK_LIBRARIES(gen_lex_hash debug dbug mysqlclient strings wsock32) GET_TARGET_PROPERTY(GEN_LEX_HASH_EXE gen_lex_hash LOCATION) ADD_CUSTOM_COMMAND( OUTPUT ${PROJECT_SOURCE_DIR}/sql/lex_hash.h diff --git a/sql/Makefile.am b/sql/Makefile.am index 937acc95403..00342f7034e 100644 --- a/sql/Makefile.am +++ b/sql/Makefile.am @@ -61,7 +61,7 @@ noinst_HEADERS = item.h item_func.h item_sum.h item_cmpfunc.h \ ha_partition.h rpl_constants.h \ opt_range.h protocol.h rpl_tblmap.h rpl_utility.h \ rpl_reporting.h \ - log.h sql_show.h rpl_rli.h rpl_mi.h \ + log.h log_slow.h sql_show.h rpl_rli.h rpl_mi.h \ sql_select.h structs.h table.h sql_udf.h hash_filo.h \ lex.h lex_symbol.h sql_acl.h sql_crypt.h \ sql_repl.h slave.h rpl_filter.h rpl_injector.h \ @@ -121,7 +121,8 @@ mysqld_SOURCES = sql_lex.cc sql_handler.cc sql_partition.cc \ event_queue.cc event_db_repository.cc events.cc \ sql_plugin.cc sql_binlog.cc \ sql_builtin.cc sql_tablespace.cc partition_info.cc \ - sql_servers.cc event_parse_data.cc + sql_servers.cc event_parse_data.cc \ + opt_table_elimination.cc nodist_mysqld_SOURCES = mini_client_errors.c pack.c client.c my_time.c my_user.c diff --git a/sql/event_data_objects.cc b/sql/event_data_objects.cc index d11aa67ac65..331b437b7ff 100644 --- a/sql/event_data_objects.cc +++ b/sql/event_data_objects.cc @@ -1450,8 +1450,7 @@ Event_job_data::execute(THD *thd, bool drop) DBUG_ASSERT(sphead); - if (thd->enable_slow_log) - sphead->m_flags|= sp_head::LOG_SLOW_STATEMENTS; + sphead->m_flags|= sp_head::LOG_SLOW_STATEMENTS; sphead->m_flags|= sp_head::LOG_GENERAL_LOG; sphead->set_info(0, 0, &thd->lex->sp_chistics, sql_mode); diff --git a/sql/events.cc b/sql/events.cc index 17b00490495..c4c00e09b4a 100644 --- a/sql/events.cc +++ b/sql/events.cc @@ -697,8 +697,7 @@ send_show_create_event(THD *thd, Event_timed *et, Protocol *protocol) field_list.push_back(new Item_empty_string("Event", NAME_CHAR_LEN)); - if (sys_var_thd_sql_mode::symbolic_mode_representation(thd, et->sql_mode, - &sql_mode)) + if (sys_var::make_set(thd, et->sql_mode, &sql_mode_typelib, &sql_mode)) DBUG_RETURN(TRUE); field_list.push_back(new Item_empty_string("sql_mode", (uint) sql_mode.length)); diff --git a/sql/filesort.cc b/sql/filesort.cc index f5fe17e6ff1..552ea27970f 100644 --- a/sql/filesort.cc +++ b/sql/filesort.cc @@ -188,6 +188,7 @@ ha_rows filesort(THD *thd, TABLE *table, SORT_FIELD *sortorder, uint s_length, { status_var_increment(thd->status_var.filesort_scan_count); } + thd->query_plan_flags|= QPLAN_FILESORT; #ifdef CAN_TRUST_RANGE if (select && select->quick && select->quick->records > 0L) { @@ -253,6 +254,7 @@ ha_rows filesort(THD *thd, TABLE *table, SORT_FIELD *sortorder, uint s_length, } else { + thd->query_plan_flags|= QPLAN_FILESORT_DISK; if (table_sort.buffpek && table_sort.buffpek_len < maxbuffer) { x_free(table_sort.buffpek); @@ -1199,6 +1201,7 @@ int merge_buffers(SORTPARAM *param, IO_CACHE *from_file, DBUG_ENTER("merge_buffers"); status_var_increment(current_thd->status_var.filesort_merge_passes); + current_thd->query_plan_fsort_passes++; if (param->not_killable) { killed= ¬_killable; diff --git a/sql/item.cc b/sql/item.cc index ec42b2b5886..b2ab28a77fd 100644 --- a/sql/item.cc +++ b/sql/item.cc @@ -1922,6 +1922,15 @@ void Item_field::reset_field(Field *f) name= (char*) f->field_name; } + +bool Item_field::enumerate_field_refs_processor(uchar *arg) +{ + Field_enumerator *fe= (Field_enumerator*)arg; + fe->visit_field(field); + return FALSE; +} + + const char *Item_ident::full_name() const { char *tmp; @@ -3599,7 +3608,7 @@ static void mark_as_dependent(THD *thd, SELECT_LEX *last, SELECT_LEX *current, /* store pointer on SELECT_LEX from which item is dependent */ if (mark_item) mark_item->depended_from= last; - current->mark_as_dependent(last); + current->mark_as_dependent(last, resolved_item); if (thd->lex->describe & DESCRIBE_EXTENDED) { push_warning_printf(thd, MYSQL_ERROR::WARN_LEVEL_NOTE, diff --git a/sql/item.h b/sql/item.h index 2a9e5b00add..74c4ca701f7 100644 --- a/sql/item.h +++ b/sql/item.h @@ -731,7 +731,11 @@ public: virtual bool val_bool_result() { return val_bool(); } virtual bool is_null_result() { return is_null(); } - /* bit map of tables used by item */ + /* + Bitmap of tables used by item + (note: if you need to check dependencies on individual columns, check out + class Field_enumerator) + */ virtual table_map used_tables() const { return (table_map) 0L; } /* Return table map of tables that can't be NULL tables (tables that are @@ -888,6 +892,8 @@ public: virtual bool reset_query_id_processor(uchar *query_id_arg) { return 0; } virtual bool is_expensive_processor(uchar *arg) { return 0; } virtual bool register_field_in_read_map(uchar *arg) { return 0; } + virtual bool enumerate_field_refs_processor(uchar *arg) { return 0; } + virtual bool mark_as_eliminated_processor(uchar *arg) { return 0; } /* Check if a partition function is allowed SYNOPSIS @@ -1012,6 +1018,29 @@ public: }; +/* + Class to be used to enumerate all field references in an item tree. + Suggested usage: + + class My_enumerator : public Field_enumerator + { + virtual void visit_field() { ... your actions ...} + } + + My_enumerator enumerator; + item->walk(Item::enumerate_field_refs_processor, ...,(uchar*)&enumerator); + + This is similar to Visitor pattern. +*/ + +class Field_enumerator +{ +public: + virtual void visit_field(Field *field)= 0; + virtual ~Field_enumerator() {}; /* purecov: inspected */ +}; + + class sp_head; @@ -1477,6 +1506,7 @@ public: bool find_item_in_field_list_processor(uchar *arg); bool register_field_in_read_map(uchar *arg); bool check_partition_func_processor(uchar *int_arg) {return FALSE;} + bool enumerate_field_refs_processor(uchar *arg); void cleanup(); bool result_as_longlong() { @@ -2203,6 +2233,10 @@ public: if (!depended_from) (*ref)->update_used_tables(); } + bool const_item() const + { + return (*ref)->const_item(); + } table_map not_null_tables() const { return (*ref)->not_null_tables(); } void set_result_field(Field *field) { result_field= field; } bool is_result_field() { return 1; } diff --git a/sql/item_cmpfunc.cc b/sql/item_cmpfunc.cc index bb391187fb9..b2e7be5ef09 100644 --- a/sql/item_cmpfunc.cc +++ b/sql/item_cmpfunc.cc @@ -5168,33 +5168,7 @@ void Item_equal::merge(Item_equal *item) void Item_equal::sort(Item_field_cmpfunc cmp, void *arg) { - bool swap; - List_iterator<Item_field> it(fields); - do - { - Item_field *item1= it++; - Item_field **ref1= it.ref(); - Item_field *item2; - - swap= FALSE; - while ((item2= it++)) - { - Item_field **ref2= it.ref(); - if (cmp(item1, item2, arg) < 0) - { - Item_field *item= *ref1; - *ref1= *ref2; - *ref2= item; - swap= TRUE; - } - else - { - item1= item2; - ref1= ref2; - } - } - it.rewind(); - } while (swap); + exchange_sort<Item_field>(&fields, cmp, arg); } diff --git a/sql/item_cmpfunc.h b/sql/item_cmpfunc.h index c2227fa04e0..23f505182dd 100644 --- a/sql/item_cmpfunc.h +++ b/sql/item_cmpfunc.h @@ -1578,6 +1578,7 @@ public: uint members(); bool contains(Field *field); Item_field* get_first() { return fields.head(); } + uint n_fields() { return fields.elements; } void merge(Item_equal *item); void update_const(); enum Functype functype() const { return MULT_EQUAL_FUNC; } diff --git a/sql/item_subselect.cc b/sql/item_subselect.cc index 6fa23b77f0d..7ee22fb3c1c 100644 --- a/sql/item_subselect.cc +++ b/sql/item_subselect.cc @@ -39,7 +39,7 @@ inline Item * and_items(Item* cond, Item *item) Item_subselect::Item_subselect(): Item_result_field(), value_assigned(0), thd(0), substitution(0), engine(0), old_engine(0), used_tables_cache(0), have_to_be_excluded(0), - const_item_cache(1), engine_changed(0), changed(0), is_correlated(FALSE) + const_item_cache(1), in_fix_fields(0), engine_changed(0), changed(0), is_correlated(FALSE) { with_subselect= 1; reset(); @@ -151,10 +151,14 @@ bool Item_subselect::fix_fields(THD *thd_param, Item **ref) DBUG_ASSERT(fixed == 0); engine->set_thd((thd= thd_param)); + if (!in_fix_fields) + refers_to.empty(); + eliminated= FALSE; if (check_stack_overrun(thd, STACK_MIN_SIZE, (uchar*)&res)) return TRUE; - + + in_fix_fields++; res= engine->prepare(); // all transformation is done (used by prepared statements) @@ -181,12 +185,14 @@ bool Item_subselect::fix_fields(THD *thd_param, Item **ref) if (!(*ref)->fixed) ret= (*ref)->fix_fields(thd, ref); thd->where= save_where; + in_fix_fields--; return ret; } // Is it one field subselect? if (engine->cols() > max_columns) { my_error(ER_OPERAND_COLUMNS, MYF(0), 1); + in_fix_fields--; return TRUE; } fix_length_and_dec(); @@ -203,11 +209,30 @@ bool Item_subselect::fix_fields(THD *thd_param, Item **ref) fixed= 1; err: + in_fix_fields--; thd->where= save_where; return res; } +bool Item_subselect::enumerate_field_refs_processor(uchar *arg) +{ + List_iterator<Item> it(refers_to); + Item *item; + while ((item= it++)) + { + if (item->walk(&Item::enumerate_field_refs_processor, FALSE, arg)) + return TRUE; + } + return FALSE; +} + +bool Item_subselect::mark_as_eliminated_processor(uchar *arg) +{ + eliminated= TRUE; + return FALSE; +} + bool Item_subselect::walk(Item_processor processor, bool walk_subquery, uchar *argument) { @@ -225,6 +250,7 @@ bool Item_subselect::walk(Item_processor processor, bool walk_subquery, if (lex->having && (lex->having)->walk(processor, walk_subquery, argument)) return 1; + /* TODO: why does this walk WHERE/HAVING but not ON expressions of outer joins? */ while ((item=li++)) { diff --git a/sql/item_subselect.h b/sql/item_subselect.h index d4aa621c083..19d58c65259 100644 --- a/sql/item_subselect.h +++ b/sql/item_subselect.h @@ -52,8 +52,16 @@ protected: bool have_to_be_excluded; /* cache of constant state */ bool const_item_cache; - + public: + /* + References from inside the subquery to the select that this predicate is + in. References to parent selects not included. + */ + List<Item> refers_to; + int in_fix_fields; + bool eliminated; + /* changed engine indicator */ bool engine_changed; /* subquery is transformed */ @@ -126,6 +134,8 @@ public: virtual void reset_value_registration() {} enum_parsing_place place() { return parsing_place; } bool walk(Item_processor processor, bool walk_subquery, uchar *arg); + bool mark_as_eliminated_processor(uchar *arg); + bool enumerate_field_refs_processor(uchar *arg); /** Get the SELECT_LEX structure associated with this Item. diff --git a/sql/item_sum.cc b/sql/item_sum.cc index 97779b6a2b7..ab2da503209 100644 --- a/sql/item_sum.cc +++ b/sql/item_sum.cc @@ -350,7 +350,7 @@ bool Item_sum::register_sum_func(THD *thd, Item **ref) sl= sl->master_unit()->outer_select() ) sl->master_unit()->item->with_sum_func= 1; } - thd->lex->current_select->mark_as_dependent(aggr_sel); + thd->lex->current_select->mark_as_dependent(aggr_sel, NULL); return FALSE; } @@ -542,11 +542,6 @@ void Item_sum::update_used_tables () args[i]->update_used_tables(); used_tables_cache|= args[i]->used_tables(); } - - used_tables_cache&= PSEUDO_TABLE_BITS; - - /* the aggregate function is aggregated into its local context */ - used_tables_cache |= (1 << aggr_sel->join->tables) - 1; } } diff --git a/sql/item_sum.h b/sql/item_sum.h index d991327d847..e884452d6e6 100644 --- a/sql/item_sum.h +++ b/sql/item_sum.h @@ -255,6 +255,12 @@ protected: */ Item **orig_args, *tmp_orig_args[2]; table_map used_tables_cache; + + /* + TRUE <=> We've managed to calculate the value of this Item in + opt_sum_query(), hence it can be considered constant at all subsequent + steps. + */ bool forced_const; public: @@ -341,6 +347,15 @@ public: virtual const char *func_name() const= 0; virtual Item *result_item(Field *field) { return new Item_field(field); } + /* + Return bitmap of tables that are needed to evaluate the item. + + The implementation takes into account the used strategy: items resolved + at optimization phase will report 0. + Items that depend on the number of join output records, but not columns + of any particular table (like COUNT(*)) will report 0 from used_tables(), + but will still return false from const_item(). + */ table_map used_tables() const { return used_tables_cache; } void update_used_tables (); void cleanup() diff --git a/sql/log.cc b/sql/log.cc index 58d00819cf7..ea71e6caefd 100644 --- a/sql/log.cc +++ b/sql/log.cc @@ -983,7 +983,7 @@ bool LOGGER::slow_log_print(THD *thd, const char *query, uint query_length, /* fill in user_host value: the format is "%s[%s] @ %s [%s]" */ user_host_len= (strxnmov(user_host_buff, MAX_USER_HOST_SIZE, sctx->priv_user ? sctx->priv_user : "", "[", - sctx->user ? sctx->user : "", "] @ ", + sctx->user ? sctx->user : (thd->slave_thread ? "SQL_SLAVE" : ""), "] @ ", sctx->host ? sctx->host : "", " [", sctx->ip ? sctx->ip : "", "]", NullS) - user_host_buff); @@ -1006,6 +1006,17 @@ bool LOGGER::slow_log_print(THD *thd, const char *query, uint query_length, query_length= command_name[thd->command].length; } + if (!query_length) + { + /* + Not a real query; Reset counts for slow query logging + (QQ: Wonder if this is really needed) + */ + thd->sent_row_count= thd->examined_row_count= 0; + thd->query_plan_flags= QPLAN_INIT; + thd->query_plan_fsort_passes= 0; + } + for (current_handler= slow_log_handler_list; *current_handler ;) error= (*current_handler++)->log_slow(thd, current_time, thd->start_time, user_host_buff, user_host_len, @@ -2295,19 +2306,39 @@ bool MYSQL_QUERY_LOG::write(THD *thd, time_t current_time, if (my_b_write(&log_file, (uchar*) "\n", 1)) tmp_errno= errno; } + /* For slow query log */ sprintf(query_time_buff, "%.6f", ulonglong2double(query_utime)/1000000.0); sprintf(lock_time_buff, "%.6f", ulonglong2double(lock_utime)/1000000.0); if (my_b_printf(&log_file, - "# Query_time: %s Lock_time: %s" - " Rows_sent: %lu Rows_examined: %lu\n", + "# Thread_id: %lu Schema: %s QC_hit: %s\n" \ + "# Query_time: %s Lock_time: %s Rows_sent: %lu Rows_examined: %lu\n", + (ulong) thd->thread_id, (thd->db ? thd->db : ""), + ((thd->query_plan_flags & QPLAN_QC) ? "Yes" : "No"), query_time_buff, lock_time_buff, (ulong) thd->sent_row_count, - (ulong) thd->examined_row_count) == (uint) -1) + (ulong) thd->examined_row_count) == (size_t) -1) tmp_errno= errno; + if ((thd->variables.log_slow_verbosity & LOG_SLOW_VERBOSITY_QUERY_PLAN) && + (thd->query_plan_flags & + (QPLAN_FULL_SCAN | QPLAN_FULL_JOIN | QPLAN_TMP_TABLE | + QPLAN_TMP_DISK | QPLAN_FILESORT | QPLAN_FILESORT_DISK)) && + my_b_printf(&log_file, + "# Full_scan: %s Full_join: %s " + "Tmp_table: %s Tmp_table_on_disk: %s\n" + "# Filesort: %s Filesort_on_disk: %s Merge_passes: %lu\n", + ((thd->query_plan_flags & QPLAN_FULL_SCAN) ? "Yes" : "No"), + ((thd->query_plan_flags & QPLAN_FULL_JOIN) ? "Yes" : "No"), + ((thd->query_plan_flags & QPLAN_TMP_TABLE) ? "Yes" : "No"), + ((thd->query_plan_flags & QPLAN_TMP_DISK) ? "Yes" : "No"), + ((thd->query_plan_flags & QPLAN_FILESORT) ? "Yes" : "No"), + ((thd->query_plan_flags & QPLAN_FILESORT_DISK) ? + "Yes" : "No"), + thd->query_plan_fsort_passes) == (size_t) -1) + tmp_errno= errno; if (thd->db && strcmp(thd->db, db)) { // Database changed - if (my_b_printf(&log_file,"use %s;\n",thd->db) == (uint) -1) + if (my_b_printf(&log_file,"use %s;\n",thd->db) == (size_t) -1) tmp_errno= errno; strmov(db,thd->db); } diff --git a/sql/log_slow.h b/sql/log_slow.h new file mode 100644 index 00000000000..5559c002fde --- /dev/null +++ b/sql/log_slow.h @@ -0,0 +1,107 @@ +/* Copyright (C) 2009 Monty Program Ab + + This program is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; version 2 or later of the License. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program; if not, write to the Free Software + Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA */ + +/* Defining what to log to slow log */ + +#define LOG_SLOW_VERBOSITY_INIT 0 +#define LOG_SLOW_VERBOSITY_INNODB 1 << 0 +#define LOG_SLOW_VERBOSITY_QUERY_PLAN 1 << 1 + +#ifdef DEFINE_VARIABLES_LOG_SLOW + +/* Names here must be in same order as the bit's above */ +static const char *log_slow_verbosity_names[]= +{ + "innodb","query_plan", + NullS +}; + +static const unsigned int log_slow_verbosity_names_len[]= +{ + sizeof("innodb") -1, + sizeof("query_plan")-1 +}; + +TYPELIB log_slow_verbosity_typelib= +{ array_elements(log_slow_verbosity_names)-1,"", log_slow_verbosity_names, + (unsigned int *) log_slow_verbosity_names_len }; + +#else +extern TYPELIB log_slow_verbosity_typelib; +#endif /* DEFINE_VARIABLES_LOG_SLOW */ + +/* Defines for what kind of query plan was used and what to log */ + +/* + We init the used query plan with a bit that is alwyas set and all 'no' bits + to enable easy testing of what to log in sql_log.cc +*/ +#define QPLAN_INIT (QPLAN_ALWAYS_SET | QPLAN_QC_NO) + +#define QPLAN_ADMIN 1 << 0 +#define QPLAN_FILESORT 1 << 1 +#define QPLAN_FILESORT_DISK 1 << 2 +#define QPLAN_FULL_JOIN 1 << 3 +#define QPLAN_FULL_SCAN 1 << 4 +#define QPLAN_QC 1 << 5 +#define QPLAN_QC_NO 1 << 6 +#define QPLAN_TMP_DISK 1 << 7 +#define QPLAN_TMP_TABLE 1 << 8 +/* ... */ +#define QPLAN_MAX ((ulong) 1) << 31 /* reserved as placeholder */ +#define QPLAN_ALWAYS_SET QPLAN_MAX +#define QPLAN_VISIBLE_MASK (~(QPLAN_ALWAYS_SET)) + +#ifdef DEFINE_VARIABLES_LOG_SLOW +/* Names here must be in same order as the bit's above */ +static const char *log_slow_filter_names[]= +{ + "admin", + "filesort", + "filesort_on_disk", + "full_join", + "full_scan", + "query_cache", + "query_cache_miss", + "tmp_table", + "tmp_table_on_disk", + NullS +}; + +static const unsigned int log_slow_filter_names_len[]= +{ + sizeof("admin")-1, + sizeof("filesort")-1, + sizeof("filesort_on_disk")-1, + sizeof("full_join")-1, + sizeof("full_scan")-1, + sizeof("query_cache")-1, + sizeof("query_cache_miss")-1, + sizeof("tmp_table")-1, + sizeof("tmp_table_on_disk")-1 +}; + +TYPELIB log_slow_filter_typelib= +{ array_elements(log_slow_filter_names)-1,"", log_slow_filter_names, + (unsigned int *) log_slow_filter_names_len }; + +#else +extern TYPELIB log_slow_filter_typelib; +#endif /* DEFINE_VARIABLES_LOG_SLOW */ + +static inline ulong fix_log_slow_filter(ulong org_filter) +{ + return org_filter ? org_filter : QPLAN_ALWAYS_SET; +} diff --git a/sql/mysql_priv.h b/sql/mysql_priv.h index 3c46ba7ea3a..63d5621742e 100644 --- a/sql/mysql_priv.h +++ b/sql/mysql_priv.h @@ -43,6 +43,7 @@ #include "sql_array.h" #include "sql_plugin.h" #include "scheduler.h" +#include "log_slow.h" class Parser_state; @@ -535,14 +536,27 @@ protected: #define OPTIMIZER_SWITCH_INDEX_MERGE_UNION 2 #define OPTIMIZER_SWITCH_INDEX_MERGE_SORT_UNION 4 #define OPTIMIZER_SWITCH_INDEX_MERGE_INTERSECT 8 -#define OPTIMIZER_SWITCH_LAST 16 -/* The following must be kept in sync with optimizer_switch_str in mysqld.cc */ -#define OPTIMIZER_SWITCH_DEFAULT (OPTIMIZER_SWITCH_INDEX_MERGE | \ - OPTIMIZER_SWITCH_INDEX_MERGE_UNION | \ - OPTIMIZER_SWITCH_INDEX_MERGE_SORT_UNION | \ - OPTIMIZER_SWITCH_INDEX_MERGE_INTERSECT) +#ifdef DBUG_OFF +# define OPTIMIZER_SWITCH_LAST 16 +#else +# define OPTIMIZER_SWITCH_TABLE_ELIMINATION 16 +# define OPTIMIZER_SWITCH_LAST 32 +#endif +#ifdef DBUG_OFF +/* The following must be kept in sync with optimizer_switch_str in mysqld.cc */ +# define OPTIMIZER_SWITCH_DEFAULT (OPTIMIZER_SWITCH_INDEX_MERGE | \ + OPTIMIZER_SWITCH_INDEX_MERGE_UNION | \ + OPTIMIZER_SWITCH_INDEX_MERGE_SORT_UNION | \ + OPTIMIZER_SWITCH_INDEX_MERGE_INTERSECT) +#else +# define OPTIMIZER_SWITCH_DEFAULT (OPTIMIZER_SWITCH_INDEX_MERGE | \ + OPTIMIZER_SWITCH_INDEX_MERGE_UNION | \ + OPTIMIZER_SWITCH_INDEX_MERGE_SORT_UNION | \ + OPTIMIZER_SWITCH_INDEX_MERGE_INTERSECT | \ + OPTIMIZER_SWITCH_TABLE_ELIMINATION) +#endif /* Replication uses 8 bytes to store SQL_MODE in the binary log. The day you diff --git a/sql/mysqld.cc b/sql/mysqld.cc index 0cb8544b880..8988ad0671b 100644 --- a/sql/mysqld.cc +++ b/sql/mysqld.cc @@ -13,6 +13,7 @@ along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA */ +#define DEFINE_VARIABLES_LOG_SLOW // Declare variables in log_slow.h #include "mysql_priv.h" #include <m_ctype.h> #include <my_dir.h> @@ -297,9 +298,14 @@ TYPELIB sql_mode_typelib= { array_elements(sql_mode_names)-1,"", static const char *optimizer_switch_names[]= { - "index_merge","index_merge_union","index_merge_sort_union", - "index_merge_intersection", "default", NullS + "index_merge","index_merge_union","index_merge_sort_union", + "index_merge_intersection", +#ifndef DBUG_OFF + "table_elimination", +#endif + "default", NullS }; + /* Corresponding defines are named OPTIMIZER_SWITCH_XXX */ static const unsigned int optimizer_switch_names_len[]= { @@ -307,6 +313,9 @@ static const unsigned int optimizer_switch_names_len[]= sizeof("index_merge_union") - 1, sizeof("index_merge_sort_union") - 1, sizeof("index_merge_intersection") - 1, +#ifndef DBUG_OFF + sizeof("table_elimination") - 1, +#endif sizeof("default") - 1 }; TYPELIB optimizer_switch_typelib= { array_elements(optimizer_switch_names)-1,"", @@ -382,7 +391,12 @@ static const char *sql_mode_str= "OFF"; /* Text representation for OPTIMIZER_SWITCH_DEFAULT */ static const char *optimizer_switch_str="index_merge=on,index_merge_union=on," "index_merge_sort_union=on," - "index_merge_intersection=on"; + "index_merge_intersection=on" +#ifndef DBUG_OFF + ",table_elimination=on"; +#else + ; +#endif static char *mysqld_user, *mysqld_chroot, *log_error_file_ptr; static char *opt_init_slave, *language_ptr, *opt_init_connect; static char *default_character_set_name; @@ -1020,6 +1034,7 @@ static void close_connections(void) } +#ifdef HAVE_CLOSE_SERVER_SOCK static void close_socket(my_socket sock, const char *info) { DBUG_ENTER("close_socket"); @@ -1039,6 +1054,7 @@ static void close_socket(my_socket sock, const char *info) } DBUG_VOID_RETURN; } +#endif static void close_server_sock() @@ -5793,6 +5809,9 @@ enum options_mysqld OPT_DEADLOCK_SEARCH_DEPTH_LONG, OPT_DEADLOCK_TIMEOUT_SHORT, OPT_DEADLOCK_TIMEOUT_LONG, + OPT_LOG_SLOW_RATE_LIMIT, + OPT_LOG_SLOW_VERBOSITY, + OPT_LOG_SLOW_FILTER, OPT_GENERAL_LOG_FILE, OPT_SLOW_QUERY_LOG_FILE, OPT_IGNORE_BUILTIN_INNODB @@ -6135,7 +6154,7 @@ Disable with --skip-large-pages.", (uchar**) &opt_log_slave_updates, (uchar**) &opt_log_slave_updates, 0, GET_BOOL, NO_ARG, 0, 0, 0, 0, 0, 0}, {"log-slow-admin-statements", OPT_LOG_SLOW_ADMIN_STATEMENTS, - "Log slow OPTIMIZE, ANALYZE, ALTER and other administrative statements to the slow log if it is open.", + "Log slow OPTIMIZE, ANALYZE, ALTER and other administrative statements to the slow log if it is open. . Please note that this option is deprecated; see --log-slow-filter for filtering slow query log output", (uchar**) &opt_log_slow_admin_statements, (uchar**) &opt_log_slow_admin_statements, 0, GET_BOOL, NO_ARG, 0, 0, 0, 0, 0, 0}, @@ -6144,15 +6163,15 @@ Disable with --skip-large-pages.", (uchar**) &opt_log_slow_slave_statements, (uchar**) &opt_log_slow_slave_statements, 0, GET_BOOL, NO_ARG, 0, 0, 0, 0, 0, 0}, - {"log_slow_queries", OPT_SLOW_QUERY_LOG, + {"log-slow-queries", OPT_SLOW_QUERY_LOG, "Log slow queries to a table or log file. Defaults logging to table " "mysql.slow_log or hostname-slow.log if --log-output=file is used. " "Must be enabled to activate other slow log options. " "(deprecated option, use --slow_query_log/--slow_query_log_file instead)", (uchar**) &opt_slow_logname, (uchar**) &opt_slow_logname, 0, GET_STR, OPT_ARG, 0, 0, 0, 0, 0, 0}, - {"slow_query_log_file", OPT_SLOW_QUERY_LOG_FILE, - "Log slow queries to given log file. Defaults logging to hostname-slow.log. Must be enabled to activate other slow log options.", + {"slow-query-log-file", OPT_SLOW_QUERY_LOG_FILE, + "Log slow queries to given log file. Defaults logging to hostname-slow.log.", (uchar**) &opt_slow_logname, (uchar**) &opt_slow_logname, 0, GET_STR, REQUIRED_ARG, 0, 0, 0, 0, 0, 0}, {"log-tc", OPT_LOG_TC, @@ -6772,11 +6791,31 @@ log and this option does nothing anymore.", (uchar**) 0, 0, (GET_ULONG | GET_ASK_ADDR) , REQUIRED_ARG, 100, 1, 100, 0, 1, 0}, + {"log-slow-filter", OPT_LOG_SLOW_FILTER, + "Log only the queries that followed certain execution plan. Multiple flags allowed in a comma-separated string. [admin, filesort, filesort_on_disk, full_join, full_scan, query_cache, query_cache_miss, tmp_table, tmp_table_on_disk]. Sets log-slow-admin-command to ON", + 0, 0, 0, GET_STR, REQUIRED_ARG, 0, 0, 0, QPLAN_ALWAYS_SET, 0, 0}, + {"log-slow-rate_limit", OPT_LOG_SLOW_RATE_LIMIT, + "If set, only write to slow log every 'log_slow_rate_limit' query (use this to reduce output on slow query log)", + (uchar**) &global_system_variables.log_slow_rate_limit, + (uchar**) &max_system_variables.log_slow_rate_limit, 0, GET_ULONG, + REQUIRED_ARG, 1, 1, ~0L, 0, 1L, 0}, + {"log-slow-verbosity", OPT_LOG_SLOW_VERBOSITY, + "Choose how verbose the messages to your slow log will be. Multiple flags allowed in a comma-separated string. [query_plan, innodb]", + 0, 0, 0, GET_STR, REQUIRED_ARG, 0, 0, 0, 0, 0, 0 }, + {"log-slow-file", OPT_SLOW_QUERY_LOG_FILE, + "Log slow queries to given log file. Defaults logging to hostname-slow.log", + (uchar**) &opt_slow_logname, (uchar**) &opt_slow_logname, 0, GET_STR, + REQUIRED_ARG, 0, 0, 0, 0, 0, 0}, {"long_query_time", OPT_LONG_QUERY_TIME, "Log all queries that have taken more than long_query_time seconds to execute to file. " "The argument will be treated as a decimal value with microsecond precission.", (uchar**) &long_query_time, (uchar**) &long_query_time, 0, GET_DOUBLE, REQUIRED_ARG, 10, 0, LONG_TIMEOUT, 0, 0, 0}, + {"log-slow-time", OPT_LONG_QUERY_TIME, + "Log all queries that have taken more than long_query_time seconds to execute to file. " + "The argument will be treated as a decimal value with microsecond precission.", + (uchar**) &long_query_time, (uchar**) &long_query_time, 0, GET_DOUBLE, + REQUIRED_ARG, 10, 0, LONG_TIMEOUT, 0, 0, 0}, {"lower_case_table_names", OPT_LOWER_CASE_TABLE_NAMES, "If set to 1 table names are stored in lowercase on disk and table names will be case-insensitive. Should be set to 2 if you are using a case insensitive file system", (uchar**) &lower_case_table_names, @@ -6963,8 +7002,11 @@ The minimum value for this variable is 4096.", 0, GET_ULONG, OPT_ARG, MAX_TABLES+1, 0, MAX_TABLES+2, 0, 1, 0}, {"optimizer_switch", OPT_OPTIMIZER_SWITCH, "optimizer_switch=option=val[,option=val...], where option={index_merge, " - "index_merge_union, index_merge_sort_union, index_merge_intersection} and " - "val={on, off, default}.", + "index_merge_union, index_merge_sort_union, index_merge_intersection" +#ifndef DBUG_OFF + ", table_elimination" +#endif + "} and val={on, off, default}.", (uchar**) &optimizer_switch_str, (uchar**) &optimizer_switch_str, 0, GET_STR, REQUIRED_ARG, /*OPTIMIZER_SWITCH_DEFAULT*/0, 0, 0, 0, 0, 0}, @@ -7877,6 +7919,9 @@ static int mysql_init_variables(void) global_system_variables.old_passwords= 0; global_system_variables.old_alter_table= 0; global_system_variables.binlog_format= BINLOG_FORMAT_UNSPEC; + global_system_variables.log_slow_verbosity= LOG_SLOW_VERBOSITY_INIT; + global_system_variables.log_slow_filter= QPLAN_ALWAYS_SET; + /* Default behavior for 4.1 and 5.0 is to treat NULL values as unequal when collecting index statistics for MyISAM tables. @@ -8231,7 +8276,7 @@ mysqld_get_one_option(int optid, } #endif /* HAVE_REPLICATION */ case (int) OPT_SLOW_QUERY_LOG: - WARN_DEPRECATED(NULL, "7.0", "--log_slow_queries", "'--slow_query_log'/'--slow_query_log_file'"); + WARN_DEPRECATED(NULL, "7.0", "--log_slow_queries", "'--slow_query_log'/'--log-slow-file'"); opt_slow_log= 1; break; #ifdef WITH_CSV_STORAGE_ENGINE @@ -8384,6 +8429,25 @@ mysqld_get_one_option(int optid, case OPT_BOOTSTRAP: opt_noacl=opt_bootstrap=1; break; + case OPT_LOG_SLOW_FILTER: + global_system_variables.log_slow_filter= + find_bit_type_or_exit(argument, &log_slow_verbosity_typelib, + opt->name, &error); + /* + If we are using filters, we set opt_slow_admin_statements to be always + true so we can maintain everything with filters + */ + opt_log_slow_admin_statements= 1; + if (error) + return 1; + break; + case OPT_LOG_SLOW_VERBOSITY: + global_system_variables.log_slow_verbosity= + find_bit_type_or_exit(argument, &log_slow_filter_typelib, + opt->name, &error); + if (error) + return 1; + break; case OPT_SERVER_ID: server_id_supplied = 1; break; @@ -8692,6 +8756,8 @@ static int get_options(int *argc,char **argv) /* Set global slave_exec_mode from its option */ fix_slave_exec_mode(OPT_GLOBAL); + global_system_variables.log_slow_filter= + fix_log_slow_filter(global_system_variables.log_slow_filter); #ifndef EMBEDDED_LIBRARY if (mysqld_chroot) set_root(mysqld_chroot); diff --git a/sql/opt_range.h b/sql/opt_range.h index 225af276e55..6a0234e62fe 100644 --- a/sql/opt_range.h +++ b/sql/opt_range.h @@ -38,6 +38,12 @@ typedef struct st_key_part { } KEY_PART; +/* + A "MIN_TUPLE < tbl.key_tuple < MAX_TUPLE" interval. + + One of endpoints may be absent. 'flags' member has flags which tell whether + the endpoints are '<' or '<='. +*/ class QUICK_RANGE :public Sql_alloc { public: uchar *min_key,*max_key; diff --git a/sql/opt_table_elimination.cc b/sql/opt_table_elimination.cc new file mode 100644 index 00000000000..712572b07b5 --- /dev/null +++ b/sql/opt_table_elimination.cc @@ -0,0 +1,1853 @@ +/** + @file + + @brief + Table Elimination Module + + @defgroup Table_Elimination Table Elimination Module + @{ +*/ + +#ifdef USE_PRAGMA_IMPLEMENTATION +#pragma implementation // gcc: Class implementation +#endif + +#include "mysql_priv.h" +#include "my_bit.h" +#include "sql_select.h" + +/* + OVERVIEW + ======== + + This file contains table elimination module. The idea behind table + elimination is as follows: suppose we have a left join + + SELECT * FROM t1 LEFT JOIN + (t2 JOIN t3) ON t2.primary_key=t1.col AND + t2.primary_key=t2.col + WHERE ... + + such that + * columns of the inner tables are not used anywhere ouside the outer join + (not in WHERE, not in GROUP/ORDER BY clause, not in select list etc etc), + * inner side of the outer join is guaranteed to produce at most one matching + record combination for each record combination of outer tables. + + then the inner side of the outer join can be removed from the query, as it + will always produce only one record combination (either real or + null-complemented one) and we don't care about what that record combination + is. + + + MODULE INTERFACE + ================ + + The module has one entry point - the eliminate_tables() function, which one + needs to call (once) at some point before join optimization. + eliminate_tables() operates over the JOIN structures. Logically, it + removes the inner tables of an outer join operation together with the + operation itself. Physically, it changes the following members: + + * Eliminated tables are marked as constant and moved to the front of the + join order. + + * In addition to this, they are recorded in JOIN::eliminated_tables bitmap. + + * Items that became disused because they were in the ON expression of an + eliminated outer join are notified by means of the Item tree walk which + calls Item::mark_as_eliminated_processor for every item + - At the moment the only Item that cares whether it was eliminated is + Item_subselect with its Item_subselect::eliminated flag which is used + by EXPLAIN code to check if the subquery should be shown in EXPLAIN. + + Table elimination is redone on every PS re-execution. + + + TABLE ELIMINATION ALGORITHM FOR ONE OUTER JOIN + ============================================== + + As described above, we can remove inner side of an outer join if it is + + 1. not referred to from any other parts of the query + 2. always produces one matching record combination. + + We check #1 by doing a recursive descent down the join->join_list while + maintaining a union of used_tables() attribute of all Item expressions in + other parts of the query. When we encounter an outer join, we check if the + bitmap of tables on its inner side has intersection with tables that are used + elsewhere. No intersection means that inner side of the outer join could + potentially be eliminated. + + In order to check #2, one needs to prove that inner side of an outer join + is functionally dependent on the outside. The proof is constructed from + functional dependencies of intermediate objects: + + - Inner side of outer join is functionally dependent when each of its tables + are functionally dependent. (We assume a table is functionally dependent + when its dependencies allow to uniquely identify one table record, or no + records). + + - Table is functionally dependent when it has got a unique key whose columns + are functionally dependent. + + - A column is functionally dependent when we could locate an AND-part of a + certain ON clause in form + + tblX.columnY= expr + + where expr is functionally depdendent. expr is functionally dependent when + all columns that it refers to are functionally dependent. + + These relationships are modeled as a bipartite directed graph that has + dependencies as edges and two kinds of nodes: + + Value nodes: + - Table column values (each is a value of tblX.columnY) + - Table values (each node represents a table inside the join nest we're + trying to eliminate). + A value has one attribute, it is either bound (i.e. functionally dependent) + or not. + + Module nodes: + - Modules representing tblX.colY=expr equalities. Equality module has + = incoming edges from columns used in expr + = outgoing edge to tblX.colY column. + - Nodes representing unique keys. Unique key has + = incoming edges from key component value modules + = outgoing edge to key's table module + - Inner side of outer join module. Outer join module has + = incoming edges from table value modules + = No outgoing edges. Once we reach it, we know we can eliminate the + outer join. + A module may depend on multiple values, and hence its primary attribute is + the number of its arguments that are not bound. + + The algorithm starts with equality nodes that don't have any incoming edges + (their expressions are either constant or depend only on tables that are + outside of the outer join in question) and performns a breadth-first + traversal. If we reach the outer join nest node, it means outer join is + functionally dependent and can be eliminated. Otherwise it cannot be + eliminated. + + HANDLING MULTIPLE NESTED OUTER JOINS + ==================================== + + Outer joins that are not nested one within another are eliminated + independently. For nested outer joins we have the following considerations: + + 1. ON expressions from children outer joins must be taken into account + + Consider this example: + + SELECT t0.* + FROM + t0 + LEFT JOIN + (t1 LEFT JOIN t2 ON t2.primary_key=t1.col1) + ON + t1.primary_key=t0.col AND t2.col1=t1.col2 + + Here we cannot eliminate the "... LEFT JOIN t2 ON ..." part alone because the + ON clause of top level outer join has references to table t2. + We can eliminate the entire "... LEFT JOIN (t1 LEFT JOIN t2) ON .." part, + but in order to do that, we must look at both ON expressions. + + 2. ON expressions of parent outer joins are useless. + Consider an example: + + SELECT t0.* + FROM + t0 + LEFT JOIN + (t1 LEFT JOIN t2 ON some_expr) + ON + t2.primary_key=t1.col -- (*) + + Here the uppermost ON expression has a clause that gives us functional + dependency of table t2 on t1 and hence could be used to eliminate the + "... LEFT JOIN t2 ON..." part. + However, we would not actually encounter this situation, because before the + table elimination we run simplify_joins(), which, among other things, upon + seeing a functional dependency condition like (*) will convert the outer join + of + + "... LEFT JOIN t2 ON ..." + + into inner join and thus make table elimination not to consider eliminating + table t2. +*/ + +class Dep_value; + class Dep_value_field; + class Dep_value_table; + + +class Dep_module; + class Dep_module_expr; + class Dep_module_goal; + class Dep_module_key; + +class Dep_analysis_context; + + +/* + A value, something that can be bound or not bound. One can also iterate over + unbound modules that depend on this value +*/ + +class Dep_value : public Sql_alloc +{ +public: + Dep_value(): bound(FALSE) {} + virtual ~Dep_value(){} /* purecov: inspected */ /* stop compiler warnings */ + + bool is_bound() { return bound; } + void make_bound() { bound= TRUE; } + + /* Iteration over unbound modules that depend on this value */ + typedef char *Iterator; + virtual Iterator init_unbound_modules_iter(char *buf)=0; + virtual Dep_module* get_next_unbound_module(Dep_analysis_context *dac, + Iterator iter) = 0; + static const size_t iterator_size; +protected: + bool bound; +}; + + +/* + A table field value. There is exactly only one such object for any tblX.fieldY + - the field depends on its table and equalities + - expressions that use the field are its dependencies +*/ + +class Dep_value_field : public Dep_value +{ +public: + Dep_value_field(Dep_value_table *table_arg, Field *field_arg) : + table(table_arg), field(field_arg) + {} + + Dep_value_table *table; /* Table this field is from */ + Field *field; /* Field this object is representing */ + + /* Iteration over unbound modules that are our dependencies */ + Iterator init_unbound_modules_iter(char *buf); + Dep_module* get_next_unbound_module(Dep_analysis_context *dac, + Iterator iter); + + void make_unbound_modules_iter_skip_keys(Iterator iter); + + static const size_t iterator_size; +private: + /* + Field_deps that belong to one table form a linked list, ordered by + field_index + */ + Dep_value_field *next_table_field; + + /* + Offset to bits in Dep_analysis_context::expr_deps (see comment to that + member for semantics of the bits). + */ + uint bitmap_offset; + + class Module_iter + { + public: + /* if not null, return this and advance */ + Dep_module_key *key_dep; + /* Otherwise, this and advance */ + uint equality_no; + }; + friend class Dep_analysis_context; + friend class Field_dependency_recorder; + friend class Dep_value_table; +}; + +const size_t Dep_value_field::iterator_size= + ALIGN_SIZE(sizeof(Dep_value_field::Module_iter)); + + +/* + A table value. There is one Dep_value_table object for every table that can + potentially be eliminated. + + Table becomes bound as soon as some of its unique keys becomes bound + Once the table is bound: + - all of its fields are bound + - its embedding outer join has one less unknown argument +*/ + +class Dep_value_table : public Dep_value +{ +public: + Dep_value_table(TABLE *table_arg) : + table(table_arg), fields(NULL), keys(NULL) + {} + TABLE *table; /* Table this object is representing */ + /* Ordered list of fields that belong to this table */ + Dep_value_field *fields; + Dep_module_key *keys; /* Ordered list of Unique keys in this table */ + + /* Iteration over unbound modules that are our dependencies */ + Iterator init_unbound_modules_iter(char *buf); + Dep_module* get_next_unbound_module(Dep_analysis_context *dac, + Iterator iter); + static const size_t iterator_size; +private: + class Module_iter + { + public: + /* Space for field iterator */ + char buf[Dep_value_field::iterator_size]; + /* !NULL <=> iterating over depdenent modules of this field */ + Dep_value_field *field_dep; + bool returned_goal; + }; +}; + + +const size_t Dep_value_table::iterator_size= + ALIGN_SIZE(sizeof(Dep_value_table::Module_iter)); + +const size_t Dep_value::iterator_size= + max(Dep_value_table::iterator_size, Dep_value_field::iterator_size); + + +/* + A 'module'. Module has unsatisfied dependencies, number of whose is stored in + unbound_args. Modules also can be linked together in a list. +*/ + +class Dep_module : public Sql_alloc +{ +public: + virtual ~Dep_module(){} /* purecov: inspected */ /* stop compiler warnings */ + + /* Mark as bound. Currently is non-virtual and does nothing */ + void make_bound() {}; + + /* + The final module will return TRUE here. When we see that TRUE was returned, + that will mean that functional dependency check succeeded. + */ + virtual bool is_final () { return FALSE; } + + /* + Increment number of bound arguments. this is expected to change + is_applicable() from false to true after sufficient set of arguments is + bound. + */ + void touch() { unbound_args--; } + bool is_applicable() { return !test(unbound_args); } + + /* Iteration over values that */ + typedef char *Iterator; + virtual Iterator init_unbound_values_iter(char *buf)=0; + virtual Dep_value* get_next_unbound_value(Dep_analysis_context *dac, + Iterator iter)=0; + static const size_t iterator_size; +protected: + uint unbound_args; + + Dep_module() : unbound_args(0) {} + /* to bump unbound_args when constructing depedendencies */ + friend class Field_dependency_recorder; + friend class Dep_analysis_context; +}; + + +/* + This represents either + - "tbl.column= expr" equality dependency, i.e. tbl.column depends on fields + used in the expression, or + - tbl1.col1=tbl2.col2=... multi-equality. +*/ + +class Dep_module_expr : public Dep_module +{ +public: + Dep_value_field *field; + Item *expr; + + List<Dep_value_field> *mult_equal_fields; + /* Used during condition analysis only, similar to KEYUSE::level */ + uint level; + + Iterator init_unbound_values_iter(char *buf); + Dep_value* get_next_unbound_value(Dep_analysis_context *dac, Iterator iter); + static const size_t iterator_size; +private: + class Value_iter + { + public: + Dep_value_field *field; + List_iterator<Dep_value_field> it; + }; +}; + +const size_t Dep_module_expr::iterator_size= + ALIGN_SIZE(sizeof(Dep_module_expr::Value_iter)); + + +/* + A Unique key module + - Unique key has all of its components as arguments + - Once unique key is bound, its table value is known +*/ + +class Dep_module_key: public Dep_module +{ +public: + Dep_module_key(Dep_value_table *table_arg, uint keyno_arg, uint n_parts_arg) : + table(table_arg), keyno(keyno_arg), next_table_key(NULL) + { + unbound_args= n_parts_arg; + } + Dep_value_table *table; /* Table this key is from */ + uint keyno; /* The index we're representing */ + /* Unique keys form a linked list, ordered by keyno */ + Dep_module_key *next_table_key; + + Iterator init_unbound_values_iter(char *buf); + Dep_value* get_next_unbound_value(Dep_analysis_context *dac, Iterator iter); + static const size_t iterator_size; +private: + class Value_iter + { + public: + Dep_value_table *table; + }; +}; + +const size_t Dep_module_key::iterator_size= + ALIGN_SIZE(sizeof(Dep_module_key::Value_iter)); + +const size_t Dep_module::iterator_size= + max(Dep_module_expr::iterator_size, Dep_module_key::iterator_size); + + +/* + A module that represents outer join that we're trying to eliminate. If we + manage to declare this module to be bound, then outer join can be eliminated. +*/ + +class Dep_module_goal: public Dep_module +{ +public: + Dep_module_goal(uint n_children) + { + unbound_args= n_children; + } + bool is_final() { return TRUE; } + /* + This is the goal module, so the running wave algorithm should terminate + once it sees that this module is applicable and should never try to apply + it, hence no use for unbound value iterator implementation. + */ + Iterator init_unbound_values_iter(char *buf) + { + DBUG_ASSERT(0); + return NULL; + } + Dep_value* get_next_unbound_value(Dep_analysis_context *dac, Iterator iter) + { + DBUG_ASSERT(0); + return NULL; + } +}; + + +/* + Functional dependency analyzer context +*/ +class Dep_analysis_context +{ +public: + bool setup_equality_modules_deps(List<Dep_module> *bound_modules); + bool run_wave(List<Dep_module> *new_bound_modules); + + /* Tables that we're looking at eliminating */ + table_map usable_tables; + + /* Array of equality dependencies */ + Dep_module_expr *equality_mods; + uint n_equality_mods; /* Number of elements in the array */ + uint n_equality_mods_alloced; + + /* tablenr -> Dep_value_table* mapping. */ + Dep_value_table *table_deps[MAX_KEY]; + + /* Element for the outer join we're attempting to eliminate */ + Dep_module_goal *outer_join_dep; + + /* + Bitmap of how expressions depend on bits. Given a Dep_value_field object, + one can check bitmap_is_set(expr_deps, field_val->bitmap_offset + expr_no) + to see if expression equality_mods[expr_no] depends on the given field. + */ + MY_BITMAP expr_deps; + + Dep_value_table *create_table_value(TABLE *table); + Dep_value_field *get_field_value(Field *field); + +#ifndef DBUG_OFF + void dbug_print_deps(); +#endif +}; + + +void eliminate_tables(JOIN *join); + +static bool +eliminate_tables_for_list(JOIN *join, + List<TABLE_LIST> *join_list, + table_map tables_in_list, + Item *on_expr, + table_map tables_used_elsewhere); +static +bool check_func_dependency(JOIN *join, + table_map dep_tables, + List_iterator<TABLE_LIST> *it, + TABLE_LIST *oj_tbl, + Item* cond); +static +void build_eq_mods_for_cond(Dep_analysis_context *dac, + Dep_module_expr **eq_mod, uint *and_level, + Item *cond); +static +void check_equality(Dep_analysis_context *dac, Dep_module_expr **eq_mod, + uint and_level, Item_func *cond, Item *left, Item *right); +static +Dep_module_expr *merge_eq_mods(Dep_module_expr *start, + Dep_module_expr *new_fields, + Dep_module_expr *end, uint and_level); +static void mark_as_eliminated(JOIN *join, TABLE_LIST *tbl); +static +void add_module_expr(Dep_analysis_context *dac, Dep_module_expr **eq_mod, + uint and_level, Dep_value_field *field_val, Item *right, + List<Dep_value_field>* mult_equal_fields); + + +/*****************************************************************************/ + +/* + Perform table elimination + + SYNOPSIS + eliminate_tables() + join Join to work on + + DESCRIPTION + This is the entry point for table elimination. Grep for MODULE INTERFACE + section in this file for calling convention. + + The idea behind table elimination is that if we have an outer join: + + SELECT * FROM t1 LEFT JOIN + (t2 JOIN t3) ON t2.primary_key=t1.col AND + t3.primary_key=t2.col + such that + + 1. columns of the inner tables are not used anywhere ouside the outer + join (not in WHERE, not in GROUP/ORDER BY clause, not in select list + etc etc), and + 2. inner side of the outer join is guaranteed to produce at most one + record combination for each record combination of outer tables. + + then the inner side of the outer join can be removed from the query. + This is because it will always produce one matching record (either a + real match or a NULL-complemented record combination), and since there + are no references to columns of the inner tables anywhere, it doesn't + matter which record combination it was. + + This function primary handles checking #1. It collects a bitmap of + tables that are not used in select list/GROUP BY/ORDER BY/HAVING/etc and + thus can possibly be eliminated. + + After this, if #1 is met, the function calls eliminate_tables_for_list() + that checks #2. + + SIDE EFFECTS + See the OVERVIEW section at the top of this file. + +*/ + +void eliminate_tables(JOIN *join) +{ + THD* thd= join->thd; + Item *item; + table_map used_tables; + DBUG_ENTER("eliminate_tables"); + + DBUG_ASSERT(join->eliminated_tables == 0); + + /* If there are no outer joins, we have nothing to eliminate: */ + if (!join->outer_join) + DBUG_VOID_RETURN; + +#ifndef DBUG_OFF + if (!optimizer_flag(thd, OPTIMIZER_SWITCH_TABLE_ELIMINATION)) + DBUG_VOID_RETURN; /* purecov: inspected */ +#endif + + /* Find the tables that are referred to from WHERE/HAVING */ + used_tables= (join->conds? join->conds->used_tables() : 0) | + (join->having? join->having->used_tables() : 0); + + /* Add tables referred to from the select list */ + List_iterator<Item> it(join->fields_list); + while ((item= it++)) + used_tables |= item->used_tables(); + + /* Add tables referred to from ORDER BY and GROUP BY lists */ + ORDER *all_lists[]= { join->order, join->group_list}; + for (int i=0; i < 2; i++) + { + for (ORDER *cur_list= all_lists[i]; cur_list; cur_list= cur_list->next) + used_tables |= (*(cur_list->item))->used_tables(); + } + + if (join->select_lex == &thd->lex->select_lex) + { + + /* Multi-table UPDATE: don't eliminate tables referred from SET statement */ + if (thd->lex->sql_command == SQLCOM_UPDATE_MULTI) + { + /* Multi-table UPDATE and DELETE: don't eliminate the tables we modify: */ + used_tables |= thd->table_map_for_update; + List_iterator<Item> it2(thd->lex->value_list); + while ((item= it2++)) + used_tables |= item->used_tables(); + } + + if (thd->lex->sql_command == SQLCOM_DELETE_MULTI) + { + TABLE_LIST *tbl; + for (tbl= (TABLE_LIST*)thd->lex->auxiliary_table_list.first; + tbl; tbl= tbl->next_local) + { + used_tables |= tbl->table->map; + } + } + } + + table_map all_tables= join->all_tables_map(); + if (all_tables & ~used_tables) + { + /* There are some tables that we probably could eliminate. Try it. */ + eliminate_tables_for_list(join, join->join_list, all_tables, NULL, + used_tables); + } + DBUG_VOID_RETURN; +} + + +/* + Perform table elimination in a given join list + + SYNOPSIS + eliminate_tables_for_list() + join The join we're working on + join_list Join list to eliminate tables from (and if + on_expr !=NULL, then try eliminating join_list + itself) + list_tables Bitmap of tables embedded in the join_list. + on_expr ON expression, if the join list is the inner side + of an outer join. + NULL means it's not an outer join but rather a + top-level join list. + tables_used_elsewhere Bitmap of tables that are referred to from + somewhere outside of the join list (e.g. + select list, HAVING, other ON expressions, etc). + + DESCRIPTION + Perform table elimination in a given join list: + - First, walk through join list members and try doing table elimination for + them. + - Then, if the join list itself is an inner side of outer join + (on_expr!=NULL), then try to eliminate the entire join list. + + See "HANDLING MULTIPLE NESTED OUTER JOINS" section at the top of this file + for more detailed description and justification. + + RETURN + TRUE The entire join list eliminated + FALSE Join list wasn't eliminated (but some of its child outer joins + possibly were) +*/ + +static bool +eliminate_tables_for_list(JOIN *join, List<TABLE_LIST> *join_list, + table_map list_tables, Item *on_expr, + table_map tables_used_elsewhere) +{ + TABLE_LIST *tbl; + List_iterator<TABLE_LIST> it(*join_list); + table_map tables_used_on_left= 0; + bool all_eliminated= TRUE; + + while ((tbl= it++)) + { + if (tbl->on_expr) + { + table_map outside_used_tables= tables_used_elsewhere | + tables_used_on_left; + if (tbl->nested_join) + { + /* This is "... LEFT JOIN (join_nest) ON cond" */ + if (eliminate_tables_for_list(join, + &tbl->nested_join->join_list, + tbl->nested_join->used_tables, + tbl->on_expr, + outside_used_tables)) + { + mark_as_eliminated(join, tbl); + } + else + all_eliminated= FALSE; + } + else + { + /* This is "... LEFT JOIN tbl ON cond" */ + if (!(tbl->table->map & outside_used_tables) && + check_func_dependency(join, tbl->table->map, NULL, tbl, + tbl->on_expr)) + { + mark_as_eliminated(join, tbl); + } + else + all_eliminated= FALSE; + } + tables_used_on_left |= tbl->on_expr->used_tables(); + } + else + { + DBUG_ASSERT(!tbl->nested_join); + } + } + + /* Try eliminating the nest we're called for */ + if (all_eliminated && on_expr && !(list_tables & tables_used_elsewhere)) + { + it.rewind(); + return check_func_dependency(join, list_tables & ~join->eliminated_tables, + &it, NULL, on_expr); + } + return FALSE; /* not eliminated */ +} + + +/* + Check if given condition makes given set of tables functionally dependent + + SYNOPSIS + check_func_dependency() + join Join we're procesing + dep_tables Tables that we check to be functionally dependent (on + everything else) + it Iterator that enumerates these tables, or NULL if we're + checking one single table and it is specified in oj_tbl + parameter. + oj_tbl NULL, or one single table that we're checking + cond Condition to use to prove functional dependency + + DESCRIPTION + Check if we can use given condition to infer that the set of given tables + is functionally dependent on everything else. + + RETURN + TRUE - Yes, functionally dependent + FALSE - No, or error +*/ + +static +bool check_func_dependency(JOIN *join, + table_map dep_tables, + List_iterator<TABLE_LIST> *it, + TABLE_LIST *oj_tbl, + Item* cond) +{ + Dep_analysis_context dac; + + /* + Pre-alloc some Dep_module_expr structures. We don't need this to be + guaranteed upper bound. + */ + dac.n_equality_mods_alloced= + join->thd->lex->current_select->max_equal_elems + + (join->thd->lex->current_select->cond_count+1)*2 + + join->thd->lex->current_select->between_count; + + bzero(dac.table_deps, sizeof(dac.table_deps)); + if (!(dac.equality_mods= new Dep_module_expr[dac.n_equality_mods_alloced])) + return FALSE; /* purecov: inspected */ + + Dep_module_expr* last_eq_mod= dac.equality_mods; + + /* Create Dep_value_table objects for all tables we're trying to eliminate */ + if (oj_tbl) + { + if (!dac.create_table_value(oj_tbl->table)) + return FALSE; /* purecov: inspected */ + } + else + { + TABLE_LIST *tbl; + while ((tbl= (*it)++)) + { + if (tbl->table && (tbl->table->map & dep_tables)) + { + if (!dac.create_table_value(tbl->table)) + return FALSE; /* purecov: inspected */ + } + } + } + dac.usable_tables= dep_tables; + + /* + Analyze the the ON expression and create Dep_module_expr objects and + Dep_value_field objects for the used fields. + */ + uint and_level=0; + build_eq_mods_for_cond(&dac, &last_eq_mod, &and_level, cond); + if (!(dac.n_equality_mods= last_eq_mod - dac.equality_mods)) + return FALSE; /* No useful conditions */ + + List<Dep_module> bound_modules; + + if (!(dac.outer_join_dep= new Dep_module_goal(my_count_bits(dep_tables))) || + dac.setup_equality_modules_deps(&bound_modules)) + { + return FALSE; /* OOM, default to non-dependent */ /* purecov: inspected */ + } + + DBUG_EXECUTE("test", dac.dbug_print_deps(); ); + + return dac.run_wave(&bound_modules); +} + + +/* + Running wave functional dependency check algorithm + + SYNOPSIS + Dep_analysis_context::run_wave() + new_bound_modules List of bound modules to start the running wave from. + The list is destroyed during execution + + DESCRIPTION + This function uses running wave algorithm to check if the join nest is + functionally-dependent. + We start from provided list of bound modules, and then run the wave across + dependency edges, trying the reach the Dep_module_goal module. If we manage + to reach it, then the join nest is functionally-dependent, otherwise it is + not. + + RETURN + TRUE Yes, functionally dependent + FALSE No. +*/ + +bool Dep_analysis_context::run_wave(List<Dep_module> *new_bound_modules) +{ + List<Dep_value> new_bound_values; + + Dep_value *value; + Dep_module *module; + + while (!new_bound_modules->is_empty()) + { + /* + The "wave" is in new_bound_modules list. Iterate over values that can be + reached from these modules but are not yet bound, and collect the next + wave generation in new_bound_values list. + */ + List_iterator<Dep_module> modules_it(*new_bound_modules); + while ((module= modules_it++)) + { + char iter_buf[Dep_module::iterator_size]; + Dep_module::Iterator iter; + iter= module->init_unbound_values_iter(iter_buf); + while ((value= module->get_next_unbound_value(this, iter))) + { + value->make_bound(); + new_bound_values.push_back(value); + } + } + new_bound_modules->empty(); + + /* + Now walk over list of values we've just found to be bound and check which + unbound modules can be reached from them. If there are some modules that + became bound, collect them in new_bound_modules list. + */ + List_iterator<Dep_value> value_it(new_bound_values); + while ((value= value_it++)) + { + char iter_buf[Dep_value::iterator_size]; + Dep_value::Iterator iter; + iter= value->init_unbound_modules_iter(iter_buf); + while ((module= value->get_next_unbound_module(this, iter))) + { + module->touch(); + if (!module->is_applicable()) + continue; + if (module->is_final()) + return TRUE; /* Functionally dependent */ + module->make_bound(); + new_bound_modules->push_back(module); + } + } + new_bound_values.empty(); + } + return FALSE; +} + + +/* + This is used to analyze expressions in "tbl.col=expr" dependencies so + that we can figure out which fields the expression depends on. +*/ + +class Field_dependency_recorder : public Field_enumerator +{ +public: + Field_dependency_recorder(Dep_analysis_context *ctx_arg): ctx(ctx_arg) + {} + + void visit_field(Field *field) + { + Dep_value_table *tbl_dep; + if ((tbl_dep= ctx->table_deps[field->table->tablenr])) + { + for (Dep_value_field *field_dep= tbl_dep->fields; field_dep; + field_dep= field_dep->next_table_field) + { + if (field->field_index == field_dep->field->field_index) + { + uint offs= field_dep->bitmap_offset + expr_offset; + if (!bitmap_is_set(&ctx->expr_deps, offs)) + ctx->equality_mods[expr_offset].unbound_args++; + bitmap_set_bit(&ctx->expr_deps, offs); + return; + } + } + /* + We got here if didn't find this field. It's not a part of + a unique key, and/or there is no field=expr element for it. + Bump the dependency anyway, this will signal that this dependency + cannot be satisfied. + */ + ctx->equality_mods[expr_offset].unbound_args++; + } + else + visited_other_tables= TRUE; + } + + Dep_analysis_context *ctx; + /* Offset of the expression we're processing in the dependency bitmap */ + uint expr_offset; + + bool visited_other_tables; +}; + + + + +/* + Setup inbound dependency relationships for tbl.col=expr equalities + + SYNOPSIS + setup_equality_modules_deps() + bound_deps_list Put here modules that were found not to depend on + any non-bound columns. + + DESCRIPTION + Setup inbound dependency relationships for tbl.col=expr equalities: + - allocate a bitmap where we store such dependencies + - for each "tbl.col=expr" equality, analyze the expr part and find out + which fields it refers to and set appropriate dependencies. + + RETURN + FALSE OK + TRUE Out of memory +*/ + +bool Dep_analysis_context::setup_equality_modules_deps(List<Dep_module> + *bound_modules) +{ + DBUG_ENTER("setup_equality_modules_deps"); + + /* + Count Dep_value_field objects and assign each of them a unique + bitmap_offset value. + */ + uint offset= 0; + for (Dep_value_table **tbl_dep= table_deps; + tbl_dep < table_deps + MAX_TABLES; + tbl_dep++) + { + if (*tbl_dep) + { + for (Dep_value_field *field_dep= (*tbl_dep)->fields; + field_dep; + field_dep= field_dep->next_table_field) + { + field_dep->bitmap_offset= offset; + offset += n_equality_mods; + } + } + } + + void *buf; + if (!(buf= current_thd->alloc(bitmap_buffer_size(offset))) || + bitmap_init(&expr_deps, (my_bitmap_map*)buf, offset, FALSE)) + { + DBUG_RETURN(TRUE); /* purecov: inspected */ + } + bitmap_clear_all(&expr_deps); + + /* + Analyze all "field=expr" dependencies, and have expr_deps encode + dependencies of expressions from fields. + + Also collect a linked list of equalities that are bound. + */ + Field_dependency_recorder deps_recorder(this); + for (Dep_module_expr *eq_mod= equality_mods; + eq_mod < equality_mods + n_equality_mods; + eq_mod++) + { + deps_recorder.expr_offset= eq_mod - equality_mods; + deps_recorder.visited_other_tables= FALSE; + eq_mod->unbound_args= 0; + + if (eq_mod->field) + { + /* Regular tbl.col=expr(tblX1.col1, tblY1.col2, ...) */ + eq_mod->expr->walk(&Item::enumerate_field_refs_processor, FALSE, + (uchar*)&deps_recorder); + } + else + { + /* It's a multi-equality */ + eq_mod->unbound_args= !test(eq_mod->expr); + List_iterator<Dep_value_field> it(*eq_mod->mult_equal_fields); + Dep_value_field* field_val; + while ((field_val= it++)) + { + uint offs= field_val->bitmap_offset + eq_mod - equality_mods; + bitmap_set_bit(&expr_deps, offs); + } + } + + if (!eq_mod->unbound_args) + bound_modules->push_back(eq_mod); + } + + DBUG_RETURN(FALSE); +} + + +/* + Ordering that we're using whenever we need to maintain a no-duplicates list + of field value objects. +*/ + +static +int compare_field_values(Dep_value_field *a, Dep_value_field *b, void *unused) +{ + uint a_ratio= a->field->table->tablenr*MAX_FIELDS + + a->field->field_index; + + uint b_ratio= b->field->table->tablenr*MAX_FIELDS + + b->field->field_index; + return (a_ratio < b_ratio)? -1 : ((a_ratio == b_ratio)? 0 : 1); +} + + +/* + Produce Dep_module_expr elements for given condition. + + SYNOPSIS + build_eq_mods_for_cond() + ctx Table elimination context + eq_mod INOUT Put produced equality conditions here + and_level INOUT AND-level (like in add_key_fields) + cond Condition to process + + DESCRIPTION + Analyze the given condition and produce an array of Dep_module_expr + dependencies from it. The idea of analysis is as follows: + There are useful equalities that have form + + eliminable_tbl.field = expr (denote as useful_equality) + + The condition is composed of useful equalities and other conditions that + are combined together with AND and OR operators. We process the condition + in recursive fashion according to these basic rules: + + useful_equality1 AND useful_equality2 -> make array of two + Dep_module_expr objects + + useful_equality AND other_cond -> discard other_cond + + useful_equality OR other_cond -> discard everything + + useful_equality1 OR useful_equality2 -> check if both sides of OR are the + same equality. If yes, that's the + result, otherwise discard + everything. + + The rules are used to map the condition into an array Dep_module_expr + elements. The array will specify functional dependencies that logically + follow from the condition. + + SEE ALSO + This function is modeled after add_key_fields() +*/ + +static +void build_eq_mods_for_cond(Dep_analysis_context *ctx, + Dep_module_expr **eq_mod, + uint *and_level, Item *cond) +{ + if (cond->type() == Item_func::COND_ITEM) + { + List_iterator_fast<Item> li(*((Item_cond*) cond)->argument_list()); + uint orig_offset= *eq_mod - ctx->equality_mods; + + /* AND/OR */ + if (((Item_cond*) cond)->functype() == Item_func::COND_AND_FUNC) + { + Item *item; + while ((item=li++)) + build_eq_mods_for_cond(ctx, eq_mod, and_level, item); + + for (Dep_module_expr *mod_exp= ctx->equality_mods + orig_offset; + mod_exp != *eq_mod ; mod_exp++) + { + mod_exp->level= *and_level; + } + } + else + { + Item *item; + (*and_level)++; + build_eq_mods_for_cond(ctx, eq_mod, and_level, li++); + while ((item=li++)) + { + Dep_module_expr *start_key_fields= *eq_mod; + (*and_level)++; + build_eq_mods_for_cond(ctx, eq_mod, and_level, item); + *eq_mod= merge_eq_mods(ctx->equality_mods + orig_offset, + start_key_fields, *eq_mod, + ++(*and_level)); + } + } + return; + } + + if (cond->type() != Item::FUNC_ITEM) + return; + + Item_func *cond_func= (Item_func*) cond; + Item **args= cond_func->arguments(); + + switch (cond_func->functype()) { + case Item_func::BETWEEN: + { + Item *fld; + if (!((Item_func_between*)cond)->negated && + (fld= args[0]->real_item())->type() == Item::FIELD_ITEM && + args[1]->eq(args[2], ((Item_field*)fld)->field->binary())) + { + check_equality(ctx, eq_mod, *and_level, cond_func, args[0], args[1]); + check_equality(ctx, eq_mod, *and_level, cond_func, args[1], args[0]); + } + break; + } + case Item_func::EQ_FUNC: + case Item_func::EQUAL_FUNC: + { + check_equality(ctx, eq_mod, *and_level, cond_func, args[0], args[1]); + check_equality(ctx, eq_mod, *and_level, cond_func, args[1], args[0]); + break; + } + case Item_func::ISNULL_FUNC: + { + Item *tmp=new Item_null; + if (tmp) + check_equality(ctx, eq_mod, *and_level, cond_func, args[0], tmp); + break; + } + case Item_func::MULT_EQUAL_FUNC: + { + /* + The condition is a + + tbl1.field1 = tbl2.field2 = tbl3.field3 [= const_expr] + + multiple-equality. Do two things: + - Collect List<Dep_value_field> of tblX.colY where tblX is one of the + tables we're trying to eliminate. + - rembember if there was a bound value, either const_expr or tblY.colZ + swher tblY is not a table that we're trying to eliminate. + Store all collected information in a Dep_module_expr object. + */ + Item_equal *item_equal= (Item_equal*)cond; + List<Dep_value_field> *fvl; + if (!(fvl= new List<Dep_value_field>)) + break; /* purecov: inspected */ + + Item_equal_iterator it(*item_equal); + Item_field *item; + Item *bound_item= item_equal->get_const(); + while ((item= it++)) + { + if ((item->used_tables() & ctx->usable_tables)) + { + Dep_value_field *field_val; + if ((field_val= ctx->get_field_value(item->field))) + fvl->push_back(field_val); + } + else + { + if (!bound_item) + bound_item= item; + } + } + exchange_sort<Dep_value_field>(fvl, compare_field_values, NULL); + add_module_expr(ctx, eq_mod, *and_level, NULL, bound_item, fvl); + break; + } + default: + break; + } +} + + +/* + Perform an OR operation on two (adjacent) Dep_module_expr arrays. + + SYNOPSIS + merge_eq_mods() + start Start of left OR-part + new_fields Start of right OR-part + end End of right OR-part + and_level AND-level (like in add_key_fields) + + DESCRIPTION + This function is invoked for two adjacent arrays of Dep_module_expr elements: + + $LEFT_PART $RIGHT_PART + +-----------------------+-----------------------+ + start new_fields end + + The goal is to produce an array which would correspond to the combined + + $LEFT_PART OR $RIGHT_PART + + condition. This is achieved as follows: First, we apply distrubutive law: + + (fdep_A_1 AND fdep_A_2 AND ...) OR (fdep_B_1 AND fdep_B_2 AND ...) = + + = AND_ij (fdep_A_[i] OR fdep_B_[j]) + + Then we walk over the obtained "fdep_A_[i] OR fdep_B_[j]" pairs, and + - Discard those that that have left and right part referring to different + columns. We can't infer anything useful from "col1=expr1 OR col2=expr2". + - When left and right parts refer to the same column, we check if they are + essentially the same. + = If they are the same, we keep one copy + "t.col=expr OR t.col=expr" -> "t.col=expr + = if they are different , then we discard both + "t.col=expr1 OR t.col=expr2" -> (nothing useful) + + (no per-table or for-index FUNC_DEPS exist yet at this phase). + + See also merge_key_fields(). + + RETURN + End of the result array +*/ + +static +Dep_module_expr *merge_eq_mods(Dep_module_expr *start, + Dep_module_expr *new_fields, + Dep_module_expr *end, uint and_level) +{ + if (start == new_fields) + return start; /* (nothing) OR (...) -> (nothing) */ + if (new_fields == end) + return start; /* (...) OR (nothing) -> (nothing) */ + + Dep_module_expr *first_free= new_fields; + + for (; new_fields != end ; new_fields++) + { + for (Dep_module_expr *old=start ; old != first_free ; old++) + { + if (old->field == new_fields->field) + { + if (!old->field) + { + /* + OR-ing two multiple equalities. We must compute an intersection of + used fields, and check the constants according to these rules: + + a=b=c=d OR a=c=e=f -> a=c (compute intersection) + a=const1 OR a=b -> (nothing) + a=const1 OR a=const1 -> a=const1 + a=const1 OR a=const2 -> (nothing) + + If we're performing an OR operation over multiple equalities, e.g. + + (a=b=c AND p=q) OR (a=b AND v=z) + + then we'll need to try combining each equality with each. ANDed + equalities are guaranteed to be disjoint, so we'll only get one + hit. + */ + Field *eq_field= old->mult_equal_fields->head()->field; + if (old->expr && new_fields->expr && + old->expr->eq_by_collation(new_fields->expr, eq_field->binary(), + eq_field->charset())) + { + /* Ok, keep */ + } + else + { + /* no single constant/bound item. */ + old->expr= NULL; + } + + List <Dep_value_field> *fv; + if (!(fv= new List<Dep_value_field>)) + break; /* purecov: inspected */ + + List_iterator<Dep_value_field> it1(*old->mult_equal_fields); + List_iterator<Dep_value_field> it2(*new_fields->mult_equal_fields); + Dep_value_field *lfield= it1++; + Dep_value_field *rfield= it2++; + /* Intersect two ordered lists */ + while (lfield && rfield) + { + if (lfield == rfield) + { + fv->push_back(lfield); + lfield=it1++; + rfield=it2++; + } + else + { + if (compare_field_values(lfield, rfield, NULL) < 0) + lfield= it1++; + else + rfield= it2++; + } + } + + if (fv->elements + test(old->expr) > 1) + { + old->mult_equal_fields= fv; + old->level= and_level; + } + } + else if (!new_fields->expr->const_item()) + { + /* + If the value matches, we can use the key reference. + If not, we keep it until we have examined all new values + */ + if (old->expr->eq(new_fields->expr, + old->field->field->binary())) + { + old->level= and_level; + } + } + else if (old->expr->eq_by_collation(new_fields->expr, + old->field->field->binary(), + old->field->field->charset())) + { + old->level= and_level; + } + else + { + /* The expressions are different. */ + if (old == --first_free) // If last item + break; + *old= *first_free; // Remove old value + old--; // Retry this value + } + } + } + } + + /* + Ok, the results are within the [start, first_free) range, and the useful + elements have level==and_level. Now, remove all unusable elements: + */ + for (Dep_module_expr *old=start ; old != first_free ;) + { + if (old->level != and_level) + { // Not used in all levels + if (old == --first_free) + break; + *old= *first_free; // Remove old value + continue; + } + old++; + } + return first_free; +} + + +/* + Add an Dep_module_expr element for left=right condition + + SYNOPSIS + check_equality() + fda Table elimination context + eq_mod INOUT Store created Dep_module_expr here and increment ptr if + you do so + and_level AND-level (like in add_key_fields) + cond Condition we've inferred the left=right equality from. + left Left expression + right Right expression + usable_tables Create Dep_module_expr only if Left_expression's table + belongs to this set. + + DESCRIPTION + Check if the passed left=right equality is such that + - 'left' is an Item_field referring to a field in a table we're checking + to be functionally depdendent, + - the equality allows to conclude that 'left' expression is functionally + dependent on the 'right', + and if so, create an Dep_module_expr object. +*/ + +static +void check_equality(Dep_analysis_context *ctx, Dep_module_expr **eq_mod, + uint and_level, Item_func *cond, Item *left, Item *right) +{ + if ((left->used_tables() & ctx->usable_tables) && + !(right->used_tables() & RAND_TABLE_BIT) && + left->real_item()->type() == Item::FIELD_ITEM) + { + Field *field= ((Item_field*)left->real_item())->field; + if (field->result_type() == STRING_RESULT) + { + if (right->result_type() != STRING_RESULT) + { + if (field->cmp_type() != right->result_type()) + return; + } + else + { + /* + We can't assume there's a functional dependency if the effective + collation of the operation differ from the field collation. + */ + if (field->cmp_type() == STRING_RESULT && + ((Field_str*)field)->charset() != cond->compare_collation()) + return; + } + } + Dep_value_field *field_val; + if ((field_val= ctx->get_field_value(field))) + add_module_expr(ctx, eq_mod, and_level, field_val, right, NULL); + } +} + + +/* + Add a Dep_module_expr object with the specified parameters. + + DESCRIPTION + Add a Dep_module_expr object with the specified parameters. Re-allocate + the ctx->equality_mods array if it has no space left. +*/ + +static +void add_module_expr(Dep_analysis_context *ctx, Dep_module_expr **eq_mod, + uint and_level, Dep_value_field *field_val, + Item *right, List<Dep_value_field>* mult_equal_fields) +{ + if (*eq_mod == ctx->equality_mods + ctx->n_equality_mods_alloced) + { + /* + We've filled the entire equality_mods array. Replace it with a bigger + one. We do it somewhat inefficiently but it doesn't matter. + */ + /* purecov: begin inspected */ + Dep_module_expr *new_arr; + if (!(new_arr= new Dep_module_expr[ctx->n_equality_mods_alloced *2])) + return; + ctx->n_equality_mods_alloced *= 2; + for (int i= 0; i < *eq_mod - ctx->equality_mods; i++) + new_arr[i]= ctx->equality_mods[i]; + + ctx->equality_mods= new_arr; + *eq_mod= new_arr + (*eq_mod - ctx->equality_mods); + /* purecov: end */ + } + + (*eq_mod)->field= field_val; + (*eq_mod)->expr= right; + (*eq_mod)->level= and_level; + (*eq_mod)->mult_equal_fields= mult_equal_fields; + (*eq_mod)++; +} + + +/* + Create a Dep_value_table object for the given table + + SYNOPSIS + Dep_analysis_context::create_table_value() + table Table to create object for + + DESCRIPTION + Create a Dep_value_table object for the given table. Also create + Dep_module_key objects for all unique keys in the table. + + RETURN + Created table value object + NULL if out of memory +*/ + +Dep_value_table *Dep_analysis_context::create_table_value(TABLE *table) +{ + Dep_value_table *tbl_dep; + if (!(tbl_dep= new Dep_value_table(table))) + return NULL; /* purecov: inspected */ + + Dep_module_key **key_list= &(tbl_dep->keys); + /* Add dependencies for unique keys */ + for (uint i=0; i < table->s->keys; i++) + { + KEY *key= table->key_info + i; + if ((key->flags & (HA_NOSAME | HA_END_SPACE_KEY)) == HA_NOSAME) + { + Dep_module_key *key_dep; + if (!(key_dep= new Dep_module_key(tbl_dep, i, key->key_parts))) + return NULL; + *key_list= key_dep; + key_list= &(key_dep->next_table_key); + } + } + return table_deps[table->tablenr]= tbl_dep; +} + + +/* + Get a Dep_value_field object for the given field, creating it if necessary + + SYNOPSIS + Dep_analysis_context::get_field_value() + field Field to create object for + + DESCRIPTION + Get a Dep_value_field object for the given field. First, we search for it + in the list of Dep_value_field objects we have already created. If we don't + find it, we create a new Dep_value_field and put it into the list of field + objects we have for the table. + + RETURN + Created field value object + NULL if out of memory +*/ + +Dep_value_field *Dep_analysis_context::get_field_value(Field *field) +{ + TABLE *table= field->table; + Dep_value_table *tbl_dep= table_deps[table->tablenr]; + + /* Try finding the field in field list */ + Dep_value_field **pfield= &(tbl_dep->fields); + while (*pfield && (*pfield)->field->field_index < field->field_index) + { + pfield= &((*pfield)->next_table_field); + } + if (*pfield && (*pfield)->field->field_index == field->field_index) + return *pfield; + + /* Create the field and insert it in the list */ + Dep_value_field *new_field= new Dep_value_field(tbl_dep, field); + new_field->next_table_field= *pfield; + *pfield= new_field; + + return new_field; +} + + +/* + Iteration over unbound modules that are our dependencies. + for those we have: + - dependendencies of our fields + - outer join we're in +*/ +char *Dep_value_table::init_unbound_modules_iter(char *buf) +{ + Module_iter *iter= ALIGN_PTR(my_ptrdiff_t(buf), Module_iter); + iter->field_dep= fields; + if (fields) + { + fields->init_unbound_modules_iter(iter->buf); + fields->make_unbound_modules_iter_skip_keys(iter->buf); + } + iter->returned_goal= FALSE; + return (char*)iter; +} + + +Dep_module* +Dep_value_table::get_next_unbound_module(Dep_analysis_context *dac, + char *iter) +{ + Module_iter *di= (Module_iter*)iter; + while (di->field_dep) + { + Dep_module *res; + if ((res= di->field_dep->get_next_unbound_module(dac, di->buf))) + return res; + if ((di->field_dep= di->field_dep->next_table_field)) + { + char *field_iter= ((Module_iter*)iter)->buf; + di->field_dep->init_unbound_modules_iter(field_iter); + di->field_dep->make_unbound_modules_iter_skip_keys(field_iter); + } + } + + if (!di->returned_goal) + { + di->returned_goal= TRUE; + return dac->outer_join_dep; + } + return NULL; +} + + +char *Dep_module_expr::init_unbound_values_iter(char *buf) +{ + Value_iter *iter= ALIGN_PTR(my_ptrdiff_t(buf), Value_iter); + iter->field= field; + if (!field) + { + new (&iter->it) List_iterator<Dep_value_field>(*mult_equal_fields); + } + return (char*)iter; +} + + +Dep_value* Dep_module_expr::get_next_unbound_value(Dep_analysis_context *dac, + char *buf) +{ + Dep_value *res; + if (field) + { + res= ((Value_iter*)buf)->field; + ((Value_iter*)buf)->field= NULL; + return (!res || res->is_bound())? NULL : res; + } + else + { + while ((res= ((Value_iter*)buf)->it++)) + { + if (!res->is_bound()) + return res; + } + return NULL; + } +} + + +char *Dep_module_key::init_unbound_values_iter(char *buf) +{ + Value_iter *iter= ALIGN_PTR(my_ptrdiff_t(buf), Value_iter); + iter->table= table; + return (char*)iter; +} + + +Dep_value* Dep_module_key::get_next_unbound_value(Dep_analysis_context *dac, + Dep_module::Iterator iter) +{ + Dep_value* res= ((Value_iter*)iter)->table; + ((Value_iter*)iter)->table= NULL; + return res; +} + + +Dep_value::Iterator Dep_value_field::init_unbound_modules_iter(char *buf) +{ + Module_iter *iter= ALIGN_PTR(my_ptrdiff_t(buf), Module_iter); + iter->key_dep= table->keys; + iter->equality_no= 0; + return (char*)iter; +} + + +void +Dep_value_field::make_unbound_modules_iter_skip_keys(Dep_value::Iterator iter) +{ + ((Module_iter*)iter)->key_dep= NULL; +} + + +Dep_module* Dep_value_field::get_next_unbound_module(Dep_analysis_context *dac, + Dep_value::Iterator iter) +{ + Module_iter *di= (Module_iter*)iter; + Dep_module_key *key_dep= di->key_dep; + + /* + First, enumerate all unique keys that are + - not yet applicable + - have this field as a part of them + */ + while (key_dep && (key_dep->is_applicable() || + !field->part_of_key.is_set(key_dep->keyno))) + { + key_dep= key_dep->next_table_key; + } + + if (key_dep) + { + di->key_dep= key_dep->next_table_key; + return key_dep; + } + else + di->key_dep= NULL; + + /* + Then walk through [multi]equalities and find those that + - depend on this field + - and are not bound yet. + */ + uint eq_no= di->equality_no; + while (eq_no < dac->n_equality_mods && + (!bitmap_is_set(&dac->expr_deps, bitmap_offset + eq_no) || + dac->equality_mods[eq_no].is_applicable())) + { + eq_no++; + } + + if (eq_no < dac->n_equality_mods) + { + di->equality_no= eq_no+1; + return &dac->equality_mods[eq_no]; + } + return NULL; +} + + +/* + Mark one table or the whole join nest as eliminated. +*/ + +static void mark_as_eliminated(JOIN *join, TABLE_LIST *tbl) +{ + TABLE *table; + /* + NOTE: there are TABLE_LIST object that have + tbl->table!= NULL && tbl->nested_join!=NULL and + tbl->table == tbl->nested_join->join_list->element(..)->table + */ + if (tbl->nested_join) + { + TABLE_LIST *child; + List_iterator<TABLE_LIST> it(tbl->nested_join->join_list); + while ((child= it++)) + mark_as_eliminated(join, child); + } + else if ((table= tbl->table)) + { + JOIN_TAB *tab= tbl->table->reginfo.join_tab; + if (!(join->const_table_map & tab->table->map)) + { + DBUG_PRINT("info", ("Eliminated table %s", table->alias)); + tab->type= JT_CONST; + join->eliminated_tables |= table->map; + join->const_table_map|= table->map; + set_position(join, join->const_tables++, tab, (KEYUSE*)0); + } + } + + if (tbl->on_expr) + tbl->on_expr->walk(&Item::mark_as_eliminated_processor, FALSE, NULL); +} + + +#ifndef DBUG_OFF +/* purecov: begin inspected */ +void Dep_analysis_context::dbug_print_deps() +{ + DBUG_ENTER("dbug_print_deps"); + DBUG_LOCK_FILE; + + fprintf(DBUG_FILE,"deps {\n"); + + /* Start with printing equalities */ + for (Dep_module_expr *eq_mod= equality_mods; + eq_mod != equality_mods + n_equality_mods; eq_mod++) + { + char buf[128]; + String str(buf, sizeof(buf), &my_charset_bin); + str.length(0); + eq_mod->expr->print(&str, QT_ORDINARY); + if (eq_mod->field) + { + fprintf(DBUG_FILE, " equality%ld: %s -> %s.%s\n", + (long)(eq_mod - equality_mods), + str.c_ptr(), + eq_mod->field->table->table->alias, + eq_mod->field->field->field_name); + } + else + { + fprintf(DBUG_FILE, " equality%ld: multi-equality", + (long)(eq_mod - equality_mods)); + } + } + fprintf(DBUG_FILE,"\n"); + + /* Then tables and their fields */ + for (uint i=0; i < MAX_TABLES; i++) + { + Dep_value_table *table_dep; + if ((table_dep= table_deps[i])) + { + /* Print table */ + fprintf(DBUG_FILE, " table %s\n", table_dep->table->alias); + /* Print fields */ + for (Dep_value_field *field_dep= table_dep->fields; field_dep; + field_dep= field_dep->next_table_field) + { + fprintf(DBUG_FILE, " field %s.%s ->", table_dep->table->alias, + field_dep->field->field_name); + uint ofs= field_dep->bitmap_offset; + for (uint bit= ofs; bit < ofs + n_equality_mods; bit++) + { + if (bitmap_is_set(&expr_deps, bit)) + fprintf(DBUG_FILE, " equality%d ", bit - ofs); + } + fprintf(DBUG_FILE, "\n"); + } + } + } + fprintf(DBUG_FILE,"\n}\n"); + DBUG_UNLOCK_FILE; + DBUG_VOID_RETURN; +} +/* purecov: end */ + +#endif +/** + @} (end of group Table_Elimination) +*/ + diff --git a/sql/set_var.cc b/sql/set_var.cc index 3a19a4849d9..113e22c6aaa 100644 --- a/sql/set_var.cc +++ b/sql/set_var.cc @@ -147,6 +147,7 @@ static bool sys_update_general_log_path(THD *thd, set_var * var); static void sys_default_general_log_path(THD *thd, enum_var_type type); static bool sys_update_slow_log_path(THD *thd, set_var * var); static void sys_default_slow_log_path(THD *thd, enum_var_type type); +static void fix_sys_log_slow_filter(THD *thd, enum_var_type); /* Variable definition list @@ -359,6 +360,9 @@ static sys_var_bool_ptr static sys_var_thd_ulong sys_log_warnings(&vars, "log_warnings", &SV::log_warnings); static sys_var_microseconds sys_var_long_query_time(&vars, "long_query_time", &SV::long_query_time); +static sys_var_microseconds sys_var_long_query_time2(&vars, + "log_slow_time", + &SV::long_query_time); static sys_var_thd_bool sys_low_priority_updates(&vars, "low_priority_updates", &SV::low_priority_updates, fix_low_priority_updates); @@ -852,6 +856,20 @@ sys_var_thd_ulong sys_group_concat_max_len(&vars, "group_concat_ma sys_var_thd_time_zone sys_time_zone(&vars, "time_zone", sys_var::SESSION_VARIABLE_IN_BINLOG); +/* Unique variables for MariaDB */ +static sys_var_thd_ulong sys_log_slow_rate_limit(&vars, + "log_slow_rate_limit", + &SV::log_slow_rate_limit); +static sys_var_thd_set sys_log_slow_filter(&vars, "log_slow_filter", + &SV::log_slow_filter, + &log_slow_filter_typelib, + QPLAN_VISIBLE_MASK, + fix_sys_log_slow_filter); +static sys_var_thd_set sys_log_slow_verbosity(&vars, + "log_slow_verbosity", + &SV::log_slow_verbosity, + &log_slow_verbosity_typelib); + /* Global read-only variable containing hostname */ static sys_var_const_str sys_hostname(&vars, "hostname", glob_hostname); @@ -1850,11 +1868,17 @@ err: return 1; } +/** + Check vality of set + + Note that this sets 'save_result.ulong_value' for the update function, + which means that we don't need a separate sys_var::update() function +*/ bool sys_var::check_set(THD *thd, set_var *var, TYPELIB *enum_names) { bool not_used; - char buff[STRING_BUFFER_USUAL_SIZE], *error= 0; + char buff[256], *error= 0; uint error_len= 0; String str(buff, sizeof(buff) - 1, system_charset_info), *res; @@ -1866,8 +1890,7 @@ bool sys_var::check_set(THD *thd, set_var *var, TYPELIB *enum_names) goto err; } - if (!m_allow_empty_value && - res->length() == 0) + if (!m_allow_empty_value && res->length() == 0) { buff[0]= 0; goto err; @@ -1889,8 +1912,7 @@ bool sys_var::check_set(THD *thd, set_var *var, TYPELIB *enum_names) { ulonglong tmp= var->value->val_int(); - if (!m_allow_empty_value && - tmp == 0) + if (!m_allow_empty_value && tmp == 0) { buff[0]= '0'; buff[1]= 0; @@ -1917,6 +1939,49 @@ err: } +/** + Make string representation of set + + @param[in] thd thread handler + @param[in] val sql_mode value + @param[in] names names for the different bits + @param[out] rep Result string + + @return + 0 ok + 1 end of memory +*/ + +bool sys_var::make_set(THD *thd, ulonglong val, TYPELIB *names, + LEX_STRING *rep) +{ + /* Strings for typelib may be big; This is reallocated on demand */ + char buff[256]; + String tmp(buff, sizeof(buff) - 1, &my_charset_latin1); + bool error= 0; + + tmp.length(0); + for (uint i= 0; val; val>>= 1, i++) + { + if (val & 1) + { + error|= tmp.append(names->type_names[i], + names->type_lengths[i]); + error|= tmp.append(','); + } + } + + if (tmp.length()) + tmp.length(tmp.length() - 1); /* trim the trailing comma */ + + /* Allocate temporary copy of string */ + if (!(rep->str= thd->strmake(tmp.ptr(), tmp.length()))) + error= 1; + rep->length= tmp.length(); + return error; /* Error in case of out of memory */ +} + + CHARSET_INFO *sys_var::charset(THD *thd) { return is_os_charset ? thd->variables.character_set_filesystem : @@ -1952,6 +2017,16 @@ uchar *sys_var_thd_enum::value_ptr(THD *thd, enum_var_type type, return (uchar*) enum_names->type_names[tmp]; } +uchar *sys_var_thd_set::value_ptr(THD *thd, enum_var_type type, + LEX_STRING *base) +{ + LEX_STRING sql_mode; + ulong val= ((type == OPT_GLOBAL) ? global_system_variables.*offset : + thd->variables.*offset); + (void) make_set(thd, val & visible_value_mask, enum_names, &sql_mode); + return (uchar *) sql_mode.str; +} + bool sys_var_thd_bit::check(THD *thd, set_var *var) { return (check_enum(thd, var, &bool_typelib) || @@ -3700,6 +3775,24 @@ int set_var_password::update(THD *thd) } /**************************************************************************** + Functions to handle log_slow_filter +****************************************************************************/ + +/* Ensure that the proper bits are set for easy test of logging */ +static void fix_sys_log_slow_filter(THD *thd, enum_var_type type) +{ + /* Maintain everything with filters */ + opt_log_slow_admin_statements= 1; + if (type == OPT_GLOBAL) + global_system_variables.log_slow_filter= + fix_log_slow_filter(global_system_variables.log_slow_filter); + else + thd->variables.log_slow_filter= + fix_log_slow_filter(thd->variables.log_slow_filter); +} + + +/**************************************************************************** Functions to handle table_type ****************************************************************************/ @@ -3810,67 +3903,6 @@ bool sys_var_thd_table_type::update(THD *thd, set_var *var) Functions to handle sql_mode ****************************************************************************/ -/** - Make string representation of mode. - - @param[in] thd thread handler - @param[in] val sql_mode value - @param[out] len pointer on length of string - - @return - pointer to string with sql_mode representation -*/ - -bool -sys_var_thd_sql_mode:: -symbolic_mode_representation(THD *thd, ulonglong val, LEX_STRING *rep) -{ - char buff[STRING_BUFFER_USUAL_SIZE*8]; - String tmp(buff, sizeof(buff) - 1, &my_charset_latin1); - - tmp.length(0); - - for (uint i= 0; val; val>>= 1, i++) - { - if (val & 1) - { - tmp.append(sql_mode_typelib.type_names[i], - sql_mode_typelib.type_lengths[i]); - tmp.append(','); - } - } - - if (tmp.length()) - tmp.length(tmp.length() - 1); /* trim the trailing comma */ - - rep->str= thd->strmake(tmp.ptr(), tmp.length()); - - rep->length= rep->str ? tmp.length() : 0; - - return rep->length != tmp.length(); -} - - -uchar *sys_var_thd_sql_mode::value_ptr(THD *thd, enum_var_type type, - LEX_STRING *base) -{ - LEX_STRING sql_mode; - ulonglong val= ((type == OPT_GLOBAL) ? global_system_variables.*offset : - thd->variables.*offset); - (void) symbolic_mode_representation(thd, val, &sql_mode); - return (uchar *) sql_mode.str; -} - - -void sys_var_thd_sql_mode::set_default(THD *thd, enum_var_type type) -{ - if (type == OPT_GLOBAL) - global_system_variables.*offset= 0; - else - thd->variables.*offset= global_system_variables.*offset; -} - - void fix_sql_mode_var(THD *thd, enum_var_type type) { if (type == OPT_GLOBAL) diff --git a/sql/set_var.h b/sql/set_var.h index 10e6e0f9c35..f2c7c0ba30f 100644 --- a/sql/set_var.h +++ b/sql/set_var.h @@ -98,6 +98,8 @@ public: virtual bool check(THD *thd, set_var *var); bool check_enum(THD *thd, set_var *var, const TYPELIB *enum_names); bool check_set(THD *thd, set_var *var, TYPELIB *enum_names); + static bool make_set(THD *thd, ulonglong sql_mode, TYPELIB *names, + LEX_STRING *rep); bool is_written_to_binlog(enum_var_type type) { return (type == OPT_SESSION || type == OPT_DEFAULT) && @@ -532,6 +534,25 @@ public: }; +class sys_var_thd_set :public sys_var_thd_enum +{ + ulong visible_value_mask; /* Mask away internal bits */ +public: + sys_var_thd_set(sys_var_chain *chain, const char *name_arg, + ulong SV::*offset_arg, TYPELIB *typelib, + ulong value_mask= ~ (ulong) 0, + sys_after_update_func func= NULL) + :sys_var_thd_enum(chain, name_arg, offset_arg, typelib, + func), visible_value_mask(value_mask) + {} + bool check(THD *thd, set_var *var) + { + return check_set(thd, var, enum_names); + } + uchar *value_ptr(THD *thd, enum_var_type type, LEX_STRING *base); +}; + + class sys_var_thd_optimizer_switch :public sys_var_thd_enum { public: @@ -548,22 +569,14 @@ public: extern void fix_sql_mode_var(THD *thd, enum_var_type type); -class sys_var_thd_sql_mode :public sys_var_thd_enum +class sys_var_thd_sql_mode :public sys_var_thd_set { public: sys_var_thd_sql_mode(sys_var_chain *chain, const char *name_arg, ulong SV::*offset_arg) - :sys_var_thd_enum(chain, name_arg, offset_arg, &sql_mode_typelib, - fix_sql_mode_var) + :sys_var_thd_set(chain, name_arg, offset_arg, &sql_mode_typelib, + ~(ulong) 0, fix_sql_mode_var) {} - bool check(THD *thd, set_var *var) - { - return check_set(thd, var, enum_names); - } - void set_default(THD *thd, enum_var_type type); - uchar *value_ptr(THD *thd, enum_var_type type, LEX_STRING *base); - static bool symbolic_mode_representation(THD *thd, ulonglong sql_mode, - LEX_STRING *rep); }; @@ -1184,7 +1197,6 @@ public: bool update(THD *thd, set_var *var); }; - /** Handler for setting the system variable --read-only. */ diff --git a/sql/slave.cc b/sql/slave.cc index 84aae6eb683..926db30be15 100644 --- a/sql/slave.cc +++ b/sql/slave.cc @@ -1756,6 +1756,7 @@ static int init_slave_thread(THD* thd, SLAVE_THD_TYPE thd_type) + MAX_LOG_EVENT_HEADER; /* note, incr over the global not session var */ thd->slave_thread = 1; thd->enable_slow_log= opt_log_slow_slave_statements; + thd->variables.log_slow_filter= global_system_variables.log_slow_filter; set_slave_thread_options(thd); thd->client_capabilities = CLIENT_LOCAL_FILES; pthread_mutex_lock(&LOCK_thread_count); diff --git a/sql/sp_head.cc b/sql/sp_head.cc index 0736e5fc2a8..e7310787a35 100644 --- a/sql/sp_head.cc +++ b/sql/sp_head.cc @@ -1846,7 +1846,7 @@ sp_head::execute_procedure(THD *thd, List<Item> *args) uint params = m_pcont->context_var_count(); sp_rcontext *save_spcont, *octx; sp_rcontext *nctx = NULL; - bool save_enable_slow_log= false; + bool save_enable_slow_log; bool save_log_general= false; DBUG_ENTER("sp_head::execute_procedure"); DBUG_PRINT("info", ("procedure %s", m_name.str)); @@ -1957,10 +1957,10 @@ sp_head::execute_procedure(THD *thd, List<Item> *args) DBUG_PRINT("info",(" %.*s: eval args done", (int) m_name.length, m_name.str)); } - if (!(m_flags & LOG_SLOW_STATEMENTS) && thd->enable_slow_log) + save_enable_slow_log= thd->enable_slow_log; + if (!(m_flags & LOG_SLOW_STATEMENTS) && save_enable_slow_log) { DBUG_PRINT("info", ("Disabling slow log for the execution")); - save_enable_slow_log= true; thd->enable_slow_log= FALSE; } if (!(m_flags & LOG_GENERAL_LOG) && !(thd->options & OPTION_LOG_OFF)) @@ -1983,8 +1983,7 @@ sp_head::execute_procedure(THD *thd, List<Item> *args) if (save_log_general) thd->options &= ~OPTION_LOG_OFF; - if (save_enable_slow_log) - thd->enable_slow_log= true; + thd->enable_slow_log= save_enable_slow_log; /* In the case when we weren't able to employ reuse mechanism for OUT/INOUT paranmeters, we should reallocate memory. This @@ -2398,8 +2397,7 @@ sp_head::show_create_routine(THD *thd, int type) if (check_show_routine_access(thd, this, &full_access)) DBUG_RETURN(TRUE); - sys_var_thd_sql_mode::symbolic_mode_representation( - thd, m_sql_mode, &sql_mode); + sys_var::make_set(thd, m_sql_mode, &sql_mode_typelib, &sql_mode); /* Send header. */ diff --git a/sql/sql_bitmap.h b/sql/sql_bitmap.h index 97accefe8aa..e07806a56ab 100644 --- a/sql/sql_bitmap.h +++ b/sql/sql_bitmap.h @@ -93,6 +93,34 @@ public: } }; +/* An iterator to quickly walk over bits in unlonglong bitmap. */ +class Table_map_iterator +{ + ulonglong bmp; + uint no; +public: + Table_map_iterator(ulonglong t) : bmp(t), no(0) {} + int next_bit() + { + static const char last_bit[16]= {32, 0, 1, 0, + 2, 0, 1, 0, + 3, 0, 1, 0, + 2, 0, 1, 0}; + uint bit; + while ((bit= last_bit[bmp & 0xF]) == 32) + { + no += 4; + bmp= bmp >> 4; + if (!bmp) + return BITMAP_END; + } + bmp &= ~(1LL << bit); + return no + bit; + } + int operator++(int) { return next_bit(); } + enum { BITMAP_END= 64 }; +}; + template <> class Bitmap<64> { ulonglong map; @@ -136,5 +164,10 @@ public: my_bool operator==(const Bitmap<64>& map2) const { return map == map2.map; } char *print(char *buf) const { longlong2str(map,buf,16); return buf; } ulonglong to_ulonglong() const { return map; } + class Iterator : public Table_map_iterator + { + public: + Iterator(Bitmap<64> &bmp) : Table_map_iterator(bmp.map) {} + }; }; diff --git a/sql/sql_cache.cc b/sql/sql_cache.cc index 66e5775b534..a97dcab0b55 100644 --- a/sql/sql_cache.cc +++ b/sql/sql_cache.cc @@ -1627,6 +1627,7 @@ def_week_frmt: %lu, in_trans: %d, autocommit: %d", thd->limit_found_rows = query->found_rows(); thd->status_var.last_query_cost= 0.0; + thd->query_plan_flags= (thd->query_plan_flags & ~QPLAN_QC_NO) | QPLAN_QC; thd->main_da.disable_status(); BLOCK_UNLOCK_RD(query_block); @@ -1635,6 +1636,10 @@ def_week_frmt: %lu, in_trans: %d, autocommit: %d", err_unlock: unlock(); err: + /* + query_plan_flags doesn't have to be changed here as it contains + QPLAN_QC_NO by default + */ DBUG_RETURN(0); // Query was not cached } diff --git a/sql/sql_class.cc b/sql/sql_class.cc index cdd1a360144..0805eda59e4 100644 --- a/sql/sql_class.cc +++ b/sql/sql_class.cc @@ -3084,6 +3084,7 @@ void THD::reset_sub_statement_state(Sub_statement_state *backup, backup->options= options; backup->in_sub_stmt= in_sub_stmt; backup->enable_slow_log= enable_slow_log; + backup->query_plan_flags= query_plan_flags; backup->limit_found_rows= limit_found_rows; backup->examined_row_count= examined_row_count; backup->sent_row_count= sent_row_count; @@ -3148,6 +3149,7 @@ void THD::restore_sub_statement_state(Sub_statement_state *backup) options= backup->options; in_sub_stmt= backup->in_sub_stmt; enable_slow_log= backup->enable_slow_log; + query_plan_flags= backup->query_plan_flags; first_successful_insert_id_in_prev_stmt= backup->first_successful_insert_id_in_prev_stmt; first_successful_insert_id_in_cur_stmt= diff --git a/sql/sql_class.h b/sql/sql_class.h index 3040b2080cf..60ab952a3ab 100644 --- a/sql/sql_class.h +++ b/sql/sql_class.h @@ -349,6 +349,10 @@ struct system_variables ulong trans_prealloc_size; ulong log_warnings; ulong group_concat_max_len; + /* Flags for slow log filtering */ + ulong log_slow_rate_limit; + ulong log_slow_filter; + ulong log_slow_verbosity; ulong ndb_autoincrement_prefetch_sz; ulong ndb_index_stat_cache_entries; ulong ndb_index_stat_update_freq; @@ -989,6 +993,7 @@ public: ulonglong limit_found_rows; ha_rows cuted_fields, sent_row_count, examined_row_count; ulong client_capabilities; + ulong query_plan_flags; uint in_sub_stmt; bool enable_slow_log; bool last_insert_id_used; @@ -1736,6 +1741,8 @@ public: create_sort_index(); may differ from examined_row_count. */ ulong row_count; + ulong query_plan_flags; + ulong query_plan_fsort_passes; pthread_t real_id; /* For debugging */ my_thread_id thread_id; uint tmp_table, global_read_lock; diff --git a/sql/sql_lex.cc b/sql/sql_lex.cc index 54c06b1fb98..6d047197992 100644 --- a/sql/sql_lex.cc +++ b/sql/sql_lex.cc @@ -1839,8 +1839,9 @@ void st_select_lex_unit::exclude_tree() 'last' should be reachable from this st_select_lex_node */ -void st_select_lex::mark_as_dependent(st_select_lex *last) +void st_select_lex::mark_as_dependent(st_select_lex *last, Item *dependency) { + SELECT_LEX *next_to_last; /* Mark all selects from resolved to 1 before select where was found table as depended (of select where was found table) @@ -1848,6 +1849,7 @@ void st_select_lex::mark_as_dependent(st_select_lex *last) for (SELECT_LEX *s= this; s && s != last; s= s->outer_select()) + { if (!(s->uncacheable & UNCACHEABLE_DEPENDENT)) { // Select is dependent of outer select @@ -1863,8 +1865,12 @@ void st_select_lex::mark_as_dependent(st_select_lex *last) sl->uncacheable|= UNCACHEABLE_UNITED; } } + next_to_last= s; + } is_correlated= TRUE; this->master_unit()->item->is_correlated= TRUE; + if (dependency) + next_to_last->master_unit()->item->refers_to.push_back(dependency); } bool st_select_lex_node::set_braces(bool value) { return 1; } diff --git a/sql/sql_lex.h b/sql/sql_lex.h index 76fd5354c51..4b9b35819fe 100644 --- a/sql/sql_lex.h +++ b/sql/sql_lex.h @@ -743,7 +743,7 @@ public: return master_unit()->return_after_parsing(); } - void mark_as_dependent(st_select_lex *last); + void mark_as_dependent(st_select_lex *last, Item *dependency); bool set_braces(bool value); bool inc_in_sum_expr(); diff --git a/sql/sql_list.h b/sql/sql_list.h index 2ed4a8060c4..93cdd20c299 100644 --- a/sql/sql_list.h +++ b/sql/sql_list.h @@ -443,6 +443,43 @@ public: /* + Exchange sort algorithm for List<T>. +*/ +template <class T> +inline void exchange_sort(List<T> *list_to_sort, + int (*sort_func)(T *a, T *b, void *arg), void *arg) +{ + bool swap; + List_iterator<T> it(*list_to_sort); + do + { + T *item1= it++; + T **ref1= it.ref(); + T *item2; + + swap= FALSE; + while ((item2= it++)) + { + T **ref2= it.ref(); + if (sort_func(item1, item2, arg) < 0) + { + T *item= *ref1; + *ref1= *ref2; + *ref2= item; + swap= TRUE; + } + else + { + item1= item2; + ref1= ref2; + } + } + it.rewind(); + } while (swap); +} + + +/* A simple intrusive list which automaticly removes element from list on delete (for THD element) */ diff --git a/sql/sql_parse.cc b/sql/sql_parse.cc index d9bc07064c3..20033e23b93 100644 --- a/sql/sql_parse.cc +++ b/sql/sql_parse.cc @@ -972,6 +972,7 @@ bool dispatch_command(enum enum_server_command command, THD *thd, the slow log only if opt_log_slow_admin_statements is set. */ thd->enable_slow_log= TRUE; + thd->query_plan_flags= QPLAN_INIT; thd->lex->sql_command= SQLCOM_END; /* to avoid confusing VIEW detectors */ thd->set_time(); VOID(pthread_mutex_lock(&LOCK_thread_count)); @@ -1046,6 +1047,7 @@ bool dispatch_command(enum enum_server_command command, THD *thd, status_var_increment(thd->status_var.com_other); thd->enable_slow_log= opt_log_slow_admin_statements; + thd->query_plan_flags|= QPLAN_ADMIN; db.str= (char*) thd->alloc(db_len + tbl_len + 2); if (!db.str) { @@ -1401,6 +1403,7 @@ bool dispatch_command(enum enum_server_command command, THD *thd, status_var_increment(thd->status_var.com_other); thd->enable_slow_log= opt_log_slow_admin_statements; + thd->query_plan_flags|= QPLAN_ADMIN; if (check_global_access(thd, REPL_SLAVE_ACL)) break; @@ -1633,6 +1636,19 @@ void log_slow_statement(THD *thd) if (unlikely(thd->in_sub_stmt)) DBUG_VOID_RETURN; // Don't set time for sub stmt + /* Follow the slow log filter configuration. */ + DBUG_ASSERT(thd->variables.log_slow_filter != 0); + if (!(thd->variables.log_slow_filter & thd->query_plan_flags)) + DBUG_VOID_RETURN; + + /* + If rate limiting of slow log writes is enabled, decide whether to log + this query to the log or not. + */ + if (thd->variables.log_slow_rate_limit > 1 && + (global_query_id % thd->variables.log_slow_rate_limit) != 0) + DBUG_VOID_RETURN; + /* Do not log administrative statements unless the appropriate option is set; do not log into slow log if reading from backup. @@ -2355,6 +2371,7 @@ mysql_execute_command(THD *thd) check_global_access(thd, FILE_ACL)) goto error; /* purecov: inspected */ thd->enable_slow_log= opt_log_slow_admin_statements; + thd->query_plan_flags|= QPLAN_ADMIN; res = mysql_backup_table(thd, first_table); select_lex->table_list.first= (uchar*) first_table; lex->query_tables=all_tables; @@ -2367,6 +2384,7 @@ mysql_execute_command(THD *thd) check_global_access(thd, FILE_ACL)) goto error; /* purecov: inspected */ thd->enable_slow_log= opt_log_slow_admin_statements; + thd->query_plan_flags|= QPLAN_ADMIN; res = mysql_restore_table(thd, first_table); select_lex->table_list.first= (uchar*) first_table; lex->query_tables=all_tables; @@ -2752,6 +2770,7 @@ end_with_restore_list: ALTER TABLE. */ thd->enable_slow_log= opt_log_slow_admin_statements; + thd->query_plan_flags|= QPLAN_ADMIN; bzero((char*) &create_info, sizeof(create_info)); create_info.db_type= 0; @@ -2871,6 +2890,7 @@ end_with_restore_list: } thd->enable_slow_log= opt_log_slow_admin_statements; + thd->query_plan_flags|= QPLAN_ADMIN; res= mysql_alter_table(thd, select_lex->db, lex->name.str, &create_info, first_table, @@ -2958,6 +2978,7 @@ end_with_restore_list: UINT_MAX, FALSE)) goto error; /* purecov: inspected */ thd->enable_slow_log= opt_log_slow_admin_statements; + thd->query_plan_flags|= QPLAN_ADMIN; res= mysql_repair_table(thd, first_table, &lex->check_opt); /* ! we write after unlocking the table */ if (!res && !lex->no_write_to_binlog) @@ -2978,6 +2999,7 @@ end_with_restore_list: UINT_MAX, FALSE)) goto error; /* purecov: inspected */ thd->enable_slow_log= opt_log_slow_admin_statements; + thd->query_plan_flags|= QPLAN_ADMIN; res = mysql_check_table(thd, first_table, &lex->check_opt); select_lex->table_list.first= (uchar*) first_table; lex->query_tables=all_tables; @@ -2990,6 +3012,7 @@ end_with_restore_list: UINT_MAX, FALSE)) goto error; /* purecov: inspected */ thd->enable_slow_log= opt_log_slow_admin_statements; + thd->query_plan_flags|= QPLAN_ADMIN; res= mysql_analyze_table(thd, first_table, &lex->check_opt); /* ! we write after unlocking the table */ if (!res && !lex->no_write_to_binlog) @@ -3011,6 +3034,7 @@ end_with_restore_list: UINT_MAX, FALSE)) goto error; /* purecov: inspected */ thd->enable_slow_log= opt_log_slow_admin_statements; + thd->query_plan_flags|= QPLAN_ADMIN; res= (specialflag & (SPECIAL_SAFE_MODE | SPECIAL_NO_NEW_FUNC)) ? mysql_recreate_table(thd, first_table) : mysql_optimize_table(thd, first_table, &lex->check_opt); @@ -5687,6 +5711,8 @@ void mysql_reset_thd_for_next_command(THD *thd) thd->total_warn_count=0; // Warnings for this query thd->rand_used= 0; thd->sent_row_count= thd->examined_row_count= 0; + thd->query_plan_flags= QPLAN_INIT; + thd->query_plan_fsort_passes= 0; /* Because we come here only for start of top-statements, binlog format is diff --git a/sql/sql_select.cc b/sql/sql_select.cc index ba38bb78ceb..014b63c057c 100644 --- a/sql/sql_select.cc +++ b/sql/sql_select.cc @@ -61,7 +61,6 @@ static bool update_ref_and_keys(THD *thd, DYNAMIC_ARRAY *keyuse, table_map table_map, SELECT_LEX *select_lex, st_sargable_param **sargables); static int sort_keyuse(KEYUSE *a,KEYUSE *b); -static void set_position(JOIN *join,uint index,JOIN_TAB *table,KEYUSE *key); static bool create_ref_for_key(JOIN *join, JOIN_TAB *j, KEYUSE *org_keyuse, table_map used_tables); static bool choose_plan(JOIN *join,table_map join_tables); @@ -116,7 +115,7 @@ static COND *simplify_joins(JOIN *join, List<TABLE_LIST> *join_list, COND *conds, bool top); static bool check_interleaving_with_nj(JOIN_TAB *next); static void restore_prev_nj_state(JOIN_TAB *last); -static void reset_nj_counters(List<TABLE_LIST> *join_list); +static uint reset_nj_counters(JOIN *join, List<TABLE_LIST> *join_list); static uint build_bitmap_for_nested_joins(List<TABLE_LIST> *join_list, uint first_unused); @@ -1013,7 +1012,7 @@ JOIN::optimize() DBUG_RETURN(1); } - reset_nj_counters(join_list); + reset_nj_counters(this, join_list); make_outerjoin_info(this); /* @@ -2390,6 +2389,13 @@ mysql_select(THD *thd, Item ***rref_pointer_array, } else { + /* + When in EXPLAIN, delay deleting the joins so that they are still + available when we're producing EXPLAIN EXTENDED warning text. + */ + if (select_options & SELECT_DESCRIBE) + free_join= 0; + if (!(join= new JOIN(thd, fields, select_options, result))) DBUG_RETURN(TRUE); thd_proc_info(thd, "init"); @@ -2655,24 +2661,31 @@ make_join_statistics(JOIN *join, TABLE_LIST *tables_arg, COND *conds, ~outer_join, join->select_lex, &sargables)) goto error; - /* Read tables with 0 or 1 rows (system tables) */ join->const_table_map= 0; + join->const_tables= const_count; + eliminate_tables(join); + const_count= join->const_tables; + found_const_table_map= join->const_table_map; + /* Read tables with 0 or 1 rows (system tables) */ for (POSITION *p_pos=join->positions, *p_end=p_pos+const_count; p_pos < p_end ; p_pos++) { - int tmp; s= p_pos->table; - s->type=JT_SYSTEM; - join->const_table_map|=s->table->map; - if ((tmp=join_read_const_table(s, p_pos))) + if (! (s->table->map & join->eliminated_tables)) { - if (tmp > 0) - goto error; // Fatal error + int tmp; + s->type=JT_SYSTEM; + join->const_table_map|=s->table->map; + if ((tmp=join_read_const_table(s, p_pos))) + { + if (tmp > 0) + goto error; // Fatal error + } + else + found_const_table_map|= s->table->map; } - else - found_const_table_map|= s->table->map; } /* loop until no more const tables are found */ @@ -2697,7 +2710,8 @@ make_join_statistics(JOIN *join, TABLE_LIST *tables_arg, COND *conds, substitution of a const table the key value happens to be null then we can state that there are no matches for this equi-join. */ - if ((keyuse= s->keyuse) && *s->on_expr_ref && !s->embedding_map) + if ((keyuse= s->keyuse) && *s->on_expr_ref && !s->embedding_map && + !(table->map & join->eliminated_tables)) { /* When performing an outer join operation if there are no matching rows @@ -2974,14 +2988,45 @@ typedef struct key_field_t { This is called for OR between different levels. - To be able to do 'ref_or_null' we merge a comparison of a column - and 'column IS NULL' to one test. This is useful for sub select queries - that are internally transformed to something like:. + That is, the function operates on an array of KEY_FIELD elements which has + two parts: + + $LEFT_PART $RIGHT_PART + +-----------------------+-----------------------+ + start new_fields end + + $LEFT_PART and $RIGHT_PART are arrays that have KEY_FIELD elements for two + parts of the OR condition. Our task is to produce an array of KEY_FIELD + elements that would correspond to "$LEFT_PART OR $RIGHT_PART". + + The rules for combining elements are as follows: + + (keyfieldA1 AND keyfieldA2 AND ...) OR (keyfieldB1 AND keyfieldB2 AND ...)= + + = AND_ij (keyfieldA_i OR keyfieldB_j) + + We discard all (keyfieldA_i OR keyfieldB_j) that refer to different + fields. For those referring to the same field, the logic is as follows: + + t.keycol=expr1 OR t.keycol=expr2 -> (since expr1 and expr2 are different + we can't produce a single equality, + so produce nothing) + + t.keycol=expr1 OR t.keycol=expr1 -> t.keycol=expr1 + + t.keycol=expr1 OR t.keycol IS NULL -> t.keycol=expr1, and also set + KEY_OPTIMIZE_REF_OR_NULL flag + + The last one is for ref_or_null access. We have handling for this special + because it's needed for evaluating IN subqueries that are internally + transformed into @code - SELECT * FROM t1 WHERE t1.key=outer_ref_field or t1.key IS NULL + EXISTS(SELECT * FROM t1 WHERE t1.key=outer_ref_field or t1.key IS NULL) @endcode + See add_key_fields() for discussion of what is and_level. + KEY_FIELD::null_rejecting is processed as follows: @n result has null_rejecting=true if it is set for both ORed references. for example: @@ -3343,6 +3388,25 @@ is_local_field (Item *field) } +/* + In this and other functions, and_level is a number that is ever-growing + and is different for the contents of every AND or OR clause. For example, + when processing clause + + (a AND b AND c) OR (x AND y) + + we'll have + * KEY_FIELD elements for (a AND b AND c) are assigned and_level=1 + * KEY_FIELD elements for (x AND y) are assigned and_level=2 + * OR operation is performed, and whatever elements are left after it are + assigned and_level=3. + + The primary reason for having and_level attribute is the OR operation which + uses and_level to mark KEY_FIELDs that should get into the result of the OR + operation + +*/ + static void add_key_fields(JOIN *join, KEY_FIELD **key_fields, uint *and_level, COND *cond, table_map usable_tables, @@ -4028,8 +4092,7 @@ add_group_and_distinct_keys(JOIN *join, JOIN_TAB *join_tab) /** Save const tables first as used tables. */ -static void -set_position(JOIN *join,uint idx,JOIN_TAB *table,KEYUSE *key) +void set_position(JOIN *join,uint idx,JOIN_TAB *table,KEYUSE *key) { join->positions[idx].table= table; join->positions[idx].key=key; @@ -4630,7 +4693,7 @@ choose_plan(JOIN *join, table_map join_tables) DBUG_ENTER("choose_plan"); join->cur_embedding_map= 0; - reset_nj_counters(join->join_list); + reset_nj_counters(join, join->join_list); /* if (SELECT_STRAIGHT_JOIN option is set) reorder tables so dependent tables come after tables they depend @@ -5795,6 +5858,7 @@ JOIN::make_simple_join(JOIN *parent, TABLE *tmp_table) tables= 1; const_tables= 0; const_table_map= 0; + eliminated_tables= 0; tmp_table_param.field_count= tmp_table_param.sum_func_count= tmp_table_param.func_count= 0; tmp_table_param.copy_field= tmp_table_param.copy_field_end=0; @@ -6059,7 +6123,7 @@ make_outerjoin_info(JOIN *join) } if (!tab->first_inner) tab->first_inner= nested_join->first_nested; - if (++nested_join->counter < nested_join->join_list.elements) + if (++nested_join->counter < nested_join->n_tables) break; /* Table tab is the last inner table for nested join. */ nested_join->first_nested->last_inner= tab; @@ -6585,7 +6649,10 @@ make_join_readinfo(JOIN *join, ulonglong options) { join->thd->server_status|=SERVER_QUERY_NO_INDEX_USED; if (statistics) + { status_var_increment(join->thd->status_var.select_scan_count); + join->thd->query_plan_flags|= QPLAN_FULL_SCAN; + } } } else @@ -6599,7 +6666,10 @@ make_join_readinfo(JOIN *join, ulonglong options) { join->thd->server_status|=SERVER_QUERY_NO_INDEX_USED; if (statistics) + { status_var_increment(join->thd->status_var.select_full_join_count); + join->thd->query_plan_flags|= QPLAN_FULL_JOIN; + } } } if (!table->no_keyread) @@ -8614,6 +8684,8 @@ simplify_joins(JOIN *join, List<TABLE_LIST> *join_list, COND *conds, bool top) conds= simplify_joins(join, &nested_join->join_list, conds, top); used_tables= nested_join->used_tables; not_null_tables= nested_join->not_null_tables; + /* The following two might become unequal after table elimination: */ + nested_join->n_tables= nested_join->join_list.elements; } else { @@ -8772,7 +8844,7 @@ static uint build_bitmap_for_nested_joins(List<TABLE_LIST> *join_list, with anything) 2. we could run out bits in nested_join_map otherwise. */ - if (nested_join->join_list.elements != 1) + if (nested_join->n_tables != 1) { nested_join->nj_map= (nested_join_map) 1 << first_unused++; first_unused= build_bitmap_for_nested_joins(&nested_join->join_list, @@ -8794,21 +8866,26 @@ static uint build_bitmap_for_nested_joins(List<TABLE_LIST> *join_list, tables which will be ignored. */ -static void reset_nj_counters(List<TABLE_LIST> *join_list) +static uint reset_nj_counters(JOIN *join, List<TABLE_LIST> *join_list) { List_iterator<TABLE_LIST> li(*join_list); TABLE_LIST *table; DBUG_ENTER("reset_nj_counters"); + uint n=0; while ((table= li++)) { NESTED_JOIN *nested_join; if ((nested_join= table->nested_join)) { nested_join->counter= 0; - reset_nj_counters(&nested_join->join_list); + //nested_join->n_tables= my_count_bits(nested_join->used_tables & + // ~join->eliminated_tables); + nested_join->n_tables= reset_nj_counters(join, &nested_join->join_list); } + if (table->table && (table->table->map & ~join->eliminated_tables)) + n++; } - DBUG_VOID_RETURN; + DBUG_RETURN(n); } @@ -8933,7 +9010,7 @@ static bool check_interleaving_with_nj(JOIN_TAB *next_tab) join->cur_embedding_map |= next_emb->nested_join->nj_map; } - if (next_emb->nested_join->join_list.elements != + if (next_emb->nested_join->n_tables != next_emb->nested_join->counter) break; @@ -8967,7 +9044,7 @@ static void restore_prev_nj_state(JOIN_TAB *last) { if (!(--last_emb->nested_join->counter)) join->cur_embedding_map&= ~last_emb->nested_join->nj_map; - else if (last_emb->nested_join->join_list.elements-1 == + else if (last_emb->nested_join->n_tables-1 == last_emb->nested_join->counter) join->cur_embedding_map|= last_emb->nested_join->nj_map; else @@ -9763,6 +9840,7 @@ create_tmp_table(THD *thd,TMP_TABLE_PARAM *param,List<Item> &fields, (ulong) rows_limit,test(group))); status_var_increment(thd->status_var.created_tmp_tables); + thd->query_plan_flags|= QPLAN_TMP_TABLE; if (use_temp_pool && !(test_flags & TEST_KEEP_TMP_TABLES)) temp_pool_slot = bitmap_lock_set_next(&temp_pool); @@ -10649,6 +10727,7 @@ static bool create_internal_tmp_table(TABLE *table,TMP_TABLE_PARAM *param, goto err; } status_var_increment(table->in_use->status_var.created_tmp_disk_tables); + table->in_use->query_plan_flags|= QPLAN_TMP_DISK; share->db_record_offset= 1; DBUG_RETURN(0); err: @@ -16304,6 +16383,14 @@ static void select_describe(JOIN *join, bool need_tmp_table, bool need_order, tmp3.length(0); quick_type= -1; + + /* Don't show eliminated tables */ + if (table->map & join->eliminated_tables) + { + used_tables|=table->map; + continue; + } + item_list.empty(); /* id */ item_list.push_back(new Item_uint((uint32) @@ -16626,8 +16713,11 @@ static void select_describe(JOIN *join, bool need_tmp_table, bool need_order, unit; unit= unit->next_unit()) { - if (mysql_explain_union(thd, unit, result)) - DBUG_VOID_RETURN; + if (!(unit->item && unit->item->eliminated)) + { + if (mysql_explain_union(thd, unit, result)) + DBUG_VOID_RETURN; + } } DBUG_VOID_RETURN; } @@ -16668,7 +16758,6 @@ bool mysql_explain_union(THD *thd, SELECT_LEX_UNIT *unit, select_result *result) unit->fake_select_lex->options|= SELECT_DESCRIBE; if (!(res= unit->prepare(thd, result, SELECT_NO_UNLOCK | SELECT_DESCRIBE))) res= unit->exec(); - res|= unit->cleanup(); } else { @@ -16701,6 +16790,7 @@ bool mysql_explain_union(THD *thd, SELECT_LEX_UNIT *unit, select_result *result) */ static void print_join(THD *thd, + table_map eliminated_tables, String *str, List<TABLE_LIST> *tables, enum_query_type query_type) @@ -16716,12 +16806,33 @@ static void print_join(THD *thd, *t= ti++; DBUG_ASSERT(tables->elements >= 1); - (*table)->print(thd, str, query_type); + /* + Assert that the first table in the list isn't eliminated. This comes from + the fact that the first table can't be inner table of an outer join. + */ + DBUG_ASSERT(!eliminated_tables || + !(((*table)->table && ((*table)->table->map & eliminated_tables)) || + ((*table)->nested_join && !((*table)->nested_join->used_tables & + ~eliminated_tables)))); + (*table)->print(thd, eliminated_tables, str, query_type); TABLE_LIST **end= table + tables->elements; for (TABLE_LIST **tbl= table + 1; tbl < end; tbl++) { TABLE_LIST *curr= *tbl; + /* + The "eliminated_tables &&" check guards againist the case of + printing the query for CREATE VIEW. We do that without having run + JOIN::optimize() and so will have nested_join->used_tables==0. + */ + if (eliminated_tables && + ((curr->table && (curr->table->map & eliminated_tables)) || + (curr->nested_join && !(curr->nested_join->used_tables & + ~eliminated_tables)))) + { + continue; + } + if (curr->outer_join) { /* MySQL converts right to left joins */ @@ -16731,7 +16842,7 @@ static void print_join(THD *thd, str->append(STRING_WITH_LEN(" straight_join ")); else str->append(STRING_WITH_LEN(" join ")); - curr->print(thd, str, query_type); + curr->print(thd, eliminated_tables, str, query_type); if (curr->on_expr) { str->append(STRING_WITH_LEN(" on(")); @@ -16785,12 +16896,13 @@ Index_hint::print(THD *thd, String *str) @param str string where table should be printed */ -void TABLE_LIST::print(THD *thd, String *str, enum_query_type query_type) +void TABLE_LIST::print(THD *thd, table_map eliminated_tables, String *str, + enum_query_type query_type) { if (nested_join) { str->append('('); - print_join(thd, str, &nested_join->join_list, query_type); + print_join(thd, eliminated_tables, str, &nested_join->join_list, query_type); str->append(')'); } else @@ -16932,7 +17044,7 @@ void st_select_lex::print(THD *thd, String *str, enum_query_type query_type) { str->append(STRING_WITH_LEN(" from ")); /* go through join tree */ - print_join(thd, str, &top_join_list, query_type); + print_join(thd, join? join->eliminated_tables: 0, str, &top_join_list, query_type); } else if (where) { diff --git a/sql/sql_select.h b/sql/sql_select.h index 5e97185a7b9..271c88ebf66 100644 --- a/sql/sql_select.h +++ b/sql/sql_select.h @@ -285,7 +285,15 @@ public: fetching data from a cursor */ bool resume_nested_loop; - table_map const_table_map,found_const_table_map; + table_map const_table_map; + /* + Constant tables for which we have found a row (as opposed to those for + which we didn't). + */ + table_map found_const_table_map; + + /* Tables removed by table elimination. Set to 0 before the elimination. */ + table_map eliminated_tables; /* Bitmap of all inner tables from outer joins */ @@ -425,6 +433,7 @@ public: table= 0; tables= 0; const_tables= 0; + eliminated_tables= 0; join_list= 0; sort_and_group= 0; first_record= 0; @@ -530,6 +539,10 @@ public: return (unit == &thd->lex->unit && (unit->fake_select_lex == 0 || select_lex == unit->fake_select_lex)); } + inline table_map all_tables_map() + { + return (table_map(1) << tables) - 1; + } private: bool make_simple_join(JOIN *join, TABLE *tmp_table); }; @@ -730,9 +743,12 @@ bool error_if_full_join(JOIN *join); int report_error(TABLE *table, int error); int safe_index_read(JOIN_TAB *tab); COND *remove_eq_conds(THD *thd, COND *cond, Item::cond_result *cond_value); +void set_position(JOIN *join,uint idx,JOIN_TAB *table,KEYUSE *key); inline bool optimizer_flag(THD *thd, uint flag) { return (thd->variables.optimizer_switch & flag); } +void eliminate_tables(JOIN *join); + diff --git a/sql/sql_show.cc b/sql/sql_show.cc index c8936be1ea4..18e83498f7d 100644 --- a/sql/sql_show.cc +++ b/sql/sql_show.cc @@ -4570,8 +4570,7 @@ static bool store_trigger(THD *thd, TABLE *table, LEX_STRING *db_name, table->field[14]->store(STRING_WITH_LEN("OLD"), cs); table->field[15]->store(STRING_WITH_LEN("NEW"), cs); - sys_var_thd_sql_mode::symbolic_mode_representation(thd, sql_mode, - &sql_mode_rep); + sys_var::make_set(thd, sql_mode, &sql_mode_typelib, &sql_mode_rep); table->field[17]->store(sql_mode_rep.str, sql_mode_rep.length, cs); table->field[18]->store(definer_buffer->str, definer_buffer->length, cs); table->field[19]->store(client_cs_name->str, client_cs_name->length, cs); @@ -5164,8 +5163,7 @@ copy_event_to_schema_table(THD *thd, TABLE *sch_table, TABLE *event_table) /* SQL_MODE */ { LEX_STRING sql_mode; - sys_var_thd_sql_mode::symbolic_mode_representation(thd, et.sql_mode, - &sql_mode); + sys_var::make_set(thd, et.sql_mode, &sql_mode_typelib, &sql_mode); sch_table->field[ISE_SQL_MODE]-> store(sql_mode.str, sql_mode.length, scs); } @@ -6870,9 +6868,7 @@ static bool show_create_trigger_impl(THD *thd, &trg_connection_cl_name, &trg_db_cl_name); - sys_var_thd_sql_mode::symbolic_mode_representation(thd, - trg_sql_mode, - &trg_sql_mode_str); + sys_var::make_set(thd, trg_sql_mode, &sql_mode_typelib, &trg_sql_mode_str); /* Resolve trigger client character set. */ diff --git a/sql/strfunc.cc b/sql/strfunc.cc index 5ff2efe2020..0153381f85b 100644 --- a/sql/strfunc.cc +++ b/sql/strfunc.cc @@ -45,6 +45,7 @@ ulonglong find_set(TYPELIB *lib, const char *str, uint length, CHARSET_INFO *strip= cs ? cs : &my_charset_latin1; const char *end= str + strip->cset->lengthsp(strip, str, length); ulonglong found= 0; + *err_pos= 0; // No error yet if (str != end) { @@ -74,9 +75,13 @@ ulonglong find_set(TYPELIB *lib, const char *str, uint length, find_type(lib, start, var_len, (bool) 0); if (!find) { - *err_pos= (char*) start; - *err_len= var_len; - *set_warning= 1; + /* Report first error */ + if (!*err_pos) + { + *err_pos= (char*) start; + *err_len= var_len; + *set_warning= 1; + } } else found|= ((longlong) 1 << (find - 1)); @@ -148,8 +153,10 @@ static uint parse_name(TYPELIB *lib, const char **strpos, const char *end, } } else - for (; pos != end && *pos != '=' && *pos !=',' ; pos++); - + { + for (; pos != end && *pos != '=' && *pos !=',' ; pos++) + ; + } uint var_len= (uint) (pos - start); /* Determine which flag it is */ uint find= cs ? find_type2(lib, start, var_len, cs) : diff --git a/sql/table.h b/sql/table.h index dad7762d63f..a7ae50b8e72 100644 --- a/sql/table.h +++ b/sql/table.h @@ -1374,7 +1374,8 @@ struct TABLE_LIST return (derived || view || schema_table || (create && !table->db_stat) || !table); } - void print(THD *thd, String *str, enum_query_type query_type); + void print(THD *thd, table_map eliminated_tables, String *str, + enum_query_type query_type); bool check_single_table(TABLE_LIST **table, table_map map, TABLE_LIST *view); bool set_insert_values(MEM_ROOT *mem_root); @@ -1623,7 +1624,11 @@ public: typedef struct st_nested_join { List<TABLE_LIST> join_list; /* list of elements in the nested join */ - table_map used_tables; /* bitmap of tables in the nested join */ + /* + Bitmap of tables within this nested join (including those embedded within + its children), including tables removed by table elimination. + */ + table_map used_tables; table_map not_null_tables; /* tables that rejects nulls */ struct st_join_table *first_nested;/* the first nested table in the plan */ /* @@ -1634,6 +1639,11 @@ typedef struct st_nested_join Before each use the counters are zeroed by reset_nj_counters. */ uint counter; + /* + Number of elements in join_list that were not (or contain table(s) that + weren't) removed by table elimination. + */ + uint n_tables; nested_join_map nj_map; /* Bit used to identify this nested join*/ } NESTED_JOIN; diff --git a/storage/maria/CMakeLists.txt b/storage/maria/CMakeLists.txt index 16707c81ecb..c0c3a44c7cf 100644 --- a/storage/maria/CMakeLists.txt +++ b/storage/maria/CMakeLists.txt @@ -51,6 +51,8 @@ SET(MARIA_SOURCES ma_init.c ma_open.c ma_extra.c ma_info.c ma_rkey.c MYSQL_STORAGE_ENGINE(MARIA) IF(NOT SOURCE_SUBLIBS) + ADD_DEPENDENCIES(maria GenError) + ADD_EXECUTABLE(maria_ftdump maria_ftdump.c) TARGET_LINK_LIBRARIES(maria_ftdump maria myisam mysys dbug strings zlib wsock32) diff --git a/storage/maria/ma_rt_index.c b/storage/maria/ma_rt_index.c index caf4a65eab2..46334ad0a97 100644 --- a/storage/maria/ma_rt_index.c +++ b/storage/maria/ma_rt_index.c @@ -1133,6 +1133,8 @@ my_bool maria_rtree_real_delete(MARIA_HA *info, MARIA_KEY *key, { uint nod_flag; ulong i; + uchar *page_buf; + MARIA_PAGE page; MARIA_KEY tmp_key; uchar *page_buf; MARIA_PAGE page; diff --git a/storage/myisam/mi_extra.c b/storage/myisam/mi_extra.c index d798ef50d7e..239fdb3fbc4 100644 --- a/storage/myisam/mi_extra.c +++ b/storage/myisam/mi_extra.c @@ -260,9 +260,8 @@ int mi_extra(MI_INFO *info, enum ha_extra_function function, void *extra_arg) case HA_EXTRA_PREPARE_FOR_DROP: pthread_mutex_lock(&THR_LOCK_myisam); share->last_version= 0L; /* Impossible version */ -#ifdef __WIN__REMOVE_OBSOLETE_WORKAROUND - /* Close the isam and data files as Win32 can't drop an open table */ pthread_mutex_lock(&share->intern_lock); + /* Flush pages that we don't need anymore */ if (flush_key_blocks(share->key_cache, share->kfile, (function == HA_EXTRA_PREPARE_FOR_DROP ? FLUSH_IGNORE_CHANGED : FLUSH_RELEASE))) @@ -272,6 +271,8 @@ int mi_extra(MI_INFO *info, enum ha_extra_function function, void *extra_arg) mi_print_error(info->s, HA_ERR_CRASHED); mi_mark_crashed(info); /* Fatal error found */ } +#ifdef __WIN__REMOVE_OBSOLETE_WORKAROUND + /* Close the isam and data files as Win32 can't drop an open table */ if (info->opt_flag & (READ_CACHE_USED | WRITE_CACHE_USED)) { info->opt_flag&= ~(READ_CACHE_USED | WRITE_CACHE_USED); @@ -304,8 +305,8 @@ int mi_extra(MI_INFO *info, enum ha_extra_function function, void *extra_arg) } } share->kfile= -1; /* Files aren't open anymore */ - pthread_mutex_unlock(&share->intern_lock); #endif + pthread_mutex_unlock(&share->intern_lock); pthread_mutex_unlock(&THR_LOCK_myisam); break; case HA_EXTRA_FLUSH: diff --git a/storage/myisam/mi_locking.c b/storage/myisam/mi_locking.c index f3d9934ed8c..ebee8826c3b 100644 --- a/storage/myisam/mi_locking.c +++ b/storage/myisam/mi_locking.c @@ -582,7 +582,7 @@ int _mi_decrement_open_count(MI_INFO *info) { uint old_lock=info->lock_type; share->global_changed=0; - lock_error=mi_lock_database(info,F_WRLCK); + lock_error= my_disable_locking ? 0 : mi_lock_database(info,F_WRLCK); /* Its not fatal even if we couldn't get the lock ! */ if (share->state.open_count > 0) { @@ -592,7 +592,7 @@ int _mi_decrement_open_count(MI_INFO *info) sizeof(share->state.header), MYF(MY_NABP)); } - if (!lock_error) + if (!lock_error && !my_disable_locking) lock_error=mi_lock_database(info,old_lock); } return test(lock_error || write_error); diff --git a/storage/pbxt/ChangeLog b/storage/pbxt/ChangeLog index 958bcd81cd1..dabfea6c168 100644 --- a/storage/pbxt/ChangeLog +++ b/storage/pbxt/ChangeLog @@ -1,7 +1,79 @@ PBXT Release Notes ================== -------- 1.0.08 RC - Not yet released +------- 1.0.08d RC2 - 2009-09-02 + +RN267: Fixed a bug that caused MySQL to crash on shutdown, after an incorrect command line parameter was given. The crash occurred because the background recovery task was not cleaned up before the PBXT engine was de-initialized. + +------- 1.0.08c RC2 - 2009-08-18 + +RN266: Updated BLOB streaming glue, used with the PBMS engine. The glue code is now identical to the version of "1.0.08-rc-pbms" version of PBXT available from http://blobstreaming.org/download. + +RN265: Changes the sequential reading of data log files to skip gaps, instead of returning EOF. This ensures that extended data records are preserved even when something goes wrong with the way the file is written. + +RN264: Fixed a bug that cased an "Data log not found" error after an out of disk space error on a log file. This bug is similar to RN262 in that it allows "gaps" to appear in the data logs. + +RN263: Updated xtstat to compile on Windows/MS Visual C++. + +RN262: Merged changes for PBMS version 0.5.09. + +RN261: Concerning bug #377788: Cannot find index for FK. Fixed buffer overflow which occurred when the error was reported. + +RN260: Fixed bug #377788: Cannot find index for FK. PBXT now correctly uses prefix of an index to support FK references (e.g. if key = (c1, c2) then an index on (c1, c2, c3) will work). Also fixed buffer overflow, which occurred when reporting the error. + +RN259: Fixed bug #309424: xtstat doesn't use my.cnf. You can now add an [xtstat] section to my.cnf, for use with xtstat. + +RN258: updated xt_p_join implementation for Windows to check if a thread has already exited or has not yet started + +RN257: Removed false assertion that could fail during restore if a transaction log page was zero-filled + +RN256: Update datalog eof pointer only if write opearions were sucessful + +RN255: Added re-allocation of of filemap if allocating the of the new map failed. This often happens if there's not enough space on disk. + +RN254: When a table with a corrupted index is detected, PBXT creates a file called 'repair-pending' in the pbxt directory, with the name of the table in it. Each table in the file is listed on a line by itself (the last line has no trailing \n). When the table is repaired (using the REPAIR TABLE command), this entry is removed from the file. + +RN253: Use fcntl(F_FULLFSYNC) instead of fsync on platforms that support it. Improper fsync operation was presumably the reason of index corruption on Mac OS X. + +RN252: Fixed bug #368692: PBXT not reporting data size correctly in information_schema. + +------- 1.0.08 RC2 - 2009-06-30 + +RN251: A Windows-specific test update, also removed false assertion that failed on Windows. + +RN250: Fixed a bug that caused recovery to fail when the transaction log ID exceeded 255. The problem was a checksum failed in the log record. + +RN249: Fixed bug #313176: Test case timeout. This happened because record cache pages where not properly freed and as soon as cache filled up the performacne degraded. + +RN248: PBXT now compiles and runs with MySQL 5.1.35. All tests pass. + +RN247: Fixed bug #369086: Incosistent/Incorrect Truncate behavior + +RN246: Fixed bug #378222: Drop sakila causes error: Cannot delete or update a parent row: a foreign key constraint fails + +RN245: Fixed bug #379315: Inconsistent behavior of DELETE IGNORE and FK constraint. + +RN244: Fixed a recovery problem: during the recovery of "record modified" action the table was updated before the old index entries were removed; then the xres_remove_index_entries was supplied the new record which lead to incorrect index update. + +RN243: Fixed a bug that caused a recovery failure if partitioned pbxt tables where present. This happended because the recovery used a MySQL function to open tables and the PBXT handler was not yet registered + +RN242: Fixed a bug that caused a deadlock if pbxt initialization failed. This happened because pbxt ceanup was done from pbxt_init() with PLUGIN_lock being held by MySQL which lead to a deadlock in the freeer thread + +RN241: Fixed a heap corruption bug (writing to a freed memory location). It happened only when memory mapped files were used leading to heap inconsistency and program crash or termination by heap checker. Likely to happen right after or during DROP TABLE but possible in other cases too. + +RN240: Load the record cache on read when no using memory mapped files. + +RN239: Added PBXT variable pbxt_max_threads. This is the maximum number of threads that can be created PBXT. By default this value is set to 0 which means the number of threads is derived from the MySQL variable max_connections. The value used is max_connections+7. Under Drizzle the default value is 500. + +RN238: Added an option to wait for the sweeper to clean up old transactions on a particular connection. This prevents the sweeper from getting too far behind. + +RN237: Added an option to lazy delete fixed length index entries. This means the index entries are just marked for deletion, instead of removing the items from the index page. This has the advantage that an exclusive lock is not always required for deletion. + +RN236: Fixed bug #349177: a bug in configure.in script. + +RN235: Fixed bug 349176: a compiler warning. + +RN234: Completed Drizzle integration. All Drizzle tests now run with PBXT. RN233: Fixed bugs which occur when PBXT is used together with PBMS (BLOB Streaming engine). diff --git a/storage/pbxt/src/Makefile.am b/storage/pbxt/src/Makefile.am index 2272fe81464..e4abf5df492 100644 --- a/storage/pbxt/src/Makefile.am +++ b/storage/pbxt/src/Makefile.am @@ -19,7 +19,7 @@ noinst_HEADERS = bsearch_xt.h cache_xt.h ccutils_xt.h database_xt.h \ datadic_xt.h datalog_xt.h filesys_xt.h hashtab_xt.h \ ha_pbxt.h heap_xt.h index_xt.h linklist_xt.h \ memory_xt.h myxt_xt.h pthread_xt.h restart_xt.h \ - streaming_xt.h sortedlist_xt.h strutil_xt.h \ + pbms_enabled.h sortedlist_xt.h strutil_xt.h \ tabcache_xt.h table_xt.h trace_xt.h thread_xt.h \ util_xt.h xaction_xt.h xactlog_xt.h lock_xt.h \ systab_xt.h ha_xtsys.h discover_xt.h \ @@ -30,7 +30,7 @@ libpbxt_la_SOURCES = bsearch_xt.cc cache_xt.cc ccutils_xt.cc database_xt.cc \ datadic_xt.cc datalog_xt.cc filesys_xt.cc hashtab_xt.cc \ ha_pbxt.cc heap_xt.cc index_xt.cc linklist_xt.cc \ memory_xt.cc myxt_xt.cc pthread_xt.cc restart_xt.cc \ - streaming_xt.cc sortedlist_xt.cc strutil_xt.cc \ + sortedlist_xt.cc strutil_xt.cc \ tabcache_xt.cc table_xt.cc trace_xt.cc thread_xt.cc \ systab_xt.cc ha_xtsys.cc discover_xt.cc \ util_xt.cc xaction_xt.cc xactlog_xt.cc lock_xt.cc locklist_xt.cc @@ -49,4 +49,4 @@ libpbxt_a_SOURCES = $(libpbxt_la_SOURCES) libpbxt_a_CXXFLAGS = $(AM_CXXFLAGS) libpbxt_a_CFLAGS = $(AM_CFLAGS) -std=c99 -EXTRA_DIST = CMakeLists.txt +EXTRA_DIST = CMakeLists.txt pbms_enabled.cc diff --git a/storage/pbxt/src/cache_xt.cc b/storage/pbxt/src/cache_xt.cc index 0e15475f185..89b62c02282 100644 --- a/storage/pbxt/src/cache_xt.cc +++ b/storage/pbxt/src/cache_xt.cc @@ -23,6 +23,10 @@ #include "xt_config.h" +#ifdef DRIZZLED +#include <bitset> +#endif + #ifndef XT_WIN #include <unistd.h> #endif @@ -51,17 +55,22 @@ #define IDX_CAC_SEGMENT_COUNT ((off_t) 1 << XT_INDEX_CACHE_SEGMENT_SHIFTS) #define IDX_CAC_SEGMENT_MASK (IDX_CAC_SEGMENT_COUNT - 1) -//#define IDX_USE_SPINRWLOCK -#define IDX_USE_RWMUTEX +#ifdef XT_NO_ATOMICS +#define IDX_CAC_USE_PTHREAD_RW +#else +//#define IDX_CAC_USE_RWMUTEX //#define IDX_CAC_USE_PTHREAD_RW +//#define IDX_USE_SPINXSLOCK +#define IDX_CAC_USE_XSMUTEX +#endif -#ifdef IDX_CAC_USE_FASTWRLOCK -#define IDX_CAC_LOCK_TYPE XTFastRWLockRec -#define IDX_CAC_INIT_LOCK(s, i) xt_fastrwlock_init(s, &(i)->cs_lock) -#define IDX_CAC_FREE_LOCK(s, i) xt_fastrwlock_free(s, &(i)->cs_lock) -#define IDX_CAC_READ_LOCK(i, o) xt_fastrwlock_slock(&(i)->cs_lock, (o)) -#define IDX_CAC_WRITE_LOCK(i, o) xt_fastrwlock_xlock(&(i)->cs_lock, (o)) -#define IDX_CAC_UNLOCK(i, o) xt_fastrwlock_unlock(&(i)->cs_lock, (o)) +#ifdef IDX_CAC_USE_XSMUTEX +#define IDX_CAC_LOCK_TYPE XTXSMutexRec +#define IDX_CAC_INIT_LOCK(s, i) xt_xsmutex_init_with_autoname(s, &(i)->cs_lock) +#define IDX_CAC_FREE_LOCK(s, i) xt_xsmutex_free(s, &(i)->cs_lock) +#define IDX_CAC_READ_LOCK(i, o) xt_xsmutex_slock(&(i)->cs_lock, (o)->t_id) +#define IDX_CAC_WRITE_LOCK(i, o) xt_xsmutex_xlock(&(i)->cs_lock, (o)->t_id) +#define IDX_CAC_UNLOCK(i, o) xt_xsmutex_unlock(&(i)->cs_lock, (o)->t_id) #elif defined(IDX_CAC_USE_PTHREAD_RW) #define IDX_CAC_LOCK_TYPE xt_rwlock_type #define IDX_CAC_INIT_LOCK(s, i) xt_init_rwlock(s, &(i)->cs_lock) @@ -69,13 +78,20 @@ #define IDX_CAC_READ_LOCK(i, o) xt_slock_rwlock_ns(&(i)->cs_lock) #define IDX_CAC_WRITE_LOCK(i, o) xt_xlock_rwlock_ns(&(i)->cs_lock) #define IDX_CAC_UNLOCK(i, o) xt_unlock_rwlock_ns(&(i)->cs_lock) -#elif defined(IDX_USE_RWMUTEX) +#elif defined(IDX_CAC_USE_RWMUTEX) #define IDX_CAC_LOCK_TYPE XTRWMutexRec #define IDX_CAC_INIT_LOCK(s, i) xt_rwmutex_init_with_autoname(s, &(i)->cs_lock) #define IDX_CAC_FREE_LOCK(s, i) xt_rwmutex_free(s, &(i)->cs_lock) #define IDX_CAC_READ_LOCK(i, o) xt_rwmutex_slock(&(i)->cs_lock, (o)->t_id) #define IDX_CAC_WRITE_LOCK(i, o) xt_rwmutex_xlock(&(i)->cs_lock, (o)->t_id) #define IDX_CAC_UNLOCK(i, o) xt_rwmutex_unlock(&(i)->cs_lock, (o)->t_id) +#elif defined(IDX_CAC_USE_SPINXSLOCK) +#define IDX_CAC_LOCK_TYPE XTSpinXSLockRec +#define IDX_CAC_INIT_LOCK(s, i) xt_spinxslock_init_with_autoname(s, &(i)->cs_lock) +#define IDX_CAC_FREE_LOCK(s, i) xt_spinxslock_free(s, &(i)->cs_lock) +#define IDX_CAC_READ_LOCK(i, s) xt_spinxslock_slock(&(i)->cs_lock, (s)->t_id) +#define IDX_CAC_WRITE_LOCK(i, s) xt_spinxslock_xlock(&(i)->cs_lock, (s)->t_id) +#define IDX_CAC_UNLOCK(i, s) xt_spinxslock_unlock(&(i)->cs_lock, (s)->t_id) #endif #define ID_HANDLE_USE_SPINLOCK @@ -308,7 +324,8 @@ xtPublic XTIndHandlePtr xt_ind_get_handle(XTOpenTablePtr ot, XTIndexPtr ind, XTI hs = &ind_cac_globals.cg_handle_slot[iref->ir_block->cb_address % XT_HANDLE_SLOTS]; - ASSERT_NS(iref->ir_ulock == XT_UNLOCK_READ); + ASSERT_NS(iref->ir_xlock == FALSE); + ASSERT_NS(iref->ir_updated == FALSE); ID_HANDLE_LOCK(&hs->hs_handles_lock); #ifdef CHECK_HANDLE_STRUCTS ic_check_handle_structs(); @@ -337,10 +354,10 @@ xtPublic XTIndHandlePtr xt_ind_get_handle(XTOpenTablePtr ot, XTIndexPtr ind, XTI * at least an Slock on the index. * So this excludes anyone who is reading * cb_handle_count in the index. - * (all cache block writers, and a freeer). + * (all cache block writers, and the freeer). * * The increment is safe because I have the list - * lock, which is required by anyone else + * lock (hs_handles_lock), which is required by anyone else * who increments or decrements this value. */ iref->ir_block->cb_handle_count++; @@ -396,8 +413,11 @@ xtPublic void xt_ind_release_handle(XTIndHandlePtr handle, xtBool have_lock, XTT xblock = seg->cs_hash_table[hash_idx]; while (xblock) { if (block == xblock) { - /* Found the block... */ - xt_atomicrwlock_xlock(&block->cb_lock, thread->t_id); + /* Found the block... + * {HANDLE-COUNT-SLOCK} + * 04.05.2009, changed to slock. + */ + XT_IPAGE_READ_LOCK(&block->cb_lock); goto block_found; } xblock = xblock->cb_next; @@ -431,7 +451,18 @@ xtPublic void xt_ind_release_handle(XTIndHandlePtr handle, xtBool have_lock, XTT /* {HANDLE-COUNT-USAGE} * This is safe here because I have excluded * all readers by taking an Xlock on the - * cache block. + * cache block (CHANGED - see below). + * + * {HANDLE-COUNT-SLOCK} + * 04.05.2009, changed to slock. + * Should be OK, because: + * A have a lock on the list lock (hs_handles_lock), + * which prevents concurrent updates to cb_handle_count. + * + * I have also have a read lock on the cache block + * but not a lock on the index. As a result, we cannot + * excluded all index writers (and readers of + * cb_handle_count. */ block->cb_handle_count--; } @@ -466,7 +497,7 @@ xtPublic void xt_ind_release_handle(XTIndHandlePtr handle, xtBool have_lock, XTT ID_HANDLE_UNLOCK(&hs->hs_handles_lock); if (block) - xt_atomicrwlock_unlock(&block->cb_lock, TRUE); + XT_IPAGE_UNLOCK(&block->cb_lock, FALSE); } /* Call this function before a referenced cache block is modified! @@ -482,17 +513,28 @@ xtPublic xtBool xt_ind_copy_on_write(XTIndReferencePtr iref) hs = &ind_cac_globals.cg_handle_slot[iref->ir_block->cb_address % XT_HANDLE_SLOTS]; + ID_HANDLE_LOCK(&hs->hs_handles_lock); + /* {HANDLE-COUNT-USAGE} * This is only called by updaters of this index block, or * the free which holds an Xlock on the index block. - * * These are all mutually exclusive for the index block. + * + * {HANDLE-COUNT-SLOCK} + * Do this check again, after we have the list lock (hs_handles_lock). + * There is a small chance that the count has changed, since we last + * checked because xt_ind_release_handle() only holds + * an slock on the index page. + * + * An updater can sometimes have a XLOCK on the index and an slock + * on the cache block. In this case xt_ind_release_handle() + * could have run through. */ - ASSERT_NS(iref->ir_block->cb_handle_count); - if (!iref->ir_block->cb_handle_count) + if (!iref->ir_block->cb_handle_count) { + ID_HANDLE_UNLOCK(&hs->hs_handles_lock); return OK; + } - ID_HANDLE_LOCK(&hs->hs_handles_lock); #ifdef CHECK_HANDLE_STRUCTS ic_check_handle_structs(); #endif @@ -609,7 +651,7 @@ xtPublic void xt_ind_init(XTThreadPtr self, size_t cache_size) #endif for (u_int i=0; i<ind_cac_globals.cg_block_count; i++) { - xt_atomicrwlock_init_with_autoname(self, &block->cb_lock); + XT_IPAGE_INIT_LOCK(self, &block->cb_lock); block->cb_state = IDX_CAC_BLOCK_FREE; block->cb_next = ind_cac_globals.cg_free_list; #ifdef XT_USE_DIRECT_IO_ON_INDEX @@ -836,10 +878,10 @@ static xtBool ind_free_block(XTOpenTablePtr ot, XTIndBlockPtr block) while (xblock) { if (block == xblock) { /* Found the block... */ - xt_atomicrwlock_xlock(&block->cb_lock, ot->ot_thread->t_id); + XT_IPAGE_WRITE_LOCK(&block->cb_lock, ot->ot_thread->t_id); if (block->cb_state != IDX_CAC_BLOCK_CLEAN) { /* This block cannot be freeed: */ - xt_atomicrwlock_unlock(&block->cb_lock, TRUE); + XT_IPAGE_UNLOCK(&block->cb_lock, TRUE); IDX_CAC_UNLOCK(seg, ot->ot_thread); #ifdef DEBUG_CHECK_IND_CACHE xt_ind_check_cache(NULL); @@ -878,11 +920,12 @@ static xtBool ind_free_block(XTOpenTablePtr ot, XTIndBlockPtr block) if (block->cb_handle_count) { XTIndReferenceRec iref; - iref.ir_ulock = XT_UNLOCK_WRITE; + iref.ir_xlock = TRUE; + iref.ir_updated = FALSE; iref.ir_block = block; iref.ir_branch = (XTIdxBranchDPtr) block->cb_data; if (!xt_ind_copy_on_write(&iref)) { - xt_atomicrwlock_unlock(&block->cb_lock, TRUE); + XT_IPAGE_UNLOCK(&block->cb_lock, TRUE); return FALSE; } } @@ -918,7 +961,7 @@ static xtBool ind_free_block(XTOpenTablePtr ot, XTIndBlockPtr block) IDX_TRACE("%d- f%x\n", (int) XT_NODE_ID(address), (int) XT_GET_DISK_2(block->cb_data)); /* Unlock BEFORE the block is reused! */ - xt_atomicrwlock_unlock(&block->cb_lock, TRUE); + XT_IPAGE_UNLOCK(&block->cb_lock, TRUE); xt_unlock_mutex_ns(&ind_cac_globals.cg_lock); @@ -1001,7 +1044,7 @@ static u_int ind_cac_free_lru_blocks(XTOpenTablePtr ot, u_int blocks_required, X * Fetch the block. Note, if we are about to write the block * then there is no need to read it from disk! */ -static XTIndBlockPtr ind_cac_fetch(XTOpenTablePtr ot, xtIndexNodeID address, DcSegmentPtr *ret_seg, xtBool read_data) +static XTIndBlockPtr ind_cac_fetch(XTOpenTablePtr ot, XTIndexPtr ind, xtIndexNodeID address, DcSegmentPtr *ret_seg, xtBool read_data) { register XTOpenFilePtr file = ot->ot_ind_file; register XTIndBlockPtr block, new_block; @@ -1110,6 +1153,7 @@ static XTIndBlockPtr ind_cac_fetch(XTOpenTablePtr ot, xtIndexNodeID address, DcS new_block->cb_state = IDX_CAC_BLOCK_CLEAN; new_block->cb_handle_count = 0; new_block->cp_flush_seq = 0; + new_block->cp_del_count = 0; new_block->cb_dirty_next = NULL; new_block->cb_dirty_prev = NULL; @@ -1172,6 +1216,13 @@ static XTIndBlockPtr ind_cac_fetch(XTOpenTablePtr ot, xtIndexNodeID address, DcS #endif xt_unlock_mutex_ns(&dcg->cg_lock); + /* {LAZY-DEL-INDEX-ITEMS} + * Conditionally count the number of deleted entries in the index: + * We do this before other threads can read the block. + */ + if (ind->mi_lazy_delete && read_data) + xt_ind_count_deleted_items(ot->ot_table, ind, block); + /* Add to the hash table: */ block->cb_next = seg->cs_hash_table[hash_idx]; seg->cs_hash_table[hash_idx] = block; @@ -1221,10 +1272,10 @@ xtPublic xtBool xt_ind_write(XTOpenTablePtr ot, XTIndexPtr ind, xtIndexNodeID ad XTIndBlockPtr block; DcSegmentPtr seg; - if (!(block = ind_cac_fetch(ot, address, &seg, FALSE))) + if (!(block = ind_cac_fetch(ot, ind, address, &seg, FALSE))) return FAILED; - xt_atomicrwlock_xlock(&block->cb_lock, ot->ot_thread->t_id); + XT_IPAGE_WRITE_LOCK(&block->cb_lock, ot->ot_thread->t_id); ASSERT_NS(block->cb_state == IDX_CAC_BLOCK_CLEAN || block->cb_state == IDX_CAC_BLOCK_DIRTY); memcpy(block->cb_data, data, size); block->cp_flush_seq = ot->ot_table->tab_ind_flush_seq; @@ -1239,7 +1290,7 @@ xtPublic xtBool xt_ind_write(XTOpenTablePtr ot, XTIndexPtr ind, xtIndexNodeID ad xt_spinlock_unlock(&ind->mi_dirty_lock); block->cb_state = IDX_CAC_BLOCK_DIRTY; } - xt_atomicrwlock_unlock(&block->cb_lock, TRUE); + XT_IPAGE_UNLOCK(&block->cb_lock, TRUE); IDX_CAC_UNLOCK(seg, ot->ot_thread); #ifdef XT_TRACK_INDEX_UPDATES ot->ot_ind_changed++; @@ -1259,10 +1310,10 @@ xtPublic xtBool xt_ind_write_cache(XTOpenTablePtr ot, xtIndexNodeID address, siz return FAILED; if (block) { - xt_atomicrwlock_xlock(&block->cb_lock, ot->ot_thread->t_id); + XT_IPAGE_WRITE_LOCK(&block->cb_lock, ot->ot_thread->t_id); ASSERT_NS(block->cb_state == IDX_CAC_BLOCK_CLEAN || block->cb_state == IDX_CAC_BLOCK_DIRTY); memcpy(block->cb_data, data, size); - xt_atomicrwlock_unlock(&block->cb_lock, TRUE); + XT_IPAGE_UNLOCK(&block->cb_lock, TRUE); IDX_CAC_UNLOCK(seg, ot->ot_thread); } @@ -1277,7 +1328,7 @@ xtPublic xtBool xt_ind_clean(XTOpenTablePtr ot, XTIndexPtr ind, xtIndexNodeID ad if (!ind_cac_get(ot, address, &seg, &block)) return FAILED; if (block) { - xt_atomicrwlock_xlock(&block->cb_lock, ot->ot_thread->t_id); + XT_IPAGE_WRITE_LOCK(&block->cb_lock, ot->ot_thread->t_id); ASSERT_NS(block->cb_state == IDX_CAC_BLOCK_CLEAN || block->cb_state == IDX_CAC_BLOCK_DIRTY); if (block->cb_state == IDX_CAC_BLOCK_DIRTY) { @@ -1293,7 +1344,7 @@ xtPublic xtBool xt_ind_clean(XTOpenTablePtr ot, XTIndexPtr ind, xtIndexNodeID ad xt_spinlock_unlock(&ind->mi_dirty_lock); block->cb_state = IDX_CAC_BLOCK_CLEAN; } - xt_atomicrwlock_unlock(&block->cb_lock, TRUE); + XT_IPAGE_UNLOCK(&block->cb_lock, TRUE); IDX_CAC_UNLOCK(seg, ot->ot_thread); } @@ -1301,29 +1352,33 @@ xtPublic xtBool xt_ind_clean(XTOpenTablePtr ot, XTIndexPtr ind, xtIndexNodeID ad return OK; } -xtPublic xtBool xt_ind_read_bytes(XTOpenTablePtr ot, xtIndexNodeID address, size_t size, xtWord1 *data) +xtPublic xtBool xt_ind_read_bytes(XTOpenTablePtr ot, XTIndexPtr ind, xtIndexNodeID address, size_t size, xtWord1 *data) { XTIndBlockPtr block; DcSegmentPtr seg; - if (!(block = ind_cac_fetch(ot, address, &seg, TRUE))) + if (!(block = ind_cac_fetch(ot, ind, address, &seg, TRUE))) return FAILED; - xt_atomicrwlock_slock(&block->cb_lock); + XT_IPAGE_READ_LOCK(&block->cb_lock); memcpy(data, block->cb_data, size); - xt_atomicrwlock_unlock(&block->cb_lock, FALSE); + XT_IPAGE_UNLOCK(&block->cb_lock, FALSE); IDX_CAC_UNLOCK(seg, ot->ot_thread); return OK; } -xtPublic xtBool xt_ind_fetch(XTOpenTablePtr ot, xtIndexNodeID address, XTPageLockType ltype, XTIndReferencePtr iref) +xtPublic xtBool xt_ind_fetch(XTOpenTablePtr ot, XTIndexPtr ind, xtIndexNodeID address, XTPageLockType ltype, XTIndReferencePtr iref) { register XTIndBlockPtr block; DcSegmentPtr seg; xtWord2 branch_size; + xtBool xlock = FALSE; - ASSERT_NS(iref->ir_ulock == XT_UNLOCK_NONE); - if (!(block = ind_cac_fetch(ot, address, &seg, TRUE))) +#ifdef DEBUG + ASSERT_NS(iref->ir_xlock == 2); + ASSERT_NS(iref->ir_xlock == 2); +#endif + if (!(block = ind_cac_fetch(ot, ind, address, &seg, TRUE))) return NULL; branch_size = XT_GET_DISK_2(((XTIdxBranchDPtr) block->cb_data)->tb_size_2); @@ -1333,21 +1388,50 @@ xtPublic xtBool xt_ind_fetch(XTOpenTablePtr ot, xtIndexNodeID address, XTPageLoc return FAILED; } - if (ltype == XT_XLOCK_LEAF) { - if (XT_IS_NODE(branch_size)) - ltype = XT_LOCK_READ; - else - ltype = XT_LOCK_WRITE; + switch (ltype) { + case XT_LOCK_READ: + break; + case XT_LOCK_WRITE: + xlock = TRUE; + break; + case XT_XLOCK_LEAF: + if (!XT_IS_NODE(branch_size)) + xlock = TRUE; + break; + case XT_XLOCK_DEL_LEAF: + if (!XT_IS_NODE(branch_size)) { + if (ot->ot_table->tab_dic.dic_no_lazy_delete) + xlock = TRUE; + else { + /* + * {LAZY-DEL-INDEX-ITEMS} + * + * We are fetch a page for delete purpose. + * we decide here if we plan to do a lazy delete, + * Or if we plan to compact the node. + * + * A lazy delete just requires a shared lock. + * + */ + if (ind->mi_lazy_delete) { + /* If the number of deleted items is greater than + * half of the number of times that can fit in the + * page, then we will compact the node. + */ + if (!xt_idx_lazy_delete_on_leaf(ind, block, XT_GET_INDEX_BLOCK_LEN(branch_size))) + xlock = TRUE; + } + else + xlock = TRUE; + } + } + break; } - if (ltype == XT_LOCK_WRITE) { - xt_atomicrwlock_xlock(&block->cb_lock, ot->ot_thread->t_id); - iref->ir_ulock = XT_UNLOCK_WRITE; - } - else { - xt_atomicrwlock_slock(&block->cb_lock); - iref->ir_ulock = XT_UNLOCK_READ; - } + if ((iref->ir_xlock = xlock)) + XT_IPAGE_WRITE_LOCK(&block->cb_lock, ot->ot_thread->t_id); + else + XT_IPAGE_READ_LOCK(&block->cb_lock); IDX_CAC_UNLOCK(seg, ot->ot_thread); @@ -1358,18 +1442,31 @@ xtPublic xtBool xt_ind_fetch(XTOpenTablePtr ot, xtIndexNodeID address, XTPageLoc * As a result, we need to pass a pointer to both the * cache block and the cache block data: */ + iref->ir_updated = FALSE; iref->ir_block = block; iref->ir_branch = (XTIdxBranchDPtr) block->cb_data; return OK; } -xtPublic xtBool xt_ind_release(XTOpenTablePtr ot, XTIndexPtr ind, XTPageUnlockType XT_UNUSED(utype), XTIndReferencePtr iref) +xtPublic xtBool xt_ind_release(XTOpenTablePtr ot, XTIndexPtr ind, XTPageUnlockType XT_NDEBUG_UNUSED(utype), XTIndReferencePtr iref) { register XTIndBlockPtr block; block = iref->ir_block; - if (utype == XT_UNLOCK_R_UPDATE || utype == XT_UNLOCK_W_UPDATE) { +#ifdef DEBUG + ASSERT_NS(iref->ir_xlock != 2); + ASSERT_NS(iref->ir_updated != 2); + if (iref->ir_updated) + ASSERT_NS(utype == XT_UNLOCK_R_UPDATE || utype == XT_UNLOCK_W_UPDATE); + else + ASSERT_NS(utype == XT_UNLOCK_READ || utype == XT_UNLOCK_WRITE); + if (iref->ir_xlock) + ASSERT_NS(utype == XT_UNLOCK_WRITE || utype == XT_UNLOCK_W_UPDATE); + else + ASSERT_NS(utype == XT_UNLOCK_READ || utype == XT_UNLOCK_R_UPDATE); +#endif + if (iref->ir_updated) { /* The page was update: */ ASSERT_NS(block->cb_state == IDX_CAC_BLOCK_CLEAN || block->cb_state == IDX_CAC_BLOCK_DIRTY); block->cp_flush_seq = ot->ot_table->tab_ind_flush_seq; @@ -1386,16 +1483,10 @@ xtPublic xtBool xt_ind_release(XTOpenTablePtr ot, XTIndexPtr ind, XTPageUnlockTy } } + XT_IPAGE_UNLOCK(&block->cb_lock, iref->ir_xlock); #ifdef DEBUG - if (utype == XT_UNLOCK_W_UPDATE) - utype = XT_UNLOCK_WRITE; - else if (utype == XT_UNLOCK_R_UPDATE) - utype = XT_UNLOCK_READ; - ASSERT_NS(iref->ir_ulock == utype); -#endif - xt_atomicrwlock_unlock(&block->cb_lock, iref->ir_ulock == XT_UNLOCK_WRITE ? TRUE : FALSE); -#ifdef DEBUG - iref->ir_ulock = XT_UNLOCK_NONE; + iref->ir_xlock = 2; + iref->ir_updated = 2; #endif return OK; } @@ -1484,24 +1575,3 @@ xtPublic void xt_ind_unreserve(XTOpenTablePtr ot) xt_ind_free_reserved(ot); } -xtPublic void xt_load_indices(XTThreadPtr self, XTOpenTablePtr ot) -{ - register XTTableHPtr tab = ot->ot_table; - register XTIndBlockPtr block; - DcSegmentPtr seg; - xtIndexNodeID id; - - xt_lock_mutex_ns(&tab->tab_ind_flush_lock); - - for (id=1; id < XT_NODE_ID(tab->tab_ind_eof); id++) { - if (!(block = ind_cac_fetch(ot, id, &seg, TRUE))) { - xt_unlock_mutex_ns(&tab->tab_ind_flush_lock); - xt_throw(self); - } - IDX_CAC_UNLOCK(seg, ot->ot_thread); - } - - xt_unlock_mutex_ns(&tab->tab_ind_flush_lock); -} - - diff --git a/storage/pbxt/src/cache_xt.h b/storage/pbxt/src/cache_xt.h index d113bb2f907..fcd05a8fe34 100644 --- a/storage/pbxt/src/cache_xt.h +++ b/storage/pbxt/src/cache_xt.h @@ -45,8 +45,46 @@ struct XTIdxReadBuffer; #define IDX_CAC_BLOCK_CLEAN 1 #define IDX_CAC_BLOCK_DIRTY 2 -typedef enum XTPageLockType { XT_LOCK_READ, XT_LOCK_WRITE, XT_XLOCK_LEAF }; -typedef enum XTPageUnlockType { XT_UNLOCK_NONE, XT_UNLOCK_READ, XT_UNLOCK_WRITE, XT_UNLOCK_R_UPDATE, XT_UNLOCK_W_UPDATE }; +#ifdef XT_NO_ATOMICS +#define XT_IPAGE_USE_PTHREAD_RW +#else +//#define XT_IPAGE_USE_ATOMIC_RW +#define XT_IPAGE_USE_SPINXSLOCK +//#define XT_IPAGE_USE_SKEW_RW +#endif + +#ifdef XT_IPAGE_USE_ATOMIC_RW +#define XT_IPAGE_LOCK_TYPE XTAtomicRWLockRec +#define XT_IPAGE_INIT_LOCK(s, i) xt_atomicrwlock_init_with_autoname(s, i) +#define XT_IPAGE_FREE_LOCK(s, i) xt_atomicrwlock_free(s, i) +#define XT_IPAGE_READ_LOCK(i) xt_atomicrwlock_slock(i) +#define XT_IPAGE_WRITE_LOCK(i, o) xt_atomicrwlock_xlock(i, o) +#define XT_IPAGE_UNLOCK(i, x) xt_atomicrwlock_unlock(i, x) +#elif defined(XT_IPAGE_USE_PTHREAD_RW) +#define XT_IPAGE_LOCK_TYPE xt_rwlock_type +#define XT_IPAGE_INIT_LOCK(s, i) xt_init_rwlock(s, i) +#define XT_IPAGE_FREE_LOCK(s, i) xt_free_rwlock(i) +#define XT_IPAGE_READ_LOCK(i) xt_slock_rwlock_ns(i) +#define XT_IPAGE_WRITE_LOCK(i, s) xt_xlock_rwlock_ns(i) +#define XT_IPAGE_UNLOCK(i, x) xt_unlock_rwlock_ns(i) +#elif defined(XT_IPAGE_USE_SPINXSLOCK) +#define XT_IPAGE_LOCK_TYPE XTSpinXSLockRec +#define XT_IPAGE_INIT_LOCK(s, i) xt_spinxslock_init_with_autoname(s, i) +#define XT_IPAGE_FREE_LOCK(s, i) xt_spinxslock_free(s, i) +#define XT_IPAGE_READ_LOCK(i) xt_spinxslock_slock(i) +#define XT_IPAGE_WRITE_LOCK(i, o) xt_spinxslock_xlock(i, o) +#define XT_IPAGE_UNLOCK(i, x) xt_spinxslock_unlock(i, x) +#else // XT_IPAGE_USE_SKEW_RW +#define XT_IPAGE_LOCK_TYPE XTSkewRWLockRec +#define XT_IPAGE_INIT_LOCK(s, i) xt_skewrwlock_init_with_autoname(s, i) +#define XT_IPAGE_FREE_LOCK(s, i) xt_skewrwlock_free(s, i) +#define XT_IPAGE_READ_LOCK(i) xt_skewrwlock_slock(i) +#define XT_IPAGE_WRITE_LOCK(i, o) xt_skewrwlock_xlock(i, o) +#define XT_IPAGE_UNLOCK(i, x) xt_skewrwlock_unlock(i, x) +#endif + +enum XTPageLockType { XT_LOCK_READ, XT_LOCK_WRITE, XT_XLOCK_LEAF, XT_XLOCK_DEL_LEAF }; +enum XTPageUnlockType { XT_UNLOCK_NONE, XT_UNLOCK_READ, XT_UNLOCK_WRITE, XT_UNLOCK_R_UPDATE, XT_UNLOCK_W_UPDATE }; /* A block is X locked if it is being changed or freed. * A block is S locked if it is being read. @@ -64,10 +102,11 @@ typedef struct XTIndBlock { struct XTIndBlock *cb_mr_used; /* More recently used blocks. */ struct XTIndBlock *cb_lr_used; /* Less recently used blocks. */ /* Protected by cb_lock: */ - XTAtomicRWLockRec cb_lock; + XT_IPAGE_LOCK_TYPE cb_lock; xtWord1 cb_state; /* Block status. */ xtWord2 cb_handle_count; /* TRUE if this page is referenced by a handle. */ xtWord2 cp_flush_seq; + xtWord2 cp_del_count; /* Number of deleted entries. */ #ifdef XT_USE_DIRECT_IO_ON_INDEX xtWord1 *cb_data; #else @@ -76,16 +115,18 @@ typedef struct XTIndBlock { } XTIndBlockRec, *XTIndBlockPtr; typedef struct XTIndReference { - XTPageUnlockType ir_ulock; + xtBool ir_xlock; /* Set to TRUE if the cache block is X locked. */ + xtBool ir_updated; /* Set to TRUE if the cache block is updated. */ XTIndBlockPtr ir_block; XTIdxBranchDPtr ir_branch; } XTIndReferenceRec, *XTIndReferencePtr; typedef struct XTIndFreeBlock { + XTDiskValue1 if_zero1_1; /* Must be set to zero. */ + XTDiskValue1 if_zero2_1; /* Must be set to zero. */ XTDiskValue1 if_status_1; XTDiskValue1 if_unused1_1; - XTDiskValue2 if_unused2_2; - XTDiskValue4 if_unused3_4; + XTDiskValue4 if_unused2_4; XTDiskValue8 if_next_block_8; } XTIndFreeBlockRec, *XTIndFreeBlockPtr; @@ -116,14 +157,13 @@ xtInt8 xt_ind_get_size(); xtBool xt_ind_write(struct XTOpenTable *ot, XTIndexPtr ind, xtIndexNodeID offset, size_t size, xtWord1 *data); xtBool xt_ind_write_cache(struct XTOpenTable *ot, xtIndexNodeID offset, size_t size, xtWord1 *data); xtBool xt_ind_clean(struct XTOpenTable *ot, XTIndexPtr ind, xtIndexNodeID offset); -xtBool xt_ind_read_bytes(struct XTOpenTable *ot, xtIndexNodeID offset, size_t size, xtWord1 *data); +xtBool xt_ind_read_bytes(struct XTOpenTable *ot, XTIndexPtr ind, xtIndexNodeID offset, size_t size, xtWord1 *data); void xt_ind_check_cache(XTIndexPtr ind); xtBool xt_ind_reserve(struct XTOpenTable *ot, u_int count, XTIdxBranchDPtr not_this); void xt_ind_free_reserved(struct XTOpenTable *ot); void xt_ind_unreserve(struct XTOpenTable *ot); -void xt_load_indices(XTThreadPtr self, struct XTOpenTable *ot); -xtBool xt_ind_fetch(struct XTOpenTable *ot, xtIndexNodeID node, XTPageLockType ltype, XTIndReferencePtr iref); +xtBool xt_ind_fetch(struct XTOpenTable *ot, XTIndexPtr ind, xtIndexNodeID node, XTPageLockType ltype, XTIndReferencePtr iref); xtBool xt_ind_release(struct XTOpenTable *ot, XTIndexPtr ind, XTPageUnlockType utype, XTIndReferencePtr iref); void xt_ind_lock_handle(XTIndHandlePtr handle); diff --git a/storage/pbxt/src/ccutils_xt.cc b/storage/pbxt/src/ccutils_xt.cc index 1d93e4c34b3..2f31061ac21 100644 --- a/storage/pbxt/src/ccutils_xt.cc +++ b/storage/pbxt/src/ccutils_xt.cc @@ -29,7 +29,7 @@ #include "ccutils_xt.h" #include "bsearch_xt.h" -static int ccu_compare_object(XTThreadPtr XT_UNUSED(self), register const void XT_UNUSED(*thunk), register const void *a, register const void *b) +static int ccu_compare_object(XTThreadPtr XT_UNUSED(self), register const void *XT_UNUSED(thunk), register const void *a, register const void *b) { XTObject *obj_ptr = (XTObject *) b; diff --git a/storage/pbxt/src/database_xt.cc b/storage/pbxt/src/database_xt.cc index d15ffc6d06e..c2231f6e854 100644 --- a/storage/pbxt/src/database_xt.cc +++ b/storage/pbxt/src/database_xt.cc @@ -23,6 +23,10 @@ #include "xt_config.h" +#ifdef DRIZZLED +#include <bitset> +#endif + #include <string.h> #include <stdio.h> @@ -240,7 +244,7 @@ static void db_hash_free(XTThreadPtr self, void *data) xt_heap_release(self, (XTDatabaseHPtr) data); } -static int db_cmp_db_id(struct XTThread XT_UNUSED(*self), register const void XT_UNUSED(*thunk), register const void *a, register const void *b) +static int db_cmp_db_id(struct XTThread *XT_UNUSED(self), register const void *XT_UNUSED(thunk), register const void *a, register const void *b) { xtDatabaseID db_id = *((xtDatabaseID *) a); XTDatabaseHPtr *db_ptr = (XTDatabaseHPtr *) b; @@ -346,7 +350,7 @@ static void db_finalize(XTThreadPtr self, void *x) } } -static void db_onrelease(XTThreadPtr self, void XT_UNUSED(*x)) +static void db_onrelease(XTThreadPtr self, void *XT_UNUSED(x)) { /* Signal threads waiting for exclusive use of the database: */ if (xt_db_open_databases) // The database may already be closed. @@ -612,7 +616,7 @@ xtPublic void xt_drop_database(XTThreadPtr self, XTDatabaseHPtr db) xtPublic void xt_open_database(XTThreadPtr self, char *path, xtBool multi_path) { XTDatabaseHPtr db; - + /* We cannot get a database, without unusing the current * first. The reason is that the restart process will * partially set the current database! @@ -621,7 +625,7 @@ xtPublic void xt_open_database(XTThreadPtr self, char *path, xtBool multi_path) db = xt_get_database(self, path, multi_path); pushr_(xt_heap_release, db); xt_use_database(self, db, XT_FOR_USER); - freer_(); // xt_heap_release(self, db); + freer_(); // xt_heap_release(self, db); } /* This function can only be called if you do not already have a database in @@ -638,6 +642,12 @@ xtPublic void xt_use_database(XTThreadPtr self, XTDatabaseHPtr db, int what_for) xt_heap_reference(self, db); self->st_database = db; +#ifdef XT_WAIT_FOR_CLEANUP + self->st_last_xact = 0; + for (int i=0; i<XT_MAX_XACT_BEHIND; i++) { + self->st_prev_xact[i] = db->db_xn_curr_id; + } +#endif xt_xn_init_thread(self, what_for); } @@ -1117,15 +1127,18 @@ xtPublic void xt_db_return_table_to_pool_ns(XTOpenTablePtr ot) XTDatabaseHPtr db = ot->ot_table->tab_db; xtBool flush_table = TRUE; + /* No open table returned to the pool should still + * have a cache handle! + */ + ASSERT_NS(!ot->ot_ind_rhandle); xt_lock_mutex_ns(&db->db_ot_pool.opt_lock); if (!(table_pool = db_get_open_table_pool(db, ot->ot_table->tab_id))) goto failed; if (table_pool->opt_locked && !table_pool->opt_flushing) { - table_pool->opt_total_open--; /* Table will be closed below: */ - if (table_pool->opt_total_open > 0) + if (table_pool->opt_total_open > 1) flush_table = FALSE; } else { @@ -1151,14 +1164,21 @@ xtPublic void xt_db_return_table_to_pool_ns(XTOpenTablePtr ot) ot = NULL; } + if (ot) { + xt_unlock_mutex_ns(&db->db_ot_pool.opt_lock); + xt_close_table(ot, flush_table, FALSE); + + /* assume that table_pool cannot be invalidated in between as we have table_pool->opt_total_open > 0 */ + xt_lock_mutex_ns(&db->db_ot_pool.opt_lock); + table_pool->opt_total_open--; + } + db_free_open_table_pool(NULL, table_pool); if (!xt_broadcast_cond_ns(&db->db_ot_pool.opt_cond)) goto failed; xt_unlock_mutex_ns(&db->db_ot_pool.opt_lock); - if (ot) - xt_close_table(ot, flush_table, FALSE); - + return; failed: diff --git a/storage/pbxt/src/datadic_xt.cc b/storage/pbxt/src/datadic_xt.cc index c2ac186cf6f..9e709b61c59 100644 --- a/storage/pbxt/src/datadic_xt.cc +++ b/storage/pbxt/src/datadic_xt.cc @@ -26,6 +26,10 @@ #include "xt_config.h" +#ifdef DRIZZLED +#include <bitset> +#endif + #include <ctype.h> #include <errno.h> @@ -433,7 +437,7 @@ class XTTokenizer { XTToken *nextToken(XTThreadPtr self, c_char *keyword, XTToken *tk); }; -void ri_free_token(XTThreadPtr self __attribute__((unused)), XTToken *tk) +void ri_free_token(XTThreadPtr XT_UNUSED(self), XTToken *tk) { delete tk; } @@ -524,6 +528,13 @@ XTToken *XTTokenizer::nextToken(XTThreadPtr self) break; tkn_curr_pos++; } + /* TODO: Unless sql_mode == 'NO_BACKSLASH_ESCAPES'!!! */ + if (*tkn_curr_pos == '\\') { + if (*(tkn_curr_pos+1) == quote) { + if (quote == '"' || quote == '\'') + tkn_curr_pos++; + } + } tkn_curr_pos++; } @@ -639,7 +650,7 @@ class XTParseTable : public XTObject { int parseKeyAction(XTThreadPtr self); void parseCreateTable(XTThreadPtr self); void parseAddTableItem(XTThreadPtr self); - void parseQualifiedName(XTThreadPtr self, char *name); + void parseQualifiedName(XTThreadPtr self, char *parent_name, char *name); void parseTableName(XTThreadPtr self, bool alterTable); void parseExpression(XTThreadPtr self, bool allow_reserved); void parseBrackets(XTThreadPtr self); @@ -667,53 +678,53 @@ class XTParseTable : public XTObject { memset(&pt_sbuffer, 0, sizeof(XTStringBufferRec)); } - virtual void finalize(XTThreadPtr self __attribute__((unused))) { + virtual void finalize(XTThreadPtr XT_UNUSED(self)) { if (pt_tokenizer) delete pt_tokenizer; xt_sb_set_size(NULL, &pt_sbuffer, 0); } // Hooks to receive output from the parser: - virtual void setTableName(XTThreadPtr self __attribute__((unused)), char *name __attribute__((unused)), bool alterTable __attribute__((unused))) { + virtual void setTableName(XTThreadPtr XT_UNUSED(self), char *XT_UNUSED(name), bool XT_UNUSED(alterTable)) { } - virtual void addColumn(XTThreadPtr self __attribute__((unused)), char *col_name __attribute__((unused)), char *old_col_name __attribute__((unused))) { + virtual void addColumn(XTThreadPtr XT_UNUSED(self), char *XT_UNUSED(col_name), char *XT_UNUSED(old_col_name)) { } virtual void setDataType(XTThreadPtr self, char *cstring) { if (cstring) xt_free(self, cstring); } - virtual void setNull(XTThreadPtr self __attribute__((unused)), bool nullOK __attribute__((unused))) { + virtual void setNull(XTThreadPtr XT_UNUSED(self), bool XT_UNUSED(nullOK)) { } - virtual void setAutoInc(XTThreadPtr self __attribute__((unused)), bool autoInc __attribute__((unused))) { + virtual void setAutoInc(XTThreadPtr XT_UNUSED(self), bool XT_UNUSED(autoInc)) { } /* Add a contraint. If lastColumn is TRUE then add the contraint * to the last column. If not, expect addListedColumn() to be called. */ - virtual void addConstraint(XTThreadPtr self __attribute__((unused)), char *name __attribute__((unused)), u_int type __attribute__((unused)), bool lastColumn __attribute__((unused))) { + virtual void addConstraint(XTThreadPtr XT_UNUSED(self), char *XT_UNUSED(name), u_int XT_UNUSED(type), bool XT_UNUSED(lastColumn)) { } /* Move the last column created. If symbol is NULL then move the column to the * first position, else move it to the position just after the given column. */ - virtual void moveColumn(XTThreadPtr self __attribute__((unused)), char *col_name __attribute__((unused))) { + virtual void moveColumn(XTThreadPtr XT_UNUSED(self), char *XT_UNUSED(col_name)) { } - virtual void dropColumn(XTThreadPtr self __attribute__((unused)), char *col_name __attribute__((unused))) { + virtual void dropColumn(XTThreadPtr XT_UNUSED(self), char *XT_UNUSED(col_name)) { } - virtual void dropConstraint(XTThreadPtr self __attribute__((unused)), char *name __attribute__((unused)), u_int type __attribute__((unused))) { + virtual void dropConstraint(XTThreadPtr XT_UNUSED(self), char *XT_UNUSED(name), u_int XT_UNUSED(type)) { } - virtual void setIndexName(XTThreadPtr self __attribute__((unused)), char *name __attribute__((unused))) { + virtual void setIndexName(XTThreadPtr XT_UNUSED(self), char *XT_UNUSED(name)) { } - virtual void addListedColumn(XTThreadPtr self __attribute__((unused)), char *index_col_name __attribute__((unused))) { + virtual void addListedColumn(XTThreadPtr XT_UNUSED(self), char *XT_UNUSED(index_col_name)) { } - virtual void setReferencedTable(XTThreadPtr self __attribute__((unused)), char *ref_table __attribute__((unused))) { + virtual void setReferencedTable(XTThreadPtr XT_UNUSED(self), char *XT_UNUSED(ref_schema), char *XT_UNUSED(ref_table)) { } - virtual void addReferencedColumn(XTThreadPtr self __attribute__((unused)), char *index_col_name __attribute__((unused))) { + virtual void addReferencedColumn(XTThreadPtr XT_UNUSED(self), char *XT_UNUSED(index_col_name)) { } - virtual void setActions(XTThreadPtr self __attribute__((unused)), int on_delete __attribute__((unused)), int on_update __attribute__((unused))) { + virtual void setActions(XTThreadPtr XT_UNUSED(self), int XT_UNUSED(on_delete), int XT_UNUSED(on_update)) { } virtual void parseTable(XTThreadPtr self, bool convert, char *sql); @@ -859,7 +870,7 @@ void XTParseTable::parseAddTableItem(XTThreadPtr self) if (pt_current->isKeyWord("CONSTRAINT")) { pt_current = pt_tokenizer->nextToken(self); if (pt_current->isIdentifier()) - parseQualifiedName(self, name); + parseQualifiedName(self, NULL, name); } if (pt_current->isReservedWord(XT_TK_PRIMARY)) { @@ -974,13 +985,15 @@ void XTParseTable::parseMoveColumn(XTThreadPtr self) char name[XT_IDENTIFIER_NAME_SIZE]; pt_current = pt_tokenizer->nextToken(self); - parseQualifiedName(self, name); + parseQualifiedName(self, NULL, name); moveColumn(self, name); } } -void XTParseTable::parseQualifiedName(XTThreadPtr self, char *name) +void XTParseTable::parseQualifiedName(XTThreadPtr self, char *parent_name, char *name) { + if (parent_name) + parent_name[0] = '\0'; /* Should be an identifier by I have this example: * CREATE TABLE t1 ( comment CHAR(32) ASCII NOT NULL, koi8_ru_f CHAR(32) CHARACTER SET koi8r NOT NULL default '' ) CHARSET=latin5; * @@ -990,6 +1003,8 @@ void XTParseTable::parseQualifiedName(XTThreadPtr self, char *name) raiseError(self, pt_current, XT_ERR_ID_TOO_LONG); pt_current = pt_tokenizer->nextToken(self); while (pt_current->isKeyWord(".")) { + if (parent_name) + xt_strcpy(XT_IDENTIFIER_NAME_SIZE,parent_name, name); pt_current = pt_tokenizer->nextToken(self); /* Accept anything after the DOT! */ if (pt_current->getString(name, XT_IDENTIFIER_NAME_SIZE) >= XT_IDENTIFIER_NAME_SIZE) @@ -1002,7 +1017,7 @@ void XTParseTable::parseTableName(XTThreadPtr self, bool alterTable) { char name[XT_IDENTIFIER_NAME_SIZE]; - parseQualifiedName(self, name); + parseQualifiedName(self, NULL, name); setTableName(self, name, alterTable); } @@ -1011,7 +1026,7 @@ void XTParseTable::parseColumnDefinition(XTThreadPtr self, char *old_col_name) char col_name[XT_IDENTIFIER_NAME_SIZE]; // column_definition - parseQualifiedName(self, col_name); + parseQualifiedName(self, NULL, col_name); addColumn(self, col_name, old_col_name); parseDataType(self); @@ -1111,7 +1126,7 @@ u_int XTParseTable::columnList(XTThreadPtr self, bool index_cols) pt_current->expectKeyWord(self, "("); do { pt_current = pt_tokenizer->nextToken(self); - parseQualifiedName(self, name); + parseQualifiedName(self, NULL, name); addListedColumn(self, name); cols++; if (index_cols) { @@ -1135,19 +1150,20 @@ void XTParseTable::parseReferenceDefinition(XTThreadPtr self, u_int req_cols) int on_delete = XT_KEY_ACTION_DEFAULT; int on_update = XT_KEY_ACTION_DEFAULT; char name[XT_IDENTIFIER_NAME_SIZE]; + char parent_name[XT_IDENTIFIER_NAME_SIZE]; u_int cols = 0; // REFERENCES tbl_name pt_current = pt_tokenizer->nextToken(self, "REFERENCES", pt_current); - parseQualifiedName(self, name); - setReferencedTable(self, name); + parseQualifiedName(self, parent_name, name); + setReferencedTable(self, parent_name[0] ? parent_name : NULL, name); // [ (index_col_name,...) ] if (pt_current->isKeyWord("(")) { pt_current->expectKeyWord(self, "("); do { pt_current = pt_tokenizer->nextToken(self); - parseQualifiedName(self, name); + parseQualifiedName(self, NULL, name); addReferencedColumn(self, name); cols++; if (cols > req_cols) @@ -1219,7 +1235,7 @@ void XTParseTable::parseAlterTable(XTThreadPtr self) if (pt_current->isReservedWord(XT_TK_COLUMN)) pt_current = pt_tokenizer->nextToken(self); - parseQualifiedName(self, old_col_name); + parseQualifiedName(self, NULL, old_col_name); parseColumnDefinition(self, old_col_name); parseMoveColumn(self); } @@ -1251,7 +1267,7 @@ void XTParseTable::parseAlterTable(XTThreadPtr self) else { if (pt_current->isReservedWord(XT_TK_COLUMN)) pt_current = pt_tokenizer->nextToken(self); - parseQualifiedName(self, name); + parseQualifiedName(self, NULL, name); dropColumn(self, name); } } @@ -1259,7 +1275,7 @@ void XTParseTable::parseAlterTable(XTThreadPtr self) pt_current = pt_tokenizer->nextToken(self); if (pt_current->isKeyWord("TO")) pt_current = pt_tokenizer->nextToken(self); - parseQualifiedName(self, name); + parseQualifiedName(self, NULL, name); } else /* Just ignore the syntax until the next , */ @@ -1284,7 +1300,7 @@ void XTParseTable::parseCreateIndex(XTThreadPtr self) else if (pt_current->isKeyWord("SPACIAL")) pt_current = pt_tokenizer->nextToken(self); pt_current = pt_tokenizer->nextToken(self, "INDEX", pt_current); - parseQualifiedName(self, name); + parseQualifiedName(self, NULL, name); optionalIndexType(self); pt_current = pt_tokenizer->nextToken(self, "ON", pt_current); parseTableName(self, true); @@ -1299,7 +1315,7 @@ void XTParseTable::parseDropIndex(XTThreadPtr self) pt_current = pt_tokenizer->nextToken(self, "DROP", pt_current); pt_current = pt_tokenizer->nextToken(self, "INDEX", pt_current); - parseQualifiedName(self, name); + parseQualifiedName(self, NULL, name); pt_current = pt_tokenizer->nextToken(self, "ON", pt_current); parseTableName(self, true); dropConstraint(self, name, XT_DD_INDEX); @@ -1340,7 +1356,7 @@ class XTCreateTable : public XTParseTable { virtual void addConstraint(XTThreadPtr self, char *name, u_int type, bool lastColumn); virtual void dropConstraint(XTThreadPtr self, char *name, u_int type); virtual void addListedColumn(XTThreadPtr self, char *index_col_name); - virtual void setReferencedTable(XTThreadPtr self, char *ref_table); + virtual void setReferencedTable(XTThreadPtr self, char *ref_schema, char *ref_table); virtual void addReferencedColumn(XTThreadPtr self, char *index_col_name); virtual void setActions(XTThreadPtr self, int on_delete, int on_update); @@ -1535,23 +1551,31 @@ void XTCreateTable::addListedColumn(XTThreadPtr self, char *index_col_name) } } -void XTCreateTable::setReferencedTable(XTThreadPtr self, char *ref_table) +void XTCreateTable::setReferencedTable(XTThreadPtr self, char *ref_schema, char *ref_table) { XTDDForeignKey *fk = (XTDDForeignKey *) ct_curr_constraint; char path[PATH_MAX]; - xt_strcpy(PATH_MAX, path, ct_tab_path->ps_path); - xt_remove_last_name_of_path(path); - if (ct_convert) { - char buffer[XT_IDENTIFIER_NAME_SIZE]; - size_t len; - - myxt_static_convert_identifier(self, ct_charset, ref_table, buffer, XT_IDENTIFIER_NAME_SIZE); - len = strlen(path); - myxt_static_convert_table_name(self, buffer, &path[len], PATH_MAX - len); - } - else + if (ref_schema) { + xt_strcpy(PATH_MAX,path, "."); + xt_add_dir_char(PATH_MAX, path); + xt_strcat(PATH_MAX, path, ref_schema); + xt_add_dir_char(PATH_MAX, path); xt_strcat(PATH_MAX, path, ref_table); + } else { + xt_strcpy(PATH_MAX, path, ct_tab_path->ps_path); + xt_remove_last_name_of_path(path); + if (ct_convert) { + char buffer[XT_IDENTIFIER_NAME_SIZE]; + size_t len; + + myxt_static_convert_identifier(self, ct_charset, ref_table, buffer, XT_IDENTIFIER_NAME_SIZE); + len = strlen(path); + myxt_static_convert_table_name(self, buffer, &path[len], PATH_MAX - len); + } + else + xt_strcat(PATH_MAX, path, ref_table); + } fk->fk_ref_tab_name = (XTPathStrPtr) xt_dup_string(self, path); } @@ -1578,7 +1602,7 @@ void XTCreateTable::addReferencedColumn(XTThreadPtr self, char *index_col_name) fk->fk_ref_cols.clone(self, &fk->co_cols); } -void XTCreateTable::setActions(XTThreadPtr self __attribute__((unused)), int on_delete, int on_update) +void XTCreateTable::setActions(XTThreadPtr XT_UNUSED(self), int on_delete, int on_update) { XTDDForeignKey *fk = (XTDDForeignKey *) ct_curr_constraint; @@ -1711,8 +1735,8 @@ void XTDDConstraint::alterColumnName(XTThreadPtr self, char *from_name, char *to void XTDDConstraint::getColumnList(char *buffer, size_t size) { if (co_table->dt_table) { - xt_strcat(size, buffer, "`"); - xt_strcpy(size, buffer, co_table->dt_table->tab_name->ps_path); + xt_strcpy(size, buffer, "`"); + xt_strcat(size, buffer, co_table->dt_table->tab_name->ps_path); xt_strcat(size, buffer, "` (`"); } else @@ -1739,6 +1763,20 @@ bool XTDDConstraint::sameColumns(XTDDConstraint *co) return OK; } +bool XTDDConstraint::samePrefixColumns(XTDDConstraint *co) +{ + u_int i = 0; + + if (co_cols.size() > co->co_cols.size()) + return false; + while (i<co_cols.size()) { + if (myxt_strcasecmp(co_cols.itemAt(i)->cr_col_name, co->co_cols.itemAt(i)->cr_col_name) != 0) + return false; + i++; + } + return OK; +} + bool XTDDConstraint::attachColumns() { XTDDColumn *col; @@ -1773,6 +1811,7 @@ bool XTDDTableRef::checkReference(xtWord1 *before_buf, XTThreadPtr thread) XTIdxSearchKeyRec search_key; xtXactID xn_id; XTXactWaitRec xw; + bool ok = false; if (!(loc_ind = tr_fkey->getReferenceIndexPtr())) return false; @@ -1792,40 +1831,42 @@ bool XTDDTableRef::checkReference(xtWord1 *before_buf, XTThreadPtr thread) /* Search for the key in the child (referencing) table: */ if (!(ot = xt_db_open_table_using_tab(tr_fkey->co_table->dt_table, thread))) - goto failed; + return false; retry: if (!xt_idx_search(ot, ind, &search_key)) - goto failed; + goto done; while (ot->ot_curr_rec_id && search_key.sk_on_key) { switch (xt_tab_maybe_committed(ot, ot->ot_curr_rec_id, &xn_id, &ot->ot_curr_row_id, &ot->ot_curr_updated)) { case XT_MAYBE: xw.xw_xn_id = xn_id; if (!xt_xn_wait_for_xact(thread, &xw, NULL)) - goto failed; + goto done; goto retry; case XT_ERR: - goto failed; + goto done; case TRUE: /* We found a matching child: */ xt_register_ixterr(XT_REG_CONTEXT, XT_ERR_ROW_IS_REFERENCED, tr_fkey->co_name); - goto failed; - break; + goto done; case FALSE: if (!xt_idx_next(ot, ind, &search_key)) - goto failed; + goto done; break; } } /* No matching children, all OK: */ - xt_db_return_table_to_pool_ns(ot); - return true; + ok = true; - failed: + done: + if (ot->ot_ind_rhandle) { + xt_ind_release_handle(ot->ot_ind_rhandle, FALSE, thread); + ot->ot_ind_rhandle = NULL; + } xt_db_return_table_to_pool_ns(ot); - return false; + return ok; } /* @@ -1962,6 +2003,10 @@ bool XTDDTableRef::modifyRow(XTOpenTablePtr XT_UNUSED(ref_ot), xtWord1 *before_b } /* No matching children, all OK: */ + if (ot->ot_ind_rhandle) { + xt_ind_release_handle(ot->ot_ind_rhandle, FALSE, thread); + ot->ot_ind_rhandle = NULL; + } xt_db_return_table_to_pool_ns(ot); success: @@ -1971,6 +2016,10 @@ bool XTDDTableRef::modifyRow(XTOpenTablePtr XT_UNUSED(ref_ot), xtWord1 *before_b return true; failed_2: + if (ot->ot_ind_rhandle) { + xt_ind_release_handle(ot->ot_ind_rhandle, FALSE, thread); + ot->ot_ind_rhandle = NULL; + } xt_db_return_table_to_pool_ns(ot); failed: @@ -2055,8 +2104,13 @@ void XTDDForeignKey::finalize(XTThreadPtr self) void XTDDForeignKey::loadString(XTThreadPtr self, XTStringBufferPtr sb) { + char schema_name[XT_IDENTIFIER_NAME_SIZE]; + XTDDConstraint::loadString(self, sb); xt_sb_concat(self, sb, " REFERENCES `"); + xt_2nd_last_name_of_path(XT_IDENTIFIER_NAME_SIZE, schema_name, fk_ref_tab_name->ps_path); + xt_sb_concat(self, sb, schema_name); + xt_sb_concat(self, sb, "`.`"); xt_sb_concat(self, sb, xt_last_name_of_path(fk_ref_tab_name->ps_path)); xt_sb_concat(self, sb, "` "); @@ -2136,6 +2190,20 @@ bool XTDDForeignKey::sameReferenceColumns(XTDDConstraint *co) return OK; } +bool XTDDForeignKey::samePrefixReferenceColumns(XTDDConstraint *co) +{ + u_int i = 0; + + if (fk_ref_cols.size() > co->co_cols.size()) + return false; + while (i<fk_ref_cols.size()) { + if (myxt_strcasecmp(fk_ref_cols.itemAt(i)->cr_col_name, co->co_cols.itemAt(i)->cr_col_name) != 0) + return false; + i++; + } + return OK; +} + bool XTDDForeignKey::checkReferencedTypes(XTDDTable *dt) { XTDDColumn *col, *ref_col; @@ -2288,6 +2356,10 @@ bool XTDDForeignKey::insertRow(xtWord1 *before_buf, xtWord1 *rec_buf, XTThreadPt goto failed_2; case TRUE: /* We found a matching parent: */ + if (ot->ot_ind_rhandle) { + xt_ind_release_handle(ot->ot_ind_rhandle, FALSE, thread); + ot->ot_ind_rhandle = NULL; + } xt_db_return_table_to_pool_ns(ot); goto success; case FALSE: @@ -2300,6 +2372,10 @@ bool XTDDForeignKey::insertRow(xtWord1 *before_buf, xtWord1 *rec_buf, XTThreadPt xt_register_ixterr(XT_REG_CONTEXT, XT_ERR_NO_REFERENCED_ROW, co_name); failed_2: + if (ot->ot_ind_rhandle) { + xt_ind_release_handle(ot->ot_ind_rhandle, FALSE, thread); + ot->ot_ind_rhandle = NULL; + } xt_db_return_table_to_pool_ns(ot); failed: @@ -2672,16 +2748,24 @@ void XTDDTable::checkForeignKeys(XTThreadPtr self, bool temp_table) XTDDIndex *XTDDTable::findIndex(XTDDConstraint *co) { - XTDDIndex *ind; + XTDDIndex *ind = NULL; + XTDDIndex *cur_ind; + u_int index_size = UINT_MAX; for (u_int i=0; i<dt_indexes.size(); i++) { - ind = dt_indexes.itemAt(i); - if (co->sameColumns(ind)) - return ind; + cur_ind = dt_indexes.itemAt(i); + u_int sz = cur_ind->getIndexPtr()->mi_key_size; + if (sz < index_size && co->samePrefixColumns(cur_ind)) { + ind = cur_ind; + index_size = sz; + } } + + if (ind) + return ind; + { char buffer[XT_ERR_MSG_SIZE - 200]; - co->getColumnList(buffer, XT_ERR_MSG_SIZE - 200); xt_register_ixterr(XT_REG_CONTEXT, XT_ERR_NO_MATCHING_INDEX, buffer); } @@ -2690,16 +2774,24 @@ XTDDIndex *XTDDTable::findIndex(XTDDConstraint *co) XTDDIndex *XTDDTable::findReferenceIndex(XTDDForeignKey *fk) { - XTDDIndex *ind; + XTDDIndex *ind = NULL; + XTDDIndex *cur_ind; XTDDColumnRef *cr; u_int i; + u_int index_size = UINT_MAX; for (i=0; i<dt_indexes.size(); i++) { - ind = dt_indexes.itemAt(i); - if (fk->sameReferenceColumns(ind)) - return ind; + cur_ind = dt_indexes.itemAt(i); + u_int sz = cur_ind->getIndexPtr()->mi_key_size; + if (sz < index_size && fk->samePrefixReferenceColumns(cur_ind)) { + ind = cur_ind; + index_size = sz; + } } + if (ind) + return ind; + /* If the index does not exist, maybe the columns do not exist?! */ for (i=0; i<fk->fk_ref_cols.size(); i++) { cr = fk->fk_ref_cols.itemAt(i); @@ -2867,9 +2959,33 @@ bool XTDDTable::updateRow(XTOpenTablePtr ot, xtWord1 *before, xtWord1 *after) return ok; } -xtBool XTDDTable::checkCanDrop() +/* + * drop_db parameter is TRUE if we are dropping the schema of this table. In this case + * we return TRUE if the table has only refs to the tables from its own schema + */ +xtBool XTDDTable::checkCanDrop(xtBool drop_db) { /* no refs or references only itself */ - return (dt_trefs == NULL) || - (dt_trefs->tr_next == NULL) && (dt_trefs->tr_fkey->co_table == this); + if ((dt_trefs == NULL) || ((dt_trefs->tr_next == NULL) && (dt_trefs->tr_fkey->co_table == this))) + return TRUE; + + if (!drop_db) + return FALSE; + + const char *this_schema = xt_last_2_names_of_path(dt_table->tab_name->ps_path); + size_t this_schema_sz = xt_last_name_of_path(dt_table->tab_name->ps_path) - this_schema; + XTDDTableRef *tr = dt_trefs; + + while (tr) { + const char *tab_path = tr->tr_fkey->co_table->dt_table->tab_name->ps_path; + const char *tab_schema = xt_last_2_names_of_path(tab_path); + size_t tab_schema_sz = xt_last_name_of_path(tab_path) - tab_schema; + + if (this_schema_sz != tab_schema_sz || strncmp(this_schema, tab_schema, tab_schema_sz)) + return FALSE; + + tr = tr->tr_next; + } + + return TRUE; } diff --git a/storage/pbxt/src/datadic_xt.h b/storage/pbxt/src/datadic_xt.h index 825914b60f3..1e56561614d 100644 --- a/storage/pbxt/src/datadic_xt.h +++ b/storage/pbxt/src/datadic_xt.h @@ -137,6 +137,7 @@ class XTDDColumnRef : public XTObject { return new_obj; } + virtual void init(XTThreadPtr self) { XTObject::init(self); } virtual void init(XTThreadPtr self, XTObject *obj); virtual void finalize(XTThreadPtr self); }; @@ -156,6 +157,7 @@ class XTDDConstraint : public XTObject { co_ind_name(NULL) { } + virtual void init(XTThreadPtr self) { XTObject::init(self); } virtual void init(XTThreadPtr self, XTObject *obj); virtual void finalize(XTThreadPtr self) { if (co_name) @@ -169,6 +171,7 @@ class XTDDConstraint : public XTObject { virtual void alterColumnName(XTThreadPtr self, char *from_name, char *to_name); void getColumnList(char *buffer, size_t size); bool sameColumns(XTDDConstraint *co); + bool samePrefixColumns(XTDDConstraint *co); bool attachColumns(); }; @@ -198,6 +201,7 @@ class XTDDIndex : public XTDDConstraint { return new_obj; } + virtual void init(XTThreadPtr self) { XTDDConstraint::init(self); }; virtual void init(XTThreadPtr self, XTObject *obj); struct XTIndex *getIndexPtr(); }; @@ -230,12 +234,14 @@ class XTDDForeignKey : public XTDDIndex { return new_obj; } + virtual void init(XTThreadPtr self) { XTDDIndex::init(self); } virtual void init(XTThreadPtr self, XTObject *obj); virtual void finalize(XTThreadPtr self); virtual void loadString(XTThreadPtr self, XTStringBufferPtr sb); void getReferenceList(char *buffer, size_t size); struct XTIndex *getReferenceIndexPtr(); bool sameReferenceColumns(XTDDConstraint *co); + bool samePrefixReferenceColumns(XTDDConstraint *co); bool checkReferencedTypes(XTDDTable *dt); void removeReference(XTThreadPtr self); bool insertRow(xtWord1 *before, xtWord1 *after, XTThreadPtr thread); @@ -284,7 +290,7 @@ class XTDDTable : public XTObject { XTDDIndex *findReferenceIndex(XTDDForeignKey *fk); bool insertRow(struct XTOpenTable *rec_ot, xtWord1 *buffer); bool checkNoAction(struct XTOpenTable *ot, xtRecordID rec_id); - xtBool checkCanDrop(); + xtBool checkCanDrop(xtBool drop_db); bool deleteRow(struct XTOpenTable *rec_ot, xtWord1 *buffer); void deleteAllRows(XTThreadPtr self); bool updateRow(struct XTOpenTable *rec_ot, xtWord1 *before, xtWord1 *after); diff --git a/storage/pbxt/src/datalog_xt.cc b/storage/pbxt/src/datalog_xt.cc index dc9423e7eac..1a8a0a47085 100644 --- a/storage/pbxt/src/datalog_xt.cc +++ b/storage/pbxt/src/datalog_xt.cc @@ -69,6 +69,7 @@ xtBool XTDataSeqRead::sl_seq_init(struct XTDatabase *db, size_t buffer_size) sl_rec_log_id = 0; sl_rec_log_offset = 0; sl_record_len = 0; + sl_extra_garbage = 0; return sl_buffer != NULL; } @@ -130,8 +131,25 @@ xtBool XTDataSeqRead::sl_rnd_read(xtLogOffset log_offset, size_t size, xtWord1 * /* * Unlike the transaction log sequential reader, this function only returns * the header of a record. + * + * {SKIP-GAPS} + * This function now skips gaps. This should not be required, because in normal + * operation, no gaps should be created. + * + * However, if his happens there is a danger that a valid record after the + * gap will be lost. + * + * So, if we find an invalid record, we scan through the log to find the next + * valid record. Note, that there is still a danger that will will find + * data that looks like a valid record, but is not. + * + * In this case, this "pseudo record" may cause the function to actually skip + * valid records. + * + * Note, any such malfunction will eventually cause the record to be lost forever + * after the garbage collector has run. */ -xtBool XTDataSeqRead::sl_seq_next(XTXactLogBufferDPtr *ret_entry, xtBool verify, struct XTThread *thread) +xtBool XTDataSeqRead::sl_seq_next(XTXactLogBufferDPtr *ret_entry, struct XTThread *thread) { XTXactLogBufferDPtr record; size_t tfer; @@ -140,10 +158,12 @@ xtBool XTDataSeqRead::sl_seq_next(XTXactLogBufferDPtr *ret_entry, xtBool verify, size_t max_rec_len; xtBool reread_from_buffer; xtWord4 size; + xtLogOffset gap_start = 0; /* Go to the next record (xseq_record_len must be initialized * to 0 for this to work. */ + retry: sl_rec_log_offset += sl_record_len; sl_record_len = 0; @@ -174,6 +194,8 @@ xtBool XTDataSeqRead::sl_seq_next(XTXactLogBufferDPtr *ret_entry, xtBool verify, record = (XTXactLogBufferDPtr) (sl_buffer + rec_offset); switch (record->xl.xl_status_1) { case XT_LOG_ENT_HEADER: + if (sl_rec_log_offset != 0) + goto scan_to_next_record; if (offsetof(XTXactLogHeaderDRec, xh_size_4) + 4 > max_rec_len) { reread_from_buffer = TRUE; goto read_more; @@ -183,33 +205,42 @@ xtBool XTDataSeqRead::sl_seq_next(XTXactLogBufferDPtr *ret_entry, xtBool verify, reread_from_buffer = TRUE; goto read_more; } - if (verify) { - if (record->xh.xh_checksum_1 != XT_CHECKSUM_1(sl_rec_log_id)) - goto return_empty; - if (XT_LOG_HEAD_MAGIC(record, len) != XT_LOG_FILE_MAGIC) + + if (record->xh.xh_checksum_1 != XT_CHECKSUM_1(sl_rec_log_id)) + goto return_empty; + if (XT_LOG_HEAD_MAGIC(record, len) != XT_LOG_FILE_MAGIC) + goto return_empty; + if (len > offsetof(XTXactLogHeaderDRec, xh_log_id_4) + 4) { + if (XT_GET_DISK_4(record->xh.xh_log_id_4) != sl_rec_log_id) goto return_empty; - if (len > offsetof(XTXactLogHeaderDRec, xh_log_id_4) + 4) { - if (XT_GET_DISK_4(record->xh.xh_log_id_4) != sl_rec_log_id) - goto return_empty; - } } break; case XT_LOG_ENT_EXT_REC_OK: case XT_LOG_ENT_EXT_REC_DEL: + if (gap_start) { + xt_logf(XT_NS_CONTEXT, XT_LOG_WARNING, "Gap in data log %lu, start: %llu, size: %llu\n", (u_long) sl_rec_log_id, (u_llong) gap_start, (u_llong) (sl_rec_log_offset - gap_start)); + gap_start = 0; + } len = offsetof(XTactExtRecEntryDRec, er_data); if (len > max_rec_len) { reread_from_buffer = TRUE; goto read_more; } size = XT_GET_DISK_4(record->er.er_data_size_4); - if (verify) { - if (sl_rec_log_offset + (xtLogOffset) offsetof(XTactExtRecEntryDRec, er_data) + size > sl_log_eof) - goto return_empty; - } + /* Verify the record as good as we can! */ + if (!size) + goto scan_to_next_record; + if (sl_rec_log_offset + (xtLogOffset) offsetof(XTactExtRecEntryDRec, er_data) + size > sl_log_eof) + goto scan_to_next_record; + if (!XT_GET_DISK_4(record->er.er_tab_id_4)) + goto scan_to_next_record; + if (!XT_GET_DISK_4(record->er.er_rec_id_4)) + goto scan_to_next_record; break; default: - ASSERT_NS(FALSE); - goto return_empty; + /* Note, we no longer assume EOF. + * Instead, we skip to the next value record. */ + goto scan_to_next_record; } if (len <= max_rec_len) { @@ -243,7 +274,20 @@ xtBool XTDataSeqRead::sl_seq_next(XTXactLogBufferDPtr *ret_entry, xtBool verify, *ret_entry = (XTXactLogBufferDPtr) sl_buffer; return OK; + scan_to_next_record: + if (!gap_start) { + gap_start = sl_rec_log_offset; + xt_logf(XT_NS_CONTEXT, XT_LOG_WARNING, "Gap found in data log %lu, starting at offset %llu\n", (u_long) sl_rec_log_id, (u_llong) gap_start); + } + sl_record_len = 1; + sl_extra_garbage++; + goto retry; + return_empty: + if (gap_start) { + xt_logf(XT_NS_CONTEXT, XT_LOG_WARNING, "Gap in data log %lu, start: %llu, size: %llu\n", (u_long) sl_rec_log_id, (u_llong) gap_start, (u_llong) (sl_rec_log_offset - gap_start)); + gap_start = 0; + } *ret_entry = NULL; return OK; } @@ -285,22 +329,54 @@ static xtBool dl_create_log_header(XTDataLogFilePtr data_log, XTOpenFilePtr of, return OK; } -static xtBool dl_write_log_header(XTDataLogFilePtr data_log, XTOpenFilePtr of, xtBool flush, XTThreadPtr thread) +static xtBool dl_write_garbage_level(XTDataLogFilePtr data_log, XTOpenFilePtr of, xtBool flush, XTThreadPtr thread) { XTXactLogHeaderDRec header; /* The header was not completely written, so write a new one: */ XT_SET_DISK_8(header.xh_free_space_8, data_log->dlf_garbage_count); - XT_SET_DISK_8(header.xh_file_len_8, data_log->dlf_log_eof); - XT_SET_DISK_8(header.xh_comp_pos_8, data_log->dlf_start_offset); - - if (!xt_pwrite_file(of, offsetof(XTXactLogHeaderDRec, xh_free_space_8), 24, (xtWord1 *) &header.xh_free_space_8, &thread->st_statistics.st_data, thread)) + if (!xt_pwrite_file(of, offsetof(XTXactLogHeaderDRec, xh_free_space_8), 8, (xtWord1 *) &header.xh_free_space_8, &thread->st_statistics.st_data, thread)) return FAILED; if (flush && !xt_flush_file(of, &thread->st_statistics.st_data, thread)) return FAILED; return OK; } +/* + * {SKIP-GAPS} + * Extra garbage is the amount of space skipped during recovery of the data + * log file. We assume this space has not be counted as garbage, + * and add it to the garbage count. + * + * This may mean that our estimate of garbaged is higher than it should + * be, but that is better than the other way around. + * + * The fact is, there should not be any gaps in the data log files, so + * this is actually an exeption which should not occur. + */ +static xtBool dl_write_log_header(XTDataLogFilePtr data_log, XTOpenFilePtr of, xtLogOffset extra_garbage, XTThreadPtr thread) +{ + XTXactLogHeaderDRec header; + + XT_SET_DISK_8(header.xh_file_len_8, data_log->dlf_log_eof); + + if (extra_garbage) { + data_log->dlf_garbage_count += extra_garbage; + if (data_log->dlf_garbage_count > data_log->dlf_log_eof) + data_log->dlf_garbage_count = data_log->dlf_log_eof; + XT_SET_DISK_8(header.xh_free_space_8, data_log->dlf_garbage_count); + if (!xt_pwrite_file(of, offsetof(XTXactLogHeaderDRec, xh_free_space_8), 16, (xtWord1 *) &header.xh_free_space_8, &thread->st_statistics.st_data, thread)) + return FAILED; + } + else { + if (!xt_pwrite_file(of, offsetof(XTXactLogHeaderDRec, xh_file_len_8), 8, (xtWord1 *) &header.xh_file_len_8, &thread->st_statistics.st_data, thread)) + return FAILED; + } + if (!xt_flush_file(of, &thread->st_statistics.st_data, thread)) + return FAILED; + return OK; +} + static void dl_free_seq_read(XTThreadPtr self __attribute__((unused)), XTDataSeqReadPtr seq_read) { seq_read->sl_seq_exit(); @@ -318,7 +394,7 @@ static void dl_recover_log(XTThreadPtr self, XTDatabaseHPtr db, XTDataLogFilePtr seq_read.sl_seq_start(data_log->dlf_log_id, 0, FALSE); for (;;) { - if (!seq_read.sl_seq_next(&record, TRUE, self)) + if (!seq_read.sl_seq_next(&record, self)) xt_throw(self); if (!record) break; @@ -331,13 +407,18 @@ static void dl_recover_log(XTThreadPtr self, XTDatabaseHPtr db, XTDataLogFilePtr } } - if (!(data_log->dlf_log_eof = seq_read.sl_rec_log_offset)) { + ASSERT_NS(seq_read.sl_log_eof == seq_read.sl_rec_log_offset); + data_log->dlf_log_eof = seq_read.sl_rec_log_offset; + + if ((size_t) data_log->dlf_log_eof < sizeof(XTXactLogHeaderDRec)) { data_log->dlf_log_eof = sizeof(XTXactLogHeaderDRec); if (!dl_create_log_header(data_log, seq_read.sl_log_file, self)) xt_throw(self); } - if (!dl_write_log_header(data_log, seq_read.sl_log_file, TRUE, self)) - xt_throw(self); + else { + if (!dl_write_log_header(data_log, seq_read.sl_log_file, seq_read.sl_extra_garbage, self)) + xt_throw(self); + } freer_(); // dl_free_seq_read(&seq_read) } @@ -452,7 +533,7 @@ xtBool XTDataLogCache::dls_set_log_state(XTDataLogFilePtr data_log, int state) return FAILED; } -static int dl_cmp_log_id(XTThreadPtr XT_UNUSED(self), register const void XT_UNUSED(*thunk), register const void *a, register const void *b) +static int dl_cmp_log_id(XTThreadPtr XT_UNUSED(self), register const void *XT_UNUSED(thunk), register const void *a, register const void *b) { xtLogID log_id_a = *((xtLogID *) a); xtLogID log_id_b = *((xtLogID *) b); @@ -1110,7 +1191,6 @@ xtBool XTDataLogBuffer::dlb_get_log_offset(xtLogID *log_id, xtLogOffset *out_off *log_id = dlb_data_log->dlf_log_id; *out_offset = dlb_data_log->dlf_log_eof; - dlb_data_log->dlf_log_eof += req_size; return OK; } @@ -1149,7 +1229,7 @@ xtBool XTDataLogBuffer::dlb_flush_log(xtBool commit, XTThreadPtr thread) return OK; } -xtBool XTDataLogBuffer::dlb_write_thru_log(xtLogID log_id __attribute__((unused)), xtLogOffset log_offset, size_t size, xtWord1 *data, XTThreadPtr thread) +xtBool XTDataLogBuffer::dlb_write_thru_log(xtLogID XT_NDEBUG_UNUSED(log_id), xtLogOffset log_offset, size_t size, xtWord1 *data, XTThreadPtr thread) { ASSERT_NS(log_id == dlb_data_log->dlf_log_id); @@ -1158,6 +1238,11 @@ xtBool XTDataLogBuffer::dlb_write_thru_log(xtLogID log_id __attribute__((unused) if (!xt_pwrite_file(dlb_data_log->dlf_log_file, log_offset, size, data, &thread->st_statistics.st_data, thread)) return FAILED; + /* Increment of dlb_data_log->dlf_log_eof was moved here from dlb_get_log_offset() + * to ensure it is done after a successful update of the log, otherwise otherwise a + * gap occurs in the log which cause eof to be detected in middle of the log + */ + dlb_data_log->dlf_log_eof += size; #ifdef DEBUG if (log_offset + size > dlb_max_write_offset) dlb_max_write_offset = log_offset + size; @@ -1166,7 +1251,7 @@ xtBool XTDataLogBuffer::dlb_write_thru_log(xtLogID log_id __attribute__((unused) return OK; } -xtBool XTDataLogBuffer::dlb_append_log(xtLogID log_id __attribute__((unused)), xtLogOffset log_offset, size_t size, xtWord1 *data, XTThreadPtr thread) +xtBool XTDataLogBuffer::dlb_append_log(xtLogID XT_NDEBUG_UNUSED(log_id), xtLogOffset log_offset, size_t size, xtWord1 *data, XTThreadPtr thread) { ASSERT_NS(log_id == dlb_data_log->dlf_log_id); @@ -1179,10 +1264,12 @@ xtBool XTDataLogBuffer::dlb_append_log(xtLogID log_id __attribute__((unused)), x if (dlb_buffer_size >= dlb_buffer_len + size) { memcpy(dlb_log_buffer + dlb_buffer_len, data, size); dlb_buffer_len += size; + dlb_data_log->dlf_log_eof += size; return OK; } } - dlb_flush_log(FALSE, thread); + if (dlb_flush_log(FALSE, thread) != OK) + return FAILED; } ASSERT_NS(dlb_buffer_len == 0); @@ -1191,6 +1278,7 @@ xtBool XTDataLogBuffer::dlb_append_log(xtLogID log_id __attribute__((unused)), x dlb_buffer_offset = log_offset; dlb_buffer_len = size; memcpy(dlb_log_buffer, data, size); + dlb_data_log->dlf_log_eof += size; return OK; } @@ -1202,6 +1290,7 @@ xtBool XTDataLogBuffer::dlb_append_log(xtLogID log_id __attribute__((unused)), x dlb_max_write_offset = log_offset + size; #endif dlb_flush_required = TRUE; + dlb_data_log->dlf_log_eof += size; return OK; } @@ -1306,7 +1395,7 @@ xtBool XTDataLogBuffer::dlb_delete_log(xtLogID log_id, xtLogOffset log_offset, s xt_lock_mutex_ns(&dlb_db->db_datalogs.dlc_head_lock); dlb_data_log->dlf_garbage_count += offsetof(XTactExtRecEntryDRec, er_data) + size; ASSERT_NS(dlb_data_log->dlf_garbage_count < dlb_data_log->dlf_log_eof); - if (!dl_write_log_header(dlb_data_log, dlb_data_log->dlf_log_file, FALSE, thread)) { + if (!dl_write_garbage_level(dlb_data_log, dlb_data_log->dlf_log_file, FALSE, thread)) { xt_unlock_mutex_ns(&dlb_db->db_datalogs.dlc_head_lock); return FAILED; } @@ -1329,7 +1418,7 @@ xtBool XTDataLogBuffer::dlb_delete_log(xtLogID log_id, xtLogOffset log_offset, s xt_lock_mutex_ns(&dlb_db->db_datalogs.dlc_head_lock); data_log->dlf_garbage_count += offsetof(XTactExtRecEntryDRec, er_data) + size; ASSERT_NS(data_log->dlf_garbage_count < data_log->dlf_log_eof); - if (!dl_write_log_header(data_log, open_log->odl_log_file, FALSE, thread)) { + if (!dl_write_garbage_level(data_log, open_log->odl_log_file, FALSE, thread)) { xt_unlock_mutex_ns(&dlb_db->db_datalogs.dlc_head_lock); goto failed; } @@ -1357,7 +1446,7 @@ xtBool XTDataLogBuffer::dlb_delete_log(xtLogID log_id, xtLogOffset log_offset, s * Delete all the extended data belonging to a particular * table. */ -xtPublic void xt_dl_delete_ext_data(XTThreadPtr self, XTTableHPtr tab, xtBool missing_ok __attribute__((unused)), xtBool have_table_lock) +xtPublic void xt_dl_delete_ext_data(XTThreadPtr self, XTTableHPtr tab, xtBool XT_UNUSED(missing_ok), xtBool have_table_lock) { XTOpenTablePtr ot; xtRecordID page_rec_id, offs_rec_id; @@ -1674,7 +1763,7 @@ static xtBool dl_collect_garbage(XTThreadPtr self, XTDatabaseHPtr db, XTDataLogF xt_lock_mutex_ns(&db->db_datalogs.dlc_head_lock); data_log->dlf_garbage_count += garbage_count; ASSERT(data_log->dlf_garbage_count < data_log->dlf_log_eof); - if (!dl_write_log_header(data_log, cs.cs_seqread->sl_seq_open_file(), TRUE, self)) { + if (!dl_write_garbage_level(data_log, cs.cs_seqread->sl_seq_open_file(), TRUE, self)) { xt_unlock_mutex_ns(&db->db_datalogs.dlc_head_lock); xt_throw(self); } @@ -1683,7 +1772,7 @@ static xtBool dl_collect_garbage(XTThreadPtr self, XTDatabaseHPtr db, XTDataLogF freer_(); // dl_free_compactor_state(&cs) return FAILED; } - if (!cs.cs_seqread->sl_seq_next(&record, TRUE, self)) + if (!cs.cs_seqread->sl_seq_next(&record, self)) xt_throw(self); cs.cs_seqread->sl_seq_pos(&curr_log_id, &curr_log_offset); if (!record) { @@ -1809,7 +1898,7 @@ static xtBool dl_collect_garbage(XTThreadPtr self, XTDatabaseHPtr db, XTDataLogF xt_lock_mutex_ns(&db->db_datalogs.dlc_head_lock); data_log->dlf_garbage_count += garbage_count; ASSERT(data_log->dlf_garbage_count < data_log->dlf_log_eof); - if (!dl_write_log_header(data_log, cs.cs_seqread->sl_seq_open_file(), TRUE, self)) { + if (!dl_write_garbage_level(data_log, cs.cs_seqread->sl_seq_open_file(), TRUE, self)) { xt_unlock_mutex_ns(&db->db_datalogs.dlc_head_lock); xt_throw(self); } @@ -1926,7 +2015,8 @@ static void *dl_run_co_thread(XTThreadPtr self) int count; void *mysql_thread; - mysql_thread = myxt_create_thread(); + if (!(mysql_thread = myxt_create_thread())) + xt_throw(self); while (!self->t_quit) { try_(a) { @@ -1979,7 +2069,10 @@ static void *dl_run_co_thread(XTThreadPtr self) } } + /* + * {MYSQL-THREAD-KILL} myxt_destroy_thread(mysql_thread, TRUE); + */ return NULL; } diff --git a/storage/pbxt/src/datalog_xt.h b/storage/pbxt/src/datalog_xt.h index 245ebcbaeda..2eeba7bfab4 100644 --- a/storage/pbxt/src/datalog_xt.h +++ b/storage/pbxt/src/datalog_xt.h @@ -183,8 +183,8 @@ typedef struct XTSeqLogRead { virtual xtBool sl_rnd_read(xtLogOffset log_offset, size_t size, xtWord1 *data, size_t *read, struct XTThread *thread) { (void) log_offset; (void) size; (void) data; (void) read; (void) thread; return OK; }; - virtual xtBool sl_seq_next(XTXactLogBufferDPtr *entry, xtBool verify, struct XTThread *thread) { - (void) entry; (void) verify; (void) thread; return OK; + virtual xtBool sl_seq_next(XTXactLogBufferDPtr *entry, struct XTThread *thread) { + (void) entry; (void) thread; return OK; }; virtual void sl_seq_skip(size_t size) { (void) size; } } XTSeqLogReadRec, *XTSeqLogReadPtr; @@ -195,6 +195,7 @@ typedef struct XTDataSeqRead : public XTSeqLogRead { xtLogOffset sl_rec_log_offset; /* The current log read position. */ size_t sl_record_len; /* The length of the current record. */ xtLogOffset sl_log_eof; + xtLogOffset sl_extra_garbage; /* Garbage found during a scan. */ size_t sl_buffer_size; /* Size of the buffer. */ xtLogOffset sl_buf_log_offset; /* File offset of the buffer. */ @@ -208,7 +209,7 @@ typedef struct XTDataSeqRead : public XTSeqLogRead { virtual void sl_seq_pos(xtLogID *log_id, xtLogOffset *log_offset); virtual xtBool sl_seq_start(xtLogID log_id, xtLogOffset log_offset, xtBool missing_ok); virtual xtBool sl_rnd_read(xtLogOffset log_offset, size_t size, xtWord1 *data, size_t *read, struct XTThread *thread); - virtual xtBool sl_seq_next(XTXactLogBufferDPtr *entry, xtBool verify, struct XTThread *thread); + virtual xtBool sl_seq_next(XTXactLogBufferDPtr *entry, struct XTThread *thread); virtual void sl_seq_skip(size_t size); virtual void sl_seq_skip_to(off_t offset); } XTDataSeqReadRec, *XTDataSeqReadPtr; diff --git a/storage/pbxt/src/discover_xt.cc b/storage/pbxt/src/discover_xt.cc index 074132d47cb..1bb6e874a1c 100644 --- a/storage/pbxt/src/discover_xt.cc +++ b/storage/pbxt/src/discover_xt.cc @@ -493,8 +493,8 @@ mysql_prepare_create_table(THD *thd, HA_CREATE_INFO *create_info, } /* Don't pack rows in old tables if the user has requested this */ if ((sql_field->flags & BLOB_FLAG) || - sql_field->sql_type == MYSQL_TYPE_VARCHAR && - create_info->row_type != ROW_TYPE_FIXED) + (sql_field->sql_type == MYSQL_TYPE_VARCHAR && + create_info->row_type != ROW_TYPE_FIXED)) (*db_options)|= HA_OPTION_PACK_RECORD; it2.rewind(); } @@ -963,7 +963,7 @@ mysql_prepare_create_table(THD *thd, HA_CREATE_INFO *create_info, sql_field->sql_type == MYSQL_TYPE_VARCHAR || sql_field->pack_flag & FIELDFLAG_BLOB))) { - if (column_nr == 0 && (sql_field->pack_flag & FIELDFLAG_BLOB) || + if ((column_nr == 0 && (sql_field->pack_flag & FIELDFLAG_BLOB)) || sql_field->sql_type == MYSQL_TYPE_VARCHAR) key_info->flags|= HA_BINARY_PACK_KEY | HA_VAR_LENGTH_KEY; else @@ -1282,9 +1282,11 @@ warn: #endif // LOCK_OPEN_HACK_REQUIRED //------------------------------ -int xt_create_table_frm(handlerton *hton, THD* thd, const char *db, const char *name, DT_FIELD_INFO *info, DT_KEY_INFO *keys __attribute__((unused)), xtBool skip_existing) +int xt_create_table_frm(handlerton *hton, THD* thd, const char *db, const char *name, DT_FIELD_INFO *info, DT_KEY_INFO *XT_UNUSED(keys), xtBool skip_existing) { #ifdef DRIZZLED + drizzled::message::Table table_proto; + static const char *ext = ".dfe"; static const int ext_len = 4; #else @@ -1329,8 +1331,7 @@ int xt_create_table_frm(handlerton *hton, THD* thd, const char *db, const char * info->field_flags, COLUMN_FORMAT_TYPE_FIXED, NULL /*default_value*/, NULL /*on_update_value*/, &comment, NULL /*change*/, - NULL /*interval_list*/, info->field_charset, - NULL /*vcol_info*/)) + NULL /*interval_list*/, info->field_charset)) #else if (add_field_to_list(thd, &field_name, info->field_type, field_length_ptr, info->field_decimal_length, info->field_flags, @@ -1365,7 +1366,10 @@ int xt_create_table_frm(handlerton *hton, THD* thd, const char *db, const char * /* Create an internal temp table */ #ifdef DRIZZLED - if (mysql_create_table_no_lock(thd, db, name, &mylex.create_info, &mylex.alter_info, 1, 0, false)) + table_proto.set_name(name); + table_proto.set_type(drizzled::message::Table::STANDARD); + + if (mysql_create_table_no_lock(thd, db, name, &mylex.create_info, &table_proto, &mylex.alter_info, 1, 0, false)) goto error; #else if (mysql_create_table_no_lock(thd, db, name, &mylex.create_info, &mylex.alter_info, 1, 0)) diff --git a/storage/pbxt/src/filesys_xt.cc b/storage/pbxt/src/filesys_xt.cc index 5ca36cd9244..2147626b20d 100644 --- a/storage/pbxt/src/filesys_xt.cc +++ b/storage/pbxt/src/filesys_xt.cc @@ -23,6 +23,10 @@ #include "xt_config.h" +#ifdef DRIZZLED +#include <bitset> +#endif + #ifndef XT_WIN #include <unistd.h> #include <dirent.h> @@ -50,6 +54,8 @@ //#define DEBUG_TRACE_IO //#define DEBUG_TRACE_MAP_IO //#define DEBUG_TRACE_FILES +//#define INJECT_WRITE_REMAP_ERROR +/* This is required to make testing on the Mac faster: */ #endif #ifdef DEBUG_TRACE_FILES @@ -57,6 +63,15 @@ #define PRINTF xt_trace #endif +#if defined(XT_MAC) && defined(F_FULLFSYNC) +#undef F_FULLFSYNC +#endif + +#ifdef INJECT_WRITE_REMAP_ERROR +#define INJECT_REMAP_FILE_SIZE 1000000 +#define INJECT_REMAP_FILE_TYPE "xtd" +#endif + /* ---------------------------------------------------------------------- * Globals */ @@ -127,11 +142,11 @@ static void fs_close_fmap(XTThreadPtr self, XTFileMemMapPtr mm) mm->mm_start = NULL; } #endif - xt_rwmutex_free(self, &mm->mm_lock); + FILE_MAP_FREE_LOCK(self, &mm->mm_lock); xt_free(self, mm); } -static void fs_free_file(XTThreadPtr self, void *thunk __attribute__((unused)), void *item) +static void fs_free_file(XTThreadPtr self, void *XT_UNUSED(thunk), void *item) { XTFilePtr file_ptr = *((XTFilePtr *) item); @@ -148,17 +163,13 @@ static void fs_free_file(XTThreadPtr self, void *thunk __attribute__((unused)), file_ptr->fil_filedes = XT_NULL_FD; } - if (file_ptr->fil_memmap) { - fs_close_fmap(self, file_ptr->fil_memmap); - file_ptr->fil_memmap = NULL; - } - #ifdef DEBUG_TRACE_FILES PRINTF("%s: free file: (%d) %s\n", self->t_name, (int) file_ptr->fil_id, file_ptr->fil_path ? xt_last_2_names_of_path(file_ptr->fil_path) : "?"); #endif if (!file_ptr->fil_ref_count) { + ASSERT_NS(!file_ptr->fil_handle_count); /* Flush any cache before this file is invalid: */ if (file_ptr->fil_path) { xt_free(self, file_ptr->fil_path); @@ -169,7 +180,7 @@ static void fs_free_file(XTThreadPtr self, void *thunk __attribute__((unused)), } } -static int fs_comp_file(XTThreadPtr self __attribute__((unused)), register const void *thunk __attribute__((unused)), register const void *a, register const void *b) +static int fs_comp_file(XTThreadPtr XT_UNUSED(self), register const void *XT_UNUSED(thunk), register const void *a, register const void *b) { char *file_name = (char *) a; XTFilePtr file_ptr = *((XTFilePtr *) b); @@ -177,7 +188,7 @@ static int fs_comp_file(XTThreadPtr self __attribute__((unused)), register const return strcmp(file_name, file_ptr->fil_path); } -static int fs_comp_file_ci(XTThreadPtr self __attribute__((unused)), register const void *thunk __attribute__((unused)), register const void *a, register const void *b) +static int fs_comp_file_ci(XTThreadPtr XT_UNUSED(self), register const void *XT_UNUSED(thunk), register const void *a, register const void *b) { char *file_name = (char *) a; XTFilePtr file_ptr = *((XTFilePtr *) b); @@ -868,11 +879,22 @@ xtPublic xtBool xt_flush_file(XTOpenFilePtr of, XTIOStatsPtr stat, XTThreadPtr X goto failed; } #else + /* Mac OS X has problems with fsync. We had several cases of index corruption presumably because + * fsync didn't really flush index pages to disk. fcntl(F_FULLFSYNC) is considered more effective + * in such case. + */ +#ifdef F_FULLFSYNC + if (fcntl(of->of_filedes, F_FULLFSYNC, 0) == -1) { + xt_register_ferrno(XT_REG_CONTEXT, errno, xt_file_path(of)); + goto failed; + } +#else if (fsync(of->of_filedes) == -1) { xt_register_ferrno(XT_REG_CONTEXT, errno, xt_file_path(of)); goto failed; } #endif +#endif #ifdef DEBUG_TRACE_IO xt_trace("/* %s */ pbxt_file_sync(\"%s\");\n", xt_trace_clock_diff(timef, start), of->fr_file->fil_path); #endif @@ -938,6 +960,29 @@ xtBool xt_pread_file(XTOpenFilePtr of, off_t offset, size_t size, size_t min_siz return OK; } +xtPublic xtBool xt_lock_file_ptr(XTOpenFilePtr of, xtWord1 **data, off_t offset, size_t size, XTIOStatsPtr stat, XTThreadPtr thread) +{ + size_t red_size; + + if (!*data) { + if (!(*data = (xtWord1 *) xt_malloc_ns(size))) + return FAILED; + } + + if (!xt_pread_file(of, offset, size, 0, *data, &red_size, stat, thread)) + return FAILED; + + //if (red_size < size) + // memset(); + return OK; +} + +xtPublic void xt_unlock_file_ptr(XTOpenFilePtr XT_UNUSED(of), xtWord1 *data, XTThreadPtr XT_UNUSED(thread)) +{ + if (data) + xt_free_ns(data); +} + /* ---------------------------------------------------------------------- * Directory operations */ @@ -949,7 +994,13 @@ XTOpenDirPtr xt_dir_open(XTThreadPtr self, c_char *path, c_char *filter) { XTOpenDirPtr od; - pushsr_(od, xt_dir_close, (XTOpenDirPtr) xt_calloc(self, sizeof(XTOpenDirRec))); +#ifdef XT_SOLARIS + /* see the comment in filesys_xt.h */ + size_t sz = pathconf(path, _PC_NAME_MAX) + sizeof(XTOpenDirRec) + 1; +#else + size_t sz = sizeof(XTOpenDirRec); +#endif + pushsr_(od, xt_dir_close, (XTOpenDirPtr) xt_calloc(self, sz)); #ifdef XT_WIN size_t len; @@ -976,7 +1027,6 @@ XTOpenDirPtr xt_dir_open(XTThreadPtr self, c_char *path, c_char *filter) if (!od->od_dir) xt_throw_ferrno(XT_CONTEXT, errno, path); #endif - popr_(); // Discard xt_dir_close(od) return od; } @@ -1097,7 +1147,7 @@ xtBool xt_dir_next(XTThreadPtr self, XTOpenDirPtr od) } #endif -char *xt_dir_name(XTThreadPtr self __attribute__((unused)), XTOpenDirPtr od) +char *xt_dir_name(XTThreadPtr XT_UNUSED(self), XTOpenDirPtr od) { #ifdef XT_WIN return od->od_data.cFileName; @@ -1106,8 +1156,9 @@ char *xt_dir_name(XTThreadPtr self __attribute__((unused)), XTOpenDirPtr od) #endif } -xtBool xt_dir_is_file(XTThreadPtr self __attribute__((unused)), XTOpenDirPtr od) +xtBool xt_dir_is_file(XTThreadPtr self, XTOpenDirPtr od) { + (void) self; #ifdef XT_WIN if (od->od_data.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY) return FALSE; @@ -1156,6 +1207,15 @@ off_t xt_dir_file_size(XTThreadPtr self, XTOpenDirPtr od) static xtBool fs_map_file(XTFileMemMapPtr mm, XTFilePtr file, xtBool grow) { +#ifdef INJECT_WRITE_REMAP_ERROR + if (xt_is_extension(file->fil_path, INJECT_REMAP_FILE_TYPE)) { + if (mm->mm_length > INJECT_REMAP_FILE_SIZE) { + xt_register_ferrno(XT_REG_CONTEXT, 30, file->fil_path); + return FAILED; + } + } +#endif + ASSERT_NS(!mm->mm_start); #ifdef XT_WIN /* This will grow the file to the given size: */ @@ -1228,7 +1288,7 @@ xtPublic XTMapFilePtr xt_open_fmap(XTThreadPtr self, char *file, size_t grow_siz /* NULL is the value returned on error! */ mm->mm_mapdes = NULL; #endif - xt_rwmutex_init_with_autoname(self, &mm->mm_lock); + FILE_MAP_INIT_LOCK(self, &mm->mm_lock); mm->mm_length = fs_seek_eof(self, map->fr_file->fil_filedes, map->fr_file); if (sizeof(size_t) == 4 && mm->mm_length >= (off_t) 0xFFFFFFFF) xt_throw_ixterr(XT_CONTEXT, XT_ERR_FILE_TOO_LONG, map->fr_file->fil_path); @@ -1257,21 +1317,19 @@ xtPublic XTMapFilePtr xt_open_fmap(XTThreadPtr self, char *file, size_t grow_siz xtPublic void xt_close_fmap(XTThreadPtr self, XTMapFilePtr map) { + ASSERT_NS(!map->mf_slock_count); if (map->fr_file) { - xt_fs_release_file(self, map->fr_file); - xt_sl_lock(self, fs_globals.fsg_open_files); - pushr_(xt_sl_unlock, fs_globals.fsg_open_files); - + pushr_(xt_sl_unlock, fs_globals.fsg_open_files); map->fr_file->fil_handle_count--; - if (!map->fr_file->fil_handle_count) - fs_free_file(self, NULL, &map->fr_file); - + if (!map->fr_file->fil_handle_count) { + fs_close_fmap(self, map->fr_file->fil_memmap); + map->fr_file->fil_memmap = NULL; + } freer_(); - + + xt_fs_release_file(self, map->fr_file); map->fr_file = NULL; - - } map->mf_memmap = NULL; xt_free(self, map); @@ -1346,14 +1404,23 @@ static xtBool fs_remap_file(XTMapFilePtr map, off_t offset, size_t size, XTIOSta } mm->mm_start = NULL; #ifdef XT_WIN - if (!CloseHandle(mm->mm_mapdes)) + /* It is possible that a previous remap attempt has failed: the map was closed + * but the new map was not allocated (e.g. because of insufficient disk space). + * In this case mm->mm_mapdes will be NULL. + */ + if (mm->mm_mapdes && !CloseHandle(mm->mm_mapdes)) return xt_register_ferrno(XT_REG_CONTEXT, fs_get_win_error(), xt_file_path(map)); mm->mm_mapdes = NULL; #endif + off_t old_size = mm->mm_length; mm->mm_length = new_size; - if (!fs_map_file(mm, map->fr_file, TRUE)) + if (!fs_map_file(mm, map->fr_file, TRUE)) { + /* Try to restore old mapping */ + mm->mm_length = old_size; + fs_map_file(mm, map->fr_file, FALSE); return FAILED; + } } return OK; @@ -1367,16 +1434,19 @@ static xtBool fs_remap_file(XTMapFilePtr map, off_t offset, size_t size, XTIOSta xtPublic xtBool xt_pwrite_fmap(XTMapFilePtr map, off_t offset, size_t size, void *data, XTIOStatsPtr stat, XTThreadPtr thread) { XTFileMemMapPtr mm = map->mf_memmap; +#ifndef FILE_MAP_USE_PTHREAD_RW xtThreadID thd_id = thread->t_id; +#endif #ifdef DEBUG_TRACE_MAP_IO xt_trace("/* %s */ pbxt_fmap_writ(\"%s\", %lu, %lu);\n", xt_trace_clock_diff(NULL), map->fr_file->fil_path, (u_long) offset, (u_long) size); #endif - xt_rwmutex_slock(&mm->mm_lock, thd_id); + ASSERT_NS(!map->mf_slock_count); + FILE_MAP_READ_LOCK(&mm->mm_lock, thd_id); if (!mm->mm_start || offset + (off_t) size > mm->mm_length) { - xt_rwmutex_unlock(&mm->mm_lock, thd_id); + FILE_MAP_UNLOCK(&mm->mm_lock, thd_id); - xt_rwmutex_xlock(&mm->mm_lock, thd_id); + FILE_MAP_WRITE_LOCK(&mm->mm_lock, thd_id); if (!fs_remap_file(map, offset, size, stat)) goto failed; } @@ -1396,29 +1466,32 @@ xtPublic xtBool xt_pwrite_fmap(XTMapFilePtr map, off_t offset, size_t size, void memcpy(mm->mm_start + offset, data, size); #endif - xt_rwmutex_unlock(&mm->mm_lock, thd_id); + FILE_MAP_UNLOCK(&mm->mm_lock, thd_id); stat->ts_write += size; return OK; failed: - xt_rwmutex_unlock(&mm->mm_lock, thd_id); + FILE_MAP_UNLOCK(&mm->mm_lock, thd_id); return FAILED; } xtPublic xtBool xt_pread_fmap_4(XTMapFilePtr map, off_t offset, xtWord4 *value, XTIOStatsPtr stat, XTThreadPtr thread) { XTFileMemMapPtr mm = map->mf_memmap; +#ifndef FILE_MAP_USE_PTHREAD_RW xtThreadID thd_id = thread->t_id; +#endif #ifdef DEBUG_TRACE_MAP_IO xt_trace("/* %s */ pbxt_fmap_read_4(\"%s\", %lu, 4);\n", xt_trace_clock_diff(NULL), map->fr_file->fil_path, (u_long) offset); #endif - xt_rwmutex_slock(&mm->mm_lock, thd_id); + if (!map->mf_slock_count) + FILE_MAP_READ_LOCK(&mm->mm_lock, thd_id); if (!mm->mm_start) { - xt_rwmutex_unlock(&mm->mm_lock, thd_id); - xt_rwmutex_xlock(&mm->mm_lock, thd_id); + FILE_MAP_UNLOCK(&mm->mm_lock, thd_id); + FILE_MAP_WRITE_LOCK(&mm->mm_lock, thd_id); if (!fs_remap_file(map, 0, 0, stat)) { - xt_rwmutex_unlock(&mm->mm_lock, thd_id); + FILE_MAP_UNLOCK(&mm->mm_lock, thd_id); return FAILED; } } @@ -1436,7 +1509,7 @@ xtPublic xtBool xt_pread_fmap_4(XTMapFilePtr map, off_t offset, xtWord4 *value, } __except(EXCEPTION_EXECUTE_HANDLER) { - xt_rwmutex_unlock(&mm->mm_lock, thd_id); + FILE_MAP_UNLOCK(&mm->mm_lock, thd_id); return xt_register_ferrno(XT_REG_CONTEXT, GetExceptionCode(), xt_file_path(map)); } #else @@ -1444,7 +1517,8 @@ xtPublic xtBool xt_pread_fmap_4(XTMapFilePtr map, off_t offset, xtWord4 *value, #endif } - xt_rwmutex_unlock(&mm->mm_lock, thd_id); + if (!map->mf_slock_count) + FILE_MAP_UNLOCK(&mm->mm_lock, thd_id); stat->ts_read += 4; return OK; } @@ -1452,7 +1526,9 @@ xtPublic xtBool xt_pread_fmap_4(XTMapFilePtr map, off_t offset, xtWord4 *value, xtPublic xtBool xt_pread_fmap(XTMapFilePtr map, off_t offset, size_t size, size_t min_size, void *data, size_t *red_size, XTIOStatsPtr stat, XTThreadPtr thread) { XTFileMemMapPtr mm = map->mf_memmap; +#ifndef FILE_MAP_USE_PTHREAD_RW xtThreadID thd_id = thread->t_id; +#endif size_t tfer; #ifdef DEBUG_TRACE_MAP_IO @@ -1461,6 +1537,8 @@ xtPublic xtBool xt_pread_fmap(XTMapFilePtr map, off_t offset, size_t size, size_ /* NOTE!! The file map may already be locked, * by a call to xt_lock_fmap_ptr()! * + * 20.05.2009: This problem should be fixed now with mf_slock_count! + * * This can occur during a sequential scan: * xt_pread_fmap() Line 1330 * XTTabCache::tc_read_direct() Line 361 @@ -1491,13 +1569,16 @@ xtPublic xtBool xt_pread_fmap(XTMapFilePtr map, off_t offset, size_t size, size_ * As a result, the slock must be able to handle * nested calls to lock/unlock. */ - xt_rwmutex_slock(&mm->mm_lock, thd_id); + if (!map->mf_slock_count) + FILE_MAP_READ_LOCK(&mm->mm_lock, thd_id); tfer = size; if (!mm->mm_start) { - xt_rwmutex_unlock(&mm->mm_lock, thd_id); - xt_rwmutex_xlock(&mm->mm_lock, thd_id); + FILE_MAP_UNLOCK(&mm->mm_lock, thd_id); + ASSERT_NS(!map->mf_slock_count); + FILE_MAP_WRITE_LOCK(&mm->mm_lock, thd_id); if (!fs_remap_file(map, 0, 0, stat)) { - xt_rwmutex_unlock(&mm->mm_lock, thd_id); + if (!map->mf_slock_count) + FILE_MAP_UNLOCK(&mm->mm_lock, thd_id); return FAILED; } } @@ -1514,7 +1595,8 @@ xtPublic xtBool xt_pread_fmap(XTMapFilePtr map, off_t offset, size_t size, size_ } __except(EXCEPTION_EXECUTE_HANDLER) { - xt_rwmutex_unlock(&mm->mm_lock, thd_id); + if (!map->mf_slock_count) + FILE_MAP_UNLOCK(&mm->mm_lock, thd_id); return xt_register_ferrno(XT_REG_CONTEXT, GetExceptionCode(), xt_file_path(map)); } #else @@ -1522,7 +1604,8 @@ xtPublic xtBool xt_pread_fmap(XTMapFilePtr map, off_t offset, size_t size, size_ #endif } - xt_rwmutex_unlock(&mm->mm_lock, thd_id); + if (!map->mf_slock_count) + FILE_MAP_UNLOCK(&mm->mm_lock, thd_id); if (tfer < min_size) return xt_register_ferrno(XT_REG_CONTEXT, ESPIPE, xt_file_path(map)); @@ -1535,18 +1618,23 @@ xtPublic xtBool xt_pread_fmap(XTMapFilePtr map, off_t offset, size_t size, size_ xtPublic xtBool xt_flush_fmap(XTMapFilePtr map, XTIOStatsPtr stat, XTThreadPtr thread) { XTFileMemMapPtr mm = map->mf_memmap; +#ifndef FILE_MAP_USE_PTHREAD_RW xtThreadID thd_id = thread->t_id; +#endif xtWord8 s; #ifdef DEBUG_TRACE_MAP_IO xt_trace("/* %s */ pbxt_fmap_sync(\"%s\");\n", xt_trace_clock_diff(NULL), map->fr_file->fil_path); #endif - xt_rwmutex_slock(&mm->mm_lock, thd_id); + if (!map->mf_slock_count) + FILE_MAP_READ_LOCK(&mm->mm_lock, thd_id); if (!mm->mm_start) { - xt_rwmutex_unlock(&mm->mm_lock, thd_id); - xt_rwmutex_xlock(&mm->mm_lock, thd_id); + FILE_MAP_UNLOCK(&mm->mm_lock, thd_id); + ASSERT_NS(!map->mf_slock_count); + FILE_MAP_WRITE_LOCK(&mm->mm_lock, thd_id); if (!fs_remap_file(map, 0, 0, stat)) { - xt_rwmutex_unlock(&mm->mm_lock, thd_id); + if (!map->mf_slock_count) + FILE_MAP_UNLOCK(&mm->mm_lock, thd_id); return FAILED; } } @@ -1562,7 +1650,8 @@ xtPublic xtBool xt_flush_fmap(XTMapFilePtr map, XTIOStatsPtr stat, XTThreadPtr t goto failed; } #endif - xt_rwmutex_unlock(&mm->mm_lock, thd_id); + if (!map->mf_slock_count) + FILE_MAP_UNLOCK(&mm->mm_lock, thd_id); s = stat->ts_flush_start; stat->ts_flush_start = 0; stat->ts_flush_time += xt_trace_clock() - s; @@ -1570,22 +1659,27 @@ xtPublic xtBool xt_flush_fmap(XTMapFilePtr map, XTIOStatsPtr stat, XTThreadPtr t return OK; failed: - xt_rwmutex_unlock(&mm->mm_lock, thd_id); + if (!map->mf_slock_count) + FILE_MAP_UNLOCK(&mm->mm_lock, thd_id); s = stat->ts_flush_start; stat->ts_flush_start = 0; stat->ts_flush_time += xt_trace_clock() - s; return FAILED; } -xtPublic xtWord1 *xt_lock_fmap_ptr(XTMapFilePtr map, off_t offset, size_t size, XTIOStatsPtr stat, XTThreadPtr XT_UNUSED(thread)) +xtPublic xtWord1 *xt_lock_fmap_ptr(XTMapFilePtr map, off_t offset, size_t size, XTIOStatsPtr stat, XTThreadPtr thread) { XTFileMemMapPtr mm = map->mf_memmap; +#ifndef FILE_MAP_USE_PTHREAD_RW xtThreadID thd_id = thread->t_id; +#endif - xt_rwmutex_slock(&mm->mm_lock, thd_id); + if (!map->mf_slock_count) + FILE_MAP_READ_LOCK(&mm->mm_lock, thd_id); + map->mf_slock_count++; if (!mm->mm_start) { - xt_rwmutex_unlock(&mm->mm_lock, thd_id); - xt_rwmutex_xlock(&mm->mm_lock, thd_id); + FILE_MAP_UNLOCK(&mm->mm_lock, thd_id); + FILE_MAP_WRITE_LOCK(&mm->mm_lock, thd_id); if (!fs_remap_file(map, 0, 0, stat)) goto failed; } @@ -1599,13 +1693,17 @@ xtPublic xtWord1 *xt_lock_fmap_ptr(XTMapFilePtr map, off_t offset, size_t size, return mm->mm_start + offset; failed: - xt_rwmutex_unlock(&mm->mm_lock, thd_id); + map->mf_slock_count--; + if (!map->mf_slock_count) + FILE_MAP_UNLOCK(&mm->mm_lock, thd_id); return NULL; } xtPublic void xt_unlock_fmap_ptr(XTMapFilePtr map, XTThreadPtr thread) { - xt_rwmutex_unlock(&map->mf_memmap->mm_lock, thread->t_id); + map->mf_slock_count--; + if (!map->mf_slock_count) + FILE_MAP_UNLOCK(&map->mf_memmap->mm_lock, thread->t_id); } /* ---------------------------------------------------------------------- diff --git a/storage/pbxt/src/filesys_xt.h b/storage/pbxt/src/filesys_xt.h index ebc4f474fc9..585ed9bd8d0 100644 --- a/storage/pbxt/src/filesys_xt.h +++ b/storage/pbxt/src/filesys_xt.h @@ -76,13 +76,60 @@ xtBool xt_fs_rename(struct XTThread *self, char *from_path, char *to_path); #define XT_NULL_FD (-1) #endif +/* Note, this lock must be re-entrant, + * The only lock that satifies this is + * FILE_MAP_USE_RWMUTEX! + * + * 20.05.2009: This problem should be fixed now with mf_slock_count! + * + * The lock need no longer be re-entrant + */ +#ifdef XT_NO_ATOMICS +#define FILE_MAP_USE_PTHREAD_RW +#else +//#define FILE_MAP_USE_RWMUTEX +//#define FILE_MAP_USE_PTHREAD_RW +//#define IDX_USE_SPINXSLOCK +#define FILE_MAP_USE_XSMUTEX +#endif + +#ifdef FILE_MAP_USE_XSMUTEX +#define FILE_MAP_LOCK_TYPE XTXSMutexRec +#define FILE_MAP_INIT_LOCK(s, i) xt_xsmutex_init_with_autoname(s, i) +#define FILE_MAP_FREE_LOCK(s, i) xt_xsmutex_free(s, i) +#define FILE_MAP_READ_LOCK(i, o) xt_xsmutex_slock(i, o) +#define FILE_MAP_WRITE_LOCK(i, o) xt_xsmutex_xlock(i, o) +#define FILE_MAP_UNLOCK(i, o) xt_xsmutex_unlock(i, o) +#elif defined(FILE_MAP_USE_PTHREAD_RW) +#define FILE_MAP_LOCK_TYPE xt_rwlock_type +#define FILE_MAP_INIT_LOCK(s, i) xt_init_rwlock(s, i) +#define FILE_MAP_FREE_LOCK(s, i) xt_free_rwlock(i) +#define FILE_MAP_READ_LOCK(i, o) xt_slock_rwlock_ns(i) +#define FILE_MAP_WRITE_LOCK(i, o) xt_xlock_rwlock_ns(i) +#define FILE_MAP_UNLOCK(i, o) xt_unlock_rwlock_ns(i) +#elif defined(FILE_MAP_USE_RWMUTEX) +#define FILE_MAP_LOCK_TYPE XTRWMutexRec +#define FILE_MAP_INIT_LOCK(s, i) xt_rwmutex_init_with_autoname(s, i) +#define FILE_MAP_FREE_LOCK(s, i) xt_rwmutex_free(s, i) +#define FILE_MAP_READ_LOCK(i, o) xt_rwmutex_slock(i, o) +#define FILE_MAP_WRITE_LOCK(i, o) xt_rwmutex_xlock(i, o) +#define FILE_MAP_UNLOCK(i, o) xt_rwmutex_unlock(i, o) +#elif defined(FILE_MAP_USE_SPINXSLOCK) +#define FILE_MAP_LOCK_TYPE XTSpinXSLockRec +#define FILE_MAP_INIT_LOCK(s, i) xt_spinxslock_init_with_autoname(s, i) +#define FILE_MAP_FREE_LOCK(s, i) xt_spinxslock_free(s, i) +#define FILE_MAP_READ_LOCK(i, o) xt_spinxslock_slock(i, o) +#define FILE_MAP_WRITE_LOCK(i, o) xt_spinxslock_xlock(i, o) +#define FILE_MAP_UNLOCK(i, o) xt_spinxslock_unlock(i, o) +#endif + typedef struct XTFileMemMap { xtWord1 *mm_start; /* The in-memory start of the map. */ #ifdef XT_WIN HANDLE mm_mapdes; #endif off_t mm_length; /* The length of the file map. */ - XTRWMutexRec mm_lock; /* The file map R/W lock. */ + FILE_MAP_LOCK_TYPE mm_lock; /* The file map R/W lock. */ size_t mm_grow_size; /* The amount by which the map file is increased. */ } XTFileMemMapRec, *XTFileMemMapPtr; @@ -127,6 +174,9 @@ xtBool xt_pwrite_file(XTOpenFilePtr of, off_t offset, size_t size, void *data, xtBool xt_pread_file(XTOpenFilePtr of, off_t offset, size_t size, size_t min_size, void *data, size_t *red_size, struct XTIOStats *timer, struct XTThread *thread); xtBool xt_flush_file(XTOpenFilePtr of, struct XTIOStats *timer, struct XTThread *thread); +xtBool xt_lock_file_ptr(XTOpenFilePtr of, xtWord1 **data, off_t offset, size_t size, struct XTIOStats *timer, struct XTThread *thread); +void xt_unlock_file_ptr(XTOpenFilePtr of, xtWord1 *data, struct XTThread *thread); + typedef struct XTOpenDir { char *od_path; #ifdef XT_WIN @@ -134,8 +184,14 @@ typedef struct XTOpenDir { WIN32_FIND_DATA od_data; #else char *od_filter; - struct dirent od_entry; DIR *od_dir; + /* WARNING: Solaris requires od_entry.d_name member to have size at least as returned + * by pathconf() function on per-directory basis. This makes it impossible to statically + * pre-set the size. So xt_dir_open on Solaris dynamically allocates space as needed. + * + * This also means that the od_entry member should always be last in the XTOpenDir structure. + */ + struct dirent od_entry; #endif } XTOpenDirRec, *XTOpenDirPtr; @@ -147,6 +203,7 @@ xtBool xt_dir_is_file(struct XTThread *self, XTOpenDirPtr od); off_t xt_dir_file_size(struct XTThread *self, XTOpenDirPtr od); typedef struct XTMapFile : public XTFileRef { + u_int mf_slock_count; XTFileMemMapPtr mf_memmap; } XTMapFileRec, *XTMapFilePtr; diff --git a/storage/pbxt/src/ha_pbxt.cc b/storage/pbxt/src/ha_pbxt.cc index 86cea271e0d..315b7ff74d6 100644 --- a/storage/pbxt/src/ha_pbxt.cc +++ b/storage/pbxt/src/ha_pbxt.cc @@ -65,12 +65,13 @@ extern "C" char **session_query(Session *session); #include "heap_xt.h" #include "myxt_xt.h" #include "datadic_xt.h" -#ifdef XT_STREAMING -#include "streaming_xt.h" +#ifdef PBMS_ENABLED +#include "pbms_enabled.h" #endif #include "tabcache_xt.h" #include "systab_xt.h" #include "xaction_xt.h" +#include "restart_xt.h" #ifdef DEBUG //#define XT_USE_SYS_PAR_DEBUG_SIZES @@ -91,16 +92,16 @@ extern "C" char **session_query(Session *session); //#define PRINT_STATEMENTS #endif +#ifndef DRIZZLED static handler *pbxt_create_handler(handlerton *hton, TABLE_SHARE *table, MEM_ROOT *mem_root); static int pbxt_init(void *p); static int pbxt_end(void *p); -#ifndef DRIZZLED static int pbxt_panic(handlerton *hton, enum ha_panic_function flag); -#endif static void pbxt_drop_database(handlerton *hton, char *path); static int pbxt_close_connection(handlerton *hton, THD* thd); static int pbxt_commit(handlerton *hton, THD *thd, bool all); static int pbxt_rollback(handlerton *hton, THD *thd, bool all); +#endif static void ha_aquire_exclusive_use(XTThreadPtr self, XTSharePtr share, ha_pbxt *mine); static void ha_release_exclusive_use(XTThreadPtr self, XTSharePtr share); static void ha_close_open_tables(XTThreadPtr self, XTSharePtr share, ha_pbxt *mine); @@ -167,7 +168,7 @@ xtBool pbxt_crash_debug = FALSE; /* Variables for pbxt share methods */ static xt_mutex_type pbxt_database_mutex; // Prevent a database from being opened while it is being dropped static XTHashTabPtr pbxt_share_tables; // Hash used to track open tables -static XTDatabaseHPtr pbxt_database = NULL; // The global open database +XTDatabaseHPtr pbxt_database = NULL; // The global open database static char *pbxt_index_cache_size; static char *pbxt_record_cache_size; static char *pbxt_log_cache_size; @@ -178,6 +179,7 @@ static char *pbxt_checkpoint_frequency; static char *pbxt_data_log_threshold; static char *pbxt_data_file_grow_size; static char *pbxt_row_file_grow_size; +static int pbxt_max_threads; #ifdef DEBUG #define XT_SHARE_LOCK_WAIT 5000 @@ -454,14 +456,20 @@ xtPublic void xt_ha_close_global_database(XTThreadPtr self) * operation to make sure it does not occur while * some other thread is doing a "closeall". */ -xtPublic void xt_ha_open_database_of_table(XTThreadPtr self, XTPathStrPtr table_path __attribute__((unused))) +xtPublic void xt_ha_open_database_of_table(XTThreadPtr self, XTPathStrPtr XT_UNUSED(table_path)) { #ifdef XT_USE_GLOBAL_DB if (!self->st_database) { if (!pbxt_database) { xt_open_database(self, mysql_real_data_home, TRUE); - pbxt_database = self->st_database; - xt_heap_reference(self, pbxt_database); + /* {GLOBAL-DB} + * This can be done at the same time as the recovery thread, + * strictly speaking I need a lock. + */ + if (!pbxt_database) { + pbxt_database = self->st_database; + xt_heap_reference(self, pbxt_database); + } } else xt_use_database(self, pbxt_database, XT_FOR_USER); @@ -574,7 +582,7 @@ xtPublic XTThreadPtr xt_ha_thd_to_self(THD *thd) } /* The first bit is 1. */ -static u_int ha_get_max_bit(MY_BITMAP *map) +static u_int ha_get_max_bit(MX_BITMAP *map) { my_bitmap_map *data_ptr = map->bitmap; my_bitmap_map *end_ptr = map->last_word_ptr; @@ -676,7 +684,7 @@ xtPublic int xt_ha_pbxt_to_mysql_error(int xt_err) return(-1); // Unknown error } -xtPublic int xt_ha_pbxt_thread_error_for_mysql(THD *thd __attribute__((unused)), const XTThreadPtr self, int ignore_dup_key) +xtPublic int xt_ha_pbxt_thread_error_for_mysql(THD *XT_UNUSED(thd), const XTThreadPtr self, int ignore_dup_key) { int xt_err = self->t_exception.e_xt_err; @@ -960,13 +968,15 @@ static void pbxt_call_exit(XTThreadPtr self) */ static void ha_exit(XTThreadPtr self) { + xt_xres_wait_for_recovery(self); + /* Wrap things up... */ xt_unuse_database(self, self); /* Just in case the main thread has a database in use (for testing)? */ /* This may cause the streaming engine to cleanup connections and * tables belonging to this engine. This in turn may require some of * the stuff below (like xt_create_thread() called from pbxt_close_table()! */ -#ifdef XT_STREAMING - xt_exit_streaming(); +#ifdef PBMS_ENABLED + pbms_finalize(); #endif pbxt_call_exit(self); xt_exit_threading(self); @@ -979,9 +989,13 @@ static void ha_exit(XTThreadPtr self) /* * Outout the PBXT status. Return FALSE on error. */ -static bool pbxt_show_status(handlerton *hton __attribute__((unused)), THD* thd, +#ifdef DRIZZLED +bool PBXTStorageEngine::show_status(Session *thd, stat_print_fn *stat_print, enum ha_stat_type) +#else +static bool pbxt_show_status(handlerton *XT_UNUSED(hton), THD* thd, stat_print_fn* stat_print, - enum ha_stat_type stat_type __attribute__((unused))) + enum ha_stat_type XT_UNUSED(stat_type)) +#endif { XTThreadPtr self; int err = 0; @@ -997,6 +1011,9 @@ static bool pbxt_show_status(handlerton *hton __attribute__((unused)), THD* thd, xt_trace("// %s - dump\n", xt_trace_clock_diff(NULL)); xt_dump_trace(); #endif +#ifdef XT_TRACK_CONNECTIONS + xt_dump_conn_tracking(); +#endif try_(a) { myxt_get_status(self, &strbuf); @@ -1020,14 +1037,18 @@ static bool pbxt_show_status(handlerton *hton __attribute__((unused)), THD* thd, * * return 1 on error, else 0. */ +#ifdef DRIZZLED +static int pbxt_init(PluginRegistry ®istry) +#else static int pbxt_init(void *p) +#endif { int init_err = 0; XT_TRACE_CALL(); if (sizeof(xtWordPS) != sizeof(void *)) { - printf("PBXT: This won't work, I require that sizeof(xtWordPS) != sizeof(void *)!\n"); + printf("PBXT: This won't work, I require that sizeof(xtWordPS) == sizeof(void *)!\n"); XT_RETURN(1); } @@ -1045,28 +1066,31 @@ static int pbxt_init(void *p) xt_p_mutex_init_with_autoname(&pbxt_database_mutex, NULL); +#ifdef DRIZZLED + pbxt_hton= new PBXTStorageEngine(std::string("PBXT")); + registry.add(pbxt_hton); +#else pbxt_hton = (handlerton *) p; pbxt_hton->state = SHOW_OPTION_YES; -#ifndef DRIZZLED pbxt_hton->db_type = DB_TYPE_PBXT; // Wow! I have my own! -#endif pbxt_hton->close_connection = pbxt_close_connection; /* close_connection, cleanup thread related data. */ pbxt_hton->commit = pbxt_commit; /* commit */ pbxt_hton->rollback = pbxt_rollback; /* rollback */ pbxt_hton->create = pbxt_create_handler; /* Create a new handler */ pbxt_hton->drop_database = pbxt_drop_database; /* Drop a database */ -#ifndef DRIZZLED pbxt_hton->panic = pbxt_panic; /* Panic call */ -#endif pbxt_hton->show_status = pbxt_show_status; pbxt_hton->flags = HTON_NO_FLAGS; /* HTON_CAN_RECREATE - Without this flags TRUNCATE uses delete_all_rows() */ - +#endif if (!xt_init_logging()) /* Initialize logging */ goto error_1; -#ifdef XT_STREAMING - if (!xt_init_streaming()) +#ifdef PBMS_ENABLED + PBMSResultRec result; + if (!pbms_initialize("PBXT", false, &result)) { + xt_logf(XT_NT_ERROR, "pbms_initialize() Error: %s", result.mr_message); goto error_2; + } #endif if (!xt_init_memory()) /* Initialize memory */ @@ -1082,9 +1106,13 @@ static int pbxt_init(void *p) * +1 Free'er thread * +1 Temporary thread (e.g. TempForClose, TempForEnd) */ - self = xt_init_threading(max_connections + 7); /* Create the main self: */ +#ifndef DRIZZLED + if (pbxt_max_threads == 0) + pbxt_max_threads = max_connections + 7; +#endif + self = xt_init_threading(pbxt_max_threads); /* Create the main self: */ if (!self) - goto error_4; + goto error_3; pbxt_inited = true; @@ -1111,7 +1139,7 @@ static int pbxt_init(void *p) ASSERT(!pbxt_database); { THD *curr_thd = current_thd; - THD *thd = curr_thd; + THD *thd = NULL; #ifndef DRIZZLED extern myxt_mutex_t LOCK_plugin; @@ -1148,21 +1176,23 @@ static int pbxt_init(void *p) xt_throw(self); } - xt_open_database(self, mysql_real_data_home, TRUE); - pbxt_database = self->st_database; - xt_heap_reference(self, pbxt_database); + xt_xres_start_database_recovery(self); } catch_(b) { - if (!curr_thd && thd) - myxt_destroy_thread(thd, FALSE); -#ifndef DRIZZLED - myxt_mutex_lock(&LOCK_plugin); -#endif - xt_throw(self); + /* It is possible that the error was reset by cleanup code. + * Set a generic error code in that case. + */ + /* PMC - This is not necessary in because exceptions are + * now preserved, in exception handler cleanup. + */ + if (!self->t_exception.e_xt_err) + xt_register_error(XT_REG_CONTEXT, XT_SYSTEM_ERROR, 0, "Initialization failed"); + xt_log_exception(self, &self->t_exception, XT_LOG_DEFAULT); + init_err = 1; } cont_(b); - if (!curr_thd) + if (thd) myxt_destroy_thread(thd, FALSE); #ifndef DRIZZLED myxt_mutex_lock(&LOCK_plugin); @@ -1205,32 +1235,37 @@ static int pbxt_init(void *p) * I have to stop the freeer here because it was * started before opening the database. */ - pbxt_call_exit(self); - pbxt_inited = FALSE; - xt_exit_threading(self); - goto error_4; + + /* {FREEER-HANG-ON-INIT-ERROR} + * pbxt_init is called with LOCK_plugin and if it fails and tries to exit + * the freeer here it hangs because the freeer calls THD::~THD which tries + * to aquire the same lock and hangs. OTOH MySQL calls pbxt_end() after + * an unsuccessful call to pbxt_init, so we defer cleaup, except + * releasing 'self' + */ + xt_free_thread(self); + goto error_3; } xt_free_thread(self); } XT_RETURN(init_err); - error_4: - xt_exit_memory(); - error_3: -#ifdef XT_STREAMING - xt_exit_streaming(); +#ifdef PBMS_ENABLED + pbms_finalize(); error_2: #endif - xt_exit_logging(); error_1: - xt_p_mutex_destroy(&pbxt_database_mutex); XT_RETURN(1); } -static int pbxt_end(void *p __attribute__((unused))) +#ifdef DRIZZLED +static int pbxt_end(PluginRegistry ®istry) +#else +static int pbxt_end(void *) +#endif { XTThreadPtr self; int err = 0; @@ -1241,7 +1276,7 @@ static int pbxt_end(void *p __attribute__((unused))) XTExceptionRec e; /* This flag also means "shutting down". */ - pbxt_inited = FALSE; + pbxt_inited = FALSE; self = xt_create_thread("TempForEnd", FALSE, TRUE, &e); if (self) { self->t_main = TRUE; @@ -1249,6 +1284,9 @@ static int pbxt_end(void *p __attribute__((unused))) } } +#ifdef DRIZZLED + registry.remove(pbxt_hton); +#endif XT_RETURN(err); } @@ -1262,12 +1300,15 @@ static int pbxt_panic(handlerton *hton, enum ha_panic_function flag) /* * Kill the PBXT thread associated with the MySQL thread. */ +#ifdef DRIZZLED +int PBXTStorageEngine::close_connection(Session *thd) +{ + PBXTStorageEngine * const hton = this; +#else static int pbxt_close_connection(handlerton *hton, THD* thd) { - XTThreadPtr self; -#ifdef XT_STREAMING - XTExceptionRec e; #endif + XTThreadPtr self; XT_TRACE_CALL(); if ((self = (XTThreadPtr) *thd_ha_data(thd, hton))) { @@ -1278,10 +1319,6 @@ static int pbxt_close_connection(handlerton *hton, THD* thd) xt_set_self(self); xt_free_thread(self); } -#ifdef XT_STREAMING - if (!xt_pbms_close_connection((void *) thd, &e)) - xt_log_exception(NULL, &e, XT_LOG_DEFAULT); -#endif return 0; } @@ -1290,7 +1327,11 @@ static int pbxt_close_connection(handlerton *hton, THD* thd) * when the last PBXT table was removed from the * database. */ -static void pbxt_drop_database(handlerton *hton __attribute__((unused)), char *path __attribute__((unused))) +#ifdef DRIZZLED +void PBXTStorageEngine::drop_database(char *) +#else +static void pbxt_drop_database(handlerton *XT_UNUSED(hton), char *XT_UNUSED(path)) +#endif { XT_TRACE_CALL(); } @@ -1317,8 +1358,14 @@ static void pbxt_drop_database(handlerton *hton __attribute__((unused)), char *p * pbxt_thr is a pointer the the PBXT thread structure. * */ +#ifdef DRIZZLED +int PBXTStorageEngine::commit(Session *thd, bool all) +{ + PBXTStorageEngine * const hton = this; +#else static int pbxt_commit(handlerton *hton, THD *thd, bool all) { +#endif int err = 0; XTThreadPtr self; @@ -1343,8 +1390,14 @@ static int pbxt_commit(handlerton *hton, THD *thd, bool all) return err; } +#ifdef DRIZZLED +int PBXTStorageEngine::rollback(Session *thd, bool all) +{ + PBXTStorageEngine * const hton = this; +#else static int pbxt_rollback(handlerton *hton, THD *thd, bool all) { +#endif int err = 0; XTThreadPtr self; @@ -1377,8 +1430,14 @@ static int pbxt_rollback(handlerton *hton, THD *thd, bool all) return 0; } +#ifdef DRIZZLED +handler *PBXTStorageEngine::create(TABLE_SHARE *table, MEM_ROOT *mem_root) +{ + PBXTStorageEngine * const hton = this; +#else static handler *pbxt_create_handler(handlerton *hton, TABLE_SHARE *table, MEM_ROOT *mem_root) { +#endif if (table && XTSystemTableShare::isSystemTable(table->path.str)) return new (mem_root) ha_xtsys(hton, table); else @@ -1513,7 +1572,11 @@ static void ha_close_open_tables(XTThreadPtr self, XTSharePtr share, ha_pbxt *mi freer_(); // xt_unlock_mutex(share->sh_ex_mutex) } -static void ha_release_exclusive_use(XTThreadPtr self __attribute__((unused)), XTSharePtr share) +#ifdef PBXT_ALLOW_PRINTING +static void ha_release_exclusive_use(XTThreadPtr self, XTSharePtr share) +#else +static void ha_release_exclusive_use(XTThreadPtr XT_UNUSED(self), XTSharePtr share) +#endif { XT_PRINT1(self, "ha_release_exclusive_use %s PBXT X UNLOCK\n", share->sh_table_path->ps_path); xt_lock_mutex_ns((xt_mutex_type *) share->sh_ex_mutex); @@ -1629,11 +1692,23 @@ ST_FIELD_INFO pbxt_statistics_fields_info[]= { 0, 0, MYSQL_TYPE_STRING, 0, 0, 0, SKIP_OPEN_TABLE} }; +#ifdef DRIZZLED +static InfoSchemaTable *pbxt_statistics_table; + +int pbxt_init_statitics(PluginRegistry ®istry) +#else int pbxt_init_statitics(void *p) +#endif { - ST_SCHEMA_TABLE *schema = (ST_SCHEMA_TABLE *) p; - schema->fields_info = pbxt_statistics_fields_info; - schema->fill_table = pbxt_statistics_fill_table; +#ifdef DRIZZLED + pbxt_statistics_table = (InfoSchemaTable *)xt_calloc_ns(sizeof(InfoSchemaTable)); + pbxt_statistics_table->table_name= "PBXT_STATISTICS"; + registry.add(pbxt_statistics_table); +#else + ST_SCHEMA_TABLE *pbxt_statistics_table = (ST_SCHEMA_TABLE *) p; +#endif + pbxt_statistics_table->fields_info = pbxt_statistics_fields_info; + pbxt_statistics_table->fill_table = pbxt_statistics_fill_table; #if defined(XT_WIN) && defined(XT_COREDUMP) void register_crash_filter(); @@ -1645,8 +1720,16 @@ int pbxt_init_statitics(void *p) return 0; } -int pbxt_exit_statitics(void *p __attribute__((unused))) +#ifdef DRIZZLED +int pbxt_exit_statitics(PluginRegistry ®istry) +#else +int pbxt_exit_statitics(void *XT_UNUSED(p)) +#endif { +#ifdef DRIZZLED + registry.remove(pbxt_statistics_table); + xt_free_ns(pbxt_statistics_table); +#endif return(0); } @@ -1765,7 +1848,7 @@ MX_TABLE_TYPES_T ha_pbxt::table_flags() const */ #define FLAGS_ARE_READ_DYNAMICALLY -MX_ULONG_T ha_pbxt::index_flags(uint inx __attribute__((unused)), uint part __attribute__((unused)), bool all_parts __attribute__((unused))) const +MX_ULONG_T ha_pbxt::index_flags(uint XT_UNUSED(inx), uint XT_UNUSED(part), bool XT_UNUSED(all_parts)) const { /* It would be nice if the dynamic version of this function works, * but it does not. MySQL loads this information when the table is openned, @@ -1876,7 +1959,7 @@ void ha_pbxt::internal_close(THD *thd, struct XTThread *self) * Called from handler.cc by handler::ha_open(). The server opens all tables by * calling ha_open() which then calls the handler specific open(). */ -int ha_pbxt::open(const char *table_path, int mode __attribute__((unused)), uint test_if_locked __attribute__((unused))) +int ha_pbxt::open(const char *table_path, int XT_UNUSED(mode), uint XT_UNUSED(test_if_locked)) { THD *thd = current_thd; int err = 0; @@ -2104,9 +2187,9 @@ void ha_pbxt::init_auto_increment(xtWord8 min_auto_inc) } void ha_pbxt::get_auto_increment(MX_ULONGLONG_T offset, MX_ULONGLONG_T increment, - MX_ULONGLONG_T nb_desired_values __attribute__((unused)), + MX_ULONGLONG_T XT_UNUSED(nb_desired_values), MX_ULONGLONG_T *first_value, - MX_ULONGLONG_T *nb_reserved_values __attribute__((unused))) + MX_ULONGLONG_T *nb_reserved_values) { register XTTableHPtr tab; MX_ULONGLONG_T nr, nr_less_inc; @@ -2225,6 +2308,14 @@ int ha_pbxt::write_row(byte *buf) XT_PRINT1(pb_open_tab->ot_thread, "ha_pbxt::write_row %s\n", pb_share->sh_table_path->ps_path); XT_DISABLED_TRACE(("INSERT tx=%d val=%d\n", (int) pb_open_tab->ot_thread->st_xact_data->xd_start_xn_id, (int) XT_GET_DISK_4(&buf[1]))); //statistic_increment(ha_write_count,&LOCK_status); +#ifdef PBMS_ENABLED + PBMSResultRec result; + err = pbms_write_row_blobs(table, buf, &result); + if (err) { + xt_logf(XT_NT_ERROR, "pbms_write_row_blobs() Error: %s", result.mr_message); + return err; + } +#endif /* GOTCHA: I have a huge problem with the transaction statement. * It is not ALWAYS committed (I mean ha_commit_trans() is @@ -2256,7 +2347,8 @@ int ha_pbxt::write_row(byte *buf) int update_err = update_auto_increment(); if (update_err) { ha_log_pbxt_thread_error_for_mysql(pb_ignore_dup_key); - return update_err; + err = update_err; + goto done; } set_auto_increment(table->next_number_field); } @@ -2274,6 +2366,10 @@ int ha_pbxt::write_row(byte *buf) pb_open_tab->ot_thread->st_update_id++; } + done: +#ifdef PBMS_ENABLED + pbms_completed(table, (err == 0)); +#endif return err; } @@ -2347,6 +2443,21 @@ int ha_pbxt::update_row(const byte * old_data, byte * new_data) if (table->timestamp_field_type & TIMESTAMP_AUTO_SET_ON_UPDATE) table->timestamp_field->set_time(); +#ifdef PBMS_ENABLED + PBMSResultRec result; + + err = pbms_delete_row_blobs(table, old_data, &result); + if (err) { + xt_logf(XT_NT_ERROR, "update_row:pbms_delete_row_blobs() Error: %s", result.mr_message); + return err; + } + err = pbms_write_row_blobs(table, new_data, &result); + if (err) { + xt_logf(XT_NT_ERROR, "update_row:pbms_write_row_blobs() Error: %s", result.mr_message); + goto pbms_done; + } +#endif + /* GOTCHA: We need to check the auto-increment value on update * because of the following test (which fails for InnoDB) - * auto_increment.test: @@ -2369,6 +2480,11 @@ int ha_pbxt::update_row(const byte * old_data, byte * new_data) err = ha_log_pbxt_thread_error_for_mysql(pb_ignore_dup_key); pb_open_tab->ot_table->tab_locks.xt_remove_temp_lock(pb_open_tab, TRUE); + +#ifdef PBMS_ENABLED + pbms_done: + pbms_completed(table, (err == 0)); +#endif return err; } @@ -2392,6 +2508,16 @@ int ha_pbxt::delete_row(const byte * buf) XT_DISABLED_TRACE(("DELETE tx=%d val=%d\n", (int) pb_open_tab->ot_thread->st_xact_data->xd_start_xn_id, (int) XT_GET_DISK_4(&buf[1]))); //statistic_increment(ha_delete_count,&LOCK_status); +#ifdef PBMS_ENABLED + PBMSResultRec result; + + err = pbms_delete_row_blobs(table, buf, &result); + if (err) { + xt_logf(XT_NT_ERROR, "pbms_delete_row_blobs() Error: %s", result.mr_message); + return err; + } +#endif + if (!pb_open_tab->ot_thread->st_stat_trans) { trans_register_ha(pb_mysql_thd, FALSE, pbxt_hton); XT_PRINT0(pb_open_tab->ot_thread, "ha_pbxt::delete_row trans_register_ha all=FALSE\n"); @@ -2405,6 +2531,9 @@ int ha_pbxt::delete_row(const byte * buf) pb_open_tab->ot_table->tab_locks.xt_remove_temp_lock(pb_open_tab, TRUE); +#ifdef PBMS_ENABLED + pbms_completed(table, (err == 0)); +#endif return err; } @@ -2491,7 +2620,7 @@ int ha_pbxt::delete_row(const byte * buf) * commit; */ -int ha_pbxt::xt_index_in_range(register XTOpenTablePtr ot __attribute__((unused)), register XTIndexPtr ind, +int ha_pbxt::xt_index_in_range(register XTOpenTablePtr XT_UNUSED(ot), register XTIndexPtr ind, register XTIdxSearchKeyPtr search_key, xtWord1 *buf) { /* If search key is given, this means we want an exact match. */ @@ -2698,7 +2827,7 @@ int ha_pbxt::xt_index_prev_read(XTOpenTablePtr ot, XTIndexPtr ind, xtBool key_on return ha_log_pbxt_thread_error_for_mysql(FALSE); } -int ha_pbxt::index_init(uint idx, bool sorted __attribute__((unused))) +int ha_pbxt::index_init(uint idx, bool XT_UNUSED(sorted)) { XTIndexPtr ind; @@ -2715,7 +2844,8 @@ int ha_pbxt::index_init(uint idx, bool sorted __attribute__((unused))) /* The number of columns required: */ if (pb_open_tab->ot_is_modify) { - pb_open_tab->ot_cols_req = table->read_set->n_bits; + + pb_open_tab->ot_cols_req = table->read_set->MX_BIT_SIZE(); #ifdef XT_PRINT_INDEX_OPT ind = (XTIndexPtr) pb_share->sh_dic_keys[idx]; @@ -2764,10 +2894,10 @@ int ha_pbxt::index_init(uint idx, bool sorted __attribute__((unused))) * seem to have this problem! */ ind = (XTIndexPtr) pb_share->sh_dic_keys[idx]; - if (bitmap_is_subset(table->read_set, &ind->mi_col_map)) + if (MX_BIT_IS_SUBSET(table->read_set, &ind->mi_col_map)) pb_key_read = TRUE; #ifdef XT_PRINT_INDEX_OPT - printf("index_init %s index %d cols req=%d/%d read_bits=%X write_bits=%X index_bits=%X converage=%d\n", pb_open_tab->ot_table->tab_name->ps_path, (int) idx, pb_open_tab->ot_cols_req, table->read_set->n_bits, (int) *table->read_set->bitmap, (int) *table->write_set->bitmap, (int) *ind->mi_col_map.bitmap, (int) (bitmap_is_subset(table->read_set, &ind->mi_col_map) != 0)); + printf("index_init %s index %d cols req=%d/%d read_bits=%X write_bits=%X index_bits=%X converage=%d\n", pb_open_tab->ot_table->tab_name->ps_path, (int) idx, pb_open_tab->ot_cols_req, table->read_set->MX_BIT_SIZE(), (int) *table->read_set->bitmap, (int) *table->write_set->bitmap, (int) *ind->mi_col_map.bitmap, (int) (MX_BIT_IS_SUBSET(table->read_set, &ind->mi_col_map) != 0)); #endif } @@ -2845,7 +2975,7 @@ void ha_return_row(XTOpenTablePtr ot, u_int index) } #endif -int ha_pbxt::index_read_xt(byte * buf, uint idx, const byte *key, uint key_len __attribute__((unused)), enum ha_rkey_function find_flag __attribute__((unused))) +int ha_pbxt::index_read_xt(byte * buf, uint idx, const byte *key, uint key_len, enum ha_rkey_function find_flag) { int err = 0; XTIndexPtr ind; @@ -2887,9 +3017,12 @@ int ha_pbxt::index_read_xt(byte * buf, uint idx, const byte *key, uint key_len _ xt_idx_prep_key(ind, &search_key, ((find_flag == HA_READ_AFTER_KEY) ? XT_SEARCH_AFTER_KEY : 0) | prefix, (xtWord1 *) key, key_len); if (!xt_idx_search(pb_open_tab, ind, &search_key)) err = ha_log_pbxt_thread_error_for_mysql(pb_ignore_dup_key); - else + else { err = xt_index_next_read(pb_open_tab, ind, pb_key_read, (find_flag == HA_READ_KEY_EXACT || find_flag == HA_READ_PREFIX) ? &search_key : NULL, buf); + if (err == HA_ERR_END_OF_FILE && find_flag == HA_READ_AFTER_KEY) + err = HA_ERR_KEY_NOT_FOUND; + } break; } @@ -2913,13 +3046,13 @@ int ha_pbxt::index_read_xt(byte * buf, uint idx, const byte *key, uint key_len _ * row if available. If the key value is null, begin at the first key of the * index. */ -int ha_pbxt::index_read(byte * buf, const byte * key, uint key_len __attribute__((unused)), enum ha_rkey_function find_flag __attribute__((unused))) +int ha_pbxt::index_read(byte * buf, const byte * key, uint key_len, enum ha_rkey_function find_flag) { //statistic_increment(ha_read_key_count,&LOCK_status); return index_read_xt(buf, active_index, key, key_len, find_flag); } -int ha_pbxt::index_read_idx(byte * buf, uint idx, const byte *key, uint key_len __attribute__((unused)), enum ha_rkey_function find_flag __attribute__((unused))) +int ha_pbxt::index_read_idx(byte * buf, uint idx, const byte *key, uint key_len, enum ha_rkey_function find_flag) { //statistic_increment(ha_read_key_count,&LOCK_status); return index_read_xt(buf, idx, key, key_len, find_flag); @@ -3147,9 +3280,24 @@ int ha_pbxt::rnd_init(bool scan) XT_PRINT1(pb_open_tab->ot_thread, "ha_pbxt::rnd_init %s\n", pb_share->sh_table_path->ps_path); XT_DISABLED_TRACE(("seq scan tx=%d\n", (int) pb_open_tab->ot_thread->st_xact_data->xd_start_xn_id)); + /* Call xt_tab_seq_exit() to make sure the resources used by the previous + * scan are freed. In particular make sure cache page ref count is decremented. + * This is needed as rnd_init() can be called mulitple times w/o matching calls + * to rnd_end(). Our experience is that currently this is done in queries like: + * + * SELECT t1.c1,t2.c1 FROM t1 LEFT JOIN t2 USING (c1); + * UPDATE t1 LEFT JOIN t2 USING (c1) SET t1.c1 = t2.c1 WHERE t1.c1 = t2.c1; + * + * when scanning inner tables. It is important to understand that in such case + * multiple calls to rnd_init() are not semantically equal to a new query. For + * example we cannot make row locks permanent as we do in rnd_end(), as + * ha_pbxt::unlock_row still can be called. + */ + xt_tab_seq_exit(pb_open_tab); + /* The number of columns required: */ if (pb_open_tab->ot_is_modify) - pb_open_tab->ot_cols_req = table->read_set->n_bits; + pb_open_tab->ot_cols_req = table->read_set->MX_BIT_SIZE(); else { pb_open_tab->ot_cols_req = ha_get_max_bit(table->read_set); @@ -3243,7 +3391,7 @@ int ha_pbxt::rnd_next(byte *buf) * * Called from filesort.cc, sql_select.cc, sql_delete.cc and sql_update.cc. */ -void ha_pbxt::position(const byte *record __attribute__((unused))) +void ha_pbxt::position(const byte *XT_UNUSED(record)) { XT_TRACE_CALL(); ASSERT_NS(pb_ex_in_use); @@ -3383,7 +3531,7 @@ int ha_pbxt::info(uint flag) if (flag & HA_STATUS_VARIABLE) { stats.deleted = ot->ot_table->tab_row_fnum; stats.records = (ha_rows) (ot->ot_table->tab_row_eof_id - 1 - stats.deleted); - stats.data_file_length = ot->ot_table->tab_rec_eof_id; + stats.data_file_length = xt_rec_id_to_rec_offset(ot->ot_table, ot->ot_table->tab_rec_eof_id); stats.index_file_length = xt_ind_node_to_offset(ot->ot_table, ot->ot_table->tab_ind_eof); stats.delete_length = ot->ot_table->tab_rec_fnum * ot->ot_rec_size; //check_time = info.check_time; @@ -3434,10 +3582,15 @@ int ha_pbxt::info(uint flag) #endif #endif // SAFE_MUTEX +#ifdef DRIZZLED + set_prefix(share->keys_in_use, share->keys); + share->keys_for_keyread&= share->keys_in_use; +#else share->keys_in_use.set_prefix(share->keys); //share->keys_in_use.intersect_extended(info.key_map); share->keys_for_keyread.intersect(share->keys_in_use); //share->db_record_offset = info.record_offset; +#endif for (u_int i = 0; i < share->keys; i++) { ind = pb_share->sh_dic_keys[i]; @@ -3445,7 +3598,7 @@ int ha_pbxt::info(uint flag) if (ind->mi_seg_count == 1 && (ind->mi_flags & HA_NOSAME)) rec_per_key = 1; else { - + rec_per_key = 1; } for (u_int j = 0; j < table->key_info[i].key_parts; j++) table->key_info[i].rec_per_key[j] = (ulong) rec_per_key; @@ -3570,6 +3723,8 @@ int ha_pbxt::extra(enum ha_extra_function operation) if (pb_open_tab) pb_open_tab->ot_table->tab_locks.xt_make_lock_permanent(pb_open_tab, &self->st_lock_list); } + if (pb_open_tab) + pb_open_tab->ot_for_update = 0; break; case HA_EXTRA_KEYREAD: /* This means we so not need to read the entire record. */ @@ -3706,6 +3861,12 @@ int ha_pbxt::delete_all_rows() */ ha_close_share(self, pb_share); + /* MySQL documentation requires us to reset auto increment value to 1 + * on truncate even if the table was created with a different value. + * This is also consistent with other engines. + */ + dic.dic_min_auto_inc = 1; + xt_create_table(self, (XTPathStrPtr) path, &dic); if (!pb_table_locked) freer_(); // ha_release_exclusive_use(pb_share) @@ -3737,7 +3898,7 @@ int ha_pbxt::delete_all_rows() * now agree with the MyISAM strategy. * */ -int ha_pbxt::analyze(THD *thd __attribute__((unused)), HA_CHECK_OPT *check_opt __attribute__((unused))) +int ha_pbxt::analyze(THD *thd, HA_CHECK_OPT *XT_UNUSED(check_opt)) { int err = 0; XTDatabaseHPtr db; @@ -3819,7 +3980,7 @@ int ha_pbxt::analyze(THD *thd __attribute__((unused)), HA_CHECK_OPT *check_opt _ XT_RETURN(err); } -int ha_pbxt::repair(THD *thd __attribute__((unused)), HA_CHECK_OPT *check_opt __attribute__((unused))) +int ha_pbxt::repair(THD *XT_UNUSED(thd), HA_CHECK_OPT *XT_UNUSED(check_opt)) { return(HA_ADMIN_TRY_ALTER); } @@ -3828,7 +3989,7 @@ int ha_pbxt::repair(THD *thd __attribute__((unused)), HA_CHECK_OPT *check_opt __ * This is mapped to "ALTER TABLE tablename TYPE=PBXT", which rebuilds * the table in MySQL. */ -int ha_pbxt::optimize(THD *thd __attribute__((unused)), HA_CHECK_OPT *check_opt __attribute__((unused))) +int ha_pbxt::optimize(THD *XT_UNUSED(thd), HA_CHECK_OPT *XT_UNUSED(check_opt)) { return(HA_ADMIN_TRY_ALTER); } @@ -3837,7 +3998,7 @@ int ha_pbxt::optimize(THD *thd __attribute__((unused)), HA_CHECK_OPT *check_opt extern int pbxt_mysql_trace_on; #endif -int ha_pbxt::check(THD* thd, HA_CHECK_OPT* check_opt __attribute__((unused))) +int ha_pbxt::check(THD* thd, HA_CHECK_OPT* XT_UNUSED(check_opt)) { int err = 0; XTThreadPtr self; @@ -3993,8 +4154,10 @@ xtPublic int ha_pbxt::external_lock(THD *thd, int lock_type) * (or update statement) just saw. */ if (pb_open_tab) { - if (pb_open_tab->ot_for_update) + if (pb_open_tab->ot_for_update) { self->st_visible_time = self->st_database->db_xn_end_time; + pb_open_tab->ot_for_update = 0; + } if (pb_share->sh_recalc_selectivity) { if ((pb_share->sh_table->tab_row_eof_id - 1 - pb_share->sh_table->tab_row_fnum) >= 200) { @@ -4079,10 +4242,15 @@ xtPublic int ha_pbxt::external_lock(THD *thd, int lock_type) pb_open_tab->ot_is_modify = FALSE; if ((pb_open_tab->ot_for_update = (lock_type == F_WRLCK))) { switch ((int) thd_sql_command(thd)) { - case SQLCOM_UPDATE: - case SQLCOM_UPDATE_MULTI: case SQLCOM_DELETE: case SQLCOM_DELETE_MULTI: + /* turn DELETE IGNORE into normal DELETE. The IGNORE option causes problems because + * when a record is deleted we add an xlog record which we cannot "rollback" later + * when we find that an FK-constraint has failed. + */ + thd->lex->ignore = false; + case SQLCOM_UPDATE: + case SQLCOM_UPDATE_MULTI: case SQLCOM_REPLACE: case SQLCOM_REPLACE_SELECT: case SQLCOM_INSERT: @@ -4290,7 +4458,9 @@ int ha_pbxt::start_stmt(THD *thd, thr_lock_type lock_type) pb_open_tab->ot_for_update = (lock_type != TL_READ && lock_type != TL_READ_WITH_SHARED_LOCKS && +#ifndef DRIZZLED lock_type != TL_READ_HIGH_PRIORITY && +#endif lock_type != TL_READ_NO_INSERT); pb_open_tab->ot_is_modify = FALSE; if (pb_open_tab->ot_for_update) { @@ -4557,9 +4727,12 @@ int ha_pbxt::delete_table(const char *table_path) { THD *thd = current_thd; int err = 0; - XTThreadPtr self; + XTThreadPtr self = NULL; XTSharePtr share; + STAT_TRACE(self, *thd_query(thd)); + XT_PRINT1(self, "ha_pbxt::delete_table %s\n", table_path); + if (XTSystemTableShare::isSystemTable(table_path)) return delete_system_table(table_path); @@ -4568,9 +4741,6 @@ int ha_pbxt::delete_table(const char *table_path) self->st_ignore_fkeys = (thd_test_options(thd, OPTION_NO_FOREIGN_KEY_CHECKS)) != 0; - STAT_TRACE(self, *thd_query(thd)); - XT_PRINT1(self, "ha_pbxt::delete_table %s\n", table_path); - try_(a) { xt_ha_open_database_of_table(self, (XTPathStrPtr) table_path); @@ -4586,16 +4756,23 @@ int ha_pbxt::delete_table(const char *table_path) pushr_(ha_release_exclusive_use, share); ha_close_open_tables(self, share, NULL); - xt_drop_table(self, (XTPathStrPtr) table_path); + xt_drop_table(self, (XTPathStrPtr) table_path, thd_sql_command(thd) == SQLCOM_DROP_DB); freer_(); // ha_release_exclusive_use(share) freer_(); // ha_unget_share(share) } catch_(b) { - /* If the table does not exist, just log the error and continue... */ + /* In MySQL if the table does not exist, just log the error and continue. This is + * needed to delete table in the case when CREATE TABLE fails and no PBXT disk + * structures were created. + * Drizzle unlike MySQL iterates over all handlers and tries to delete table. It + * stops after when a handler returns TRUE, so in Drizzle we need to report error. + */ +#ifndef DRIZZLED if (self->t_exception.e_xt_err == XT_ERR_TABLE_NOT_FOUND) xt_log_and_clear_exception(self); else +#endif throw_(); } cont_(b); @@ -4619,8 +4796,25 @@ int ha_pbxt::delete_table(const char *table_path) } catch_(a) { err = xt_ha_pbxt_thread_error_for_mysql(thd, self, pb_ignore_dup_key); +#ifdef DRIZZLED + if (err == HA_ERR_NO_SUCH_TABLE) + err = ENOENT; +#endif } cont_(a); + +#ifdef PBMS_ENABLED + /* Call pbms_delete_table_with_blobs() last because it cannot be undone. */ + if (!err) { + PBMSResultRec result; + + if (pbms_delete_table_with_blobs(table_path, &result)) { + xt_logf(XT_NT_WARNING, "pbms_delete_table_with_blobs() Error: %s", result.mr_message); + } + + pbms_completed(NULL, true); + } +#endif return err; } @@ -4681,6 +4875,16 @@ int ha_pbxt::rename_table(const char *from, const char *to) XT_PRINT2(self, "ha_pbxt::rename_table %s -> %s\n", from, to); +#ifdef PBMS_ENABLED + PBMSResultRec result; + + err = pbms_rename_table_with_blobs(from, to, &result); + if (err) { + xt_logf(XT_NT_ERROR, "pbms_rename_table_with_blobs() Error: %s", result.mr_message); + return err; + } +#endif + try_(a) { xt_ha_open_database_of_table(self, (XTPathStrPtr) to); to_db = self->st_database; @@ -4709,10 +4913,6 @@ int ha_pbxt::rename_table(const char *from, const char *to) freer_(); // ha_release_exclusive_use(share) freer_(); // ha_unget_share(share) -#ifdef XT_STREAMING - /* PBMS remove the table? */ - xt_pbms_rename_table(from, to); -#endif /* * If there are no more PBXT tables in the database, we * "drop the database", which deletes all PBXT resources @@ -4732,11 +4932,15 @@ int ha_pbxt::rename_table(const char *from, const char *to) err = xt_ha_pbxt_thread_error_for_mysql(thd, self, pb_ignore_dup_key); } cont_(a); + +#ifdef PBMS_ENABLED + pbms_completed(NULL, (err == 0)); +#endif XT_RETURN(err); } -int ha_pbxt::rename_system_table(const char *from __attribute__((unused)), const char *to __attribute__((unused))) +int ha_pbxt::rename_system_table(const char *XT_UNUSED(from), const char *XT_UNUSED(to)) { return ER_NOT_SUPPORTED_YET; } @@ -4771,7 +4975,7 @@ double ha_pbxt::scan_time() /* * The next method will never be called if you do not implement indexes. */ -double ha_pbxt::read_time(uint index __attribute__((unused)), uint ranges, ha_rows rows) +double ha_pbxt::read_time(uint XT_UNUSED(index), uint ranges, ha_rows rows) { double result = rows2double(ranges+rows); return result; @@ -4945,7 +5149,7 @@ void ha_pbxt::free_foreign_key_create_info(char* str) xt_free(NULL, str); } -bool ha_pbxt::get_error_message(int error __attribute__((unused)), String *buf) +bool ha_pbxt::get_error_message(int XT_UNUSED(error), String *buf) { THD *thd = current_thd; int err = 0; @@ -5104,9 +5308,9 @@ struct st_mysql_sys_var #endif #ifdef USE_CONST_SAVE -static void pbxt_record_cache_size_func(THD *thd __attribute__((unused)), struct st_mysql_sys_var *var, void *tgt, const void *save) +static void pbxt_record_cache_size_func(THD *XT_UNUSED(thd), struct st_mysql_sys_var *var, void *tgt, const void *save) #else -static void pbxt_record_cache_size_func(THD *thd __attribute__((unused)), struct st_mysql_sys_var *var, void *tgt, void *save) +static void pbxt_record_cache_size_func(THD *XT_UNUSED(thd), struct st_mysql_sys_var *var, void *tgt, void *save) #endif { xtInt8 record_cache_size; @@ -5215,6 +5419,18 @@ static MYSQL_SYSVAR_INT(sweeper_priority, xt_db_sweeper_priority, "Determines the priority of the background sweeper process, 0 = low (default), 1 = normal (same as user threads), 2 = high.", NULL, NULL, XT_PRIORITY_LOW, XT_PRIORITY_LOW, XT_PRIORITY_HIGH, 1); +#ifdef DRIZZLED +static MYSQL_SYSVAR_INT(max_threads, pbxt_max_threads, + PLUGIN_VAR_OPCMDARG, + "The maximum number of threads used by PBXT", + NULL, NULL, 500, 20, 20000, 1); +#else +static MYSQL_SYSVAR_INT(max_threads, pbxt_max_threads, + PLUGIN_VAR_OPCMDARG, + "The maximum number of threads used by PBXT, 0 = set according to MySQL max_connections.", + NULL, NULL, 0, 0, 20000, 1); +#endif + static struct st_mysql_sys_var* pbxt_system_variables[] = { MYSQL_SYSVAR(index_cache_size), MYSQL_SYSVAR(record_cache_size), @@ -5231,6 +5447,7 @@ static struct st_mysql_sys_var* pbxt_system_variables[] = { MYSQL_SYSVAR(auto_increment_mode), MYSQL_SYSVAR(offline_log_function), MYSQL_SYSVAR(sweeper_priority), + MYSQL_SYSVAR(max_threads), NULL }; #endif @@ -5241,8 +5458,8 @@ drizzle_declare_plugin(pbxt) mysql_declare_plugin(pbxt) #endif { - MYSQL_STORAGE_ENGINE_PLUGIN, #ifndef DRIZZLED + MYSQL_STORAGE_ENGINE_PLUGIN, &pbxt_storage_engine, #endif "PBXT", @@ -5266,8 +5483,8 @@ mysql_declare_plugin(pbxt) NULL /* config options */ }, { - MYSQL_INFORMATION_SCHEMA_PLUGIN, #ifndef DRIZZLED + MYSQL_INFORMATION_SCHEMA_PLUGIN, &pbxt_statitics, #endif "PBXT_STATISTICS", diff --git a/storage/pbxt/src/ha_pbxt.h b/storage/pbxt/src/ha_pbxt.h index 6f6a194de12..d48bcd34147 100644 --- a/storage/pbxt/src/ha_pbxt.h +++ b/storage/pbxt/src/ha_pbxt.h @@ -28,7 +28,7 @@ #ifdef DRIZZLED #include <drizzled/common.h> #include <drizzled/handler.h> -#include <drizzled/handlerton.h> +#include <drizzled/plugin/storage_engine.h> #include <mysys/thr_lock.h> #else #include "mysql_priv.h" @@ -51,6 +51,25 @@ class ha_pbxt; +#ifdef DRIZZLED + +class PBXTStorageEngine : public StorageEngine { +public: + PBXTStorageEngine(std::string name_arg) + : StorageEngine(name_arg, HTON_NO_FLAGS) {} + + /* override */ int close_connection(Session *); + /* override */ int commit(Session *, bool); + /* override */ int rollback(Session *, bool); + /* override */ handler *create(TABLE_SHARE *, MEM_ROOT *); + /* override */ void drop_database(char *); + /* override */ bool show_status(Session *, stat_print_fn *, enum ha_stat_type); +}; + +typedef PBXTStorageEngine handlerton; + +#endif + extern handlerton *pbxt_hton; /* diff --git a/storage/pbxt/src/ha_xtsys.cc b/storage/pbxt/src/ha_xtsys.cc index 1c76d13379a..c76f60267be 100644 --- a/storage/pbxt/src/ha_xtsys.cc +++ b/storage/pbxt/src/ha_xtsys.cc @@ -75,7 +75,7 @@ const char **ha_xtsys::bas_ext() const return ha_pbms_exts; } -int ha_xtsys::open(const char *table_path, int mode __attribute__((unused)), uint test_if_locked __attribute__((unused))) +int ha_xtsys::open(const char *table_path, int XT_UNUSED(mode), uint XT_UNUSED(test_if_locked)) { THD *thd = current_thd; XTExceptionRec e; @@ -141,7 +141,7 @@ int ha_xtsys::close(void) return err; } -int ha_xtsys::rnd_init(bool scan __attribute__((unused))) +int ha_xtsys::rnd_init(bool XT_UNUSED(scan)) { int err = 0; @@ -185,7 +185,7 @@ int ha_xtsys::rnd_pos(byte * buf, byte *pos) return err; } -int ha_xtsys::info(uint flag __attribute__((unused))) +int ha_xtsys::info(uint XT_UNUSED(flag)) { return 0; } @@ -211,7 +211,7 @@ int ha_xtsys::external_lock(THD *thd, int lock_type) return err; } -THR_LOCK_DATA **ha_xtsys::store_lock(THD *thd __attribute__((unused)), THR_LOCK_DATA **to, enum thr_lock_type lock_type) +THR_LOCK_DATA **ha_xtsys::store_lock(THD *XT_UNUSED(thd), THR_LOCK_DATA **to, enum thr_lock_type lock_type) { if (lock_type != TL_IGNORE && ha_lock.type == TL_UNLOCK) ha_lock.type = lock_type; @@ -220,13 +220,13 @@ THR_LOCK_DATA **ha_xtsys::store_lock(THD *thd __attribute__((unused)), THR_LOCK_ } /* Note: ha_pbxt::delete_system_table is called instead. */ -int ha_xtsys::delete_table(const char *table_path __attribute__((unused))) +int ha_xtsys::delete_table(const char *XT_UNUSED(table_path)) { /* Should never be called */ return 0; } -int ha_xtsys::create(const char *name __attribute__((unused)), TABLE *table_arg __attribute__((unused)), HA_CREATE_INFO *create_info __attribute__((unused))) +int ha_xtsys::create(const char *XT_UNUSED(name), TABLE *XT_UNUSED(table_arg), HA_CREATE_INFO *XT_UNUSED(create_info)) { /* Allow the table to be created. * This is required after a dump is restored. @@ -234,7 +234,7 @@ int ha_xtsys::create(const char *name __attribute__((unused)), TABLE *table_arg return 0; } -bool ha_xtsys::get_error_message(int error __attribute__((unused)), String *buf) +bool ha_xtsys::get_error_message(int XT_UNUSED(error), String *buf) { THD *thd = current_thd; XTExceptionRec e; diff --git a/storage/pbxt/src/ha_xtsys.h b/storage/pbxt/src/ha_xtsys.h index 66a4b5a5dfa..598abe0938f 100644 --- a/storage/pbxt/src/ha_xtsys.h +++ b/storage/pbxt/src/ha_xtsys.h @@ -59,7 +59,7 @@ public: const char *table_type() const { return "PBXT"; } - const char *index_type(uint inx __attribute__((unused))) { + const char *index_type(uint XT_UNUSED(inx)) { return "NONE"; } @@ -69,7 +69,7 @@ public: return HA_BINLOG_ROW_CAPABLE | HA_BINLOG_STMT_CAPABLE; } - MX_ULONG_T index_flags(uint inx __attribute__((unused)), uint part __attribute__((unused)), bool all_parts __attribute__((unused))) const { + MX_ULONG_T index_flags(uint XT_UNUSED(inx), uint XT_UNUSED(part), bool XT_UNUSED(all_parts)) const { return (HA_READ_NEXT | HA_READ_PREV | HA_READ_RANGE | HA_KEYREAD_ONLY); } uint max_supported_keys() const { return 512; } diff --git a/storage/pbxt/src/hashtab_xt.cc b/storage/pbxt/src/hashtab_xt.cc index 3708f071ac5..80ba86a5248 100644 --- a/storage/pbxt/src/hashtab_xt.cc +++ b/storage/pbxt/src/hashtab_xt.cc @@ -115,7 +115,7 @@ xtPublic void xt_ht_put(XTThreadPtr self, XTHashTabPtr ht, void *data) popr_(); } -xtPublic void *xt_ht_get(XTThreadPtr self __attribute__((unused)), XTHashTabPtr ht, void *key) +xtPublic void *xt_ht_get(XTThreadPtr XT_UNUSED(self), XTHashTabPtr ht, void *key) { XTHashItemPtr item; xtHashValue h; @@ -239,14 +239,14 @@ xtPublic void xt_ht_signal(XTThreadPtr self, XTHashTabPtr ht) xt_signal_cond(self, ht->ht_cond); } -xtPublic void xt_ht_enum(struct XTThread *self __attribute__((unused)), XTHashTabPtr ht, XTHashEnumPtr en) +xtPublic void xt_ht_enum(struct XTThread *XT_UNUSED(self), XTHashTabPtr ht, XTHashEnumPtr en) { en->he_i = 0; en->he_item = NULL; en->he_ht = ht; } -xtPublic void *xt_ht_next(struct XTThread *self __attribute__((unused)), XTHashEnumPtr en) +xtPublic void *xt_ht_next(struct XTThread *XT_UNUSED(self), XTHashEnumPtr en) { if (en->he_item) { en->he_item = en->he_item->hi_next; diff --git a/storage/pbxt/src/heap_xt.cc b/storage/pbxt/src/heap_xt.cc index a88df833a4a..dcfa1dae11f 100644 --- a/storage/pbxt/src/heap_xt.cc +++ b/storage/pbxt/src/heap_xt.cc @@ -31,7 +31,7 @@ #undef xt_heap_new #endif -#ifdef DEBUG +#ifdef DEBUG_MEMORY xtPublic XTHeapPtr xt_mm_heap_new(XTThreadPtr self, size_t size, XTFinalizeFunc finalize, u_int line, c_char *file, xtBool track) #else xtPublic XTHeapPtr xt_heap_new(XTThreadPtr self, size_t size, XTFinalizeFunc finalize) @@ -39,7 +39,7 @@ xtPublic XTHeapPtr xt_heap_new(XTThreadPtr self, size_t size, XTFinalizeFunc fin { volatile XTHeapPtr hp; -#ifdef DEBUG +#ifdef DEBUG_MEMORY hp = (XTHeapPtr) xt_mm_calloc(self, size, line, file); hp->h_track = track; if (track) @@ -65,21 +65,21 @@ xtPublic XTHeapPtr xt_heap_new(XTThreadPtr self, size_t size, XTFinalizeFunc fin return hp; } -xtPublic void xt_check_heap(XTThreadPtr self __attribute__((unused)), XTHeapPtr hp __attribute__((unused))) +xtPublic void xt_check_heap(XTThreadPtr XT_NDEBUG_UNUSED(self), XTHeapPtr XT_NDEBUG_UNUSED(hp)) { -#ifdef DEBUG +#ifdef DEBUG_MEMORY xt_mm_malloc_size(self, hp); #endif } -#ifdef DEBUG +#ifdef DEBUG_MEMORY xtPublic void xt_mm_heap_reference(XTThreadPtr self, XTHeapPtr hp, u_int line, c_char *file) #else xtPublic void xt_heap_reference(XTThreadPtr, XTHeapPtr hp) #endif { xt_spinlock_lock(&hp->h_lock); -#ifdef DEBUG +#ifdef DEBUG_MEMORY if (hp->h_track) printf("HEAP: +1 %d->%d %s:%d\n", (int) hp->h_ref_count, (int) hp->h_ref_count+1, file, (int) line); #endif @@ -91,7 +91,7 @@ xtPublic void xt_heap_release(XTThreadPtr self, XTHeapPtr hp) { if (!hp) return; -#ifdef DEBUG +#ifdef DEBUG_MEMORY xt_spinlock_lock(&hp->h_lock); ASSERT(hp->h_ref_count != 0); xt_spinlock_unlock(&hp->h_lock); @@ -100,7 +100,7 @@ xtPublic void xt_heap_release(XTThreadPtr self, XTHeapPtr hp) if (hp->h_onrelease) (*hp->h_onrelease)(self, hp); if (hp->h_ref_count > 0) { -#ifdef DEBUG +#ifdef DEBUG_MEMORY if (hp->h_track) printf("HEAP: -1 %d->%d\n", (int) hp->h_ref_count, (int) hp->h_ref_count-1); #endif @@ -116,12 +116,12 @@ xtPublic void xt_heap_release(XTThreadPtr self, XTHeapPtr hp) xt_spinlock_unlock(&hp->h_lock); } -xtPublic void xt_heap_set_release_callback(XTThreadPtr self __attribute__((unused)), XTHeapPtr hp, XTFinalizeFunc onrelease) +xtPublic void xt_heap_set_release_callback(XTThreadPtr XT_UNUSED(self), XTHeapPtr hp, XTFinalizeFunc onrelease) { hp->h_onrelease = onrelease; } -xtPublic u_int xt_heap_get_ref_count(struct XTThread *self __attribute__((unused)), XTHeapPtr hp) +xtPublic u_int xt_heap_get_ref_count(struct XTThread *XT_UNUSED(self), XTHeapPtr hp) { return hp->h_ref_count; } diff --git a/storage/pbxt/src/heap_xt.h b/storage/pbxt/src/heap_xt.h index afad132e1e3..db7a6909f05 100644 --- a/storage/pbxt/src/heap_xt.h +++ b/storage/pbxt/src/heap_xt.h @@ -25,6 +25,7 @@ #include "xt_defs.h" #include "lock_xt.h" +#include "memory_xt.h" struct XTThread; @@ -59,7 +60,7 @@ u_int xt_heap_get_ref_count(struct XTThread *self, XTHeapPtr mem); void xt_check_heap(struct XTThread *self, XTHeapPtr mem); -#ifdef DEBUG +#ifdef DEBUG_MEMORY #define xt_heap_new(t, s, f) xt_mm_heap_new(t, s, f, __LINE__, __FILE__, FALSE) #define xt_heap_new_track(t, s, f) xt_mm_heap_new(t, s, f, __LINE__, __FILE__, TRUE) #define xt_heap_reference(t, s) xt_mm_heap_reference(t, s, __LINE__, __FILE__) diff --git a/storage/pbxt/src/index_xt.cc b/storage/pbxt/src/index_xt.cc index 9cd9a966f74..91e03c3e424 100644 --- a/storage/pbxt/src/index_xt.cc +++ b/storage/pbxt/src/index_xt.cc @@ -23,6 +23,10 @@ #include "xt_config.h" +#ifdef DRIZZLED +#include <bitset> +#endif + #include <string.h> #include <stdio.h> #include <stddef.h> @@ -52,7 +56,7 @@ //#define CHECK_AND_PRINT //#define CHECK_NODE_REFERENCE //#define TRACE_FLUSH -//#define CHECK_PRINTS_RECORD_REFERENCES +#define CHECK_PRINTS_RECORD_REFERENCES #else #define MAX_SEARCH_DEPTH 100 #endif @@ -77,6 +81,7 @@ static u_int idx_check_index(XTOpenTablePtr ot, XTIndexPtr ind, xtBool with_lock #endif static xtBool idx_insert_node(XTOpenTablePtr ot, XTIndexPtr ind, IdxBranchStackPtr stack, XTIdxKeyValuePtr key_value, xtIndexNodeID branch); +static xtBool idx_remove_lazy_deleted_item_in_node(XTOpenTablePtr ot, XTIndexPtr ind, xtIndexNodeID current, XTIndReferencePtr iref, XTIdxKeyValuePtr key_value); #ifdef XT_TRACK_INDEX_UPDATES @@ -163,7 +168,7 @@ static void track_dump_all(u_int max_block) #endif -xtPublic void xt_ind_track_dump_block(XTTableHPtr tab __attribute__((unused)), xtIndexNodeID address __attribute__((unused))) +xtPublic void xt_ind_track_dump_block(XTTableHPtr XT_UNUSED(tab), xtIndexNodeID XT_UNUSED(address)) { #ifdef TRACK_ACTIVITY u_int i = XT_NODE_ID(address)-1; @@ -268,7 +273,7 @@ static xtBool idx_new_branch(XTOpenTablePtr ot, XTIndexPtr ind, xtIndexNodeID *a if ((XT_NODE_ID(wrote_pos) = XT_NODE_ID(tab->tab_ind_free))) { /* Use the block on the free list: */ - if (!xt_ind_read_bytes(ot, wrote_pos, sizeof(XTIndFreeBlockRec), (xtWord1 *) &free_block)) + if (!xt_ind_read_bytes(ot, ind, wrote_pos, sizeof(XTIndFreeBlockRec), (xtWord1 *) &free_block)) goto failed; XT_NODE_ID(tab->tab_ind_free) = (xtIndexNodeID) XT_GET_DISK_8(free_block.if_next_block_8); xt_unlock_mutex_ns(&tab->tab_ind_lock); @@ -343,7 +348,7 @@ static xtBool idx_free_branch(XTOpenTablePtr ot, XTIndexPtr ind, xtIndexNodeID n * Simple compare functions */ -xtPublic int xt_compare_2_int4(XTIndexPtr ind __attribute__((unused)), uint key_length, xtWord1 *key_value, xtWord1 *b_value) +xtPublic int xt_compare_2_int4(XTIndexPtr XT_UNUSED(ind), uint key_length, xtWord1 *key_value, xtWord1 *b_value) { int r; @@ -357,7 +362,7 @@ xtPublic int xt_compare_2_int4(XTIndexPtr ind __attribute__((unused)), uint key_ return r; } -xtPublic int xt_compare_3_int4(XTIndexPtr ind __attribute__((unused)), uint key_length, xtWord1 *key_value, xtWord1 *b_value) +xtPublic int xt_compare_3_int4(XTIndexPtr XT_UNUSED(ind), uint key_length, xtWord1 *key_value, xtWord1 *b_value) { int r; @@ -381,7 +386,7 @@ xtPublic int xt_compare_3_int4(XTIndexPtr ind __attribute__((unused)), uint key_ * Tree branch sanning (searching nodes and leaves) */ -xtPublic void xt_scan_branch_single(struct XTTable *tab __attribute__((unused)), XTIndexPtr ind, XTIdxBranchDPtr branch, register XTIdxKeyValuePtr value, register XTIdxResultRec *result) +xtPublic void xt_scan_branch_single(struct XTTable *XT_UNUSED(tab), XTIndexPtr ind, XTIdxBranchDPtr branch, register XTIdxKeyValuePtr value, register XTIdxResultRec *result) { XT_NODE_TEMP; u_int branch_size; @@ -522,7 +527,7 @@ xtPublic void xt_scan_branch_single(struct XTTable *tab __attribute__((unused)), * index (in the case of -1) or to the first value after the * the search key in the case of 1. */ -xtPublic void xt_scan_branch_fix(struct XTTable *tab __attribute__((unused)), XTIndexPtr ind, XTIdxBranchDPtr branch, register XTIdxKeyValuePtr value, register XTIdxResultRec *result) +xtPublic void xt_scan_branch_fix(struct XTTable *XT_UNUSED(tab), XTIndexPtr ind, XTIdxBranchDPtr branch, register XTIdxKeyValuePtr value, register XTIdxResultRec *result) { XT_NODE_TEMP; u_int branch_size; @@ -619,7 +624,7 @@ xtPublic void xt_scan_branch_fix(struct XTTable *tab __attribute__((unused)), XT result->sr_item.i_item_offset = node_ref_size + i * full_item_size; } -xtPublic void xt_scan_branch_fix_simple(struct XTTable *tab __attribute__((unused)), XTIndexPtr ind, XTIdxBranchDPtr branch, register XTIdxKeyValuePtr value, register XTIdxResultRec *result) +xtPublic void xt_scan_branch_fix_simple(struct XTTable *XT_UNUSED(tab), XTIndexPtr ind, XTIdxBranchDPtr branch, register XTIdxKeyValuePtr value, register XTIdxResultRec *result) { XT_NODE_TEMP; u_int branch_size; @@ -720,7 +725,7 @@ xtPublic void xt_scan_branch_fix_simple(struct XTTable *tab __attribute__((unuse * Variable length key values are stored as a sorted list. Since each list item has a variable length, we * must scan the list sequentially in order to find a key. */ -xtPublic void xt_scan_branch_var(struct XTTable *tab __attribute__((unused)), XTIndexPtr ind, XTIdxBranchDPtr branch, register XTIdxKeyValuePtr value, register XTIdxResultRec *result) +xtPublic void xt_scan_branch_var(struct XTTable *XT_UNUSED(tab), XTIndexPtr ind, XTIdxBranchDPtr branch, register XTIdxKeyValuePtr value, register XTIdxResultRec *result) { XT_NODE_TEMP; u_int branch_size; @@ -816,7 +821,7 @@ xtPublic void xt_scan_branch_var(struct XTTable *tab __attribute__((unused)), XT } /* Go to the next item in the node. */ -static void idx_next_branch_item(XTTableHPtr tab __attribute__((unused)), XTIndexPtr ind, XTIdxBranchDPtr branch, register XTIdxResultRec *result) +static void idx_next_branch_item(XTTableHPtr XT_UNUSED(tab), XTIndexPtr ind, XTIdxBranchDPtr branch, register XTIdxResultRec *result) { XT_NODE_TEMP; xtWord1 *bitem; @@ -834,7 +839,7 @@ static void idx_next_branch_item(XTTableHPtr tab __attribute__((unused)), XTInde result->sr_branch = IDX_GET_NODE_REF(tab, bitem, result->sr_item.i_node_ref_size); } -xtPublic void xt_prev_branch_item_fix(XTTableHPtr tab __attribute__((unused)), XTIndexPtr ind __attribute__((unused)), XTIdxBranchDPtr branch, register XTIdxResultRec *result) +xtPublic void xt_prev_branch_item_fix(XTTableHPtr XT_UNUSED(tab), XTIndexPtr XT_UNUSED(ind), XTIdxBranchDPtr branch, register XTIdxResultRec *result) { XT_NODE_TEMP; ASSERT_NS(result->sr_item.i_item_offset >= result->sr_item.i_item_size + result->sr_item.i_node_ref_size + result->sr_item.i_node_ref_size); @@ -843,7 +848,7 @@ xtPublic void xt_prev_branch_item_fix(XTTableHPtr tab __attribute__((unused)), X result->sr_branch = IDX_GET_NODE_REF(tab, branch->tb_data + result->sr_item.i_item_offset, result->sr_item.i_node_ref_size); } -xtPublic void xt_prev_branch_item_var(XTTableHPtr tab __attribute__((unused)), XTIndexPtr ind, XTIdxBranchDPtr branch, register XTIdxResultRec *result) +xtPublic void xt_prev_branch_item_var(XTTableHPtr XT_UNUSED(tab), XTIndexPtr ind, XTIdxBranchDPtr branch, register XTIdxResultRec *result) { XT_NODE_TEMP; xtWord1 *bitem; @@ -865,7 +870,20 @@ xtPublic void xt_prev_branch_item_var(XTTableHPtr tab __attribute__((unused)), X result->sr_item.i_item_offset = bitem - branch->tb_data; } -static void idx_first_branch_item(XTTableHPtr tab __attribute__((unused)), XTIndexPtr ind, XTIdxBranchDPtr branch, register XTIdxResultPtr result) +static void idx_reload_item_fix(XTIndexPtr XT_NDEBUG_UNUSED(ind), XTIdxBranchDPtr branch, register XTIdxResultPtr result) +{ + u_int branch_size; + + branch_size = XT_GET_DISK_2(branch->tb_size_2); + ASSERT_NS(result->sr_item.i_node_ref_size == (XT_IS_NODE(branch_size) ? XT_NODE_REF_SIZE : 0)); + ASSERT_NS(result->sr_item.i_item_size == ind->mi_key_size + XT_RECORD_REF_SIZE); + result->sr_item.i_total_size = XT_GET_BRANCH_DATA_SIZE(branch_size); + if (result->sr_item.i_item_offset > result->sr_item.i_total_size) + result->sr_item.i_item_offset = result->sr_item.i_total_size; + xt_get_res_record_ref(&branch->tb_data[result->sr_item.i_item_offset + result->sr_item.i_item_size - XT_RECORD_REF_SIZE], result); +} + +static void idx_first_branch_item(XTTableHPtr XT_UNUSED(tab), XTIndexPtr ind, XTIdxBranchDPtr branch, register XTIdxResultPtr result) { XT_NODE_TEMP; u_int branch_size; @@ -903,7 +921,7 @@ static void idx_first_branch_item(XTTableHPtr tab __attribute__((unused)), XTInd /* * Last means different things for leaf or node! */ -xtPublic void xt_last_branch_item_fix(XTTableHPtr tab __attribute__((unused)), XTIndexPtr ind, XTIdxBranchDPtr branch, register XTIdxResultPtr result) +xtPublic void xt_last_branch_item_fix(XTTableHPtr XT_UNUSED(tab), XTIndexPtr ind, XTIdxBranchDPtr branch, register XTIdxResultPtr result) { XT_NODE_TEMP; u_int branch_size; @@ -935,7 +953,7 @@ xtPublic void xt_last_branch_item_fix(XTTableHPtr tab __attribute__((unused)), X } } -xtPublic void xt_last_branch_item_var(XTTableHPtr tab __attribute__((unused)), XTIndexPtr ind, XTIdxBranchDPtr branch, register XTIdxResultPtr result) +xtPublic void xt_last_branch_item_var(XTTableHPtr XT_UNUSED(tab), XTIndexPtr ind, XTIdxBranchDPtr branch, register XTIdxResultPtr result) { XT_NODE_TEMP; u_int branch_size; @@ -986,6 +1004,218 @@ xtPublic void xt_last_branch_item_var(XTTableHPtr tab __attribute__((unused)), X } } +xtPublic xtBool xt_idx_lazy_delete_on_leaf(XTIndexPtr ind, XTIndBlockPtr block, xtWord2 branch_size) +{ + ASSERT_NS(ind->mi_fix_key); + + /* Compact the leaf if more than half the items that fit on the page + * are deleted: */ + if (block->cp_del_count >= ind->mi_max_items/2) + return FALSE; + + /* Compact the page if there is only 1 (or less) valid item left: */ + if ((u_int) block->cp_del_count+1 >= ((u_int) branch_size - 2)/(ind->mi_key_size + XT_RECORD_REF_SIZE)) + return FALSE; + + return OK; +} + +static xtBool idx_lazy_delete_on_node(XTIndexPtr ind, XTIndBlockPtr block, register XTIdxItemPtr item) +{ + ASSERT_NS(ind->mi_fix_key); + + /* Compact the node if more than 1/4 of the items that fit on the page + * are deleted: */ + if (block->cp_del_count >= ind->mi_max_items/4) + return FALSE; + + /* Compact the page if there is only 1 (or less) valid item left: */ + if ((u_int) block->cp_del_count+1 >= (item->i_total_size - item->i_node_ref_size)/(item->i_item_size + item->i_node_ref_size)) + return FALSE; + + return OK; +} + +inline static xtBool idx_cmp_item_key_fix(XTIndReferencePtr iref, register XTIdxItemPtr item, XTIdxKeyValuePtr value) +{ + xtWord1 *data; + + data = &iref->ir_branch->tb_data[item->i_item_offset]; + return memcmp(data, value->sv_key, value->sv_length) == 0; +} + +inline static void idx_set_item_key_fix(XTIndReferencePtr iref, register XTIdxItemPtr item, XTIdxKeyValuePtr value) +{ + xtWord1 *data; + + data = &iref->ir_branch->tb_data[item->i_item_offset]; + memcpy(data, value->sv_key, value->sv_length); + xt_set_val_record_ref(data + value->sv_length, value); + iref->ir_updated = TRUE; +} + +inline static void idx_set_item_reference(XTIndReferencePtr iref, register XTIdxItemPtr item, xtRowID rec_id, xtRowID row_id) +{ + size_t offset; + xtWord1 *data; + + /* This is the offset of the reference in the item we found: */ + offset = item->i_item_offset +item->i_item_size - XT_RECORD_REF_SIZE; + data = &iref->ir_branch->tb_data[offset]; + + xt_set_record_ref(data, rec_id, row_id); + iref->ir_updated = TRUE; +} + +inline static void idx_set_item_row_id(XTIndReferencePtr iref, register XTIdxItemPtr item, xtRowID row_id) +{ + size_t offset; + xtWord1 *data; + + offset = + /* This is the offset of the reference in the item we found: */ + item->i_item_offset +item->i_item_size - XT_RECORD_REF_SIZE + + /* This is the offset of the row id in the reference: */ + XT_RECORD_ID_SIZE; + data = &iref->ir_branch->tb_data[offset]; + + /* This update does not change the structure of page, so we do it without + * copying the page before we write. + */ + XT_SET_DISK_4(data, row_id); + iref->ir_updated = TRUE; +} + +inline static xtBool idx_is_item_deleted(register XTIdxBranchDPtr branch, register XTIdxItemPtr item) +{ + xtWord1 *data; + + data = &branch->tb_data[item->i_item_offset + item->i_item_size - XT_RECORD_REF_SIZE + XT_RECORD_ID_SIZE]; + return XT_GET_DISK_4(data) == (xtRowID) -1; +} + +inline static void idx_set_item_deleted(XTIndReferencePtr iref, register XTIdxItemPtr item) +{ + idx_set_item_row_id(iref, item, (xtRowID) -1); + + /* This should be safe because there is only one thread, + * the sweeper, that does this! + * + * Threads that decrement this value have an xlock on + * the page, or the index. + */ + iref->ir_block->cp_del_count++; +} + +/* + * {LAZY-DEL-INDEX-ITEMS} + * Do a lazy delete of an item by just setting the Row ID + * to the delete indicator: row ID -1. + */ +static void idx_lazy_delete_branch_item(XTOpenTablePtr ot, XTIndexPtr ind, XTIndReferencePtr iref, register XTIdxItemPtr item) +{ + idx_set_item_deleted(iref, item); + xt_ind_release(ot, ind, iref->ir_xlock ? XT_UNLOCK_W_UPDATE : XT_UNLOCK_R_UPDATE, iref); +} + +/* + * This function compacts the leaf, but preserves the + * position of the item. + */ +static xtBool idx_compact_leaf(XTOpenTablePtr ot, XTIndexPtr ind, XTIndReferencePtr iref, register XTIdxItemPtr item) +{ + register XTIdxBranchDPtr branch = iref->ir_branch; + int item_idx, count, i, idx; + u_int size; + xtWord1 *s_data; + xtWord1 *d_data; + xtWord1 *data; + xtRowID row_id; + + if (iref->ir_block->cb_handle_count) { + if (!xt_ind_copy_on_write(iref)) { + xt_ind_release(ot, ind, iref->ir_xlock ? XT_UNLOCK_WRITE : XT_UNLOCK_READ, iref); + return FAILED; + } + } + + ASSERT_NS(!item->i_node_ref_size); + ASSERT_NS(ind->mi_fix_key); + size = item->i_item_size; + count = item->i_total_size / size; + item_idx = item->i_item_offset / size; + s_data = d_data = branch->tb_data; + idx = 0; + for (i=0; i<count; i++) { + data = s_data + item->i_item_size - XT_RECORD_REF_SIZE + XT_RECORD_ID_SIZE; + row_id = XT_GET_DISK_4(data); + if (row_id == (xtRowID) -1) { + if (idx < item_idx) + item_idx--; + } + else { + if (d_data != s_data) + memcpy(d_data, s_data, size); + d_data += size; + idx++; + } + s_data += size; + } + iref->ir_block->cp_del_count = 0; + item->i_total_size = d_data - branch->tb_data; + ASSERT_NS(idx * size == item->i_total_size); + item->i_item_offset = item_idx * size; + XT_SET_DISK_2(branch->tb_size_2, XT_MAKE_BRANCH_SIZE(item->i_total_size, 0)); + iref->ir_updated = TRUE; + return OK; +} + +static xtBool idx_lazy_remove_leaf_item_right(XTOpenTablePtr ot, XTIndexPtr ind, XTIndReferencePtr iref, register XTIdxItemPtr item) +{ + register XTIdxBranchDPtr branch = iref->ir_branch; + int item_idx, count, i; + u_int size; + xtWord1 *s_data; + xtWord1 *d_data; + xtWord1 *data; + xtRowID row_id; + + ASSERT_NS(!item->i_node_ref_size); + + if (iref->ir_block->cb_handle_count) { + if (!xt_ind_copy_on_write(iref)) { + xt_ind_release(ot, ind, XT_UNLOCK_WRITE, iref); + return FAILED; + } + } + + ASSERT_NS(ind->mi_fix_key); + size = item->i_item_size; + count = item->i_total_size / size; + item_idx = item->i_item_offset / size; + s_data = d_data = branch->tb_data; + for (i=0; i<count; i++) { + if (i == item_idx) + item->i_item_offset = d_data - branch->tb_data; + else { + data = s_data + item->i_item_size - XT_RECORD_REF_SIZE + XT_RECORD_ID_SIZE; + row_id = XT_GET_DISK_4(data); + if (row_id != (xtRowID) -1) { + if (d_data != s_data) + memcpy(d_data, s_data, size); + d_data += size; + } + } + s_data += size; + } + iref->ir_block->cp_del_count = 0; + item->i_total_size = d_data - branch->tb_data; + XT_SET_DISK_2(branch->tb_size_2, XT_MAKE_BRANCH_SIZE(item->i_total_size, 0)); + iref->ir_updated = TRUE; + xt_ind_release(ot, ind, XT_UNLOCK_W_UPDATE, iref); + return OK; +} + /* * Remove an item and save to disk. */ @@ -1003,8 +1233,14 @@ static xtBool idx_remove_branch_item_right(XTOpenTablePtr ot, XTIndexPtr ind, xt * an Xlock on the cache block. */ if (iref->ir_block->cb_handle_count) { - if (!xt_ind_copy_on_write(iref)) + if (!xt_ind_copy_on_write(iref)) { + xt_ind_release(ot, ind, item->i_node_ref_size ? XT_UNLOCK_READ : XT_UNLOCK_WRITE, iref); return FAILED; + } + } + if (ind->mi_lazy_delete) { + if (idx_is_item_deleted(branch, item)) + iref->ir_block->cp_del_count--; } /* Remove the node reference to the left of the item: */ memmove(&branch->tb_data[item->i_item_offset], @@ -1013,18 +1249,28 @@ static xtBool idx_remove_branch_item_right(XTOpenTablePtr ot, XTIndexPtr ind, xt item->i_total_size -= size; XT_SET_DISK_2(branch->tb_size_2, XT_MAKE_BRANCH_SIZE(item->i_total_size, item->i_node_ref_size)); IDX_TRACE("%d-> %x\n", (int) XT_NODE_ID(address), (int) XT_GET_DISK_2(branch->tb_size_2)); + iref->ir_updated = TRUE; xt_ind_release(ot, ind, item->i_node_ref_size ? XT_UNLOCK_R_UPDATE : XT_UNLOCK_W_UPDATE, iref); return OK; } -static xtBool idx_remove_branch_item_left(XTOpenTablePtr ot, XTIndexPtr ind, xtIndexNodeID, XTIndReferencePtr iref, register XTIdxItemPtr item) +static xtBool idx_remove_branch_item_left(XTOpenTablePtr ot, XTIndexPtr ind, xtIndexNodeID, XTIndReferencePtr iref, register XTIdxItemPtr item, xtBool *lazy_delete_cleanup_required) { register XTIdxBranchDPtr branch = iref->ir_branch; u_int size = item->i_item_size + item->i_node_ref_size; + ASSERT_NS(item->i_node_ref_size); if (iref->ir_block->cb_handle_count) { - if (!xt_ind_copy_on_write(iref)) + if (!xt_ind_copy_on_write(iref)) { + xt_ind_release(ot, ind, item->i_node_ref_size ? XT_UNLOCK_READ : XT_UNLOCK_WRITE, iref); return FAILED; + } + } + if (ind->mi_lazy_delete) { + if (idx_is_item_deleted(branch, item)) + iref->ir_block->cp_del_count--; + if (lazy_delete_cleanup_required) + *lazy_delete_cleanup_required = idx_lazy_delete_on_node(ind, iref->ir_block, item); } /* Remove the node reference to the left of the item: */ memmove(&branch->tb_data[item->i_item_offset - item->i_node_ref_size], @@ -1033,11 +1279,12 @@ static xtBool idx_remove_branch_item_left(XTOpenTablePtr ot, XTIndexPtr ind, xtI item->i_total_size -= size; XT_SET_DISK_2(branch->tb_size_2, XT_MAKE_BRANCH_SIZE(item->i_total_size, item->i_node_ref_size)); IDX_TRACE("%d-> %x\n", (int) XT_NODE_ID(address), (int) XT_GET_DISK_2(branch->tb_size_2)); + iref->ir_updated = TRUE; xt_ind_release(ot, ind, item->i_node_ref_size ? XT_UNLOCK_R_UPDATE : XT_UNLOCK_W_UPDATE, iref); return OK; } -static void idx_insert_leaf_item(XTIndexPtr ind __attribute__((unused)), XTIdxBranchDPtr leaf, XTIdxKeyValuePtr value, XTIdxResultPtr result) +static void idx_insert_leaf_item(XTIndexPtr XT_UNUSED(ind), XTIdxBranchDPtr leaf, XTIdxKeyValuePtr value, XTIdxResultPtr result) { xtWord1 *item; @@ -1053,7 +1300,7 @@ static void idx_insert_leaf_item(XTIndexPtr ind __attribute__((unused)), XTIdxBr XT_SET_DISK_2(leaf->tb_size_2, XT_MAKE_LEAF_SIZE(result->sr_item.i_total_size)); } -static void idx_insert_node_item(XTTableHPtr tab __attribute__((unused)), XTIndexPtr ind __attribute__((unused)), XTIdxBranchDPtr leaf, XTIdxKeyValuePtr value, XTIdxResultPtr result, xtIndexNodeID branch) +static void idx_insert_node_item(XTTableHPtr XT_UNUSED(tab), XTIndexPtr XT_UNUSED(ind), XTIdxBranchDPtr leaf, XTIdxKeyValuePtr value, XTIdxResultPtr result, xtIndexNodeID branch) { xtWord1 *item; @@ -1114,7 +1361,7 @@ static void idx_get_middle_branch_item(XTIndexPtr ind, XTIdxBranchDPtr branch, X } } -static size_t idx_write_branch_item(XTIndexPtr ind __attribute__((unused)), xtWord1 *item, XTIdxKeyValuePtr value) +static size_t idx_write_branch_item(XTIndexPtr XT_UNUSED(ind), xtWord1 *item, XTIdxKeyValuePtr value) { memcpy(item, value->sv_key, value->sv_length); xt_set_val_record_ref(item + value->sv_length, value); @@ -1133,23 +1380,38 @@ static xtBool idx_replace_node_key(XTOpenTablePtr ot, XTIndexPtr ind, IdxStackIt xtWord1 key_buf[XT_INDEX_MAX_KEY_SIZE]; #ifdef DEBUG - iref.ir_ulock = XT_UNLOCK_NONE; + iref.ir_xlock = 2; + iref.ir_updated = 2; #endif - if (!xt_ind_fetch(ot, current, XT_LOCK_WRITE, &iref)) + if (!xt_ind_fetch(ot, ind, current, XT_LOCK_WRITE, &iref)) return FAILED; if (iref.ir_block->cb_handle_count) { if (!xt_ind_copy_on_write(&iref)) goto failed_1; } + if (ind->mi_lazy_delete) { + ASSERT_NS(item_size == item->i_pos.i_item_size); + if (idx_is_item_deleted(iref.ir_branch, &item->i_pos)) + iref.ir_block->cp_del_count--; + } memmove(&iref.ir_branch->tb_data[item->i_pos.i_item_offset + item_size], &iref.ir_branch->tb_data[item->i_pos.i_item_offset + item->i_pos.i_item_size], item->i_pos.i_total_size - item->i_pos.i_item_offset - item->i_pos.i_item_size); memcpy(&iref.ir_branch->tb_data[item->i_pos.i_item_offset], item_buf, item_size); + if (ind->mi_lazy_delete) { + if (idx_is_item_deleted(iref.ir_branch, &item->i_pos)) + iref.ir_block->cp_del_count++; + } item->i_pos.i_total_size = item->i_pos.i_total_size + item_size - item->i_pos.i_item_size; XT_SET_DISK_2(iref.ir_branch->tb_size_2, XT_MAKE_NODE_SIZE(item->i_pos.i_total_size)); IDX_TRACE("%d-> %x\n", (int) XT_NODE_ID(current), (int) XT_GET_DISK_2(iref.ir_branch->tb_size_2)); + iref.ir_updated = TRUE; +#ifdef DEBUG + if (ind->mi_lazy_delete) + ASSERT_NS(item->i_pos.i_total_size <= XT_INDEX_PAGE_DATA_SIZE); +#endif if (item->i_pos.i_total_size <= XT_INDEX_PAGE_DATA_SIZE) return xt_ind_release(ot, ind, XT_UNLOCK_W_UPDATE, &iref); @@ -1184,6 +1446,7 @@ static xtBool idx_replace_node_key(XTOpenTablePtr ot, XTIndexPtr ind, IdxStackIt /* Change the size of the old branch: */ XT_SET_DISK_2(iref.ir_branch->tb_size_2, XT_MAKE_NODE_SIZE(result.sr_item.i_item_offset)); IDX_TRACE("%d-> %x\n", (int) XT_NODE_ID(current), (int) XT_GET_DISK_2(iref.ir_branch->tb_size_2)); + iref.ir_updated = TRUE; xt_ind_release(ot, ind, XT_UNLOCK_W_UPDATE, &iref); @@ -1237,7 +1500,8 @@ static xtBool idx_insert_node(XTOpenTablePtr ot, XTIndexPtr ind, IdxBranchStackP XTIdxBranchDPtr new_branch_ptr; #ifdef DEBUG - iref.ir_ulock = XT_UNLOCK_NONE; + iref.ir_xlock = 2; + iref.ir_updated = 2; #endif /* Insert a new branch (key, data)... */ if (!(stack_item = idx_pop(stack))) { @@ -1268,7 +1532,7 @@ static xtBool idx_insert_node(XTOpenTablePtr ot, XTIndexPtr ind, IdxBranchStackP * cache, and will remain in cache when we read again below for the * purpose of update. */ - if (!xt_ind_fetch(ot, current, XT_LOCK_READ, &iref)) + if (!xt_ind_fetch(ot, ind, current, XT_LOCK_READ, &iref)) goto failed; ASSERT_NS(XT_IS_NODE(XT_GET_DISK_2(iref.ir_branch->tb_size_2))); ind->mi_scan_branch(ot->ot_table, ind, iref.ir_branch, key_value, &result); @@ -1280,6 +1544,7 @@ static xtBool idx_insert_node(XTOpenTablePtr ot, XTIndexPtr ind, IdxBranchStackP } idx_insert_node_item(ot->ot_table, ind, iref.ir_branch, key_value, &result, branch); IDX_TRACE("%d-> %x\n", (int) XT_NODE_ID(current), (int) XT_GET_DISK_2(ot->ot_ind_wbuf.tb_size_2)); + iref.ir_updated = TRUE; ASSERT_NS(result.sr_item.i_total_size <= XT_INDEX_PAGE_DATA_SIZE); xt_ind_release(ot, ind, XT_UNLOCK_R_UPDATE, &iref); goto done_ok; @@ -1314,6 +1579,7 @@ static xtBool idx_insert_node(XTOpenTablePtr ot, XTIndexPtr ind, IdxBranchStackP goto failed_2; } memcpy(iref.ir_branch, &ot->ot_ind_wbuf, offsetof(XTIdxBranchDRec, tb_data) + result.sr_item.i_item_offset); + iref.ir_updated = TRUE; xt_ind_release(ot, ind, XT_UNLOCK_R_UPDATE, &iref); /* Insert the new branch into the parent node, using the new middle key value: */ @@ -1373,7 +1639,8 @@ static xtBool idx_check_duplicates(XTOpenTablePtr ot, XTIndexPtr ind, XTIdxKeyVa XTXactWaitRec xw; #ifdef DEBUG - iref.ir_ulock = XT_UNLOCK_NONE; + iref.ir_xlock = 2; + iref.ir_updated = 2; #endif retry: idx_newstack(&stack); @@ -1385,7 +1652,7 @@ static xtBool idx_check_duplicates(XTOpenTablePtr ot, XTIndexPtr ind, XTIdxKeyVa key_value->sv_flags = 0; while (XT_NODE_ID(current)) { - if (!xt_ind_fetch(ot, current, XT_LOCK_READ, &iref)) { + if (!xt_ind_fetch(ot, ind, current, XT_LOCK_READ, &iref)) { key_value->sv_flags = save_flags; return FAILED; } @@ -1422,7 +1689,7 @@ static xtBool idx_check_duplicates(XTOpenTablePtr ot, XTIndexPtr ind, XTIdxKeyVa while ((node = idx_pop(&stack))) { if (node->i_pos.i_item_offset < node->i_pos.i_total_size) { current = node->i_branch; - if (!xt_ind_fetch(ot, current, XT_LOCK_READ, &iref)) + if (!xt_ind_fetch(ot, ind, current, XT_LOCK_READ, &iref)) return FAILED; xt_get_res_record_ref(&iref.ir_branch->tb_data[node->i_pos.i_item_offset + node->i_pos.i_item_size - XT_RECORD_REF_SIZE], &result); result.sr_item = node->i_pos; @@ -1439,6 +1706,11 @@ static xtBool idx_check_duplicates(XTOpenTablePtr ot, XTIndexPtr ind, XTIdxKeyVa break; } + if (ind->mi_lazy_delete) { + if (result.sr_row_id == (xtRowID) -1) + goto next_item; + } + switch (xt_tab_maybe_committed(ot, result.sr_rec_id, &xn_id, NULL, NULL)) { case XT_MAYBE: /* Record is not committed, wait for the transaction. */ @@ -1464,6 +1736,7 @@ static xtBool idx_check_duplicates(XTOpenTablePtr ot, XTIndexPtr ind, XTIdxKeyVa break; } + next_item: idx_next_branch_item(ot->ot_table, ind, iref.ir_branch, &result); if (result.sr_item.i_node_ref_size) { @@ -1473,7 +1746,7 @@ static xtBool idx_check_duplicates(XTOpenTablePtr ot, XTIndexPtr ind, XTIdxKeyVa if (!idx_push(&stack, current, &result.sr_item)) return FAILED; current = result.sr_branch; - if (!xt_ind_fetch(ot, current, XT_LOCK_READ, &iref)) + if (!xt_ind_fetch(ot, ind, current, XT_LOCK_READ, &iref)) return FAILED; idx_first_branch_item(ot->ot_table, ind, iref.ir_branch, &result); if (!result.sr_item.i_node_ref_size) @@ -1489,6 +1762,14 @@ static xtBool idx_check_duplicates(XTOpenTablePtr ot, XTIndexPtr ind, XTIdxKeyVa return FAILED; } +inline static void idx_still_on_key(XTIndexPtr ind, register XTIdxSearchKeyPtr search_key, register XTIdxBranchDPtr branch, register XTIdxItemPtr item) +{ + if (search_key && search_key->sk_on_key) { + search_key->sk_on_key = myxt_compare_key(ind, search_key->sk_key_value.sv_flags, search_key->sk_key_value.sv_length, + search_key->sk_key_value.sv_key, &branch->tb_data[item->i_item_offset]) == 0; + } +} + /* * Insert a value into the given index. Return FALSE if an error occurs. */ @@ -1506,9 +1787,11 @@ xtPublic xtBool xt_idx_insert(XTOpenTablePtr ot, XTIndexPtr ind, xtRowID row_id, size_t new_size; xtBool check_for_dups = ind->mi_flags & (HA_UNIQUE_CHECK | HA_NOSAME) && !allow_dups; xtBool lock_structure = FALSE; + xtBool updated = FALSE; #ifdef DEBUG - iref.ir_ulock = XT_UNLOCK_NONE; + iref.ir_xlock = 2; + iref.ir_updated = 2; #endif #ifdef CHECK_AND_PRINT //idx_check_index(ot, ind, TRUE); @@ -1559,6 +1842,7 @@ xtPublic xtBool xt_idx_insert(XTOpenTablePtr ot, XTIndexPtr ind, xtRowID row_id, XT_INDEX_READ_LOCK(ind, ot); retry: + /* Create a root node if required: */ if (!(XT_NODE_ID(current) = XT_NODE_ID(ind->mi_root))) { /* Index is empty, create a new one: */ ASSERT_NS(lock_structure); @@ -1575,8 +1859,9 @@ xtPublic xtBool xt_idx_insert(XTOpenTablePtr ot, XTIndexPtr ind, xtRowID row_id, goto done_ok; } + /* Search down the tree for the insertion point. */ while (XT_NODE_ID(current)) { - if (!xt_ind_fetch(ot, current, XT_XLOCK_LEAF, &iref)) + if (!xt_ind_fetch(ot, ind, current, XT_XLOCK_LEAF, &iref)) goto failed; ind->mi_scan_branch(ot->ot_table, ind, iref.ir_branch, &key_value, &result); if (result.sr_duplicate) { @@ -1601,8 +1886,23 @@ xtPublic xtBool xt_idx_insert(XTOpenTablePtr ot, XTIndexPtr ind, xtRowID row_id, } } if (result.sr_found) { - /* Node found, can happen during recovery of indexes! */ - XTPageUnlockType utype; + /* Node found, can happen during recovery of indexes! + * We have found an exact match of both key and record. + */ + XTPageUnlockType utype; + xtBool overwrite = FALSE; + + /* {LAZY-DEL-INDEX-ITEMS} + * If the item has been lazy deleted, then just overwrite! + */ + if (result.sr_row_id == (xtRowID) -1) { + xtWord2 del_count; + + /* This is safe because we have an xlock on the leaf. */ + if ((del_count = iref.ir_block->cp_del_count)) + iref.ir_block->cp_del_count = del_count-1; + overwrite = TRUE; + } if (!result.sr_row_id && row_id) { /* {INDEX-RECOV_ROWID} Set the row-id @@ -1610,20 +1910,11 @@ xtPublic xtBool xt_idx_insert(XTOpenTablePtr ot, XTIndexPtr ind, xtRowID row_id, * is not committed. * It will be removed later by the sweeper. */ - size_t offset; - xtWord1 *data; - - offset = - /* This is the offset of the reference in the item we found: */ - result.sr_item.i_item_offset + result.sr_item.i_item_size - XT_RECORD_REF_SIZE + - /* This is the offset of the row id in the reference: */ - 4; - data = &iref.ir_branch->tb_data[offset]; - - /* This update does not change the structure of page, so we do it without - * copying the page before we write. - */ - XT_SET_DISK_4(data, row_id); + overwrite = TRUE; + } + + if (overwrite) { + idx_set_item_row_id(&iref, &result.sr_item, row_id); utype = result.sr_item.i_node_ref_size ? XT_UNLOCK_R_UPDATE : XT_UNLOCK_W_UPDATE; } else @@ -1644,14 +1935,84 @@ xtPublic xtBool xt_idx_insert(XTOpenTablePtr ot, XTIndexPtr ind, xtRowID row_id, /* Must be a leaf!: */ ASSERT_NS(!result.sr_item.i_node_ref_size); + updated = FALSE; + if (ind->mi_lazy_delete && iref.ir_block->cp_del_count) { + /* There are a number of possibilities: + * - We could just replace a lazy deleted slot. + * - We could compact and insert. + * - We could just insert + */ + + if (result.sr_item.i_item_offset > 0) { + /* Check if it can go into the previous node: */ + XTIdxResultRec t_res; + + t_res.sr_item = result.sr_item; + xt_prev_branch_item_fix(ot->ot_table, ind, iref.ir_branch, &t_res); + if (t_res.sr_row_id != (xtRowID) -1) + goto try_current; + + /* Yup, it can, but first check to see if it would be + * better to put it in the current node. + * This is the case if the previous node key is not the + * same as the key we are adding... + */ + if (result.sr_item.i_item_offset < result.sr_item.i_total_size && + result.sr_row_id == (xtRowID) -1) { + if (!idx_cmp_item_key_fix(&iref, &t_res.sr_item, &key_value)) + goto try_current; + } + + idx_set_item_key_fix(&iref, &t_res.sr_item, &key_value); + iref.ir_block->cp_del_count--; + xt_ind_release(ot, ind, XT_UNLOCK_W_UPDATE, &iref); + goto done_ok; + } + + try_current: + if (result.sr_item.i_item_offset < result.sr_item.i_total_size) { + if (result.sr_row_id == (xtRowID) -1) { + idx_set_item_key_fix(&iref, &result.sr_item, &key_value); + iref.ir_block->cp_del_count--; + xt_ind_release(ot, ind, XT_UNLOCK_W_UPDATE, &iref); + goto done_ok; + } + } + + /* Check if we must compact... + * It makes no sense to split as long as there are lazy deleted items + * in the page. So, delete them if a split would otherwise be required! + */ + ASSERT_NS(key_value.sv_length + XT_RECORD_REF_SIZE == result.sr_item.i_item_size); + if (result.sr_item.i_total_size + key_value.sv_length + XT_RECORD_REF_SIZE > XT_INDEX_PAGE_DATA_SIZE) { + if (!idx_compact_leaf(ot, ind, &iref, &result.sr_item)) + goto failed; + updated = TRUE; + } + + /* Fall through to the insert code... */ + /* NOTE: if there were no lazy deleted items in the leaf, then + * idx_compact_leaf is a NOP. This is the only case in which it may not + * fall through and do the insert below. + * + * Normally, if the cp_del_count is correct then the insert + * will work below, and the assertion here will not fail. + * + * In this case, the xt_ind_release() will correctly indicate an update. + */ + ASSERT_NS(result.sr_item.i_total_size + key_value.sv_length + XT_RECORD_REF_SIZE <= XT_INDEX_PAGE_DATA_SIZE); + } + if (result.sr_item.i_total_size + key_value.sv_length + XT_RECORD_REF_SIZE <= XT_INDEX_PAGE_DATA_SIZE) { if (iref.ir_block->cb_handle_count) { if (!xt_ind_copy_on_write(&iref)) goto failed_1; } + idx_insert_leaf_item(ind, iref.ir_branch, &key_value, &result); IDX_TRACE("%d-> %x\n", (int) XT_NODE_ID(current), (int) XT_GET_DISK_2(ot->ot_ind_wbuf.tb_size_2)); ASSERT_NS(result.sr_item.i_total_size <= XT_INDEX_PAGE_DATA_SIZE); + iref.ir_updated = TRUE; xt_ind_release(ot, ind, XT_UNLOCK_W_UPDATE, &iref); goto done_ok; } @@ -1660,7 +2021,7 @@ xtPublic xtBool xt_idx_insert(XTOpenTablePtr ot, XTIndexPtr ind, xtRowID row_id, * Make sure we have a structural lock: */ if (!lock_structure) { - xt_ind_release(ot, ind, XT_UNLOCK_WRITE, &iref); + xt_ind_release(ot, ind, updated ? XT_UNLOCK_W_UPDATE : XT_UNLOCK_WRITE, &iref); XT_INDEX_UNLOCK(ind, ot); lock_structure = TRUE; goto lock_and_retry; @@ -1705,6 +2066,7 @@ xtPublic xtBool xt_idx_insert(XTOpenTablePtr ot, XTIndexPtr ind, xtRowID row_id, goto failed_2; } memcpy(iref.ir_branch, &ot->ot_ind_wbuf, offsetof(XTIdxBranchDRec, tb_data) + result.sr_item.i_item_offset); + iref.ir_updated = TRUE; xt_ind_release(ot, ind, XT_UNLOCK_W_UPDATE, &iref); /* Insert the new branch into the parent node, using the new middle key value: */ @@ -1732,7 +2094,7 @@ xtPublic xtBool xt_idx_insert(XTOpenTablePtr ot, XTIndexPtr ind, xtRowID row_id, idx_free_branch(ot, ind, new_branch); failed_1: - xt_ind_release(ot, ind, XT_UNLOCK_WRITE, &iref); + xt_ind_release(ot, ind, updated ? XT_UNLOCK_W_UPDATE : XT_UNLOCK_WRITE, &iref); failed: XT_INDEX_UNLOCK(ind, ot); @@ -1747,18 +2109,175 @@ xtPublic xtBool xt_idx_insert(XTOpenTablePtr ot, XTIndexPtr ind, xtRowID row_id, return FAILED; } + +/* Remove the given item in the node. + * This is done by going down the tree to find a replacement + * for the deleted item! + */ +static xtBool idx_remove_item_in_node(XTOpenTablePtr ot, XTIndexPtr ind, IdxBranchStackPtr stack, XTIndReferencePtr iref, XTIdxKeyValuePtr key_value) +{ + IdxStackItemPtr delete_node; + XTIdxResultRec result; + xtIndexNodeID current; + xtBool lazy_delete_cleanup_required = FALSE; + IdxStackItemPtr current_top; + + delete_node = idx_top(stack); + current = delete_node->i_branch; + result.sr_item = delete_node->i_pos; + + /* Follow the branch after this item: */ + idx_next_branch_item(ot->ot_table, ind, iref->ir_branch, &result); + xt_ind_release(ot, ind, iref->ir_updated ? XT_UNLOCK_R_UPDATE : XT_UNLOCK_READ, iref); + + /* Go down the left-hand side until we reach a leaf: */ + while (XT_NODE_ID(current)) { + current = result.sr_branch; + if (!xt_ind_fetch(ot, ind, current, XT_XLOCK_LEAF, iref)) + return FAILED; + idx_first_branch_item(ot->ot_table, ind, iref->ir_branch, &result); + if (!result.sr_item.i_node_ref_size) + break; + xt_ind_release(ot, ind, XT_UNLOCK_READ, iref); + if (!idx_push(stack, current, &result.sr_item)) + return FAILED; + } + + ASSERT_NS(XT_NODE_ID(current)); + ASSERT_NS(!result.sr_item.i_node_ref_size); + + if (!xt_ind_reserve(ot, stack->s_top + 2, iref->ir_branch)) { + xt_ind_release(ot, ind, XT_UNLOCK_WRITE, iref); + return FAILED; + } + + /* This code removes lazy deleted items from the leaf, + * before we promote an item to a leaf. + * This is not essential, but prevents lazy deleted + * items from being propogated up the tree. + */ + if (ind->mi_lazy_delete) { + if (iref->ir_block->cp_del_count) { + if (!idx_compact_leaf(ot, ind, iref, &result.sr_item)) + return FAILED; + } + } + + /* Crawl back up the stack trace, looking for a key + * that can be used to replace the deleted key. + * + * Any empty nodes on the way up can be removed! + */ + if (result.sr_item.i_total_size > 0) { + /* There is a key in the leaf, extract it, and put it in the node: */ + memcpy(key_value->sv_key, &iref->ir_branch->tb_data[result.sr_item.i_item_offset], result.sr_item.i_item_size); + /* This call also frees the iref.ir_branch page! */ + if (!idx_remove_branch_item_right(ot, ind, current, iref, &result.sr_item)) + return FAILED; + if (!idx_replace_node_key(ot, ind, delete_node, stack, result.sr_item.i_item_size, key_value->sv_key)) + return FAILED; + goto done_ok; + } + + xt_ind_release(ot, ind, iref->ir_updated ? XT_UNLOCK_W_UPDATE : XT_UNLOCK_WRITE, iref); + + for (;;) { + /* The current node/leaf is empty, remove it: */ + idx_free_branch(ot, ind, current); + + current_top = idx_pop(stack); + current = current_top->i_branch; + if (!xt_ind_fetch(ot, ind, current, XT_XLOCK_LEAF, iref)) + return FAILED; + + if (current_top == delete_node) { + /* All children have been removed. Delete the key and done: */ + if (!idx_remove_branch_item_right(ot, ind, current, iref, ¤t_top->i_pos)) + return FAILED; + goto done_ok; + } + + if (current_top->i_pos.i_total_size > current_top->i_pos.i_node_ref_size) { + /* Save the key: */ + memcpy(key_value->sv_key, &iref->ir_branch->tb_data[current_top->i_pos.i_item_offset], current_top->i_pos.i_item_size); + /* This function also frees the cache page: */ + if (!idx_remove_branch_item_left(ot, ind, current, iref, ¤t_top->i_pos, &lazy_delete_cleanup_required)) + return FAILED; + if (!idx_replace_node_key(ot, ind, delete_node, stack, current_top->i_pos.i_item_size, key_value->sv_key)) + return FAILED; + /* */ + if (lazy_delete_cleanup_required) { + if (!xt_ind_fetch(ot, ind, current, XT_LOCK_READ, iref)) + return FAILED; + if (!idx_remove_lazy_deleted_item_in_node(ot, ind, current, iref, key_value)) + return FAILED; + } + goto done_ok; + } + xt_ind_release(ot, ind, current_top->i_pos.i_node_ref_size ? XT_UNLOCK_READ : XT_UNLOCK_WRITE, iref); + } + + done_ok: +#ifdef XT_TRACK_INDEX_UPDATES + ASSERT_NS(ot->ot_ind_reserved >= ot->ot_ind_reads); +#endif + return OK; +} + +/* + * This function assumes we have a lock on the structure of the index. + */ +static xtBool idx_remove_lazy_deleted_item_in_node(XTOpenTablePtr ot, XTIndexPtr ind, xtIndexNodeID current, XTIndReferencePtr iref, XTIdxKeyValuePtr key_value) +{ + IdxBranchStackRec stack; + XTIdxResultRec result; + + /* Now remove all lazy deleted items in this node.... */ + idx_first_branch_item(ot->ot_table, ind, (XTIdxBranchDPtr) iref->ir_block->cb_data, &result); + + for (;;) { + while (result.sr_item.i_item_offset < result.sr_item.i_total_size) { + if (result.sr_row_id == (xtRowID) -1) + goto remove_item; + idx_next_branch_item(ot->ot_table, ind, (XTIdxBranchDPtr) iref->ir_block->cb_data, &result); + } + break; + + remove_item: + + idx_newstack(&stack); + if (!idx_push(&stack, current, &result.sr_item)) { + xt_ind_release(ot, ind, iref->ir_updated ? XT_UNLOCK_R_UPDATE : XT_UNLOCK_READ, iref); + return FAILED; + } + + if (!idx_remove_item_in_node(ot, ind, &stack, iref, key_value)) + return FAILED; + + /* Go back up to the node we are trying to + * free of things. + */ + if (!xt_ind_fetch(ot, ind, current, XT_LOCK_READ, iref)) + return FAILED; + /* Load the data again: */ + idx_reload_item_fix(ind, iref->ir_branch, &result); + } + + xt_ind_release(ot, ind, iref->ir_updated ? XT_UNLOCK_R_UPDATE : XT_UNLOCK_READ, iref); + return OK; +} + static xtBool idx_delete(XTOpenTablePtr ot, XTIndexPtr ind, XTIdxKeyValuePtr key_value) { IdxBranchStackRec stack; xtIndexNodeID current; XTIndReferenceRec iref; XTIdxResultRec result; - IdxStackItemPtr delete_node = NULL; - IdxStackItemPtr current_top = NULL; xtBool lock_structure = FALSE; #ifdef DEBUG - iref.ir_ulock = XT_UNLOCK_NONE; + iref.ir_xlock = 2; + iref.ir_updated = 2; #endif /* The index appears to have no root: */ if (!XT_NODE_ID(ind->mi_root)) @@ -1776,17 +2295,37 @@ static xtBool idx_delete(XTOpenTablePtr ot, XTIndexPtr ind, XTIdxKeyValuePtr key goto done_ok; while (XT_NODE_ID(current)) { - if (!xt_ind_fetch(ot, current, XT_XLOCK_LEAF, &iref)) + if (!xt_ind_fetch(ot, ind, current, XT_XLOCK_DEL_LEAF, &iref)) goto failed; ind->mi_scan_branch(ot->ot_table, ind, iref.ir_branch, key_value, &result); if (!result.sr_item.i_node_ref_size) { /* A leaf... */ if (result.sr_found) { - if (!idx_remove_branch_item_right(ot, ind, current, &iref, &result.sr_item)) - goto failed; + if (ind->mi_lazy_delete) { + /* If the we have a W lock, then fetch decided that we + * need to compact the page. + * The decision is made by xt_idx_lazy_delete_on_leaf() + */ + if (!iref.ir_xlock) + idx_lazy_delete_branch_item(ot, ind, &iref, &result.sr_item); + else { + if (!iref.ir_block->cp_del_count) { + if (!idx_remove_branch_item_right(ot, ind, current, &iref, &result.sr_item)) + goto failed; + } + else { + if (!idx_lazy_remove_leaf_item_right(ot, ind, &iref, &result.sr_item)) + goto failed; + } + } + } + else { + if (!idx_remove_branch_item_right(ot, ind, current, &iref, &result.sr_item)) + goto failed; + } } else - xt_ind_release(ot, ind, XT_UNLOCK_WRITE, &iref); + xt_ind_release(ot, ind, iref.ir_xlock ? XT_UNLOCK_WRITE : XT_UNLOCK_READ, &iref); goto done_ok; } if (!idx_push(&stack, current, &result.sr_item)) { @@ -1803,6 +2342,35 @@ static xtBool idx_delete(XTOpenTablePtr ot, XTIndexPtr ind, XTIdxKeyValuePtr key /* Must be a non-leaf!: */ ASSERT_NS(result.sr_item.i_node_ref_size); + if (ind->mi_lazy_delete) { + if (!idx_lazy_delete_on_node(ind, iref.ir_block, &result.sr_item)) { + /* We need to remove some items from this node: */ + + if (!lock_structure) { + xt_ind_release(ot, ind, XT_UNLOCK_READ, &iref); + XT_INDEX_UNLOCK(ind, ot); + lock_structure = TRUE; + goto lock_and_retry; + } + + idx_set_item_deleted(&iref, &result.sr_item); + if (!idx_remove_lazy_deleted_item_in_node(ot, ind, current, &iref, key_value)) + goto failed; + goto done_ok; + } + + if (!ot->ot_table->tab_dic.dic_no_lazy_delete) { + /* {LAZY-DEL-INDEX-ITEMS} + * We just set item to deleted, this is a significant time + * saver. + * But this item can only be cleaned up when all + * items on the node below are deleted. + */ + idx_lazy_delete_branch_item(ot, ind, &iref, &result.sr_item); + goto done_ok; + } + } + /* We will have to remove the key from a non-leaf node, * which means we are changing the structure of the index. * Make sure we have a structural lock: @@ -1815,86 +2383,8 @@ static xtBool idx_delete(XTOpenTablePtr ot, XTIndexPtr ind, XTIdxKeyValuePtr key } /* This is the item we will have to replace: */ - delete_node = idx_top(&stack); - - /* Follow the branch after this item: */ - idx_next_branch_item(ot->ot_table, ind, iref.ir_branch, &result); - ASSERT_NS(XT_NODE_ID(current)); - xt_ind_release(ot, ind, XT_UNLOCK_READ, &iref); - - /* Go down the left-hand side until we reach a leaf: */ - while (XT_NODE_ID(current)) { - current = result.sr_branch; - if (!xt_ind_fetch(ot, current, XT_XLOCK_LEAF, &iref)) - goto failed; - idx_first_branch_item(ot->ot_table, ind, iref.ir_branch, &result); - if (!result.sr_item.i_node_ref_size) - break; - xt_ind_release(ot, ind, XT_UNLOCK_READ, &iref); - if (!idx_push(&stack, current, &result.sr_item)) - goto failed; - } - - ASSERT_NS(XT_NODE_ID(current)); - ASSERT_NS(!result.sr_item.i_node_ref_size); - - if (!xt_ind_reserve(ot, stack.s_top + 2, iref.ir_branch)) { - xt_ind_release(ot, ind, XT_UNLOCK_WRITE, &iref); + if (!idx_remove_item_in_node(ot, ind, &stack, &iref, key_value)) goto failed; - } - - /* Crawl back up the stack trace, looking for a key - * that can be used to replace the deleted key. - * - * Any empty nodes on the way up can be removed! - */ - if (result.sr_item.i_total_size > 0) { - /* There is a key in the leaf, extract it, and put it in the node: */ - memcpy(key_value->sv_key, &iref.ir_branch->tb_data[result.sr_item.i_item_offset], result.sr_item.i_item_size); - /* This call also frees the iref.ir_branch page! */ - if (!idx_remove_branch_item_right(ot, ind, current, &iref, &result.sr_item)) - goto failed; - if (!idx_replace_node_key(ot, ind, delete_node, &stack, result.sr_item.i_item_size, key_value->sv_key)) - goto failed; - goto done_ok_2; - } - - xt_ind_release(ot, ind, XT_UNLOCK_WRITE, &iref); - - for (;;) { - /* The current node/leaf is empty, remove it: */ - idx_free_branch(ot, ind, current); - - current_top = idx_pop(&stack); - current = current_top->i_branch; - if (!xt_ind_fetch(ot, current, XT_XLOCK_LEAF, &iref)) - goto failed; - - if (current_top == delete_node) { - /* All children have been removed. Delete the key and done: */ - if (!idx_remove_branch_item_right(ot, ind, current, &iref, ¤t_top->i_pos)) - goto failed; - goto done_ok_2; - } - - if (current_top->i_pos.i_total_size > current_top->i_pos.i_node_ref_size) { - /* Save the key: */ - memcpy(key_value->sv_key, &iref.ir_branch->tb_data[current_top->i_pos.i_item_offset], current_top->i_pos.i_item_size); - /* This function also frees the cache page: */ - if (!idx_remove_branch_item_left(ot, ind, current, &iref, ¤t_top->i_pos)) - goto failed; - if (!idx_replace_node_key(ot, ind, delete_node, &stack, current_top->i_pos.i_item_size, key_value->sv_key)) - goto failed; - goto done_ok_2; - } - xt_ind_release(ot, ind, current_top->i_pos.i_node_ref_size ? XT_UNLOCK_READ : XT_UNLOCK_WRITE, &iref); - } - - - done_ok_2: -#ifdef XT_TRACK_INDEX_UPDATES - ASSERT_NS(ot->ot_ind_reserved >= ot->ot_ind_reads); -#endif done_ok: XT_INDEX_UNLOCK(ind, ot); @@ -1945,7 +2435,8 @@ xtPublic xtBool xt_idx_update_row_id(XTOpenTablePtr ot, XTIndexPtr ind, xtRecord xtWord1 key_buf[XT_INDEX_MAX_KEY_SIZE + XT_MAX_RECORD_REF_SIZE]; #ifdef DEBUG - iref.ir_ulock = XT_UNLOCK_NONE; + iref.ir_xlock = 2; + iref.ir_updated = 2; #endif #ifdef CHECK_AND_PRINT idx_check_index(ot, ind, TRUE); @@ -1989,7 +2480,7 @@ xtPublic xtBool xt_idx_update_row_id(XTOpenTablePtr ot, XTIndexPtr ind, xtRecord goto done_ok; while (XT_NODE_ID(current)) { - if (!xt_ind_fetch(ot, current, XT_LOCK_READ, &iref)) + if (!xt_ind_fetch(ot, ind, current, XT_LOCK_READ, &iref)) goto failed; ind->mi_scan_branch(ot->ot_table, ind, iref.ir_branch, &key_value, &result); if (result.sr_found || !result.sr_item.i_node_ref_size) @@ -1999,23 +2490,10 @@ xtPublic xtBool xt_idx_update_row_id(XTOpenTablePtr ot, XTIndexPtr ind, xtRecord } if (result.sr_found) { - size_t offset; - xtWord1 *data; - - offset = - /* This is the offset of the reference in the item we found: */ - result.sr_item.i_item_offset + result.sr_item.i_item_size - XT_RECORD_REF_SIZE + - /* This is the offset of the row id in the reference: */ - 4; - data = &iref.ir_branch->tb_data[offset]; - - /* This update does not change the structure of page, so we do it without - * copying the page before we write. - * - * TODO: Check that concurrent reads can handle this! + /* TODO: Check that concurrent reads can handle this! * assuming the write is not atomic. */ - XT_SET_DISK_4(data, row_id); + idx_set_item_row_id(&iref, &result.sr_item, row_id); xt_ind_release(ot, ind, XT_UNLOCK_R_UPDATE, &iref); } else @@ -2076,7 +2554,8 @@ xtPublic xtBool xt_idx_search(XTOpenTablePtr ot, XTIndexPtr ind, register XTIdxS XTIdxResultRec result; #ifdef DEBUG - iref.ir_ulock = XT_UNLOCK_NONE; + iref.ir_xlock = 2; + iref.ir_updated = 2; #endif if (ot->ot_ind_rhandle) { xt_ind_release_handle(ot->ot_ind_rhandle, FALSE, ot->ot_thread); @@ -2110,7 +2589,7 @@ xtPublic xtBool xt_idx_search(XTOpenTablePtr ot, XTIndexPtr ind, register XTIdxS goto done_ok; while (XT_NODE_ID(current)) { - if (!xt_ind_fetch(ot, current, XT_LOCK_READ, &iref)) + if (!xt_ind_fetch(ot, ind, current, XT_LOCK_READ, &iref)) goto failed; ind->mi_scan_branch(ot->ot_table, ind, iref.ir_branch, &search_key->sk_key_value, &result); if (result.sr_found) @@ -2124,6 +2603,17 @@ xtPublic xtBool xt_idx_search(XTOpenTablePtr ot, XTIndexPtr ind, register XTIdxS current = result.sr_branch; } + if (ind->mi_lazy_delete) { + ignore_lazy_deleted_items: + while (result.sr_item.i_item_offset < result.sr_item.i_total_size) { + if (result.sr_row_id != (xtRowID) -1) { + idx_still_on_key(ind, search_key, iref.ir_branch, &result.sr_item); + break; + } + idx_next_branch_item(ot->ot_table, ind, iref.ir_branch, &result); + } + } + if (result.sr_item.i_item_offset == result.sr_item.i_total_size) { IdxStackItemPtr node; @@ -2134,12 +2624,39 @@ xtPublic xtBool xt_idx_search(XTOpenTablePtr ot, XTIndexPtr ind, register XTIdxS xt_ind_release(ot, ind, XT_UNLOCK_READ, &iref); while ((node = idx_pop(&stack))) { if (node->i_pos.i_item_offset < node->i_pos.i_total_size) { - xtRecordID rec_id; - - if (!xt_ind_fetch(ot, node->i_branch, XT_LOCK_READ, &iref)) + if (!xt_ind_fetch(ot, ind, node->i_branch, XT_LOCK_READ, &iref)) goto failed; - xt_get_record_ref(&iref.ir_branch->tb_data[node->i_pos.i_item_offset + node->i_pos.i_item_size - XT_RECORD_REF_SIZE], &rec_id, &ot->ot_curr_row_id); - ot->ot_curr_rec_id = rec_id; + xt_get_res_record_ref(&iref.ir_branch->tb_data[node->i_pos.i_item_offset + node->i_pos.i_item_size - XT_RECORD_REF_SIZE], &result); + + if (ind->mi_lazy_delete) { + result.sr_item = node->i_pos; + if (result.sr_row_id == (xtRowID) -1) { + /* If this node pointer is lazy deleted, then + * go down the next branch... + */ + idx_next_branch_item(ot->ot_table, ind, iref.ir_branch, &result); + + /* Go down to the bottom: */ + current = node->i_branch; + while (XT_NODE_ID(current)) { + xt_ind_release(ot, ind, XT_UNLOCK_READ, &iref); + if (!idx_push(&stack, current, &result.sr_item)) + goto failed; + current = result.sr_branch; + if (!xt_ind_fetch(ot, ind, current, XT_LOCK_READ, &iref)) + goto failed; + idx_first_branch_item(ot->ot_table, ind, iref.ir_branch, &result); + if (!result.sr_item.i_node_ref_size) + break; + } + + goto ignore_lazy_deleted_items; + } + idx_still_on_key(ind, search_key, iref.ir_branch, &result.sr_item); + } + + ot->ot_curr_rec_id = result.sr_rec_id; + ot->ot_curr_row_id = result.sr_row_id; ot->ot_ind_state = node->i_pos; /* Convert the pointer to a handle which can be used in later operations: */ @@ -2180,14 +2697,16 @@ xtPublic xtBool xt_idx_search(XTOpenTablePtr ot, XTIndexPtr ind, register XTIdxS //idx_check_index(ot, ind, TRUE); //idx_check_on_key(ot); #endif - ASSERT_NS(iref.ir_ulock == XT_UNLOCK_NONE); + ASSERT_NS(iref.ir_xlock == 2); + ASSERT_NS(iref.ir_updated == 2); return OK; failed: XT_INDEX_UNLOCK(ind, ot); if (idx_out_of_memory_failure(ot)) goto retry_after_oom; - ASSERT_NS(iref.ir_ulock == XT_UNLOCK_NONE); + ASSERT_NS(iref.ir_xlock == 2); + ASSERT_NS(iref.ir_updated == 2); return FAILED; } @@ -2199,7 +2718,8 @@ xtPublic xtBool xt_idx_search_prev(XTOpenTablePtr ot, XTIndexPtr ind, register X XTIdxResultRec result; #ifdef DEBUG - iref.ir_ulock = XT_UNLOCK_NONE; + iref.ir_xlock = 2; + iref.ir_updated = 2; #endif if (ot->ot_ind_rhandle) { xt_ind_release_handle(ot->ot_ind_rhandle, FALSE, ot->ot_thread); @@ -2232,7 +2752,7 @@ xtPublic xtBool xt_idx_search_prev(XTOpenTablePtr ot, XTIndexPtr ind, register X goto done_ok; while (XT_NODE_ID(current)) { - if (!xt_ind_fetch(ot, current, XT_LOCK_READ, &iref)) + if (!xt_ind_fetch(ot, ind, current, XT_LOCK_READ, &iref)) goto failed; ind->mi_scan_branch(ot->ot_table, ind, iref.ir_branch, &search_key->sk_key_value, &result); if (result.sr_found) @@ -2249,17 +2769,43 @@ xtPublic xtBool xt_idx_search_prev(XTOpenTablePtr ot, XTIndexPtr ind, register X if (result.sr_item.i_item_offset == 0) { IdxStackItemPtr node; - /* We are at the end of a leaf node. - * Go up the stack to find the start poition of the next key. + search_up_stack: + /* We are at the start of a leaf node. + * Go up the stack to find the start position of the next key. * If we find none, then we are the end of the index. */ xt_ind_release(ot, ind, XT_UNLOCK_READ, &iref); while ((node = idx_pop(&stack))) { if (node->i_pos.i_item_offset > node->i_pos.i_node_ref_size) { - if (!xt_ind_fetch(ot, node->i_branch, XT_LOCK_READ, &iref)) + if (!xt_ind_fetch(ot, ind, node->i_branch, XT_LOCK_READ, &iref)) goto failed; result.sr_item = node->i_pos; ind->mi_prev_item(ot->ot_table, ind, iref.ir_branch, &result); + + if (ind->mi_lazy_delete) { + if (result.sr_row_id == (xtRowID) -1) { + /* Go down to the bottom, in order to scan the leaf backwards: */ + current = node->i_branch; + while (XT_NODE_ID(current)) { + xt_ind_release(ot, ind, XT_UNLOCK_READ, &iref); + if (!idx_push(&stack, current, &result.sr_item)) + goto failed; + current = result.sr_branch; + if (!xt_ind_fetch(ot, ind, current, XT_LOCK_READ, &iref)) + goto failed; + ind->mi_last_item(ot->ot_table, ind, iref.ir_branch, &result); + if (!result.sr_item.i_node_ref_size) + break; + } + + /* If the leaf empty we have to go up the stack again... */ + if (result.sr_item.i_total_size == 0) + goto search_up_stack; + + goto scan_back_in_leaf; + } + } + goto record_found; } } @@ -2269,6 +2815,16 @@ xtPublic xtBool xt_idx_search_prev(XTOpenTablePtr ot, XTIndexPtr ind, register X /* We must just step once to the left in this leaf node... */ ind->mi_prev_item(ot->ot_table, ind, iref.ir_branch, &result); + if (ind->mi_lazy_delete) { + scan_back_in_leaf: + while (result.sr_row_id == (xtRowID) -1) { + if (result.sr_item.i_item_offset == 0) + goto search_up_stack; + ind->mi_prev_item(ot->ot_table, ind, iref.ir_branch, &result); + } + idx_still_on_key(ind, search_key, iref.ir_branch, &result.sr_item); + } + record_found: ot->ot_curr_rec_id = result.sr_rec_id; ot->ot_curr_row_id = result.sr_row_id; @@ -2330,34 +2886,47 @@ xtPublic xtBool xt_idx_next(register XTOpenTablePtr ot, register XTIndexPtr ind, XTIndReferenceRec iref; #ifdef DEBUG - iref.ir_ulock = XT_UNLOCK_NONE; + iref.ir_xlock = 2; + iref.ir_updated = 2; #endif ASSERT_NS(ot->ot_ind_rhandle); xt_ind_lock_handle(ot->ot_ind_rhandle); - if (!ot->ot_ind_state.i_node_ref_size && - ot->ot_ind_state.i_item_offset < ot->ot_ind_state.i_total_size && + result.sr_item = ot->ot_ind_state; + if (!result.sr_item.i_node_ref_size && + result.sr_item.i_item_offset < result.sr_item.i_total_size && ot->ot_ind_rhandle->ih_cache_reference) { - key_value.sv_key = &ot->ot_ind_rhandle->ih_branch->tb_data[ot->ot_ind_state.i_item_offset]; - key_value.sv_length = ot->ot_ind_state.i_item_size - XT_RECORD_REF_SIZE; + XTIdxItemRec prev_item; + + key_value.sv_key = &ot->ot_ind_rhandle->ih_branch->tb_data[result.sr_item.i_item_offset]; + key_value.sv_length = result.sr_item.i_item_size - XT_RECORD_REF_SIZE; - result.sr_item = ot->ot_ind_state; + prev_item = result.sr_item; idx_next_branch_item(ot->ot_table, ind, ot->ot_ind_rhandle->ih_branch, &result); + + if (ind->mi_lazy_delete) { + while (result.sr_item.i_item_offset < result.sr_item.i_total_size) { + if (result.sr_row_id != (xtRowID) -1) + break; + prev_item = result.sr_item; + idx_next_branch_item(ot->ot_table, ind, ot->ot_ind_rhandle->ih_branch, &result); + } + } + if (result.sr_item.i_item_offset < result.sr_item.i_total_size) { /* Still on key? */ - if (search_key && search_key->sk_on_key) { - search_key->sk_on_key = myxt_compare_key(ind, search_key->sk_key_value.sv_flags, search_key->sk_key_value.sv_length, - search_key->sk_key_value.sv_key, &ot->ot_ind_rhandle->ih_branch->tb_data[result.sr_item.i_item_offset]) == 0; - } + idx_still_on_key(ind, search_key, ot->ot_ind_rhandle->ih_branch, &result.sr_item); xt_ind_unlock_handle(ot->ot_ind_rhandle); goto checked_on_key; } + + result.sr_item = prev_item; } key_value.sv_flags = XT_SEARCH_WHOLE_KEY; - xt_get_record_ref(&ot->ot_ind_rhandle->ih_branch->tb_data[ot->ot_ind_state.i_item_offset + ot->ot_ind_state.i_item_size - XT_RECORD_REF_SIZE], &key_value.sv_rec_id, &key_value.sv_row_id); + xt_get_record_ref(&ot->ot_ind_rhandle->ih_branch->tb_data[result.sr_item.i_item_offset + result.sr_item.i_item_size - XT_RECORD_REF_SIZE], &key_value.sv_rec_id, &key_value.sv_row_id); key_value.sv_key = key_buf; - key_value.sv_length = ot->ot_ind_state.i_item_size - XT_RECORD_REF_SIZE; - memcpy(key_buf, &ot->ot_ind_rhandle->ih_branch->tb_data[ot->ot_ind_state.i_item_offset], key_value.sv_length); + key_value.sv_length = result.sr_item.i_item_size - XT_RECORD_REF_SIZE; + memcpy(key_buf, &ot->ot_ind_rhandle->ih_branch->tb_data[result.sr_item.i_item_offset], key_value.sv_length); xt_ind_release_handle(ot->ot_ind_rhandle, TRUE, ot->ot_thread); ot->ot_ind_rhandle = NULL; @@ -2375,7 +2944,7 @@ xtPublic xtBool xt_idx_next(register XTOpenTablePtr ot, register XTIndexPtr ind, } while (XT_NODE_ID(current)) { - if (!xt_ind_fetch(ot, current, XT_LOCK_READ, &iref)) + if (!xt_ind_fetch(ot, ind, current, XT_LOCK_READ, &iref)) goto failed; ind->mi_scan_branch(ot->ot_table, ind, iref.ir_branch, &key_value, &result); if (result.sr_item.i_node_ref_size) { @@ -2389,7 +2958,7 @@ xtPublic xtBool xt_idx_next(register XTOpenTablePtr ot, register XTIndexPtr ind, if (!idx_push(&stack, current, &result.sr_item)) goto failed; current = result.sr_branch; - if (!xt_ind_fetch(ot, current, XT_LOCK_READ, &iref)) + if (!xt_ind_fetch(ot, ind, current, XT_LOCK_READ, &iref)) goto failed; idx_first_branch_item(ot->ot_table, ind, iref.ir_branch, &result); if (!result.sr_item.i_node_ref_size) @@ -2416,6 +2985,15 @@ xtPublic xtBool xt_idx_next(register XTOpenTablePtr ot, register XTIndexPtr ind, current = result.sr_branch; } + if (ind->mi_lazy_delete) { + ignore_lazy_deleted_items: + while (result.sr_item.i_item_offset < result.sr_item.i_total_size) { + if (result.sr_row_id != (xtRowID) -1) + break; + idx_next_branch_item(NULL, ind, iref.ir_branch, &result); + } + } + /* Check the current position in a leaf: */ if (result.sr_item.i_item_offset == result.sr_item.i_total_size) { /* At the end: */ @@ -2428,10 +3006,37 @@ xtPublic xtBool xt_idx_next(register XTOpenTablePtr ot, register XTIndexPtr ind, xt_ind_release(ot, ind, XT_UNLOCK_READ, &iref); while ((node = idx_pop(&stack))) { if (node->i_pos.i_item_offset < node->i_pos.i_total_size) { - if (!xt_ind_fetch(ot, node->i_branch, XT_LOCK_READ, &iref)) + if (!xt_ind_fetch(ot, ind, node->i_branch, XT_LOCK_READ, &iref)) goto failed; result.sr_item = node->i_pos; xt_get_res_record_ref(&iref.ir_branch->tb_data[result.sr_item.i_item_offset + result.sr_item.i_item_size - XT_RECORD_REF_SIZE], &result); + + if (ind->mi_lazy_delete) { + if (result.sr_row_id == (xtRowID) -1) { + /* If this node pointer is lazy deleted, then + * go down the next branch... + */ + idx_next_branch_item(ot->ot_table, ind, iref.ir_branch, &result); + + /* Go down to the bottom: */ + current = node->i_branch; + while (XT_NODE_ID(current)) { + xt_ind_release(ot, ind, XT_UNLOCK_READ, &iref); + if (!idx_push(&stack, current, &result.sr_item)) + goto failed; + current = result.sr_branch; + if (!xt_ind_fetch(ot, ind, current, XT_LOCK_READ, &iref)) + goto failed; + idx_first_branch_item(ot->ot_table, ind, iref.ir_branch, &result); + if (!result.sr_item.i_node_ref_size) + break; + } + + /* And scan the leaf... */ + goto ignore_lazy_deleted_items; + } + } + goto unlock_check_on_key; } } @@ -2503,32 +3108,39 @@ xtPublic xtBool xt_idx_prev(register XTOpenTablePtr ot, register XTIndexPtr ind, IdxStackItemPtr node; #ifdef DEBUG - iref.ir_ulock = XT_UNLOCK_NONE; + iref.ir_xlock = 2; + iref.ir_updated = 2; #endif ASSERT_NS(ot->ot_ind_rhandle); xt_ind_lock_handle(ot->ot_ind_rhandle); - if (!ot->ot_ind_state.i_node_ref_size && ot->ot_ind_state.i_item_offset > 0) { - key_value.sv_key = &ot->ot_ind_rhandle->ih_branch->tb_data[ot->ot_ind_state.i_item_offset]; - key_value.sv_length = ot->ot_ind_state.i_item_size - XT_RECORD_REF_SIZE; + result.sr_item = ot->ot_ind_state; + if (!result.sr_item.i_node_ref_size && result.sr_item.i_item_offset > 0) { + key_value.sv_key = &ot->ot_ind_rhandle->ih_branch->tb_data[result.sr_item.i_item_offset]; + key_value.sv_length = result.sr_item.i_item_size - XT_RECORD_REF_SIZE; - result.sr_item = ot->ot_ind_state; ind->mi_prev_item(ot->ot_table, ind, ot->ot_ind_rhandle->ih_branch, &result); - if (search_key && search_key->sk_on_key) { - search_key->sk_on_key = myxt_compare_key(ind, search_key->sk_key_value.sv_flags, search_key->sk_key_value.sv_length, - search_key->sk_key_value.sv_key, &ot->ot_ind_rhandle->ih_branch->tb_data[result.sr_item.i_item_offset]) == 0; + if (ind->mi_lazy_delete) { + while (result.sr_row_id == (xtRowID) -1) { + if (result.sr_item.i_item_offset == 0) + goto research; + ind->mi_prev_item(ot->ot_table, ind, ot->ot_ind_rhandle->ih_branch, &result); + } } + idx_still_on_key(ind, search_key, ot->ot_ind_rhandle->ih_branch, &result.sr_item); + xt_ind_unlock_handle(ot->ot_ind_rhandle); goto checked_on_key; } + research: key_value.sv_flags = XT_SEARCH_WHOLE_KEY; key_value.sv_rec_id = ot->ot_curr_rec_id; key_value.sv_row_id = 0; key_value.sv_key = key_buf; - key_value.sv_length = ot->ot_ind_state.i_item_size - XT_RECORD_REF_SIZE; - memcpy(key_buf, &ot->ot_ind_rhandle->ih_branch->tb_data[ot->ot_ind_state.i_item_offset], key_value.sv_length); + key_value.sv_length = result.sr_item.i_item_size - XT_RECORD_REF_SIZE; + memcpy(key_buf, &ot->ot_ind_rhandle->ih_branch->tb_data[result.sr_item.i_item_offset], key_value.sv_length); xt_ind_release_handle(ot->ot_ind_rhandle, TRUE, ot->ot_thread); ot->ot_ind_rhandle = NULL; @@ -2546,29 +3158,39 @@ xtPublic xtBool xt_idx_prev(register XTOpenTablePtr ot, register XTIndexPtr ind, } while (XT_NODE_ID(current)) { - if (!xt_ind_fetch(ot, current, XT_LOCK_READ, &iref)) + if (!xt_ind_fetch(ot, ind, current, XT_LOCK_READ, &iref)) goto failed; ind->mi_scan_branch(ot->ot_table, ind, iref.ir_branch, &key_value, &result); if (result.sr_item.i_node_ref_size) { if (result.sr_found) { /* If we have found the key in a node: */ + search_down_stack: /* Go down to the bottom: */ while (XT_NODE_ID(current)) { xt_ind_release(ot, ind, XT_UNLOCK_READ, &iref); if (!idx_push(&stack, current, &result.sr_item)) goto failed; current = result.sr_branch; - if (!xt_ind_fetch(ot, current, XT_LOCK_READ, &iref)) + if (!xt_ind_fetch(ot, ind, current, XT_LOCK_READ, &iref)) goto failed; ind->mi_last_item(ot->ot_table, ind, iref.ir_branch, &result); if (!result.sr_item.i_node_ref_size) break; } - /* Is the leaf not empty, then we are done... */ + /* If the leaf empty we have to go up the stack again... */ if (result.sr_item.i_total_size == 0) break; + + if (ind->mi_lazy_delete) { + while (result.sr_row_id == (xtRowID) -1) { + if (result.sr_item.i_item_offset == 0) + goto search_up_stack; + ind->mi_prev_item(ot->ot_table, ind, iref.ir_branch, &result); + } + } + goto unlock_check_on_key; } } @@ -2580,6 +3202,15 @@ xtPublic xtBool xt_idx_prev(register XTOpenTablePtr ot, register XTIndexPtr ind, if (result.sr_item.i_item_offset == 0) break; ind->mi_prev_item(ot->ot_table, ind, iref.ir_branch, &result); + + if (ind->mi_lazy_delete) { + while (result.sr_row_id == (xtRowID) -1) { + if (result.sr_item.i_item_offset == 0) + goto search_up_stack; + ind->mi_prev_item(ot->ot_table, ind, iref.ir_branch, &result); + } + } + goto unlock_check_on_key; } xt_ind_release(ot, ind, XT_UNLOCK_READ, &iref); @@ -2588,6 +3219,7 @@ xtPublic xtBool xt_idx_prev(register XTOpenTablePtr ot, register XTIndexPtr ind, current = result.sr_branch; } + search_up_stack: /* We are at the start of a leaf node. * Go up the stack to find the start poition of the next key. * If we find none, then we are the end of the index. @@ -2595,10 +3227,18 @@ xtPublic xtBool xt_idx_prev(register XTOpenTablePtr ot, register XTIndexPtr ind, xt_ind_release(ot, ind, XT_UNLOCK_READ, &iref); while ((node = idx_pop(&stack))) { if (node->i_pos.i_item_offset > node->i_pos.i_node_ref_size) { - if (!xt_ind_fetch(ot, node->i_branch, XT_LOCK_READ, &iref)) + if (!xt_ind_fetch(ot, ind, node->i_branch, XT_LOCK_READ, &iref)) goto failed; result.sr_item = node->i_pos; ind->mi_prev_item(ot->ot_table, ind, iref.ir_branch, &result); + + if (ind->mi_lazy_delete) { + if (result.sr_row_id == (xtRowID) -1) { + current = node->i_branch; + goto search_down_stack; + } + } + goto unlock_check_on_key; } } @@ -2648,7 +3288,7 @@ xtPublic xtBool xt_idx_prev(register XTOpenTablePtr ot, register XTIndexPtr ind, } /* Return TRUE if the record matches the current index search! */ -xtPublic xtBool xt_idx_match_search(register XTOpenTablePtr ot __attribute__((unused)), register XTIndexPtr ind, register XTIdxSearchKeyPtr search_key, xtWord1 *buf, int mode) +xtPublic xtBool xt_idx_match_search(register XTOpenTablePtr XT_UNUSED(ot), register XTIndexPtr ind, register XTIdxSearchKeyPtr search_key, xtWord1 *buf, int mode) { int r; xtWord1 key_buf[XT_INDEX_MAX_KEY_SIZE]; @@ -2666,7 +3306,7 @@ xtPublic xtBool xt_idx_match_search(register XTOpenTablePtr ot __attribute__((un return FALSE; } -static void idx_set_index_selectivity(XTThreadPtr self __attribute__((unused)), XTOpenTablePtr ot, XTIndexPtr ind) +static void idx_set_index_selectivity(XTThreadPtr self, XTOpenTablePtr ot, XTIndexPtr ind) { static const xtRecordID MAX_RECORDS = 100; @@ -2784,7 +3424,7 @@ static void idx_set_index_selectivity(XTThreadPtr self __attribute__((unused)), ot->ot_ind_rhandle = NULL; failed: - ot->ot_table->tab_dic.dic_disable_index = XT_INDEX_CORRUPTED; + xt_tab_disable_index(ot->ot_table, XT_INDEX_CORRUPTED); xt_log_and_clear_exception_ns(); return; } @@ -2834,10 +3474,11 @@ static u_int idx_check_node(XTOpenTablePtr ot, XTIndexPtr ind, int depth, xtInde XTIndReferenceRec iref; #ifdef DEBUG - iref.ir_ulock = XT_UNLOCK_NONE; + iref.ir_xlock = 2; + iref.ir_updated = 2; #endif ASSERT_NS(XT_NODE_ID(node) <= XT_NODE_ID(ot->ot_table->tab_ind_eof)); - if (!xt_ind_fetch(ot, node, XT_LOCK_READ, &iref)) + if (!xt_ind_fetch(ot, ind, node, XT_LOCK_READ, &iref)) return 0; idx_first_branch_item(ot->ot_table, ind, iref.ir_branch, &result); @@ -2974,7 +3615,7 @@ xtPublic void xt_check_indices(XTOpenTablePtr ot) track_block_exists(current); #endif printf("%d ", (int) XT_NODE_ID(current)); - if (!xt_ind_read_bytes(ot, current, sizeof(XTIndFreeBlockRec), (xtWord1 *) &free_block)) { + if (!xt_ind_read_bytes(ot, *ind, current, sizeof(XTIndFreeBlockRec), (xtWord1 *) &free_block)) { xt_log_and_clear_exception_ns(); break; } @@ -3000,6 +3641,88 @@ xtPublic void xt_check_indices(XTOpenTablePtr ot) /* * ----------------------------------------------------------------------- + * Load index + */ + +static void idx_load_node(XTThreadPtr self, XTOpenTablePtr ot, XTIndexPtr ind, xtIndexNodeID node) +{ + XTIdxResultRec result; + XTIndReferenceRec iref; + + ASSERT_NS(XT_NODE_ID(node) <= XT_NODE_ID(ot->ot_table->tab_ind_eof)); + if (!xt_ind_fetch(ot, ind, node, XT_LOCK_READ, &iref)) + xt_throw(self); + + idx_first_branch_item(ot->ot_table, ind, iref.ir_branch, &result); + if (result.sr_item.i_node_ref_size) + idx_load_node(self, ot, ind, result.sr_branch); + while (result.sr_item.i_item_offset < result.sr_item.i_total_size) { + idx_next_branch_item(ot->ot_table, ind, iref.ir_branch, &result); + if (result.sr_item.i_node_ref_size) + idx_load_node(self, ot, ind, result.sr_branch); + } + + xt_ind_release(ot, ind, XT_UNLOCK_READ, &iref); +} + +xtPublic void xt_load_indices(XTThreadPtr self, XTOpenTablePtr ot) +{ + register XTTableHPtr tab = ot->ot_table; + XTIndexPtr *ind_ptr; + XTIndexPtr ind; + xtIndexNodeID current; + + xt_lock_mutex(self, &tab->tab_ind_flush_lock); + pushr_(xt_unlock_mutex, &tab->tab_ind_flush_lock); + + ind_ptr = tab->tab_dic.dic_keys; + for (u_int k=0; k<tab->tab_dic.dic_key_count; k++, ind_ptr++) { + ind = *ind_ptr; + XT_INDEX_WRITE_LOCK(ind, ot); + if ((XT_NODE_ID(current) = XT_NODE_ID(ind->mi_root))) + idx_load_node(self, ot, ind, current); + XT_INDEX_UNLOCK(ind, ot); + } + + freer_(); // xt_unlock_mutex(&tab->tab_ind_flush_lock) +} + +/* + * ----------------------------------------------------------------------- + * Count the number of deleted entries in a node: + */ + +/* + * {LAZY-DEL-INDEX-ITEMS} + * + * Use this function to count the number of deleted items + * in a node when it is loaded. + * + * The count helps us decide of the node should be "packed". + */ +xtPublic void xt_ind_count_deleted_items(XTTableHPtr tab, XTIndexPtr ind, XTIndBlockPtr block) +{ + XTIdxResultRec result; + int del_count = 0; + xtWord2 branch_size; + + branch_size = XT_GET_DISK_2(((XTIdxBranchDPtr) block->cb_data)->tb_size_2); + + /* This is possible when reading free pages. */ + if (XT_GET_INDEX_BLOCK_LEN(branch_size) < 2 || XT_GET_INDEX_BLOCK_LEN(branch_size) > XT_INDEX_PAGE_SIZE) + return; + + idx_first_branch_item(tab, ind, (XTIdxBranchDPtr) block->cb_data, &result); + while (result.sr_item.i_item_offset < result.sr_item.i_total_size) { + if (result.sr_row_id == (xtRowID) -1) + del_count++; + idx_next_branch_item(tab, ind, (XTIdxBranchDPtr) block->cb_data, &result); + } + block->cp_del_count = del_count; +} + +/* + * ----------------------------------------------------------------------- * Index consistant flush */ @@ -3408,7 +4131,7 @@ void XTIndexLogPool::ilp_init(struct XTThread *self, struct XTDatabase *db, size xt_throw(self); } -void XTIndexLogPool::ilp_close(struct XTThread *self __attribute__((unused)), xtBool lock) +void XTIndexLogPool::ilp_close(struct XTThread *XT_UNUSED(self), xtBool lock) { XTIndexLogPtr il; @@ -3570,7 +4293,7 @@ xtBool XTIndexLog::il_require_space(size_t bytes, XTThreadPtr thread) return OK; } -xtBool XTIndexLog::il_write_byte(struct XTOpenTable *ot __attribute__((unused)), xtWord1 byte) +xtBool XTIndexLog::il_write_byte(struct XTOpenTable *ot, xtWord1 byte) { if (!il_require_space(1, ot->ot_thread)) return FAILED; @@ -3579,7 +4302,7 @@ xtBool XTIndexLog::il_write_byte(struct XTOpenTable *ot __attribute__((unused)), return OK; } -xtBool XTIndexLog::il_write_word4(struct XTOpenTable *ot __attribute__((unused)), xtWord4 value) +xtBool XTIndexLog::il_write_word4(struct XTOpenTable *ot, xtWord4 value) { xtWord1 *buffer; @@ -3591,7 +4314,7 @@ xtBool XTIndexLog::il_write_word4(struct XTOpenTable *ot __attribute__((unused)) return OK; } -xtBool XTIndexLog::il_write_block(struct XTOpenTable *ot __attribute__((unused)), XTIndBlockPtr block) +xtBool XTIndexLog::il_write_block(struct XTOpenTable *ot, XTIndBlockPtr block) { XTIndPageDataDPtr page_data; xtIndexNodeID node_id; @@ -3618,7 +4341,7 @@ xtBool XTIndexLog::il_write_block(struct XTOpenTable *ot __attribute__((unused)) return OK; } -xtBool XTIndexLog::il_write_header(struct XTOpenTable *ot __attribute__((unused)), size_t head_size, xtWord1 *head_buf) +xtBool XTIndexLog::il_write_header(struct XTOpenTable *ot, size_t head_size, xtWord1 *head_buf) { XTIndHeadDataDPtr head_data; diff --git a/storage/pbxt/src/index_xt.h b/storage/pbxt/src/index_xt.h index b0b2813f5e1..ed4b9cef6ae 100644 --- a/storage/pbxt/src/index_xt.h +++ b/storage/pbxt/src/index_xt.h @@ -24,6 +24,7 @@ #define __xt_index_h__ #ifdef DRIZZLED +#include <drizzled/definitions.h> #include <mysys/my_bitmap.h> #else #include <mysql_version.h> @@ -34,7 +35,6 @@ #include "linklist_xt.h" #include "datalog_xt.h" #include "datadic_xt.h" -//#include "cache_xt.h" #ifndef MYSQL_VERSION_ID #error MYSQL_VERSION_ID must be defined! @@ -109,7 +109,7 @@ class Field; #define XT_MAX_RECORD_REF_SIZE 8 -#define XT_INDEX_PAGE_DATA_SIZE XT_INDEX_PAGE_SIZE - 2 /* NOTE: 2 == offsetof(XTIdxBranchDRec, tb_data) */ +#define XT_INDEX_PAGE_DATA_SIZE (XT_INDEX_PAGE_SIZE - 2) /* NOTE: 2 == offsetof(XTIdxBranchDRec, tb_data) */ #define XT_MAKE_LEAF_SIZE(x) ((x) + offsetof(XTIdxBranchDRec, tb_data)) @@ -218,7 +218,7 @@ typedef struct XTIndFreeList { * in 32 threads on smalltab: runTest(SMALL_INSERT_TEST, 32, dbUrl) */ /* - * XT_INDEX_USE_RW_MUTEX: + * XT_INDEX_USE_RWMUTEX: * But the RW mutex is a close second, if not just as fast. * If it is at least as fast, then it is better because read lock * overhead is then zero. @@ -240,17 +240,24 @@ typedef struct XTIndFreeList { * Latest test show that RW mutex is slightly faster: * 127460 to 123574 payment transactions. */ -#define XT_INDEX_USE_RW_MUTEX + +#ifdef XT_NO_ATOMICS +#define XT_INDEX_USE_PTHREAD_RW +#else +//#define XT_INDEX_USE_RWMUTEX //#define XT_INDEX_USE_PTHREAD_RW +//#define XT_INDEX_SPINXSLOCK +#define XT_TAB_ROW_USE_XSMUTEX +#endif -#ifdef XT_INDEX_USE_FASTWRLOCK -#define XT_INDEX_LOCK_TYPE XTFastRWLockRec -#define XT_INDEX_INIT_LOCK(s, i) xt_fastrwlock_init(s, &(i)->mi_rwlock) -#define XT_INDEX_FREE_LOCK(s, i) xt_fastrwlock_free(s, &(i)->mi_rwlock) -#define XT_INDEX_READ_LOCK(i, o) xt_fastrwlock_slock(&(i)->mi_rwlock, (o)->ot_thread) -#define XT_INDEX_WRITE_LOCK(i, o) xt_fastrwlock_xlock(&(i)->mi_rwlock, (o)->ot_thread) -#define XT_INDEX_UNLOCK(i, o) xt_fastrwlock_unlock(&(i)->mi_rwlock, (o)->ot_thread) -#define XT_INDEX_HAVE_XLOCK(i, o) TRUE +#ifdef XT_TAB_ROW_USE_XSMUTEX +#define XT_INDEX_LOCK_TYPE XTXSMutexRec +#define XT_INDEX_INIT_LOCK(s, i) xt_xsmutex_init_with_autoname(s, &(i)->mi_rwlock) +#define XT_INDEX_FREE_LOCK(s, i) xt_xsmutex_free(s, &(i)->mi_rwlock) +#define XT_INDEX_READ_LOCK(i, o) xt_xsmutex_slock(&(i)->mi_rwlock, (o)->ot_thread->t_id) +#define XT_INDEX_WRITE_LOCK(i, o) xt_xsmutex_xlock(&(i)->mi_rwlock, (o)->ot_thread->t_id) +#define XT_INDEX_UNLOCK(i, o) xt_xsmutex_unlock(&(i)->mi_rwlock, (o)->ot_thread->t_id) +#define XT_INDEX_HAVE_XLOCK(i, o) ((i)->sxs_xlocker == (o)->ot_thread->t_id) #elif defined(XT_INDEX_USE_PTHREAD_RW) #define XT_INDEX_LOCK_TYPE xt_rwlock_type #define XT_INDEX_INIT_LOCK(s, i) xt_init_rwlock_with_autoname(s, &(i)->mi_rwlock) @@ -259,7 +266,15 @@ typedef struct XTIndFreeList { #define XT_INDEX_WRITE_LOCK(i, o) xt_xlock_rwlock_ns(&(i)->mi_rwlock) #define XT_INDEX_UNLOCK(i, o) xt_unlock_rwlock_ns(&(i)->mi_rwlock) #define XT_INDEX_HAVE_XLOCK(i, o) TRUE -#else // XT_INDEX_USE_RW_MUTEX +#elif defined(XT_INDEX_SPINXSLOCK) +#define XT_INDEX_LOCK_TYPE XTSpinXSLockRec +#define XT_INDEX_INIT_LOCK(s, i) xt_spinxslock_init_with_autoname(s, &(i)->mi_rwlock) +#define XT_INDEX_FREE_LOCK(s, i) xt_spinxslock_free(s, &(i)->mi_rwlock) +#define XT_INDEX_READ_LOCK(i, o) xt_spinxslock_slock(&(i)->mi_rwlock, (o)->ot_thread->t_id) +#define XT_INDEX_WRITE_LOCK(i, o) xt_spinxslock_xlock(&(i)->mi_rwlock, (o)->ot_thread->t_id) +#define XT_INDEX_UNLOCK(i, o) xt_spinxslock_unlock(&(i)->mi_rwlock, (o)->ot_thread->t_id) +#define XT_INDEX_HAVE_XLOCK(i, o) ((i)->mi_rwlock.nrw_xlocker == (o)->ot_thread->t_id) +#else // XT_INDEX_USE_RWMUTEX #define XT_INDEX_LOCK_TYPE XTRWMutexRec #define XT_INDEX_INIT_LOCK(s, i) xt_rwmutex_init_with_autoname(s, &(i)->mi_rwlock) #define XT_INDEX_FREE_LOCK(s, i) xt_rwmutex_free(s, &(i)->mi_rwlock) @@ -289,22 +304,24 @@ typedef struct XTIndex { XTIndFreeListPtr mi_free_list; /* List of free pages for this index. */ /* Protected by the mi_dirty_lock: */ - XTSpinLockRec mi_dirty_lock; /* Spin lock protecting the dirty & free lists. */ + XTSpinLockRec mi_dirty_lock; /* Spin lock protecting the dirty & free lists. */ struct XTIndBlock *mi_dirty_list; /* List of dirty pages for this index. */ u_int mi_dirty_blocks; /* Count of the dirty blocks. */ /* Index contants: */ u_int mi_flags; u_int mi_key_size; + u_int mi_max_items; /* The maximum number of items that can fit in a leaf node. */ xtBool mi_low_byte_first; xtBool mi_fix_key; + xtBool mi_lazy_delete; /* TRUE if index entries are "lazy deleted". */ u_int mi_single_type; /* Used when the index contains a single field. */ u_int mi_select_total; XTScanBranchFunc mi_scan_branch; XTPrevItemFunc mi_prev_item; XTLastItemFunc mi_last_item; XTSimpleCompFunc mi_simple_comp_key; - MY_BITMAP mi_col_map; /* Bit-map of columns in the index. */ + MX_BITMAP mi_col_map; /* Bit-map of columns in the index. */ u_int mi_subset_of; /* Indicates if this index is a complete subset of someother index. */ u_int mi_seg_count; XTIndexSegRec mi_seg[200]; @@ -344,6 +361,7 @@ typedef struct XTDictionary { Field **dic_blob_cols; /* MySQL related information. NULL when no tables are open from MySQL side! */ + xtBool dic_no_lazy_delete; /* FALSE if lazy delete is OK. */ u_int dic_disable_index; /* Non-zero if the index cannot be used. */ u_int dic_index_ver; /* The version of the index. */ u_int dic_key_count; @@ -462,6 +480,8 @@ xtBool xt_idx_prev(register struct XTOpenTable *ot, register struct XTIndex *ind xtBool xt_idx_read(struct XTOpenTable *ot, struct XTIndex *ind, xtWord1 *rec_buf); void xt_ind_set_index_selectivity(XTThreadPtr self, struct XTOpenTable *ot); void xt_check_indices(struct XTOpenTable *ot); +void xt_load_indices(XTThreadPtr self, struct XTOpenTable *ot); +void xt_ind_count_deleted_items(struct XTTable *ot, struct XTIndex *ind, struct XTIndBlock *block); xtBool xt_flush_indices(struct XTOpenTable *ot, off_t *bytes_flushed, xtBool have_table_lock); void xt_ind_track_dump_block(struct XTTable *tab, xtIndexNodeID address); @@ -482,6 +502,7 @@ void xt_prev_branch_item_var(struct XTTable *tab, XTIndexPtr ind, XTIdxBranchDPt void xt_last_branch_item_fix(struct XTTable *tab, XTIndexPtr ind, XTIdxBranchDPtr branch, register XTIdxResultPtr result); void xt_last_branch_item_var(struct XTTable *tab, XTIndexPtr ind, XTIdxBranchDPtr branch, register XTIdxResultPtr result); +xtBool xt_idx_lazy_delete_on_leaf(XTIndexPtr ind, struct XTIndBlock *block, xtWord2 branch_size); //#define TRACK_ACTIVITY #ifdef TRACK_ACTIVITY diff --git a/storage/pbxt/src/lock_xt.cc b/storage/pbxt/src/lock_xt.cc index ec698bb81b2..6a2d7a5b0e3 100644 --- a/storage/pbxt/src/lock_xt.cc +++ b/storage/pbxt/src/lock_xt.cc @@ -25,6 +25,10 @@ #include "xt_config.h" +#ifdef DRIZZLED +#include <bitset> +#endif + #include <stdio.h> #include "lock_xt.h" @@ -40,6 +44,16 @@ #endif /* + * This function should never be called. It indicates a link + * error! + */ +xtPublic void xt_log_atomic_error_and_abort(c_char *func, c_char *file, u_int line) +{ + xt_logf(NULL, func, file, line, XT_LOG_ERROR, "%s", "Atomic operations not supported\n"); + abort(); +} + +/* * ----------------------------------------------------------------------- * ROW LOCKS, LIST BASED */ @@ -715,7 +729,7 @@ xtBool xt_init_row_locks(XTRowLocksPtr rl) return OK; } -void xt_exit_row_locks(XTRowLocksPtr rl __attribute__((unused))) +void xt_exit_row_locks(XTRowLocksPtr rl) { for (int i=0; i<XT_ROW_LOCK_GROUP_COUNT; i++) { xt_spinlock_free(NULL, &rl->rl_groups[i].lg_lock); @@ -982,7 +996,7 @@ xtBool old_xt_init_row_locks(XTRowLocksPtr rl) return OK; } -void old_xt_exit_row_locks(XTRowLocksPtr rl __attribute__((unused))) +void old_xt_exit_row_locks(XTRowLocksPtr XT_UNUSED(rl)) { } @@ -1007,10 +1021,6 @@ xtPublic void xt_exit_row_lock_list(XTRowLockListPtr lock_list) * SPECIAL EXCLUSIVE/SHARED (XS) LOCK */ -#define XT_GET1(x) *(x) -#define XT_SET4(x, y) xt_atomic_set4(x, y) -#define XT_GET4(x) xt_atomic_get4(x) - #ifdef XT_THREAD_LOCK_INFO xtPublic void xt_rwmutex_init(struct XTThread *self, XTRWMutexPtr xsl, const char *n) #else @@ -1023,7 +1033,7 @@ xtPublic void xt_rwmutex_init(XTThreadPtr self, XTRWMutexPtr xsl) #endif xt_init_mutex_with_autoname(self, &xsl->xs_lock); xt_init_cond(self, &xsl->xs_cond); - XT_SET4(&xsl->xs_state, 0); + xt_atomic_set4(&xsl->xs_state, 0); xsl->xs_xlocker = 0; /* Must be aligned! */ ASSERT(xt_thr_maximum_threads == xt_align_size(xt_thr_maximum_threads, XT_XS_LOCK_ALIGN)); @@ -1068,7 +1078,7 @@ xtPublic xtBool xt_rwmutex_xlock(XTRWMutexPtr xsl, xtThreadID thd_id) } /* I am the locker (set state before locker!): */ - XT_SET4(&xsl->xs_state, 0); + xt_atomic_set4(&xsl->xs_state, 0); xsl->xs_xlocker = thd_id; /* Wait for all the read lockers: */ @@ -1078,7 +1088,7 @@ xtPublic xtBool xt_rwmutex_xlock(XTRWMutexPtr xsl, xtThreadID thd_id) * Just in case of this, we keep the wait time down! */ if (!xt_timed_wait_cond_ns(&xsl->xs_cond, &xsl->xs_lock, 10)) { - XT_SET4(&xsl->xs_state, 0); + xt_atomic_set4(&xsl->xs_state, 0); xsl->xs_xlocker = 0; xt_unlock_mutex_ns(&xsl->xs_lock); return FAILED; @@ -1087,11 +1097,11 @@ xtPublic xtBool xt_rwmutex_xlock(XTRWMutexPtr xsl, xtThreadID thd_id) /* State can be incremented in parallel by a reader * thread! */ - XT_SET4(&xsl->xs_state, xsl->xs_state + 1); + xt_atomic_set4(&xsl->xs_state, xsl->xs_state + 1); } /* I have waited for all: */ - XT_SET4(&xsl->xs_state, xt_thr_maximum_threads); + xt_atomic_set4(&xsl->xs_state, xt_thr_maximum_threads); #ifdef XT_THREAD_LOCK_INFO xt_thread_lock_info_add_owner(&xsl->xs_lock_info); @@ -1107,7 +1117,7 @@ xtPublic xtBool xt_rwmutex_slock(XTRWMutexPtr xsl, xtThreadID thd_id) #endif ASSERT_NS(xt_get_self()->t_id == thd_id); - xt_flushed_inc1(&xsl->x.xs_rlock[thd_id]); + xt_atomic_inc1(&xsl->x.xs_rlock[thd_id]); if (xsl->x.xs_rlock[thd_id] > 1) return OK; @@ -1158,7 +1168,7 @@ xtPublic xtBool xt_rwmutex_unlock(XTRWMutexPtr xsl, xtThreadID thd_id) /* I have an X lock. */ ASSERT_NS(xsl->x.xs_rlock[thd_id] == XT_NO_LOCK); ASSERT_NS(xsl->xs_state == xt_thr_maximum_threads); - XT_SET4(&xsl->xs_state, 0); + xt_atomic_set4(&xsl->xs_state, 0); xsl->xs_xlocker = 0; xt_unlock_mutex_ns(&xsl->xs_lock); /* Wake up any other X or shared lockers: */ @@ -1201,7 +1211,7 @@ xtPublic xtBool xt_rwmutex_unlock(XTRWMutexPtr xsl, xtThreadID thd_id) return FAILED; } } - xt_flushed_dec1(&xsl->x.xs_rlock[thd_id]); + xt_atomic_dec1(&xsl->x.xs_rlock[thd_id]); xt_unlock_mutex_ns(&xsl->xs_lock); } else @@ -1213,7 +1223,7 @@ xtPublic xtBool xt_rwmutex_unlock(XTRWMutexPtr xsl, xtThreadID thd_id) * try to get the lock xs_lock, I could hand for the duration * of the X lock. */ - xt_flushed_dec1(&xsl->x.xs_rlock[thd_id]); + xt_atomic_dec1(&xsl->x.xs_rlock[thd_id]); } } #ifdef XT_THREAD_LOCK_INFO @@ -1228,13 +1238,14 @@ xtPublic xtBool xt_rwmutex_unlock(XTRWMutexPtr xsl, xtThreadID thd_id) */ #ifdef XT_THREAD_LOCK_INFO -xtPublic void xt_spinlock_init(XTThreadPtr self __attribute__((unused)), XTSpinLockPtr spl, const char *n) +xtPublic void xt_spinlock_init(XTThreadPtr self, XTSpinLockPtr spl, const char *n) #else -xtPublic void xt_spinlock_init(XTThreadPtr self __attribute__((unused)), XTSpinLockPtr spl) +xtPublic void xt_spinlock_init(XTThreadPtr self, XTSpinLockPtr spl) #endif { + (void) self; spl->spl_lock = 0; -#ifdef XT_SPL_DEFAULT +#ifdef XT_NO_ATOMICS xt_init_mutex(self, &spl->spl_mutex); #endif #ifdef DEBUG @@ -1246,9 +1257,10 @@ xtPublic void xt_spinlock_init(XTThreadPtr self __attribute__((unused)), XTSpinL #endif } -xtPublic void xt_spinlock_free(XTThreadPtr self __attribute__((unused)), XTSpinLockPtr spl __attribute__((unused))) +xtPublic void xt_spinlock_free(XTThreadPtr XT_UNUSED(self), XTSpinLockPtr spl) { -#ifdef XT_SPL_DEFAULT + (void) spl; +#ifdef XT_NO_ATOMICS xt_free_mutex(&spl->spl_mutex); #endif #ifdef XT_THREAD_LOCK_INFO @@ -1266,7 +1278,7 @@ xtPublic xtBool xt_spinlock_spin(XTSpinLockPtr spl) if (!*lck) { /* Try to get the lock: */ if (!xt_spinlock_set(spl)) - return OK; + goto done_ok; } } @@ -1274,6 +1286,7 @@ xtPublic xtBool xt_spinlock_spin(XTSpinLockPtr spl) xt_critical_wait(); } + done_ok: return OK; } @@ -1400,147 +1413,96 @@ xtPublic void xt_fastlock_wakeup(XTFastLockPtr fal) /* * ----------------------------------------------------------------------- * READ/WRITE SPIN LOCK + * + * An extremely genius very fast read/write lock based on atomics! */ #ifdef XT_THREAD_LOCK_INFO -xtPublic void xt_spinrwlock_init(struct XTThread *self, XTSpinRWLockPtr srw, const char *name) +xtPublic void xt_spinxslock_init(struct XTThread *XT_UNUSED(self), XTSpinXSLockPtr sxs, const char *name) #else -xtPublic void xt_spinrwlock_init(struct XTThread *self, XTSpinRWLockPtr srw) +xtPublic void xt_spinxslock_init(struct XTThread *XT_UNUSED(self), XTSpinXSLockPtr sxs) #endif { - xt_spinlock_init_with_autoname(self, &srw->srw_lock); - xt_spinlock_init_with_autoname(self, &srw->srw_state_lock); - srw->srw_state = 0; - srw->srw_xlocker = 0; - /* Must be aligned! */ - ASSERT(xt_thr_maximum_threads == xt_align_size(xt_thr_maximum_threads, XT_XS_LOCK_ALIGN)); - srw->x.srw_rlock = (xtWord1 *) xt_calloc(self, xt_thr_maximum_threads); + sxs->sxs_xlocked = 0; + sxs->sxs_rlock_count = 0; + sxs->sxs_wait_count = 0; +#ifdef DEBUG + sxs->sxs_locker = 0; +#endif #ifdef XT_THREAD_LOCK_INFO - srw->srw_name = name; - xt_thread_lock_info_init(&srw->srw_lock_info, srw); + sxs->sxs_name = name; + xt_thread_lock_info_init(&sxs->sxs_lock_info, sxs); #endif } -xtPublic void xt_spinrwlock_free(struct XTThread *self, XTSpinRWLockPtr srw) +xtPublic void xt_spinxslock_free(struct XTThread *XT_UNUSED(self), XTSpinXSLockPtr sxs) { - if (srw->x.srw_rlock) - xt_free(self, (void *) srw->x.srw_rlock); - xt_spinlock_free(self, &srw->srw_lock); - xt_spinlock_free(self, &srw->srw_state_lock); #ifdef XT_THREAD_LOCK_INFO - xt_thread_lock_info_free(&srw->srw_lock_info); + xt_thread_lock_info_free(&sxs->sxs_lock_info); +#else + (void) sxs; #endif } -xtPublic xtBool xt_spinrwlock_xlock(XTSpinRWLockPtr srw, xtThreadID thd_id) +xtPublic xtBool xt_spinxslock_xlock(XTSpinXSLockPtr sxs, xtThreadID XT_NDEBUG_UNUSED(thd_id)) { - xt_spinlock_lock(&srw->srw_lock); - ASSERT_NS(srw->x.srw_rlock[thd_id] == XT_NO_LOCK); - - xt_spinlock_lock(&srw->srw_state_lock); - - /* Set the state before xlocker (dirty read!) */ - srw->srw_state = 0; - - /* I am the locker: */ - srw->srw_xlocker = thd_id; + register xtWord2 set; - /* Wait for all the read lockers: */ - while (srw->srw_state < xt_thr_current_max_threads) { - while (srw->x.srw_rlock[srw->srw_state]) { - xt_spinlock_unlock(&srw->srw_state_lock); - /* Wait for this reader, during this time, the reader - * himself, may increment the state. */ - xt_critical_wait(); - xt_spinlock_lock(&srw->srw_state_lock); - } - /* State can be incremented in parallel by a reader - * thread! - */ - srw->srw_state++; + /* Wait for exclusive locker: */ + for (;;) { + set = xt_atomic_tas2(&sxs->sxs_xlocked, 1); + if (!set) + break; + xt_yield(); } - /* I have waited for all: */ - srw->srw_state = xt_thr_maximum_threads; +#ifdef DEBUG + sxs->sxs_locker = thd_id; +#endif - xt_spinlock_unlock(&srw->srw_state_lock); + /* Wait for all the reader to wait! */ + while (sxs->sxs_wait_count < sxs->sxs_rlock_count) + xt_yield(); #ifdef XT_THREAD_LOCK_INFO - xt_thread_lock_info_add_owner(&srw->srw_lock_info); + xt_thread_lock_info_add_owner(&sxs->sxs_lock_info); #endif - return OK; } -xtPublic xtBool xt_spinrwlock_slock(XTSpinRWLockPtr srw, xtThreadID thd_id) +xtPublic xtBool xt_spinxslock_slock(XTSpinXSLockPtr sxs) { - ASSERT_NS(srw->x.srw_rlock[thd_id] == XT_NO_LOCK); - srw->x.srw_rlock[thd_id] = XT_WANT_LOCK; + xt_atomic_inc2(&sxs->sxs_rlock_count); + /* Check if there could be an X locker: */ - if (srw->srw_xlocker) { - /* There is an X locker. - * If srw_state < thd_id then the X locker will wait for me. - * So I should not wait! - */ - if (srw->srw_state >= thd_id) { - /* If srw->srw_state >= thd_id, then the locker may have, or - * has already checked me, and I will have to wait. - * - * Otherwise, srw_state <= thd_id, which means the - * X locker has not checked me, and will still wait for me (or - * is already waiting for me). In this case, I will have to - * take the mutex to make sure exactly how far he - * is with the checking. - */ - xt_spinlock_lock(&srw->srw_state_lock); - while (srw->srw_state > thd_id && srw->srw_xlocker) { - xt_spinlock_unlock(&srw->srw_state_lock); - xt_critical_wait(); - xt_spinlock_lock(&srw->srw_state_lock); - } - xt_spinlock_unlock(&srw->srw_state_lock); - } + if (sxs->sxs_xlocked) { + /* I am waiting... */ + xt_atomic_inc2(&sxs->sxs_wait_count); + while (sxs->sxs_xlocked) + xt_yield(); + xt_atomic_dec2(&sxs->sxs_wait_count); } - /* There is no exclusive locker, so we have the read lock: */ - srw->x.srw_rlock[thd_id] = XT_HAVE_LOCK; #ifdef XT_THREAD_LOCK_INFO - xt_thread_lock_info_add_owner(&srw->srw_lock_info); + xt_thread_lock_info_add_owner(&sxs->sxs_lock_info); #endif - return OK; } -xtPublic xtBool xt_spinrwlock_unlock(XTSpinRWLockPtr srw, xtThreadID thd_id) +xtPublic xtBool xt_spinxslock_unlock(XTSpinXSLockPtr sxs, xtBool xlocked) { - if (srw->srw_xlocker == thd_id) { - /* I have an X lock. */ - ASSERT_NS(srw->srw_state == xt_thr_maximum_threads); - srw->srw_state = 0; - srw->srw_xlocker = 0; - xt_spinlock_unlock(&srw->srw_lock); - } - else { - /* I have a shared lock: */ - ASSERT_NS(srw->x.srw_rlock[thd_id] == XT_HAVE_LOCK); - ASSERT_NS(srw->srw_state != xt_thr_maximum_threads); - srw->x.srw_rlock[thd_id] = XT_NO_LOCK; - if (srw->srw_xlocker && srw->srw_state == thd_id) { - xt_spinlock_lock(&srw->srw_state_lock); - if (srw->srw_xlocker && srw->srw_state == thd_id) { - /* If the X locker is waiting for me, - * then allow him to continue. - */ - srw->srw_state = thd_id+1; - } - xt_spinlock_unlock(&srw->srw_state_lock); - } + if (xlocked) { +#ifdef DEBUG + sxs->sxs_locker = 0; +#endif + sxs->sxs_xlocked = 0; } + else + xt_atomic_dec2(&sxs->sxs_rlock_count); #ifdef XT_THREAD_LOCK_INFO - xt_thread_lock_info_release_owner(&srw->srw_lock_info); + xt_thread_lock_info_release_owner(&sxs->sxs_lock_info); #endif - return OK; } @@ -1550,194 +1512,159 @@ xtPublic xtBool xt_spinrwlock_unlock(XTSpinRWLockPtr srw, xtThreadID thd_id) */ #ifdef XT_THREAD_LOCK_INFO -xtPublic void xt_fastrwlock_init(struct XTThread *self, XTFastRWLockPtr frw, const char *n) +xtPublic void xt_xsmutex_init(struct XTThread *self, XTXSMutexLockPtr xsm, const char *name) #else -xtPublic void xt_fastrwlock_init(struct XTThread *self, XTFastRWLockPtr frw) +xtPublic void xt_xsmutex_init(struct XTThread *self, XTXSMutexLockPtr xsm) #endif { - xt_fastlock_init_with_autoname(self, &frw->frw_lock); - frw->frw_xlocker = NULL; - xt_spinlock_init_with_autoname(self, &frw->frw_state_lock); - frw->frw_state = 0; - frw->frw_read_waiters = 0; - /* Must be aligned! */ - ASSERT(xt_thr_maximum_threads == xt_align_size(xt_thr_maximum_threads, XT_XS_LOCK_ALIGN)); - frw->x.frw_rlock = (xtWord1 *) xt_calloc(self, xt_thr_maximum_threads); + xt_init_mutex_with_autoname(self, &xsm->xsm_lock); + xt_init_cond(self, &xsm->xsm_cond); + xt_init_cond(self, &xsm->xsm_cond_2); + xsm->xsm_xlocker = 0; + xsm->xsm_rlock_count = 0; + xsm->xsm_wait_count = 0; +#ifdef DEBUG + xsm->xsm_locker = 0; +#endif #ifdef XT_THREAD_LOCK_INFO - frw->frw_name = n; - xt_thread_lock_info_init(&frw->frw_lock_info, frw); + xsm->xsm_name = name; + xt_thread_lock_info_init(&xsm->xsm_lock_info, xsm); #endif } -xtPublic void xt_fastrwlock_free(struct XTThread *self, XTFastRWLockPtr frw) +xtPublic void xt_xsmutex_free(struct XTThread *XT_UNUSED(self), XTXSMutexLockPtr xsm) { - if (frw->x.frw_rlock) - xt_free(self, (void *) frw->x.frw_rlock); - xt_fastlock_free(self, &frw->frw_lock); - xt_spinlock_free(self, &frw->frw_state_lock); + xt_free_mutex(&xsm->xsm_lock); + xt_free_cond(&xsm->xsm_cond); + xt_free_cond(&xsm->xsm_cond_2); #ifdef XT_THREAD_LOCK_INFO - xt_thread_lock_info_free(&frw->frw_lock_info); + xt_thread_lock_info_free(&xsm->xsm_lock_info); #endif } -xtPublic xtBool xt_fastrwlock_xlock(XTFastRWLockPtr frw, struct XTThread *thread) +xtPublic xtBool xt_xsmutex_xlock(XTXSMutexLockPtr xsm, xtThreadID thd_id) { - xt_fastlock_lock(&frw->frw_lock, thread); - ASSERT_NS(frw->x.frw_rlock[thread->t_id] == XT_NO_LOCK); - - xt_spinlock_lock(&frw->frw_state_lock); - - /* Set the state before xlocker (dirty read!) */ - frw->frw_state = 0; + xt_lock_mutex_ns(&xsm->xsm_lock); - /* I am the locker: */ - frw->frw_xlocker = thread; - - /* Wait for all the read lockers: */ - while (frw->frw_state < xt_thr_current_max_threads) { - while (frw->x.frw_rlock[frw->frw_state]) { - xt_lock_thread(thread); - xt_spinlock_unlock(&frw->frw_state_lock); - /* Wait for this reader. We rely on the reader to free - * us from this wait! */ - if (!xt_wait_thread(thread)) { - xt_unlock_thread(thread); - frw->frw_state = 0; - frw->frw_xlocker = NULL; - xt_fastlock_unlock(&frw->frw_lock, thread); - return FAILED; - } - xt_unlock_thread(thread); - xt_spinlock_lock(&frw->frw_state_lock); + /* Wait for exclusive locker: */ + while (xsm->xsm_xlocker) { + if (!xt_timed_wait_cond_ns(&xsm->xsm_cond, &xsm->xsm_lock, 10000)) { + xt_unlock_mutex_ns(&xsm->xsm_lock); + return FAILED; } - /* State can be incremented in parallel by a reader - * thread! - */ - frw->frw_state++; } - /* I have waited for all: */ - frw->frw_state = xt_thr_maximum_threads; - - xt_spinlock_unlock(&frw->frw_state_lock); + /* GOTCHA: You would think this is not necessary... + * But is does not always work, if a normal insert is used. + * The reason is, I guess, on MMP the assignment is not + * always immediately visible to other processors, because they + * have old versions of this variable in there cache. + * + * But this is required, because the locking mechanism is based + * on: + * Locker: sets xlocker, tests rlock_count + * Reader: incs rlock_count, tests xlocker + * + * The test, in both cases, may not read stale values. + * volatile does not help, because this just turns compiler + * optimisations off. + */ + xt_atomic_set4(&xsm->xsm_xlocker, thd_id); + + /* Wait for all the reader to wait! */ + while (xsm->xsm_wait_count < xsm->xsm_rlock_count) { + /* {RACE-WR_MUTEX} Here as well: */ + if (!xt_timed_wait_cond_ns(&xsm->xsm_cond, &xsm->xsm_lock, 100)) { + xsm->xsm_xlocker = 0; + xt_unlock_mutex_ns(&xsm->xsm_lock); + return FAILED; + } + } #ifdef XT_THREAD_LOCK_INFO - xt_thread_lock_info_add_owner(&frw->frw_lock_info); + xt_thread_lock_info_add_owner(&xsm->xsm_lock_info); #endif - return OK; } -xtPublic xtBool xt_fastrwlock_slock(XTFastRWLockPtr frw, struct XTThread *thread) +xtPublic xtBool xt_xsmutex_slock(XTXSMutexLockPtr xsm, xtThreadID XT_UNUSED(thd_id)) { - xtThreadID thd_id = thread->t_id; + xt_atomic_inc2(&xsm->xsm_rlock_count); - ASSERT_NS(frw->x.frw_rlock[thd_id] == XT_NO_LOCK); - frw->x.frw_rlock[thd_id] = XT_WANT_LOCK; /* Check if there could be an X locker: */ - if (frw->frw_xlocker) { - /* There is an X locker. - * If frw_state < thd_id then the X locker will wait for me. - * So I should not wait! - */ - if (frw->frw_state >= thd_id) { - /* If frw->frw_state >= thd_id, then the locker may have, or - * has already checked me, and I will have to wait. - * - * Otherwise, frw_state <= thd_id, which means the - * X locker has not checked me, and will still wait for me (or - * is already waiting for me). In this case, I will have to - * take the mutex to make sure exactly how far he - * is with the checking. - */ - xt_spinlock_lock(&frw->frw_state_lock); - frw->frw_read_waiters++; - frw->x.frw_rlock[thd_id] = XT_WAITING; - while (frw->frw_state > thd_id && frw->frw_xlocker) { - xt_lock_thread(thread); - xt_spinlock_unlock(&frw->frw_state_lock); - if (!xt_wait_thread(thread)) { - xt_unlock_thread(thread); - xt_spinlock_lock(&frw->frw_state_lock); - frw->frw_read_waiters--; - frw->x.frw_rlock[thd_id] = XT_NO_LOCK; - xt_spinlock_unlock(&frw->frw_state_lock); - return FAILED; - } - xt_unlock_thread(thread); - xt_spinlock_lock(&frw->frw_state_lock); + if (xsm->xsm_xlocker) { + /* I am waiting... */ + xt_lock_mutex_ns(&xsm->xsm_lock); + xsm->xsm_wait_count++; + /* Wake up the xlocker: */ + if (xsm->xsm_xlocker && xsm->xsm_wait_count == xsm->xsm_rlock_count) { + if (!xt_broadcast_cond_ns(&xsm->xsm_cond)) { + xsm->xsm_wait_count--; + xt_unlock_mutex_ns(&xsm->xsm_lock); + return FAILED; } - frw->x.frw_rlock[thd_id] = XT_HAVE_LOCK; - frw->frw_read_waiters--; - xt_spinlock_unlock(&frw->frw_state_lock); - return OK; } + while (xsm->xsm_xlocker) { + if (!xt_timed_wait_cond_ns(&xsm->xsm_cond_2, &xsm->xsm_lock, 10000)) { + xsm->xsm_wait_count--; + xt_unlock_mutex_ns(&xsm->xsm_lock); + return FAILED; + } + } + xsm->xsm_wait_count--; + xt_unlock_mutex_ns(&xsm->xsm_lock); } - /* There is no exclusive locker, so we have the read lock: */ - frw->x.frw_rlock[thd_id] = XT_HAVE_LOCK; #ifdef XT_THREAD_LOCK_INFO - xt_thread_lock_info_add_owner(&frw->frw_lock_info); + xt_thread_lock_info_add_owner(&xsm->xsm_lock_info); #endif - return OK; } -xtPublic xtBool xt_fastrwlock_unlock(XTFastRWLockPtr frw, struct XTThread *thread) +xtPublic xtBool xt_xsmutex_unlock(XTXSMutexLockPtr xsm, xtThreadID thd_id) { - xtThreadID thd_id = thread->t_id; - - if (frw->frw_xlocker == thread) { - /* I have an X lock. */ - ASSERT_NS(frw->frw_state == xt_thr_maximum_threads); - frw->frw_state = 0; - frw->frw_xlocker = NULL; - - /* Wake up all read waiters: */ - if (frw->frw_read_waiters) { - xt_spinlock_lock(&frw->frw_state_lock); - if (frw->frw_read_waiters) { - XTThreadPtr target; - - for (u_int i=0; i<xt_thr_current_max_threads; i++) { - if (frw->x.frw_rlock[i] == XT_WAITING) { - if ((target = xt_thr_array[i])) { - xt_lock_thread(target); - xt_signal_thread(target); - xt_unlock_thread(target); - } - } - } + if (xsm->xsm_xlocker == thd_id) { + xsm->xsm_xlocker = 0; + if (xsm->xsm_wait_count) { + if (!xt_broadcast_cond_ns(&xsm->xsm_cond_2)) { + xt_unlock_mutex_ns(&xsm->xsm_lock); + return FAILED; } - xt_spinlock_unlock(&frw->frw_state_lock); } - xt_fastlock_unlock(&frw->frw_lock, thread); + else { + /* Wake up any other X or shared lockers: */ + if (!xt_broadcast_cond_ns(&xsm->xsm_cond)) { + xt_unlock_mutex_ns(&xsm->xsm_lock); + return FAILED; + } + } + xt_unlock_mutex_ns(&xsm->xsm_lock); } else { - /* I have a shared lock: */ - ASSERT_NS(frw->x.frw_rlock[thd_id] == XT_HAVE_LOCK); - ASSERT_NS(frw->frw_state != xt_thr_maximum_threads); - frw->x.frw_rlock[thd_id] = XT_NO_LOCK; - if (frw->frw_xlocker && frw->frw_state == thd_id) { - xt_spinlock_lock(&frw->frw_state_lock); - if (frw->frw_xlocker && frw->frw_state == thd_id) { + /* Taking the advice from {RACE-WR_MUTEX} I do the decrement + * after I have a lock! + */ + if (xsm->xsm_xlocker) { + xt_lock_mutex_ns(&xsm->xsm_lock); + xt_atomic_dec2(&xsm->xsm_rlock_count); + if (xsm->xsm_xlocker && xsm->xsm_wait_count == xsm->xsm_rlock_count) { /* If the X locker is waiting for me, * then allow him to continue. */ - frw->frw_state = thd_id+1; - /* Wake him up: */ - xt_lock_thread(frw->frw_xlocker); - xt_signal_thread(frw->frw_xlocker); - xt_unlock_thread(frw->frw_xlocker); + if (!xt_broadcast_cond_ns(&xsm->xsm_cond)) { + xt_unlock_mutex_ns(&xsm->xsm_lock); + return FAILED; + } } - xt_spinlock_unlock(&frw->frw_state_lock); + xt_unlock_mutex_ns(&xsm->xsm_lock); } + else + xt_atomic_dec2(&xsm->xsm_rlock_count); } #ifdef XT_THREAD_LOCK_INFO - xt_thread_lock_info_release_owner(&frw->frw_lock_info); + xt_thread_lock_info_release_owner(&xsm->xsm_lock_info); #endif - return OK; } @@ -1747,9 +1674,9 @@ xtPublic xtBool xt_fastrwlock_unlock(XTFastRWLockPtr frw, struct XTThread *threa */ #ifdef XT_THREAD_LOCK_INFO -xtPublic void xt_atomicrwlock_init(struct XTThread XT_UNUSED(*self), XTAtomicRWLockPtr arw, const char *n) +xtPublic void xt_atomicrwlock_init(struct XTThread *XT_UNUSED(self), XTAtomicRWLockPtr arw, const char *n) #else -xtPublic void xt_atomicrwlock_init(struct XTThread XT_UNUSED(*self), XTAtomicRWLockPtr arw) +xtPublic void xt_atomicrwlock_init(struct XTThread *XT_UNUSED(self), XTAtomicRWLockPtr arw) #endif { arw->arw_reader_count = 0; @@ -1760,14 +1687,18 @@ xtPublic void xt_atomicrwlock_init(struct XTThread XT_UNUSED(*self), XTAtomicRWL #endif } +#ifdef XT_THREAD_LOCK_INFO +xtPublic void xt_atomicrwlock_free(struct XTThread *, XTAtomicRWLockPtr arw) +#else xtPublic void xt_atomicrwlock_free(struct XTThread *, XTAtomicRWLockPtr XT_UNUSED(arw)) +#endif { #ifdef XT_THREAD_LOCK_INFO xt_thread_lock_info_free(&arw->arw_lock_info); #endif } -xtPublic xtBool xt_atomicrwlock_xlock(XTAtomicRWLockPtr arw, xtThreadID XT_UNUSED(thr_id)) +xtPublic xtBool xt_atomicrwlock_xlock(XTAtomicRWLockPtr arw, xtThreadID XT_NDEBUG_UNUSED(thr_id)) { register xtWord2 set; @@ -1819,16 +1750,118 @@ xtPublic xtBool xt_atomicrwlock_slock(XTAtomicRWLockPtr arw) xtPublic xtBool xt_atomicrwlock_unlock(XTAtomicRWLockPtr arw, xtBool xlocked) { - if (xlocked) + if (xlocked) { +#ifdef DEBUG + arw->arw_locker = 0; +#endif arw->arw_xlock_set = 0; + } else xt_atomic_dec2(&arw->arw_reader_count); #ifdef XT_THREAD_LOCK_INFO xt_thread_lock_info_release_owner(&arw->arw_lock_info); #endif + + return OK; +} + +/* + * ----------------------------------------------------------------------- + * "SKEW" ATOMITC READ/WRITE LOCK (BASED ON ATOMIC OPERATIONS) + * + * This lock type favors writers. It only works if the proportion of readers + * to writer is high. + */ + +#ifdef XT_THREAD_LOCK_INFO +xtPublic void xt_skewrwlock_init(struct XTThread *XT_UNUSED(self), XTSkewRWLockPtr srw, const char *n) +#else +xtPublic void xt_skewrwlock_init(struct XTThread *XT_UNUSED(self), XTSkewRWLockPtr srw) +#endif +{ + srw->srw_reader_count = 0; + srw->srw_xlock_set = 0; +#ifdef XT_THREAD_LOCK_INFO + srw->srw_name = n; + xt_thread_lock_info_init(&srw->srw_lock_info, srw); +#endif +} + +#ifdef XT_THREAD_LOCK_INFO +xtPublic void xt_skewrwlock_free(struct XTThread *, XTSkewRWLockPtr srw) +#else +xtPublic void xt_skewrwlock_free(struct XTThread *, XTSkewRWLockPtr XT_UNUSED(srw)) +#endif +{ +#ifdef XT_THREAD_LOCK_INFO + xt_thread_lock_info_free(&srw->srw_lock_info); +#endif +} + +xtPublic xtBool xt_skewrwlock_xlock(XTSkewRWLockPtr srw, xtThreadID XT_NDEBUG_UNUSED(thr_id)) +{ + register xtWord2 set; + + /* First get an exclusive lock: */ + for (;;) { + set = xt_atomic_tas2(&srw->srw_xlock_set, 1); + if (!set) + break; + xt_yield(); + } + + /* Wait for the remaining readers: */ + while (srw->srw_reader_count) + xt_yield(); + +#ifdef DEBUG + srw->srw_locker = thr_id; +#endif + +#ifdef XT_THREAD_LOCK_INFO + xt_thread_lock_info_add_owner(&srw->srw_lock_info); +#endif + return OK; +} + +xtPublic xtBool xt_skewrwlock_slock(XTSkewRWLockPtr srw) +{ + /* Wait for an exclusive lock: */ + retry: + for (;;) { + if (!srw->srw_xlock_set) + break; + xt_yield(); + } + + /* Add a reader: */ + xt_atomic_inc2(&srw->srw_reader_count); + + /* Check for xlock again: */ + if (srw->srw_xlock_set) { + xt_atomic_dec2(&srw->srw_reader_count); + goto retry; + } + +#ifdef XT_THREAD_LOCK_INFO + xt_thread_lock_info_add_owner(&srw->srw_lock_info); +#endif + return OK; +} + +xtPublic xtBool xt_skewrwlock_unlock(XTSkewRWLockPtr srw, xtBool xlocked) +{ + if (xlocked) + srw->srw_xlock_set = 0; + else + xt_atomic_dec2(&srw->srw_reader_count); + +#ifdef XT_THREAD_LOCK_INFO + xt_thread_lock_info_release_owner(&srw->srw_lock_info); +#endif #ifdef DEBUG - arw->arw_locker = 0; + srw->srw_locker = 0; #endif return OK; @@ -1844,15 +1877,17 @@ xtPublic xtBool xt_atomicrwlock_unlock(XTAtomicRWLockPtr arw, xtBool xlocked) #define JOB_PRINT 3 #define JOB_INCREMENT 4 #define JOB_SNOOZE 5 +#define JOB_DOUBLE_INC 6 #define LOCK_PTHREAD_RW 1 #define LOCK_PTHREAD_MUTEX 2 -#define LOCK_FASTRW 3 +#define LOCK_RWMUTEX 3 #define LOCK_SPINLOCK 4 #define LOCK_FASTLOCK 5 -#define LOCK_SPINRWLOCK 6 -#define LOCK_FASTRWLOCK 7 +#define LOCK_SPINXSLOCK 6 +#define LOCK_XSMUTEX 7 #define LOCK_ATOMICRWLOCK 8 +#define LOCK_SKEWRWLOCK 9 typedef struct XSLockTest { u_int xs_interations; @@ -1864,18 +1899,19 @@ typedef struct XSLockTest { XTSpinLockRec xs_spinlock; xt_mutex_type xs_mutex; XTFastLockRec xs_fastlock; - XTSpinRWLockRec xs_spinrwlock; - XTFastRWLockRec xs_fastrwlock; + XTSpinXSLockRec xs_spinrwlock; + XTXSMutexRec xs_fastrwlock; XTAtomicRWLockRec xs_atomicrwlock; + XTSkewRWLockRec xs_skewrwlock; int xs_progress; xtWord4 xs_inc; } XSLockTestRec, *XSLockTestPtr; -static void lck_free_thread_data(XTThreadPtr self __attribute__((unused)), void *data __attribute__((unused))) +static void lck_free_thread_data(XTThreadPtr XT_UNUSED(self), void *XT_UNUSED(data)) { } -static void lck_do_job(XTThreadPtr self, int job, XSLockTestPtr data) +static void lck_do_job(XTThreadPtr self, int job, XSLockTestPtr data, xtBool reader) { char b1[2048], b2[2048]; @@ -1900,6 +1936,16 @@ static void lck_do_job(XTThreadPtr self, int job, XSLockTestPtr data) xt_sleep_milli_second(10); data->xs_inc++; break; + case JOB_DOUBLE_INC: + if (reader) { + if ((data->xs_inc & 1) != 0) + printf("Noooo!\n"); + } + else { + data->xs_inc++; + data->xs_inc++; + } + break; } } @@ -1929,29 +1975,34 @@ static void *lck_run_reader(XTThreadPtr self) printf("- %s %d\n", self->t_name, i+1); if (data->xs_which_lock == LOCK_PTHREAD_RW) { xt_slock_rwlock_ns(&data->xs_plock); - lck_do_job(self, data->xs_which_job, data); + lck_do_job(self, data->xs_which_job, data, TRUE); xt_unlock_rwlock_ns(&data->xs_plock); } - else if (data->xs_which_lock == LOCK_FASTRW) { + else if (data->xs_which_lock == LOCK_RWMUTEX) { xt_rwmutex_slock(&data->xs_lock, self->t_id); - lck_do_job(self, data->xs_which_job, data); + lck_do_job(self, data->xs_which_job, data, TRUE); xt_rwmutex_unlock(&data->xs_lock, self->t_id); } - else if (data->xs_which_lock == LOCK_SPINRWLOCK) { - xt_spinrwlock_slock(&data->xs_spinrwlock, self->t_id); - lck_do_job(self, data->xs_which_job, data); - xt_spinrwlock_unlock(&data->xs_spinrwlock, self->t_id); + else if (data->xs_which_lock == LOCK_SPINXSLOCK) { + xt_spinxslock_slock(&data->xs_spinrwlock); + lck_do_job(self, data->xs_which_job, data, TRUE); + xt_spinxslock_unlock(&data->xs_spinrwlock, FALSE); } - else if (data->xs_which_lock == LOCK_FASTRWLOCK) { - xt_fastrwlock_slock(&data->xs_fastrwlock, self); - lck_do_job(self, data->xs_which_job, data); - xt_fastrwlock_unlock(&data->xs_fastrwlock, self); + else if (data->xs_which_lock == LOCK_XSMUTEX) { + xt_xsmutex_slock(&data->xs_fastrwlock, self->t_id); + lck_do_job(self, data->xs_which_job, data, TRUE); + xt_xsmutex_unlock(&data->xs_fastrwlock, self->t_id); } else if (data->xs_which_lock == LOCK_ATOMICRWLOCK) { xt_atomicrwlock_slock(&data->xs_atomicrwlock); - lck_do_job(self, data->xs_which_job, data); + lck_do_job(self, data->xs_which_job, data, TRUE); xt_atomicrwlock_unlock(&data->xs_atomicrwlock, FALSE); } + else if (data->xs_which_lock == LOCK_SKEWRWLOCK) { + xt_skewrwlock_slock(&data->xs_skewrwlock); + lck_do_job(self, data->xs_which_job, data, TRUE); + xt_skewrwlock_unlock(&data->xs_skewrwlock, FALSE); + } else ASSERT(FALSE); } @@ -1971,29 +2022,34 @@ static void *lck_run_writer(XTThreadPtr self) printf("- %s %d\n", self->t_name, i+1); if (data->xs_which_lock == LOCK_PTHREAD_RW) { xt_xlock_rwlock_ns(&data->xs_plock); - lck_do_job(self, data->xs_which_job, data); + lck_do_job(self, data->xs_which_job, data, FALSE); xt_unlock_rwlock_ns(&data->xs_plock); } - else if (data->xs_which_lock == LOCK_FASTRW) { + else if (data->xs_which_lock == LOCK_RWMUTEX) { xt_rwmutex_xlock(&data->xs_lock, self->t_id); - lck_do_job(self, data->xs_which_job, data); + lck_do_job(self, data->xs_which_job, data, FALSE); xt_rwmutex_unlock(&data->xs_lock, self->t_id); } - else if (data->xs_which_lock == LOCK_SPINRWLOCK) { - xt_spinrwlock_xlock(&data->xs_spinrwlock, self->t_id); - lck_do_job(self, data->xs_which_job, data); - xt_spinrwlock_unlock(&data->xs_spinrwlock, self->t_id); + else if (data->xs_which_lock == LOCK_SPINXSLOCK) { + xt_spinxslock_xlock(&data->xs_spinrwlock, self->t_id); + lck_do_job(self, data->xs_which_job, data, FALSE); + xt_spinxslock_unlock(&data->xs_spinrwlock, TRUE); } - else if (data->xs_which_lock == LOCK_FASTRWLOCK) { - xt_fastrwlock_xlock(&data->xs_fastrwlock, self); - lck_do_job(self, data->xs_which_job, data); - xt_fastrwlock_unlock(&data->xs_fastrwlock, self); + else if (data->xs_which_lock == LOCK_XSMUTEX) { + xt_xsmutex_xlock(&data->xs_fastrwlock, self->t_id); + lck_do_job(self, data->xs_which_job, data, FALSE); + xt_xsmutex_unlock(&data->xs_fastrwlock, self->t_id); } else if (data->xs_which_lock == LOCK_ATOMICRWLOCK) { xt_atomicrwlock_xlock(&data->xs_atomicrwlock, self->t_id); - lck_do_job(self, data->xs_which_job, data); + lck_do_job(self, data->xs_which_job, data, FALSE); xt_atomicrwlock_unlock(&data->xs_atomicrwlock, TRUE); } + else if (data->xs_which_lock == LOCK_SKEWRWLOCK) { + xt_skewrwlock_xlock(&data->xs_skewrwlock, self->t_id); + lck_do_job(self, data->xs_which_job, data, FALSE); + xt_skewrwlock_unlock(&data->xs_skewrwlock, TRUE); + } else ASSERT(FALSE); } @@ -2011,7 +2067,7 @@ static void lck_print_test(XSLockTestRec *data) case LOCK_PTHREAD_MUTEX: printf("pthread mutex"); break; - case LOCK_FASTRW: + case LOCK_RWMUTEX: printf("fast read/write mutex"); break; case LOCK_SPINLOCK: @@ -2020,15 +2076,18 @@ static void lck_print_test(XSLockTestRec *data) case LOCK_FASTLOCK: printf("fast mutex"); break; - case LOCK_SPINRWLOCK: + case LOCK_SPINXSLOCK: printf("spin read/write lock"); break; - case LOCK_FASTRWLOCK: - printf("fast read/write lock"); + case LOCK_XSMUTEX: + printf("fast x/s mutex"); break; case LOCK_ATOMICRWLOCK: printf("atomic read/write lock"); break; + case LOCK_SKEWRWLOCK: + printf("skew read/write lock"); + break; } switch (data->xs_which_job) { @@ -2063,17 +2122,17 @@ static void *lck_run_mutex_locker(XTThreadPtr self) printf("- %s %d\n", self->t_name, i+1); if (data->xs_which_lock == LOCK_PTHREAD_MUTEX) { xt_lock_mutex_ns(&data->xs_mutex); - lck_do_job(self, data->xs_which_job, data); + lck_do_job(self, data->xs_which_job, data, FALSE); xt_unlock_mutex_ns(&data->xs_mutex); } else if (data->xs_which_lock == LOCK_SPINLOCK) { xt_spinlock_lock(&data->xs_spinlock); - lck_do_job(self, data->xs_which_job, data); + lck_do_job(self, data->xs_which_job, data, FALSE); xt_spinlock_unlock(&data->xs_spinlock); } else if (data->xs_which_lock == LOCK_FASTLOCK) { xt_fastlock_lock(&data->xs_fastlock, self); - lck_do_job(self, data->xs_which_job, data); + lck_do_job(self, data->xs_which_job, data, FALSE); xt_fastlock_unlock(&data->xs_fastlock, self); } else @@ -2164,15 +2223,19 @@ xtPublic void xt_unit_test_read_write_locks(XTThreadPtr self) memset(&data, 0, sizeof(data)); printf("TEST: xt_unit_test_read_write_locks\n"); + printf("size of XTXSMutexRec = %d\n", (int) sizeof(XTXSMutexRec)); + printf("size of pthread_cond_t = %d\n", (int) sizeof(pthread_cond_t)); + printf("size of pthread_mutex_t = %d\n", (int) sizeof(pthread_mutex_t)); xt_rwmutex_init_with_autoname(self, &data.xs_lock); xt_init_rwlock_with_autoname(self, &data.xs_plock); - xt_spinrwlock_init_with_autoname(self, &data.xs_spinrwlock); - xt_fastrwlock_init_with_autoname(self, &data.xs_fastrwlock); + xt_spinxslock_init_with_autoname(self, &data.xs_spinrwlock); + xt_xsmutex_init_with_autoname(self, &data.xs_fastrwlock); xt_atomicrwlock_init_with_autoname(self, &data.xs_atomicrwlock); + xt_skewrwlock_init_with_autoname(self, &data.xs_skewrwlock); /** data.xs_interations = 10; - data.xs_which_lock = LOCK_FASTRW; // LOCK_PTHREAD_RW, LOCK_FASTRW, LOCK_SPINRWLOCK, LOCK_FASTRWLOCK + data.xs_which_lock = LOCK_RWMUTEX; // LOCK_PTHREAD_RW, LOCK_RWMUTEX, LOCK_SPINXSLOCK, LOCK_XSMUTEX data.xs_which_job = JOB_PRINT; data.xs_debug_print = TRUE; data.xs_progress = 0; @@ -2184,7 +2247,7 @@ xtPublic void xt_unit_test_read_write_locks(XTThreadPtr self) /** data.xs_interations = 4000; - data.xs_which_lock = LOCK_FASTRW; // LOCK_PTHREAD_RW, LOCK_FASTRW, LOCK_SPINRWLOCK, LOCK_FASTRWLOCK + data.xs_which_lock = LOCK_RWMUTEX; // LOCK_PTHREAD_RW, LOCK_RWMUTEX, LOCK_SPINXSLOCK, LOCK_XSMUTEX data.xs_which_job = JOB_SLEEP; data.xs_debug_print = TRUE; data.xs_progress = 200; @@ -2194,37 +2257,52 @@ xtPublic void xt_unit_test_read_write_locks(XTThreadPtr self) lck_reader_writer_test(self, &data, 4, 2); **/ + // LOCK_PTHREAD_RW, LOCK_RWMUTEX, LOCK_SPINXSLOCK, LOCK_XSMUTEX, LOCK_ATOMICRWLOCK, LOCK_SKEWRWLOCK /**/ - data.xs_interations = 1000000; - data.xs_which_lock = LOCK_FASTRW; // LOCK_PTHREAD_RW, LOCK_FASTRW, LOCK_SPINRWLOCK, LOCK_FASTRWLOCK, LOCK_ATOMICRWLOCK - data.xs_which_job = JOB_INCREMENT; + data.xs_interations = 100000; + data.xs_which_lock = LOCK_XSMUTEX; + data.xs_which_job = JOB_DOUBLE_INC; // JOB_INCREMENT, JOB_DOUBLE_INC data.xs_debug_print = FALSE; data.xs_progress = 0; lck_reader_writer_test(self, &data, 10, 0); + data.xs_which_lock = LOCK_XSMUTEX; + lck_reader_writer_test(self, &data, 10, 0); + //lck_reader_writer_test(self, &data, 0, 5); + //lck_reader_writer_test(self, &data, 10, 0); + //lck_reader_writer_test(self, &data, 10, 5); /**/ - /** + /**/ data.xs_interations = 10000; - data.xs_which_lock = LOCK_FASTRW; // LOCK_PTHREAD_RW, LOCK_FASTRW, LOCK_SPINRWLOCK, LOCK_FASTRWLOCK + data.xs_which_lock = LOCK_XSMUTEX; data.xs_which_job = JOB_MEMCPY; data.xs_debug_print = FALSE; data.xs_progress = 0; - lck_reader_writer_test(self, &data, 10, 5); - **/ + lck_reader_writer_test(self, &data, 10, 0); + data.xs_which_lock = LOCK_XSMUTEX; + lck_reader_writer_test(self, &data, 10, 0); + //lck_reader_writer_test(self, &data, 0, 5); + //lck_reader_writer_test(self, &data, 10, 0); + //lck_reader_writer_test(self, &data, 10, 5); + /**/ - /** + /**/ data.xs_interations = 1000; - data.xs_which_lock = LOCK_FASTRW; // LOCK_PTHREAD_RW, LOCK_FASTRW, LOCK_SPINRWLOCK, LOCK_FASTRWLOCK - data.xs_which_job = JOB_SLEEP; + data.xs_which_lock = LOCK_XSMUTEX; + data.xs_which_job = JOB_SLEEP; // JOB_SLEEP, JOB_SNOOZE data.xs_debug_print = FALSE; data.xs_progress = 0; - lck_reader_writer_test(self, &data, 10, 5); - **/ + lck_reader_writer_test(self, &data, 10, 0); + data.xs_which_lock = LOCK_XSMUTEX; + lck_reader_writer_test(self, &data, 10, 0); + /**/ xt_rwmutex_free(self, &data.xs_lock); xt_free_rwlock(&data.xs_plock); - xt_spinrwlock_free(self, &data.xs_spinrwlock); - xt_fastrwlock_free(self, &data.xs_fastrwlock); + xt_spinxslock_free(self, &data.xs_spinrwlock); + xt_xsmutex_free(self, &data.xs_fastrwlock); + xt_atomicrwlock_free(self, &data.xs_atomicrwlock); + xt_skewrwlock_free(self, &data.xs_skewrwlock); } xtPublic void xt_unit_test_mutex_locks(XTThreadPtr self) diff --git a/storage/pbxt/src/lock_xt.h b/storage/pbxt/src/lock_xt.h index 214b00a849e..05ba9af244e 100644 --- a/storage/pbxt/src/lock_xt.h +++ b/storage/pbxt/src/lock_xt.h @@ -36,96 +36,16 @@ struct XTOpenTable; struct XTXactData; struct XTTable; -/* Possibilities are 2 = align 4 or 2 = align 8 */ -#define XT_XS_LOCK_SHIFT 2 -#define XT_XS_LOCK_ALIGN (1 << XT_XS_LOCK_SHIFT) - -/* This lock is fast for reads but slow for writes. - * Use this lock in situations where you have 99% reads, - * and then some potentially long writes. - */ -typedef struct XTRWMutex { -#ifdef DEBUG - struct XTThread *xs_lock_thread; - u_int xs_inited; -#endif -#ifdef XT_THREAD_LOCK_INFO - XTThreadLockInfoRec xs_lock_info; - const char *xs_name; -#endif - xt_mutex_type xs_lock; - xt_cond_type xs_cond; - volatile xtWord4 xs_state; - volatile xtThreadID xs_xlocker; - union { -#if XT_XS_LOCK_ALIGN == 4 - volatile xtWord4 *xs_rlock_align; -#else - volatile xtWord8 *xs_rlock_align; -#endif - volatile xtWord1 *xs_rlock; - } x; -} XTRWMutexRec, *XTRWMutexPtr; - -#ifdef XT_THREAD_LOCK_INFO -#define xt_rwmutex_init_with_autoname(a,b) xt_rwmutex_init(a,b,LOCKLIST_ARG_SUFFIX(b)) -void xt_rwmutex_init(struct XTThread *self, XTRWMutexPtr xsl, const char *name); -#else -#define xt_rwmutex_init_with_autoname(a,b) xt_rwmutex_init(a,b) -void xt_rwmutex_init(struct XTThread *self, XTRWMutexPtr xsl); -#endif -void xt_rwmutex_free(struct XTThread *self, XTRWMutexPtr xsl); -xtBool xt_rwmutex_xlock(XTRWMutexPtr xsl, xtThreadID thd_id); -xtBool xt_rwmutex_slock(XTRWMutexPtr xsl, xtThreadID thd_id); -xtBool xt_rwmutex_unlock(XTRWMutexPtr xsl, xtThreadID thd_id); - -#ifdef XT_WIN -#define XT_SPL_WIN32_ASM -#else -#if defined(__GNUC__) && (defined(__x86_64__) || defined(__i386__)) -#define XT_SPL_GNUC_X86 -#else -#define XT_SPL_DEFAULT -#endif -#endif - -#ifdef XT_SOLARIS -/* Use Sun atomic operations library - * http://docs.sun.com/app/docs/doc/816-5168/atomic-ops-3c?a=view - */ -#define XT_SPL_SOLARIS_LIB -#endif - -#ifdef XT_SPL_SOLARIS_LIB +#ifdef XT_ATOMIC_SOLARIS_LIB #include <atomic.h> #endif -typedef struct XTSpinLock { - volatile xtWord4 spl_lock; -#ifdef XT_SPL_DEFAULT - xt_mutex_type spl_mutex; -#endif -#ifdef DEBUG - struct XTThread *spl_locker; -#endif -#ifdef XT_THREAD_LOCK_INFO - XTThreadLockInfoRec spl_lock_info; - const char *spl_name; -#endif -} XTSpinLockRec, *XTSpinLockPtr; +void xt_log_atomic_error_and_abort(c_char *func, c_char *file, u_int line); -#ifdef XT_THREAD_LOCK_INFO -#define xt_spinlock_init_with_autoname(a,b) xt_spinlock_init(a,b,LOCKLIST_ARG_SUFFIX(b)) -void xt_spinlock_init(struct XTThread *self, XTSpinLockPtr sp, const char *name); -#else -#define xt_spinlock_init_with_autoname(a,b) xt_spinlock_init(a,b) -void xt_spinlock_init(struct XTThread *self, XTSpinLockPtr sp); -#endif -void xt_spinlock_free(struct XTThread *self, XTSpinLockPtr sp); -xtBool xt_spinlock_spin(XTSpinLockPtr spl); -#ifdef DEBUG -void xt_spinlock_set_thread(XTSpinLockPtr spl); -#endif +/* + * ----------------------------------------------------------------------- + * ATOMIC OPERATIONS + */ /* * This macro is to remind me where it was safe @@ -137,37 +57,38 @@ void xt_spinlock_set_thread(XTSpinLockPtr spl); * is written atomically. * But the operations themselves are not atomic! */ -inline void xt_flushed_inc1(volatile xtWord1 *mptr) +inline void xt_atomic_inc1(volatile xtWord1 *mptr) { -#ifdef XT_SPL_WIN32_ASM +#ifdef XT_ATOMIC_WIN32_X86 __asm MOV ECX, mptr __asm MOV DL, BYTE PTR [ECX] __asm INC DL __asm XCHG DL, BYTE PTR [ECX] -#elif defined(XT_SPL_GNUC_X86) +#elif defined(XT_ATOMIC_GNUC_X86) xtWord1 val; asm volatile ("movb %1,%0" : "=r" (val) : "m" (*mptr) : "memory"); val++; asm volatile ("xchgb %1,%0" : "=r" (val) : "m" (*mptr), "0" (val) : "memory"); -#elif defined(XT_SPL_SOLARIS_LIB) +#elif defined(XT_ATOMIC_SOLARIS_LIB) atomic_inc_8(mptr); #else *mptr++; + xt_log_atomic_error_and_abort(__FUNC__, __FILE__, __LINE__); #endif } -inline xtWord1 xt_flushed_dec1(volatile xtWord1 *mptr) +inline xtWord1 xt_atomic_dec1(volatile xtWord1 *mptr) { xtWord1 val; -#ifdef XT_SPL_WIN32_ASM +#ifdef XT_ATOMIC_WIN32_X86 __asm MOV ECX, mptr __asm MOV DL, BYTE PTR [ECX] __asm DEC DL __asm MOV val, DL __asm XCHG DL, BYTE PTR [ECX] -#elif defined(XT_SPL_GNUC_X86) +#elif defined(XT_ATOMIC_GNUC_X86) xtWord1 val2; asm volatile ("movb %1, %0" : "=r" (val) : "m" (*mptr) : "memory"); @@ -176,55 +97,58 @@ inline xtWord1 xt_flushed_dec1(volatile xtWord1 *mptr) /* Should work, but compiler makes a mistake? * asm volatile ("xchgb %1, %0" : : "r" (val), "m" (*mptr) : "memory"); */ -#elif defined(XT_SPL_SOLARIS_LIB) +#elif defined(XT_ATOMIC_SOLARIS_LIB) val = atomic_dec_8_nv(mptr); #else val = --(*mptr); + xt_log_atomic_error_and_abort(__FUNC__, __FILE__, __LINE__); #endif return val; } inline void xt_atomic_inc2(volatile xtWord2 *mptr) { -#ifdef XT_SPL_WIN32_ASM +#ifdef XT_ATOMIC_WIN32_X86 __asm LOCK INC WORD PTR mptr -#elif defined(XT_SPL_GNUC_X86) - asm volatile ("lock; incw %0" : : "m" (*mptr) : "memory"); -#elif defined(__GNUC__) +#elif defined(XT_ATOMIC_GNUC_X86) + asm volatile ("lock; incw %0" : : "m" (*mptr) : "memory"); +#elif defined(XT_ATOMIC_GCC_OPS) __sync_fetch_and_add(mptr, 1); -#elif defined(XT_SPL_SOLARIS_LIB) +#elif defined(XT_ATOMIC_SOLARIS_LIB) atomic_inc_16_nv(mptr); #else (*mptr)++; + xt_log_atomic_error_and_abort(__FUNC__, __FILE__, __LINE__); #endif } inline void xt_atomic_dec2(volatile xtWord2 *mptr) { -#ifdef XT_SPL_WIN32_ASM +#ifdef XT_ATOMIC_WIN32_X86 __asm LOCK DEC WORD PTR mptr -#elif defined(XT_SPL_GNUC_X86) +#elif defined(XT_ATOMIC_GNUC_X86) asm volatile ("lock; decw %0" : : "m" (*mptr) : "memory"); -#elif defined(__GNUC__) +#elif defined(XT_ATOMIC_GCC_OPS) __sync_fetch_and_sub(mptr, 1); -#elif defined(XT_SPL_SOLARIS_LIB) +#elif defined(XT_ATOMIC_SOLARIS_LIB) atomic_dec_16_nv(mptr); #else --(*mptr); + xt_log_atomic_error_and_abort(__FUNC__, __FILE__, __LINE__); #endif } /* Atomic test and set 2 byte word! */ inline xtWord2 xt_atomic_tas2(volatile xtWord2 *mptr, xtWord2 val) { -#ifdef XT_SPL_WIN32_ASM +#ifdef XT_ATOMIC_WIN32_X86 __asm MOV ECX, mptr __asm MOV DX, val __asm XCHG DX, WORD PTR [ECX] __asm MOV val, DX -#elif defined(XT_SPL_GNUC_X86) +#elif defined(XT_ATOMIC_GNUC_X86) asm volatile ("xchgw %1,%0" : "=r" (val) : "m" (*mptr), "0" (val) : "memory"); -#elif defined(XT_SPL_SOLARIS_LIB) +#elif defined(XT_ATOMIC_SOLARIS_LIB) val = atomic_swap_16(mptr, val); #else /* Yikes! */ @@ -232,43 +156,80 @@ inline xtWord2 xt_atomic_tas2(volatile xtWord2 *mptr, xtWord2 val) val = *mptr; *mptr = nval; + xt_log_atomic_error_and_abort(__FUNC__, __FILE__, __LINE__); #endif return val; } inline void xt_atomic_set4(volatile xtWord4 *mptr, xtWord4 val) { -#ifdef XT_SPL_WIN32_ASM +#ifdef XT_ATOMIC_WIN32_X86 __asm MOV ECX, mptr __asm MOV EDX, val __asm XCHG EDX, DWORD PTR [ECX] //__asm MOV DWORD PTR [ECX], EDX -#elif defined(XT_SPL_GNUC_X86) +#elif defined(XT_ATOMIC_GNUC_X86) asm volatile ("xchgl %1,%0" : "=r" (val) : "m" (*mptr), "0" (val) : "memory"); //asm volatile ("movl %0,%1" : "=r" (val) : "m" (*mptr) : "memory"); -#elif defined(XT_SPL_SOLARIS_LIB) +#elif defined(XT_ATOMIC_SOLARIS_LIB) atomic_swap_32(mptr, val); #else *mptr = val; + xt_log_atomic_error_and_abort(__FUNC__, __FILE__, __LINE__); #endif } -inline xtWord4 xt_atomic_get4(volatile xtWord4 *mptr) -{ - xtWord4 val; - -#ifdef XT_SPL_WIN32_ASM - __asm MOV ECX, mptr - __asm MOV EDX, DWORD PTR [ECX] - __asm MOV val, EDX -#elif defined(XT_SPL_GNUC_X86) - asm volatile ("movl %1,%0" : "=r" (val) : "m" (*mptr) : "memory"); +inline xtWord4 xt_atomic_tas4(volatile xtWord4 *mptr, xtWord4 val) +{ +#ifdef XT_ATOMIC_WIN32_X86 + __asm MOV ECX, mptr + __asm MOV EDX, val + __asm XCHG EDX, DWORD PTR [ECX] + __asm MOV val, EDX +#elif defined(XT_ATOMIC_GNUC_X86) + val = val; + asm volatile ("xchgl %1,%0" : "=r" (val) : "m" (*mptr), "0" (val) : "memory"); +#elif defined(XT_ATOMIC_SOLARIS_LIB) + val = atomic_swap_32(mptr, val); #else - val = *mptr; + *mptr = val; + xt_log_atomic_error_and_abort(__FUNC__, __FILE__, __LINE__); #endif return val; } +/* + * ----------------------------------------------------------------------- + * DIFFERENT TYPES OF LOCKS + */ + +typedef struct XTSpinLock { + volatile xtWord4 spl_lock; +#ifdef XT_NO_ATOMICS + xt_mutex_type spl_mutex; +#endif +#ifdef DEBUG + struct XTThread *spl_locker; +#endif +#ifdef XT_THREAD_LOCK_INFO + XTThreadLockInfoRec spl_lock_info; + const char *spl_name; +#endif +} XTSpinLockRec, *XTSpinLockPtr; + +#ifdef XT_THREAD_LOCK_INFO +#define xt_spinlock_init_with_autoname(a,b) xt_spinlock_init(a,b,LOCKLIST_ARG_SUFFIX(b)) +void xt_spinlock_init(struct XTThread *self, XTSpinLockPtr sp, const char *name); +#else +#define xt_spinlock_init_with_autoname(a,b) xt_spinlock_init(a,b) +void xt_spinlock_init(struct XTThread *self, XTSpinLockPtr sp); +#endif +void xt_spinlock_free(struct XTThread *self, XTSpinLockPtr sp); +xtBool xt_spinlock_spin(XTSpinLockPtr spl); +#ifdef DEBUG +void xt_spinlock_set_thread(XTSpinLockPtr spl); +#endif + /* Code for test and set is derived from code by Larry Zhou and * Google: http://code.google.com/p/google-perftools */ @@ -278,15 +239,15 @@ inline xtWord4 xt_spinlock_set(XTSpinLockPtr spl) volatile xtWord4 *lck; lck = &spl->spl_lock; -#ifdef XT_SPL_WIN32_ASM +#ifdef XT_ATOMIC_WIN32_X86 __asm MOV ECX, lck __asm MOV EDX, 1 __asm XCHG EDX, DWORD PTR [ECX] __asm MOV prv, EDX -#elif defined(XT_SPL_GNUC_X86) +#elif defined(XT_ATOMIC_GNUC_X86) prv = 1; asm volatile ("xchgl %1,%0" : "=r" (prv) : "m" (*lck), "0" (prv) : "memory"); -#elif defined(XT_SPL_SOLARIS_LIB) +#elif defined(XT_ATOMIC_SOLARIS_LIB) prv = atomic_swap_32(lck, 1); #else /* The default implementation just uses a mutex, and @@ -312,15 +273,15 @@ inline xtWord4 xt_spinlock_reset(XTSpinLockPtr spl) spl->spl_locker = NULL; #endif lck = &spl->spl_lock; -#ifdef XT_SPL_WIN32_ASM +#ifdef XT_ATOMIC_WIN32_X86 __asm MOV ECX, lck __asm MOV EDX, 0 __asm XCHG EDX, DWORD PTR [ECX] __asm MOV prv, EDX -#elif defined(XT_SPL_GNUC_X86) +#elif defined(XT_ATOMIC_GNUC_X86) prv = 0; asm volatile ("xchgl %1,%0" : "=r" (prv) : "m" (*lck), "0" (prv) : "memory"); -#elif defined(XT_SPL_SOLARIS_LIB) +#elif defined(XT_ATOMIC_SOLARIS_LIB) prv = atomic_swap_32(lck, 0); #else *lck = 0; @@ -359,9 +320,48 @@ inline void xt_spinlock_unlock(XTSpinLockPtr spl) #endif } -void xt_unit_test_read_write_locks(struct XTThread *self); -void xt_unit_test_mutex_locks(struct XTThread *self); -void xt_unit_test_create_threads(struct XTThread *self); +/* Possibilities are 2 = align 4 or 2 = align 8 */ +#define XT_XS_LOCK_SHIFT 2 +#define XT_XS_LOCK_ALIGN (1 << XT_XS_LOCK_SHIFT) + +/* This lock is fast for reads but slow for writes. + * Use this lock in situations where you have 99% reads, + * and then some potentially long writes. + */ +typedef struct XTRWMutex { +#ifdef DEBUG + struct XTThread *xs_lock_thread; + u_int xs_inited; +#endif +#ifdef XT_THREAD_LOCK_INFO + XTThreadLockInfoRec xs_lock_info; + const char *xs_name; +#endif + xt_mutex_type xs_lock; + xt_cond_type xs_cond; + volatile xtWord4 xs_state; + volatile xtThreadID xs_xlocker; + union { +#if XT_XS_LOCK_ALIGN == 4 + volatile xtWord4 *xs_rlock_align; +#else + volatile xtWord8 *xs_rlock_align; +#endif + volatile xtWord1 *xs_rlock; + } x; +} XTRWMutexRec, *XTRWMutexPtr; + +#ifdef XT_THREAD_LOCK_INFO +#define xt_rwmutex_init_with_autoname(a,b) xt_rwmutex_init(a,b,LOCKLIST_ARG_SUFFIX(b)) +void xt_rwmutex_init(struct XTThread *self, XTRWMutexPtr xsl, const char *name); +#else +#define xt_rwmutex_init_with_autoname(a,b) xt_rwmutex_init(a,b) +void xt_rwmutex_init(struct XTThread *self, XTRWMutexPtr xsl); +#endif +void xt_rwmutex_free(struct XTThread *self, XTRWMutexPtr xsl); +xtBool xt_rwmutex_xlock(XTRWMutexPtr xsl, xtThreadID thd_id); +xtBool xt_rwmutex_slock(XTRWMutexPtr xsl, xtThreadID thd_id); +xtBool xt_rwmutex_unlock(XTRWMutexPtr xsl, xtThreadID thd_id); #define XT_FAST_LOCK_MAX_WAIT 100 @@ -410,7 +410,7 @@ inline xtBool xt_fastlock_lock(XTFastLockPtr fal, struct XTThread *thread) #endif } -inline void xt_fastlock_unlock(XTFastLockPtr fal, struct XTThread *thread __attribute__((unused))) +inline void xt_fastlock_unlock(XTFastLockPtr fal, struct XTThread *XT_UNUSED(thread)) { if (fal->fal_wait_count) xt_fastlock_wakeup(fal); @@ -423,73 +423,61 @@ inline void xt_fastlock_unlock(XTFastLockPtr fal, struct XTThread *thread __attr #endif } -typedef struct XTSpinRWLock { - XTSpinLockRec srw_lock; - volatile xtThreadID srw_xlocker; - XTSpinLockRec srw_state_lock; - volatile u_int srw_state; - union { -#if XT_XS_LOCK_ALIGN == 4 - volatile xtWord4 *srw_rlock_align; -#else - volatile xtWord8 *srw_rlock_align; -#endif - volatile xtWord1 *srw_rlock; - } x; +#define XT_SXS_SLOCK_COUNT 2 +typedef struct XTSpinXSLock { + volatile xtWord2 sxs_xlocked; + volatile xtWord2 sxs_rlock_count; + volatile xtWord2 sxs_wait_count; /* The number of readers waiting for the xlocker. */ +#ifdef DEBUG + xtThreadID sxs_locker; +#endif #ifdef XT_THREAD_LOCK_INFO - XTThreadLockInfoRec srw_lock_info; - const char *srw_name; + XTThreadLockInfoRec sxs_lock_info; + const char *sxs_name; #endif - -} XTSpinRWLockRec, *XTSpinRWLockPtr; +} XTSpinXSLockRec, *XTSpinXSLockPtr; #ifdef XT_THREAD_LOCK_INFO -#define xt_spinrwlock_init_with_autoname(a,b) xt_spinrwlock_init(a,b,LOCKLIST_ARG_SUFFIX(b)) -void xt_spinrwlock_init(struct XTThread *self, XTSpinRWLockPtr xsl, const char *name); -#else -#define xt_spinrwlock_init_with_autoname(a,b) xt_spinrwlock_init(a,b) -void xt_spinrwlock_init(struct XTThread *self, XTSpinRWLockPtr xsl); -#endif -void xt_spinrwlock_free(struct XTThread *self, XTSpinRWLockPtr xsl); -xtBool xt_spinrwlock_xlock(XTSpinRWLockPtr xsl, xtThreadID thd_id); -xtBool xt_spinrwlock_slock(XTSpinRWLockPtr xsl, xtThreadID thd_id); -xtBool xt_spinrwlock_unlock(XTSpinRWLockPtr xsl, xtThreadID thd_id); - -typedef struct XTFastRWLock { - XTFastLockRec frw_lock; - struct XTThread *frw_xlocker; - XTSpinLockRec frw_state_lock; - volatile u_int frw_state; - u_int frw_read_waiters; - union { -#if XT_XS_LOCK_ALIGN == 4 - volatile xtWord4 *frw_rlock_align; +#define xt_spinxslock_init_with_autoname(a,b) xt_spinxslock_init(a,b,LOCKLIST_ARG_SUFFIX(b)) +void xt_spinxslock_init(struct XTThread *self, XTSpinXSLockPtr sxs, const char *name); #else - volatile xtWord8 *frw_rlock_align; +#define xt_spinxslock_init_with_autoname(a,b) xt_spinxslock_init(a,b) +void xt_spinxslock_init(struct XTThread *self, XTSpinXSLockPtr sxs); +#endif +void xt_spinxslock_free(struct XTThread *self, XTSpinXSLockPtr sxs); +xtBool xt_spinxslock_xlock(XTSpinXSLockPtr sxs, xtThreadID thd_id); +xtBool xt_spinxslock_slock(XTSpinXSLockPtr sxs); +xtBool xt_spinxslock_unlock(XTSpinXSLockPtr sxs, xtBool xlocked); + +typedef struct XTXSMutexLock { + xt_mutex_type xsm_lock; + xt_cond_type xsm_cond; + xt_cond_type xsm_cond_2; + volatile xtThreadID xsm_xlocker; + volatile xtWord2 xsm_rlock_count; + volatile xtWord2 xsm_wait_count; /* The number of readers waiting for the xlocker. */ +#ifdef DEBUG + xtThreadID xsm_locker; #endif - volatile xtWord1 *frw_rlock; - } x; - #ifdef XT_THREAD_LOCK_INFO - XTThreadLockInfoRec frw_lock_info; - const char *frw_name; + XTThreadLockInfoRec xsm_lock_info; + const char *xsm_name; #endif - -} XTFastRWLockRec, *XTFastRWLockPtr; +} XTXSMutexRec, *XTXSMutexLockPtr; #ifdef XT_THREAD_LOCK_INFO -#define xt_fastrwlock_init_with_autoname(a,b) xt_fastrwlock_init(a,b,LOCKLIST_ARG_SUFFIX(b)) -void xt_fastrwlock_init(struct XTThread *self, XTFastRWLockPtr frw, const char *name); +#define xt_xsmutex_init_with_autoname(a,b) xt_xsmutex_init(a,b,LOCKLIST_ARG_SUFFIX(b)) +void xt_xsmutex_init(struct XTThread *self, XTXSMutexLockPtr xsm, const char *name); #else -#define xt_fastrwlock_init_with_autoname(a,b) xt_fastrwlock_init(a,b) -void xt_fastrwlock_init(struct XTThread *self, XTFastRWLockPtr frw); +#define xt_xsmutex_init_with_autoname(a,b) xt_xsmutex_init(a,b) +void xt_xsmutex_init(struct XTThread *self, XTXSMutexLockPtr xsm); #endif -void xt_fastrwlock_free(struct XTThread *self, XTFastRWLockPtr frw); -xtBool xt_fastrwlock_xlock(XTFastRWLockPtr frw, struct XTThread *thread); -xtBool xt_fastrwlock_slock(XTFastRWLockPtr frw, struct XTThread *thread); -xtBool xt_fastrwlock_unlock(XTFastRWLockPtr frw, struct XTThread *thread); +void xt_xsmutex_free(struct XTThread *self, XTXSMutexLockPtr xsm); +xtBool xt_xsmutex_xlock(XTXSMutexLockPtr xsm, xtThreadID thd_id); +xtBool xt_xsmutex_slock(XTXSMutexLockPtr xsm, xtThreadID thd_id); +xtBool xt_xsmutex_unlock(XTXSMutexLockPtr xsm, xtThreadID thd_id); typedef struct XTAtomicRWLock { volatile xtWord2 arw_reader_count; @@ -516,6 +504,35 @@ xtBool xt_atomicrwlock_xlock(XTAtomicRWLockPtr xsl, xtThreadID thr_id); xtBool xt_atomicrwlock_slock(XTAtomicRWLockPtr xsl); xtBool xt_atomicrwlock_unlock(XTAtomicRWLockPtr xsl, xtBool xlocked); +typedef struct XTSkewRWLock { + volatile xtWord2 srw_reader_count; + volatile xtWord2 srw_xlock_set; + +#ifdef XT_THREAD_LOCK_INFO + XTThreadLockInfoRec srw_lock_info; + const char *srw_name; +#endif +#ifdef DEBUG + xtThreadID srw_locker; +#endif +} XTSkewRWLockRec, *XTSkewRWLockPtr; + +#ifdef XT_THREAD_LOCK_INFO +#define xt_skewrwlock_init_with_autoname(a,b) xt_skewrwlock_init(a,b,LOCKLIST_ARG_SUFFIX(b)) +void xt_skewrwlock_init(struct XTThread *self, XTSkewRWLockPtr xsl, const char *name); +#else +#define xt_skewrwlock_init_with_autoname(a,b) xt_skewrwlock_init(a,b) +void xt_skewrwlock_init(struct XTThread *self, XTSkewRWLockPtr xsl); +#endif +void xt_skewrwlock_free(struct XTThread *self, XTSkewRWLockPtr xsl); +xtBool xt_skewrwlock_xlock(XTSkewRWLockPtr xsl, xtThreadID thr_id); +xtBool xt_skewrwlock_slock(XTSkewRWLockPtr xsl); +xtBool xt_skewrwlock_unlock(XTSkewRWLockPtr xsl, xtBool xlocked); + +void xt_unit_test_read_write_locks(struct XTThread *self); +void xt_unit_test_mutex_locks(struct XTThread *self); +void xt_unit_test_create_threads(struct XTThread *self); + /* * ----------------------------------------------------------------------- * ROW LOCKS diff --git a/storage/pbxt/src/locklist_xt.cc b/storage/pbxt/src/locklist_xt.cc index 0a3df584ba6..9a4aeb8f501 100644 --- a/storage/pbxt/src/locklist_xt.cc +++ b/storage/pbxt/src/locklist_xt.cc @@ -59,13 +59,13 @@ void xt_thread_lock_info_init(XTThreadLockInfoPtr ptr, xt_rwlock_struct *lock) ptr->li_lock_type = XTThreadLockInfo::RW_LOCK; } -void xt_thread_lock_info_init(XTThreadLockInfoPtr ptr, XTFastRWLock *lock) +void xt_thread_lock_info_init(XTThreadLockInfoPtr ptr, XTXSMutexLock *lock) { ptr->li_fast_rwlock = lock; ptr->li_lock_type = XTThreadLockInfo::FAST_RW_LOCK; } -void xt_thread_lock_info_init(XTThreadLockInfoPtr ptr, XTSpinRWLock *lock) +void xt_thread_lock_info_init(XTThreadLockInfoPtr ptr, XTSpinXSLock *lock) { ptr->li_spin_rwlock = lock; ptr->li_lock_type = XTThreadLockInfo::SPIN_RW_LOCK; @@ -77,6 +77,12 @@ void xt_thread_lock_info_init(XTThreadLockInfoPtr ptr, XTAtomicRWLock *lock) ptr->li_lock_type = XTThreadLockInfo::ATOMIC_RW_LOCK; } +void xt_thread_lock_info_init(XTThreadLockInfoPtr ptr, XTSkewRWLock *lock) +{ + ptr->li_skew_rwlock = lock; + ptr->li_lock_type = XTThreadLockInfo::SKEW_RW_LOCK; +} + void xt_thread_lock_info_free(XTThreadLockInfoPtr ptr) { /* TODO: check to see if it's present in a thread's list */ @@ -163,12 +169,12 @@ void xt_trace_thread_locks(XTThread *self) lock_name = li->li_fast_lock->fal_name; break; case XTThreadLockInfo::FAST_RW_LOCK: - lock_type = "XTFastRWLock"; - lock_name = li->li_fast_rwlock->frw_name; + lock_type = "XTXSMutexLock"; + lock_name = li->li_fast_rwlock->xsm_name; break; case XTThreadLockInfo::SPIN_RW_LOCK: lock_type = "XTSpinRWLock"; - lock_name = li->li_spin_rwlock->srw_name; + lock_name = li->li_spin_rwlock->sxs_name; break; case XTThreadLockInfo::ATOMIC_RW_LOCK: lock_type = "XTAtomicRWLock"; diff --git a/storage/pbxt/src/locklist_xt.h b/storage/pbxt/src/locklist_xt.h index f0f16009ca1..4170e220ae1 100644 --- a/storage/pbxt/src/locklist_xt.h +++ b/storage/pbxt/src/locklist_xt.h @@ -25,7 +25,7 @@ #define __xt_locklist_h__ #ifdef DEBUG -//#define XT_THREAD_LOCK_INFO +#define XT_THREAD_LOCK_INFO #ifndef XT_WIN /* We need DEBUG_LOCKING in order to enable pthread function wrappers */ #define DEBUG_LOCKING @@ -40,9 +40,10 @@ struct XTRWMutex; struct xt_mutex_struct; struct xt_rwlock_struct; struct XTFastLock; -struct XTFastRWLock; -struct XTSpinRWLock; +struct XTXSMutexLock; +struct XTSpinXSLock; struct XTAtomicRWLock; +struct XTSkewRWLock; #ifdef XT_THREAD_LOCK_INFO @@ -61,7 +62,7 @@ struct XTAtomicRWLock; */ typedef struct XTThreadLockInfo { - enum LockType { SPIN_LOCK, RW_MUTEX, MUTEX, RW_LOCK, FAST_LOCK, FAST_RW_LOCK, SPIN_RW_LOCK, ATOMIC_RW_LOCK }; + enum LockType { SPIN_LOCK, RW_MUTEX, MUTEX, RW_LOCK, FAST_LOCK, FAST_RW_LOCK, SPIN_RW_LOCK, ATOMIC_RW_LOCK, SKEW_RW_LOCK }; LockType li_lock_type; @@ -69,11 +70,12 @@ typedef struct XTThreadLockInfo { XTSpinLock *li_spin_lock; // SPIN_LOCK XTRWMutex *li_rw_mutex; // RW_MUTEX XTFastLock *li_fast_lock; // FAST_LOCK - XTFastRWLock *li_fast_rwlock; // FAST_RW_LOCK - XTSpinRWLock *li_spin_rwlock; // SPIN_RW_LOCK + XTXSMutexLock *li_fast_rwlock; // FAST_RW_LOCK + XTSpinXSLock *li_spin_rwlock; // SPIN_RW_LOCK XTAtomicRWLock *li_atomic_rwlock; // ATOMIC_RW_LOCK xt_mutex_struct *li_mutex; // MUTEX xt_rwlock_struct *li_rwlock; // RW_LOCK + XTSkewRWLock *li_skew_rwlock; // SKEW_RW_LOCK }; } XTThreadLockInfoRec, *XTThreadLockInfoPtr; @@ -81,11 +83,12 @@ XTThreadLockInfoRec, *XTThreadLockInfoPtr; void xt_thread_lock_info_init(XTThreadLockInfoPtr ptr, XTSpinLock *lock); void xt_thread_lock_info_init(XTThreadLockInfoPtr ptr, XTRWMutex *lock); void xt_thread_lock_info_init(XTThreadLockInfoPtr ptr, XTFastLock *lock); -void xt_thread_lock_info_init(XTThreadLockInfoPtr ptr, XTFastRWLock *lock); -void xt_thread_lock_info_init(XTThreadLockInfoPtr ptr, XTSpinRWLock *lock); +void xt_thread_lock_info_init(XTThreadLockInfoPtr ptr, XTXSMutexLock *lock); +void xt_thread_lock_info_init(XTThreadLockInfoPtr ptr, XTSpinXSLock *lock); void xt_thread_lock_info_init(XTThreadLockInfoPtr ptr, XTAtomicRWLock *lock); void xt_thread_lock_info_init(XTThreadLockInfoPtr ptr, xt_mutex_struct *lock); void xt_thread_lock_info_init(XTThreadLockInfoPtr ptr, xt_rwlock_struct *lock); +void xt_thread_lock_info_init(XTThreadLockInfoPtr ptr, XTSkewRWLock *lock); void xt_thread_lock_info_free(XTThreadLockInfoPtr ptr); void xt_thread_lock_info_add_owner (XTThreadLockInfoPtr ptr); diff --git a/storage/pbxt/src/memory_xt.cc b/storage/pbxt/src/memory_xt.cc index 6da8673cfcc..a920a47e29b 100644 --- a/storage/pbxt/src/memory_xt.cc +++ b/storage/pbxt/src/memory_xt.cc @@ -34,7 +34,7 @@ #include "trace_xt.h" #ifdef DEBUG -#define RECORD_MM +//#define RECORD_MM #endif #ifdef DEBUG @@ -117,7 +117,7 @@ xtPublic xtBool xt_realloc(XTThreadPtr self, void **ptr, size_t size) return OK; } -xtPublic void xt_free(XTThreadPtr self __attribute__((unused)), void *ptr) +xtPublic void xt_free(XTThreadPtr XT_UNUSED(self), void *ptr) { free(ptr); } @@ -186,7 +186,7 @@ xtPublic void xt_free_ns(void *ptr) free(ptr); } -#ifdef DEBUG +#ifdef DEBUG_MEMORY /* * ----------------------------------------------------------------------- @@ -678,7 +678,7 @@ void xt_mm_memset(void *block, void *dest, int value, size_t size) memset(dest, value, size); } -void *xt_mm_malloc(XTThreadPtr self, size_t size, u_int line __attribute__((unused)), c_char *file __attribute__((unused))) +void *xt_mm_malloc(XTThreadPtr self, size_t size, u_int line, c_char *file) { unsigned char *p; @@ -695,6 +695,8 @@ void *xt_mm_malloc(XTThreadPtr self, size_t size, u_int line __attribute__((unus *(p + size + MEM_DEBUG_HDR_SIZE) = MEM_TRAILER_BYTE; *(p + size + MEM_DEBUG_HDR_SIZE + 1L) = MEM_TRAILER_BYTE; + (void) line; + (void) file; #ifdef RECORD_MM xt_lock_mutex(self, &mm_mutex); mm_add_core_ptr(self, p + MEM_DEBUG_HDR_SIZE, 0, line, file); @@ -704,7 +706,7 @@ void *xt_mm_malloc(XTThreadPtr self, size_t size, u_int line __attribute__((unus return p + MEM_DEBUG_HDR_SIZE; } -void *xt_mm_calloc(XTThreadPtr self, size_t size, u_int line __attribute__((unused)), c_char *file __attribute__((unused))) +void *xt_mm_calloc(XTThreadPtr self, size_t size, u_int line, c_char *file) { unsigned char *p; @@ -719,6 +721,8 @@ void *xt_mm_calloc(XTThreadPtr self, size_t size, u_int line __attribute__((unus *(p + size + MEM_DEBUG_HDR_SIZE) = MEM_TRAILER_BYTE; *(p + size + MEM_DEBUG_HDR_SIZE + 1L) = MEM_TRAILER_BYTE; + (void) line; + (void) file; #ifdef RECORD_MM xt_lock_mutex(self, &mm_mutex); mm_add_core_ptr(self, p + MEM_DEBUG_HDR_SIZE, 0, line, file); @@ -849,7 +853,7 @@ void xt_mm_check_ptr(XTThreadPtr self, void *ptr) xtPublic xtBool xt_init_memory(void) { -#ifdef DEBUG +#ifdef DEBUG_MEMORY XTThreadPtr self = NULL; if (!xt_init_mutex_with_autoname(NULL, &mm_mutex)) @@ -875,7 +879,7 @@ xtPublic void debug_ik_sum(void); xtPublic void xt_exit_memory(void) { -#ifdef DEBUG +#ifdef DEBUG_MEMORY long mm; int i; @@ -919,7 +923,7 @@ xtPublic void xt_exit_memory(void) * MEMORY ALLOCATION UTILITIES */ -#ifdef DEBUG +#ifdef DEBUG_MEMORY char *xt_mm_dup_string(XTThreadPtr self, c_char *str, u_int line, c_char *file) #else char *xt_dup_string(XTThreadPtr self, c_char *str) @@ -931,7 +935,7 @@ char *xt_dup_string(XTThreadPtr self, c_char *str) if (!str) return NULL; len = strlen(str); -#ifdef DEBUG +#ifdef DEBUG_MEMORY new_str = (char *) xt_mm_malloc(self, len + 1, line, file); #else new_str = (char *) xt_malloc(self, len + 1); @@ -1020,7 +1024,7 @@ xtPublic xtBool xt_realloc(XTThreadPtr self, void **ptr, size_t size) return *ptr != NULL; } -xtPublic void xt_free(XTThreadPtr self __attribute__((unused)), void *ptr) +xtPublic void xt_free(XTThreadPtr XT_UNUSED(self), void *ptr) { char *old_ptr; xtWord4 size; diff --git a/storage/pbxt/src/memory_xt.h b/storage/pbxt/src/memory_xt.h index 3b4150df185..1785cd0bd51 100644 --- a/storage/pbxt/src/memory_xt.h +++ b/storage/pbxt/src/memory_xt.h @@ -30,6 +30,10 @@ struct XTThread; #ifdef DEBUG +#define DEBUG_MEMORY +#endif + +#ifdef DEBUG_MEMORY #define XT_MM_STACK_TRACE 200 #define XT_MM_TRACE_DEPTH 4 @@ -109,7 +113,7 @@ void xt_free_ns(void *ptr); #endif -#ifdef DEBUG +#ifdef DEBUG_MEMORY #define xt_dup_string(t, s) xt_mm_dup_string(t, s, __LINE__, __FILE__) char *xt_mm_dup_string(struct XTThread *self, const char *path, u_int line, const char *file); diff --git a/storage/pbxt/src/myxt_xt.cc b/storage/pbxt/src/myxt_xt.cc index ba9a72c87a3..fdcc078c957 100644 --- a/storage/pbxt/src/myxt_xt.cc +++ b/storage/pbxt/src/myxt_xt.cc @@ -52,12 +52,12 @@ extern pthread_key_t THR_Session; #include "myxt_xt.h" #include "strutil_xt.h" #include "database_xt.h" -#ifdef XT_STREAMING -#include "streaming_xt.h" -#endif #include "cache_xt.h" #include "datalog_xt.h" +static void myxt_bitmap_init(XTThreadPtr self, MX_BITMAP *map, u_int n_bits); +static void myxt_bitmap_free(XTThreadPtr self, MX_BITMAP *map); + #ifdef DRIZZLED #define swap_variables(TYPE, a, b) \ do { \ @@ -143,7 +143,7 @@ static void my_store_blob_length(byte *pos,uint pack_length,uint length) static int my_compare_text(MX_CONST_CHARSET_INFO *charset_info, uchar *a, uint a_length, uchar *b, uint b_length, my_bool part_key, - my_bool skip_end_space __attribute__((unused))) + my_bool XT_UNUSED(skip_end_space)) { if (!part_key) /* The last parameter is diff_if_only_endspace_difference, which means @@ -632,7 +632,6 @@ static char *mx_get_length_and_data(Field *field, char *dest, xtWord4 *len) case DRIZZLE_TYPE_DATE: case DRIZZLE_TYPE_NEWDECIMAL: case DRIZZLE_TYPE_ENUM: - case DRIZZLE_TYPE_VIRTUAL: #endif break; } @@ -751,7 +750,6 @@ static void mx_set_length_and_data(Field *field, char *dest, xtWord4 len, char * case DRIZZLE_TYPE_DATE: case DRIZZLE_TYPE_NEWDECIMAL: case DRIZZLE_TYPE_ENUM: - case DRIZZLE_TYPE_VIRTUAL: #endif break; } @@ -764,7 +762,7 @@ static void mx_set_length_and_data(Field *field, char *dest, xtWord4 len, char * bzero(from, field->pack_length()); } -xtPublic void myxt_set_null_row_from_key(XTOpenTablePtr ot __attribute__((unused)), XTIndexPtr ind, xtWord1 *record) +xtPublic void myxt_set_null_row_from_key(XTOpenTablePtr XT_UNUSED(ot), XTIndexPtr ind, xtWord1 *record) { register XTIndexSegRec *keyseg = ind->mi_seg; @@ -800,7 +798,7 @@ xtPublic void myxt_set_default_row_from_key(XTOpenTablePtr ot, XTIndexPtr ind, x } /* Derived from _mi_put_key_in_record */ -xtPublic xtBool myxt_create_row_from_key(XTOpenTablePtr ot __attribute__((unused)), XTIndexPtr ind, xtWord1 *b_value, u_int key_len, xtWord1 *dest_buff) +xtPublic xtBool myxt_create_row_from_key(XTOpenTablePtr XT_UNUSED(ot), XTIndexPtr ind, xtWord1 *b_value, u_int key_len, xtWord1 *dest_buff) { byte *record = (byte *) dest_buff; register byte *key; @@ -935,8 +933,8 @@ xtPublic xtBool myxt_create_row_from_key(XTOpenTablePtr ot __attribute__((unused #ifdef CHECK_KEYS err: -#endif return FAILED; /* Crashed row */ +#endif } /* @@ -1715,7 +1713,7 @@ xtPublic void myxt_get_column_as_string(XTOpenTablePtr ot, char *buffer, u_int c /* Required by store() - or an assertion will fail: */ if (table->read_set) - bitmap_set_bit(table->read_set, col_idx); + MX_BIT_SET(table->read_set, col_idx); save = field->ptr; xt_lock_mutex(self, &tab->tab_dic_field_lock); @@ -1743,7 +1741,7 @@ xtPublic xtBool myxt_set_column(XTOpenTablePtr ot, char *buffer, u_int col_idx, /* Required by store() - or an assertion will fail: */ if (table->write_set) - bitmap_set_bit(table->write_set, col_idx); + MX_BIT_SET(table->write_set, col_idx); mx_set_notnull_in_record(field, buffer); @@ -1875,7 +1873,12 @@ xtPublic void myxt_print_key(XTIndexPtr ind, xtWord1 *key_value) static void my_close_table(TABLE *table) { -#ifndef DRIZZLED +#ifdef DRIZZLED + TABLE_SHARE *share; + + share = (TABLE_SHARE *) ((char *) table + sizeof(TABLE)); + share->free_table_share(); +#else closefrm(table, 1); // TODO: Q, why did Stewart remove this? #endif xt_free_ns(table); @@ -1885,7 +1888,7 @@ static void my_close_table(TABLE *table) * This function returns NULL if the table cannot be opened * because this is not a MySQL thread. */ -static TABLE *my_open_table(XTThreadPtr self, XTDatabaseHPtr db __attribute__((unused)), XTPathStrPtr tab_path) +static TABLE *my_open_table(XTThreadPtr self, XTDatabaseHPtr XT_UNUSED(db), XTPathStrPtr tab_path) { THD *thd = current_thd; char path_buffer[PATH_MAX]; @@ -1946,6 +1949,18 @@ static TABLE *my_open_table(XTThreadPtr self, XTDatabaseHPtr db __attribute__((u new_lex.current_select= NULL; lex_start(thd); +#ifdef DRIZZLED + share->init(db_name, 0, name, path); + if ((error = open_table_def(thd, share)) || + (error = open_table_from_share(thd, share, "", 0, (uint32_t) READ_ALL, 0, table, OTM_OPEN))) + { + xt_free(self, table); + lex_end(&new_lex); + thd->lex = old_lex; + xt_throw_sulxterr(XT_CONTEXT, XT_ERR_LOADING_MYSQL_DIC, tab_path->ps_path, (u_long) error); + return NULL; + } +#else #if MYSQL_VERSION_ID < 60000 #if MYSQL_VERSION_ID < 50123 init_tmp_table_share(share, db_name, 0, name, path); @@ -1960,11 +1975,23 @@ static TABLE *my_open_table(XTThreadPtr self, XTDatabaseHPtr db __attribute__((u #endif #endif + /* If MySQL shutsdown while we are just starting up, they + * they kill the plugin sub-system before calling + * shutdown for the engine! + */ + if (!ha_resolve_by_legacy_type(thd, DB_TYPE_PBXT)) { + xt_free(self, table); + lex_end(&new_lex); + thd->lex = old_lex; + xt_throw_xterr(XT_CONTEXT, XT_ERR_MYSQL_SHUTDOWN); + return NULL; + } + if ((error = open_table_def(thd, share, 0))) { xt_free(self, table); lex_end(&new_lex); thd->lex = old_lex; - xt_throw_ulxterr(XT_CONTEXT, XT_ERR_LOADING_MYSQL_DIC, (u_long) error); + xt_throw_sulxterr(XT_CONTEXT, XT_ERR_LOADING_MYSQL_DIC, tab_path->ps_path, (u_long) error); return NULL; } @@ -1977,9 +2004,10 @@ static TABLE *my_open_table(XTThreadPtr self, XTDatabaseHPtr db __attribute__((u xt_free(self, table); lex_end(&new_lex); thd->lex = old_lex; - xt_throw_ulxterr(XT_CONTEXT, XT_ERR_LOADING_MYSQL_DIC, (u_long) error); + xt_throw_sulxterr(XT_CONTEXT, XT_ERR_LOADING_MYSQL_DIC, tab_path->ps_path, (u_long) error); return NULL; } +#endif lex_end(&new_lex); thd->lex = old_lex; @@ -1989,8 +2017,10 @@ static TABLE *my_open_table(XTThreadPtr self, XTDatabaseHPtr db __attribute__((u * plugin_shutdown() and reap_plugins() in sql_plugin.cc * from doing their job on shutdown! */ +#ifndef DRIZZLED plugin_unlock(NULL, table->s->db_plugin); table->s->db_plugin = NULL; +#endif return table; } @@ -2069,6 +2099,11 @@ static xtBool my_is_not_null_int4(XTIndexSegPtr seg) return (seg->type == HA_KEYTYPE_LONG_INT && !(seg->flag & HA_NULL_PART)); } +/* MY_BITMAP definition in Drizzle does not like if + * I use a NULL pointer to calculate the offset!? + */ +#define MX_OFFSETOF(x, y) ((size_t)(&((x *) 8)->y) - 8) + /* Derived from ha_myisam::create and mi_create */ static XTIndexPtr my_create_index(XTThreadPtr self, TABLE *table_arg, u_int idx, KEY *index) { @@ -2084,7 +2119,7 @@ static XTIndexPtr my_create_index(XTThreadPtr self, TABLE *table_arg, u_int idx, enter_(); - pushsr_(ind, my_deref_index_data, (XTIndexPtr) xt_calloc(self, offsetof(XTIndexRec, mi_seg) + sizeof(XTIndexSegRec) * index->key_parts)); + pushsr_(ind, my_deref_index_data, (XTIndexPtr) xt_calloc(self, MX_OFFSETOF(XTIndexRec, mi_seg) + sizeof(XTIndexSegRec) * index->key_parts)); XT_INDEX_INIT_LOCK(self, ind); xt_init_mutex_with_autoname(self, &ind->mi_flush_lock); @@ -2235,7 +2270,7 @@ static XTIndexPtr my_create_index(XTThreadPtr self, TABLE *table_arg, u_int idx, /* NOTE: do not set if the field is only partially in the index!!! */ if (!partial_field) - bitmap_fast_test_and_set(&ind->mi_col_map, field->field_index); + MX_BIT_FAST_TEST_AND_SET(&ind->mi_col_map, field->field_index); } if (key_length > XT_INDEX_MAX_KEY_SIZE) @@ -2243,6 +2278,7 @@ static XTIndexPtr my_create_index(XTThreadPtr self, TABLE *table_arg, u_int idx, /* This is the maximum size of the index on disk: */ ind->mi_key_size = key_length; + ind->mi_max_items = (XT_INDEX_PAGE_SIZE-2) / (key_length+XT_RECORD_REF_SIZE); if (ind->mi_fix_key) { /* Special case for not-NULL 4 byte int value: */ @@ -2281,6 +2317,7 @@ static XTIndexPtr my_create_index(XTThreadPtr self, TABLE *table_arg, u_int idx, ind->mi_prev_item = xt_prev_branch_item_var; ind->mi_last_item = xt_last_branch_item_var; } + ind->mi_lazy_delete = ind->mi_fix_key && ind->mi_max_items >= 4; XT_NODE_ID(ind->mi_root) = 0; @@ -2344,6 +2381,10 @@ xtPublic void myxt_setup_dictionary(XTThreadPtr self, XTDictionaryPtr dic) KEY_PART_INFO *key_part; KEY_PART_INFO *key_part_end; +#ifndef XT_USE_LAZY_DELETE + dic->dic_no_lazy_delete = TRUE; +#endif + dic->dic_ind_cols_req = 0; for (uint i=0; i<TS(my_tab)->keys; i++) { index = &my_tab->key_info[i]; @@ -2602,7 +2643,7 @@ xtPublic void myxt_setup_dictionary(XTThreadPtr self, XTDictionaryPtr dic) dic->dic_mysql_rec_size = TS(my_tab)->reclength; } -static u_int my_get_best_superset(XTThreadPtr self __attribute__((unused)), XTDictionaryPtr dic, XTIndexPtr ind) +static u_int my_get_best_superset(XTThreadPtr XT_UNUSED(self), XTDictionaryPtr dic, XTIndexPtr ind) { XTIndexPtr super_ind; u_int super = 0; @@ -2762,7 +2803,7 @@ static void ha_create_dd_index(XTThreadPtr self, XTDDIndex *ind, KEY *key) } } -static char *my_type_to_string(XTThreadPtr self, Field *field, TABLE *my_tab __attribute__((unused))) +static char *my_type_to_string(XTThreadPtr self, Field *field, TABLE *XT_UNUSED(my_tab)) { char buffer[MAX_FIELD_WIDTH + 400], *ptr; String type((char *) buffer, sizeof(buffer), system_charset_info); @@ -2834,7 +2875,7 @@ xtPublic XTDDTable *myxt_create_table_from_table(XTThreadPtr self, TABLE *my_tab * MySQL CHARACTER UTILITIES */ -xtPublic void myxt_static_convert_identifier(XTThreadPtr self __attribute__((unused)), MX_CHARSET_INFO *cs, char *from, char *to, size_t to_len) +xtPublic void myxt_static_convert_identifier(XTThreadPtr XT_UNUSED(self), MX_CHARSET_INFO *cs, char *from, char *to, size_t to_len) { uint errors; @@ -2877,11 +2918,16 @@ xtPublic char *myxt_convert_table_name(XTThreadPtr self, char *from) return to; } -xtPublic void myxt_static_convert_table_name(XTThreadPtr self __attribute__((unused)), char *from, char *to, size_t to_len) +xtPublic void myxt_static_convert_table_name(XTThreadPtr XT_UNUSED(self), char *from, char *to, size_t to_len) { tablename_to_filename(from, to, to_len); } +xtPublic void myxt_static_convert_file_name(char *from, char *to, size_t to_len) +{ + filename_to_tablename(from, to, to_len); +} + xtPublic int myxt_strcasecmp(char * a, char *b) { return my_strcasecmp(&my_charset_utf8_general_ci, a, b); @@ -2913,90 +2959,11 @@ xtPublic MX_CHARSET_INFO *myxt_getcharset(bool convert) return &my_charset_utf8_general_ci; } -#ifdef XT_STREAMING -xtPublic xtBool myxt_use_blobs(XTOpenTablePtr ot, void **ret_pbms_table, xtWord1 *rec_buf) -{ - void *pbms_table; - XTTable *tab = ot->ot_table; - u_int idx = 0; - Field *field; - char *blob_ref; - xtWord4 len; - char in_url[PBMS_BLOB_URL_SIZE]; - char *out_url; - - if (!xt_pbms_open_table(&pbms_table, tab->tab_name->ps_path)) - return FAILED; - - for (idx=0; idx<tab->tab_dic.dic_blob_count; idx++) { - field = tab->tab_dic.dic_blob_cols[idx]; - if ((blob_ref = mx_get_length_and_data(field, (char *) rec_buf, &len)) && len) { - xt_strncpy(PBMS_BLOB_URL_SIZE, in_url, blob_ref, len); - - if (!xt_pbms_use_blob(pbms_table, &out_url, in_url, field->field_index)) { - xt_pbms_close_table(pbms_table); - return FAILED; - } - - if (out_url) { - len = strlen(out_url); - mx_set_length_and_data(field, (char *) rec_buf, len, out_url); - } - } - } - *ret_pbms_table = pbms_table; - return OK; -} - -xtPublic void myxt_unuse_blobs(XTOpenTablePtr ot __attribute__((unused)), void *pbms_table) -{ - xt_pbms_close_table(pbms_table); -} - -xtPublic xtBool myxt_retain_blobs(XTOpenTablePtr ot __attribute__((unused)), void *pbms_table, xtRecordID rec_id) -{ - xtBool ok; - PBMSEngineRefRec eng_ref; - - memset(&eng_ref, 0, sizeof(PBMSEngineRefRec)); - XT_SET_DISK_8(eng_ref.er_data, rec_id); - ok = xt_pbms_retain_blobs(pbms_table, &eng_ref); - xt_pbms_close_table(pbms_table); - return ok; -} - -xtPublic void myxt_release_blobs(XTOpenTablePtr ot, xtWord1 *rec_buf, xtRecordID rec_id) -{ - void *pbms_table; - XTTable *tab = ot->ot_table; - u_int idx = 0; - Field *field; - char *blob_ref; - xtWord4 len; - char in_url[PBMS_BLOB_URL_SIZE]; - PBMSEngineRefRec eng_ref; - - memset(&eng_ref, 0, sizeof(PBMSEngineRefRec)); - XT_SET_DISK_8(eng_ref.er_data, rec_id); - - if (!xt_pbms_open_table(&pbms_table, tab->tab_name->ps_path)) - return; - - for (idx=0; idx<tab->tab_dic.dic_blob_count; idx++) { - field = tab->tab_dic.dic_blob_cols[idx]; - if ((blob_ref = mx_get_length_and_data(field, (char *) rec_buf, &len)) && len) { - xt_strncpy(PBMS_BLOB_URL_SIZE, in_url, blob_ref, len); - - xt_pbms_release_blob(pbms_table, in_url, field->field_index, &eng_ref); - } - } - - xt_pbms_close_table(pbms_table); -} -#endif // XT_STREAMING - xtPublic void *myxt_create_thread() { +#ifdef DRIZZLED + return (void *) 1; +#else THD *new_thd; if (my_thread_init()) { @@ -3004,6 +2971,46 @@ xtPublic void *myxt_create_thread() return NULL; } + /* + * Unfortunately, if PBXT is the default engine, and we are shutting down + * then global_system_variables.table_plugin may be NULL. Which will cause + * a crash if we try to create a thread! + * + * The following call in plugin_shutdown() sets the global reference + * to NULL: + * + * unlock_variables(NULL, &global_system_variables); + * + * Later plugin_deinitialize() is called. + * + * The following stack is an example crash which occurs when I call + * myxt_create_thread() in ha_exit(), to force the error. + * + * if (pi->state & (PLUGIN_IS_READY | PLUGIN_IS_UNINITIALIZED)) + * pi is NULL! + * #0 0x002ff684 in intern_plugin_lock at sql_plugin.cc:617 + * #1 0x0030296d in plugin_thdvar_init at sql_plugin.cc:2432 + * #2 0x000db4a4 in THD::init at sql_class.cc:756 + * #3 0x000e02ed in THD::THD at sql_class.cc:638 + * #4 0x00e2678d in myxt_create_thread at myxt_xt.cc:2990 + * #5 0x00e05d43 in ha_exit at ha_pbxt.cc:1011 + * #6 0x00e065c2 in pbxt_end at ha_pbxt.cc:1330 + * #7 0x00e065df in pbxt_panic at ha_pbxt.cc:1343 + * #8 0x0023e57d in ha_finalize_handlerton at handler.cc:392 + * #9 0x002ffc8b in plugin_deinitialize at sql_plugin.cc:816 + * #10 0x003037d9 in plugin_shutdown at sql_plugin.cc:1572 + * #11 0x000f7b2b in clean_up at mysqld.cc:1266 + * #12 0x000f7fca in unireg_end at mysqld.cc:1192 + * #13 0x000fa021 in kill_server at mysqld.cc:1134 + * #14 0x000fa6df in kill_server_thread at mysqld.cc:1155 + * #15 0x91fdb155 in _pthread_start + * #16 0x91fdb012 in thread_start + */ + if (!global_system_variables.table_plugin) { + xt_register_xterr(XT_REG_CONTEXT, XT_ERR_MYSQL_NO_THREAD); + return NULL; + } + if (!(new_thd = new THD())) { my_thread_end(); xt_register_error(XT_REG_CONTEXT, XT_ERR_MYSQL_ERROR, 0, "Unable to create MySQL thread (THD)"); @@ -3015,8 +3022,18 @@ xtPublic void *myxt_create_thread() lex_start(new_thd); return (void *) new_thd; +#endif } +#ifdef DRIZZLED +xtPublic void myxt_destroy_thread(void *, xtBool) +{ +} + +xtPublic void myxt_delete_remaining_thread() +{ +} +#else xtPublic void myxt_destroy_thread(void *thread, xtBool end_threads) { THD *thd = (THD *) thread; @@ -3044,6 +3061,15 @@ xtPublic void myxt_destroy_thread(void *thread, xtBool end_threads) my_thread_end(); } +xtPublic void myxt_delete_remaining_thread() +{ + THD *thd; + + if ((thd = current_thd)) + myxt_destroy_thread((void *) thd, TRUE); +} +#endif + xtPublic XTThreadPtr myxt_get_self() { THD *thd; @@ -3182,7 +3208,7 @@ xtPublic void myxt_get_status(XTThreadPtr self, XTStringBufferPtr strbuf) * MySQL Bit Maps */ -xtPublic void myxt_bitmap_init(XTThreadPtr self, MY_BITMAP *map, u_int n_bits) +static void myxt_bitmap_init(XTThreadPtr self, MX_BITMAP *map, u_int n_bits) { my_bitmap_map *buf; uint size_in_bytes = (((n_bits) + 31) / 32) * 4; @@ -3194,7 +3220,7 @@ xtPublic void myxt_bitmap_init(XTThreadPtr self, MY_BITMAP *map, u_int n_bits) bitmap_clear_all(map); } -xtPublic void myxt_bitmap_free(XTThreadPtr self, MY_BITMAP *map) +static void myxt_bitmap_free(XTThreadPtr self, MX_BITMAP *map) { if (map->bitmap) { xt_free(self, map->bitmap); diff --git a/storage/pbxt/src/myxt_xt.h b/storage/pbxt/src/myxt_xt.h index 4d33431088e..484440a15b6 100644 --- a/storage/pbxt/src/myxt_xt.h +++ b/storage/pbxt/src/myxt_xt.h @@ -70,6 +70,7 @@ XTDDTable *myxt_create_table_from_table(XTThreadPtr self, STRUCT_TABLE *my_tab); void myxt_static_convert_identifier(XTThreadPtr self, struct charset_info_st *cs, char *from, char *to, size_t to_len); char *myxt_convert_identifier(XTThreadPtr self, struct charset_info_st *cs, char *from); void myxt_static_convert_table_name(XTThreadPtr self, char *from, char *to, size_t to_len); +void myxt_static_convert_file_name(char *from, char *to, size_t to_len); char *myxt_convert_table_name(XTThreadPtr self, char *from); int myxt_strcasecmp(char * a, char *b); int myxt_isspace(struct charset_info_st *cs, char a); @@ -78,23 +79,14 @@ int myxt_isdigit(struct charset_info_st *cs, char a); struct charset_info_st *myxt_getcharset(bool convert); -#ifdef XT_STREAMING -xtBool myxt_use_blobs(XTOpenTablePtr ot, void **ret_pbms_table, xtWord1 *rec_buf); -void myxt_unuse_blobs(XTOpenTablePtr ot, void *pbms_table); -xtBool myxt_retain_blobs(XTOpenTablePtr ot, void *pbms_table, xtRecordID record); -void myxt_release_blobs(XTOpenTablePtr ot, xtWord1 *rec_buf, xtRecordID record); -#endif - void *myxt_create_thread(); void myxt_destroy_thread(void *thread, xtBool end_threads); +void myxt_delete_remaining_thread(); XTThreadPtr myxt_get_self(); int myxt_statistics_fill_table(XTThreadPtr self, void *th, void *ta, void *co, MX_CONST void *ch); void myxt_get_status(XTThreadPtr self, XTStringBufferPtr strbuf); -void myxt_bitmap_init(XTThreadPtr self, MY_BITMAP *map, u_int n_bits); -void myxt_bitmap_free(XTThreadPtr self, MY_BITMAP *map); - class XTDDColumnFactory { public: diff --git a/storage/pbxt/src/pbms.h b/storage/pbxt/src/pbms.h index bf29c63651e..28ed7e5b4df 100644 --- a/storage/pbxt/src/pbms.h +++ b/storage/pbxt/src/pbms.h @@ -16,7 +16,8 @@ * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA * - * Paul McCullagh + * Original author: Paul McCullagh + * Continued development: Barry Leslie * H&G2JCtL * * 2007-06-01 @@ -37,21 +38,26 @@ #include <dirent.h> #include <signal.h> #include <ctype.h> +#include <errno.h> + #ifdef USE_PRAGMA_INTERFACE #pragma interface /* gcc class implementation */ #endif +/* 2 10 1 10 20 10 10 20 20 + * Format: "~*"<db_id><'~' || '_'><tab_id>"-"<blob_id>"-"<auth_code>"-"<server_id>"-"<blob_ref_id>"-"<blob_size> + */ +//If URL_FMT changes do not forget to update couldBeURL() in this file. + +#define URL_FMT "~*%lu%c%lu-%llu-%lx-%lu-%llu-%llu" + #define MS_SHARED_MEMORY_MAGIC 0x7E9A120C #define MS_ENGINE_VERSION 1 -#define MS_CALLBACK_VERSION 1 -#define MS_SHARED_MEMORY_VERSION 1 -#define MS_ENGINE_LIST_SIZE 80 +#define MS_CALLBACK_VERSION 4 +#define MS_SHARED_MEMORY_VERSION 2 +#define MS_ENGINE_LIST_SIZE 10 #define MS_TEMP_FILE_PREFIX "pbms_temp_" -#define MS_TEMP_FILE_PREFIX "pbms_temp_" - -#define MS_RESULT_MESSAGE_SIZE 300 -#define MS_RESULT_STACK_SIZE 200 #define MS_BLOB_HANDLE_SIZE 300 @@ -68,146 +74,81 @@ #define MS_ERR_UNKNOWN_DB 8 #define MS_ERR_REMOVING_REPO 9 #define MS_ERR_DATABASE_DELETED 10 +#define MS_ERR_DUPLICATE 11 /* Attempt to insert a duplicate key into a system table. */ +#define MS_ERR_INVALID_RECORD 12 +#define MS_ERR_RECOVERY_IN_PROGRESS 13 +#define MS_ERR_DUPLICATE_DB 14 +#define MS_ERR_DUPLICATE_DB_ID 15 +#define MS_ERR_INVALID_OPERATION 16 #define MS_LOCK_NONE 0 #define MS_LOCK_READONLY 1 #define MS_LOCK_READ_WRITE 2 -#define MS_XACT_NONE 0 -#define MS_XACT_BEGIN 1 -#define MS_XACT_COMMIT 2 -#define MS_XACT_ROLLBACK 3 - -#define PBMS_ENGINE_REF_LEN 8 -#define PBMS_BLOB_URL_SIZE 200 +#define PBMS_BLOB_URL_SIZE 120 #define PBMS_FIELD_COL_SIZE 128 #define PBMS_FIELD_COND_SIZE 300 +#define MS_RESULT_MESSAGE_SIZE 300 +#define MS_RESULT_STACK_SIZE 200 + +typedef struct PBMSResultRec { + int mr_code; /* Engine specific error code. */ + char mr_message[MS_RESULT_MESSAGE_SIZE]; /* Error message, required if non-zero return code. */ + char mr_stack[MS_RESULT_STACK_SIZE]; /* Trace information about where the error occurred. */ +} PBMSResultRec, *PBMSResultPtr; + + typedef struct PBMSBlobID { + u_int32_t bi_db_id; u_int64_t bi_blob_size; u_int64_t bi_blob_id; // or repo file offset if type = REPO + u_int64_t bi_blob_ref_id; u_int32_t bi_tab_id; // or repo ID if type = REPO u_int32_t bi_auth_code; u_int32_t bi_blob_type; } PBMSBlobIDRec, *PBMSBlobIDPtr; -typedef struct PBMSResultRec { - int mr_code; /* Engine specific error code. */ - char mr_message[MS_RESULT_MESSAGE_SIZE]; /* Error message, required if non-zero return code. */ - char mr_stack[MS_RESULT_STACK_SIZE]; /* Trace information about where the error occurred. */ -} PBMSResultRec, *PBMSResultPtr; - -typedef struct PBMSEngineRefRec { - unsigned char er_data[PBMS_ENGINE_REF_LEN]; -} PBMSEngineRefRec, *PBMSEngineRefPtr; - typedef struct PBMSBlobURL { char bu_data[PBMS_BLOB_URL_SIZE]; } PBMSBlobURLRec, *PBMSBlobURLPtr; -typedef struct PBMSFieldRef { - char fr_column[PBMS_FIELD_COL_SIZE]; - char fr_cond[PBMS_FIELD_COND_SIZE]; -} PBMSFieldRefRec, *PBMSFieldRefPtr; -/* - * The engine must free its resources for the given thread. - */ -typedef void (*MSCloseConnFunc)(void *thd); - -/* Before access BLOBs of a table, the streaming engine will open the table. - * Open tables are managed as a pool by the streaming engine. - * When a request is received, the streaming engine will ask all - * registered engine to open the table. The engine must return a NULL - * open_table pointer if it does not handle the table. - * A callback allows an engine to request all open tables to be - * closed by the streaming engine. - */ -typedef int (*MSOpenTableFunc)(void *thd, const char *table_url, void **open_table, PBMSResultPtr result); -typedef void (*MSCloseTableFunc)(void *thd, void *open_table); - -/* - * When the streaming engine wants to use an open table handle from the - * pool, it calls the lock table function. - */ -typedef int (*MSLockTableFunc)(void *thd, int *xact, void *open_table, int lock_type, PBMSResultPtr result); -typedef int (*MSUnlockTableFunc)(void *thd, int xact, void *open_table, PBMSResultPtr result); - -/* This function is used to locate and send a BLOB on the given stream. - */ -typedef int (*MSSendBLOBFunc)(void *thd, void *open_table, const char *blob_column, const char *blob_url, void *stream, PBMSResultPtr result); - -/* - * Lookup and engine reference, and return readable text. - */ -typedef int (*MSLookupRefFunc)(void *thd, void *open_table, unsigned short col_index, PBMSEngineRefPtr eng_ref, PBMSFieldRefPtr feild_ref, PBMSResultPtr result); - typedef struct PBMSEngineRec { int ms_version; /* MS_ENGINE_VERSION */ int ms_index; /* The index into the engine list. */ int ms_removing; /* TRUE (1) if the engine is being removed. */ - const char *ms_engine_name; - void *ms_engine_info; - MSCloseConnFunc ms_close_conn; - MSOpenTableFunc ms_open_table; - MSCloseTableFunc ms_close_table; - MSLockTableFunc ms_lock_table; - MSUnlockTableFunc ms_unlock_table; - MSSendBLOBFunc ms_send_blob; - MSLookupRefFunc ms_lookup_ref; + int ms_internal; /* TRUE (1) if the engine is supported directly in the mysq/drizzle handler code . */ + char ms_engine_name[32]; } PBMSEngineRec, *PBMSEnginePtr; /* * This function should never be called directly, it is called * by deregisterEngine() below. */ -typedef void (*ECDeregisterdFunc)(PBMSEnginePtr engine); - -typedef void (*ECTableCloseAllFunc)(const char *table_url); - -typedef int (*ECSetContentLenFunc)(void *stream, off_t len, PBMSResultPtr result); - -typedef int (*ECWriteHeadFunc)(void *stream, PBMSResultPtr result); - -typedef int (*ECWriteStreamFunc)(void *stream, void *buffer, size_t len, PBMSResultPtr result); - -/* - * The engine should call this function from - * its own close connection function! - */ -typedef int (*ECCloseConnFunc)(void *thd, PBMSResultPtr result); +typedef void (*ECRegisterdFunc)(PBMSEnginePtr engine); -/* - * Call this function before retaining or releasing BLOBs in a row. - */ -typedef int (*ECOpenTableFunc)(void **open_table, char *table_path, PBMSResultPtr result); +typedef void (*ECDeregisterdFunc)(PBMSEnginePtr engine); /* - * Call this function when the operation is complete. + * Call this function to store a BLOB in the repository the BLOB's + * URL will be returned. The returned URL buffer is expected to be atleast + * PBMS_BLOB_URL_SIZE long. + * + * The BLOB URL must still be retained or it will automaticly be deleted after a timeout expires. */ -typedef void (*ECCloseTableFunc)(void *open_table); +typedef int (*ECCreateBlobsFunc)(bool built_in, const char *db_name, const char *tab_name, char *blob, size_t blob_len, char *blob_url, unsigned short col_index, PBMSResultPtr result); /* * Call this function for each BLOB to be retained. When a BLOB is used, the - * URL may be changed. The returned URL is valid as long as the the - * table is open. + * URL may be changed. The returned URL buffer is expected to be atleast + * PBMS_BLOB_URL_SIZE long. * * The returned URL must be inserted into the row in place of the given * URL. */ -typedef int (*ECUseBlobFunc)(void *open_table, char **ret_blob_url, char *blob_url, unsigned short col_index, PBMSResultPtr result); - -/* - * Reference Blobs that has been uploaded to the streaming engine. - * - * All BLOBs specified by the use blob function are retained by - * this function. - * - * The engine reference is a (unaligned) 8 byte value which - * identifies the row that the BLOBs are in. - */ -typedef int (*ECRetainBlobsFunc)(void *open_table, PBMSEngineRefPtr eng_ref, PBMSResultPtr result); +typedef int (*ECRetainBlobsFunc)(bool built_in, const char *db_name, const char *tab_name, char *ret_blob_url, char *blob_url, unsigned short col_index, PBMSResultPtr result); /* * If a row containing a BLOB is deleted, then the BLOBs in the @@ -216,27 +157,24 @@ typedef int (*ECRetainBlobsFunc)(void *open_table, PBMSEngineRefPtr eng_ref, PBM * Note: if a table is dropped, all the BLOBs referenced by the * table are automatically released. */ -typedef int (*ECReleaseBlobFunc)(void *open_table, char *blob_url, unsigned short col_index, PBMSEngineRefPtr eng_ref, PBMSResultPtr result); +typedef int (*ECReleaseBlobFunc)(bool built_in, const char *db_name, const char *tab_name, char *blob_url, PBMSResultPtr result); + +typedef int (*ECDropTable)(bool built_in, const char *db_name, const char *tab_name, PBMSResultPtr result); -typedef int (*ECDropTable)(const char *table_path, PBMSResultPtr result); +typedef int (*ECRenameTable)(bool built_in, const char *db_name, const char *from_table, const char *to_table, PBMSResultPtr result); -typedef int (*ECRenameTable)(const char *from_table, const char *to_table, PBMSResultPtr result); +typedef void (*ECCallCompleted)(bool built_in, bool ok); typedef struct PBMSCallbacksRec { int cb_version; /* MS_CALLBACK_VERSION */ + ECRegisterdFunc cb_register; ECDeregisterdFunc cb_deregister; - ECTableCloseAllFunc cb_table_close_all; - ECSetContentLenFunc cb_set_cont_len; - ECWriteHeadFunc cb_write_head; - ECWriteStreamFunc cb_write_stream; - ECCloseConnFunc cb_close_conn; - ECOpenTableFunc cb_open_table; - ECCloseTableFunc cb_close_table; - ECUseBlobFunc cb_use_blob; - ECRetainBlobsFunc cb_retain_blobs; + ECCreateBlobsFunc cb_create_blob; + ECRetainBlobsFunc cb_retain_blob; ECReleaseBlobFunc cb_release_blob; ECDropTable cb_drop_table; ECRenameTable cb_rename_table; + ECCallCompleted cb_completed; } PBMSCallbacksRec, *PBMSCallbacksPtr; typedef struct PBMSSharedMemoryRec { @@ -251,24 +189,18 @@ typedef struct PBMSSharedMemoryRec { PBMSEnginePtr sm_engine_list[MS_ENGINE_LIST_SIZE]; } PBMSSharedMemoryRec, *PBMSSharedMemoryPtr; -#ifndef PBMS_API -#ifndef PBMS_CLIENT_API -Please define he value of PBMS_API -#endif -#else +#ifdef PBMS_API class PBMS_API { private: const char *temp_prefix[3]; + bool built_in; public: PBMS_API(): sharedMemory(NULL) { int i = 0; temp_prefix[i++] = MS_TEMP_FILE_PREFIX; -#ifdef MS_TEMP_FILE_PREFIX - temp_prefix[i++] = MS_TEMP_FILE_PREFIX; -#endif temp_prefix[i++] = NULL; } @@ -276,6 +208,43 @@ public: ~PBMS_API() { } /* + * This method is called by the PBMS engine during startup. + */ + int PBMSStartup(PBMSCallbacksPtr callbacks, PBMSResultPtr result) { + int err; + + deleteTempFiles(); + err = getSharedMemory(true, result); + if (!err) + sharedMemory->sm_callbacks = callbacks; + + return err; + } + + /* + * This method is called by the PBMS engine during startup. + */ + void PBMSShutdown() { + + if (!sharedMemory) + return; + + lock(); + sharedMemory->sm_callbacks = NULL; + + bool empty = true; + for (int i=0; i<sharedMemory->sm_list_len && empty; i++) { + if (sharedMemory->sm_engine_list[i]) + empty = false; + } + + unlock(); + + if (empty) + removeSharedMemory(); + } + + /* * Register the engine with the Stream Engine. */ int registerEngine(PBMSEnginePtr engine, PBMSResultPtr result) { @@ -283,6 +252,7 @@ public: deleteTempFiles(); + // The first engine to register creates the shared memory. if ((err = getSharedMemory(true, result))) return err; @@ -292,6 +262,10 @@ public: engine->ms_index = i; if (i >= sharedMemory->sm_list_len) sharedMemory->sm_list_len = i+1; + if (sharedMemory->sm_callbacks) + sharedMemory->sm_callbacks->cb_register(engine); + + built_in = (engine->ms_internal == 1); return MS_OK; } } @@ -322,7 +296,7 @@ public: PBMSResultRec result; int err; - if ((err = getSharedMemory(true, &result))) + if ((err = getSharedMemory(false, &result))) return; lock(); @@ -342,207 +316,98 @@ public: unlock(); - if (empty) { - char temp_file[100]; - - sharedMemory->sm_magic = 0; - free(sharedMemory); - sharedMemory = NULL; - const char **prefix = temp_prefix; - while (*prefix) { - getTempFileName(temp_file, *prefix, getpid()); - unlink(temp_file); - prefix++; - } - } + if (empty) + removeSharedMemory(); } - void closeAllTables(const char *table_url) + void removeSharedMemory() { - PBMSResultRec result; - int err; - - if ((err = getSharedMemory(true, &result))) - return; + const char **prefix = temp_prefix; + char temp_file[100]; + // Do not remove the sharfed memory until after + // the PBMS engine has shutdown. if (sharedMemory->sm_callbacks) - sharedMemory->sm_callbacks->cb_table_close_all(table_url); - } - - int setContentLength(void *stream, off_t len, PBMSResultPtr result) - { - int err; - - if ((err = getSharedMemory(true, result))) - return err; - - return sharedMemory->sm_callbacks->cb_set_cont_len(stream, len, result); - } - - int writeHead(void *stream, PBMSResultPtr result) - { - int err; - - if ((err = getSharedMemory(true, result))) - return err; - - return sharedMemory->sm_callbacks->cb_write_head(stream, result); - } - - int writeStream(void *stream, void *buffer, size_t len, PBMSResultPtr result) - { - int err; - - if ((err = getSharedMemory(true, result))) - return err; - - return sharedMemory->sm_callbacks->cb_write_stream(stream, buffer, len, result); - } - - int closeConn(void *thd, PBMSResultPtr result) - { - int err; - - if ((err = getSharedMemory(true, result))) - return err; - - if (!sharedMemory->sm_callbacks) - return MS_OK; - - return sharedMemory->sm_callbacks->cb_close_conn(thd, result); - } - - int openTable(void **open_table, char *table_path, PBMSResultPtr result) - { - int err; - - if ((err = getSharedMemory(true, result))) - return err; - - if (!sharedMemory->sm_callbacks) { - *open_table = NULL; - return MS_OK; + return; + + sharedMemory->sm_magic = 0; + free(sharedMemory); + sharedMemory = NULL; + + while (*prefix) { + getTempFileName(temp_file, *prefix, getpid()); + unlink(temp_file); + prefix++; } - - return sharedMemory->sm_callbacks->cb_open_table(open_table, table_path, result); - } - - int closeTable(void *open_table, PBMSResultPtr result) - { - int err; - - if ((err = getSharedMemory(true, result))) - return err; - - if (sharedMemory->sm_callbacks && open_table) - sharedMemory->sm_callbacks->cb_close_table(open_table); - return MS_OK; } - - int couldBeURL(char *blob_url) - /* ~*test/~1-150-2b5e0a7-0[*<blob size>][.ext] */ - /* ~*test/_1-150-2b5e0a7-0[*<blob size>][.ext] */ - { - char *ptr; - size_t len; - bool have_blob_size = false; - - if (blob_url) { - if ((len = strlen(blob_url))) { - /* Too short: */ - if (len <= 10) - return 0; - - /* Required prefix: */ - /* NOTE: ~> is deprecated v0.5.4+, now use ~* */ - if (*blob_url != '~' || (*(blob_url + 1) != '>' && *(blob_url + 1) != '*')) - return 0; - - ptr = blob_url + len - 1; - - /* Allow for an optional extension: */ - if (!isdigit(*ptr)) { - while (ptr > blob_url && *ptr != '/' && *ptr != '.') - ptr--; - if (ptr == blob_url || *ptr != '.') - return 0; - if (ptr == blob_url || !isdigit(*ptr)) - return 0; - } - // field 1: server id OR blob size - do_again: - while (ptr > blob_url && isdigit(*ptr)) - ptr--; - - if (ptr != blob_url && *ptr == '*' && !have_blob_size) { - ptr--; - have_blob_size = true; - goto do_again; - } - - if (ptr == blob_url || *ptr != '-') - return 0; - - - // field 2: Authoration code - ptr--; - if (!isxdigit(*ptr)) - return 0; - - while (ptr > blob_url && isxdigit(*ptr)) - ptr--; - - if (ptr == blob_url || *ptr != '-') - return 0; - - // field 3:offset - ptr--; - if (!isxdigit(*ptr)) - return 0; - - while (ptr > blob_url && isdigit(*ptr)) - ptr--; - - if (ptr == blob_url || *ptr != '-') - return 0; - - - // field 4:Table id - ptr--; - if (!isdigit(*ptr)) - return 0; - - while (ptr > blob_url && isdigit(*ptr)) - ptr--; - - /* NOTE: ^ and : are deprecated v0.5.4+, now use ! and ~ */ - if (ptr == blob_url || (*ptr != '^' && *ptr != ':' && *ptr != '_' && *ptr != '~')) - return 0; - ptr--; - - if (ptr == blob_url || *ptr != '/') - return 0; - ptr--; - if (ptr == blob_url) - return 0; - return 1; + int couldBeURL(char *blob_url, int size) + { + if (blob_url && (size < PBMS_BLOB_URL_SIZE)) { + char buffer[PBMS_BLOB_URL_SIZE+1]; + u_int32_t db_id = 0; + u_int32_t tab_id = 0; + u_int64_t blob_id = 0; + u_int64_t blob_ref_id = 0; + u_int64_t blob_size = 0; + u_int32_t auth_code = 0; + u_int32_t server_id = 0; + char type, junk[5]; + int scanned; + + junk[0] = 0; + if (blob_url[size]) { // There is no guarantee that the URL will be null terminated. + memcpy(buffer, blob_url, size); + buffer[size] = 0; + blob_url = buffer; + } + + scanned = sscanf(blob_url, URL_FMT"%4s", &db_id, &type, &tab_id, &blob_id, &auth_code, &server_id, &blob_ref_id, &blob_size, junk); + if (scanned != 8) {// If junk is found at the end this will also result in an invalid URL. + printf("Bad URL \"%s\": scanned = %d, junk: %d, %d, %d, %d\n", blob_url, scanned, junk[0], junk[1], junk[2], junk[3]); + return 0; } + + if (junk[0] || (type != '~' && type != '_')) { + printf("Bad URL \"%s\": scanned = %d, junk: %d, %d, %d, %d\n", blob_url, scanned, junk[0], junk[1], junk[2], junk[3]); + return 0; + } + + return 1; } + return 0; } - - int useBlob(void *open_table, char **ret_blob_url, char *blob_url, unsigned short col_index, PBMSResultPtr result) + + int retainBlob(const char *db_name, const char *tab_name, char *ret_blob_url, char *blob_url, size_t blob_size, unsigned short col_index, PBMSResultPtr result) { int err; + char safe_url[PBMS_BLOB_URL_SIZE+1]; - if ((err = getSharedMemory(true, result))) + + if ((err = getSharedMemory(false, result))) return err; - if (!couldBeURL(blob_url)) { - *ret_blob_url = NULL; - return MS_OK; + if (!couldBeURL(blob_url, blob_size)) { + + if (!sharedMemory->sm_callbacks) { + *ret_blob_url = 0; + return MS_OK; + } + err = sharedMemory->sm_callbacks->cb_create_blob(built_in, db_name, tab_name, blob_url, blob_size, ret_blob_url, col_index, result); + if (err) + return err; + + blob_url = ret_blob_url; + } else { + // Make sure the url is a C string: + if (blob_url[blob_size]) { + memcpy(safe_url, blob_url, blob_size); + safe_url[blob_size] = 0; + blob_url = safe_url; + } } + if (!sharedMemory->sm_callbacks) { result->mr_code = MS_ERR_INCORRECT_URL; @@ -551,64 +416,71 @@ public: return MS_ERR_INCORRECT_URL; } - return sharedMemory->sm_callbacks->cb_use_blob(open_table, ret_blob_url, blob_url, col_index, result); + return sharedMemory->sm_callbacks->cb_retain_blob(built_in, db_name, tab_name, ret_blob_url, blob_url, col_index, result); } - int retainBlobs(void *open_table, PBMSEngineRefPtr eng_ref, PBMSResultPtr result) + int releaseBlob(const char *db_name, const char *tab_name, char *blob_url, size_t blob_size, PBMSResultPtr result) { int err; + char safe_url[PBMS_BLOB_URL_SIZE+1]; - if ((err = getSharedMemory(true, result))) + if ((err = getSharedMemory(false, result))) return err; if (!sharedMemory->sm_callbacks) return MS_OK; - return sharedMemory->sm_callbacks->cb_retain_blobs(open_table, eng_ref, result); + if (!couldBeURL(blob_url, blob_size)) + return MS_OK; + + if (blob_url[blob_size]) { + memcpy(safe_url, blob_url, blob_size); + safe_url[blob_size] = 0; + blob_url = safe_url; + } + + return sharedMemory->sm_callbacks->cb_release_blob(built_in, db_name, tab_name, blob_url, result); } - int releaseBlob(void *open_table, char *blob_url, unsigned short col_index, PBMSEngineRefPtr eng_ref, PBMSResultPtr result) + int dropTable(const char *db_name, const char *tab_name, PBMSResultPtr result) { int err; - if ((err = getSharedMemory(true, result))) + if ((err = getSharedMemory(false, result))) return err; if (!sharedMemory->sm_callbacks) return MS_OK; - - if (!couldBeURL(blob_url)) - return MS_OK; - - return sharedMemory->sm_callbacks->cb_release_blob(open_table, blob_url, col_index, eng_ref, result); + + return sharedMemory->sm_callbacks->cb_drop_table(built_in, db_name, tab_name, result); } - int dropTable(const char *table_path, PBMSResultPtr result) + int renameTable(const char *db_name, const char *from_table, const char *to_table, PBMSResultPtr result) { int err; - if ((err = getSharedMemory(true, result))) + if ((err = getSharedMemory(false, result))) return err; if (!sharedMemory->sm_callbacks) return MS_OK; - return sharedMemory->sm_callbacks->cb_drop_table(table_path, result); + return sharedMemory->sm_callbacks->cb_rename_table(built_in, db_name, from_table, to_table, result); } - int renameTable(const char *from_table, const char *to_table, PBMSResultPtr result) + void completed(int ok) { - int err; + PBMSResultRec result; - if ((err = getSharedMemory(true, result))) - return err; + if (getSharedMemory(false, &result)) + return; if (!sharedMemory->sm_callbacks) - return MS_OK; + return; - return sharedMemory->sm_callbacks->cb_rename_table(from_table, to_table, result); + sharedMemory->sm_callbacks->cb_completed(built_in, ok); } - + volatile PBMSSharedMemoryPtr sharedMemory; private: @@ -618,7 +490,6 @@ private: int r; char temp_file[100]; const char **prefix = temp_prefix; - void *tmp_p = NULL; if (sharedMemory) return MS_OK; @@ -644,8 +515,7 @@ private: } buffer[tfer] = 0; - sscanf(buffer, "%p", &tmp_p); - sharedMemory = (PBMSSharedMemoryPtr) tmp_p; + sscanf(buffer, "%p", &sharedMemory); if (!sharedMemory || sharedMemory->sm_magic != MS_SHARED_MEMORY_MAGIC) { if (!create) return MS_OK; @@ -661,9 +531,9 @@ private: return setOSResult(errno, "fseek", temp_file, result); } - sprintf(buffer, "%p", (void *) sharedMemory); + sprintf(buffer, "%p", sharedMemory); tfer = write(tmp_f, buffer, strlen(buffer)); - if (tfer != (ssize_t) strlen(buffer)) { + if (tfer != strlen(buffer)) { close(tmp_f); return setOSResult(errno, "write", temp_file, result); } @@ -782,19 +652,20 @@ private: void deleteTempFiles() { - struct dirent *entry; + struct dirent *entry; struct dirent *result; DIR *odir; int err; - size_t sz; + size_t sz; char temp_file[100]; -#ifdef XT_SOLARIS +#ifdef __sun sz = sizeof(struct dirent) + pathconf("/tmp/", _PC_NAME_MAX); // Solaris, see readdir(3C) #else sz = sizeof(struct dirent); #endif - entry = (struct dirent*)malloc(sz); + if (!(entry = (struct dirent *) malloc(sz))) + return; if (!(odir = opendir("/tmp/"))) return; err = readdir_r(odir, entry, &result); @@ -846,25 +717,25 @@ extern void PBMSDeinitBlobStreamingThread(void *v_bs_thread); extern void PBMSGetError(void *v_bs_thread, PBMSResultPtr result); /* -* PBMSCreateBlob():Creates a new blob in the database of the given size. cont_type can be NULL. +* PBMSCreateBlob():Creates a new blob in the database of the given size. */ -extern bool PBMSCreateBlob(PBMSBlobIDPtr blob_id, char *database_name, char *cont_type, u_int64_t size); +extern bool PBMSCreateBlob(PBMSBlobIDPtr blob_id, char *database_name, u_int64_t size); /* * PBMSWriteBlob():Write the data to the blob in one or more chunks. The total size of all the chuncks of * data written to the blob must match the size specified when the blob was created. */ -extern bool PBMSWriteBlob(PBMSBlobIDPtr blob_id, char *database_name, char *data, size_t size, size_t offset); +extern bool PBMSWriteBlob(PBMSBlobIDPtr blob_id, char *data, size_t size, size_t offset); /* * PBMSReadBlob():Read the blob data out of the blob in one or more chunks. */ -extern bool PBMSReadBlob(PBMSBlobIDPtr blob_id, char *database_name, char *buffer, size_t *size, size_t offset); +extern bool PBMSReadBlob(PBMSBlobIDPtr blob_id, char *buffer, size_t *size, size_t offset); /* * PBMSIDToURL():Convert a blob id to a blob URL. The 'url' buffer must be atleast PBMS_BLOB_URL_SIZE bytes in size. */ -extern bool PBMSIDToURL(PBMSBlobIDPtr blob_id, char *database_name, char *url); +extern bool PBMSIDToURL(PBMSBlobIDPtr blob_id, char *url); /* * PBMSIDToURL():Convert a blob URL to a blob ID. diff --git a/storage/pbxt/src/pbms_enabled.cc b/storage/pbxt/src/pbms_enabled.cc new file mode 100644 index 00000000000..df8b99b331b --- /dev/null +++ b/storage/pbxt/src/pbms_enabled.cc @@ -0,0 +1,238 @@ +/* Copyright (c) 2009 PrimeBase Technologies GmbH, Germany + * + * PrimeBase Media Stream for MySQL + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + * + * Barry Leslie + * + * 2009-07-16 + * + * H&G2JCtL + * + * PBMS interface used to enable engines for use with the PBMS engine. + * + * For an example on how to build this into an engine have a look at the PBXT engine + * in file ha_pbxt.cc. Search for 'PBMS_ENABLED'. + * + */ + +#define PBMS_API pbms_enabled_api + +#include "pbms_enabled.h" +#ifdef DRIZZLED +#include <sys/stat.h> +#include <drizzled/common_includes.h> +#include <drizzled/plugin.h> +#else +#include "mysql_priv.h" +#include <mysql/plugin.h> +#define session_alloc(sess, size) thd_alloc(sess, size); +#define current_session current_thd +#endif + +#define GET_BLOB_FIELD(t, i) (Field_blob *)(t->field[t->s->blob_field[i]]) +#define DB_NAME(f) (f->table->s->db.str) +#define TAB_NAME(f) (*(f->table_name)) + +static PBMS_API pbms_api; + +PBMSEngineRec enabled_engine = { + MS_ENGINE_VERSION +}; + +//==================== +bool pbms_initialize(const char *engine_name, bool isServer, PBMSResultPtr result) +{ + int err; + + strncpy(enabled_engine.ms_engine_name, engine_name, 32); + enabled_engine.ms_internal = isServer; + enabled_engine.ms_engine_name[31] = 0; + + err = pbms_api.registerEngine(&enabled_engine, result); + + return (err == 0); +} + + +//==================== +void pbms_finalize() +{ + pbms_api.deregisterEngine(&enabled_engine); +} + +//==================== +int pbms_write_row_blobs(TABLE *table, uchar *row_buffer, PBMSResultPtr result) +{ + Field_blob *field; + char *blob_rec, *blob; + size_t packlength, i, org_length, length; + char blob_url_buffer[PBMS_BLOB_URL_SIZE]; + int err; + String type_name; + + if (table->s->blob_fields == 0) + return 0; + + for (i= 0; i < table->s->blob_fields; i++) { + field = GET_BLOB_FIELD(table, i); + + // Note: field->type() always returns MYSQL_TYPE_BLOB regardless of the type of BLOB + field->sql_type(type_name); + if (strcasecmp(type_name.c_ptr(), "LongBlob")) + continue; + + // Get the blob record: + blob_rec = (char *)row_buffer + field->offset(field->table->record[0]); + packlength = field->pack_length() - field->table->s->blob_ptr_size; + + memcpy(&blob, blob_rec +packlength, sizeof(char*)); + org_length = field->get_length((uchar *)blob_rec); + + + // Signal PBMS to record a new reference to the BLOB. + // If 'blob' is not a BLOB URL then it will be stored in the repositor as a new BLOB + // and a reference to it will be created. + err = pbms_api.retainBlob(DB_NAME(field), TAB_NAME(field), blob_url_buffer, blob, org_length, field->field_index, result); + if (err) + return err; + + // If the BLOB length changed reset it. + // This will happen if the BLOB data was replaced with a BLOB reference. + length = strlen(blob_url_buffer) +1; + if ((length != org_length) || memcmp(blob_url_buffer, blob, length)) { + if (length != org_length) { + field->store_length((uchar *)blob_rec, packlength, length); + } + + if (length > org_length) { + // This can only happen if the BLOB URL is actually larger than the BLOB itself. + blob = (char *) session_alloc(current_session, length); + memcpy(blob_rec+packlength, &blob, sizeof(char*)); + } + memcpy(blob, blob_url_buffer, length); + } + } + + return 0; +} + +//==================== +int pbms_delete_row_blobs(TABLE *table, const uchar *row_buffer, PBMSResultPtr result) +{ + Field_blob *field; + const char *blob_rec; + char *blob; + size_t packlength, i, length; + int err; + String type_name; + + if (table->s->blob_fields == 0) + return 0; + + for (i= 0; i < table->s->blob_fields; i++) { + field = GET_BLOB_FIELD(table, i); + + // Note: field->type() always returns MYSQL_TYPE_BLOB regardless of the type of BLOB + field->sql_type(type_name); + if (strcasecmp(type_name.c_ptr(), "LongBlob")) + continue; + + // Get the blob record: + blob_rec = (char *)row_buffer + field->offset(field->table->record[0]); + packlength = field->pack_length() - field->table->s->blob_ptr_size; + + length = field->get_length((uchar *)blob_rec); + memcpy(&blob, blob_rec +packlength, sizeof(char*)); + + // Signal PBMS to delete the reference to the BLOB. + err = pbms_api.releaseBlob(DB_NAME(field), TAB_NAME(field), blob, length, result); + if (err) + return err; + } + + return 0; +} + +#define MAX_NAME_SIZE 64 +static void parse_table_path(const char *path, char *db_name, char *tab_name) +{ + const char *ptr = path + strlen(path) -1, *eptr; + int len; + + *db_name = *tab_name = 0; + + while ((ptr > path) && (*ptr != '/'))ptr --; + if (*ptr != '/') + return; + + strncpy(tab_name, ptr+1, MAX_NAME_SIZE); + tab_name[MAX_NAME_SIZE-1] = 0; + eptr = ptr; + ptr--; + + while ((ptr > path) && (*ptr != '/'))ptr --; + if (*ptr != '/') + return; + ptr++; + + len = eptr - ptr; + if (len >= MAX_NAME_SIZE) + len = MAX_NAME_SIZE-1; + + memcpy(db_name, ptr, len); + db_name[len] = 0; + +} + +//==================== +int pbms_rename_table_with_blobs(const char *old_table_path, const char *new_table_path, PBMSResultPtr result) +{ + char o_db_name[MAX_NAME_SIZE], n_db_name[MAX_NAME_SIZE], o_tab_name[MAX_NAME_SIZE], n_tab_name[MAX_NAME_SIZE]; + + parse_table_path(old_table_path, o_db_name, o_tab_name); + parse_table_path(new_table_path, n_db_name, n_tab_name); + + if (strcmp(o_db_name, n_db_name)) { + result->mr_code = MS_ERR_INVALID_OPERATION; + strcpy(result->mr_message, "PBMS does not support renaming tables across databases."); + strcpy(result->mr_stack, "pbms_rename_table_with_blobs()"); + return MS_ERR_INVALID_OPERATION; + } + + + return pbms_api.renameTable(o_db_name, o_tab_name, n_tab_name, result); +} + +//==================== +int pbms_delete_table_with_blobs(const char *table_path, PBMSResultPtr result) +{ + char db_name[MAX_NAME_SIZE], tab_name[MAX_NAME_SIZE]; + + parse_table_path(table_path, db_name, tab_name); + + return pbms_api.dropTable(db_name, tab_name, result); +} + +//==================== +void pbms_completed(TABLE *table, bool ok) +{ + if ((!table) || (table->s->blob_fields != 0)) + pbms_api.completed(ok) ; + + return ; +} + diff --git a/storage/pbxt/src/pbms_enabled.h b/storage/pbxt/src/pbms_enabled.h new file mode 100644 index 00000000000..f389db1d3f3 --- /dev/null +++ b/storage/pbxt/src/pbms_enabled.h @@ -0,0 +1,110 @@ +/* Copyright (c) 2009 PrimeBase Technologies GmbH, Germany + * + * PrimeBase Media Stream for MySQL + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + * + * Barry Leslie + * + * 2009-07-16 + * + * H&G2JCtL + * + * PBMS interface used to enable engines for use with the PBMS engine. + * + * For an example on how to build this into an engine have a look at the PBXT engine + * in file ha_pbxt.cc. Search for 'PBMS_ENABLED'. + * + */ + + +#ifndef __PBMS_ENABLED_H__ +#define __PBMS_ENABLED_H__ + +#include "pbms.h" + +#ifdef DRIZZLED +#include <drizzled/server_includes.h> +#define TABLE Table +#else +#include <mysql_priv.h> +#endif + +/* + * pbms_initialize() should be called from the engines plugIn's 'init()' function. + * The engine_name is the name of your engine, "PBXT" or "InnoDB" for example. + * + * The isServer flag indicates if this entire server is being enabled. This is only + * true if this is being built into the server's handler code above the engine level + * calls. + */ +extern bool pbms_initialize(const char *engine_name, bool isServer, PBMSResultPtr result); + +/* + * pbms_finalize() should be called from the engines plugIn's 'deinit()' function. + */ +extern void pbms_finalize(); + +/* + * pbms_write_row_blobs() should be called from the engine's 'write_row' function. + * It can alter the row data so it must be called before any other function using the row data. + * It should also be called from engine's 'update_row' function for the new row. + * + * pbms_completed() must be called after calling pbms_write_row_blobs() and just before + * returning from write_row() to indicate if the operation completed successfully. + */ +extern int pbms_write_row_blobs(TABLE *table, uchar *buf, PBMSResultPtr result); + +/* + * pbms_delete_row_blobs() should be called from the engine's 'delete_row' function. + * It should also be called from engine's 'update_row' function for the old row. + * + * pbms_completed() must be called after calling pbms_delete_row_blobs() and just before + * returning from delete_row() to indicate if the operation completed successfully. + */ +extern int pbms_delete_row_blobs(TABLE *table, const uchar *buf, PBMSResultPtr result); + +/* + * pbms_rename_table_with_blobs() should be called from the engine's 'rename_table' function. + * + * NOTE: Renaming tables across databases is not supported. + * + * pbms_completed() must be called after calling pbms_rename_table_with_blobs() and just before + * returning from rename_table() to indicate if the operation completed successfully. + */ +extern int pbms_rename_table_with_blobs(const char *old_table_path, const char *new_table_path, PBMSResultPtr result); + +/* + * pbms_delete_table_with_blobs() should be called from the engine's 'delete_table' function. + * + * NOTE: Currently pbms_delete_table_with_blobs() cannot be undone so it should only + * be called after the host engine has performed successfully drop it's table. + * + * pbms_completed() must be called after calling pbms_delete_table_with_blobs() and just before + * returning from delete_table() to indicate if the operation completed successfully. + */ +extern int pbms_delete_table_with_blobs(const char *table_path, PBMSResultPtr result); + +/* + * pbms_completed() must be called to indicate success or failure of a an operation after having + * called pbms_write_row_blobs(), pbms_delete_row_blobs(), pbms_rename_table_with_blobs(), or + * pbms_delete_table_with_blobs(). + * + * pbms_completed() has the effect of committing or rolling back the changes made if the session + * is in 'autocommit' mode. + */ +extern void pbms_completed(TABLE *table, bool ok); + +#endif diff --git a/storage/pbxt/src/pthread_xt.cc b/storage/pbxt/src/pthread_xt.cc index 0a9f4da2074..d87bbd31722 100755 --- a/storage/pbxt/src/pthread_xt.cc +++ b/storage/pbxt/src/pthread_xt.cc @@ -395,20 +395,31 @@ int xt_p_cond_timedwait(xt_cond_type *cond, xt_mutex_type *mt, struct timespec * int xt_p_join(pthread_t thread, void **value) { - switch (WaitForSingleObject(thread, INFINITE)) { - case WAIT_OBJECT_0: - case WAIT_TIMEOUT: - /* Don't do this! According to the Win docs: - * _endthread automatically closes the thread handle - * (whereas _endthreadex does not). Therefore, when using - * _beginthread and _endthread, do not explicitly close the - * thread handle by calling the Win32 CloseHandle API. - CloseHandle(thread); - */ - break; - case WAIT_FAILED: - return GetLastError(); + DWORD exitcode; + + while(1) { + switch (WaitForSingleObject(thread, 10000)) { + case WAIT_OBJECT_0: + return 0; + case WAIT_TIMEOUT: + /* Don't do this! According to the Win docs: + * _endthread automatically closes the thread handle + * (whereas _endthreadex does not). Therefore, when using + * _beginthread and _endthread, do not explicitly close the + * thread handle by calling the Win32 CloseHandle API. + CloseHandle(thread); + */ + /* This is done so that if the thread was not [yet] in the running + * state when this function was called we won't deadlock here. + */ + if (GetExitCodeThread(thread, &exitcode) && (exitcode == STILL_ACTIVE)) + break; + return 0; + case WAIT_FAILED: + return GetLastError(); + } } + return 0; } diff --git a/storage/pbxt/src/restart_xt.cc b/storage/pbxt/src/restart_xt.cc index 50390bfb727..0e0e4306a2e 100644 --- a/storage/pbxt/src/restart_xt.cc +++ b/storage/pbxt/src/restart_xt.cc @@ -410,7 +410,7 @@ typedef struct XTOperation { xtLogOffset or_log_offset; } XTOperationRec, *XTOperationPtr; -static int xres_cmp_op_seq(struct XTThread *self __attribute__((unused)), register const void *thunk __attribute__((unused)), register const void *a, register const void *b) +static int xres_cmp_op_seq(struct XTThread *XT_UNUSED(self), register const void *XT_UNUSED(thunk), register const void *a, register const void *b) { xtOpSeqNo lf_op_seq = *((xtOpSeqNo *) a); XTOperationPtr lf_ptr = (XTOperationPtr) b; @@ -480,19 +480,6 @@ static xtBool xres_add_index_entries(XTOpenTablePtr ot, xtRowID row_id, xtRecord return OK; for (idx_cnt=0, ind=tab->tab_dic.dic_keys; idx_cnt<tab->tab_dic.dic_key_count; idx_cnt++, ind++) { - /* - key.sk_on_key = FALSE; - key.sk_key_value.sv_flags = XT_SEARCH_WHOLE_KEY; - key.sk_key_value.sv_rec_id = rec_offset; - key.sk_key_value.sv_key = key.sk_key_buf; - key.sk_key_value.sv_length = myxt_create_key_from_row(*ind, key.sk_key_buf, rec_data, NULL); - if (!xt_idx_search(ot, *ind, &key)) { - ot->ot_err_index_no = (*ind)->mi_index_no; - return FAILED; - } - if (!key.sk_on_key) { - } - */ if (!xt_idx_insert(ot, *ind, row_id, rec_id, rec_data, NULL, TRUE)) { /* Check the error, certain errors are recoverable! */ XTThreadPtr self = xt_get_self(); @@ -509,7 +496,7 @@ static xtBool xres_add_index_entries(XTOpenTablePtr ot, xtRowID row_id, xtRecord /* TODO: Write something to the index header to indicate that * it is corrupted. */ - tab->tab_dic.dic_disable_index = XT_INDEX_CORRUPTED; + xt_tab_disable_index(ot->ot_table, XT_INDEX_CORRUPTED); xt_log_and_clear_exception_ns(); return OK; } @@ -642,6 +629,9 @@ static void xres_apply_change(XTThreadPtr self, XTOpenTablePtr ot, XTXactLogBuff xtWord1 *rec_data = NULL; XTTabRecFreeDPtr free_data; + if (tab->tab_dic.dic_key_count == 0) + check_index = FALSE; + switch (record->xl.xl_status_1) { case XT_LOG_ENT_REC_MODIFIED: case XT_LOG_ENT_UPDATE: @@ -651,20 +641,25 @@ static void xres_apply_change(XTThreadPtr self, XTOpenTablePtr ot, XTXactLogBuff case XT_LOG_ENT_INSERT_BG: case XT_LOG_ENT_DELETE_BG: rec_id = XT_GET_DISK_4(record->xu.xu_rec_id_4); + + /* This should be done before we apply change to table, as otherwise we lose + * the key value that we need to remove from index + */ + if (check_index && record->xl.xl_status_1 == XT_LOG_ENT_REC_MODIFIED) { + if ((rec_data = xres_load_record(self, ot, rec_id, NULL, 0, rec_buf, tab->tab_dic.dic_ind_cols_req))) + xres_remove_index_entries(ot, rec_id, rec_data); + } + len = (size_t) XT_GET_DISK_2(record->xu.xu_size_2); if (!XT_PWRITE_RR_FILE(ot->ot_rec_file, xt_rec_id_to_rec_offset(tab, rec_id), len, (xtWord1 *) &record->xu.xu_rec_type_1, &ot->ot_thread->st_statistics.st_rec, ot->ot_thread)) xt_throw(self); tab->tab_bytes_to_flush += len; - if (check_index && ot->ot_table->tab_dic.dic_key_count) { + if (check_index) { switch (record->xl.xl_status_1) { case XT_LOG_ENT_DELETE: case XT_LOG_ENT_DELETE_BG: break; - case XT_LOG_ENT_REC_MODIFIED: - if ((rec_data = xres_load_record(self, ot, rec_id, NULL, 0, rec_buf, tab->tab_dic.dic_ind_cols_req))) - xres_remove_index_entries(ot, rec_id, rec_data); - /* No break required: */ default: if ((rec_data = xres_load_record(self, ot, rec_id, &record->xu.xu_rec_type_1, len, rec_buf, tab->tab_dic.dic_ind_cols_req))) { row_id = XT_GET_DISK_4(record->xu.xu_row_id_4); @@ -859,9 +854,6 @@ static void xres_apply_change(XTThreadPtr self, XTOpenTablePtr ot, XTXactLogBuff goto do_rec_freed; record_loaded = TRUE; } -#ifdef XT_STREAMING - myxt_release_blobs(ot, rec_data, rec_id); -#endif } if (record->xl.xl_status_1 == XT_LOG_ENT_REC_REMOVED_EXT) { @@ -967,31 +959,12 @@ static void xres_apply_change(XTThreadPtr self, XTOpenTablePtr ot, XTXactLogBuff if (check_index) { cols_required = tab->tab_dic.dic_ind_cols_req; -#ifdef XT_STREAMING - if (tab->tab_dic.dic_blob_cols_req > cols_required) - cols_required = tab->tab_dic.dic_blob_cols_req; -#endif if (!(rec_data = xres_load_record(self, ot, rec_id, &record->rb.rb_rec_type_1, rec_size, rec_buf, cols_required))) goto go_on_to_free; record_loaded = TRUE; xres_remove_index_entries(ot, rec_id, rec_data); } -#ifdef XT_STREAMING - if (tab->tab_dic.dic_blob_count) { - if (!record_loaded) { - cols_required = tab->tab_dic.dic_blob_cols_req; - if (!(rec_data = xres_load_record(self, ot, rec_id, &record->rb.rb_rec_type_1, rec_size, rec_buf, cols_required))) - /* [(7)] REMOVE is followed by FREE: - goto get_rec_offset; - */ - goto go_on_to_free; - record_loaded = TRUE; - } - myxt_release_blobs(ot, rec_data, rec_id); - } -#endif - if (data_log_id && data_log_offset && log_over_size) { if (!ot->ot_thread->st_dlog_buf.dlb_delete_log(data_log_id, data_log_offset, log_over_size, tab->tab_id, rec_id, self)) { if (ot->ot_thread->t_exception.e_xt_err != XT_ERR_BAD_EXT_RECORD && @@ -1560,7 +1533,7 @@ static xtBool xres_delete_data_log(XTDatabaseHPtr db, xtLogID log_id) return OK; } -static int xres_comp_flush_tabs(XTThreadPtr self __attribute__((unused)), register const void *thunk __attribute__((unused)), register const void *a, register const void *b) +static int xres_comp_flush_tabs(XTThreadPtr XT_UNUSED(self), register const void *XT_UNUSED(thunk), register const void *a, register const void *b) { xtTableID tab_id = *((xtTableID *) a); XTCheckPointTablePtr cp_tab = (XTCheckPointTablePtr) b; @@ -1868,7 +1841,7 @@ void XTXactRestart::xres_init(XTThreadPtr self, XTDatabaseHPtr db, xtLogID *log_ exit_(); } -void XTXactRestart::xres_exit(XTThreadPtr self __attribute__((unused))) +void XTXactRestart::xres_exit(XTThreadPtr XT_UNUSED(self)) { } @@ -2578,7 +2551,8 @@ static void *xres_cp_run_thread(XTThreadPtr self) int count; void *mysql_thread; - mysql_thread = myxt_create_thread(); + if (!(mysql_thread = myxt_create_thread())) + xt_throw(self); while (!self->t_quit) { try_(a) { @@ -2615,7 +2589,10 @@ static void *xres_cp_run_thread(XTThreadPtr self) } } + /* + * {MYSQL-THREAD-KILL} myxt_destroy_thread(mysql_thread, TRUE); + */ return NULL; } @@ -2700,7 +2677,7 @@ xtPublic xtBool xt_begin_checkpoint(XTDatabaseHPtr db, xtBool have_table_lock, X XTXactSegPtr seg; seg = &db->db_xn_idx[i]; - XT_XACT_WRITE_LOCK(&seg->xs_tab_lock, self); + XT_XACT_READ_LOCK(&seg->xs_tab_lock, self); for (u_int j=0; j<XT_XN_HASH_TABLE_SIZE; j++) { XTXactDataPtr xact; @@ -2716,7 +2693,7 @@ xtPublic xtBool xt_begin_checkpoint(XTDatabaseHPtr db, xtBool have_table_lock, X xact = xact->xd_next_xact; } } - XT_XACT_UNLOCK(&seg->xs_tab_lock, self); + XT_XACT_UNLOCK(&seg->xs_tab_lock, self, FALSE); } #ifdef TRACE_CHECKPOINT @@ -3201,3 +3178,105 @@ xtPublic void xt_dump_xlogs(XTDatabaseHPtr db, xtLogID start_log) done: db->db_xlog.xlog_seq_exit(&seq); } + +/* ---------------------------------------------------------------------- + * D A T A B A S E R E C O V E R Y T H R E A D + */ + +extern XTDatabaseHPtr pbxt_database; +static XTThreadPtr xres_recovery_thread; + +static void *xn_xres_run_recovery_thread(XTThreadPtr self) +{ + THD *mysql_thread; + + if (!(mysql_thread = (THD *) myxt_create_thread())) + xt_throw(self); + + while (!xres_recovery_thread->t_quit && !ha_resolve_by_legacy_type(mysql_thread, DB_TYPE_PBXT)) + xt_sleep_milli_second(1); + + if (!xres_recovery_thread->t_quit) { + /* {GLOBAL-DB} + * It can happen that something will just get in before this + * thread and open/recover the database! + */ + if (!pbxt_database) { + try_(a) { + xt_open_database(self, mysql_real_data_home, TRUE); + /* This can be done at the same time by a foreground thread, + * strictly speaking I need a lock. + */ + if (!pbxt_database) { + pbxt_database = self->st_database; + xt_heap_reference(self, pbxt_database); + } + } + catch_(a) { + xt_log_and_clear_exception(self); + } + cont_(a); + } + } + + /* + * {MYSQL-THREAD-KILL} + * Here is the problem with destroying the thread at this + * point. If we had an error started, then it can lead + * to a callback into pbxt: pbxt_panic(). + * + * This will shutdown things, making it impossible quite the + * thread and do a cleanup. Solution: + * + * Move the MySQL thread descruction to a later point! + * + * sql/mysqld --no-defaults --basedir=~/maria/trunk + * --character-sets-dir=~/maria/trunk/sql/share/charsets + * --language=~/maria/trunk/sql/share/english + * --skip-networking --datadir=/tmp/x --skip-grant-tables --nonexistentoption + * + * #0 0x003893f9 in xt_exit_databases at database_xt.cc:304 + * #1 0x0039dc7e in pbxt_end at ha_pbxt.cc:947 + * #2 0x0039dd27 in pbxt_panic at ha_pbxt.cc:1289 + * #3 0x001d619e in ha_finalize_handlerton at handler.cc:391 + * #4 0x00279d22 in plugin_deinitialize at sql_plugin.cc:816 + * #5 0x0027bcf5 in reap_plugins at sql_plugin.cc:904 + * #6 0x0027c38c in plugin_thdvar_cleanup at sql_plugin.cc:2513 + * #7 0x000c0db2 in THD::~THD at sql_class.cc:934 + * #8 0x003b025b in myxt_destroy_thread at myxt_xt.cc:2999 + * #9 0x003b66b5 in xn_xres_run_recovery_thread at restart_xt.cc:3196 + * #10 0x003cbfbb in thr_main at thread_xt.cc:1020 + * + myxt_destroy_thread(mysql_thread, TRUE); + */ + + xres_recovery_thread = NULL; + return NULL; +} + +xtPublic void xt_xres_start_database_recovery(XTThreadPtr self) +{ + char name[PATH_MAX]; + + sprintf(name, "DB-RECOVERY-%s", xt_last_directory_of_path(mysql_real_data_home)); + xt_remove_dir_char(name); + + xres_recovery_thread = xt_create_daemon(self, name); + xt_run_thread(self, xres_recovery_thread, xn_xres_run_recovery_thread); +} + +xtPublic void xt_xres_wait_for_recovery(XTThreadPtr self) +{ + XTThreadPtr thr_rec; + + /* {MYSQL-THREAD-KILL} + * Stack above shows that his is possible! + */ + if ((thr_rec = xres_recovery_thread) && (self != xres_recovery_thread)) { + xtThreadID tid = thr_rec->t_id; + + xt_terminate_thread(self, thr_rec); + + xt_wait_for_thread(tid, TRUE); + } +} diff --git a/storage/pbxt/src/restart_xt.h b/storage/pbxt/src/restart_xt.h index 259b3cbda90..bcef7191443 100644 --- a/storage/pbxt/src/restart_xt.h +++ b/storage/pbxt/src/restart_xt.h @@ -131,4 +131,7 @@ xtWord8 xt_bytes_since_last_checkpoint(struct XTDatabase *db, xtLogID curr_log_i void xt_print_log_record(xtLogID log, off_t offset, XTXactLogBufferDPtr record); void xt_dump_xlogs(struct XTDatabase *db, xtLogID start_log); +void xt_xres_start_database_recovery(XTThreadPtr self); +void xt_xres_wait_for_recovery(XTThreadPtr self); + #endif diff --git a/storage/pbxt/src/sortedlist_xt.cc b/storage/pbxt/src/sortedlist_xt.cc index b4c525dbb22..f1742b64330 100644 --- a/storage/pbxt/src/sortedlist_xt.cc +++ b/storage/pbxt/src/sortedlist_xt.cc @@ -234,7 +234,7 @@ xtPublic void xt_sl_delete_item_at(struct XTThread *self, XTSortedListPtr sl, si XT_MEMMOVE(sl->sl_data, &sl->sl_data[idx * sl->sl_item_size], &sl->sl_data[(idx+1) * sl->sl_item_size], (sl->sl_usage_count-idx) * sl->sl_item_size); } -xtPublic void xt_sl_remove_from_front(struct XTThread *self __attribute__((unused)), XTSortedListPtr sl, size_t items) +xtPublic void xt_sl_remove_from_front(struct XTThread *XT_UNUSED(self), XTSortedListPtr sl, size_t items) { if (sl->sl_usage_count <= items) xt_sl_set_size(sl, 0); diff --git a/storage/pbxt/src/streaming_xt.cc b/storage/pbxt/src/streaming_xt.cc deleted file mode 100755 index 710b3256d60..00000000000 --- a/storage/pbxt/src/streaming_xt.cc +++ /dev/null @@ -1,624 +0,0 @@ -/* Copyright (c) 2005 PrimeBase Technologies GmbH - * - * PrimeBase XT - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License as published by - * the Free Software Foundation; either version 2 of the License, or - * (at your option) any later version. - * - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. - * - * You should have received a copy of the GNU General Public License - * along with this program; if not, write to the Free Software - * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA - * - * 2006-06-07 Paul McCullagh - * - * H&G2JCtL - * - * This file contains PBXT streaming interface. - */ - -#include "xt_config.h" - -#ifdef XT_STREAMING -#include "ha_pbxt.h" - -#include "thread_xt.h" -#include "strutil_xt.h" -#include "table_xt.h" -#include "myxt_xt.h" -#include "xaction_xt.h" -#include "database_xt.h" -#include "streaming_xt.h" - -extern PBMSEngineRec pbxt_engine; - -static PBMS_API pbxt_streaming; - -/* ---------------------------------------------------------------------- - * INIT & EXIT - */ - -xtPublic xtBool xt_init_streaming(void) -{ - XTThreadPtr self = NULL; - int err; - PBMSResultRec result; - - if ((err = pbxt_streaming.registerEngine(&pbxt_engine, &result))) { - xt_logf(XT_CONTEXT, XT_LOG_ERROR, "%s\n", result.mr_message); - return FAILED; - } - return OK; -} - -xtPublic void xt_exit_streaming(void) -{ - pbxt_streaming.deregisterEngine(&pbxt_engine); -} - -/* ---------------------------------------------------------------------- - * UTILITY FUNCTIONS - */ - -static void str_result_to_exception(XTExceptionPtr e, int r, PBMSResultPtr result) -{ - char *str, *end_str; - - e->e_xt_err = r; - e->e_sys_err = result->mr_code; - xt_strcpy(XT_ERR_MSG_SIZE, e->e_err_msg, result->mr_message); - - e->e_source_line = 0; - str = result->mr_stack; - if ((end_str = strchr(str, '('))) { - xt_strcpy_term(XT_MAX_FUNC_NAME_SIZE, e->e_func_name, str, '('); - str = end_str+1; - if ((end_str = strchr(str, ':'))) { - xt_strcpy_term(XT_SOURCE_FILE_NAME_SIZE, e->e_source_file, str, ':'); - str = end_str+1; - if ((end_str = strchr(str, ')'))) { - char number[40]; - - xt_strcpy_term(40, number, str, ')'); - e->e_source_line = atol(number); - str = end_str+1; - if (*str == '\n') - str++; - } - } - } - - if (e->e_source_line == 0) { - *e->e_func_name = 0; - *e->e_source_file = 0; - xt_strcpy(XT_ERR_MSG_SIZE, e->e_catch_trace, result->mr_stack); - } - else - xt_strcpy(XT_ERR_MSG_SIZE, e->e_catch_trace, str); -} - -static void str_exception_to_result(XTExceptionPtr e, PBMSResultPtr result) -{ - int len; - - if (e->e_sys_err) - result->mr_code = e->e_sys_err; - else - result->mr_code = e->e_xt_err; - xt_strcpy(MS_RESULT_MESSAGE_SIZE, result->mr_message, e->e_err_msg); - xt_strcpy(MS_RESULT_STACK_SIZE, result->mr_stack, e->e_func_name); - xt_strcat(MS_RESULT_STACK_SIZE, result->mr_stack, "("); - xt_strcat(MS_RESULT_STACK_SIZE, result->mr_stack, e->e_source_file); - xt_strcat(MS_RESULT_STACK_SIZE, result->mr_stack, ":"); - xt_strcati(MS_RESULT_STACK_SIZE, result->mr_stack, (int) e->e_source_line); - xt_strcat(MS_RESULT_STACK_SIZE, result->mr_stack, ")"); - len = strlen(result->mr_stack); - if (strncmp(result->mr_stack, e->e_catch_trace, len) == 0) - xt_strcat(MS_RESULT_STACK_SIZE, result->mr_stack, e->e_catch_trace + len); - else { - xt_strcat(MS_RESULT_STACK_SIZE, result->mr_stack, "\n"); - xt_strcat(MS_RESULT_STACK_SIZE, result->mr_stack, e->e_catch_trace); - } -} - -static XTIndexPtr str_find_index(XTTableHPtr tab, u_int *col_list, u_int col_cnt) -{ - u_int i, j; - XTIndexPtr *ind; /* MySQL/PBXT key description */ - - ind = tab->tab_dic.dic_keys; - for (i=0; i<tab->tab_dic.dic_key_count; i++) { - if ((*ind)->mi_seg_count == col_cnt) { - for (j=0; j<(*ind)->mi_seg_count; j++) { - if ((*ind)->mi_seg[j].col_idx != col_list[j]) - goto loop; - } - return *ind; - } - - loop: - ind++; - } - return NULL; -} - -static XTThreadPtr str_set_current_thread(THD *thd, PBMSResultPtr result) -{ - XTThreadPtr self; - XTExceptionRec e; - - if (!(self = xt_ha_set_current_thread(thd, &e))) { - str_exception_to_result(&e, result); - return NULL; - } - return self; -} - -/* ---------------------------------------------------------------------- - * BLOB STREAMING INTERFACE - */ - -static void pbxt_close_conn(void *thread) -{ - xt_ha_close_connection((THD *) thread); -} - -static int pbxt_open_table(void *thread, const char *table_url, void **open_table, PBMSResultPtr result) -{ - THD *thd = (THD *) thread; - XTThreadPtr self; - XTTableHPtr tab = NULL; - XTOpenTablePtr ot = NULL; - int err = MS_OK; - - if (!(self = str_set_current_thread(thd, result))) - return MS_ERR_ENGINE; - - try_(a) { - xt_ha_open_database_of_table(self, (XTPathStrPtr) table_url); - if (!(tab = xt_use_table(self, (XTPathStrPtr) table_url, FALSE, TRUE, NULL))) { - err = MS_ERR_UNKNOWN_TABLE; - goto done; - } - if (!(ot = xt_open_table(tab))) - throw_(); - ot->ot_thread = self; - done:; - } - catch_(a) { - str_exception_to_result(&self->t_exception, result); - err = MS_ERR_ENGINE; - } - cont_(a); - if (tab) - xt_heap_release(self, tab); - *open_table = ot; - return err; -} - -static void pbxt_close_table(void *thread, void *open_table_ptr) -{ - THD *thd = (THD *) thread; - volatile XTThreadPtr self, new_self = NULL; - XTOpenTablePtr ot = (XTOpenTablePtr) open_table_ptr; - XTExceptionRec e; - - if (thd) { - if (!(self = xt_ha_set_current_thread(thd, &e))) { - xt_log_exception(NULL, &e, XT_LOG_DEFAULT); - return; - } - } - else if (!(self = xt_get_self())) { - if (!(new_self = xt_create_thread("TempForClose", FALSE, TRUE, &e))) { - xt_log_exception(NULL, &e, XT_LOG_DEFAULT); - return; - } - self = new_self; - } - - ot->ot_thread = self; - try_(a) { - xt_close_table(ot, TRUE, FALSE); - } - catch_(a) { - xt_log_and_clear_exception(self); - } - cont_(a); - if (new_self) - xt_free_thread(self); -} - -static int pbxt_lock_table(void *thread, int *xact, void *open_table, int lock_type, PBMSResultPtr result) -{ - THD *thd = (THD *) thread; - XTThreadPtr self; - XTOpenTablePtr ot = (XTOpenTablePtr) open_table; - int err = MS_OK; - - if (!(self = str_set_current_thread(thd, result))) - return MS_ERR_ENGINE; - - if (lock_type != MS_LOCK_NONE) { - try_(a) { - xt_ha_open_database_of_table(self, ot->ot_table->tab_name); - ot->ot_thread = self; - } - catch_(a) { - str_exception_to_result(&self->t_exception, result); - err = MS_ERR_ENGINE; - } - cont_(a); - } - - if (!err && *xact == MS_XACT_BEGIN) { - if (self->st_xact_data) - *xact = MS_XACT_NONE; - else { - if (xt_xn_begin(self)) { - *xact = MS_XACT_COMMIT; - } - else { - str_exception_to_result(&self->t_exception, result); - err = MS_ERR_ENGINE; - } - } - } - - return err; -} - -static int pbxt_unlock_table(void *thread, int xact, void *open_table __attribute__((unused)), PBMSResultPtr result) -{ - THD *thd = (THD *) thread; - XTThreadPtr self = xt_ha_thd_to_self(thd); - int err = MS_OK; - - if (xact == MS_XACT_COMMIT) { - if (!xt_xn_commit(self)) { - str_exception_to_result(&self->t_exception, result); - err = MS_ERR_ENGINE; - } - } - else if (xact == MS_XACT_ROLLBACK) { - xt_xn_rollback(self); - } - - return err; -} - -static int pbxt_send_blob(void *thread, void *open_table, const char *blob_column, const char *blob_url_p, void *stream, PBMSResultPtr result) -{ - THD *thd = (THD *) thread; - XTThreadPtr self = xt_ha_thd_to_self(thd); - XTOpenTablePtr ot = (XTOpenTablePtr) open_table; - int err = MS_OK; - u_int blob_col_idx, col_idx; - char col_name[XT_IDENTIFIER_NAME_SIZE]; - XTStringBufferRec value; - u_int col_list[XT_MAX_COLS_PER_INDEX]; - u_int col_cnt; - char col_names[XT_ERR_MSG_SIZE - 200]; - XTIdxSearchKeyRec search_key; - XTIndexPtr ind; - char *blob_data; - size_t blob_len; - const char *blob_url = blob_url_p; - - memset(&value, 0, sizeof(value)); - - *col_names = 0; - - ot->ot_thread = self; - try_(a) { - if (ot->ot_row_wbuf_size < ot->ot_table->tab_dic.dic_mysql_buf_size) { - xt_realloc(self, (void **) &ot->ot_row_wbuffer, ot->ot_table->tab_dic.dic_mysql_buf_size); - ot->ot_row_wbuf_size = ot->ot_table->tab_dic.dic_mysql_buf_size; - } - - xt_strcpy_url(XT_IDENTIFIER_NAME_SIZE, col_name, blob_column); - if (!myxt_find_column(ot, &blob_col_idx, col_name)) - xt_throw_tabcolerr(XT_CONTEXT, XT_ERR_COLUMN_NOT_FOUND, ot->ot_table->tab_name, blob_column); - - /* Prepare a row for the condition: */ - const char *ptr; - - col_cnt = 0; - while (*blob_url) { - ptr = xt_strchr(blob_url, '='); - xt_strncpy_url(XT_IDENTIFIER_NAME_SIZE, col_name, blob_url, (size_t) (ptr - blob_url)); - if (!myxt_find_column(ot, &col_idx, col_name)) - xt_throw_tabcolerr(XT_CONTEXT, XT_ERR_COLUMN_NOT_FOUND, ot->ot_table->tab_name, col_name); - if (*col_names) - xt_strcat(sizeof(col_names), col_names, ", "); - xt_strcat(sizeof(col_names), col_names, col_name); - blob_url = ptr; - if (*blob_url == '=') - blob_url++; - ptr = xt_strchr(blob_url, '&'); - value.sb_len = 0; - xt_sb_concat_url_len(self, &value, blob_url, (size_t) (ptr - blob_url)); - blob_url = ptr; - if (*blob_url == '&') - blob_url++; - if (!myxt_set_column(ot, (char *) ot->ot_row_rbuffer, col_idx, value.sb_cstring, value.sb_len)) - xt_throw_tabcolerr(XT_CONTEXT, XT_ERR_CONVERSION, ot->ot_table->tab_name, col_name); - if (col_cnt < XT_MAX_COLS_PER_INDEX) { - col_list[col_cnt] = col_idx; - col_cnt++; - } - } - - /* Find a matching index: */ - if (!(ind = str_find_index(ot->ot_table, col_list, col_cnt))) - xt_throw_ixterr(XT_CONTEXT, XT_ERR_NO_MATCHING_INDEX, col_names); - - search_key.sk_key_value.sv_flags = 0; - search_key.sk_key_value.sv_rec_id = 0; - search_key.sk_key_value.sv_row_id = 0; - search_key.sk_key_value.sv_key = search_key.sk_key_buf; - search_key.sk_key_value.sv_length = myxt_create_key_from_row(ind, search_key.sk_key_buf, ot->ot_row_rbuffer, NULL); - search_key.sk_on_key = FALSE; - - if (!xt_idx_search(ot, ind, &search_key)) - xt_throw(self); - - if (!ot->ot_curr_rec_id) - xt_throw_taberr(XT_CONTEXT, XT_ERR_NO_ROWS, ot->ot_table->tab_name); - - while (ot->ot_curr_rec_id) { - if (!search_key.sk_on_key) - xt_throw_taberr(XT_CONTEXT, XT_ERR_NO_ROWS, ot->ot_table->tab_name); - - retry: - /* X TODO - Check if the write buffer is big enough here! */ - switch (xt_tab_read_record(ot, ot->ot_row_wbuffer)) { - case FALSE: - if (xt_idx_next(ot, ind, &search_key)) - break; - case XT_ERR: - xt_throw(self); - case XT_NEW: - if (xt_idx_match_search(ot, ind, &search_key, ot->ot_row_wbuffer, XT_S_MODE_MATCH)) - goto success; - if (!xt_idx_next(ot, ind, &search_key)) - xt_throw(self); - break; - case XT_RETRY: - goto retry; - default: - goto success; - } - } - - success: - myxt_get_column_data(ot, (char *) ot->ot_row_wbuffer, blob_col_idx, &blob_data, &blob_len); - - /* - * Write the content length, then write the HTTP - * header, and then the content. - */ - err = pbxt_streaming.setContentLength(stream, blob_len, result); - if (!err) - err = pbxt_streaming.writeHead(stream, result); - if (!err) - err = pbxt_streaming.writeStream(stream, (void *) blob_data, blob_len, result); - } - catch_(a) { - str_exception_to_result(&self->t_exception, result); - if (result->mr_code == XT_ERR_NO_ROWS) - err = MS_ERR_NOT_FOUND; - else - err = MS_ERR_ENGINE; - } - cont_(a); - xt_sb_set_size(NULL, &value, 0); - return err; -} - -int pbxt_lookup_ref(void *thread, void *open_table, unsigned short col_index, PBMSEngineRefPtr eng_ref, PBMSFieldRefPtr field_ref, PBMSResultPtr result) -{ - THD *thd = (THD *) thread; - XTThreadPtr self = xt_ha_thd_to_self(thd); - XTOpenTablePtr ot = (XTOpenTablePtr) open_table; - int err = MS_OK; - u_int i, len; - char *data; - XTIndexPtr ind = NULL; - - ot->ot_thread = self; - if (ot->ot_row_wbuf_size < ot->ot_table->tab_dic.dic_mysql_buf_size) { - xt_realloc(self, (void **) &ot->ot_row_wbuffer, ot->ot_table->tab_dic.dic_mysql_buf_size); - ot->ot_row_wbuf_size = ot->ot_table->tab_dic.dic_mysql_buf_size; - } - - ot->ot_curr_rec_id = (xtRecordID) XT_GET_DISK_8(eng_ref->er_data); - switch (xt_tab_dirty_read_record(ot, ot->ot_row_wbuffer)) { - case FALSE: - err = MS_ERR_ENGINE; - break; - default: - break; - } - - if (err) { - str_exception_to_result(&self->t_exception, result); - goto exit; - } - - myxt_get_column_name(ot, col_index, PBMS_FIELD_COL_SIZE, field_ref->fr_column); - - for (i=0; i<ot->ot_table->tab_dic.dic_key_count; i++) { - ind = ot->ot_table->tab_dic.dic_keys[i]; - if (ind->mi_flags & (HA_UNIQUE_CHECK | HA_NOSAME)) - break; - } - - if (ind) { - len = 0; - data = field_ref->fr_cond; - for (i=0; i<ind->mi_seg_count; i++) { - if (i > 0) { - xt_strcat(PBMS_FIELD_COND_SIZE, data, "&"); - len = strlen(data); - } - myxt_get_column_name(ot, ind->mi_seg[i].col_idx, PBMS_FIELD_COND_SIZE - len, data + len); - len = strlen(data); - xt_strcat(PBMS_FIELD_COND_SIZE, data, "="); - len = strlen(data); - myxt_get_column_as_string(ot, (char *) ot->ot_row_wbuffer, ind->mi_seg[i].col_idx, PBMS_FIELD_COND_SIZE - len, data + len); - len = strlen(data); - } - } - else - xt_strcpy(PBMS_FIELD_COND_SIZE, field_ref->fr_cond, "*no unique key*"); - - exit: - return err; -} - -PBMSEngineRec pbxt_engine = { - MS_ENGINE_VERSION, - 0, - FALSE, - "PBXT", - NULL, - pbxt_close_conn, - pbxt_open_table, - pbxt_close_table, - pbxt_lock_table, - pbxt_unlock_table, - pbxt_send_blob, - pbxt_lookup_ref -}; - -/* ---------------------------------------------------------------------- - * CALL IN FUNCTIONS - */ - -xtPublic void xt_pbms_close_all_tables(const char *table_url) -{ - pbxt_streaming.closeAllTables(table_url); -} - -xtPublic xtBool xt_pbms_close_connection(void *thd, XTExceptionPtr e) -{ - PBMSResultRec result; - int err; - - err = pbxt_streaming.closeConn(thd, &result); - if (err) { - str_result_to_exception(e, err, &result); - return FAILED; - } - return OK; -} - -xtPublic xtBool xt_pbms_open_table(void **open_table, char *table_path) -{ - PBMSResultRec result; - int err; - - err = pbxt_streaming.openTable(open_table, table_path, &result); - if (err) { - XTThreadPtr thread = xt_get_self(); - - str_result_to_exception(&thread->t_exception, err, &result); - return FAILED; - } - return OK; -} - -xtPublic void xt_pbms_close_table(void *open_table) -{ - PBMSResultRec result; - int err; - - err = pbxt_streaming.closeTable(open_table, &result); - if (err) { - XTThreadPtr thread = xt_get_self(); - - str_result_to_exception(&thread->t_exception, err, &result); - xt_log_exception(thread, &thread->t_exception, XT_LOG_DEFAULT); - } -} - -xtPublic xtBool xt_pbms_use_blob(void *open_table, char **ret_blob_url, char *blob_url, unsigned short col_index) -{ - PBMSResultRec result; - int err; - - err = pbxt_streaming.useBlob(open_table, ret_blob_url, blob_url, col_index, &result); - if (err) { - XTThreadPtr thread = xt_get_self(); - - str_result_to_exception(&thread->t_exception, err, &result); - return FAILED; - } - return OK; -} - -xtPublic xtBool xt_pbms_retain_blobs(void *open_table, PBMSEngineRefPtr eng_ref) -{ - PBMSResultRec result; - int err; - - err = pbxt_streaming.retainBlobs(open_table, eng_ref, &result); - if (err) { - XTThreadPtr thread = xt_get_self(); - - str_result_to_exception(&thread->t_exception, err, &result); - return FAILED; - } - return OK; -} - -xtPublic void xt_pbms_release_blob(void *open_table, char *blob_url, unsigned short col_index, PBMSEngineRefPtr eng_ref) -{ - PBMSResultRec result; - int err; - - err = pbxt_streaming.releaseBlob(open_table, blob_url, col_index, eng_ref, &result); - if (err) { - XTThreadPtr thread = xt_get_self(); - - str_result_to_exception(&thread->t_exception, err, &result); - xt_log_exception(thread, &thread->t_exception, XT_LOG_DEFAULT); - } -} - -xtPublic void xt_pbms_drop_table(const char *table_path) -{ - PBMSResultRec result; - int err; - - err = pbxt_streaming.dropTable(table_path, &result); - if (err) { - XTThreadPtr thread = xt_get_self(); - - str_result_to_exception(&thread->t_exception, err, &result); - xt_log_exception(thread, &thread->t_exception, XT_LOG_DEFAULT); - } -} - -xtPublic void xt_pbms_rename_table(const char *from_table, const char *to_table) -{ - PBMSResultRec result; - int err; - - err = pbxt_streaming.renameTable(from_table, to_table, &result); - if (err) { - XTThreadPtr thread = xt_get_self(); - - str_result_to_exception(&thread->t_exception, err, &result); - xt_log_exception(thread, &thread->t_exception, XT_LOG_DEFAULT); - } -} - -#endif // XT_STREAMING diff --git a/storage/pbxt/src/streaming_xt.h b/storage/pbxt/src/streaming_xt.h deleted file mode 100755 index 6fe36822383..00000000000 --- a/storage/pbxt/src/streaming_xt.h +++ /dev/null @@ -1,46 +0,0 @@ -/* Copyright (c) 2005 PrimeBase Technologies GmbH - * - * PrimeBase XT - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License as published by - * the Free Software Foundation; either version 2 of the License, or - * (at your option) any later version. - * - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. - * - * You should have received a copy of the GNU General Public License - * along with this program; if not, write to the Free Software - * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA - * - * 2006-06-07 Paul McCullagh - * - * H&G2JCtL - * - * This file contains PBXT streaming interface. - */ - -#ifndef __streaming_xt_h__ -#define __streaming_xt_h__ - -#include "xt_defs.h" -#define PBMS_API pbms_api_PBXT -#include "pbms.h" - -xtBool xt_init_streaming(void); -void xt_exit_streaming(void); - -void xt_pbms_close_all_tables(const char *table_url); -xtBool xt_pbms_close_connection(void *thd, XTExceptionPtr e); -xtBool xt_pbms_open_table(void **open_table, char *table_path); -void xt_pbms_close_table(void *open_table); -xtBool xt_pbms_use_blob(void *open_table, char **ret_blob_url, char *blob_url, unsigned short col_index); -xtBool xt_pbms_retain_blobs(void *open_table, PBMSEngineRefPtr eng_ref); -void xt_pbms_release_blob(void *open_table, char *blob_url, unsigned short col_index, PBMSEngineRefPtr eng_ref); -void xt_pbms_drop_table(const char *table_path); -void xt_pbms_rename_table(const char *from_table, const char *to_table); - -#endif diff --git a/storage/pbxt/src/strutil_xt.cc b/storage/pbxt/src/strutil_xt.cc index 60e45c455d1..dee92c67df4 100644 --- a/storage/pbxt/src/strutil_xt.cc +++ b/storage/pbxt/src/strutil_xt.cc @@ -365,9 +365,10 @@ xtPublic void xt_int8_to_byte_size(xtInt8 value, char *string) sprintf(string, "%s %s (%"PRId64" bytes)", val_str, unit, value); } +/* Version number must also be set in configure.in! */ xtPublic c_char *xt_get_version(void) { - return "1.0.08 RC"; + return "1.0.08d RC"; } /* Copy and URL decode! */ diff --git a/storage/pbxt/src/systab_xt.cc b/storage/pbxt/src/systab_xt.cc index 73ecc7a07cb..19d6c1d0ccb 100644 --- a/storage/pbxt/src/systab_xt.cc +++ b/storage/pbxt/src/systab_xt.cc @@ -217,7 +217,7 @@ bool XTLocationTable::seqScanNext(char *buf, bool *eof) uint32 len; Field *curr_field; byte *save; - MY_BITMAP *save_write_set; + MX_BITMAP *save_write_set; last_access = CS_GET_DISK_4(blob->rb_last_access_4); last_ref = CS_GET_DISK_4(blob->rb_last_ref_4); @@ -336,7 +336,6 @@ bool XTLocationTable::seqScanNext(char *buf, bool *eof) table->write_set = save_write_set; return true; #endif - return false; } void XTLocationTable::loadRow(char *buf, xtWord4 row_id) @@ -345,7 +344,7 @@ void XTLocationTable::loadRow(char *buf, xtWord4 row_id) Field *curr_field; XTTablePathPtr tp_ptr; byte *save; - MY_BITMAP *save_write_set; + MX_BITMAP *save_write_set; /* ASSERT_COLUMN_MARKED_FOR_WRITE is failing when * I use store()!?? @@ -386,7 +385,7 @@ void XTLocationTable::loadRow(char *buf, xtWord4 row_id) table->write_set = save_write_set; } -xtWord4 XTLocationTable::seqScanPos(xtWord1 *buf __attribute__((unused))) +xtWord4 XTLocationTable::seqScanPos(xtWord1 *XT_UNUSED(buf)) { return lt_index-1; } @@ -451,7 +450,7 @@ bool XTStatisticsTable::seqScanNext(char *buf, bool *eof) void XTStatisticsTable::loadRow(char *buf, xtWord4 rec_id) { TABLE *table = ost_my_table; - MY_BITMAP *save_write_set; + MX_BITMAP *save_write_set; Field *curr_field; byte *save; const char *stat_name; @@ -503,7 +502,7 @@ void XTStatisticsTable::loadRow(char *buf, xtWord4 rec_id) table->write_set = save_write_set; } -xtWord4 XTStatisticsTable::seqScanPos(xtWord1 *buf __attribute__((unused))) +xtWord4 XTStatisticsTable::seqScanPos(xtWord1 *XT_UNUSED(buf)) { return tt_index-1; } @@ -531,14 +530,14 @@ void st_path_to_table_name(size_t size, char *buffer, const char *path) *str = '.'; } -void XTSystemTableShare::startUp(XTThreadPtr self __attribute__((unused))) +void XTSystemTableShare::startUp(XTThreadPtr XT_UNUSED(self)) { thr_lock_init(&sys_location_lock); thr_lock_init(&sys_statistics_lock); sys_lock_inited = TRUE; } -void XTSystemTableShare::shutDown(XTThreadPtr self __attribute__((unused))) +void XTSystemTableShare::shutDown(XTThreadPtr XT_UNUSED(self)) { if (sys_lock_inited) { thr_lock_delete(&sys_location_lock); @@ -588,7 +587,7 @@ bool XTSystemTableShare::doesSystemTableExist() return false; } -void XTSystemTableShare::createSystemTables(XTThreadPtr self __attribute__((unused)), XTDatabaseHPtr db __attribute__((unused))) +void XTSystemTableShare::createSystemTables(XTThreadPtr XT_UNUSED(self), XTDatabaseHPtr XT_UNUSED(db)) { int i = 0; diff --git a/storage/pbxt/src/systab_xt.h b/storage/pbxt/src/systab_xt.h index e64bcc816f4..408a8749dd0 100644 --- a/storage/pbxt/src/systab_xt.h +++ b/storage/pbxt/src/systab_xt.h @@ -85,15 +85,15 @@ public: virtual bool use() { return true; } virtual bool unuse() { return true; } virtual bool seqScanInit() { return true; } - virtual bool seqScanNext(char *buf __attribute__((unused)), bool *eof) { + virtual bool seqScanNext(char *XT_UNUSED(buf), bool *eof) { *eof = true; return false; } virtual int getRefLen() { return 4; } - virtual xtWord4 seqScanPos(xtWord1 *buf __attribute__((unused))) { + virtual xtWord4 seqScanPos(xtWord1 *XT_UNUSED(buf)) { return 0; } - virtual bool seqScanRead(xtWord4 rec_id __attribute__((unused)), char *buf __attribute__((unused))) { + virtual bool seqScanRead(xtWord4 XT_UNUSED(rec_id), char *XT_UNUSED(buf)) { return true; } diff --git a/storage/pbxt/src/tabcache_xt.cc b/storage/pbxt/src/tabcache_xt.cc index b9f9ccd37e1..7e8cc9e1f6e 100644 --- a/storage/pbxt/src/tabcache_xt.cc +++ b/storage/pbxt/src/tabcache_xt.cc @@ -26,6 +26,10 @@ #include "xt_config.h" +#ifdef DRIZZLED +#include <bitset> +#endif + #include <signal.h> #include "pthread_xt.h" @@ -63,7 +67,7 @@ xtPublic void xt_tc_init(XTThreadPtr self, size_t cache_size) for (u_int i=0; i<XT_TC_SEGMENT_COUNT; i++) { xt_tab_cache.tcm_segment[i].tcs_cache_in_use = 0; xt_tab_cache.tcm_segment[i].tcs_hash_table = (XTTabCachePagePtr *) xt_calloc(self, xt_tab_cache.tcm_hash_size * sizeof(XTTabCachePagePtr)); - xt_rwmutex_init_with_autoname(self, &xt_tab_cache.tcm_segment[i].tcs_lock); + TAB_CAC_INIT_LOCK(self, &xt_tab_cache.tcm_segment[i].tcs_lock); } xt_init_mutex_with_autoname(self, &xt_tab_cache.tcm_lock); @@ -97,7 +101,7 @@ xtPublic void xt_tc_exit(XTThreadPtr self) xt_free(self, xt_tab_cache.tcm_segment[i].tcs_hash_table); xt_tab_cache.tcm_segment[i].tcs_hash_table = NULL; - xt_rwmutex_free(self, &xt_tab_cache.tcm_segment[i].tcs_lock); + TAB_CAC_FREE_LOCK(self, &xt_tab_cache.tcm_segment[i].tcs_lock); } } @@ -213,7 +217,7 @@ xtBool XTTabCache::xt_tc_write(XT_ROW_REC_FILE_PTR file, xtRefID ref_id, size_t page->tcp_dirty = TRUE; ASSERT_NS(page->tcp_db_id == tci_table->tab_db->db_id && page->tcp_tab_id == tci_table->tab_id); *op_seq = tci_table->tab_seq.ts_set_op_seq(page); - xt_rwmutex_unlock(&seg->tcs_lock, thread->t_id); + TAB_CAC_UNLOCK(&seg->tcs_lock, thread->t_id); return OK; } @@ -269,21 +273,36 @@ xtBool XTTabCache::xt_tc_write_cond(XTThreadPtr self, XT_ROW_REC_FILE_PTR file, page->tcp_dirty = TRUE; ASSERT(page->tcp_db_id == tci_table->tab_db->db_id && page->tcp_tab_id == tci_table->tab_id); *op_seq = tci_table->tab_seq.ts_set_op_seq(page); - xt_rwmutex_unlock(&seg->tcs_lock, self->t_id); + TAB_CAC_UNLOCK(&seg->tcs_lock, self->t_id); return TRUE; no_change: - xt_rwmutex_unlock(&seg->tcs_lock, self->t_id); + TAB_CAC_UNLOCK(&seg->tcs_lock, self->t_id); return FALSE; } xtBool XTTabCache::xt_tc_read(XT_ROW_REC_FILE_PTR file, xtRefID ref_id, size_t size, xtWord1 *data, XTThreadPtr thread) { +#ifdef XT_USE_ROW_REC_MMAP_FILES return tc_read_direct(file, ref_id, size, data, thread); +#else + size_t offset; + XTTabCachePagePtr page; + XTTabCacheSegPtr seg; + + if (!tc_fetch(file, ref_id, &seg, &page, &offset, TRUE, thread)) + return FAILED; + /* A read must be completely on a page: */ + ASSERT_NS(offset + size <= tci_page_size); + memcpy(data, page->tcp_data + offset, size); + TAB_CAC_UNLOCK(&seg->tcs_lock, thread->t_id); + return OK; +#endif } xtBool XTTabCache::xt_tc_read_4(XT_ROW_REC_FILE_PTR file, xtRefID ref_id, xtWord4 *value, XTThreadPtr thread) { +#ifdef XT_USE_ROW_REC_MMAP_FILES register u_int page_idx; register XTTabCachePagePtr page; register XTTabCacheSegPtr seg; @@ -300,7 +319,7 @@ xtBool XTTabCache::xt_tc_read_4(XT_ROW_REC_FILE_PTR file, xtRefID ref_id, xtWord seg = &dcg->tcm_segment[hash_idx & XT_TC_SEGMENT_MASK]; hash_idx = (hash_idx >> XT_TC_SEGMENT_SHIFTS) % dcg->tcm_hash_size; - xt_rwmutex_slock(&seg->tcs_lock, thread->t_id); + TAB_CAC_READ_LOCK(&seg->tcs_lock, thread->t_id); page = seg->tcs_hash_table[hash_idx]; while (page) { if (page->tcp_page_idx == page_idx && page->tcp_file_id == file->fr_id) { @@ -311,53 +330,60 @@ xtBool XTTabCache::xt_tc_read_4(XT_ROW_REC_FILE_PTR file, xtRefID ref_id, xtWord ASSERT_NS(offset + 4 <= this->tci_page_size); buffer = page->tcp_data + offset; *value = XT_GET_DISK_4(buffer); - xt_rwmutex_unlock(&seg->tcs_lock, thread->t_id); + TAB_CAC_UNLOCK(&seg->tcs_lock, thread->t_id); return OK; } page = page->tcp_next; } - xt_rwmutex_unlock(&seg->tcs_lock, thread->t_id); + TAB_CAC_UNLOCK(&seg->tcs_lock, thread->t_id); -#ifdef XT_USE_ROW_REC_MMAP_FILES return xt_pread_fmap_4(file, address, value, &thread->st_statistics.st_rec, thread); #else - xtWord1 data[4]; + size_t offset; + XTTabCachePagePtr page; + XTTabCacheSegPtr seg; + xtWord1 *data; - if (!XT_PREAD_RR_FILE(file, address, 4, 4, data, NULL, &thread->st_statistics.st_rec, thread)) + if (!tc_fetch(file, ref_id, &seg, &page, &offset, TRUE, thread)) return FAILED; + /* A read must be completely on a page: */ + ASSERT_NS(offset + 4 <= tci_page_size); + data = page->tcp_data + offset; *value = XT_GET_DISK_4(data); + TAB_CAC_UNLOCK(&seg->tcs_lock, thread->t_id); return OK; #endif } -xtBool XTTabCache::xt_tc_get_page(XT_ROW_REC_FILE_PTR file, xtRefID ref_id, XTTabCachePagePtr *ret_page, size_t *offset, XTThreadPtr thread) +xtBool XTTabCache::xt_tc_get_page(XT_ROW_REC_FILE_PTR file, xtRefID ref_id, xtBool load, XTTabCachePagePtr *ret_page, size_t *offset, XTThreadPtr thread) { XTTabCachePagePtr page; XTTabCacheSegPtr seg; -#ifdef XT_SEQ_SCAN_FROM_MEMORY - if (!tc_fetch_direct(file, ref_id, &seg, &page, offset, thread)) - return FAILED; - if (!seg) { - *ret_page = NULL; - return OK; + if (load) { + if (!tc_fetch(file, ref_id, &seg, &page, offset, TRUE, thread)) + return FAILED; + } + else { + if (!tc_fetch_direct(file, ref_id, &seg, &page, offset, thread)) + return FAILED; + if (!seg) { + *ret_page = NULL; + return OK; + } } -#else - if (!tc_fetch(file, ref_id, &seg, &page, offset, TRUE, thread)) - return FAILED; -#endif page->tcp_lock_count++; - xt_rwmutex_unlock(&seg->tcs_lock, thread->t_id); + TAB_CAC_UNLOCK(&seg->tcs_lock, thread->t_id); *ret_page = page; return OK; } -void XTTabCache::xt_tc_release_page(XT_ROW_REC_FILE_PTR file __attribute__((unused)), XTTabCachePagePtr page, XTThreadPtr thread) +void XTTabCache::xt_tc_release_page(XT_ROW_REC_FILE_PTR XT_UNUSED(file), XTTabCachePagePtr page, XTThreadPtr thread) { XTTabCacheSegPtr seg; seg = &xt_tab_cache.tcm_segment[page->tcp_seg]; - xt_rwmutex_xlock(&seg->tcs_lock, thread->t_id); + TAB_CAC_WRITE_LOCK(&seg->tcs_lock, thread->t_id); #ifdef DEBUG XTTabCachePagePtr lpage, ppage; @@ -379,7 +405,7 @@ void XTTabCache::xt_tc_release_page(XT_ROW_REC_FILE_PTR file __attribute__((unus if (page->tcp_lock_count > 0) page->tcp_lock_count--; - xt_rwmutex_unlock(&seg->tcs_lock, thread->t_id); + TAB_CAC_UNLOCK(&seg->tcs_lock, thread->t_id); } xtBool XTTabCache::xt_tc_read_page(XT_ROW_REC_FILE_PTR file, xtRefID ref_id, xtWord1 *data, XTThreadPtr thread) @@ -412,7 +438,7 @@ xtBool XTTabCache::tc_read_direct(XT_ROW_REC_FILE_PTR file, xtRefID ref_id, size seg = &dcg->tcm_segment[hash_idx & XT_TC_SEGMENT_MASK]; hash_idx = (hash_idx >> XT_TC_SEGMENT_SHIFTS) % dcg->tcm_hash_size; - xt_rwmutex_slock(&seg->tcs_lock, thread->t_id); + TAB_CAC_READ_LOCK(&seg->tcs_lock, thread->t_id); page = seg->tcs_hash_table[hash_idx]; while (page) { if (page->tcp_page_idx == page_idx && page->tcp_file_id == file->fr_id) { @@ -421,12 +447,12 @@ xtBool XTTabCache::tc_read_direct(XT_ROW_REC_FILE_PTR file, xtRefID ref_id, size offset = (ref_id % this->tci_rows_per_page) * this->tci_rec_size; ASSERT_NS(offset + size <= this->tci_page_size); memcpy(data, page->tcp_data + offset, size); - xt_rwmutex_unlock(&seg->tcs_lock, thread->t_id); + TAB_CAC_UNLOCK(&seg->tcs_lock, thread->t_id); return OK; } page = page->tcp_next; } - xt_rwmutex_unlock(&seg->tcs_lock, thread->t_id); + TAB_CAC_UNLOCK(&seg->tcs_lock, thread->t_id); if (!XT_PREAD_RR_FILE(file, address, size, 0, data, &red_size, &thread->st_statistics.st_rec, thread)) return FAILED; memset(data + red_size, 0, size - red_size); @@ -450,7 +476,7 @@ xtBool XTTabCache::tc_fetch_direct(XT_ROW_REC_FILE_PTR file, xtRefID ref_id, XTT seg = &dcg->tcm_segment[hash_idx & XT_TC_SEGMENT_MASK]; hash_idx = (hash_idx >> XT_TC_SEGMENT_SHIFTS) % dcg->tcm_hash_size; - xt_rwmutex_xlock(&seg->tcs_lock, thread->t_id); + TAB_CAC_WRITE_LOCK(&seg->tcs_lock, thread->t_id); page = seg->tcs_hash_table[hash_idx]; while (page) { if (page->tcp_page_idx == page_idx && page->tcp_file_id == file->fr_id) { @@ -460,7 +486,7 @@ xtBool XTTabCache::tc_fetch_direct(XT_ROW_REC_FILE_PTR file, xtRefID ref_id, XTT } page = page->tcp_next; } - xt_rwmutex_unlock(&seg->tcs_lock, thread->t_id); + TAB_CAC_UNLOCK(&seg->tcs_lock, thread->t_id); *ret_seg = NULL; *ret_page = NULL; return OK; @@ -492,7 +518,7 @@ xtBool XTTabCache::tc_fetch(XT_ROW_REC_FILE_PTR file, xtRefID ref_id, XTTabCache seg = &dcg->tcm_segment[hash_idx & XT_TC_SEGMENT_MASK]; hash_idx = (hash_idx >> XT_TC_SEGMENT_SHIFTS) % dcg->tcm_hash_size; - xt_rwmutex_slock(&seg->tcs_lock, thread->t_id); + TAB_CAC_READ_LOCK(&seg->tcs_lock, thread->t_id); page = seg->tcs_hash_table[hash_idx]; while (page) { if (page->tcp_page_idx == page_idx && page->tcp_file_id == file->fr_id) { @@ -528,7 +554,7 @@ xtBool XTTabCache::tc_fetch(XT_ROW_REC_FILE_PTR file, xtRefID ref_id, XTTabCache } page = page->tcp_next; } - xt_rwmutex_unlock(&seg->tcs_lock, thread->t_id); + TAB_CAC_UNLOCK(&seg->tcs_lock, thread->t_id); /* Page not found, allocate a new page: */ size_t page_size = offsetof(XTTabCachePageRec, tcp_data) + this->tci_page_size; @@ -674,7 +700,7 @@ xtBool XTTabCache::tc_fetch(XT_ROW_REC_FILE_PTR file, xtRefID ref_id, XTTabCache #endif /* Add the page to the cache! */ - xt_rwmutex_xlock(&seg->tcs_lock, thread->t_id); + TAB_CAC_WRITE_LOCK(&seg->tcs_lock, thread->t_id); page = seg->tcs_hash_table[hash_idx]; while (page) { if (page->tcp_page_idx == page_idx && page->tcp_file_id == file->fr_id) { @@ -898,11 +924,11 @@ static size_t tabc_free_page(XTThreadPtr self, TCResourcePtr tc) } seg = &dcg->tcm_segment[page->tcp_seg]; - xt_rwmutex_xlock(&seg->tcs_lock, self->t_id); + TAB_CAC_WRITE_LOCK(&seg->tcs_lock, self->t_id); if (page->tcp_dirty) { if (!was_dirty) { - xt_rwmutex_unlock(&seg->tcs_lock, self->t_id); + TAB_CAC_UNLOCK(&seg->tcs_lock, self->t_id); goto retry_2; } @@ -923,7 +949,7 @@ static size_t tabc_free_page(XTThreadPtr self, TCResourcePtr tc) XTDatabaseHPtr db = tab->tab_db; rewait: - xt_rwmutex_unlock(&seg->tcs_lock, self->t_id); + TAB_CAC_UNLOCK(&seg->tcs_lock, self->t_id); /* Flush the log, in case this is holding up the * writer! @@ -963,7 +989,7 @@ static size_t tabc_free_page(XTThreadPtr self, TCResourcePtr tc) db->db_wr_freeer_waiting = FALSE; freer_(); // xt_unlock_mutex(&db->db_wr_lock) - xt_rwmutex_xlock(&seg->tcs_lock, self->t_id); + TAB_CAC_WRITE_LOCK(&seg->tcs_lock, self->t_id); if (XTTableSeq::xt_op_is_before(tab->tab_head_op_seq, page->tcp_op_seq)) goto rewait; } @@ -988,11 +1014,11 @@ static size_t tabc_free_page(XTThreadPtr self, TCResourcePtr tc) */ if ((page = page->tcp_mr_used)) { page_cnt++; - xt_rwmutex_unlock(&seg->tcs_lock, self->t_id); + TAB_CAC_UNLOCK(&seg->tcs_lock, self->t_id); goto retry_2; } } - xt_rwmutex_unlock(&seg->tcs_lock, self->t_id); + TAB_CAC_UNLOCK(&seg->tcs_lock, self->t_id); dcg->tcm_free_try_count++; /* Starting to spin, free the threads: */ @@ -1047,7 +1073,7 @@ static size_t tabc_free_page(XTThreadPtr self, TCResourcePtr tc) seg->tcs_cache_in_use -= freed_space; xt_free_ns(page); - xt_rwmutex_unlock(&seg->tcs_lock, self->t_id); + TAB_CAC_UNLOCK(&seg->tcs_lock, self->t_id); self->st_statistics.st_rec_cache_frees++; dcg->tcm_free_try_count = 0; return freed_space; @@ -1152,11 +1178,14 @@ static void *tabc_fr_run_thread(XTThreadPtr self) } } + /* + * {MYSQL-THREAD-KILL} myxt_destroy_thread(mysql_thread, TRUE); + */ return NULL; } -static void tabc_fr_free_thread(XTThreadPtr self, void *data __attribute__((unused))) +static void tabc_fr_free_thread(XTThreadPtr self, void *XT_UNUSED(data)) { if (xt_tab_cache.tcm_freeer_thread) { xt_lock_mutex(self, &xt_tab_cache.tcm_freeer_lock); @@ -1238,7 +1267,7 @@ xtPublic void xt_load_pages(XTThreadPtr self, XTOpenTablePtr ot) while (rec_id<tab->tab_row_eof_id) { if (!tab->tab_rows.tc_fetch(ot->ot_row_file, rec_id, &seg, &page, &poffset, TRUE, self)) xt_throw(self); - xt_rwmutex_unlock(&seg->tcs_lock, self->t_id); + TAB_CAC_UNLOCK(&seg->tcs_lock, self->t_id); rec_id += tab->tab_rows.tci_rows_per_page; } @@ -1246,7 +1275,7 @@ xtPublic void xt_load_pages(XTThreadPtr self, XTOpenTablePtr ot) while (rec_id<tab->tab_rec_eof_id) { if (!tab->tab_recs.tc_fetch(ot->ot_rec_file, rec_id, &seg, &page, &poffset, TRUE, self)) xt_throw(self); - xt_rwmutex_unlock(&seg->tcs_lock, self->t_id); + TAB_CAC_UNLOCK(&seg->tcs_lock, self->t_id); rec_id += tab->tab_recs.tci_rows_per_page; } } diff --git a/storage/pbxt/src/tabcache_xt.h b/storage/pbxt/src/tabcache_xt.h index 694244835b4..46973037b49 100644 --- a/storage/pbxt/src/tabcache_xt.h +++ b/storage/pbxt/src/tabcache_xt.h @@ -125,11 +125,11 @@ typedef struct XTTableSeq { xt_init_mutex_with_autoname(self, &ts_ns_lock); } - void xt_op_seq_set(XTThreadPtr self __attribute__((unused)), xtOpSeqNo n) { + void xt_op_seq_set(XTThreadPtr XT_UNUSED(self), xtOpSeqNo n) { ts_next_seq = n; } - void xt_op_seq_exit(XTThreadPtr self __attribute__((unused))) { + void xt_op_seq_exit(XTThreadPtr XT_UNUSED(self)) { xt_free_mutex(&ts_ns_lock); } @@ -150,12 +150,50 @@ typedef struct XTTableSeq { #endif } XTTableSeqRec, *XTTableSeqPtr; +#ifdef XT_NO_ATOMICS +#define TAB_CAC_USE_PTHREAD_RW +#else +//#define TAB_CAC_USE_RWMUTEX +//#define TAB_CAC_USE_PTHREAD_RW +//#define IDX_USE_SPINXSLOCK +#define TAB_CAC_USE_XSMUTEX +#endif + +#ifdef TAB_CAC_USE_XSMUTEX +#define TAB_CAC_LOCK_TYPE XTXSMutexRec +#define TAB_CAC_INIT_LOCK(s, i) xt_xsmutex_init_with_autoname(s, i) +#define TAB_CAC_FREE_LOCK(s, i) xt_xsmutex_free(s, i) +#define TAB_CAC_READ_LOCK(i, o) xt_xsmutex_slock(i, o) +#define TAB_CAC_WRITE_LOCK(i, o) xt_xsmutex_xlock(i, o) +#define TAB_CAC_UNLOCK(i, o) xt_xsmutex_unlock(i, o) +#elif defined(TAB_CAC_USE_PTHREAD_RW) +#define TAB_CAC_LOCK_TYPE xt_rwlock_type +#define TAB_CAC_INIT_LOCK(s, i) xt_init_rwlock(s, i) +#define TAB_CAC_FREE_LOCK(s, i) xt_free_rwlock(i) +#define TAB_CAC_READ_LOCK(i, o) xt_slock_rwlock_ns(i) +#define TAB_CAC_WRITE_LOCK(i, o) xt_xlock_rwlock_ns(i) +#define TAB_CAC_UNLOCK(i, o) xt_unlock_rwlock_ns(i) +#elif defined(TAB_CAC_USE_RWMUTEX) +#define TAB_CAC_LOCK_TYPE XTRWMutexRec +#define TAB_CAC_INIT_LOCK(s, i) xt_rwmutex_init_with_autoname(s, i) +#define TAB_CAC_FREE_LOCK(s, i) xt_rwmutex_free(s, i) +#define TAB_CAC_READ_LOCK(i, o) xt_rwmutex_slock(i, o) +#define TAB_CAC_WRITE_LOCK(i, o) xt_rwmutex_xlock(i, o) +#define TAB_CAC_UNLOCK(i, o) xt_rwmutex_unlock(i, o) +#elif defined(TAB_CAC_USE_SPINXSLOCK) +#define TAB_CAC_LOCK_TYPE XTSpinXSLockRec +#define TAB_CAC_INIT_LOCK(s, i) xt_spinxslock_init_with_autoname(s, i) +#define TAB_CAC_FREE_LOCK(s, i) xt_spinxslock_free(s, i) +#define TAB_CAC_READ_LOCK(i, o) xt_spinxslock_slock(i, o) +#define TAB_CAC_WRITE_LOCK(i, o) xt_spinxslock_xlock(i, o) +#define TAB_CAC_UNLOCK(i, o) xt_spinxslock_unlock(i, o) +#endif + /* A disk cache segment. The cache is divided into a number of segments * to improve concurrency. */ typedef struct XTTabCacheSeg { - XTRWMutexRec tcs_lock; /* The cache segment read/write lock. */ - //xt_cond_type tcs_cond; + TAB_CAC_LOCK_TYPE tcs_lock; /* The cache segment read/write lock. */ XTTabCachePagePtr *tcs_hash_table; size_t tcs_cache_in_use; } XTTabCacheSegRec, *XTTabCacheSegPtr; @@ -220,7 +258,7 @@ public: xtBool xt_tc_read(XT_ROW_REC_FILE_PTR file, xtRefID ref_id, size_t size, xtWord1 *data, XTThreadPtr thread); xtBool xt_tc_read_4(XT_ROW_REC_FILE_PTR file, xtRefID ref_id, xtWord4 *data, XTThreadPtr thread); xtBool xt_tc_read_page(XT_ROW_REC_FILE_PTR file, xtRefID ref_id, xtWord1 *data, XTThreadPtr thread); - xtBool xt_tc_get_page(XT_ROW_REC_FILE_PTR file, xtRefID ref_id, XTTabCachePagePtr *page, size_t *offset, XTThreadPtr thread); + xtBool xt_tc_get_page(XT_ROW_REC_FILE_PTR file, xtRefID ref_id, xtBool load, XTTabCachePagePtr *page, size_t *offset, XTThreadPtr thread); void xt_tc_release_page(XT_ROW_REC_FILE_PTR file, XTTabCachePagePtr page, XTThreadPtr thread); xtBool tc_fetch(XT_ROW_REC_FILE_PTR file, xtRefID ref_id, XTTabCacheSegPtr *ret_seg, XTTabCachePagePtr *ret_page, size_t *offset, xtBool read, XTThreadPtr thread); diff --git a/storage/pbxt/src/table_xt.cc b/storage/pbxt/src/table_xt.cc index fc9ae6156cb..2671f02c057 100644 --- a/storage/pbxt/src/table_xt.cc +++ b/storage/pbxt/src/table_xt.cc @@ -35,7 +35,7 @@ #include <drizzled/common.h> #include <mysys/thr_lock.h> #include <drizzled/dtcollation.h> -#include <drizzled/handlerton.h> +#include <drizzled/plugin/storage_engine.h> #else #include "mysql_priv.h" #endif @@ -47,9 +47,6 @@ #include "myxt_xt.h" #include "cache_xt.h" #include "trace_xt.h" -#ifdef XT_STREAMING -#include "streaming_xt.h" -#endif #include "index_xt.h" #include "restart_xt.h" #include "systab_xt.h" @@ -293,17 +290,17 @@ static void tab_get_row_file_name(char *table_name, char *name, xtTableID tab_id sprintf(table_name, "%s-%lu.xtr", name, (u_long) tab_id); } -static void tab_get_data_file_name(char *table_name, char *name, xtTableID tab_id __attribute__((unused))) +static void tab_get_data_file_name(char *table_name, char *name, xtTableID XT_UNUSED(tab_id)) { sprintf(table_name, "%s.xtd", name); } -static void tab_get_index_file_name(char *table_name, char *name, xtTableID tab_id __attribute__((unused))) +static void tab_get_index_file_name(char *table_name, char *name, xtTableID XT_UNUSED(tab_id)) { sprintf(table_name, "%s.xti", name); } -static void tab_free_by_id(XTThreadPtr self __attribute__((unused)), void *thunk __attribute__((unused)), void *item) +static void tab_free_by_id(XTThreadPtr self, void *XT_UNUSED(thunk), void *item) { XTTableEntryPtr te_ptr = (XTTableEntryPtr) item; @@ -315,7 +312,7 @@ static void tab_free_by_id(XTThreadPtr self __attribute__((unused)), void *thunk te_ptr->te_table = NULL; } -static int tab_comp_by_id(XTThreadPtr self __attribute__((unused)), register const void *thunk __attribute__((unused)), register const void *a, register const void *b) +static int tab_comp_by_id(XTThreadPtr XT_UNUSED(self), register const void *XT_UNUSED(thunk), register const void *a, register const void *b) { xtTableID te_id = *((xtTableID *) a); XTTableEntryPtr te_ptr = (XTTableEntryPtr) b; @@ -327,14 +324,14 @@ static int tab_comp_by_id(XTThreadPtr self __attribute__((unused)), register con return 1; } -static void tab_free_path(XTThreadPtr self __attribute__((unused)), void *thunk __attribute__((unused)), void *item) +static void tab_free_path(XTThreadPtr self, void *XT_UNUSED(thunk), void *item) { XTTablePathPtr tp_ptr = *((XTTablePathPtr *) item); xt_free(self, tp_ptr); } -static int tab_comp_path(XTThreadPtr self __attribute__((unused)), register const void *thunk __attribute__((unused)), register const void *a, register const void *b) +static int tab_comp_path(XTThreadPtr XT_UNUSED(self), register const void *XT_UNUSED(thunk), register const void *a, register const void *b) { char *path = (char *) a; XTTablePathPtr tp_ptr = *((XTTablePathPtr *) b); @@ -398,7 +395,7 @@ xtPublic xtBool xt_describe_tables_next(XTThreadPtr self, XTTableDescPtr td) return_(TRUE); } -xtPublic void xt_describe_tables_exit(XTThreadPtr self __attribute__((unused)), XTTableDescPtr td) +xtPublic void xt_describe_tables_exit(XTThreadPtr XT_UNUSED(self), XTTableDescPtr td) { if (td->td_open_dir) xt_dir_close(NULL, td->td_open_dir); @@ -410,8 +407,11 @@ xtPublic void xt_tab_init_db(XTThreadPtr self, XTDatabaseHPtr db) { XTTableDescRec desc; XTTableEntryRec te_tab; + XTTableEntryPtr te_ptr; XTTablePathPtr db_path; + char pbuf[PATH_MAX]; int len; + u_int edx; enter_(); pushr_(xt_tab_exit_db, db); @@ -425,7 +425,6 @@ xtPublic void xt_tab_init_db(XTThreadPtr self, XTDatabaseHPtr db) if (db->db_multi_path) { XTOpenFilePtr of; char *buffer, *ptr, *path; - char pbuf[PATH_MAX]; xt_strcpy(PATH_MAX, pbuf, db->db_main_path); xt_add_location_file(PATH_MAX, pbuf); @@ -490,6 +489,27 @@ xtPublic void xt_tab_init_db(XTThreadPtr self, XTDatabaseHPtr db) } freer_(); // xt_describe_tables_exit(&desc) + /* + * The purpose of this code is to ensure that all tables are opened and cached, + * which is actually only required if tables have foreign key references. + * + * In other words, a side affect of this code is that FK references between tables + * are registered, and checked. + * + * Unfortunately we don't know if a table is referenced by a FK, so we have to open + * all tables. + * + * Cannot open tables in the loop above because db->db_table_by_id which is built + * above is used by xt_use_table_no_lock() + */ + xt_enum_tables_init(&edx); + while ((te_ptr = xt_enum_tables_next(self, db, &edx))) { + xt_strcpy(PATH_MAX, pbuf, te_ptr->te_tab_path->tp_path); + xt_add_dir_char(PATH_MAX, pbuf); + xt_strcat(PATH_MAX, pbuf, te_ptr->te_tab_name); + xt_heap_release(self, xt_use_table_no_lock(self, db, (XTPathStrPtr)pbuf, FALSE, FALSE, NULL, NULL)); + } + popr_(); // Discard xt_tab_exit_db(db) exit_(); } @@ -605,8 +625,9 @@ xtPublic void xt_tab_exit_db(XTThreadPtr self, XTDatabaseHPtr db) } } -static void tab_check_table(XTThreadPtr self __attribute__((unused)), XTTableHPtr tab __attribute__((unused))) +static void tab_check_table(XTThreadPtr self, XTTableHPtr XT_UNUSED(tab)) { + (void) self; enter_(); exit_(); } @@ -661,7 +682,7 @@ xtPublic void xt_enum_tables_init(u_int *edx) *edx = 0; } -xtPublic XTTableEntryPtr xt_enum_tables_next(XTThreadPtr self __attribute__((unused)), XTDatabaseHPtr db, u_int *edx) +xtPublic XTTableEntryPtr xt_enum_tables_next(XTThreadPtr XT_UNUSED(self), XTDatabaseHPtr db, u_int *edx) { XTTableEntryPtr en_ptr; @@ -727,6 +748,12 @@ static xtBool tab_find_table(XTThreadPtr self, XTDatabaseHPtr db, XTPathStrPtr n return FALSE; } +xtPublic void xt_tab_disable_index(XTTableHPtr tab, u_int ind_error) +{ + tab->tab_dic.dic_disable_index = ind_error; + xt_tab_set_table_repair_pending(tab); +} + xtPublic void xt_tab_set_index_error(XTTableHPtr tab) { switch (tab->tab_dic.dic_disable_index) { @@ -803,22 +830,39 @@ static void tab_load_index_header(XTThreadPtr self, XTTableHPtr tab, XTOpenFileP tab->tab_index_page_size = XT_GET_DISK_4(index_fmt->if_page_size_4); } +#ifdef XT_USE_LAZY_DELETE + if (tab->tab_dic.dic_index_ver <= XT_IND_NO_LAZY_DELETE) + tab->tab_dic.dic_no_lazy_delete = TRUE; + else + tab->tab_dic.dic_no_lazy_delete = FALSE; +#else + tab->tab_dic.dic_no_lazy_delete = TRUE; +#endif + /* Incorrect version of index is handled by allowing a sequential scan, but no index access. * Recovery with the wrong index type will not recover the indexes, a REPAIR TABLE * will be required! */ if (tab->tab_dic.dic_index_ver != XT_IND_CURRENT_VERSION) { - if (tab->tab_dic.dic_index_ver != XT_IND_CURRENT_VERSION) - tab->tab_dic.dic_disable_index = XT_INDEX_TOO_OLD; - else - tab->tab_dic.dic_disable_index = XT_INDEX_TOO_NEW; + switch (tab->tab_dic.dic_index_ver) { + case XT_IND_NO_LAZY_DELETE: + case XT_IND_LAZY_DELETE_OK: + /* I can handle this type of index. */ + break; + default: + if (tab->tab_dic.dic_index_ver < XT_IND_CURRENT_VERSION) + xt_tab_disable_index(tab, XT_INDEX_TOO_OLD); + else + xt_tab_disable_index(tab, XT_INDEX_TOO_NEW); + break; + } } else if (tab->tab_index_page_size != XT_INDEX_PAGE_SIZE) - tab->tab_dic.dic_disable_index = XT_INDEX_BAD_BLOCK; + xt_tab_disable_index(tab, XT_INDEX_BAD_BLOCK); } else { memset(tab->tab_index_head, 0, XT_INDEX_HEAD_SIZE); - tab->tab_dic.dic_disable_index = XT_INDEX_MISSING; + xt_tab_disable_index(tab, XT_INDEX_MISSING); tab->tab_index_header_size = XT_INDEX_HEAD_SIZE; tab->tab_index_page_size = XT_INDEX_PAGE_SIZE; tab->tab_dic.dic_index_ver = 0; @@ -1089,6 +1133,8 @@ static int tab_new_handle(XTThreadPtr self, XTTableHPtr *r_tab, XTDatabaseHPtr d xt_heap_set_release_callback(self, tab, tab_onrelease); + tab->tab_repair_pending = xt_tab_is_table_repair_pending(tab); + popr_(); // Discard xt_heap_release(tab) xt_ht_put(self, db->db_tables, tab); @@ -1216,11 +1262,6 @@ static XTOpenTablePoolPtr tab_lock_table(XTThreadPtr self, XTPathStrPtr name, xt return_(NULL); } -#ifdef XT_STREAMING - /* Tell PBMS to close all open tables of this sort: */ - xt_pbms_close_all_tables(name->ps_path); -#endif - /* Wait for all open tables to close: */ xt_db_wait_for_open_tables(self, table_pool); @@ -1297,9 +1338,6 @@ xtPublic void xt_create_table(XTThreadPtr self, XTPathStrPtr name, XTDictionaryP /* Remove the PBMS table: */ ASSERT(xt_get_self() == self); -#ifdef XT_STREAMING - xt_pbms_drop_table(name->ps_path); -#endif /* Remove the table from the directory. It will get a new * ID so the handle in the directory will no longer be valid. @@ -1572,7 +1610,7 @@ xtPublic void xt_create_table(XTThreadPtr self, XTPathStrPtr name, XTDictionaryP exit_(); } -xtPublic void xt_drop_table(XTThreadPtr self, XTPathStrPtr tab_name) +xtPublic void xt_drop_table(XTThreadPtr self, XTPathStrPtr tab_name, xtBool drop_db) { XTDatabaseHPtr db = self->st_database; XTOpenTablePoolPtr table_pool; @@ -1596,8 +1634,16 @@ xtPublic void xt_drop_table(XTThreadPtr self, XTPathStrPtr tab_name) tab_id = tab->tab_id; /* tab is not null if returned table_pool is not null */ /* check if other tables refer this */ if (!self->st_ignore_fkeys) - can_drop = tab->tab_dic.dic_table->checkCanDrop(); + can_drop = tab->tab_dic.dic_table->checkCanDrop(drop_db); + } +#ifdef DRIZZLED + /* See the comment in ha_pbxt::delete_table regarding different implmentation of DROP TABLE + * in MySQL and Drizzle + */ + else { + xt_throw_xterr(XT_CONTEXT, XT_ERR_TABLE_NOT_FOUND); } +#endif if (can_drop) { if (tab_id) { @@ -1614,9 +1660,6 @@ xtPublic void xt_drop_table(XTThreadPtr self, XTPathStrPtr tab_name) tab_delete_table_files(self, tab_name, tab_id); ASSERT(xt_get_self() == self); -#ifdef XT_STREAMING - xt_pbms_drop_table(tab_name->ps_path); -#endif if ((te_ptr = (XTTableEntryPtr) xt_sl_find(self, db->db_table_by_id, &tab_id))) { tab_remove_table_path(self, db, te_ptr->te_tab_path); xt_sl_delete(self, db->db_table_by_id, &tab_id); @@ -1733,6 +1776,7 @@ xtPublic void xt_check_table(XTThreadPtr self, XTOpenTablePtr ot) u_llong max_comp_rec_len = 0; size_t rec_size; size_t row_size; + u_llong ext_data_len = 0; #if defined(DUMP_CHECK_TABLE) || defined(CHECK_TABLE_STATS) printf("\nCHECK TABLE: %s\n", tab->tab_name->ps_path); @@ -1832,6 +1876,7 @@ xtPublic void xt_check_table(XTThreadPtr self, XTOpenTablePtr ot) printf("record-X "); #endif alloc_rec_count++; + ext_data_len += XT_GET_DISK_4(rec_buf->re_log_dat_siz_4); row_size = XT_GET_DISK_4(rec_buf->re_log_dat_siz_4) + ot->ot_rec_size - XT_REC_EXT_HEADER_SIZE; alloc_rec_bytes += row_size; if (!min_comp_rec_len || row_size < min_comp_rec_len) @@ -1887,6 +1932,9 @@ xtPublic void xt_check_table(XTThreadPtr self, XTOpenTablePtr ot) } #ifdef CHECK_TABLE_STATS + if (!tab->tab_dic.dic_rec_fixed) + printf("Extendend data length = %llu\n", ext_data_len); + if (alloc_rec_count) { printf("Minumum comp. rec. len. = %llu\n", (u_llong) min_comp_rec_len); printf("Average comp. rec. len. = %llu\n", (u_llong) ((double) alloc_rec_bytes / (double) alloc_rec_count + (double) 0.5)); @@ -2055,6 +2103,8 @@ xtPublic void xt_rename_table(XTThreadPtr self, XTPathStrPtr old_name, XTPathStr popr_(); // Discard xt_free(te_new_name); tab = xt_use_table_no_lock(self, db, new_name, FALSE, FALSE, &dic, NULL); + /* All renamed tables are considered repaired! */ + xt_tab_table_repaired(tab); xt_heap_release(self, tab); freer_(); // myxt_free_dictionary(&dic) @@ -2306,6 +2356,9 @@ xtPublic XTOpenTablePtr tab_open_table(XTTableHPtr tab) return NULL; memset(ot, 0, offsetof(XTOpenTableRec, ot_ind_wbuf)); + ot->ot_seq_page = NULL; + ot->ot_seq_data = NULL; + self = xt_get_self(); try_(a) { xt_heap_reference(self, tab); @@ -3353,6 +3406,16 @@ xtPublic int xt_tab_dirty_read_record(register XTOpenTablePtr ot, xtWord1 *buffe return OK; } +#ifdef XT_USE_ROW_REC_MMAP_FILES +/* Loading into cache is not required, + * Instead we copy the memory map to load the + * data. + */ +#define TAB_ROW_LOAD_CACHE FALSE +#else +#define TAB_ROW_LOAD_CACHE TRUE +#endif + /* * Pull the entire row pointer file into memory. */ @@ -3376,7 +3439,7 @@ xtPublic void xt_tab_load_row_pointers(XTThreadPtr self, XTOpenTablePtr ot) end_offset = xt_row_id_to_row_offset(tab, eof_rec_id); rec_id = 1; while (rec_id < eof_rec_id) { - if (!tab->tab_rows.xt_tc_get_page(ot->ot_row_file, rec_id, &page, &poffset, self)) + if (!tab->tab_rows.xt_tc_get_page(ot->ot_row_file, rec_id, TAB_ROW_LOAD_CACHE, &page, &poffset, self)) xt_throw(self); if (page) tab->tab_rows.xt_tc_release_page(ot->ot_row_file, page, self); @@ -3392,7 +3455,7 @@ xtPublic void xt_tab_load_row_pointers(XTThreadPtr self, XTOpenTablePtr ot) XT_LOCK_MEMORY_PTR(buff_ptr, ot->ot_row_file, offset, tfer, &self->st_statistics.st_rec, self); if (buff_ptr) { memcpy(buffer, buff_ptr, tfer); - XT_UNLOCK_MEMORY_PTR(ot->ot_row_file, self); + XT_UNLOCK_MEMORY_PTR(ot->ot_row_file, buff_ptr, TRUE, self); } } rec_id += tab->tab_rows.tci_rows_per_page; @@ -3521,7 +3584,7 @@ static void tab_restore_exception(XTExceptionPtr e) * FALSE if the record has already been freed. * */ -xtPublic int xt_tab_remove_record(XTOpenTablePtr ot, xtRecordID rec_id, xtWord1 *rec_data, xtRecordID *prev_var_id, xtBool clean_delete, xtRowID row_id, xtXactID xn_id __attribute__((unused))) +xtPublic int xt_tab_remove_record(XTOpenTablePtr ot, xtRecordID rec_id, xtWord1 *rec_data, xtRecordID *prev_var_id, xtBool clean_delete, xtRowID row_id, xtXactID XT_UNUSED(xn_id)) { register XTTableHPtr tab = ot->ot_table; size_t rec_size; @@ -3664,49 +3727,6 @@ xtPublic int xt_tab_remove_record(XTOpenTablePtr ot, xtRecordID rec_id, xtWord1 } } -#ifdef XT_STREAMING - if (tab->tab_dic.dic_blob_count) { - /* If the record contains any LONGBLOB then check how much - * space we need. - */ - size_t blob_size; - - switch (old_rec_type) { - case XT_TAB_STATUS_DELETE: - case XT_TAB_STATUS_DEL_CLEAN: - break; - case XT_TAB_STATUS_FIXED: - case XT_TAB_STATUS_FIX_CLEAN: - /* Should not be the case, record with LONGBLOB can never be fixed! */ - break; - case XT_TAB_STATUS_VARIABLE: - case XT_TAB_STATUS_VAR_CLEAN: - cols_req = tab->tab_dic.dic_blob_cols_req; - cols_in_buffer = cols_req; - blob_size = myxt_load_row_length(ot, rec_size - XT_REC_FIX_HEADER_SIZE, ot->ot_row_rbuffer + XT_REC_FIX_HEADER_SIZE, &cols_in_buffer); - if (cols_in_buffer < cols_req) - blob_size = tab->tab_dic.dic_rec_size; - else - blob_size += XT_REC_FIX_HEADER_SIZE; - if (blob_size > rec_size) - rec_size = blob_size; - break; - case XT_TAB_STATUS_EXT_DLOG: - case XT_TAB_STATUS_EXT_CLEAN: - cols_req = tab->tab_dic.dic_blob_cols_req; - cols_in_buffer = cols_req; - blob_size = myxt_load_row_length(ot, rec_size - XT_REC_EXT_HEADER_SIZE, ot->ot_row_rbuffer + XT_REC_EXT_HEADER_SIZE, &cols_in_buffer); - if (cols_in_buffer < cols_req) - blob_size = tab->tab_dic.dic_rec_size; - else - blob_size += XT_REC_EXT_HEADER_SIZE; - if (blob_size > rec_size) - rec_size = blob_size; - break; - } - } -#endif - set_removed: if (XT_REC_IS_EXT_DLOG(old_rec_type)) { /* {LOCK-EXT-REC} Lock, and read again to make sure that the @@ -3810,7 +3830,7 @@ static xtRowID tab_new_row(XTOpenTablePtr ot, XTTableHPtr tab) xt_unlock_mutex_ns(&tab->tab_row_lock); return 0; } - xt_rwmutex_unlock(&seg->tcs_lock, ot->ot_thread->t_id); + TAB_CAC_UNLOCK(&seg->tcs_lock, ot->ot_thread->t_id); } tab->tab_row_eof_id++; } @@ -4343,15 +4363,6 @@ xtPublic xtBool xt_tab_new_record(XTOpenTablePtr ot, xtWord1 *rec_buf) xtRowID row_id; u_int idx_cnt = 0; XTIndexPtr *ind; -#ifdef XT_STREAMING - void *pbms_table; - - /* PBMS: Reference BLOBs!? */ - if (tab->tab_dic.dic_blob_count) { - if (!myxt_use_blobs(ot, &pbms_table, rec_buf)) - return FAILED; - } -#endif if (!myxt_store_row(ot, &rec_info, (char *) rec_buf)) goto failed_0; @@ -4386,17 +4397,6 @@ xtPublic xtBool xt_tab_new_record(XTOpenTablePtr ot, xtWord1 *rec_buf) } } -#ifdef XT_STREAMING - /* Reference the BLOBs in the row: */ - if (tab->tab_dic.dic_blob_count) { - if (!myxt_retain_blobs(ot, pbms_table, rec_info.ri_rec_id)) { - pbms_table = NULL; - goto failed_2; - } - pbms_table = NULL; - } -#endif - /* Do the foreign key stuff: */ if (ot->ot_table->tab_dic.dic_table->dt_fkeys.size() > 0) { if (!ot->ot_table->tab_dic.dic_table->insertRow(ot, rec_buf)) @@ -4417,10 +4417,6 @@ xtPublic xtBool xt_tab_new_record(XTOpenTablePtr ot, xtWord1 *rec_buf) tab_free_row_on_fail(ot, tab, row_id); failed_0: -#ifdef XT_STREAMING - if (tab->tab_dic.dic_blob_count && pbms_table) - myxt_unuse_blobs(ot, pbms_table); -#endif return FAILED; } @@ -4524,15 +4520,6 @@ static xtBool tab_overwrite_record(XTOpenTablePtr ot, xtWord1 *before_buf, xtWor xtLogOffset log_offset; xtBool prev_ext_rec; -#ifdef XT_STREAMING - void *pbms_table; - - if (tab->tab_dic.dic_blob_count) { - if (!myxt_use_blobs(ot, &pbms_table, after_buf)) - return FAILED; - } -#endif - if (!myxt_store_row(ot, &rec_info, (char *) after_buf)) goto failed_0; @@ -4596,16 +4583,6 @@ static xtBool tab_overwrite_record(XTOpenTablePtr ot, xtWord1 *before_buf, xtWor if (prev_ext_rec) tab_free_ext_record_on_fail(ot, rec_id, &prev_rec_head, TRUE); -#ifdef XT_STREAMING - if (tab->tab_dic.dic_blob_count) { - /* Retain the BLOBs new record: */ - if (!myxt_retain_blobs(ot, pbms_table, rec_id)) - return FAILED; - /* Release the BLOBs in the old record: */ - myxt_release_blobs(ot, before_buf, rec_id); - } -#endif - return OK; failed_2: @@ -4648,11 +4625,6 @@ static xtBool tab_overwrite_record(XTOpenTablePtr ot, xtWord1 *before_buf, xtWor tab_free_ext_record_on_fail(ot, rec_id, &prev_rec_head, TRUE); failed_0: -#ifdef XT_STREAMING - /* Unuse the BLOBs of the new record: */ - if (tab->tab_dic.dic_blob_count && pbms_table) - myxt_unuse_blobs(ot, pbms_table); -#endif return FAILED; } @@ -4666,10 +4638,6 @@ xtPublic xtBool xt_tab_update_record(XTOpenTablePtr ot, xtWord1 *before_buf, xtW u_int idx_cnt = 0; XTIndexPtr *ind; -#ifdef XT_STREAMING - void *pbms_table; -#endif - /* * Originally only the flag ot->ot_curr_updated was checked, and if it was on, then * tab_overwrite_record() was called, but this caused crashes in some cases like: @@ -4709,14 +4677,6 @@ xtPublic xtBool xt_tab_update_record(XTOpenTablePtr ot, xtWord1 *before_buf, xtW row_id = ot->ot_curr_row_id; self = ot->ot_thread; -#ifdef XT_STREAMING - /* PBMS: Reference BLOBs!? */ - if (tab->tab_dic.dic_blob_count) { - if (!myxt_use_blobs(ot, &pbms_table, after_buf)) - return FAILED; - } -#endif - if (!myxt_store_row(ot, &rec_info, (char *) after_buf)) goto failed_0; @@ -4766,17 +4726,6 @@ xtPublic xtBool xt_tab_update_record(XTOpenTablePtr ot, xtWord1 *before_buf, xtW } } -#ifdef XT_STREAMING - /* Reference the BLOBs in the row: */ - if (tab->tab_dic.dic_blob_count) { - if (!myxt_retain_blobs(ot, pbms_table, rec_info.ri_rec_id)) { - pbms_table = NULL; - goto failed_2; - } - pbms_table = NULL; - } -#endif - if (ot->ot_table->tab_dic.dic_table->dt_trefs || ot->ot_table->tab_dic.dic_table->dt_fkeys.size() > 0) { if (!ot->ot_table->tab_dic.dic_table->updateRow(ot, before_buf, after_buf)) goto failed_2; @@ -4793,10 +4742,6 @@ xtPublic xtBool xt_tab_update_record(XTOpenTablePtr ot, xtWord1 *before_buf, xtW XT_TAB_ROW_UNLOCK(&tab->tab_row_rwlock[row_id % XT_ROW_RWLOCKS], ot->ot_thread); failed_0: -#ifdef XT_STREAMING - if (tab->tab_dic.dic_blob_count && pbms_table) - myxt_unuse_blobs(ot, pbms_table); -#endif return FAILED; } @@ -4906,6 +4851,7 @@ xtPublic xtBool xt_tab_seq_init(XTOpenTablePtr ot) register XTTableHPtr tab = ot->ot_table; ot->ot_seq_page = NULL; + ot->ot_seq_data = NULL; ot->ot_on_page = FALSE; ot->ot_seq_offset = 0; @@ -4958,6 +4904,7 @@ xtPublic void xt_tab_seq_reset(XTOpenTablePtr ot) ot->ot_seq_rec_id = 0; ot->ot_seq_eof_id = 0; ot->ot_seq_page = NULL; + ot->ot_seq_data = NULL; ot->ot_on_page = FALSE; ot->ot_seq_offset = 0; } @@ -4970,23 +4917,40 @@ xtPublic void xt_tab_seq_exit(XTOpenTablePtr ot) tab->tab_recs.xt_tc_release_page(ot->ot_rec_file, ot->ot_seq_page, ot->ot_thread); ot->ot_seq_page = NULL; } + if (ot->ot_seq_data) + XT_UNLOCK_MEMORY_PTR(ot->ot_rec_file, ot->ot_seq_data, TRUE, ot->ot_thread); ot->ot_on_page = FALSE; } +#ifdef XT_USE_ROW_REC_MMAP_FILES +#define TAB_SEQ_LOAD_CACHE FALSE +#else +#ifdef XT_SEQ_SCAN_LOADS_CACHE +#define TAB_SEQ_LOAD_CACHE TRUE +#else +#define TAB_SEQ_LOAD_CACHE FALSE +#endif +#endif + xtPublic xtBool xt_tab_seq_next(XTOpenTablePtr ot, xtWord1 *buffer, xtBool *eof) { register XTTableHPtr tab = ot->ot_table; register size_t rec_size = tab->tab_dic.dic_rec_size; xtWord1 *buff_ptr; xtRecordID new_rec_id; - xtBool ptr_locked; xtRecordID invalid_rec = 0; - XTTabRecHeadDRec rec_head; next_page: if (!ot->ot_on_page) { - if (!(ot->ot_on_page = tab->tab_recs.xt_tc_get_page(ot->ot_rec_file, ot->ot_seq_rec_id, &ot->ot_seq_page, &ot->ot_seq_offset, ot->ot_thread))) + if (!(ot->ot_on_page = tab->tab_recs.xt_tc_get_page(ot->ot_rec_file, ot->ot_seq_rec_id, TAB_SEQ_LOAD_CACHE, &ot->ot_seq_page, &ot->ot_seq_offset, ot->ot_thread))) return FAILED; + if (!ot->ot_seq_page) { + XT_LOCK_MEMORY_PTR(ot->ot_seq_data, ot->ot_rec_file, xt_rec_id_to_rec_offset(tab, ot->ot_seq_rec_id), tab->tab_rows.tci_page_size, &ot->ot_thread->st_statistics.st_rec, ot->ot_thread); + if (!ot->ot_seq_data) + return FAILED; + ot->ot_on_page = TRUE; + ot->ot_seq_offset = 0; + } } next_record: @@ -5001,22 +4965,19 @@ xtPublic xtBool xt_tab_seq_next(XTOpenTablePtr ot, xtWord1 *buffer, xtBool *eof) tab->tab_recs.xt_tc_release_page(ot->ot_rec_file, ot->ot_seq_page, ot->ot_thread); ot->ot_seq_page = NULL; } + if (ot->ot_seq_data) + /* NULL here means that in the case of non-memory mapped + * files we "keep" the lock. + */ + XT_UNLOCK_MEMORY_PTR(ot->ot_rec_file, ot->ot_seq_data, FALSE, ot->ot_thread); ot->ot_on_page = FALSE; goto next_page; } - if (ot->ot_seq_page) { - ptr_locked = FALSE; + if (ot->ot_seq_page) buff_ptr = ot->ot_seq_page->tcp_data + ot->ot_seq_offset; - } - else { - size_t red_size; - - ptr_locked = TRUE; - if (!xt_pread_fmap(ot->ot_rec_file, xt_rec_id_to_rec_offset(tab, ot->ot_seq_rec_id), sizeof(XTTabRecHeadDRec), sizeof(XTTabRecHeadDRec), &rec_head, &red_size, &ot->ot_thread->st_statistics.st_rec, ot->ot_thread)) - return FAILED; - buff_ptr = (xtWord1 *) &rec_head; - } + else + buff_ptr = ot->ot_seq_data + ot->ot_seq_offset; /* This is the current record: */ ot->ot_curr_rec_id = ot->ot_seq_rec_id; @@ -5033,7 +4994,6 @@ xtPublic xtBool xt_tab_seq_next(XTOpenTablePtr ot, xtWord1 *buffer, xtBool *eof) case XT_ERR: goto failed; case XT_NEW: - ptr_locked = FALSE; buff_ptr = ot->ot_row_rbuffer; if (!xt_tab_get_rec_data(ot, new_rec_id, rec_size, ot->ot_row_rbuffer)) return XT_ERR; @@ -5066,8 +5026,6 @@ xtPublic xtBool xt_tab_seq_next(XTOpenTablePtr ot, xtWord1 *buffer, xtBool *eof) invalid_rec = 0; goto next_record; default: - if (ptr_locked) - XT_LOCK_MEMORY_PTR(buff_ptr, ot->ot_rec_file, xt_rec_id_to_rec_offset(tab, ot->ot_curr_rec_id), tab->tab_rows.tci_page_size, &ot->ot_thread->st_statistics.st_rec, ot->ot_thread); break; } @@ -5099,17 +5057,176 @@ xtPublic xtBool xt_tab_seq_next(XTOpenTablePtr ot, xtWord1 *buffer, xtBool *eof) break; } } - if (ptr_locked) - XT_UNLOCK_MEMORY_PTR(ot->ot_rec_file, ot->ot_thread); *eof = FALSE; return OK; failed_1: - if (ptr_locked) - XT_UNLOCK_MEMORY_PTR(ot->ot_rec_file, ot->ot_thread); failed: return FAILED; } +/* + * ----------------------------------------------------------------------- + * REPAIR TABLE + */ + +#define REP_FIND 0 +#define REP_ADD 1 +#define REP_DEL 2 + +static xtBool tab_exec_repair_pending(XTDatabaseHPtr db, int what, char *table_name) +{ + XTThreadPtr thread = xt_get_self(); + char file_path[PATH_MAX]; + XTOpenFilePtr of = NULL; + int len; + char *buffer = NULL, *ptr, *name; + char ch; + xtBool found = FALSE; + + xt_strcpy(PATH_MAX, file_path, db->db_main_path); + xt_add_pbxt_file(PATH_MAX, file_path, "repair-pending"); + + if (what == REP_ADD) { + if (!xt_open_file_ns(&of, file_path, XT_FS_CREATE | XT_FS_MAKE_PATH)) + return FALSE; + } + else { + if (!xt_open_file_ns(&of, file_path, XT_FS_DEFAULT)) + return FALSE; + } + if (!of) + return FALSE; + + len = (int) xt_seek_eof_file(NULL, of); + + if (!(buffer = (char *) xt_malloc_ns(len + 1))) + goto failed; + + if (!xt_pread_file(of, 0, len, len, buffer, NULL, &thread->st_statistics.st_x, thread)) + goto failed; + + buffer[len] = 0; + ptr = buffer; + for(;;) { + name = ptr; + while (*ptr && *ptr != '\n' && *ptr != '\r') + ptr++; + if (ptr > name) { + ch = *ptr; + *ptr = 0; + if (xt_tab_compare_names(name, table_name) == 0) { + *ptr = ch; + found = TRUE; + break; + } + *ptr = ch; + } + if (!*ptr) + break; + ptr++; + } + + switch (what) { + case REP_ADD: + if (!found) { + /* Remove any trailing empty lines: */ + while (len > 0) { + if (buffer[len-1] != '\n' && buffer[len-1] != '\r') + break; + len--; + } + if (len > 0) { + if (!xt_pwrite_file(of, len, 1, (void *) "\n", &thread->st_statistics.st_x, thread)) + goto failed; + len++; + } + if (!xt_pwrite_file(of, len, strlen(table_name), table_name, &thread->st_statistics.st_x, thread)) + goto failed; + len += strlen(table_name); + if (!xt_set_eof_file(NULL, of, len)) + goto failed; + } + break; + case REP_DEL: + if (found) { + if (*ptr != '\0') + ptr++; + memmove(name, ptr, len - (ptr - buffer)); + len = len - (ptr - name); + + /* Remove trailing empty lines: */ + while (len > 0) { + if (buffer[len-1] != '\n' && buffer[len-1] != '\r') + break; + len--; + } + + if (len > 0) { + if (!xt_pwrite_file(of, 0, len, buffer, &thread->st_statistics.st_x, thread)) + goto failed; + if (!xt_set_eof_file(NULL, of, len)) + goto failed; + } + } + break; + } + + xt_close_file_ns(of); + xt_free_ns(buffer); + + if (len == 0) + xt_fs_delete(NULL, file_path); + return found; + + failed: + if (of) + xt_close_file_ns(of); + if (buffer) + xt_free_ns(buffer); + xt_log_and_clear_exception(thread); + return FALSE; +} + +xtPublic void tab_make_table_name(XTTableHPtr tab, char *table_name, size_t size) +{ + char name_buf[XT_IDENTIFIER_NAME_SIZE*3+3]; + + xt_2nd_last_name_of_path(sizeof(name_buf), name_buf, tab->tab_name->ps_path); + myxt_static_convert_file_name(name_buf, table_name, size); + xt_strcat(size, table_name, "."); + myxt_static_convert_file_name(xt_last_name_of_path(tab->tab_name->ps_path), name_buf, sizeof(name_buf)); + xt_strcat(size, table_name, name_buf); +} + +xtPublic xtBool xt_tab_is_table_repair_pending(XTTableHPtr tab) +{ + char table_name[XT_IDENTIFIER_NAME_SIZE*3+3]; + + tab_make_table_name(tab, table_name, sizeof(table_name)); + return tab_exec_repair_pending(tab->tab_db, REP_FIND, table_name); +} + +xtPublic void xt_tab_table_repaired(XTTableHPtr tab) +{ + if (tab->tab_repair_pending) { + char table_name[XT_IDENTIFIER_NAME_SIZE*3+3]; + + tab->tab_repair_pending = FALSE; + tab_make_table_name(tab, table_name, sizeof(table_name)); + tab_exec_repair_pending(tab->tab_db, REP_DEL, table_name); + } +} + +xtPublic void xt_tab_set_table_repair_pending(XTTableHPtr tab) +{ + if (!tab->tab_repair_pending) { + char table_name[XT_IDENTIFIER_NAME_SIZE*3+3]; + + tab->tab_repair_pending = TRUE; + tab_make_table_name(tab, table_name, sizeof(table_name)); + tab_exec_repair_pending(tab->tab_db, REP_ADD, table_name); + } +} diff --git a/storage/pbxt/src/table_xt.h b/storage/pbxt/src/table_xt.h index 6ba3eadd2ae..8378534ce77 100644 --- a/storage/pbxt/src/table_xt.h +++ b/storage/pbxt/src/table_xt.h @@ -45,7 +45,17 @@ struct XTTablePath; #define XT_TAB_INCOMPATIBLE_VERSION 4 #define XT_TAB_CURRENT_VERSION 5 -#define XT_IND_CURRENT_VERSION 3 +/* This version of the index does not have lazy + * delete. The new version is compatible with + * this and maintains the old format. + */ +#define XT_IND_NO_LAZY_DELETE 3 +#define XT_IND_LAZY_DELETE_OK 4 +#ifdef XT_USE_LAZY_DELETE +#define XT_IND_CURRENT_VERSION XT_IND_LAZY_DELETE_OK +#else +#define XT_IND_CURRENT_VERSION XT_IND_NO_LAZY_DELETE +#endif #define XT_HEAD_BUFFER_SIZE 1024 @@ -100,15 +110,21 @@ struct XTTablePath; #define XT_TAB_POOL_CLOSED 3 /* Cannot open table at the moment, the pool is closed. */ #define XT_TAB_FAILED 4 -#define XT_TAB_ROW_USE_RW_MUTEX +#ifdef XT_NO_ATOMICS +#define XT_TAB_ROW_USE_PTHREAD_RW +#else +//#define XT_TAB_ROW_USE_RWMUTEX +//#define XT_TAB_ROW_USE_SPINXSLOCK +#define XT_TAB_ROW_USE_XSMUTEX +#endif -#ifdef XT_TAB_ROW_USE_FASTWRLOCK -#define XT_TAB_ROW_LOCK_TYPE XTFastRWLockRec -#define XT_TAB_ROW_INIT_LOCK(s, i) xt_fastrwlock_init(s, i) -#define XT_TAB_ROW_FREE_LOCK(s, i) xt_fastrwlock_free(s, i) -#define XT_TAB_ROW_READ_LOCK(i, s) xt_fastrwlock_slock(i, s) -#define XT_TAB_ROW_WRITE_LOCK(i, s) xt_fastrwlock_xlock(i, s) -#define XT_TAB_ROW_UNLOCK(i, s) xt_fastrwlock_unlock(i, s) +#ifdef XT_TAB_ROW_USE_XSMUTEX +#define XT_TAB_ROW_LOCK_TYPE XTXSMutexRec +#define XT_TAB_ROW_INIT_LOCK(s, i) xt_xsmutex_init_with_autoname(s, i) +#define XT_TAB_ROW_FREE_LOCK(s, i) xt_xsmutex_free(s, i) +#define XT_TAB_ROW_READ_LOCK(i, s) xt_xsmutex_slock(i, (s)->t_id) +#define XT_TAB_ROW_WRITE_LOCK(i, s) xt_xsmutex_xlock(i, (s)->t_id) +#define XT_TAB_ROW_UNLOCK(i, s) xt_xsmutex_unlock(i, (s)->t_id) #elif defined(XT_TAB_ROW_USE_PTHREAD_RW) #define XT_TAB_ROW_LOCK_TYPE xt_rwlock_type #define XT_TAB_ROW_INIT_LOCK(s, i) xt_init_rwlock(s, i) @@ -116,16 +132,23 @@ struct XTTablePath; #define XT_TAB_ROW_READ_LOCK(i, s) xt_slock_rwlock_ns(i) #define XT_TAB_ROW_WRITE_LOCK(i, s) xt_xlock_rwlock_ns(i) #define XT_TAB_ROW_UNLOCK(i, s) xt_unlock_rwlock_ns(i) -#elif defined(XT_TAB_ROW_USE_RW_MUTEX) +#elif defined(XT_TAB_ROW_USE_RWMUTEX) #define XT_TAB_ROW_LOCK_TYPE XTRWMutexRec #define XT_TAB_ROW_INIT_LOCK(s, i) xt_rwmutex_init_with_autoname(s, i) #define XT_TAB_ROW_FREE_LOCK(s, i) xt_rwmutex_free(s, i) #define XT_TAB_ROW_READ_LOCK(i, s) xt_rwmutex_slock(i, (s)->t_id) #define XT_TAB_ROW_WRITE_LOCK(i, s) xt_rwmutex_xlock(i, (s)->t_id) #define XT_TAB_ROW_UNLOCK(i, s) xt_rwmutex_unlock(i, (s)->t_id) +#elif defined(XT_TAB_ROW_USE_SPINXSLOCK) +#define XT_TAB_ROW_LOCK_TYPE XTSpinXSLockRec +#define XT_TAB_ROW_INIT_LOCK(s, i) xt_spinxslock_init_with_autoname(s, i) +#define XT_TAB_ROW_FREE_LOCK(s, i) xt_spinxslock_free(s, i) +#define XT_TAB_ROW_READ_LOCK(i, s) xt_spinxslock_slock(i, (s)->t_id) +#define XT_TAB_ROW_WRITE_LOCK(i, s) xt_spinxslock_xlock(i, (s)->t_id) +#define XT_TAB_ROW_UNLOCK(i, s) xt_spinxslock_unlock(i, (s)->t_id) #else #define XT_TAB_ROW_LOCK_TYPE XTSpinLockRec -#define XT_TAB_ROW_INIT_LOCK(s, i) xt_spinlock_init(s, i) +#define XT_TAB_ROW_INIT_LOCK(s, i) xt_spinlock_init_with_autoname(s, i) #define XT_TAB_ROW_FREE_LOCK(s, i) xt_spinlock_free(s, i) #define XT_TAB_ROW_READ_LOCK(i, s) xt_spinlock_lock(i) #define XT_TAB_ROW_WRITE_LOCK(i, s) xt_spinlock_lock(i) @@ -310,6 +333,7 @@ typedef struct XTTable : public XTHeap { /* Values that belong in the header when flushed! */ xtBool tab_flush_pending; /* TRUE if the table needs to be flushed */ xtBool tab_recovery_done; /* TRUE if the table has been recovered */ + xtBool tab_repair_pending; /* TRUE if the table has been marked for repair */ xtBool tab_temporary; /* TRUE if this is a temporary table {TEMP-TABLES}. */ off_t tab_bytes_to_flush; /* Number of bytes of the record/row files to flush. */ @@ -441,6 +465,9 @@ typedef struct XTOpenTable { xtRecordID ot_seq_rec_id; /* Current position of a sequential scan. */ xtRecordID ot_seq_eof_id; /* The EOF at the start of the sequential scan. */ XTTabCachePagePtr ot_seq_page; /* If ot_seq_buffer is non-NULL, then a page has been locked! */ + xtWord1 *ot_seq_data; /* Non-NULL if the data references memory mapped memory, or if it was + * allocated if no memory mapping is being used. + */ xtBool ot_on_page; size_t ot_seq_offset; /* Offset on the current page. */ } XTOpenTableRec, *XTOpenTablePtr; @@ -488,7 +515,7 @@ XTTableHPtr xt_use_table_no_lock(XTThreadPtr self, struct XTDatabase *db, XTPa int xt_use_table_by_id(struct XTThread *self, XTTableHPtr *tab, struct XTDatabase *db, xtTableID tab_id); XTOpenTablePtr xt_open_table(XTTableHPtr tab); void xt_close_table(XTOpenTablePtr ot, xtBool flush, xtBool have_table_lock); -void xt_drop_table(struct XTThread *self, XTPathStrPtr name); +void xt_drop_table(struct XTThread *self, XTPathStrPtr name, xtBool drop_db); void xt_check_table(XTThreadPtr self, XTOpenTablePtr tab); void xt_rename_table(struct XTThread *self, XTPathStrPtr old_name, XTPathStrPtr new_name); @@ -536,8 +563,13 @@ xtBool xt_tab_put_eof_rec_data(XTOpenTablePtr ot, xtRecordID rec_id, size_t s xtBool xt_tab_put_log_op_rec_data(XTOpenTablePtr ot, u_int status, xtRecordID free_rec_id, xtRecordID rec_id, size_t size, xtWord1 *buffer); xtBool xt_tab_put_log_rec_data(XTOpenTablePtr ot, u_int status, xtRecordID free_rec_id, xtRecordID rec_id, size_t size, xtWord1 *buffer, xtOpSeqNo *op_seq); xtBool xt_tab_get_rec_data(register XTOpenTablePtr ot, xtRecordID rec_id, size_t size, xtWord1 *buffer); +void xt_tab_disable_index(XTTableHPtr tab, u_int ind_error); void xt_tab_set_index_error(XTTableHPtr tab); +xtBool xt_tab_is_table_repair_pending(XTTableHPtr tab); +void xt_tab_table_repaired(XTTableHPtr tab); +void xt_tab_set_table_repair_pending(XTTableHPtr tab); + inline off_t xt_row_id_to_row_offset(register XTTableHPtr tab, xtRowID row_id) { return (off_t) tab->tab_rows.tci_header_size + (off_t) (row_id - 1) * (off_t) tab->tab_rows.tci_rec_size; @@ -605,3 +637,4 @@ inline xtIndexNodeID xt_ind_offset_to_node(register XTTableHPtr tab, off_t ind_o while (0) #endif + diff --git a/storage/pbxt/src/thread_xt.cc b/storage/pbxt/src/thread_xt.cc index bd7dca31cb5..9509df6184d 100644 --- a/storage/pbxt/src/thread_xt.cc +++ b/storage/pbxt/src/thread_xt.cc @@ -23,6 +23,10 @@ #include "xt_config.h" +#ifdef DRIZZLED +#include <bitset> +#endif + #ifndef XT_WIN #include <unistd.h> #include <sys/time.h> @@ -177,7 +181,7 @@ static void thr_log_newline(XTThreadPtr self, c_char *func, c_char *file, u_int #endif #endif -void xt_log_flush(XTThreadPtr self __attribute__((unused))) +void xt_log_flush(XTThreadPtr XT_UNUSED(self)) { fflush(log_file); } @@ -466,7 +470,7 @@ static void thr_free_resources(XTThreadPtr self, XTResourcePtr top) } } -xtPublic void xt_bug(XTThreadPtr self __attribute__((unused))) +xtPublic void xt_bug(XTThreadPtr XT_UNUSED(self)) { static int *bug_ptr = NULL; @@ -532,7 +536,11 @@ xtPublic void xt_throw_error(XTThreadPtr self, c_char *func, c_char *file, u_int #define XT_SYS_ERR_SIZE 300 -static c_char *thr_get_sys_error(int err, char *err_msg __attribute__((unused))) +#ifdef XT_WIN +static c_char *thr_get_sys_error(int err, char *err_msg) +#else +static c_char *thr_get_sys_error(int err, char *XT_UNUSED(err_msg)) +#endif { #ifdef XT_WIN char *ptr; @@ -617,7 +625,7 @@ static c_char *thr_get_err_string(int xt_err) case XT_ERR_NO_REFERENCED_ROW: str = "Constraint: `%s`"; break; // "Foreign key '%s', referenced row does not exist" case XT_ERR_ROW_IS_REFERENCED: str = "Constraint: `%s`"; break; // "Foreign key '%s', has a referencing row" case XT_ERR_BAD_DICTIONARY: str = "Internal dictionary does not match MySQL dictionary"; break; - case XT_ERR_LOADING_MYSQL_DIC: str = "Error %s loading MySQL .frm file"; break; + case XT_ERR_LOADING_MYSQL_DIC: str = "Error loading %s.frm file, MySQL error: %s"; break; case XT_ERR_COLUMN_IS_NOT_NULL: str = "Column `%s` is NOT NULL"; break; case XT_ERR_INCORRECT_NO_OF_COLS: str = "Incorrect number of columns near %s"; break; case XT_ERR_FK_ON_TEMP_TABLE: str = "Cannot create foreign key on temporary table"; break; @@ -638,7 +646,7 @@ static c_char *thr_get_err_string(int xt_err) case XT_ERR_INDEX_CORRUPTED: str = "Table `%s` index is corrupted, REPAIR TABLE required"; break; case XT_ERR_NO_INDEX_CACHE: str = "Not enough index cache memory to handle concurrent updates"; break; case XT_ERR_INDEX_LOG_CORRUPT: str = "Index log corrupt: '%s'"; break; - case XT_ERR_TOO_MANY_THREADS: str = "Too many threads: %s, increase max_connections"; break; + case XT_ERR_TOO_MANY_THREADS: str = "Too many threads: %s, increase pbxt_max_threads"; break; case XT_ERR_TOO_MANY_WAITERS: str = "Too many waiting threads: %s"; break; case XT_ERR_INDEX_OLD_VERSION: str = "Table `%s` index created by an older version, REPAIR TABLE required"; break; case XT_ERR_PBXT_TABLE_EXISTS: str = "System table cannot be dropped because PBXT table still exists"; break; @@ -648,6 +656,8 @@ static c_char *thr_get_err_string(int xt_err) case XT_ERR_NEW_TYPE_OF_XLOG: str = "Transaction log %s, is using a newer format, upgrade required"; break; case XT_ERR_NO_BEFORE_IMAGE: str = "Internal error: no before image"; break; case XT_ERR_FK_REF_TEMP_TABLE: str = "Foreign key may not reference temporary table"; break; + case XT_ERR_MYSQL_SHUTDOWN: str = "Cannot open table, MySQL has shutdown"; break; + case XT_ERR_MYSQL_NO_THREAD: str = "Cannot create thread, MySQL has shutdown"; break; default: str = "Unknown XT error"; break; } return str; @@ -869,13 +879,18 @@ xtPublic void xt_log_errno(XTThreadPtr self, c_char *func, c_char *file, u_int l * ----------------------------------------------------------------------- * Assertions and failures (one breakpoints for all failures) */ +//#define CRASH_ON_ASSERT -xtPublic xtBool xt_assert(XTThreadPtr self __attribute__((unused)), c_char *expr, c_char *func, c_char *file, u_int line) +xtPublic xtBool xt_assert(XTThreadPtr self, c_char *expr, c_char *func, c_char *file, u_int line) { + (void) self; #ifdef DEBUG //xt_set_fflush(TRUE); //xt_dump_trace(); printf("%s(%s:%d) %s\n", func, file, (int) line, expr); +#ifdef CRASH_ON_ASSERT + abort(); +#endif #ifdef XT_WIN FatalAppExit(0, "Assertion Failed!"); #endif @@ -981,11 +996,13 @@ static xtBool thr_setup_signals(void) } #endif -static void *thr_main(void *data) +typedef void *(*ThreadMainFunc)(XTThreadPtr self); + +extern "C" void *thr_main(void *data) { ThreadDataPtr td = (ThreadDataPtr) data; XTThreadPtr self = td->td_thr; - void *(*start_routine)(XTThreadPtr); + ThreadMainFunc start_routine; void *return_data; enter_(); @@ -1011,6 +1028,11 @@ static void *thr_main(void *data) outer_(); xt_free_thread(self); + + /* {MYSQL-THREAD-KILL} + * Clean up any remaining MySQL thread! + */ + myxt_delete_remaining_thread(); return return_data; } @@ -1857,7 +1879,7 @@ xtPublic void xt_signal_thread(XTThreadPtr target) xt_broadcast_cond_ns(&target->t_cond); } -xtPublic void xt_terminate_thread(XTThreadPtr self __attribute__((unused)), XTThreadPtr target) +xtPublic void xt_terminate_thread(XTThreadPtr XT_UNUSED(self), XTThreadPtr target) { target->t_quit = TRUE; target->t_delayed_signal = SIGTERM; diff --git a/storage/pbxt/src/thread_xt.h b/storage/pbxt/src/thread_xt.h index 4344c5335b9..69f828758a1 100644 --- a/storage/pbxt/src/thread_xt.h +++ b/storage/pbxt/src/thread_xt.h @@ -44,14 +44,6 @@ * Macros and defines */ -#ifdef XT_WIN -#define __FUNC__ __FUNCTION__ -#elif defined(XT_SOLARIS) -#define __FUNC__ "__func__" -#else -#define __FUNC__ __PRETTY_FUNCTION__ -#endif - #define XT_ERR_MSG_SIZE (PATH_MAX + 200) #ifdef DEBUG @@ -291,6 +283,12 @@ typedef struct XTThread { xtBool st_xact_long_running; /* TRUE if this is a long running writer transaction. */ xtWord4 st_visible_time; /* Transactions committed before this time are visible. */ XTDataLogBufferRec st_dlog_buf; + + /* A list of the last 10 transactions run by this connection: */ +#ifdef XT_WAIT_FOR_CLEANUP + u_int st_last_xact; + xtXactID st_prev_xact[XT_MAX_XACT_BEHIND]; +#endif int st_xact_mode; /* The transaction mode. */ xtBool st_ignore_fkeys; /* TRUE if we must ignore foreign keys. */ diff --git a/storage/pbxt/src/trace_xt.cc b/storage/pbxt/src/trace_xt.cc index e5881cb6d12..a913e2ae385 100644 --- a/storage/pbxt/src/trace_xt.cc +++ b/storage/pbxt/src/trace_xt.cc @@ -72,6 +72,12 @@ xtPublic xtBool xt_init_trace(void) trace_log_offset = 0; trace_log_end = 0; trace_stat_count = 0; + +#ifdef XT_TRACK_CONNECTIONS + for (int i=0; i<XT_TRACK_MAX_CONNS; i++) + xt_track_conn_info[i].cu_t_id = i; +#endif + return TRUE; } @@ -343,3 +349,45 @@ xtPublic void xt_ftracef(char *fmt, ...) va_end(ap); } +/* + * ----------------------------------------------------------------------- + * CONNECTION TRACKING + */ + +#ifdef XT_TRACK_CONNECTIONS +XTConnInfoRec xt_track_conn_info[XT_TRACK_MAX_CONNS]; + +static int trace_comp_conn_info(const void *a, const void *b) +{ + XTConnInfoPtr ci_a = (XTConnInfoPtr) a, ci_b = (XTConnInfoPtr) b; + + if (ci_a->ci_curr_xact_id > ci_b->ci_curr_xact_id) + return 1; + if (ci_a->ci_curr_xact_id < ci_b->ci_curr_xact_id) + return -1; + return 0; +} + +xtPublic void xt_dump_conn_tracking(void) +{ + XTConnInfoRec conn_info[XT_TRACK_MAX_CONNS]; + XTConnInfoPtr ptr; + + memcpy(conn_info, xt_track_conn_info, sizeof(xt_track_conn_info)); + qsort(conn_info, XT_TRACK_MAX_CONNS, sizeof(XTConnInfoRec), trace_comp_conn_info); + + ptr = conn_info; + for (int i=0; i<XT_TRACK_MAX_CONNS; i++) { + if (ptr->ci_curr_xact_id || ptr->ci_prev_xact_id) { + printf("%3d curr=%d prev=%d prev-time=%ld\n", (int) ptr->cu_t_id, (int) ptr->ci_curr_xact_id, (int) ptr->ci_prev_xact_id, (long) ptr->ci_prev_xact_time); + if (i+1<XT_TRACK_MAX_CONNS) { + printf(" diff=%d\n", (int) (ptr+1)->ci_curr_xact_id - (int) ptr->ci_curr_xact_id); + } + } + ptr++; + } +} + +#endif + + diff --git a/storage/pbxt/src/trace_xt.h b/storage/pbxt/src/trace_xt.h index 44cfa9945f1..9b00a5a05c7 100644 --- a/storage/pbxt/src/trace_xt.h +++ b/storage/pbxt/src/trace_xt.h @@ -46,4 +46,29 @@ void xt_ftracef(char *fmt, ...); //#define PBXT_HANDLER_TRACE #endif +/* + * ----------------------------------------------------------------------- + * CONNECTION TRACKING + */ + +#define XT_TRACK_CONNECTIONS + +#ifdef XT_TRACK_CONNECTIONS +#define XT_TRACK_MAX_CONNS 500 + +typedef struct XTConnInfo { + xtThreadID cu_t_id; + xtXactID ci_curr_xact_id; + xtWord8 ci_xact_start; + + xtXactID ci_prev_xact_id; + xtWord8 ci_prev_xact_time; +} XTConnInfoRec, *XTConnInfoPtr; + +extern XTConnInfoRec xt_track_conn_info[XT_TRACK_MAX_CONNS]; + +void xt_dump_conn_tracking(void); + +#endif + #endif diff --git a/storage/pbxt/src/util_xt.cc b/storage/pbxt/src/util_xt.cc index 6e1db1f5f73..7e805e6e044 100644 --- a/storage/pbxt/src/util_xt.cc +++ b/storage/pbxt/src/util_xt.cc @@ -61,7 +61,7 @@ xtPublic xtWord8 xt_time_now(void) return ms; } -xtPublic void xt_free_nothing(struct XTThread XT_UNUSED(*thr), void XT_UNUSED(*x)) +xtPublic void xt_free_nothing(struct XTThread *XT_UNUSED(thread), void *XT_UNUSED(x)) { } diff --git a/storage/pbxt/src/xaction_xt.cc b/storage/pbxt/src/xaction_xt.cc index be3cb650bcb..82039ace2b3 100644 --- a/storage/pbxt/src/xaction_xt.cc +++ b/storage/pbxt/src/xaction_xt.cc @@ -23,6 +23,10 @@ #include "xt_config.h" +#ifdef DRIZZLED +#include <bitset> +#endif + #include <time.h> #include <signal.h> @@ -48,7 +52,7 @@ #endif static void xn_sw_wait_for_xact(XTThreadPtr self, XTDatabaseHPtr db, u_int hsecs); -static xtBool xn_get_xact_details(XTDatabaseHPtr db, xtXactID xn_id, XTThreadPtr thread __attribute__((unused)), int *flags, xtXactID *start, xtXactID *end, xtThreadID *thd_id); +static xtBool xn_get_xact_details(XTDatabaseHPtr db, xtXactID xn_id, XTThreadPtr XT_UNUSED(thread), int *flags, xtXactID *start, xtXactID *end, xtThreadID *thd_id); static xtBool xn_get_xact_pointer(XTDatabaseHPtr db, xtXactID xn_id, XTXactDataPtr *xact_ptr); /* ============================================================================================== */ @@ -203,7 +207,7 @@ typedef struct XNWaitFor { xtXactID wf_for_me_xn_id; /* The transaction we are waiting for. */ } XNWaitForRec, *XNWaitForPtr; -static int xn_compare_wait_for(XTThreadPtr XT_UNUSED(self), register const void XT_UNUSED(*thunk), register const void *a, register const void *b) +static int xn_compare_wait_for(XTThreadPtr XT_UNUSED(self), register const void *XT_UNUSED(thunk), register const void *a, register const void *b) { xtXactID *x = (xtXactID *) a; XNWaitForPtr y = (XNWaitForPtr) b; @@ -215,7 +219,7 @@ static int xn_compare_wait_for(XTThreadPtr XT_UNUSED(self), register const void return 1; } -static void xn_free_wait_for(XTThreadPtr XT_UNUSED(self), void XT_UNUSED(*thunk), void XT_UNUSED(*item)) +static void xn_free_wait_for(XTThreadPtr XT_UNUSED(self), void *XT_UNUSED(thunk), void *XT_UNUSED(item)) { } @@ -446,7 +450,9 @@ xtPublic xtBool xt_xn_wait_for_xact(XTThreadPtr thread, XTXactWaitPtr xw, XTLock xt_timed_wait_cond_ns(&my_wt->wt_cond, &my_wt->wt_lock, WAIT_FOR_XACT_TIME); } + /* Unreachable xt_unlock_mutex_ns(&my_wt->wt_lock); + */ } if (xw) { @@ -753,12 +759,13 @@ xtPublic xtXactID xt_xn_get_curr_id(XTDatabaseHPtr db) return curr_xn_id; } -xtPublic XTXactDataPtr xt_xn_add_old_xact(XTDatabaseHPtr db, xtXactID xn_id, XTThreadPtr thread __attribute__((unused))) +xtPublic XTXactDataPtr xt_xn_add_old_xact(XTDatabaseHPtr db, xtXactID xn_id, XTThreadPtr thread) { register XTXactDataPtr xact; register XTXactSegPtr seg; register XTXactDataPtr *hash; + (void) thread; seg = &db->db_xn_idx[xn_id & XT_XN_SEGMENT_MASK]; XT_XACT_WRITE_LOCK(&seg->xs_tab_lock, thread); hash = &seg->xs_table[(xn_id >> XT_XN_SEGMENT_SHIFTS) % XT_XN_HASH_TABLE_SIZE]; @@ -778,7 +785,7 @@ xtPublic XTXactDataPtr xt_xn_add_old_xact(XTDatabaseHPtr db, xtXactID xn_id, XTT */ db->db_sw_faster |= XT_SW_NO_MORE_XACT_SLOTS; if (!(xact = (XTXactDataPtr) xt_malloc_ns(sizeof(XTXactDataRec)))) { - XT_XACT_UNLOCK(&seg->xs_tab_lock, thread); + XT_XACT_UNLOCK(&seg->xs_tab_lock, thread, TRUE); return NULL; } } @@ -797,7 +804,7 @@ xtPublic XTXactDataPtr xt_xn_add_old_xact(XTDatabaseHPtr db, xtXactID xn_id, XTT seg->xs_last_xn_id = xn_id; done_ok: - XT_XACT_UNLOCK(&seg->xs_tab_lock, thread); + XT_XACT_UNLOCK(&seg->xs_tab_lock, thread, TRUE); #ifdef HIGH_X tot_alloced++; if (tot_alloced > high_alloced) @@ -806,12 +813,13 @@ xtPublic XTXactDataPtr xt_xn_add_old_xact(XTDatabaseHPtr db, xtXactID xn_id, XTT return xact; } -static XTXactDataPtr xn_add_new_xact(XTDatabaseHPtr db, xtXactID xn_id, XTThreadPtr thread __attribute__((unused))) +static XTXactDataPtr xn_add_new_xact(XTDatabaseHPtr db, xtXactID xn_id, XTThreadPtr thread) { register XTXactDataPtr xact; register XTXactSegPtr seg; register XTXactDataPtr *hash; + (void) thread; seg = &db->db_xn_idx[xn_id & XT_XN_SEGMENT_MASK]; XT_XACT_WRITE_LOCK(&seg->xs_tab_lock, thread); hash = &seg->xs_table[(xn_id >> XT_XN_SEGMENT_SHIFTS) % XT_XN_HASH_TABLE_SIZE]; @@ -825,7 +833,7 @@ static XTXactDataPtr xn_add_new_xact(XTDatabaseHPtr db, xtXactID xn_id, XTThread */ db->db_sw_faster |= XT_SW_NO_MORE_XACT_SLOTS; if (!(xact = (XTXactDataPtr) xt_malloc_ns(sizeof(XTXactDataRec)))) { - XT_XACT_UNLOCK(&seg->xs_tab_lock, thread); + XT_XACT_UNLOCK(&seg->xs_tab_lock, thread, TRUE); return NULL; } } @@ -841,7 +849,7 @@ static XTXactDataPtr xn_add_new_xact(XTDatabaseHPtr db, xtXactID xn_id, XTThread xact->xd_flags = 0; seg->xs_last_xn_id = xn_id; - XT_XACT_UNLOCK(&seg->xs_tab_lock, thread); + XT_XACT_UNLOCK(&seg->xs_tab_lock, thread, TRUE); #ifdef HIGH_X tot_alloced++; if (tot_alloced > high_alloced) @@ -850,7 +858,7 @@ static XTXactDataPtr xn_add_new_xact(XTDatabaseHPtr db, xtXactID xn_id, XTThread return xact; } -static xtBool xn_get_xact_details(XTDatabaseHPtr db, xtXactID xn_id, XTThreadPtr thread __attribute__((unused)), int *flags, xtXactID *start, xtWord4 *end, xtThreadID *thd_id) +static xtBool xn_get_xact_details(XTDatabaseHPtr db, xtXactID xn_id, XTThreadPtr XT_UNUSED(thread), int *flags, xtXactID *start, xtWord4 *end, xtThreadID *thd_id) { register XTXactSegPtr seg; register XTXactDataPtr xact; @@ -874,7 +882,7 @@ static xtBool xn_get_xact_details(XTDatabaseHPtr db, xtXactID xn_id, XTThreadPtr } xact = xact->xd_next_xact; } - XT_XACT_UNLOCK(&seg->xs_tab_lock, thread); + XT_XACT_UNLOCK(&seg->xs_tab_lock, thread, FALSE); return found; } @@ -900,11 +908,11 @@ static xtBool xn_get_xact_pointer(XTDatabaseHPtr db, xtXactID xn_id, XTXactDataP } xact = xact->xd_next_xact; } - XT_XACT_UNLOCK(&seg->xs_tab_lock, thread); + XT_XACT_UNLOCK(&seg->xs_tab_lock, thread, FALSE); return found; } -static xtBool xn_get_xact_start(XTDatabaseHPtr db, xtXactID xn_id, XTThreadPtr thread __attribute__((unused)), xtLogID *log_id, xtLogOffset *log_offset) +static xtBool xn_get_xact_start(XTDatabaseHPtr db, xtXactID xn_id, XTThreadPtr XT_UNUSED(thread), xtLogID *log_id, xtLogOffset *log_offset) { register XTXactSegPtr seg; register XTXactDataPtr xact; @@ -922,12 +930,12 @@ static xtBool xn_get_xact_start(XTDatabaseHPtr db, xtXactID xn_id, XTThreadPtr t } xact = xact->xd_next_xact; } - XT_XACT_UNLOCK(&seg->xs_tab_lock, thread); + XT_XACT_UNLOCK(&seg->xs_tab_lock, thread, FALSE); return found; } /* NOTE: this function may only be used by the sweeper or the recovery process. */ -xtPublic XTXactDataPtr xt_xn_get_xact(XTDatabaseHPtr db, xtXactID xn_id, XTThreadPtr thread __attribute__((unused))) +xtPublic XTXactDataPtr xt_xn_get_xact(XTDatabaseHPtr db, xtXactID xn_id, XTThreadPtr XT_UNUSED(thread)) { register XTXactSegPtr seg; register XTXactDataPtr xact; @@ -940,7 +948,7 @@ xtPublic XTXactDataPtr xt_xn_get_xact(XTDatabaseHPtr db, xtXactID xn_id, XTThrea break; xact = xact->xd_next_xact; } - XT_XACT_UNLOCK(&seg->xs_tab_lock, thread); + XT_XACT_UNLOCK(&seg->xs_tab_lock, thread, FALSE); return xact; } @@ -948,11 +956,12 @@ xtPublic XTXactDataPtr xt_xn_get_xact(XTDatabaseHPtr db, xtXactID xn_id, XTThrea * Delete a transaction, return TRUE if the transaction * was found. */ -xtPublic xtBool xt_xn_delete_xact(XTDatabaseHPtr db, xtXactID xn_id, XTThreadPtr thread __attribute__((unused))) +xtPublic xtBool xt_xn_delete_xact(XTDatabaseHPtr db, xtXactID xn_id, XTThreadPtr thread) { XTXactDataPtr xact, pxact = NULL; XTXactSegPtr seg; + (void) thread; seg = &db->db_xn_idx[xn_id & XT_XN_SEGMENT_MASK]; XT_XACT_WRITE_LOCK(&seg->xs_tab_lock, thread); xact = seg->xs_table[(xn_id >> XT_XN_SEGMENT_SHIFTS) % XT_XN_HASH_TABLE_SIZE]; @@ -963,13 +972,13 @@ xtPublic xtBool xt_xn_delete_xact(XTDatabaseHPtr db, xtXactID xn_id, XTThreadPtr else seg->xs_table[(xn_id >> XT_XN_SEGMENT_SHIFTS) % XT_XN_HASH_TABLE_SIZE] = xact->xd_next_xact; xn_free_xact(db, seg, xact); - XT_XACT_UNLOCK(&seg->xs_tab_lock, thread); + XT_XACT_UNLOCK(&seg->xs_tab_lock, thread, TRUE); return TRUE; } pxact = xact; xact = xact->xd_next_xact; } - XT_XACT_UNLOCK(&seg->xs_tab_lock, thread); + XT_XACT_UNLOCK(&seg->xs_tab_lock, thread, TRUE); return FALSE; } @@ -1253,6 +1262,10 @@ xtPublic xtBool xt_xn_begin(XTThreadPtr self) #ifdef TRACE_TRANSACTION xt_ttracef(self, "BEGIN T%lu\n", (u_long) self->st_xact_data->xd_start_xn_id); #endif +#ifdef XT_TRACK_CONNECTIONS + xt_track_conn_info[self->t_id].ci_curr_xact_id = self->st_xact_data->xd_start_xn_id; + xt_track_conn_info[self->t_id].ci_xact_start = xt_trace_clock(); +#endif return OK; } @@ -1375,6 +1388,13 @@ static xtBool xn_end_xact(XTThreadPtr thread, u_int status) thread->st_xact_data = NULL; +#ifdef XT_TRACK_CONNECTIONS + xt_track_conn_info[thread->t_id].ci_prev_xact_id = xt_track_conn_info[thread->t_id].ci_curr_xact_id; + xt_track_conn_info[thread->t_id].ci_prev_xact_time = xt_trace_clock() - xt_track_conn_info[thread->t_id].ci_xact_start; + xt_track_conn_info[thread->t_id].ci_curr_xact_id = 0; + xt_track_conn_info[thread->t_id].ci_xact_start = 0; +#endif + xt_xn_wakeup_waiting_threads(thread); /* {WAKE-SW} Waking the sweeper @@ -1401,6 +1421,19 @@ static xtBool xn_end_xact(XTThreadPtr thread, u_int status) /* Don't get too far ahead of the sweeper! */ if (writer) { +#ifdef XT_WAIT_FOR_CLEANUP + xtXactID wait_xn_id; + + /* This is the transaction that was committed 3 transactions ago: */ + wait_xn_id = thread->st_prev_xact[thread->st_last_xact]; + thread->st_prev_xact[thread->st_last_xact] = xn_id; + /* This works because XT_MAX_XACT_BEHIND == 2! */ + ASSERT_NS((thread->st_last_xact + 1) % XT_MAX_XACT_BEHIND == thread->st_last_xact ^ 1); + thread->st_last_xact ^= 1; + while (xt_xn_is_before(db->db_xn_to_clean_id, wait_xn_id) && (db->db_sw_faster & XT_SW_TOO_FAR_BEHIND)) { + xt_critical_wait(); + } +#else if ((db->db_sw_faster & XT_SW_TOO_FAR_BEHIND) != 0) { xtWord8 then = xt_trace_clock() + (xtWord8) 20000; @@ -1412,6 +1445,7 @@ static xtBool xn_end_xact(XTThreadPtr thread, u_int status) break; } } +#endif } } return ok; @@ -1854,7 +1888,7 @@ static xtBool xn_sw_cleanup_done(XTThreadPtr self, XTOpenTablePtr ot, xtRecordID return FALSE; } -static void xn_sw_clean_indices(XTThreadPtr self __attribute__((unused)), XTOpenTablePtr ot, xtRecordID rec_id, xtRowID row_id, xtWord1 *rec_data, xtWord1 *rec_buffer) +static void xn_sw_clean_indices(XTThreadPtr XT_NDEBUG_UNUSED(self), XTOpenTablePtr ot, xtRecordID rec_id, xtRowID row_id, xtWord1 *rec_data, xtWord1 *rec_buffer) { XTTableHPtr tab = ot->ot_table; u_int cols_req; @@ -2495,7 +2529,8 @@ static void *xn_sw_run_thread(XTThreadPtr self) int count; void *mysql_thread; - mysql_thread = myxt_create_thread(); + if (!(mysql_thread = myxt_create_thread())) + xt_throw(self); while (!self->t_quit) { try_(a) { @@ -2552,7 +2587,10 @@ static void *xn_sw_run_thread(XTThreadPtr self) db->db_sw_idle = XT_THREAD_BUSY; } + /* + * {MYSQL-THREAD-KILL} myxt_destroy_thread(mysql_thread, TRUE); + */ return NULL; } @@ -2599,7 +2637,13 @@ xtPublic void xt_wait_for_sweeper(XTThreadPtr self, XTDatabaseHPtr db, int abort if (db->db_sw_thread) { then = time(NULL); - while (!xt_xn_is_before(xt_xn_get_curr_id(db), db->db_xn_to_clean_id)) { // was db->db_xn_to_clean_id <= xt_xn_get_curr_id(db) + /* Changed xt_xn_get_curr_id(db) to db->db_xn_curr_id, + * This should work because we are not concerned about the difference + * between xt_xn_get_curr_id(db) and db->db_xn_curr_id, + * Which is just a matter of when transactions we can expect ot find + * in memory (see {GAP-INC-ADD-XACT}) + */ + while (!xt_xn_is_before(db->db_xn_curr_id, db->db_xn_to_clean_id)) { // was db->db_xn_to_clean_id <= xt_xn_get_curr_id(db) xt_lock_mutex(self, &db->db_sw_lock); pushr_(xt_unlock_mutex, &db->db_sw_lock); xt_wakeup_sweeper(db); diff --git a/storage/pbxt/src/xaction_xt.h b/storage/pbxt/src/xaction_xt.h index 9a651fc2532..7bcc46dfcf3 100644 --- a/storage/pbxt/src/xaction_xt.h +++ b/storage/pbxt/src/xaction_xt.h @@ -36,28 +36,48 @@ struct XTOpenTable; #ifdef XT_USE_XACTION_DEBUG_SIZES -#define XT_XN_DATA_ALLOC_COUNT 400 -#define XT_XN_SEGMENT_SHIFTS 1 -#define XT_XN_HASH_TABLE_SIZE 31 #define XT_TN_NUMBER_INCREMENT 20 #define XT_TN_MAX_TO_FREE 20 #define XT_TN_MAX_TO_FREE_WASTE 3 #define XT_TN_MAX_TO_FREE_CHECK 3 #define XT_TN_MAX_TO_FREE_INC 3 +#define XT_XN_SEGMENT_SHIFTS 1 + #else -#define XT_XN_DATA_ALLOC_COUNT 1250 // Number of pre-allocated transaction data structures per segment -#define XT_XN_SEGMENT_SHIFTS 5 // (32) -#define XT_XN_HASH_TABLE_SIZE 1279 // This is a prime number! #define XT_TN_NUMBER_INCREMENT 100 // The increment of the transaction number on restart #define XT_TN_MAX_TO_FREE 800 // The maximum size of the "to free" list #define XT_TN_MAX_TO_FREE_WASTE 400 #define XT_TN_MAX_TO_FREE_CHECK 100 // Once we have exceeded the limit, we only try in intervals #define XT_TN_MAX_TO_FREE_INC 100 +//#define XT_XN_SEGMENT_SHIFTS 5 // (32) +//#define XT_XN_SEGMENT_SHIFTS 6 // (64) +//#define XT_XN_SEGMENT_SHIFTS 7 // (128) +#define XT_XN_SEGMENT_SHIFTS 8 // (256) +//#define XT_XN_SEGMENT_SHIFTS 9 // (512) + #endif +/* The hash table size (a prime number) */ +#if XT_XN_SEGMENT_SHIFTS == 1 // (1) +#define XT_XN_HASH_TABLE_SIZE 1301 +#elif XT_XN_SEGMENT_SHIFTS == 5 // (32) +#define XT_XN_HASH_TABLE_SIZE 1009 +#elif XT_XN_SEGMENT_SHIFTS == 6 // (64) +#define XT_XN_HASH_TABLE_SIZE 503 +#elif XT_XN_SEGMENT_SHIFTS == 7 // (128) +#define XT_XN_HASH_TABLE_SIZE 251 +#elif XT_XN_SEGMENT_SHIFTS == 8 // (256) +#define XT_XN_HASH_TABLE_SIZE 127 +#elif XT_XN_SEGMENT_SHIFTS == 9 // (512) +#define XT_XN_HASH_TABLE_SIZE 67 +#endif + +/* Number of pre-allocated transaction data structures per segment */ +#define XT_XN_DATA_ALLOC_COUNT XT_XN_HASH_TABLE_SIZE + #define XT_XN_NO_OF_SEGMENTS (1 << XT_XN_SEGMENT_SHIFTS) #define XT_XN_SEGMENT_MASK (XT_XN_NO_OF_SEGMENTS - 1) @@ -94,36 +114,34 @@ typedef struct XTXactData { } XTXactDataRec, *XTXactDataPtr; -#define XT_XACT_USE_SPINLOCK +#ifdef XT_NO_ATOMICS +#define XT_XACT_USE_PTHREAD_RW +#else +//#define XT_XACT_USE_SKEWRWLOCK +#define XT_XACT_USE_SPINXSLOCK +#endif -#ifdef XT_XACT_USE_FASTWRLOCK -#define XT_XACT_LOCK_TYPE XTFastRWLockRec -#define XT_XACT_INIT_LOCK(s, i) xt_fastrwlock_init(s, i) -#define XT_XACT_FREE_LOCK(s, i) xt_fastrwlock_free(s, i) -#define XT_XACT_READ_LOCK(i, s) xt_fastrwlock_slock(i, s) -#define XT_XACT_WRITE_LOCK(i, s) xt_fastrwlock_xlock(i, s) -#define XT_XACT_UNLOCK(i, s) xt_fastrwlock_unlock(i, s) -#elif defined(XT_XACT_USE_PTHREAD_RW) +#if defined(XT_XACT_USE_PTHREAD_RW) #define XT_XACT_LOCK_TYPE xt_rwlock_type #define XT_XACT_INIT_LOCK(s, i) xt_init_rwlock(s, i) #define XT_XACT_FREE_LOCK(s, i) xt_free_rwlock(i) #define XT_XACT_READ_LOCK(i, s) xt_slock_rwlock_ns(i) #define XT_XACT_WRITE_LOCK(i, s) xt_xlock_rwlock_ns(i) -#define XT_XACT_UNLOCK(i, s) xt_unlock_rwlock_ns(i) -#elif defined(XT_XACT_USE_RW_MUTEX) -#define XT_XACT_LOCK_TYPE XTRWMutexRec -#define XT_XACT_INIT_LOCK(s, i) xt_rwmutex_init(s, i) -#define XT_XACT_FREE_LOCK(s, i) xt_rwmutex_free(s, i) -#define XT_XACT_READ_LOCK(i, s) xt_rwmutex_slock(i, (s)->t_id) -#define XT_XACT_WRITE_LOCK(i, s) xt_rwmutex_xlock(i, (s)->t_id) -#define XT_XACT_UNLOCK(i, s) xt_rwmutex_unlock(i, (s)->t_id) +#define XT_XACT_UNLOCK(i, s, b) xt_unlock_rwlock_ns(i) +#elif defined(XT_XACT_USE_SPINXSLOCK) +#define XT_XACT_LOCK_TYPE XTSpinXSLockRec +#define XT_XACT_INIT_LOCK(s, i) xt_spinxslock_init_with_autoname(s, i) +#define XT_XACT_FREE_LOCK(s, i) xt_spinxslock_free(s, i) +#define XT_XACT_READ_LOCK(i, s) xt_spinxslock_slock(i) +#define XT_XACT_WRITE_LOCK(i, s) xt_spinxslock_xlock(i, (s)->t_id) +#define XT_XACT_UNLOCK(i, s, b) xt_spinxslock_unlock(i, b) #else -#define XT_XACT_LOCK_TYPE XTSpinLockRec -#define XT_XACT_INIT_LOCK(s, i) xt_spinlock_init_with_autoname(s, i) -#define XT_XACT_FREE_LOCK(s, i) xt_spinlock_free(s, i) -#define XT_XACT_READ_LOCK(i, s) xt_spinlock_lock(i) -#define XT_XACT_WRITE_LOCK(i, s) xt_spinlock_lock(i) -#define XT_XACT_UNLOCK(i, s) xt_spinlock_unlock(i) +#define XT_XACT_LOCK_TYPE XTSkewRWLockRec +#define XT_XACT_INIT_LOCK(s, i) xt_skewrwlock_init_with_autoname(s, i) +#define XT_XACT_FREE_LOCK(s, i) xt_skewrwlock_free(s, i) +#define XT_XACT_READ_LOCK(i, s) xt_skewrwlock_slock(i) +#define XT_XACT_WRITE_LOCK(i, s) xt_skewrwlock_xlock(i, (s)->t_id) +#define XT_XACT_UNLOCK(i, s, b) xt_skewrwlock_unlock(i, b) #endif /* We store the transactions in a number of segments, each diff --git a/storage/pbxt/src/xactlog_xt.cc b/storage/pbxt/src/xactlog_xt.cc index 82c0d85b770..fac32ea0a09 100644 --- a/storage/pbxt/src/xactlog_xt.cc +++ b/storage/pbxt/src/xactlog_xt.cc @@ -28,6 +28,10 @@ #include "xt_config.h" +#ifdef DRIZZLED +#include <bitset> +#endif + #include <signal.h> #include "xactlog_xt.h" @@ -600,7 +604,12 @@ void XTDatabaseLog::xlog_setup(XTThreadPtr self, XTDatabaseHPtr db, off_t inp_lo xt_init_mutex_with_autoname(self, &xl_write_lock); xt_init_cond(self, &xl_write_cond); +#ifdef XT_XLOG_WAIT_SPINS xt_writing = 0; + xt_waiting = 0; +#else + xt_writing = FALSE; +#endif xl_log_id = 0; xl_log_file = 0; @@ -752,6 +761,7 @@ xtBool XTDatabaseLog::xlog_append(XTThreadPtr thread, size_t size1, xtWord1 *dat xtLogOffset req_flush_log_offset; size_t part_size; xtWord8 flush_time; + xtWord2 sum; if (!size1) { /* Just flush the buffer... */ @@ -790,13 +800,13 @@ xtBool XTDatabaseLog::xlog_append(XTThreadPtr thread, size_t size1, xtWord1 *dat * enough space in the buffer, or a flush * is required. */ + xtWord8 then; /* * The objective of the following code is to * pick one writer, out of all threads. - * The others rest will wait for the writer. + * The rest will wait for the writer. */ - xtBool i_am_writer; if (write_reason == WR_FLUSH) { /* Before we flush, check if we should wait for running @@ -805,8 +815,7 @@ xtBool XTDatabaseLog::xlog_append(XTThreadPtr thread, size_t size1, xtWord1 *dat if (xl_db->db_xn_writer_count - xl_db->db_xn_writer_wait_count - xl_db->db_xn_long_running_count > 0 && xl_last_flush_time) { /* Wait for about as long as the last flush took, * the idea is to saturate the disk with flushing...: */ - xtWord8 then = xt_trace_clock() + (xtWord8) xl_last_flush_time; - + then = xt_trace_clock() + (xtWord8) xl_last_flush_time; for (;;) { xt_critical_wait(); /* If a thread leaves this loop because times up, or @@ -831,6 +840,55 @@ xtBool XTDatabaseLog::xlog_append(XTThreadPtr thread, size_t size1, xtWord1 *dat } } +#ifdef XT_XLOG_WAIT_SPINS + /* Spin for 1/1000s: */ + then = xt_trace_clock() + (xtWord8) 1000; + for (;;) { + if (!xt_atomic_tas4(&xt_writing, 1)) + break; + + /* If I am not the writer, then I just waited for the + * writer. So it may be that my requirements have now + * been met! + */ + if (write_reason == WR_FLUSH) { + /* If the reason was to flush, then + * check the last flush sequence, maybe it is passed + * our required sequence. + */ + if (xt_comp_log_pos(req_flush_log_id, req_flush_log_offset, xl_flush_log_id, xl_flush_log_offset) <= 0) { + /* The required flush position of the log is before + * or equal to the actual flush position. This means the condition + * for this thread have been satified (via group commit). + * Nothing more to do! + */ + ASSERT_NS(xt_comp_log_pos(xl_write_log_id, xl_write_log_offset, xl_append_log_id, xl_append_log_offset) <= 0); + return OK; + } + } + else { + /* It may be that there is now space in the append buffer: */ + if (xl_append_buf_pos + size1 + size2 <= xl_size_of_buffers) + goto copy_to_log_buffer; + } + + if (xt_trace_clock() >= then) { + xt_lock_mutex_ns(&xl_write_lock); + xt_waiting++; + if (!xt_timed_wait_cond_ns(&xl_write_cond, &xl_write_lock, 500)) { + xt_waiting--; + xt_unlock_mutex_ns(&xl_write_lock); + return FALSE; + } + xt_waiting--; + xt_unlock_mutex_ns(&xl_write_lock); + } + else + xt_critical_wait(); + } +#else + xtBool i_am_writer; + i_am_writer = FALSE; xt_lock_mutex_ns(&xl_write_lock); if (xt_writing) { @@ -873,6 +931,7 @@ xtBool XTDatabaseLog::xlog_append(XTThreadPtr thread, size_t size1, xtWord1 *dat goto write_log_to_file; } +#endif /* I am the writer, check the conditions, again: */ if (write_reason == WR_FLUSH) { @@ -881,8 +940,14 @@ xtBool XTDatabaseLog::xlog_append(XTThreadPtr thread, size_t size1, xtWord1 *dat /* The writers required flush position is before or equal * to the actual position, so the writer is done... */ +#ifdef XT_XLOG_WAIT_SPINS + xt_writing = 0; + if (xt_waiting) + xt_cond_wakeall(&xl_write_cond); +#else xt_writing = FALSE; xt_cond_wakeall(&xl_write_cond); +#endif ASSERT_NS(xt_comp_log_pos(xl_write_log_id, xl_write_log_offset, xl_append_log_id, xl_append_log_offset) <= 0); return OK; } @@ -923,8 +988,14 @@ xtBool XTDatabaseLog::xlog_append(XTThreadPtr thread, size_t size1, xtWord1 *dat xt_unlock_mutex_ns(&xl_db->db_wr_lock); } } +#ifdef XT_XLOG_WAIT_SPINS + xt_writing = 0; + if (xt_waiting) + xt_cond_wakeall(&xl_write_cond); +#else xt_writing = FALSE; xt_cond_wakeall(&xl_write_cond); +#endif ASSERT_NS(xt_comp_log_pos(xl_write_log_id, xl_write_log_offset, xl_append_log_id, xl_append_log_offset) <= 0); return ok; } @@ -934,8 +1005,14 @@ xtBool XTDatabaseLog::xlog_append(XTThreadPtr thread, size_t size1, xtWord1 *dat * to copy our data into the buffer: */ if (xl_append_buf_pos + size1 + size2 <= xl_size_of_buffers) { +#ifdef XT_XLOG_WAIT_SPINS + xt_writing = 0; + if (xt_waiting) + xt_cond_wakeall(&xl_write_cond); +#else xt_writing = FALSE; xt_cond_wakeall(&xl_write_cond); +#endif goto copy_to_log_buffer; } } @@ -1055,6 +1132,7 @@ xtBool XTDatabaseLog::xlog_append(XTThreadPtr thread, size_t size1, xtWord1 *dat /* [(8)] Flush the compactor log. */ xt_lock_mutex_ns(&xl_db->db_co_dlog_lock); if (!xl_db->db_co_thread->st_dlog_buf.dlb_flush_log(TRUE, thread)) { + xl_log_bytes_written -= part_size; xt_unlock_mutex_ns(&xl_db->db_co_dlog_lock); goto write_failed; } @@ -1063,8 +1141,10 @@ xtBool XTDatabaseLog::xlog_append(XTThreadPtr thread, size_t size1, xtWord1 *dat /* And flush if required: */ flush_time = thread->st_statistics.st_xlog.ts_flush_time; - if (!xt_flush_file(xl_log_file, &thread->st_statistics.st_xlog, thread)) + if (!xt_flush_file(xl_log_file, &thread->st_statistics.st_xlog, thread)) { + xl_log_bytes_written -= part_size; goto write_failed; + } xl_last_flush_time = (u_int) (thread->st_statistics.st_xlog.ts_flush_time - flush_time); xl_log_bytes_flushed = xl_log_bytes_written; @@ -1085,8 +1165,14 @@ xtBool XTDatabaseLog::xlog_append(XTThreadPtr thread, size_t size1, xtWord1 *dat * position, continue writing: */ goto rewrite; +#ifdef XT_XLOG_WAIT_SPINS + xt_writing = 0; + if (xt_waiting) + xt_cond_wakeall(&xl_write_cond); +#else xt_writing = FALSE; xt_cond_wakeall(&xl_write_cond); +#endif ASSERT_NS(xt_comp_log_pos(xl_write_log_id, xl_write_log_offset, xl_append_log_id, xl_append_log_offset) <= 0); return OK; } @@ -1100,8 +1186,14 @@ xtBool XTDatabaseLog::xlog_append(XTThreadPtr thread, size_t size1, xtWord1 *dat if (xl_append_buf_pos + size1 + size2 > xl_size_of_buffers) goto rewrite; +#ifdef XT_XLOG_WAIT_SPINS + xt_writing = 0; + if (xt_waiting) + xt_cond_wakeall(&xl_write_cond); +#else xt_writing = FALSE; xt_cond_wakeall(&xl_write_cond); +#endif } copy_to_log_buffer: @@ -1146,8 +1238,6 @@ xtBool XTDatabaseLog::xlog_append(XTThreadPtr thread, size_t size1, xtWord1 *dat case XT_LOG_ENT_DELETE_BG: case XT_LOG_ENT_DELETE_FL: case XT_LOG_ENT_DELETE_FL_BG: - xtWord2 sum; - sum = XT_GET_DISK_2(record->xu.xu_checksum_2) ^ XT_CHECKSUM_2(xl_append_log_id); XT_SET_DISK_2(record->xu.xu_checksum_2, sum); @@ -1158,6 +1248,10 @@ xtBool XTDatabaseLog::xlog_append(XTThreadPtr thread, size_t size1, xtWord1 *dat xl_db->db_xn_total_writer_count++; } break; + case XT_LOG_ENT_REC_REMOVED_BI: + sum = XT_GET_DISK_2(record->xu.xu_checksum_2) ^ XT_CHECKSUM_2(xl_append_log_id); + XT_SET_DISK_2(record->xu.xu_checksum_2, sum); + break; case XT_LOG_ENT_ROW_NEW: case XT_LOG_ENT_ROW_NEW_FL: record->xl.xl_checksum_1 ^= XT_CHECKSUM_1(xl_append_log_id); @@ -1209,8 +1303,14 @@ xtBool XTDatabaseLog::xlog_append(XTThreadPtr thread, size_t size1, xtWord1 *dat return OK; write_failed: +#ifdef XT_XLOG_WAIT_SPINS + xt_writing = 0; + if (xt_waiting) + xt_cond_wakeall(&xl_write_cond); +#else xt_writing = FALSE; xt_cond_wakeall(&xl_write_cond); +#endif return FAILED; } @@ -1595,7 +1695,7 @@ void XTDatabaseLog::xlog_seq_close(XTXactSeqReadPtr seq) seq->xseq_log_eof = 0; } -xtBool XTDatabaseLog::xlog_seq_start(XTXactSeqReadPtr seq, xtLogID log_id, xtLogOffset log_offset, xtBool missing_ok __attribute__((unused))) +xtBool XTDatabaseLog::xlog_seq_start(XTXactSeqReadPtr seq, xtLogID log_id, xtLogOffset log_offset, xtBool XT_UNUSED(missing_ok)) { if (seq->xseq_rec_log_id != log_id) { seq->xseq_rec_log_id = log_id; @@ -2094,7 +2194,9 @@ xtBool XTDatabaseLog::xlog_seq_next(XTXactSeqReadPtr seq, XTXactLogBufferDPtr *r goto return_empty; } default: - ASSERT_NS(FALSE); + /* It is possible to land here after a crash, if the + * log was not completely written. + */ seq->xseq_record_len = 0; goto return_empty; } @@ -2304,7 +2406,13 @@ static void xlog_wr_wait_for_log_flush(XTThreadPtr self, XTDatabaseHPtr db) * the wait, and the sweeper has nothing to do, and the checkpointer. */ if (db->db_xn_curr_id == last_xn_id && - xt_xn_is_before(xt_xn_get_curr_id(db), db->db_xn_to_clean_id) && // db->db_xn_curr_id < db->db_xn_to_clean_id + /* Changed xt_xn_get_curr_id(db) to db->db_xn_curr_id, + * This should work because we are not concerned about the difference + * between xt_xn_get_curr_id(db) and db->db_xn_curr_id, + * Which is just a matter of when transactions we can expect ot find + * in memory (see {GAP-INC-ADD-XACT}) + */ + xt_xn_is_before(db->db_xn_curr_id, db->db_xn_to_clean_id) && // db->db_xn_curr_id < db->db_xn_to_clean_id !db->db_restart.xres_is_checkpoint_pending(db->db_xlog.xl_write_log_id, db->db_xlog.xl_write_log_offset)) { /* There seems to be no activity at the moment. * this might be a good time to write the log data. @@ -2409,9 +2517,6 @@ static void xlog_wr_main(XTThreadPtr self) if (!record) { break; } - /* Count the number of bytes read from the log: */ - db->db_xlog.xl_log_bytes_read += ws->ws_seqread.xseq_record_len; - switch (record->xl.xl_status_1) { case XT_LOG_ENT_HEADER: break; @@ -2435,6 +2540,8 @@ static void xlog_wr_main(XTThreadPtr self) xt_xres_apply_in_order(self, ws, ws->ws_seqread.xseq_rec_log_id, ws->ws_seqread.xseq_rec_log_offset, record); break; } + /* Count the number of bytes read from the log: */ + db->db_xlog.xl_log_bytes_read += ws->ws_seqread.xseq_record_len; } } @@ -2503,7 +2610,10 @@ static void *xlog_wr_run_thread(XTThreadPtr self) db->db_wr_idle = XT_THREAD_BUSY; } + /* + * {MYSQL-THREAD-KILL} myxt_destroy_thread(mysql_thread, TRUE); + */ return NULL; } diff --git a/storage/pbxt/src/xactlog_xt.h b/storage/pbxt/src/xactlog_xt.h index 391b646b53f..2db11898ec2 100644 --- a/storage/pbxt/src/xactlog_xt.h +++ b/storage/pbxt/src/xactlog_xt.h @@ -373,6 +373,13 @@ typedef struct XTXactLogFile { /* * The transaction log. Each database has one. */ + +/* Does not seem to make much difference... */ +#ifndef XT_NO_ATOMICS +/* This function uses atomic ops: */ +//#define XT_XLOG_WAIT_SPINS +#endif + typedef struct XTDatabaseLog { struct XTDatabase *xl_db; @@ -390,7 +397,12 @@ typedef struct XTDatabaseLog { /* The writer log buffer: */ xt_mutex_type xl_write_lock; xt_cond_type xl_write_cond; +#ifdef XT_XLOG_WAIT_SPINS + xtWord4 xt_writing; /* 1 if a thread is writing. */ + xtWord4 xt_waiting; /* Count of the threads waiting on the xl_write_cond. */ +#else xtBool xt_writing; /* TRUE if a thread is writing. */ +#endif xtLogID xl_log_id; /* The number of the write log. */ XTOpenFilePtr xl_log_file; /* The open write log. */ diff --git a/storage/pbxt/src/xt_config.h b/storage/pbxt/src/xt_config.h index 6571ebdaebe..399789da043 100644 --- a/storage/pbxt/src/xt_config.h +++ b/storage/pbxt/src/xt_config.h @@ -81,7 +81,8 @@ const int max_connections = 500; #define DEBUG #endif // _DEBUG #else -#define XT_STREAMING +// Paul suggested to disable PBMS in MariaDB for now. +// #define PBMS_ENABLED #endif #ifdef __FreeBSD__ @@ -96,4 +97,22 @@ const int max_connections = 500; #define XT_SOLARIS #endif +/* + * Definition of which atomic operations to use: + */ +#ifdef XT_WIN +/* MS Studio style embedded assembler for x86 */ +#define XT_ATOMIC_WIN32_X86 +#elif defined(__GNUC__) && (defined(__x86_64__) || defined(__i386__)) +/* Use GNU style embedded assembler for x86 */ +#define XT_ATOMIC_GNUC_X86 +#elif defined(XT_SOLARIS) +/* Use Sun atomic operations library + * http://docs.sun.com/app/docs/doc/816-5168/atomic-ops-3c?a=view + */ +#define XT_ATOMIC_SOLARIS_LIB +#else +#define XT_NO_ATOMICS +#endif + #endif diff --git a/storage/pbxt/src/xt_defs.h b/storage/pbxt/src/xt_defs.h index 16981ddc672..98867f746c9 100644 --- a/storage/pbxt/src/xt_defs.h +++ b/storage/pbxt/src/xt_defs.h @@ -187,7 +187,16 @@ typedef struct XTPathStr { char ps_path[XT_VAR_LENGTH]; } *XTPathStrPtr; -#define XT_UNUSED(x) x __attribute__((__unused__)) +//#define XT_UNUSED(x) x __attribute__((__unused__)) +#define XT_UNUSED(x) + +/* Only used when DEBUG is on: */ +#ifdef DEBUG +#define XT_NDEBUG_UNUSED(x) x +#else +//#define XT_NDEBUG_UNUSED(x) x __attribute__((__unused__)) +#define XT_NDEBUG_UNUSED(x) +#endif /* ---------------------------------------------------------------------- * MAIN CONSTANTS @@ -267,8 +276,10 @@ typedef struct XTPathStr { * the row list is scanned. * * For more details see [(9)]. + * 223, 1019, 3613 */ -#define XT_ROW_RWLOCKS 223 +#define XT_ROW_RWLOCKS 1019 +//#define XT_ROW_RWLOCKS 223 /* * These are the number of row lock "slots" per table. @@ -292,31 +303,20 @@ typedef struct XTPathStr { */ #define XT_OPEN_TABLE_FREE_TIME 30 -#ifdef XT_USE_GLOBAL_DEBUG_SIZES -/* - * DEBUG SIZES! - * Reduce the thresholds to make things happen faster. +/* Define this in order to use memory mapped files + * (record and row pointer files only). + * + * This makes no difference in sysbench R/W performance + * test. */ +//#define XT_USE_ROW_REC_MMAP_FILES -//#undef XT_ROW_RWLOCKS -//#define XT_ROW_RWLOCKS 2 - -//#undef XT_TAB_MIN_VAR_REC_LENGTH -//#define XT_TAB_MIN_VAR_REC_LENGTH 20 - -//#undef XT_ROW_LOCK_COUNT -//#define XT_ROW_LOCK_COUNT (XT_ROW_RWLOCKS * 2) - -//#undef XT_INDEX_PAGE_SHIFTS -//#define XT_INDEX_PAGE_SHIFTS 12 - -//#undef XT_INDEX_WRITE_BUFFER_SIZE -//#define XT_INDEX_WRITE_BUFFER_SIZE (40 * 1024) - -#endif - -/* Define this in order to use memory mapped files: */ -#define XT_USE_ROW_REC_MMAP_FILES +/* Define this if sequential scan should load data into the + * record cache. + * + * This is the way InnoDB behaves. + */ +#define XT_SEQ_SCAN_LOADS_CACHE /* Define this in order to use direct I/O on index files: */ /* NOTE: DO NOT ENABLE! @@ -326,32 +326,34 @@ typedef struct XTPathStr { */ //#define XT_USE_DIRECT_IO_ON_INDEX -#ifdef XT_USE_ROW_REC_MMAP_FILES - -#define XT_SEQ_SCAN_FROM_MEMORY -#define XT_ROW_REC_FILE_PTR XTMapFilePtr -#define XT_PWRITE_RR_FILE xt_pwrite_fmap -#define XT_PREAD_RR_FILE xt_pread_fmap -#define XT_FLUSH_RR_FILE xt_flush_fmap -#define XT_CLOSE_RR_FILE_NS xt_close_fmap_ns - -#else - -#define XT_ROW_REC_FILE_PTR XTOpenFilePtr -#define XT_PWRITE_RR_FILE xt_pwrite_file -#define XT_PREAD_RR_FILE xt_pread_file -#define XT_FLUSH_RR_FILE xt_flush_file -#define XT_CLOSE_RR_FILE_NS xt_close_file_ns +/* + * Define this variable if PBXT should do lazy deleting in indexes + * Note, even if the variable is not defined, PBXT will handle + * lazy deleted items in an index. + * + * NOTE: This can cause significant degrade of index scan speed. + * 25% on sysbench readonly index scan tests. + */ +//#define XT_USE_LAZY_DELETE -#endif +/* + * Define this variable if a connection should wait for the + * sweeper to clean up previous transactions executed by the + * connection, before continuing. + * + * The number of transactions that the sweeper is aload to + * lag can be dynamic, but there is a limit (XT_MAX_XACT_BEHIND) + */ +#define XT_WAIT_FOR_CLEANUP -#ifdef XT_SEQ_SCAN_FROM_MEMORY -#define XT_LOCK_MEMORY_PTR(x, f, a, s, v, c) do { x = xt_lock_fmap_ptr(f, a, s, v, c); } while (0) -#define XT_UNLOCK_MEMORY_PTR(f, v) xt_unlock_fmap_ptr(f, v); -#else -#define XT_LOCK_MEMORY_PTR(x, f, a, v, c) -#define XT_UNLOCK_MEMORY_PTR(f, v) -#endif +/* + * This seems to be the optimal value, at least according to + * sysbench/sysbench run --test=oltp --num-threads=128 --max-requests=50000 --mysql-user=root + * --oltp-table-size=100000 --oltp-table-name=sb_pbxt --mysql-engine-trx=yes + * + * Using 8, 16 and 128 threads. + */ +#define XT_MAX_XACT_BEHIND 2 /* {NO-ACTION-BUG} * Define this to implement NO ACTION correctly @@ -405,6 +407,60 @@ typedef struct XTPathStr { #define XT_ADD_PTR(p, l) ((void *) ((char *) (p) + (l))) /* ---------------------------------------------------------------------- + * DEFINES DEPENDENT ON CONSTANTS + */ + +#ifdef XT_USE_ROW_REC_MMAP_FILES + +#define XT_ROW_REC_FILE_PTR XTMapFilePtr +#define XT_PWRITE_RR_FILE xt_pwrite_fmap +#define XT_PREAD_RR_FILE xt_pread_fmap +#define XT_FLUSH_RR_FILE xt_flush_fmap +#define XT_CLOSE_RR_FILE_NS xt_close_fmap_ns + +#define XT_LOCK_MEMORY_PTR(x, f, a, s, v, c) do { x = xt_lock_fmap_ptr(f, a, s, v, c); } while (0) +#define XT_UNLOCK_MEMORY_PTR(f, d, e, v) do { xt_unlock_fmap_ptr(f, v); d = NULL; } while (0) + +#else + +#define XT_ROW_REC_FILE_PTR XTOpenFilePtr +#define XT_PWRITE_RR_FILE xt_pwrite_file +#define XT_PREAD_RR_FILE xt_pread_file +#define XT_FLUSH_RR_FILE xt_flush_file +#define XT_CLOSE_RR_FILE_NS xt_close_file_ns + +#define XT_LOCK_MEMORY_PTR(x, f, a, s, v, c) do { if (!xt_lock_file_ptr(f, &x, a, s, v, c)) x = NULL; } while (0) +#define XT_UNLOCK_MEMORY_PTR(f, d, e, v) do { if (e) { xt_unlock_file_ptr(f, d, v); d = NULL; } } while (0) + +#endif + +/* ---------------------------------------------------------------------- + * DEBUG SIZES! + * Reduce the thresholds to make things happen faster. + */ + +#ifdef XT_USE_GLOBAL_DEBUG_SIZES + +//#undef XT_ROW_RWLOCKS +//#define XT_ROW_RWLOCKS 2 + +//#undef XT_TAB_MIN_VAR_REC_LENGTH +//#define XT_TAB_MIN_VAR_REC_LENGTH 20 + +//#undef XT_ROW_LOCK_COUNT +//#define XT_ROW_LOCK_COUNT (XT_ROW_RWLOCKS * 2) + +//#undef XT_INDEX_PAGE_SHIFTS +//#define XT_INDEX_PAGE_SHIFTS 8 // 256 +//#undef XT_BLOCK_SIZE_FOR_DIRECT_IO +//#define XT_BLOCK_SIZE_FOR_DIRECT_IO 256 + +//#undef XT_INDEX_WRITE_BUFFER_SIZE +//#define XT_INDEX_WRITE_BUFFER_SIZE (40 * 1024) + +#endif + +/* ---------------------------------------------------------------------- * BYTE ORDER */ @@ -645,6 +701,14 @@ typedef struct xtIndexNodeID { #define XT_XACT_ID_SIZE 4 #define XT_CHECKSUM4_XACT(x) (x) +#ifdef XT_WIN +#define __FUNC__ __FUNCTION__ +#elif defined(XT_SOLARIS) +#define __FUNC__ "__func__" +#else +#define __FUNC__ __PRETTY_FUNCTION__ +#endif + /* ---------------------------------------------------------------------- * GLOBAL VARIABLES */ @@ -669,6 +733,7 @@ extern xtBool pbxt_crash_debug; #define MYSQL_THD Session * #define THR_THD THR_Session #define STRUCT_TABLE class Table +#define TABLE_SHARE TableShare #define MYSQL_TYPE_STRING DRIZZLE_TYPE_VARCHAR #define MYSQL_TYPE_VARCHAR DRIZZLE_TYPE_VARCHAR @@ -687,6 +752,7 @@ extern xtBool pbxt_crash_debug; #define mx_tmp_use_all_columns(x, y) (x)->use_all_columns(y) #define mx_tmp_restore_column_map(x, y) (x)->restore_column_map(y) +#define MX_BIT_FAST_TEST_AND_SET(x, y) bitmap_test_and_set(x, y) #define MX_TABLE_TYPES_T handler::Table_flags #define MX_UINT8_T uint8_t @@ -696,6 +762,7 @@ extern xtBool pbxt_crash_debug; #define MX_CHARSET_INFO struct charset_info_st #define MX_CONST_CHARSET_INFO const struct charset_info_st #define MX_CONST const + #define my_bool bool #define int16 int16_t #define int32 int32_t @@ -712,6 +779,9 @@ extern xtBool pbxt_crash_debug; #define HA_CAN_SQL_HANDLER 0 #define HA_CAN_INSERT_DELAYED 0 +#define HA_BINLOG_ROW_CAPABLE 0 +#define HA_BINLOG_STMT_CAPABLE 0 +#define HA_CACHE_TBL_TRANSACT 0 #define max cmax #define min cmin @@ -734,6 +804,7 @@ extern xtBool pbxt_crash_debug; #define thd_tablespace_op session_tablespace_op #define thd_alloc session_alloc #define thd_make_lex_string session_make_lex_string +#define column_bitmaps_signal() #define my_pthread_setspecific_ptr(T, V) pthread_setspecific(T, (void*) (V)) @@ -750,6 +821,9 @@ extern xtBool pbxt_crash_debug; (((uint32_t) (((const unsigned char*) (A))[1])) << 16) +\ (((uint32_t) (((const unsigned char*) (A))[0])) << 24))) +class PBXTStorageEngine; +typedef PBXTStorageEngine handlerton; + #else // DRIZZLED /* The MySQL case: */ #if MYSQL_VERSION_ID >= 60008 @@ -760,6 +834,7 @@ extern xtBool pbxt_crash_debug; #define mx_tmp_use_all_columns dbug_tmp_use_all_columns #define mx_tmp_restore_column_map(x, y) dbug_tmp_restore_column_map((x)->read_set, y) +#define MX_BIT_FAST_TEST_AND_SET(x, y) bitmap_fast_test_and_set(x, y) #define MX_TABLE_TYPES_T ulonglong #define MX_UINT8_T uint8 @@ -772,6 +847,11 @@ extern xtBool pbxt_crash_debug; #endif // DRIZZLED +#define MX_BITMAP MY_BITMAP +#define MX_BIT_SIZE() n_bits +#define MX_BIT_IS_SUBSET(x, y) bitmap_is_subset(x, y) +#define MX_BIT_SET(x, y) bitmap_set_bit(x, y) + #ifndef XT_SCAN_CORE_DEFINED #define XT_SCAN_CORE_DEFINED xtBool xt_mm_scan_core(void); diff --git a/storage/pbxt/src/xt_errno.h b/storage/pbxt/src/xt_errno.h index 4d74589efe3..60e43d5fdfa 100644 --- a/storage/pbxt/src/xt_errno.h +++ b/storage/pbxt/src/xt_errno.h @@ -117,6 +117,8 @@ #define XT_ERR_NEW_TYPE_OF_XLOG -93 #define XT_ERR_NO_BEFORE_IMAGE -94 #define XT_ERR_FK_REF_TEMP_TABLE -95 +#define XT_ERR_MYSQL_SHUTDOWN -98 +#define XT_ERR_MYSQL_NO_THREAD -99 #ifdef XT_WIN #define XT_ENOMEM ERROR_NOT_ENOUGH_MEMORY diff --git a/storage/xtradb/Makefile.am b/storage/xtradb/Makefile.am index 8f64aedb9b0..3b5826f33e6 100644 --- a/storage/xtradb/Makefile.am +++ b/storage/xtradb/Makefile.am @@ -131,7 +131,8 @@ noinst_HEADERS= include/btr0btr.h include/btr0btr.ic \ include/ut0list.ic include/ut0wqueue.h \ include/ha_prototypes.h handler/ha_innodb.h \ include/handler0alter.h \ - handler/i_s.h handler/innodb_patch_info.h + handler/i_s.h handler/innodb_patch_info.h \ + handler/handler0vars.h EXTRA_LIBRARIES= libinnobase.a noinst_LIBRARIES= @plugin_innobase_static_target@ diff --git a/storage/xtradb/buf/buf0flu.c b/storage/xtradb/buf/buf0flu.c index 752380d116c..3122cbabee7 100644 --- a/storage/xtradb/buf/buf0flu.c +++ b/storage/xtradb/buf/buf0flu.c @@ -1120,6 +1120,10 @@ retry_lock_1: /* Try to flush also all the neighbors */ page_count += buf_flush_try_neighbors( space, offset, flush_type, srv_flush_neighbor_pages); + mutex_t* block_mutex; + buf_page_t* bpage_tmp; + block_mutex = buf_page_get_mutex(bpage); + bpage_tmp = buf_page_hash_get(space, offset); /* fprintf(stderr, "Flush type %lu, page no %lu, neighb %lu\n", flush_type, offset, @@ -1234,7 +1238,8 @@ buf_flush_LRU_recommendation(void) + BUF_FLUSH_EXTRA_MARGIN) && (distance < BUF_LRU_FREE_SEARCH_LEN)) { - if (!bpage->in_LRU_list) { + mutex_t* block_mutex; + if (!bpage->in_LRU_list) { /* reatart. but it is very optimistic */ bpage = UT_LIST_GET_LAST(buf_pool->LRU); continue; diff --git a/storage/xtradb/fil/fil0fil.c b/storage/xtradb/fil/fil0fil.c index ba46bfc9478..c60136e27c0 100644 --- a/storage/xtradb/fil/fil0fil.c +++ b/storage/xtradb/fil/fil0fil.c @@ -45,7 +45,9 @@ Created 10/25/1995 Heikki Tuuri #include "trx0trx.h" #include "trx0sys.h" #include "pars0pars.h" +#include "row0row.h" #include "row0mysql.h" +#include "que0que.h" /* @@ -3131,13 +3133,13 @@ skip_info: mem_heap_t* heap = NULL; ulint offsets_[REC_OFFS_NORMAL_SIZE]; ulint* offsets = offsets_; - ib_int64_t offset; + ib_int64_t offset; size = (ulint) (size_bytes / UNIV_PAGE_SIZE); /* over write space id of all pages */ rec_offs_init(offsets_); - fprintf(stderr, "InnoDB: Progress in %:"); + fprintf(stderr, "%s", "InnoDB: Progress in %:"); for (offset = 0; offset < size_bytes; offset += UNIV_PAGE_SIZE) { success = os_file_read(file, page, diff --git a/storage/xtradb/handler/i_s.cc b/storage/xtradb/handler/i_s.cc index 89f40bb7151..e45f2b4dac0 100644 --- a/storage/xtradb/handler/i_s.cc +++ b/storage/xtradb/handler/i_s.cc @@ -763,7 +763,7 @@ i_s_innodb_buffer_pool_pages_index_fill( dict_index_t* index; dulint index_id; - char *p; + const char *p; char db_name_raw[NAME_LEN*5+1], db_name[NAME_LEN+1]; char table_name_raw[NAME_LEN*5+1], table_name[NAME_LEN+1]; diff --git a/storage/xtradb/mtr/mtr0mtr.c b/storage/xtradb/mtr/mtr0mtr.c index 703f9b18eed..1f811a6eea3 100644 --- a/storage/xtradb/mtr/mtr0mtr.c +++ b/storage/xtradb/mtr/mtr0mtr.c @@ -32,6 +32,7 @@ Created 11/26/1995 Heikki Tuuri #include "page0types.h" #include "mtr0log.h" #include "log0log.h" +#include "buf0flu.h" /********************************************************************* Releases the item in the slot given. */ diff --git a/storage/xtradb/srv/srv0srv.c b/storage/xtradb/srv/srv0srv.c index c81a5e17f55..d261a69991c 100644 --- a/storage/xtradb/srv/srv0srv.c +++ b/storage/xtradb/srv/srv0srv.c @@ -1816,7 +1816,6 @@ srv_printf_innodb_monitor( ulint btr_search_sys_subtotal; ulint lock_sys_subtotal; ulint recv_sys_subtotal; - ulint io_counter_subtotal; ulint i; trx_t* trx; @@ -2696,6 +2695,8 @@ loop: ib_uint64_t level, bpl; buf_page_t* bpage; + mutex_exit(&(log_sys->mutex)); + mutex_exit(&(log_sys->mutex)); mutex_enter(&flush_list_mutex); diff --git a/storage/xtradb/srv/srv0start.c b/storage/xtradb/srv/srv0start.c index 9239af0cad5..7140b59964d 100644 --- a/storage/xtradb/srv/srv0start.c +++ b/storage/xtradb/srv/srv0start.c @@ -122,20 +122,6 @@ static char* srv_monitor_file_name; #define SRV_MAX_N_PENDING_SYNC_IOS 100 -/* Avoid warnings when using purify */ - -#ifdef HAVE_valgrind -static int inno_bcmp(register const char *s1, register const char *s2, - register uint len) -{ - while ((len-- != 0) && (*s1++ == *s2++)) - ; - - return(len + 1); -} -#define memcmp(A,B,C) inno_bcmp((A),(B),(C)) -#endif - static char* srv_parse_megabytes( diff --git a/strings/decimal.c b/strings/decimal.c index 136ce98c12b..90834a90dbe 100644 --- a/strings/decimal.c +++ b/strings/decimal.c @@ -30,7 +30,7 @@ integer that determines the number of significant digits in a particular radix R, where R is either 2 or 10. S is a non-negative integer. Every value of an exact numeric type of scale S is of the - form n*10^{-S}, where n is an integer such that -R^P <= n <= R^P. + form n*10^{-S}, where n is an integer such that �-R^P <= n <= R^P. [...] @@ -306,7 +306,7 @@ int decimal_actual_fraction(decimal_t *from) { for (i= DIG_PER_DEC1 - ((frac - 1) % DIG_PER_DEC1); *buf0 % powers10[i++] == 0; - frac--) ; + frac--) {} } return frac; } @@ -500,7 +500,7 @@ static void digits_bounds(decimal_t *from, int *start_result, int *end_result) stop= (int) ((buf_end - from->buf + 1) * DIG_PER_DEC1); i= 1; } - for (; *buf_end % powers10[i++] == 0; stop--) ; + for (; *buf_end % powers10[i++] == 0; stop--) {} *end_result= stop; /* index of position after last decimal digit (from 0) */ } @@ -1011,7 +1011,7 @@ static int ull2dec(ulonglong from, decimal_t *to) sanity(to); - for (intg1=1; from >= DIG_BASE; intg1++, from/=DIG_BASE) ; + for (intg1=1; from >= DIG_BASE; intg1++, from/=DIG_BASE) {} if (unlikely(intg1 > to->len)) { intg1=to->len; diff --git a/support-files/ccfilter b/support-files/ccfilter new file mode 100644 index 00000000000..e2957cd3228 --- /dev/null +++ b/support-files/ccfilter @@ -0,0 +1,104 @@ +#! /usr/bin/perl + +# Post-processor for compiler output to filter out warnings matched in +# support-files/compiler_warnings.supp. This makes it easier to check +# that no new warnings are introduced without needing to submit a build +# for Buildbot. +# +# Use by setting CC="ccfilter gcc" CXX="ccfilter gcc" before ./configure. +# +# By default, just filters the output for suppressed warnings. If the +# FAILONWARNING environment variable is set, then instead will fail the +# compile on encountering a non-suppressed warnings. + +use strict; +use warnings; + +my $suppressions; + +open STDOUT_COPY, ">&STDOUT" + or die "Failed to dup stdout: $!]n"; + +my $pid= open(PIPE, '-|'); + +if (!defined($pid)) { + die "Error: Cannot fork(): $!\n"; +} elsif (!$pid) { + # Child. + # actually want to send the STDERR to the parent, not the STDOUT. + # So shuffle things around a bit. + open STDERR, ">&STDOUT" + or die "Child: Failed to dup pipe to parent: $!\n"; + open STDOUT, ">&STDOUT_COPY" + or die "Child: Failed to dup parent stdout: $!\n"; + close STDOUT_COPY; + exec { $ARGV[0] } @ARGV; + die "Child: exec() failed: $!\n"; +} else { + # Parent. + close STDOUT_COPY; + my $cwd= qx(pwd); + chomp($cwd); + while (<PIPE>) { + my $line= $_; + if (/^(.*?):([0-9]+): [Ww]arning: (.*)$/) { + my ($file, $lineno, $msg)= ($1, $2, $3); + $file= "$cwd/$file"; + + next + if check_if_suppressed($file, $lineno, $msg); + die "$line\nGot warning, terminating.\n" + if $ENV{FAILONWARNING}; + print STDERR $line; + next; + } + + print STDERR $line; + } + close(PIPE); +} + +exit 0; + +sub check_if_suppressed { + my ($file, $lineno, $msg)= @_; + load_suppressions() unless defined($suppressions); + for my $s (@$suppressions) { + my ($file_re, $msg_re, $start, $end)= @$s; + if ($file =~ /$file_re/ && + $msg =~ /$msg_re/ && + (!defined($start) || $start <= $lineno) && + (!defined($end) || $end >= $lineno)) { + return 1; + } + } + return undef; +} + +sub load_suppressions { + # First find the suppressions file, might be we need to move up to + # the base directory. + my $path = "support-files/compiler_warnings.supp"; + my $exists; + for (1..10) { + $exists= -f $path; + last if $exists; + $path= '../'. $path; + } + die "Error: Could not find suppression file (out of source dir?).\n" + unless $exists; + + $suppressions= []; + open "F", "<", $path + or die "Error: Could not read suppression file '$path': $!\n"; + while (<F>) { + # Skip comment and empty lines. + next if /^\s*(\#.*)?$/; + die "Invalid syntax in suppression file '$path', line $.:\n$_" + unless /^\s*(.+?)\s*:\s*(.+?)\s*(?:[:]\s*([0-9]+)(?:-([0-9]+))?\s*)?$/; + my ($file_re, $line_re, $start, $end)= ($1, $2, $3, $4); + $end = $start + if defined($start) && !defined($end); + push @$suppressions, [$file_re, $line_re, $start, $end]; + } +} diff --git a/support-files/compiler_warnings.supp b/support-files/compiler_warnings.supp index fbe4b3d21c9..535b8666ec1 100644 --- a/support-files/compiler_warnings.supp +++ b/support-files/compiler_warnings.supp @@ -25,6 +25,9 @@ sql_yacc.cc : .*switch statement contains 'default' but no 'case' labels.* pars0grm.tab.c: .*'yyerrorlab' : unreferenced label.* _flex_tmp.c: .*not enough actual parameters for macro 'yywrap'.* pars0lex.l: .*conversion from 'ulint' to 'int', possible loss of data.* +btr/btr0cur\.c: .*value computed is not used.*: 3175-3375 +include/buf0buf\.ic: unused parameter ‘mtr’ +fil/fil0fil\.c: comparison between signed and unsigned : 3100-3199 # # bdb is not critical to keep up to date @@ -41,6 +44,12 @@ db_vrfy.c : .*comparison is always false due to limited range of data type.* .*/cmd-line-utils/readline/.* : .* # +# Ignore some warnings in libevent, which is not maintained by us. +# +.*/extra/libevent/.* : .*unused parameter.* +.*/extra/libevent/select\.c : .*comparison between signed and unsigned.* : 270-280 + +# # Ignore all conversion warnings on windows 64 # (Is safe as we are not yet supporting strings >= 2G) # @@ -75,6 +84,17 @@ db_vrfy.c : .*comparison is always false due to limited range of data type.* storage/maria/ma_pagecache.c: .*'info_check_pin' defined but not used # +# I think these are due to mix of C and C++. +# +storage/pbxt/ : typedef.*was ignored in this declaration + + +# +# Groff warnings on OpenSUSE. +# +.*/dbug/.*(groff|<standard input>) : .* + +# # Unexplanable (?) stuff # listener.cc : .*conversion from 'SOCKET' to 'int'.* diff --git a/unittest/mysys/waiting_threads-t.c b/unittest/mysys/waiting_threads-t.c index 809c25657e5..d6c8dc31025 100644 --- a/unittest/mysys/waiting_threads-t.c +++ b/unittest/mysys/waiting_threads-t.c @@ -263,7 +263,7 @@ void do_tests() kill_strategy=X; \ do_one_test(); #else - #define test_kill_strategy(X) \ +#define test_kill_strategy(X) \ diag("kill strategy: " #X); \ DBUG_PRINT("info", ("kill strategy: " #X)); \ kill_strategy=X; \ |