diff options
author | unknown <jcole@tetra.spaceapes.com> | 2001-08-02 14:56:14 -0500 |
---|---|---|
committer | unknown <jcole@tetra.spaceapes.com> | 2001-08-02 14:56:14 -0500 |
commit | 6fe5b189cd2b73a8bb4eef2e7ad1470af778a257 (patch) | |
tree | 06ec7a44bfd362ee1795db042d56af9c2be22327 /Docs | |
parent | aee57a52b8af28b45ce0962cf4976187d950795c (diff) | |
parent | 89e1029b2711753ee54aa6893b867e46380bd231 (diff) | |
download | mariadb-git-6fe5b189cd2b73a8bb4eef2e7ad1470af778a257.tar.gz |
Merge jcole@work.mysql.com:/home/bk/mysql
into tetra.spaceapes.com:/home/jcole/bk/mysql
Docs/manual.texi:
Auto merged
Diffstat (limited to 'Docs')
-rw-r--r-- | Docs/manual.texi | 5857 |
1 files changed, 3001 insertions, 2856 deletions
diff --git a/Docs/manual.texi b/Docs/manual.texi index 769b898f8e0..ab8e1815c5e 100644 --- a/Docs/manual.texi +++ b/Docs/manual.texi @@ -113,11 +113,10 @@ distribution for that version. * Installing:: Installing @strong{MySQL} * Tutorial:: @strong{MySQL} Tutorial * MySQL Database Administration:: +* MySQL Optimization:: * Reference:: @strong{MySQL} language reference * Table types:: @strong{MySQL} table types * Fulltext Search:: Fulltext Search -* Performance:: Getting maximum performance from @strong{MySQL} -* MySQL Benchmarks:: The @strong{MySQL} benchmark suite * Maintenance:: Maintaining a @strong{MySQL} installation * Adding functions:: Adding new functions to @strong{MySQL} * Adding procedures:: Adding new procedures to @strong{MySQL} @@ -320,7 +319,8 @@ us. you are looking for, you should give it a try. @strong{MySQL} also has a very practical set of features developed in very close cooperation with our users. You can find a performance comparison of @strong{MySQL} -to some other database managers on our benchmark page. @xref{Benchmarks}. +to some other database managers on our benchmark page. +@xref{MySQL Benchmarks}. @strong{MySQL} was originally developed to handle very large databases much faster than existing solutions and has been successfully used in @@ -4488,7 +4488,7 @@ For a list of all supported limits, functions, and types, see the @item Performance For a true comparison of speed, consult the growing @strong{MySQL} benchmark -suite. @xref{Benchmarks}. +suite. @xref{MySQL Benchmarks}. Because there is no thread creation overhead, a small parser, few features, and simple security, @code{mSQL} should be quicker at: @@ -4547,7 +4547,7 @@ slower than @strong{MySQL} was seen. This is due to @code{mSQL}'s lack of a join optimizer to order tables in the optimal order. However, if you put the tables in exactly the right order in @code{mSQL}2 and the @code{WHERE} is simple and uses index columns, the join will be relatively fast! -@xref{Benchmarks}. +@xref{MySQL Benchmarks}. @item @code{ORDER BY} and @code{GROUP BY}. @item @@ -7250,11 +7250,11 @@ hundreds of megabytes of data. @item The @strong{MySQL} benchmark suite This runs a range of common queries. It is also a test to see whether the latest batch of optimizations actually made the code faster. -@xref{Benchmarks}. +@xref{MySQL Benchmarks}. @item The @code{crash-me} test This tries to determine what features the database supports and what its -capabilities and limitations are. @xref{Benchmarks}. +capabilities and limitations are. @xref{MySQL Benchmarks}. @end table Another test is that we use the newest @strong{MySQL} version in our internal @@ -14280,7 +14280,7 @@ FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' ESCAPED BY '\\' @end example -@node MySQL Database Administration, Reference, Tutorial, Top +@node MySQL Database Administration, MySQL Optimization, Tutorial, Top @chapter MySQL Database Administration @menu @@ -19851,7 +19851,7 @@ be run at the same time. @item @code{thread_stack} The stack size for each thread. Many of the limits detected by the @code{crash-me} test are dependent on this value. The default is -large enough for normal operation. @xref{Benchmarks}. +large enough for normal operation. @xref{MySQL Benchmarks}. @item @code{timezone} The timezone for the server. @@ -23664,8 +23664,2988 @@ isolate it into a separate test case first. Then report the problem to @email{bugs@@lists.mysql.com} with as much info as possible. +@node MySQL Optimization, Reference, MySQL Database Administration, Top +@chapter MySQL Optimization -@node Reference, Table types, MySQL Database Administration, Top +@menu +* Optimize Overview:: +* Query Speed:: +* Locking Issues:: +* Optimizing Database Structure:: +* Optimizing the Server:: +* Disk issues:: +@end menu + + +Optimization is a complicated task because it ultimately requires +understanding of the whole system. While it may be possible to do some +local optimizations with small knowledge of your system or application, +the more optimal you want your system to become the more you will have +to know about it. + +This chapter will try to explain and give some examples of different +ways to optimize @strong{MySQL}. Remember, however, that there are +always some (increasingly harder) additional ways to make the system +even faster. + + +@node Optimize Overview, Query Speed, MySQL Optimization, MySQL Optimization +@section Optimization Overview + +The most important part for getting a system fast is of course the basic +design. You also need to know what kinds of things your system will be +doing, and what your bottlenecks are. + +The most common bottlenecks are: +@itemize @bullet +@item Disk seeks. +It takes time for the disk to find a piece of data. With modern disks in +1999, the mean time for this is usually lower than 10ms, so we can in +theory do about 1000 seeks a second. This time improves slowly with new +disks and is very hard to optimize for a single table. The way to +optimize this is to spread the data on more than one disk. + +@item Disk reading/writing. +When the disk is at the correct position we need to read the data. With +modern disks in 1999, one disk delivers something like 10-20Mb/s. This +is easier to optimize than seeks because you can read in parallel from +multiple disks. + +@item CPU cycles. +When we have the data in main memory (or if it already were +there) we need to process it to get to our result. Having small +tables compared to the memory is the most common limiting +factor. But then, with small tables speed is usually not the problem. + +@item Memory bandwidth. +When the CPU needs more data than can fit in the CPU cache the main +memory bandwidth becomes a bottleneck. This is an uncommon bottleneck +for most systems, but one should be aware of it. +@end itemize + + +@menu +* Design Limitations:: +* Portability:: +* Internal use:: +* MySQL Benchmarks:: +* Custom Benchmarks:: +@end menu + +@node Design Limitations, Portability, Optimize Overview, Optimize Overview +@subsection MySQL Design Limitations/Tradeoffs + +@cindex design, limitations +@cindex limitations, design + +Because @strong{MySQL} uses extremely fast table locking (multiple readers / +single writers) the biggest remaining problem is a mix of a steady stream of +inserts and slow selects on the same table. + +We believe that for a huge number of systems the extremely fast +performance in other cases make this choice a win. This case is usually +also possible to solve by having multiple copies of the table, but it +takes more effort and hardware. + +We are also working on some extensions to solve this problem for some +common application niches. + + +@node Portability, Internal use, Design Limitations, Optimize Overview +@subsection Portability + +@cindex portability +@cindex crash-me program +@cindex programs, crash-me + +Because all SQL servers implement different parts of SQL, it takes work to +write portable SQL applications. For very simple selects/inserts it is +very easy, but the more you need the harder it gets. If you want an +application that is fast with many databases it becomes even harder! + +To make a complex application portable you need to choose a number of +SQL servers that it should work with. + +You can use the @strong{MySQL} crash-me program/web-page +@uref{http://www.mysql.com/information/crash-me.php} to find functions, +types, and limits you can use with a selection of database +servers. Crash-me now tests far from everything possible, but it +is still comprehensive with about 450 things tested. + +For example, you shouldn't have column names longer than 18 characters +if you want to be able to use Informix or DB2. + +Both the @strong{MySQL} benchmarks and crash-me programs are very +database-independent. By taking a look at how we have handled this, you +can get a feeling for what you have to do to write your application +database-independent. The benchmarks themselves can be found in the +@file{sql-bench} directory in the @strong{MySQL} source +distribution. They are written in Perl with DBI database interface +(which solves the access part of the problem). + +See @uref{http://www.mysql.com/information/benchmarks.html} for the results +from this benchmark. + +As you can see in these results, all databases have some weak points. That +is, they have different design compromises that lead to different +behavior. + +If you strive for database independence, you need to get a good feeling +for each SQL server's bottlenecks. @strong{MySQL} is VERY fast in +retrieving and updating things, but will have a problem in mixing slow +readers/writers on the same table. Oracle, on the other hand, has a big +problem when you try to access rows that you have recently updated +(until they are flushed to disk). Transaction databases in general are +not very good at generating summary tables from log tables, as in this +case row locking is almost useless. + +To get your application @emph{really} database-independent, you need to define +an easy extendable interface through which you manipulate your data. As +C++ is available on most systems, it makes sense to use a C++ classes +interface to the databases. + +If you use some specific feature for some database (like the +@code{REPLACE} command in @strong{MySQL}), you should code a method for +the other SQL servers to implement the same feature (but slower). With +@strong{MySQL} you can use the @code{/*! */} syntax to add +@strong{MySQL}-specific keywords to a query. The code inside +@code{/**/} will be treated as a comment (ignored) by most other SQL +servers. + +If REAL high performance is more important than exactness, as in some +Web applications, a possibility is to create an application layer that +caches all results to give you even higher performance. By letting +old results 'expire' after a while, you can keep the cache reasonably +fresh. This is quite nice in case of extremely high load, in which case +you can dynamically increase the cache and set the expire timeout higher +until things get back to normal. + +In this case the table creation information should contain information +of the initial size of the cache and how often the table should normally +be refreshed. + +@node Internal use, MySQL Benchmarks, Portability, Optimize Overview +@subsection What Have We Used MySQL For? + +@cindex uses, of MySQL +@cindex customers, of MySQL + +During @strong{MySQL} initial development, the features of @strong{MySQL} +were made to fit our largest customer. They handle data warehousing for a +couple of the biggest retailers in Sweden. + +From all stores, we get weekly summaries of all bonus card transactions, +and we are expected to provide useful information for the store owners +to help them find how their advertisement campaigns are affecting their +customers. + +The data is quite huge (about 7 million summary transactions per month), +and we have data for 4-10 years that we need to present to the users. +We got weekly requests from the customers that they want to get +'instant' access to new reports from this data. + +We solved this by storing all information per month in compressed +'transaction' tables. We have a set of simple macros (script) that +generates summary tables grouped by different criteria (product group, +customer id, store ...) from the transaction tables. The reports are +Web pages that are dynamically generated by a small Perl script that +parses a Web page, executes the SQL statements in it, and inserts the +results. We would have used PHP or mod_perl instead but they were +not available at that time. + +For graphical data we wrote a simple tool in @code{C} that can produce +GIFs based on the result of a SQL query (with some processing of the +result). This is also dynamically executed from the Perl script that +parses the @code{HTML} files. + +In most cases a new report can simply be done by copying an existing +script and modifying the SQL query in it. In some cases, we will need to +add more fields to an existing summary table or generate a new one, but +this is also quite simple, as we keep all transactions tables on disk. +(Currently we have at least 50G of transactions tables and 200G of other +customer data.) + +We also let our customers access the summary tables directly with ODBC +so that the advanced users can themselves experiment with the data. + +We haven't had any problems handling this with quite modest Sun Ultra +SPARCstation (2x200 Mhz). We recently upgraded one of our servers to a 2 +CPU 400 Mhz UltraSPARC, and we are now planning to start handling +transactions on the product level, which would mean a ten-fold increase +of data. We think we can keep up with this by just adding more disk to +our systems. + +We are also experimenting with Intel-Linux to be able to get more CPU +power cheaper. Now that we have the binary portable database format (new +in Version 3.23), we will start to use this for some parts of the application. + +Our initial feelings are that Linux will perform much better on +low-to-medium load and Solaris will perform better when you start to get a +high load because of extreme disk IO, but we don't yet have anything +conclusive about this. After some discussion with a Linux Kernel +developer, this might be a side effect of Linux giving so much resources +to the batch job that the interactive performance gets very low. This +makes the machine feel very slow and unresponsive while big batches are +going. Hopefully this will be better handled in future Linux Kernels. + + +@node MySQL Benchmarks, Custom Benchmarks, Internal use, Optimize Overview +@subsection The MySQL Benchmark Suite + +@cindex benchmark suite +@cindex crash-me program + +This should contain a technical description of the @strong{MySQL} +benchmark suite (and @code{crash-me}), but that description is not +written yet. Currently, you can get a good idea of the benchmark by +looking at the code and results in the @file{sql-bench} directory in any +@strong{MySQL} source distributions. + +This benchmark suite is meant to be a benchmark that will tell any user +what things a given SQL implementation performs well or poorly at. + +Note that this benchmark is single threaded, so it measures the minimum +time for the operations. We plan to in the future add a lot of +multi-threaded tests to the benchmark suite. + +For example, (run on the same NT 4.0 machine): + +@multitable @columnfractions .6 .2 .2 +@strong{Reading 2000000 rows by index} @tab @strong{Seconds} @tab @strong{Seconds} +@item mysql @tab 367 @tab 249 +@item mysql_odbc @tab 464 +@item db2_odbc @tab 1206 +@item informix_odbc @tab 121126 +@item ms-sql_odbc @tab 1634 +@item oracle_odbc @tab 20800 +@item solid_odbc @tab 877 +@item sybase_odbc @tab 17614 +@end multitable + +@multitable @columnfractions .6 .2 .2 +@strong{Inserting (350768) rows} @tab @strong{Seconds} @tab @strong{Seconds} +@item mysql @tab 381 @tab 206 +@item mysql_odbc @tab 619 +@item db2_odbc @tab 3460 +@item informix_odbc @tab 2692 +@item ms-sql_odbc @tab 4012 +@item oracle_odbc @tab 11291 +@item solid_odbc @tab 1801 +@item sybase_odbc @tab 4802 +@end multitable + +In the above test @strong{MySQL} was run with a 8M index cache. + +We have gather some more benchmark results at +@uref{http://www.mysql.com/information/benchmarks.html}. + +Note that Oracle is not included because they asked to be removed. All +Oracle benchmarks have to be passed by Oracle! We believe that makes +Oracle benchmarks @strong{VERY} biased because the above benchmarks are +supposed to show what a standard installation can do for a single +client. + +To run the benchmark suite, you have to download a @strong{MySQL} source +distribution, install the perl DBI driver, the perl DBD driver for the +database you want to test and then do: + +@example +cd sql-bench +perl run-all-tests --server=# +@end example + +where # is one of supported servers. You can get a list of all options +and supported servers by doing @code{run-all-tests --help}. + +@cindex crash-me +@code{crash-me} tries to determine what features a database supports and +what its capabilities and limitations are by actually running +queries. For example, it determines: + +@itemize @bullet +@item +What column types are supported +@item +How many indexes are supported +@item +What functions are supported +@item +How big a query can be +@item +How big a @code{VARCHAR} column can be +@end itemize + +We can find the result from crash-me on a lot of different databases at +@uref{http://www.mysql.com/information/crash-me.php}. + + +@node Custom Benchmarks, , MySQL Benchmarks, Optimize Overview +@subsection Using Your Own Benchmarks + +@cindex benchmarks +@cindex performance, benchmarks + +You should definitely benchmark your application and database to find +out where the bottlenecks are. By fixing it (or by replacing the +bottleneck with a 'dummy module') you can then easily identify the next +bottleneck (and so on). Even if the overall performance for your +application is sufficient, you should at least make a plan for each +bottleneck, and decide how to solve it if someday you really need the +extra performance. + +For an example of portable benchmark programs, look at the @strong{MySQL} +benchmark suite. @xref{MySQL Benchmarks, , @strong{MySQL} Benchmarks}. You +can take any program from this suite and modify it for your needs. By doing +this, you can try different solutions to your problem and test which is really +the fastest solution for you. + +It is very common that some problems only occur when the system is very +heavily loaded. We have had many customers who contact us when they +have a (tested) system in production and have encountered load problems. In +every one of these cases so far, it has been problems with basic design +(table scans are NOT good at high load) or OS/Library issues. Most of +this would be a @strong{LOT} easier to fix if the systems were not +already in production. + +To avoid problems like this, you should put some effort into benchmarking +your whole application under the worst possible load! You can use +Super Smack for this, and it is available at: +@uref{http://www.mysql.com/Downloads/super-smack/super-smack-1.0.tar.gz}. +As the name suggests, it can bring your system down to its knees if you ask it, +so make sure to use it only on your development systems. + + +@node Query Speed, Locking Issues, Optimize Overview, MySQL Optimization +@section Optimizing @code{SELECT}s and Other Queries + +@cindex queries, speed of +@cindex permission checks, effect on speed +@cindex speed, of queries + +First, one thing that affects all queries: The more complex permission +system setup you have, the more overhead you get. + +If you do not have any @code{GRANT} statements done, @strong{MySQL} will +optimize the permission checking somewhat. So if you have a very high +volume it may be worth the time to avoid grants. Otherwise more +permission check results in a larger overhead. + +If your problem is with some explicit @strong{MySQL} function, you can +always time this in the @strong{MySQL} client: + +@example +mysql> select benchmark(1000000,1+1); ++------------------------+ +| benchmark(1000000,1+1) | ++------------------------+ +| 0 | ++------------------------+ +1 row in set (0.32 sec) +@end example + +The above shows that @strong{MySQL} can execute 1,000,000 @code{+} +expressions in 0.32 seconds on a @code{PentiumII 400MHz}. + +All @strong{MySQL} functions should be very optimized, but there may be +some exceptions, and the @code{benchmark(loop_count,expression)} is a +great tool to find out if this is a problem with your query. + +@menu +* EXPLAIN:: +* Estimating performance:: Estimating query performance +* SELECT speed:: Speed of @code{SELECT} queries +* Where optimizations:: How MySQL optimizes @code{WHERE} clauses +* DISTINCT optimization:: How MySQL Optimizes @code{DISTINCT} +* LEFT JOIN optimization:: How MySQL optimizes @code{LEFT JOIN} +* LIMIT optimization:: How MySQL optimizes @code{LIMIT} +* Insert speed:: Speed of @code{INSERT} queries +* Update speed:: Speed of @code{UPDATE} queries +* Delete speed:: Speed of @code{DELETE} queries +* Tips:: +@end menu + + +@node EXPLAIN, Estimating performance, Query Speed, Query Speed +@subsection @code{EXPLAIN} Syntax (Get Information About a @code{SELECT}) + +@findex EXPLAIN +@findex SELECT, optimizing + +@example + EXPLAIN tbl_name +or EXPLAIN SELECT select_options +@end example + +@code{EXPLAIN tbl_name} is a synonym for @code{DESCRIBE tbl_name} or +@code{SHOW COLUMNS FROM tbl_name}. + +When you precede a @code{SELECT} statement with the keyword @code{EXPLAIN}, +@strong{MySQL} explains how it would process the @code{SELECT}, providing +information about how tables are joined and in which order. + +With the help of @code{EXPLAIN}, you can see when you must add indexes +to tables to get a faster @code{SELECT} that uses indexes to find the +records. You can also see if the optimizer joins the tables in an optimal +order. To force the optimizer to use a specific join order for a +@code{SELECT} statement, add a @code{STRAIGHT_JOIN} clause. + +For non-simple joins, @code{EXPLAIN} returns a row of information for each +table used in the @code{SELECT} statement. The tables are listed in the order +they would be read. @strong{MySQL} resolves all joins using a single-sweep +multi-join method. This means that @strong{MySQL} reads a row from the first +table, then finds a matching row in the second table, then in the third table +and so on. When all tables are processed, it outputs the selected columns and +backtracks through the table list until a table is found for which there are +more matching rows. The next row is read from this table and the process +continues with the next table. + +Output from @code{EXPLAIN} includes the following columns: + +@table @code +@item table +The table to which the row of output refers. + +@item type +The join type. Information about the various types is given below. + +@item possible_keys +The @code{possible_keys} column indicates which indexes @strong{MySQL} +could use to find the rows in this table. Note that this column is +totally independent of the order of the tables. That means that some of +the keys in possible_keys may not be usable in practice with the +generated table order. + +If this column is empty, there are no relevant indexes. In this case, +you may be able to improve the performance of your query by examining +the @code{WHERE} clause to see if it refers to some column or columns +that would be suitable for indexing. If so, create an appropriate index +and check the query with @code{EXPLAIN} again. @xref{ALTER TABLE}. + +To see what indexes a table has, use @code{SHOW INDEX FROM tbl_name}. + +@item key +The @code{key} column indicates the key that @strong{MySQL} actually +decided to use. The key is @code{NULL} if no index was chosen. If +@strong{MySQL} chooses the wrong index, you can probably force +@strong{MySQL} to use another index by using @code{myisamchk --analyze}, +@xref{myisamchk syntax}, or by using @code{USE INDEX/IGNORE INDEX}. +@xref{JOIN}. + +@item key_len +The @code{key_len} column indicates the length of the key that +@strong{MySQL} decided to use. The length is @code{NULL} if the +@code{key} is @code{NULL}. Note that this tells us how many parts of a +multi-part key @strong{MySQL} will actually use. + +@item ref +The @code{ref} column shows which columns or constants are used with the +@code{key} to select rows from the table. + +@item rows +The @code{rows} column indicates the number of rows @strong{MySQL} +believes it must examine to execute the query. + +@item Extra +This column contains additional information of how @strong{MySQL} will +resolve the query. Here is an explanation of the different text +strings that can be found in this column: + +@table @code +@item Distinct +@strong{MySQL} will not continue searching for more rows for the current row +combination after it has found the first matching row. + +@item Not exists +@strong{MySQL} was able to do a @code{LEFT JOIN} optimization on the +query and will not examine more rows in this table for the previous row +combination after it finds one row that matches the @code{LEFT JOIN} criteria. + +Here is an example for this: + +@example +SELECT * FROM t1 LEFT JOIN t2 ON t1.id=t2.id WHERE t2.id IS NULL; +@end example + +Assume that @code{t2.id} is defined with @code{NOT NULL}. In this case +@strong{MySQL} will scan @code{t1} and look up the rows in @code{t2} +through @code{t1.id}. If @strong{MySQL} finds a matching row in +@code{t2}, it knows that @code{t2.id} can never be @code{NULL}, and will +not scan through the rest of the rows in @code{t2} that has the same +@code{id}. In other words, for each row in @code{t1}, @strong{MySQL} +only needs to do a single lookup in @code{t2}, independent of how many +matching rows there are in @code{t2}. + +@item @code{range checked for each record (index map: #)} +@strong{MySQL} didn't find a real good index to use. It will, instead, for +each row combination in the preceding tables, do a check on which index to +use (if any), and use this index to retrieve the rows from the table. This +isn't very fast but is faster than having to do a join without +an index. + +@item Using filesort +@strong{MySQL} will need to do an extra pass to find out how to retrieve +the rows in sorted order. The sort is done by going through all rows +according to the @code{join type} and storing the sort key + pointer to +the row for all rows that match the @code{WHERE}. Then the keys are +sorted. Finally the rows are retrieved in sorted order. + +@item Using index +The column information is retrieved from the table using only +information in the index tree without having to do an additional seek to +read the actual row. This can be done when all the used columns for +the table are part of the same index. + +@item Using temporary +To resolve the query @strong{MySQL} will need to create a +temporary table to hold the result. This typically happens if you do an +@code{ORDER BY} on a different column set than you did a @code{GROUP +BY} on. + +@item Where used +A @code{WHERE} clause will be used to restrict which rows will be +matched against the next table or sent to the client. If you don't have +this information and the table is of type @code{ALL} or @code{index}, +you may have something wrong in your query (if you don't intend to +fetch/examine all rows from the table). +@end table + +If you want to get your queries as fast as possible, you should look out for +@code{Using filesort} and @code{Using temporary}. +@end table + +The different join types are listed below, ordered from best to worst type: + +@cindex system table +@cindex tables, system +@table @code +@item system +The table has only one row (= system table). This is a special case of +the @code{const} join type. + +@cindex constant table +@cindex tables, constant +@item const +The table has at most one matching row, which will be read at the start +of the query. Because there is only one row, values from the column in +this row can be regarded as constants by the rest of the +optimizer. @code{const} tables are very fast as they are read only once! + +@item eq_ref +One row will be read from this table for each combination of rows from +the previous tables. This is the best possible join type, other than the +@code{const} types. It is used when all parts of an index are used by +the join and the index is @code{UNIQUE} or a @code{PRIMARY KEY}. + +@item ref +All rows with matching index values will be read from this table for each +combination of rows from the previous tables. @code{ref} is used if the join +uses only a leftmost prefix of the key, or if the key is not @code{UNIQUE} +or a @code{PRIMARY KEY} (in other words, if the join cannot select a single +row based on the key value). If the key that is used matches only a few rows, +this join type is good. + +@item range +Only rows that are in a given range will be retrieved, using an index to +select the rows. The @code{key} column indicates which index is used. +The @code{key_len} contains the longest key part that was used. +The @code{ref} column will be NULL for this type. + +@item index +This is the same as @code{ALL}, except that only the index tree is +scanned. This is usually faster than @code{ALL}, as the index file is usually +smaller than the data file. + +@item ALL +A full table scan will be done for each combination of rows from the +previous tables. This is normally not good if the table is the first +table not marked @code{const}, and usually @strong{very} bad in all other +cases. You normally can avoid @code{ALL} by adding more indexes, so that +the row can be retrieved based on constant values or column values from +earlier tables. +@end table + +You can get a good indication of how good a join is by multiplying all values +in the @code{rows} column of the @code{EXPLAIN} output. This should tell you +roughly how many rows @strong{MySQL} must examine to execute the query. This +number is also used when you restrict queries with the @code{max_join_size} +variable. +@xref{Server parameters}. + +The following example shows how a @code{JOIN} can be optimized progressively +using the information provided by @code{EXPLAIN}. + +Suppose you have the @code{SELECT} statement shown below, that you examine +using @code{EXPLAIN}: + +@example +EXPLAIN SELECT tt.TicketNumber, tt.TimeIn, + tt.ProjectReference, tt.EstimatedShipDate, + tt.ActualShipDate, tt.ClientID, + tt.ServiceCodes, tt.RepetitiveID, + tt.CurrentProcess, tt.CurrentDPPerson, + tt.RecordVolume, tt.DPPrinted, et.COUNTRY, + et_1.COUNTRY, do.CUSTNAME + FROM tt, et, et AS et_1, do + WHERE tt.SubmitTime IS NULL + AND tt.ActualPC = et.EMPLOYID + AND tt.AssignedPC = et_1.EMPLOYID + AND tt.ClientID = do.CUSTNMBR; +@end example + +For this example, assume that: + +@itemize @bullet +@item +The columns being compared have been declared as follows: + +@multitable @columnfractions .1 .2 .7 +@item @strong{Table} @tab @strong{Column} @tab @strong{Column type} +@item @code{tt} @tab @code{ActualPC} @tab @code{CHAR(10)} +@item @code{tt} @tab @code{AssignedPC} @tab @code{CHAR(10)} +@item @code{tt} @tab @code{ClientID} @tab @code{CHAR(10)} +@item @code{et} @tab @code{EMPLOYID} @tab @code{CHAR(15)} +@item @code{do} @tab @code{CUSTNMBR} @tab @code{CHAR(15)} +@end multitable + +@item +The tables have the indexes shown below: + +@multitable @columnfractions .1 .9 +@item @strong{Table} @tab @strong{Index} +@item @code{tt} @tab @code{ActualPC} +@item @code{tt} @tab @code{AssignedPC} +@item @code{tt} @tab @code{ClientID} +@item @code{et} @tab @code{EMPLOYID} (primary key) +@item @code{do} @tab @code{CUSTNMBR} (primary key) +@end multitable + +@item +The @code{tt.ActualPC} values aren't evenly distributed. +@end itemize + +Initially, before any optimizations have been performed, the @code{EXPLAIN} +statement produces the following information: + +@example +table type possible_keys key key_len ref rows Extra +et ALL PRIMARY NULL NULL NULL 74 +do ALL PRIMARY NULL NULL NULL 2135 +et_1 ALL PRIMARY NULL NULL NULL 74 +tt ALL AssignedPC,ClientID,ActualPC NULL NULL NULL 3872 + range checked for each record (key map: 35) +@end example + +Because @code{type} is @code{ALL} for each table, this output indicates that +@strong{MySQL} is doing a full join for all tables! This will take quite a +long time, as the product of the number of rows in each table must be +examined! For the case at hand, this is @code{74 * 2135 * 74 * 3872 = +45,268,558,720} rows. If the tables were bigger, you can only imagine how +long it would take. + +One problem here is that @strong{MySQL} can't (yet) use indexes on columns +efficiently if they are declared differently. In this context, +@code{VARCHAR} and @code{CHAR} are the same unless they are declared as +different lengths. Because @code{tt.ActualPC} is declared as @code{CHAR(10)} +and @code{et.EMPLOYID} is declared as @code{CHAR(15)}, there is a length +mismatch. + +To fix this disparity between column lengths, use @code{ALTER TABLE} to +lengthen @code{ActualPC} from 10 characters to 15 characters: + +@example +mysql> ALTER TABLE tt MODIFY ActualPC VARCHAR(15); +@end example + +Now @code{tt.ActualPC} and @code{et.EMPLOYID} are both @code{VARCHAR(15)}. +Executing the @code{EXPLAIN} statement again produces this result: + +@example +table type possible_keys key key_len ref rows Extra +tt ALL AssignedPC,ClientID,ActualPC NULL NULL NULL 3872 where used +do ALL PRIMARY NULL NULL NULL 2135 + range checked for each record (key map: 1) +et_1 ALL PRIMARY NULL NULL NULL 74 + range checked for each record (key map: 1) +et eq_ref PRIMARY PRIMARY 15 tt.ActualPC 1 +@end example + +This is not perfect, but is much better (the product of the @code{rows} +values is now less by a factor of 74). This version is executed in a couple +of seconds. + +A second alteration can be made to eliminate the column length mismatches +for the @code{tt.AssignedPC = et_1.EMPLOYID} and @code{tt.ClientID = +do.CUSTNMBR} comparisons: + +@example +mysql> ALTER TABLE tt MODIFY AssignedPC VARCHAR(15), + MODIFY ClientID VARCHAR(15); +@end example + +Now @code{EXPLAIN} produces the output shown below: + +@example +table type possible_keys key key_len ref rows Extra +et ALL PRIMARY NULL NULL NULL 74 +tt ref AssignedPC,ClientID,ActualPC ActualPC 15 et.EMPLOYID 52 where used +et_1 eq_ref PRIMARY PRIMARY 15 tt.AssignedPC 1 +do eq_ref PRIMARY PRIMARY 15 tt.ClientID 1 +@end example + +This is almost as good as it can get. + +The remaining problem is that, by default, @strong{MySQL} assumes that values +in the @code{tt.ActualPC} column are evenly distributed, and that isn't the +case for the @code{tt} table. Fortunately, it is easy to tell @strong{MySQL} +about this: + +@example +shell> myisamchk --analyze PATH_TO_MYSQL_DATABASE/tt +shell> mysqladmin refresh +@end example + +Now the join is perfect, and @code{EXPLAIN} produces this result: + +@example +table type possible_keys key key_len ref rows Extra +tt ALL AssignedPC,ClientID,ActualPC NULL NULL NULL 3872 where used +et eq_ref PRIMARY PRIMARY 15 tt.ActualPC 1 +et_1 eq_ref PRIMARY PRIMARY 15 tt.AssignedPC 1 +do eq_ref PRIMARY PRIMARY 15 tt.ClientID 1 +@end example + +Note that the @code{rows} column in the output from @code{EXPLAIN} is an +educated guess from the @strong{MySQL} join optimizer. To optimize a +query, you should check if the numbers are even close to the truth. If not, +you may get better performance by using @code{STRAIGHT_JOIN} in your +@code{SELECT} statement and trying to list the tables in a different order in +the @code{FROM} clause. + + +@node Estimating performance, SELECT speed, EXPLAIN, Query Speed +@subsection Estimating Query Performance + +@cindex estimating, query performance +@cindex queries, estimating performance +@cindex performance, estimating + +In most cases you can estimate the performance by counting disk seeks. +For small tables, you can usually find the row in 1 disk seek (as the +index is probably cached). For bigger tables, you can estimate that +(using B++ tree indexes) you will need: @code{log(row_count) / +log(index_block_length / 3 * 2 / (index_length + data_pointer_length)) + +1} seeks to find a row. + +In @strong{MySQL} an index block is usually 1024 bytes and the data +pointer is usually 4 bytes. A 500,000 row table with an +index length of 3 (medium integer) gives you: +@code{log(500,000)/log(1024/3*2/(3+4)) + 1} = 4 seeks. + +As the above index would require about 500,000 * 7 * 3/2 = 5.2M, +(assuming that the index buffers are filled to 2/3, which is typical) +you will probably have much of the index in memory and you will probably +only need 1-2 calls to read data from the OS to find the row. + +For writes, however, you will need 4 seek requests (as above) to find +where to place the new index and normally 2 seeks to update the index +and write the row. + +Note that the above doesn't mean that your application will slowly +degenerate by N log N! As long as everything is cached by the OS or SQL +server things will only go marginally slower while the table gets +bigger. After the data gets too big to be cached, things will start to +go much slower until your applications is only bound by disk-seeks +(which increase by N log N). To avoid this, increase the index cache as +the data grows. @xref{Server parameters}. + + +@node SELECT speed, Where optimizations, Estimating performance, Query Speed +@subsection Speed of @code{SELECT} Queries + +@findex SELECT speed + +@cindex speed, of queries + +In general, when you want to make a slow @code{SELECT ... WHERE} faster, the +first thing to check is whether or not you can add an index. @xref{MySQL +indexes, , @strong{MySQL} indexes}. All references between different tables +should usually be done with indexes. You can use the @code{EXPLAIN} command +to determine which indexes are used for a @code{SELECT}. +@xref{EXPLAIN, , @code{EXPLAIN}}. + +Some general tips: + +@itemize @bullet +@item +To help @strong{MySQL} optimize queries better, run @code{myisamchk +--analyze} on a table after it has been loaded with relevant data. This +updates a value for each index part that indicates the average number of +rows that have the same value. (For unique indexes, this is always 1, +of course.). @strong{MySQL} will use this to decide which index to +choose when you connect two tables with 'a non-constant expression'. +You can check the result from the @code{analyze} run by doing @code{SHOW +INDEX FROM table_name} and examining the @code{Cardinality} column. + +@item +To sort an index and data according to an index, use @code{myisamchk +--sort-index --sort-records=1} (if you want to sort on index 1). If you +have a unique index from which you want to read all records in order +according to that index, this is a good way to make that faster. Note, +however, that this sorting isn't written optimally and will take a long +time for a large table! +@end itemize + + +@node Where optimizations, DISTINCT optimization, SELECT speed, Query Speed +@subsection How MySQL Optimizes @code{WHERE} Clauses + +@findex WHERE + +@cindex optimizations + +The @code{WHERE} optimizations are put in the @code{SELECT} part here because +they are mostly used with @code{SELECT}, but the same optimizations apply for +@code{WHERE} in @code{DELETE} and @code{UPDATE} statements. + +Also note that this section is incomplete. @strong{MySQL} does many +optimizations, and we have not had time to document them all. + +Some of the optimizations performed by @strong{MySQL} are listed below: + +@itemize @bullet +@item +Removal of unnecessary parentheses: +@example + ((a AND b) AND c OR (((a AND b) AND (c AND d)))) +-> (a AND b AND c) OR (a AND b AND c AND d) +@end example +@item +Constant folding: +@example + (a<b AND b=c) AND a=5 +-> b>5 AND b=c AND a=5 +@end example +@item +Constant condition removal (needed because of constant folding): +@example + (B>=5 AND B=5) OR (B=6 AND 5=5) OR (B=7 AND 5=6) +-> B=5 OR B=6 +@end example +@item +Constant expressions used by indexes are evaluated only once. +@item +@code{COUNT(*)} on a single table without a @code{WHERE} is retrieved +directly from the table information. This is also done for any @code{NOT NULL} +expression when used with only one table. +@item +Early detection of invalid constant expressions. @strong{MySQL} quickly +detects that some @code{SELECT} statements are impossible and returns no rows. +@item +@code{HAVING} is merged with @code{WHERE} if you don't use @code{GROUP BY} +or group functions (@code{COUNT()}, @code{MIN()}...). +@item +For each sub-join, a simpler @code{WHERE} is constructed to get a fast +@code{WHERE} evaluation for each sub-join and also to skip records as +soon as possible. +@cindex constant table +@cindex tables, constant +@item +All constant tables are read first, before any other tables in the query. +A constant table is: +@itemize @minus +@item +An empty table or a table with 1 row. +@item +A table that is used with a @code{WHERE} clause on a @code{UNIQUE} +index, or a @code{PRIMARY KEY}, where all index parts are used with constant +expressions and the index parts are defined as @code{NOT NULL}. +@end itemize +All the following tables are used as constant tables: +@example +mysql> SELECT * FROM t WHERE primary_key=1; +mysql> SELECT * FROM t1,t2 + WHERE t1.primary_key=1 AND t2.primary_key=t1.id; +@end example + +@item +The best join combination to join the tables is found by trying all +possibilities. If all columns in @code{ORDER BY} and in @code{GROUP +BY} come from the same table, then this table is preferred first when +joining. +@item +If there is an @code{ORDER BY} clause and a different @code{GROUP BY} +clause, or if the @code{ORDER BY} or @code{GROUP BY} contains columns +from tables other than the first table in the join queue, a temporary +table is created. +@item +If you use @code{SQL_SMALL_RESULT}, @strong{MySQL} will use an in-memory +temporary table. +@item +Each table index is queried, and the best index that spans fewer than 30% of +the rows is used. If no such index can be found, a quick table scan is used. +@item +In some cases, @strong{MySQL} can read rows from the index without even +consulting the data file. If all columns used from the index are numeric, +then only the index tree is used to resolve the query. +@item +Before each record is output, those that do not match the @code{HAVING} clause +are skipped. +@end itemize + +Some examples of queries that are very fast: + +@example +mysql> SELECT COUNT(*) FROM tbl_name; +mysql> SELECT MIN(key_part1),MAX(key_part1) FROM tbl_name; +mysql> SELECT MAX(key_part2) FROM tbl_name + WHERE key_part_1=constant; +mysql> SELECT ... FROM tbl_name + ORDER BY key_part1,key_part2,... LIMIT 10; +mysql> SELECT ... FROM tbl_name + ORDER BY key_part1 DESC,key_part2 DESC,... LIMIT 10; +@end example + +The following queries are resolved using only the index tree (assuming +the indexed columns are numeric): + +@example +mysql> SELECT key_part1,key_part2 FROM tbl_name WHERE key_part1=val; +mysql> SELECT COUNT(*) FROM tbl_name + WHERE key_part1=val1 AND key_part2=val2; +mysql> SELECT key_part2 FROM tbl_name GROUP BY key_part1; +@end example + +The following queries use indexing to retrieve the rows in sorted +order without a separate sorting pass: + +@example +mysql> SELECT ... FROM tbl_name ORDER BY key_part1,key_part2,... ; +mysql> SELECT ... FROM tbl_name ORDER BY key_part1 DESC,key_part2 DESC,... ; +@end example + +@node DISTINCT optimization, LEFT JOIN optimization, Where optimizations, Query Speed +@subsection How MySQL Optimizes @code{DISTINCT} + +@findex DISTINCT + +@cindex optimizing, DISTINCT + +@code{DISTINCT} is converted to a @code{GROUP BY} on all columns, +@code{DISTINCT} combined with @code{ORDER BY} will in many cases also +need a temporary table. + +When combining @code{LIMIT #} with @code{DISTINCT}, @strong{MySQL} will stop +as soon as it finds @code{#} unique rows. + +If you don't use columns from all used tables, @strong{MySQL} will stop +the scanning of the not used tables as soon as it has found the first match. + +@example +SELECT DISTINCT t1.a FROM t1,t2 where t1.a=t2.a; +@end example + +In the case, assuming t1 is used before t2 (check with @code{EXPLAIN}), then +@strong{MySQL} will stop reading from t2 (for that particular row in t1) +when the first row in t2 is found. + + +@node LEFT JOIN optimization, LIMIT optimization, DISTINCT optimization, Query Speed +@subsection How MySQL Optimizes @code{LEFT JOIN} and @code{RIGHT JOIN} + +@findex LEFT JOIN + +@cindex optimizing, LEFT JOIN + +@code{A LEFT JOIN B} in @strong{MySQL} is implemented as follows: + +@itemize @bullet +@item +The table @code{B} is set to be dependent on table @code{A} and all tables +that @code{A} is dependent on. +@item +The table @code{A} is set to be dependent on all tables (except @code{B}) +that are used in the @code{LEFT JOIN} condition. +@item +All @code{LEFT JOIN} conditions are moved to the @code{WHERE} clause. +@item +All standard join optimizations are done, with the exception that a table is +always read after all tables it is dependent on. If there is a circular +dependence then @strong{MySQL} will issue an error. +@item +All standard @code{WHERE} optimizations are done. +@item +If there is a row in @code{A} that matches the @code{WHERE} clause, but there +wasn't any row in @code{B} that matched the @code{LEFT JOIN} condition, +then an extra @code{B} row is generated with all columns set to @code{NULL}. +@item +If you use @code{LEFT JOIN} to find rows that don't exist in some +table and you have the following test: @code{column_name IS NULL} in the +@code{WHERE} part, where column_name is a column that is declared as +@code{NOT NULL}, then @strong{MySQL} will stop searching after more rows +(for a particular key combination) after it has found one row that +matches the @code{LEFT JOIN} condition. +@end itemize + +@code{RIGHT JOIN} is implemented analogously as @code{LEFT JOIN}. + +The table read order forced by @code{LEFT JOIN} and @code{STRAIGHT JOIN} +will help the join optimizer (which calculates in which order tables +should be joined) to do its work much more quickly, as there are fewer +table permutations to check. + +Note that the above means that if you do a query of type: + +@example +SELECT * FROM a,b LEFT JOIN c ON (c.key=a.key) LEFT JOIN d (d.key=a.key) WHERE b.key=d.key +@end example + +@strong{MySQL} will do a full scan on @code{b} as the @code{LEFT +JOIN} will force it to be read before @code{d}. + +The fix in this case is to change the query to: + +@example +SELECT * FROM b,a LEFT JOIN c ON (c.key=a.key) LEFT JOIN d (d.key=a.key) WHERE b.key=d.key +@end example + + +@node LIMIT optimization, Insert speed, LEFT JOIN optimization, Query Speed +@subsection How MySQL Optimizes @code{LIMIT} + +@findex LIMIT + +@cindex optimizing, LIMIT + +In some cases @strong{MySQL} will handle the query differently when you are +using @code{LIMIT #} and not using @code{HAVING}: + +@itemize @bullet +@item +If you are selecting only a few rows with @code{LIMIT}, @strong{MySQL} +will use indexes in some cases when it normally would prefer to do a +full table scan. +@item +If you use @code{LIMIT #} with @code{ORDER BY}, @strong{MySQL} will end the +sorting as soon as it has found the first @code{#} lines instead of sorting +the whole table. +@item +When combining @code{LIMIT #} with @code{DISTINCT}, @strong{MySQL} will stop +as soon as it finds @code{#} unique rows. +@item +In some cases a @code{GROUP BY} can be resolved by reading the key in order +(or do a sort on the key) and then calculate summaries until the +key value changes. In this case @code{LIMIT #} will not calculate any +unnecessary @code{GROUP BY}'s. +@item +As soon as @strong{MySQL} has sent the first @code{#} rows to the client, it +will abort the query. +@item +@code{LIMIT 0} will always quickly return an empty set. This is useful +to check the query and to get the column types of the result columns. +@item +The size of temporary tables uses the @code{LIMIT #} to calculate how much +space is needed to resolve the query. +@end itemize + +@node Insert speed, Update speed, LIMIT optimization, Query Speed +@subsection Speed of @code{INSERT} Queries + +@findex INSERT + +@cindex speed, inserting +@cindex inserting, speed of + +The time to insert a record consists approximately of: + +@itemize @bullet +@item +Connect: (3) +@item +Sending query to server: (2) +@item +Parsing query: (2) +@item +Inserting record: (1 x size of record) +@item +Inserting indexes: (1 x number of indexes) +@item +Close: (1) +@end itemize + +where the numbers are somewhat proportional to the overall time. This +does not take into consideration the initial overhead to open tables +(which is done once for each concurrently running query). + +The size of the table slows down the insertion of indexes by N log N +(B-trees). + +Some ways to speed up inserts: + +@itemize @bullet +@item +If you are inserting many rows from the same client at the same time, use +multiple value lists @code{INSERT} statements. This is much faster (many +times in some cases) than using separate @code{INSERT} statements. +@item +If you are inserting a lot of rows from different clients, you can get +higher speed by using the @code{INSERT DELAYED} statement. @xref{INSERT, +, @code{INSERT}}. +@item +Note that with @code{MyISAM} you can insert rows at the same time +@code{SELECT}s are running if there are no deleted rows in the tables. +@item +When loading a table from a text file, use @code{LOAD DATA INFILE}. This +is usually 20 times faster than using a lot of @code{INSERT} statements. +@xref{LOAD DATA, , @code{LOAD DATA}}. +@item +It is possible with some extra work to make @code{LOAD DATA INFILE} run even +faster when the table has many indexes. Use the following procedure: + +@enumerate +@item +Optionally create the table with @code{CREATE TABLE}. For example, using +@code{mysql} or Perl-DBI. + +@item +Execute a @code{FLUSH TABLES} statement or the shell command @code{mysqladmin +flush-tables}. + +@item +Use @code{myisamchk --keys-used=0 -rq /path/to/db/tbl_name}. This will +remove all usage of all indexes from the table. + +@item +Insert data into the table with @code{LOAD DATA INFILE}. This will not +update any indexes and will therefore be very fast. + +@item +If you are going to only read the table in the future, run @code{myisampack} +on it to make it smaller. @xref{Compressed format}. + +@item +Re-create the indexes with @code{myisamchk -r -q +/path/to/db/tbl_name}. This will create the index tree in memory before +writing it to disk, which is much faster because it avoids lots of disk +seeks. The resulting index tree is also perfectly balanced. + +@item +Execute a @code{FLUSH TABLES} statement or the shell command @code{mysqladmin +flush-tables}. +@end enumerate + +This procedure will be built into @code{LOAD DATA INFILE} in some future +version of @strong{MySQL}. +@item +You can speed up insertions by locking your tables: + +@example +mysql> LOCK TABLES a WRITE; +mysql> INSERT INTO a VALUES (1,23),(2,34),(4,33); +mysql> INSERT INTO a VALUES (8,26),(6,29); +mysql> UNLOCK TABLES; +@end example + +The main speed difference is that the index buffer is flushed to disk only +once, after all @code{INSERT} statements have completed. Normally there would +be as many index buffer flushes as there are different @code{INSERT} +statements. Locking is not needed if you can insert all rows with a single +statement. + +Locking will also lower the total time of multi-connection tests, but the +maximum wait time for some threads will go up (because they wait for +locks). For example: + +@example +thread 1 does 1000 inserts +thread 2, 3, and 4 does 1 insert +thread 5 does 1000 inserts +@end example + +If you don't use locking, 2, 3, and 4 will finish before 1 and 5. If you +use locking, 2, 3, and 4 probably will not finish before 1 or 5, but the +total time should be about 40% faster. + +As @code{INSERT}, @code{UPDATE}, and @code{DELETE} operations are very +fast in @strong{MySQL}, you will obtain better overall performance by +adding locks around everything that does more than about 5 inserts or +updates in a row. If you do very many inserts in a row, you could do a +@code{LOCK TABLES} followed by an @code{UNLOCK TABLES} once in a while +(about each 1000 rows) to allow other threads access to the table. This +would still result in a nice performance gain. + +Of course, @code{LOAD DATA INFILE} is much faster for loading data. +@end itemize + +To get some more speed for both @code{LOAD DATA INFILE} and +@code{INSERT}, enlarge the key buffer. @xref{Server parameters}. + + +@node Update speed, Delete speed, Insert speed, Query Speed +@subsection Speed of @code{UPDATE} Queries + +Update queries are optimized as a @code{SELECT} query with the additional +overhead of a write. The speed of the write is dependent on the size of +the data that is being updated and the number of indexes that are +updated. Indexes that are not changed will not be updated. + +Also, another way to get fast updates is to delay updates and then do +many updates in a row later. Doing many updates in a row is much quicker +than doing one at a time if you lock the table. + +Note that, with dynamic record format, updating a record to +a longer total length may split the record. So if you do this often, +it is very important to @code{OPTIMIZE TABLE} sometimes. +@xref{OPTIMIZE TABLE, , @code{OPTIMIZE TABLE}}. + + +@node Delete speed, Tips, Update speed, Query Speed +@subsection Speed of @code{DELETE} Queries + +If you want to delete all rows in the table, you should use +@code{TRUNCATE TABLE table_name}. @xref{TRUNCATE}. + +The time to delete a record is exactly proportional to the number of +indexes. To delete records more quickly, you can increase the size of +the index cache. @xref{Server parameters}. + + +@node Tips, , Delete speed, Query Speed +@subsection Other Optimization Tips + +@cindex optimization, tips +@cindex tips, optimization + +Unsorted tips for faster systems: + +@itemize @bullet +@item +Use persistent connections to the database to avoid the connection +overhead. If you can't use persistent connections and you are doing a +lot of new connections to the database, you may want to change the value +of the @code{thread_cache_size} variable. @xref{Server parameters}. +@item +Always check that all your queries really use the indexes you have created +in the tables. In @strong{MySQL} you can do this with the @code{EXPLAIN} +command. @xref{EXPLAIN, Explain, Explain, manual}. +@item +Try to avoid complex @code{SELECT} queries on tables that are updated a +lot. This is to avoid problems with table locking. +@item +The new @code{MyISAM} tables can insert rows in a table without deleted +rows at the same time another table is reading from it. If this is important +for you, you should consider methods where you don't have to delete rows +or run @code{OPTIMIZE TABLE} after you have deleted a lot of rows. +@item +Use @code{ALTER TABLE ... ORDER BY expr1,expr2...} if you mostly +retrieve rows in expr1,expr2.. order. By using this option after big +changes to the table, you may be able to get higher performance. +@item +In some cases it may make sense to introduce a column that is 'hashed' +based on information from other columns. If this column is short and +reasonably unique it may be much faster than a big index on many +columns. In @strong{MySQL} it's very easy to use this extra column: +@code{SELECT * FROM table_name WHERE hash=MD5(concat(col1,col2)) +AND col_1='constant' AND col_2='constant'} +@item +For tables that change a lot you should try to avoid all @code{VARCHAR} +or @code{BLOB} columns. You will get dynamic row length as soon as you +are using a single @code{VARCHAR} or @code{BLOB} column. @xref{Table +types}. +@item +It's not normally useful to split a table into different tables just +because the rows gets 'big'. To access a row, the biggest performance +hit is the disk seek to find the first byte of the row. After finding +the data most new disks can read the whole row fast enough for most +applications. The only cases where it really matters to split up a table is if +it's a dynamic row size table (see above) that you can change to a fixed +row size, or if you very often need to scan the table and don't need +most of the columns. @xref{Table types}. +@item +If you very often need to calculate things based on information from a +lot of rows (like counts of things), it's probably much better to +introduce a new table and update the counter in real time. An update of +type @code{UPDATE table set count=count+1 where index_column=constant} +is very fast! + +This is really important when you use databases like @strong{MySQL} that +only have table locking (multiple readers / single writers). This will +also give better performance with most databases, as the row locking +manager in this case will have less to do. +@item +If you need to collect statistics from big log tables, use summary tables +instead of scanning the whole table. Maintaining the summaries should be +much faster than trying to do statistics 'live'. It's much faster to +regenerate new summary tables from the logs when things change +(depending on business decisions) than to have to change the running +application! +@item +If possible, one should classify reports as 'live' or 'statistical', +where data needed for statistical reports are only generated based on +summary tables that are generated from the actual data. +@item +Take advantage of the fact that columns have default values. Insert +values explicitly only when the value to be inserted differs from the +default. This reduces the parsing that @strong{MySQL} need to do and +improves the insert speed. +@item +In some cases it's convenient to pack and store data into a blob. In this +case you have to add some extra code in your application to pack/unpack +things in the blob, but this may save a lot of accesses at some stage. +This is practical when you have data that doesn't conform to a static +table structure. +@item +Normally you should try to keep all data non-redundant (what +is called 3rd normal form in database theory), but you should not be +afraid of duplicating things or creating summary tables if you need these +to gain more speed. +@item +Stored procedures or UDF (user-defined functions) may be a good way to +get more performance. In this case you should, however, always have a way +to do this some other (slower) way if you use some database that doesn't +support this. +@item +You can always gain something by caching queries/answers in your +application and trying to do many inserts/updates at the same time. If +your database supports lock tables (like @strong{MySQL} and Oracle), +this should help to ensure that the index cache is only flushed once +after all updates. +@item +Use @code{INSERT /*! DELAYED */} when you do not need to know when your +data is written. This speeds things up because many records can be written +with a single disk write. +@item +Use @code{INSERT /*! LOW_PRIORITY */} when you want your selects to be +more important. +@item +Use @code{SELECT /*! HIGH_PRIORITY */} to get selects that jump the +queue. That is, the select is done even if there is somebody waiting to +do a write. +@item +Use the multi-line @code{INSERT} statement to store many rows with one +SQL command (many SQL servers supports this). +@item +Use @code{LOAD DATA INFILE} to load bigger amounts of data. This is +faster than normal inserts and will be even faster when @code{myisamchk} +is integrated in @code{mysqld}. +@item +Use @code{AUTO_INCREMENT} columns to make unique values. +@item +Use @code{OPTIMIZE TABLE} once in a while to avoid fragmentation when +using dynamic table format. @xref{OPTIMIZE TABLE, , @code{OPTIMIZE TABLE}}. + +@item +Use @code{HEAP} tables to get more speed when possible. @xref{Table +types}. +@item +When using a normal Web server setup, images should be stored as +files. That is, store only a file reference in the database. The main +reason for this is that a normal Web server is much better at caching +files than database contents. So it it's much easier to get a fast +system if you are using files. +@item +Use in memory tables for non-critical data that are accessed often (like +information about the last shown banner for users that don't have +cookies). +@item +Columns with identical information in different tables should be +declared identical and have identical names. Before Version 3.23 you +got slow joins otherwise. + +Try to keep the names simple (use @code{name} instead of +@code{customer_name} in the customer table). To make your names portable +to other SQL servers you should keep them shorter than 18 characters. +@item +If you need REALLY high speed, you should take a look at the low-level +interfaces for data storage that the different SQL servers support! For +example, by accessing the @strong{MySQL} @code{MyISAM} directly, you could +get a speed increase of 2-5 times compared to using the SQL interface. +To be able to do this the data must be on the same server as +the application, and usually it should only be accessed by one process +(because external file locking is really slow). One could eliminate the +above problems by introducing low-level @code{MyISAM} commands in the +@strong{MySQL} server (this could be one easy way to get more +performance if needed). By carefully designing the database interface, +it should be quite easy to support this types of optimization. +@item +In many cases it's faster to access data from a database (using a live +connection) than accessing a text file, just because the database is +likely to be more compact than the text file (if you are using numerical +data), and this will involve fewer disk accesses. You will also save +code because you don't have to parse your text files to find line and +column boundaries. +@item +You can also use replication to speed things up. @xref{Replication}. +@item +Declaring a table with @code{DELAY_KEY_WRITE=1} will make the updating of +indexes faster, as these are not logged to disk until the file is closed. +The downside is that you should run @code{myisamchk} on these tables before +you start @code{mysqld} to ensure that they are okay if something killed +@code{mysqld} in the middle. As the key information can always be generated +from the data, you should not lose anything by using @code{DELAY_KEY_WRITE}. +@end itemize + + +@node Locking Issues, Optimizing Database Structure, Query Speed, MySQL Optimization +@section Locking Issues + + +@menu +* Internal locking:: +* Table locking:: +@end menu + +@node Internal locking, Table locking, Locking Issues, Locking Issues +@subsection How MySQL Locks Tables + +@cindex internal locking +@cindex locking, tables +@cindex tables, locking + +You can find a discussion about different locking methods in the appendix. +@xref{Locking methods}. + +All locking in @strong{MySQL} is deadlock-free. This is managed by always +requesting all needed locks at once at the beginning of a query and always +locking the tables in the same order. + +The locking method @strong{MySQL} uses for @code{WRITE} locks works as follows: + +@itemize @bullet +@item +If there are no locks on the table, put a write lock on it. +@item +Otherwise, put the lock request in the write lock queue. +@end itemize + +The locking method @strong{MySQL} uses for @code{READ} locks works as follows: + +@itemize @bullet +@item +If there are no write locks on the table, put a read lock on it. +@item +Otherwise, put the lock request in the read lock queue. +@end itemize + +When a lock is released, the lock is made available to the threads +in the write lock queue, then to the threads in the read lock queue. + +This means that if you have many updates on a table, @code{SELECT} +statements will wait until there are no more updates. + +To work around this for the case where you want to do many @code{INSERT} and +@code{SELECT} operations on a table, you can insert rows in a temporary +table and update the real table with the records from the temporary table +once in a while. + +This can be done with the following code: +@example +mysql> LOCK TABLES real_table WRITE, insert_table WRITE; +mysql> insert into real_table select * from insert_table; +mysql> TRUNCATE TABLE insert_table; +mysql> UNLOCK TABLES; +@end example + +You can use the @code{LOW_PRIORITY} options with @code{INSERT}, +@code{UPDATE} or @code{DELETE} or @code{HIGH_PRIORITY} with +@code{SELECT} if you want to prioritize retrieval in some specific +cases. You can also start @code{mysqld} with @code{--low-priority-updates} +to get the same behaveour. + +Using @code{SQL_BUFFER_RESULT} can also help making table locks shorter. +@xref{SELECT}. + +You could also change the locking code in @file{mysys/thr_lock.c} to use a +single queue. In this case, write locks and read locks would have the same +priority, which might help some applications. + + +@node Table locking, , Internal locking, Locking Issues +@subsection Table Locking Issues + +@cindex problems, table locking + +The table locking code in @strong{MySQL} is deadlock free. + +@strong{MySQL} uses table locking (instead of row locking or column +locking) on all table types, except @code{BDB} tables, to achieve a very +high lock speed. For large tables, table locking is MUCH better than +row locking for most applications, but there are, of course, some +pitfalls. + +For @code{BDB} and @code{InnoDB} tables, @strong{MySQL} only uses table +locking if you explicitely lock the table with @code{LOCK TABLES} or +execute a command that will modify every row in the table, like +@code{ALTER TABLE}. For these table types we recommend you to not use +@code{LOCK TABLES} at all. + +In @strong{MySQL} Version 3.23.7 and above, you can insert rows into +@code{MyISAM} tables at the same time other threads are reading from the +table. Note that currently this only works if there are no holes after +deleted rows in the table at the time the insert is made. When all holes +has been filled with new data, concurrent inserts will automatically be +enabled again. + +Table locking enables many threads to read from a table at the same +time, but if a thread wants to write to a table, it must first get +exclusive access. During the update, all other threads that want to +access this particular table will wait until the update is ready. + +As updates on tables normally are considered to be more important than +@code{SELECT}, all statements that update a table have higher priority +than statements that retrieve information from a table. This should +ensure that updates are not 'starved' because one issues a lot of heavy +queries against a specific table. (You can change this by using +LOW_PRIORITY with the statement that does the update or +@code{HIGH_PRIORITY} with the @code{SELECT} statement.) + +Starting from @strong{MySQL} Version 3.23.7 one can use the +@code{max_write_lock_count} variable to force @strong{MySQL} to +temporary give all @code{SELECT} statements, that wait for a table, a +higher priority after a specific number of inserts on a table. + +Table locking is, however, not very good under the following senario: + +@itemize @bullet +@item +A client issues a @code{SELECT} that takes a long time to run. +@item +Another client then issues an @code{UPDATE} on a used table. This client +will wait until the @code{SELECT} is finished. +@item +Another client issues another @code{SELECT} statement on the same table. As +@code{UPDATE} has higher priority than @code{SELECT}, this @code{SELECT} +will wait for the @code{UPDATE} to finish. It will also wait for the first +@code{SELECT} to finish! +@item +A thread is waiting for something like @code{full disk}, in which case all +threads that wants to access the problem table will also be put in a waiting +state until more disk space is made available. +@end itemize + +Some possible solutions to this problem are: + +@itemize @bullet +@item +Try to get the @code{SELECT} statements to run faster. You may have to create +some summary tables to do this. + +@item +Start @code{mysqld} with @code{--low-priority-updates}. This will give +all statements that update (modify) a table lower priority than a @code{SELECT} +statement. In this case the last @code{SELECT} statement in the previous +scenario would execute before the @code{INSERT} statement. + +@item +You can give a specific @code{INSERT}, @code{UPDATE}, or @code{DELETE} +statement lower priority with the @code{LOW_PRIORITY} attribute. + +@item +Start @code{mysqld} with a low value for @strong{max_write_lock_count} to give +@code{READ} locks after a certain number of @code{WRITE} locks. + +@item +You can specify that all updates from a specific thread should be done with +low priority by using the SQL command: @code{SET SQL_LOW_PRIORITY_UPDATES=1}. +@xref{SET OPTION, , @code{SET OPTION}}. + +@item +You can specify that a specific @code{SELECT} is very important with the +@code{HIGH_PRIORITY} attribute. @xref{SELECT, , @code{SELECT}}. + +@item +If you have problems with @code{INSERT} combined with @code{SELECT}, +switch to use the new @code{MyISAM} tables as these support concurrent +@code{SELECT}s and @code{INSERT}s. + +@item +If you mainly mix @code{INSERT} and @code{SELECT} statements, the +@code{DELAYED} attribute to @code{INSERT} will probably solve your problems. +@xref{INSERT, , @code{INSERT}}. + +@item +If you have problems with @code{SELECT} and @code{DELETE}, the @code{LIMIT} +option to @code{DELETE} may help. @xref{DELETE, , @code{DELETE}}. +@end itemize + + +@node Optimizing Database Structure, Optimizing the Server, Locking Issues, MySQL Optimization +@section Optimizing Database Structure + + +@menu +* Design:: +* Data size:: +* MySQL indexes:: +* Indexes:: +* Multiple-column indexes:: +* Table cache:: +* Creating many tables:: +* Open tables:: +@end menu + +@node Design, Data size, Optimizing Database Structure, Optimizing Database Structure +@subsection Design Choices + +@cindex design, choices +@cindex database design +@cindex storage of data + +@strong{MySQL} keeps row data and index data in separate files. Many (almost +all) other databases mix row and index data in the same file. We believe that +the @strong{MySQL} choice is better for a very wide range of modern systems. + +Another way to store the row data is to keep the information for each +column in a separate area (examples are SDBM and Focus). This will cause a +performance hit for every query that accesses more than one column. Because +this degenerates so quickly when more than one column is accessed, +we believe that this model is not good for general purpose databases. + +The more common case is that the index and data are stored together +(like in Oracle/Sybase et al). In this case you will find the row +information at the leaf page of the index. The good thing with this +layout is that it, in many cases, depending on how well the index is +cached, saves a disk read. The bad things with this layout are: + +@itemize @bullet +@item +Table scanning is much slower because you have to read through the indexes +to get at the data. +@item +You can't use only the index table to retrieve data for a query. +@item +You lose a lot of space, as you must duplicate indexes from the nodes +(as you can't store the row in the nodes). +@item +Deletes will degenerate the table over time (as indexes in nodes are +usually not updated on delete). +@item +It's harder to cache ONLY the index data. +@end itemize + + +@node Data size, MySQL indexes, Design, Optimizing Database Structure +@subsection Get Your Data as Small as Possible + +@cindex data, size +@cindex reducing, data size +@cindex storage space, minimizing +@cindex tables, improving performance +@cindex performance, improving + +One of the most basic optimization is to get your data (and indexes) to +take as little space on the disk (and in memory) as possible. This can +give huge improvements because disk reads are faster and normally less +main memory will be used. Indexing also takes less resources if +done on smaller columns. + +@strong{MySQL} supports a lot of different table types and row formats. +Choosing the right table format may give you a big performance gain. +@xref{Table types}. + +You can get better performance on a table and minimize storage space +using the techniques listed below: + +@itemize @bullet +@item +Use the most efficient (smallest) types possible. @strong{MySQL} has +many specialized types that save disk space and memory. + +@item +Use the smaller integer types if possible to get smaller tables. For +example, @code{MEDIUMINT} is often better than @code{INT}. + +@item +Declare columns to be @code{NOT NULL} if possible. It makes everything +faster and you save one bit per column. Note that if you really need +@code{NULL} in your application you should definitely use it. Just avoid +having it on all columns by default. + +@item +If you don't have any variable-length columns (@code{VARCHAR}, +@code{TEXT}, or @code{BLOB} columns), a fixed-size record format is +used. This is faster but unfortunately may waste some space. +@xref{MyISAM table formats}. + +@item +The primary index of a table should be as short as possible. This makes +identification of one row easy and efficient. + +@item +For each table, you have to decide which storage/index method to +use. @xref{Table types}. + +@item +Only create the indexes that you really need. Indexes are good for +retrieval but bad when you need to store things fast. If you mostly +access a table by searching on a combination of columns, make an index +on them. The first index part should be the most used column. If you are +ALWAYS using many columns, you should use the column with more duplicates +first to get better compression of the index. + +@item +If it's very likely that a column has a unique prefix on the first number +of characters, it's better to only index this prefix. @strong{MySQL} +supports an index on a part of a character column. Shorter indexes are +faster not only because they take less disk space but also because they +will give you more hits in the index cache and thus fewer disk +seeks. @xref{Server parameters}. + +@item +In some circumstances it can be beneficial to split into two a table that is +scanned very often. This is especially true if it is a dynamic +format table and it is possible to use a smaller static format table that +can be used to find the relevant rows when scanning the table. +@end itemize + + +@node MySQL indexes, Indexes, Data size, Optimizing Database Structure +@subsection How MySQL Uses Indexes + +@cindex indexes, use of + +Indexes are used to find rows with a specific value of one column +fast. Without an index @strong{MySQL} has to start with the first record +and then read through the whole table until it finds the relevant +rows. The bigger the table, the more this costs. If the table has an index +for the columns in question, @strong{MySQL} can quickly get a position to +seek to in the middle of the data file without having to look at all the +data. If a table has 1000 rows, this is at least 100 times faster than +reading sequentially. Note that if you need to access almost all 1000 +rows it is faster to read sequentially because we then avoid disk seeks. + +All @strong{MySQL} indexes (@code{PRIMARY}, @code{UNIQUE}, and +@code{INDEX}) are stored in B-trees. Strings are automatically prefix- +and end-space compressed. @xref{CREATE INDEX, , @code{CREATE INDEX}}. + +Indexes are used to: +@itemize @bullet +@item +Quickly find the rows that match a @code{WHERE} clause. + +@item +Retrieve rows from other tables when performing joins. + +@item +Find the @code{MAX()} or @code{MIN()} value for a specific indexed +column. This is optimized by a preprocessor that checks if you are +using @code{WHERE} key_part_# = constant on all key parts < N. In this case +@strong{MySQL} will do a single key lookup and replace the @code{MIN()} +expression with a constant. If all expressions are replaced with +constants, the query will return at once: + +@example +SELECT MIN(key_part2),MAX(key_part2) FROM table_name where key_part1=10 +@end example + +@item +Sort or group a table if the sorting or grouping is done on a leftmost +prefix of a usable key (for example, @code{ORDER BY key_part_1,key_part_2 }). The +key is read in reverse order if all key parts are followed by @code{DESC}. + +The index can also be used even if the @code{ORDER BY} doesn't match the index +exactly, as long as all the unused index parts and all the extra +are @code{ORDER BY} columns are constants in the @code{WHERE} clause. The +following queries will use the index to resolve the @code{ORDER BY} part: + +@example +SELECT * FROM foo ORDER BY key_part1,key_part2,key_part3; +SELECT * FROM foo WHERE column=constant ORDER BY column, key_part1; +SELECT * FROM foo WHERE key_part1=const GROUP BY key_part2; +@end example + +@item +In some cases a query can be optimized to retrieve values without +consulting the data file. If all used columns for some table are numeric +and form a leftmost prefix for some key, the values may be retrieved +from the index tree for greater speed: + +@example +SELECT key_part3 FROM table_name WHERE key_part1=1 +@end example + +@end itemize + +Suppose you issue the following @code{SELECT} statement: + +@example +mysql> SELECT * FROM tbl_name WHERE col1=val1 AND col2=val2; +@end example + +If a multiple-column index exists on @code{col1} and @code{col2}, the +appropriate rows can be fetched directly. If separate single-column +indexes exist on @code{col1} and @code{col2}, the optimizer tries to +find the most restrictive index by deciding which index will find fewer +rows and using that index to fetch the rows. + +@cindex indexes, leftmost prefix of +@cindex leftmost prefix of indexes +If the table has a multiple-column index, any leftmost prefix of the +index can be used by the optimizer to find rows. For example, if you +have a three-column index on @code{(col1,col2,col3)}, you have indexed +search capabilities on @code{(col1)}, @code{(col1,col2)}, and +@code{(col1,col2,col3)}. + +@strong{MySQL} can't use a partial index if the columns don't form a +leftmost prefix of the index. Suppose you have the @code{SELECT} +statements shown below: + +@example +mysql> SELECT * FROM tbl_name WHERE col1=val1; +mysql> SELECT * FROM tbl_name WHERE col2=val2; +mysql> SELECT * FROM tbl_name WHERE col2=val2 AND col3=val3; +@end example + +If an index exists on @code{(col1,col2,col3)}, only the first query +shown above uses the index. The second and third queries do involve +indexed columns, but @code{(col2)} and @code{(col2,col3)} are not +leftmost prefixes of @code{(col1,col2,col3)}. + +@findex LIKE, and indexes +@findex LIKE, and wildcards +@cindex indexes, and @code{LIKE} +@cindex wildcards, and @code{LIKE} +@strong{MySQL} also uses indexes for @code{LIKE} comparisons if the argument +to @code{LIKE} is a constant string that doesn't start with a wild-card +character. For example, the following @code{SELECT} statements use indexes: + +@example +mysql> select * from tbl_name where key_col LIKE "Patrick%"; +mysql> select * from tbl_name where key_col LIKE "Pat%_ck%"; +@end example + +In the first statement, only rows with @code{"Patrick" <= key_col < +"Patricl"} are considered. In the second statement, only rows with +@code{"Pat" <= key_col < "Pau"} are considered. + +The following @code{SELECT} statements will not use indexes: +@example +mysql> select * from tbl_name where key_col LIKE "%Patrick%"; +mysql> select * from tbl_name where key_col LIKE other_col; +@end example + +In the first statement, the @code{LIKE} value begins with a wild-card +character. In the second statement, the @code{LIKE} value is not a +constant. + +@findex IS NULL, and indexes +@cindex indexes, and @code{IS NULL} +Searching using @code{column_name IS NULL} will use indexes if column_name +is an index. + +@strong{MySQL} normally uses the index that finds the least number of rows. An +index is used for columns that you compare with the following operators: +@code{=}, @code{>}, @code{>=}, @code{<}, @code{<=}, @code{BETWEEN}, and a +@code{LIKE} with a non-wild-card prefix like @code{'something%'}. + +Any index that doesn't span all @code{AND} levels in the @code{WHERE} clause +is not used to optimize the query. In other words: To be able to use an +index, a prefix of the index must be used in every @code{AND} group. + +The following @code{WHERE} clauses use indexes: +@example +... WHERE index_part1=1 AND index_part2=2 AND other_column=3 +... WHERE index=1 OR A=10 AND index=2 /* index = 1 OR index = 2 */ +... WHERE index_part1='hello' AND index_part_3=5 + /* optimized like "index_part1='hello'" */ +... WHERE index1=1 and index2=2 or index1=3 and index3=3; + /* Can use index on index1 but not on index2 or index 3 */ +@end example + +These @code{WHERE} clauses do @strong{NOT} use indexes: +@example +... WHERE index_part2=1 AND index_part3=2 /* index_part_1 is not used */ +... WHERE index=1 OR A=10 /* Index is not used in both AND parts */ +... WHERE index_part1=1 OR index_part2=10 /* No index spans all rows */ +@end example + +Note that in some cases @strong{MySQL} will not use an index, even if one +would be available. Some of the cases where this happens are: + +@itemize @bullet +@item +If the use of the index would require @strong{MySQL} to access more +than 30 % of the rows in the table. (In this case a table scan is +probably much faster, as this will require us to do much fewer seeks). +Note that if such a query uses @code{LIMIT} to only retrieve +part of the rows, @strong{MySQL} will use an index anyway, as it can +much more quickly find the few rows to return in the result. +@end itemize + + +@menu +* Indexes:: +* Multiple-column indexes:: +* Table cache:: +* Creating many tables:: +* Open tables:: +@end menu + +@node Indexes, Multiple-column indexes, MySQL indexes, Optimizing Database Structure +@subsection Column Indexes + +@cindex indexes, columns +@cindex columns, indexes +@cindex keys + +All @strong{MySQL} column types can be indexed. Use of indexes on the +relevant columns is the best way to improve the performance of @code{SELECT} +operations. + +The maximum number of keys and the maximum index length is defined per +table handler. @xref{Table types}. You can with all table handlers have +at least 16 keys and a total index length of at least 256 bytes. + +For @code{CHAR} and @code{VARCHAR} columns, you can index a prefix of a +column. This is much faster and requires less disk space than indexing the +whole column. The syntax to use in the @code{CREATE TABLE} statement to +index a column prefix looks like this: + +@example +KEY index_name (col_name(length)) +@end example + +The example below creates an index for the first 10 characters of the +@code{name} column: + +@example +mysql> CREATE TABLE test ( + name CHAR(200) NOT NULL, + KEY index_name (name(10))); +@end example + +For @code{BLOB} and @code{TEXT} columns, you must index a prefix of the +column. You cannot index the entire column. + +In @strong{MySQL} Version 3.23.23 or later, you can also create special +@strong{FULLTEXT} indexes. They are used for full-text search. Only the +@code{MyISAM} table type supports @code{FULLTEXT} indexes. They can be +created only from @code{VARCHAR} and @code{TEXT} columns. +Indexing always happens over the entire column and partial indexing is not +supported. See @ref{Fulltext Search} for details. + +@node Multiple-column indexes, Table cache, Indexes, Optimizing Database Structure +@subsection Multiple-Column Indexes + +@cindex multi-column indexes +@cindex indexes, multi-column +@cindex keys, multi-column + +@strong{MySQL} can create indexes on multiple columns. An index may +consist of up to 15 columns. (On @code{CHAR} and @code{VARCHAR} columns you +can also use a prefix of the column as a part of an index). + +A multiple-column index can be considered a sorted array containing values +that are created by concatenating the values of the indexed columns. + +@strong{MySQL} uses multiple-column indexes in such a way that queries are +fast when you specify a known quantity for the first column of the index in a +@code{WHERE} clause, even if you don't specify values for the other columns. + +Suppose a table is created using the following specification: + +@example +mysql> CREATE TABLE test ( + id INT NOT NULL, + last_name CHAR(30) NOT NULL, + first_name CHAR(30) NOT NULL, + PRIMARY KEY (id), + INDEX name (last_name,first_name)); +@end example + +Then the index @code{name} is an index over @code{last_name} and +@code{first_name}. The index will be used for queries that specify +values in a known range for @code{last_name}, or for both @code{last_name} +and @code{first_name}. +Therefore, the @code{name} index will be used in the following queries: + +@example +mysql> SELECT * FROM test WHERE last_name="Widenius"; + +mysql> SELECT * FROM test WHERE last_name="Widenius" + AND first_name="Michael"; + +mysql> SELECT * FROM test WHERE last_name="Widenius" + AND (first_name="Michael" OR first_name="Monty"); + +mysql> SELECT * FROM test WHERE last_name="Widenius" + AND first_name >="M" AND first_name < "N"; +@end example + +However, the @code{name} index will NOT be used in the following queries: + +@example +mysql> SELECT * FROM test WHERE first_name="Michael"; + +mysql> SELECT * FROM test WHERE last_name="Widenius" + OR first_name="Michael"; +@end example + +For more information on the manner in which @strong{MySQL} uses indexes to +improve query performance, see @ref{MySQL indexes, , @strong{MySQL} +indexes}. + + +@node Table cache, Creating many tables, Multiple-column indexes, Optimizing Database Structure +@subsection How MySQL Opens and Closes Tables + +@findex table_cache + +@cindex tables, opening +@cindex tables, closing +@cindex opening, tables +@cindex closing, tables +@cindex table cache + +@code{table_cache}, @code{max_connections}, and @code{max_tmp_tables} +affect the maximum number of files the server keeps open. If you +increase one or both of these values, you may run up against a limit +imposed by your operating system on the per-process number of open file +descriptors. However, you can increase the limit on many systems. +Consult your OS documentation to find out how to do this, because the +method for changing the limit varies widely from system to system. + +@code{table_cache} is related to @code{max_connections}. For example, +for 200 concurrent running connections, you should have a table cache of +at least @code{200 * n}, where @code{n} is the maximum number of tables +in a join. You also need to reserve some extra file descriptors for +temporary tables and files. + +The cache of open tables can grow to a maximum of @code{table_cache} +(default 64; this can be changed with the @code{-O table_cache=#} +option to @code{mysqld}). A table is never closed, except when the +cache is full and another thread tries to open a table or if you use +@code{mysqladmin refresh} or @code{mysqladmin flush-tables}. + +When the table cache fills up, the server uses the following procedure +to locate a cache entry to use: + +@itemize @bullet +@item +Tables that are not currently in use are released, in least-recently-used +order. + +@item +If the cache is full and no tables can be released, but a new table needs to +be opened, the cache is temporarily extended as necessary. + +@item +If the cache is in a temporarily-extended state and a table goes from in-use +to not-in-use state, the table is closed and released from the cache. +@end itemize + +A table is opened for each concurrent access. This means that +if you have two threads accessing the same table or access the table +twice in the same query (with @code{AS}) the table needs to be opened twice. +The first open of any table takes two file descriptors; each additional +use of the table takes only one file descriptor. The extra descriptor +for the first open is used for the index file; this descriptor is shared +among all threads. + +You can check if your table cache is too small by checking the mysqld +variable @code{opened_tables}. If this is quite big, even if you +haven't done a lot of @code{FLUSH TABLES}, you should increase your table +cache. @xref{SHOW STATUS}. + + +@node Creating many tables, Open tables, Table cache, Optimizing Database Structure +@subsection Drawbacks to Creating Large Numbers of Tables in the Same Database + +@cindex tables, too many + +If you have many files in a directory, open, close, and create operations will +be slow. If you execute @code{SELECT} statements on many different tables, +there will be a little overhead when the table cache is full, because for +every table that has to be opened, another must be closed. You can reduce +this overhead by making the table cache larger. + + +@node Open tables, , Creating many tables, Optimizing Database Structure +@subsection Why So Many Open tables? + +@cindex tables, open +@cindex open tables + +When you run @code{mysqladmin status}, you'll see something like this: + +@example +Uptime: 426 Running threads: 1 Questions: 11082 Reloads: 1 Open tables: 12 +@end example + +This can be somewhat perplexing if you only have 6 tables. + +@strong{MySQL} is multithreaded, so it may have many queries on the same +table simultaneously. To minimize the problem with two threads having +different states on the same file, the table is opened independently by +each concurrent thread. This takes some memory and one extra file +descriptor for the data file. The index file descriptor is shared +between all threads. + + +@node Optimizing the Server, Disk issues, Optimizing Database Structure, MySQL Optimization +@section Optimizing the MySQL Server + + +@menu +* System:: +* Server parameters:: +* Compile and link options:: +* Memory use:: +* DNS:: +* SET OPTION:: +@end menu + +@node System, Server parameters, Optimizing the Server, Optimizing the Server +@subsection System/Compile Time and Startup Parameter Tuning + +@cindex compiling, optimizing +@cindex system optimization +@cindex startup parameters, tuning + +We start with the system level things since some of these decisions have +to be made very early. In other cases a fast look at this part may +suffice because it not that important for the big gains. However, it is always +nice to have a feeling about how much one could gain by changing things +at this level. + +The default OS to use is really important! To get the most use of +multiple CPU machines one should use Solaris (because the threads works +really nice) or Linux (because the 2.2 kernel has really good SMP +support). Also on 32-bit machines Linux has a 2G file size limit by +default. Hopefully this will be fixed soon when new filesystems are +released (XFS/Reiserfs). If you have a desperate need for files bigger +than 2G on Linux-intel 32 bit, you should get the LFS patch for the ext2 +file system. + +Because we have not run @strong{MySQL} in production on that many platforms, we +advice you to test your intended platform before choosing it, if possible. + +@cindex locking +Other tips: +@itemize @bullet +@item +If you have enough RAM, you could remove all swap devices. Some +operating systems will use a swap device in some contexts even if you +have free memory. +@item +Use the @code{--skip-locking} @strong{MySQL} option to avoid external +locking. Note that this will not impact @strong{MySQL}'s functionality as +long as you only run one server. Just remember to take down the server (or +lock relevant parts) before you run @code{myisamchk}. On some system +this switch is mandatory because the external locking does not work in any +case. + +The @code{--skip-locking} option is on by default when compiling with +MIT-pthreads, because @code{flock()} isn't fully supported by +MIT-pthreads on all platforms. It's also on default for Linux +as Linux file locking are not yet safe. + +The only case when you can't use @code{--skip-locking} is if you run +multiple @strong{MySQL} @emph{servers} (not clients) on the same data, +or run @code{myisamchk} on the table without first flushing and locking +the @code{mysqld} server tables first. + +You can still use @code{LOCK TABLES}/@code{UNLOCK TABLES} even if you +are using @code{--skip-locking} +@end itemize + + +@node Server parameters, Compile and link options, System, Optimizing the Server +@subsection Tuning Server Parameters + +@cindex parameters, server +@cindex @code{mysqld} server, buffer sizes +@cindex buffer sizes, @code{mysqld} server +@cindex startup parameters + +You can get the default buffer sizes used by the @code{mysqld} server +with this command: + +@example +shell> mysqld --help +@end example + +@cindex @code{mysqld} options +@cindex variables, @code{mysqld} +This command produces a list of all @code{mysqld} options and configurable +variables. The output includes the default values and looks something +like this: + +@example +Possible variables for option --set-variable (-O) are: +back_log current value: 5 +bdb_cache_size current value: 1048540 +binlog_cache_size current_value: 32768 +connect_timeout current value: 5 +delayed_insert_timeout current value: 300 +delayed_insert_limit current value: 100 +delayed_queue_size current value: 1000 +flush_time current value: 0 +interactive_timeout current value: 28800 +join_buffer_size current value: 131072 +key_buffer_size current value: 1048540 +lower_case_table_names current value: 0 +long_query_time current value: 10 +max_allowed_packet current value: 1048576 +max_binlog_cache_size current_value: 4294967295 +max_connections current value: 100 +max_connect_errors current value: 10 +max_delayed_threads current value: 20 +max_heap_table_size current value: 16777216 +max_join_size current value: 4294967295 +max_sort_length current value: 1024 +max_tmp_tables current value: 32 +max_write_lock_count current value: 4294967295 +myisam_sort_buffer_size current value: 8388608 +net_buffer_length current value: 16384 +net_retry_count current value: 10 +net_read_timeout current value: 30 +net_write_timeout current value: 60 +query_buffer_size current value: 0 +record_buffer current value: 131072 +slow_launch_time current value: 2 +sort_buffer current value: 2097116 +table_cache current value: 64 +thread_concurrency current value: 10 +tmp_table_size current value: 1048576 +thread_stack current value: 131072 +wait_timeout current value: 28800 +@end example + +If there is a @code{mysqld} server currently running, you can see what +values it actually is using for the variables by executing this command: + +@example +shell> mysqladmin variables +@end example + +You can find a full description for all variables in the @code{SHOW VARIABLES} +section in this manual. @xref{SHOW VARIABLES}. + +You can also see some statistics from a running server by issuing the command +@code{SHOW STATUS}. @xref{SHOW STATUS}. + +@strong{MySQL} uses algorithms that are very scalable, so you can usually +run with very little memory. If you, however, give @strong{MySQL} more +memory, you will normally also get better performance. + +When tuning a @strong{MySQL} server, the two most important variables to use +are @code{key_buffer_size} and @code{table_cache}. You should first feel +confident that you have these right before trying to change any of the +other variables. + +If you have much memory (>=256M) and many tables and want maximum performance +with a moderate number of clients, you should use something like this: + +@example +shell> safe_mysqld -O key_buffer=64M -O table_cache=256 \ + -O sort_buffer=4M -O record_buffer=1M & +@end example + +If you have only 128M and only a few tables, but you still do a lot of +sorting, you can use something like: + +@example +shell> safe_mysqld -O key_buffer=16M -O sort_buffer=1M +@end example + +If you have little memory and lots of connections, use something like this: + +@example +shell> safe_mysqld -O key_buffer=512k -O sort_buffer=100k \ + -O record_buffer=100k & +@end example + +or even: + +@example +shell> safe_mysqld -O key_buffer=512k -O sort_buffer=16k \ + -O table_cache=32 -O record_buffer=8k -O net_buffer=1K & +@end example + +When you have installed @strong{MySQL}, the @file{support-files} directory will +contain some different @code{my.cnf} example files, @file{my-huge.cnf}, +@file{my-large.cnf}, @file{my-medium.cnf}, and @file{my-small.cnf}, you can +use as a base to optimize your system. + +If there are very many connections, ``swapping problems'' may occur unless +@code{mysqld} has been configured to use very little memory for each +connection. @code{mysqld} performs better if you have enough memory for all +connections, of course. + +Note that if you change an option to @code{mysqld}, it remains in effect only +for that instance of the server. + +To see the effects of a parameter change, do something like this: + +@example +shell> mysqld -O key_buffer=32m --help +@end example + +Make sure that the @code{--help} option is last; otherwise, the effect of any +options listed after it on the command line will not be reflected in the +output. + + +@node Compile and link options, Memory use, Server parameters, Optimizing the Server +@subsection How Compiling and Linking Affects the Speed of MySQL + +@cindex linking, speed +@cindex compiling, speed +@cindex speed, compiling +@cindex speed, linking + +Most of the following tests are done on Linux with the +@strong{MySQL} benchmarks, but they should give some indication for +other operating systems and workloads. + +You get the fastest executable when you link with @code{-static}. + +On Linux, you will get the fastest code when compiling with @code{pgcc} +and @code{-O3}. To compile @file{sql_yacc.cc} with these options, you +need about 200M memory because @code{gcc/pgcc} needs a lot of memory to +make all functions inline. You should also set @code{CXX=gcc} when +configuring @strong{MySQL} to avoid inclusion of the @code{libstdc++} +library (it is not needed). Note that with some versions of @code{pgcc}, +the resulting code will only run on true Pentium processors, even if you +use the compiler option that you want the resulting code to be working on +all x586 type processors (like AMD). + +By just using a better compiler and/or better compiler options you can +get a 10-30 % speed increase in your application. This is particularly +important if you compile the SQL server yourself! + +We have tested both the Cygnus CodeFusion and Fujitsu compilers, but +when we tested them, neither was sufficiently bug free to allow +@strong{MySQL} to be compiled with optimizations on. + +When you compile @strong{MySQL} you should only include support for the +character sets that you are going to use. (Option @code{--with-charset=xxx}). +The standard @strong{MySQL} binary distributions are compiled with support +for all character sets. + +Here is a list of some measurements that we have done: +@itemize @bullet +@item +If you use @code{pgcc} and compile everything with @code{-O6}, the +@code{mysqld} server is 1% faster than with @code{gcc} 2.95.2. + +@item +If you link dynamically (without @code{-static}), the result is 13% +slower on Linux. Note that you still can use a dynamic linked +@strong{MySQL} library. It is only the server that is critical for +performance. + +@item +If you strip your @code{mysqld} binary with @code{strip libexec/mysqld}, +the resulting binary can be up to 4 % faster. + +@item +If you connect using TCP/IP rather than Unix sockets, the result is 7.5% +slower on the same computer. (If you are connection to @code{localhost}, +@strong{MySQL} will, by default, use sockets). + +@item +If you connect using TCP/IP from another computer over a 100M Ethernet, +things will be 8-11 % slower. + +@item +If you compile with @code{--with-debug=full}, then you will loose 20 % +for most queries, but some queries may take substantially longer (The +@strong{MySQL} benchmarks ran 35 % slower) +If you use @code{--with-debug}, then you will only loose 15 %. +By starting a @code{mysqld} version compiled with @code{--with-debug=full} +with @code{--skip-safemalloc} the end result should be close to when +configuring with @code{--with-debug}. + +@item +On a Sun SPARCstation 20, SunPro C++ 4.2 is 5 % faster than @code{gcc} 2.95.2. + +@item +Compiling with @code{gcc} 2.95.2 for ultrasparc with the option +@code{-mcpu=v8 -Wa,-xarch=v8plusa} gives 4 % more performance. + +@item +On Solaris 2.5.1, MIT-pthreads is 8-12% slower than Solaris native +threads on a single processor. With more load/CPUs the difference should +get bigger. + +@item +Running with @code{--log-bin} makes @strong{[MySQL} 1 % slower. + +@item +Compiling on Linux-x86 using gcc without frame pointers +@code{-fomit-frame-pointer} or @code{-fomit-frame-pointer -ffixed-ebp} +@code{mysqld} 1-4% faster. +@end itemize + +The @strong{MySQL}-Linux distribution provided by @strong{MySQL AB} used +to be compiled with @code{pgcc}, but we had to go back to regular gcc +because of a bug in @code{pgcc} that would generate the code that does +not run on AMD. We will continue using gcc until that bug is resolved. +In the meantime, if you have a non-AMD machine, you can get a faster +binary by compiling with @code{pgcc}. The standard @strong{MySQL} +Linux binary is linked statically to get it faster and more portable. + + +@node Memory use, DNS, Compile and link options, Optimizing the Server +@subsection How MySQL Uses Memory + +@cindex memory use + +The list below indicates some of the ways that the @code{mysqld} server +uses memory. Where applicable, the name of the server variable relevant +to the memory use is given: + +@itemize @bullet +@item +The key buffer (variable @code{key_buffer_size}) is shared by all +threads; Other buffers used by the server are allocated as +needed. @xref{Server parameters}. + +@item +Each connection uses some thread-specific space: A stack (default 64K, +variable @code{thread_stack}), a connection buffer (variable +@code{net_buffer_length}), and a result buffer (variable +@code{net_buffer_length}). The connection buffer and result buffer are +dynamically enlarged up to @code{max_allowed_packet} when needed. When +a query is running, a copy of the current query string is also allocated. + +@item +All threads share the same base memory. + +@item +Only the compressed ISAM / MyISAM tables are memory mapped. This is +because the 32-bit memory space of 4GB is not large enough for most +big tables. When systems with a 64-bit address space become more +common we may add general support for memory mapping. + +@item +Each request doing a sequential scan over a table allocates a read buffer +(variable @code{record_buffer}). + +@item +All joins are done in one pass, and most joins can be done without even +using a temporary table. Most temporary tables are memory-based (HEAP) +tables. Temporary tables with a big record length (calculated as the +sum of all column lengths) or that contain @code{BLOB} columns are +stored on disk. + +One problem in @strong{MySQL} versions before Version 3.23.2 is that if a HEAP table +exceeds the size of @code{tmp_table_size}, you get the error @code{The +table tbl_name is full}. In newer versions this is handled by +automatically changing the in-memory (HEAP) table to a disk-based +(MyISAM) table as necessary. To work around this problem, you can +increase the temporary table size by setting the @code{tmp_table_size} +option to @code{mysqld}, or by setting the SQL option +@code{SQL_BIG_TABLES} in the client program. @xref{SET OPTION, , +@code{SET OPTION}}. In @strong{MySQL} Version 3.20, the maximum size of the +temporary table was @code{record_buffer*16}, so if you are using this +version, you have to increase the value of @code{record_buffer}. You can +also start @code{mysqld} with the @code{--big-tables} option to always +store temporary tables on disk. However, this will affect the speed of +many complicated queries. + +@item +Most requests doing a sort allocates a sort buffer and 0-2 temporary +files depending on the result set size. @xref{Temporary files}. + +@item +Almost all parsing and calculating is done in a local memory store. No +memory overhead is needed for small items and the normal slow memory +allocation and freeing is avoided. Memory is allocated only for +unexpectedly large strings (this is done with @code{malloc()} and +@code{free()}). + +@item +Each index file is opened once and the data file is opened once for each +concurrently running thread. For each concurrent thread, a table structure, +column structures for each column, and a buffer of size @code{3 * n} is +allocated (where @code{n} is the maximum row length, not counting @code{BLOB} +columns). A @code{BLOB} uses 5 to 8 bytes plus the length of the @code{BLOB} +data. The @code{ISAM}/@code{MyISAM} table handlers will use one extra row +buffer for internal usage. + +@item +For each table having @code{BLOB} columns, a buffer is enlarged dynamically +to read in larger @code{BLOB} values. If you scan a table, a buffer as large +as the largest @code{BLOB} value is allocated. + +@item +Table handlers for all in-use tables are saved in a cache and managed as a +FIFO. Normally the cache has 64 entries. If a table has been used by two +running threads at the same time, the cache contains two entries for the +table. @xref{Table cache}. + +@item +A @code{mysqladmin flush-tables} command closes all tables that are not in +use and marks all in-use tables to be closed when the currently executing +thread finishes. This will effectively free most in-use memory. +@end itemize + +@code{ps} and other system status programs may report that @code{mysqld} +uses a lot of memory. This may be caused by thread-stacks on different +memory addresses. For example, the Solaris version of @code{ps} counts +the unused memory between stacks as used memory. You can verify this by +checking available swap with @code{swap -s}. We have tested +@code{mysqld} with commercial memory-leakage detectors, so there should +be no memory leaks. + + +@node DNS, SET OPTION, Memory use, Optimizing the Server +@subsection How MySQL uses DNS + +@cindex DNS +@cindex hostname caching + +When a new thread connects to @code{mysqld}, @code{mysqld} will span a +new thread to handle the request. This thread will first check if the +hostname is in the hostname cache. If not the thread will call +@code{gethostbyaddr_r()} and @code{gethostbyname_r()} to resolve the +hostname. + +If the operating system doesn't support the above thread-safe calls, the +thread will lock a mutex and call @code{gethostbyaddr()} and +@code{gethostbyname()} instead. Note that in this case no other thread +can resolve other hostnames that is not in the hostname cache until the +first thread is ready. + +You can disable DNS host lookup by starting @code{mysqld} with +@code{--skip-name-resolve}. In this case you can however only use IP +names in the @strong{MySQL} privilege tables. + +If you have a very slow DNS and many hosts, you can get more performance by +either disabling DNS lookop with @code{--skip-name-resolve} or by +increasing the @code{HOST_CACHE_SIZE} define (default: 128) and recompile +@code{mysqld}. + +You can disable the hostname cache with @code{--skip-host-cache}. You +can clear the hostname cache with @code{FLUSH HOSTS} or @code{mysqladmin +flush-hosts}. + +If you don't want to allow connections over @code{TCP/IP}, you can do this +by starting @code{mysqld} with @code{--skip-networking}. + + +@node SET OPTION, , DNS, Optimizing the Server +@subsection @code{SET} Syntax + +@findex SET OPTION + +@example +SET [OPTION] SQL_VALUE_OPTION= value, ... +@end example + +@code{SET OPTION} sets various options that affect the operation of the +server or your client. Any option you set remains in effect until the +current session ends, or until you set the option to a different value. + +@table @code +@item CHARACTER SET character_set_name | DEFAULT +This maps all strings from and to the client with the given mapping. +Currently the only option for @code{character_set_name} is +@code{cp1251_koi8}, but you can easily add new mappings by editing the +@file{sql/convert.cc} file in the @strong{MySQL} source distribution. The +default mapping can be restored by using a @code{character_set_name} value of +@code{DEFAULT}. + +Note that the syntax for setting the @code{CHARACTER SET} option differs +from the syntax for setting the other options. + +@item PASSWORD = PASSWORD('some password') +@cindex passwords, setting +Set the password for the current user. Any non-anonymous user can change his +own password! + +@item PASSWORD FOR user = PASSWORD('some password') +Set the password for a specific user on the current server host. Only a user +with access to the @code{mysql} database can do this. The user should be +given in @code{user@@hostname} format, where @code{user} and @code{hostname} +are exactly as they are listed in the @code{User} and @code{Host} columns of +the @code{mysql.user} table entry. For example, if you had an entry with +@code{User} and @code{Host} fields of @code{'bob'} and @code{'%.loc.gov'}, +you would write: + +@example +mysql> SET PASSWORD FOR bob@@"%.loc.gov" = PASSWORD("newpass"); + +or + +mysql> UPDATE mysql.user SET password=PASSWORD("newpass") where user="bob' and host="%.loc.gov"; +@end example + +@item SQL_AUTO_IS_NULL = 0 | 1 +If set to @code{1} (default) then one can find the last inserted row +for a table with an auto_increment row with the following construct: +@code{WHERE auto_increment_column IS NULL}. This is used by some +ODBC programs like Access. + +@item AUTOCOMMIT= 0 | 1 +If set to @code{1} all changes to a table will be done at once. To start +a multi-command transaction, you have to use the @code{BEGIN} +statement. @xref{COMMIT}. If set to @code{0} you have to use @code{COMMIT} / +@code{ROLLBACK} to accept/revoke that transaction. @xref{COMMIT}. Note +that when you change from not @code{AUTOCOMMIT} mode to +@code{AUTOCOMMIT} mode, @strong{MySQL} will do an automatic +@code{COMMIT} on any open transactions. + +@item SQL_BIG_TABLES = 0 | 1 +@cindex table is full +If set to @code{1}, all temporary tables are stored on disk rather than in +memory. This will be a little slower, but you will not get the error +@code{The table tbl_name is full} for big @code{SELECT} operations that +require a large temporary table. The default value for a new connection is +@code{0} (that is, use in-memory temporary tables). + +@item SQL_BIG_SELECTS = 0 | 1 +If set to @code{0}, @strong{MySQL} will abort if a @code{SELECT} is attempted +that probably will take a very long time. This is useful when an inadvisable +@code{WHERE} statement has been issued. A big query is defined as a +@code{SELECT} that probably will have to examine more than +@code{max_join_size} rows. The default value for a new connection is +@code{1} (which will allow all @code{SELECT} statements). + +@item SQL_BUFFER_RESULT = 0 | 1 +@code{SQL_BUFFER_RESULT} will force the result from @code{SELECT}'s +to be put into a temporary table. This will help @strong{MySQL} free the +table locks early and will help in cases where it takes a long time to +send the result set to the client. + +@item SQL_LOW_PRIORITY_UPDATES = 0 | 1 +If set to @code{1}, all @code{INSERT}, @code{UPDATE}, @code{DELETE}, and +and @code{LOCK TABLE WRITE} statements wait until there is no pending +@code{SELECT} or @code{LOCK TABLE READ} on the affected table. + +@item SQL_MAX_JOIN_SIZE = value | DEFAULT +Don't allow @code{SELECT}s that will probably need to examine more than +@code{value} row combinations. By setting this value, you can catch +@code{SELECT}s where keys are not used properly and that would probably +take a long time. Setting this to a value other than @code{DEFAULT} will reset +the @code{SQL_BIG_SELECTS} flag. If you set the @code{SQL_BIG_SELECTS} +flag again, the @code{SQL_MAX_JOIN_SIZE} variable will be ignored. +You can set a default value for this variable by starting @code{mysqld} with +@code{-O max_join_size=#}. + +@item SQL_SAFE_MODE = 0 | 1 +If set to @code{1}, @strong{MySQL} will abort if an @code{UPDATE} or +@code{DELETE} is attempted that doesn't use a key or @code{LIMIT} in the +@code{WHERE} clause. This makes it possible to catch wrong updates +when creating SQL commands by hand. + +@item SQL_SELECT_LIMIT = value | DEFAULT +The maximum number of records to return from @code{SELECT} statements. If +a @code{SELECT} has a @code{LIMIT} clause, the @code{LIMIT} takes precedence +over the value of @code{SQL_SELECT_LIMIT}. The default value for a new +connection is ``unlimited.'' If you have changed the limit, the default value +can be restored by using a @code{SQL_SELECT_LIMIT} value of @code{DEFAULT}. + +@item SQL_LOG_OFF = 0 | 1 +If set to @code{1}, no logging will be done to the standard log for this +client, if the client has the @strong{process} privilege. This does not +affect the update log! + +@item SQL_LOG_UPDATE = 0 | 1 +If set to @code{0}, no logging will be done to the update log for the client, +if the client has the @strong{process} privilege. This does not affect the +standard log! + +@item SQL_QUOTE_SHOW_CREATE = 0 | 1 +If set to @code{1}, @code{SHOW CREATE TABLE} will quote +table and column names. This is @strong{on} by default, +for replication of tables with fancy column names to work. +@ref{SHOW CREATE TABLE, , @code{SHOW CREATE TABLE}}. + +@item TIMESTAMP = timestamp_value | DEFAULT +Set the time for this client. This is used to get the original timestamp if +you use the update log to restore rows. @code{timestamp_value} should be a +UNIX Epoch timestamp, not a @strong{MySQL} timestamp. + +@item LAST_INSERT_ID = # +Set the value to be returned from @code{LAST_INSERT_ID()}. This is stored in +the update log when you use @code{LAST_INSERT_ID()} in a command that updates +a table. + +@item INSERT_ID = # +Set the value to be used by the following @code{INSERT} or @code{ALTER TABLE} +command when inserting an @code{AUTO_INCREMENT} value. This is mainly used +with the update log. +@end table + + +@menu +* SET TRANSACTION:: +@end menu + +@node SET TRANSACTION, , SET OPTION, SET OPTION +@subsubsection @code{SET TRANSACTION} Syntax + +@findex ISOLATION LEVEL + +@example +SET [GLOBAL | SESSION] TRANSACTION ISOLATION LEVEL +[READ UNCOMMITTED | READ COMMITTED | REPEATABLE READ | SERIALIZABLE] +@end example + +Sets the transaction isolation level for the global, whole session or +the next transaction. + +The default behavior is to set the isolation level for the next (not started) +transaction. + +If you set the @code{GLOBAL} privilege it will affect all new created threads. +You will need the @code{PROCESS} privilege to do do this. + +Setting the @code{SESSION} privilege will affect the following and all +future transactions. + +You can set the default isolation level for @code{mysqld} with +@code{--transaction-isolation=...}. @xref{Command-line options}. + + +@node Disk issues, , Optimizing the Server, MySQL Optimization +@section Disk Issues + +@cindex disk issues +@cindex performance, disk issues + +@itemize @bullet +@item +As mentioned before, disks seeks are a big performance bottleneck. This +problems gets more and more apparent when the data starts to grow so +large that effective caching becomes impossible. For large databases, +where you access data more or less randomly, you can be sure that you +will need at least one disk seek to read and a couple of disk seeks to +write things. To minimize this problem, use disks with low seek times. + +@item +Increase the number of available disk spindles (and thereby reduce +the seek overhead) by either symlink files to different disks or striping +the disks. + +@table @strong +@item Using symbolic links +This means that you symlink the index and/or data file(s) from the +normal data directory to another disk (that may also be striped). This +makes both the seek and read times better (if the disks are not used for +other things). @xref{Symbolic links}. + +@cindex striping, defined +@item Striping +Striping means that you have many disks and put the first block on the +first disk, the second block on the second disk, and the Nth on the +(N mod number_of_disks) disk, and so on. This means if your normal data +size is less than the stripe size (or perfectly aligned) you will get +much better performance. Note that striping is very dependent on the OS +and stripe-size. So benchmark your application with different +stripe-sizes. @xref{Custom Benchmarks}. + +Note that the speed difference for striping is @strong{very} dependent +on the parameters. Depending on how you set the striping parameters and +number of disks you may get a difference in orders of magnitude. Note that +you have to choose to optimize for random or sequential access. +@end table + +@item +For reliability you may want to use RAID 0+1 (striping + mirroring), but +in this case you will need 2*N drives to hold N drives of data. This is +probably the best option if you have the money for it! You may, however, +also have to invest in some volume-management software to handle it +efficiently. + +@item +A good option is to have semi-important data (that can be regenerated) +on RAID 0 disk while storing really important data (like host information +and logs) on a RAID 0+1 or RAID N disk. RAID N can be a problem if you +have many writes because of the time to update the parity bits. + +@item +You may also set the parameters for the file system that the database +uses. One easy change is to mount the file system with the noatime +option. That makes it skip the updating of the last access time in the +inode and by this will avoid some disk seeks. + +@item +On Linux, you can get much more performance (up to 100 % under load is +not uncommon) by using hdpram to configure your disk's interface! The +following should be quite good hdparm options for @strong{MySQL} (and +probably many other applications): + +@example +hdparm -m 16 -d 1 +@end example + +Note that the performance/reliability when using the above depends on +your hardware, so we strongly suggest that you test your system +thoroughly after using @code{hdparm}! Please consult the @code{hdparm} +man page for more information! If @code{hdparm} is not used wisely, +filesystem corruption may result. Backup everything before experimenting! + +@item +On many operating systems you can mount the disks with the 'async' flag to +set the file system to be updated asynchronously. If your computer is +reasonable stable, this should give you more performance without sacrificing +too much reliability. (This flag is on by default on Linux.) + +@item +If you don't need to know when a file was last accessed (which is not +really useful on a database server), you can mount your file systems +with the noatime flag. +@end itemize + +@menu +* Symbolic links:: +@end menu + +@node Symbolic links, , Disk issues, Disk issues +@subsection Using Symbolic Links + +@cindex symbolic links +@cindex links, symbolic + +You can move tables and databases from the database directory to other +locations and replace them with symbolic links to the new locations. +You might want to do this, for example, to move a database to a file +system with more free space or increase the speed of your system by +spreading your tables to different disk. + +The recommended may to do this, is to just symlink databases to different +disk and only symlink tables as a last resort. + +@cindex databases, symbolic links +@menu +* Symbolic links to databases:: +* Symbolic links to tables:: +@end menu + + +@node Symbolic links to databases, Symbolic links to tables, Symbolic links, Symbolic links +@subsubsection Using Symbolic Links for Databases + +The way to symlink a database is to first create a directory on some +disk where you have free space and then create a symlink to it from +the @strong{MySQL} database directory. + +@example +shell> mkdir /dr1/databases/test +shell> ln -s /dr1/databases/test mysqld-datadir +@end example + +@strong{MySQL} doesn't support that you link one directory to multiple +databases. Replacing a database directory with a symbolic link will +work fine as long as you don't make a symbolic link between databases. +Suppose you have a database @code{db1} under the @strong{MySQL} data +directory, and then make a symlink @code{db2} that points to @code{db1}: + +@example +shell> cd /path/to/datadir +shell> ln -s db1 db2 +@end example + +Now, for any table @code{tbl_a} in @code{db1}, there also appears to be +a table @code{tbl_a} in @code{db2}. If one thread updates @code{db1.tbl_a} +and another thread updates @code{db2.tbl_a}, there will be problems. + +If you really need this, you must change the following code in +@file{mysys/mf_format.c}: + +@example +if (flag & 32 || (!lstat(to,&stat_buff) && S_ISLNK(stat_buff.st_mode))) +@end example + +to + +@example +if (1) +@end example + +On Windows you can use internal symbolic links to directories by compiling +@strong{MySQL} with @code{-DUSE_SYMDIR}. This allows you to put different +databases on different disks. @xref{Windows symbolic links}. + + +@node Symbolic links to tables, , Symbolic links to databases, Symbolic links +@subsubsection Using Symbolic Links for Tables + +@cindex databases, symbolic links + +Before @strong{MySQL} 4.0 you should not symlink tables, if you are not +very carefully with them. The problem is that if you run @code{ALTER +TABLE}, @code{REPAIR TABLE} or @code{OPTIMIZE TABLE} on a symlinked +table, the symlinks will be removed and replaced by the original +files. This happens because the above command works by creating a +temporary file in the database directory and when the command is +complete, replace the original file with the temporary file. + +You should not symlink tables on system that doesn't have a fully +working @code{realpath()} call. (At least Linux and Solaris support +@code{realpath()}) + +In @strong{MySQL} 4.0 symlinks is only fully supported for @code{MyISAM} +tables. For other table types you will probably get strange problems +when doing any of the above mentioned commands. + +The handling of symbolic links in @strong{MySQL} 4.0 works the following +way (this is mostly relevant only for @code{MyISAM} tables). + +@itemize @bullet +@item +In the data directory you will always have the table definition file +and the data/index files. + +@item +You can symlink the index file and the data file to different directories +independent of the other. + +@item +The symlinking can be done from the operating system (if @code{mysqld} is +not running) or with the @code{INDEX/DATA DIRECTORY="path-to-dir"} command +in @code{CREATE TABLE}. @xref{CREATE TABLE}. + +@item +@code{myisamchk} will not replace a symlink with the index/file but +work directly on the files the symlinks points to. Any temporary files +will be created in the same directory where the data/index file is. + +@item +When you drop a table that is using symlinks, both the symlink and the +file the symlink points to is dropped. This is a good reason to why you +should NOT run @code{mysqld} as root and not allow persons to have write +access to the @strong{MySQL} database directories. + +@item +If you rename a table with @code{ALTER TABLE RENAME} and you don't change +database, the symlink in the database directory will be renamed to the new +name and the data/index file will be renamed accordingly. + +@item +If you use @code{ALTER TABLE RENAME} to move a table to another database, +then the table will be moved to the other database directory and the old +symlinks and the files they pointed to will be deleted. + +@item +If you are not using symlinks you should use the @code{--skip-symlink} +option to @code{mysqld} to ensure that no one can drop or rename a file +outside of the @code{mysqld} data directory. +@end itemize + +Things that are not yet supported: + +@cindex TODO, symlinks +@itemize @bullet +@item +@code{ALTER TABLE} ignores all @code{INDEX/DATA DIRECTORY="path"} options. +@item +@code{CREATE TABLE} doesn't report if the table has symbolic links. +@item +@code{mysqldump} doesn't include the symbolic links information in the output. +@item +@code{BACKUP TABLE} and @code{RESTORE TABLE} doesn't use symbolic links. +@end itemize + + + + + +@node Reference, Table types, MySQL Optimization, Top @chapter MySQL Language Reference @menu @@ -23688,12 +26668,9 @@ isolate it into a separate test case first. Then report the problem to * LOAD DATA:: @code{LOAD DATA INFILE} syntax * UPDATE:: @code{UPDATE} syntax * USE:: @code{USE} syntax -* EXPLAIN:: @code{EXPLAIN} syntax (Get information about a @code{SELECT}) * DESCRIBE:: @code{DESCRIBE} syntax (Get information about names of columns) * COMMIT:: @code{BEGIN/COMMIT/ROLLBACK} syntax * LOCK TABLES:: @code{LOCK TABLES/UNLOCK TABLES} syntax -* SET OPTION:: @code{SET OPTION} syntax -* SET TRANSACTION:: @code{SET TRANSACTION} syntax * CREATE INDEX:: @code{CREATE INDEX} syntax * DROP INDEX:: @code{DROP INDEX} syntax * Comments:: Comment syntax @@ -24490,8 +27467,6 @@ be chosen from the list of values @code{'value1'}, @code{'value2'}, * Date and time types:: Date and time types * String types:: String types * Choosing types:: Choosing the right type for a column -* Indexes:: Column indexes -* Multiple-column indexes:: Multiple-column indexes * Other-vendor column types:: Using column types from other database engines @end menu @@ -25548,7 +28523,7 @@ the @code{SET} definition in the second column. @cindex types, columns @cindex choosing types -@node Choosing types, Indexes, String types, Column types +@node Choosing types, Other-vendor column types, String types, Column types @subsection Choosing the Right Type for a Column For the most efficient use of storage, try to use the most precise type in @@ -25565,113 +28540,13 @@ For high precision, you can always convert to a fixed-point type stored in a @code{BIGINT}. This allows you to do all calculations with integers and convert results back to floating-point values only when necessary. -@cindex indexes, columns -@cindex columns, indexes -@cindex keys -@node Indexes, Multiple-column indexes, Choosing types, Column types -@subsection Column Indexes - -All @strong{MySQL} column types can be indexed. Use of indexes on the -relevant columns is the best way to improve the performance of @code{SELECT} -operations. - -The maximum number of keys and the maximum index length is defined per -table handler. @xref{Table types}. You can with all table handlers have -at least 16 keys and a total index length of at least 256 bytes. - -For @code{CHAR} and @code{VARCHAR} columns, you can index a prefix of a -column. This is much faster and requires less disk space than indexing the -whole column. The syntax to use in the @code{CREATE TABLE} statement to -index a column prefix looks like this: - -@example -KEY index_name (col_name(length)) -@end example - -The example below creates an index for the first 10 characters of the -@code{name} column: - -@example -mysql> CREATE TABLE test ( - name CHAR(200) NOT NULL, - KEY index_name (name(10))); -@end example - -For @code{BLOB} and @code{TEXT} columns, you must index a prefix of the -column. You cannot index the entire column. - -In @strong{MySQL} Version 3.23.23 or later, you can also create special -@strong{FULLTEXT} indexes. They are used for full-text search. Only the -@code{MyISAM} table type supports @code{FULLTEXT} indexes. They can be -created only from @code{VARCHAR} and @code{TEXT} columns. -Indexing always happens over the entire column and partial indexing is not -supported. See @ref{Fulltext Search} for details. - -@cindex multi-column indexes -@cindex indexes, multi-column -@cindex keys, multi-column -@node Multiple-column indexes, Other-vendor column types, Indexes, Column types -@subsection Multiple-column Indexes - -@strong{MySQL} can create indexes on multiple columns. An index may -consist of up to 15 columns. (On @code{CHAR} and @code{VARCHAR} columns you -can also use a prefix of the column as a part of an index). - -A multiple-column index can be considered a sorted array containing values -that are created by concatenating the values of the indexed columns. - -@strong{MySQL} uses multiple-column indexes in such a way that queries are -fast when you specify a known quantity for the first column of the index in a -@code{WHERE} clause, even if you don't specify values for the other columns. - -Suppose a table is created using the following specification: - -@example -mysql> CREATE TABLE test ( - id INT NOT NULL, - last_name CHAR(30) NOT NULL, - first_name CHAR(30) NOT NULL, - PRIMARY KEY (id), - INDEX name (last_name,first_name)); -@end example - -Then the index @code{name} is an index over @code{last_name} and -@code{first_name}. The index will be used for queries that specify -values in a known range for @code{last_name}, or for both @code{last_name} -and @code{first_name}. -Therefore, the @code{name} index will be used in the following queries: - -@example -mysql> SELECT * FROM test WHERE last_name="Widenius"; - -mysql> SELECT * FROM test WHERE last_name="Widenius" - AND first_name="Michael"; - -mysql> SELECT * FROM test WHERE last_name="Widenius" - AND (first_name="Michael" OR first_name="Monty"); - -mysql> SELECT * FROM test WHERE last_name="Widenius" - AND first_name >="M" AND first_name < "N"; -@end example - -However, the @code{name} index will NOT be used in the following queries: - -@example -mysql> SELECT * FROM test WHERE first_name="Michael"; - -mysql> SELECT * FROM test WHERE last_name="Widenius" - OR first_name="Michael"; -@end example -For more information on the manner in which @strong{MySQL} uses indexes to -improve query performance, see @ref{MySQL indexes, , @strong{MySQL} -indexes}. +@node Other-vendor column types, , Choosing types, Column types +@subsection Using Column Types from Other Database Engines @cindex types, portability @cindex portability, types @cindex columns, other types -@node Other-vendor column types, , Multiple-column indexes, Column types -@subsection Using Column Types from Other Database Engines To make it easier to use code written for SQL implementations from other vendors, @strong{MySQL} maps column types as shown in the table below. These @@ -30702,7 +33577,7 @@ In @strong{MySQL} Version 3.23, you can use @code{LIMIT #} to ensure that only a given number of rows are changed. @findex USE -@node USE, EXPLAIN, UPDATE, Reference +@node USE, DESCRIBE, UPDATE, Reference @section @code{USE} Syntax @example @@ -30737,364 +33612,10 @@ mysql> SELECT author_name,editor_name FROM author,db2.editor The @code{USE} statement is provided for Sybase compatibility. -@findex EXPLAIN -@findex SELECT, optimizing -@node EXPLAIN, DESCRIBE, USE, Reference -@section @code{EXPLAIN} Syntax (Get Information About a @code{SELECT}) - -@example - EXPLAIN tbl_name -or EXPLAIN SELECT select_options -@end example - -@code{EXPLAIN tbl_name} is a synonym for @code{DESCRIBE tbl_name} or -@code{SHOW COLUMNS FROM tbl_name}. - -When you precede a @code{SELECT} statement with the keyword @code{EXPLAIN}, -@strong{MySQL} explains how it would process the @code{SELECT}, providing -information about how tables are joined and in which order. - -With the help of @code{EXPLAIN}, you can see when you must add indexes -to tables to get a faster @code{SELECT} that uses indexes to find the -records. You can also see if the optimizer joins the tables in an optimal -order. To force the optimizer to use a specific join order for a -@code{SELECT} statement, add a @code{STRAIGHT_JOIN} clause. - -For non-simple joins, @code{EXPLAIN} returns a row of information for each -table used in the @code{SELECT} statement. The tables are listed in the order -they would be read. @strong{MySQL} resolves all joins using a single-sweep -multi-join method. This means that @strong{MySQL} reads a row from the first -table, then finds a matching row in the second table, then in the third table -and so on. When all tables are processed, it outputs the selected columns and -backtracks through the table list until a table is found for which there are -more matching rows. The next row is read from this table and the process -continues with the next table. - -Output from @code{EXPLAIN} includes the following columns: - -@table @code -@item table -The table to which the row of output refers. - -@item type -The join type. Information about the various types is given below. - -@item possible_keys -The @code{possible_keys} column indicates which indexes @strong{MySQL} -could use to find the rows in this table. Note that this column is -totally independent of the order of the tables. That means that some of -the keys in possible_keys may not be usable in practice with the -generated table order. - -If this column is empty, there are no relevant indexes. In this case, -you may be able to improve the performance of your query by examining -the @code{WHERE} clause to see if it refers to some column or columns -that would be suitable for indexing. If so, create an appropriate index -and check the query with @code{EXPLAIN} again. @xref{ALTER TABLE}. - -To see what indexes a table has, use @code{SHOW INDEX FROM tbl_name}. - -@item key -The @code{key} column indicates the key that @strong{MySQL} actually -decided to use. The key is @code{NULL} if no index was chosen. If -@strong{MySQL} chooses the wrong index, you can probably force -@strong{MySQL} to use another index by using @code{myisamchk --analyze}, -@xref{myisamchk syntax}, or by using @code{USE INDEX/IGNORE INDEX}. -@xref{JOIN}. - -@item key_len -The @code{key_len} column indicates the length of the key that -@strong{MySQL} decided to use. The length is @code{NULL} if the -@code{key} is @code{NULL}. Note that this tells us how many parts of a -multi-part key @strong{MySQL} will actually use. - -@item ref -The @code{ref} column shows which columns or constants are used with the -@code{key} to select rows from the table. - -@item rows -The @code{rows} column indicates the number of rows @strong{MySQL} -believes it must examine to execute the query. - -@item Extra -This column contains additional information of how @strong{MySQL} will -resolve the query. Here is an explanation of the different text -strings that can be found in this column: - -@table @code -@item Distinct -@strong{MySQL} will not continue searching for more rows for the current row -combination after it has found the first matching row. - -@item Not exists -@strong{MySQL} was able to do a @code{LEFT JOIN} optimization on the -query and will not examine more rows in this table for the previous row -combination after it finds one row that matches the @code{LEFT JOIN} criteria. - -Here is an example for this: - -@example -SELECT * FROM t1 LEFT JOIN t2 ON t1.id=t2.id WHERE t2.id IS NULL; -@end example - -Assume that @code{t2.id} is defined with @code{NOT NULL}. In this case -@strong{MySQL} will scan @code{t1} and look up the rows in @code{t2} -through @code{t1.id}. If @strong{MySQL} finds a matching row in -@code{t2}, it knows that @code{t2.id} can never be @code{NULL}, and will -not scan through the rest of the rows in @code{t2} that has the same -@code{id}. In other words, for each row in @code{t1}, @strong{MySQL} -only needs to do a single lookup in @code{t2}, independent of how many -matching rows there are in @code{t2}. - -@item @code{range checked for each record (index map: #)} -@strong{MySQL} didn't find a real good index to use. It will, instead, for -each row combination in the preceding tables, do a check on which index to -use (if any), and use this index to retrieve the rows from the table. This -isn't very fast but is faster than having to do a join without -an index. - -@item Using filesort -@strong{MySQL} will need to do an extra pass to find out how to retrieve -the rows in sorted order. The sort is done by going through all rows -according to the @code{join type} and storing the sort key + pointer to -the row for all rows that match the @code{WHERE}. Then the keys are -sorted. Finally the rows are retrieved in sorted order. - -@item Using index -The column information is retrieved from the table using only -information in the index tree without having to do an additional seek to -read the actual row. This can be done when all the used columns for -the table are part of the same index. - -@item Using temporary -To resolve the query @strong{MySQL} will need to create a -temporary table to hold the result. This typically happens if you do an -@code{ORDER BY} on a different column set than you did a @code{GROUP -BY} on. - -@item Where used -A @code{WHERE} clause will be used to restrict which rows will be -matched against the next table or sent to the client. If you don't have -this information and the table is of type @code{ALL} or @code{index}, -you may have something wrong in your query (if you don't intend to -fetch/examine all rows from the table). -@end table - -If you want to get your queries as fast as possible, you should look out for -@code{Using filesort} and @code{Using temporary}. -@end table - -The different join types are listed below, ordered from best to worst type: - -@cindex system table -@cindex tables, system -@table @code -@item system -The table has only one row (= system table). This is a special case of -the @code{const} join type. - -@cindex constant table -@cindex tables, constant -@item const -The table has at most one matching row, which will be read at the start -of the query. Because there is only one row, values from the column in -this row can be regarded as constants by the rest of the -optimizer. @code{const} tables are very fast as they are read only once! - -@item eq_ref -One row will be read from this table for each combination of rows from -the previous tables. This is the best possible join type, other than the -@code{const} types. It is used when all parts of an index are used by -the join and the index is @code{UNIQUE} or a @code{PRIMARY KEY}. - -@item ref -All rows with matching index values will be read from this table for each -combination of rows from the previous tables. @code{ref} is used if the join -uses only a leftmost prefix of the key, or if the key is not @code{UNIQUE} -or a @code{PRIMARY KEY} (in other words, if the join cannot select a single -row based on the key value). If the key that is used matches only a few rows, -this join type is good. - -@item range -Only rows that are in a given range will be retrieved, using an index to -select the rows. The @code{key} column indicates which index is used. -The @code{key_len} contains the longest key part that was used. -The @code{ref} column will be NULL for this type. - -@item index -This is the same as @code{ALL}, except that only the index tree is -scanned. This is usually faster than @code{ALL}, as the index file is usually -smaller than the data file. - -@item ALL -A full table scan will be done for each combination of rows from the -previous tables. This is normally not good if the table is the first -table not marked @code{const}, and usually @strong{very} bad in all other -cases. You normally can avoid @code{ALL} by adding more indexes, so that -the row can be retrieved based on constant values or column values from -earlier tables. -@end table - -You can get a good indication of how good a join is by multiplying all values -in the @code{rows} column of the @code{EXPLAIN} output. This should tell you -roughly how many rows @strong{MySQL} must examine to execute the query. This -number is also used when you restrict queries with the @code{max_join_size} -variable. -@xref{Server parameters}. - -The following example shows how a @code{JOIN} can be optimized progressively -using the information provided by @code{EXPLAIN}. - -Suppose you have the @code{SELECT} statement shown below, that you examine -using @code{EXPLAIN}: - -@example -EXPLAIN SELECT tt.TicketNumber, tt.TimeIn, - tt.ProjectReference, tt.EstimatedShipDate, - tt.ActualShipDate, tt.ClientID, - tt.ServiceCodes, tt.RepetitiveID, - tt.CurrentProcess, tt.CurrentDPPerson, - tt.RecordVolume, tt.DPPrinted, et.COUNTRY, - et_1.COUNTRY, do.CUSTNAME - FROM tt, et, et AS et_1, do - WHERE tt.SubmitTime IS NULL - AND tt.ActualPC = et.EMPLOYID - AND tt.AssignedPC = et_1.EMPLOYID - AND tt.ClientID = do.CUSTNMBR; -@end example - -For this example, assume that: - -@itemize @bullet -@item -The columns being compared have been declared as follows: - -@multitable @columnfractions .1 .2 .7 -@item @strong{Table} @tab @strong{Column} @tab @strong{Column type} -@item @code{tt} @tab @code{ActualPC} @tab @code{CHAR(10)} -@item @code{tt} @tab @code{AssignedPC} @tab @code{CHAR(10)} -@item @code{tt} @tab @code{ClientID} @tab @code{CHAR(10)} -@item @code{et} @tab @code{EMPLOYID} @tab @code{CHAR(15)} -@item @code{do} @tab @code{CUSTNMBR} @tab @code{CHAR(15)} -@end multitable - -@item -The tables have the indexes shown below: - -@multitable @columnfractions .1 .9 -@item @strong{Table} @tab @strong{Index} -@item @code{tt} @tab @code{ActualPC} -@item @code{tt} @tab @code{AssignedPC} -@item @code{tt} @tab @code{ClientID} -@item @code{et} @tab @code{EMPLOYID} (primary key) -@item @code{do} @tab @code{CUSTNMBR} (primary key) -@end multitable - -@item -The @code{tt.ActualPC} values aren't evenly distributed. -@end itemize - -Initially, before any optimizations have been performed, the @code{EXPLAIN} -statement produces the following information: - -@example -table type possible_keys key key_len ref rows Extra -et ALL PRIMARY NULL NULL NULL 74 -do ALL PRIMARY NULL NULL NULL 2135 -et_1 ALL PRIMARY NULL NULL NULL 74 -tt ALL AssignedPC,ClientID,ActualPC NULL NULL NULL 3872 - range checked for each record (key map: 35) -@end example - -Because @code{type} is @code{ALL} for each table, this output indicates that -@strong{MySQL} is doing a full join for all tables! This will take quite a -long time, as the product of the number of rows in each table must be -examined! For the case at hand, this is @code{74 * 2135 * 74 * 3872 = -45,268,558,720} rows. If the tables were bigger, you can only imagine how -long it would take. - -One problem here is that @strong{MySQL} can't (yet) use indexes on columns -efficiently if they are declared differently. In this context, -@code{VARCHAR} and @code{CHAR} are the same unless they are declared as -different lengths. Because @code{tt.ActualPC} is declared as @code{CHAR(10)} -and @code{et.EMPLOYID} is declared as @code{CHAR(15)}, there is a length -mismatch. - -To fix this disparity between column lengths, use @code{ALTER TABLE} to -lengthen @code{ActualPC} from 10 characters to 15 characters: - -@example -mysql> ALTER TABLE tt MODIFY ActualPC VARCHAR(15); -@end example - -Now @code{tt.ActualPC} and @code{et.EMPLOYID} are both @code{VARCHAR(15)}. -Executing the @code{EXPLAIN} statement again produces this result: - -@example -table type possible_keys key key_len ref rows Extra -tt ALL AssignedPC,ClientID,ActualPC NULL NULL NULL 3872 where used -do ALL PRIMARY NULL NULL NULL 2135 - range checked for each record (key map: 1) -et_1 ALL PRIMARY NULL NULL NULL 74 - range checked for each record (key map: 1) -et eq_ref PRIMARY PRIMARY 15 tt.ActualPC 1 -@end example - -This is not perfect, but is much better (the product of the @code{rows} -values is now less by a factor of 74). This version is executed in a couple -of seconds. - -A second alteration can be made to eliminate the column length mismatches -for the @code{tt.AssignedPC = et_1.EMPLOYID} and @code{tt.ClientID = -do.CUSTNMBR} comparisons: - -@example -mysql> ALTER TABLE tt MODIFY AssignedPC VARCHAR(15), - MODIFY ClientID VARCHAR(15); -@end example - -Now @code{EXPLAIN} produces the output shown below: - -@example -table type possible_keys key key_len ref rows Extra -et ALL PRIMARY NULL NULL NULL 74 -tt ref AssignedPC,ClientID,ActualPC ActualPC 15 et.EMPLOYID 52 where used -et_1 eq_ref PRIMARY PRIMARY 15 tt.AssignedPC 1 -do eq_ref PRIMARY PRIMARY 15 tt.ClientID 1 -@end example - -This is almost as good as it can get. - -The remaining problem is that, by default, @strong{MySQL} assumes that values -in the @code{tt.ActualPC} column are evenly distributed, and that isn't the -case for the @code{tt} table. Fortunately, it is easy to tell @strong{MySQL} -about this: - -@example -shell> myisamchk --analyze PATH_TO_MYSQL_DATABASE/tt -shell> mysqladmin refresh -@end example - -Now the join is perfect, and @code{EXPLAIN} produces this result: - -@example -table type possible_keys key key_len ref rows Extra -tt ALL AssignedPC,ClientID,ActualPC NULL NULL NULL 3872 where used -et eq_ref PRIMARY PRIMARY 15 tt.ActualPC 1 -et_1 eq_ref PRIMARY PRIMARY 15 tt.AssignedPC 1 -do eq_ref PRIMARY PRIMARY 15 tt.ClientID 1 -@end example - -Note that the @code{rows} column in the output from @code{EXPLAIN} is an -educated guess from the @strong{MySQL} join optimizer. To optimize a -query, you should check if the numbers are even close to the truth. If not, -you may get better performance by using @code{STRAIGHT_JOIN} in your -@code{SELECT} statement and trying to list the tables in a different order in -the @code{FROM} clause. @findex DESC @findex DESCRIBE -@node DESCRIBE, COMMIT, EXPLAIN, Reference +@node DESCRIBE, COMMIT, USE, Reference @section @code{DESCRIBE} Syntax (Get Information About Columns) @example @@ -31179,7 +33700,7 @@ You can change the isolation level for transactions with @findex LOCK TABLES @findex UNLOCK TABLES -@node LOCK TABLES, SET OPTION, COMMIT, Reference +@node LOCK TABLES, CREATE INDEX, COMMIT, Reference @section @code{LOCK TABLES/UNLOCK TABLES} Syntax @example @@ -31313,181 +33834,15 @@ that can take snapshots in time. automatically commit any active transactions before attempting to lock the tables. -@findex SET OPTION -@node SET OPTION, SET TRANSACTION, LOCK TABLES, Reference -@section @code{SET} Syntax - -@example -SET [OPTION] SQL_VALUE_OPTION= value, ... -@end example - -@code{SET OPTION} sets various options that affect the operation of the -server or your client. Any option you set remains in effect until the -current session ends, or until you set the option to a different value. - -@table @code -@item CHARACTER SET character_set_name | DEFAULT -This maps all strings from and to the client with the given mapping. -Currently the only option for @code{character_set_name} is -@code{cp1251_koi8}, but you can easily add new mappings by editing the -@file{sql/convert.cc} file in the @strong{MySQL} source distribution. The -default mapping can be restored by using a @code{character_set_name} value of -@code{DEFAULT}. - -Note that the syntax for setting the @code{CHARACTER SET} option differs -from the syntax for setting the other options. - -@item PASSWORD = PASSWORD('some password') -@cindex passwords, setting -Set the password for the current user. Any non-anonymous user can change his -own password! - -@item PASSWORD FOR user = PASSWORD('some password') -Set the password for a specific user on the current server host. Only a user -with access to the @code{mysql} database can do this. The user should be -given in @code{user@@hostname} format, where @code{user} and @code{hostname} -are exactly as they are listed in the @code{User} and @code{Host} columns of -the @code{mysql.user} table entry. For example, if you had an entry with -@code{User} and @code{Host} fields of @code{'bob'} and @code{'%.loc.gov'}, -you would write: - -@example -mysql> SET PASSWORD FOR bob@@"%.loc.gov" = PASSWORD("newpass"); - -or - -mysql> UPDATE mysql.user SET password=PASSWORD("newpass") where user="bob' and host="%.loc.gov"; -@end example - -@item SQL_AUTO_IS_NULL = 0 | 1 -If set to @code{1} (default) then one can find the last inserted row -for a table with an auto_increment row with the following construct: -@code{WHERE auto_increment_column IS NULL}. This is used by some -ODBC programs like Access. - -@item AUTOCOMMIT= 0 | 1 -If set to @code{1} all changes to a table will be done at once. To start -a multi-command transaction, you have to use the @code{BEGIN} -statement. @xref{COMMIT}. If set to @code{0} you have to use @code{COMMIT} / -@code{ROLLBACK} to accept/revoke that transaction. @xref{COMMIT}. Note -that when you change from not @code{AUTOCOMMIT} mode to -@code{AUTOCOMMIT} mode, @strong{MySQL} will do an automatic -@code{COMMIT} on any open transactions. - -@item SQL_BIG_TABLES = 0 | 1 -@cindex table is full -If set to @code{1}, all temporary tables are stored on disk rather than in -memory. This will be a little slower, but you will not get the error -@code{The table tbl_name is full} for big @code{SELECT} operations that -require a large temporary table. The default value for a new connection is -@code{0} (that is, use in-memory temporary tables). - -@item SQL_BIG_SELECTS = 0 | 1 -If set to @code{0}, @strong{MySQL} will abort if a @code{SELECT} is attempted -that probably will take a very long time. This is useful when an inadvisable -@code{WHERE} statement has been issued. A big query is defined as a -@code{SELECT} that probably will have to examine more than -@code{max_join_size} rows. The default value for a new connection is -@code{1} (which will allow all @code{SELECT} statements). - -@item SQL_BUFFER_RESULT = 0 | 1 -@code{SQL_BUFFER_RESULT} will force the result from @code{SELECT}'s -to be put into a temporary table. This will help @strong{MySQL} free the -table locks early and will help in cases where it takes a long time to -send the result set to the client. - -@item SQL_LOW_PRIORITY_UPDATES = 0 | 1 -If set to @code{1}, all @code{INSERT}, @code{UPDATE}, @code{DELETE}, and -and @code{LOCK TABLE WRITE} statements wait until there is no pending -@code{SELECT} or @code{LOCK TABLE READ} on the affected table. - -@item SQL_MAX_JOIN_SIZE = value | DEFAULT -Don't allow @code{SELECT}s that will probably need to examine more than -@code{value} row combinations. By setting this value, you can catch -@code{SELECT}s where keys are not used properly and that would probably -take a long time. Setting this to a value other than @code{DEFAULT} will reset -the @code{SQL_BIG_SELECTS} flag. If you set the @code{SQL_BIG_SELECTS} -flag again, the @code{SQL_MAX_JOIN_SIZE} variable will be ignored. -You can set a default value for this variable by starting @code{mysqld} with -@code{-O max_join_size=#}. - -@item SQL_SAFE_MODE = 0 | 1 -If set to @code{1}, @strong{MySQL} will abort if an @code{UPDATE} or -@code{DELETE} is attempted that doesn't use a key or @code{LIMIT} in the -@code{WHERE} clause. This makes it possible to catch wrong updates -when creating SQL commands by hand. - -@item SQL_SELECT_LIMIT = value | DEFAULT -The maximum number of records to return from @code{SELECT} statements. If -a @code{SELECT} has a @code{LIMIT} clause, the @code{LIMIT} takes precedence -over the value of @code{SQL_SELECT_LIMIT}. The default value for a new -connection is ``unlimited.'' If you have changed the limit, the default value -can be restored by using a @code{SQL_SELECT_LIMIT} value of @code{DEFAULT}. - -@item SQL_LOG_OFF = 0 | 1 -If set to @code{1}, no logging will be done to the standard log for this -client, if the client has the @strong{process} privilege. This does not -affect the update log! - -@item SQL_LOG_UPDATE = 0 | 1 -If set to @code{0}, no logging will be done to the update log for the client, -if the client has the @strong{process} privilege. This does not affect the -standard log! - -@item SQL_QUOTE_SHOW_CREATE = 0 | 1 -If set to @code{1}, @code{SHOW CREATE TABLE} will quote -table and column names. This is @strong{on} by default, -for replication of tables with fancy column names to work. -@ref{SHOW CREATE TABLE, , @code{SHOW CREATE TABLE}}. - -@item TIMESTAMP = timestamp_value | DEFAULT -Set the time for this client. This is used to get the original timestamp if -you use the update log to restore rows. @code{timestamp_value} should be a -UNIX Epoch timestamp, not a @strong{MySQL} timestamp. - -@item LAST_INSERT_ID = # -Set the value to be returned from @code{LAST_INSERT_ID()}. This is stored in -the update log when you use @code{LAST_INSERT_ID()} in a command that updates -a table. - -@item INSERT_ID = # -Set the value to be used by the following @code{INSERT} or @code{ALTER TABLE} -command when inserting an @code{AUTO_INCREMENT} value. This is mainly used -with the update log. -@end table - -@findex ISOLATION LEVEL -@node SET TRANSACTION, CREATE INDEX, SET OPTION, Reference -@section @code{SET TRANSACTION} Syntax - -@example -SET [GLOBAL | SESSION] TRANSACTION ISOLATION LEVEL -[READ UNCOMMITTED | READ COMMITTED | REPEATABLE READ | SERIALIZABLE] -@end example - -Sets the transaction isolation level for the global, whole session or -the next transaction. - -The default behavior is to set the isolation level for the next (not started) -transaction. - -If you set the @code{GLOBAL} privilege it will affect all new created threads. -You will need the @code{PROCESS} privilege to do do this. - -Setting the @code{SESSION} privilege will affect the following and all -future transactions. - -You can set the default isolation level for @code{mysqld} with -@code{--transaction-isolation=...}. @xref{Command-line options}. +@node CREATE INDEX, DROP INDEX, LOCK TABLES, Reference +@section @code{CREATE INDEX} Syntax +@findex CREATE INDEX @cindex indexes @cindex indexes, multi-part @cindex multi-part index -@findex CREATE INDEX -@node CREATE INDEX, DROP INDEX, SET TRANSACTION, Reference -@section @code{CREATE INDEX} Syntax @example CREATE [UNIQUE|FULLTEXT] INDEX index_name ON tbl_name (col_name[(length)],... ) @@ -34175,7 +36530,7 @@ not trivial). @end itemize -@node Fulltext Search, Performance, Table types, Top +@node Fulltext Search, Maintenance, Table types, Top @chapter MySQL Full-text Search @cindex searching, full-text @@ -34418,2216 +36773,8 @@ the user wants to treat as words, examples are "C++", "AS/400", "TCP/IP", etc. parameters to @code{FULLTEXT} in @code{CREATE/ALTER TABLE}). @end itemize -@cindex performance, maximizing -@cindex optimization -@node Performance, MySQL Benchmarks, Fulltext Search, Top -@chapter Getting Maximum Performance from MySQL - -Optimization is a complicated task because it ultimately requires -understanding of the whole system. While it may be possible to do some -local optimizations with small knowledge of your system/application, the -more optimal you want your system to become the more you will have to -know about it. - -So this chapter will try to explain and give some examples of different -ways to optimize @strong{MySQL}. But remember that there are always some -(increasingly harder) additional ways to make the system even faster. - -@menu -* Optimize Basics:: Optimization overview -* System:: System/Compile time and startup parameter tuning -* Data size:: Get your data as small as possible -* MySQL indexes:: How @strong{MySQL} uses indexes -* Query Speed:: Speed of queries that access or update data -* Tips:: Other optimization tips -* Benchmarks:: Using your own benchmarks -* Design:: Design choices -* Design Limitations:: MySQL design limitations/tradeoffs -* Portability:: Portability -* Internal use:: What have we used MySQL for? -@end menu - -@node Optimize Basics, System, Performance, Performance -@section Optimization Overview - -The most important part for getting a system fast is of course the basic -design. You also need to know what kinds of things your system will be -doing, and what your bottlenecks are. - -The most common bottlenecks are: -@itemize @bullet -@item Disk seeks. -It takes time for the disk to find a piece of data. With modern disks in -1999, the mean time for this is usually lower than 10ms, so we can in -theory do about 1000 seeks a second. This time improves slowly with new -disks and is very hard to optimize for a single table. The way to -optimize this is to spread the data on more than one disk. -@item Disk reading/writing. -When the disk is at the correct position we need to read the data. With -modern disks in 1999, one disk delivers something like 10-20Mb/s. This -is easier to optimize than seeks because you can read in parallel from -multiple disks. -@item CPU cycles. -When we have the data in main memory (or if it already were -there) we need to process it to get to our result. Having small -tables compared to the memory is the most common limiting -factor. But then, with small tables speed is usually not the problem. -@item Memory bandwidth. -When the CPU needs more data than can fit in the CPU cache the main -memory bandwidth becomes a bottleneck. This is an uncommon bottleneck -for most systems, but one should be aware of it. -@end itemize - -@cindex compiling, optimizing -@cindex system optimization -@cindex startup parameters, tuning -@node System, Data size, Optimize Basics, Performance -@section System/Compile Time and Startup Parameter Tuning - -We start with the system level things since some of these decisions have -to be made very early. In other cases a fast look at this part may -suffice because it not that important for the big gains. However, it is always -nice to have a feeling about how much one could gain by changing things -at this level. - -The default OS to use is really important! To get the most use of -multiple CPU machines one should use Solaris (because the threads works -really nice) or Linux (because the 2.2 kernel has really good SMP -support). Also on 32-bit machines Linux has a 2G file size limit by -default. Hopefully this will be fixed soon when new filesystems are -released (XFS/Reiserfs). If you have a desperate need for files bigger -than 2G on Linux-intel 32 bit, you should get the LFS patch for the ext2 -file system. - -Because we have not run @strong{MySQL} in production on that many platforms, we -advice you to test your intended platform before choosing it, if possible. - -@cindex locking -Other tips: -@itemize @bullet -@item -If you have enough RAM, you could remove all swap devices. Some -operating systems will use a swap device in some contexts even if you -have free memory. -@item -Use the @code{--skip-locking} @strong{MySQL} option to avoid external -locking. Note that this will not impact @strong{MySQL}'s functionality as -long as you only run one server. Just remember to take down the server (or -lock relevant parts) before you run @code{myisamchk}. On some system -this switch is mandatory because the external locking does not work in any -case. - -The @code{--skip-locking} option is on by default when compiling with -MIT-pthreads, because @code{flock()} isn't fully supported by -MIT-pthreads on all platforms. It's also on default for Linux -as Linux file locking are not yet safe. - -The only case when you can't use @code{--skip-locking} is if you run -multiple @strong{MySQL} @emph{servers} (not clients) on the same data, -or run @code{myisamchk} on the table without first flushing and locking -the @code{mysqld} server tables first. - -You can still use @code{LOCK TABLES}/@code{UNLOCK TABLES} even if you -are using @code{--skip-locking} -@end itemize - -@menu -* Compile and link options:: How compiling and linking affects the speed of MySQL -* Disk issues:: Disk issues -* Symbolic links:: Using Symbolic Links -* Server parameters:: Tuning server parameters -* Table cache:: How MySQL opens and closes tables -* Creating many tables:: Drawbacks of creating large numbers of tables in the same database -* Open tables:: Why so many open tables? -* Memory use:: How MySQL uses memory -* Internal locking:: How MySQL locks tables -* Table locking:: Table locking issues -* DNS:: -@end menu - -@node Compile and link options, Disk issues, System, System -@subsection How Compiling and Linking Affects the Speed of MySQL - -Most of the following tests are done on Linux with the -@strong{MySQL} benchmarks, but they should give some indication for -other operating systems and workloads. - -@cindex linking, speed -@cindex compiling, speed -@cindex speed, compiling -@cindex speed, linking - -You get the fastest executable when you link with @code{-static}. - -On Linux, you will get the fastest code when compiling with @code{pgcc} -and @code{-O3}. To compile @file{sql_yacc.cc} with these options, you -need about 200M memory because @code{gcc/pgcc} needs a lot of memory to -make all functions inline. You should also set @code{CXX=gcc} when -configuring @strong{MySQL} to avoid inclusion of the @code{libstdc++} -library (it is not needed). Note that with some versions of @code{pgcc}, -the resulting code will only run on true Pentium processors, even if you -use the compiler option that you want the resulting code to be working on -all x586 type processors (like AMD). - -By just using a better compiler and/or better compiler options you can -get a 10-30 % speed increase in your application. This is particularly -important if you compile the SQL server yourself! - -We have tested both the Cygnus CodeFusion and Fujitsu compilers, but -when we tested them, neither was sufficiently bug free to allow -@strong{MySQL} to be compiled with optimizations on. - -When you compile @strong{MySQL} you should only include support for the -character sets that you are going to use. (Option @code{--with-charset=xxx}). -The standard @strong{MySQL} binary distributions are compiled with support -for all character sets. - -Here is a list of some measurements that we have done: -@itemize @bullet -@item -If you use @code{pgcc} and compile everything with @code{-O6}, the -@code{mysqld} server is 1% faster than with @code{gcc} 2.95.2. - -@item -If you link dynamically (without @code{-static}), the result is 13% -slower on Linux. Note that you still can use a dynamic linked -@strong{MySQL} library. It is only the server that is critical for -performance. - -@item -If you strip your @code{mysqld} binary with @code{strip libexec/mysqld}, -the resulting binary can be up to 4 % faster. - -@item -If you connect using TCP/IP rather than Unix sockets, the result is 7.5% -slower on the same computer. (If you are connection to @code{localhost}, -@strong{MySQL} will, by default, use sockets). - -@item -If you connect using TCP/IP from another computer over a 100M Ethernet, -things will be 8-11 % slower. - -@item -If you compile with @code{--with-debug=full}, then you will loose 20 % -for most queries, but some queries may take substantially longer (The -@strong{MySQL} benchmarks ran 35 % slower) -If you use @code{--with-debug}, then you will only loose 15 %. -By starting a @code{mysqld} version compiled with @code{--with-debug=full} -with @code{--skip-safemalloc} the end result should be close to when -configuring with @code{--with-debug}. - -@item -On a Sun SPARCstation 20, SunPro C++ 4.2 is 5 % faster than @code{gcc} 2.95.2. - -@item -Compiling with @code{gcc} 2.95.2 for ultrasparc with the option -@code{-mcpu=v8 -Wa,-xarch=v8plusa} gives 4 % more performance. - -@item -On Solaris 2.5.1, MIT-pthreads is 8-12% slower than Solaris native -threads on a single processor. With more load/CPUs the difference should -get bigger. - -@item -Running with @code{--log-bin} makes @strong{[MySQL} 1 % slower. - -@item -Compiling on Linux-x86 using gcc without frame pointers -@code{-fomit-frame-pointer} or @code{-fomit-frame-pointer -ffixed-ebp} -@code{mysqld} 1-4% faster. -@end itemize - -The @strong{MySQL}-Linux distribution provided by @strong{MySQL AB} used -to be compiled with @code{pgcc}, but we had to go back to regular gcc -because of a bug in @code{pgcc} that would generate the code that does -not run on AMD. We will continue using gcc until that bug is resolved. -In the meantime, if you have a non-AMD machine, you can get a faster -binary by compiling with @code{pgcc}. The standard @strong{MySQL} -Linux binary is linked statically to get it faster and more portable. - -@cindex disk issues -@cindex performance, disk issues -@node Disk issues, Symbolic links, Compile and link options, System -@subsection Disk Issues - -@itemize @bullet -@item -As mentioned before, disks seeks are a big performance bottleneck. This -problems gets more and more apparent when the data starts to grow so -large that effective caching becomes impossible. For large databases, -where you access data more or less randomly, you can be sure that you -will need at least one disk seek to read and a couple of disk seeks to -write things. To minimize this problem, use disks with low seek times. -@item -Increase the number of available disk spindles (and thereby reduce -the seek overhead) by either symlink files to different disks or striping -the disks. -@table @strong -@item Using symbolic links -This means that you symlink the index and/or data file(s) from the -normal data directory to another disk (that may also be striped). This -makes both the seek and read times better (if the disks are not used for -other things). @xref{Symbolic links}. -@cindex striping, defined -@item Striping -Striping means that you have many disks and put the first block on the -first disk, the second block on the second disk, and the Nth on the -(N mod number_of_disks) disk, and so on. This means if your normal data -size is less than the stripe size (or perfectly aligned) you will get -much better performance. Note that striping is very dependent on the OS -and stripe-size. So benchmark your application with different -stripe-sizes. @xref{Benchmarks}. - -Note that the speed difference for striping is @strong{very} dependent -on the parameters. Depending on how you set the striping parameters and -number of disks you may get a difference in orders of magnitude. Note that -you have to choose to optimize for random or sequential access. -@end table -@item -For reliability you may want to use RAID 0+1 (striping + mirroring), but -in this case you will need 2*N drives to hold N drives of data. This is -probably the best option if you have the money for it! You may, however, -also have to invest in some volume-management software to handle it -efficiently. -@item -A good option is to have semi-important data (that can be regenerated) -on RAID 0 disk while storing really important data (like host information -and logs) on a RAID 0+1 or RAID N disk. RAID N can be a problem if you -have many writes because of the time to update the parity bits. -@item -You may also set the parameters for the file system that the database -uses. One easy change is to mount the file system with the noatime -option. That makes it skip the updating of the last access time in the -inode and by this will avoid some disk seeks. -@item -On Linux, you can get much more performance (up to 100 % under load is -not uncommon) by using hdpram to configure your disk's interface! The -following should be quite good hdparm options for @strong{MySQL} (and -probably many other applications): -@example -hdparm -m 16 -d 1 - -@end example - -Note that the performance/reliability when using the above depends on -your hardware, so we strongly suggest that you test your system -thoroughly after using @code{hdparm}! Please consult the @code{hdparm} -man page for more information! If @code{hdparm} is not used wisely, -filesystem corruption may result. Backup everything before experimenting! -@item -On many operating systems you can mount the disks with the 'async' flag to set the file -system to be updated asynchronously. If your computer is reasonable stable, -this should give you more performance without sacrificing too much reliability. -(This flag is on by default on Linux.) -@item -If you don't need to know when a file was last accessed (which is not -really useful on a database server), you can mount your file systems -with the noatime flag. -@end itemize - -@cindex symbolic links -@cindex links, symbolic -@node Symbolic links, Server parameters, Disk issues, System -@subsection Using Symbolic Links - -You can move tables and databases from the database directory to other -locations and replace them with symbolic links to the new locations. -You might want to do this, for example, to move a database to a file -system with more free space or increase the speed of your system by -spreading your tables to different disk. - -The recommended may to do this, is to just symlink databases to different -disk and only symlink tables as a last resort. -. - -@cindex databases, symbolic links -@menu -* Symbolic links to database:: -* Symbolic links to tables:: -@end menu - -@node Symbolic links to database, Symbolic links to tables, Symbolic links, Symbolic links -@subsubsection Using Symbolic Links for Databases - -The way to symlink a database is to first create a directory on some -disk where you have free space and then create a symlink to it from -the @strong{MySQL} database directory. - -@example -shell> mkdir /dr1/databases/test -shell> ln -s /dr1/databases/test mysqld-datadir -@end example - -@strong{MySQL} doesn't support that you link one directory to multiple -databases. Replacing a database directory with a symbolic link will -work fine as long as you don't make a symbolic link between databases. -Suppose you have a database @code{db1} under the @strong{MySQL} data -directory, and then make a symlink @code{db2} that points to @code{db1}: - -@example -shell> cd /path/to/datadir -shell> ln -s db1 db2 -@end example - -Now, for any table @code{tbl_a} in @code{db1}, there also appears to be -a table @code{tbl_a} in @code{db2}. If one thread updates @code{db1.tbl_a} -and another thread updates @code{db2.tbl_a}, there will be problems. - -If you really need this, you must change the following code in -@file{mysys/mf_format.c}: - -@example -if (flag & 32 || (!lstat(to,&stat_buff) && S_ISLNK(stat_buff.st_mode))) -@end example - -to - -@example -if (1) -@end example - -On Windows you can use internal symbolic links to directories by compiling -@strong{MySQL} with @code{-DUSE_SYMDIR}. This allows you to put different -databases on different disks. @xref{Windows symbolic links}. - -@cindex databases, symbolic links -@node Symbolic links to tables, , Symbolic links to database, Symbolic links -@subsubsection Using Symbolic Links for Tables - -Before @strong{MySQL} 4.0 you should not symlink tables, if you are not -very carefully with them. The problem is that if you run @code{ALTER -TABLE}, @code{REPAIR TABLE} or @code{OPTIMIZE TABLE} on a symlinked -table, the symlinks will be removed and replaced by the original -files. This happens because the above command works by creating a -temporary file in the database directory and when the command is -complete, replace the original file with the temporary file. - -You should not symlink tables on system that doesn't have a fully -working @code{realpath()} call. (At least Linux and Solaris support -@code{realpath()}) - -In @strong{MySQL} 4.0 symlinks is only fully supported for @code{MyISAM} -tables. For other table types you will probably get strange problems -when doing any of the above mentioned commands. - -The handling of symbolic links in @strong{MySQL} 4.0 works the following -way (this is mostly relevant only for @code{MyISAM} tables). - -@itemize @bullet -@item -In the data directory you will always have the table definition file -and the data/index files. -@item -You can symlink the index file and the data file to different directories -independent of the other. -@item -The symlinking can be done from the operating system (if @code{mysqld} is -not running) or with the @code{INDEX/DATA DIRECTORY="path-to-dir"} command -in @code{CREATE TABLE}. @xref{CREATE TABLE}. -@item -@code{myisamchk} will not replace a symlink with the index/file but -work directly on the files the symlinks points to. Any temporary files -will be created in the same directory where the data/index file is. -@item -When you drop a table that is using symlinks, both the symlink and the -file the symlink points to is dropped. This is a good reason to why you -should NOT run @code{mysqld} as root and not allow persons to have write -access to the @strong{MySQL} database directories. -@item -If you rename a table with @code{ALTER TABLE RENAME} and you don't change -database, the symlink in the database directory will be renamed to the new -name and the data/index file will be renamed accordingly. -@item -If you use @code{ALTER TABLE RENAME} to move a table to another database, -then the table will be moved to the other database directory and the old -symlinks and the files they pointed to will be deleted. -@item -If you are not using symlinks you should use the @code{--skip-symlink} -option to @code{mysqld} to ensure that no one can drop or rename a file -outside of the @code{mysqld} data directory. -@end itemize - -Things that are not yet supported: - -@cindex TODO, symlinks -@itemize @bullet -@item -@code{ALTER TABLE} ignores all @code{INDEX/DATA DIRECTORY="path"} options. -@item -@code{CREATE TABLE} doesn't report if the table has symbolic links. -@item -@code{mysqldump} doesn't include the symbolic links information in the output. -@item -@code{BACKUP TABLE} and @code{RESTORE TABLE} doesn't use symbolic links. -@end itemize - -@cindex parameters, server -@cindex @code{mysqld} server, buffer sizes -@cindex buffer sizes, @code{mysqld} server -@cindex startup parameters -@node Server parameters, Table cache, Symbolic links, System -@subsection Tuning Server Parameters - -You can get the default buffer sizes used by the @code{mysqld} server -with this command: - -@example -shell> mysqld --help -@end example - -@cindex @code{mysqld} options -@cindex variables, @code{mysqld} -This command produces a list of all @code{mysqld} options and configurable -variables. The output includes the default values and looks something -like this: - -@example -Possible variables for option --set-variable (-O) are: -back_log current value: 5 -bdb_cache_size current value: 1048540 -binlog_cache_size current_value: 32768 -connect_timeout current value: 5 -delayed_insert_timeout current value: 300 -delayed_insert_limit current value: 100 -delayed_queue_size current value: 1000 -flush_time current value: 0 -interactive_timeout current value: 28800 -join_buffer_size current value: 131072 -key_buffer_size current value: 1048540 -lower_case_table_names current value: 0 -long_query_time current value: 10 -max_allowed_packet current value: 1048576 -max_binlog_cache_size current_value: 4294967295 -max_connections current value: 100 -max_connect_errors current value: 10 -max_delayed_threads current value: 20 -max_heap_table_size current value: 16777216 -max_join_size current value: 4294967295 -max_sort_length current value: 1024 -max_tmp_tables current value: 32 -max_write_lock_count current value: 4294967295 -myisam_sort_buffer_size current value: 8388608 -net_buffer_length current value: 16384 -net_retry_count current value: 10 -net_read_timeout current value: 30 -net_write_timeout current value: 60 -query_buffer_size current value: 0 -record_buffer current value: 131072 -slow_launch_time current value: 2 -sort_buffer current value: 2097116 -table_cache current value: 64 -thread_concurrency current value: 10 -tmp_table_size current value: 1048576 -thread_stack current value: 131072 -wait_timeout current value: 28800 -@end example - -If there is a @code{mysqld} server currently running, you can see what -values it actually is using for the variables by executing this command: - -@example -shell> mysqladmin variables -@end example - -You can find a full description for all variables in the @code{SHOW VARIABLES} -section in this manual. @xref{SHOW VARIABLES}. - -You can also see some statistics from a running server by issuing the command -@code{SHOW STATUS}. @xref{SHOW STATUS}. - -@strong{MySQL} uses algorithms that are very scalable, so you can usually -run with very little memory. If you, however, give @strong{MySQL} more -memory, you will normally also get better performance. - -When tuning a @strong{MySQL} server, the two most important variables to use -are @code{key_buffer_size} and @code{table_cache}. You should first feel -confident that you have these right before trying to change any of the -other variables. - -If you have much memory (>=256M) and many tables and want maximum performance -with a moderate number of clients, you should use something like this: - -@example -shell> safe_mysqld -O key_buffer=64M -O table_cache=256 \ - -O sort_buffer=4M -O record_buffer=1M & -@end example - -If you have only 128M and only a few tables, but you still do a lot of -sorting, you can use something like: - -@example -shell> safe_mysqld -O key_buffer=16M -O sort_buffer=1M -@end example - -If you have little memory and lots of connections, use something like this: - -@example -shell> safe_mysqld -O key_buffer=512k -O sort_buffer=100k \ - -O record_buffer=100k & -@end example - -or even: - -@example -shell> safe_mysqld -O key_buffer=512k -O sort_buffer=16k \ - -O table_cache=32 -O record_buffer=8k -O net_buffer=1K & -@end example - -When you have installed @strong{MySQL}, the @file{support-files} directory will -contain some different @code{my.cnf} example files, @file{my-huge.cnf}, -@file{my-large.cnf}, @file{my-medium.cnf}, and @file{my-small.cnf}, you can -use as a base to optimize your system. - -If there are very many connections, ``swapping problems'' may occur unless -@code{mysqld} has been configured to use very little memory for each -connection. @code{mysqld} performs better if you have enough memory for all -connections, of course. - -Note that if you change an option to @code{mysqld}, it remains in effect only -for that instance of the server. - -To see the effects of a parameter change, do something like this: - -@example -shell> mysqld -O key_buffer=32m --help -@end example - -Make sure that the @code{--help} option is last; otherwise, the effect of any -options listed after it on the command line will not be reflected in the -output. - -@cindex tables, opening -@cindex tables, closing -@cindex opening, tables -@cindex closing, tables -@cindex table cache -@findex table_cache -@node Table cache, Creating many tables, Server parameters, System -@subsection How MySQL Opens and Closes Tables - -@code{table_cache}, @code{max_connections}, and @code{max_tmp_tables} -affect the maximum number of files the server keeps open. If you -increase one or both of these values, you may run up against a limit -imposed by your operating system on the per-process number of open file -descriptors. However, you can increase the limit on many systems. -Consult your OS documentation to find out how to do this, because the -method for changing the limit varies widely from system to system. - -@code{table_cache} is related to @code{max_connections}. For example, -for 200 concurrent running connections, you should have a table cache of -at least @code{200 * n}, where @code{n} is the maximum number of tables -in a join. You also need to reserve some extra file descriptors for -temporary tables and files. - -The cache of open tables can grow to a maximum of @code{table_cache} -(default 64; this can be changed with the @code{-O table_cache=#} -option to @code{mysqld}). A table is never closed, except when the -cache is full and another thread tries to open a table or if you use -@code{mysqladmin refresh} or @code{mysqladmin flush-tables}. - -When the table cache fills up, the server uses the following procedure -to locate a cache entry to use: - -@itemize @bullet -@item -Tables that are not currently in use are released, in least-recently-used -order. - -@item -If the cache is full and no tables can be released, but a new table needs to -be opened, the cache is temporarily extended as necessary. - -@item -If the cache is in a temporarily-extended state and a table goes from in-use -to not-in-use state, the table is closed and released from the cache. -@end itemize - -A table is opened for each concurrent access. This means that -if you have two threads accessing the same table or access the table -twice in the same query (with @code{AS}) the table needs to be opened twice. -The first open of any table takes two file descriptors; each additional -use of the table takes only one file descriptor. The extra descriptor -for the first open is used for the index file; this descriptor is shared -among all threads. - -You can check if your table cache is too small by checking the mysqld -variable @code{opened_tables}. If this is quite big, even if you -haven't done a lot of @code{FLUSH TABLES}, you should increase your table -cache. @xref{SHOW STATUS}. - -@cindex tables, too many -@node Creating many tables, Open tables, Table cache, System -@subsection Drawbacks to Creating Large Numbers of Tables in the Same Database - -If you have many files in a directory, open, close, and create operations will -be slow. If you execute @code{SELECT} statements on many different tables, -there will be a little overhead when the table cache is full, because for -every table that has to be opened, another must be closed. You can reduce -this overhead by making the table cache larger. - -@cindex tables, open -@cindex open tables -@node Open tables, Memory use, Creating many tables, System -@subsection Why So Many Open tables? - -When you run @code{mysqladmin status}, you'll see something like this: - -@example -Uptime: 426 Running threads: 1 Questions: 11082 Reloads: 1 Open tables: 12 -@end example - -This can be somewhat perplexing if you only have 6 tables. - -@strong{MySQL} is multithreaded, so it may have many queries on the same -table simultaneously. To minimize the problem with two threads having -different states on the same file, the table is opened independently by -each concurrent thread. This takes some memory and one extra file -descriptor for the data file. The index file descriptor is shared -between all threads. - -@cindex memory use -@node Memory use, Internal locking, Open tables, System -@subsection How MySQL Uses Memory - -The list below indicates some of the ways that the @code{mysqld} server -uses memory. Where applicable, the name of the server variable relevant -to the memory use is given: - -@itemize @bullet -@item -The key buffer (variable @code{key_buffer_size}) is shared by all -threads; Other buffers used by the server are allocated as -needed. @xref{Server parameters}. - -@item -Each connection uses some thread-specific space: A stack (default 64K, -variable @code{thread_stack}), a connection buffer (variable -@code{net_buffer_length}), and a result buffer (variable -@code{net_buffer_length}). The connection buffer and result buffer are -dynamically enlarged up to @code{max_allowed_packet} when needed. When -a query is running, a copy of the current query string is also allocated. - -@item -All threads share the same base memory. - -@item -Only the compressed ISAM / MyISAM tables are memory mapped. This is -because the 32-bit memory space of 4GB is not large enough for most -big tables. When systems with a 64-bit address space become more -common we may add general support for memory mapping. - -@item -Each request doing a sequential scan over a table allocates a read buffer -(variable @code{record_buffer}). - -@item -All joins are done in one pass, and most joins can be done without even -using a temporary table. Most temporary tables are memory-based (HEAP) -tables. Temporary tables with a big record length (calculated as the -sum of all column lengths) or that contain @code{BLOB} columns are -stored on disk. - -One problem in @strong{MySQL} versions before Version 3.23.2 is that if a HEAP table -exceeds the size of @code{tmp_table_size}, you get the error @code{The -table tbl_name is full}. In newer versions this is handled by -automatically changing the in-memory (HEAP) table to a disk-based -(MyISAM) table as necessary. To work around this problem, you can -increase the temporary table size by setting the @code{tmp_table_size} -option to @code{mysqld}, or by setting the SQL option -@code{SQL_BIG_TABLES} in the client program. @xref{SET OPTION, , -@code{SET OPTION}}. In @strong{MySQL} Version 3.20, the maximum size of the -temporary table was @code{record_buffer*16}, so if you are using this -version, you have to increase the value of @code{record_buffer}. You can -also start @code{mysqld} with the @code{--big-tables} option to always -store temporary tables on disk. However, this will affect the speed of -many complicated queries. - -@item -Most requests doing a sort allocates a sort buffer and 0-2 temporary -files depending on the result set size. @xref{Temporary files}. - -@item -Almost all parsing and calculating is done in a local memory store. No -memory overhead is needed for small items and the normal slow memory -allocation and freeing is avoided. Memory is allocated only for -unexpectedly large strings (this is done with @code{malloc()} and -@code{free()}). - -@item -Each index file is opened once and the data file is opened once for each -concurrently running thread. For each concurrent thread, a table structure, -column structures for each column, and a buffer of size @code{3 * n} is -allocated (where @code{n} is the maximum row length, not counting @code{BLOB} -columns). A @code{BLOB} uses 5 to 8 bytes plus the length of the @code{BLOB} -data. The @code{ISAM}/@code{MyISAM} table handlers will use one extra row -buffer for internal usage. - -@item -For each table having @code{BLOB} columns, a buffer is enlarged dynamically -to read in larger @code{BLOB} values. If you scan a table, a buffer as large -as the largest @code{BLOB} value is allocated. - -@item -Table handlers for all in-use tables are saved in a cache and managed as a -FIFO. Normally the cache has 64 entries. If a table has been used by two -running threads at the same time, the cache contains two entries for the -table. @xref{Table cache}. - -@item -A @code{mysqladmin flush-tables} command closes all tables that are not in -use and marks all in-use tables to be closed when the currently executing -thread finishes. This will effectively free most in-use memory. -@end itemize - -@code{ps} and other system status programs may report that @code{mysqld} -uses a lot of memory. This may be caused by thread-stacks on different -memory addresses. For example, the Solaris version of @code{ps} counts -the unused memory between stacks as used memory. You can verify this by -checking available swap with @code{swap -s}. We have tested -@code{mysqld} with commercial memory-leakage detectors, so there should -be no memory leaks. - -@cindex internal locking -@cindex locking, tables -@cindex tables, locking -@node Internal locking, Table locking, Memory use, System -@subsection How MySQL Locks Tables - -You can find a discussion about different locking methods in the appendix. -@xref{Locking methods}. - -All locking in @strong{MySQL} is deadlock-free. This is managed by always -requesting all needed locks at once at the beginning of a query and always -locking the tables in the same order. - -The locking method @strong{MySQL} uses for @code{WRITE} locks works as follows: - -@itemize @bullet -@item -If there are no locks on the table, put a write lock on it. -@item -Otherwise, put the lock request in the write lock queue. -@end itemize - -The locking method @strong{MySQL} uses for @code{READ} locks works as follows: - -@itemize @bullet -@item -If there are no write locks on the table, put a read lock on it. -@item -Otherwise, put the lock request in the read lock queue. -@end itemize - -When a lock is released, the lock is made available to the threads -in the write lock queue, then to the threads in the read lock queue. - -This means that if you have many updates on a table, @code{SELECT} -statements will wait until there are no more updates. - -To work around this for the case where you want to do many @code{INSERT} and -@code{SELECT} operations on a table, you can insert rows in a temporary -table and update the real table with the records from the temporary table -once in a while. - -This can be done with the following code: -@example -mysql> LOCK TABLES real_table WRITE, insert_table WRITE; -mysql> insert into real_table select * from insert_table; -mysql> TRUNCATE TABLE insert_table; -mysql> UNLOCK TABLES; -@end example - -You can use the @code{LOW_PRIORITY} options with @code{INSERT}, -@code{UPDATE} or @code{DELETE} or @code{HIGH_PRIORITY} with -@code{SELECT} if you want to prioritize retrieval in some specific -cases. You can also start @code{mysqld} with @code{--low-priority-updates} -to get the same behaveour. - -Using @code{SQL_BUFFER_RESULT} can also help making table locks shorter. -@xref{SELECT}. - -You could also change the locking code in @file{mysys/thr_lock.c} to use a -single queue. In this case, write locks and read locks would have the same -priority, which might help some applications. - -@cindex problems, table locking -@node Table locking, DNS, Internal locking, System -@subsection Table Locking Issues - -The table locking code in @strong{MySQL} is deadlock free. - -@strong{MySQL} uses table locking (instead of row locking or column -locking) on all table types, except @code{BDB} tables, to achieve a very -high lock speed. For large tables, table locking is MUCH better than -row locking for most applications, but there are, of course, some -pitfalls. - -For @code{BDB} and @code{InnoDB} tables, @strong{MySQL} only uses table -locking if you explicitely lock the table with @code{LOCK TABLES} or -execute a command that will modify every row in the table, like -@code{ALTER TABLE}. For these table types we recommend you to not use -@code{LOCK TABLES} at all. - -In @strong{MySQL} Version 3.23.7 and above, you can insert rows into -@code{MyISAM} tables at the same time other threads are reading from the -table. Note that currently this only works if there are no holes after -deleted rows in the table at the time the insert is made. When all holes -has been filled with new data, concurrent inserts will automatically be -enabled again. - -Table locking enables many threads to read from a table at the same -time, but if a thread wants to write to a table, it must first get -exclusive access. During the update, all other threads that want to -access this particular table will wait until the update is ready. - -As updates on tables normally are considered to be more important than -@code{SELECT}, all statements that update a table have higher priority -than statements that retrieve information from a table. This should -ensure that updates are not 'starved' because one issues a lot of heavy -queries against a specific table. (You can change this by using -LOW_PRIORITY with the statement that does the update or -@code{HIGH_PRIORITY} with the @code{SELECT} statement.) - -Starting from @strong{MySQL} Version 3.23.7 one can use the -@code{max_write_lock_count} variable to force @strong{MySQL} to -temporary give all @code{SELECT} statements, that wait for a table, a -higher priority after a specific number of inserts on a table. - -Table locking is, however, not very good under the following senario: - -@itemize @bullet -@item -A client issues a @code{SELECT} that takes a long time to run. -@item -Another client then issues an @code{UPDATE} on a used table. This client -will wait until the @code{SELECT} is finished. -@item -Another client issues another @code{SELECT} statement on the same table. As -@code{UPDATE} has higher priority than @code{SELECT}, this @code{SELECT} -will wait for the @code{UPDATE} to finish. It will also wait for the first -@code{SELECT} to finish! -@item -A thread is waiting for something like @code{full disk}, in which case all -threads that wants to access the problem table will also be put in a waiting -state until more disk space is made available. -@end itemize - -Some possible solutions to this problem are: - -@itemize @bullet -@item -Try to get the @code{SELECT} statements to run faster. You may have to create -some summary tables to do this. - -@item -Start @code{mysqld} with @code{--low-priority-updates}. This will give -all statements that update (modify) a table lower priority than a @code{SELECT} -statement. In this case the last @code{SELECT} statement in the previous -scenario would execute before the @code{INSERT} statement. - -@item -You can give a specific @code{INSERT}, @code{UPDATE}, or @code{DELETE} -statement lower priority with the @code{LOW_PRIORITY} attribute. - -@item -Start @code{mysqld} with a low value for @strong{max_write_lock_count} to give -@code{READ} locks after a certain number of @code{WRITE} locks. - -@item -You can specify that all updates from a specific thread should be done with -low priority by using the SQL command: @code{SET SQL_LOW_PRIORITY_UPDATES=1}. -@xref{SET OPTION, , @code{SET OPTION}}. - -@item -You can specify that a specific @code{SELECT} is very important with the -@code{HIGH_PRIORITY} attribute. @xref{SELECT, , @code{SELECT}}. - -@item -If you have problems with @code{INSERT} combined with @code{SELECT}, -switch to use the new @code{MyISAM} tables as these support concurrent -@code{SELECT}s and @code{INSERT}s. - -@item -If you mainly mix @code{INSERT} and @code{SELECT} statements, the -@code{DELAYED} attribute to @code{INSERT} will probably solve your problems. -@xref{INSERT, , @code{INSERT}}. - -@item -If you have problems with @code{SELECT} and @code{DELETE}, the @code{LIMIT} -option to @code{DELETE} may help. @xref{DELETE, , @code{DELETE}}. -@end itemize - -@cindex DNS -@cindex hostname caching -@node DNS, , Table locking, System -@subsection How MySQL uses DNS - -When a new thread connects to @code{mysqld}, @code{mysqld} will span a -new thread to handle the request. This thread will first check if the -hostname is in the hostname cache. If not the thread will call -@code{gethostbyaddr_r()} and @code{gethostbyname_r()} to resolve the -hostname. - -If the operating system doesn't support the above thread-safe calls, the -thread will lock a mutex and call @code{gethostbyaddr()} and -@code{gethostbyname()} instead. Note that in this case no other thread -can resolve other hostnames that is not in the hostname cache until the -first thread is ready. - -You can disable DNS host lookup by starting @code{mysqld} with -@code{--skip-name-resolve}. In this case you can however only use IP -names in the @strong{MySQL} privilege tables. - -If you have a very slow DNS and many hosts, you can get more performance by -either disabling DNS lookop with @code{--skip-name-resolve} or by -increasing the @code{HOST_CACHE_SIZE} define (default: 128) and recompile -@code{mysqld}. - -You can disable the hostname cache with @code{--skip-host-cache}. You -can clear the hostname cache with @code{FLUSH HOSTS} or @code{mysqladmin -flush-hosts}. - -If you don't want to allow connections over @code{TCP/IP}, you can do this -by starting @code{mysqld} with @code{--skip-networking}. - -@cindex data, size -@cindex reducing, data size -@cindex storage space, minimizing -@cindex tables, improving performance -@cindex performance, improving -@node Data size, MySQL indexes, System, Performance -@section Get Your Data as Small as Possible - -One of the most basic optimization is to get your data (and indexes) to -take as little space on the disk (and in memory) as possible. This can -give huge improvements because disk reads are faster and normally less -main memory will be used. Indexing also takes less resources if -done on smaller columns. - -@strong{MySQL} supports a lot of different table types and row formats. -Choosing the right table format may give you a big performance gain. -@xref{Table types}. - -You can get better performance on a table and minimize storage space -using the techniques listed below: - -@itemize @bullet -@item -Use the most efficient (smallest) types possible. @strong{MySQL} has -many specialized types that save disk space and memory. -@item -Use the smaller integer types if possible to get smaller tables. For -example, @code{MEDIUMINT} is often better than @code{INT}. -@item -Declare columns to be @code{NOT NULL} if possible. It makes everything -faster and you save one bit per column. Note that if you really need -@code{NULL} in your application you should definitely use it. Just avoid -having it on all columns by default. -@item -If you don't have any variable-length columns (@code{VARCHAR}, -@code{TEXT}, or @code{BLOB} columns), a fixed-size record format is -used. This is faster but unfortunately may waste some space. -@xref{MyISAM table formats}. -@item -The primary index of a table should be as short as possible. This makes -identification of one row easy and efficient. -@item -For each table, you have to decide which storage/index method to -use. @xref{Table types}. -@item -Only create the indexes that you really need. Indexes are good for -retrieval but bad when you need to store things fast. If you mostly -access a table by searching on a combination of columns, make an index -on them. The first index part should be the most used column. If you are -ALWAYS using many columns, you should use the column with more duplicates -first to get better compression of the index. -@item -If it's very likely that a column has a unique prefix on the first number -of characters, it's better to only index this prefix. @strong{MySQL} -supports an index on a part of a character column. Shorter indexes are -faster not only because they take less disk space but also because they -will give you more hits in the index cache and thus fewer disk -seeks. @xref{Server parameters}. -@item -In some circumstances it can be beneficial to split into two a table that is -scanned very often. This is especially true if it is a dynamic -format table and it is possible to use a smaller static format table that -can be used to find the relevant rows when scanning the table. -@end itemize - -@cindex indexes, uses for -@node MySQL indexes, Query Speed, Data size, Performance -@section How MySQL Uses Indexes - -Indexes are used to find rows with a specific value of one column -fast. Without an index @strong{MySQL} has to start with the first record -and then read through the whole table until it finds the relevant -rows. The bigger the table, the more this costs. If the table has an index -for the columns in question, @strong{MySQL} can quickly get a position to -seek to in the middle of the data file without having to look at all the -data. If a table has 1000 rows, this is at least 100 times faster than -reading sequentially. Note that if you need to access almost all 1000 -rows it is faster to read sequentially because we then avoid disk seeks. - -All @strong{MySQL} indexes (@code{PRIMARY}, @code{UNIQUE}, and -@code{INDEX}) are stored in B-trees. Strings are automatically prefix- -and end-space compressed. @xref{CREATE INDEX, , @code{CREATE INDEX}}. - -Indexes are used to: -@itemize @bullet -@item -Quickly find the rows that match a @code{WHERE} clause. - -@item -Retrieve rows from other tables when performing joins. - -@item -Find the @code{MAX()} or @code{MIN()} value for a specific indexed -column. This is optimized by a preprocessor that checks if you are -using @code{WHERE} key_part_# = constant on all key parts < N. In this case -@strong{MySQL} will do a single key lookup and replace the @code{MIN()} -expression with a constant. If all expressions are replaced with -constants, the query will return at once: - -@example -SELECT MIN(key_part2),MAX(key_part2) FROM table_name where key_part1=10 -@end example - -@item -Sort or group a table if the sorting or grouping is done on a leftmost -prefix of a usable key (for example, @code{ORDER BY key_part_1,key_part_2 }). The -key is read in reverse order if all key parts are followed by @code{DESC}. - -The index can also be used even if the @code{ORDER BY} doesn't match the index -exactly, as long as all the unused index parts and all the extra -are @code{ORDER BY} columns are constants in the @code{WHERE} clause. The -following queries will use the index to resolve the @code{ORDER BY} part: - -@example -SELECT * FROM foo ORDER BY key_part1,key_part2,key_part3; -SELECT * FROM foo WHERE column=constant ORDER BY column, key_part1; -SELECT * FROM foo WHERE key_part1=const GROUP BY key_part2; -@end example - -@item -In some cases a query can be optimized to retrieve values without -consulting the data file. If all used columns for some table are numeric -and form a leftmost prefix for some key, the values may be retrieved -from the index tree for greater speed: - -@example -SELECT key_part3 FROM table_name WHERE key_part1=1 -@end example - -@end itemize - -Suppose you issue the following @code{SELECT} statement: - -@example -mysql> SELECT * FROM tbl_name WHERE col1=val1 AND col2=val2; -@end example - -If a multiple-column index exists on @code{col1} and @code{col2}, the -appropriate rows can be fetched directly. If separate single-column -indexes exist on @code{col1} and @code{col2}, the optimizer tries to -find the most restrictive index by deciding which index will find fewer -rows and using that index to fetch the rows. - -@cindex indexes, leftmost prefix of -@cindex leftmost prefix of indexes -If the table has a multiple-column index, any leftmost prefix of the -index can be used by the optimizer to find rows. For example, if you -have a three-column index on @code{(col1,col2,col3)}, you have indexed -search capabilities on @code{(col1)}, @code{(col1,col2)}, and -@code{(col1,col2,col3)}. - -@strong{MySQL} can't use a partial index if the columns don't form a -leftmost prefix of the index. Suppose you have the @code{SELECT} -statements shown below: - -@example -mysql> SELECT * FROM tbl_name WHERE col1=val1; -mysql> SELECT * FROM tbl_name WHERE col2=val2; -mysql> SELECT * FROM tbl_name WHERE col2=val2 AND col3=val3; -@end example -If an index exists on @code{(col1,col2,col3)}, only the first query -shown above uses the index. The second and third queries do involve -indexed columns, but @code{(col2)} and @code{(col2,col3)} are not -leftmost prefixes of @code{(col1,col2,col3)}. - -@findex LIKE, and indexes -@findex LIKE, and wildcards -@cindex indexes, and @code{LIKE} -@cindex wildcards, and @code{LIKE} -@strong{MySQL} also uses indexes for @code{LIKE} comparisons if the argument -to @code{LIKE} is a constant string that doesn't start with a wild-card -character. For example, the following @code{SELECT} statements use indexes: - -@example -mysql> select * from tbl_name where key_col LIKE "Patrick%"; -mysql> select * from tbl_name where key_col LIKE "Pat%_ck%"; -@end example - -In the first statement, only rows with @code{"Patrick" <= key_col < -"Patricl"} are considered. In the second statement, only rows with -@code{"Pat" <= key_col < "Pau"} are considered. - -The following @code{SELECT} statements will not use indexes: -@example -mysql> select * from tbl_name where key_col LIKE "%Patrick%"; -mysql> select * from tbl_name where key_col LIKE other_col; -@end example - -In the first statement, the @code{LIKE} value begins with a wild-card -character. In the second statement, the @code{LIKE} value is not a -constant. - -@findex IS NULL, and indexes -@cindex indexes, and @code{IS NULL} -Searching using @code{column_name IS NULL} will use indexes if column_name -is an index. - -@strong{MySQL} normally uses the index that finds the least number of rows. An -index is used for columns that you compare with the following operators: -@code{=}, @code{>}, @code{>=}, @code{<}, @code{<=}, @code{BETWEEN}, and a -@code{LIKE} with a non-wild-card prefix like @code{'something%'}. - -Any index that doesn't span all @code{AND} levels in the @code{WHERE} clause -is not used to optimize the query. In other words: To be able to use an -index, a prefix of the index must be used in every @code{AND} group. - -The following @code{WHERE} clauses use indexes: -@example -... WHERE index_part1=1 AND index_part2=2 AND other_column=3 -... WHERE index=1 OR A=10 AND index=2 /* index = 1 OR index = 2 */ -... WHERE index_part1='hello' AND index_part_3=5 - /* optimized like "index_part1='hello'" */ -... WHERE index1=1 and index2=2 or index1=3 and index3=3; - /* Can use index on index1 but not on index2 or index 3 */ -@end example - -These @code{WHERE} clauses do @strong{NOT} use indexes: -@example -... WHERE index_part2=1 AND index_part3=2 /* index_part_1 is not used */ -... WHERE index=1 OR A=10 /* Index is not used in both AND parts */ -... WHERE index_part1=1 OR index_part2=10 /* No index spans all rows */ -@end example - -Note that in some cases @strong{MySQL} will not use an index, even if one -would be available. Some of the cases where this happens are: - -@itemize @bullet -@item -If the use of the index would require @strong{MySQL} to access more -than 30 % of the rows in the table. (In this case a table scan is -probably much faster, as this will require us to do much fewer seeks). -Note that if such a query uses @code{LIMIT} to only retrieve -part of the rows, @strong{MySQL} will use an index anyway, as it can -much more quickly find the few rows to return in the result. -@end itemize - -@cindex queries, speed of -@cindex permission checks, effect on speed -@cindex speed, of queries -@node Query Speed, Tips, MySQL indexes, Performance -@section Speed of Queries that Access or Update Data - -First, one thing that affects all queries: The more complex permission -system setup you have, the more overhead you get. - -If you do not have any @code{GRANT} statements done, @strong{MySQL} will -optimize the permission checking somewhat. So if you have a very high -volume it may be worth the time to avoid grants. Otherwise more -permission check results in a larger overhead. - -If your problem is with some explicit @strong{MySQL} function, you can -always time this in the @strong{MySQL} client: - -@example -mysql> select benchmark(1000000,1+1); -+------------------------+ -| benchmark(1000000,1+1) | -+------------------------+ -| 0 | -+------------------------+ -1 row in set (0.32 sec) -@end example - -The above shows that @strong{MySQL} can execute 1,000,000 @code{+} -expressions in 0.32 seconds on a @code{PentiumII 400MHz}. - -All @strong{MySQL} functions should be very optimized, but there may be -some exceptions, and the @code{benchmark(loop_count,expression)} is a -great tool to find out if this is a problem with your query. - -@menu -* Estimating performance:: Estimating query performance -* SELECT speed:: Speed of @code{SELECT} queries -* Where optimizations:: How MySQL optimizes @code{WHERE} clauses -* DISTINCT optimization:: How MySQL Optimizes @code{DISTINCT} -* LEFT JOIN optimization:: How MySQL optimizes @code{LEFT JOIN} -* LIMIT optimization:: How MySQL optimizes @code{LIMIT} -* Insert speed:: Speed of @code{INSERT} queries -* Update speed:: Speed of @code{UPDATE} queries -* Delete speed:: Speed of @code{DELETE} queries -@end menu - -@cindex estimating, query performance -@cindex queries, estimating performance -@cindex performance, estimating -@node Estimating performance, SELECT speed, Query Speed, Query Speed -@subsection Estimating Query Performance - -In most cases you can estimate the performance by counting disk seeks. -For small tables, you can usually find the row in 1 disk seek (as the -index is probably cached). For bigger tables, you can estimate that -(using B++ tree indexes) you will need: @code{log(row_count) / -log(index_block_length / 3 * 2 / (index_length + data_pointer_length)) + -1} seeks to find a row. - -In @strong{MySQL} an index block is usually 1024 bytes and the data -pointer is usually 4 bytes. A 500,000 row table with an -index length of 3 (medium integer) gives you: -@code{log(500,000)/log(1024/3*2/(3+4)) + 1} = 4 seeks. - -As the above index would require about 500,000 * 7 * 3/2 = 5.2M, -(assuming that the index buffers are filled to 2/3, which is typical) -you will probably have much of the index in memory and you will probably -only need 1-2 calls to read data from the OS to find the row. - -For writes, however, you will need 4 seek requests (as above) to find -where to place the new index and normally 2 seeks to update the index -and write the row. - -Note that the above doesn't mean that your application will slowly -degenerate by N log N! As long as everything is cached by the OS or SQL -server things will only go marginally slower while the table gets -bigger. After the data gets too big to be cached, things will start to -go much slower until your applications is only bound by disk-seeks -(which increase by N log N). To avoid this, increase the index cache as -the data grows. @xref{Server parameters}. - -@cindex speed, of queries -@findex SELECT speed - -@node SELECT speed, Where optimizations, Estimating performance, Query Speed -@subsection Speed of @code{SELECT} Queries - -In general, when you want to make a slow @code{SELECT ... WHERE} faster, the -first thing to check is whether or not you can add an index. @xref{MySQL -indexes, , @strong{MySQL} indexes}. All references between different tables -should usually be done with indexes. You can use the @code{EXPLAIN} command -to determine which indexes are used for a @code{SELECT}. -@xref{EXPLAIN, , @code{EXPLAIN}}. - -Some general tips: - -@itemize @bullet -@item -To help @strong{MySQL} optimize queries better, run @code{myisamchk ---analyze} on a table after it has been loaded with relevant data. This -updates a value for each index part that indicates the average number of -rows that have the same value. (For unique indexes, this is always 1, -of course.). @strong{MySQL} will use this to decide which index to -choose when you connect two tables with 'a non-constant expression'. -You can check the result from the @code{analyze} run by doing @code{SHOW -INDEX FROM table_name} and examining the @code{Cardinality} column. - -@item -To sort an index and data according to an index, use @code{myisamchk ---sort-index --sort-records=1} (if you want to sort on index 1). If you -have a unique index from which you want to read all records in order -according to that index, this is a good way to make that faster. Note, -however, that this sorting isn't written optimally and will take a long -time for a large table! -@end itemize - -@cindex optimizations -@findex WHERE -@node Where optimizations, DISTINCT optimization, SELECT speed, Query Speed -@subsection How MySQL Optimizes @code{WHERE} Clauses - -The @code{WHERE} optimizations are put in the @code{SELECT} part here because -they are mostly used with @code{SELECT}, but the same optimizations apply for -@code{WHERE} in @code{DELETE} and @code{UPDATE} statements. - -Also note that this section is incomplete. @strong{MySQL} does many -optimizations, and we have not had time to document them all. - -Some of the optimizations performed by @strong{MySQL} are listed below: - -@itemize @bullet -@item -Removal of unnecessary parentheses: -@example - ((a AND b) AND c OR (((a AND b) AND (c AND d)))) --> (a AND b AND c) OR (a AND b AND c AND d) -@end example -@item -Constant folding: -@example - (a<b AND b=c) AND a=5 --> b>5 AND b=c AND a=5 -@end example -@item -Constant condition removal (needed because of constant folding): -@example - (B>=5 AND B=5) OR (B=6 AND 5=5) OR (B=7 AND 5=6) --> B=5 OR B=6 -@end example -@item -Constant expressions used by indexes are evaluated only once. -@item -@code{COUNT(*)} on a single table without a @code{WHERE} is retrieved -directly from the table information. This is also done for any @code{NOT NULL} -expression when used with only one table. -@item -Early detection of invalid constant expressions. @strong{MySQL} quickly -detects that some @code{SELECT} statements are impossible and returns no rows. -@item -@code{HAVING} is merged with @code{WHERE} if you don't use @code{GROUP BY} -or group functions (@code{COUNT()}, @code{MIN()}...). -@item -For each sub-join, a simpler @code{WHERE} is constructed to get a fast -@code{WHERE} evaluation for each sub-join and also to skip records as -soon as possible. -@cindex constant table -@cindex tables, constant -@item -All constant tables are read first, before any other tables in the query. -A constant table is: -@itemize @minus -@item -An empty table or a table with 1 row. -@item -A table that is used with a @code{WHERE} clause on a @code{UNIQUE} -index, or a @code{PRIMARY KEY}, where all index parts are used with constant -expressions and the index parts are defined as @code{NOT NULL}. -@end itemize -All the following tables are used as constant tables: -@example -mysql> SELECT * FROM t WHERE primary_key=1; -mysql> SELECT * FROM t1,t2 - WHERE t1.primary_key=1 AND t2.primary_key=t1.id; -@end example - -@item -The best join combination to join the tables is found by trying all -possibilities. If all columns in @code{ORDER BY} and in @code{GROUP -BY} come from the same table, then this table is preferred first when -joining. -@item -If there is an @code{ORDER BY} clause and a different @code{GROUP BY} -clause, or if the @code{ORDER BY} or @code{GROUP BY} contains columns -from tables other than the first table in the join queue, a temporary -table is created. -@item -If you use @code{SQL_SMALL_RESULT}, @strong{MySQL} will use an in-memory -temporary table. -@item -Each table index is queried, and the best index that spans fewer than 30% of -the rows is used. If no such index can be found, a quick table scan is used. -@item -In some cases, @strong{MySQL} can read rows from the index without even -consulting the data file. If all columns used from the index are numeric, -then only the index tree is used to resolve the query. -@item -Before each record is output, those that do not match the @code{HAVING} clause -are skipped. -@end itemize - -Some examples of queries that are very fast: - -@example -mysql> SELECT COUNT(*) FROM tbl_name; -mysql> SELECT MIN(key_part1),MAX(key_part1) FROM tbl_name; -mysql> SELECT MAX(key_part2) FROM tbl_name - WHERE key_part_1=constant; -mysql> SELECT ... FROM tbl_name - ORDER BY key_part1,key_part2,... LIMIT 10; -mysql> SELECT ... FROM tbl_name - ORDER BY key_part1 DESC,key_part2 DESC,... LIMIT 10; -@end example - -The following queries are resolved using only the index tree (assuming -the indexed columns are numeric): - -@example -mysql> SELECT key_part1,key_part2 FROM tbl_name WHERE key_part1=val; -mysql> SELECT COUNT(*) FROM tbl_name - WHERE key_part1=val1 AND key_part2=val2; -mysql> SELECT key_part2 FROM tbl_name GROUP BY key_part1; -@end example - -The following queries use indexing to retrieve the rows in sorted -order without a separate sorting pass: - -@example -mysql> SELECT ... FROM tbl_name ORDER BY key_part1,key_part2,... ; -mysql> SELECT ... FROM tbl_name ORDER BY key_part1 DESC,key_part2 DESC,... ; -@end example - -@findex DISTINCT -@cindex optimizing, DISTINCT -@node DISTINCT optimization, LEFT JOIN optimization, Where optimizations, Query Speed -@subsection How MySQL Optimizes @code{DISTINCT} - -@code{DISTINCT} is converted to a @code{GROUP BY} on all columns, -@code{DISTINCT} combined with @code{ORDER BY} will in many cases also -need a temporary table. - -When combining @code{LIMIT #} with @code{DISTINCT}, @strong{MySQL} will stop -as soon as it finds @code{#} unique rows. - -If you don't use columns from all used tables, @strong{MySQL} will stop -the scanning of the not used tables as soon as it has found the first match. - -@example -SELECT DISTINCT t1.a FROM t1,t2 where t1.a=t2.a; -@end example - -In the case, assuming t1 is used before t2 (check with @code{EXPLAIN}), then -@strong{MySQL} will stop reading from t2 (for that particular row in t1) -when the first row in t2 is found. - -@findex LEFT JOIN -@cindex optimizing, LEFT JOIN -@node LEFT JOIN optimization, LIMIT optimization, DISTINCT optimization, Query Speed -@subsection How MySQL Optimizes @code{LEFT JOIN} and @code{RIGHT JOIN} - -@code{A LEFT JOIN B} in @strong{MySQL} is implemented as follows: - -@itemize @bullet -@item -The table @code{B} is set to be dependent on table @code{A} and all tables -that @code{A} is dependent on. -@item -The table @code{A} is set to be dependent on all tables (except @code{B}) -that are used in the @code{LEFT JOIN} condition. -@item -All @code{LEFT JOIN} conditions are moved to the @code{WHERE} clause. -@item -All standard join optimizations are done, with the exception that a table is -always read after all tables it is dependent on. If there is a circular -dependence then @strong{MySQL} will issue an error. -@item -All standard @code{WHERE} optimizations are done. -@item -If there is a row in @code{A} that matches the @code{WHERE} clause, but there -wasn't any row in @code{B} that matched the @code{LEFT JOIN} condition, -then an extra @code{B} row is generated with all columns set to @code{NULL}. -@item -If you use @code{LEFT JOIN} to find rows that don't exist in some -table and you have the following test: @code{column_name IS NULL} in the -@code{WHERE} part, where column_name is a column that is declared as -@code{NOT NULL}, then @strong{MySQL} will stop searching after more rows -(for a particular key combination) after it has found one row that -matches the @code{LEFT JOIN} condition. -@end itemize - -@code{RIGHT JOIN} is implemented analogously as @code{LEFT JOIN}. - -The table read order forced by @code{LEFT JOIN} and @code{STRAIGHT JOIN} -will help the join optimizer (which calculates in which order tables -should be joined) to do its work much more quickly, as there are fewer -table permutations to check. - -Note that the above means that if you do a query of type: - -@example -SELECT * FROM a,b LEFT JOIN c ON (c.key=a.key) LEFT JOIN d (d.key=a.key) WHERE b.key=d.key -@end example - -@strong{MySQL} will do a full scan on @code{b} as the @code{LEFT -JOIN} will force it to be read before @code{d}. - -The fix in this case is to change the query to: - -@example -SELECT * FROM b,a LEFT JOIN c ON (c.key=a.key) LEFT JOIN d (d.key=a.key) WHERE b.key=d.key -@end example - -@cindex optimizing, LIMIT -@findex LIMIT -@node LIMIT optimization, Insert speed, LEFT JOIN optimization, Query Speed -@subsection How MySQL Optimizes @code{LIMIT} - -In some cases @strong{MySQL} will handle the query differently when you are -using @code{LIMIT #} and not using @code{HAVING}: - -@itemize @bullet -@item -If you are selecting only a few rows with @code{LIMIT}, @strong{MySQL} -will use indexes in some cases when it normally would prefer to do a -full table scan. -@item -If you use @code{LIMIT #} with @code{ORDER BY}, @strong{MySQL} will end the -sorting as soon as it has found the first @code{#} lines instead of sorting -the whole table. -@item -When combining @code{LIMIT #} with @code{DISTINCT}, @strong{MySQL} will stop -as soon as it finds @code{#} unique rows. -@item -In some cases a @code{GROUP BY} can be resolved by reading the key in order -(or do a sort on the key) and then calculate summaries until the -key value changes. In this case @code{LIMIT #} will not calculate any -unnecessary @code{GROUP BY}'s. -@item -As soon as @strong{MySQL} has sent the first @code{#} rows to the client, it -will abort the query. -@item -@code{LIMIT 0} will always quickly return an empty set. This is useful -to check the query and to get the column types of the result columns. -@item -The size of temporary tables uses the @code{LIMIT #} to calculate how much -space is needed to resolve the query. -@end itemize - -@cindex speed, inserting -@cindex inserting, speed of -@node Insert speed, Update speed, LIMIT optimization, Query Speed -@subsection Speed of @code{INSERT} Queries - -The time to insert a record consists approximately of: - -@itemize @bullet -@item -Connect: (3) -@item -Sending query to server: (2) -@item -Parsing query: (2) -@item -Inserting record: (1 x size of record) -@item -Inserting indexes: (1 x number of indexes) -@item -Close: (1) -@end itemize - -where the numbers are somewhat proportional to the overall time. This -does not take into consideration the initial overhead to open tables -(which is done once for each concurrently running query). - -The size of the table slows down the insertion of indexes by N log N -(B-trees). - -Some ways to speed up inserts: - -@itemize @bullet -@item -If you are inserting many rows from the same client at the same time, use -multiple value lists @code{INSERT} statements. This is much faster (many -times in some cases) than using separate @code{INSERT} statements. -@item -If you are inserting a lot of rows from different clients, you can get -higher speed by using the @code{INSERT DELAYED} statement. @xref{INSERT, -, @code{INSERT}}. -@item -Note that with @code{MyISAM} you can insert rows at the same time -@code{SELECT}s are running if there are no deleted rows in the tables. -@item -When loading a table from a text file, use @code{LOAD DATA INFILE}. This -is usually 20 times faster than using a lot of @code{INSERT} statements. -@xref{LOAD DATA, , @code{LOAD DATA}}. -@item -It is possible with some extra work to make @code{LOAD DATA INFILE} run even -faster when the table has many indexes. Use the following procedure: - -@enumerate -@item -Optionally create the table with @code{CREATE TABLE}. For example, using -@code{mysql} or Perl-DBI. - -@item -Execute a @code{FLUSH TABLES} statement or the shell command @code{mysqladmin -flush-tables}. - -@item -Use @code{myisamchk --keys-used=0 -rq /path/to/db/tbl_name}. This will -remove all usage of all indexes from the table. - -@item -Insert data into the table with @code{LOAD DATA INFILE}. This will not -update any indexes and will therefore be very fast. - -@item -If you are going to only read the table in the future, run @code{myisampack} -on it to make it smaller. @xref{Compressed format}. - -@item -Re-create the indexes with @code{myisamchk -r -q -/path/to/db/tbl_name}. This will create the index tree in memory before -writing it to disk, which is much faster because it avoids lots of disk -seeks. The resulting index tree is also perfectly balanced. - -@item -Execute a @code{FLUSH TABLES} statement or the shell command @code{mysqladmin -flush-tables}. -@end enumerate - -This procedure will be built into @code{LOAD DATA INFILE} in some future -version of @strong{MySQL}. -@item -You can speed up insertions by locking your tables: - -@example -mysql> LOCK TABLES a WRITE; -mysql> INSERT INTO a VALUES (1,23),(2,34),(4,33); -mysql> INSERT INTO a VALUES (8,26),(6,29); -mysql> UNLOCK TABLES; -@end example - -The main speed difference is that the index buffer is flushed to disk only -once, after all @code{INSERT} statements have completed. Normally there would -be as many index buffer flushes as there are different @code{INSERT} -statements. Locking is not needed if you can insert all rows with a single -statement. - -Locking will also lower the total time of multi-connection tests, but the -maximum wait time for some threads will go up (because they wait for -locks). For example: - -@example -thread 1 does 1000 inserts -thread 2, 3, and 4 does 1 insert -thread 5 does 1000 inserts -@end example - -If you don't use locking, 2, 3, and 4 will finish before 1 and 5. If you -use locking, 2, 3, and 4 probably will not finish before 1 or 5, but the -total time should be about 40% faster. - -As @code{INSERT}, @code{UPDATE}, and @code{DELETE} operations are very -fast in @strong{MySQL}, you will obtain better overall performance by -adding locks around everything that does more than about 5 inserts or -updates in a row. If you do very many inserts in a row, you could do a -@code{LOCK TABLES} followed by an @code{UNLOCK TABLES} once in a while -(about each 1000 rows) to allow other threads access to the table. This -would still result in a nice performance gain. - -Of course, @code{LOAD DATA INFILE} is much faster for loading data. -@end itemize - -To get some more speed for both @code{LOAD DATA INFILE} and -@code{INSERT}, enlarge the key buffer. @xref{Server parameters}. - -@node Update speed, Delete speed, Insert speed, Query Speed -@subsection Speed of @code{UPDATE} Queries - -Update queries are optimized as a @code{SELECT} query with the additional -overhead of a write. The speed of the write is dependent on the size of -the data that is being updated and the number of indexes that are -updated. Indexes that are not changed will not be updated. - -Also, another way to get fast updates is to delay updates and then do -many updates in a row later. Doing many updates in a row is much quicker -than doing one at a time if you lock the table. - -Note that, with dynamic record format, updating a record to -a longer total length may split the record. So if you do this often, -it is very important to @code{OPTIMIZE TABLE} sometimes. -@xref{OPTIMIZE TABLE, , @code{OPTIMIZE TABLE}}. - -@node Delete speed, , Update speed, Query Speed -@subsection Speed of @code{DELETE} Queries - -If you want to delete all rows in the table, you should use -@code{TRUNCATE TABLE table_name}. @xref{TRUNCATE}. - -The time to delete a record is exactly proportional to the number of -indexes. To delete records more quickly, you can increase the size of -the index cache. @xref{Server parameters}. - -@cindex optimization, tips -@cindex tips, optimization -@node Tips, Benchmarks, Query Speed, Performance -@section Other Optimization Tips - -Unsorted tips for faster systems: - -@itemize @bullet -@item -Use persistent connections to the database to avoid the connection -overhead. If you can't use persistent connections and you are doing a -lot of new connections to the database, you may want to change the value -of the @code{thread_cache_size} variable. @xref{Server parameters}. -@item -Always check that all your queries really use the indexes you have created -in the tables. In @strong{MySQL} you can do this with the @code{EXPLAIN} -command. @xref{EXPLAIN, Explain, Explain, manual}. -@item -Try to avoid complex @code{SELECT} queries on tables that are updated a -lot. This is to avoid problems with table locking. -@item -The new @code{MyISAM} tables can insert rows in a table without deleted -rows at the same time another table is reading from it. If this is important -for you, you should consider methods where you don't have to delete rows -or run @code{OPTIMIZE TABLE} after you have deleted a lot of rows. -@item -Use @code{ALTER TABLE ... ORDER BY expr1,expr2...} if you mostly -retrieve rows in expr1,expr2.. order. By using this option after big -changes to the table, you may be able to get higher performance. -@item -In some cases it may make sense to introduce a column that is 'hashed' -based on information from other columns. If this column is short and -reasonably unique it may be much faster than a big index on many -columns. In @strong{MySQL} it's very easy to use this extra column: -@code{SELECT * FROM table_name WHERE hash=MD5(concat(col1,col2)) -AND col_1='constant' AND col_2='constant'} -@item -For tables that change a lot you should try to avoid all @code{VARCHAR} -or @code{BLOB} columns. You will get dynamic row length as soon as you -are using a single @code{VARCHAR} or @code{BLOB} column. @xref{Table -types}. -@item -It's not normally useful to split a table into different tables just -because the rows gets 'big'. To access a row, the biggest performance -hit is the disk seek to find the first byte of the row. After finding -the data most new disks can read the whole row fast enough for most -applications. The only cases where it really matters to split up a table is if -it's a dynamic row size table (see above) that you can change to a fixed -row size, or if you very often need to scan the table and don't need -most of the columns. @xref{Table types}. -@item -If you very often need to calculate things based on information from a -lot of rows (like counts of things), it's probably much better to -introduce a new table and update the counter in real time. An update of -type @code{UPDATE table set count=count+1 where index_column=constant} -is very fast! - -This is really important when you use databases like @strong{MySQL} that -only have table locking (multiple readers / single writers). This will -also give better performance with most databases, as the row locking -manager in this case will have less to do. -@item -If you need to collect statistics from big log tables, use summary tables -instead of scanning the whole table. Maintaining the summaries should be -much faster than trying to do statistics 'live'. It's much faster to -regenerate new summary tables from the logs when things change -(depending on business decisions) than to have to change the running -application! -@item -If possible, one should classify reports as 'live' or 'statistical', -where data needed for statistical reports are only generated based on -summary tables that are generated from the actual data. -@item -Take advantage of the fact that columns have default values. Insert -values explicitly only when the value to be inserted differs from the -default. This reduces the parsing that @strong{MySQL} need to do and -improves the insert speed. -@item -In some cases it's convenient to pack and store data into a blob. In this -case you have to add some extra code in your application to pack/unpack -things in the blob, but this may save a lot of accesses at some stage. -This is practical when you have data that doesn't conform to a static -table structure. -@item -Normally you should try to keep all data non-redundant (what -is called 3rd normal form in database theory), but you should not be -afraid of duplicating things or creating summary tables if you need these -to gain more speed. -@item -Stored procedures or UDF (user-defined functions) may be a good way to -get more performance. In this case you should, however, always have a way -to do this some other (slower) way if you use some database that doesn't -support this. -@item -You can always gain something by caching queries/answers in your -application and trying to do many inserts/updates at the same time. If -your database supports lock tables (like @strong{MySQL} and Oracle), -this should help to ensure that the index cache is only flushed once -after all updates. -@item -Use @code{INSERT /*! DELAYED */} when you do not need to know when your -data is written. This speeds things up because many records can be written -with a single disk write. -@item -Use @code{INSERT /*! LOW_PRIORITY */} when you want your selects to be -more important. -@item -Use @code{SELECT /*! HIGH_PRIORITY */} to get selects that jump the -queue. That is, the select is done even if there is somebody waiting to -do a write. -@item -Use the multi-line @code{INSERT} statement to store many rows with one -SQL command (many SQL servers supports this). -@item -Use @code{LOAD DATA INFILE} to load bigger amounts of data. This is -faster than normal inserts and will be even faster when @code{myisamchk} -is integrated in @code{mysqld}. -@item -Use @code{AUTO_INCREMENT} columns to make unique values. -@item -Use @code{OPTIMIZE TABLE} once in a while to avoid fragmentation when -using dynamic table format. @xref{OPTIMIZE TABLE, , @code{OPTIMIZE TABLE}}. - -@item -Use @code{HEAP} tables to get more speed when possible. @xref{Table -types}. -@item -When using a normal Web server setup, images should be stored as -files. That is, store only a file reference in the database. The main -reason for this is that a normal Web server is much better at caching -files than database contents. So it it's much easier to get a fast -system if you are using files. -@item -Use in memory tables for non-critical data that are accessed often (like -information about the last shown banner for users that don't have -cookies). -@item -Columns with identical information in different tables should be -declared identical and have identical names. Before Version 3.23 you -got slow joins otherwise. - -Try to keep the names simple (use @code{name} instead of -@code{customer_name} in the customer table). To make your names portable -to other SQL servers you should keep them shorter than 18 characters. -@item -If you need REALLY high speed, you should take a look at the low-level -interfaces for data storage that the different SQL servers support! For -example, by accessing the @strong{MySQL} @code{MyISAM} directly, you could -get a speed increase of 2-5 times compared to using the SQL interface. -To be able to do this the data must be on the same server as -the application, and usually it should only be accessed by one process -(because external file locking is really slow). One could eliminate the -above problems by introducing low-level @code{MyISAM} commands in the -@strong{MySQL} server (this could be one easy way to get more -performance if needed). By carefully designing the database interface, -it should be quite easy to support this types of optimization. -@item -In many cases it's faster to access data from a database (using a live -connection) than accessing a text file, just because the database is -likely to be more compact than the text file (if you are using numerical -data), and this will involve fewer disk accesses. You will also save -code because you don't have to parse your text files to find line and -column boundaries. -@item -You can also use replication to speed things up. @xref{Replication}. -@item -Declaring a table with @code{DELAY_KEY_WRITE=1} will make the updating of -indexes faster, as these are not logged to disk until the file is closed. -The downside is that you should run @code{myisamchk} on these tables before -you start @code{mysqld} to ensure that they are okay if something killed -@code{mysqld} in the middle. As the key information can always be generated -from the data, you should not lose anything by using @code{DELAY_KEY_WRITE}. -@end itemize - -@cindex benchmarks -@cindex performance, benchmarks -@node Benchmarks, Design, Tips, Performance -@section Using Your Own Benchmarks - -You should definitely benchmark your application and database to find -out where the bottlenecks are. By fixing it (or by replacing the -bottleneck with a 'dummy module') you can then easily identify the next -bottleneck (and so on). Even if the overall performance for your -application is sufficient, you should at least make a plan for each -bottleneck, and decide how to solve it if someday you really need the -extra performance. - -For an example of portable benchmark programs, look at the @strong{MySQL} -benchmark suite. @xref{MySQL Benchmarks, , @strong{MySQL} Benchmarks}. You -can take any program from this suite and modify it for your needs. By doing this, -you can try different solutions to your problem and test which is really the -fastest solution for you. - -It is very common that some problems only occur when the system is very -heavily loaded. We have had many customers who contact us when they -have a (tested) system in production and have encountered load problems. In -every one of these cases so far, it has been problems with basic design -(table scans are NOT good at high load) or OS/Library issues. Most of -this would be a @strong{LOT} easier to fix if the systems were not -already in production. - -To avoid problems like this, you should put some effort into benchmarking -your whole application under the worst possible load! You can use Sasha's -recent hack for this - -@uref{http://www.mysql.com/Downloads/super-smack/super-smack-1.0.tar.gz, -super-smack}. -As the name suggests, it can bring your system down to its knees if you ask it, -so make sure to use it only on your development systems. - -@cindex design, choices -@cindex database design -@cindex storage of data -@node Design, Design Limitations, Benchmarks, Performance -@section Design Choices - -@strong{MySQL} keeps row data and index data in separate files. Many (almost -all) other databases mix row and index data in the same file. We believe that -the @strong{MySQL} choice is better for a very wide range of modern systems. - -Another way to store the row data is to keep the information for each -column in a separate area (examples are SDBM and Focus). This will cause a -performance hit for every query that accesses more than one column. Because -this degenerates so quickly when more than one column is accessed, -we believe that this model is not good for general purpose databases. - -The more common case is that the index and data are stored together -(like in Oracle/Sybase et al). In this case you will find the row -information at the leaf page of the index. The good thing with this -layout is that it, in many cases, depending on how well the index is -cached, saves a disk read. The bad things with this layout are: - -@itemize @bullet -@item -Table scanning is much slower because you have to read through the indexes -to get at the data. -@item -You can't use only the index table to retrieve data for a query. -@item -You lose a lot of space, as you must duplicate indexes from the nodes -(as you can't store the row in the nodes). -@item -Deletes will degenerate the table over time (as indexes in nodes are -usually not updated on delete). -@item -It's harder to cache ONLY the index data. -@end itemize - -@cindex design, limitations -@node Design Limitations, Portability, Design, Performance -@section MySQL Design Limitations/Tradeoffs - -Because @strong{MySQL} uses extremely fast table locking (multiple readers / -single writers) the biggest remaining problem is a mix of a steady stream of -inserts and slow selects on the same table. - -We believe that for a huge number of systems the extremely fast -performance in other cases make this choice a win. This case is usually -also possible to solve by having multiple copies of the table, but it -takes more effort and hardware. - -We are also working on some extensions to solve this problem for some -common application niches. - -@cindex portability -@cindex crash-me program -@cindex programs, crash-me -@node Portability, Internal use, Design Limitations, Performance -@section Portability - -Because all SQL servers implement different parts of SQL, it takes work to -write portable SQL applications. For very simple selects/inserts it is -very easy, but the more you need the harder it gets. If you want an -application that is fast with many databases it becomes even harder! - -To make a complex application portable you need to choose a number of -SQL servers that it should work with. - -You can use the @strong{MySQL} crash-me program/web-page -@uref{http://www.mysql.com/information/crash-me.php} to find functions, -types, and limits you can use with a selection of database -servers. Crash-me now tests far from everything possible, but it -is still comprehensive with about 450 things tested. - -For example, you shouldn't have column names longer than 18 characters -if you want to be able to use Informix or DB2. - -Both the @strong{MySQL} benchmarks and crash-me programs are very -database-independent. By taking a look at how we have handled this, you -can get a feeling for what you have to do to write your application -database-independent. The benchmarks themselves can be found in the -@file{sql-bench} directory in the @strong{MySQL} source -distribution. They are written in Perl with DBI database interface -(which solves the access part of the problem). - -See @uref{http://www.mysql.com/information/benchmarks.html} for the results -from this benchmark. - -As you can see in these results, all databases have some weak points. That -is, they have different design compromises that lead to different -behavior. - -If you strive for database independence, you need to get a good feeling -for each SQL server's bottlenecks. @strong{MySQL} is VERY fast in -retrieving and updating things, but will have a problem in mixing slow -readers/writers on the same table. Oracle, on the other hand, has a big -problem when you try to access rows that you have recently updated -(until they are flushed to disk). Transaction databases in general are -not very good at generating summary tables from log tables, as in this -case row locking is almost useless. - -To get your application @emph{really} database-independent, you need to define -an easy extendable interface through which you manipulate your data. As -C++ is available on most systems, it makes sense to use a C++ classes -interface to the databases. - -If you use some specific feature for some database (like the -@code{REPLACE} command in @strong{MySQL}), you should code a method for -the other SQL servers to implement the same feature (but slower). With -@strong{MySQL} you can use the @code{/*! */} syntax to add -@strong{MySQL}-specific keywords to a query. The code inside -@code{/**/} will be treated as a comment (ignored) by most other SQL -servers. - -If REAL high performance is more important than exactness, as in some -Web applications, a possibility is to create an application layer that -caches all results to give you even higher performance. By letting -old results 'expire' after a while, you can keep the cache reasonably -fresh. This is quite nice in case of extremely high load, in which case -you can dynamically increase the cache and set the expire timeout higher -until things get back to normal. - -In this case the table creation information should contain information -of the initial size of the cache and how often the table should normally -be refreshed. - -@cindex uses, of MySQL -@cindex customers, of MySQL -@node Internal use, , Portability, Performance -@section What Have We Used MySQL For? - -During @strong{MySQL} initial development, the features of @strong{MySQL} were made to fit -our largest customer. They handle data warehousing for a couple of the -biggest retailers in Sweden. - -From all stores, we get weekly summaries of all bonus card transactions, -and we are expected to provide useful information for the store owners -to help them find how their advertisement campaigns are affecting their -customers. - -The data is quite huge (about 7 million summary transactions per month), -and we have data for 4-10 years that we need to present to the users. -We got weekly requests from the customers that they want to get -'instant' access to new reports from this data. - -We solved this by storing all information per month in compressed -'transaction' tables. We have a set of simple macros (script) that -generates summary tables grouped by different criteria (product group, -customer id, store ...) from the transaction tables. The reports are -Web pages that are dynamically generated by a small Perl script that -parses a Web page, executes the SQL statements in it, and inserts the -results. We would have used PHP or mod_perl instead but they were -not available at that time. - -For graphical data we wrote a simple tool in @code{C} that can produce -GIFs based on the result of a SQL query (with some processing of the -result). This is also dynamically executed from the Perl script that -parses the @code{HTML} files. - -In most cases a new report can simply be done by copying an existing -script and modifying the SQL query in it. In some cases, we will need to -add more fields to an existing summary table or generate a new one, but -this is also quite simple, as we keep all transactions tables on disk. -(Currently we have at least 50G of transactions tables and 200G of other -customer data.) - -We also let our customers access the summary tables directly with ODBC -so that the advanced users can themselves experiment with the data. - -We haven't had any problems handling this with quite modest Sun Ultra -SPARCstation (2x200 Mhz). We recently upgraded one of our servers to a 2 -CPU 400 Mhz UltraSPARC, and we are now planning to start handling -transactions on the product level, which would mean a ten-fold increase -of data. We think we can keep up with this by just adding more disk to -our systems. - -We are also experimenting with Intel-Linux to be able to get more CPU -power cheaper. Now that we have the binary portable database format (new -in Version 3.23), we will start to use this for some parts of the application. - -Our initial feelings are that Linux will perform much better on -low-to-medium load and Solaris will perform better when you start to get a -high load because of extreme disk IO, but we don't yet have anything -conclusive about this. After some discussion with a Linux Kernel -developer, this might be a side effect of Linux giving so much resources -to the batch job that the interactive performance gets very low. This -makes the machine feel very slow and unresponsive while big batches are -going. Hopefully this will be better handled in future Linux Kernels. - -@cindex benchmark suite -@cindex crash-me program -@node MySQL Benchmarks, Maintenance, Performance, Top -@chapter The MySQL Benchmark Suite - -This should contain a technical description of the @strong{MySQL} -benchmark suite (and @code{crash-me}), but that description is not -written yet. Currently, you can get a good idea of the benchmark by -looking at the code and results in the @file{sql-bench} directory in any -@strong{MySQL} source distributions. - -This benchmark suite is meant to be a benchmark that will tell any user -what things a given SQL implementation performs well or poorly at. - -Note that this benchmark is single threaded, so it measures the minimum -time for the operations. We plan to in the future add a lot of -multi-threaded tests to the benchmark suite. - -For example, (run on the same NT 4.0 machine): - -@multitable @columnfractions .6 .2 .2 -@strong{Reading 2000000 rows by index} @tab @strong{Seconds} @tab @strong{Seconds} -@item mysql @tab 367 @tab 249 -@item mysql_odbc @tab 464 -@item db2_odbc @tab 1206 -@item informix_odbc @tab 121126 -@item ms-sql_odbc @tab 1634 -@item oracle_odbc @tab 20800 -@item solid_odbc @tab 877 -@item sybase_odbc @tab 17614 -@end multitable - -@multitable @columnfractions .6 .2 .2 -@strong{Inserting (350768) rows} @tab @strong{Seconds} @tab @strong{Seconds} -@item mysql @tab 381 @tab 206 -@item mysql_odbc @tab 619 -@item db2_odbc @tab 3460 -@item informix_odbc @tab 2692 -@item ms-sql_odbc @tab 4012 -@item oracle_odbc @tab 11291 -@item solid_odbc @tab 1801 -@item sybase_odbc @tab 4802 -@end multitable - -In the above test @strong{MySQL} was run with a 8M index cache. - -We have gather some more benchmark results at -@uref{http://www.mysql.com/information/benchmarks.html}. - -Note that Oracle is not included because they asked to be removed. All -Oracle benchmarks have to be passed by Oracle! We believe that makes -Oracle benchmarks @strong{VERY} biased because the above benchmarks are -supposed to show what a standard installation can do for a single -client. - -To run the benchmark suite, you have to download a @strong{MySQL} source -distribution, install the perl DBI driver, the perl DBD driver for the -database you want to test and then do: - -@example -cd sql-bench -perl run-all-tests --server=# -@end example - -where # is one of supported servers. You can get a list of all options -and supported servers by doing @code{run-all-tests --help}. - -@cindex crash-me -@code{crash-me} tries to determine what features a database supports and -what its capabilities and limitations are by actually running -queries. For example, it determines: - -@itemize @bullet -@item -What column types are supported -@item -How many indexes are supported -@item -What functions are supported -@item -How big a query can be -@item -How big a @code{VARCHAR} column can be -@end itemize - -We can find the result from crash-me on a lot of different databases at -@uref{http://www.mysql.com/information/crash-me.php}. - - -@node Maintenance, Adding functions, MySQL Benchmarks, Top +@node Maintenance, Adding functions, Fulltext Search, Top @chapter Maintaining a MySQL Installation @cindex installation maintenance @@ -36648,8 +36795,6 @@ This chapter covers what you should know about maintaining a @strong{MySQL} distribution. You will learn how to care for your tables on a regular basis, and what to do when disaster strikes. - - @node Using mysqlcheck, , Maintenance, Maintenance @section Using @code{mysqlcheck} for Table Maintenance and Crash Recovery |