diff options
author | unknown <lenz@mysql.com> | 2003-05-27 18:47:50 +0200 |
---|---|---|
committer | unknown <lenz@mysql.com> | 2003-05-27 18:47:50 +0200 |
commit | d272a9e4d623775e6963e40b01f0515319e2efa6 (patch) | |
tree | 848b67f2e72fa098c4faadcc429362721347a30d /Docs | |
parent | da8f7ecadd2abcf4a5908cd2a729f7e794138ef6 (diff) | |
download | mariadb-git-d272a9e4d623775e6963e40b01f0515319e2efa6.tar.gz |
- removed internals.texi (has been moved to mysqldoc BK tree)
BitKeeper/deleted/.del-internals.texi~62b6f580a41c2a43:
Delete: Docs/internals.texi
Docs/Makefile.am:
- removed all traces of internals.texi
Diffstat (limited to 'Docs')
-rw-r--r-- | Docs/Makefile.am | 47 | ||||
-rw-r--r-- | Docs/internals.texi | 5707 |
2 files changed, 0 insertions, 5754 deletions
diff --git a/Docs/Makefile.am b/Docs/Makefile.am index 00eb936c408..f3df055a7dd 100644 --- a/Docs/Makefile.am +++ b/Docs/Makefile.am @@ -158,53 +158,6 @@ manual_letter.de.ps: manual.de.texi include.texi # -# Internals Manual -# - -# GNU Info -internals.info: internals.texi include.texi - cd $(srcdir) && $(MAKEINFO) --no-split -I $(srcdir) $< - -# Plain Text -internals.txt: internals.texi include.texi - cd $(srcdir) && \ - $(MAKEINFO) -I $(srcdir) --no-headers --no-split --output $@ $< - -# HTML, all in one file -internals.html: internals.texi include.texi $(srcdir)/Support/texi2html - cd $(srcdir) && @PERL@ $(srcdir)/Support/texi2html $(TEXI2HTML_FLAGS) $< -internals_toc.html: internals.html - -# PDF, Portable Document Format -internals.pdf: internals.texi - sed -e 's|@image{[^}]*} *||g' <$< >internals-tmp.texi - pdftex --interaction=nonstopmode internals-tmp.texi - texindex internals-tmp.?? - pdftex --interaction=nonstopmode internals-tmp.texi - texindex internals-tmp.?? - pdftex --interaction=nonstopmode internals-tmp.texi - mv internals-tmp.pdf $@ - rm -f internals-tmp.* - touch $@ - -# Postscript, A4 Paper -internals_a4.ps: internals.texi include.texi - TEXINPUTS=$(srcdir):$$TEXINPUTS \ - MAKEINFO='$(MAKEINFO) -I $(srcdir)' \ - $(TEXI2DVI) --batch --texinfo --quiet '@afourpaper' $< - $(DVIPS) -t a4 internals.dvi -o $@ - touch $@ - -# Postscript, US Letter Paper -internals_letter.ps: internals.texi include.texi - TEXINPUTS=$(srcdir):$$TEXINPUTS \ - MAKEINFO='$(MAKEINFO) -I $(srcdir)' \ - $(TEXI2DVI) --batch $< - $(DVIPS) -t letter internals.dvi -o $@ - touch $@ - - -# # Miscellaneous # diff --git a/Docs/internals.texi b/Docs/internals.texi deleted file mode 100644 index a54f5098e5d..00000000000 --- a/Docs/internals.texi +++ /dev/null @@ -1,5707 +0,0 @@ -\input texinfo @c -*-texinfo-*- -@c Copyright 2002 MySQL AB -@c -@c %**start of header -@setfilename internals.info - -@c We want the types in the same index -@synindex cp fn - -@iftex -@afourpaper -@end iftex - -@c Get version and other info -@include include.texi - -@ifclear tex-debug -@c This removes the black squares in the right margin -@finalout -@end ifclear - -@c Set background for HTML -@set _body_tags BGCOLOR=#FFFFFF TEXT=#000000 LINK=#101090 VLINK=#7030B0 -@settitle @strong{MySQL} Internals Manual for version @value{mysql_version}. -@setchapternewpage odd -@paragraphindent 0 - -@c %**end of header - -@ifinfo -@format -START-INFO-DIR-ENTRY -* mysql-internals: (mysql-internals). @strong{MySQL} internals. -END-INFO-DIR-ENTRY -@end format -@end ifinfo - -@titlepage -@sp 10 -@center @titlefont{@strong{MySQL} Internals Manual} -@sp 10 -@center Copyright @copyright{} 1998-2002 MySQL AB -@page -@end titlepage - -@node Top, coding guidelines, (dir), (dir) - -@ifinfo -This is a manual about @strong{MySQL} internals. -@end ifinfo - -@menu -* coding guidelines:: Coding Guidelines -* caching:: How MySQL Handles Caching -* join_buffer_size:: -* flush tables:: How MySQL Handles @code{FLUSH TABLES} -* Algorithms:: -* mysys functions:: Functions In The @code{mysys} Library -* DBUG:: DBUG Tags To Use -* protocol:: MySQL Client/Server Protocol -* Fulltext Search:: Fulltext Search in MySQL -* MyISAM Record Structure:: MyISAM Record Structure -* InnoDB Record Structure:: InnoDB Record Structure -* InnoDB Page Structure:: InnoDB Page Structure -* Files in MySQL Sources:: Annotated List Of Files in the MySQL Source Code Distribution -* Files in InnoDB Sources:: Annotated List Of Files in the InnoDB Source Code Distribution -@end menu - - -@node coding guidelines, caching, Top, Top -@chapter Coding Guidelines - -@itemize @bullet - -@item -We use @uref{http://www.bitkeeper.com/, BitKeeper} for source management. - -@item -You should use the @strong{MySQL} 4.1 source for all developments. - -@item -If you have any questions about the @strong{MySQL} source, you can post these -to @email{internals@@mysql.com} and we will answer them. - -@item -Try to write code in a lot of black boxes that can be reused or use at -least a clean, easy to change interface. - -@item -Reuse code; There is already a lot of algorithms in MySQL for list handling, -queues, dynamic and hashed arrays, sorting, etc. that can be reused. - -@item -Use the @code{my_*} functions like @code{my_read()}/@code{my_write()}/ -@code{my_malloc()} that you can find in the @code{mysys} library instead -of the direct system calls; This will make your code easier to debug and -more portable. - -@item -Try to always write optimized code, so that you don't have to -go back and rewrite it a couple of months later. It's better to -spend 3 times as much time designing and writing an optimal function than -having to do it all over again later on. - -@item -Avoid CPU wasteful code, even where it does not matter, so that -you will not develop sloppy coding habits. - -@item -If you can write it in fewer lines, do it (as long as the code will not -be slower or much harder to read). - -@item -Don't use two commands on the same line. - -@item -Do not check the same pointer for @code{NULL} more than once. - -@item -Use long function and variable names in English. This makes your code -easier to read. - -@item -Use @code{my_var} as opposed to @code{myVar} or @code{MyVar} (@samp{_} -rather than dancing SHIFT to seperate words in identifiers). - -@item -Think assembly - make it easier for the compiler to optimize your code. - -@item -Comment your code when you do something that someone else may think -is not ``trivial''. - -@item -Use @code{libstring} functions (in the @file{strings} directory) -instead of standard @code{libc} string functions whenever possible. - -@item -Avoid using @code{malloc()} (its REAL slow); For memory allocations -that only need to live for the lifetime of one thread, one should use -@code{sql_alloc()} instead. - -@item -Before making big design decisions, please first post a summary of -what you want to do, why you want to do it, and how you plan to do -it. This way we can easily provide you with feedback and also -easily discuss it thoroughly if some other developer thinks there is better -way to do the same thing! - -@item -Class names start with a capital letter. - -@item -Structure types are @code{typedef}'ed to an all-caps identifier. - -@item -Any @code{#define}'s are in all-caps. - -@item -Matching @samp{@{} are in the same column. - -@item -Put the @samp{@{} after a @code{switch} on the same line, as this gives -better overall indentation for the switch statement: - -@example -switch (arg) @{ -@end example - -@item -In all other cases, @samp{@{} and @samp{@}} should be on their own line, except -if there is nothing inside @samp{@{} and @samp{@}}. - -@item -Have a space after @code{if} - -@item -Put a space after @samp{,} for function arguments - -@item -Functions return @samp{0} on success, and non-zero on error, so you can do: - -@example -if(a() || b() || c()) @{ error("something went wrong"); @} -@end example - -@item -Using @code{goto} is okay if not abused. - -@item -Avoid default variable initalizations, use @code{LINT_INIT()} if the -compiler complains after making sure that there is really no way -the variable can be used uninitialized. - -@item -Do not instantiate a class if you do not have to. - -@item -Use pointers rather than array indexing when operating on strings. - -@end itemize - -Suggested mode in emacs: - -@example -(load "cc-mode") -(setq c-mode-common-hook '(lambda () - (turn-on-font-lock) - (setq comment-column 48))) -(setq c-style-alist - (cons - '("MY" - (c-basic-offset . 2) - (c-comment-only-line-offset . 0) - (c-offsets-alist . ((statement-block-intro . +) - (knr-argdecl-intro . 0) - (substatement-open . 0) - (label . -) - (statement-cont . +) - (arglist-intro . c-lineup-arglist-intro-after-paren) - (arglist-close . c-lineup-arglist) - )) - ) - c-style-alist)) -(c-set-style "MY") -(setq c-default-style "MY") -@end example - -@node caching, join_buffer_size, coding guidelines, Top -@chapter How MySQL Handles Caching - -@strong{MySQL} has the following caches: -(Note that the some of the filename have a wrong spelling of cache. :) - -@table @strong - -@item Key Cache -A shared cache for all B-tree index blocks in the different NISAM -files. Uses hashing and reverse linked lists for quick caching of the -last used blocks and quick flushing of changed entries for a specific -table. (@file{mysys/mf_keycash.c}) - -@item Record Cache -This is used for quick scanning of all records in a table. -(@file{mysys/mf_iocash.c} and @file{isam/_cash.c}) - -@item Table Cache -This holds the last used tables. (@file{sql/sql_base.cc}) - -@item Hostname Cache -For quick lookup (with reverse name resolving). Is a must when one has a -slow DNS. -(@file{sql/hostname.cc}) - -@item Privilege Cache -To allow quick change between databases the last used privileges are -cached for each user/database combination. -(@file{sql/sql_acl.cc}) - -@item Heap Table Cache -Many use of @code{GROUP BY} or @code{DISTINCT} caches all found rows in -a @code{HEAP} table. (This is a very quick in-memory table with hash index.) - -@item Join buffer Cache -For every full join in a @code{SELECT} statement (a full join here means -there were no keys that one could use to find the next table in a list), -the found rows are cached in a join cache. One @code{SELECT} query can -use many join caches in the worst case. -@end table - -@node join_buffer_size, flush tables, caching, Top -@chapter How MySQL uses the join_buffer cache - -Basic information about @code{join_buffer_size}: - -@itemize @bullet -@item -It's only used in the case when join type is of type @code{ALL} or -@code{index}; In other words: no possible keys can be used. -@item -A join buffer is never allocated for the first not-const table, -even it it would be of type @code{ALL}/@code{index}. -@item -The buffer is allocated when we need to do a each full join between two -tables and freed after the query is done. -@item -Accepted row combinations of tables before the @code{ALL}/@code{index} -able is stored in the cache and is used to compare against each read -row in the @code{ALL} table. -@item -We only store the used fields in the join_buffer cache, not the -whole rows. -@end itemize - -Assume you have the following join: - -@example -Table name Type -t1 range -t2 ref -t3 @code{ALL} -@end example - -The join is then done as follows: - -@example -- While rows in t1 matching range - - Read through all rows in t2 according to reference key - - Store used fields form t1,t2 in cache - - If cache is full - - Read through all rows in t3 - - Compare t3 row against all t1,t2 combination in cache - - If rows satisfying join condition, send it to client - - Empty cache - -- Read through all rows in t3 - - Compare t3 row against all stored t1,t2 combinations in cache - - If rows satisfying join condition, send it to client -@end example - -The above means that table t3 is scanned - -@example -(size-of-stored-row(t1,t2) * accepted-row-cominations(t1,t2))/ -join_buffer_size+1 -@end example -times. - -Some conclusions: - -@itemize @bullet -@item -The larger the join_buff_size, the fewer scans of t3. -If @code{join_buff_size} is already large enough to hold all previous row -combinations then there is no speed to gain by making it bigger. -@item -If there is several tables of @code{ALL}/@code{index} then the we -allocate one @code{join_buffer_size buffer} for each of them and use the -same algorithm described above to handle it. (In other words, we store -the same row combination several times into different buffers) -@end itemize - -@node flush tables, Algorithms, join_buffer_size, Top -@chapter How MySQL Handles @code{FLUSH TABLES} - -@itemize @bullet - -@item -Flush tables is handled in @file{sql/sql_base.cc::close_cached_tables()}. - -@item -The idea of flush tables is to force all tables to be closed. This -is mainly to ensure that if someone adds a new table outside of -@strong{MySQL} (for example with @code{cp}) all threads will start using -the new table. This will also ensure that all table changes are flushed -to disk (but of course not as optimally as simple calling a sync on -all tables)! - -@item -When one does a @code{FLUSH TABLES}, the variable @code{refresh_version} -will be incremented. Every time a thread releases a table it checks if -the refresh version of the table (updated at open) is the same as -the current @code{refresh_version}. If not it will close it and broadcast -a signal on @code{COND_refresh} (to wait any thread that is waiting for -all instanses of a table to be closed). - -@item -The current @code{refresh_version} is also compared to the open -@code{refresh_version} after a thread gets a lock on a table. If the -refresh version is different the thread will free all locks, reopen the -table and try to get the locks again; This is just to quickly get all -tables to use the newest version. This is handled by -@file{sql/lock.cc::mysql_lock_tables()} and -@file{sql/sql_base.cc::wait_for_tables()}. - -@item -When all tables has been closed @code{FLUSH TABLES} will return an ok -to client. - -@item -If the thread that is doing @code{FLUSH TABLES} has a lock on some tables, -it will first close the locked tables, then wait until all other threads -have also closed them, and then reopen them and get the locks. -After this it will give other threads a chance to open the same tables. - -@end itemize - -@node Algorithms, mysys functions, flush tables, Top -@chapter Different algoritms used in MySQL - -MySQL uses a lot of different algorithms. This chapter tries to describe -some of these: - -@menu -* filesort:: -* bulk-insert:: -@end menu - -@node filesort, bulk-insert, Algorithms, Algorithms -@section How MySQL Does Sorting (@code{filesort}) - -@itemize @bullet - -@item -Read all rows according to key or by table scanning. - -@item -Store the sort-key in a buffer (@code{sort_buffer}). - -@item -When the buffer gets full, run a @code{qsort} on it and store the result -in a temporary file. Save a pointer to the sorted block. - -@item -Repeat the above until all rows have been read. - -@item -Repeat the following until there is less than @code{MERGEBUFF2} (15) -blocks left. - -@item -Do a multi-merge of up to @code{MERGEBUFF} (7) regions to one block in -another temporary file. Repeat until all blocks from the first file -are in the second file. - -@item -On the last multi-merge, only the pointer to the row (last part of -the sort-key) is written to a result file. - -@item -Now the code in @file{sql/records.cc} will be used to read through them -in sorted order by using the row pointers in the result file. -To optimize this, we read in a big block of row pointers, sort these -and then we read the rows in the sorted order into a row buffer -(@code{record_buffer}). - -@end itemize - -@node bulk-insert, , filesort, Algorithms -@section Bulk insert - -Logic behind bulk insert optimisation is simple. - -Instead of writing each key value to b-tree (that is to keycache, but -bulk insert code doesn't know about keycache) keys are stored in -balanced binary (red-black) tree, in memory. When this tree reaches its -memory limit it's writes all keys to disk (to keycache, that is). But -as key stream coming from the binary tree is already sorted inserting -goes much faster, all the necessary pages are already in cache, disk -access is minimized, etc. - -@node mysys functions, DBUG, Algorithms, Top -@chapter Functions In The @code{mysys} Library - -Functions in @code{mysys}: (For flags see @file{my_sys.h}) - -@table @code -@item int my_copy _A((const char *from, const char *to, myf MyFlags)); -Copy file from @code{from} to @code{to}. - -@item int my_delete _A((const char *name, myf MyFlags)); -Delete file @code{name}. - -@item int my_getwd _A((string buf, uint size, myf MyFlags)); -@item int my_setwd _A((const char *dir, myf MyFlags)); -Get and set working directory. - -@item string my_tempnam _A((const char *pfx, myf MyFlags)); -Make a unique temporary file name by using dir and adding something after -@code{pfx} to make name unique. The file name is made by adding a unique -six character string and @code{TMP_EXT} after @code{pfx}. -Returns pointer to @code{malloc()}'ed area for filename. Should be freed by -@code{free()}. - -@item File my_open _A((const char *FileName,int Flags,myf MyFlags)); -@item File my_create _A((const char *FileName, int CreateFlags, int AccsesFlags, myf MyFlags)); -@item int my_close _A((File Filedes, myf MyFlags)); -@item uint my_read _A((File Filedes, byte *Buffer, uint Count, myf MyFlags)); -@item uint my_write _A((File Filedes, const byte *Buffer, uint Count, myf MyFlags)); -@item ulong my_seek _A((File fd,ulong pos,int whence,myf MyFlags)); -@item ulong my_tell _A((File fd,myf MyFlags)); -Use instead of open, open-with-create-flag, close, read, and write -to get automatic error messages (flag @code{MYF_WME}) and only have -to test for != 0 if error (flag @code{MY_NABP}). - -@item int my_rename _A((const char *from, const char *to, myf MyFlags)); -Rename file from @code{from} to @code{to}. - -@item FILE *my_fopen _A((const char *FileName,int Flags,myf MyFlags)); -@item FILE *my_fdopen _A((File Filedes,int Flags,myf MyFlags)); -@item int my_fclose _A((FILE *fd,myf MyFlags)); -@item uint my_fread _A((FILE *stream,byte *Buffer,uint Count,myf MyFlags)); -@item uint my_fwrite _A((FILE *stream,const byte *Buffer,uint Count, myf MyFlags)); -@item ulong my_fseek _A((FILE *stream,ulong pos,int whence,myf MyFlags)); -@item ulong my_ftell _A((FILE *stream,myf MyFlags)); -Same read-interface for streams as for files. - -@item gptr _mymalloc _A((uint uSize,const char *sFile,uint uLine, myf MyFlag)); -@item gptr _myrealloc _A((string pPtr,uint uSize,const char *sFile,uint uLine, myf MyFlag)); -@item void _myfree _A((gptr pPtr,const char *sFile,uint uLine)); -@item int _sanity _A((const char *sFile,unsigned int uLine)); -@item gptr _myget_copy_of_memory _A((const byte *from,uint length,const char *sFile, uint uLine,myf MyFlag)); -@code{malloc(size,myflag)} is mapped to these functions if not compiled -with @code{-DSAFEMALLOC}. - -@item void TERMINATE _A((void)); -Writes @code{malloc()} info on @code{stdout} if compiled with -@code{-DSAFEMALLOC}. - -@item int my_chsize _A((File fd, ulong newlength, myf MyFlags)); -Change size of file @code{fd} to @code{newlength}. - -@item void my_error _D((int nr, myf MyFlags, ...)); -Writes message using error number (see @file{mysys/errors.h}) on @code{stdout}, -or using curses, if @code{MYSYS_PROGRAM_USES_CURSES()} has been called. - -@item void my_message _A((const char *str, myf MyFlags)); -Writes @code{str} on @code{stdout}, or using curses, if -@code{MYSYS_PROGRAM_USES_CURSES()} has been called. - -@item void my_init _A((void )); -Start each program (in @code{main()}) with this. - -@item void my_end _A((int infoflag)); -Gives info about program. -If @code{infoflag & MY_CHECK_ERROR}, prints if some files are left open. -If @code{infoflag & MY_GIVE_INFO}, prints timing info and malloc info -about program. - -@item int my_redel _A((const char *from, const char *to, int MyFlags)); -Delete @code{from} before rename of @code{to} to @code{from}. Copies state -from old file to new file. If @code{MY_COPY_TIME} is set, sets old time. - -@item int my_copystat _A((const char *from, const char *to, int MyFlags)); -Copy state from old file to new file. If @code{MY_COPY_TIME} is set, -sets old time. - -@item string my_filename _A((File fd)); -Returns filename of open file. - -@item int dirname _A((string to, const char *name)); -Copy name of directory from filename. - -@item int test_if_hard_path _A((const char *dir_name)); -Test if @code{dir_name} is a hard path (starts from root). - -@item void convert_dirname _A((string name)); -Convert dirname according to system. -In MSDOS, changes all characters to capitals and changes @samp{/} to @samp{\}. - -@item string fn_ext _A((const char *name)); -Returns pointer to extension in filename. - -@item string fn_format _A((string to,const char *name,const char *dsk,const char *form,int flag)); - format a filename with replace of library and extension and - converts between different systems. - params to and name may be identicall - function dosn't change name if name != to - Flag may be: 1 force replace filnames library with 'dsk' - 2 force replace extension with 'form' */ - 4 force Unpack filename (replace ~ with home) - 8 Pack filename as short as possibly for output to - user. - All open requests should allways use at least: - "open(fn_format(temp_buffe,name,"","",4),...)" to unpack home and - convert filename to system-form. - -@item string fn_same _A((string toname, const char *name, int flag)); -Copys directory and extension from @code{name} to @code{toname} if neaded. -Copying can be forced by same flags used in @code{fn_format()}. - -@item int wild_compare _A((const char *str, const char *wildstr)); -Compare if @code{str} matches @code{wildstr}. @code{wildstr} can contain -@samp{*} and @samp{?} as wildcard characters. -Returns 0 if @code{str} and @code{wildstr} match. - -@item void get_date _A((string to, int timeflag)); -Get current date in a form ready for printing. - -@item void soundex _A((string out_pntr, string in_pntr)) -Makes @code{in_pntr} to a 5 char long string. All words that sound -alike have the same string. - -@item int init_key_cache _A((ulong use_mem, ulong leave_this_much_mem)); -Use caching of keys in MISAM, PISAM, and ISAM. -@code{KEY_CACHE_SIZE} is a good size. -Remember to lock databases for optimal caching. - -@item void end_key_cache _A((void)); -End key caching. -@end table - - - -@node DBUG, protocol, mysys functions, Top -@chapter DBUG Tags To Use - -Here is some of the tags we now use: -(We should probably add a couple of new ones) - -@table @code -@item enter -Arguments to the function. - -@item exit -Results from the function. - -@item info -Something that may be interesting. - -@item warning -When something doesn't go the usual route or may be wrong. - -@item error -When something went wrong. - -@item loop -Write in a loop, that is probably only useful when debugging -the loop. These should normally be deleted when one is -satisfied with the code and it has been in real use for a while. -@end table - -Some specific to mysqld, because we want to watch these carefully: - -@table @code -@item trans -Starting/stopping transactions. - -@item quit -@code{info} when mysqld is preparing to die. - -@item query -Print query. -@end table - - -@node protocol, Fulltext Search, DBUG, Top -@chapter MySQL Client/Server Protocol - -@menu -* raw packet without compression:: -* raw packet with compression:: -* basic packets:: -* communication:: -* fieldtype codes:: -* protocol functions:: -* protocol version 2:: -* 4.1 protocol changes:: -* 4.1 field packet:: -* 4.1 field desc:: -* 4.1 ok packet:: -* 4.1 end packet:: -* 4.1 error packet:: -* 4.1 prep init:: -* 4.1 long data:: -* 4.1 execute:: -* 4.1 binary result:: -@end menu - -@node raw packet without compression, raw packet with compression, protocol, protocol -@section Raw Packet Without Compression - -@example -+-----------------------------------------------+ -| Packet Length | Packet no | Data | -| 3 Bytes | 1 Byte | n Bytes | -+-----------------------------------------------+ -@end example - -@table @asis -@item 3 Byte packet length -The length is calculated with int3store -See include/global.h for details. -The max packetsize can be 16 MB. - -@item 1 Byte packet no -If no compression is used the first 4 bytes of each packet is the header -of the packet. The packet number is incremented for each sent packet. -The first packet starts with 0. -@item n Byte data - -@end table - -The packet length can be recalculated with: - -@example -length = byte1 + (256 * byte2) + (256 * 256 * byte3) -@end example - - -@node raw packet with compression, basic packets, raw packet without compression, protocol -@section Raw Packet With Compression - -@example -+---------------------------------------------------+ -| Packet Length | Packet no | Uncomp. Packet Length | -| 3 Bytes | 1 Byte | 3 Bytes | -+---------------------------------------------------+ -@end example - -@table @asis -@item 3 Byte packet length -The length is calculated with int3store -See include/global.h for details. -The max packetsize can be 16 MB. - -@item 1 Byte packet no -@item 3 Byte uncompressed packet length -@end table - -If compression is used the first 7 bytes of each packet -is the header of the packet. - - -@node basic packets, communication, raw packet with compression, protocol -@section Basic Packets - -@menu -* ok packet:: -* error packet:: -@end menu - - -@node ok packet, error packet, basic packets, basic packets -@subsection OK Packet - -For details, see @file{sql/net_pkg.cc::send_ok()}. - -@example -+-----------------------------------------------+ -| Header | No of Rows | Affected Rows | -| | 1 Byte | 1-8 Byte | -|-----------------------------------------------| -| ID (last_insert_id) | Status | Length | -| 1-8 Byte | 2 Byte | 1-8 Byte | -|-----------------------------------------------| -| Messagetext | -| n Byte | -+-----------------------------------------------+ -@end example - -@table @asis -@item Header -@item 1 byte number of rows ? (always 0 ?) -@item 1-8 bytes affected rows -@item 1-8 byte id (last_insert_id) -@item 2 byte Status (usually 0) -@item If the OK-packege includes a message: -@item 1-8 bytes length of message -@item n bytes messagetext -@end table - - -@node error packet, , ok packet, basic packets -@subsection Error Packet - -@example -+-----------------------------------------------+ -| Header | Status code | Error no | -| | 1 Byte | 2 Byte | -|-----------------------------------------------| -| Messagetext | 0x00 | -| n Byte | 1 Byte | -+-----------------------------------------------+ -@end example - -@table @asis -@item Header -@item 1 byte status code (0xFF = ERROR) -@item 2 byte error number (is only sent to new 3.23 clients. -@item n byte errortext -@item 1 byte 0x00 -@end table - - -@node communication, fieldtype codes, basic packets, protocol -@section Communication - -@example -> Packet from server to client -< Paket from client tor server - - Login - ------ - > 1. packet - Header - 1 byte protocolversion - n byte serverversion - 1 byte 0x00 - 4 byte threadnumber - 8 byte crypt seed - 1 byte 0x00 - 2 byte CLIENT_xxx options (see include/mysql_com.h - that is supported by the server - 1 byte number of current server charset - 2 byte server status variables (SERVER_STATUS_xxx flags) - 13 byte 0x00 (not used yet). - - < 2. packet - Header - 2 byte CLIENT_xxx options - 3 byte max_allowed_packet for the client - n byte username - 1 byte 0x00 - 8 byte crypted password - 1 byte 0x00 - n byte databasename - 1 byte 0x00 - - > 3. packet - OK-packet - - - Command - -------- - < 1. packet - Header - 1 byte command type (e.g.0x03 = query) - n byte query - - Result set (after command) - -------------------------- - > 2. packet - Header - 1-8 byte field_count (packed with net_store_length()) - - If field_count == 0 (command): - 1-8 byte affected rows - 1-8 byte insert id - 2 bytes server_status (SERVER_STATUS_xx) - - If field_count == NULL_LENGTH (251) - LOAD DATA LOCAL INFILE - - If field_count > 0 Result Set: - - > n packets - Header Info - Column description: 5 data object /column - (See code in unpack_fields()) - - Columninfo for each column: - 1 data block table_name - 1 byte length of block - n byte data - 1 data block field_name - 1 byte length of block... - n byte data - 1 data block display length of field - 1 byte length of block - 3 bytes display length of filed - 1 data block type field of type (enum_field_types) - 1 byte length of block - 1 bytexs field of type - 1 data block flags - 1 byte length of block - 2 byte flags for the columns (NOT_NULL_FLAG, ZEROFILL_FLAG....) - 1 byte decimals - - if table definition: - 1 data block default value - - Actual result (one packet per row): - 4 byte header - 1-8 byte length of data - n data -@end example - -@node fieldtype codes, protocol functions, communication, protocol -@section Fieldtype Codes - -@example - display_length |enum_field_type |flags - ---------------------------------------------------- -Blob 03 FF FF 00 |01 FC |03 90 00 00 -Mediumblob 03 FF FF FF |01 FC |03 90 00 00 -Tinyblob 03 FF 00 00 |01 FC |03 90 00 00 -Text 03 FF FF 00 |01 FC |03 10 00 00 -Mediumtext 03 FF FF FF |01 FC |03 10 00 00 -Tinytext 03 FF 00 00 |01 FC |03 10 00 00 -Integer 03 0B 00 00 |01 03 |03 03 42 00 -Mediumint 03 09 00 00 |01 09 |03 00 00 00 -Smallint 03 06 00 00 |01 02 |03 00 00 00 -Tinyint 03 04 00 00 |01 01 |03 00 00 00 -Varchar 03 XX 00 00 |01 FD |03 00 00 00 -Enum 03 05 00 00 |01 FE |03 00 01 00 -Datetime 03 13 00 00 |01 0C |03 00 00 00 -Timestamp 03 0E 00 00 |01 07 |03 61 04 00 -Time 03 08 00 00 |01 0B |03 00 00 00 -Date 03 0A 00 00 |01 0A |03 00 00 00 -@end example - -@node protocol functions, protocol version 2, fieldtype codes, protocol -@section Functions used to implement the protocol - -@c This should be merged with the above one and changed to texi format - -@example - -Raw packets ------------ - -- The my_net_xxxx() functions handles the packaging of a stream of data - into a raw packet that contains a packet number, length and data. - -- This is implemented for the server in sql/net_serv.cc. - The client file, libmysql/net.c, is symlinked to this file - -The important functions are: - -my_net_write() Store a packet (= # number of bytes) to be sent -net_flush() Send the packets stored in the buffer -net_write_command() Send a command (1 byte) + packet to the server. -my_net_read() Read a packet - - -Include files -------------- - -- include/mysql.h is included by all MySQL clients. It includes the - MYSQL and MYSQL_RES structures. -- include/mysql_com.h is include by mysql.h and mysql_priv.h (the - server) and includes a lot of common functions and structures to - handle the client/server protocol. - - -Packets from server to client: ------------------------------ - -sql/net_pkg.cc: - - - Sending of error packets - - Sending of OK packets (= end of data) - - Storing of values in a packet - - -sql/sql_base.cc: - - - Function send_fields() sends the field description to the client. - -sql/sql_show.cc: - - - Sends results for a lot of SHOW commands, including: - SHOW DATABASES [like 'wildcard'] - SHOW TABLES [like 'wildcard'] - - -Packets from client to server: ------------------------------- - -This is done in libmysql/libmysql.c - -The important ones are: - -- mysql_real_connect() Connects to a mysqld server -- mysql_real_query() Sends a query to the server and - reads the ok packet or columns header. -- mysql_store_result() Read a result set from the server to memory -- mysql_use_result() Read a result set row by row from the server. - -- net_safe_read() Read a packet from the server with - error handling. -- net_field_length() Reads the length of a packet string. -- simple_command() Sends a command/query to the server. - - - -Connecting to mysqld (the MySQL server) ---------------------------------------- - -- On the client side: libmysql/libmysql.c::mysql_real_connect(). -- On the server side: sql/sql_parse.cc::check_connections() - -The packets sent during a connection are as follows - -Server: Send greeting package (includes server capabilites, server - version and a random string of bytes to be used to scramble - the password. -Client: Sends package with client capabilites, user name, scrambled - password, database name - -Server: Sends ok package or error package. - -Client: If init command specified, send it t the server and read - ok/error package. - - -Password functions ------------------- - -The passwords are scrambled to a random number and are stored in hex -format on the server. - -The password handling is done in sql/password.c. The important -function is 'scramble()', which takes the a password in clear text -and uses this to 'encrypt' the random string sent by the server -to a new message. - -The encrypted message is sent to the server which uses the stored -random number password to encrypt the random string sent to the -client. If this is equal to the new message the client sends to the -server then the password is accepted. -@end example - -@node protocol version 2, 4.1 protocol changes, protocol functions, protocol -@section Another description of the protocol - -@c This should be merged with the above one and changed to texi format. - -@example -***************************** -* -* PROTOCOL OVERVIEW -* -***************************** - -The MySQL protocol is relatively simple, and is designed for high performance -through minimisation of overhead, and extensibility through versioning and -options flags. It is a request-response protocol, and does not allow -multitasking or multiplexing over a single connection. There are two packet -formats, 'raw' and 'compressed' (which is used when both client and -server support zlib compression, and the client requests that data be -compressed): - -* RAW PACKET, shorter than 16 M * - -+-----------------------------------------------+ -| Packet Length | Packet no | Data | -| 3 Bytes | 1 Byte | n Bytes | -+-----------------------------------------------+ -^ ^ -| 'HEADER' | -+-------------------------------+ - - - * Packet Length: Calculated with int3store. See include/global.h for - details. The basic computation is length = byte1 + - (256 * byte2) + (256 * 256 * byte3). The max packetsize - can be 16 MB. - - * Packet no: The packet number is incremented for each sent packet. - The first packet for each query from the client - starts with 0. - - * Data: Specific to the operation being performed. Most often - used to send string data, such as a SQL query. - -* COMPRESSED PACKET * - -+---------------------------------------------------+-----------------+ -| Packet Length | Packet no | Uncomp. Packet Length | Compressed Data | -| 3 Bytes | 1 Byte | 3 Bytes | n bytes | -+---------------------------------------------------+-----------------+ -^ ^ -| 'HEADER' | -+---------------------------------------------------+ - - * Packet Length: Calculated with int3store. See include/my_global.h for - details. The basic computation is length = byte1 + - (256 * byte2) + (256 * 256 * byte3). The max packetsize - can be 16 MB. - - * Packet no: The packet number is incremented for each sent packet. - The first packet starts with 0. - - * Uncomp. Packet Length: The length of the original, uncompressed packet - If this is zero then the data is not compressed. - - * Compressed Data: The original packet, compressed with zlib compression - - -When using the compressed protocol, the client/server will only compress -send packets where the new packet is smaller than the not compressed one. -In other words, some packets may be compressed while others will not. - -The 'compressed data' is one or more packets in *RAW PACKET* format. - -***************************** -* -* FLOW OF EVENTS -* -***************************** - -To understand how a client communicates with a MySQL server, it is easiest -to start with a high-level flow of events. Each event section will then be -followed by details of the exact contents of each type of packet involved -in the event flow. - -* * -* CONNECTION ESTABLISHMENT * -* * - -Clients connect to the server via a TCP/IP socket (port 3306 by default), a -Unix Domain Socket, or named pipes (on Windows). Once connected, the -following connection establishment sequence is followed: - -+--------+ +--------+ -| Client | | Server | -+--------+ +--------+ - | | - | Handshake initialisation, including MySQL server version, | - | protocol version and options supported, as well as the seed | - | for the password hash | - | | - | <-------------------------------------------------------------- | - | | - | Client options supported, max packet size for client | - | username, password crypted with seed from server, database | - | name. | - | | - | --------------------------------------------------------------> | - | | - | 'OK' packet if authentication succeeds, 'ERROR' packet if | - | authentication fails. | - | | - | <-------------------------------------------------------------- | - | | - - - -* HANDSHAKE INITIALISATION PACKET * - - -+--------------------------------------------------------------------+ -| Header | Prot. Version | Server Version String | 0x00 | -| | 1 Byte | n bytes | 1 byte | -|--------------------------------------------------------------------| -| Thread Number | Crypt Seed | 0x00 | CLIENT_xxx options | -| | | | supported by server | -| 4 Bytes | 8 Bytes | 1 Byte | 2 Bytes | -|--------------------------------------------------------------------| -| Server charset no. | Server status variables | 0x00 padding | -| 1 Byte | 2 Bytes | 13 bytes | -+--------------------------------------------------------------------+ - - * Protocol version (currently '10') - * Server Version String (e.g. '4.0.5-beta-log'). Can be any length as - it's followed by a 0 byte. - * Thread Number - ID of server thread handling this connection - * Crypt seed - seed used to crypt password in auth packet from client - * CLIENT_xxx options - see include/mysql_com.h - * Server charset no. - Index of charset in use by server - * Server status variables - see include/mysql_com.h - * The padding bytes are reserverd for future extensions to the protocol - -* CLIENT AUTH PACKET * - - -+--------------------------------------------------------------------+ -| Header | CLIENT_xxx options supported | max_allowed_packet | -| | by client | for client | -| | 2 Bytes | 3 bytes | -|--------------------------------------------------------------------| -| User Name | 0x00 | Crypted Password | 0x00 | Database Name | -| n Bytes | 1 Byte | 8 Bytes | 1 Byte | n Bytes | -|--------------------------------------------------------------------| -| 0x00 | -| 1 Byte | -+--------------------------------------------------------------------+ - - * CLIENT_xxx options that this client supports: - -#define CLIENT_LONG_PASSWORD 1 /* new more secure passwords */ -#define CLIENT_FOUND_ROWS 2 /* Found instead of affected rows */ -#define CLIENT_LONG_FLAG 4 /* Get all column flags */ -#define CLIENT_CONNECT_WITH_DB 8 /* One can specify db on connect */ -#define CLIENT_NO_SCHEMA 16 /* Don't allow database.table.column */ -#define CLIENT_COMPRESS 32 /* Can use compression protocol */ -#define CLIENT_ODBC 64 /* Odbc client */ -#define CLIENT_LOCAL_FILES 128 /* Can use LOAD DATA LOCAL */ -#define CLIENT_IGNORE_SPACE 256 /* Ignore spaces before '(' */ -#define CLIENT_INTERACTIVE 1024 /* This is an interactive client */ -#define CLIENT_SSL 2048 /* Switch to SSL after handshake */ -#define CLIENT_IGNORE_SIGPIPE 4096 /* IGNORE sigpipes */ -#define CLIENT_TRANSACTIONS 8192 /* Client knows about transactions */ - - * max_allowed_packet for the client (in 'int3store' form) - * User Name - user to authenticate as. Is followed by a null byte. - * Crypted Password - password crypted with seed given in packet from - server, see scramble() in sql/password.c - * Database name (optional) - initial database to use once connected - Is followed by a null byte - -At the end of every client/server exchange there is either an 'OK' packet -or an 'ERROR' packet sent from the server. To determine whether a packet is -an 'OK' packet, or an 'ERROR' packet, check if the first byte (after the -header) is 0xFF. If it has the value of 0xFF, the packet is an 'ERROR' -packet. - - -* OK PACKET * - -For details, see sql/net_pkg.cc::send_ok() - -+-----------------------------------------------+ -| Header | No of Rows | Affected Rows | -| | 1 Byte | 1-9 Byte | -|-----------------------------------------------| -| ID (last_insert_id) | Status | Length | -| 1-9 Byte | 2 Byte | 1-9 Byte | -|-----------------------------------------------| -| Messagetext | -| n Byte | -+-----------------------------------------------+ - - * Number of rows, always 0 - * Affected rows - * ID (last_insert_id) - value for auto_increment column (if any) - * Status (usually 0) - -In general, in the MySQL protocol, fields in a packet that that -represent numeric data, such as lengths, that are labeled as '1-9' -bytes can be decoded by the following logic: - - If the first byte is '251', the - corresponding column value is NULL (only appropriate in - 'ROW DATA' packets). - - If the first byte is '252', the value stored can be read - from the following 2 bytes as a 16-bit integer. - - - If the first byte is '253' the value stored can be read - from the following 4 bytes as a 32-bit long integer - - - If the first byte is '254', the value stored can be read - from the following 8 bytes as a 64-byte long - - Otherwise (values 0-250), the value stored is the value of the - first byte itself. - - -If the OK-packet includes a message: - - * Length of message - * Message Text - - -* ERROR PACKET * - -+-----------------------------------------------+ -| Header | Status code | Error no | -| | 1 Byte | 2 Byte | -|-----------------------------------------------| -| Messagetext | | -| n Byte | | -+-----------------------------------------------+ - - * Status code (0xFF = ERROR) - * Error number (is only sent to 3.23 and newer clients) - * Error message text (ends at end of packet) - -Note that the error message is not null terminated. -The client code can however assume that the packet ends with a null -as my_net_read() will always add an end-null to all read packets to -make things easier for the client. - -Example: - -Packet dump of client connecting to server: - -+------------------------- Protocol Version (10) -| -| +---------------------- Server Version String (0x00 terminated) -| | -| | -0a 34 2e 30 2e 35 2d 62 . 4 . 0 . 5 - b -65 74 61 2d 6c 6f 67 00 e t a - l o g . -15 00 00 00 2b 5a 65 6c . . . . + Z e l - | | - | +------------ First 4 bytes of crypt seed - | - +------------------------ Thread Number - -+------------------------- Last 4 bytes of crypt seed -| -| +-------- CLIENT_XXX Options supported by server -| | -| +-+--+ +--- Server charset index -| | | | -6f 69 41 46 00 2c 28 08 o i A F . , ( . -02 00 00 00 00 00 00 00 . . . . . . . . -| | -| +---------------------- 0x00 padding begins -| -+------------------------- Server status (0x02 = - SERVER_STATUS_AUTOCOMMIT) - -00 00 00 00 00 00 00 00 . . . . . . . . - -* Client Authentication Response (Username 'test', no database - selected) * - - +--------------------- Packet Length (0x13 = 19 bytes) - | - | +--------------- Packet Sequence # - | | - | | +----------- CLIENT_XXX Options supported by client - | | -+---+---+ | +-+-+ -| | | | | -13 00 00 01 03 00 1e 00 . . . . . . . . -00 74 65 73 74 00 48 5e . t e s t . H ^ - | | | - +----+-----+ +------- Scrambled password, 0x00 terminated - | - +----------------- Username, 0x00 terminated - -57 4a 4e 41 4a 4e 00 00 W J N A J N . . -00 . - - ->From this point on, the server waits for 'commands' from the client -which include queries, database shutdown, quit, change user, etc (see -the COM_xxxx values in include/mysql_com.h for the latest -command codes). - -* * -* COMMAND PROCESSING * -* * - -+--------+ +--------+ -| Client | | Server | -+--------+ +--------+ - | | - | A command packet, with a command code, and string data | - | when appropriate (e.g. a query), (see the COM_xxxx values | - | in include/mysql_com.h for the command codes) | - | | - | --------------------------------------------------------------> | - | | - | A 'RESULT' packet if the command completed successfully, | - | an 'ERROR' packet if the command failed. 'RESULT' packets | - | take different forms (see the details following this chart) | - | depending on whether or not the command returns rows. | - | | - | <-------------------------------------------------------------- | - | | - | n 'FIELD PACKET's (if rows are returned) | - | | - | <-------------------------------------------------------------- | - | | - | 'LAST DATA' packet | - | | - | <-------------------------------------------------------------- | - | | - | n 'ROW PACKET's (if rows are returned) | - | | - | <-------------------------------------------------------------- | - | | - | 'LAST DATA' packet | - | | - | <-------------------------------------------------------------- | - | | - - -* Command Packet * - -+------------------------------------------------------+ -| Header | Command type | Query (if applicable) | -| | 1 Byte | n Bytes | -+------------------------------------------------------+ - - * Command type: (e.g.0x03 = query, see the COM_xxxx values in - include/mysql_com.h) - * Query (if applicable) - -Note that my_net_read() null-terminates all packets on the -receiving side of the channel to make it easier for the code -examining the packets. - -The current command codes are: - - 0x00 COM_SLEEP - 0x01 COM_QUIT - 0x02 COM_INIT_DB - 0x03 COM_QUERY - 0x04 COM_FIELD_LIST - 0x05 COM_CREATE_DB - 0x06 COM_DROP_DB - 0x07 COM_REFRESH - 0x08 COM_SHUTDOWN - 0x09 COM_STATISTICS - 0x0a COM_PROCESS_INFO - 0x0b COM_CONNECT - 0x0c COM_PROCESS_KILL - 0x0d COM_DEBUG - 0x0e COM_PING - 0x0f COM_TIME - 0x10 COM_DELAYED_INSERT - 0x11 COM_CHANGE_USER - 0x12 COM_BINLOG_DUMP - 0x13 COM_TABLE_DUMP - 0x14 COM_CONNECT_OUT - 0x15 COM_REGISTER_SLAVE - -* Result Packet * - -Result packet for a command returning _no_ rows: - -+-----------------------------------------------+ -| Header | Field Count | Affected Rows | -| | 1-9 Bytes | 1-9 Bytes | -|-----------------------------------------------| -| ID (last_insert_id) | Server Status | -| 1-9 Bytes | 2 Bytes | -+-----------------------------------------------+ - - * Field Count: Has value of '0' for commands returning _no_ rows - * Affected rows: Count of rows affected by INSERT/UPDATE/DELETE, etc. - * ID: value of auto_increment column in row (if any). 0 if - * Server Status: Usually 0 - -Result packet for a command returning rows: - -+-------------------------------+ -| Header | Field Count | -| | 1-9 Bytes | -+-------------------------------+ - - * Field Count: number of columns/fields in result set, - (packed with net_store_length() in sql/net_pkg.cc) - -This is followed by as many packets as the number of fields ('Field Count') -that contain the metadata for each column/field (see unpack_fields() in -libmysql/libmysql.c): - - -* FIELD PACKET * - -+-----------------------------------------------+ -| Header | Table Name | -| | length-coded-string | -|-----------------------------------------------| -| Field Name | -| length-code-string | -|-----------------------------------------------| -| Display length of field -| length-coded-binary (4 bytes) | -|-----------------------------------------------| -| Field Type (enum_field_types in mysql_com.h) | -| length-coded-binary (2 bytes) | -|-----------------------------------------------| -| Field Flags | Decimal Places| -| length-coded-binary (3 bytes) | 1 Byte | -+--------------+-------------+------------------+ - - * A length coded string is a string where we first have a packet - length (1-9 bytes, packed_with net_store_length()) followed - by a string. - * A length coded binary is a length (1 byte) followed by an integer - value in low-byte-first order. For the moment this type is always - fixed length in this packet. - - * Table Name - the name of the table the column comes from - * Field Name - the name of the column/field - * Display length of field - length of field - * Field Type - Type of field, see enum_field_types in - include/mysql_com.h - - Current field types are: - - 0x00 FIELD_TYPE_DECIMAL - 0x01 FIELD_TYPE_TINY - 0x02 FIELD_TYPE_SHORT - 0x03 FIELD_TYPE_LONG - 0x04 FIELD_TYPE_FLOAT - 0x05 FIELD_TYPE_DOUBLE - 0x06 FIELD_TYPE_NULL - 0x07 FIELD_TYPE_TIMESTAMP - 0x08 FIELD_TYPE_LONGLONG - 0x09 FIELD_TYPE_INT24 - 0x0a FIELD_TYPE_DATE - 0x0b FIELD_TYPE_TIME - 0x0c FIELD_TYPE_DATETIME - 0x0d FIELD_TYPE_YEAR - 0x0e FIELD_TYPE_NEWDATE - 0xf7 FIELD_TYPE_ENUM - 0xf8 FIELD_TYPE_SET - 0xf9 FIELD_TYPE_TINY_BLOB - 0xfa FIELD_TYPE_MEDIUM_BLOB - 0xfb FIELD_TYPE_LONG_BLOB - 0xfc FIELD_TYPE_BLOB - 0xfd FIELD_TYPE_VAR_STRING - 0xfe FIELD_TYPE_STRING - 0xff FIELD_TYPE_GEOMETRY - - * Field Flags - NOT_NULL_FLAG, PRI_KEY_FLAG, xxx_FLAG in - include/mysql_com.h - - -Note that the packet format in 4.1 has slightly changed to allow more values. - - -* ROW PACKET * - -+-----------------------------------------------+ -| Header | Data Length | Column Data | ....for each column -| | 1-9 Bytes | n Bytes | -+-----------------------------------------------+ - - * Data Length: (packed with net_store_length() in sql/net_pkg.cc) - - If 'Data Length' == 0, this is an 'ERROR PACKET'. - - * Column Data: String representation of data. MySQL always sends result set - data as strings. - -* LAST DATA PACKET * - -Packet length is < 9 bytes, and first byte is 0xFE - -+--------+ -| 0xFE | -| 1 Byte | -+--------+ - -Examples: - -*********** -* -* INITDB Command -* -*********** - -A client issuing an 'INITDB' (select the database to use) command, -followed by an 'OK' packet with no rows and no affected rows from -the server: - -* INITDB (select database to use) 'COMMAND' Packet * - - +--------------------- Packet Length (5 bytes) - | - | +--------------- Packet Sequence # - | | - | | +------------ Command # (INITDB = 0x02) - | | -+---+---+ | | +---------- Beginning of query data -| | | | | -05 00 00 00 02 74 65 73 . . . . . t e s -74 t - -* 'OK' Packet with no rows, and no rows affected * - - +--------------------- Packet Length (3 bytes) - | - | +--------------- Packet Sequence # - | | -+---+---+ | -| | | -03 00 00 01 00 00 00 . . . . . . . - - -*********** -* -* SELECT query example -* -*********** - -Client issuing a 'SELECT *' query on the following table: - - CREATE TABLE number_test (minBigInt bigint, - maxBigInt bigint, - testBigInt bigint) - -* 'COMMAND' Packet with QUERY (select ...) * - - +--------------------- Packet Length (26) - | - | +--------------- Packet Sequence # - | | - | | +------------ Command # (QUERY = 0x03) - | | -+---+---+ | | +---------- Beginning of query data -| | | | | -1a 00 00 00 03 53 45 4c . . . . . S E L -45 43 54 20 2a 20 66 72 E C T . * . f r -6f 6d 20 6e 75 6d 62 65 o m . n u m b e -72 5f 74 65 73 74 r _ t e s t - - -and receiving an 'OK' packet with a 'FIELD COUNT' of 3 - - -* 'OK' Packet with 3 fields * - - +--------------------- Packet Length (3 bytes) - | - | +--------------- Packet Sequence # - | | -+---+---+ | -| | | -01 00 00 01 03 . . . . . - -Followed immediately by 3 'FIELD' Packets. Note, the individual packets -are delimitted by =======, so that all fields can be annotated in the first -'FIELD' packet example: - -============================================================= - - +--------------------- Packet Length (0x1f = 31 bytes) - | - | +--------------- Packet Sequence # - | | - | | +------------ Block Length (0x0b = 11 bytes) - | | | -+---+---+ | | +--------- Table Name (11 bytes long) -| | | | | -1f 00 00 02 0b 6e 75 6d . . . . . n u m -62 65 72 5f 74 65 73 74 b e r _ t e s t - - +------------------------ Block Length (9 bytes) - | - | +--------------------- Column Name (9 bytes long) - | | -09 6d 69 6e 42 69 67 49 . m i n B i g I -6e 74 03 14 00 00 01 08 n t . . . . . . - | | | | | - | +---+---+ | +--- Field Type (0x08 = FIELD_TYPE_LONGLONG) - | | | - | | +------ Block Length (1) - | | - | +--------------- Display Length (0x14 = 20 chars) - | - +------------------ Block Length (3) - - +------------------------ Block Length (2) - | - | +-------------------- Field Flags (0 - no flags set) - | | - | +---+ +--------------- Decimal Places (0) - | | | | -02 00 00 00 . . . . - -============================================================= - -'FIELD' packet for the 'number_Test.maxBigInt' column - -1f 00 00 03 0b 6e 75 6d . . . . . n u m -62 65 72 5f 74 65 73 74 b e r _ t e s t -09 6d 61 78 42 69 67 49 . m a x B i g I -6e 74 03 14 00 00 01 08 n t . . . . . . -02 00 00 00 . . . . - -============================================================= - -'FIELD' packet for the 'number_test.testBigInt' column - -20 00 00 04 0b 6e 75 6d . . . . . n u m -62 65 72 5f 74 65 73 74 b e r _ t e s t -0a 74 65 73 74 42 69 67 . t e st B i g -49 6e 74 03 14 00 00 01 I n t . . . . . -08 02 00 00 00 . . . . . -============================================================= - -Followed immediately by one 'LAST DATA' packet: - -fe 00 . . - -Followed immediately by 'n' row packets (in this case, only -one packet is sent from the server, for simplicity's sake): - - - +--------------------- Packet Length (0x52 = 82 bytes) - | - | +--------------- Packet Sequence # - | | - | | +------------ Data Length (0x14 = 20 bytes) - | | | -+---+---+ | | +--------- String Data '-9223372036854775808' -| | | | | (repeat Data Length/Data sequence) - -52 00 00 06 14 2d 39 32 . . . . . - 9 2 -32 33 33 37 32 30 33 36 2 3 3 7 2 0 3 6 -38 35 34 37 37 35 38 30 8 5 4 7 7 5 8 0 -38 13 39 32 32 33 33 37 8 . 9 2 2 3 3 7 -32 30 33 36 38 35 34 37 2 0 3 6 8 5 4 7 -37 35 38 30 37 0a 36 31 7 5 8 0 7 . 6 1 -34 37 34 38 33 36 34 37 4 7 4 8 3 6 4 7 - -Followed immediately by one 'LAST DATA' packet: - -fe 00 . . -@end example - - -@c The Index was empty, and ugly, so I removed it. (jcole, Sep 7, 2000) - -@c @node Index -@c @unnumbered Index - -@c @printindex fn - -@c @node 4.1 protocol,,, -@c @chapter MySQL 4.1 protocol - -@node 4.1 protocol changes, 4.1 field packet, protocol version 2, protocol -@section Changes to 4.0 protocol in 4.1 - -All basic packet handling is identical to 4.0. When communication -with an old 4.0 or 3.x client we will use the old protocol. - -The new things that we support with 4.1 are: - -@itemize @bullet -@item -Warnings -@item -Prepared statements -@item -Binary protocol (will be faster than the current protocol that -converts everything to strings) -@end itemize - - -What has changed in 4.1 are: - -@itemize @bullet -@item -A lot of new field information (database, real table name etc) -@item -The 'ok' packet has more status fields -@item -The 'end' packet (send last for each result set) now contains some -extra information -@item -New protocol for prepared statements. In this case all parameters and -results will sent as binary (low-byte-first). -@end itemize - - -@node 4.1 field packet, 4.1 field desc, 4.1 protocol changes, protocol -@section 4.1 field description packet - -The field description packet is sent as a response to a query that -contains a result set. It can be distinguished from a ok packet by -the fact that the first byte can't be 0 for a field packet. -@xref{4.1 ok packet}. - -The header packet has the following structure: - -@multitable @columnfractions .10 .90 -@item Size @tab Comment -@item 1-9 @tab Number of columns in result set (never 0) -@item 1-9 @tab Extra information sent be some command (SHOW COLUMNS -uses this to send the number of rows in the table) -@end multitable - -This packet is always followed by a field description set. -@xref{4.1 field desc}. - -@node 4.1 field desc, 4.1 ok packet, 4.1 field packet, protocol -@section 4.1 field description result set - -The field description result set contains the meta info for a result set. - -@multitable @columnfractions .20 .80 -@item Type @tab Comment -@item string @tab Database name -@item string @tab Table name alias (or table name if no alias) -@item string @tab Real table name -@item string @tab Alias for column name (or column name if not used) -@item 3 byte int @tab Length of column definition -@item 1 byte int @tab Enum value for field type -@item 3 byte int @tab 2 byte column flags (NOT_NULL_FLAG etc..) + 1 byte number of decimals. -@item string int @tab Default value, only set when using mysql_list_fields(). -@end multitable - - -@node 4.1 ok packet, 4.1 end packet, 4.1 field desc, protocol -@section 4.1 ok packet - -The ok packet is the first that is sent as an response for a query -that didn't return a result set. - -The ok packet has the following structure: - -@multitable @columnfractions .10 .90 -@item Size @tab Comment -@item 1 @tab 0 ; Marker for ok packet -@item 1-9 @tab Affected rows -@item 1-9 @tab Last insert id (0 if one wasn't used) -@item 2 @tab Server status; Can be used by client to check if we are inside an transaction -@item 2 @tab Warning count -@item 1-9 @tab Message length (optional) -@item xxx @tab Message (optional) -@end multitable - -Size 1-9 means that the parameter is packed in to 1-9 bytes depending on -the value. (See function sql/net_pkg.cc::net_store_length). - -The message is optional. For example for multi line INSERT it -contains a string for how many rows was inserted / deleted. - - -@node 4.1 end packet, 4.1 error packet, 4.1 ok packet, protocol -@section 4.1 end packet - -The end packet is sent as the last packet for - -@itemize @bullet -@item -End of field information -@item -End of parameter type information -@item -End of result set -@end itemize - -The end packet has the following structure: - -@multitable @columnfractions .10 .90 -@item Size @tab Comment -@item 1 @tab 254 ; Marker for EOF packet -@item 2 @tab Warning count -@item 2 @tab Status flags (For flags like SERVER_STATUS_MORE_RESULTS) -@end multitable - -Note that a normal packet may start with byte 254, which means -'length stored in 9 bytes'. One can different between these cases -by checking the packet length < 9 bytes (in which case it's and end -packet). - - -@node 4.1 error packet, 4.1 prep init, 4.1 end packet, protocol -@section 4.1 error packet. - -The error packet is sent when something goes wrong. -The error packet has the following structure: - -@multitable @columnfractions .10 .90 -@item Size @tab Comment -@item 1 @tab 255 Error packet marker -@item 2 @tab Error code -@item 1-255 @tab Null terminated error message -@end multitable - -The client/server protocol is designed in such a way that a packet -can only start with 255 if it's an error packet. - - -@node 4.1 prep init, 4.1 long data, 4.1 error packet, protocol -@section 4.1 prepared statement init packet - -This is the return packet when one sends a query with the COM_PREPARE -command. - -@multitable @columnfractions .10 .90 -@item Size @tab Comment -@item 4 @tab Statement handler id -@item 2 @tab Number of columns in result set -@item 2 @tab Number of parameters in query -@end multitable - -After this, there is a packet that contains the following for each -parameter in the query: - -@multitable @columnfractions .10 .90 -@item Size @tab Comment -@item 2 @tab Enum value for field type. (MYSQL_TYPE_UNKNOWN if not known) -@item 2 @tab 2 byte column flags (NOT_NULL_FLAG etc) -@item 1 @tab Number of decimals -@item 4 @tab Max column length. -@end multitable - -Note that the above is not yet in 4.1 but will be added this month. - -As MySQL can have a parameter 'anywhere' it will in many cases not be -able to provide the optimal information for all parameters. - -If number of columns, in the header packet, is not 0 then the -prepared statement will contain a result set. In this case the packet -is followed by a field description result set. @xref{4.1 field desc}. - - -@node 4.1 long data, 4.1 execute, 4.1 prep init, protocol -@section 4.1 long data handling - -This is used by mysql_send_long_data() to set any parameter to a string -value. One can call mysql_send_long_data() multiple times for the -same parameter; The server will concatenate the results to a one big -string. - -The server will not require an end packet for the string. -mysql_send_long_data() is responsible updating a flag that all data -has been sent. (Ie; That the last call to mysql_send_long_data() has -the 'last_data' flag set). - -This packet is sent from client -> server: - -@multitable @columnfractions .10 .90 -@item Size @tab Comment -@item 4 @tab Statement handler -@item 2 @tab Parameter number -@item 2 @tab Type of parameter (not used at this point) -@item # @tab data (Rest of packet) -@end multitable - -The server will NOT send an @code{ok} or @code{error} packet in -responce for this. If there is any errors (like to big string), one -will get the error when calling execute. - -@node 4.1 execute, 4.1 binary result, 4.1 long data, protocol -@section 4.1 execute - -On execute we send all parameters to the server in a COM_EXECUTE -packet. - -The packet contains the following information: - -@multitable @columnfractions .30 .70 -@item Size @tab Comment -@item (param_count+9)/8 @tab Null bit map (2 bits reserved for protocol) -@item 1 @tab new_parameter_bound flag. Is set to 1 for first -execute or if one has rebound the parameters. -@item 2*param_count @tab Type of parameters (only given if new_parameter_bound flag is 1) -@item # @tab Parameter data, repeated for each parameter that are -NOT NULL and not used with mysql_send_long_data(). -@end multitable - -The null-bit-map is for all parameters (including parameters sent with -'mysql_send_long_data). If parameter 0 is NULL, then bit 0 in the -null-bit-map should be 1 (ie: first byte should be 1) - -The parameters are stored the following ways: - -@multitable @columnfractions .20 .10 .70 -@item Type @tab Size @tab Comment -@item tinyint @tab 1 @tab One byte integer -@item short @tab 2 @tab -@item int @tab 4 @tab -@item longlong @tab 8 @tab -@item float @tab 4 @tab -@item double @tab 8 @tab -@item string @tab 1-9 + # @tab Packed string length + string -@end multitable - -The result for this will be either an ok packet or a binary result -set. - -@node 4.1 binary result, , 4.1 execute, protocol -@section 4.1 binary result set - -A binary result are sent the following way. - -For each result row: - -@itemize -@item -null bit map with first two bits set to 01 (bit 0,1 value 1) -@item -parameter data, repeated for each not null result column. -@end itemize - -The idea with the reserving two bits in the null map is that we can -use standard error (first byte 255) and ok packets (first byte 0) -to end a result sets. - -Except that the null-bit-map is shifted two steps, the server is -sending the data to the client the same way that the server is sending -bound parameters to the client. The server is always sending the data -as type given for 'column type' for respective column. It's up to the -client to convert the parameter to the requested type. - -DATETIME, DATE and TIME are sent to the server in a binary format as follows: - -@multitable @columnfractions .20 .10 .70 -@item Type @tab Size @tab Comment -@item date @tab 1 + 0-11 @tab Length + 2 byte year, 1 byte MMDDHHMMSS, 4 byte billionth of a second -@item datetime @tab 1 + 0-11 @tab Length + 2 byte year, 1 byte MMDDHHMMSS, 4 byte billionth of a second -@item time @tab 1 + 0-14 @tab Length + sign (0 = pos, 1= neg), 4 byte days, 1 byte HHMMDD, 4 byte billionth of a second -@end multitable - -The first byte is a length byte and then comes all parameters that are -not 0. (Always counted from the beginning). - -@node Fulltext Search, MyISAM Record Structure, protocol, Top -@chapter Fulltext Search in MySQL - -Hopefully, sometime there will be complete description of -fulltext search algorithms. -Now it's just unsorted notes. - -@menu -* Weighting in boolean mode:: -@end menu - -@node Weighting in boolean mode, , Fulltext Search, Fulltext Search -@section Weighting in boolean mode - -The basic idea is as follows: in expression -@code{A or B or (C and D and E)}, either @code{A} or @code{B} alone -is enough to match the whole expression. While @code{C}, -@code{D}, and @code{E} should @strong{all} match. So it's -reasonable to assign weight 1 to @code{A}, @code{B}, and -@code{(C and D and E)}. And @code{C}, @code{D}, and @code{E} -should get a weight of 1/3. - -Things become more complicated when considering boolean -operators, as used in MySQL FTB. Obvioulsy, @code{+A +B} -should be treated as @code{A and B}, and @code{A B} - -as @code{A or B}. The problem is, that @code{+A B} can @strong{not} -be rewritten in and/or terms (that's the reason why this - extended - -set of operators was chosen). Still, aproximations can be used. -@code{+A B C} can be approximated as @code{A or (A and (B or C))} -or as @code{A or (A and B) or (A and C) or (A and B and C)}. -Applying the above logic (and omitting mathematical -transformations and normalization) one gets that for -@code{+A_1 +A_2 ... +A_N B_1 B_2 ... B_M} the weights -should be: @code{A_i = 1/N}, @code{B_j=1} if @code{N==0}, and, -otherwise, in the first rewritting approach @code{B_j = 1/3}, -and in the second one - @code{B_j = (1+(M-1)*2^M)/(M*(2^(M+1)-1))}. - -The second expression gives somewhat steeper increase in total -weight as number of matched B's increases, because it assigns -higher weights to individual B's. Also the first expression in -much simplier. So it is the first one, that is implemented in MySQL. - - -@node MyISAM Record Structure, InnoDB Record Structure, Fulltext Search, Top -@chapter MyISAM Record Structure - -@section Introduction - -When you say: -@* - -@strong{CREATE TABLE Table1 ...} -@* - -MySQL creates files named Table1.MYD ("MySQL Data"), Table1.MYI -("MySQL Index"), and Table1.frm ("Format"). These files will be in the -directory: @* -/<datadir>/<database>/ -@* - -For example, if you use Linux, you might find the files here (assume -your database name is "test"): @* -/usr/local/var/test -@* - -And if you use Windows, you might find the files in this directory: @* -\mysql\data\test\ -@*@* - -Let's look at the .MYD Data file (MyISAM SQL Data file) more closely. -There are three possible formats -- fixed, dynamic, and packed. First, -let's discuss the fixed format. - - -@table @strong -@item Page Size -Unlike most DBMSs, MySQL doesn't store on disk using pages. Therefore -you will not see filler space between rows. (Reminder: This does not -refer to BDB and InnoDB tables, which do use pages). -@* - -@item Record Header -The minimal record header is a set of flags: -@itemize @bullet -@item -"X bit" = 0 if row is deleted, = 1 if row is not deleted -@item -"Null Bits" = 0 if column is not NULL, = 1 if column is NULL -@item -"Filler Bits" = 1 -@end itemize -@end table -@* - -The length of the record header is thus:@* -(1 + number of NULL columns + 7) / 8 bytes@* -After the header, all columns are stored in -the order that they were created, which is the -same order that you would get from SHOW COLUMNS. - -Here's an example. Suppose you say: -@* - -@strong{CREATE TABLE Table1 (column1 CHAR(1), column2 CHAR(1), column3 CHAR(1))} -@* - -@strong{INSERT INTO Table1 VALUES ('a', 'b', 'c')} -@* - -@strong{INSERT INTO Table1 VALUES ('d', NULL, 'e')} -@* - -A CHAR(1) column takes precisely one byte (plus one bit of overhead -that is assigned to every column -- I'll describe the details of -column storage later). So the file Table1.MYD looks like this: -@* - -@strong{Hexadecimal Display of Table1.MYD file}@* -@code{ -F1 61 62 63 00 F5 64 00 66 00 ... .abc..d e. -} -@* - -Here's how to read this hexadecimal-dump display:@* -@itemize @bullet -@item -The hexadecimal numbers @code{F1 61 62 63 00 F5 64 20 66 00} are byte -values and the column on the right is an attempt to show the -same bytes in ASCII. -@item -The @code{F1} byte means that there are no null fields in the first row. -@item -The @code{F5} byte means that the second column of the second row is NULL. -@end itemize - -(It's probably easier to understand the flag setting if you restate -@code{F5} as @code{11110101 binary}, and (a) notice that the third flag bit from the -right is @code{on}, and (b) remember that the first flag bit is the X bit.) -@* - -There are complications -- the record header is more complex if there -are variable-length fields -- but the simple display shown in the -example is exactly what you'd see if you looked at the MySQL Data file -with a debugger or a hexadecimal file dumper. -@* - -So much for the fixed format. Now, let's discuss the dynamic format. -@* - -The dynamic file format is necessary if rows can vary in size. That will -be the case if there are BLOB columns, or "true" VARCHAR columns. (Remember -that MySQL may treat VARCHAR columns as if they're CHAR columns, in which -case the fixed format is used.) A dynamic row has more fields in the header. -The important ones are "the actual length", "the unused length", and "the -overflow pointer". The actual length is the total number of bytes in all the -columns. The unused length is the total number of bytes between one physical -record and the next one. The overflow pointer is the location of the rest of -the record if there are multiple parts. -@* - -For example, here is a dynamic row: -@* -@example -03, 00 start of header -04 actual length -0c unused length -01, fc flags + overflow pointer -**** data in the row -************ unused bytes - <-- next row starts here) -@end example - -In the example, the actual length and the unused length -are short (one byte each) because the table definition -says that the columns are short -- if the columns were -potentially large, then the actual length and the unused -length could be two bytes each, three bytes each, and so -on. In this case, actual length plus unused length is 10 -hexadecimal (sixteen decimal), which is a minimum. - -As for the third format -- packed -- we will only say -briefly that: -@itemize @bullet -@item -Numeric values are stored in a form that depends on the -range (start/end values) for the data type. -@item -All columns are packed using either Huffman or enum coding. -@end itemize - -For details, see the source files /myisam/mi_statrec.c -(for fixed format), /myisam/mi_dynrec.c (for dynamic -format), and /myisam/mi_packrec.c (for packed format). - -Note: Internally, MySQL uses a format much like the fixed format -which it uses for disk storage. The main differences are: -@enumerate -@item -BLOBs have a length and a memory pointer rather than being stored inline. -@item -"True VARCHAR" (a column storage which will be fully implemented in -version 5.0) will have a 16-bit length plus the data. -@item -All integer or floating-point numbers are stored with the low byte first. -Point (3) does not apply for ISAM storage or internals. -@end enumerate -@* - - -@section Physical Attributes of Columns - -Next I'll describe the physical attributes of each column in a row. -The format depends entirely on the data type and the size of the -column, so, for every data type, I'll give a description and an example. -@* - -@table @strong -@item The character data types - -@strong{CHAR} -@itemize @bullet -@item -Storage: fixed-length string with space padding on the right. -@item -Example: a CHAR(5) column containing the value 'A' looks like:@* -@code{hexadecimal 41 20 20 20 20} -- (length = 5, value = @code{'A '}) -@end itemize - -@strong{VARCHAR} -@itemize @bullet -@item -Storage: variable-length string with a preceding length. -@item -Example: a VARCHAR(7) column containing 'A' looks like:@* -@code{hexadecimal 01 41} -- (length = 1, value = @code{'A'}) -@end itemize - -@item The numeric data types - -Important: MySQL almost always stores multi-byte binary numbers with -the low byte first. This is called "little-endian" numeric storage; -it's normal on Intel x86 machines; MySQL uses it even for non-Intel -machines so that databases will be portable. -@* - -@strong{TINYINT} -@itemize @bullet -@item -Storage: fixed-length binary, always one byte. -@item -Example: a TINYINT column containing 65 looks like:@* -@code{hexadecimal 41} -- (length = 1, value = 65) -@end itemize - -@strong{SMALLINT} -@itemize @bullet -@item -Storage: fixed-length binary, always two bytes. -@item -Example: a SMALLINT column containing 65 looks like:@* -@code{hexadecimal 41 00} -- (length = 2, value = 65) -@end itemize - -@strong{MEDIUMINT} -@itemize @bullet -@item -Storage: fixed-length binary, always three bytes. -@item -Example: a MEDIUMINT column containing 65 looks like:@* -@code{hexadecimal 41 00 00} -- (length = 3, value = 65) -@end itemize - -@strong{INT} -@itemize @bullet -@item -Storage: fixed-length binary, always four bytes. -@item -Example: an INT column containing 65 looks like:@* -@code{hexadecimal 41 00 00 00} -- (length = 4, value = 65) -@end itemize - -@strong{BIGINT} -@itemize @bullet -@item -Storage: fixed-length binary, always eight bytes. -@item -Example: a BIGINT column containing 65 looks like:@* -@code{hexadecimal 41 00 00 00 00 00 00 00} -- (length = 8, value = 65) -@end itemize - -@strong{FLOAT} -@itemize @bullet -@item -Storage: fixed-length binary, always four bytes. -@item -Example: a FLOAT column containing approximately 65 looks like:@* -@code{hexadecimal 00 00 82 42} -- (length = 4, value = 65) -@end itemize - -@strong{DOUBLE PRECISION} -@itemize @bullet -@item -Storage: fixed-length binary, always eight bytes. -@item -Example: a DOUBLE PRECISION column containing approximately 65 looks like:@* -@code{hexadecimal 00 00 00 00 00 40 50 40} -- (length = 8, value = 65) -@end itemize - -@strong{REAL} -@itemize @bullet -@item -Storage: same as FLOAT, or same as DOUBLE PRECISION, depending on setting of the --ansi switch. -@end itemize - -@strong{DECIMAL} -@itemize @bullet -@item -Storage: fixed-length string, with a leading byte for the sign, if any. -@item -Example: a DECIMAL(2) column containing 65 looks like:@* -@code{hexadecimal 20 36 35} -- (length = 3, value = @code{' 65'}) -@item -Example: a DECIMAL(2) UNSIGNED column containing 65 looks like:@* -@code{hexadecimal 36 35} -- (length = 2, value = @code{'65'}) -@item -Example: a DECIMAL(4,2) UNSIGNED column containing 65 looks like:@* -@code{hexadecimal 36 35 2E 30 30} -- (length = 5, value = @code{'65.00'}) -@end itemize - -@strong{NUMERIC} -@itemize @bullet -@item -Storage: same as DECIMAL. -@end itemize - -@strong{BOOL} -@itemize @bullet -@item -Storage: same as TINYINT. -@end itemize - -@item The temporal data types - -@strong{DATE} -@itemize @bullet -@item -Storage: 3 byte integer, low byte first. -Packed as: 'day + month*32 + year*16*32' -@item -Example: a DATE column containing '1962-01-02' looks like:@* -@code{hexadecimal 22 54 0F} -@end itemize - -@strong{DATETIME} -@itemize @bullet -@item -Storage: eight bytes. -@item -Part 1 is a 32-bit integer containing year*10000 + month*100 + day. -@item -Part 2 is a 32-bit integer containing hour*10000 + minute*100 + second. -@item -Example: a DATETIME column for '0001-01-01 01:01:01' looks like:@* -@code{hexadecimal B5 2E 11 5A 02 00 00 00} -@end itemize - -@strong{TIME} -@itemize @bullet -@item -Storage: 3 bytes, low byte first. -This is stored as seconds: days*24*3600+hours*3600+minutes*60+seconds -@item -Example: a TIME column containing '1 02:03:04' (1 day 2 hour 3 minutes and 4 seconds) looks like:@* -@code{hexadecimal 58 6E 01} -@end itemize - -@strong{TIMESTAMP} -@itemize @bullet -@item -Storage: 4 bytes, low byte first. -Stored as unix @code{time()}, which is seconds since the Epoch -(00:00:00 UTC, January 1, 1970). -@item -Example: a TIMESTAMP column containing '2003-01-01 01:01:01' looks like:@* -@code{hexadecimal 4D AE 12 23} -@end itemize - -@strong{YEAR} -@itemize @bullet -@item -Storage: same as unsigned TINYINT with a base value of 0 = 1901. -@end itemize - -@item Others - -@strong{SET} -@itemize @bullet -@item -Storage: one byte for each eight members in the set. -@item -Maximum length: eight bytes (for maximum 64 members). -@item -This is a bit list. The least significant bit corresponds to the -first listed member of the set. -@item -Example: a SET('A','B','C') column containing 'A' looks like:@* -@code{01} -- (length = 1, value = 'A') -@end itemize - -@strong{ENUM} -@itemize @bullet -@item -Storage: one byte if less than 256 alternatives, else two bytes. -@item -This is an index. The value 1 corresponds to the first listed -alternative. (Note: ENUM always reserves 0 for an erroneous value. This -explains why 'A' is 1 instead of 0.) -@item -Example: an ENUM('A','B','C') column containing 'A' looks like:@* -@code{01} -- (length = 1, value = 'A') -@end itemize - -@item The Large-Object data types - -Warning: Because TINYBLOB's preceding length is one byte long (the -size of a TINYINT) and MEDIUMBLOB's preceding length is three bytes -long (the size of a MEDIUMINT), it's easy to think there's some sort -of correspondence between the BLOB and the INT types. There isn't -- a -BLOB's preceding length is not four bytes long (the size of an INT). -@* - -@strong{TINYBLOB} -@itemize @bullet -@item -Storage: variable-length string with a preceding one-byte length. -@item -Example: a TINYBLOB column containing 'A' looks like:@* -@code{hexadecimal 01 41} -- (length = 2, value = 'A') -@end itemize - -@strong{TINYTEXT} -@itemize @bullet -@item -Storage: same as TINYBLOB. -@end itemize - -@strong{BLOB} -@itemize @bullet -@item -Storage: variable-length string with a preceding two-byte length. -@item -Example: a BLOB column containing 'A' looks like:@* -@code{hexadecimal 01 00 41} -- (length = 2, value = 'A') -@end itemize - -@strong{TEXT} -@itemize @bullet -@item -Storage: same as BLOB. -@end itemize - -@strong{MEDIUMBLOB} -@itemize @bullet -@item -Storage: variable-length string with a preceding length. -@item -Example: a MEDIUMBLOB column containing 'A' looks like:@* -@code{hexadecimal 01 00 00 41} -- (length = 4, value = 'A') -@end itemize - -@strong{MEDIUMTEXT} -@itemize @bullet -@item -Storage: same as MEDIUMBLOB. -@end itemize - -@strong{LONGBLOB} -@itemize @bullet -@item -Storage: variable-length string with a preceding four-byte length. -@item -Example: a LONGBLOB column containing 'A' looks like:@* -@code{hexadecimal 01 00 00 00 41} -- (length = 5, value = 'A') -@end itemize - -@strong{LONGTEXT} -@itemize @bullet -@item -Storage: same as LONGBLOB. -@end itemize - -@end table - -@section Where to Look For More Information - -@strong{References:} @* -Most of the formatting work for MyISAM columns is visible -in the program /sql/field.cc in the source code directory. -And in the MyISAM directory, the files that do formatting -work for different record formats are: /myisam/mi_statrec.c, -/myisam/mi_dynrec.c, and /myisam/mi_packrec.c. -@* - -@node InnoDB Record Structure, InnoDB Page Structure, MyISAM Record Structure, Top -@chapter InnoDB Record Structure - -This page contains: -@itemize @bullet -@item -A high-altitude "summary" picture of the parts of a MySQL/InnoDB -record structure. -@item -A description of each part. -@item -An example. -@end itemize - -After reading this page, you will know how MySQL/InnoDB stores a -physical record. -@* - -@section High-Altitude Picture - -The chart below shows the three parts of a physical record. - -@multitable @columnfractions .10 .35 - -@item @strong{Name} @tab @strong{Size} -@item Field Start Offsets -@tab (F*1) or (F*2) bytes -@item Extra Bytes -@tab 6 bytes -@item Field Contents -@tab depends on content - -@end multitable - -Legend: The letter 'F' stands for 'Number Of Fields'. - -The meaning of the parts is as follows: -@itemize @bullet -@item -The FIELD START OFFSETS is a list of numbers containing the -information "where a field starts". -@item -The EXTRA BYTES is a fixed-size header. -@item -The FIELD CONTENTS contains the actual data. -@end itemize - -@strong{An Important Note About The Word "Origin"}@* -The "Origin" or "Zero Point" of a record is the first byte of the -Field Contents -- not the first byte of the Field Start Offsets. If -there is a pointer to a record, that pointer is pointing to the -Origin. Therefore the first two parts of the record are addressed by -subtracting from the pointer, and only the third part is addressed by -adding to the pointer. - -@subsection FIELD START OFFSETS - -The Field Start Offsets is a list in which each entry is the -position, relative to the Origin, of the start of the next field. The -entries are in reverse order, that is, the first field's offset is at -the end of the list. -@* - -An example: suppose there are three columns. The first column's length -is 1, the second column's length is 2, and the third column's length is 4. -In this case, the offset values are, respectively, 1, 3 (1+2), and 7 (1+2+4). -Because values are reversed, a core dump of the Field Start Offsets -would look like this: @code{07,03,01}. -@* - -There are two complications for special cases: -@itemize @bullet -@item -Complication #1: The size of each offset can be either one byte or -two bytes. One-byte offsets are only usable if the total record size -is less than 127. There is a flag in the "Extra Bytes" part which will -tell you whether the size is one byte or two bytes. -@item -Complication #2: The most significant bits of an offset may contain -flag values. The next two paragraphs explain what the contents are. -@end itemize - -@strong{When The Size Of Each Offset Is One Byte} -@itemize @bullet -@item -1 bit = 0 if field is non-NULL, = 1 if field is NULL -@item -7 bits = the actual offset, a number between 0 and 127 -@end itemize - -@strong{When The Size Of Each Offset Is Two Bytes} -@itemize @bullet -@item -1 bit = 0 if field is non-NULL, = 1 if field is NULL -@item -1 bit = 0 if field is on same page as offset, = 1 if field and offset are on different pages -@item -14 bits = the actual offset, a number between 0 and 16383 -@end itemize - -It is unlikely that the "field and offset are on different pages" -unless the record contains a large BLOB. - -@subsection EXTRA BYTES - -The Extra Bytes are a fixed six-byte header. - -@multitable @columnfractions .10 .25 .35 - -@item @strong{Name} @tab @strong{Size} @tab @strong{Description} -@item @strong{info_bits:} -@item () -@tab 1 bit -@tab unused or unknown -@item () -@tab 1 bit -@tab unused or unknown -@item deleted_flag -@tab 1 bit -@tab 1 if record is deleted -@item min_rec_flag -@tab 1 bit -@tab 1 if record is predefined minimum record -@item n_owned -@tab 4 bits -@tab number of records owned by this record -@item heap_no -@tab 13 bits -@tab record's order number in heap of index page -@item n_fields -@tab 10 bits -@tab number of fields in this record, 1 to 1023 -@item 1byte_offs_flag -@tab 1 bit -@tab 1 if each Field Start Offsets is 1 byte long (this item is also called the "short" flag) -@item @strong{next 16 bits} -@tab 16 bits -@tab pointer to next record in page -@item @strong{TOTAL} -@tab 48 bits - -@end multitable - -Total size is 48 bits, which is six bytes. -@* - -If you're just trying to read the record, the key bit in the Extra -Bytes is 1byte_offs_flag -- you need to know if 1byte_offs_flag is 1 -(i.e.: "short 1-byteoffsets") or 0 (i.e.: "2-byte offsets"). -@* - -Given a pointer to the Origin, InnoDB finds the start of the record as follows: -@itemize @bullet -@item -Let X = n_fields (the number of fields is by definition equal to the -number of entries in the Field Start Offsets Table). -@item -If 1byte_offs_flag equals 0, then let X = X * 2 because there are -two bytes for each entry instead of just one. -@item -Let X = X + 6, because the fixed size of Extra Bytes is 6. -@item -The start of the record is at (pointer value minus X). -@end itemize - -@subsection FIELD CONTENTS - -The Field Contents part of the record has all the data. Fields are -stored in the order they were defined in. -@* - -There are no markers between fields, and there is no marker or filler -at the end of a record. -@* - -Here's an example. -@itemize @bullet -@item -I made a table with this definition: -@*@* - -@strong{CREATE TABLE T - (FIELD1 VARCHAR(3), FIELD2 VARCHAR(3), FIELD3 VARCHAR(3)) - Type=InnoDB;} -@*@* - -To understand what follows, you must know that table T has six columns --- not three -- because InnoDB automatically added three "system -columns" at the start for its own housekeeping. It happens that these -system columns are the row ID, the transaction ID, and the rollback -pointer, but their values don't matter now. Regard them as three black -boxes. -@*@* - -@item -I put some rows in the table. My last three INSERTs were: -@*@* - -@strong{INSERT INTO T VALUES ('PP', 'PP', 'PP')} -@*@* - -@strong{INSERT INTO T VALUES ('Q', 'Q', 'Q')} -@*@* - -@strong{INSERT INTO T VALUES ('R', NULL, NULL)} -@*@* - -@item -I ran Borland's TDUMP to get a hexadecimal dump of -the contents of \mysql\data\ibdata1, which (in my case) is the -MySQL/InnoDB data file (on Windows). -@end itemize - -Here is an extract of the dump: - -@multitable @columnfractions .05 .95 - -@item @strong{Address Values In Hexadecimal} @tab @strong{Values In ASCII} -@item @code{0D4280: 00 00 2D 00 84 4F 4F 4F 4F 4F 4F 4F 4F 4F 19 17} -@tab @code{..-..OOOOOOOOO..} -@item @code{0D4290: 15 13 0C 06 00 00 78 0D 02 BF 00 00 00 00 04 21} -@tab @code{......x........!} -@item @code{0D42A0: 00 00 00 00 09 2A 80 00 00 00 2D 00 84 50 50 50} -@tab @code{.....*....-..PPP} -@item @code{0D42B0: 50 50 50 16 15 14 13 0C 06 00 00 80 0D 02 E1 00} -@tab @code{PPP.............} -@item @code{0D42C0: 00 00 00 04 22 00 00 00 00 09 2B 80 00 00 00 2D} -@tab @code{....".....+....-} -@item @code{0D42D0: 00 84 51 51 51 94 94 14 13 0C 06 00 00 88 0D 00} -@tab @code{..QQQ...........} -@item @code{0D42E0: 74 00 00 00 00 04 23 00 00 00 00 09 2C 80 00 00} -@tab @code{t.....#.....,...} -@item @code{0D42F0: 00 2D 00 84 52 00 00 00 00 00 00 00 00 00 00 00} -@tab @code{.-..R...........} - -@end multitable - -A reformatted version of the dump, showing only the relevant bytes, -looks like this (I've put a line break after each field and added labels): - -@strong{Reformatted Hexadecimal Dump}@* -@code{ - 19 17 15 13 0C 06 Field Start Offsets /* First Row */@* - 00 00 78 0D 02 BF Extra Bytes@* - 00 00 00 00 04 21 System Column #1@* - 00 00 00 00 09 2A System Column #2@* - 80 00 00 00 2D 00 84 System Column #3@* - 50 50 Field1 'PP'@* - 50 50 Field2 'PP'@* - 50 50 Field3 'PP'}@* - -@code{ - 16 15 14 13 0C 06 Field Start Offsets /* Second Row */@* - 00 00 80 0D 02 E1 Extra Bytes@* - 00 00 00 00 04 22 System Column #1@* - 00 00 00 00 09 2B 80 System Column #2@* - 00 00 00 2D 00 84 System Column #3@* - 51 Field1 'Q'@* - 51 Field2 'Q'@* - 51 Field3 'Q'}@* - -@code{ - 94 94 14 13 0C 06 Field Start Offsets /* Third Row */@* - 00 00 88 0D 00 74 Extra Bytes@* - 00 00 00 00 04 23 System Column #1@* - 00 00 00 00 09 2C System Column #2@* - 80 00 00 00 2D 00 84 System Column #3@* - 52 Field1 'R'}@* -@* - -You won't need explanation if you followed everything I've said, but -I'll add helpful notes for the three trickiest details. -@itemize @bullet -@item -Helpful Notes About "Field Start Offsets": @* -Notice that the sizes of the record's fields, in forward order, are: -6, 6, 7, 2, 2, 2. Since each offset is for the start of the "next" -field, the hexadecimal offsets are 06, 0c (6+6), 13 (6+6+7), 15 -(6+6+7+2), 17 (6+6+7+2+2), 19 (6+6+7+2+2+2). Reversing the order, the -Field Start Offsets of the first record are: @code{19,17,15,13,0c,06}. -@item -Helpful Notes About "Extra Bytes": @* -Look at the Extra Bytes of the first record: @code{00 00 78 0D 02 BF}. The -fourth byte is @code{0D hexadecimal}, which is @code{1101 binary} ... the 110 is the -last bits of n_fields (@code{110 binary} is 6 which is indeed the number of -fields in the record) and the final 1 bit is 1byte_offs_flag. The -fifth and sixth bytes, which contain @code{02 BF}, constitute the "next" -field. Looking at the original hexadecimal dump, at address -@code{0D42BF} (which is position @code{02BF} within the page), you'll see the beginning bytes of -System Column #1 of the second row. In other words, the "next" field -points to the "Origin" of the following row. -@item -Helpful Notes About NULLs:@* -For the third row, I inserted NULLs in FIELD2 and FIELD3. Therefore in -the Field Start Offsets the top bit is @code{on} for these fields (the -values are @code{94 hexadecimal}, @code{94 hexadecimal}, instead of -@code{14 hexadecimal}, @code{14 hexadecimal}). And the row is -shorter because the NULLs take no space. -@end itemize - -@section Where to Look For More Information - -@strong{References:} @* -The most relevant InnoDB source-code files are rem0rec.c, rem0rec.ic, -and rem0rec.h in the rem ("Record Manager") directory. - -@node InnoDB Page Structure, Files in MySQL Sources, InnoDB Record Structure, Top -@chapter InnoDB Page Structure - -InnoDB stores all records inside a fixed-size unit which is commonly called a -"page" (though InnoDB sometimes calls it a "block" instead). -Currently all pages are the same size, 16KB. -@* - -A page contains records, but it also contains headers and trailers. -I'll start this description with a high-altitude view of a page's parts, -then I'll describe each part of a page. Finally, I'll show an example. This -discussion deals only with the most common format, for the leaf page of a data file. -@* - -@section High-Altitude View - -An InnoDB page has seven parts: -@itemize @bullet -@item -Fil Header -@item -Page Header -@item -Infimum + Supremum Records -@item -User Records -@item -Free Space -@item -Page Directory -@item -Fil Trailer -@end itemize - -As you can see, a page has two header/trailer pairs. The inner pair, "Page Header" and -"Page Directory", are mostly the concern of the \page program group, -while the outer pair, "Fil Header" and "Fil Trailer", are mostly the -concern of the \fil program group. The "Fil" header also goes goes by -the name of "File Page Header". -@* - -Sandwiched between the headers and trailers, are the records and -the free (unused) space. A page always begins with two unchanging -records called the Infimum and the Supremum. Then come the user -records. Between the user records (which grow downwards) and the page -directory (which grows upwards) there is space for new records. -@* - -@subsection Fil Header - -The Fil Header has eight parts, as follows: - -@multitable @columnfractions .10 .30 .35 - -@item @strong{Name} @tab @strong{Size} @tab @strong{Remarks} -@item FIL_PAGE_SPACE -@tab 4 -@tab 4 ID of the space the page is in -@item FIL_PAGE_OFFSET -@tab 4 -@tab ordinal page number from start of space -@item FIL_PAGE_PREV -@tab 4 -@tab offset of previous page in key order -@item FIL_PAGE_NEXT -@tab 4 -@tab offset of next page in key order -@item FIL_PAGE_LSN -@tab 8 -@tab log serial number of page's latest log record -@item FIL_PAGE_TYPE -@tab 2 -@tab current defined types are: FIL_PAGE_INDEX, FIL_PAGE_UNDO_LOG, FIL_PAGE_INODE, FIL_PAGE_IBUF_FREE_LIST -@item FIL_PAGE_FILE_FLUSH_LSN -@tab 8 -@tab "the file has been flushed to disk at least up to this lsn" (log serial number), - valid only on the first page of the file -@item FIL_PAGE_ARCH_LOG_NO -@tab 4 -@tab the latest archived log file number at the time that FIL_PAGE_FILE_FLUSH_LSN was written (in the log) -@end multitable - -@itemize -@item -FIL_PAGE_SPACE is a necessary identifier because different pages might belong to -different (table) spaces within the same file. The word -"space" is generic jargon for either "log" or "tablespace". -@*@* - -@item -FIL_PAGE_PREV and FIL_PAGE_NEXT are the page's "backward" and -"forward" pointers. To show what they're about, I'll draw a two-level -B-tree. -@*@* - -@example - -------- - - root - - -------- - | - ---------------------- - | | - | | - -------- -------- - - leaf - <--> - leaf - - -------- -------- -@end example -@* - -Everyone has seen a B-tree and knows that the entries in the root page -point to the leaf pages. (I indicate those pointers with vertical '|' -bars in the drawing.) But sometimes people miss the detail that leaf -pages can also point to each other (I indicate those pointers with a horizontal -two-way pointer '<-->' in the drawing). This feature allows InnoDB to navigate from -leaf to leaf without having to back up to the root level. This is a -sophistication which you won't find in the classic B-tree, which is -why InnoDB should perhaps be called a B+-tree instead. -@*@* - -@item -The fields FIL_PAGE_FILE_FLUSH_LSN, FIL_PAGE_PREV, and FIL_PAGE_NEXT -all have to do with logs, so I'll refer you to my article "How Logs -Work With MySQL And InnoDB" on devarticles.com. -@*@* - -@item -FIL_PAGE_FILE_FLUSH_LSN and FIL_PAGE_ARCH_LOG_NO are only valid for -the first page of a data file. -@end itemize - -@subsection Page Header - -The Page Header has 14 parts, as follows: -@*@* - -@multitable @columnfractions .10 .20 .30 - -@item @strong{Name} @tab @strong{Size} @tab @strong{Remarks} -@item PAGE_N_DIR_SLOTS -@tab 2 -@tab number of directory slots in the Page Directory part; initial value = 2 -@item PAGE_HEAP_TOP -@tab 2 -@tab record pointer to first record in heap -@item PAGE_N_HEAP -@tab 2 -@tab number of heap records; initial value = 2 -@item PAGE_FREE -@tab 2 -@tab record pointer to first free record -@item PAGE_GARBAGE -@tab 2 -@tab "number of bytes in deleted records" -@item PAGE_LAST_INSERT -@tab 2 -@tab record pointer to the last inserted record -@item PAGE_DIRECTION -@tab 2 -@tab either PAGE_LEFT, PAGE_RIGHT, or PAGE_NO_DIRECTION -@item PAGE_N_DIRECTION -@tab 2 -@tab number of consecutive inserts in the same direction, e.g. "last 5 were all to the left" -@item PAGE_N_RECS -@tab 2 -@tab number of user records -@item PAGE_MAX_TRX_ID -@tab 8 -@tab the highest ID of a transaction which might have changed a record on the page (only set for secondary indexes) -@item PAGE_LEVEL -@tab 2 -@tab level within the index (0 for a leaf page) -@item PAGE_INDEX_ID -@tab 8 -@tab identifier of the index the page belongs to -@item PAGE_BTR_SEG_LEAF -@tab 10 -@tab "file segment header for the leaf pages in a B-tree" (this is irrelevant here) -@item PAGE_BTR_SEG_TOP -@tab 10 -@tab "file segment header for the non-leaf pages in a B-tree" (this is irrelevant here) - -@end multitable -@* - -(Note: I'll clarify what a "heap" is when I discuss the User Records part of the page.) -@*@* - -Some of the Page Header parts require further explanation: -@itemize @bullet -@item -PAGE_FREE: @* -Records which have been freed (due to deletion or migration) are in a -one-way linked list. The PAGE_FREE pointer in the page header points -to the first record in the list. The "next" pointer in the record -header (specifically, in the record's Extra Bytes) points to the next -record in the list. -@item -PAGE_DIRECTION and PAGE_N_DIRECTION: @* -It's useful to know whether inserts are coming in a constantly -ascending sequence. That can affect InnoDB's efficiency. -@item -PAGE_HEAP_TOP and PAGE_FREE and PAGE_LAST_INSERT: @* -Warning: Like all record pointers, these point not to the beginning of the -record but to its Origin (see the earlier discussion of Record -Structure). -@item -PAGE_BTR_SEG_LEAF and PAGE_BTR_SEG_TOP: @* -These variables contain information (space ID, page number, and byte offset) about -index node file segments. InnoDB uses the information for allocating new pages. -There are two different variables because InnoDB allocates separately for leaf -pages and upper-level pages. -@end itemize - -@subsection The Infimum And Supremum Records - -"Infimum" and "supremum" are real English words but they are found -only in arcane mathematical treatises, and in InnoDB comments. To -InnoDB, an infimum is lower than the the lowest possible real value -(negative infinity) and a supremum is greater than the greatest -possible real value (positive infinity). InnoDB sets up an infimum -record and a supremum record automatically at page-create time, and -never deletes them. They make a useful barrier to navigation so that -"get-prev" won't pass the beginning and "get-next" won't pass the end. -Also, the infimum record can be a dummy target for temporary record -locks. -@*@* - -The InnoDB code comments distinguish between "the infimum and supremum -records" and the "user records" (all other kinds). -@*@* - -It's sometimes unclear whether InnoDB considers the infimum and -supremum to be part of the header or not. Their size is fixed and -their position is fixed, so I guess so. - -@subsection User Records - -In the User Records part of a page, you'll find all the records that the user -inserted. -@*@* - -There are two ways to navigate through the user records, depending -whether you want to think of their organization as an unordered or an -ordered list. -@*@* - -An unordered list is often called a "heap". If you make a pile of -stones by saying "whichever one I happen to pick up next will go on -top" -- rather than organizing them according to size and colour -- -then you end up with a heap. Similarly, InnoDB does not want to insert -new rows according to the B-tree's key order (that would involve -expensive shifting of large amounts of data), so it inserts new rows -right after the end of the existing rows (at the -top of the Free Space part) or wherever there's space left by a -deleted row. -@*@* - -But by definition the records of a B-tree must be accessible in order -by key value, so there is a record pointer in each record (the "next" -field in the Extra Bytes) which points to the next record in key -order. In other words, the records are a one-way linked list. So -InnoDB can access rows in key order when searching. - -@subsection Free Space - -I think it's clear what the Free Space part of a page is, from the discussion of -other parts. - -@subsection Page Directory - -The Page Directory part of a page has a variable number of record pointers. -Sometimes the record pointers are called "slots" or "directory slots". -Unlike other DBMSs, InnoDB does not have a slot for every record in -the page. Instead it keeps a sparse directory. In a fullish page, -there will be one slot for every six records. -@*@* - -The slots track the records' logical order (the order by key rather -than the order by placement on the heap). Therefore, if the records -are @code{'A' 'B' 'F' 'D'} the slots will be @code{(pointer to 'A') (pointer to -'B') (pointer to 'D') (pointer to 'F')}. Because the slots are in key -order, and each slot has a fixed size, it's easy to do a binary -search of the records on the page via the slots. -@*@* - -(Since the Page Directory does not have a slot for every record, -binary search can only give a rough position and then InnoDB must -follow the "next" record pointers. InnoDB's "sparse slots" policy also -accounts for the n_owned field in the Extra Bytes part of a record: -n_owned indicates how many more records must be gone through because -they don't have their own slots.) - -@subsection Fil Trailer - -The Fil Trailer has one part, as follows: -@*@* - -@multitable @columnfractions .10 .35 .40 - -@item @strong{Name} @tab @strong{Size} @tab @strong{Remarks} -@item FIL_PAGE_END_LSN -@tab 8 -@tab low 4 bytes = checksum of page, last 4 bytes = same as FIL_PAGE_LSN -@end multitable -@* - -The final part of a page, the fil trailer (or File Page Trailer), -exists because InnoDB's architect worried about integrity. It's -impossible for a page to be only half-written, or corrupted by -crashes, because the log-recovery mechanism restores to a consistent -state. But if something goes really wrong, then it's nice to have a -checksum, and to have a value at the very end of the page which must -be the same as a value at the very beginning of the page. - -@section Example - -For this example, I used Borland's TDUMP again, as I did for the earlier chapter on -Record Format. This is what a page looked like: -@*@* - -@multitable @columnfractions .05 .95 - -@item @strong{Address Values In Hexadecimal} @tab @strong{Values In ASCII} -@item @code{0D4000: 00 00 00 00 00 00 00 35 FF FF FF FF FF FF FF FF} -@tab @code{.......5........} -@item @code{0D4010: 00 00 00 00 00 00 E2 64 45 BF 00 00 00 00 00 00} -@tab @code{.......dE.......} -@item @code{0D4020: 00 00 00 00 00 00 00 05 02 F5 00 12 00 00 00 00} -@tab @code{................} -@item @code{0D4030: 02 E1 00 02 00 0F 00 10 00 00 00 00 00 00 00 00} -@tab @code{................} -@item @code{0D4040: 00 00 00 00 00 00 00 00 00 14 00 00 00 00 00 00} -@tab @code{................} -@item @code{0D4050: 00 02 16 B2 00 00 00 00 00 00 00 02 15 F2 08 01} -@tab @code{................} -@item @code{0D4060: 00 00 03 00 89 69 6E 66 69 6D 75 6D 00 09 05 00} -@tab @code{.....infimum....} -@item @code{0D4070: 08 03 00 00 73 75 70 72 65 6D 75 6D 00 22 1D 18} -@tab @code{....supremum."..} -@item @code{0D4080: 13 0C 06 00 00 10 0D 00 B7 00 00 00 00 04 14 00} -@tab @code{................} -@item @code{0D4090: 00 00 00 09 1D 80 00 00 00 2D 00 84 41 41 41 41} -@tab @code{.........-..AAAA} -@item @code{0D40A0: 41 41 41 41 41 41 41 41 41 41 41 1F 1B 17 13 0C} -@tab @code{AAAAAAAAAAA.....} -@item @code{ ... } -@item @code{ ... } -@item @code{0D7FE0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 74} -@tab @code{...............t} -@item @code{0D7FF0: 02 47 01 AA 01 0A 00 65 3A E0 AA 71 00 00 E2 64} -@tab @code{.G.....e:..q...d} -@end multitable -@*@* - -Let's skip past the first 38 bytes, which are Fil Header. The bytes -of the Page Header start at location @code{0d4026 hexadecimal}: -@*@* - -@multitable @columnfractions .10 .45 .60 - -@item @strong{Location} @tab @strong{Name} @tab @strong{Description} -@item @code{00 05} -@tab PAGE_N_DIR_SLOTS -@tab There are 5 directory slots. -@item @code{02 F5} -@tab PAGE_HEAP_TOP -@tab At location @code{0402F5}, not shown, is the beginning of free space. -Maybe a better name would have been PAGE_HEAP_END -@item @code{00 12} -@tab PAGE_N_HEAP -@tab There are 18 (hexadecimal 12) records in the page. -@item @code{00 00} -@tab PAGE_FREE -@tab There are zero free (deleted) records. -@item @code{00 00} -@tab PAGE_GARBAGE -@tab There are zero bytes in deleted records. -@item @code{02 E1} -@tab PAGE_LAST_INSERT -@tab The last record was inserted at location @code{02E1}, not shown, within the page. -@item @code{00 02} -@tab PAGE_DIRECTION -@tab A glance at page0page.h will tell you that 2 is the #defined value for PAGE_RIGHT. -@item @code{00 0F} -@tab PAGE_N_DIRECTION -@tab The last 15 (hexadecimal 0F) inserts were all done "to the right" -because I was inserting in ascending order. -@item @code{00 10} -@tab PAGE_N_RECS -@tab There are 16 (hexadecimal 10) user records. Notice that PAGE_N_RECS is -smaller than the earlier field, PAGE_N_HEAP. -@item @code{00 00 00 00 00 00 00} -@tab PAGE_MAX_TRX_ID -@item @code{00 00} -@tab PAGE_LEVEL -@tab Zero because this is a leaf page. -@item @code{00 00 00 00 00 00 00 14} -@tab PAGE_INDEX_ID -@tab This is index number 20. -@item @code{00 00 00 00 00 00 00 02 16 B2} -@tab PAGE_BTR_SEG_LEAF -@item @code{00 00 00 00 00 00 00 02 15 F2} -@tab PAGE_BTR_SEG_TOP -@end multitable -@* - -Immediately after the page header are the infimum and supremum -records. Looking at the "Values In ASCII" column in the hexadecimal -dump, you will see that the contents are in fact the words "infimum" -and "supremum" respectively. -@*@* - -Skipping past the User Records and the Free Space, many bytes later, -is the end of the 16KB page. The values shown there are the two trailers. -@itemize @bullet -@item -The first trailer (@code{00 74, 02 47, 01 AA, 01 0A, 00 65}) is the page -directory. It has 5 entries, because the header field PAGE_N_DIR_SLOTS -says there are 5. -@item -The next trailer (@code{3A E0 AA 71, 00 00 E2 64}) is the fil trailer. Notice -that the last four bytes, @code{00 00 E2 64}, appeared before in the fil -header. -@end itemize - -@section Where to Look For More Information - -@strong{References:} @* -The most relevant InnoDB source-code files are page0page.c, -page0page.ic, and page0page.h in \page directory. - -@node Files in MySQL Sources, Files in InnoDB Sources, InnoDB Page Structure, Top -@chapter Annotated List Of Files in the MySQL Source Code Distribution - -This is a description of the files that you get when you download the -source code of MySQL. This description begins with a list -of the main directories and a short comment about each one. Then, for -each directory, in alphabetical order, a longer description is -supplied. When a directory contains significant program files, a list of each C -program is given along with an explanation of its intended function. - -@section Directory Listing - -@strong{Directory -- Short Comment} -@itemize @bullet -@item -bdb -- The Berkeley Database table handler -@item -BitKeeper -- BitKeeper administration (not part of the source distribution) -@item -BUILD -- Frequently used build scripts -@item -Build-tools -- Build tools (not part of the source distribution) -@item -client -- Client library -@item -cmd-line-utils -- Command-line utilities (libedit and readline) -@item -dbug -- Fred Fish's dbug library -@item -Docs -- Preliminary documents about internals and new modules; will eventually be moved to the mysqldoc repository -@item -extra -- Some minor standalone utility programs -@item -heap -- The HEAP table handler -@item -include -- Header (*.h) files for most libraries; includes all header files distributed with the MySQL binary distribution -@item -innobase -- The Innobase (InnoDB) table handler -@item -libmysql -- For producing MySQL as a library (e.g. a Windows .DLL) -@item -libmysql_r -- For building a thread-safe libmysql library -@item -libmysqld -- The MySQL Server as an embeddable library -@item -man -- Some user-contributed manual pages -@item -myisam -- The MyISAM table handler -@item -myisammrg -- The MyISAM Merge table handler -@item -mysql-test -- A test suite for mysqld -@item -mysys -- MySQL system library (Low level routines for file access etc.) -@item -netware -- Files related to the Novell NetWare version of MySQL -@item -NEW-RPMS -- Directory to place RPMs while making a distribution -@item -os2 -- Routines for working with the OS/2 operating system -@item -pstack -- Process stack display (not currently used) -@item -regex -- Henry Spencer's Regular Expression library for support of REGEXP function -@item -SCCS -- Source Code Control System (not part of source distribution) -@item -scripts -- SQL batches, e.g. mysqlbug and mysql_install_db -@item -sql -- Programs for handling SQL commands; the "core" of MySQL -@item -sql-bench -- The MySQL benchmarks -@item -SSL -- Secure Sockets Layer; includes an example certification one can use to test an SSL (secure) database connection -@item -strings -- Library for C string routines, e.g. atof, strchr -@item -support-files -- Files used to build MySQL on different systems -@item -tests -- Tests in Perl and in C -@item -tools -- mysqlmanager.c (tool under development, not yet useful) -@item -VC++Files -- Includes this entire directory, repeated for VC++ (Windows) use -@item -vio -- Virtual I/O Library -@item -zlib -- Data compression library, used on Windows -@end itemize - -@subsection bdb - -The Berkeley Database table handler. -@*@* - -The Berkeley Database (BDB) is maintained by Sleepycat Software. -MySQL AB maintains only a few small patches to make BDB work -better with MySQL. -@*@* - -The documentation for BDB is available at -http://www.sleepycat.com/docs/. Since it's reasonably thorough -documentation, a description of the BDB program files is not included -in this document. -@*@* - -@subsection BitKeeper - -BitKeeper administration. -@*@* - -Bitkeeper administration is not part of the source distribution. This -directory may be present if you downloaded the MySQL source using -BitKeeper rather than via the mysql.com site. The files in the -BitKeeper directory are for maintenance purposes only -- they are not -part of the MySQL package. -@*@* - -The MySQL Reference Manual explains how to use Bitkeeper to get the -MySQL source. Please see @url{http://www.mysql.com/doc/en/Installing_source_tree.html} -for more information. -@*@* - -@subsection BUILD - -Frequently used build scripts. -@*@* - -This directory contains the build switches for compilation on various -platforms. There is a subdirectory for each set of options. The main -ones are: -@itemize @bullet -@item -alpha -@item -ia64 -@item -pentium (with and without debug or bdb, etc.) -@item -solaris -@end itemize -@*@* - -@subsection Build-tools - -Build tools. -@*@* - -Build-tools is not part of the source distribution. This directory -contains batch files for extracting, making directories, and making -programs from source files. There are several subdirectories with -different scripts -- for building Linux executables, for compiling, -for performing all build steps, and so on. -@*@* - -@subsection client - -Client library. -@*@* - -The client library includes mysql.cc (the source of the 'mysql' -executable) and other utilities. Most of the utilities are mentioned -in the MySQL Reference Manual. Generally these are standalone C -programs which one runs in "client mode", that is, they call the -server. -@*@* - -The C program files in the directory are: -@itemize @bullet -@item -get_password.c -- ask for a password from the console -@item -mysql.cc -- "The MySQL command tool" -@item -mysqladmin.c -- maintenance of MySQL databases -@item -mysqlcheck.c -- check all databases, check connect, etc. -@item -mysqldump.c -- dump table's contents as SQL statements, suitable to backup a MySQL database -@item -mysqlimport.c -- import text files in different formats into tables -@item -mysqlmanager-pwgen.c -- pwgen stands for "password generation" (not currently maintained) -@item -mysqlmanagerc.c -- entry point for mysql manager (not currently maintained) -@item -mysqlshow.c -- show databases, tables or columns -@item -mysqltest.c -- test program used by the mysql-test suite, mysql-test-run -@item -password.c -- password checking routines (version 4.1 and up) -@end itemize -@*@* - -@subsection cmd-line-utils - -Command-line utilities (libedit and readline). -@*@* - -There are two subdirectories: \readline and \libedit. All the files -here are "non-MySQL" files, in the sense that MySQL AB didn't produce -them, it just uses them. It should be unnecessary to study the -programs in these files unless you are writing or debugging a tty-like -client for MySQL, such as mysql.exe. -@*@* - -The \readline subdirectory contains the files of the GNU Readline -Library, "a library for reading lines of text with interactive input -and history editing". The programs are copyrighted by the Free -Software Foundation. -@*@* - -The \libedit (library of edit functions) subdirectory has files -written by Christos Zoulas. They are distributed and modifed under -the BSD License. These files are for editing the line contents. -@*@* - -These are the program files in the \libedit subdirectory: -@itemize @bullet -@item -chared.c -- character editor -@item -common.c -- common editor functions -@item -el.c -- editline interface functions -@item -emacs.c -- emacs functions -@item -fgetln.c -- get line -@item -hist.c -- history access functions -@item -history.c -- more history access functions -@item -key.c -- procedures for maintaining the extended-key map -@item -map.c -- editor function definitions -@item -parse.c -- parse an editline extended command -@item -prompt.c -- prompt printing functions -@item -read.c -- terminal read functions -@item -readline.c -- read line -@item -refresh.c -- "lower level screen refreshing functions" -@item -search.c -- "history and character search functions" -@item -sig.c -- for signal handling -@item -strlcpy.c -- string copy -@item -term.c -- "editor/termcap-curses interface" -@item -tokenizer.c -- Bourne shell line tokenizer -@item -tty.c -- for a tty interface -@item -vi.c -- commands used when in the vi (editor) mode -@end itemize -@*@* - -@subsection dbug - -Fred Fish's dbug library. -@*@* - -This is not really part of the MySQL package. Rather, it's a set of -public-domain routines which are useful for debugging MySQL programs. -The MySQL Server and all .c and .cc programs support the use of this -package. -@*@* - -How it works: One inserts a function call that begins with DBUG_* in -one of the regular MYSQL programs. For example, in get_password.c, you -will find this line: @* -DBUG_ENTER("get_tty_password"); @* -at the start of a routine, and this line: @* -DBUG_RETURN(my_strdup(to,MYF(MY_FAE))); @* -at the end of the routine. These lines don't affect production code. -Features of the dbug library include extensive reporting and profiling -(the latter has not been used by the MySQL team). -@*@* - -The C programs in this directory are: -@itemize @bullet -@item -dbug.c -- The main module -@item -dbug_analyze.c -- Reads a file produced by trace functions -@item -example1.c -- A tiny example -@item -example2.c -- A tiny example -@item -example3.c -- A tiny example -@item -factorial.c -- A tiny example -@item -main.c -- A tiny example -@item -sanity.c -- Declaration of a variable -@end itemize -@*@* - -@subsection Docs - -Preliminary documents about internals and new modules, which will eventually -be moved to the mysqldoc repository. -@*@* - -This directory doesn't have much at present that's very useful to the -student, but the plan is that some documentation related to the source -files and the internal workings of MySQL, including perhaps some -documentation from developers themselves, will be placed here. Files in -this directory will eventually be moved to the MySQL documentation repository. -@*@* - -These sub-directories are part of this directory: -@itemize @bullet -@item -books -- .gif images and empty .txt files; no real information -@item -flags -- images of flags of countries -@item -images -- flag backgrounds and the MySQL dolphin logo -@item -mysql-logos -- more MySQL-related logos, some of them moving -@item -raw-flags -- more country flags, all .gif files -@item -support -- various files for generating texinfo/docbook documentation -@item -to-be-included... -- contains a MySQL-for-dummies file -@item -translations -- some Portuguese myodbc documentation -@end itemize -@*@* - -In the main directory, you'll find some .txt files related to the -methods that MySQL uses to produce its printed and html documents, odd -bits in various languages, and the single file in the directory which -has any importance -- internals.texi -- The "MySQL Internals" -document. -@*@* - -Despite the name, internals.texi is not yet much of a description of MySQL -internals although work is in progress to make it so. However, there is -some useful description of the functions in the mysys directory (see below), -and of the structure of client/server messages (doubtless very useful for -eople who want to make their own JDBC drivers, or just sniff). -@*@* - -@subsection extra - -Some minor standalone utility programs. -@*@* - -These programs are all standalone utilities, that is, they have -a main() function and their main role is to show information that the -MySQL server needs or produces. Most are unimportant. They are as -follows: -@itemize @bullet -@item -my_print_defaults.c -- print parameters from my.ini files. Can also be used in scripts to enable processing of my.ini files. -@item -mysql_waitpid.c -- wait for a program to terminate. Useful for shell scripts when one needs to wait until a process terminates. -@item -perror.c -- "print error" -- given error number, display message -@item -replace.c -- replace strings in text files or pipe -@item -resolve_stack_dump.c -- show symbolic information from a MySQL stack dump, normally found in the mysql.err file -@item -resolveip.c -- convert an IP address to a hostname, or vice versa -@end itemize -@*@* - -@subsection heap - -The HEAP table handler. -@*@* - -All the MySQL table handlers (i.e. the handlers that MySQL itself -produces) have files with similar names and functions. Thus, this -(heap) directory contains a lot of duplication of the myisam directory -(for the MyISAM table handler). Such duplicates have been marked with -an "*" in the following list. For example, you will find that -\heap\hp_extra.c has a close equivalent in the myisam directory -(\myisam\mi_extra.c) with the same descriptive comment. (Some of the -differences arise because HEAP has different structures. HEAP does not -need to use the sort of B-tree indexing that ISAM and MyISAM use; instead -there is a hash index. Most importantly, HEAP is entirely in memory. -File-I/O routines lose some of their vitality in such a context.) -@*@* - -@itemize -@item -hp_block.c -- Read/write a block (i.e. a page) -@item -hp_clear.c -- Remove all records in the table -@item -hp_close.c -- * close database -@item -hp_create.c -- * create a table -@item -hp_delete.c -- * delete a row -@item -hp_extra.c -- * for setting options and buffer sizes when optimizing -@item -hp_hash.c -- Hash functions used for saving keys -@item -hp_info.c -- * Information about database status -@item -hp_open.c -- * open database -@item -hp_panic.c -- * the hp_panic routine, for shutdowns and flushes -@item -hp_rename.c -- * rename a table -@item -hp_rfirst.c -- * read first row through a specific key (very short) -@item -hp_rkey.c -- * read record using a key -@item -hp_rlast.c -- * read last row with same key as previously-read row -@item -hp_rnext.c -- * read next row with same key as previously-read row -@item -hp_rprev.c -- * read previous row with same key as previously-read row -@item -hp_rrnd.c -- * read a row based on position -@item -hp_rsame.c -- * find current row using positional read or key-based -read -@item -hp_scan.c -- * read all rows sequentially -@item -hp_static.c -- * static variables (very short) -@item -hp_test1.c -- * testing basic functions -@item -hp_test2.c -- * testing database and storing results -@item -hp_update.c -- * update an existing row -@item -hp_write.c -- * insert a new row -@end itemize -@*@* - -There are fewer files in the heap directory than in the myisam -directory, because fewer are necessary. For example, there is no need -for a \myisam\mi_cache.c equivalent (to cache reads) or a -\myisam\log.c equivalent (to log statements). -@*@* - -@subsection include - -Header (*.h) files for most libraries; includes all header files distributed -with the MySQL binary distribution. -@*@* - -These files may be included in C program files. Note that each -individual directory will also have its own *.h files, for including -in its own *.c programs. The *.h files in the include directory are -ones that might be included from more than one place. -@*@* - -For example, the mysys directory contains a C file named rijndael.c, -but does not include rijndael.h. The include directory contains -rijndael.h. Looking further, you'll find that rijndael.h is also -included in other places: by my_aes.c and my_aes.h. -@*@* - -The include directory contains 51 *.h (header) files. -@*@* - -@subsection innobase - -The Innobase (InnoDB) table handler. -@*@* - -A full description of these files can be found elsewhere in this -document. -@*@* - -@subsection libmysql - -The MySQL Library, Part 1. -@*@* - -The files here are for producing MySQL as a library (e.g. a Windows -DLL). The idea is that, instead of producing separate mysql (client) -and mysqld (server) programs, one produces a library. Instead of -sending messages, the client part merely calls the server part. -@*@* - -The libmysql files are split into three directories: libmysql (this -one), libmysql_r (the next one), and libmysqld (the next one after -that). -@*@* - -The "library of mysql" has some client-connection -modules. For example, as described in an earlier -section of this manual, there is a discussion of -libmysql/libmysql.c which sends packets from the -client to the server. Many of the entries in the -libmysql directory (and in the following libmysqld -directory) are 'symlinks' on Linux, that is, they -are in fact pointers to files in other directories. -@*@* - -The program files on this directory are: -@itemize @bullet -@item -conf_to_src.c -- has to do with charsets -@item -dll.c -- initialization of the dll library -@item -errmsg.c -- English error messages, compare \mysys\errors.c -@item -get_password.c -- get password -@item -libmysql.c -- the code that implements the MySQL API, i.e. the functions a client that wants to connect to MySQL will call -@item -manager.c -- initialize/connect/fetch with MySQL manager -@end itemize -@*@* - -@subsection libmysql_r - -The MySQL Library, Part 2. -@*@* - -There is only one file here, used to build a thread-safe libmysql library: -@itemize @bullet -@item -makefile.am -@end itemize -@*@* - -@subsection libmysqld - -The MySQL library, Part 3. -@*@* - -The Embedded MySQL Server Library. The product of libmysqld -is not a client/server affair, but a library. There is a wrapper -to emulate the client calls. The program files on this directory -are: -@itemize @bullet -@item -libmysqld.c -- The called side, compare the mysqld.exe source -@item -lib_vio.c -- Emulate the vio directory's communication buffer -@end itemize -@*@* - -@subsection man - -Some user-contributed manual pages -@*@* - -These are user-contributed "man" (manual) pages in a special markup -format. The format is described in a document with a heading like -"man page for man" or "macros to format man pages" which you can find -in a Linux directory or on the Internet. -@*@* - -@subsection myisam - -The MyISAM table handler. -@*@* - -The C files in this subdirectory come in six main groups: -@itemize @bullet -@item -ft*.c files -- ft stands for "Full Text", code contributed by Sergei Golubchik -@item -mi*.c files -- mi stands for "My Isam", these are the main programs for Myisam -@item -myisam*.c files -- for example, "myisamchk" utility routine functions source -@item -rt*.c files -- rt stands for "rtree", some code was written by Alexander Barkov -@item -sp*.c files -- sp stands for "spatial", some code was written by Ramil Kalimullin -@item -sort.c -- this is a single file that sorts keys for index-create purposes -@end itemize -@*@* - -The "full text" and "rtree" and "spatial" program sets are for special -purposes, so this document focuses only on the mi*.c "myisam" C -programs. They are: -@itemize @bullet -@item -mi_cache.c -- for reading records from a cache -@item -mi_changed.c -- a single routine for setting a "changed" flag (very short) -@item -mi_check.c -- for checking and repairing tables. Used by the myisamchk program and by the MySQL server. -@item -mi_checksum.c -- calculates a checksum for a row -@item -mi_close.c -- close database -@item -mi_create.c -- create a table -@item -mi_dbug.c -- support routines for use with "dbug" (see \dbug description) -@item -mi_delete.c -- delete a row -@item -mi_delete_all.c -- delete all rows -@item -mi_delete_table.c -- delete a table (very short) -@item -mi_dynrec.c -- functions to handle space-packed records and blobs -@item -mi_extra.c -- setting options and buffer sizes when optimizing -@item -mi_info.c -- return useful base information for an open table -@item -mi_key.c -- for handling keys -@item -mi_locking.c -- lock database -@item -mi_log.c -- save commands in a log file which myisamlog program can read. Can be used to exactly replay a set of changes to a table. -@item -mi_open.c -- open database -@item -mi_packrec.c -- read from a data file compresed with myisampack -@item -mi_page.c -- read and write pages containing keys -@item -mi_panic.c -- the mi_panic routine, probably for sudden shutdowns -@item -mi_range.c -- approximate count of how many records lie between two keys -@item -mi_rename.c -- rename a table -@item -mi_rfirst.c -- read first row through a specific key (very short) -@item -mi_rkey.c -- read a record using a key -@item -mi_rlast.c -- read last row with same key as previously-read row -@item -mi_rnext.c -- read next row with same key as previously-read row -@item -mi_rnext_same.c -- same as mi_rnext.c, but abort if the key changes -@item -mi_rprev.c -- read previous row with same key as previously-read row -@item -mi_rrnd.c -- read a row based on position -@item -mi_rsame.c -- find current row using positional read or key-based read -@item -mi_rsamepos.c -- positional read -@item -mi_scan.c -- read all rows sequentially -@item -mi_search.c -- key-handling functions -@item -mi_static.c -- static variables (very short) -@item -mi_statrec.c -- functions to handle fixed-length records -@item -mi_test1.c -- testing basic functions -@item -mi_test2.c -- testing database and storing results -@item -mi_test3.c -- testing locking -@item -mi_unique.c -- functions to check if a row is unique -@item -mi_update.c -- update an existing row -@item -mi_write.c -- insert a new row -@end itemize -@*@* - -@subsection myisammrg - -MyISAM Merge table handler. -@*@* - -As with other table handlers, you'll find that the *.c files in the -myissammrg directory have counterparts in the myisam directory. In -fact, this general description of a myisammrg program is almost always -true: The myisammrg -function checks an argument, the myisammrg function formulates an -expression for passing to a myisam function, the myisammrg calls a -myisam function, the myisammrg function returns. -@*@* - -These are the 21 files in the myisammrg directory, with notes about -the myisam functions or programs they're connected with: -@itemize @bullet -@item -myrg_close.c -- mi_close.c -@item -myrg_create.c -- mi_create.c -@item -myrg_delete.c -- mi_delete.c / delete last-read record -@item -myrg_extra.c -- mi_extra.c / "extra functions we want to do ..." -@item -myrg_info.c -- mi_info.c / display information about a mymerge file -@item -myrg_locking.c -- mi_locking.c / lock databases -@item -myrg_open.c -- mi_open.c / open a MyISAM MERGE table -@item -myrg_panic.c -- mi_panic.c / close in a hurry -@item -myrg_queue.c -- read record based on a key -@item -myrg_range.c -- mi_range.c / find records in a range -@item -myrg_rfirst.c -- mi_rfirst.c / read first record according to -specific key -@item -myrg_rkey.c -- mi_rkey.c / read record based on a key -@item -myrg_rlast.c -- mi_rlast.c / read last row with same key as previous -read -@item -myrg_rnext.c -- mi_rnext.c / read next row with same key as previous -read -@item -myrg_rnext_same.c -- mi_rnext_same.c / read next row with same key -@item -myrg_rprev.c -- mi_rprev.c / read previous row with same key -@item -myrg_rrnd.c -- mi_rrnd.c / read record with random access -@item -myrg_rsame.c -- mi_rsame.c / call mi_rsame function, see -\myisam\mi_rsame.c -@item -myrg_static.c -- mi_static.c / static variable declaration -@item -myrg_update.c -- mi_update.c / call mi_update function, see -\myisam\mi_update.c -@item -myrg_write.c -- mi_write.c / call mi_write function, see -\myisam\mi_write.c -@end itemize -@*@* - -@subsection mysql-test - -A test suite for mysqld. -@*@* - -The directory has a README file which explains how to run the tests, -how to make new tests (in files with the filename extension "*.test"), -and how to report errors. -@*@* - -There are four subdirectories: -@itemize @bullet -@item -\misc -- contains one minor Perl program -@item -\r -- contains *.result, i.e. "what happened" files and -*.required, i.e. "what should happen" file -@item -\std_data -- contains standard data for input to tests -@item -\t -- contains tests -@end itemize -@*@* - -There are 186 *.test files in the \t subdirectory. Primarily these are -SQL scripts which try out a feature, output a result, and compare the -result with what's required. Some samples of what the test files check -are: latin1_de comparisons, date additions, the HAVING clause, outer -joins, openSSL, load data, logging, truncate, and UNION. -@*@* - -There are other tests in these directories: -@itemize @bullet -@item -sql-bench -@item -tests -@end itemize - -@subsection mysys - -MySQL system library. Low level routines for file access and so on. -@*@* - -There are 115 *.c programs in this directory: -@itemize @bullet -@item -array.c -- Dynamic array handling -@item -charset.c -- Using dynamic character sets, set default character set, ... -@item -charset2html.c -- Check what character set a browser is using -@item -checksum.c -- Calculate checksum for a memory block, used for pack_isam -@item -default.c -- Find defaults from *.cnf or *.ini files -@item -errors.c -- English text of global errors -@item -hash.c -- Hash search/compare/free functions "for saving keys" -@item -list.c -- Double-linked lists -@item -make-conf.c -- "Make a charset .conf file out of a ctype-charset.c file" -@item -md5.c -- MD5 ("Message Digest 5") algorithm from RSA Data Security -@item -mf_brkhant.c -- Prevent user from doing a Break during critical execution (not used in MySQL; can be used by standalone MyISAM applications) -@item -mf_cache.c -- "Open a temporary file and cache it with io_cache" -@item -mf_dirname.c -- Parse/convert directory names -@item -mf_fn_ext.c -- Get filename extension -@item -mf_format.c -- Format a filename -@item -mf_getdate.c -- Get date, return in yyyy-mm-dd hh:mm:ss format -@item -mf_iocache.c -- Cached read/write of files in fixed-size units -@item -mf_iocache2.c -- Continuation of mf_iocache.c -@item -mf_keycache.c -- Key block caching for certain file types -@item -mf_loadpath.c -- Return full path name (no ..\ stuff) -@item -mf_pack.c -- Packing/unpacking directory names for create purposes -@item -mf_path.c -- Determine where a program can find its files -@item -mf_qsort.c -- Quicksort -@item -mf_qsort2.c -- Quicksort, part 2 (allows the passing of an extra argument to the sort-compare routine) -@item -mf_radix.c -- Radix sort -@item -mf_same.c -- Determine whether filenames are the same -@item -mf_sort.c -- Sort with choice of Quicksort or Radix sort -@item -mf_soundex.c -- Soundex algorithm derived from EDN Nov. 14, 1985 (pg. 36) -@item -mf_strip.c -- Strip trail spaces from a string -@item -mf_tempdir.c -- Initialize/find/free temporary directory -@item -mf_tempfile.c -- Create a temporary file -@item -mf_unixpath.c -- Convert filename to UNIX-style filename -@item -mf_util.c -- Routines, #ifdef'd, which may be missing on some -machines -@item -mf_wcomp.c -- Comparisons with wildcards -@item -mf_wfile.c -- Finding files with wildcards -@item -mulalloc.c -- Malloc many pointers at the same time -@item -my_aes.c -- AES encryption -@item -my_alarm.c -- Set a variable value when an alarm is received -@item -my_alloc.c -- malloc of results which will be freed simultaneously -@item -my_append.c -- one file to another -@item -my_bit.c -- smallest X where 2^X >= value, maybe useful for -divisions -@item -my_bitmap.c -- Handle uchar arrays as large bitmaps -@item -my_chsize.c -- Truncate file if shorter, else fill with a filler -character -@item -my_clock.c -- Time-of-day ("clock()") function, with OS-dependent -#ifdef's -@item -my_compress.c -- Compress packet (see also description of \zlib -directory) -@item -my_copy.c -- Copy files -@item -my_create.c -- Create file -@item -my_delete.c -- Delete file -@item -my_div.c -- Get file's name -@item -my_dup.c -- Open a duplicated file -@item -my_error.c -- Return formatted error to user -@item -my_fopen.c -- File open -@item -my_fstream.c -- Streaming file read/write -@item -my_getwd.c -- Get working directory -@item -my_gethostbyname.c -- Thread-safe version of standard net -gethostbyname() func -@item -my_getopt.c -- Find out what options are in effect -@item -my_handler.c -- Compare two keys in various possible formats -@item -my_init.c -- Initialize variables and functions in the mysys library -@item -my_lib.c -- Compare/convert directory names and file names -@item -my_lock.c -- Lock part of a file -@item -my_lockmem.c -- "Allocate a block of locked memory" -@item -my_lread.c -- Read a specified number of bytes from a file into -memory -@item -my_lwrite.c -- Write a specified number of bytes from memory into a -file -@item -my_malloc.c -- Malloc (memory allocate) and dup functions -@item -my_messnc.c -- Put out a message on stderr with "no curses" -@item -my_mkdir.c -- Make directory -@item -my_net.c -- Thread-safe version of net inet_ntoa function -@item -my_netware.c -- Functions used only with the Novell Netware version -of MySQL -@item -my_once.c -- Allocation / duplication for "things we don't need to -free" -@item -my_open.c -- Open a file -@item -my_os2cond.c -- OS2-specific: "A simple implementation of posix conditions" -@item -my_os2dirsrch.c -- OS2-specific: Emulate a Win32 directory search -@item -my_os2dlfcn.c -- OS2-specific: Emulate UNIX dynamic loading -@item -my_os2file64.c -- OS2-specific: For File64bit setting -@item -my_os2mutex.c -- OS2-specific: For mutex handling -@item -my_os2thread.c -- OS2-specific: For thread handling -@item -my_os2tls.c -- OS2-specific: For thread-local storage -@item -my_port.c -- OS/machine-dependent porting functions, e.g. AIX-specific my_ulonglong2double() -@item -my_pread.c -- Read a specified number of bytes from a file -@item -my_pthread.c -- A wrapper for thread-handling functions in different OSs -@item -my_quick.c -- Read/write (labelled a "quicker" interface, perhaps -obsolete) -@item -my_read.c -- Read a specified number of bytes from a file, possibly -retry -@item -my_realloc.c -- Reallocate memory allocated with my_alloc.c -(probably) -@item -my_redel.c -- Rename and delete file -@item -my_rename.c -- Rename without delete -@item -my_seek.c -- Seek, i.e. point to a spot within a file -@item -my_semaphore.c -- Semaphore routines, for use on OS that doesn't support them -@item -my_sleep.c -- Wait n microseconds -@item -my_static.c -- Static variables used by the mysys library -@item -my_symlink.c -- Read a symbolic link (symlinks are a UNIX thing, I guess) -@item -my_symlink2.c -- Part 2 of my_symlink.c -@item -my_tempnam.c -- Obsolete temporary-filename routine used by ISAM table handler -@item -my_thr_init.c -- initialize/allocate "all mysys & debug thread variables" -@item -my_wincond.c -- Windows-specific: emulate Posix conditions -@item -my_winsem.c -- Windows-specific: emulate Posix threads -@item -my_winthread.c -- Windows-specific: emulate Posix threads -@item -my_write.c -- Write a specified number of bytes to a file -@item -ptr_cmp.c -- Point to an optimal byte-comparison function -@item -queues.c -- Handle priority queues as in Robert Sedgewick's book -@item -raid2.c -- RAID support (the true implementation is in raid.cc) -@item -rijndael.c -- "Optimised ANSI C code for the Rijndael cipher (now AES") -@item -safemalloc.c -- A version of the standard malloc() with safety checking -@item -sha1.c -- Implementation of Secure Hashing Algorithm 1 -@item -string.c -- Initialize/append/free dynamically-sized strings; see also sql_string.cc in the /sql directory -@item -testhash.c -- Standalone program: test the hash library routines -@item -test_charset.c -- Standalone program: display character set information -@item -test_dir.c -- Standalone program: placeholder for "test all functions" idea -@item -test_fn.c -- Standalone program: apparently tests a function -@item -test_xml.c -- Standalone program: test XML routines -@item -thr_alarm.c -- Thread alarms and signal handling -@item -thr_lock.c -- "Read and write locks for Posix threads" -@item -thr_mutex.c -- A wrapper for mutex functions -@item -thr_rwlock.c -- Synchronizes the readers' thread locks with the writer's lock -@item -tree.c -- Initialize/search/free binary trees -@item -typelib.c -- Find a string in a set of strings; returns the offset to the string found -@end itemize -@*@* - -You can find documentation for the main functions in these files -elsewhere in this document. For example, the main functions in my_getwd.c -are described thus: -@*@* - -@example -"int my_getwd _A((string buf, uint size, myf MyFlags)); @* - int my_setwd _A((const char *dir, myf MyFlags)); @* - Get and set working directory." @* -@end example - -@subsection netware - -Files related to the Novell NetWare version of MySQL. -@*@* - -There are 39 files on this directory. Most have filename extensions of -*.def, *.sql, or *.c. -@*@* - -The twenty-five *.def files are all from Novell Inc. They contain import or -export symbols. (".def" is a common filename extension for -"definition".) -@*@* - -The two *.sql files are short scripts of SQL statements used in -testing. -@*@* - -These are the five *.c files, all from Novell Inc.: -@itemize @bullet -@item -libmysqlmain.c -- Only one function: init_available_charsets() -@item -my_manage.c -- Standalone management utility -@item -mysql_install_db.c -- Compare \scripts\mysql_install_db.sh -@item -mysql_test_run.c -- Short test program -@item -mysqld_safe.c -- Compare \scripts\mysqld_safe.sh -@end itemize - -Perhaps the most important file is: -@itemize @bullet -@item -netware.patch -- NetWare-specific build instructions and switches -(compare \mysql-4.1\ltmain.sh) -@end itemize -@*@* - -For instructions about basic installation, see "Deployment Guide For -NetWare AMP" at: -@url{http://developer.novell.com/ndk/whitepapers/namp.htm} -@* - -@subsection NEW-RPMS - -Directory to place RPMs while making a distribution. -@*@* - -This directory is not part of the Windows distribution. It is -a temporary directory used during RPM builds with Linux distributions. -@*@* - -@subsection os2 - -Routines for working with the OS2 operating system. -@*@* - -The files in this directory are the product of the efforts of three -people from outside MySQL: Yuri Dario, Timo Maier, and John M -Alfredsson. There are no .C program files in this directory. -@*@* - -The contents of \os2 are: -@itemize @bullet -@item -A Readme.Txt file -@item -An \include subdirectory containing .h files which are for OS/2 only -@item -Files used in the build process (configuration, switches, and one -.obj) -@end itemize -@*@* - -The README file refers to MySQL version 3.23, which suggests that -there have been no updates for MySQL 4.0 for this section. -@*@* - -@subsection pstack - -Process stack display (not currently used). -@*@* - -This is a set of publicly-available debugging aids which all do pretty -well the same thing: display the contents of the stack, along with -symbolic information, for a running process. There are versions for -various object file formats (such as ELF and IEEE-695). Most of the -programs are copyrighted by the Free Software Foundation and are -marked as "part of GNU Binutils". -@*@* - -In other words, the pstack files are not really part of the MySQL -library. They are merely useful when you re-program some MYSQL code -and it crashes. -@*@* - -@subsection regex - -Henry Spencer's Regular Expression library for support of REGEXP function. -@*@* - -This is the copyrighted product of Henry Spencer from the University -of Toronto. It's a fairly-well-known implementation of the -requirements of POSIX 1003.2 Section 2.8. The library is bundled with -Apache and is the default implementation for regular-expression -handling in BSD Unix. MySQL's Monty Widenius has made minor changes in -three programs (debug.c, engine.c, regexec.c) but this is not a MySQL -package. MySQL calls it only in order to support two MySQL functions: -REGEXP and RLIKE. -@*@* - -Some of Mr Spencer's documentation for the regex library can be found -in the README and WHATSNEW files. -@*@* - -One MySQL program which uses regex is \cmd-line-utils\libedit\search.c -@*@* - -This program calls the 'regcomp' function, which is the entry point in -\regex\regexp.c. -@*@* - -@subsection SCCS - -Source Code Control System (not part of source distribution). -@*@* - -You will see this directory if and only if you used BitKeeper for -downloading the source. The files here are for BitKeeper -administration and are not of interest to application programmers. -@*@* - -@subsection scripts - -SQL batches, e.g. mysqlbug and mysql_install_db. -@*@* - -The *.sh filename extension stands for "shell script". Linux -programmers use it where Windows programmers would use a *.bat -(batch filename extension). -@*@* - -The *.sh files on this directory are: -@itemize @bullet -@item -fill_help_tables.sh -- Create help-information tables and insert -@item -make_binary_distribution.sh -- Get configure information, make, produce tar -@item -msql2mysql.sh -- Convert (partly) mSQL programs and scripts to MySQL -@item -mysqlbug.sh -- Create a bug report and mail it -@item -mysqld_multi.sh -- Start/stop any number of mysqld instances -@item -mysqld_safe-watch.sh -- Start/restart in safe mode -@item -mysqld_safe.sh -- Start/restart in safe mode -@item -mysqldumpslow.sh -- Parse and summarize the slow query log -@item -mysqlhotcopy.sh -- Hot backup -@item -mysql_config.sh -- Get configuration information that might be needed to compile a client -@item -mysql_convert_table_format.sh -- Conversion, e.g. from ISAM to MyISAM -@item -mysql_explain_log.sh -- Put a log (made with --log) into a MySQL table -@item -mysql_find_rows.sh -- Search for queries containing <regexp> -@item -mysql_fix_extensions.sh -- Renames some file extensions, not recommended -@item -mysql_fix_privilege_tables.sh -- Fix mysql.user etc. when upgrading. Can be safely run during any upgrade to get the newest -MySQL privilege tables -@item -mysql_install_db.sh -- Create privilege tables and func table -@item -mysql_secure_installation.sh -- Disallow remote root login, eliminate test, etc. -@item -mysql_setpermission.sh -- Aid to add users or databases, sets privileges -@item -mysql_tableinfo.sh -- Puts info re MySQL tables into a MySQL table -@item -mysql_zap.sh -- Kill processes which match pattern -@end itemize -@*@* - -@subsection sql - -Programs for handling SQL commands. The "core" of MySQL. -@*@* - -These are the .c and .cc files in the sql directory: -@itemize @bullet -@item -convert.cc -- convert tables between different character sets -@item -derror.cc -- read language-dependent message file -@item -des_key_file.cc -- load DES keys from plaintext file -@item -field.cc -- "implement classes defined in field.h" (long); defines all storage methods MySQL uses to store field information -into records that are then passed to handlers -@item -field_conv.cc -- functions to copy data between fields -@item -filesort.cc -- sort a result set, using memory or temporary files -@item -frm_crypt.cc -- contains only one short function: get_crypt_for_frm -@item -gen_lex_hash.cc -- Knuth's algorithm from Vol 3 Sorting and Searching, Chapter 6.3; used to search for SQL keywords in a query -@item -gstream.cc -- GTextReadStream, used to read GIS objects -@item -handler.cc -- handler-calling functions -@item -hash_filo.cc -- static-sized hash tables, used to store info like hostname -> ip tables in a FIFO manner -@item -ha_berkeley.cc -- Handler: BDB -@item -ha_heap.cc -- Handler: Heap -@item -ha_innodb.cc -- Handler: InnoDB -@item -ha_isam.cc -- Handler: ISAM -@item -ha_isammrg.cc -- Handler: (ISAM MERGE) -@item -ha_myisam.cc -- Handler: MyISAM -@item -ha_myisammrg.cc -- Handler: (MyISAM MERGE) -@item -hostname.cc -- Given IP, return hostname -@item -init.cc -- Init and dummy functions for interface with unireg -@item -item.cc -- Item functions -@item -item_buff.cc -- Buffers to save and compare item values -@item -item_cmpfunc.cc -- Definition of all compare functions -@item -item_create.cc -- Create an item. Used by lex.h. -@item -item_func.cc -- Numerical functions -@item -item_row.cc -- Row items for comparing rows and for IN on rows -@item -item_sum.cc -- Set functions (SUM, AVG, etc.) -@item -item_strfunc.cc -- String functions -@item -item_subselect.cc -- Item subselect -@item -item_timefunc.cc -- Date/time functions, e.g. week of year -@item -item_uniq.cc -- Empty file, here for compatibility reasons -@item -key.cc -- Functions to create keys from records and compare a key to a key in a record -@item -lock.cc -- Locks -@item -log.cc -- Logs -@item -log_event.cc -- Log event (a binary log consists of a stream of log events) -@item -matherr.c -- Handling overflow, underflow, etc. -@item -mf_iocache.cc -- Caching of (sequential) reads and writes -@item -mini_client.cc -- Client included in server for server-server messaging; used by the replication code -@item -mysqld.cc -- Source of mysqld.exe; includes the main() program that starts mysqld, handling of signals and connections -@item -my_lock.c -- Lock part of a file (like /mysys/my_lock.c, but with timeout handling for threads) -@item -net_serv.cc -- Read/write of packets on a network socket -@item -nt_servc.cc -- Initialize/register/remove an NT service -@item -opt_ft.cc -- Create a FT or QUICK RANGE based on a key (very short) -@item -opt_range.cc -- Range of keys -@item -opt_sum.cc -- Optimize functions in presence of (implied) GROUP BY -@item -password.c -- Password checking -@item -procedure.cc -- Procedure interface, as used in SELECT * FROM Table_name PROCEDURE ANALYSE -@item -protocol.cc -- Low level functions for PACKING data that is sent to client; actual sending done with net_serv.cc -@item -records.cc -- Functions for easy reading of records, possible through a cache -@item -repl_failsafe.cc -- Replication fail-save (not yet implemented) -@item -set_var.cc -- Set and retrieve MySQL user variables -@item -slave.cc -- Procedures for a slave in a master/slave (replication) relation -@item -spatial.cc -- Geometry stuff (lines, points, etc.) -@item -sql_acl.cc -- Functions related to ACL security; checks, stores, retrieves, and deletes MySQL user level privileges -@item -sql_analyse.cc -- Implements the PROCEDURE analyse, which analyses a query result and returns the 'optimal' data type for each result column -@item -sql_base.cc -- Basic functions needed by many modules, like opening and closing tables with table cache management -@item -sql_cache.cc -- SQL query cache, with long comments about how caching works -@item -sql_class.cc -- SQL class; implements the SQL base classes, of which THD (THREAD object) is the most important -@item -sql_crypt.cc -- Encode / decode, very short -@item -sql_db.cc -- Create / drop database -@item -sql_delete.cc -- The DELETE statement -@item -sql_derived.cc -- Derived tables, with long comments -@item -sql_do.cc -- The DO statement -@item -sql_error.cc -- Errors and warnings -@item -sql_handler.cc -- Implements the HANDLER interface, which gives direct access to rows in MyISAM and InnoDB -@item -sql_help.cc -- The HELP statement -@item -sql_insert.cc -- The INSERT statement -@item -sql_lex.cc -- Does lexical analysis of a query; i.e. breaks a query string into pieces and determines the basic type (number, -string, keyword, etc.) of each piece -@item -sql_list.cc -- Only list_node_end_of_list, short (the rest of the list class is implemented in sql_list.h) -@item -sql_load.cc -- The LOAD DATA statement -@item -sql_map.cc -- Memory-mapped files (not yet in use) -@item -sql_manager.cc -- Maintenance tasks, e.g. flushing the buffers periodically; used with BDB table logs -@item -sql_olap.cc -- ROLLUP -@item -sql_parse.cc -- Parse an SQL statement; do initial checks and then jump to the function that should execute the statement -@item -sql_prepare.cc -- Prepare an SQL statement, or use a prepared statement -@item -sql_repl.cc -- Replication -@item -sql_rename.cc -- Rename table -@item -sql_select.cc -- Select and join optimisation -@item -sql_show.cc -- The SHOW statement -@item -sql_string.cc -- String functions: alloc, realloc, copy, convert, etc. -@item -sql_table.cc -- The DROP TABLE and ALTER TABLE statements -@item -sql_test.cc -- Some debugging information -@item -sql_udf.cc -- User-defined functions -@item -sql_union.cc -- The UNION operator -@item -sql_update.cc -- The UPDATE statement -@item -stacktrace.c -- Display stack trace (Linux/Intel only) -@item -table.cc -- Table metadata retrieval; read the table definition from a .frm file and store it in a TABLE object -@item -thr_malloc.cc -- Thread-safe interface to /mysys/my_alloc.c -@item -time.cc -- Date and time functions -@item -udf_example.cc -- Example file of user-defined functions -@item -uniques.cc -- Function to handle quick removal of duplicates -@item -unireg.cc -- Create a unireg form file (.frm) from a FIELD and field-info struct -@end itemize -@*@* - -@subsection sql-bench - -The MySQL Benchmarks. -@*@* - -This directory has the programs and input files which MySQL uses for -its comparisons of MySQL, PostgreSQL, mSQL, Solid, etc. Since MySQL -publishes the comparative results, it's only right that it should make -available all the material necessary to reproduce all the tests. -@*@* - -There are five subdirectories and sub-subdirectories: -@itemize @bullet -@item -\Comments -- Comments about results from tests of Access, Adabas, etc. -@item -\Data\ATIS -- .txt files containing input data for the "ATIS" tests -@item -\Data\Wisconsin -- .txt files containing input data for the "Wisconsin" tests -@item -\Results -- old test results -@item -\Results-win32 -- old test results from Windows 32-bit tests -@end itemize -@*@* - -There are twenty-four *.sh (shell script) files, which involve Perl -programs. -@*@* - -There are three *.bat (batch) files. -@*@* - -There is one README file and one TODO file. -@*@* - -@subsection SSL - -Secure Sockets Layer; includes an example certification one can use -test an SSL (secure) database connection. -@*@* - -This isn't a code directory. It contains a short note from Tonu Samuel -(the NOTES file) and seven *.pem files. PEM stands for "Privacy -Enhanced Mail" and is an Internet standard for adding security to -electronic mail. Finally, there are two short scripts for running -clients and servers over SSL connections. -@*@* - -@subsection strings - -The string library. -@*@* - -Many of the files in this subdirectory are equivalent to well-known -functions that appear in most C string libraries. For those, there is -documentation available in most compiler handbooks. -@*@* - -On the other hand, some of the files are MySQL additions or -improvements. Often the MySQL changes are attempts to optimize the -standard libraries. It doesn't seem that anyone tried to optimize for -recent Pentium class processors, though. -@*@* - -The .C files are: -@itemize @bullet -@item -atof.c -- ascii-to-float, MySQL version -@item -bchange.c -- short replacement routine written by Monty Widenius in -1987 -@item -bcmp.c -- binary compare, rarely used -@item -bcopy-duff.c -- block copy: attempt to copy memory blocks faster -than cmemcpy -@item -bfill.c -- byte fill, to fill a buffer with (length) copies of a -byte -@item -bmove.c -- block move -@item -bmove512.c -- "should be the fastest way to move a multiple of 512 -bytes" -@item -bmove_upp.c -- bmove.c variant, starting with last byte -@item -bzero.c -- something like bfill with an argument of 0 -@item -conf_to_src.c -- reading a configuration file -@item -ctype*.c -- string handling programs for each char type MySQL -handles -@item -do_ctype.c -- display case-conversion and sort-conversion tables -@item -int2str.c -- integer-to-string -@item -is_prefix.c -- checks whether string1 starts with string2 -@item -llstr.c -- convert long long to temporary-buffer string, return -pointer -@item -longlong2str.c -- ditto, but to argument-buffer -@item -memcmp.c -- memory compare -@item -memset.c -- memory set -@item -my_vsnprintf.c -- variant of printf -@item -r_strinstr.c -- see if one string is within another -@item -str2int.c -- convert string to integer -@item -strappend.c -- fill up a string to n characters -@item -strcat.c -- concatenate strings -@item -strcend.c -- point to where a character C occurs within str, or NULL -@item -strchr.c -- point to first place in string where character occurs -@item -strcmp.c -- compare two strings -@item -strcont.c -- point to where any one of a set of characters appears -@item -strend.c -- point to the '\0' byte which terminates str -@item -strfill.c -- fill a string with n copies of a byte -@item -strinstr.c -- find string within string -@item -strlen.c -- return length of string in bytes -@item -strmake.c -- create new string from old string with fixed length, append end \0 if needed -@item -strmov.c -- move source to dest and return pointer to end -@item -strnlen.c -- return min(length of string, n) -@item -strnmov.c -- move source to dest for source size, or for n bytes -@item -strrchr.c -- find a character within string, searching from end -@item -strstr.c -- find an instance of pattern within source -@item -strto.c -- string to long, to long long, to unsigned long, etc. -@item -strtol.c -- string to long -@item -strtoll.c -- string to long long -@item -strtoul.c -- string to unsigned long -@item -strtoull.c -- string to unsigned long long -@item -strxmov.c -- move a series of concatenated source strings to dest -@item -strxnmov.c -- like strxmov.c but with a maximum length n -@item -str_test.c -- test of all the string functions encoded in assembler -@item -udiv.c -- unsigned long divide, for operating systems that don't support these -@item -xml.c -- read and parse XML strings; used to read character definition information stored in /sql/share/charsets -@end itemize -@*@* - -There are also four .ASM files -- macros.asm, ptr_cmp.asm, -strings.asm, and strxmov.asm -- which can replace some of the -C-program functions. But again, they look like optimizations for old -members of the Intel processor family. -@*@* - -@subsection support-files - -Files used to build MySQL on different systems. -@*@* - -The files here are for building ("making") MySQL given a package -manager, compiler, linker, and other build tools. The support files -provide instructions and switches for the build processes. They -include example my.cnf files one can use as a default setup for -MySQL. -@*@* - -@subsection tests - -Tests in Perl and in C. -@*@* - -The files in this directory are test programs that can be used -as a base to write a program to simulate problems in MySQL in various -scenarios: forks, locks, big records, exporting, truncating, and so on. -Some examples are: -@itemize @bullet -@item -connect_test.c -- test that a connect is possible -@item -insert_test.c -- test that an insert is possible -@item -list_test.c -- test that a select is possible -@item -select_test.c -- test that a select is possible -@item -showdb_test.c -- test that a show-databases is possible -@item -ssl_test.c -- test that SSL is possible -@item -thread_test.c -- test that threading is possible -@end itemize -@*@* - -@subsection tools - -Tools -- well, actually, one tool. -@*@* - -The only file is: -@itemize @bullet -@item -mysqlmanager.c -- A "server management daemon" by Sasha Pachev. This -is a tool under development and is not yet useful. Related to fail-safe -replication. -@end itemize -@*@* - -@subsection VC++Files - -Visual C++ Files. -@*@* - -Includes this entire directory, repeated for VC++ (Windows) use. -@*@* - -VC++Files includes a complete environment to compile MySQL with the VC++ -compiler. To use it, just copy the files on this directory; the make_win_src_distribution.sh -script uses these files to create a Windows source installation. -@*@* - -This directory has subdirectories which are copies of the main directories. -For example, there is a subdirectory \VC++Files\heap, which has the Microsoft -developer studio project file to compile \heap with VC++. So for a description -of the files in \VC++Files\heap, see the description of the files in \heap. The -same applies for almost all of VC++Files's subdirectories (bdb, client, -isam, libmysql, etc.). The difference is that the \VC++Files variants -are specifically for compilation with Microsoft Visual C++ in 32-bit -Windows environments. -@*@* - -In addition to the "subdirectories which are duplicates of -directories", VC++Files contains these subdirectories, which are not -duplicates: -@itemize @bullet -@item -comp_err -- (nearly empty) -@item -contrib -- (nearly empty) -@item -InstallShield -- script files -@item -isamchk -- (nearly empty) -@item -libmysqltest -- one small non-MySQL test program: mytest.c -@item -myisamchk -- (nearly empty) -@item -myisamlog -- (nearly empty) -@item -myisammrg -- (nearly empty) -@item -mysqlbinlog -- (nearly empty) -@item -mysqlmanager -- MFC foundation class files created by AppWizard -@item -mysqlserver -- (nearly empty) -@item -mysqlshutdown -- one short program, mysqlshutdown.c -@item -mysqlwatch.c -- Windows service initialization and monitoring -@item -my_print_defaults -- (nearly empty) -@item -pack_isam -- (nearly empty) -@item -perror -- (nearly empty) -@item -prepare -- (nearly empty) -@item -replace -- (nearly empty) -@item -SCCS -- source code control system -@item -test1 -- tests connecting via X threads -@item -thr_insert_test -- (nearly empty) -@item -thr_test -- one short program used to test for memory-allocation bug -@item -winmysqladmin -- the winmysqladmin.exe source -@end itemize -@*@* - -The "nearly empty" subdirectories noted above (e.g. comp_err and isamchk) -are needed because VC++ requires one directory per project (i.e. executable). -We are trying to keep to the MySQL standard source layout and compile only -to different directories. -@*@* - -@subsection vio - -Virtual I/O Library. -@*@* - -The VIO routines are wrappers for the various network I/O calls that -happen with different protocols. The idea is that in the main modules -one won't have to write separate bits of code for each protocol. Thus -vio's purpose is somewhat like the purpose of Microsoft's winsock -library. -@*@* - -The underlying protocols at this moment are: TCP/IP, Named Pipes (for -WindowsNT), Shared Memory, and Secure Sockets (SSL). -@*@* - -The C programs are: -@itemize @bullet -@item -test-ssl.c -- Short standalone test program: SSL -@item -test-sslclient.c -- Short standalone test program: clients -@item -test-sslserver.c -- Short standalone test program: server -@item -vio.c -- Declarations + open/close functions -@item -viosocket.c -- Send/retrieve functions -@item -viossl.c -- SSL variations for the above -@item -viosslfactories.c -- Certification / Verification -@item -viotest.cc -- Short standalone test program: general -@item -viotest-ssl.c -- Short standalone test program: SSL -@item -viotest-sslconnect.cc -- Short standalone test program: SSL connect -@end itemize -@*@* - -The older functions -- raw_net_read, raw_net_write -- are now -obsolete. -@*@* - -@subsection zlib - -Data compression library, used on Windows. -@*@* - -zlib is a data compression library used to support the compressed -protocol and the COMPRESS/UNCOMPRESS functions under Windows. -On Unix, MySQL uses the system libgz.a library for this purpose. -@*@* - -Zlib -- which presumably stands for "Zip Library" -- is not a MySQL -package. It was produced by the GNU Zip (gzip.org) people. Zlib is a -variation of the famous "Lempel-Ziv" method, which is also used by -"Zip". The method for reducing the size of any arbitrary string of -bytes is as follows: -@itemize @bullet -@item -Find a substring which occurs twice in the string. -@item -Replace the second occurrence of the substring with (a) a pointer to -the first occurrence, plus (b) an indication of the length of the -first occurrence. -@end itemize - -There is a full description of the library's functions in the gzip -manual at: @* -@url{http://www.gzip.org/zlib/manual.html} @* -There is therefore no need to list the modules in this document. -@*@* - -The MySQL program \mysys\my_compress.c uses zlib for packet compression. -The client sends messages to the server which are compressed by zlib. -See also: \sql\net_serv.cc. - -@node Files in InnoDB Sources, , Files in MySQL Sources, Top -@chapter Annotated List Of Files in the InnoDB Source Code Distribution - -ERRATUM BY HEIKKI TUURI (START) -@*@* - -Errata about InnoDB row locks:@*@* - -@example - #define LOCK_S 4 /* shared */ - #define LOCK_X 5 /* exclusive */ -... -@strong{/* Waiting lock flag */} - #define LOCK_WAIT 256 -/* this wait bit should be so high that it can be ORed to the lock -mode and type; when this bit is set, it means that the lock has not -yet been granted, it is just waiting for its turn in the wait queue */ -... -@strong{/* Precise modes */} - #define LOCK_ORDINARY 0 -/* this flag denotes an ordinary next-key lock in contrast to LOCK_GAP -or LOCK_REC_NOT_GAP */ - #define LOCK_GAP 512 -/* this gap bit should be so high that it can be ORed to the other -flags; when this bit is set, it means that the lock holds only on the -gap before the record; for instance, an x-lock on the gap does not -give permission to modify the record on which the bit is set; locks of -this type are created when records are removed from the index chain of -records */ - #define LOCK_REC_NOT_GAP 1024 -/* this bit means that the lock is only on the index record and does -NOT block inserts to the gap before the index record; this is used in -the case when we retrieve a record with a unique key, and is also used -in locking plain SELECTs (not part of UPDATE or DELETE) when the user -has set the READ COMMITTED isolation level */ - #define LOCK_INSERT_INTENTION 2048 -/* this bit is set when we place a waiting gap type record lock -request in order to let an insert of an index record to wait until -there are no conflicting locks by other transactions on the gap; note -that this flag remains set when the waiting lock is granted, or if the -lock is inherited to a neighboring record */ -@end example -@* - -ERRATUM BY HEIKKI TUURI (END) -@*@* - -The InnoDB source files are the best place to look for information -about internals of the file structure that MySQLites can optionally -use for transaction support. But when you first look at all the -subdirectories and file names you'll wonder: Where Do I Start? It can -be daunting. -@*@* - -Well, I've been through that phase, so I'll pass on what I had to -learn on the first day that I looked at InnoDB source files. I am very -sure that this will help you grasp, in overview, the organization of -InnoDB modules. I'm also going to add comments about what is going on --- which you should mistrust! These comments are reasonable working -hypotheses; nevertheless, they have not been subjected to expert peer -review. -@*@* - -Here's how I'm going to organize the discussion. I'll take each of the -32 InnoDB subdirectories that come with the MySQL 4.0 source code in -\mysql\innobase (on my Windows directory). The format of each section -will be like this every time: -@*@* - -@strong{\subdirectory-name (LONGER EXPLANATORY NAME)}@* -@multitable @columnfractions .10 .20 .40 .50 -@item @strong{File Name} @tab @strong{What Name Stands For} @tab @strong{Size} @tab @strong{Comment Inside File} -@item file-name -@tab my-own-guess -@tab in-bytes -@tab from-the-file-itself -@end multitable -...@* -My-Comments@* -@* - -For example: @* -@example -" -@strong{\ha (HASHING)} - File Name What Name Stands For Size Comment Inside File - --------- -------------------- ------ ------------------- - ha0ha.c Hashing/Hashing 7,452 Hash table with external chains - - Comments about hashing will be here. -" -@end example -@* - -The "Comment Inside File" column is a direct copy from the first /* -comment */ line inside the file. All other comments are mine. After -I've discussed each directory, I'll finish with some notes about -naming conventions and a short list of URLs that you can use for -further reference. -@*@* - -Now let's begin. -@*@* - -@example - -@strong{\ha (HASHING)} - File Name What Name Stands For Size Comment Inside File - --------- -------------------- ------ ------------------- - ha0ha.c Hashing / Hashing 7,452 Hash table with external chains - -I'll hold my comments until the next section, \hash (HASHING). - -@strong{\hash (HASHING)} - File Name What Name Stands For Size Comment Inside File - --------- -------------------- ------ ------------------- - hash0hash.c Hashing / Hashing 3,257 Simple hash table utility - -The two C programs in the \ha and \hashing directories -- ha0ha.c and -hash0hash.c -- both refer to a "hash table" but hash0hash.c is -specialized, it is mostly about accessing points in the table under -mutex control. - -When a "database" is so small that InnoDB can load it all into memory -at once, it's more efficient to access it via a hash table. After all, -no disk i/o can be saved by using an index lookup, if there's no disk. - -@strong{\os (OPERATING SYSTEM)} - File Name What Name Stands For Size Comment Inside File - --------- -------------------- ------ ------------------- - os0shm.c OS / Shared Memory 3,150 To shared memory primitives - os0file.c OS / File 64,412 To i/o primitives - os0thread.c OS / Thread 6,827 To thread control primitives - os0proc.c OS / Process 3,700 To process control primitives - os0sync.c OS / Synchronization 10,208 To synchronization primitives - -This is a group of utilities that other modules may call whenever they -want to use an operating-system resource. For example, in os0file.c -there is a public InnoDB function named os_file_create_simple(), which -simply calls the Windows-API function CreateFile. Naturally the -contents of this group are somewhat different for other operating systems. - -The "Shared Memory" functions in os0shm.c are only called from the -communications program com0shm.c (see \com COMMUNICATIONS). The i/o -and thread-control primitives are called extensively. The word -"synchronization" in this context refers to the mutex-create and -mutex-wait functionality. - -@strong{\ut (UTILITIES)} - File Name What Name Stands For Size Comment Inside File - --------- -------------------- ------ ------------------- - ut0ut.c Utilities / Utilities 7,041 Various utilities - ut0byte.c Utilities / Debug 1,856 Byte utilities - ut0rnd.c Utilities / Random 1,475 Random numbers and hashing - ut0mem.c Utilities / Memory 5,530 Memory primitives - ut0dbg.c Utilities / Debug 642 Debug utilities - -The two functions in ut0byte.c are just for lower/upper case -conversion and comparison. The single function in ut0rnd.c is for -finding a prime slightly greater than the given argument, which is -useful for hash functions, but unrelated to randomness. The functions -in ut0mem.c are wrappers for "malloc" and "free" calls -- for the -real "memory" module see section \mem (MEMORY). Finally, the -functions in ut0ut.c are a miscellany that didn't fit better elsewhere: -get_high_bytes, clock, time, difftime, get_year_month_day, and "sprintf" -for various diagnostic purposes. - -In short: the \ut group is trivial. - -@strong{\buf (BUFFERING)} - File Name What Name Stands For Size Comment Inside File - --------- -------------------- ------ ------------------- - buf0buf.c Buffering / Buffering 53,246 The database buffer buf_pool - buf0flu.c Buffering / Flush 23,711 ... flush algorithm - buf0lru.c / least-recently-used 20,245 ... replacement algorithm - buf0rea.c Buffering / read 17,399 ... read - -There is a separate file group (\mem MEMORY) which handles memory -requests in general.A "buffer" usually has a more specific -definition, as a memory area which contains copies of pages that -ordinarily are in the main data file. The "buffer pool" is the set -of all buffers (there are lots of them because InnoDB doesn't -depend on the OS's caching to make things faster). - -The pool size is fixed (at the time of this writing) but the rest of -the buffering architecture is sophisticated, involving a host of -control structures. In general: when InnoDB needs to access a new page -it looks first in the buffer pool; InnoDB reads from disk to a new -buffer when the page isn't there; InnoDB chucks old buffers (basing -its decision on a conventional Least-Recently-Used algorithm) when it -has to make space for a new buffer. - -There are routines for checking a page's validity, and for read-ahead. -An example of "read-ahead" use: if a sequential scan is going on, then -a DBMS can read more than one page at a time, which is efficient -because reading 32,768 bytes (two pages) takes less than twice as long -as reading 16,384 bytes (one page). - -@strong{\btr (B-TREE)} - File Name What Name Stands For Size Comment Inside File - --------- -------------------- ------ ------------------- - btr0btr.c B-tree / B-tree 74,255 B-tree - btr0cur.c B-tree / Cursor 94,950 index tree cursor - btr0sea.c B-tree / Search 36,580 index tree adaptive search - btr0pcur.c B-tree / persistent cursor 14,548 index tree persistent cursor - -If you total up the sizes of the C files, you'll see that \btr is the -second-largest file group in InnoDB. This is understandable because -maintaining a B-tree is a relatively complex task. Luckily, there has -been a lot of work done to describe efficient management of B-tree and -B+-tree structures, much of it open-source or public-domain, since -their original invention over thirty years ago. - -InnoDB likes to put everything in B-trees. This is what I'd call a -"distinguishing characteristic" because in all the major DBMSs (like -IBM DB2, Microsoft SQL Server, and Oracle), the main or default or -classic structure is the heap-and-index. In InnoDB the main structure -is just the index. To put it another way: InnoDB keeps the rows in the -leaf node of the index, rather than in a separate file. Compare -Oracle's Index Organized Tables, and Microsoft SQL Server's Clustered -Indexes. - -This, by the way, has some consequences. For example, you may as well -have a primary key since otherwise InnoDB will make one anyway. And -that primary key should be the shortest of the candidate keys, since -InnoDB -will use it as a pointer if there are secondary indexes. - -Most importantly, it means that rows have no fixed address. Therefore -the routines for managing file pages should be good. We'll see about -that when we look at the \row (ROW) program group later. - -@strong{\com (COMMUNCATION)} - File Name What Name Stands For Size Comment Inside File - --------- -------------------- ------ ------------------- - com0com.c Communication 6,913 Communication primitives - com0shm.c Communication / 24,633 ... through shared memory - Shared Memory - -The communication primitives in com0com.c are said to be modelled -after the ones in Microsoft's winsock library (the Windows Sockets -interface). The communication primitives in com0shm.c are at a -slightly lower level, and are called from the routines in com0com.c. - -I was interested in seeing how InnoDB would handle inter-process -communication, since there are many options -- named pipes, TCP/IP, -Windows messaging, and Shared Memory being the main ones that come to -mind. It appears that InnoDB prefers Shared Memory. The main idea is: -there is an area of memory which two different processes (or threads, -of course) can both access. To communicate, a thread gets an -appropriate mutex, puts in a request, and waits for a response. Thread -interaction is also a subject for the os0thread.c program in another -program group, \os (OPERATING SYSTEM). - -@strong{\dyn (DYNAMICALLY ALLOCATED ARRAY)} - File Name What Name Stands For Size Comment Inside File - --------- -------------------- ------ ------------------- - dyn0dyn.c Dynamic / Dynamic 994 dynamically allocated array - -There is a single function in the dyn0dyn.c program, for adding a -block to the dynamically allocated array. InnoDB might use the array -for managing concurrency between threads. - -At the moment, the \dyn program group is trivial. - -@strong{\fil (FILE)} - File Name What Name Stands For Size Comment Inside File - --------- -------------------- ------ ------------------- - fil0fil.c File / File 39,725 The low-level file system - -The reads and writes to the database files happen here, in -co-ordination with the low-level file i/o routines (see os0file.h in -the \os program group). - -Briefly: a table's contents are in pages, which are in files, which -are in tablespaces. Files do not grow; instead one can add new files -to the tablespace. As we saw earlier (discussing the \btr program group) -the pages are nodes of B-trees. Since that's the case, new additions can -happen at various places in the logical file structure, not -necessarily at the end. Reads and writes are asynchronous, and go into -buffers, which are set up by routines in the \buf program group. - -@strong{\fsp (FILE SPACE)} - File Name What Name Stands For Size Comment Inside File - --------- -------------------- ------ ------------------- - fsp0fsp.c File Space Management 100,271 File space management - -I would have thought that the \fil (FILE) and \fsp (FILE SPACE) -MANAGEMENT programs would fit together in the same program group; -however, I guess the InnoDB folk are splitters rather than lumpers. - -It's in fsp0fsp.c that one finds some of the descriptions and comments -of extents, segments, and headers. For example, the "descriptor bitmap -of the pages in the extent" is in here, and you can find as well how -the free-page list is maintained, what's in the bitmaps, and what -various header fields' contents are. - -@strong{\fut (FILE UTILITY)} - File Name What Name Stands For Size Comment Inside File - --------- -------------------- ------ ------------------- - fut0fut.c File Utility / Utility 293 File-based utilities - fut0lst.c File Utility / List 14,129 File-based list utilities - -Mainly these small programs affect only file-based lists, so maybe -saying "File Utility" is too generic. The real work with data files -goes on in the \fsp program group. - -@strong{\log (LOGGING)} - File Name What Name Stands For Size Comment Inside File - --------- -------------------- ------ ------------------- - log0log.c Logging / Logging 77,834 Database log - log0recv.c Logging / Recovery 80,701 Recovery - -I've already written about the \log program group, so here's a link to -my previous article: "How Logs work with MySQL and InnoDB": -@url{http://www.devarticles.com/art/1/181/2} - -@strong{\mem (MEMORY)} - File Name What Name Stands For Size Comment Inside File - --------- -------------------- ------ ------------------- - mem0mem.c Memory / Memory 9,971 The memory management - mem0dbg.c Memory / Debug 21,297 ... the debug code - mem0pool.c Memory / Pool 16,293 ... the lowest level - -There is a long comment at the start of the mem0pool.c program, which -explains what the memory-consumers are, and how InnoDB tries to -satisfy them. The main thing to know is that there are really three -pools: the buffer pool (see the \buf program group), the log pool (see the \log -program group), and the common pool, which is where everything that's -not in the buffer or log pools goes (for example the parsed SQL -statements and the data dictionary cache). - -@strong{\mtr (MINI-TRANSACTION)} - File Name What Name Stands For Size Comment Inside File - --------- -------------------- ------ ------------------- - mtr0mtr.c Mini-transaction / 12,433 Mini-transaction buffer - mtr0log.c Mini-transaction / Log 8,180 ... log routines - -The mini-transaction routines are called from most of the other -program groups. I'd describe this as a low-level utility set. - -@strong{\que (QUERY GRAPH)} - File Name What Name Stands For Size Comment Inside File - --------- -------------------- ------ ------------------- - que0que.c Query Graph / Query 35,964 Query graph - -The program que0que.c ostensibly is about the execution of stored -procedures which contain commit/rollback statements. I took it that -this has little importance for the average MySQL user. - -@strong{\rem (RECORD MANAGER)} - File Name What Name Stands For Size Comment Inside File - --------- -------------------- ------ ------------------- - rem0rec.c Record Manager 14,961 Record Manager - rem0cmp.c Record Manager / 25,263 Comparison services for records - Comparison - -There's an extensive comment near the start of rem0rec.c title -"Physical Record" and it's recommended reading. At some point you'll -ask what are all those bits that surround the data in the rows on a page, -and this is where you'll find the answer. - -@strong{\row (ROW)} - File Name What Name Stands For Size Comment Inside File - --------- -------------------- ------ ------------------- - row0row.c Row / Row 16,764 General row routines - row0uins.c Row / Undo Insert 7,199 Fresh insert undo - row0umod.c Row / Undo Modify 17,147 Undo modify of a row - row0undo.c Row / Undo 10,254 Row undo - row0vers.c Row / Version 12,288 Row versions - row0mysql.c Row / MySQL 63,556 Interface [to MySQL] - row0ins.c Row / Insert 42,829 Insert into a table - row0sel.c Row / Select 85,923 Select - row0upd.c Row / Update 44,456 Update of a row - row0purge.c Row / Purge 14,961 Purge obsolete records - -Rows can be selected, inserted, updated/deleted, or purged (a -maintenance activity). These actions have ancillary actions, for -example after insert there can be an index-update test, but it seems -to me that sometimes the ancillary action has no MySQL equivalent (yet) -and so is inoperative. - -Speaking of MySQL, notice that one of the larger programs in the \row -program group is the "interface between Innobase row operations and -MySQL" (row0mysql.c) -- information interchange happens at this level -because rows in InnoDB and in MySQL are analogous, something which -can't be said for pages and other levels. - -@strong{\srv (Server)} - File Name What Name Stands For Size Comment Inside File - --------- -------------------- ------ ------------------- - srv0srv.c Server / Server 79,058 Server main program - srv0que.c Server / Query 2,361 Server query execution - srv0start.c Server / Start 34,586 Starts the server - -This is where the server reads the initial configuration files, splits -up the threads, and gets going. There is a long comment deep in the -program (you might miss it at first glance) titled "IMPLEMENTATION OF -THE SERVER MAIN PROGRAM" in which you'll find explanations about -thread priority, and about what the responsibiities are for various -thread types. - -InnoDB has many threads, for example "user threads" (which wait for -client requests and reply to them), "parallel communication threads" -(which take part of a user thread's job if a query process can be -split), "utility threads" (background priority), and a "master thread" -(high priority, usually asleep). - -@strong{\thr (Thread Local Storage)} - File Name What Name Stands For Size Comment Inside File - --------- -------------------- ------ ------------------- - thr0loc.c Thread / Local 5,261 The thread local storage - -InnoDB doesn't use the Windows-API thread-local-storage functions, -perhaps because they're not portable enough. - -@strong{\trx (Transaction)} - File Name What Name Stands For Size Comment Inside File - --------- -------------------- ------ ------------------- - trx0trx.c Transaction / 37,447 The transaction - trx0purge.c Transaction / Purge 26,782 ... Purge old versions - trx0rec.c Transaction / Record 36,525 ... Undo log record - trx0sys.c Transaction / System 20,671 ... System - trx0rseg.c / Rollback segment 6,214 ... Rollback segment - trx0undo.c Transaction / Undo 46,595 ... Undo log - -InnoDB's transaction management is supposedly "in the style of Oracle" -and that's close to true but can mislead you. -@itemize -@item -First: InnoDB uses rollback segments like Oracle8i does -- but -Oracle9i uses a different name -@item -Second: InnoDB uses multi-versioning like Oracle does -- but I see -nothing that looks like an Oracle ITL being stored in the InnoDB data -pages. -@item -Third: InnoDB and Oracle both have short (back-to-statement-start) -versioning for the READ COMMITTED isolation level and long -(back-to-transaction-start) versioning for higher levels -- but InnoDB -and Oracle have different "default" isolation levels. -@item -Finally: InnoDB's documentation says it has to lock "the gaps before -index keys" to prevent phantoms -- but any Oracle user will tell you that -phantoms are impossible anyway at the SERIALIZABLE isolation level, so -key-locks are unnecessary. -@end itemize - -The main idea, though, is that InnoDB has multi-versioning. So does -Oracle. This is very different from the way that DB2 and SQL Server do -things. - -@strong{\usr (USER)} - File Name What Name Stands For Size Comment Inside File - --------- -------------------- ------ ------------------- - usr0sess.c User / Session 27,415 Sessions - -One user can have multiple sessions (the session being all the things -that happen betweeen a connect and disconnect). This is where InnoDB -tracks session IDs, and server/client messaging. It's another of those -items which is usually MySQL's job, though. - -@strong{\data (DATA)} - File Name What Name Stands For Size Comment Inside File - --------- -------------------- ------ ------------------- - data0data.c Data / Data 26,002 SQL data field and tuple - data0type.c Data / Type 2,122 Data types - -This is a collection of minor utility routines affecting rows. - -@strong{\dict (DICTIONARY)} - File Name What Name Stands For Size Comment Inside File - --------- -------------------- ------ ------------------- - dict0dict.c Dictionary / Dictionary 84,667 Data dictionary system - dict0boot.c Dictionary / boot 12,134 ... creation and booting - dict0load.c Dictionary / load 26,546 ... load to memory cache - dict0mem.c Dictionary / memory 8,221 ... memory object creation - -The data dictionary (known in some circles as the catalog) has the -metadata information about objects in the database -- column sizes, -table names, and the like. - -@strong{\eval (EVALUATING)} - File Name What Name Stands For Size Comment Inside File - --------- -------------------- ------ ------------------- - eval0eval.c Evaluating/Evaluating 15,682 SQL evaluator - eval0proc.c Evaluating/Procedures 5,000 Executes SQL procedures - -The evaluating step is a late part of the process of interpreting an -SQL statement -- parsing has already occurred during \pars (PARSING). - -The ability to execute SQL stored procedures is an InnoDB feature, but -not a MySQL feature, so the eval0proc.c program is unimportant. - -@strong{\ibuf (INSERT BUFFER)} - File Name What Name Stands For Size Comment Inside File - --------- -------------------- ------ ------------------- - ibuf0ibuf.c Insert Buffer / 69,884 Insert buffer - -The words "Insert Buffer" mean not "buffer used for INSERT" but -"insertion of a buffer into the buffer pool" (see the \buf BUFFER -program group description). The matter is complex due to possibilities -for deadlocks, a problem to which the comments in the ibuf0ibuf.c -program devote considerable attention. - -@strong{\mach (MACHINE FORMAT)} - File Name What Name Stands For Size Comment Inside File - --------- -------------------- ------ ------------------- - mach0data.c Machine/Data 2,319 Utilities for converting - -The mach0data.c program has two small routines for reading compressed -ulints (unsigned long integers). - -@strong{\lock (LOCKING)} - File Name What Name Stands For Size Comment Inside File - --------- -------------------- ------ ------------------- - lock0lock.c Lock / Lock 127,646 The transaction lock system - -If you've used DB2 or SQL Server, you might think that locks have their -own in-memory table, that row locks might need occasional escalation to -table locks, and that there are three lock types: Shared, Update, Exclusive. - -All those things are untrue with InnoDB! Locks are kept in the database -pages. A bunch of row locks can't be rolled together into a single table -lock. And most importantly there's only one lock type. I call this type -"Update" because it has the characteristics of DB2 / SQL Server Update -locks, that is, it blocks other updates but doesn't block reads. -Unfortunately, InnoDB comments refer to them as "x-locks" etc. - -To sum it up: if your background is Oracle you won't find too much -surprising, but if your background is DB2 or SQL Server the locking -concepts and terminology will probably confuse you at first. - -You can find an online article about the differences between -Oracle-style and DB2/SQL-Server-style locks at: -@url{http://dbazine.com/gulutzan6.html} - -@strong{\odbc (ODBC)} - File Name What Name Stands For Size Comment Inside File - --------- -------------------- ------ ------------------- - odbc0odbc.c ODBC / ODBC 16,865 ODBC client library - -The odbc0odbc.c program has a small selection of old ODBC-API -functions: SQLAllocEnv, SQLAllocConnect, SQLAllocStmt, SQLConnect, -SQLError, SQLPrepare, SQLBindParameter, SQLExecute. - -@strong{\page (PAGE)} - File Name What Name Stands For Size Comment Inside File - --------- -------------------- ------ ------------------- - page0page.c Page / Page 44,309 Index page routines - page0cur.c Page / Cursor 30,305 The page cursor - -It's in the page0page.c program that you'll learn as follows: index -pages start with a header, entries in the page are in order, at the -end of the page is a sparse "page directory" (what I would have called -a slot table) which makes binary searches easier. - -Incidentally, the program comments refer to "a page size of 8 kB" -which seems obsolete. In univ.i (a file containing universal -constants) the page size is now #defined as 16KB. - -@strong{\pars (PARSING)} - File Name What Name Stands For Size Comment Inside File - --------- -------------------- ------ ------------------- - pars0pars.c Parsing/Parsing 49,947 SQL parser - pars0grm.c Parsing/Grammar 62,685 A Bison parser - pars0opt.c Parsing/Optimizer 30,809 Simple SQL Optimizer - pars0sym.c Parsing/Symbol Table 5,541 SQL parser symbol table - lexyy.c ?/Lexer 59,948 Lexical scanner - -The job is to input a string containing an SQL statement and output an -in-memory parse tree. The EVALUATING (subdirectory \eval) programs -will use the tree. - -As is common practice, the Bison and Flex tools were used -- pars0grm.c -is what the Bison parser produced from an original file named pars0grm.y -(not supplied), and lexyy.c is what Flex produced. - -Since InnoDB is a DBMS by itself, it's natural to find SQL parsing in -it. But in the MySQL/InnoDB combination, MySQL handles most of the -parsing. These files are unimportant. - -@strong{\read (READ)} - File Name What Name Stands For Size Comment Inside File - --------- -------------------- ------ ------------------- - read0read.c Read / Read 6,244 Cursor read - -The read0read.c program opens a "read view" of a query result, using -some functions in the \trx program group. - -@strong{\sync (SYNCHRONIZATION)} - File Name What Name Stands For Size Comment Inside File - --------- -------------------- ------ ------------------- - sync0sync.c Synchronization / 35,918 Mutex, the basic sync primitive - sync0arr.c ... / array 26,461 Wait array used in primitives - sync0ipm.c ... / interprocess 4,027 for interprocess sync - sync0rw.c ... / read-write 22,220 read-write lock for thread sync - -A mutex (Mutual Exclusion) is an object which only one thread/process -can hold at a time. Any modern operating system API has some functions -for mutexes; however, as the comments in the sync0sync.c code indicate, it -can be faster to write one's own low-level mechanism. In fact the old -assembly-language XCHG trick is in here -- this is the only program -that contains any assembly code. -@end example -@* -@* - -This is the end of the section-by-section account of InnoDB -subdirectories. -@*@* - -@strong{A Note About File Naming} @*@* - -There appears to be a naming convention. The first letters of the file -name are the same as the subdirectory name, then there is a '0' -separator, then there is an individual name. For the main program in a -subdirectory, the individual name may be a repeat of the subdirectory -name. For example, there is a file named ha0ha.c (the first two -letters ha mean "it's in in subdirectory ..\ha", the next letter 0 -means "0 separator", the next two letters mean "this is the main ha -program"). This naming convention is not strict, though: for example -the file lexyy.c is in the \pars subdirectory. -@*@* - -@strong{A Note About Copyrights} @*@* - -Most of the files begin with a copyright notice or a creation date, -for example "Created 10/25/1995 Heikki Tuuri". I don't know a great -deal about the history of InnoDB, but found it interesting that most -creation dates were between 1994 and 1998. -@*@* - -@strong{References} @*@* - -Ryan Bannon, Alvin Chin, Faryaaz Kassam and Andrew Roszko @* -"InnoDB Concrete Architecture" @* -@url{http://www.swen.uwaterloo.ca/~mrbannon/cs798/assignment_02/innodb.pdf} - -A student paper. It's an interesting attempt to figure out InnoDB's -architecture using tools, but I didn't end up using it for the specific -purposes of this article. -@*@* - -Peter Gulutzan @* -"How Logs Work With MySQL And InnoDB" @* -@url{http://www.devarticles.com/art/1/181/2} -@*@* - -Heikki Tuuri @* -"InnoDB Engine in MySQL-Max-3.23.54 / MySQL-4.0.9: The Up-to-Date -Reference Manual of InnoDB" @* -@url{http://www.innodb.com/ibman.html} - -This is the natural starting point for all InnoDB information. Mr -Tuuri also appears frequently on MySQL forums. -@*@* - -@summarycontents -@contents - -@bye |