diff options
author | unknown <serg@serg.mysql.com> | 2001-03-26 22:39:03 +0200 |
---|---|---|
committer | unknown <serg@serg.mysql.com> | 2001-03-26 22:39:03 +0200 |
commit | 50007b28e9d472c363ec3e516975769567fb99ef (patch) | |
tree | 331ff235df46a888bb91b6914b658b60932765cd | |
parent | 4bf00b00b74e11b801d6ad4ade5de9a07b88e1cf (diff) | |
download | mariadb-git-50007b28e9d472c363ec3e516975769567fb99ef.tar.gz |
Fulltext search section moved
-rw-r--r-- | Docs/manual.texi | 431 |
1 files changed, 215 insertions, 216 deletions
diff --git a/Docs/manual.texi b/Docs/manual.texi index 417b4ebf8a0..f37d00ae219 100644 --- a/Docs/manual.texi +++ b/Docs/manual.texi @@ -120,6 +120,7 @@ version see the relevant distribution. * Tutorial:: @strong{MySQL} Tutorial * Server:: @strong{MySQL} Server * Replication:: Replication +* Fulltext Search:: Fulltext Search * Performance:: Getting maximum performance from @strong{MySQL} * MySQL Benchmarks:: The @strong{MySQL} benchmark suite * Tools:: @strong{MySQL} Utilities @@ -600,6 +601,13 @@ Replication in MySQL * Replication FAQ:: Frequently Asked Questions about replication * Replication Problems:: Troubleshooting Replication. +MySQL Full-text Search + +* Fulltext Search:: +* Fulltext Fine-tuning:: +* Fulltext Features to Appear in MySQL 4.0:: +* Fulltext TODO:: + Getting Maximum Performance from MySQL * Optimize Basics:: Optimization overview @@ -868,15 +876,8 @@ How MySQL Compares to @code{mSQL} MySQL Internals * MySQL threads:: MySQL threads -* MySQL full-text search:: MySQL full-text search * MySQL test suite:: MySQL test suite -MySQL Full-text Search - -* Fulltext Fine-tuning:: -* Fulltext features to appear in MySQL 4.0:: -* Fulltext TODO:: - Credits * Developers:: @@ -15379,7 +15380,7 @@ In @strong{MySQL} Version 3.23.23 or later, you can also create special @code{MyISAM} table type supports @code{FULLTEXT} indexes. They can be created only from @code{VARCHAR} and @code{TEXT} columns. Indexing always happens over the entire column and partial indexing is not -supported. See @ref{MySQL full-text search} for details. +supported. See @ref{Fulltext Search} for details. @cindex multi-column indexes @cindex indexes, multi-column @@ -16122,7 +16123,7 @@ For @code{MATCH ... AGAINST()} to work, a @strong{FULLTEXT} index must be created first. @xref{CREATE TABLE, , @code{CREATE TABLE}}. @code{MATCH ... AGAINST()} is available in @strong{MySQL} Version 3.23.23 or later. For details and usage examples -@pxref{MySQL full-text search}. +@pxref{Fulltext Search}. @end table @findex casts @@ -18496,7 +18497,7 @@ In @strong{MySQL} Version 3.23.23 or later, you can also create special @code{MyISAM} table type supports @code{FULLTEXT} indexes. They can be created only from @code{VARCHAR} and @code{TEXT} columns. Indexing always happens over the entire column, partial indexing is not -supported. See @ref{MySQL full-text search} for details of operation. +supported. See @ref{Fulltext Search} for details of operation. @item The @code{FOREIGN KEY}, @code{CHECK}, and @code{REFERENCES} clauses don't @@ -22675,7 +22676,7 @@ For more information about how @strong{MySQL} uses indexes, see @code{FULLTEXT} indexes can index only @code{VARCHAR} and @code{TEXT} columns, and only in @code{MyISAM} tables. @code{FULLTEXT} indexes are available in @strong{MySQL} Version 3.23.23 and later. -@ref{MySQL full-text search}. +@ref{Fulltext Search}. @findex DROP INDEX @node DROP INDEX, Comments, CREATE INDEX, Reference @@ -26913,7 +26914,7 @@ tables}. @cindex increasing, speed @cindex speed, increasing @cindex databases, replicating -@node Replication, Performance, Server, Top +@node Replication, Fulltext Search, Server, Top @chapter Replication in MySQL @menu * Replication Intro:: Introduction @@ -27871,10 +27872,208 @@ Once you have collected the evidence on the phantom problem, try hard to isolate it into a separate test case first. Then report the problem to @email{bugs@@lists.mysql.com} with as much info as possible. +@cindex searching, full-text +@cindex full-text search +@cindex FULLTEXT +@node Fulltext Search, Performance, Replication, Top +@chapter MySQL Full-text Search + +Since Version 3.23.23, @strong{MySQL} has support for full-text indexing +and searching. Full-text indexes in @strong{MySQL} are an index of type +@code{FULLTEXT}. @code{FULLTEXT} indexes can be created from @code{VARCHAR} +and @code{TEXT} columns at @code{CREATE TABLE} time or added later with +@code{ALTER TABLE} or @code{CREATE INDEX}. For large datasets, adding +@code{FULLTEXT} index with @code{ALTER TABLE} (or @code{CREATE INDEX}) would +be much faster than inserting rows into the empty table with a @code{FULLTEXT} +index. + +Full-text search is performed with the @code{MATCH} function. + +@example +mysql> CREATE TABLE t (a VARCHAR(200), b TEXT, FULLTEXT (a,b)); +Query OK, 0 rows affected (0.00 sec) + +mysql> INSERT INTO t VALUES + -> ('MySQL has now support', 'for full-text search'), + -> ('Full-text indexes', 'are called collections'), + -> ('Only MyISAM tables','support collections'), + -> ('Function MATCH ... AGAINST()','is used to do a search'), + -> ('Full-text search in MySQL', 'implements vector space model'); +Query OK, 5 rows affected (0.00 sec) +Records: 5 Duplicates: 0 Warnings: 0 + +mysql> SELECT * FROM t WHERE MATCH (a,b) AGAINST ('MySQL'); ++---------------------------+-------------------------------+ +| a | b | ++---------------------------+-------------------------------+ +| MySQL has now support | for full-text search | +| Full-text search in MySQL | implements vector-space-model | ++---------------------------+-------------------------------+ +2 rows in set (0.00 sec) + +mysql> SELECT *,MATCH a,b AGAINST ('collections support') as x FROM t; ++------------------------------+-------------------------------+--------+ +| a | b | x | ++------------------------------+-------------------------------+--------+ +| MySQL has now support | for full-text search | 0.3834 | +| Full-text indexes | are called collections | 0.3834 | +| Only MyISAM tables | support collections | 0.7668 | +| Function MATCH ... AGAINST() | is used to do a search | 0 | +| Full-text search in MySQL | implements vector space model | 0 | ++------------------------------+-------------------------------+--------+ +5 rows in set (0.00 sec) +@end example + +The function @code{MATCH} matches a natural language query @code{AGAINST} +a text collection (which is simply the columns that are covered by a +@code{FULLTEXT} index). For every row in a table it returns relevance - +a similarity measure between the text in that row (in the columns that are +part of the collection) and the query. When it is used in a @code{WHERE} +clause (see example above) the rows returned are automatically sorted with +relevance decreasing. Relevance is a non-negative floating-point number. +Zero relevance means no similarity. Relevance is computed based on the +number of words in the row, the number of unique words in that row, the +total number of words in the collection, and the number of documents (rows) +that contain a particular word. + +MySQL uses a very simple parser to split text into words. A ``word'' is +any sequence of letters, numbers, @samp{'}, and @samp{_}. Any ``word'' +that is present in the stopword list or just too short (3 characters +or less) is ignored. + +Every correct word in the collection and in the query is weighted, +according to its significance in the query or collection. This way, a +word that is present in many documents will have lower weight (and may +even have a zero weight), because it has lower semantic value in this +particular collection. Otherwise, if the word is rare, it will receive a +higher weight. The weights of the words are then combined to compute the +relevance of the row. + +Such a technique works best with large collections (in fact, it was +carefully tuned this way). For very small tables, word distribution +does not reflect adequately their semantical value, and this model +may sometimes produce bizarre results. + +For example, search for the word "search" will produce no results in the +above example. Word "search" is present in more than half of rows, and +as such, is effectively treated as a stopword (that is, with semantical value +zero). It is, really, the desired behavior - a natural language query +should not return every other row in 1GB table. + +A word that matches half of rows in a table is less likely to locate relevant +documents. In fact, it will most likely find plenty of irrelevant documents. +We all know this happens far too often when we are trying to find something on +the Internet with a search engine. It is with this reasoning that such rows +have been assigned a low semantical value in @strong{a particular dataset}. + +@menu +* Fulltext Fine-tuning:: +* Fulltext Features to Appear in MySQL 4.0:: +* Fulltext TODO:: +@end menu + +@node Fulltext Fine-tuning, Fulltext Features to Appear in MySQL 4.0, , Fulltext Search +@section Fine-tuning MySQL Full-text Search + +Unfortunately, full-text search has no user-tunable parameters yet, +although adding some is very high on the TODO. However, if you have a +@strong{MySQL} source distribution (@xref{Installing source}.), you can +somewhat alter the full-text search behavior. + +Note that full-text search was carefully tuned for the best searching +effectiveness. Modifying the default behavior will, in most cases, +only make the search results worse. Do not alter the @strong{MySQL} sources +unless you know what you are doing! + +@itemize + +@item +Minimal length of word to be indexed is defined in +@code{myisam/ftdefs.h} file by the line +@example +#define MIN_WORD_LEN 4 +@end example +Change it to the value you prefer, recompile @strong{MySQL}, and rebuild +your @code{FULLTEXT} indexes. + +@item +The stopword list is defined in @code{myisam/ft_static.c} +Modify it to your taste, recompile @strong{MySQL} and rebuild +your @code{FULLTEXT} indexes. + +@item +The 50% threshold is caused by the particular weighting scheme chosen. To +disable it, change the following line in @code{myisam/ftdefs.h}: +@example +#define GWS_IN_USE GWS_PROB +@end example +to +@example +#define GWS_IN_USE GWS_FREQ +@end example +and recompile @strong{MySQL}. +There is no need to rebuild the indexes in this case. + +@end itemize + +@node Fulltext Features to Appear in MySQL 4.0, Fulltext TODO, Fulltext Fine-tuning, Fulltext Search +@section New Features of Full-text Search to Appear in MySQL 4.0 + +This section includes a list of the fulltext features that are already +implemented in the 4.0 tree. It explains +@strong{More functions for full-text search} entry of @ref{TODO MySQL 4.0}. + +@itemize @bullet +@item @code{REPAIR TABLE} with @code{FULLTEXT} indexes, +@code{ALTER TABLE} with @code{FULLTEXT} indexes, and +@code{OPTIMIZE TABLE} with @code{FULLTEXT} indexes are now +up to 100 times faster. + +@item @code{MATCH ... AGAINST} now supports the following +@strong{boolean operators}: + +@itemize @bullet +@item @code{+}word means the that word @strong{must} be present in every +row returned. +@item @code{-}word means the that word @strong{must not} be present in every +row returned. +@item @code{<} and @code{>} can be used to decrease and increase word +weight in the query. +@item @code{~} can be used to assign a @strong{negative} weight to a noise +word. +@item @code{*} is a truncation operator. +@end itemize + +Boolean search utilizes a more simplistic way of calculating the relevance, +that does not have a 50% threshold. + +@item Searches are now up to 2 times faster due to optimized search algorithm. + +@item Utility program @code{ft_dump} added for low-level @code{FULLTEXT} +index operations (querying/dumping/statistics). + +@end itemize + +@node Fulltext TODO, , Fulltext Features to Appear in MySQL 4.0, Fulltext Search +@section Full-text Search TODO + +@itemize @bullet +@item Make all operations with @code{FULLTEXT} index @strong{faster}. +@item Support for braces @code{()} in boolean full-text search. +@item Support for "always-index words". They could be any strings +the user wants to treat as words, examples are "C++", "AS/400", "TCP/IP", etc. +@item Support for full-text search in @code{MERGE} tables. +@item Support for multi-byte charsets. +@item Make stopword list to depend of the language of the data. +@item Stemming (dependent of the language of the data, of course). +@item Generic user-supplyable UDF (?) preparser. +@item Make the model more flexible (by adding some adjustable +parameters to @code{FULLTEXT} in @code{CREATE/ALTER TABLE}). +@end itemize @cindex performance, maximizing @cindex optimization -@node Performance, MySQL Benchmarks, Replication, Top +@node Performance, MySQL Benchmarks, Fulltext Search, Top @chapter Getting Maximum Performance from MySQL Optimization is a complicated task because it ultimately requires @@ -40160,11 +40359,10 @@ This is a relatively low traffic list, in comparison with @menu * MySQL threads:: MySQL threads -* MySQL full-text search:: MySQL full-text search * MySQL test suite:: MySQL test suite @end menu -@node MySQL threads, MySQL full-text search, MySQL internals, MySQL internals +@node MySQL threads, MySQL test suite, , MySQL internals @section MySQL Threads The @strong{MySQL} server creates the following threads: @@ -40211,208 +40409,9 @@ started to read and apply updates from the master. @code{mysqladmin processlist} only shows the connection, @code{INSERT DELAYED}, and replication threads. -@cindex searching, full-text -@cindex full-text search -@cindex FULLTEXT -@node MySQL full-text search, MySQL test suite, MySQL threads, MySQL internals -@section MySQL Full-text Search - -Since Version 3.23.23, @strong{MySQL} has support for full-text indexing -and searching. Full-text indexes in @strong{MySQL} are an index of type -@code{FULLTEXT}. @code{FULLTEXT} indexes can be created from @code{VARCHAR} -and @code{TEXT} columns at @code{CREATE TABLE} time or added later with -@code{ALTER TABLE} or @code{CREATE INDEX}. For large datasets, adding -@code{FULLTEXT} index with @code{ALTER TABLE} (or @code{CREATE INDEX}) would -be much faster than inserting rows into the empty table with a @code{FULLTEXT} -index. - -Full-text search is performed with the @code{MATCH} function. - -@example -mysql> CREATE TABLE t (a VARCHAR(200), b TEXT, FULLTEXT (a,b)); -Query OK, 0 rows affected (0.00 sec) - -mysql> INSERT INTO t VALUES - -> ('MySQL has now support', 'for full-text search'), - -> ('Full-text indexes', 'are called collections'), - -> ('Only MyISAM tables','support collections'), - -> ('Function MATCH ... AGAINST()','is used to do a search'), - -> ('Full-text search in MySQL', 'implements vector space model'); -Query OK, 5 rows affected (0.00 sec) -Records: 5 Duplicates: 0 Warnings: 0 - -mysql> SELECT * FROM t WHERE MATCH (a,b) AGAINST ('MySQL'); -+---------------------------+-------------------------------+ -| a | b | -+---------------------------+-------------------------------+ -| MySQL has now support | for full-text search | -| Full-text search in MySQL | implements vector-space-model | -+---------------------------+-------------------------------+ -2 rows in set (0.00 sec) - -mysql> SELECT *,MATCH a,b AGAINST ('collections support') as x FROM t; -+------------------------------+-------------------------------+--------+ -| a | b | x | -+------------------------------+-------------------------------+--------+ -| MySQL has now support | for full-text search | 0.3834 | -| Full-text indexes | are called collections | 0.3834 | -| Only MyISAM tables | support collections | 0.7668 | -| Function MATCH ... AGAINST() | is used to do a search | 0 | -| Full-text search in MySQL | implements vector space model | 0 | -+------------------------------+-------------------------------+--------+ -5 rows in set (0.00 sec) -@end example - -The function @code{MATCH} matches a natural language query @code{AGAINST} -a text collection (which is simply the columns that are covered by a -@strong{FULLTEXT} index). For every row in a table it returns relevance - -a similarity measure between the text in that row (in the columns that are -part of the collection) and the query. When it is used in a @code{WHERE} -clause (see example above) the rows returned are automatically sorted with -relevance decreasing. Relevance is a non-negative floating-point number. -Zero relevance means no similarity. Relevance is computed based on the -number of words in the row, the number of unique words in that row, the -total number of words in the collection, and the number of documents (rows) -that contain a particular word. - -MySQL uses a very simple parser to split text into words. A ``word'' is -any sequence of letters, numbers, @samp{'}, and @samp{_}. Any ``word'' -that is present in the stopword list or just too short (3 characters -or less) is ignored. - -Every correct word in the collection and in the query is weighted, -according to its significance in the query or collection. This way, a -word that is present in many documents will have lower weight (and may -even have a zero weight), because it has lower semantic value in this -particular collection. Otherwise, if the word is rare, it will receive a -higher weight. The weights of the words are then combined to compute the -relevance of the row. - -Such a technique works best with large collections (in fact, it was -carefully tuned this way). For very small tables, word distribution -does not reflect adequately their semantical value, and this model -may sometimes produce bizarre results. - -For example, search for the word "search" will produce no results in the -above example. Word "search" is present in more than half of rows, and -as such, is effectively treated as a stopword (that is, with semantical value -zero). It is, really, the desired behavior - a natural language query -should not return every other row in 1GB table. - -A word that matches half of rows in a table is less likely to locate relevant -documents. In fact, it will most likely find plenty of irrelevant documents. -We all know this happens far too often when we are trying to find something on -the Internet with a search engine. It is with this reasoning that such rows -have been assigned a low semantical value in @strong{a particular dataset}. - -@menu -* Fulltext Fine-tuning:: -* Fulltext features to appear in MySQL 4.0:: -* Fulltext TODO:: -@end menu - -@node Fulltext Fine-tuning, Fulltext features to appear in MySQL 4.0, MySQL full-text search, MySQL full-text search -@subsection Fine-tuning MySQL Full-text Search - -Unfortunately, full-text search has no user-tunable parameters yet, -although adding some is very high on the TODO. However, if you have a -@strong{MySQL} source distribution (@xref{Installing source}.), you can -somewhat alter the full-text search behavior. - -Note that full-text search was carefully tuned for the best searching -effectiveness. Modifying the default behavior will, in most cases, -only make the search results worse. Do not alter the @strong{MySQL} sources -unless you know what you are doing! - -@itemize - -@item -Minimal length of word to be indexed is defined in -@code{myisam/ftdefs.h} file by the line -@example -#define MIN_WORD_LEN 4 -@end example -Change it to the value you prefer, recompile @strong{MySQL}, and rebuild -your @code{FULLTEXT} indexes. - -@item -The stopword list is defined in @code{myisam/ft_static.c} -Modify it to your taste, recompile @strong{MySQL} and rebuild -your @code{FULLTEXT} indexes. - -@item -The 50% threshold is caused by the particular weighting scheme chosen. To -disable it, change the following line in @code{myisam/ftdefs.h}: -@example -#define GWS_IN_USE GWS_PROB -@end example -to -@example -#define GWS_IN_USE GWS_FREQ -@end example -and recompile @strong{MySQL}. -There is no need to rebuild the indexes in this case. - -@end itemize - -@node Fulltext features to appear in MySQL 4.0, Fulltext TODO, Fulltext Fine-tuning, MySQL full-text search -@subsection New Features of Full-text Search to Appear in MySQL 4.0 - -This section includes a list of the fulltext features that are already -implemented in the 4.0 tree. It explains -@strong{More functions for full-text search} entry of @ref{TODO MySQL 4.0}. - -@itemize @bullet -@item @code{REPAIR TABLE} with @code{FULLTEXT} indexes, -@code{ALTER TABLE} with @code{FULLTEXT} indexes, and -@code{OPTIMIZE TABLE} with @code{FULLTEXT} indexes are now -up to 100 times faster. - -@item @code{MATCH ... AGAINST} now supports the following -@strong{boolean operators}: - -@itemize @bullet -@item @code{+}word means the that word @strong{must} be present in every -row returned. -@item @code{-}word means the that word @strong{must not} be present in every -row returned. -@item @code{<} and @code{>} can be used to decrease and increase word -weight in the query. -@item @code{~} can be used to assign a @strong{negative} weight to a noise -word. -@item @code{*} is a truncation operator. -@end itemize - -Boolean search utilizes a more simplistic way of calculating the relevance, -that does not have a 50% threshold. - -@item Searches are now up to 2 times faster due to optimized search algorithm. - -@item Utility program @code{ft_dump} added for low-level @code{FULLTEXT} -index operations (querying/dumping/statistics). - -@end itemize - -@node Fulltext TODO, , Fulltext features to appear in MySQL 4.0, MySQL full-text search -@subsection Full-text Search TODO - -@itemize @bullet -@item Make all operations with @code{FULLTEXT} index @strong{faster}. -@item Support for braces @code{()} in boolean fulltext search. -@item Support for "always-index words". They could be any strings -the user wants to treat as words, examples are "C++", "AS/400", "TCP/IP", etc. -@item Support for fulltext search in @code{MERGE} tables. -@item Support for multi-byte charsets. -@item Make stopword list to depend of the language of the data. -@item Stemming (dependent of the language of the data, of course). -@item Generic user-supplyable UDF (?) preparser. -@item Make the model more flexible (by adding some adjustable -parameters to @code{FULLTEXT} in @code{CREATE/ALTER TABLE}). -@end itemize - @cindex mysqltest, MySQL Test Suite @cindex testing mysqld, mysqltest -@node MySQL test suite, , MySQL full-text search, MySQL internals +@node MySQL test suite, , MySQL threads, MySQL internals @section MySQL Test Suite Until recently, our main full-coverage test suite was based on proprietary @@ -47563,7 +47562,7 @@ the @code{.MYD} file. Better replication. @item More functions for full-text search. -@xref{Fulltext features to appear in MySQL 4.0}. +@xref{Fulltext Features to Appear in MySQL 4.0}. @item Character set casts and syntax for handling multiple character sets. @item |