summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorunknown <serg@serg.mysql.com>2001-03-26 22:39:03 +0200
committerunknown <serg@serg.mysql.com>2001-03-26 22:39:03 +0200
commit50007b28e9d472c363ec3e516975769567fb99ef (patch)
tree331ff235df46a888bb91b6914b658b60932765cd
parent4bf00b00b74e11b801d6ad4ade5de9a07b88e1cf (diff)
downloadmariadb-git-50007b28e9d472c363ec3e516975769567fb99ef.tar.gz
Fulltext search section moved
-rw-r--r--Docs/manual.texi431
1 files changed, 215 insertions, 216 deletions
diff --git a/Docs/manual.texi b/Docs/manual.texi
index 417b4ebf8a0..f37d00ae219 100644
--- a/Docs/manual.texi
+++ b/Docs/manual.texi
@@ -120,6 +120,7 @@ version see the relevant distribution.
* Tutorial:: @strong{MySQL} Tutorial
* Server:: @strong{MySQL} Server
* Replication:: Replication
+* Fulltext Search:: Fulltext Search
* Performance:: Getting maximum performance from @strong{MySQL}
* MySQL Benchmarks:: The @strong{MySQL} benchmark suite
* Tools:: @strong{MySQL} Utilities
@@ -600,6 +601,13 @@ Replication in MySQL
* Replication FAQ:: Frequently Asked Questions about replication
* Replication Problems:: Troubleshooting Replication.
+MySQL Full-text Search
+
+* Fulltext Search::
+* Fulltext Fine-tuning::
+* Fulltext Features to Appear in MySQL 4.0::
+* Fulltext TODO::
+
Getting Maximum Performance from MySQL
* Optimize Basics:: Optimization overview
@@ -868,15 +876,8 @@ How MySQL Compares to @code{mSQL}
MySQL Internals
* MySQL threads:: MySQL threads
-* MySQL full-text search:: MySQL full-text search
* MySQL test suite:: MySQL test suite
-MySQL Full-text Search
-
-* Fulltext Fine-tuning::
-* Fulltext features to appear in MySQL 4.0::
-* Fulltext TODO::
-
Credits
* Developers::
@@ -15379,7 +15380,7 @@ In @strong{MySQL} Version 3.23.23 or later, you can also create special
@code{MyISAM} table type supports @code{FULLTEXT} indexes. They can be
created only from @code{VARCHAR} and @code{TEXT} columns.
Indexing always happens over the entire column and partial indexing is not
-supported. See @ref{MySQL full-text search} for details.
+supported. See @ref{Fulltext Search} for details.
@cindex multi-column indexes
@cindex indexes, multi-column
@@ -16122,7 +16123,7 @@ For @code{MATCH ... AGAINST()} to work, a @strong{FULLTEXT} index
must be created first. @xref{CREATE TABLE, , @code{CREATE TABLE}}.
@code{MATCH ... AGAINST()} is available in @strong{MySQL} Version
3.23.23 or later. For details and usage examples
-@pxref{MySQL full-text search}.
+@pxref{Fulltext Search}.
@end table
@findex casts
@@ -18496,7 +18497,7 @@ In @strong{MySQL} Version 3.23.23 or later, you can also create special
@code{MyISAM} table type supports @code{FULLTEXT} indexes. They can be created
only from @code{VARCHAR} and @code{TEXT} columns.
Indexing always happens over the entire column, partial indexing is not
-supported. See @ref{MySQL full-text search} for details of operation.
+supported. See @ref{Fulltext Search} for details of operation.
@item
The @code{FOREIGN KEY}, @code{CHECK}, and @code{REFERENCES} clauses don't
@@ -22675,7 +22676,7 @@ For more information about how @strong{MySQL} uses indexes, see
@code{FULLTEXT} indexes can index only @code{VARCHAR} and
@code{TEXT} columns, and only in @code{MyISAM} tables. @code{FULLTEXT} indexes
are available in @strong{MySQL} Version 3.23.23 and later.
-@ref{MySQL full-text search}.
+@ref{Fulltext Search}.
@findex DROP INDEX
@node DROP INDEX, Comments, CREATE INDEX, Reference
@@ -26913,7 +26914,7 @@ tables}.
@cindex increasing, speed
@cindex speed, increasing
@cindex databases, replicating
-@node Replication, Performance, Server, Top
+@node Replication, Fulltext Search, Server, Top
@chapter Replication in MySQL
@menu
* Replication Intro:: Introduction
@@ -27871,10 +27872,208 @@ Once you have collected the evidence on the phantom problem, try hard to
isolate it into a separate test case first. Then report the problem to
@email{bugs@@lists.mysql.com} with as much info as possible.
+@cindex searching, full-text
+@cindex full-text search
+@cindex FULLTEXT
+@node Fulltext Search, Performance, Replication, Top
+@chapter MySQL Full-text Search
+
+Since Version 3.23.23, @strong{MySQL} has support for full-text indexing
+and searching. Full-text indexes in @strong{MySQL} are an index of type
+@code{FULLTEXT}. @code{FULLTEXT} indexes can be created from @code{VARCHAR}
+and @code{TEXT} columns at @code{CREATE TABLE} time or added later with
+@code{ALTER TABLE} or @code{CREATE INDEX}. For large datasets, adding
+@code{FULLTEXT} index with @code{ALTER TABLE} (or @code{CREATE INDEX}) would
+be much faster than inserting rows into the empty table with a @code{FULLTEXT}
+index.
+
+Full-text search is performed with the @code{MATCH} function.
+
+@example
+mysql> CREATE TABLE t (a VARCHAR(200), b TEXT, FULLTEXT (a,b));
+Query OK, 0 rows affected (0.00 sec)
+
+mysql> INSERT INTO t VALUES
+ -> ('MySQL has now support', 'for full-text search'),
+ -> ('Full-text indexes', 'are called collections'),
+ -> ('Only MyISAM tables','support collections'),
+ -> ('Function MATCH ... AGAINST()','is used to do a search'),
+ -> ('Full-text search in MySQL', 'implements vector space model');
+Query OK, 5 rows affected (0.00 sec)
+Records: 5 Duplicates: 0 Warnings: 0
+
+mysql> SELECT * FROM t WHERE MATCH (a,b) AGAINST ('MySQL');
++---------------------------+-------------------------------+
+| a | b |
++---------------------------+-------------------------------+
+| MySQL has now support | for full-text search |
+| Full-text search in MySQL | implements vector-space-model |
++---------------------------+-------------------------------+
+2 rows in set (0.00 sec)
+
+mysql> SELECT *,MATCH a,b AGAINST ('collections support') as x FROM t;
++------------------------------+-------------------------------+--------+
+| a | b | x |
++------------------------------+-------------------------------+--------+
+| MySQL has now support | for full-text search | 0.3834 |
+| Full-text indexes | are called collections | 0.3834 |
+| Only MyISAM tables | support collections | 0.7668 |
+| Function MATCH ... AGAINST() | is used to do a search | 0 |
+| Full-text search in MySQL | implements vector space model | 0 |
++------------------------------+-------------------------------+--------+
+5 rows in set (0.00 sec)
+@end example
+
+The function @code{MATCH} matches a natural language query @code{AGAINST}
+a text collection (which is simply the columns that are covered by a
+@code{FULLTEXT} index). For every row in a table it returns relevance -
+a similarity measure between the text in that row (in the columns that are
+part of the collection) and the query. When it is used in a @code{WHERE}
+clause (see example above) the rows returned are automatically sorted with
+relevance decreasing. Relevance is a non-negative floating-point number.
+Zero relevance means no similarity. Relevance is computed based on the
+number of words in the row, the number of unique words in that row, the
+total number of words in the collection, and the number of documents (rows)
+that contain a particular word.
+
+MySQL uses a very simple parser to split text into words. A ``word'' is
+any sequence of letters, numbers, @samp{'}, and @samp{_}. Any ``word''
+that is present in the stopword list or just too short (3 characters
+or less) is ignored.
+
+Every correct word in the collection and in the query is weighted,
+according to its significance in the query or collection. This way, a
+word that is present in many documents will have lower weight (and may
+even have a zero weight), because it has lower semantic value in this
+particular collection. Otherwise, if the word is rare, it will receive a
+higher weight. The weights of the words are then combined to compute the
+relevance of the row.
+
+Such a technique works best with large collections (in fact, it was
+carefully tuned this way). For very small tables, word distribution
+does not reflect adequately their semantical value, and this model
+may sometimes produce bizarre results.
+
+For example, search for the word "search" will produce no results in the
+above example. Word "search" is present in more than half of rows, and
+as such, is effectively treated as a stopword (that is, with semantical value
+zero). It is, really, the desired behavior - a natural language query
+should not return every other row in 1GB table.
+
+A word that matches half of rows in a table is less likely to locate relevant
+documents. In fact, it will most likely find plenty of irrelevant documents.
+We all know this happens far too often when we are trying to find something on
+the Internet with a search engine. It is with this reasoning that such rows
+have been assigned a low semantical value in @strong{a particular dataset}.
+
+@menu
+* Fulltext Fine-tuning::
+* Fulltext Features to Appear in MySQL 4.0::
+* Fulltext TODO::
+@end menu
+
+@node Fulltext Fine-tuning, Fulltext Features to Appear in MySQL 4.0, , Fulltext Search
+@section Fine-tuning MySQL Full-text Search
+
+Unfortunately, full-text search has no user-tunable parameters yet,
+although adding some is very high on the TODO. However, if you have a
+@strong{MySQL} source distribution (@xref{Installing source}.), you can
+somewhat alter the full-text search behavior.
+
+Note that full-text search was carefully tuned for the best searching
+effectiveness. Modifying the default behavior will, in most cases,
+only make the search results worse. Do not alter the @strong{MySQL} sources
+unless you know what you are doing!
+
+@itemize
+
+@item
+Minimal length of word to be indexed is defined in
+@code{myisam/ftdefs.h} file by the line
+@example
+#define MIN_WORD_LEN 4
+@end example
+Change it to the value you prefer, recompile @strong{MySQL}, and rebuild
+your @code{FULLTEXT} indexes.
+
+@item
+The stopword list is defined in @code{myisam/ft_static.c}
+Modify it to your taste, recompile @strong{MySQL} and rebuild
+your @code{FULLTEXT} indexes.
+
+@item
+The 50% threshold is caused by the particular weighting scheme chosen. To
+disable it, change the following line in @code{myisam/ftdefs.h}:
+@example
+#define GWS_IN_USE GWS_PROB
+@end example
+to
+@example
+#define GWS_IN_USE GWS_FREQ
+@end example
+and recompile @strong{MySQL}.
+There is no need to rebuild the indexes in this case.
+
+@end itemize
+
+@node Fulltext Features to Appear in MySQL 4.0, Fulltext TODO, Fulltext Fine-tuning, Fulltext Search
+@section New Features of Full-text Search to Appear in MySQL 4.0
+
+This section includes a list of the fulltext features that are already
+implemented in the 4.0 tree. It explains
+@strong{More functions for full-text search} entry of @ref{TODO MySQL 4.0}.
+
+@itemize @bullet
+@item @code{REPAIR TABLE} with @code{FULLTEXT} indexes,
+@code{ALTER TABLE} with @code{FULLTEXT} indexes, and
+@code{OPTIMIZE TABLE} with @code{FULLTEXT} indexes are now
+up to 100 times faster.
+
+@item @code{MATCH ... AGAINST} now supports the following
+@strong{boolean operators}:
+
+@itemize @bullet
+@item @code{+}word means the that word @strong{must} be present in every
+row returned.
+@item @code{-}word means the that word @strong{must not} be present in every
+row returned.
+@item @code{<} and @code{>} can be used to decrease and increase word
+weight in the query.
+@item @code{~} can be used to assign a @strong{negative} weight to a noise
+word.
+@item @code{*} is a truncation operator.
+@end itemize
+
+Boolean search utilizes a more simplistic way of calculating the relevance,
+that does not have a 50% threshold.
+
+@item Searches are now up to 2 times faster due to optimized search algorithm.
+
+@item Utility program @code{ft_dump} added for low-level @code{FULLTEXT}
+index operations (querying/dumping/statistics).
+
+@end itemize
+
+@node Fulltext TODO, , Fulltext Features to Appear in MySQL 4.0, Fulltext Search
+@section Full-text Search TODO
+
+@itemize @bullet
+@item Make all operations with @code{FULLTEXT} index @strong{faster}.
+@item Support for braces @code{()} in boolean full-text search.
+@item Support for "always-index words". They could be any strings
+the user wants to treat as words, examples are "C++", "AS/400", "TCP/IP", etc.
+@item Support for full-text search in @code{MERGE} tables.
+@item Support for multi-byte charsets.
+@item Make stopword list to depend of the language of the data.
+@item Stemming (dependent of the language of the data, of course).
+@item Generic user-supplyable UDF (?) preparser.
+@item Make the model more flexible (by adding some adjustable
+parameters to @code{FULLTEXT} in @code{CREATE/ALTER TABLE}).
+@end itemize
@cindex performance, maximizing
@cindex optimization
-@node Performance, MySQL Benchmarks, Replication, Top
+@node Performance, MySQL Benchmarks, Fulltext Search, Top
@chapter Getting Maximum Performance from MySQL
Optimization is a complicated task because it ultimately requires
@@ -40160,11 +40359,10 @@ This is a relatively low traffic list, in comparison with
@menu
* MySQL threads:: MySQL threads
-* MySQL full-text search:: MySQL full-text search
* MySQL test suite:: MySQL test suite
@end menu
-@node MySQL threads, MySQL full-text search, MySQL internals, MySQL internals
+@node MySQL threads, MySQL test suite, , MySQL internals
@section MySQL Threads
The @strong{MySQL} server creates the following threads:
@@ -40211,208 +40409,9 @@ started to read and apply updates from the master.
@code{mysqladmin processlist} only shows the connection, @code{INSERT DELAYED},
and replication threads.
-@cindex searching, full-text
-@cindex full-text search
-@cindex FULLTEXT
-@node MySQL full-text search, MySQL test suite, MySQL threads, MySQL internals
-@section MySQL Full-text Search
-
-Since Version 3.23.23, @strong{MySQL} has support for full-text indexing
-and searching. Full-text indexes in @strong{MySQL} are an index of type
-@code{FULLTEXT}. @code{FULLTEXT} indexes can be created from @code{VARCHAR}
-and @code{TEXT} columns at @code{CREATE TABLE} time or added later with
-@code{ALTER TABLE} or @code{CREATE INDEX}. For large datasets, adding
-@code{FULLTEXT} index with @code{ALTER TABLE} (or @code{CREATE INDEX}) would
-be much faster than inserting rows into the empty table with a @code{FULLTEXT}
-index.
-
-Full-text search is performed with the @code{MATCH} function.
-
-@example
-mysql> CREATE TABLE t (a VARCHAR(200), b TEXT, FULLTEXT (a,b));
-Query OK, 0 rows affected (0.00 sec)
-
-mysql> INSERT INTO t VALUES
- -> ('MySQL has now support', 'for full-text search'),
- -> ('Full-text indexes', 'are called collections'),
- -> ('Only MyISAM tables','support collections'),
- -> ('Function MATCH ... AGAINST()','is used to do a search'),
- -> ('Full-text search in MySQL', 'implements vector space model');
-Query OK, 5 rows affected (0.00 sec)
-Records: 5 Duplicates: 0 Warnings: 0
-
-mysql> SELECT * FROM t WHERE MATCH (a,b) AGAINST ('MySQL');
-+---------------------------+-------------------------------+
-| a | b |
-+---------------------------+-------------------------------+
-| MySQL has now support | for full-text search |
-| Full-text search in MySQL | implements vector-space-model |
-+---------------------------+-------------------------------+
-2 rows in set (0.00 sec)
-
-mysql> SELECT *,MATCH a,b AGAINST ('collections support') as x FROM t;
-+------------------------------+-------------------------------+--------+
-| a | b | x |
-+------------------------------+-------------------------------+--------+
-| MySQL has now support | for full-text search | 0.3834 |
-| Full-text indexes | are called collections | 0.3834 |
-| Only MyISAM tables | support collections | 0.7668 |
-| Function MATCH ... AGAINST() | is used to do a search | 0 |
-| Full-text search in MySQL | implements vector space model | 0 |
-+------------------------------+-------------------------------+--------+
-5 rows in set (0.00 sec)
-@end example
-
-The function @code{MATCH} matches a natural language query @code{AGAINST}
-a text collection (which is simply the columns that are covered by a
-@strong{FULLTEXT} index). For every row in a table it returns relevance -
-a similarity measure between the text in that row (in the columns that are
-part of the collection) and the query. When it is used in a @code{WHERE}
-clause (see example above) the rows returned are automatically sorted with
-relevance decreasing. Relevance is a non-negative floating-point number.
-Zero relevance means no similarity. Relevance is computed based on the
-number of words in the row, the number of unique words in that row, the
-total number of words in the collection, and the number of documents (rows)
-that contain a particular word.
-
-MySQL uses a very simple parser to split text into words. A ``word'' is
-any sequence of letters, numbers, @samp{'}, and @samp{_}. Any ``word''
-that is present in the stopword list or just too short (3 characters
-or less) is ignored.
-
-Every correct word in the collection and in the query is weighted,
-according to its significance in the query or collection. This way, a
-word that is present in many documents will have lower weight (and may
-even have a zero weight), because it has lower semantic value in this
-particular collection. Otherwise, if the word is rare, it will receive a
-higher weight. The weights of the words are then combined to compute the
-relevance of the row.
-
-Such a technique works best with large collections (in fact, it was
-carefully tuned this way). For very small tables, word distribution
-does not reflect adequately their semantical value, and this model
-may sometimes produce bizarre results.
-
-For example, search for the word "search" will produce no results in the
-above example. Word "search" is present in more than half of rows, and
-as such, is effectively treated as a stopword (that is, with semantical value
-zero). It is, really, the desired behavior - a natural language query
-should not return every other row in 1GB table.
-
-A word that matches half of rows in a table is less likely to locate relevant
-documents. In fact, it will most likely find plenty of irrelevant documents.
-We all know this happens far too often when we are trying to find something on
-the Internet with a search engine. It is with this reasoning that such rows
-have been assigned a low semantical value in @strong{a particular dataset}.
-
-@menu
-* Fulltext Fine-tuning::
-* Fulltext features to appear in MySQL 4.0::
-* Fulltext TODO::
-@end menu
-
-@node Fulltext Fine-tuning, Fulltext features to appear in MySQL 4.0, MySQL full-text search, MySQL full-text search
-@subsection Fine-tuning MySQL Full-text Search
-
-Unfortunately, full-text search has no user-tunable parameters yet,
-although adding some is very high on the TODO. However, if you have a
-@strong{MySQL} source distribution (@xref{Installing source}.), you can
-somewhat alter the full-text search behavior.
-
-Note that full-text search was carefully tuned for the best searching
-effectiveness. Modifying the default behavior will, in most cases,
-only make the search results worse. Do not alter the @strong{MySQL} sources
-unless you know what you are doing!
-
-@itemize
-
-@item
-Minimal length of word to be indexed is defined in
-@code{myisam/ftdefs.h} file by the line
-@example
-#define MIN_WORD_LEN 4
-@end example
-Change it to the value you prefer, recompile @strong{MySQL}, and rebuild
-your @code{FULLTEXT} indexes.
-
-@item
-The stopword list is defined in @code{myisam/ft_static.c}
-Modify it to your taste, recompile @strong{MySQL} and rebuild
-your @code{FULLTEXT} indexes.
-
-@item
-The 50% threshold is caused by the particular weighting scheme chosen. To
-disable it, change the following line in @code{myisam/ftdefs.h}:
-@example
-#define GWS_IN_USE GWS_PROB
-@end example
-to
-@example
-#define GWS_IN_USE GWS_FREQ
-@end example
-and recompile @strong{MySQL}.
-There is no need to rebuild the indexes in this case.
-
-@end itemize
-
-@node Fulltext features to appear in MySQL 4.0, Fulltext TODO, Fulltext Fine-tuning, MySQL full-text search
-@subsection New Features of Full-text Search to Appear in MySQL 4.0
-
-This section includes a list of the fulltext features that are already
-implemented in the 4.0 tree. It explains
-@strong{More functions for full-text search} entry of @ref{TODO MySQL 4.0}.
-
-@itemize @bullet
-@item @code{REPAIR TABLE} with @code{FULLTEXT} indexes,
-@code{ALTER TABLE} with @code{FULLTEXT} indexes, and
-@code{OPTIMIZE TABLE} with @code{FULLTEXT} indexes are now
-up to 100 times faster.
-
-@item @code{MATCH ... AGAINST} now supports the following
-@strong{boolean operators}:
-
-@itemize @bullet
-@item @code{+}word means the that word @strong{must} be present in every
-row returned.
-@item @code{-}word means the that word @strong{must not} be present in every
-row returned.
-@item @code{<} and @code{>} can be used to decrease and increase word
-weight in the query.
-@item @code{~} can be used to assign a @strong{negative} weight to a noise
-word.
-@item @code{*} is a truncation operator.
-@end itemize
-
-Boolean search utilizes a more simplistic way of calculating the relevance,
-that does not have a 50% threshold.
-
-@item Searches are now up to 2 times faster due to optimized search algorithm.
-
-@item Utility program @code{ft_dump} added for low-level @code{FULLTEXT}
-index operations (querying/dumping/statistics).
-
-@end itemize
-
-@node Fulltext TODO, , Fulltext features to appear in MySQL 4.0, MySQL full-text search
-@subsection Full-text Search TODO
-
-@itemize @bullet
-@item Make all operations with @code{FULLTEXT} index @strong{faster}.
-@item Support for braces @code{()} in boolean fulltext search.
-@item Support for "always-index words". They could be any strings
-the user wants to treat as words, examples are "C++", "AS/400", "TCP/IP", etc.
-@item Support for fulltext search in @code{MERGE} tables.
-@item Support for multi-byte charsets.
-@item Make stopword list to depend of the language of the data.
-@item Stemming (dependent of the language of the data, of course).
-@item Generic user-supplyable UDF (?) preparser.
-@item Make the model more flexible (by adding some adjustable
-parameters to @code{FULLTEXT} in @code{CREATE/ALTER TABLE}).
-@end itemize
-
@cindex mysqltest, MySQL Test Suite
@cindex testing mysqld, mysqltest
-@node MySQL test suite, , MySQL full-text search, MySQL internals
+@node MySQL test suite, , MySQL threads, MySQL internals
@section MySQL Test Suite
Until recently, our main full-coverage test suite was based on proprietary
@@ -47563,7 +47562,7 @@ the @code{.MYD} file.
Better replication.
@item
More functions for full-text search.
-@xref{Fulltext features to appear in MySQL 4.0}.
+@xref{Fulltext Features to Appear in MySQL 4.0}.
@item
Character set casts and syntax for handling multiple character sets.
@item