summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorserg@serg.mysql.com <>2001-04-15 20:15:58 +0200
committerserg@serg.mysql.com <>2001-04-15 20:15:58 +0200
commit491dc47f8cfff898978a0fdf545e5c83253fc368 (patch)
tree240442b4bbcb545ede618e29dd7eaf34c6e7e31e
parent6991ee78677cc076c288332baa97befcf83d9d35 (diff)
downloadmariadb-git-491dc47f8cfff898978a0fdf545e5c83253fc368.tar.gz
bad auto-merge fixed
-rw-r--r--Docs/manual.texi199
1 files changed, 0 insertions, 199 deletions
diff --git a/Docs/manual.texi b/Docs/manual.texi
index 2b8a6c20001..0f240040bca 100644
--- a/Docs/manual.texi
+++ b/Docs/manual.texi
@@ -40814,205 +40814,6 @@ started to read and apply updates from the master.
@code{mysqladmin processlist} only shows the connection, @code{INSERT DELAYED},
and replication threads.
-@cindex searching, full-text
-@cindex full-text search
-@cindex FULLTEXT
-@node MySQL full-text search, MySQL test suite, MySQL threads, MySQL internals
-@section MySQL Full-text Search
-
-Since Version 3.23.23, @strong{MySQL} has support for full-text indexing
-and searching. Full-text indexes in @strong{MySQL} are an index of type
-@code{FULLTEXT}. @code{FULLTEXT} indexes can be created from @code{VARCHAR}
-and @code{TEXT} columns at @code{CREATE TABLE} time or added later with
-@code{ALTER TABLE} or @code{CREATE INDEX}. For large datasets, adding
-@code{FULLTEXT} index with @code{ALTER TABLE} (or @code{CREATE INDEX}) would
-be much faster than inserting rows into the empty table with a @code{FULLTEXT}
-index.
-
-Full-text search is performed with the @code{MATCH} function.
-
-@example
-mysql> CREATE TABLE t (a VARCHAR(200), b TEXT, FULLTEXT (a,b));
-Query OK, 0 rows affected (0.00 sec)
-
-mysql> INSERT INTO t VALUES
- -> ('MySQL has now support', 'for full-text search'),
- -> ('Full-text indexes', 'are called collections'),
- -> ('Only MyISAM tables','support collections'),
- -> ('Function MATCH ... AGAINST()','is used to do a search'),
- -> ('Full-text search in MySQL', 'implements vector space model');
-Query OK, 5 rows affected (0.00 sec)
-Records: 5 Duplicates: 0 Warnings: 0
-
-mysql> SELECT * FROM t WHERE MATCH (a,b) AGAINST ('MySQL');
-+---------------------------+-------------------------------+
-| a | b |
-+---------------------------+-------------------------------+
-| MySQL has now support | for full-text search |
-| Full-text search in MySQL | implements vector-space-model |
-+---------------------------+-------------------------------+
-2 rows in set (0.00 sec)
-
-mysql> SELECT *,MATCH a,b AGAINST ('collections support') as x FROM t;
-+------------------------------+-------------------------------+--------+
-| a | b | x |
-+------------------------------+-------------------------------+--------+
-| MySQL has now support | for full-text search | 0.3834 |
-| Full-text indexes | are called collections | 0.3834 |
-| Only MyISAM tables | support collections | 0.7668 |
-| Function MATCH ... AGAINST() | is used to do a search | 0 |
-| Full-text search in MySQL | implements vector space model | 0 |
-+------------------------------+-------------------------------+--------+
-5 rows in set (0.00 sec)
-@end example
-
-The function @code{MATCH} matches a natural language query @code{AGAINST}
-a text collection (which is simply the columns that are covered by a
-@strong{FULLTEXT} index). For every row in a table it returns relevance -
-a similarity measure between the text in that row (in the columns that are
-part of the collection) and the query. When it is used in a @code{WHERE}
-clause (see example above) the rows returned are automatically sorted with
-relevance decreasing. Relevance is a non-negative floating-point number.
-Zero relevance means no similarity. Relevance is computed based on the
-number of words in the row, the number of unique words in that row, the
-total number of words in the collection, and the number of documents (rows)
-that contain a particular word.
-
-MySQL uses a very simple parser to split text into words. A ``word'' is
-any sequence of letters, numbers, @samp{'}, and @samp{_}. Any ``word''
-that is present in the stopword list or just too short (3 characters
-or less) is ignored.
-
-Every correct word in the collection and in the query is weighted,
-according to its significance in the query or collection. This way, a
-word that is present in many documents will have lower weight (and may
-even have a zero weight), because it has lower semantic value in this
-particular collection. Otherwise, if the word is rare, it will receive a
-higher weight. The weights of the words are then combined to compute the
-relevance of the row.
-
-Such a technique works best with large collections (in fact, it was
-carefully tuned this way). For very small tables, word distribution
-does not reflect adequately their semantical value, and this model
-may sometimes produce bizarre results.
-
-For example, search for the word "search" will produce no results in the
-above example. Word "search" is present in more than half of rows, and
-as such, is effectively treated as a stopword (that is, with semantical value
-zero). It is, really, the desired behavior - a natural language query
-should not return every other row in 1GB table.
-
-A word that matches half of rows in a table is less likely to locate relevant
-documents. In fact, it will most likely find plenty of irrelevant documents.
-We all know this happens far too often when we are trying to find something on
-the Internet with a search engine. It is with this reasoning that such rows
-have been assigned a low semantical value in @strong{a particular dataset}.
-
-@menu
-* Fulltext Fine-tuning::
-* Fulltext features to appear in MySQL 4.0::
-* Fulltext TODO::
-@end menu
-
-@node Fulltext Fine-tuning, Fulltext features to appear in MySQL 4.0, MySQL full-text search, MySQL full-text search
-@subsection Fine-tuning MySQL Full-text Search
-
-Unfortunately, full-text search has no user-tunable parameters yet,
-although adding some is very high on the TODO. However, if you have a
-@strong{MySQL} source distribution (@xref{Installing source}.), you can
-somewhat alter the full-text search behavior.
-
-Note that full-text search was carefully tuned for the best searching
-effectiveness. Modifying the default behavior will, in most cases,
-only make the search results worse. Do not alter the @strong{MySQL} sources
-unless you know what you are doing!
-
-@itemize
-
-@item
-Minimal length of word to be indexed is defined in
-@code{myisam/ftdefs.h} file by the line
-@example
-#define MIN_WORD_LEN 4
-@end example
-Change it to the value you prefer, recompile @strong{MySQL}, and rebuild
-your @code{FULLTEXT} indexes.
-
-@item
-The stopword list is defined in @code{myisam/ft_static.c}
-Modify it to your taste, recompile @strong{MySQL} and rebuild
-your @code{FULLTEXT} indexes.
-
-@item
-The 50% threshold is caused by the particular weighting scheme chosen. To
-disable it, change the following line in @code{myisam/ftdefs.h}:
-@example
-#define GWS_IN_USE GWS_PROB
-@end example
-to
-@example
-#define GWS_IN_USE GWS_FREQ
-@end example
-and recompile @strong{MySQL}.
-There is no need to rebuild the indexes in this case.
-
-@end itemize
-
-@node Fulltext features to appear in MySQL 4.0, Fulltext TODO, Fulltext Fine-tuning, MySQL full-text search
-@subsection New Features of Full-text Search to Appear in MySQL 4.0
-
-This section includes a list of the fulltext features that are already
-implemented in the 4.0 tree. It explains
-@strong{More functions for full-text search} entry of @ref{TODO MySQL 4.0}.
-
-@itemize @bullet
-@item @code{REPAIR TABLE} with @code{FULLTEXT} indexes,
-@code{ALTER TABLE} with @code{FULLTEXT} indexes, and
-@code{OPTIMIZE TABLE} with @code{FULLTEXT} indexes are now
-up to 100 times faster.
-
-@item @code{MATCH ... AGAINST} now supports the following
-@strong{boolean operators}:
-
-@itemize @bullet
-@item @code{+}word means the that word @strong{must} be present in every
-row returned.
-@item @code{-}word means the that word @strong{must not} be present in every
-row returned.
-@item @code{<} and @code{>} can be used to decrease and increase word
-weight in the query.
-@item @code{~} can be used to assign a @strong{negative} weight to a noise
-word.
-@item @code{*} is a truncation operator.
-@end itemize
-
-Boolean search utilizes a more simplistic way of calculating the relevance,
-that does not have a 50% threshold.
-
-@item Searches are now up to 2 times faster due to optimized search algorithm.
-
-@item Utility program @code{ft_dump} added for low-level @code{FULLTEXT}
-index operations (querying/dumping/statistics).
-
-@end itemize
-
-@node Fulltext TODO, , Fulltext features to appear in MySQL 4.0, MySQL full-text search
-@subsection Full-text Search TODO
-
-@itemize @bullet
-@item Make all operations with @code{FULLTEXT} index @strong{faster}.
-@item Support for braces @code{()} in boolean fulltext search.
-@item Support for "always-index words". They could be any strings
-the user wants to treat as words, examples are "C++", "AS/400", "TCP/IP", etc.
-@item Support for fulltext search in @code{MERGE} tables.
-@item Support for multi-byte charsets.
-@item Make stopword list to depend of the language of the data.
-@item Stemming (dependent of the language of the data, of course).
-@item Generic user-suppied UDF (?) preparser.
-@item Make the model more flexible (by adding some adjustable
-parameters to @code{FULLTEXT} in @code{CREATE/ALTER TABLE}).
-@end itemize
-
@cindex mysqltest, MySQL Test Suite
@cindex testing mysqld, mysqltest
@node MySQL test suite, , MySQL threads, MySQL internals