2 files changed, 46 insertions, 1 deletions
diff --git a/Docs/internals.texi b/Docs/internals.texi
index 8f358982ded..871e51c50bd 100644
--- a/Docs/internals.texi
+++ b/Docs/internals.texi
@@ -57,6 +57,7 @@ This is a manual about @strong{MySQL} internals.
 * mysys functions::             Functions In The @code{mysys} Library
 * DBUG::                        DBUG Tags To Use
 * protocol::                    MySQL Client/Server Protocol
+* Fulltext Search::             Fulltext Search in MySQL
 @end menu
 
 
@@ -535,7 +536,7 @@ Print query.
 @end table
 
 
-@node protocol,  , DBUG, Top
+@node protocol, Fulltext Search, DBUG, Top
 @chapter MySQL Client/Server Protocol
 
 @menu
@@ -785,6 +786,48 @@ Date            03 0A 00 00     |01 0A                  |03 00 00 00
 
 @c @printindex fn
 
+@node Fulltext Search,  , protocol, Top
+@chapter Fulltext Search in MySQL
+
+Hopefully, sometime there will be complete description of
+fulltext search algorithms.
+Now it's just unsorted notes.
+
+@menu
+* Weighting in boolean mode::  
+@end menu
+
+@node Weighting in boolean mode, , , Fulltext Search
+@section Weighting in boolean mode
+
+The basic idea is as follows: in expression
+@code{A or B or (C and D and E)}, either @code{A} or @code{B} alone
+is enough to match the whole expression. While @code{C},
+@code{D}, and @code{E} should @strong{all} match. So it's
+reasonable to assign weight 1 to @code{A}, @code{B}, and
+@code{(C and D and E)}. And @code{C}, @code{D}, and @code{E}
+should get a weight of 1/3.
+
+Things become more complicated when considering boolean
+operators, as used in MySQL FTB. Obvioulsy, @code{+A +B}
+should be treated as @code{A and B}, and @code{A B} -
+as @code{A or B}. The problem is, that @code{+A B} can @strong{not}
+be rewritten in and/or terms (that's the reason why this - extended -
+set of operators was chosen). Still, aproximations can be used.
+@code{+A B C} can be approximated as @code{A or (A and (B or C))}
+or as @code{A or (A and B) or (A and C) or (A and B and C)}.
+Applying the above logic (and omitting mathematical
+transformations and normalization) one gets that for
+@code{+A_1 +A_2 ... +A_N B_1 B_2 ... B_M} the weights
+should be: @code{A_i = 1/N}, @code{B_j=1} if @code{N==0}, and,
+otherwise, in the first rewritting approach @code{B_j = 1/3},
+and in the second one - @code{B_j = (1+(M-1)*2^M)/(M*(2^(M+1)-1))}.
+
+The second expression gives somewhat steeper increase in total
+weight as number of matched B's increases, because it assigns
+higher weights to individual B's. Also the first expression in
+much simplier. So it is the first one, that is implemented in MySQL.
+
 @summarycontents
 @contents
 
diff --git a/Docs/manual.texi b/Docs/manual.texi
index a3cc6ffd799..47082b839ba 100644
--- a/Docs/manual.texi
+++ b/Docs/manual.texi
@@ -48933,6 +48933,8 @@ Our TODO section contains what we plan to have in 4.0. @xref{TODO MySQL 4.0}.
 
 @itemize @bullet
 @item
+Boolean fulltext search weighting scheme changed to something more reasonable.
+@item
 Fixed bug in boolean fulltext search, that caused MySQL to ignore queries of
 @code{ft_min_word_len} characters.
 @item