blob: a2fcf4c9f84d3e0c08c56e97f54bee7d0a150614 (
plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
|
# README
## Name
groonga-normalizer-mysql
## Description
Groonga-normalizer-mysql is a Groonga plugin. It provides MySQL
compatible normalizers and a custom normalizer to Groonga.
MySQL compatible normalizers are `NormalizerMySQLGeneralCI` and
`NormalizerMySQLUnicodeCI`. `NormalizerMySQLGeneralCI` corresponds to
`utf8mb4_general_ci`. `NormalizerMySQLUnicodeCI` corresponds to
`utf8mb4_unicode_ci`.
A custom normalizer is
`NormalizerMySQLUnicodeCIExceptKanaCIKanaWithVoicedSoundMark`. It is
self-descriptive name but long. It is a variant normalizer of
`NormalizerMySQLUnicode`. It has different behaviors. The followings
are the different behaviors.
* `NormalizerMySQLUnicode` normalizes all small Hiragana such as `ぁ`,
`っ` to Hiragana such as `あ`, `つ`.
`NormalizerMySQLUnicodeCIExceptKanaCIKanaWithVoicedSoundMark`
doesn't normalize `ぁ` to `あ` nor `っ` to `つ`. `ぁ` and `あ` are
different characters. `っ` and `つ` are also different characters.
This behavior is described by `ExceptKanaCI` in the long name. This
following behaviors ared described by
`ExceptKanaWithVoicedSoundMark` in the long name.
* `NormalizerMySQLUnicode` normalizes all Hiragana with voiced sound
mark such as `が` to Hiragana without voiced sound mark such as `か`.
`NormalizerMySQLUnicodeCIExceptKanaCIKanaWithVoicedSoundMark` doesn't
normalize `が` to `か`. `が` and `か` are different characters.
* `NormalizerMySQLUnicode` normalizes all Hiragana with semi-voiced sound
mark such as `ぱ` to Hiragana without semi-voiced sound mark such as `は`.
`NormalizerMySQLUnicodeCIExceptKanaCIKanaWithVoicedSoundMark` doesn't
normalize `ぱ` to `は`. `ぱ` and `は` are different characters.
* `NormalizerMySQLUnicode` normalizes all Katakana with voiced sound
mark such as `ガ` to Katakana without voiced sound mark such as `カ`.
`NormalizerMySQLUnicodeCIExceptKanaCIKanaWithVoicedSoundMark` doesn't
normalize `ガ` to `カ`. `ガ` and `カ` are different characters.
* `NormalizerMySQLUnicode` normalizes all Katakana with semi-voiced sound
mark such as `パ` to Hiragana without semi-voiced sound mark such as `ハ`.
`NormalizerMySQLUnicodeCIExceptKanaCIKanaWithVoicedSoundMark` doesn't
normalize `パ` to `ハ`. `パ` and `ハ` are different characters.
* `NormalizerMySQLUnicode` normalizes all halfwidth Katakana with
voiced sound mark such as `ガ` to halfwidth Katakana without voiced
sound mark such as `カ`.
`NormalizerMySQLUnicodeCIExceptKanaCIKanaWithVoicedSoundMark`
normalizes all halfwidth Katakana with voided sound mark such as `ガ`
to fullwidth Katakana with voiced sound mark such as `ガ`.
`NormalizerMySQLUnicodeCIExceptKanaCIKanaWithVoicedSoundMark` is MySQL
incompatible normalizer but it is useful for Japanese text. For
example, `ふらつく` and `ブラック` has different
means. `NormalizerMySQLUnicodeCI` identifies `ふらつく` with `ブラック
` but `NormalizerMySQLUnicodeCIExceptKanaCIKanaWithVoicedSoundMark
doesn't identify them.
## Install
### Debian GNU/Linux
[Add apt-line for the Groonga deb package repository](http://groonga.org/docs/install/debian.html)
and install `groonga-normalizer-mysql` package:
% sudo aptitude -V -D -y install groonga-normalizer-mysql
### Ubuntu
[Add apt-line for the Groonga deb package repository](http://groonga.org/docs/install/ubuntu.html)
and install `groonga-normalizer-mysql` package:
% sudo aptitude -V -D -y install groonga-normalizer-mysql
### CentOS
Install `groonga-repository` package:
% sudo rpm -ivh http://packages.groonga.org/centos/groonga-release-1.1.0-1.noarch.rpm
% sudo yum makecache
Then install `groonga-normalizer-mysql` package:
% sudo yum install -y groonga-normalizer-mysql
### Fedora
Install `groonga-repository` package:
% sudo rpm -ivh http://packages.groonga.org/fedora/groonga-release-1.1.0-1.noarch.rpm
% sudo yum makecache
Then install `groonga-normalizer-mysql` package:
% sudo yum install -y groonga-normalizer-mysql
### OS X - Homebrew
Install `groonga-normalizer-mysql` package:
% brew install groonga-normalizer-mysql
### Windows
You need to build from source. Here are build instructions.
#### Build system
Install the following build tools:
* [Microsoft Visual Studio 2010 Express](http://www.microsoft.com/japan/msdn/vstudio/express/): 2012 isn't tested yet.
* [CMake](http://www.cmake.org/)
#### Build Groonga
Download the latest Groonga source from [packages.groonga.org](http://packages.groonga.org/source/groonga/). Source file name is formatted as `groonga-X.Y.Z.zip`.
Extract the source and move to the source folder:
> cd ...\groonga-X.Y.Z
groonga-X.Y.Z>
Run CMake. Here is a command line to install Groonga to `C:\groonga` folder:
groonga-X.Y.Z> cmake . -G "Visual Studio 10 Win64" -DCMAKE_INSTALL_PREFIX=C:\groonga
Build:
groonga-X.Y.Z> cmake --build . --config Release
Install:
groonga-X.Y.Z> cmake --build . --config Release --target Install
#### Build groonga-normalizer-mysql
Download the latest groonga-normalizer-mysql source from [packages.groonga.org](http://packages.groonga.org/source/groonga-normalizer-mysql/). Source file name is formatted as `groonga-normalizer-X.Y.Z.zip`.
Extract the source and move to the source folder:
> cd ...\groonga-normalizer-mysql-X.Y.Z
groonga-normalizer-mysql-X.Y.Z>
IMPORTANT!!!: Set `PKG_CONFIG_PATH` environment variable:
groonga-normalizer-mysql-X.Y.Z> set PKG_CONFIG_PATH=C:\groongalocal\lib\pkgconfig
Run CMake. Here is a command line to install Groonga to `C:\groonga` folder:
groonga-normalizer-mysql-X.Y.Z> cmake . -G "Visual Studio 10 Win64" -DCMAKE_INSTALL_PREFIX=C:\groonga
Build:
groonga-normalizer-mysql-X.Y.Z> cmake --build . --config Release
Install:
groonga-normalizer-mysql-X.Y.Z> cmake --build . --config Release --target Install
## Usage
First, you need to register `normalizers/mysql` plugin:
groonga> register normalizers/mysql
Then, you can use `NormalizerMySQLGeneralCI` and
`NormalizerMySQLUnicodeCI` as normalizers:
groonga> table_create Lexicon TABLE_PAT_KEY --default_tokenizer TokenBigram --normalizer NormalizerMySQLGeneralCI
## Dependencies
* Groonga >= 3.0.3
## Mailing list
* English: [groonga-talk@lists.sourceforge.net](https://lists.sourceforge.net/lists/listinfo/groonga-talk)
* Japanese: [groonga-dev@lists.sourceforge.jp](http://lists.sourceforge.jp/mailman/listinfo/groonga-dev)
## Thanks
* Alexander Barkov \<bar@udm.net\>: The author of
`MYSQL_SOURCE/strings/ctype-utf8.c`.
* ...
## Authors
* Kouhei Sutou \<kou@clear-code.com\>
## License
LGPLv2 only. See doc/text/lgpl-2.0.txt for details.
This program uses normalization table defined in MySQL source code. So
this program is derived work of
`MYSQL_SOURCE/strings/ctype-utf8.c`. This program is the same license
as `MYSQL_SOURCE/strings/ctype-utf8.c` and it is licensed under LGPLv2
only.
|