1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
|
Text::Soundex Version 3.02
NOTE: Users of Text::Soundex Version 2.x should consult the 'History'
section at the end of this document before installing this module.
The interface has been simplified, and existing code that takes
advantages of Version 2.x features may need to be altered to function
properly.
This is a perl 5 module implementing the Soundex algorithm described by
Knuth. The algorithm is used quite often for locating a person by name
where the actual spelling of the name is not known.
This version directly supercedes the version of Text::Soundex that can be
found in the core from Perl 5.8.0 and down. (This version is a drop-in
replacement)
The algorithm used by soundex() is NOT fully compatible with the
algorithm used to index names for US Censuses. Use the soundex_nara()
subroutine to return codes for this purpose.
Basic Usage:
Soundex is used to do a one way transformation of a name, converting
a character string given as input into a set of codes representing
the identifiable sounds those characters might make in the output.
For example:
use Text::Soundex;
print soundex("Mark"), "\n"; # prints: M620
print soundex("Marc"), "\n"; # prints: M620
print soundex("Hansen"), "\n"; # prints: H525
print soundex("Hanson"), "\n"; # prints: H525
print soundex("Henson"), "\n"; # prints: H525
In many situations, code such as the following:
if ($name1 eq $name2) {
...
}
Can be substituted with:
if (soundex($name1) eq soundex($name2)) {
...
}
Installation:
Once the archive has been unpacked then the following steps are needed
to build, test and install the module (to be done in the directory which
contains the Makefile.PL)
perl Makefile.PL
make
make test
If the make test succeeds then the next step may need to be run as root
(on a Unix-like system) or with special privileges on other systems.
make install
If you do not want to use the XS code (for whatever reason) do the following
instead of the above:
perl Makefile.PL --no-xs
make
make test
make install
If any of the tests report 'not ok' and you are running perl 5.6.0 or later
then please contact Mark Mielke <mark@mielke.cc>
History:
Version 3.02:
3.01 and 3.00 used the 'U8' type incorrectly causing some strict
compilers to complain or refuse to compile the XS code. Also, unicode
support did not work properly for Perl 5.6.x. Both of these problems
are now fixed.
Version 3.01:
A bug with non-UTF 8 strings that contain non-ASCII alphabetic characters
was fixed. The soundex_unicode() and soundex_nara_unicode() wrapper
routines were included and the documentation refers the user to the
excellent Text::Unidecode module to perform soundex encodings using
unicode strings. The Perl versions of the routines have been further
optimized, and correct a border case involving non-alphabetic characters
at the beginning of the string.
Version 3.00:
Support for UTF-8 strings (unicode strings) is now in place. Note
that this allows UTF-8 strings to be passed to the XS version of
the soundex() routine. The Soundex algorithm treats characters
outside the ascii range (0x00 - 0x7F) as if they were not
alphabetical.
The interface has been simplified. In order to explicitly use the
non-XS implementation of soundex():
use Text::Soundex ();
$code = Text::Soundex::soundex_noxs($name);
In order to use the NARA soundex algorithm:
use Text::Soundex 'soundex_nara';
$code = soundex_nara($name);
Use of the ':NARA-Ruleset' import directive is now obsolete. To
emulate the old behaviour:
use Text::Soundex ();
*soundex = \&Text::Soundex::soundex_nara;
$code = soundex($name);
Version 2.20:
This version includes support for the algorithm used to index
the U.S. Federal Censuses. There is a slight descrepancy in the
definition for a soundex code which is not commonly known or
recognized involved similar sounding letters being seperated
by the characters H or W. This is defined as the NARA ruleset,
as this descrepency was discovered by them. (Calling it "the
US Census ruleset" was too unwieldy...)
NARA can be found at:
http://www.nara.gov/genealogy/
The algorithm requested by NARA can be found at:
http://home.utah-inter.net/kinsearch/Soundex.html
Ways to use it in your code:
Transparently change existing code like this:
=============================================
use Text::Soundex qw(:NARA-Ruleset);
... soundex(...) ...
--
Make the change visibly distinct like this:
===========================================
use Text::Soundex qw(soundex_nara);
... soundex_nara(...) ...
Version 2.00:
This version is a full re-write of the 1.0 engine by Mark Mielke.
The goal was for speed... and this was achieved. There is an optional
XS module which can be used completely transparently by the user
which offers a further speed increase of a factor of more than 7.5X.
Version 1.00:
This version can be found in the perl core distribution from at
least Perl 5.8.0 and down. It was written by Mike Stok. It can be
identified by the fact that it does not contain a $VERSION
in the beginning of the module, and as well it uses an RCS
tag with a version of 1.x. This version, before some perl5'ish
packaging was introduced, was actually written for perl4.
|