summaryrefslogtreecommitdiff
path: root/lib
diff options
context:
space:
mode:
authorKarl Williamson <public@khwilliamson.com>2010-08-02 16:42:25 -0600
committerRafael Garcia-Suarez <rgs@consttype.org>2010-08-13 14:36:20 +0200
commite4b4d0cc90ea801f118be8e27b7e78e05b398467 (patch)
tree3fe31814babf1d7e9a764d853692ea9f8b6260b1 /lib
parenta28b016699ec02a4ee4b83e23d8f22df52e18a44 (diff)
downloadperl-e4b4d0cc90ea801f118be8e27b7e78e05b398467.tar.gz
charnames.t: tweak amount of testing of CJK chars
Actually, this tweaks the amount of testing of characters whose names are algorithmically determinable, most of which are CJK characters. This patch changes the testing to test not 1% of them, but to test 1 in each block, no matter what the block size. We really don't need to test many of these to be confident the algorithm is working. It also adds some comments to clarify what happens if one tweaks the block size.
Diffstat (limited to 'lib')
-rw-r--r--lib/charnames.t19
1 files changed, 13 insertions, 6 deletions
diff --git a/lib/charnames.t b/lib/charnames.t
index 75a3aaccb1..74b0999242 100644
--- a/lib/charnames.t
+++ b/lib/charnames.t
@@ -780,20 +780,27 @@ is("\N{U+1D0C5}", "\N{BYZANTINE MUSICAL SYMBOL FTHORA SKLIRON CHROMA VASIS}");
defined $ENV{PERL_TEST_CHARNAMES_SEED};
$seed = srand($seed);
+ # We will look at the data grouped in "blocks" of the following
+ # size.
+ my $block_size_bits = 7; # above 16 is not sensible
+ my $block_size = 2**$block_size_bits;
+
# There are the regular names, like "SPACE", plus the ones
# that are algorithmically determinable, such as "CKJ UNIFIED
# IDEOGRAPH-hhhh" where the hhhh is the actual hex code point number
# of the character. The percentage of each type to test is
# independently settable.
my $percentage_of_regular_names = 25;
- my $percentage_of_algorithmic_names = 1;
+ my $percentage_of_algorithmic_names = 100 / $block_size; # 1 test/block
- my @names; # The names of every code point.
+ # Changing the block size doesn't change anything with regards to
+ # testing the regular names, but will affect the algorithmic names.
+ # If you make the size too big so that blocks include both regular
+ # names and algorithmic, the whole block will be sampled at the sum
+ # of the two rates. If you make it too small, then more algorithmic
+ # names will be tested than you probably intended.
- # We will look at the data grouped in "blocks" of the following
- # size.
- my $block_size_bits = 7; # above 16 is not sensible
- my $block_size = 2**$block_size_bits;
+ my @names; # The names of every code point.
# We look at one block past the Unicode maximum, to verify there are
# no names in it.