charnames.t: tweak amount of testing of CJK chars

Actually, this tweaks the amount of testing of characters whose names are algorithmically determinable, most of which are CJK characters. This patch changes the testing to test not 1% of them, but to test 1 in each block, no matter what the block size. We really don't need to test many of these to be confident the algorithm is working. It also adds some comments to clarify what happens if one tweaks the block size.
author: Karl Williamson <public@khwilliamson.com> 2010-08-02 16:42:25 -0600
committer: Rafael Garcia-Suarez <rgs@consttype.org> 2010-08-13 14:36:20 +0200
commit: e4b4d0cc90ea801f118be8e27b7e78e05b398467 (patch)
tree: 3fe31814babf1d7e9a764d853692ea9f8b6260b1 /lib/charnames.t
parent: a28b016699ec02a4ee4b83e23d8f22df52e18a44 (diff)
download: perl-e4b4d0cc90ea801f118be8e27b7e78e05b398467.tar.gz
1 files changed, 13 insertions, 6 deletions
diff --git a/lib/charnames.t b/lib/charnames.t
index 75a3aaccb1..74b0999242 100644
--- a/lib/charnames.t
+++ b/lib/charnames.t
@@ -780,20 +780,27 @@ is("\N{U+1D0C5}", "\N{BYZANTINE MUSICAL SYMBOL FTHORA SKLIRON CHROMA VASIS}");
                                     defined $ENV{PERL_TEST_CHARNAMES_SEED};
     $seed = srand($seed);
 
+    # We will look at the data grouped in "blocks" of the following
+    # size.
+    my $block_size_bits = 7;   # above 16 is not sensible
+    my $block_size = 2**$block_size_bits;
+
     # There are the regular names, like "SPACE", plus the ones
     # that are algorithmically determinable, such as "CKJ UNIFIED
     # IDEOGRAPH-hhhh" where the hhhh is the actual hex code point number
     # of the character.  The percentage of each type to test is
     # independently settable.
     my $percentage_of_regular_names = 25;
-    my $percentage_of_algorithmic_names = 1;
+    my $percentage_of_algorithmic_names = 100 / $block_size; # 1 test/block
 
-    my @names;  # The names of every code point.
+    # Changing the block size doesn't change anything with regards to
+    # testing the regular names, but will affect the algorithmic names.
+    # If you make the size too big so that blocks include both regular
+    # names and algorithmic, the whole block will be sampled at the sum
+    # of the two rates.  If you make it too small, then more algorithmic
+    # names will be tested than you probably intended.
 
-    # We will look at the data grouped in "blocks" of the following
-    # size.
-    my $block_size_bits = 7;   # above 16 is not sensible
-    my $block_size = 2**$block_size_bits;
+    my @names;  # The names of every code point.
 
     # We look at one block past the Unicode maximum, to verify there are
     # no names in it.
author	Karl Williamson <public@khwilliamson.com>	2010-08-02 16:42:25 -0600
committer	Rafael Garcia-Suarez <rgs@consttype.org>	2010-08-13 14:36:20 +0200
commit	e4b4d0cc90ea801f118be8e27b7e78e05b398467 (patch)
tree	3fe31814babf1d7e9a764d853692ea9f8b6260b1 /lib/charnames.t
parent	a28b016699ec02a4ee4b83e23d8f22df52e18a44 (diff)
download	perl-e4b4d0cc90ea801f118be8e27b7e78e05b398467.tar.gz