diff options
author | Karl Williamson <khw@cpan.org> | 2016-01-18 14:25:02 -0700 |
---|---|---|
committer | Karl Williamson <khw@cpan.org> | 2016-01-19 15:08:59 -0700 |
commit | 6b659339f976d014a1a53731d86cedd01f5921ec (patch) | |
tree | 852b02830e8b19bf700af95791214485d1f4e2e8 /lib/unicore/mktables | |
parent | ca8226cfa2cc0ddcc50f60505c42078df8e3b766 (diff) | |
download | perl-6b659339f976d014a1a53731d86cedd01f5921ec.tar.gz |
Add qr/\b{lb}/
This adds the final Unicode boundary type previously missing from core
Perl: the LineBreak one. This feature is already available in the
Unicode::LineBreak module, but I've been told that there are portability
and some other issues with that module. What's added here is a
light-weight version that is lacking the customizable features of the
module.
This implements the default Line Breaking algorithm, but with the
customizations that Unicode is expecting everybody to add, as their
test file tests for them. In other words, this passes Unicode's fairly
extensive furnished tests, but wouldn't if it didn't include certain
customizations specified by Unicode beyond the basic algorithm.
The implementation uses a look-up table of the characters surrounding a
boundary to see if it is a suitable place to break a line. In a few
cases, context needs to be taken into account, so there is code in
addition to the lookup table to handle those.
This should meet the needs for line breaking of many applications,
without having to load the module.
The algorithm is somewhat independent of the Unicode version, just like
the other boundary types. Only if new rules are added, or existing ones
modified is there need to go in and change this code. Otherwise,
running regen/mk_invlists.pl should be sufficient when a new Unicode
release is done to keep it up-to-date, again like the other Unicode
boundary types.
Diffstat (limited to 'lib/unicore/mktables')
-rw-r--r-- | lib/unicore/mktables | 21 |
1 files changed, 20 insertions, 1 deletions
diff --git a/lib/unicore/mktables b/lib/unicore/mktables index 4f05062b0f..de97363f91 100644 --- a/lib/unicore/mktables +++ b/lib/unicore/mktables @@ -1389,6 +1389,7 @@ my $has_hangul_syllables = 0; my $needing_code_points_ending_in_code_point = 0; my @backslash_X_tests; # List of tests read in for testing \X +my @LB_tests; # List of tests read in for testing \b{lb} my @SB_tests; # List of tests read in for testing \b{sb} my @WB_tests; # List of tests read in for testing \b{wb} my @unhandled_properties; # Will contain a list of properties found in @@ -12051,6 +12052,18 @@ sub process_GCB_test { return; } +sub process_LB_test { + + my $file = shift; + Carp::carp_extra_args(\@_) if main::DEBUG && @_; + + while ($file->next_line) { + push @LB_tests, $_; + } + + return; +} + sub process_SB_test { my $file = shift; @@ -18628,6 +18641,7 @@ sub make_property_test_script() { <DATA>, @output, (map {"Test_GCB('$_');\n"} @backslash_X_tests), + (map {"Test_LB('$_');\n"} @LB_tests), (map {"Test_SB('$_');\n"} @SB_tests), (map {"Test_WB('$_');\n"} @WB_tests), "Finished();\n" @@ -19099,7 +19113,7 @@ my @input_file_objects = ( . 'F<NamedSequences.txt> and recompile perl', ), Input_file->new("$AUXILIARY/LBTest.txt", v5.1.0, - Skip => $Validation, + Handler => \&process_LB_test, ), Input_file->new("$AUXILIARY/LineBreakTest.html", v5.1.0, Skip => $Validation_Documentation, @@ -19893,6 +19907,10 @@ sub Test_GCB($) { _test_break(shift, 'gcb'); } +sub Test_LB($) { + _test_break(shift, 'lb'); +} + sub Test_SB($) { _test_break(shift, 'sb'); } @@ -19917,3 +19935,4 @@ Expect(1, 0xFF10, '\p{XDigit}', ""); # Bug # 71726 # official suite, and gets modified to check for the perl tailoring by # Test_WB() Test_WB("$breakable 0020 $breakable 0020 $breakable 0308 $breakable"); +Test_LB("$nobreak 200B $nobreak 0020 $nobreak 0020 $breakable 2060 $breakable"); |