summaryrefslogtreecommitdiff
path: root/lib/charnames.t
Commit message (Collapse)AuthorAgeFilesLines
* charnames: :alias alone implies :fullKarl Williamson2013-01-041-6/+7
| | | | | | | | | The documentation says this is how it should behave, but only 1 of the three paths in the code did it, and in fact there was a test to the contrary. I'm only adding a test for one of the two fixed paths, as the other one appears to require a weird file name.
* Make \N{unknown char} a syntax errorKarl Williamson2012-10-241-13/+26
| | | | | | | Previously, it was a warning with the REPLACEMENT CHARACTER substituted. Unicode recommends that it be a syntax error, and any code that used this had to be buggy since the REPLACEMENT CHARACTER has no other use in Unicode.
* charnames.t: Fix erroneous interpolation of \N{}Karl Williamson2012-10-241-1/+1
| | | | This is supposed to print as-is, not interpolate.
* mktables: Convert to BELL meaning U+1F514Karl Williamson2012-06-021-11/+11
| | | | | | | | | As a result of the Unicode 6.0 mistake of using "BELL" to refer to a different code point, Perl has deprecated use of this name for 2 major release cycles, while not fully implementing Unicode in the interim, to allow any affected code to migrate to the new name This commit now switches to the new meaning of BELL.
* charnames.t: White-space onlyKarl Williamson2012-06-021-13/+13
| | | | Indent newly formed block
* charnames.t: Fix to work on Unicodes without NameAliasesKarl Williamson2012-06-021-3/+44
| | | | | | This is a recent addition. Use alternate means if the file doesn't exist in the Unicode release, or is for a non-ASCII platform (as the alternate means should take care of the translation in that case).
* charnames.t: Skip hangul syllable testing for early UnicodesKarl Williamson2012-06-021-0/+3
| | | | | If the Unicode release doesn't contain hangul syllables, just skip those tests
* charnames.t: Indent newly formed blockKarl Williamson2012-06-021-13/+13
|
* charnames.t: Skip testing named sequences if don't existKarl Williamson2012-06-021-2/+7
| | | | | Instead of dying when applied to a Unicode version that doesn't have named sequences, skip them.
* charnames.t: viacode doesn't return Unicode_1 name alwaysKarl Williamson2012-02-131-1/+7
| | | | There are now four characters which have a different preferred name.
* mktables: viacode() return unparenthesized names for 4 controlsKarl Williamson2012-02-131-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit changes the viacode() returned name for four control characters, as follows: Code point Old Name New Name U+000A LINE FEED (LF) LINE FEED U+000C FORM FEED (FF) FORM FEED U+000D CARRIAGE RETURN (CR) CARRIAGE RETURN U+0085 NEXT LINE (NEL) NEXT LINE Only the return from viacode is affected. All the names are accepted as input, as they always have been. Unicode 6.1 now has official names for all the controls, and the new names match those. The old names were the ones that were recommended by TR18 prior to 6.1, and still are, sort of. This change uses the official names in preference to the TR18 ones. We probably wouldn't bother except that the old names were problematic--the only names in the whole universe of names containing parentheses, and not matching traditional usage. The new names have always been accepted as inputs by Perl. I actually doubt that Unicode ever grokked that they were recommending these ugly names. and they haven't paid much attention to TR18 anyway, breaking it in version 6.0 by encoding one of the recommended names (BELL) as an official name for another code point, and without realizing it. TR18 now is in limbo, still wrongly recommending BELL, with a rewrite being promised for many months now. It's unclear what will happen with it. It was agreed on p5p to go with the cleaner, now official names, instead of the older, likely obsolete, TR18 names. I did a search of CPAN; it was unclear if this change, (which again is only for viacode()) mattered to any code there or not. There were a few instances of the old names, but none of those were apparently associated with viacode().
* Unicode 6.1Karl Williamson2012-02-041-4/+13
| | | | | | This commit delivers the official Unicode character database files for release 6.1, plus the final bits needed to cope with the changes in them from release 6.0, including documentation.
* charnames.t: Skip null name testKarl Williamson2011-12-291-1/+1
| | | | | | In versions of Unicode earlier than 6.1, there was no possibility of a name being empty here; but 6.1 will make that happen, so guard against it.
* charnames.t: Add test namesKarl Williamson2011-12-291-419/+433
| | | | | This adds test names to nearly all the ones that were missing. Most were done via a global substituted in a text editor.
* charnames.t: Fix test that passed whether or not it shouldKarl Williamson2011-12-291-2/+3
| | | | | This test was calling grep, then the comma operator, then the non-empty string after the comment operator caused it to always succeed.
* charnames tests: Add names to some more testsKarl Williamson2011-12-201-4/+5
|
* Autoload charnames for \N{name}Karl Williamson2011-12-201-3/+2
| | | | | | | | | | | | | | | | This autoloads charnames.pm when needed. It uses the :full and :short options. :loose is not used because of its relative unfamiliarity in the Perl community, and is slower. (If someone later added a typical "use charnames qw(:full)", things that previously matched under :loose would start to fail, causing confustion. If :loose does become more common, we can change this in the future to use it; the converse isn't true.) The callable functions in the module are not automatically loaded. To access them, an explicity "use charnames" must be provided. Thanks to Tony Cook for doing a code inspection and finding a missing SPAGAIN.
* charnames.t: Rmv extra blank in commentKarl Williamson2011-12-201-1/+1
|
* charnames: Add :loose matchingKarl Williamson2011-06-151-5/+69
| | | | | | | | | This adds the capability to charnames to use Unicode loose name look-ups, via ":loose" being specified in the pragma. The number of tests per code point is doubled in the .t, so to preserve the same amount of elapsed test time, the number of code points tested in each run is halved.
* charnames.t: Rmv duplicated testKarl Williamson2011-06-151-1/+0
|
* charnames: Quote metachars in script namesKarl Williamson2011-06-151-0/+4
| | | | "use charnames qw(.*)" will match any script; it should match none.
* charnames: Abbreviations wrong on certain C1 controlsKarl Williamson2011-06-131-4/+4
| | | | | The abbreviations for 4 of the C1 controls have a trailing blank. Unfortunately so did the tests for them.
* Fix typos (spelling errors) in lib/*Peter J. Acklam) (via RT2011-01-071-1/+1
| | | | | | | | | # New Ticket Created by (Peter J. Acklam) # Please include the string: [perl #81890] # in the subject line of all future correspondence about this issue. # <URL: http://rt.perl.org/rt3/Ticket/Display.html?id=81890 > Signed-off-by: Abigail <abigail@abigail.be>
* Work-around Uni 6.0 issues with 'BELL'Karl Williamson2010-11-181-0/+11
| | | | | | | | | | Unicode version 6.0 has co-opted the name BELL for a different character than traditionally used in Perl. This patch works around that by adding ALERT as a synonym for BELL, and causing a deprecated warning for uses of the old name. The new Unicode character will be nameless in Perl 5.14, unless I can (unlikely) get Unicode to grant a synonym that they will support.
* charnames.t: indent newly formed blockKarl Williamson2010-11-181-12/+13
| | | | | This is a white-space only patch to indent the code that was put into an if block by the previous commit
* charnames.t: PERL_RUN_SLOW_TESTS runs more testsKarl Williamson2010-11-181-1/+15
| | | | | | This patch makes this .t look for this environment variable, and if set run more tests. There are two levels of setting, as explained in the comments
* charnames::viacode returning less correct nameKarl Williamson2010-10-211-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | There are several cases where more than one name is valid for a code point. This happens usually when the original name was published with a typo in it. It's best for viacode to return the revised name, though the original remains valid. The names data is in a table generated by mktables exclusively for charnames, including vianame (and its kin) and viacode. The fix is to mktables to put the more correct name first in the table, so that it is found first and returned by viacode(). When I originally designed this code, I thought the correct name should come last in the tables, so someone looping and reading it could just overwrite the less correct one with the more correct one. But to save memory we have the same table shared by viacode and vianame, and vianame has to recognize both names, so both entries are needed. viacode could do an rindex to find the more correct name, but experiments show that that was twice as slow as going the other direction. Therefore, this patch is for speed. If the tables for vianame and viacode were ever to be split, this patch could be reverted, if desired, to put things back to the reverse order.
* charnames.t: Make sure code point aliasess are rightKarl Williamson2010-10-121-0/+4
| | | | | Some code points have two (possibly more names). This makes sure that all work.
* charnames.t: Extract common code to subroutineKarl Williamson2010-10-121-13/+15
|
* Teach Perl about Unicode named character sequencesKarl Williamson2010-09-251-4/+84
| | | | | | | | | | | | | mktables is changed to process the Unicode named sequence file. charnames.pm is changed to cache the looked-up values in utf8. A new function, string_vianame is created that can handle named sequences, as the interface for vianame cannot. The subroutine lookup_name() is slightly refactored to do almost all of the common work for \N{} and the vianame routines. It now understands named sequences as created my mktables.. tests and documentation are added. In the randomized testing section, half use vianame() and half string_vianame().
* charnames.t: Add output messageKarl Williamson2010-09-251-1/+1
|
* charnames.t: Clarify messageKarl Williamson2010-09-251-1/+1
|
* charnames.t: Clarify value is hexKarl Williamson2010-09-251-1/+1
|
* charnames.t: Add tests for NameAliasesKarl Williamson2010-09-251-0/+10
|
* charnames.t: Add code so can test 100% of namesKarl Williamson2010-09-251-1/+12
| | | | | | If the percentage of characters to test is changed to 100%, add code to make the block size 1. This guarantees each character gets tested in spite of randomness
* charnames.t: clarify commentsKarl Williamson2010-09-251-4/+7
|
* charnames.t: Don't call srand(undef)Karl Williamson2010-09-251-3/+9
| | | | | srand(undef) is the same as srand(0). The code is trying to get random seeds, not a fixed one.
* Tests using t/lib/common.pl need to run in separate directories.Nicholas Clark2010-09-011-2/+2
| | | | | | | Commit 8f776eae73090661 turned out to be a bit optimistic with "should be capable of running in parallel", as the temporary files and modules written out by the various test scripts have clashing names. Hence run each test a private subdirectory.
* charnames.t: tweak amount of testing of CJK charsKarl Williamson2010-08-131-6/+13
| | | | | | | | | | | | Actually, this tweaks the amount of testing of characters whose names are algorithmically determinable, most of which are CJK characters. This patch changes the testing to test not 1% of them, but to test 1 in each block, no matter what the block size. We really don't need to test many of these to be confident the algorithm is working. It also adds some comments to clarify what happens if one tweaks the block size.
* charnames.t: Change message to fit in 80 columnsKarl Williamson2010-08-131-3/+3
| | | | This is an important message. Better not to wrap it.
* charnames.t: Guard agains empty lines in __DATA__Karl Williamson2010-08-131-0/+1
| | | | | | | | Somehow an empty line got inserted at the end of the file, and got interpreted as 0's which caused the test for NULL to fail. This guards against that. I removed the empty line, but I'm not sure git has picked that up.
* charnames.t: use srand's seedKarl Williamson2010-08-131-5/+5
| | | | Don't calculate our own seed
* [perl 71764] Extend charnames to all of UnicodeKarl Williamson2010-07-131-1/+11360
| | | | | | | | | | | | | | | | | | | | This patch causes \N{}, vianame, and viacode to know the names of all Unicode code points. Previously the names that are algorithmically determinable were not handled. These include the Hangul syllables and many CJK characters. It simply adds using the routines that mktables inserts into Name.pl that handle these characters. mktables generates these algorithms from data in the Unicode data base. The routines have been there since 11/2009 in anticipation of this change, but have been unused until now. They probably have not been reviewed thoroughly. The major change to this is the .t file. Now that all code points are understood, the .t tests them all. But this would take too long each time, so it tests a random sample. If there is a failure, the seed is output so that the test can be reproduced. This idea came from Michael Schwern, and is the same he uses in Test::Sims. Various parameters about the sampling are easily adjustable.
* charnames.t: Test that can have string "0x..."Karl Williamson2010-07-131-1/+1
| | | | | The form "0x..." is supposed to evaluate as if it weren't a string. Make sure that is tested
* charnames: Change so :short syntax can have spacesKarl Williamson2010-07-131-10/+10
| | | | | | The syntax for name look ups under :short is 'script:letter'. Allow spaces adjacent to the colon (and while we're at it) at the beginning and end
* charnames: Fix scoping bugsKarl Williamson2010-07-131-1/+116
| | | | | | | | | | | | | | | | | | | | | | | | | This was done by moving what could to %^H. Because data structures in %^H get stringified at runtime, new serialized entries for them had to be created and then unserialized on each runtime call. Also, because %^H is read-only at runtime, some data structures couldn't be moved to it. Things were set up so that these contain only things invariant under scoping, and looked at only when the same scoped options are in effect as when they were created. Further comments at declaration of %full_names_cache. I was well into this patch when it dawned on me that it was doing unnecessary tests, so that if (! a) { conditionally set a } if (! a) {} could be implemented more efficiently as if (! a) { conditionally set a } if (! a) {} } so I changed it, which messes up leading indentation for the diffs.
* charnames.t update because of rebaseKarl Williamson2010-07-041-59/+6
| | | | | | Use of t/lib/common.pl caused some glitches; some behaviors of the underlying is() functions changed, so revised .t to work under this scheme.
* charnames: check for use bytes in vianame; efficiencyKarl Williamson2010-07-041-2/+7
| | | | | | | | | | When vianame returns a chr, it now verifies that it is legal under 'use bytes'. Update .t An instance of taking of a substr of a huge string is needed only in an error leg. Move it to that leg for performance. And make the message a subroutine so will be identical whenever raised.
* Clean up viacode, accept large aliasesKarl Williamson2010-07-041-0/+1
| | | | | This changes viacode to accept aliases that the user has defined beyond the Unicode range.
* Extend \N{} enhancements to vianame()Karl Williamson2010-07-041-2/+10
| | | | | This patch refactors charnames so that vianame and \N call the same common subroutine so that they have as identical behavior as possible.