summaryrefslogtreecommitdiff
path: root/lib/Unicode
Commit message (Collapse)AuthorAgeFilesLines
* Change mktables output for some tables to use hexKarl Williamson2013-10-162-6/+13
| | | | | | | | | | | | | | | | | | | This makes all the tables in the lib/unicore/To directory that map from code point to code point be formatted so that the mapped-to code point is expressed as hexadecimal. This allows for uniform treatment of these tables in utf8.c, and removes the final use of strtol() in the (non-CPAN) core. strtol() should be avoided because it is subject to locale rules, and some older libc implementations have been buggy. It was used because Perl doesn't have an efficient way of parsing a decimal number and advancing the parse pointer to beyond it; we do have such a method for hex numbers. The input to mktables published by Unicode is also in hex, so this now conforms to that convention. This also will facilitate the new work currently being done to read in the tables that find the closing bracket given an opening one.
* lib/Unicode/UCD.t: Do tests in deterministic orderKarl Williamson2013-10-161-5/+5
| | | | I needed this in order to compare successive runs of this test
* Bump Unicode versionKarl Williamson2013-10-041-1/+1
| | | | | In commit a9c9e371c40cf388593577cf577494e91793f62a, I forgot to update the Unicode version in the file that states it.
* Unicode::UCD: Work on non-ASCII platformsKarl Williamson2013-08-291-33/+83
| | | | | Now that mktables generates native tables, it is a fairly simple matter to get Unicode::UCD to work on those platforms.
* typo fixes for Unicode UCDDavid Steinbrunner2013-05-261-7/+7
|
* Make Unicode::UCD::search_invlist() availableKarl Williamson2013-05-222-11/+63
| | | | | This commit documents this function, removing the initial underscore from its name. (And it hardens input checking.)
* Unicode::UCD Clarifications in podKarl Williamson2013-05-221-7/+8
| | | | | | There are no "missing" values in inversion maps; there is a default value returned for each one. So change the example variables' names. Plus another sentence rewording for clarity.
* Unicode::UCD: Move function in file.Karl Williamson2013-05-221-59/+59
| | | | | This is in preparation for making this function public, and it should be listed in the pod later than it otherwise would be.
* Unicode::UCD: Correct wrong pod infoKarl Williamson2013-05-221-15/+36
| | | | This was erroneous. Extra clarifications are also added.
* Unicode/UCD.pm: Fix undef bugKarl Williamson2013-02-251-2/+2
| | | | | This only happens should Perl be compiled on the very first Unicode release, which is extremely unlikely, but fix it anyway.
* Unicode::UCD: Add examples to podKarl Williamson2013-02-161-3/+11
|
* lib/Unicode/UCD.pm: Clarify podKarl Williamson2013-02-151-2/+3
|
* Fix various minor pod issuesKarl Williamson2013-01-241-3/+3
| | | | | These were all uncovered by the new Pod::Checker, not yet in core. Fixing these will speed up debugging the new Checker.
* Unicode::UCD.pm: Fix bugs in undocumented binary search functionKarl Williamson2012-11-191-3/+7
| | | | | | | | | | | This function is undocumented mostly because I was afraid it would be buggy, as many such implementations are. See: http://googleresearch.blogspot.com/2006/06/extra-extra-read-all-about-it-nearly.html (I recommend reading that link; it is instructive, entertaining, and humbling.) And it turns out I was right that I was wrong (in my original code). A test was inexplicably reversed, and another missing.
* Unicode/UCD.pm: Clarify podKarl Williamson2012-09-131-2/+3
|
* Use new Unicode 6.2 betaKarl Williamson2012-08-261-1/+1
| | | | | | | | | | | | These supposedly are the final data files for 6.2. Earlier changes originally proposed for 6.2 have been deferred until a later release. Thus there is no change in the general category of ASCII characters in these files from what they were in 6.1 and earlier, unlike what had been proposed. Unlike the previous experimental beta, code is now in place in Perl to handle the revised definition of \X in 6.2. The current working draft of that definition is at http://unicode.org/draft/reports/tr29/tr29.html
* Revert "Experimentally Use Unicode 6.2 beta"Karl Williamson2012-08-261-1/+1
| | | | | This reverts commit 5435c3759c4567a1bb51384f6641c04822ec6391. A new beta has been released, and so we should use that instead.
* Unicode::UCD::prop_invlist() Allow to return internal propertyKarl Williamson2012-08-021-3/+7
| | | | | | | | This creates an optional undocumented parameter to this function to allow it to return the inversion list of an internal-only Perl property. This will be used by other functions in Perl, but should not be documented, as we don't want to encourage the use of internal-only properties, which are subject to change or removal without notice.
* Unicode::UCD: typo and incorrect recipe in podKarl Williamson2012-06-091-2/+2
| | | | | An extra word is removed, and the recipe should not eval lines with NaN in them and expect the answer to be NaN
* Experimentally Use Unicode 6.2 betaKarl Williamson2012-06-081-1/+1
| | | | | | | | | | | | | | | | | Unicode 6.2 is proposing some changes that may very well break some CPAN modules. The timing of this nicely coincides with Perl's being early in the release cycle. This commit takes the current beta 6.2, adds the proposed changes that aren't yet in it, and subtracts the changes that would affect \X processing, as those turn out to have errors, and may have to be rethought. Unicode has been notified of these problems. This commit is to gather data as to whether or not the proposed changes cause us problems. These will be presented to Unicode to aid in their final decision as to whether or not to go forward with the changes. This commit should be reverted at some point, and the final 6.2 used instead.
* UCD.t: Cope with $/ being set.Karl Williamson2012-06-071-2/+20
| | | | | These tests fail in some earlier Unicode versions because $/ is being set. Reset it to $/ around the reads and chomps this .t does.
* Revert "UCD.t: Don't use BEL for $/"Karl Williamson2012-06-071-0/+2
| | | | | | This reverts commit 9a8e4a54f8b3668a7ebd8f229cdfb405a1dce77c. It turns out that the reason this was there was to add a stress test. Outside setting of $/ should not break Unicode::UCD
* Unicode::UCD: Cope with early Unicodes for casespec()Karl Williamson2012-06-021-2/+12
|
* UCD.t: Allow to test earlier UnicodesKarl Williamson2012-06-021-7/+10
| | | | | | | In Unicode 6.1, the only property that is stored in hex format that wasn't handled elsewhere is the bmg property, but earlier Unicodes had some of the Unihan (if they are being compiled) ones stored that way too. So make it more general.
* UCD.pm: Fix grammar in commentKarl Williamson2012-06-021-1/+1
|
* Add all_casefolds()Karl Williamson2012-06-021-1/+53
| | | | | This function returns the entire structure that casefold() builds. It is useful for a .t.
* Unicode::UCD: Allow some fncs to work under minitestKarl Williamson2012-06-021-12/+41
| | | | | | | | | Some of the functions defined in this module are needed for minitest, where dclone is not available. This defines and uses a substitute dclone when Storable::dclone is not available. It also conditionally loads Unicode::Normalize. The function that uses that module is not executed in minitest.
* Unicode::UCD::casefold(): Don't use .txt file for sourceKarl Williamson2012-06-021-45/+79
| | | | | | | | This converts this function to using the outputs of prop_invmap() to get its casefolding definitions. This allows it to work on versions of Unicode which don't have this file, allows the file to not have to be installed, and removes this function from having to be different on EBCDIC platforms (which wasn't coded anyway).
* UCD.t: Don't use BEL for $/Karl Williamson2012-06-021-2/+0
| | | | This causes failures on early Unicode releases, and is not necessary
* UCD.t: Skip PropValueAliases tests on early UnicodesKarl Williamson2012-06-021-0/+4
|
* UCD.t: Skip tests for PropertyAlias on early UnicodesKarl Williamson2012-06-021-0/+4
|
* UCD.t: Use v-string for easier version comparisonKarl Williamson2012-06-021-4/+4
|
* UCD.t: white-space onlyKarl Williamson2012-06-021-5/+5
| | | | Indent because a previous commit surrounded this with an 'if'
* Unicode::UCD: Fix blocks to work on early UnicodesKarl Williamson2012-06-021-2/+9
| | | | Not all Unicode releases supported blocks
* Unicode::UCD: Fix to work on Unicodes without script propertyKarl Williamson2012-06-022-2/+13
|
* Unicode::UCD::compexcl(): Fix to work on early UnicodesKarl Williamson2012-06-021-1/+6
|
* Unicode::UCD::charinfo(): Fix to handle decomps in early Unicode releasesKarl Williamson2012-06-021-1/+2
| | | | There are no hangul syllables in early releases.
* Unicode::UCD::prop_invmap(): Fix so handles dm in earlier UnicodesKarl Williamson2012-06-021-3/+12
| | | | | Some versions of Unicode did not have hangul syllables; and there is a bug in handling them that doesn't show up in the latest versions.
* Unicode::UCD::prop_invmap: Fix so works on very early UnicodeKarl Williamson2012-06-021-1/+1
| | | | | Some versions of Unicode don't have the AHex property. Instead use [:xdigit:] which is defined in all versions.
* Unicode::UCD::prop_invmap(): Fix to work on early UnicodesKarl Williamson2012-06-021-2/+6
| | | | | The scf property was originally known as the sfc property. This handles both possibilities.
* Unicode::UCD::num(): Fix so works on early Unicode releasesKarl Williamson2012-06-021-10/+29
| | | | This has to do extra work for releases prior to 6.0.
* Unicode::UCD::charinfo(): get ISO comment for earlier UnicodesKarl Williamson2012-06-021-3/+11
| | | | This field had meaning in earlier Unicode versions.
* Unicode::UCD: Store v-string Unicode version.Karl Williamson2012-06-021-1/+3
| | | | | This value will be used in future commits to make version comparisons easier.
* mktables: Handle typo in Unicode 6.1 data fileKarl Williamson2012-05-231-0/+3
| | | | | Unicode has published a correction to their data files for version 6.1. This patch applies that correction.
* Unicode::UCD.pm: Bump versionKarl Williamson2012-04-041-1/+1
|
* Unicode::UCD::prop_invmap(): Return 's' not 'i' formatKarl Williamson2012-04-042-1/+7
| | | | | | The 'i' is an earlier name, and I overlooked changing it when the other formats were changed. In Unicode 6.1, the only property that is affected is Bmg.
* Unicode::UCD::prop_invmap: Fix returned formatKarl Williamson2012-03-191-1/+1
| | | | | | The type of an 'a' table should not be changed to 's'. This bug happened currently only if someone changed mktables to output one of the optional files.
* Unicode::UCD: typos in error messagesKarl Williamson2012-03-191-4/+4
| | | | | These concatenated the package name with the beginning of the text with no intervening punctuation. Add also the function within the package
* Unicode::UCD: pod clarifications, correctionsKarl Williamson2012-03-161-9/+46
|
* UCD.t: white-space onlyKarl Williamson2012-02-101-13/+13
| | | | This outdents some statements that are no longer enclosed in a block