summaryrefslogtreecommitdiff
path: root/lib/Unicode
Commit message (Collapse)AuthorAgeFilesLines
* UCD.pm: All code points are in some blockKarl Williamson2011-03-032-8/+8
| | | | | | Code points that are not in a block are considered to be in the pseudo-block 'No_Block' by the Unicode standard; so change to do that instead of 'undef'
* UCD.pm: All code points have a scriptKarl Williamson2011-03-032-2/+2
| | | | Unassigned code points have the script 'Unknown'; not undef
* UCD.pm: Nits in podKarl Williamson2011-03-031-5/+4
|
* UCD.pm: Fix typos in podKarl Williamson2011-03-031-3/+3
|
* UCD.pm: Remove reliance on UnicodeData.txtKarl Williamson2011-03-032-127/+147
| | | | | | | | | | | | | | | | | | | | In doing so, there were a number of bug fixes made, as it now relies on files processed by mktables, which has intelligence to fix a number of problems with UnicodeData.txt. This is essentially a rewrite of charinfo(). It previously had hard-coded the ranges in UnicodeData.txt, instead of examining the file to see what was there. This had not been updated for some time, and was out-of-date, with the result that the newer ranges (all CJK) were quite wrong. The new code does not have such reliance, and so new versions of Unicode should not break this, like they previously would This may be slower than what was previously there, as it reads several smaller files instead of one very large one. But the principal reason to do this work was to save disk space. It was previously thought that the function could continue to use UnicodeData.txt if it exists on the machine, but this would have required fixing all the bugs that this automatically fixes by using the processed files.
* UCD.pm: Use subclassed warningsKarl Williamson2011-03-031-1/+2
| | | | | 5.14 subclasses some UTF8 warnings, so that they can be turned off more precisely.
* UCD.pm: Use traditional casing for script namesKarl Williamson2011-03-031-0/+1
| | | | | | For some reason UCD.pm has lowercased the first letters of the non-first word in script names. For backwards compatibility, continue to do so.
* UCD.t: Add test for non-Unicode code pointKarl Williamson2011-03-031-1/+3
|
* UCD.pm" remove no longer used variableKarl Williamson2011-03-031-1/+0
|
* UCD.t: Fix a test descriptionKarl Williamson2011-03-031-1/+1
|
* UCD.pm: Nits in pod and commentKarl Williamson2011-03-031-3/+12
|
* UCD.pm: nits in comments and podKarl Williamson2011-03-021-7/+5
|
* UCD.pm: Convert charscript to use mktables tablesKarl Williamson2011-03-012-27/+13
| | | | This removes the need for Scripts.txt
* UCD.pm: Bump versionKarl Williamson2011-03-011-1/+1
|
* UCD.pm: Convert num() to use new fcnKarl Williamson2011-03-012-13/+19
| | | | | | | A new function that reads mktables files has been created. Switch to use this. A test is added to make sure it's working right
* UCD.pm: Add internal fcn for reading mktables fileKarl Williamson2011-03-011-0/+33
|
* Unicode::UCD::num(): Remove definitions for irrationalsKarl Williamson2011-02-171-11/+21
| | | | | | We decided it was not a good idea to have definitions for these three code points that are not officially defined as such in the Unicode standard.
* Unicode::UCD::num() clarify pod textKarl Williamson2011-02-171-3/+3
|
* Add UCD::num() to get safe numeric value of a stringKarl Williamson2011-02-152-3/+128
| | | | | | | | | | | This function will return the numeric value of the string passed it, and undef if the entire string has no safe numeric value. To be safe, a string must be a single character which has a numeric value, or consist entirely of characters that match \d, coming from the same Unicode block of digits. Thus, a mix of Bengali and Western digits would be considered unsafe, as well as a mix of half- and full-width digits.
* Remove Mac OS classic code from tests in lib.Nicholas Clark2011-01-181-1/+0
| | | | | Including all @INC setting boilerplate from lib/Tie/ExtraHash.t, which TestInit now performs.
* Correct test count in UCD.tFather Chrysostomos2010-11-201-1/+1
|
* Increase Unicode'UCD::s versionFather Chrysostomos2010-11-201-1/+1
|
* UCD.pm: Add info about named sequence alternativesKarl Williamson2010-11-201-0/+6
| | | | | | The namedseq function is essentially obsolete, as the core has better incorporated its abilities. This adds documentation as to the alternatives.
* UCD.pm: Don't use CompositionExclusions.txtKarl Williamson2010-11-202-28/+26
| | | | | | | | | | | | The motiviation for this patch was to remove dependence of UCD on another Unicode DB .txt file. But the subroutine that uses it is out-of-date, now that this property, and an even more convenient one are accessible from the core. So the documentation is also updated to educate people. Instead of using the file, the routine just uses the core's access method
* UCD.pm: Don't use NamedSequences.txt, saves diskKarl Williamson2010-11-201-8/+22
| | | | | | | | | This changes UCD to not use this file. Instead it takes advantage of the recent addition of named sequences being accessible through the \N{} construct. In one case where it returns a hash of all the named sequences, it uses the same .pl file that \N{} does. My guess is that this routine's usefulness is now past, as named sequences are now incorporated into the core.
* Unicode 6.0 DBKarl Williamson2010-11-181-2/+2
|
* Bump module version numbersDavid Golden2010-07-191-1/+1
|
* PATCH: [perl #76502] Fix UCD.pm docKarl Williamson2010-07-141-1/+1
| | | | | | | Thank you for your bug report. Change <lower> to <upper> as the report showed. Signed-off-by: David Golden <dagolden@cpan.org>
* Bump versions of charnames and Unicode::UCD after last patchesRafael Garcia-Suarez2010-04-251-1/+1
|
* Adapt plan after last patchRafael Garcia-Suarez2010-04-251-1/+1
|
* PATCH [perl #72624] charnames::viacode(0) returns undefKarl Williamson2010-04-252-1/+21
| | | | | | | | | | | | | | | | The viacode() function contained the code from the _getcode() function from Unicode::UCD, unchanged. However, the rest of viacode() requires that the result be specially formatted to do a string match with leading zeros inserted to bring the length up to 4 if less than that. The original function only needs to get the number right, as a numerical comparison is done, so it doesn't do this. This showed up with calling viacode with 0, but the bug also affected any input that looked like a hex number, or a U+ number, such as 'BEE' or 'U+EF'. These need to be massaged into '0BEE' and '00EF' for the pattern match later in the routine to succeed. The patch also adds a test case to Unicode::UCD to verify that it really does work ok on 0.
* Unicode 5.2Karl Williamson2009-12-031-1/+1
|
* Move Unicode::Collate from lib to ext.Nicholas Clark2009-09-1322-24152/+0
|
* PATCH small documentation change for UCD.pmkarl williamson2009-06-261-1/+5
| | | | | | | | | From 47005e45e9738044f28ea250c17120bfa04a09b1 Mon Sep 17 00:00:00 2001 From: Karl Williamson <khw@khw-desktop.(none)> Date: Fri, 26 Jun 2009 12:11:05 -0600 Subject: [PATCH] Small documentation change Signed-off-by: H.Merijn Brand <h.m.brand@xs4all.nl>
* Change documentation for UCD::casespec() to match realitykarl williamson2009-01-151-12/+32
|
* Fixed spelling of 'uncondtional', as reported by Ronald J KimballAbigail2009-01-081-1/+1
| | | | in 20090108160007.GA85010@penkwe.pair.com.
* PATCH [perl #58430] Unicode::UCD::casefold() does not work as documented,karl williamson2008-12-062-193/+516
| | | | | | | Message-ID: <493745CA.6070300@khwilliamson.com> And bump version to 0.27 p4raw-id: //depot/perl@35036
* [perl #58428][PATCH] Unicode::UCD::charinfo() does not work on 21 Han codepointsRenee Baecker2008-11-171-4/+32
| | | | | Message-Id: <20080831093545.A15C4120011@rserv16.sitepush.net> p4raw-id: //depot/perl@34867
* Missed updated a test description, as spotted by vincent.Nicholas Clark2008-04-061-1/+1
| | | p4raw-id: //depot/perl@33649
* UCD 5.1.0Nicholas Clark2008-04-051-2/+2
| | | p4raw-id: //depot/perl@33648
* Fix "grep in void context" warningRafael Garcia-Suarez2008-01-061-1/+1
| | | p4raw-id: //depot/perl@32877
* Unicode::UCD: add general category and bidi type interfacesJarkko Hietaniemi2007-05-182-7/+135
| | | | | Message-Id: <200705180045.l4I0jTeI221780@kosh.hut.fi> p4raw-id: //depot/perl@31237
* Add the Default Unicode Collation Element Table for UCD 5.0.0Rafael Garcia-Suarez2007-04-151-0/+18191
| | | | | to Unicode::Collate p4raw-id: //depot/perl@30957
* UCD 5.0.0Jarkko Hietaniemi2006-09-061-2/+2
| | | | | Message-ID: <44FDC219.8010006@iki.fi> p4raw-id: //depot/perl@28788
* Bump $VERSION in many modules that have changed.Nicholas Clark2006-01-121-1/+1
| | | p4raw-id: //depot/perl@26804
* Upgrade to Unicode-Collate-0.52Steve Peters2005-10-144-28/+67
| | | p4raw-id: //depot/perl@25756
* Typos in *.p[lm]Piotr Fusik2005-08-021-2/+2
| | | | | | From: "Piotr Fusik" <pfusik@op.pl> Message-ID: <001401c595bd$dccb5d80$0bd34dd5@piec> p4raw-id: //depot/perl@25261
* Upgrade to Unicode::Collate 0.51Rafael Garcia-Suarez2005-06-243-26/+70
| | | p4raw-id: //depot/perl@24978
* Upgrade to Unicode::Collate 0.50Rafael Garcia-Suarez2005-05-0920-455/+736
| | | p4raw-id: //depot/perl@24426
* Unicode 4.1.0Jarkko Hietaniemi2005-04-022-7/+85
| | | | | | Message-ID: <424E584D.5000508@iki.fi> Date: Sat, 02 Apr 2005 11:31:09 +0300 p4raw-id: //depot/perl@24134