| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
| |
Code points that are not in a block are considered to be in the
pseudo-block 'No_Block' by the Unicode standard; so change to do that
instead of 'undef'
|
|
|
|
| |
Unassigned code points have the script 'Unknown'; not undef
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In doing so, there were a number of bug fixes made, as it now relies
on files processed by mktables, which has intelligence to fix a
number of problems with UnicodeData.txt.
This is essentially a rewrite of charinfo(). It previously had
hard-coded the ranges in UnicodeData.txt, instead of examining the file
to see what was there. This had not been updated for some time, and was
out-of-date, with the result that the newer ranges (all CJK) were quite
wrong. The new code does not have such reliance, and so new versions
of Unicode should not break this, like they previously would
This may be slower than what was previously there, as it reads several
smaller files instead of one very large one. But the principal reason
to do this work was to save disk space. It was previously thought that
the function could continue to use UnicodeData.txt if it exists on the
machine, but this would have required fixing all the bugs that this
automatically fixes by using the processed files.
|
|
|
|
|
| |
5.14 subclasses some UTF8 warnings, so that they can be turned off
more precisely.
|
|
|
|
|
|
| |
For some reason UCD.pm has lowercased the first letters of the
non-first word in script names. For backwards compatibility, continue
to do so.
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
| |
This removes the need for Scripts.txt
|
| |
|
|
|
|
|
|
|
| |
A new function that reads mktables files has been created. Switch to
use this.
A test is added to make sure it's working right
|
| |
|
|
|
|
|
|
| |
We decided it was not a good idea to have definitions for these three
code points that are not officially defined as such in the Unicode
standard.
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
This function will return the numeric value of the string passed it,
and undef if the entire string has no safe numeric value.
To be safe, a string must be a single character which has a numeric
value, or consist entirely of characters that match \d, coming from the
same Unicode block of digits. Thus, a mix of Bengali and Western
digits would be considered unsafe, as well as a mix of half- and
full-width digits.
|
|
|
|
|
| |
Including all @INC setting boilerplate from lib/Tie/ExtraHash.t, which TestInit
now performs.
|
| |
|
| |
|
|
|
|
|
|
| |
The namedseq function is essentially obsolete, as the core has better
incorporated its abilities. This adds documentation as to the
alternatives.
|
|
|
|
|
|
|
|
|
|
|
|
| |
The motiviation for this patch was to remove dependence of UCD on
another Unicode DB .txt file.
But the subroutine that uses it is out-of-date, now that this property,
and an even more convenient one are accessible from the core. So the
documentation is also updated to educate people.
Instead of using the file, the routine just uses the core's access
method
|
|
|
|
|
|
|
|
|
| |
This changes UCD to not use this file. Instead it takes advantage of
the recent addition of named sequences being accessible through the \N{}
construct. In one case where it returns a hash of all the named
sequences, it uses the same .pl file that \N{} does. My guess is that
this routine's usefulness is now past, as named sequences are now
incorporated into the core.
|
| |
|
| |
|
|
|
|
|
|
|
| |
Thank you for your bug report. Change <lower> to <upper> as the report
showed.
Signed-off-by: David Golden <dagolden@cpan.org>
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The viacode() function contained the code from the _getcode() function from
Unicode::UCD, unchanged. However, the rest of viacode() requires that
the result be specially formatted to do a string match with leading
zeros inserted to bring the length up to 4 if less than that. The
original function only needs to get the number right, as a numerical
comparison is done, so it doesn't do this. This showed up with calling
viacode with 0, but the bug also affected any input that looked like a
hex number, or a U+ number, such as 'BEE' or 'U+EF'. These need to be
massaged into '0BEE' and '00EF' for the pattern match later in the
routine to succeed.
The patch also adds a test case to Unicode::UCD to verify that it really
does work ok on 0.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
| |
From 47005e45e9738044f28ea250c17120bfa04a09b1 Mon Sep 17 00:00:00 2001
From: Karl Williamson <khw@khw-desktop.(none)>
Date: Fri, 26 Jun 2009 12:11:05 -0600
Subject: [PATCH] Small documentation change
Signed-off-by: H.Merijn Brand <h.m.brand@xs4all.nl>
|
| |
|
|
|
|
| |
in 20090108160007.GA85010@penkwe.pair.com.
|
|
|
|
|
|
|
| |
Message-ID: <493745CA.6070300@khwilliamson.com>
And bump version to 0.27
p4raw-id: //depot/perl@35036
|
|
|
|
|
| |
Message-Id: <20080831093545.A15C4120011@rserv16.sitepush.net>
p4raw-id: //depot/perl@34867
|
|
|
| |
p4raw-id: //depot/perl@33649
|
|
|
| |
p4raw-id: //depot/perl@33648
|
|
|
| |
p4raw-id: //depot/perl@32877
|
|
|
|
|
| |
Message-Id: <200705180045.l4I0jTeI221780@kosh.hut.fi>
p4raw-id: //depot/perl@31237
|
|
|
|
|
| |
to Unicode::Collate
p4raw-id: //depot/perl@30957
|
|
|
|
|
| |
Message-ID: <44FDC219.8010006@iki.fi>
p4raw-id: //depot/perl@28788
|
|
|
| |
p4raw-id: //depot/perl@26804
|
|
|
| |
p4raw-id: //depot/perl@25756
|
|
|
|
|
|
| |
From: "Piotr Fusik" <pfusik@op.pl>
Message-ID: <001401c595bd$dccb5d80$0bd34dd5@piec>
p4raw-id: //depot/perl@25261
|
|
|
| |
p4raw-id: //depot/perl@24978
|
|
|
| |
p4raw-id: //depot/perl@24426
|
|
|
|
|
|
| |
Message-ID: <424E584D.5000508@iki.fi>
Date: Sat, 02 Apr 2005 11:31:09 +0300
p4raw-id: //depot/perl@24134
|