summaryrefslogtreecommitdiff
path: root/lib/charnames.pm
Commit message (Collapse)AuthorAgeFilesLines
* charnames: pod nitKarl Williamson2011-05-181-3/+3
|
* Version bumps for non-dual-life pragmas identified byJesse Vincent2011-01-201-1/+1
| | | | ./perl -Ilib Porting/cmpVERSION.pl -xd . v5.13.8
* Fix typos (spelling errors) in lib/*Peter J. Acklam) (via RT2011-01-071-3/+3
| | | | | | | | | # New Ticket Created by (Peter J. Acklam) # Please include the string: [perl #81890] # in the subject line of all future correspondence about this issue. # <URL: http://rt.perl.org/rt3/Ticket/Display.html?id=81890 > Signed-off-by: Abigail <abigail@abigail.be>
* Work-around Uni 6.0 issues with 'BELL'Karl Williamson2010-11-181-2/+5
| | | | | | | | | | Unicode version 6.0 has co-opted the name BELL for a different character than traditionally used in Perl. This patch works around that by adding ALERT as a synonym for BELL, and causing a deprecated warning for uses of the old name. The new Unicode character will be nameless in Perl 5.14, unless I can (unlikely) get Unicode to grant a synonym that they will support.
* Revert "Bump charnames’ version"Father Chrysostomos2010-10-271-1/+1
| | | | This reverts commit fc7c69e2154d208e45cf3f5c98b74ed035d8f50c.
* Bump charnames’ versionFather Chrysostomos2010-10-211-1/+1
|
* charnames.pm: reformat commentsKarl Williamson2010-09-251-8/+6
| | | | | | Now that have less indent, don't need so many lines. The only changes in this commit are several blocks of comments to occupy more of each line. No wording changes are involved.
* charnames.pm: indent less to fit in 80 columnsKarl Williamson2010-09-251-415/+415
| | | | | | This patch changes white space only. It lessens the indent of certain lines that were made longer in an earlier commit, and now most of them fit into 80 columns.
* Teach Perl about Unicode named character sequencesKarl Williamson2010-09-251-434/+560
| | | | | | | | | | | | | mktables is changed to process the Unicode named sequence file. charnames.pm is changed to cache the looked-up values in utf8. A new function, string_vianame is created that can handle named sequences, as the interface for vianame cannot. The subroutine lookup_name() is slightly refactored to do almost all of the common work for \N{} and the vianame routines. It now understands named sequences as created my mktables.. tests and documentation are added. In the randomized testing section, half use vianame() and half string_vianame().
* charnames.pm: Nits in podKarl Williamson2010-09-251-13/+16
|
* charnames.pm: Clarify commentsKarl Williamson2010-09-251-6/+6
|
* charnames.pm: Change variable nameKarl Williamson2010-09-251-13/+13
| | | | | This is an intermediate commit in preparation for handling named sequences
* charnames: Remove unnecessary \t in Name.plKarl Williamson2010-09-251-9/+9
| | | | | The double \t\t is unnecessary, and so we can remove one of them, shortening the table.
* charnames.pm: Small performance enhancementsKarl Williamson2010-09-251-22/+8
| | | | | | mktables is changed to output 5 digit code points, which means that charnames doesn't have to go looking for the boundaries, which gives a slight performance enhancement.
* use charnames (); failsKarl Williamson2010-08-131-7/+34
| | | | | | If "use charnames ()" are the only usages of this pragma in a program, it fails due to the pragma's import method not getting called. This fixes that.
* [perl 71764] Extend charnames to all of UnicodeKarl Williamson2010-07-131-22/+45
| | | | | | | | | | | | | | | | | | | | This patch causes \N{}, vianame, and viacode to know the names of all Unicode code points. Previously the names that are algorithmically determinable were not handled. These include the Hangul syllables and many CJK characters. It simply adds using the routines that mktables inserts into Name.pl that handle these characters. mktables generates these algorithms from data in the Unicode data base. The routines have been there since 11/2009 in anticipation of this change, but have been unused until now. They probably have not been reviewed thoroughly. The major change to this is the .t file. Now that all code points are understood, the .t tests them all. But this would take too long each time, so it tests a random sample. If there is a failure, the seed is output so that the test can be reproduced. This idea came from Michael Schwern, and is the same he uses in Test::Sims. Various parameters about the sampling are easily adjustable.
* charnames.pm: More refactoring for performanceKarl Williamson2010-07-131-32/+33
| | | | | | | | | | | | | | | | | | I realized after the last commit that it might be faster to use a trie when there are multiple scripts that a letter could be in, instead of searching the table for each script. When there were 6 possible scripts and the letter was found in the final one, the speed-up was a factor of 5. This also simplified things. The list of scripts can be stored as a string like A|B|C instead of a stringified array, and the code just gets simpler. Also, there were complications to the code to keep from zapping the input name, just in case it was needed for an error message. But I realized that instead of using a shift to get the name, just copy it from $_[0], and on the error leg that needs the original, it still is in $_[0]. If a user-defined alias is to a character name to lookup and that one is invalid, we want to output the invalid one, so a further variable, $save_input is used to hold it
* charnames.pm: refactor so complex re is used onceKarl Williamson2010-07-131-24/+27
| | | | | | | | | | | | | | | | | The :short option which looks like "greek:letter" is just a special case of the option where a list of possible scripts is set up in the pragma call. In this case, greek is the single script to look up. It also turns out that, contrary to the prior code, :short is effectively mutually exclusive of checking through that list of scripts. That is, "greek:letter" didn't match in the :short option, it won't match any script option either because ':' is not a legal character in a name. So there is no need to execute both. I refactored the code to do an if then else because of this. And they both use the same complicated regex that I may have to change in future patches. So I refactored the code to use the same re Finally, I added a goto to eliminate a test.
* charnames.pm: Change variable's nameKarl Williamson2010-07-131-8/+4
| | | | It makes more sense to me in light of patches coming up.
* charnames.pm: expand tabsKarl Williamson2010-07-131-66/+66
| | | | It was getting painful to have tabs, so change to expand them to blanks
* Yet another comment changeKarl Williamson2010-07-131-2/+2
|
* charnames: Change so :short syntax can have spacesKarl Williamson2010-07-131-1/+2
| | | | | | The syntax for name look ups under :short is 'script:letter'. Allow spaces adjacent to the colon (and while we're at it) at the beginning and end
* charnames: Fix scoping bugsKarl Williamson2010-07-131-91/+167
| | | | | | | | | | | | | | | | | | | | | | | | | This was done by moving what could to %^H. Because data structures in %^H get stringified at runtime, new serialized entries for them had to be created and then unserialized on each runtime call. Also, because %^H is read-only at runtime, some data structures couldn't be moved to it. Things were set up so that these contain only things invariant under scoping, and looked at only when the same scoped options are in effect as when they were created. Further comments at declaration of %full_names_cache. I was well into this patch when it dawned on me that it was doing unnecessary tests, so that if (! a) { conditionally set a } if (! a) {} could be implemented more efficiently as if (! a) { conditionally set a } if (! a) {} } so I changed it, which messes up leading indentation for the diffs.
* charnames.pm: A couple more commentsKarl Williamson2010-07-131-2/+2
|
* charnames.pm: More comment fixesKarl Williamson2010-07-131-5/+3
|
* charnames: clean up podKarl Williamson2010-07-131-12/+21
|
* charnames.pm: return ord not chrKarl Williamson2010-07-131-1/+1
| | | | | | An error leg in charnames.pm was returning the wrong type. This fixes it. A later commit will change the .t to add "use warnings" so this fix will be noticed.
* charnames.pm clarify commentsKarl Williamson2010-07-131-17/+7
|
* Speed up viacodeKarl Williamson2010-07-041-3/+7
| | | | | | Capturing parentheses greatly slow down regexes, at least here. On my machine, viacode took 27 seconds for the 22K Unicode names without capturing parens; 45s with.
* Add vi hint for non-std format of charnames.pmKarl Williamson2010-07-041-0/+2
|
* More charnames pod updatesKarl Williamson2010-07-041-24/+39
|
* charnames: check for use bytes in vianame; efficiencyKarl Williamson2010-07-041-5/+14
| | | | | | | | | | When vianame returns a chr, it now verifies that it is legal under 'use bytes'. Update .t An instance of taking of a substr of a huge string is needed only in an error leg. Move it to that leg for performance. And make the message a subroutine so will be identical whenever raised.
* Clean up charnames pod, including new changesKarl Williamson2010-07-041-70/+86
| | | | | This patch brings the charnames pod up-to-date, and rewords it to hopefully be more clear.
* Clean up viacode, accept large aliasesKarl Williamson2010-07-041-12/+17
| | | | | This changes viacode to accept aliases that the user has defined beyond the Unicode range.
* Extend \N{} enhancements to vianame()Karl Williamson2010-07-041-56/+66
| | | | | This patch refactors charnames so that vianame and \N call the same common subroutine so that they have as identical behavior as possible.
* Bump version; some pod cleanupKarl Williamson2010-07-041-21/+36
|
* charnames: add CORE:: to hex()Karl Williamson2010-07-041-2/+2
| | | | | Other programs do this; I don't know why just hex() needs to be protected from user override, but I'm just copying prior art.
* Abandon plans to change viacode's return of unassignedKarl Williamson2010-07-041-6/+5
| | | | | | | | The BUGS section of the charnames pod said that it was a bug to return undef for unassigned characters, whereas the real Unicode name is the empty string. demerphq noted that undef stringifies to the empty string, so we are in fact in compliance with the standard. This clarifies the pod wording, removing the text from the BUGS section.
* Allow defining custom charnames to ordinalsKarl Williamson2010-07-041-24/+76
| | | | | | | | | | | | | | This adds the ability of a user to create a custom alias that maps to a numeric ordinal value, instead of an official Unicode name. The number of hashes went up so that is better to refer to them by a name than a number, so I renamed them. Also, viacode will return any defined user's alias for an otherwise unamed code point. This change is principally so that private use characters can be named so it is more convenient to use them in Perl.
* Reword feedback request.Karl Williamson2010-07-041-1/+1
| | | | | It's not clear to me what should be done about the problem of vianame being bipolar.
* Add a number of abbrs and variants to \N{}Karl Williamson2010-07-041-60/+479
| | | | | | | | | | | | | | | | | | | | | This patch adds the standard abbreviations for the control characters (such as ACK, BEL, etc) to the repertoire that \N{} knows about. It also adds a few common variants of their full names, and the old names for the 4 controls that Unicode has chosen not to have any names at all for. The patch also adds all the abbreviations that Unicode lists in 5.2 for longer characters, such as NBSP, SHY, LRE, ... To preserve complete backward compatibilty for these and future changes, user-defined aliases are now checked first, before these are. As a performance enhancement, these aliases are mapped to their actual code values instead of their full names which then had to be looked up in the large table. Now that is avoided, and the table is not loaded at all until a name is encountered that is not one of these aliases. The pod and .t are updated.
* Remove BUG report from pod that is now fixedKarl Williamson2010-06-281-2/+0
| | | | viacode now works correctly for 0.
* Fix charnames::viacode not accepting U+... paramKarl Williamson2010-06-281-1/+1
| | | | | | The commit e10d7780a27dcfeb9c50ab28b66f2df8763d8016 introduced a bug in which a parameter to viacode of the form U+... no longer worked. This is fixed, as well as tests added.
* don't use $[ in library codeZefram2010-04-271-3/+3
| | | | | | | Remove all uses of $[, both reads and writes, from library code. Test code (which must test behaviour of $[) is unchanged, as is the actual implementation of $[. Uses in CPAN libraries are also untouched: I've opened tickets at rt.cpan.org regarding them.
* Bump versions of charnames and Unicode::UCD after last patchesRafael Garcia-Suarez2010-04-251-1/+1
|
* PATCH [perl #72624] charnames::viacode(0) returns undefKarl Williamson2010-04-251-3/+6
| | | | | | | | | | | | | | | | The viacode() function contained the code from the _getcode() function from Unicode::UCD, unchanged. However, the rest of viacode() requires that the result be specially formatted to do a string match with leading zeros inserted to bring the length up to 4 if less than that. The original function only needs to get the number right, as a numerical comparison is done, so it doesn't do this. This showed up with calling viacode with 0, but the bug also affected any input that looked like a hex number, or a U+ number, such as 'BEE' or 'U+EF'. These need to be massaged into '0BEE' and '00EF' for the pattern match later in the routine to succeed. The patch also adds a test case to Unicode::UCD to verify that it really does work ok on 0.
* Update documentationKarl Williamson2010-02-281-1/+4
| | | | List known bugs, mention new meaning of \N
* Update charnames documentations for \N changes, bugsKarl Williamson2010-02-281-6/+26
| | | | | \N has a possible new meaning, and mention bug reports filed against charnames
* bump versions of non-dual-life modules thatDavid Mitchell2009-07-031-1/+1
| | | | | * differ between 5.10.0 and maint-5.10, or * differ between 5.8.9 and maint-5.10
* Update comments and documentation dealing with utfKarl2008-12-261-0/+5
|