| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
| |
At some point enough files were installed that it was possible to
rebuild perl's Unicode databases outside the source tree. This is no
longer possible. (171f12bc in 2003 seems to have stopped installing
Makefiles under lib/ so this doc is very outdated.)
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
| |
If the string which contains the name of a user-defined character property
function is tainted, then die rather than calling that function.
See [perl #82616].
|
|
|
|
| |
In fact the code is such that changing an A to a cntrol does work
|
|
|
|
|
|
|
|
|
|
| |
This is for security as well as performance. It allows Unicode properties to
not be matched case sensitively. As a result the swash inversion hash is
converted from having utf8 keys to numeric, code point, keys.
It also for the first time fixes the bug where /i doesn't work for a code point
not at the end of a range in a bracketed character class has a multi-character
fold
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
| |
# New Ticket Created by (Peter J. Acklam)
# Please include the string: [perl #81906]
# in the subject line of all future correspondence about this issue.
# <URL: http://rt.perl.org/rt3/Ticket/Display.html?id=81906 >
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
| |
This patch adds two functions for setting the ANYOF node bitmaps. The
one for dealing with folds has intelligence as to what to do if unicode
semantics is in effect.
Together with previous commits, this fixes the unicode bug for bracketed
character classes, as far as known bugs go, so pods are updated as well.
|
| |
|
| |
|
|
|
|
|
|
|
| |
This patch is part of fixing the Unicode bug. The /u regex modifier now
applies to posix character classes. This resolves [perl #18281].
The Todo tests in reg_posicc.t have all been made not todo.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit causes regex sequences \b, \s, and \w (and complements) to
match in the latin1 range in the scope of feature 'unicode_strings' or
with the /u regex modifier.
It uses the previously unused flags field in the respective regnodes to
indicate the type of matching, and in regexec.c, uses that to decide
which of the handy.h macros to use, native or Latin1.
I chose this for now rather than create new nodes for each type of
match. An earlier version of this patch did that, and in every case the
switch case: statements were adjacent, offering no performance
advantage. If regexec were modified to use in-line functions or more
macros for various short section of it, then it would be faster to have
new nodes rather than using the flags field. But, using that field
simplified things, as this change flies under the radar in a number of
places where it would not if separate nodes were used.
|
|
|
|
| |
Oooops!
|
|
|
|
|
|
| |
I ran some experiments and found out that the user-defined casing worked
in ways that were surprises to me. And thus, this brutally lays out its
shortcomings.
|
|
|
|
|
|
|
|
|
| |
There was some misleading, or uncharitably, wrong text in this pod about
user-defined casing. And, it jumped the gun, presuming that 5.14 would
fix something for which there has not been a patch submitted yet.
And, I realized there was a way around having to figure out the utf8 for
a character.
|
|
|
|
|
| |
Mention the POSIX character classes as being affected by the Unicode
bug.
|
|
|
|
| |
This reverts commit d67647f5f40a7e78bffc92ff8600c67f95d3d7b0.
|
|
|
|
|
| |
Mention the POSIX character classes as being affected by the Unicode
bug.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch causes \N{}, vianame, and viacode to know the names of all
Unicode code points. Previously the names that are algorithmically
determinable were not handled. These include the Hangul syllables and
many CJK characters.
It simply adds using the routines that mktables inserts into Name.pl
that handle these characters. mktables generates these algorithms from
data in the Unicode data base. The routines have been there since
11/2009 in anticipation of this change, but have been unused until now.
They probably have not been reviewed thoroughly.
The major change to this is the .t file. Now that all code points are
understood, the .t tests them all. But this would take too long each
time, so it tests a random sample. If there is a failure, the seed is
output so that the test can be reproduced. This idea came from Michael
Schwern, and is the same he uses in Test::Sims. Various parameters
about the sampling are easily adjustable.
|
|
|
|
| |
And add a .t file to verify that it works.
|
|
|
|
| |
This is suitable for 5.12.2, but not many people use this feature.
|
| |
|
| |
|
|
|
|
|
|
|
|
| |
The module encoding::warnings can be used to warn when two strings are
concatenated where one is utf8 and the other is not and contains
non-ASCII.
Note the existence of this in the pod documentation.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I looked at all the instances of spaces around -- and in most cases
converted the sentences to use more appropriate punctuation. In
general, the -- in the perl docs seem to be there only to make
really complicated and really long sentences.
I didn't look at the closed em-dashes. They probably have the same
sentence-complexity problem.
I left some open em-dashes in place. Those are the ones used in
lists.
|
|
|
|
|
|
|
|
|
|
| |
The Unicode Standard defines (as a recommendation) that Print be based on
graphical characters and blank characters (minus controls). Perl's has been
based on space rather than blank. The only practical effect this has is that
Perl erroneously matches the LINE SEPARATOR and PARAGRAPH SEPARATOR, which
clearly are not printable characters.
Signed-off-by: Abigail <abigail@abigail.be>
|
|
|
|
|
|
|
|
|
|
|
| |
Attached
From 75bb462da5f7ea844447dfdd7d9aadfe15f6dcf3 Mon Sep 17 00:00:00 2001
From: Karl Williamson <khw@khw-desktop.(none)>
Date: Tue, 29 Dec 2009 13:08:28 -0700
Subject: [PATCH] Correct grammatical error in perlunicode.pod
Signed-off-by: H.Merijn Brand <h.m.brand@xs4all.nl>
|
|
|
|
|
|
|
|
|
|
|
| |
This also changes some C<> constructs.
From d01b049b3aa9bc3a394adb30d6db735f5dd52321 Mon Sep 17 00:00:00 2001
From: Karl Williamson <khw@khw-desktop.(none)>
Date: Mon, 28 Dec 2009 09:14:48 -0700
Subject: [PATCH] Document all perl Unicode \p extensions
Signed-off-by: H.Merijn Brand <h.m.brand@xs4all.nl>
|
|
|
|
| |
Signed-off-by: Abigail <abigail@abigail.be>
|
|
|
|
| |
Signed-off-by: Abigail <abigail@abigail.be>
|
| |
|