summaryrefslogtreecommitdiff
path: root/utf8.c
Commit message (Collapse)AuthorAgeFilesLines
* Avoid compiler warnings in Perl_foldEQ_utf8, spotted by Jerry D. Hedden.Nicholas Clark2010-06-171-2/+6
|
* Change name of ibcmp to foldEQKarl Williamson2010-06-051-8/+8
| | | | | | | | | | | | | | | | As discussed on p5p, ibcmp has different semantics from other cmp functions in that it is a binary instead of ternary function. It is less confusing then to have a name that implies true/false. There are three functions affected: ibcmp, ibcmp_locale and ibcmp_utf8. ibcmp is actually equivalent to foldNE, but for the same reason that things like 'unless' and 'until' are cautioned against, I changed the functions to foldEQ, so that the existing names, like ibcmp_utf8 are defined as macros as being the complement of foldEQ. This patch also changes the one file where turning ibcmp into a macro causes problems. It changes it to use the new name. It also documents for the first time ibcmp, ibcmp_locale and their new names.
* utf8.c: further doc tweaksKarl Williamson2010-06-051-6/+11
|
* utf8.c: Modify doc comment; change whitespaceKarl Williamson2010-06-051-75/+74
| | | | | | | This removes the comment about the function name, and converts tabs to blanks throughout the function, as so much of it is changing already. It also removes trailing whitespace in other lines of the file.
* Revamp ibcmp_utf8 for performance and clarityKarl Williamson2010-06-051-108/+152
| | | | | | | | | | | | | | | | | | | | | | | | | | I had a hard time understanding how this routine worked; there were no comments. In figuring it out, I discovered it could be made more efficient. This routine is called over and over in the innermost loops in regex matching, so efficiency is a concern. Setup is done once before the main while loop so that it now has two conditions instead of eight. The loop was rearranged slightly to be smaller and a couple of unneeded assignments to temporaries were removed, and recomputation of some values was avoided. Several other small efficiency changes were made. Several asserts had been commented out, saying that they make tests fail. But they no longer do, at least on my platform. There was a reason that they were asserts to begin with, and that is they denoted an insane or trivial condition. Apparently there have been fixes to the other code calling this, so I re-enabled them. The names of several variables were changed to be less confusing; hence f1 means the fold buffer for string 1 whereas it used to mean its goal, which is now g1. The leading indent was changed from 5 to 4 blanks. I made enough other changes that I didn't submit this as a separate commit
* Clarify some documentationKarl Williamson2010-06-051-3/+5
|
* PATCH: user defined special casing for non utf8Karl Williamson2010-05-261-2/+1
| | | | | | | | | | Users can define their own case changing mappings to replace the standard ones. Prior to this patch, any mappings on characters whose ordinals are 0-222, 224-255 that resulted in multiple characters were ignored. Note that there still is a deficiency in that the mappings will be applied only to strings in utf8 format.
* PATCH: [perl #72998] regex loopingKarl Williamson2010-04-151-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | If a character folds to multiple ones in case-insensitive matching, it should not match just one of those, or the regular expression can loop. For example, \N{LATIN SMALL LIGATURE FF} folds to 'ff', and so "\N{LATIN SMALL LIGATURE FF}" =~ /f+/i should match. Prior to this patch, this function returned that there is a match, but left the matching string pointer at the beginning of the "\N{LATIN SMALL LIGATURE FF}" because it doesn't make sense to match just half a character, and at this level it doesn't know about the '+'. This leaves things in an inconsistent state, with the reporting of a match, but the input pointer unchanged, the result of which is a loop. I don't know how to fix this so that it correctly matches, and there are semantic issues with doing so. For example, if "\N{LATIN SMALL LIGATURE FF}" =~ /ff/i matches, then one would think that so should "\N{LATIN SMALL LIGATURE FF}" =~ /(f)(f)/i But $1 and $2 don't really make sense here, since they both refer to the half of the same character. So this patch just returns failure if only a partial character is matched. That leaves things consistent, and solves the problem of looping, so that Perl doesn't hang on such a construct, but leaves the ultimate solution for another day.
* [perl #73174] swash_init() wasn't saving %^HDavid Mitchell2010-03-021-2/+1
|
* change non-char warning message from malformedKarl Williamson2009-12-201-45/+49
|
* qr/\X/ expansionKarl Williamson2009-12-051-3/+120
|
* Perl_utf16_to_utf8() should treat "\0" like any every other odd-length input.Nicholas Clark2009-10-221-6/+0
| | | | | The "be understanding" bodge to not panic, introduced in 1de9afcdf18cf98b, is no longer needed now that c28d61051c446453 fixes the underlying problem.
* Perl_utf16_to_utf8() should return the correct length when being "understanding"Nicholas Clark2009-10-211-1/+1
| | | | | ("be understanding" being a bodge added in 1de9afcdf18cf98b, which will soon go when I fix the underlying cause of the bugs it works around.)
* somewhat fix failing regex tests. but break lots of other stuff at the same timeYves Orton2009-10-191-1/+31
|
* In utf16_to_utf8(), fix off-by-one errors for the range of valid surrogates.Nicholas Clark2009-10-181-2/+2
| | | | Both high ends were one too low.
* utf16_to_utf8() should croak on encountering a bare low surrogate.Nicholas Clark2009-10-181-0/+2
|
* utf16_to_utf8() should croak if the buffer ends without the second surrogate.Nicholas Clark2009-10-181-4/+8
|
* utf16_to_utf8_reversed() should croak early when passed an odd byte length.Nicholas Clark2009-10-181-0/+4
| | | | | | Rather than transposing n + 1 bytes, including 1 it was not passed, before calling utf16_to_utf8() and having that croak. e 69422~
* Add Perl_ck_warner_d(), which combines Perl_ckwarn_d() and Perl_warner().Nicholas Clark2009-10-121-7/+5
| | | | | Replace ckWARN_d{,2,3,4}() && Perl_warner() with it, which trades reduced code size for 1 more function call if warnings are not enabled.
* Change warning "Unicode character is illegal" to more accurate descriptionRafael Garcia-Suarez2009-10-031-1/+1
| | | | | That now reads "Unicode non-character is illegal in interchange" and the perldiag documentation is expanded a bit.
* Remove obsolete functions is_uni_alnumc, is_uni_alnumc_lc, is_utf8_alnumcRafael Garcia-Suarez2009-09-131-24/+0
|
* Don't pass the the interpreter to is_ascii_string(), is_utf8_char(), ↵Vincent Pit2009-08-271-8/+4
| | | | is_utf8_string(), is_utf8_string_loclen() as they don't need it
* In C<use utf8; a=>'b'>, do not set utf8 flag on 'a' [perl #68812]Chip Salzenberg2009-08-261-1/+34
|
* Faster utf8_length method -- fixes [RT#50250]Alex Vandiver2009-06-061-13/+15
| | | | | | | UTF8SKIP appears to be a rather slow call; use UTF8_IS_INVARIANT to skip it whenever possible. We also move the malformed utf8 check until after the loop, since it can be checked after the termination condition, instead of at every pass through the loop.
* Update the documentation of get_hv() to note that it calls Perl_gv_fetchpv(),Nicholas Clark2009-01-211-1/+1
| | | | | and hence the 'create' argument is actually 'flags'. Fix code and documentation that used TRUE or FALSE to use 0 or GV_ADD.
* Subject: PATCH 5.10 documentationSteve Peters2008-12-191-9/+13
| | | | | | From: karl williamson <public@khwilliamson.com> Date: Tue, 16 Dec 2008 16:00:34 -0700 Message-ID: <49483312.80804@khwilliamson.com>
* PATCH: Large omnibus patch to clean up the JRRT quotesTom Christiansen2008-11-021-4/+11
| | | | | | Message-ID: <25940.1225611819@chthon> Date: Sun, 02 Nov 2008 01:43:39 -0600 p4raw-id: //depot/perl@34698
* Use pvs macros instead of pvn where possible.Marcus Holland-Moritz2008-10-291-2/+2
| | | p4raw-id: //depot/perl@34653
* Remove redundant API definitions from '=for apidoc' sections.Marcus Holland-Moritz2008-10-291-24/+24
| | | | | | Those are already in embed.fnc, and most of them were already outdated. This also fixes the docs for pv_escape and pv_pretty. p4raw-id: //depot/perl@34642
* Eliminate (HV *) casts in u*.c.Nicholas Clark2008-10-281-3/+3
| | | p4raw-id: //depot/perl@34624
* Update copyright years.Nicholas Clark2008-10-251-1/+1
| | | p4raw-id: //depot/perl@34585
* pv_uni_display () omitted backslash in output stringH.Merijn Brand2008-09-251-0/+1
| | | p4raw-id: //depot/perl@34416
* assert() that every NN argument is not NULL. Otherwise we have theNicholas Clark2008-02-121-2/+125
| | | | | | | | | | | | ability to create landmines that will explode under someone in the future when they upgrade their compiler to one with better optimisation. We've already done this at least twice. (Yes, some of the assertions are after code that would already have SEGVd because it already deferences a pointer, but they are put in to make it easier to automate checking that each and every case is covered.) Add a tool, checkARGS_ASSERT.pl, to check that every case is covered. p4raw-id: //depot/perl@33291
* Add macros mPUSHs() and mXPUSHs() for pushing SVs on the stackMarcus Holland-Moritz2008-01-041-4/+4
| | | | | | and mortalizing them. Use these macros where possible. And also mX?PUSH[inpu] where possible. p4raw-id: //depot/perl@32821
* Add newSVpvs_flags() as a wrapper to newSVpvn_flags(), and reworkNicholas Clark2008-01-031-1/+1
| | | | | sv_2mortal(newSVpvs(...)) constructions to use it. p4raw-id: //depot/perl@32819
* Extend newSVpvn_flags() to also call sv_2mortal() if SVs_TEMP is set inNicholas Clark2008-01-031-2/+2
| | | | | | the flags. Move its implementation just ahead of sv_2mortal()'s for CPU cache locality. Refactor all code that can be to use this. p4raw-id: //depot/perl@32818
* Fix various bugs in regex engine with mixed utf8/latin pattern and strings. ↵Yves Orton2007-12-171-0/+5
| | | | | | | Related to [perl #36207] among others Message-ID: <9b18b3110712170621h41de2c76k331971e3660abcb0@mail.gmail.com> p4raw-id: //depot/perl@32628
* Re: several compilation problems on VMS in perl@32039Craig A. Berry2007-10-061-2/+2
| | | | | | | From: "Craig A. Berry" <craig.a.berry@gmail.com> Message-ID: <c9ab31fc0710061147x3ee7f9bdg2b1bac3acd018bb2@mail.gmail.com> Date: Sat, 6 Oct 2007 13:47:03 -0500 p4raw-id: //depot/perl@32058
* newSV(size) and SvPOK_on() will be more efficient than newSVpvs("")Nicholas Clark2007-10-061-2/+2
| | | | | followed by SvGROW(size+1) p4raw-id: //depot/perl@32045
* Revert one hunk of change 32034 that had the possibility of being buggyNicholas Clark2007-10-061-0/+1
| | | | | | | | (the sprintf "%c" code will work correctly when the SV is UTF-8). Audit all the rest for UTF-8 correctness, and force SvUTF-8_off() in utf8.c to ensure correctness. (The string is reset to "", so this will not be a behaviour change.) p4raw-id: //depot/perl@32040
* Eliminate most *printf-like calls that use a simple "%c" format,Nicholas Clark2007-10-051-2/+4
| | | | | | replacing them with constructions that are more efficient because they avoid the overhead of the *printf format parser and interpreter code. p4raw-id: //depot/perl@32034
* s/\bunicode\b/Unicode/; # For everything not dual lifeNicholas Clark2007-06-241-1/+1
| | | p4raw-id: //depot/perl@31455
* move PL_tokenbuf into the PL_parser structDave Mitchell2007-05-211-8/+0
| | | p4raw-id: //depot/perl@31252
* move PL_in_my and PL_in_my_stash into the PL_parser structDave Mitchell2007-05-121-3/+0
| | | p4raw-id: //depot/perl@31203
* Avoid the need for 2 casts added in 31055 by using a better type forNicholas Clark2007-04-251-0/+1
| | | | | | the local variable. Add an assertion that another cast is not a data loss (and that there is no buffer overflow) p4raw-id: //depot/perl@31069
* Silence 5 "possible loss of data" warnings from VC6Steve Hay2007-04-241-1/+1
| | | p4raw-id: //depot/perl@31055
* Fix problems caused by downsizing in change 31017. (Which don't showNicholas Clark2007-04-221-1/+1
| | | | | up until you test on a "real" architecture) p4raw-id: //depot/perl@31023
* The last parameter to gv_stashpv/gv_stashpvn/gv_stashsv is a bitmaskNicholas Clark2007-01-251-1/+1
| | | | | of flags, not a boolean, so correct the documenation and callers. p4raw-id: //depot/perl@29977
* Update copyright years in .c filesRafael Garcia-Suarez2007-01-051-1/+1
| | | p4raw-id: //depot/perl@29696
* 4th patch from: Marcus Holland-Moritz2007-01-041-1/+1
| | | | | | | | | Subject: [PATCH] Cleanup SVf arguments (2nd try) Message-ID: <20070101201613.4120d9ef@r2d2> Introduce an SVfARG() macro for %SVf (%-p here) arguments to perl's printf p4raw-id: //depot/perl@29687