summaryrefslogtreecommitdiff
path: root/lib/utf8_heavy.pl
Commit message (Collapse)AuthorAgeFilesLines
* utf8_heavy.pl: Improve debug outputKarl Williamson2011-02-061-1/+3
| | | | | Often, when DEBUG is set, an uninitialized variable message gets printed as well. This fixes that.
* utf8_heavy: Use new mktables caseless featureKarl Williamson2011-02-021-3/+13
| | | | | | | | | This patch causes utf8_heavy.pl to know about the new data structure that mktables now generates to indicate what substitute table to use for one that has different results under /i matching. Note that regcomp.c, as of this commit, does not generate the names that would exercise this code.
* restrict \p{IsUserDefined} to In\w+ and In\w+David Mitchell2011-01-161-1/+1
| | | | | | | | | | | In L<perlunicode/"User-Defined Character Properties">, it says you can create custom properties by defining subroutines whose names begin with "In" or "Is". However, perl doesn't actually enforce that naming restriction, so \p{foo::bar} will call foo::Bar() if it exists. This commit finally enforces this convention. Note that this broke a number of existing tests for properties, since they didn't always use an Is/In prefix.
* utf8_heavy.pl: Remove unused variable declarationMichael Parker2011-01-071-1/+0
|
* utf8_heavy: Guard against infinite recursionKarl Williamson2010-11-221-8/+48
| | | | | | | | | | | If things aren't just so, it could be that utf8_heavy calls something which requires a pattern, such as split or just a pattern match that ends up calling utf8_heavy again, ad infinitum. When this happens, memory gets eaten up and the machine grinds to a halt, likely requiring a manual forced reboot. To prevent this undesirable situation, utf8_heavy now stacks all its calls in progress, and if any is a repeat, panics.
* Avoid a run-time miniperl check every time SWASHNEW is calledFather Chrysostomos2010-11-051-1/+7
|
* utf8_heavy.pl: Make callable during Perl's compilationKarl Williamson2010-11-051-3/+16
| | | | | | | | | | | | | | | It's possible for this to be called during the compilation phase of Perl by miniperl before the Unicode tables have been built. This patch checks if dynamic loading is available, and if not evals the require needed to gain access to the tables. If it succeeds, the tables have been built; if it doesn't, instead of dying, just return empty tables, as currently the things being built don't require information outside the ASCII range, which is hard-coded into Perl without needing the tables. In the future, that may not be the case, and then likely the tables will have to be shipped with Perl, and make regen would be done to rebuild them.
* mktables revampKarl Williamson2009-11-211-254/+430
|
* Localize $@ and $! before loading a file in SWASHNEWRafael Garcia-Suarez2009-09-161-0/+2
| | | | | | This fixes a bug where a spurious error was reported from utf8_heavy. This been found by Salvador Ortiz Garcia who suggested to localize $@; I merely added $!.
* Make lc/uc/lcfirst/ucfirst warn when passed undef.Rafael Garcia-Suarez2008-01-281-0/+2
| | | | | Naive implementation. p4raw-id: //depot/perl@33088
* Tels' patch to defer overloading of hex and oct,Rafael Garcia-Suarez2007-06-231-4/+4
| | | | | | to avoid magic leaking and smoke failures under utf-8 locales p4raw-id: //depot/perl@31450
* [perl #42839] Swatch hash cache has key mismatch Jonathan Steinert2007-05-021-3/+4
| | | | | | From: Jonathan Steinert (via RT) <perlbug-followup@perl.org> Message-ID: <rt-3.6.HEAD-30557-1178021932-1416.42839-75-0@perl.org> p4raw-id: //depot/perl@31119
* do $file; won't propagate errors from die, as do is an implicit eval.Nicholas Clark2007-01-081-1/+1
| | | | | So need to propagate errors with $@. p4raw-id: //depot/perl@29723
* Clarification and cleanup of the XS SWASHGET codeSADAHIRO Tomoyuki2005-12-051-19/+33
| | | | | | Subject: Re: XS-assisted SWASHGET (esp. for t/uni/class.t speedup) Message-Id: <20051204162508.D726.BQW10602@nifty.com> p4raw-id: //depot/perl@26255
* Re: XS-assisted SWASHGET (esp. for t/uni/class.t speedup)SADAHIRO Tomoyuki2005-11-301-7/+1
| | | | | Message-Id: <20051127170016.A786.BQW10602@nifty.com> p4raw-id: //depot/perl@26229
* XS-assisted SWASHGET (esp. for t/uni/class.t speedup)SADAHIRO Tomoyuki2005-11-231-138/+3
| | | | | | | | | | Message-Id: <20051123175603.FFD5.BQW10602@nifty.com> And : Message-Id: <20051123202935.4D9D.BQW10602@nifty.com> with some nits to use U8 instead of char more consistently p4raw-id: //depot/perl@26199
* replace the run time code in lib/utf8_pva.pl with data generatedNicholas Clark2004-05-311-1/+1
| | | | | at build by mktables, stored in lib/unicore/PVA.pl p4raw-id: //depot/perl@22881
* Don't need to require utf8_pva.pl at top of fileNicholas Clark2004-05-311-2/+1
| | | p4raw-id: //depot/perl@22880
* candidate for TR18 complianceJeff Pinyan2004-04-271-28/+54
| | | | | | | | | Date: Thu, 22 Apr 2004 14:31:30 -0400 (EDT) Message-ID: <Pine.LNX.4.44.0404221429040.10466-101000@perlmonk.org> Date: Mon, 26 Apr 2004 12:37:21 -0400 (EDT) Message-ID: <Pine.LNX.4.44.0404261222320.7154-400000@perlmonk.org> p4raw-id: //depot/perl@22744
* lib/utf8_heavy.pl -- cascading classes and '&' supportJeff Pinyan2004-04-141-4/+20
| | | | | Message-ID: <Pine.LNX.4.44.0404122011160.3038-200000@perlmonk.org> p4raw-id: //depot/perl@22693
* For characters beyond the BMP the $bits will be undef,Jarkko Hietaniemi2003-06-221-1/+1
| | | | | | which will cause utf8_heavy.pl noise (reported by Daniel Yacob, analysis and fix from SADAHIRO Tomoyuki) p4raw-id: //depot/perl@19835
* Integrate from the maint-5.8/ branch :Rafael Garcia-Suarez2002-12-101-15/+41
| | | | | | | | | | | | | | | | changes 18219, 18236, 18242-3, 18247-8, 18253-5, 18257, 18273-6 p4raw-id: //depot/perl@18280 p4raw-branched: from //depot/maint-5.8/perl@18279 'branch in' t/op/lc_user.t p4raw-integrated: from //depot/maint-5.8/perl@18279 'copy in' lib/File/Copy.pm (@17645..) lib/utf8_heavy.pl pod/perlsec.pod (@18080..) hints/irix_6.sh (@18173..) t/uni/tr_utf8.t (@18197..) pod/perlunicode.pod (@18242..) t/op/pat.t (@18248..) t/op/split.t (@18274..) 'edit in' pod/perlguts.pod (@18242..) 'merge in' pp.c (@18126..) MANIFEST (@18234..) p4raw-integrated: from //depot/maint-5.8/perl@18254 'merge in' pod/perldiag.pod (@18234..)
* Re: [perl #17951] Strange UTF errorJarkko Hietaniemi2002-10-201-2/+4
| | | | | Message-ID: <20021016155051.GB268437@lyta.hut.fi> p4raw-id: //depot/perl@18035
* perl #17453Jarkko Hietaniemi2002-09-261-15/+14
| | | | | Message-ID: <20020920142245.GG280265@lyta.hut.fi> p4raw-id: //depot/perl@17933
* Integrate #16353 from macperl;Jarkko Hietaniemi2002-05-021-2/+2
| | | | | | | | | | | | "fix" for utf8_heavy.pl, lexical UTF8 var crashed in test 92 of run/fresh_perl.t on MacOS (as pudge rightfully points out, this is voodoo programming at it best, the real bug is somewhere else, now we just happened to shake the chicken the right way) p4raw-id: //depot/perl@16355 p4raw-integrated: from //depot/macperl@16354 'merge in' lib/utf8_heavy.pl (@16123..)
* Re: Encode, charnames and utf8heavyDan Kogai2002-05-021-1/+1
| | | | | | | Message-Id: <539D985A-5D1A-11D6-BB19-00039301D480@dan.co.jp> (plus a respective perlunicode tweak) p4raw-id: //depot/perl@16354
* Make writing user-defined character properties nicer.Jarkko Hietaniemi2002-04-211-1/+7
| | | p4raw-id: //depot/perl@16054
* User-defined character properties were unintentionallyJarkko Hietaniemi2002-04-201-13/+36
| | | | | removed, noticed by Dan Kogai. p4raw-id: //depot/perl@16012
* A little bit better error message for \pq, stillJarkko Hietaniemi2002-03-281-1/+3
| | | | | not good because the script context is not shown. p4raw-id: //depot/perl@15581
* Jeffrey's Unicode adventure continues: unify the In/*.plJarkko Hietaniemi2002-01-161-131/+58
| | | | | | and Is/*.pl to lib/*.pl, remove In.pl and Is.pl, introduce Canonical.pl and Exact.pl. p4raw-id: //depot/perl@14294
* Additional utf8_heavy.pl tweak from Jeffrey.Jarkko Hietaniemi2002-01-151-4/+11
| | | p4raw-id: //depot/perl@14272
* Big mktables rewrite from Jeffrey;Jarkko Hietaniemi2002-01-141-84/+170
| | | | | documentation not yet updated. p4raw-id: //depot/perl@14254
* Future-proofing from Jeffrey Friedl (for conflictingJarkko Hietaniemi2002-01-131-2/+2
| | | | | In* and Is* names). p4raw-id: //depot/perl@14242
* RESENT - [PATCH] utf8_heavy.pl Jeffrey Friedl2001-12-161-2/+2
| | | | | Message-Id: <200112160355.fBG3t1t84835@ventrue.corp.yahoo.com> p4raw-id: //depot/perl@13710
* Support \p{All}, \p{IsAssigned}, \p{IsUnassigned}.Jarkko Hietaniemi2001-12-151-0/+2
| | | p4raw-id: //depot/perl@13706
* Unicode categories continue:Jarkko Hietaniemi2001-10-191-6/+13
| | | | | | | | implement Category=, Script=, Block= (these are based on an upcoming update of TR#18) Fix a bug where we got two In categories named "old italic", and another where shortcut for the Is categories wasn't taken. p4raw-id: //depot/perl@12500
* Document the problem with the swash_fetch() API that affectsJarkko Hietaniemi2001-10-161-0/+1
| | | | | more complex case conversions. p4raw-id: //depot/perl@12450
* Rewrite mktables from scratch.Jarkko Hietaniemi2001-10-131-29/+64
| | | | | | | | | | | | | | | | | | | - Cleaner. - Faster: 15-20 seconds as opposed to several minutes. - More dynamic: the names of the various categories such as the linebreak ones are dynamic, not static. - Is.pl: long names for the general category properties are now available. - Ranges (<... ,First>, <..., Last>) from the general categories work now. - No more mktables.PL because the mktables.PL is not and never has been run to create a mktables. - syllables.txt and Is/Syl*.pl removed: non-standard (not part of the Unicode), and the whole concept is being reworked (http://syllabary.sourceforge.net/), the old way wouldn't even work with the new Syllables.txt (it would result in 1000+ new categories) p4raw-id: //depot/perl@12427
* Enable more debugging.Jarkko Hietaniemi2001-10-091-5/+5
| | | p4raw-id: //depot/perl@12373
* Unicode properties saga continues.Jarkko Hietaniemi2001-10-041-1/+1
| | | p4raw-id: //depot/perl@12335
* Yet more Unicode properties.Jarkko Hietaniemi2001-10-041-3/+4
| | | p4raw-id: //depot/perl@12334
* Unicode properties: fix L& (the #12319 didn't allow L&,Jarkko Hietaniemi2001-10-031-2/+2
| | | | | | only IsL&) and Inherited (negative lookahead good); add tests for Common, Inherited, and L&. p4raw-id: //depot/perl@12320
* Unicode properties: support \p{(?:Is)?L&} as an alias for \pL.Jarkko Hietaniemi2001-10-031-7/+8
| | | | | (The Unicode standard uses L& quite often.) p4raw-id: //depot/perl@12319
* Further tweaks to the Unicode properties.Jarkko Hietaniemi2001-10-011-0/+1
| | | p4raw-id: //depot/perl@12286
* Cleanup utf8_heavy; allow dropping the In prefix fromJarkko Hietaniemi2001-09-301-41/+43
| | | | | Unicode script/block properties. p4raw-id: //depot/perl@12281
* #12272 wasn't right, it introduced an extra ().Jarkko Hietaniemi2001-09-301-1/+1
| | | p4raw-id: //depot/perl@12278
* Nasty recursion trap if one would match Unicode.Jarkko Hietaniemi2001-09-291-1/+1
| | | p4raw-id: //depot/perl@12272
* More leniency to the \p and \P: now can have whitespaceJarkko Hietaniemi2001-09-291-1/+1
| | | | | | | between the property definition and the curlies; now can invert the property by having a caret between the open curly and the property. p4raw-id: //depot/perl@12269
* Allow for more flexibility in the \p{In...} names, nowJarkko Hietaniemi2001-09-291-4/+13
| | | | | | | case doesn't matter, and any space or dash can be matched by any space, dash, underbar, or empty. (may be going too far on leniency) p4raw-id: //depot/perl@12264
* Rename lib/unicode files to lib/unicore to avoidJarkko Hietaniemi2001-08-091-1/+1
| | | | | | conflicts between core lib/unicode and Unicode:: files in case-ignoring filesystems. p4raw-id: //depot/perl@11623