summaryrefslogtreecommitdiff
path: root/regcharclass.h
Commit message (Collapse)AuthorAgeFilesLines
* fix incorrect vi filetype declarations in generated filesLukas Mai2023-03-241-1/+1
| | | | | Vim's filetype declarations are case sensitive. The correct types for Perl, C, and Pod are perl, c, and pod, respectively.
* manually triggered generated files - add file type data to modelineYves Orton2023-02-191-2/+2
| | | | so that github syntax highlights them properly
* mktables: flush output immediately when debuggingKarl Williamson2022-09-281-1/+1
| | | | Helps disentangle mixed up output
* Support Unicode 15.0Unicode Consortium2022-09-281-50/+50
|
* mktables: Skip some new 15.0 filesKarl Williamson2022-09-281-1/+1
| | | | | | | | | These are newly delivered by Unicode. I haven't had time to analyze them for use for potential new properties. They deal with security issues of characters that look alike. I'm not adding them to the list of files under git, but they are explicitly mentioned in mktables to indicate their not being used.
* mktables: Skip some code for Unicode 15Karl Williamson2022-09-281-1/+1
| | | | As it becomes obsolete
* mktables: Revise Version line search in inputsKarl Williamson2022-09-281-1/+1
| | | | | Unicode 15.0 is revising the heading format for non-UCD files; Fix mktables to be able to parse that.
* mktables: Accept multiple @missing lines in input filesKarl Williamson2022-09-281-1/+1
| | | | | | | | | | | | | Unicode 15.0 will now use this approach to deal with ranges of code points that have a different default for unassigned code points than the table at large. For example, a table may have one default, but all Ideographic character ranges have something else. Prior to this new mechanism, the files had entries for each unassigned code point that had a different default than the global one. So this saves some lines in the files that Unicode delivers that were otherwise useless. Not all files in 15.0 have been converted to use the new scheme, for whatever reason.
* mktables: Multi_Default now accepts multiple defaults per propertyKarl Williamson2022-09-281-1/+1
| | | | | | Unicode 15.0 may have multiple @missing lines for a single property, that should use this class. This commit converts the storage into an array to accommodate that need..
* mktables: Add two methods to Multi_Default classKarl Williamson2022-09-281-1/+1
| | | | | | These are so that you don't have to know everything at construction time. The constructor function changes to call these with whatever it does get passed
* mktables: More closely examine @missing linesKarl Williamson2022-09-281-1/+1
| | | | | | These lines have all had the same range (all of Unicode). But in Unicode 15.0, there will be some with different ranges. This commit changes to save those values (which are currently still unused)
* mktables: Standardize value aliasesKarl Williamson2022-09-281-1/+1
| | | | | | | | | | | | | | As stated in the code comments added by this commit, Unicode has various spellings for the same property value. For example in some places it uses 'W', and in others 'Wide'. The legal spellings are listed in PropValueAliases.txt, which is processed early in the construction. So we can standardize things on input, which makes it easier later. This commit produces minimal changes in the generated tables, so that the algorithm can be verified by inspection of the results. And no other code that has hard-coded in expected spellings needs to be changed. Prior to this commit, we standardized the default value for properties that have a default value,.
* mktables: Add/Fix comments, white-spaceKarl Williamson2022-09-281-1/+1
| | | | | This includes indenting a block of code in anticipation of a future commit which will form a conditional block around it
* mktables: Use intermed variable to shorten nameKarl Williamson2022-09-281-1/+1
| | | | | This changes an inside-out hash reference to have a shorthand for it, making for better readability
* mktables: Convert array to hashKarl Williamson2022-09-281-1/+1
| | | | | | | | Prior to this commit we had a two element array, and it was known that element 0 contained a particular thing; and element 1 contained the other. But a future commit will add several elements, so keeping track of which is which will become more problematic. Solve this by using a hash instead, with the elements appropriately named.
* mktables: Fix some function signaturesKarl Williamson2022-09-281-1/+1
| | | | | These functions were missed or broken by 4fe9356b250. They're used only in debugging, so it wasn't noticed until now.
* mktables: Reorder some codeKarl Williamson2022-09-281-1/+1
| | | | | | | This moves some code ahead of other code so that the end of the sub all works on a single related issue. This is in preparation for 15.0, where that issue becomes moot, so we can then change to return early from the sub.
* mktables: Stop infinite loop with invalid inputKarl Williamson2022-09-281-1/+1
| | | | This failed to exit when the file handle was exhausted
* mktables: Don't generate pod for Name.pmKarl Williamson2022-07-021-1/+1
| | | | | | | This is a relic from long ago. mktables creates lib/unicore/Name.pm. And in that file which is for internal core use only, it was creating the beginnings of some pod, but quite incomplete; this was confusing buildtoc, which perhaps could be hardened against such inputs.
* Update checksums in some generated filesKarl Williamson2022-06-061-0/+1
| | | | | | | | These use checksums to see if the generated data could be out of date. The new NormTest.pl wasn't counted in this, and needn't be, but excluding it and other similar ones is more trouble than it's worth, so make a comment to that effect and update to include the NormTest.pl digest value.
* Bump \p{nv=} precision from 2 to 3Karl Williamson2022-04-121-1/+1
| | | | | | | | | | | | This closes #19603 Unicode has various characters whose numeric value is rational non-integer. These can be specified in \p{nv=...} constructs by either the rational form or by an expression that it evaluates to. The number of significant digits that must match are kept to a minimum to allow for variances in different platforms floating point lengths and rounding decisions. Previously that number was 2 digits; but that is no longer always sufficient for all platforms. This commit changes it to 3.
* Add is_XPERLSPACE_utf8_safe_backwards()Karl Williamson2022-03-191-1/+88
| | | | | This macro starts from the right side and matches UTF-8 white space characters.
* Remove 'no warnings experimental::signatures' from support filesPaul "LeoNerd" Evans2022-02-201-1/+1
|
* Fix lib/unicore/mktables for experimental::builtin warningsPaul "LeoNerd" Evans2022-01-251-1/+1
|
* Remove remaining uses of @_ in signatured subs in lib/unicore/mktablesPaul "LeoNerd" Evans2022-01-241-1/+1
|
* Add missing aliases for \p{Present_In}Karl Williamson2022-01-051-1/+1
| | | | | | | | | | | | | | | | | | \p{Present_In} is a Perl extension of the Unicode Age property, added because knowing the exact Unicode version in which a code point became assigned is rarely what you want; much more frequently you want to know if the code point exists in the version or not. (Since this extension was added, Unicode changed their language to declare that the Age property should be interpreted in pattern matching, not as described, but as Perl's Present_In is. But I chose to not change Age, to avoid backwards compatibility issues, and this way, a coder can choose which thing s/he wanted.) Unicode typically has synonyms (aliases) for each value a property can tak on, so \p{Age=6.1} and \p{Age=V61_1} mean the same thing. Prior to this commit, neither \p{Present_In=1_1} nor \p{Present_In=NA} worked.
* mktables: Use builtin::refaddrKarl Williamson2021-12-131-1/+1
| | | | | Now that this function is available in miniperl, mktables can use it to avoid a bunch of visually distracting 'no overloading' calls.
* mktables: Don't calculate some unused valuesKarl Williamson2021-12-131-1/+1
| | | | These apparently were once needed, but no longer.
* mktables: Use mnemonic variable namesKarl Williamson2021-12-071-1/+1
| | | | Spotted by Dagfinn Ilmari Mannsåker
* Fix unicore/mktables to avoid any @_ accesses in signatured subsPaul "LeoNerd" Evans2021-12-071-1/+1
|
* mktables: Remove relics of removed legacy tablesKarl Williamson2021-09-151-1/+1
| | | | | These mentions of the tables removed in b852e1da77b497e086508451bebff00541073fb1 were missed in that commit.
* Support Unicode 14.0Unicode Consortium2021-09-151-52/+52
|
* mktables: Split a Line Break equivalence classKarl Williamson2021-09-151-1/+1
| | | | This is used for the \b{lb}, and the rule is changing in Unicode 14.0
* mktables: Reorder some comments, white-spaceKarl Williamson2021-09-151-1/+1
| | | | Move comments closer to the action
* mktables: Rename variable, and hoist calc from loopKarl Williamson2021-09-151-1/+1
|
* Unicode::UCD: Don't depend on a file current syntaxKarl Williamson2021-08-311-1/+1
| | | | | This generated file will be changed in a future commit. This shouldn't have been relying on its syntax anyway, but the value it returns.
* Unicode::UCD: Fix typo in podKarl Williamson2021-08-311-1/+1
|
* Remove deprecated Unicode filesKarl Williamson2021-09-011-1/+1
| | | | | | | | | | | | | | | | | | | | | | | These files were once apparently intended for use by modules to supplement the core Unicode handling. They contain tables suitable for use by Perl code of the portions of the Unicode character database about changing the case of characters and finding the numeric value of a given \d character, in a form suitable for use by perl code. In particular, they were designed for fast access using the swash mechanism that has since been removed. Now, Unicode::UCD now contains more convenient methods of accessing the data these contain, and the use of these files has been deprecated since 5.16. I could not figure out a way to force a message should someone open and read one of these files, but each of their texts say that the file may be removed without notice at any time. I did not find any uses on cpan of them. Unicode is adding new properties that the format of these files will not be able to handle. Consequently I'm coming up with a new format. Though these files don't contain the new properties, their existence means having the burden of having to maintain two separate mechanisms. Better to have just one mechanism, suitable for going forward.
* mktables: Generate =head1 NAME line in Name.pmKarl Williamson2021-08-151-1/+1
| | | | | All .pm files are supposed to have this line. So far this hasn't been necessary for this file, but future commits will require it.
* lib/unicore/mktables: correct sub signatures in 2 locationsJames E Keenan2021-08-141-1/+1
| | | | | Then, re-run regen/mk_invlists.pl and regen/regcharclass.pl and commit changes in headers.
* utf8.c: Rmv an EBCDIC dependencyKarl Williamson2021-08-141-1/+1
| | | | This is now generated by regcharclass.pl
* mktables: Change "null string" to "empty string"Karl Williamson2021-08-111-1/+1
| | | | The latter phrase makes more sense
* mktables: Add, fix commentsKarl Williamson2021-08-111-1/+1
|
* mktables: Fix debugging issuesKarl Williamson2021-08-111-1/+1
| | | | | | Commit 4fe9356b250 changed the signatures on subroutines, and didn't do these correctly. The result was that perl would croak when using the mktables debugging facility.
* mktables: Fix table outputKarl Williamson2021-08-091-1/+1
| | | | | | Commit 4fe9356b250 changed the signatures on subroutines, and didn't do this one correctly. The result was that the comments in the generated files had duplicate text and were slightly garbled.
* regcharclass.pl: Add fast surrogate UTF-8 trieKarl Williamson2021-08-071-1/+13
| | | | | This will be used in the next commit. It requires only the first two bytes to determine if a UTF-8 or UTF-EBCDIC sequence is for a surrogate
* regcharclass.pl: Further improve EBCDIC codeKarl Williamson2021-08-071-23/+23
| | | | | | | | | | | A couple of commits ago improved the generated output of this script. This builds on that. The improvements were to try a transform that could lead to fewer conditionals, as bytes were greouped in fewer ranges. But that introduced a useless transformation for the single element ranges that remain. This commit removes the transformation if not needed.
* regcharclass.pl: Make 2 locals into global hashesKarl Williamson2021-08-071-1/+1
| | | | This is in preparation for a future commit
* regcharclass.pl: Improve generated code for EBCDICKarl Williamson2021-08-071-151/+139
| | | | | | | | | | | | | | | | UTF-8 has some desirable characteristics not shared by UTF-EBCDIC. One example is all the continuation bytes are in a single range. By transforming a UTF-EBCDIC byte into I8 (similar to UTF-8), we gain those characteristics, and may be able to save a conditional or three. This commit creates a 2nd pass over the bytes that are to be matched, transforming them into I8. If that pass results in fewer conditionals than the traditional, native, generated code, use the fewer result. This saves quite a bit in some of the generated code, enabling the quotemeta macro to be represented in a single part; previously it had to be split to avoid compiler macro size limits.
* regcharclass.pl: White-space comment onlyKarl Williamson2021-08-071-1/+1
| | | | A future commit will put a block around this; indent now.