| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
| |
A code pattern `p + enclen(enc, p, pend)` may lead to a buffer overrun
if incomplete bytes of a UTF-8 character is placed at the end of a
string. Because this pattern is used in several places in onigmo,
this change fixes the issue in the side of `enclen`: the function should
not return a number that is larger than `pend - p`.
Co-Authored-By: Nobuyoshi Nakada <nobu@ruby-lang.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A regexp that ends with an escape following an incomplete UTF-8 char
might cause buffer overrun. Found by OSS-Fuzz.
```
$ valgrind ./miniruby -e 'Regexp.new("\\u2d73\\0\\0\\0\\0 \\\xE6".b)'
==296213== Memcheck, a memory error detector
==296213== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==296213== Using Valgrind-3.18.1 and LibVEX; rerun with -h for copyright info
==296213== Command: ./miniruby -e Regexp.new("\\\\u2d73\\\\0\\\\0\\\\0\\\\0\ \ \ \ \ \ \ \ \ \ \\\\\\xE6".b)
==296213==
==296213== Warning: client switching stacks? SP change: 0x1ffe8020e0 --> 0x1ffeffff10
==296213== to suppress, use: --max-stackframe=8379952 or greater
==296213== Invalid read of size 1
==296213== at 0x484EA10: memmove (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==296213== by 0x339568: memcpy (string_fortified.h:29)
==296213== by 0x339568: onig_strcpy (regparse.c:271)
==296213== by 0x339568: onig_node_str_cat (regparse.c:1413)
==296213== by 0x33CBA0: parse_exp (regparse.c:6198)
==296213== by 0x33EDE4: parse_branch (regparse.c:6511)
==296213== by 0x33EEA2: parse_subexp (regparse.c:6544)
==296213== by 0x34019C: parse_regexp (regparse.c:6593)
==296213== by 0x34019C: onig_parse_make_tree (regparse.c:6638)
==296213== by 0x32782D: onig_compile_ruby (regcomp.c:5779)
==296213== by 0x313EFA: onig_new_with_source (re.c:876)
==296213== by 0x313EFA: make_regexp (re.c:900)
==296213== by 0x313EFA: rb_reg_initialize (re.c:3136)
==296213== by 0x318555: rb_reg_initialize_str (re.c:3170)
==296213== by 0x318555: rb_reg_init_str (re.c:3205)
==296213== by 0x31A669: rb_reg_initialize_m (re.c:3856)
==296213== by 0x3E5165: vm_call0_cfunc_with_frame (vm_eval.c:150)
==296213== by 0x3E5165: vm_call0_cfunc (vm_eval.c:164)
==296213== by 0x3E5165: vm_call0_body (vm_eval.c:210)
==296213== by 0x3E89BD: vm_call0_cc (vm_eval.c:87)
==296213== by 0x3E89BD: rb_call0 (vm_eval.c:551)
==296213== Address 0x9d45b10 is 0 bytes after a block of size 32 alloc'd
==296213== at 0x4844899: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==296213== by 0x20FA7B: objspace_xmalloc0 (gc.c:12146)
==296213== by 0x35F8C9: str_buf_cat4.part.0 (string.c:3132)
==296213== by 0x31359D: unescape_escaped_nonascii (re.c:2690)
==296213== by 0x313A9D: unescape_nonascii (re.c:2869)
==296213== by 0x313A9D: rb_reg_preprocess (re.c:2992)
==296213== by 0x313DFC: rb_reg_initialize (re.c:3109)
==296213== by 0x318555: rb_reg_initialize_str (re.c:3170)
==296213== by 0x318555: rb_reg_init_str (re.c:3205)
==296213== by 0x31A669: rb_reg_initialize_m (re.c:3856)
==296213== by 0x3E5165: vm_call0_cfunc_with_frame (vm_eval.c:150)
==296213== by 0x3E5165: vm_call0_cfunc (vm_eval.c:164)
==296213== by 0x3E5165: vm_call0_body (vm_eval.c:210)
==296213== by 0x3E89BD: vm_call0_cc (vm_eval.c:87)
==296213== by 0x3E89BD: rb_call0 (vm_eval.c:551)
==296213== by 0x3E957B: rb_call (vm_eval.c:877)
==296213== by 0x3E957B: rb_funcallv_kw (vm_eval.c:1074)
==296213== by 0x2A4123: rb_class_new_instance_pass_kw (object.c:1991)
==296213==
==296213==
==296213== HEAP SUMMARY:
==296213== in use at exit: 35,476,538 bytes in 9,489 blocks
==296213== total heap usage: 14,893 allocs, 5,404 frees, 37,517,821 bytes allocated
==296213==
==296213== LEAK SUMMARY:
==296213== definitely lost: 316,081 bytes in 2,989 blocks
==296213== indirectly lost: 136,808 bytes in 2,361 blocks
==296213== possibly lost: 1,048,624 bytes in 3 blocks
==296213== still reachable: 33,975,025 bytes in 4,136 blocks
==296213== suppressed: 0 bytes in 0 blocks
==296213== Rerun with --leak-check=full to see details of leaked memory
==296213==
==296213== For lists of detected and suppressed errors, rerun with: -s
==296213== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
```
|
|
|
|
| |
* Fix some UBSAN false positives.
* ruby tool/update-deps --fix
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
http://rubyci.s3.amazonaws.com/arch/ruby-master/log/20220613T030003Z.log.html.gz
```
regparse.c:264:15: warning: array subscript 56 is outside array bounds of ‘Node[1]’ {aka ‘struct _Node[1]’} [-Warray-bounds]
```
and
```
/usr/include/bits/string_fortified.h:29:10: warning: ‘__builtin_memcpy’ pointer overflow between offset 32 and size [9223372036854775792, 9223372036854775807] [-Warray-bounds]
```
|
|
|
|
| |
Also make the format string compatible with literal strings which
are const arrays of "plain" chars.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Quantifier reduction when using +?)* and +?)+ should not be done
as it affects which text will be matched.
This removes the need for the RQ_PQ_Q ReduceType, so remove the
enum entry and related switch case.
Test that these are the only two patterns affected by testing all
quantifier reduction tuples for both the captured and uncaptured
cases and making sure the matched text is the same for both.
Fixes [Bug #17341]
|
|
|
|
|
|
|
|
| |
Default to ONIGERR_INVALID_CHAR_PROPERTY_NAME in
fetch_char_property_to_ctype and only set otherwise if an ending
} is found.
Fixes [Bug #17340]
|
|
|
|
| |
Fixed misspellings reported at [Bug #16437], only in ruby and rubyspec.
|
|
|
|
|
|
|
|
| |
After 5e86b005c0f2ef30df2f9906c7e2f3abefe286a2, I now think ANYARGS is
dangerous and should be extinct. This commit deletes ANYARGS from
st_foreach. I strongly believe that this commit should have had come
with b0af0592fdd9e9d4e4b863fde006d67ccefeac21, which added extra
parameter to st_foreach callbacks.
|
|
|
|
|
|
|
|
| |
* string.c (get_reg_grapheme_cluster): make regexp from properly
encoded sources fro wide-char encodings. [Bug #15965]
* regparse.c (node_extended_grapheme_cluster): suppress false
duplicated range warning for the time being.
|
|
|
|
|
|
|
|
|
|
| |
In regparse.c, in function node_extended_grapheme_cluster,
we used a raw if() with exit(1) as a cross-check for our
length calculations for the common node array. Convert this
to an assertion and comment it out because it is not needed
for active code.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66269 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
|
|
|
|
|
|
|
|
|
|
| |
In file regparse.c, in function node_extended_grapheme_cluster(),
eliminate code duplication of CRLF and '.' (any character). This
uses the fact that both for Unicode encodings and for non-Unicode
encodings, the first alternative is CRLF, and the last alternative
is '.' (any character). This puts all of the pieces into forward
order (the order of the code follows the order of the syntax
definition).
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66267 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
|
|
| |
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66240 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
|
|
|
|
|
|
|
| |
regparse.c: In function node_extended_grapheme_cluster(), use function-global
array node_common and use it for list and alternate construction. This is done
so that in case of error, all nodes that have already been constructed can be
correctly freed in a single for loop. Document the layout structure.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66239 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In regparse.c:
* Reduce coode duplication by merging the almost identical functions
create_sequence_node and create_alternate_node into a new function
create_node_from_array, adding a parameter that distinguishes between
creating a list and creating an alternative.
* Streamline variable/function naming. Unicode UAX #29 uses 'sequence', but
the regular expression library uses 'list' for the same concept. Keep
'sequence' in the ccmments that are taken from UAX #29, but use 'list'
in variable names.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66234 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
|
|
|
|
|
|
|
|
| |
* unicode.c: Remove the arrays onigenc_unicode_GCB_ranges_GAZ,
onigenc_unicode_GCB_ranges_E_Base, and onigenc_unicode_GCB_ranges_Emoji,
because they are not needed anymore for Unicode 11.0.0.
* regparse.c: Remove external declarations for above arrays.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66232 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
|
|
| |
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66218 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
|
|
| |
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66217 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
|
|
| |
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66214 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- common.mk: Change Unicode version to 11.0.0, and Emoji version to 11.0
- test/ruby/enc/test_emoji_breaks.rb: update hard-coded Emoji version
- enc/unicode/11.0.0, enc/unicode/11.0.0/casefold.h, enc/unicode/name2ctype.h:
Add generated files. Files for Unicode 10.0.0 will be removed once we are
sure 11.0.0 works.
- lib/unicode_normalize/tables.rb: Updated table.
- regparse.c: Almost completely reimplement grapheme cluster detection in
function node_extended_grapheme_cluster().
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66213 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
|
|
|
|
|
|
|
|
|
|
| |
Remove unnecessary settings of node_array elements to NULL_NODE.
We can do this because we initialize the whole array to NULL_NODEs
and set everything again to NULL_NODEs when creating a sequence or
alternative node.
Also, fix an index error in the initialization of node_array.
(issue #15343)
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66139 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
|
|
| |
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66138 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
|
|
| |
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66137 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
|
|
|
|
|
|
|
| |
regparse.c: In function node_extended_grapheme_cluster(), introduce function-global
array node_array and use it for sequence and alternate construction. This is done
so that in case of error, all nodes that have already been constructed can be
correctly freed. (issue #15343)
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66135 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
|
|
| |
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66132 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
|
|
| |
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66123 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
|
|
| |
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66122 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
|
|
|
|
|
|
| |
Remove code that tries to remove CR and LF from Grapheme_Cluster_Break=Control.
This code is unnecessary because Grapheme_Cluster_Break=Control already excludes
CR and LF.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66116 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
|
|
| |
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66115 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
|
|
|
|
|
| |
Introduce new function create_alternate_node() to create an alternative node
from a list of nodes in one go. Use it once (two more uses expected).
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66114 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
|
|
| |
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66113 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
|
|
| |
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66072 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
|
|
| |
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66071 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
|
|
|
|
|
| |
Four more use of create_sequence_node() in node_extended_grapheme_cluster
(a few more to come).
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66070 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
|
|
|
|
|
| |
One more use of create_sequence_node() in node_extended_grapheme_cluster
(several more to come).
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66063 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
|
|
|
|
|
|
| |
Introduce a new preprocessor macro R_ERR to visually reduce repetitive code
checking for return values and going to the err: label at the end of the
function node_extended_grapheme_cluster().
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66057 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
|
|
|
|
|
|
|
| |
There are only four patterns of the last two arguments to quantify_property_node().
By replacing the lower/upper arguments with a single char, we get more expressive
calls, the last argument directly corresponding to the quantifier that we want to
use (except for '2', which means exactly two).
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66052 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
|
|
| |
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66048 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
|
|
| |
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66047 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
|
|
|
|
| |
node_extended_grapheme_cluster
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66046 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
|
|
|
|
|
| |
In function node_extended_grapheme_cluster(), store and test
return value from create_sequence_node(). Never forget this!
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66045 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
|
|
| |
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66044 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
|
|
|
|
|
|
|
|
| |
In function node_extended_grapheme_cluster(),
move declaration up so that block encompasses all of the regular expression
creation that finally makes up the sequence. Having blocks like this will
be great because it directly shows the extent of code belonging to each
subexpression of the regular expression being created.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66043 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
|
|
|
|
|
| |
We make sure that the newly created tree and all remaining nodes passed in
in the node_array are freed.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66042 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
|
|
|
|
|
|
|
|
| |
../regparse.c:5908:28: error: initializer for aggregate is not a compile-time constant [-Werror,-Wc99-extensions]
Node* sequence[] = { np1, np2, np3, ((Node* )0) };
^~~
https://travis-ci.org/ruby/ruby/jobs/460197620
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66034 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
|
|
|
|
|
|
| |
The new function create_sequence_node() uses its second argument
(an array of Node*, from left to right, ending with NULL_NODE)
to create a sequence of expressions using node_new_list().
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66033 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
|
|
| |
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66032 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
|
|
|
|
|
| |
The new function quantify_property_node() combines the functions
create_property_node() and quantify_node(), which frequently appear together.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66031 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
|
|
| |
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66030 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
|
|
|
|
| |
"Grapheme_Cluster_Break=Extend"
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66021 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|