| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
| |
Correction to RDoc for Regexp.new
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
argument
Previously, only certain values of the 3rd argument triggered a
deprecation warning.
First step for fix for bug #18797. Support for the 3rd argument
will be removed after the release of Ruby 3.2.
Fix minor fallout discovered by the tests.
Co-authored-by: Nobuyoshi Nakada <nobu@ruby-lang.org>
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Calling `String#scan` without a block creates an incomplete MatchData
object whose `RMATCH(match)->str` is Qfalse. Usually this object is not
leaked, but it was possible to pull it by using ObjectSpace.each_object.
This change hides the internal MatchData object by using rb_obj_hide.
Fixes [Bug #19159]
|
| |
|
|
|
|
|
|
|
| |
GCC [Bug 99578] seems triggered by calling `rb_reg_last_match` before
`match_check(match)`, probably by `NIL_P(match)` in `rb_reg_nth_match`.
[Bug 99578]: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99578
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Fix per-instance Regexp timeout
This makes it follow what was decided in [Bug #19055]:
* `Regexp.new(str, timeout: nil)` should respect the global timeout
* `Regexp.new(str, timeout: huge_val)` should use the maximum value that
can be represented in the internal representation
* `Regexp.new(str, timeout: 0 or negative value)` should raise an error
|
| |
|
| |
|
|
|
|
|
| |
Tabs were expanded because the file did not have any tab indentation in unedited lines.
Please update your editor config, and use misc/expand_tabs.rb in the pre-commit hook.
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch speeds up setting the backref match object by avoiding some
memcopies. Take the following code for example:
```ruby
"hello world" =~ /hello/
p $~
```
When the RE matches the string, we have to set the Match object in the
backref global. So we would allocate a match object[^1] and use
`rb_reg_region_copy`[^2] to make a deep copy of the stack allocated
`re_registers` struct[^3] in to the newly created Ruby object. This
could possibly trigger GC[^4], and would allocate new memory.
This patch makes a shallow copy of the `re_registers` struct on to the
Match object allowing the match object to manage the `re_registers`
pointer and also avoiding some calls to `xmalloc` and some manual
memcopy.
Benchmark looks like this:
```ruby
require "benchmark/ips"
def test_re thing
thing =~ /hello/
end
Benchmark.ips do |x|
x.report("re hit") do
test_re "hello world"
end
x.report("re miss") do
test_re "world"
end
end
```
Before this patch:
```
$ ruby -v test.rb
ruby 3.2.0dev (2022-07-27T22:29:00Z master 4ad69899b7) [arm64-darwin21]
Ignoring bcrypt-3.1.16 because its extensions are not built. Try: gem pristine bcrypt --version 3.1.16
Warming up --------------------------------------
re hit 345.401k i/100ms
re miss 673.584k i/100ms
Calculating -------------------------------------
re hit 3.452M (± 0.5%) i/s - 17.270M in 5.002535s
re miss 6.736M (± 0.4%) i/s - 34.353M in 5.099593s
```
After this patch:
```
$ ./ruby -v test.rb
ruby 3.2.0dev (2022-08-01T21:24:12Z less-memcpy 0ff2a56606) [arm64-darwin21]
Warming up --------------------------------------
re hit 419.578k i/100ms
re miss 673.251k i/100ms
Calculating -------------------------------------
re hit 4.201M (± 0.7%) i/s - 21.398M in 5.093593s
re miss 6.716M (± 0.4%) i/s - 33.663M in 5.012756s
```
Matches get faster and misses maintain the same speed
[^1]: https://github.com/ruby/ruby/blob/24204d54ab730791bfbd0cd66b8e12f0bd62ca5d/re.c#L1737
[^2]: https://github.com/ruby/ruby/blob/24204d54ab730791bfbd0cd66b8e12f0bd62ca5d/re.c#L1738
[^3]: https://github.com/ruby/ruby/blob/24204d54ab730791bfbd0cd66b8e12f0bd62ca5d/re.c#L1686
[^4]: https://github.com/ruby/ruby/blob/24204d54ab730791bfbd0cd66b8e12f0bd62ca5d/re.c#L981
|
|
|
|
| |
[Misc #18891]
|
| |
|
|
|
|
| |
Related to [Feature #18838]
|
|
|
|
| |
Co-Authored-By: Janosch Müller <janosch.mueller@betterplace.org>
|
|
|
|
|
| |
`Regexp.new` now supports passing the regexp flags not only as an
`Integer`, but also as a `String. Unknown flags raise errors.
|
|
|
|
|
| |
Now second argument should be `true`, `false`, `nil` or Integer.
This flag is confused with third argument some times.
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Invalid escapes are handled at multiple levels. The first level
is in parse.y, so skip invalid unicode escape checks for regexps
in parse.y.
Make rb_reg_preprocess and unescape_nonascii accept the regexp
options. In unescape_nonascii, if the regexp is an extended
regexp, when "#" is encountered, ignore all characters until the
end of line or end of regexp.
Unfortunately, in extended regexps, you can use "#" as a non-comment
character inside a character class, so also parse "[" and "]"
specially for extended regexps, and only skip comments if "#" is
not inside a character class. Handle nested character classes as well.
This issue doesn't just affect extended regexps, it also affects
"(#?" comments inside all regexps. So for those comments, scan
until trailing ")" and ignore content inside.
I'm not sure if there are other corner cases not handled. A
better fix would be to redesign the regexp parser so that it
unescaped during parsing instead of before parsing, so you already
know the current parsing state.
Fixes [Bug #18294]
Co-authored-by: Nobuyoshi Nakada <nobu@ruby-lang.org>
|
|
|
|
|
|
|
|
|
|
| |
Treats:
#to_s
#named_captures
#string
#inspect
#hash
#==
|
|
|
|
|
|
| |
Treats:
#[]
#values_at
|
|
|
|
|
|
|
|
| |
Treats:
#pre_match
#post_match
#to_a
#captures
|
|
|
|
|
|
|
|
| |
Treats:
#begin
#end
#match
#match_length
|
|
|
|
|
|
|
|
| |
Treats:
#regexp
#names
#size
#offset
|
|
|
|
|
|
|
|
|
| |
Treats:
::new
::escape
::try_convert
::union
::last_match
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Treats:
#fixed_encoding?
#hash
#==
#=~
#match
#match?
Also, in regexp.rdoc:
Changes heading from 'Special Global Variables' to 'Regexp Global Variables'.
Add tiny section 'Regexp Interpolation'.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Treats:
#source
#inspect
#to_s
#casefold?
#options
#names
#named_captures
|
| |
|
|
|
|
| |
[Bug #18669]
|
|
|
|
| |
Currently it has only one function prototype.
|
| |
|
| |
|
|
|
|
| |
[Feature #17837]
|
|
|
|
|
|
| |
* Add String#byteindex, String#byterindex, and MatchData#byteoffset [Feature #13110]
Co-authored-by: NARUSE, Yui <naruse@airemix.jp>
|
|
|
|
| |
https://github.com/ruby/ruby/pull/5518#discussion_r809645406
|
| |
|
| |
|
|
|
|
| |
* Adding links to literals and Kernel
|