summaryrefslogtreecommitdiff
path: root/re.c
Commit message (Collapse)AuthorAgeFilesLines
* [DOC] Correction to RDoc for Regexp.new (#7130)Burdette Lamar2023-01-161-0/+2
| | | Correction to RDoc for Regexp.new
* Always issue deprecation warning when calling Regexp.new with 3rd positional ↵Jeremy Evans2022-12-221-14/+10
| | | | | | | | | | | | | | argument Previously, only certain values of the 3rd argument triggered a deprecation warning. First step for fix for bug #18797. Support for the 3rd argument will be removed after the release of Ruby 3.2. Fix minor fallout discovered by the tests. Co-authored-by: Nobuyoshi Nakada <nobu@ruby-lang.org>
* Refactor `reg_extract_args` to return regexp if givenNobuyoshi Nakada2022-12-221-12/+9
|
* Share argument parsing in `Regexp#initialize` and `Regexp.linear_time?`Nobuyoshi Nakada2022-12-221-20/+41
|
* typo in doc [ci skip]卜部昌平2022-12-191-1/+1
|
* Note about Regexp.linera_time? [ci skip]卜部昌平2022-12-191-0/+10
|
* Add `Regexp.linear_time?` (#6901)TSUYUSATO Kitsune2022-12-141-0/+34
|
* Introduce encoding check macroS-H-GAMELINKS2022-12-021-1/+2
|
* Prevent segfault in String#scan with ObjectSpace.each_objectYusuke Endoh2022-12-011-0/+7
| | | | | | | | | | Calling `String#scan` without a block creates an incomplete MatchData object whose `RMATCH(match)->str` is Qfalse. Usually this object is not leaked, but it was possible to pull it by using ObjectSpace.each_object. This change hides the internal MatchData object by using rb_obj_hide. Fixes [Bug #19159]
* Using UNDEF_P macroS-H-GAMELINKS2022-11-161-2/+2
|
* Suppress false warning by a bug of gccNobuyoshi Nakada2022-11-081-4/+5
| | | | | | | GCC [Bug 99578] seems triggered by calling `rb_reg_last_match` before `match_check(match)`, probably by `NIL_P(match)` in `rb_reg_nth_match`. [Bug 99578]: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99578
* Refactor timeout-setting code to a functionYusuke Endoh2022-10-241-13/+12
|
* Refactor timeout-related code in re.c a littleYusuke Endoh2022-10-241-9/+9
|
* Fix per-instance Regexp timeout (#6621)Yusuke Endoh2022-10-241-2/+8
| | | | | | | | | | Fix per-instance Regexp timeout This makes it follow what was decided in [Bug #19055]: * `Regexp.new(str, timeout: nil)` should respect the global timeout * `Regexp.new(str, timeout: huge_val)` should use the maximum value that can be represented in the internal representation * `Regexp.new(str, timeout: 0 or negative value)` should raise an error
* Fix argument & Remove enumS-H-GAMELINKS2022-10-231-9/+3
|
* Introduce rb_memsearch_with_char_size functionS-H-GAMELINKS2022-10-231-10/+14
|
* * expand tabs. [ci skip]git2022-10-101-2/+2
| | | | | Tabs were expanded because the file did not have any tab indentation in unedited lines. Please update your editor config, and use misc/expand_tabs.rb in the pre-commit hook.
* Should use dedecated function `Check_Type`Nobuyoshi Nakada2022-10-101-12/+4
|
* Add MatchData#deconstruct/deconstruct_keysVladimir Dementyev2022-10-101-0/+85
|
* [DOC] `offset` argument of Regexp#matchNobuyoshi Nakada2022-08-181-1/+6
|
* Speed up setting the backref match objectAaron Patterson2022-08-021-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch speeds up setting the backref match object by avoiding some memcopies. Take the following code for example: ```ruby "hello world" =~ /hello/ p $~ ``` When the RE matches the string, we have to set the Match object in the backref global. So we would allocate a match object[^1] and use `rb_reg_region_copy`[^2] to make a deep copy of the stack allocated `re_registers` struct[^3] in to the newly created Ruby object. This could possibly trigger GC[^4], and would allocate new memory. This patch makes a shallow copy of the `re_registers` struct on to the Match object allowing the match object to manage the `re_registers` pointer and also avoiding some calls to `xmalloc` and some manual memcopy. Benchmark looks like this: ```ruby require "benchmark/ips" def test_re thing thing =~ /hello/ end Benchmark.ips do |x| x.report("re hit") do test_re "hello world" end x.report("re miss") do test_re "world" end end ``` Before this patch: ``` $ ruby -v test.rb ruby 3.2.0dev (2022-07-27T22:29:00Z master 4ad69899b7) [arm64-darwin21] Ignoring bcrypt-3.1.16 because its extensions are not built. Try: gem pristine bcrypt --version 3.1.16 Warming up -------------------------------------- re hit 345.401k i/100ms re miss 673.584k i/100ms Calculating ------------------------------------- re hit 3.452M (± 0.5%) i/s - 17.270M in 5.002535s re miss 6.736M (± 0.4%) i/s - 34.353M in 5.099593s ``` After this patch: ``` $ ./ruby -v test.rb ruby 3.2.0dev (2022-08-01T21:24:12Z less-memcpy 0ff2a56606) [arm64-darwin21] Warming up -------------------------------------- re hit 419.578k i/100ms re miss 673.251k i/100ms Calculating ------------------------------------- re hit 4.201M (± 0.7%) i/s - 21.398M in 5.093593s re miss 6.716M (± 0.4%) i/s - 33.663M in 5.012756s ``` Matches get faster and misses maintain the same speed [^1]: https://github.com/ruby/ruby/blob/24204d54ab730791bfbd0cd66b8e12f0bd62ca5d/re.c#L1737 [^2]: https://github.com/ruby/ruby/blob/24204d54ab730791bfbd0cd66b8e12f0bd62ca5d/re.c#L1738 [^3]: https://github.com/ruby/ruby/blob/24204d54ab730791bfbd0cd66b8e12f0bd62ca5d/re.c#L1686 [^4]: https://github.com/ruby/ruby/blob/24204d54ab730791bfbd0cd66b8e12f0bd62ca5d/re.c#L981
* Expand tabs [ci skip]Takashi Kokubun2022-07-211-636/+636
| | | | [Misc #18891]
* [DOC] Fix a typo [ci skip]Kazuhiro NISHIYAMA2022-06-261-1/+1
|
* Document that Regexp#source does not retain lexer escapesJeremy Evans2022-06-201-1/+5
| | | | Related to [Feature #18838]
* [Feature #18788] [DOC] String options to `Regexp.new`Nobuyoshi Nakada2022-06-201-0/+5
| | | | Co-Authored-By: Janosch Müller <janosch.mueller@betterplace.org>
* [Feature #18788] Support options as `String` to `Regexp.new`Nobuyoshi Nakada2022-06-201-0/+21
| | | | | `Regexp.new` now supports passing the regexp flags not only as an `Integer`, but also as a `String. Unknown flags raise errors.
* Warn suspicious flag to `Regexp.new`Nobuyoshi Nakada2022-06-201-1/+3
| | | | | Now second argument should be `true`, `false`, `nil` or Integer. This flag is confused with third argument some times.
* [DOC] Refine Regexp.new argument descriptionsNobuyoshi Nakada2022-06-201-6/+19
|
* [DOC] Regexp timeout is float or nilNobuyoshi Nakada2022-06-201-3/+3
|
* [DOC] Fixed omissions in Regexp.new argumentsNobuyoshi Nakada2022-06-201-2/+6
|
* Ignore invalid escapes in regexp commentsJeremy Evans2022-06-061-8/+63
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Invalid escapes are handled at multiple levels. The first level is in parse.y, so skip invalid unicode escape checks for regexps in parse.y. Make rb_reg_preprocess and unescape_nonascii accept the regexp options. In unescape_nonascii, if the regexp is an extended regexp, when "#" is encountered, ignore all characters until the end of line or end of regexp. Unfortunately, in extended regexps, you can use "#" as a non-comment character inside a character class, so also parse "[" and "]" specially for extended regexps, and only skip comments if "#" is not inside a character class. Handle nested character classes as well. This issue doesn't just affect extended regexps, it also affects "(#?" comments inside all regexps. So for those comments, scan until trailing ")" and ignore content inside. I'm not sure if there are other corner cases not handled. A better fix would be to redesign the regexp parser so that it unescaped during parsing instead of before parsing, so you already know the current parsing state. Fixes [Bug #18294] Co-authored-by: Nobuyoshi Nakada <nobu@ruby-lang.org>
* [DOC] Enhanced RDoc for MatchData (#5822)Burdette Lamar2022-04-181-50/+69
| | | | | | | | | | Treats: #to_s #named_captures #string #inspect #hash #==
* Enhanced RDoc for MatchData (#5821)Burdette Lamar2022-04-181-32/+41
| | | | | | Treats: #[] #values_at
* Enhanced RDoc for MatchData (#5820)Burdette Lamar2022-04-181-33/+41
| | | | | | | | Treats: #pre_match #post_match #to_a #captures
* [DOC] Enhanced RDoc for MatchData (#5819)Burdette Lamar2022-04-181-45/+47
| | | | | | | | Treats: #begin #end #match #match_length
* [DOC] Enhanced RDoc for MatchData (#5818)Burdette Lamar2022-04-181-31/+32
| | | | | | | | Treats: #regexp #names #size #offset
* [DOC] Enhanced RDoc for Regexp (#5815)Burdette Lamar2022-04-181-100/+136
| | | | | | | | | Treats: ::new ::escape ::try_convert ::union ::last_match
* [DOC] Enhanced RDoc for Regexp (#5812)Burdette Lamar2022-04-161-91/+105
| | | | | | | | | | | | | | | | | Treats: #fixed_encoding? #hash #== #=~ #match #match? Also, in regexp.rdoc: Changes heading from 'Special Global Variables' to 'Regexp Global Variables'. Add tiny section 'Regexp Interpolation'.
* [DOC] Enhanced RDoc for Regexp (#5807)Burdette Lamar2022-04-151-78/+84
| | | | | | | | | | | | Treats: #source #inspect #to_s #casefold? #options #names #named_captures
* Return only captured range in `MatchData` [Bug #18670]Nobuyoshi Nakada2022-03-311-1/+1
|
* re.c: stop a wrong warning of "flags ignored" on Regexp.new(//)Yusuke Endoh2022-03-311-1/+1
| | | | [Bug #18669]
* internal/ractor.h: AddedYusuke Endoh2022-03-301-1/+1
| | | | Currently it has only one function prototype.
* re.c: raise Regexp::TimeoutError instead of RuntimeErrorYusuke Endoh2022-03-301-2/+3
|
* re.c: Add `timeout` keyword for Regexp.new and Regexp#timeoutYusuke Endoh2022-03-301-14/+63
|
* re.c: Add Regexp.timeout= and Regexp.timeoutYusuke Endoh2022-03-301-0/+88
| | | | [Feature #17837]
* Add String#byteindex, String#byterindex, and MatchData#byteoffset (#5518)Shugo Maeda2022-02-191-0/+33
| | | | | | * Add String#byteindex, String#byterindex, and MatchData#byteoffset [Feature #13110] Co-authored-by: NARUSE, Yui <naruse@airemix.jp>
* LONG2NUM() should be used for rmatch_offset::{beg,end}Shugo Maeda2022-02-181-4/+4
| | | | https://github.com/ruby/ruby/pull/5518#discussion_r809645406
* [DOC] Fix broken links to literals.rdocNobuyoshi Nakada2022-02-081-1/+1
|
* Replace to RBOOL macroS-H-GAMELINKS2022-01-171-4/+1
|
* Adding links to literals and Kernel (#5192)Burdette Lamar2021-12-031-0/+4
| | | | * Adding links to literals and Kernel