summaryrefslogtreecommitdiff
path: root/string.c
Commit message (Collapse)AuthorAgeFilesLines
* Fix inspect for unicode codepoint 0x85Jeremy Evans2022-08-111-1/+9
| | | | | | | | | | | | This is an inelegant hack, by manually checking for this specific code point in rb_str_inspect. Some testing indicates that this is the only code point affected. It's possible a better fix would be inside of lower-level encoding code, such that rb_enc_isprint would return false and not true for codepoint 0x85. Fixes [Bug #16842]
* Adjust indent [ci skip]Nobuyoshi Nakada2022-07-261-10/+10
|
* Cheaply derive code range for String#b return valueKevin Menard2022-07-261-1/+17
| | | | The result of String#b is a string with an ASCII_8BIT/BINARY encoding. That encoding is ASCII-compatible and has no byte sequences that are invalid for the encoding. If we know the receiver's code range, we can derive the resulting string's code range without needing to perform a full code range scan.
* rb_str_buf_append: add a fast path for ENC_CODERANGE_VALIDJean Boussier2022-07-251-3/+18
| | | | | | | | | | | | | | | | | | | | | | If the RHS has valid encoding, and both strings have the same encoding, we can use the fast path. However we need to update the LHS coderange. ``` compare-ruby: ruby 3.2.0dev (2022-07-21T14:46:32Z master cdbb9b8555) [arm64-darwin21] built-ruby: ruby 3.2.0dev (2022-07-25T07:25:41Z string-concat-vali.. 11a2772bdd) [arm64-darwin21] warming up... | |compare-ruby|built-ruby| |:-------------------|-----------:|---------:| |binary_concat_7bit | 554.816k| 556.460k| | | -| 1.00x| |utf8_concat_7bit | 556.367k| 555.101k| | | 1.00x| -| |utf8_concat_UTF8 | 412.555k| 556.824k| | | -| 1.35x| ```
* Expand tabs [ci skip]Takashi Kokubun2022-07-211-2861/+2861
| | | | [Misc #18891]
* Make String#each_line work correctly with paragraph separator and chompJeremy Evans2022-07-211-2/+7
| | | | | | | | | | Previously, it was including one newline when chomp was used, which is inconsistent with IO#each_line behavior. This makes behavior consistent with IO#each_line, chomping all paragraph separators (multiple consecutive newlines), but not single newlines. Partially Fixes [Bug #18768]
* string.c: use str_enc_fastpath in TERM_LENJean Boussier2022-07-211-15/+15
| | | | | | | | | | | | | | | | | | | | | | | Not having to fetch the rb_encoding save a significant amount of time. Additionally, even when we have to fetch it, we can do it faster using `ENCODING_GET` rather than `rb_enc_get`. ``` compare-ruby: ruby 3.2.0dev (2022-07-19T08:41:40Z master cb9fd920a3) [arm64-darwin21] built-ruby: ruby 3.2.0dev (2022-07-21T11:16:16Z faster-buffer-conc.. 4f001f0748) [arm64-darwin21] warming up... | |compare-ruby|built-ruby| |:---------------------|-----------:|---------:| |binary_concat_utf8 | 510.580k| 565.600k| | | -| 1.11x| |binary_concat_binary | 512.653k| 571.483k| | | -| 1.11x| |utf8_concat_utf8 | 511.396k| 566.879k| | | -| 1.11x| ```
* str_buf_cat: preserve coderange when going through fastpathJean Boussier2022-07-191-6/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | rb_str_modify clear the coderange, which in this case isn't necessary. ``` compare-ruby: ruby 3.2.0dev (2022-07-12T15:01:11Z master 71aec68566) [arm64-darwin21] built-ruby: ruby 3.2.0dev (2022-07-19T07:17:01Z faster-buffer-conc.. 3cad62aab4) [arm64-darwin21] warming up... | |compare-ruby|built-ruby| |:---------------------|-----------:|---------:| |binary_concat_utf8 | 360.617k| 605.091k| | | -| 1.68x| |binary_concat_binary | 446.650k| 605.053k| | | -| 1.35x| |utf8_concat_utf8 | 454.166k| 597.311k| | | -| 1.32x| ``` ``` | |compare-ruby|built-ruby| |:-----------|-----------:|---------:| |erb_render | 1.790M| 2.045M| | | -| 1.14x| ```
* rb_str_buf_append: fastpath to str_buf_catJean Boussier2022-07-191-3/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | If the LHS is ASCII compatible and the RHS is 7BIT we can directly concat without being concerned about anything else. Benchmark: ``` compare-ruby: ruby 3.2.0dev (2022-07-12T15:01:11Z master 71aec68566) [arm64-darwin21] built-ruby: ruby 3.2.0dev (2022-07-13T10:13:53Z faster-buffer-conc.. a04c10476d) [arm64-darwin21] warming up... | |compare-ruby|built-ruby| |:---------------------|-----------:|---------:| |binary_append_utf8 | 385.315k| 573.663k| | | -| 1.49x| |binary_append_binary | 446.579k| 574.898k| | | -| 1.29x| |utf8_append_utf8 | 430.936k| 573.394k| | | -| 1.33x| ``` Note that in the benchmark, the RHS always have a precomputed coderange. So the benchmark never enter the slowpath of having to scan the RHS. However it's extremly likely that we'll end up scanning it anyway in rb_enc_cr_str_buf_cat
* Rename ENCINDEX_ASCII to ENCINDEX_ASCII_8BITJean Boussier2022-07-191-2/+2
| | | | Otherwise it's way too easy to confuse it with US_ASCII.
* [DOC] Correct call-seq directive in string.c (#6131)Burdette Lamar2022-07-131-1/+1
| | | Correct call-seq directive in string.c
* Using is_ascii_string to check encodingS-H-GAMELINKS2022-06-171-3/+3
|
* Remove unused and accidentally public rb_str_shared_root_p()Alan Wu2022-06-161-6/+0
| | | | | | | This function was added to a public header in [1] probably unintentionally since it's not used anywhere, exposes implementation details, and isn't related to the goals of that pull request. [1]: 56cc3e99b6b9ec004255280337f6b8353f5e5b06
* Add placeholder to let braces matchNobuyoshi Nakada2022-06-141-6/+6
|
* Move String RVALUES between poolsMatt Valentine-House2022-06-131-4/+73
| | | | | And re-embed any strings that can now fit inside the slot they've been moved to
* [DOC] Fix markup for `String` (#5984)Alexander Ilyin2022-06-091-1/+1
| | | | | | * Add missing space for `String#start_with?`. * Add missing pluses for `String#tr` and `Methods for Converting to New String` label. * Move quote into the tag for `Whitespace in Strings` label.
* Revert "error.c: Let Exception#inspect inspect its message"Yusuke Endoh2022-06-071-1/+1
| | | | This reverts commit 9d927204e7b86eb00bfd07a060a6383139edf741.
* error.c: Let Exception#inspect inspect its messageYusuke Endoh2022-06-071-1/+1
| | | | | | | | | | | | ... only when the message string has a newline. `p StandardError.new("foo\nbar")` now prints `#<StandardError: "foo\nbar">' instead of: #<StandardError: bar> [Bug #18170]
* [Feature #18595] Alias String#-@ as String#dedupJean Boussier2022-05-201-1/+4
|
* [DOC] Move the documentations of moved Symbol methodsNobuyoshi Nakada2022-04-141-45/+2
|
* [DOC] Enhanced RDoc for Symbol (#5796)Burdette Lamar2022-04-131-41/+31
| | | | | | | | | | | | | | | Treats: #[] #length #empty? #upcase #downcase #capitalize #swapcase #start_with? #end_with? #encoding ::all_symbols
* Enforce literals on the second argumentsNobuyoshi Nakada2022-04-131-1/+1
|
* Enhanced RDoc for Symbol (#5795)Burdette Lamar2022-04-121-92/+80
| | | | | | | | | | | | | | | | | | | Treats: #== #inspect #name #to_s #to_sym #to_proc #succ #<=> #casecmp #casecmp? #=~ #match #match?
* Fix some RDoc links (#5778)Burdette Lamar2022-04-081-11/+11
|
* All-in-one RDoc for class String (#5777)Burdette Lamar2022-04-071-416/+0
|
* [DOC] Enhanced RDoc for string slices (#5769)Burdette Lamar2022-04-061-107/+35
| | | Creates file doc/string/slices.rdoc that the string slicing methods can link to.
* Enhanced RDoc for String#index (#5759)Burdette Lamar2022-04-041-30/+1
|
* [DOC] Enhanced RDoc for String (#5753)Burdette Lamar2022-04-031-12/+2
| | | | | | | Treats: #length #bytesize
* [DOC] Enhanced RDoc for String (#5751)Burdette Lamar2022-04-021-3/+1
| | | | | | Adds to doc for String.new, also making it compliant with documentation_guide.rdoc. Fixes some broken links in io.c (that I failed to correct yesterday).
* [DOC] Enhanced RDoc for String (#5742)Burdette Lamar2022-03-311-69/+42
| | | | | | | | | | | | Treats: #force_encoding #b #valid_encoding? #ascii_only? #scrub #scrub! #unicode_normalized? Plus a couple of minor tweaks.
* Repaired What's Here sections for Range, String, Symbol, Struct (#5735)Burdette Lamar2022-03-301-184/+185
| | | Repaired What's Here sections for Range, String, Symbol, Struct.
* [DOC] Enhanced RDoc for String (#5730)Burdette Lamar2022-03-291-34/+14
| | | | | | | | | | | | Treats: #start_with? #end_with? #delete_prefix #delete_prefix! #delete_suffix #delete_suffix!
* [DOC] Enhanced RDoc for String (#5726)Burdette Lamar2022-03-281-54/+16
| | | | | | | | | | | Treats: #ljust #rjust #center #partition #rpartition
* [DOC] Enhanced RDoc for String (#5724)Burdette Lamar2022-03-271-46/+57
| | | | | | | | | | | | Treats: #scan #hex #oct #crypt #ord #sum
* [DOC] Fix references to unary operatorNobuyoshi Nakada2022-03-271-4/+4
|
* Enhanced RDoc for String (#5723)Burdette Lamar2022-03-261-47/+57
| | | | | | | | | | | | | Treats: #lstrip #lstrip! #rstrip #rstrip! #strip #strip! Adds section Whitespace in Strings.
* [DOC] Use simple references to operator methodsNobuyoshi Nakada2022-03-261-6/+5
| | | | | | | Method references is not only able to be marked up as code, also reflects `--show-hash` option. The bug that prevented the old rdoc from correctly parsing these methods was fixed last month.
* [DOC] Enhanced RDoc for String (#5707)Burdette Lamar2022-03-241-38/+15
| | | | | | | | | | Treated: #chomp #chomp! #chop #chop!
* [DOC] Enhanced RDoc for String (#5685)Burdette Lamar2022-03-221-59/+25
| | | | | | | | | | | | | Treats: #chars #codepoints #each_char #each_codepoint #each_grapheme_cluster #grapheme_clusters Also, corrects a passage in #unicode_normalize that mentioned module UnicodeNormalize, whose doc (:nodoc:, actually) says not to mention it.
* [DOC] Use RDoc inclusions in string.c (#5683)Burdette Lamar2022-03-211-5/+42
| | | | | | | | | | | | | | | | | | As @peterzhu2118 and @duerst have pointed out, putting string method's RDoc into doc/ (which allows non-ASCII in examples) makes the "click to toggle source" feature not work for that method. This PR moves the primary method doc back into string.c, then includes RDoc from doc/string/*.rdoc, and also removes doc/string.rdoc. The affected methods are: ::new #bytes #each_byte #each_line #split The call-seq is in string.c because it works there; it did not work when the call-seq is in doc/string/*.rdoc. This PR also updates the relevant guidance in doc/documentation_guide.rdoc.
* [DOC] Enhanced RDoc for String (#5675)Burdette Lamar2022-03-181-86/+7
| | | | | | | | | | Treats: #split #each_line #lines #each_byte #bytes
* Add String#bytespliceShugo Maeda2022-03-181-0/+71
|
* [DOC] Enhanced RDoc for String#split (#5644)Burdette Lamar2022-03-161-52/+1
| | | | | | | | | | | * Enhanced RDoc for String#split * Enhanced RDoc for String#split * Enhanced RDoc for String#split * Enhanced RDoc for String#split * Enhanced RDoc for String#split
* Initialize mutex for crypt(3) staticallyNobuyoshi Nakada2022-03-161-24/+1
| | | | | Assuming that all platforms, where only `crypt` is available but not `crypt_r`, are POSIX-base.
* [DOC] Enhanced RDoc for String (#5635)Burdette Lamar2022-03-091-25/+24
| | | | | | | | | | | | | Treats: #count #delete #delete! #squeeze #squeeze! Adds section "Multiple Character Selectors" to doc/character_selectors.rdoc. Co-authored-by: Peter Zhu <peter@peterzhu.ca>
* [DOC] Enhanced RDoc for String (#5633)Burdette Lamar2022-03-091-38/+32
| | | | | | | | | | | Treats: #tr (revised to link to "Character Selectors" document) #tr! #tr_s #tr_s! Also renames doc/character_selector.rdoc to match its title.
* [DOC] Fix default offset of String#byterindexKazuhiro NISHIYAMA2022-03-091-2/+2
|
* [DOC] Enhanced RDoc for String #tr and #tr! (#5626)Burdette Lamar2022-03-071-27/+41
|
* [DOC] mark `rb_str_init` as `:nodoc:`Nobuyoshi Nakada2022-03-031-0/+1
| | | | | Otherwise, an empty entry will be generated as `String::new` along with the one from doc/string.rb.
* [DOC] Fix String#getbyte docMau Magnaguagno2022-03-011-5/+6
| | | | | | | * String#getbyte returns `nil` if `index` is out of range. * Add String#getbyte example with nil output. * Modify String#getbyte example to use negative index.