delta/ruby.git - github.com: ruby/ruby.git

	Commit message (Collapse)	Author	Age	Files	Lines
*	Optimize method_missing calls	Jeremy Evans	2023-04-25	1	-0/+62
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	CALLER_ARG_SPLAT is not necessary for method_missing. We just need to unshift the method name into the arguments. This optimizes all method_missing calls: * mm(recv) ~9% * mm(recv, args) ~215% for args.length == 200 mm(recv, args, kw) ~55% for args.length == 200 mm(recv, *kw) ~22% mm(recv, kw: 1) ~100% Note that empty argument splats do get slower with this approach, by about 30-40%. Other than non-empty argument splats, other argument splats are faster, with the speedup depending on the number of arguments.
*	Optimize symproc calls	Jeremy Evans	2023-04-25	1	-0/+83
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Similar to the bmethod/send optimization, this avoids using CALLER_ARG_SPLAT if not necessary. As long as the receiver argument can be shifted off, other arguments are passed through as-is. This optimizes the following types of calls: * symproc.(recv) ~5% * symproc.(recv, args) ~65% for args.length == 200 symproc.(recv, args, kw) ~45% for args.length == 200 symproc.(recv, *kw) ~30% symproc.(recv, kw: 1) ~100% Note that empty argument splats do get slower with this approach, by about 2-3%. This is probably because iseq argument setup is slower for empty argument splats than CALLER_SETUP_ARG is. Other than non-empty argument splats, other argument splats are faster, with the speedup depending on the number of arguments. The following types of calls are not optimized: * symproc.(args) symproc.(args, *kw) This is because the you cannot shift the receiver argument off without first splatting the arg.
*	Optimize send calls	Jeremy Evans	2023-04-25	1	-0/+77
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Similar to the bmethod optimization, this avoids using CALLER_ARG_SPLAT if not necessary. As long as the method argument can be shifted off, other arguments are passed through as-is. This optimizes the following types of calls: * send(meth, arg) ~5% * send(meth, args) ~75% for args.length == 200 send(meth, args, kw) ~50% for args.length == 200 send(meth, *kw) ~25% send(meth, kw: 1) ~115% Note that empty argument splats do get slower with this approach, by about 20%. This is probably because iseq argument setup is slower for empty argument splats than CALLER_SETUP_ARG is. Other than non-empty argument splats, other argument splats are faster, with the speedup depending on the number of arguments. The following types of calls are not optimized: * send(args) send(args, *kw) This is because the you cannot shift the method argument off without first splatting the arg.
*	Optimize cfunc calls for f(a) and f(a, **kw) if kw is empty	Jeremy Evans	2023-04-25	1	-2/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This optimizes the following calls: * ~10-15% for f(a) when a does not end with a flagged keywords hash ~10-15% for f(a) when a ends with an empty flagged keywords hash ~35-40% for f(a, *kw) if kw is empty This still copies the array contents to the VM stack, but avoids some overhead. It would be faster to use the array pointer directly, but that could cause problems if the array was modified during the call to the function. You could do that optimization for frozen arrays, but as splatting frozen arrays is uncommon, and the speedup is minimal (<5%), it doesn't seem worth it. The vm_send_cfunc benchmark has been updated to test additional cfunc call types, and the numbers above were taken from the benchmark results.
*	Speed up calling iseq bmethods	Jeremy Evans	2023-04-25	1	-0/+37
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, bmethod arguments are copied from the VM stack to the C stack in vm_call_bmethod, then copied from the C stack to the VM stack later in invoke_iseq_block_from_c. This is inefficient. This adds vm_call_iseq_bmethod and vm_call_noniseq_bmethod. vm_call_iseq_bmethod is an optimized method that skips stack copies (though there is one copy to remove the receiver from the stack), and avoids calling vm_call_bmethod_body, rb_vm_invoke_bmethod, invoke_block_from_c_proc, invoke_iseq_block_from_c, and vm_yield_setup_args. Th vm_call_iseq_bmethod argument handling is similar to the way normal iseq methods are called, and allows for similar performance optimizations when using splats or keywords. However, even in the no argument case it's still significantly faster. A benchmark is added for bmethod calling. In my environment, it improves bmethod calling performance by 38-59% for simple bmethod calls, and up to 180% for bmethod calls passing literal keywords on both sides. ``` ./miniruby-iseq-bmethod: 18159792.6 i/s ./miniruby-m: 13174419.1 i/s - 1.38x slower bmethod_simple_1 ./miniruby-iseq-bmethod: 15890745.4 i/s ./miniruby-m: 10008972.7 i/s - 1.59x slower bmethod_simple_0_splat ./miniruby-iseq-bmethod: 13142804.3 i/s ./miniruby-m: 11168595.2 i/s - 1.18x slower bmethod_simple_1_splat ./miniruby-iseq-bmethod: 12375791.0 i/s ./miniruby-m: 8491140.1 i/s - 1.46x slower bmethod_no_splat ./miniruby-iseq-bmethod: 10151258.8 i/s ./miniruby-m: 8716664.1 i/s - 1.16x slower bmethod_0_splat ./miniruby-iseq-bmethod: 8138802.5 i/s ./miniruby-m: 7515600.2 i/s - 1.08x slower bmethod_1_splat ./miniruby-iseq-bmethod: 8028372.7 i/s ./miniruby-m: 5947658.6 i/s - 1.35x slower bmethod_10_splat ./miniruby-iseq-bmethod: 6953514.1 i/s ./miniruby-m: 4840132.9 i/s - 1.44x slower bmethod_100_splat ./miniruby-iseq-bmethod: 5287288.4 i/s ./miniruby-m: 2243218.4 i/s - 2.36x slower bmethod_kw ./miniruby-iseq-bmethod: 8931358.2 i/s ./miniruby-m: 3185818.6 i/s - 2.80x slower bmethod_no_kw ./miniruby-iseq-bmethod: 12281287.4 i/s ./miniruby-m: 10041727.9 i/s - 1.22x slower bmethod_kw_splat ./miniruby-iseq-bmethod: 5618956.8 i/s ./miniruby-m: 3657549.5 i/s - 1.54x slower ```
*	Remove MJIT-specific benchmarks	Takashi Kokubun	2023-03-06	7	-149/+0
\|
*	Fix spelling (#7405)	John Bampton	2023-02-28	1	-1/+1
\|
*	Benchmark String interpolation across size pools	Matt Valentine-House	2023-01-13	1	-0/+6
\|
*	Rewrite Kernel#loop in Ruby (#6983)	Takashi Kokubun	2022-12-25	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	* Rewrite Kernel#loop in Ruby * Use enum_for(:loop) { Float::INFINITY } Co-authored-by: Ufuk Kayserilioglu <ufuk@paralaus.com> * Limit the scope to rescue StopIteration Co-authored-by: Ufuk Kayserilioglu <ufuk@paralaus.com>
*	[Feature #18033] Make Time.new parse time strings	Nobuyoshi Nakada	2022-12-16	1	-0/+2
\| \| \| \| \|	`Time.new` now parses strings such as the result of `Time#inspect` and restricted ISO-8601 formats.
*	Introduce BOP_CMP for optimized comparison	Daniel Colson	2022-12-06	4	-0/+57
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Prior to this commit the `OPTIMIZED_CMP` macro relied on a method lookup to determine whether `<=>` was overridden. The result of the lookup was cached, but only for the duration of the specific method that initialized the cmp_opt_data cache structure. With this method lookup, `[x,y].max` is slower than doing `x > y ? x : y` even though there's an optimized instruction for "new array max". (John noticed somebody a proposed micro-optimization based on this fact in https://github.com/mastodon/mastodon/pull/19903.) ```rb a, b = 1, 2 Benchmark.ips do \|bm\| bm.report('conditional') { a > b ? a : b } bm.report('method') { [a, b].max } bm.compare! end ``` Before: ``` Comparison: conditional: 22603733.2 i/s method: 19820412.7 i/s - 1.14x (± 0.00) slower ``` This commit replaces the method lookup with a new CMP basic op, which gives the examples above equivalent performance. After: ``` Comparison: method: 24022466.5 i/s conditional: 23851094.2 i/s - same-ish: difference falls within error ``` Relevant benchmarks show an improvement to Array#max and Array#min when not using the optimized newarray_max instruction as well. They are noticeably faster for small arrays with the relevant types, and the same or maybe a touch faster on larger arrays. ``` $ make benchmark COMPARE_RUBY=<master@5958c305> ITEM=array_min $ make benchmark COMPARE_RUBY=<master@5958c305> ITEM=array_max ``` The benchmarks added in this commit also look generally improved. Co-authored-by: John Hawthorn <jhawthorn@github.com>
*	Rename --mjit-min-calls to --mjit-call-threshold (#6731)	Takashi Kokubun	2022-11-14	1	-1/+1
\| \| \|	for consistency with YJIT
*	Improve HTML escape benchmarks	Takashi Kokubun	2022-11-04	2	-23/+45
\|
*	Improve performance some `Integer` and `Float` methods [Feature #19085] (#6638)	S.H	2022-10-27	1	-0/+16
\| \| \| \| \| \| \|	* Improve some Integer and Float methods * Using alias and Remove unnecessary code * Remove commentout code
*	Add several new methods for getting and setting buffer contents. (#6434)	Samuel Williams	2022-09-26	2	-5/+47
\|
*	Adds a benchmark to measure freezing objects	Jemma Issroff	2022-09-22	1	-0/+6
\|
*	avoid extra dup and pop in compile_op_asgn2	HParker	2022-09-22	1	-0/+8
\| \| \| \|	Co-authored-by: John Hawthorn <jhawthorn@github.com>
*	Fix style on vm_ivar benchmarks (#6379)	Jemma Issroff	2022-09-15	3	-27/+27
\|
*	Add vm_ivar get, get_unitialized, and lazy_set benchmarks	Jemma Issroff	2022-09-14	3	-0/+61
\|
*	rb_str_concat_literals: use rb_str_buf_append	Jean Boussier	2022-09-08	1	-0/+10
\| \| \| \|	That's about 1.30x faster.
*	New constant caching insn: opt_getconstant_path	John Hawthorn	2022-09-01	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously YARV bytecode implemented constant caching by having a pair of instructions, opt_getinlinecache and opt_setinlinecache, wrapping a series of getconstant calls (with putobject providing supporting arguments). This commit replaces that pattern with a new instruction, opt_getconstant_path, handling both getting/setting the inline cache and fetching the constant on a cache miss. This is implemented by storing the full constant path as a null-terminated array of IDs inside of the IC structure. idNULL is used to signal an absolute constant reference. $ ./miniruby --dump=insns -e '::Foo::Bar::Baz' == disasm: #<ISeq:<main>@-e:1 (1,0)-(1,13)> (catch: FALSE) 0000 opt_getconstant_path <ic:0 ::Foo::Bar::Baz> ( 1)[Li] 0002 leave The motivation for this is that we had increasingly found the need to disassemble the instructions between the opt_getinlinecache and opt_setinlinecache in order to determine the constant we are fetching, or otherwise store metadata. This disassembly was done: * In opt_setinlinecache, to register the IC against the constant names it is using for granular invalidation. * In rb_iseq_free, to unregister the IC from the invalidation table. * In YJIT to find the position of a opt_getinlinecache instruction to invalidate it when the cache is populated * In YJIT to register the constant names being used for invalidation. With this change we no longe need disassemly for these (in fact rb_iseq_each is now unused), as the list of constant names being referenced is held in the IC. This should also make it possible to make more optimizations in the future. This may also reduce the size of iseqs, as previously each segment required 32 bytes (on 64-bit platforms) for each constant segment. This implementation only stores one ID per-segment. There should be no significant performance change between this and the previous implementation. Previously opt_getinlinecache was a "leaf" instruction, but it included a jump (almost always to a separate cache line). Now opt_getconstant_path is a non-leaf (it may raise/autoload/call const_missing) but it does not jump. These seem to even out.
*	Remove mjit_exec benchmarks	Takashi Kokubun	2022-08-21	4	-255/+0
\| \| \| \| \|	Now that mjit_exec doesn't exist, those files feel old. I'll probably change how I benchmark it when I add benchmarks for it again.
*	Rename mjit_compile.c to mjit_compiler.c	Takashi Kokubun	2022-08-21	1	-1/+1
\| \| \| \| \| \|	I'm planning to introduce mjit_compiler.rb, and I want to make this consistent with it. Consistency with compile.c doesn't seem important for MJIT anyway.
*	Rename mjit_exec to jit_exec (#6262)	Takashi Kokubun	2022-08-19	1	-3/+3
\| \| \| \| \| \| \|	* Rename mjit_exec to jit_exec * Rename mjit_exec_slowpath to mjit_check_iseq * Remove mjit_exec references from comments
*	Make benchmark indentation consistent	Takashi Kokubun	2022-08-19	2	-55/+55
\| \| \| \|	Related to https://github.com/Shopify/yjit-bench/pull/109
*	Added vm setivar benchmark from yjit-bench	Jemma Issroff	2022-08-17	1	-0/+35
\|
*	Optimize Marshal dump/load for large (> 31-bit) FIXNUM (#6229)	John Hawthorn	2022-08-15	1	-0/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Optimize Marshal dump of large fixnum Marshal's FIXNUM type only supports 31-bit fixnums, so on 64-bit platforms the 63-bit fixnums need to be represented in Marshal's BIGNUM. Previously this was done by converting to a bugnum and serializing the bignum object. This commit avoids allocating the intermediate bignum object, instead outputting the T_FIXNUM directly to a Marshal bignum. This maintains the same representation as the previous implementation, including not using LINKs for these large fixnums (an artifact of the previous implementation always allocating a new BIGNUM). This commit also avoids unnecessary st_lookups on immediate values, which we know will not be in that table. * Fastpath for loading FIXNUM from Marshal bignum * Run update-deps
*	Update multiple assignment benchmarks to include non-literal array cases	Jeremy Evans	2022-08-09	1	-1/+25
\| \| \| \| \| \| \|	This allows them to show the effect of the previous newarray/expandarray to swap/opt_reverse optimization. This shows an 35-83% performance improvement in the four multiple assignment benchmarks that use this optimization.
*	Update IO::Buffer#get_value benchmark	Jean Boussier	2022-08-08	1	-8/+9
\| \| \| \| \| \| \| \|	- The method was renamed from `get` to `get_value` - Comparing to `String#unpack` isn't quite equivalent, `unpack1` is closer. - Use frozen_string_literal to avoid allocating a format string every time. - Use `N` format which is equivalent to `:U32` (`uint_32_t` big-endian). - Disable experimental warnings to not mess up the output.
*	rb_str_buf_append: add a fast path for ENC_CODERANGE_VALID	Jean Boussier	2022-07-25	1	-13/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If the RHS has valid encoding, and both strings have the same encoding, we can use the fast path. However we need to update the LHS coderange. ``` compare-ruby: ruby 3.2.0dev (2022-07-21T14:46:32Z master cdbb9b8555) [arm64-darwin21] built-ruby: ruby 3.2.0dev (2022-07-25T07:25:41Z string-concat-vali.. 11a2772bdd) [arm64-darwin21] warming up... \| \|compare-ruby\|built-ruby\| \|:-------------------\|-----------:\|---------:\| \|binary_concat_7bit \| 554.816k\| 556.460k\| \| \| -\| 1.00x\| \|utf8_concat_7bit \| 556.367k\| 555.101k\| \| \| 1.00x\| -\| \|utf8_concat_UTF8 \| 412.555k\| 556.824k\| \| \| -\| 1.35x\| ```
*	string.c: use str_enc_fastpath in TERM_LEN	Jean Boussier	2022-07-21	1	-3/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Not having to fetch the rb_encoding save a significant amount of time. Additionally, even when we have to fetch it, we can do it faster using `ENCODING_GET` rather than `rb_enc_get`. ``` compare-ruby: ruby 3.2.0dev (2022-07-19T08:41:40Z master cb9fd920a3) [arm64-darwin21] built-ruby: ruby 3.2.0dev (2022-07-21T11:16:16Z faster-buffer-conc.. 4f001f0748) [arm64-darwin21] warming up... \| \|compare-ruby\|built-ruby\| \|:---------------------\|-----------:\|---------:\| \|binary_concat_utf8 \| 510.580k\| 565.600k\| \| \| -\| 1.11x\| \|binary_concat_binary \| 512.653k\| 571.483k\| \| \| -\| 1.11x\| \|utf8_concat_utf8 \| 511.396k\| 566.879k\| \| \| -\| 1.11x\| ```
*	rb_str_buf_append: fastpath to str_buf_cat	Jean Boussier	2022-07-19	1	-2/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If the LHS is ASCII compatible and the RHS is 7BIT we can directly concat without being concerned about anything else. Benchmark: ``` compare-ruby: ruby 3.2.0dev (2022-07-12T15:01:11Z master 71aec68566) [arm64-darwin21] built-ruby: ruby 3.2.0dev (2022-07-13T10:13:53Z faster-buffer-conc.. a04c10476d) [arm64-darwin21] warming up... \| \|compare-ruby\|built-ruby\| \|:---------------------\|-----------:\|---------:\| \|binary_append_utf8 \| 385.315k\| 573.663k\| \| \| -\| 1.49x\| \|binary_append_binary \| 446.579k\| 574.898k\| \| \| -\| 1.29x\| \|utf8_append_utf8 \| 430.936k\| 573.394k\| \| \| -\| 1.33x\| ``` Note that in the benchmark, the RHS always have a precomputed coderange. So the benchmark never enter the slowpath of having to scan the RHS. However it's extremly likely that we'll end up scanning it anyway in rb_enc_cr_str_buf_cat
*	Add benchmarks for setting / getting ivars on generics	Jemma Issroff	2022-07-15	2	-0/+31
\|
*	Fixes ivar benchmarks to not depend on object allocation	Jemma Issroff	2022-07-15	3	-7/+14
\| \| \| \| \| \| \|	Prior to this change, we were measuring object allocation as well as setting instance variables within ivar benchmarks. With this change, we now only measure setting instance variables within ivar benchmarks.
*	vm_opt_ltlt: call rb_str_buf_append directly if RHS is a String	Jean Boussier	2022-07-06	1	-0/+13
\| \| \| \| \| \| \| \| \| \| \| \|	`rb_str_concat` does a lot of type checking we can easily bypass. ``` \| \|compare-ruby\|built-ruby\| \|:--------------\|-----------:\|---------:\| \|string_concat \| 362.007k\| 398.965k\| \| \| -\| 1.10x\| ```
*	Added vm_ivar benchmark for initializing an embedded obj	Jemma Issroff	2022-06-16	2	-1/+13
\|
*	Update the help message on /benchmark	Takashi Kokubun	2022-06-07	1	-3/+5
\| \| \| \|	I wanted to point out there's --output=all.
*	Add IO write throughput/locking overhead benchmark.	Samuel Williams	2022-05-28	1	-0/+22
\|
*	Finer-grained constant cache invalidation (take 2)	Kevin Newton	2022-04-01	1	-0/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This commit reintroduces finer-grained constant cache invalidation. After 8008fb7 got merged, it was causing issues on token-threaded builds (such as on Windows). The issue was that when you're iterating through instruction sequences and using the translator functions to get back the instruction structs, you're either using `rb_vm_insn_null_translator` or `rb_vm_insn_addr2insn2` depending if it's a direct-threading build. `rb_vm_insn_addr2insn2` does some normalization to always return to you the non-trace version of whatever instruction you're looking at. `rb_vm_insn_null_translator` does not do that normalization. This means that when you're looping through the instructions if you're trying to do an opcode comparison, it can change depending on the type of threading that you're using. This can be very confusing. So, this commit creates a new translator function `rb_vm_insn_normalizing_translator` to always return the non-trace version so that opcode comparisons don't have to worry about different configurations. [Feature #18589]
*	Revert "Finer-grained inline constant cache invalidation"	Nobuyoshi Nakada	2022-03-25	1	-22/+0
\| \| \| \| \| \| \| \| \| \| \| \|	This reverts commits for [Feature #18589]: * 8008fb7352abc6fba433b99bf20763cf0d4adb38 "Update formatting per feedback" * 8f6eaca2e19828e92ecdb28b0fe693d606a03f96 "Delete ID from constant cache table if it becomes empty on ISEQ free" * 629908586b4bead1103267652f8b96b1083573a8 "Finer-grained inline constant cache invalidation" MSWin builds on AppVeyor have been crashing since the merger.
*	Finer-grained inline constant cache invalidation	Kevin Newton	2022-03-24	1	-0/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Current behavior - caches depend on a global counter. All constant mutations cause caches to be invalidated. ```ruby class A B = 1 end def foo A::B # inline cache depends on global counter end foo # populate inline cache foo # hit inline cache C = 1 # global counter increments, all caches are invalidated foo # misses inline cache due to `C = 1` ``` Proposed behavior - caches depend on name components. Only constant mutations with corresponding names will invalidate the cache. ```ruby class A B = 1 end def foo A::B # inline cache depends constants named "A" and "B" end foo # populate inline cache foo # hit inline cache C = 1 # caches that depend on the name "C" are invalidated foo # hits inline cache because IC only depends on "A" and "B" ``` Examples of breaking the new cache: ```ruby module C # Breaks `foo` cache because "A" constant is set and the cache in foo depends # on "A" and "B" class A; end end B = 1 ``` We expect the new cache scheme to be invalidated less often because names aren't frequently reused. With the cache being invalidated less, we can rely on its stability more to keep our constant references fast and reduce the need to throw away generated code in YJIT.
*	Constant time class to class ancestor lookup	John Hawthorn	2022-02-23	1	-0/+27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously when checking ancestors, we would walk all the way up the ancestry chain checking each parent for a matching class or module. I believe this was especially unfriendly to CPU cache since for each step we need to check two cache lines (the class and class ext). This check is used quite often in: * case statements * rescue statements * Calling protected methods * Class#is_a? * Module#=== * Module#<=> I believe it's most common to check a class against a parent class, to this commit aims to improve that (unfortunately does not help checking for an included Module). This is done by storing on each class the number and an array of all parent classes, in order (BasicObject is at index 0). Using this we can check whether a class is a subclass of another in constant time since we know the location to expect it in the hierarchy.
*	Speed up and avoid kwarg hash alloc in Time.now	John Hawthorn	2022-01-12	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously Time.now was switched to use Time.new as it added support for the in: argument. Unfortunately because Class#new is a cfunc this requires always allocating a Hash. This commit switches Time.now back to using a builtin time_s_now. This avoids the extra Hash allocation and is about 3x faster. $ benchmark-driver -e './ruby;3.1::~/.rubies/ruby-3.1.0/bin/ruby;3.0::~/.rubies/ruby-3.0.2/bin/ruby' benchmark/time_now.yml Warming up -------------------------------------- Time.now 6.704M i/s - 6.710M times in 1.000814s (149.16ns/i, 328clocks/i) Time.now(in: "+09:00") 2.003M i/s - 2.112M times in 1.054330s (499.31ns/i) Calculating ------------------------------------- ./ruby 3.1 3.0 Time.now 7.693M 2.763M 6.394M i/s - 20.113M times in 2.614428s 7.278710s 3.145572s Time.now(in: "+09:00") 2.030M 1.260M 1.617M i/s - 6.008M times in 2.960132s 4.769378s 3.716537s Comparison: Time.now ./ruby: 7693129.7 i/s 3.0: 6394109.2 i/s - 1.20x slower 3.1: 2763282.5 i/s - 2.78x slower Time.now(in: "+09:00") ./ruby: 2029757.4 i/s 3.0: 1616652.3 i/s - 1.26x slower 3.1: 1259776.2 i/s - 1.61x slower
*	Prepare for removing RubyVM::JIT (#5262)	Takashi Kokubun	2021-12-13	2	-7/+7
\|
*	Optimize dynamic string interpolation for symbol/true/false/nil/0-9	Jeremy Evans	2021-11-18	10	-0/+67
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This provides a significant speedup for symbol, true, false, nil, and 0-9, class/module, and a small speedup in most other cases. Speedups (using included benchmarks): :symbol :: 60% 0-9 :: 50% Class/Module :: 50% nil/true/false :: 20% integer :: 10% [] :: 10% "" :: 3% One reason this approach is faster is it reduces the number of VM instructions for each interpolated value. Initial idea, approach, and benchmarks from Eric Wong. I applied the same approach against the master branch, updating it to handle the significant internal changes since this was first proposed 4 years ago (such as CALL_INFO/CALL_CACHE -> CALL_DATA). I also expanded it to optimize true/false/nil/0-9/class/module, and added handling of missing methods, refined methods, and RUBY_DEBUG. This renames the tostring insn to anytostring, and adds an objtostring insn that implements the optimization. This requires making a few functions non-static, and adding some non-static functions. This disables 4 YJIT tests. Those tests should be reenabled after YJIT optimizes the new objtostring insn. Implements [Feature #13715] Co-authored-by: Eric Wong <e@80x24.org> Co-authored-by: Alan Wu <XrXr@users.noreply.github.com> Co-authored-by: Yusuke Endoh <mame@ruby-lang.org> Co-authored-by: Koichi Sasada <ko1@atdot.net>
*	Skip string allocation in benchmark/time_at.yml	Takashi Kokubun	2021-11-14	2	-1/+2
\| \| \| \|	and also drop a weird newline from benchmark/array_sample.yml.
*	add benchmark/time_at.yml	Koichi Sasada	2021-11-15	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	``` ruby_2_6 ruby_2_7 ruby_3_0 master modified Time.at(0) 12.362M 11.015M 9.499M 6.615M 9.000M i/s - 32.115M times in 2.597946s 2.915517s 3.380725s 4.854651s 3.568234s Time.at(0, 500) 7.542M 7.136M 8.252M 5.707M 5.646M i/s - 20.713M times in 2.746279s 2.902556s 2.510166s 3.629644s 3.668854s Time.at(0, in: "+09:00") 1.426M 1.346M 1.565M 1.674M 1.667M i/s - 4.240M times in 2.974049s 3.149753s 2.709416s 2.533043s 2.542853s ``` ``` ruby_2_6: ruby 2.6.7p150 (2020-12-09 revision 67888) [x86_64-linux] ruby_2_7: ruby 2.7.3p140 (2020-12-09 revision 9b884df6dd) [x86_64-linux] ruby_3_0: ruby 3.0.3p150 (2021-11-06 revision 6d540c1b98) [x86_64-linux] master: ruby 3.1.0dev (2021-11-13T20:48:57Z master fc456adc6a) [x86_64-linux] modified: ruby 3.1.0dev (2021-11-15T01:12:51Z mandatory_only_bui.. b0228446db) [x86_64-linux] ```
*	add benchmark/array_sample.yml	Koichi Sasada	2021-11-15	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	``` ruby_2_6 ruby_2_7 ruby_3_0 master modified ary.sample 32.113M 30.146M 11.162M 10.539M 26.620M i/s - 64.882M times in 2.020428s 2.152296s 5.812981s 6.156398s 2.437325s ary.sample(2) 9.420M 8.987M 7.500M 6.973M 7.191M i/s - 25.170M times in 2.672085s 2.800616s 3.355896s 3.609534s 3.500108s ``` ``` ruby_2_6: ruby 2.6.7p150 (2020-12-09 revision 67888) [x86_64-linux] ruby_2_7: ruby 2.7.3p140 (2020-12-09 revision 9b884df6dd) [x86_64-linux] ruby_3_0: ruby 3.0.3p150 (2021-11-06 revision 6d540c1b98) [x86_64-linux] master: ruby 3.1.0dev (2021-11-13T20:48:57Z master fc456adc6a) [x86_64-linux] modified: ruby 3.1.0dev (2021-11-15T01:12:51Z mandatory_only_bui.. b0228446db) [x86_64-linux] ```
*	IO::Buffer for scheduler interface.	Samuel Williams	2021-11-10	1	-0/+9
\|
*	add vm_ivar_of_class_set	Koichi Sasada	2021-10-23	1	-0/+11
\| \| \| \|	benchmark for a class's ivar setter