summaryrefslogtreecommitdiff
path: root/benchmark
Commit message (Collapse)AuthorAgeFilesLines
* Support tracing of attr_reader and attr_writerJeremy Evans2021-08-291-0/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | In vm_call_method_each_type, check for c_call and c_return events before dispatching to vm_call_ivar and vm_call_attrset. With this approach, the call cache will still dispatch directly to those functions, so this change will only decrease performance for the first (uncached) call, and even then, the performance decrease is very minimal. This approach requires that we clear the call caches when tracing is enabled or disabled. The approach currently switches all vm_call_ivar and vm_call_attrset call caches to vm_call_general any time tracing is enabled or disabled. So it could theoretically result in a slowdown for code that constantly enables or disables tracing. This approach does not handle targeted tracepoints, but from my testing, c_call and c_return events are not supported for targeted tracepoints, so that shouldn't matter. This includes a benchmark showing the performance decrease is minimal if detectable at all. Fixes [Bug #16383] Fixes [Bug #10470] Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com>
* Prefer qualified names under ThreadNobuyoshi Nakada2021-06-292-9/+9
|
* Add a cache for class variableseileencodes2021-06-181-0/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Redo of 34a2acdac788602c14bf05fb616215187badd504 and 931138b00696419945dc03e10f033b1f53cd50f3 which were reverted. GitHub PR #4340. This change implements a cache for class variables. Previously there was no cache for cvars. Cvar access is slow due to needing to travel all the way up th ancestor tree before returning the cvar value. The deeper the ancestor tree the slower cvar access will be. The benefits of the cache are more visible with a higher number of included modules due to the way Ruby looks up class variables. The benchmark here includes 26 modules and shows with the cache, this branch is 6.5x faster when accessing class variables. ``` compare-ruby: ruby 3.1.0dev (2021-03-15T06:22:34Z master 9e5105c) [x86_64-darwin19] built-ruby: ruby 3.1.0dev (2021-03-15T12:12:44Z add-cache-for-clas.. c6be009) [x86_64-darwin19] | |compare-ruby|built-ruby| |:--------|-----------:|---------:| |vm_cvar | 5.681M| 36.980M| | | -| 6.51x| ``` Benchmark.ips calling `ActiveRecord::Base.logger` from within a Rails application. ActiveRecord::Base.logger has 71 ancestors. The more ancestors a tree has, the more clear the speed increase. IE if Base had only one ancestor we'd see no improvement. This benchmark is run on a vanilla Rails application. Benchmark code: ```ruby require "benchmark/ips" require_relative "config/environment" Benchmark.ips do |x| x.report "logger" do ActiveRecord::Base.logger end end ``` Ruby 3.0 master / Rails 6.1: ``` Warming up -------------------------------------- logger 155.251k i/100ms Calculating ------------------------------------- ``` Ruby 3.0 with cvar cache / Rails 6.1: ``` Warming up -------------------------------------- logger 1.546M i/100ms Calculating ------------------------------------- logger 14.857M (± 4.8%) i/s - 74.198M in 5.006202s ``` Lastly we ran a benchmark to demonstate the difference between master and our cache when the number of modules increases. This benchmark measures 1 ancestor, 30 ancestors, and 100 ancestors. Ruby 3.0 master: ``` Warming up -------------------------------------- 1 module 1.231M i/100ms 30 modules 432.020k i/100ms 100 modules 145.399k i/100ms Calculating ------------------------------------- 1 module 12.210M (± 2.1%) i/s - 61.553M in 5.043400s 30 modules 4.354M (± 2.7%) i/s - 22.033M in 5.063839s 100 modules 1.434M (± 2.9%) i/s - 7.270M in 5.072531s Comparison: 1 module: 12209958.3 i/s 30 modules: 4354217.8 i/s - 2.80x (± 0.00) slower 100 modules: 1434447.3 i/s - 8.51x (± 0.00) slower ``` Ruby 3.0 with cvar cache: ``` Warming up -------------------------------------- 1 module 1.641M i/100ms 30 modules 1.655M i/100ms 100 modules 1.620M i/100ms Calculating ------------------------------------- 1 module 16.279M (± 3.8%) i/s - 82.038M in 5.046923s 30 modules 15.891M (± 3.9%) i/s - 79.459M in 5.007958s 100 modules 16.087M (± 3.6%) i/s - 81.005M in 5.041931s Comparison: 1 module: 16279458.0 i/s 100 modules: 16087484.6 i/s - same-ish: difference falls within error 30 modules: 15891406.2 i/s - same-ish: difference falls within error ``` Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
* Improve perfomance for Integer#size method [Feature #17135] (#3476)S.H2021-06-041-0/+2
| | | | | | | | | * Improve perfomance for Integer#size method [Feature #17135] * re-run ci * Let MJIT frame skip work for Integer#size Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com>
* Implemented some NilClass method in Ruby code is faster [Feature #17054] (#3366)S.H2021-06-021-0/+6
|
* compile.c: Emit send for === calls in when statementsAlan Wu2021-05-281-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The checkmatch instruction with VM_CHECKMATCH_TYPE_CASE calls === without a call cache. Emit a send instruction to make the call instead. It includes a call cache. The call cache improves throughput of using when statements to check the class of a given object. This is useful for say, JSON serialization. Use of a regular send instead of checkmatch also avoids taking the VM lock every time, which is good for multi-ractor workloads. Calculating ------------------------------------- master post vm_case_classes 11.013M 16.172M i/s - 6.000M times in 0.544795s 0.371009s vm_case_lit 2.296 2.263 i/s - 1.000 times in 0.435606s 0.441826s vm_case 74.098M 64.338M i/s - 6.000M times in 0.080974s 0.093257s Comparison: vm_case_classes post: 16172114.4 i/s master: 11013316.9 i/s - 1.47x slower vm_case_lit master: 2.3 i/s post: 2.3 i/s - 1.01x slower vm_case master: 74097858.6 i/s post: 64338333.9 i/s - 1.15x slower The vm_case benchmark is a bit slower post patch, possibily due to the larger instruction sequence. The benchmark dispatches using opt_case_dispatch so was not running checkmatch and does not make the === call post patch.
* Revert "Filling cache values on cvar write"Aaron Patterson2021-05-111-20/+0
| | | | | This reverts commit 08de37f9fa3469365e6b5c964689ae2bae0eb9f3. This reverts commit e8ae922b62adb00a80d3d4c49f7d7b0e6026eaba.
* Add a cache for class variableseileencodes2021-05-111-0/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This change implements a cache for class variables. Previously there was no cache for cvars. Cvar access is slow due to needing to travel all the way up th ancestor tree before returning the cvar value. The deeper the ancestor tree the slower cvar access will be. The benefits of the cache are more visible with a higher number of included modules due to the way Ruby looks up class variables. The benchmark here includes 26 modules and shows with the cache, this branch is 6.5x faster when accessing class variables. ``` compare-ruby: ruby 3.1.0dev (2021-03-15T06:22:34Z master 9e5105ca45) [x86_64-darwin19] built-ruby: ruby 3.1.0dev (2021-03-15T12:12:44Z add-cache-for-clas.. c6be0093ae) [x86_64-darwin19] | |compare-ruby|built-ruby| |:--------|-----------:|---------:| |vm_cvar | 5.681M| 36.980M| | | -| 6.51x| ``` Benchmark.ips calling `ActiveRecord::Base.logger` from within a Rails application. ActiveRecord::Base.logger has 71 ancestors. The more ancestors a tree has, the more clear the speed increase. IE if Base had only one ancestor we'd see no improvement. This benchmark is run on a vanilla Rails application. Benchmark code: ```ruby require "benchmark/ips" require_relative "config/environment" Benchmark.ips do |x| x.report "logger" do ActiveRecord::Base.logger end end ``` Ruby 3.0 master / Rails 6.1: ``` Warming up -------------------------------------- logger 155.251k i/100ms Calculating ------------------------------------- ``` Ruby 3.0 with cvar cache / Rails 6.1: ``` Warming up -------------------------------------- logger 1.546M i/100ms Calculating ------------------------------------- logger 14.857M (± 4.8%) i/s - 74.198M in 5.006202s ``` Lastly we ran a benchmark to demonstate the difference between master and our cache when the number of modules increases. This benchmark measures 1 ancestor, 30 ancestors, and 100 ancestors. Ruby 3.0 master: ``` Warming up -------------------------------------- 1 module 1.231M i/100ms 30 modules 432.020k i/100ms 100 modules 145.399k i/100ms Calculating ------------------------------------- 1 module 12.210M (± 2.1%) i/s - 61.553M in 5.043400s 30 modules 4.354M (± 2.7%) i/s - 22.033M in 5.063839s 100 modules 1.434M (± 2.9%) i/s - 7.270M in 5.072531s Comparison: 1 module: 12209958.3 i/s 30 modules: 4354217.8 i/s - 2.80x (± 0.00) slower 100 modules: 1434447.3 i/s - 8.51x (± 0.00) slower ``` Ruby 3.0 with cvar cache: ``` Warming up -------------------------------------- 1 module 1.641M i/100ms 30 modules 1.655M i/100ms 100 modules 1.620M i/100ms Calculating ------------------------------------- 1 module 16.279M (± 3.8%) i/s - 82.038M in 5.046923s 30 modules 15.891M (± 3.9%) i/s - 79.459M in 5.007958s 100 modules 16.087M (± 3.6%) i/s - 81.005M in 5.041931s Comparison: 1 module: 16279458.0 i/s 100 modules: 16087484.6 i/s - same-ish: difference falls within error 30 modules: 15891406.2 i/s - same-ish: difference falls within error ``` Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
* Eagerly allocate instance variable tables along with objectAaron Patterson2021-05-031-0/+23
| | | | | | | | This allows us to allocate the right size for the object in advance, meaning that we don't have to pay the cost of ivar table extension later. The idea is that if an object type ever became "extended" at some point, then it is very likely it will become extended again. So we may as well allocate the ivar table up front.
* [Doc] Fix a typo s/visilibity/visibility/wonda-tea-coffee2021-04-251-1/+1
|
* Evaluate multiple assignment left hand side before right hand sideJeremy Evans2021-04-211-0/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In regular assignment, Ruby evaluates the left hand side before the right hand side. For example: ```ruby foo[0] = bar ``` Calls `foo`, then `bar`, then `[]=` on the result of `foo`. Previously, multiple assignment didn't work this way. If you did: ```ruby abc.def, foo[0] = bar, baz ``` Ruby would previously call `bar`, then `baz`, then `abc`, then `def=` on the result of `abc`, then `foo`, then `[]=` on the result of `foo`. This change makes multiple assignment similar to single assignment, changing the evaluation order of the above multiple assignment code to calling `abc`, then `foo`, then `bar`, then `baz`, then `def=` on the result of `abc`, then `[]=` on the result of `foo`. Implementing this is challenging with the stack-based virtual machine. We need to keep track of all of the left hand side attribute setter receivers and setter arguments, and then keep track of the stack level while handling the assignment processing, so we can issue the appropriate topn instructions to get the receiver. Here's an example of how the multiple assignment is executed, showing the stack and instructions: ``` self # putself abc # send abc, self # putself abc, foo # send abc, foo, 0 # putobject 0 abc, foo, 0, [bar, baz] # evaluate RHS abc, foo, 0, [bar, baz], baz, bar # expandarray abc, foo, 0, [bar, baz], baz, bar, abc # topn 5 abc, foo, 0, [bar, baz], baz, abc, bar # swap abc, foo, 0, [bar, baz], baz, def= # send abc, foo, 0, [bar, baz], baz # pop abc, foo, 0, [bar, baz], baz, foo # topn 3 abc, foo, 0, [bar, baz], baz, foo, 0 # topn 3 abc, foo, 0, [bar, baz], baz, foo, 0, baz # topn 2 abc, foo, 0, [bar, baz], baz, []= # send abc, foo, 0, [bar, baz], baz # pop abc, foo, 0, [bar, baz] # pop [bar, baz], foo, 0, [bar, baz] # setn 3 [bar, baz], foo, 0 # pop [bar, baz], foo # pop [bar, baz] # pop ``` As multiple assignment must deal with splats, post args, and any level of nesting, it gets quite a bit more complex than this in non-trivial cases. To handle this, struct masgn_state is added to keep track of the overall state of the mass assignment, which stores a linked list of struct masgn_attrasgn, one for each assigned attribute. This adds a new optimization that replaces a topn 1/pop instruction combination with a single swap instruction for multiple assignment to non-aref attributes. This new approach isn't compatible with one of the optimizations previously used, in the case where the multiple assignment return value was not needed, there was no lhs splat, and one of the left hand side used an attribute setter. This removes that optimization. Removing the optimization allowed for removing the POP_ELEMENT and adjust_stack functions. This adds a benchmark to measure how much slower multiple assignment is with the correct evaluation order. This benchmark shows: * 4-9% decrease for attribute sets * 14-23% decrease for array member sets * Basically same speed for local variable sets Importantly, it shows no significant difference between the popped (where return value of the multiple assignment is not needed) and !popped (where return value of the multiple assignment is needed) cases for attribute and array member sets. This indicates the previous optimization, which was dropped in the evaluation order fix and only affected the popped case, is not important to performance. Fixes [Bug #4443]
* st.c: skip all deleted entries [Bug #17779]tompng (tomoya ishida)2021-04-111-0/+11
| | | | | Update the start entry skipping all already deleted entries. Fixes performance issue of `Hash#first` in a certain case.
* Improve Enumerable#tally performanceNobuyoshi Nakada2021-03-161-0/+4
| | | | | | | | | Iteration per second (i/s) | |compare-ruby|built-ruby| |:------|-----------:|---------:| |tally | 52.814| 114.936| | | -| 2.18x|
* Add a benchmark for RubyVM::InstructionSequence.load_from_binaryJean Boussier2021-03-101-0/+25
|
* proc.c: make bind_call use existing callable method entry when possibleJean Boussier2021-03-101-0/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The most common use case for `bind_call` is to protect from core methods being redefined, for instance a typical use: ```ruby UNBOUND_METHOD_MODULE_NAME = Module.instance_method(:name) def real_mod_name(mod) UNBOUND_METHOD_MODULE_NAME.bind_call(mod) end ``` But it's extremely common that the method wasn't actually redefined. In such case we can avoid creating a new callable method entry, and simply delegate to the receiver. This result in a 1.5-2X speed-up for the fast path, and little to no impact on the slowpath: ``` compare-ruby: ruby 3.1.0dev (2021-02-05T06:33:00Z master b2674c1fd7) [x86_64-darwin19] built-ruby: ruby 3.1.0dev (2021-02-15T10:35:17Z bind-call-fastpath d687e06615) [x86_64-darwin19] | |compare-ruby|built-ruby| |:---------|-----------:|---------:| |fastpath | 11.325M| 16.393M| | | -| 1.45x| |slowpath | 10.488M| 10.242M| | | 1.02x| -| ```
* Improve performance some Numeric methods [Feature #17632] (#4190)S.H2021-02-191-0/+13
|
* Do not run File.write while Ractors are runningTakashi Kokubun2021-02-111-12/+15
| | | | also make sure all local variables have the __bmdv_ prefix.
* Add a benchmark-driver runner for Ractor (#4172)Takashi Kokubun2021-02-103-0/+131
| | | | | | | | | | | | | * Add a benchmark-driver runner for Ractor * Process.clock_gettime(Process:CLOCK_MONOTONIC) could be slow in Ruby 3.0 Ractor * Fetching Time could also be slow * Fix a comment * Assert overriding a private method
* Simple benchmark of Float#to_sNobuyoshi Nakada2021-02-101-0/+7
|
* Improve performance Float#positive? and Float#negative? [Feature #17614] (#4160)S.H2021-02-081-0/+8
|
* Rename RubyVM::MJIT to RubyVM::JITTakashi Kokubun2021-01-132-7/+7
| | | | | | | | because the name "MJIT" is an internal code name, it's inconsistent with --jit while they are related to each other, and I want to discourage future JIT implementation-specific (e.g. MJIT-specific) APIs by this rename. [Feature #17490]
* Improve performance some Float methods [Feature #17498] (#4018)S.H2021-01-011-0/+14
|
* Allow inlining Integer#-@ and #~Takashi Kokubun2020-12-221-0/+4
| | | | | | | | | | | | | | | | | | | | | | ``` $ benchmark-driver -v --rbenv 'before --jit;after --jit' benchmark/mjit_integer.yml --filter '(comp|uminus)' before --jit: ruby 3.0.0dev (2020-12-23T05:41:44Z master 0dd4896175) +JIT [x86_64-linux] after --jit: ruby 3.0.0dev (2020-12-23T06:25:41Z master 8887d78992) +JIT [x86_64-linux] last_commit=Allow inlining Integer#-@ and #~ Calculating ------------------------------------- before --jit after --jit mjit_comp(1) 44.006M 70.417M i/s - 40.000M times in 0.908967s 0.568042s mjit_uminus(1) 44.333M 68.422M i/s - 40.000M times in 0.902255s 0.584603s Comparison: mjit_comp(1) after --jit: 70417331.4 i/s before --jit: 44005980.4 i/s - 1.60x slower mjit_uminus(1) after --jit: 68422468.8 i/s before --jit: 44333371.0 i/s - 1.54x slower ```
* fix duplicated nameKoichi Sasada2020-12-161-1/+1
|
* Guard all accesses to RubyVM::MJIT with defined?(RubyVM::MJIT) &&Benoit Daloze2020-12-041-3/+3
| | | | * Otherwise those tests, etc cannot run on alternative Ruby implementations.
* Set allocator on class creationAlan Wu2020-11-161-0/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allocating an instance of a class uses the allocator for the class. When the class has no allocator set, Ruby looks for it in the super class (see rb_get_alloc_func()). It's uncommon for classes created from Ruby code to ever have an allocator set, so it's common during the allocation process to search all the way to BasicObject from the class with which the allocation is being performed. This makes creating instances of classes that have long ancestry chains more expensive than creating instances of classes have that shorter ancestry chains. Setting the allocator at class creation time removes the need to perform a search for the alloctor during allocation. This is a breaking change for C-extensions that assume that classes created from Ruby code have no allocator set. Libraries that setup a class hierarchy in Ruby code and then set the allocator on some parent class, for example, can experience breakage. This seems like an unusual use case and hopefully it is rare or non-existent in practice. Rails has many classes that have upwards of 60 elements in the ancestry chain and benchmark shows a significant improvement for allocating with a class that includes 64 modules. ``` pre: ruby 3.0.0dev (2020-11-12T14:39:27Z master 6325866421) post: ruby 3.0.0dev (2020-11-12T20:15:30Z cut-allocator-lookup) Comparison: allocate_8_deep post: 10336985.6 i/s pre: 8691873.1 i/s - 1.19x slower allocate_32_deep post: 10423181.2 i/s pre: 6264879.1 i/s - 1.66x slower allocate_64_deep post: 10541851.2 i/s pre: 4936321.5 i/s - 2.14x slower allocate_128_deep post: 10451505.0 i/s pre: 3031313.5 i/s - 3.45x slower ```
* Add a benchmark for polymorphic ivar settingAaron Patterson2020-11-091-0/+17
| | | | | | | This benchmark demonstrates the performance of setting an instance variable when the type of object is constantly changing. This benchmark should give us an idea of the performance of ivar setting in a polymorphic environment
* eagerly initialize ivar table when index is small enoughAaron Patterson2020-11-091-0/+14
| | | | | | | | | | | | | When the inline cache is written, the iv table will contain an entry for the instance variable. If we get an inline cache hit, then we know the iv table must contain a value for the index written to the inline cache. If the index in the inline cache is larger than the list on the object, but *smaller* than the iv index table on the class, then we can just eagerly allocate the iv list to be the same size as the iv index table. This avoids duplicate work of checking frozen as well as looking up the index for the particular instance variable name.
* Added benchmark of vm_send by variable [ci skip]Nobuyoshi Nakada2020-10-281-0/+3
|
* Improve the performance of supereileencodes2020-09-231-0/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This PR improves the performance of `super` calls. While working on some Rails optimizations jhawthorn discovered that `super` calls were slower than expected. The changes here do the following: 1) Adds a check for whether the call frame is not equal to the method entry iseq. This avoids the `rb_obj_is_kind_of` check on the next line which is quite slow. If the current call frame is equal to the method entry we know we can't have an instance eval, etc. 2) Changes `FL_TEST` to `FL_TEST_RAW`. This is safe because we've already done the check for `T_ICLASS` above. 3) Adds a benchmark for `T_ICLASS` super calls. 4) Note: makes a chage for `method_entry_cref` to use `const`. On master the benchmarks showed that `super` is 1.76x slower. Our changes improved the performance so that it is now only 1.36x slower. Benchmark IPS: ``` Warming up -------------------------------------- super 244.918k i/100ms method call 383.007k i/100ms Calculating ------------------------------------- super 2.280M (± 6.7%) i/s - 11.511M in 5.071758s method call 3.834M (± 4.9%) i/s - 19.150M in 5.008444s Comparison: method call: 3833648.3 i/s super: 2279837.9 i/s - 1.68x (± 0.00) slower ``` With changes: ``` Warming up -------------------------------------- super 308.777k i/100ms method call 375.051k i/100ms Calculating ------------------------------------- super 2.951M (± 5.4%) i/s - 14.821M in 5.039592s method call 3.551M (± 4.9%) i/s - 18.002M in 5.081695s Comparison: method call: 3551372.7 i/s super: 2950557.9 i/s - 1.20x (± 0.00) slower ``` Ruby VM benchmarks also showed an improvement: Existing `vm_super` benchmark`. ``` $ make benchmark ITEM=vm_super | |compare-ruby|built-ruby| |:---------|-----------:|---------:| |vm_super | 21.555M| 37.819M| | | -| 1.75x| ``` New `vm_iclass_super` benchmark: ``` $ make benchmark ITEM=vm_iclass_super | |compare-ruby|built-ruby| |:----------------|-----------:|---------:| |vm_iclass_super | 1.669M| 3.683M| | | -| 2.21x| ``` This is the benchmark script used for the benchmark-ips benchmarks: ```ruby require "benchmark/ips" class Foo def zuper; end def top; end last_method = "top" ("A".."M").each do |module_name| eval <<-EOM module #{module_name} def zuper; super; end def #{module_name.downcase} #{last_method} end end prepend #{module_name} EOM last_method = module_name.downcase end end foo = Foo.new Benchmark.ips do |x| x.report "super" do foo.zuper end x.report "method call" do foo.m end x.compare! end ``` Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org> Co-authored-by: John Hawthorn <john@hawthorn.email>
* Optimize ObjectSpace.dump_allJean Boussier2020-09-091-0/+13
| | | | | | | | | | | | | | | | | | The two main optimization are: - buffer writes for improved performance - avoid formatting functions when possible ``` | |compare-ruby|built-ruby| |:------------------|-----------:|---------:| |dump_all_string | 1.038| 195.925| | | -| 188.77x| |dump_all_file | 33.453| 139.645| | | -| 4.17x| |dump_all_dev_null | 44.030| 278.552| | | -| 6.33x| ```
* Improved Enumerable::Lazy#zipNobuyoshi Nakada2020-07-231-0/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |compare-ruby|built-ruby| |:-------------------|-----------:|---------:| |first_ary | 290.514k| 296.331k| | | -| 1.02x| |first_nonary | 166.954k| 169.178k| | | -| 1.01x| |first_noarg | 299.547k| 305.358k| | | -| 1.02x| |take3_ary | 129.388k| 188.360k| | | -| 1.46x| |take3_nonary | 90.684k| 112.688k| | | -| 1.24x| |take3_noarg | 131.940k| 189.471k| | | -| 1.44x| |chain-first_ary | 195.913k| 286.194k| | | -| 1.46x| |chain-first_nonary | 127.483k| 168.716k| | | -| 1.32x| |chain-first_noarg | 201.252k| 298.562k| | | -| 1.48x| |chain-take3_ary | 101.189k| 183.188k| | | -| 1.81x| |chain-take3_nonary | 75.381k| 112.301k| | | -| 1.49x| |chain-take3_noarg | 101.483k| 192.148k| | | -| 1.89x| |block | 296.696k| 292.877k| | | 1.01x| -|
* Improved Enumerable::Lazy#flat_mapNobuyoshi Nakada2020-07-231-0/+16
| | | | | | | | | | | | | | | | | | | | |compare-ruby|built-ruby| |:-------|-----------:|---------:| |num3 | 96.333k| 160.732k| | | -| 1.67x| |num10 | 96.615k| 159.150k| | | -| 1.65x| |ary2 | 103.836k| 172.787k| | | -| 1.66x| |ary10 | 109.249k| 177.252k| | | -| 1.62x| |ary20 | 106.628k| 177.371k| | | -| 1.66x| |ary50 | 107.135k| 162.282k| | | -| 1.51x| |ary100 | 106.513k| 177.626k| | | -| 1.67x|
* Optimize Array#min (#3324)Kenta Murata2020-07-181-0/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The benchmark result is below: | |compare-ruby|built-ruby| |:---------------|-----------:|---------:| |ary2.min | 39.105M| 39.442M| | | -| 1.01x| |ary10.min | 23.995M| 30.762M| | | -| 1.28x| |ary100.min | 6.249M| 10.783M| | | -| 1.73x| |ary500.min | 1.408M| 2.714M| | | -| 1.93x| |ary1000.min | 828.397k| 1.465M| | | -| 1.77x| |ary2000.min | 332.256k| 570.504k| | | -| 1.72x| |ary3000.min | 338.079k| 573.868k| | | -| 1.70x| |ary5000.min | 168.217k| 286.114k| | | -| 1.70x| |ary10000.min | 85.512k| 143.551k| | | -| 1.68x| |ary20000.min | 43.264k| 71.935k| | | -| 1.66x| |ary50000.min | 17.317k| 29.107k| | | -| 1.68x| |ary100000.min | 9.072k| 14.540k| | | -| 1.60x| |ary1000000.min | 872.930| 1.436k| | | -| 1.64x| compare-ruby is 9f4b7fc82e.
* Optimize Array#max (#3325)Kenta Murata2020-07-183-0/+91
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The benchmark result is below: | |compare-ruby|built-ruby| |:---------------|-----------:|---------:| |ary2.max | 38.837M| 40.830M| | | -| 1.05x| |ary10.max | 23.035M| 32.626M| | | -| 1.42x| |ary100.max | 5.490M| 11.020M| | | -| 2.01x| |ary500.max | 1.324M| 2.679M| | | -| 2.02x| |ary1000.max | 699.167k| 1.403M| | | -| 2.01x| |ary2000.max | 284.321k| 570.446k| | | -| 2.01x| |ary3000.max | 282.613k| 571.683k| | | -| 2.02x| |ary5000.max | 145.120k| 285.546k| | | -| 1.97x| |ary10000.max | 72.102k| 142.831k| | | -| 1.98x| |ary20000.max | 36.065k| 72.077k| | | -| 2.00x| |ary50000.max | 14.343k| 29.139k| | | -| 2.03x| |ary100000.max | 7.586k| 14.472k| | | -| 1.91x| |ary1000000.max | 726.915| 1.495k| | | -| 2.06x|
* Inline builtin struct arefTakashi Kokubun2020-07-061-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | We don't do this for aset because it might raise a FrozenError. ``` $ benchmark-driver -v --rbenv 'before;after;before --jit;after --jit' benchmark/mjit_struct_aref.yml --repeat-count=4 before: ruby 2.8.0dev (2020-07-06T01:47:11Z master d94ef7c6b6) [x86_64-linux] after: ruby 2.8.0dev (2020-07-06T07:11:51Z master 85425168f4) [x86_64-linux] last_commit=Inline builtin struct aref before --jit: ruby 2.8.0dev (2020-07-06T01:47:11Z master d94ef7c6b6) +JIT [x86_64-linux] after --jit: ruby 2.8.0dev (2020-07-06T07:11:51Z master 85425168f4) +JIT [x86_64-linux] last_commit=Inline builtin struct aref Calculating ------------------------------------- before after before --jit after --jit mjit_struct_aref(struct) 34.783M 34.810M 48.321M 58.378M i/s - 40.000M times in 1.149996s 1.149097s 0.827794s 0.685192s Comparison: mjit_struct_aref(struct) after --jit: 58377836.7 i/s before --jit: 48321205.7 i/s - 1.21x slower after: 34809935.5 i/s - 1.68x slower before: 34782736.5 i/s - 1.68x slower ```
* Make Kernel#then, #yield_self, #frozen? builtin (#3283)Takashi Kokubun2020-07-032-0/+15
| | | | | * Make Kernel#then, #yield_self, #frozen? builtin * Fix test_jit
* Rewrite Kernel#tap with Ruby (#3281)Takashi Kokubun2020-07-031-0/+6
| | | | | | | | | | | | * Rewrite Kernel#tap with Ruby This was good for VM too, but of course my intention is to unblock JIT's inlining of a block over yield (inlining invokeyield has not been committed though). * Fix test_settracefunc About the :tap deletions, the :tap events are actually traced (we already have a TracePoint test for builtin methods), but it's filtered out by tp.path == "xyzzy" (it became "<internal:kernel>"). We could trace tp.path == "<internal:kernel>" cases too, but the lineno is impacted by kernel.rb changes and I didn't want to make it fragile for kernel.rb lineno changes.
* Mark some Integer methods as inline (#3264)Takashi Kokubun2020-06-272-21/+26
|
* Add pattern matching with arrays benchmarkVladimir Dementyev2020-06-271-0/+19
|
* Decide JIT-ed insn based on cached cfuncTakashi Kokubun2020-06-251-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | for opt_* insns. opt_eq handles rb_obj_equal inside opt_eq, and all other cfunc is handled by opt_send_without_block. Therefore we can't decide which insn should be generated by checking whether it's cfunc cc or not. ``` $ benchmark-driver -v --rbenv 'before --jit;after --jit' benchmark/mjit_opt_cc_insns.yml --repeat-count=4 before --jit: ruby 2.8.0dev (2020-06-26T05:21:43Z master 9dbc2294a6) +JIT [x86_64-linux] after --jit: ruby 2.8.0dev (2020-06-26T06:30:18Z master 75cece1b0b) +JIT [x86_64-linux] last_commit=Decide JIT-ed insn based on cached cfunc Calculating ------------------------------------- before --jit after --jit mjit_nil?(1) 73.878M 74.021M i/s - 40.000M times in 0.541432s 0.540391s mjit_not(1) 72.635M 74.601M i/s - 40.000M times in 0.550702s 0.536187s mjit_eq(1, nil) 7.331M 7.445M i/s - 8.000M times in 1.091211s 1.074596s mjit_eq(nil, 1) 49.450M 64.711M i/s - 8.000M times in 0.161781s 0.123627s Comparison: mjit_nil?(1) after --jit: 74020528.4 i/s before --jit: 73878185.9 i/s - 1.00x slower mjit_not(1) after --jit: 74600882.0 i/s before --jit: 72634507.6 i/s - 1.03x slower mjit_eq(1, nil) after --jit: 7444657.4 i/s before --jit: 7331304.3 i/s - 1.02x slower mjit_eq(nil, 1) after --jit: 64710790.6 i/s before --jit: 49449507.4 i/s - 1.31x slower ```
* Annotate Kernel#class as inline (#3250)Takashi Kokubun2020-06-232-7/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | ``` $ benchmark-driver -v --rbenv 'before;after;before --jit;after --jit' benchmark/mjit_class.yml --repeat-count=4 before: ruby 2.8.0dev (2020-06-23T07:09:54Z master 37a2e48d76) [x86_64-linux] after: ruby 2.8.0dev (2020-06-23T17:29:56Z inline-class 0ff147c007) [x86_64-linux] before --jit: ruby 2.8.0dev (2020-06-23T07:09:54Z master 37a2e48d76) +JIT [x86_64-linux] after --jit: ruby 2.8.0dev (2020-06-23T17:29:56Z inline-class 0ff147c007) +JIT [x86_64-linux] Calculating ------------------------------------- before after before --jit after --jit mjit_class(self) 39.219M 40.060M 53.502M 69.202M i/s - 40.000M times in 1.019915s 0.998495s 0.747631s 0.578021s mjit_class(1) 39.567M 41.242M 52.100M 68.895M i/s - 40.000M times in 1.010935s 0.969885s 0.767749s 0.580591s Comparison: mjit_class(self) after --jit: 69201690.7 i/s before --jit: 53502336.4 i/s - 1.29x slower after: 40060289.1 i/s - 1.73x slower before: 39218939.2 i/s - 1.76x slower mjit_class(1) after --jit: 68895358.6 i/s before --jit: 52100353.0 i/s - 1.32x slower after: 41241993.6 i/s - 1.67x slower before: 39567314.0 i/s - 1.74x slower ```
* Compile opt_send for opt_* only when cc has ISeqTakashi Kokubun2020-06-221-0/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | because opt_nil/opt_not/opt_eq populates cc even when it doesn't fallback to opt_send_without_block because of vm_method_cfunc_is. ``` $ benchmark-driver -v --rbenv 'before --jit;after --jit' benchmark/mjit_opt_cc_insns.yml --repeat-count=4 before --jit: ruby 2.8.0dev (2020-06-22T08:11:24Z master d231b8f95b) +JIT [x86_64-linux] after --jit: ruby 2.8.0dev (2020-06-22T08:53:27Z master e1125879ed) +JIT [x86_64-linux] last_commit=Compile opt_send for opt_* only when cc has ISeq Calculating ------------------------------------- before --jit after --jit mjit_nil?(1) 54.106M 73.693M i/s - 40.000M times in 0.739288s 0.542795s mjit_not(1) 53.398M 74.477M i/s - 40.000M times in 0.749090s 0.537075s mjit_eq(1, nil) 7.427M 6.497M i/s - 8.000M times in 1.077136s 1.231326s Comparison: mjit_nil?(1) after --jit: 73692594.3 i/s before --jit: 54106108.4 i/s - 1.36x slower mjit_not(1) after --jit: 74477487.9 i/s before --jit: 53398125.0 i/s - 1.39x slower mjit_eq(1, nil) before --jit: 7427105.9 i/s after --jit: 6497063.0 i/s - 1.14x slower ``` Actually opt_eq becomes slower by this. Maybe it's indeed using opt_send_without_block, but I'll approach that one in another commit.
* Share warmup logic across MJIT benchmarksTakashi Kokubun2020-06-225-30/+38
|
* The RUBYOPT= comment is no longer neededTakashi Kokubun2020-06-223-6/+0
|
* Stop relying on `make benchmark`'s `-I$(srcdir)/benchmark/lib`Takashi Kokubun2020-06-223-3/+3
| | | | | | These days I don't use `make benchmark`. The YAML files should be executable with bare `benchmark-driver` CLI without passing `RUBYOPT=-Ibenchmark/lib`.
* Introduce Primitive.attr! to annotate 'inline' (#3242)Takashi Kokubun2020-06-201-0/+36
| | | [Feature #15589]
* Make Integer#zero? a separated method and builtin (#3226)Takashi Kokubun2020-06-201-0/+8
| | | | | | | A prerequisite to fix https://bugs.ruby-lang.org/issues/15589 with JIT. This commit alone doesn't make a significant difference yet, but I thought this commit should be committed independently. This method override was discussed in [Misc #16961].
* Fix `make benchmark` exampleRyuta Kamizono2020-06-071-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | `make benchmark ARGS=../benchmark/erb_render.yml` does not work. ``` % make benchmark ARGS=../benchmark/erb_render.yml /Users/kamipo/.rbenv/shims/ruby --disable=gems -rrubygems -I./benchmark/lib ./benchmark/benchmark-driver/exe/benchmark-driver \ --executables="compare-ruby::/Users/kamipo/.rbenv/shims/ruby --disable=gems -I.ext/common --disable-gem" \ --executables="built-ruby::./miniruby -I./lib -I. -I.ext/common ./tool/runruby.rb --extout=.ext -- --disable-gems --disable-gem" \ ../benchmark/erb_render.yml Traceback (most recent call last): 6: from ./benchmark/benchmark-driver/exe/benchmark-driver:112:in `<main>' 5: from ./benchmark/benchmark-driver/exe/benchmark-driver:112:in `flat_map' 4: from ./benchmark/benchmark-driver/exe/benchmark-driver:112:in `each' 3: from ./benchmark/benchmark-driver/exe/benchmark-driver:122:in `block in <main>' 2: from /Users/kamipo/.rbenv/versions/2.6.6/lib/ruby/2.6.0/psych.rb:577:in `load_file' 1: from /Users/kamipo/.rbenv/versions/2.6.6/lib/ruby/2.6.0/psych.rb:577:in `open' /Users/kamipo/.rbenv/versions/2.6.6/lib/ruby/2.6.0/psych.rb:577:in `initialize': No such file or directory @ rb_sysopen - ../benchmark/erb_render.yml (Errno::ENOENT) make: *** [benchmark] Error 1 % make benchmark ARGS=benchmark/erb_render.yml /Users/kamipo/.rbenv/shims/ruby --disable=gems -rrubygems -I./benchmark/lib ./benchmark/benchmark-driver/exe/benchmark-driver \ --executables="compare-ruby::/Users/kamipo/.rbenv/shims/ruby --disable=gems -I.ext/common --disable-gem" \ --executables="built-ruby::./miniruby -I./lib -I. -I.ext/common ./tool/runruby.rb --extout=.ext -- --disable-gems --disable-gem" \ benchmark/erb_render.yml Calculating ------------------------------------- compare-ruby built-ruby erb_render 825.454k 783.664k i/s - 1.500M times in 1.817181s 1.914086s Comparison: erb_render compare-ruby: 825454.4 i/s built-ruby: 783663.8 i/s - 1.05x slower ```
* add benchmark for different block handlers卜部昌平2020-06-031-0/+27
|