| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When an object becomes "too complex" (in other words it has too many
variations in the shape tree), we transition it to use a "too complex"
shape and use a hash for storing instance variables.
Without this patch, there were rare cases where shape tree growth could
"explode" and cause performance degradation on what would otherwise have
been cached fast paths.
This patch puts a limit on shape tree growth, and gracefully degrades in
the rare case where there could be a factorial growth in the shape tree.
For example:
```ruby
class NG; end
HUGE_NUMBER.times do
NG.new.instance_variable_set(:"@unique_ivar_#{_1}", 1)
end
```
We consider objects to be "too complex" when the object's class has more
than SHAPE_MAX_VARIATIONS (currently 8) leaf nodes in the shape tree and
the object introduces a new variation (a new leaf node) associated with
that class.
For example, new variations on instances of the following class would be
considered "too complex" because those instances create more than 8
leaves in the shape tree:
```ruby
class Foo; end
9.times { Foo.new.instance_variable_set(":@uniq_#{_1}", 1) }
```
However, the following class is *not* too complex because it only has
one leaf in the shape tree:
```ruby
class Foo
def initialize
@a = @b = @c = @d = @e = @f = @g = @h = @i = nil
end
end
9.times { Foo.new }
``
This case is rare, so we don't expect this change to impact performance
of most applications, but it needs to be handled.
Co-Authored-By: Aaron Patterson <tenderlove@ruby-lang.org>
|
| |
|
|
|
|
| |
I plan to rebuild MJIT metrics later, not using debug counters.
|
|
|
|
|
|
| |
The structure and readability of jit_exec is messed up right now. I'd
like to help the current situation by this for now. I'll resurrect
them when I need it again.
|
|
|
|
| |
This reverts commit 9a6803c90b817f70389cae10d60b50ad752da48f.
|
|
|
|
| |
This reverts commit 68bc9e2e97d12f80df0d113e284864e225f771c2.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Object Shapes is used for accessing instance variables and representing the
"frozenness" of objects. Object instances have a "shape" and the shape
represents some attributes of the object (currently which instance variables are
set and the "frozenness"). Shapes form a tree data structure, and when a new
instance variable is set on an object, that object "transitions" to a new shape
in the shape tree. Each shape has an ID that is used for caching. The shape
structure is independent of class, so objects of different types can have the
same shape.
For example:
```ruby
class Foo
def initialize
# Starts with shape id 0
@a = 1 # transitions to shape id 1
@b = 1 # transitions to shape id 2
end
end
class Bar
def initialize
# Starts with shape id 0
@a = 1 # transitions to shape id 1
@b = 1 # transitions to shape id 2
end
end
foo = Foo.new # `foo` has shape id 2
bar = Bar.new # `bar` has shape id 2
```
Both `foo` and `bar` instances have the same shape because they both set
instance variables of the same name in the same order.
This technique can help to improve inline cache hits as well as generate more
efficient machine code in JIT compilers.
This commit also adds some methods for debugging shapes on objects. See
`RubyVM::Shape` for more details.
For more context on Object Shapes, see [Feature: #18776]
Co-Authored-By: Aaron Patterson <tenderlove@ruby-lang.org>
Co-Authored-By: Eileen M. Uchitelle <eileencodes@gmail.com>
Co-Authored-By: John Hawthorn <john@hawthorn.email>
|
|
|
|
|
|
|
|
|
|
| |
Revert "* expand tabs. [ci skip]"
This reverts commit 830b5b5c351c5c6efa5ad461ae4ec5085e5f0275.
Revert "This commit implements the Object Shapes technique in CRuby."
This reverts commit 9ddfd2ca004d1952be79cf1b84c52c79a55978f4.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Object Shapes is used for accessing instance variables and representing the
"frozenness" of objects. Object instances have a "shape" and the shape
represents some attributes of the object (currently which instance variables are
set and the "frozenness"). Shapes form a tree data structure, and when a new
instance variable is set on an object, that object "transitions" to a new shape
in the shape tree. Each shape has an ID that is used for caching. The shape
structure is independent of class, so objects of different types can have the
same shape.
For example:
```ruby
class Foo
def initialize
# Starts with shape id 0
@a = 1 # transitions to shape id 1
@b = 1 # transitions to shape id 2
end
end
class Bar
def initialize
# Starts with shape id 0
@a = 1 # transitions to shape id 1
@b = 1 # transitions to shape id 2
end
end
foo = Foo.new # `foo` has shape id 2
bar = Bar.new # `bar` has shape id 2
```
Both `foo` and `bar` instances have the same shape because they both set
instance variables of the same name in the same order.
This technique can help to improve inline cache hits as well as generate more
efficient machine code in JIT compilers.
This commit also adds some methods for debugging shapes on objects. See
`RubyVM::Shape` for more details.
For more context on Object Shapes, see [Feature: #18776]
Co-Authored-By: Aaron Patterson <tenderlove@ruby-lang.org>
Co-Authored-By: Eileen M. Uchitelle <eileencodes@gmail.com>
Co-Authored-By: John Hawthorn <john@hawthorn.email>
|
|
|
|
|
|
|
| |
* Rename mjit_exec to jit_exec
* Rename mjit_exec_slowpath to mjit_check_iseq
* Remove mjit_exec references from comments
|
|
|
|
|
| |
Introduce new optimized method type
`OPTIMIZED_METHOD_TYPE_STRUCT_AREF/ASET` with index information.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Redo of 34a2acdac788602c14bf05fb616215187badd504 and
931138b00696419945dc03e10f033b1f53cd50f3 which were reverted.
GitHub PR #4340.
This change implements a cache for class variables. Previously there was
no cache for cvars. Cvar access is slow due to needing to travel all the
way up th ancestor tree before returning the cvar value. The deeper the
ancestor tree the slower cvar access will be.
The benefits of the cache are more visible with a higher number of
included modules due to the way Ruby looks up class variables. The
benchmark here includes 26 modules and shows with the cache, this branch
is 6.5x faster when accessing class variables.
```
compare-ruby: ruby 3.1.0dev (2021-03-15T06:22:34Z master 9e5105c) [x86_64-darwin19]
built-ruby: ruby 3.1.0dev (2021-03-15T12:12:44Z add-cache-for-clas.. c6be009) [x86_64-darwin19]
| |compare-ruby|built-ruby|
|:--------|-----------:|---------:|
|vm_cvar | 5.681M| 36.980M|
| | -| 6.51x|
```
Benchmark.ips calling `ActiveRecord::Base.logger` from within a Rails
application. ActiveRecord::Base.logger has 71 ancestors. The more
ancestors a tree has, the more clear the speed increase. IE if Base had
only one ancestor we'd see no improvement. This benchmark is run on a
vanilla Rails application.
Benchmark code:
```ruby
require "benchmark/ips"
require_relative "config/environment"
Benchmark.ips do |x|
x.report "logger" do
ActiveRecord::Base.logger
end
end
```
Ruby 3.0 master / Rails 6.1:
```
Warming up --------------------------------------
logger 155.251k i/100ms
Calculating -------------------------------------
```
Ruby 3.0 with cvar cache / Rails 6.1:
```
Warming up --------------------------------------
logger 1.546M i/100ms
Calculating -------------------------------------
logger 14.857M (± 4.8%) i/s - 74.198M in 5.006202s
```
Lastly we ran a benchmark to demonstate the difference between master
and our cache when the number of modules increases. This benchmark
measures 1 ancestor, 30 ancestors, and 100 ancestors.
Ruby 3.0 master:
```
Warming up --------------------------------------
1 module 1.231M i/100ms
30 modules 432.020k i/100ms
100 modules 145.399k i/100ms
Calculating -------------------------------------
1 module 12.210M (± 2.1%) i/s - 61.553M in 5.043400s
30 modules 4.354M (± 2.7%) i/s - 22.033M in 5.063839s
100 modules 1.434M (± 2.9%) i/s - 7.270M in 5.072531s
Comparison:
1 module: 12209958.3 i/s
30 modules: 4354217.8 i/s - 2.80x (± 0.00) slower
100 modules: 1434447.3 i/s - 8.51x (± 0.00) slower
```
Ruby 3.0 with cvar cache:
```
Warming up --------------------------------------
1 module 1.641M i/100ms
30 modules 1.655M i/100ms
100 modules 1.620M i/100ms
Calculating -------------------------------------
1 module 16.279M (± 3.8%) i/s - 82.038M in 5.046923s
30 modules 15.891M (± 3.9%) i/s - 79.459M in 5.007958s
100 modules 16.087M (± 3.6%) i/s - 81.005M in 5.041931s
Comparison:
1 module: 16279458.0 i/s
100 modules: 16087484.6 i/s - same-ish: difference falls within error
30 modules: 15891406.2 i/s - same-ish: difference falls within error
```
Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
|
|
|
|
|
| |
This reverts commit 08de37f9fa3469365e6b5c964689ae2bae0eb9f3.
This reverts commit e8ae922b62adb00a80d3d4c49f7d7b0e6026eaba.
|
|
|
|
|
|
| |
Instead of on read. Once it's in the inline cache we never have to make
one again. We want to eventually put the value into the cache, and the
best opportunity to do that is when you write the value.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This change implements a cache for class variables. Previously there was
no cache for cvars. Cvar access is slow due to needing to travel all the
way up th ancestor tree before returning the cvar value. The deeper the
ancestor tree the slower cvar access will be.
The benefits of the cache are more visible with a higher number of
included modules due to the way Ruby looks up class variables. The
benchmark here includes 26 modules and shows with the cache, this branch
is 6.5x faster when accessing class variables.
```
compare-ruby: ruby 3.1.0dev (2021-03-15T06:22:34Z master 9e5105ca45) [x86_64-darwin19]
built-ruby: ruby 3.1.0dev (2021-03-15T12:12:44Z add-cache-for-clas.. c6be0093ae) [x86_64-darwin19]
| |compare-ruby|built-ruby|
|:--------|-----------:|---------:|
|vm_cvar | 5.681M| 36.980M|
| | -| 6.51x|
```
Benchmark.ips calling `ActiveRecord::Base.logger` from within a Rails
application. ActiveRecord::Base.logger has 71 ancestors. The more
ancestors a tree has, the more clear the speed increase. IE if Base had
only one ancestor we'd see no improvement. This benchmark is run on a
vanilla Rails application.
Benchmark code:
```ruby
require "benchmark/ips"
require_relative "config/environment"
Benchmark.ips do |x|
x.report "logger" do
ActiveRecord::Base.logger
end
end
```
Ruby 3.0 master / Rails 6.1:
```
Warming up --------------------------------------
logger 155.251k i/100ms
Calculating -------------------------------------
```
Ruby 3.0 with cvar cache / Rails 6.1:
```
Warming up --------------------------------------
logger 1.546M i/100ms
Calculating -------------------------------------
logger 14.857M (± 4.8%) i/s - 74.198M in 5.006202s
```
Lastly we ran a benchmark to demonstate the difference between master
and our cache when the number of modules increases. This benchmark
measures 1 ancestor, 30 ancestors, and 100 ancestors.
Ruby 3.0 master:
```
Warming up --------------------------------------
1 module 1.231M i/100ms
30 modules 432.020k i/100ms
100 modules 145.399k i/100ms
Calculating -------------------------------------
1 module 12.210M (± 2.1%) i/s - 61.553M in 5.043400s
30 modules 4.354M (± 2.7%) i/s - 22.033M in 5.063839s
100 modules 1.434M (± 2.9%) i/s - 7.270M in 5.072531s
Comparison:
1 module: 12209958.3 i/s
30 modules: 4354217.8 i/s - 2.80x (± 0.00) slower
100 modules: 1434447.3 i/s - 8.51x (± 0.00) slower
```
Ruby 3.0 with cvar cache:
```
Warming up --------------------------------------
1 module 1.641M i/100ms
30 modules 1.655M i/100ms
100 modules 1.620M i/100ms
Calculating -------------------------------------
1 module 16.279M (± 3.8%) i/s - 82.038M in 5.046923s
30 modules 15.891M (± 3.9%) i/s - 79.459M in 5.007958s
100 modules 16.087M (± 3.6%) i/s - 81.005M in 5.041931s
Comparison:
1 module: 16279458.0 i/s
100 modules: 16087484.6 i/s - same-ish: difference falls within error
30 modules: 15891406.2 i/s - same-ish: difference falls within error
```
Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
rb_funcall* (rb_funcall(), rb_funcallv(), ...) functions invokes
Ruby's method with given receiver. Ruby 2.7 introduced inline method
cache with static memory area. However, Ruby 3.0 reimplemented the
method cache data structures and the inline cache was removed.
Without inline cache, rb_funcall* searched methods everytime.
Most of cases per-Class Method Cache (pCMC) will be helped but
pCMC requires VM-wide locking and it hurts performance on
multi-Ractor execution, especially all Ractors calls methods
with rb_funcall*.
This patch introduced Global Call-Cache Cache Table (gccct) for
rb_funcall*. Call-Cache was introduced from Ruby 3.0 to manage
method cache entry atomically and gccct enables method-caching
without VM-wide locking. This table solves the performance issue
on multi-ractor execution.
[Bug #17497]
Ruby-level method invocation does not use gccct because it has
inline-method-cache and the table size is limited. Basically
rb_funcall* is not used frequently, so 1023 entries can be enough.
We will revisit the table size if it is not enough.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
constant cache `IC` is accessed by non-atomic manner and there are
thread-safety issues, so Ruby 3.0 disables to use const cache on
non-main ractors.
This patch enables it by introducing `imemo_constcache` and allocates
it by every re-fill of const cache like `imemo_callcache`.
[Bug #17510]
Now `IC` only has one entry `IC::entry` and it points to
`iseq_inline_constant_cache_entry`, managed by T_IMEMO object.
`IC` is atomic data structure so `rb_mjit_before_vm_ic_update()` and
`rb_mjit_after_vm_ic_update()` is not needed.
|
| |
|
| |
|
|
|
|
|
|
|
|
| |
* vm_sync_lock
* vm_sync_lock_enter
* vm_sync_lock_enter_nb
* vm_sync_lock_enter_cr
* vm_sync_barrier
|
|
|
|
|
|
|
| |
add cc_found_in_ccs (renamed from cc_found_ccs), cc_not_found_in_ccs,
call0_public, call0_other debug counters to measure more details.
also it contains several modification.
|
|
|
|
|
| |
mc_inline_miss_same_def is added to check same method or not.
Also the mc_inline_miss_same_cc calculation was fixed.
|
|
|
|
| |
ccs_not_found to count not found in ccs table.
|
| |
|
|
|
|
| |
counts for invalidating negative cache.
|
|
|
|
|
|
| |
This commit adds a debug counter for the case where the inline cache
*missed* but the ivar index table has an entry for that ivar. This is a
case where a polymorphic cache could help
|
| |
|
| |
|
| |
|
|
|
|
| |
To fix build failures.
|
|
|
|
| |
This shall fix compile errors.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
when there's no need to call CALLER_SETUP_ARG and CALLER_REMOVE_EMPTY_KW_SPLAT
(i.e. !rb_splat_or_kwargs_p(ci) && !calling->kw_splat).
Micro benchmark:
```
$ benchmark-driver -v --rbenv 'before;after' benchmark/vm_send_cfunc.yml --repeat-count=4
before: ruby 2.8.0dev (2020-04-13T23:45:05Z master b9d3ceee8f) [x86_64-linux]
after: ruby 2.8.0dev (2020-04-14T00:48:52Z no-splat-fastpath 418d363722) [x86_64-linux]
Calculating -------------------------------------
before after
vm_send_cfunc 69.585M 88.724M i/s - 100.000M times in 1.437097s 1.127096s
Comparison:
vm_send_cfunc
after: 88723605.2 i/s
before: 69584737.1 i/s - 1.28x slower
```
Optcarrot:
```
$ benchmark-driver -v --rbenv 'before;after' benchmark.yml --repeat-count=12 --output=all
before: ruby 2.8.0dev (2020-04-13T23:45:05Z master b9d3ceee8f) [x86_64-linux]
after: ruby 2.8.0dev (2020-04-14T00:48:52Z no-splat-fastpath 418d363722) [x86_64-linux]
Calculating -------------------------------------
before after
Optcarrot Lan_Master.nes 50.76119601545175 42.73858236484051 fps
50.76388649761503 51.04211379912850
50.80930672252514 51.39455790755538
50.90236000778749 51.75656936556145
51.01744746340430 51.86875277356489
51.06495279015112 51.88692482485558
51.07785337168974 51.93429603190578
51.20163525187862 51.95768145071314
51.34671771913112 52.45577266040274
51.35918340835583 52.53163888762858
51.46641337418146 52.62172484121034
51.50835463462257 52.85064021113239
```
|
|
|
| |
Split ruby.h
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
JIT support of dd723771c11.
$ benchmark-driver -v --rbenv 'before;before --jit;after --jit' benchmark/mjit_exivar.yml --repeat-count=4
before: ruby 2.8.0dev (2020-03-30T12:32:26Z master e5db3da9d3) [x86_64-linux]
before --jit: ruby 2.8.0dev (2020-03-30T12:32:26Z master e5db3da9d3) +JIT [x86_64-linux]
after --jit: ruby 2.8.0dev (2020-03-31T05:57:24Z mjit-exivar 128625baec) +JIT [x86_64-linux]
Calculating -------------------------------------
before before --jit after --jit
mjit_exivar 57.944M 53.579M 54.471M i/s - 200.000M times in 3.451588s 3.732772s 3.671687s
Comparison:
mjit_exivar
before: 57944345.1 i/s
after --jit: 54470876.7 i/s - 1.06x slower
before --jit: 53579483.4 i/s - 1.08x slower
|
| |
|
|
|
|
| |
changing add_iseq_to_process's debug counter name as well for comparison
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch contains several ideas:
(1) Disposable inline method cache (IMC) for race-free inline method cache
* Making call-cache (CC) as a RVALUE (GC target object) and allocate new
CC on cache miss.
* This technique allows race-free access from parallel processing
elements like RCU.
(2) Introduce per-Class method cache (pCMC)
* Instead of fixed-size global method cache (GMC), pCMC allows flexible
cache size.
* Caching CCs reduces CC allocation and allow sharing CC's fast-path
between same call-info (CI) call-sites.
(3) Invalidate an inline method cache by invalidating corresponding method
entries (MEs)
* Instead of using class serials, we set "invalidated" flag for method
entry itself to represent cache invalidation.
* Compare with using class serials, the impact of method modification
(add/overwrite/delete) is small.
* Updating class serials invalidate all method caches of the class and
sub-classes.
* Proposed approach only invalidate the method cache of only one ME.
See [Feature #16614] for more details.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now, rb_call_info contains how to call the method with tuple of
(mid, orig_argc, flags, kwarg). Most of cases, kwarg == NULL and
mid+argc+flags only requires 64bits. So this patch packed
rb_call_info to VALUE (1 word) on such cases. If we can not
represent it in VALUE, then use imemo_callinfo which contains
conventional callinfo (rb_callinfo, renamed from rb_call_info).
iseq->body->ci_kw_size is removed because all of callinfo is VALUE
size (packed ci or a pointer to imemo_callinfo).
To access ci information, we need to use these functions:
vm_ci_mid(ci), _flag(ci), _argc(ci), _kwarg(ci).
struct rb_call_info_kw_arg is renamed to rb_callinfo_kwarg.
rb_funcallv_with_cc() and rb_method_basic_definition_p_with_cc()
is temporary removed because cd->ci should be marked.
|
|
|
|
| |
Include what is necessary.
|
|
|
|
| |
These functions are enabled only on USE_DEBUG_COUNTER=1.
|
| |
|
|
|
|
|
| |
Fixed misspellings reported at [Bug #16437], missed and a new
typo.
|
| |
|
| |
|
|
|
|
|
| |
Implemented fine-grained inspection of cache misshits. Handy for
counting the reasons why an inline method cache was evicted.
|
|
|
|
|
| |
Introduce a new debug counter `obj_ary_extracapa` which counts
arrays which are `len < capa`.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
for debug counters
```
../include/ruby/intern.h:1175:137: warning: passing argument 3 of 'rb_define_singleton_method0' from incompatible pointer type [-Wincompatible-pointer-types]
#define rb_define_singleton_method(klass, mid, func, arity) rb_define_singleton_method_choose_prototypem3((arity),(func))((klass),(mid),(func),(arity));
^
../vm.c:2958:5: note: in expansion of macro 'rb_define_singleton_method'
rb_define_singleton_method(rb_cRubyVM, "show_debug_counters", rb_debug_counter_show, 0);
^~~~~~~~~~~~~~~~~~~~~~~~~~
../include/ruby/intern.h:1139:99: note: expected 'VALUE (*)(VALUE) {aka long unsigned int (*)(long unsigned int)}' but argument is of type 'VALUE (*)(void) {aka long unsigned int (*)(void)}'
__attribute__((__unused__,__weakref__("rb_define_singleton_method"),__nonnull__(2,3)))static void rb_define_singleton_method0 (VALUE,const char*,VALUE(*)(VALUE),int);
```
|
|
|
|
|
|
|
|
| |
I am trying to study debug counters inside a Rails application.
Accessing debug counters by killing the process is hard because child
processes don't get the same TRAP as the parent, and Rails seems to
intercept calls to `exit`. Adding this method lets me print the debug
counters when I want (at the end of requests for example)
|
|
|
|
| |
add debug_counters to check the Hash object statistics.
|