summaryrefslogtreecommitdiff
path: root/vm_insnhelper.c
Commit message (Collapse)AuthorAgeFilesLines
* Try to fix MJIT symbol clash with cargo cultMaxime Chevalier-Boisvert2021-10-201-2/+2
|
* Implement defined bytecode (#39)Maxime Chevalier-Boisvert2021-10-201-0/+6
|
* Implement setivar with a plain old function call (#34)Maxime Chevalier-Boisvert2021-10-201-0/+6
| | | | | * Implement setivar with a plain old function call * Remove return
* Implement opt_aset as interpreter handler callMaxime Chevalier-Boisvert2021-10-201-0/+6
|
* Implement opt_mod as call to interpreter function (#29)Maxime Chevalier-Boisvert2021-10-201-0/+6
|
* Implement opt_eq by calling interpreter function (#28)Maxime Chevalier-Boisvert2021-10-201-0/+6
|
* Implement send with alias method (#23)Maxime Chevalier-Boisvert2021-10-201-0/+6
| | | | | * Implement send with alias method * Add alias_method tests
* Implement calls to methods with simple optional paramsAlan Wu2021-10-201-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | * Implement calls to methods with simple optional params * Remove unnecessary MJIT_STATIC See comment for MJIT_STATIC. I added it not knowing whether it's required because the function next to it has it. Don't use it and wait for problems to come up instead. * Better naming, some comments * Count bailing on kw only iseqs On railsbench: ``` opt_send_without_block exit reasons: bmethod 59729 (27.7%) optimized_method 59137 (27.5%) iseq_complex_callee 41362 (19.2%) alias_method 33346 (15.5%) callsite_not_simple 19170 ( 8.9%) iseq_only_keywords 1300 ( 0.6%) kw_splat 1299 ( 0.6%) cfunc_ruby_array_varg 18 ( 0.0%) ```
* YJIT: Fancier opt_getinlinecacheAlan Wu2021-10-201-0/+3
| | | | | | | Make sure `opt_getinlinecache` is in a block all on its own, and invalidate it from the interpreter when `opt_setinlinecache`. It will recompile with a filled cache the second time around. This lets YJIT runs well when the IC for constant is cold.
* YJIT: lazy polymorphic getinstancevariableAlan Wu2021-10-201-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | Lazily compile out a chain of checks for different known classes and whether `self` embeds its ivars or not. * Remove trailing whitespaces * Get proper addresss in Capstone disassembly * Lowercase address in Capstone disassembly Capstone uses lowercase for jump targets in generated listings. Let's match it. * Use the same successor in getivar guard chains Cuts down on duplication * Address reviews * Fix copypasta error * Add a comment
* WIP JIT-to-JIT returnsMaxime Chevalier-Boisvert2021-10-201-0/+1
|
* Remove useless castsNobuyoshi Nakada2021-10-191-2/+2
|
* Get rid of type-punning castNobuyoshi Nakada2021-10-191-1/+3
|
* Using NIL_P macro instead of `== Qnil`S.H2021-10-031-2/+2
|
* Introduce rb_vm_call_with_refinements to DRY up a few callsJeremy Evans2021-10-011-27/+4
|
* Make Array#min/max optimization respect refined methodsJeremy Evans2021-09-301-4/+18
| | | | | | | | | Pass in ec to vm_opt_newarray_{max,min}. Avoids having to call GET_EC inside the functions, for better performance. While here, add a test for Array#min/max being redefined to test_optimization.rb. Fixes [Bug #18180]
* Extract hook macro for attributesNobuyoshi Nakada2021-09-191-28/+24
|
* Remove printf family from the mjit headerNobuyoshi Nakada2021-09-111-14/+13
| | | | | Linking printf family functions makes mjit objects to link unnecessary code.
* Support tracing of attr_reader and attr_writerJeremy Evans2021-08-291-4/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | In vm_call_method_each_type, check for c_call and c_return events before dispatching to vm_call_ivar and vm_call_attrset. With this approach, the call cache will still dispatch directly to those functions, so this change will only decrease performance for the first (uncached) call, and even then, the performance decrease is very minimal. This approach requires that we clear the call caches when tracing is enabled or disabled. The approach currently switches all vm_call_ivar and vm_call_attrset call caches to vm_call_general any time tracing is enabled or disabled. So it could theoretically result in a slowdown for code that constantly enables or disables tracing. This approach does not handle targeted tracepoints, but from my testing, c_call and c_return events are not supported for targeted tracepoints, so that shouldn't matter. This includes a benchmark showing the performance decrease is minimal if detectable at all. Fixes [Bug #16383] Fixes [Bug #10470] Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com>
* Get rid of type-punning pointer casts [Bug #18062]Nobuyoshi Nakada2021-08-111-2/+5
|
* Using RBOOL macroS.H2021-08-021-40/+20
|
* Refactor class variable cache functionsNobuyoshi Nakada2021-06-231-43/+26
| | | | | | | Extracted repeated code as update_classvariable_cache. When cvc table is not set in getclassvariable, an empty table was created but it has no id and would cause [BUG], so made the code same as setclassvariable.
* Add a cache for class variableseileencodes2021-06-181-2/+82
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Redo of 34a2acdac788602c14bf05fb616215187badd504 and 931138b00696419945dc03e10f033b1f53cd50f3 which were reverted. GitHub PR #4340. This change implements a cache for class variables. Previously there was no cache for cvars. Cvar access is slow due to needing to travel all the way up th ancestor tree before returning the cvar value. The deeper the ancestor tree the slower cvar access will be. The benefits of the cache are more visible with a higher number of included modules due to the way Ruby looks up class variables. The benchmark here includes 26 modules and shows with the cache, this branch is 6.5x faster when accessing class variables. ``` compare-ruby: ruby 3.1.0dev (2021-03-15T06:22:34Z master 9e5105c) [x86_64-darwin19] built-ruby: ruby 3.1.0dev (2021-03-15T12:12:44Z add-cache-for-clas.. c6be009) [x86_64-darwin19] | |compare-ruby|built-ruby| |:--------|-----------:|---------:| |vm_cvar | 5.681M| 36.980M| | | -| 6.51x| ``` Benchmark.ips calling `ActiveRecord::Base.logger` from within a Rails application. ActiveRecord::Base.logger has 71 ancestors. The more ancestors a tree has, the more clear the speed increase. IE if Base had only one ancestor we'd see no improvement. This benchmark is run on a vanilla Rails application. Benchmark code: ```ruby require "benchmark/ips" require_relative "config/environment" Benchmark.ips do |x| x.report "logger" do ActiveRecord::Base.logger end end ``` Ruby 3.0 master / Rails 6.1: ``` Warming up -------------------------------------- logger 155.251k i/100ms Calculating ------------------------------------- ``` Ruby 3.0 with cvar cache / Rails 6.1: ``` Warming up -------------------------------------- logger 1.546M i/100ms Calculating ------------------------------------- logger 14.857M (± 4.8%) i/s - 74.198M in 5.006202s ``` Lastly we ran a benchmark to demonstate the difference between master and our cache when the number of modules increases. This benchmark measures 1 ancestor, 30 ancestors, and 100 ancestors. Ruby 3.0 master: ``` Warming up -------------------------------------- 1 module 1.231M i/100ms 30 modules 432.020k i/100ms 100 modules 145.399k i/100ms Calculating ------------------------------------- 1 module 12.210M (± 2.1%) i/s - 61.553M in 5.043400s 30 modules 4.354M (± 2.7%) i/s - 22.033M in 5.063839s 100 modules 1.434M (± 2.9%) i/s - 7.270M in 5.072531s Comparison: 1 module: 12209958.3 i/s 30 modules: 4354217.8 i/s - 2.80x (± 0.00) slower 100 modules: 1434447.3 i/s - 8.51x (± 0.00) slower ``` Ruby 3.0 with cvar cache: ``` Warming up -------------------------------------- 1 module 1.641M i/100ms 30 modules 1.655M i/100ms 100 modules 1.620M i/100ms Calculating ------------------------------------- 1 module 16.279M (± 3.8%) i/s - 82.038M in 5.046923s 30 modules 15.891M (± 3.9%) i/s - 79.459M in 5.007958s 100 modules 16.087M (± 3.6%) i/s - 81.005M in 5.041931s Comparison: 1 module: 16279458.0 i/s 100 modules: 16087484.6 i/s - same-ish: difference falls within error 30 modules: 15891406.2 i/s - same-ish: difference falls within error ``` Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
* Revert "Filling cache values on cvar write"Aaron Patterson2021-05-111-82/+2
| | | | | This reverts commit 08de37f9fa3469365e6b5c964689ae2bae0eb9f3. This reverts commit e8ae922b62adb00a80d3d4c49f7d7b0e6026eaba.
* Filling cache values on cvar writeeileencodes2021-05-111-7/+38
| | | | | | Instead of on read. Once it's in the inline cache we never have to make one again. We want to eventually put the value into the cache, and the best opportunity to do that is when you write the value.
* Add a cache for class variableseileencodes2021-05-111-2/+51
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This change implements a cache for class variables. Previously there was no cache for cvars. Cvar access is slow due to needing to travel all the way up th ancestor tree before returning the cvar value. The deeper the ancestor tree the slower cvar access will be. The benefits of the cache are more visible with a higher number of included modules due to the way Ruby looks up class variables. The benchmark here includes 26 modules and shows with the cache, this branch is 6.5x faster when accessing class variables. ``` compare-ruby: ruby 3.1.0dev (2021-03-15T06:22:34Z master 9e5105ca45) [x86_64-darwin19] built-ruby: ruby 3.1.0dev (2021-03-15T12:12:44Z add-cache-for-clas.. c6be0093ae) [x86_64-darwin19] | |compare-ruby|built-ruby| |:--------|-----------:|---------:| |vm_cvar | 5.681M| 36.980M| | | -| 6.51x| ``` Benchmark.ips calling `ActiveRecord::Base.logger` from within a Rails application. ActiveRecord::Base.logger has 71 ancestors. The more ancestors a tree has, the more clear the speed increase. IE if Base had only one ancestor we'd see no improvement. This benchmark is run on a vanilla Rails application. Benchmark code: ```ruby require "benchmark/ips" require_relative "config/environment" Benchmark.ips do |x| x.report "logger" do ActiveRecord::Base.logger end end ``` Ruby 3.0 master / Rails 6.1: ``` Warming up -------------------------------------- logger 155.251k i/100ms Calculating ------------------------------------- ``` Ruby 3.0 with cvar cache / Rails 6.1: ``` Warming up -------------------------------------- logger 1.546M i/100ms Calculating ------------------------------------- logger 14.857M (± 4.8%) i/s - 74.198M in 5.006202s ``` Lastly we ran a benchmark to demonstate the difference between master and our cache when the number of modules increases. This benchmark measures 1 ancestor, 30 ancestors, and 100 ancestors. Ruby 3.0 master: ``` Warming up -------------------------------------- 1 module 1.231M i/100ms 30 modules 432.020k i/100ms 100 modules 145.399k i/100ms Calculating ------------------------------------- 1 module 12.210M (± 2.1%) i/s - 61.553M in 5.043400s 30 modules 4.354M (± 2.7%) i/s - 22.033M in 5.063839s 100 modules 1.434M (± 2.9%) i/s - 7.270M in 5.072531s Comparison: 1 module: 12209958.3 i/s 30 modules: 4354217.8 i/s - 2.80x (± 0.00) slower 100 modules: 1434447.3 i/s - 8.51x (± 0.00) slower ``` Ruby 3.0 with cvar cache: ``` Warming up -------------------------------------- 1 module 1.641M i/100ms 30 modules 1.655M i/100ms 100 modules 1.620M i/100ms Calculating ------------------------------------- 1 module 16.279M (± 3.8%) i/s - 82.038M in 5.046923s 30 modules 15.891M (± 3.9%) i/s - 79.459M in 5.007958s 100 modules 16.087M (± 3.6%) i/s - 81.005M in 5.041931s Comparison: 1 module: 16279458.0 i/s 100 modules: 16087484.6 i/s - same-ish: difference falls within error 30 modules: 15891406.2 i/s - same-ish: difference falls within error ``` Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
* Protoized old pre-ANSI K&R style declarations and definitionsNobuyoshi Nakada2021-05-071-2/+2
|
* Add back checks for empty kw splat with tests (#4405)Alan Wu2021-04-231-0/+2
| | | | | This reverts commit a224ce8150f2bc687cf79eb415c931d87a4cd247. Turns out the checks are needed to handle splatting an array with an empty ruby2 keywords hash.
* Remove part of comment that is no longer accurateJeremy Evans2021-04-231-5/+0
| | | | | In Ruby 2.7, empty keyword splats could be added back for backwards compatibility. However, that stopped in Ruby 3.0.
* Remove unnecessary checks for empty kw splatAlan Wu2021-04-231-2/+0
| | | | These two checks are surrounded by an if that ensures the call site is not a kw splat call site.
* fix return from orphan Proc in lambdaKoichi Sasada2021-04-021-7/+31
| | | | | | | | | | | | | | | A "return" statement in a Proc in a lambda like: `lambda{ proc{ return }.call }` should return outer lambda block. However, the inner Proc can become orphan Proc from the lambda block. This "return" escape outer-scope like method, but this behavior was decieded as a bug. [Bug #17105] This patch raises LocalJumpError by checking the proc is orphan or not from lambda blocks before escaping by "return". Most of tests are written by Jeremy Evans https://github.com/ruby/ruby/pull/4223
* return bool instead of VALUEAaron Patterson2021-03-171-7/+7
|
* Refactor vm_defined to return a booleanAaron Patterson2021-03-171-34/+14
| | | | | | We just need this function to return whether or not the thing we're looking for is defined. If it's defined, return something true, otherwise false.
* Stop calling `rb_iseq_defined_string` in vm_definedAaron Patterson2021-03-171-8/+3
| | | | | We already have access to the string from the iseqs, so we can stop calling this function.
* Remove DEFINED_IVAR2 from enumJohn Hawthorn2021-03-101-3/+0
| | | | This version of defined? doesn't seem to be possible to emit anymore.
* opt_equality_by_mid for rb_equal_optKoichi Sasada2021-02-131-36/+63
| | | | | | | | | | | | | | This patch improves the performance of sequential and parallel execution of rb_equal() (and rb_eql()). [Bug #17497] rb_equal_opt (and rb_eql_opt) does not have own cd and it waste a time to initialize cd. This patch introduces opt_equality_by_mid() to check equality without cd. Furthermore, current master uses "static" cd on rb_equal_opt (and rb_eql_opt) and it hurts CPU caches on multi-thread execution. Now they are gone so there are no bottleneck on parallel execution.
* global call-cache cache table for rb_funcall*Koichi Sasada2021-01-291-5/+6
| | | | | | | | | | | | | | | | | | | | | | | | | rb_funcall* (rb_funcall(), rb_funcallv(), ...) functions invokes Ruby's method with given receiver. Ruby 2.7 introduced inline method cache with static memory area. However, Ruby 3.0 reimplemented the method cache data structures and the inline cache was removed. Without inline cache, rb_funcall* searched methods everytime. Most of cases per-Class Method Cache (pCMC) will be helped but pCMC requires VM-wide locking and it hurts performance on multi-Ractor execution, especially all Ractors calls methods with rb_funcall*. This patch introduced Global Call-Cache Cache Table (gccct) for rb_funcall*. Call-Cache was introduced from Ruby 3.0 to manage method cache entry atomically and gccct enables method-caching without VM-wide locking. This table solves the performance issue on multi-ractor execution. [Bug #17497] Ruby-level method invocation does not use gccct because it has inline-method-cache and the table size is limited. Basically rb_funcall* is not used frequently, so 1023 entries can be enough. We will revisit the table size if it is not enough.
* Suppress the warning for the invalid method_explorer caseNobuyoshi Nakada2021-01-171-1/+1
|
* Revert "[Bug #11213] let defined?(super) call respond_to_missing?"Nobuyoshi Nakada2021-01-131-2/+2
| | | | | | This reverts commit fac2498e0299f13dffe4f09a7dd7657fb49bf643 for now, due to [Bug #17509], the breakage in the case `super` is called in `respond_to?`.
* simplify assertionKoichi Sasada2021-01-071-4/+1
| | | | searched_cme is used only this line so the variable is not needed.
* Fix broken JIT of getinlinecacheTakashi Kokubun2021-01-041-9/+16
| | | | | | | | e7fc353f04 reverted vm_ic_hit_p's signature change made in 53babf35ef, which broke JIT compilation of getinlinecache. To make sure it doesn't happen again, I separated vm_inlined_ic_hit_p to make the intention clear.
* enable constant cache on ractorsKoichi Sasada2021-01-051-12/+20
| | | | | | | | | | | | | | | | constant cache `IC` is accessed by non-atomic manner and there are thread-safety issues, so Ruby 3.0 disables to use const cache on non-main ractors. This patch enables it by introducing `imemo_constcache` and allocates it by every re-fill of const cache like `imemo_callcache`. [Bug #17510] Now `IC` only has one entry `IC::entry` and it points to `iseq_inline_constant_cache_entry`, managed by T_IMEMO object. `IC` is atomic data structure so `rb_mjit_before_vm_ic_update()` and `rb_mjit_after_vm_ic_update()` is not needed.
* Fixed leaked global symbolsNobuyoshi Nakada2020-12-261-3/+4
|
* separate rb_ractor_pub from rb_ractor_tKoichi Sasada2020-12-221-1/+1
| | | | | | | | | separate some fields from rb_ractor_t to rb_ractor_pub and put it at the beggining of rb_ractor_t and declare it in vm_core.h so vm_core.h can access rb_ractor_pub fields. Now rb_ec_ractor_hooks() is a complete inline function and no MJIT related issue.
* TracePoint.new(&block) should be ractor-localKoichi Sasada2020-12-221-1/+1
| | | | | TracePoint should be ractor-local because the Proc can violate the Ractor-safe.
* Introduce Ractor::IsolationErrorKoichi Sasada2020-12-211-1/+1
| | | | | | | | | | | Ractor has several restrictions to keep each ractor being isolated and some operation such as `CONST="foo"` in non-main ractor raises an exception. This kind of operation raises an error but there is confusion (some code raises RuntimeError and some code raises NameError). To make clear we introduce Ractor::IsolationError which is raised when the isolation between ractors is violated.
* ALWAYS_INLINE implies inlineNobuyoshi Nakada2020-12-191-1/+1
|
* Fix vm_search_invokeblockTakashi Kokubun2020-12-191-1/+8
| | | | call_ needs to be vm_invokeblock_i, and flags is also not empty.
* discourage inlining for vm_sendish()Takashi Kokubun2020-12-191-6/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | reversing 9213771817 only for JIT, because it made JIT slower. $ benchmark-driver -v --rbenv 'before;after;before --jit;after --jit' --repeat-count=36 --alternate --output=all benchmark.yml before: ruby 3.0.0dev (2020-12-19T07:38:17Z master a139318538) [x86_64-linux] after: ruby 3.0.0dev (2020-12-19T07:52:01Z master ce9faaeff5) [x86_64-linux] last_commit=discourage inlining for vm_sendish() before --jit: ruby 3.0.0dev (2020-12-19T07:38:17Z master a139318538) +JIT [x86_64-linux] after --jit: ruby 3.0.0dev (2020-12-19T07:52:01Z master ce9faaeff5) +JIT [x86_64-linux] last_commit=discourage inlining for vm_sendish() Calculating ------------------------------------- before after before --jit after --jit Optcarrot Lan_Master.nes 42.83365858987760 42.68912456143848 76.50136803552716 65.74704713379785 fps 42.87724738609940 42.89045158177300 79.72624911659534 81.26221749201044 43.34963955708526 42.95431841174180 80.18085951039328 82.86458983313545 43.56786038452823 43.57563008888242 80.45933051716041 83.09150550702445 43.83219269706004 43.60748924115331 80.67164125046142 83.39458202043882 43.99035062888973 43.62050459554573 80.93204435712701 83.56303651352751 44.25176047881120 44.04822899344536 81.15051082548314 83.58166141398522 44.41978060794512 44.06521657912991 81.35651907376140 83.80036752456826 44.46864790591856 44.09325484326153 81.53456531520031 83.87502933718609 45.54712020644544 44.70693952869038 81.97738413452767 83.95818356402224 45.84292299382878 44.77704345873913 82.35118338199700 83.95966387450966 45.89411137280815 45.41425773286726 83.01052538434648 84.12812994632024 45.93130099197283 46.16884439916935 83.50833510120576 84.26276094927231 46.13648038236674 46.66645417860622 84.88757531920830 85.41732546800056 46.74873798919658 46.71790568883760 84.90953097036886 85.56340808970482 47.11273577214855 46.74581938882115 84.93196765297411 85.57603396455576 47.17870777128640 46.82414166607185 84.97178445888456 86.63510466280221 47.19338055580042 46.83645774240446 85.43536447262163 86.74129103462393 47.25761413477774 46.86834469505590 85.59822430471097 86.85376073363715 47.53327847102834 46.90228589364909 85.76446609620548 87.26108400015282 47.64308771617673 47.02814519551055 85.79904863600991 87.72293541243303 47.80286861846863 47.44672838168050 85.88640862064263 87.86803587836525 47.86455937950740 47.65301489003541 85.88750199172448 88.16881051171814 47.90065455321760 47.73425082354376 85.94295700508701 88.71267004066843 47.90727961241468 47.86377917424705 85.94674546805844 88.77726627283683 47.93243954623904 47.88720812998766 86.51872778134982 88.78993962536994 47.95062952008558 47.88774830879015 86.63116771614249 88.88085054889298 47.95097849989396 47.89825669442417 86.77387990931732 89.72021826461126 48.04730571166697 47.89981045730949 86.95084011077047 89.75804193954582 48.08042611622322 48.03246661737583 87.87239147980547 90.05949240088842 48.08999523258601 48.15253490344558 88.31289344498016 90.36439442190294 48.25670456430854 48.26904755214532 88.33999433286937 90.54253266759406 48.25947200597002 48.41894159956091 88.35502296938638 90.72591894564106 48.30826210577268 48.43125201523194 88.58311746582939 90.77173035874087 48.31514124187375 48.53932287546499 88.89099681179805 91.07747476133886 48.44349281318267 48.58969411593706 89.34043973691581 91.08545627378257
* encourage inlining for vm_sendish()Koichi Sasada2020-12-171-34/+28
| | | | | | | | | Some tunings. * add `inline` for vm_sendish() * pass enum instead of func ptr to vm_sendish() * reorder initial order of `calling` struct. * add ALWAYS_INLINE for vm_search_method_fastpath() * call vm_search_method_fastpath() from vm_sendish()