summaryrefslogtreecommitdiff
path: root/compile.c
Commit message (Collapse)AuthorAgeFilesLines
...
* Refactor uJIT code into more files for readabilityMaxime Chevalier-Boisvert2021-10-201-1/+0
|
* MicroJIT: compile after ten callsAlan Wu2021-10-201-36/+6
|
* Implement the --disable-ujit command line optionAlan Wu2021-10-201-2/+2
|
* Avoid triggering GC while translating threaded codeAlan Wu2021-10-201-7/+20
|
* Avoid recompiling overlapping instruction sequences in ujitMaxime Chevalier-Boisvert2021-10-201-7/+15
|
* Generate multiple copies of native code for `pop`Alan Wu2021-10-201-1/+1
| | | | | | Insert generated addresses into st_table for mapping native code addresses back to info about VM instructions. Export `encoded_insn_data` to do this. Also some style fixes.
* Add new files, ujit_compile.c, ujit_compile.hMaxime Chevalier-Boisvert2021-10-201-8/+7
|
* Added shift instructionsMaxime Chevalier-Boisvert2021-10-201-6/+12
|
* Yeah, this actually works!Alan Wu2021-10-201-0/+5
|
* Cast to void pointer for `%p` in commented out code [ci skip]Nobuyoshi Nakada2021-10-201-4/+4
|
* Dump outer variables tables when dumping an iseq to binaryAaron Patterson2021-10-071-1/+54
| | | | | | | | This commit dumps the outer variables table when dumping an iseq to binary. This fixes a case where Ractors aren't able to tell what outer variables belong to a lambda after the lambda is loaded via ISeq.load_from_binary [Bug #18232] [ruby-core:105504]
* Using NIL_P macro instead of `== Qnil`S.H2021-10-031-1/+1
|
* Using RB_FLOAT_TYPE_P macroS-H-GAMELINKS2021-09-121-2/+2
|
* Using SYMBOL_P macroS-H-GAMELINKS2021-09-111-2/+2
|
* Remove unused argumentNobuyoshi Nakada2021-09-101-1/+1
|
* suppress GCC's -Wsuggest-attribute=format卜部昌平2021-09-101-2/+2
| | | | I was not aware of this because I use clang these days.
* Replace RBOOL macroS-H-GAMELINKS2021-09-051-2/+2
|
* Extract compile_attrasgn from iseq_compile_each0Nobuyoshi Nakada2021-09-011-84/+89
|
* Extract compile_kw_arg from iseq_compile_each0Nobuyoshi Nakada2021-09-011-32/+37
|
* Extract compile_errinfo from iseq_compile_each0Nobuyoshi Nakada2021-09-011-24/+30
|
* Extract compile_dots from iseq_compile_each0Nobuyoshi Nakada2021-09-011-23/+30
|
* Extract compile_colon3 from iseq_compile_each0Nobuyoshi Nakada2021-09-011-25/+32
|
* Extract compile_colon2 from iseq_compile_each0Nobuyoshi Nakada2021-09-011-42/+49
|
* Extract compile_match from iseq_compile_each0Nobuyoshi Nakada2021-09-011-34/+40
|
* Extract compile_yield from iseq_compile_each0Nobuyoshi Nakada2021-09-011-48/+46
|
* Extract compile_super from iseq_compile_each0Nobuyoshi Nakada2021-09-011-132/+140
|
* Extract compile_op_log from iseq_compile_each0Nobuyoshi Nakada2021-09-011-39/+46
|
* Extract compile_op_cdecl from iseq_compile_each0Nobuyoshi Nakada2021-09-011-66/+73
|
* Extract compile_op_asgn2 from iseq_compile_each0Nobuyoshi Nakada2021-09-011-104/+111
|
* Extract compile_op_asgn1 from iseq_compile_each0Nobuyoshi Nakada2021-09-011-134/+139
|
* Remove no longer used variable line_nodeNobuyoshi Nakada2021-08-311-258/+257
|
* Extract compile_block from iseq_compile_each0Nobuyoshi Nakada2021-08-311-12/+18
| | | | And constify `node` argument of `iseq_compile_each0`.
* Constify line_node in iseq_compile_each0Nobuyoshi Nakada2021-08-311-1/+1
|
* Allow tracing of optimized methodsJeremy Evans2021-08-211-0/+1
| | | | | | | | | | | | | This updates the trace instructions to directly dispatch to opt_send_without_block. So this should cause no slowdown in non-trace mode. To enable the tracing of the optimized methods, RUBY_EVENT_C_CALL and RUBY_EVENT_C_RETURN are added as events to the specialized instructions. Fixes [Bug #14870] Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com>
* Show verbose error messages when single pattern match failsKazuki Tsujimoto2021-08-151-95/+361
| | | | | | | [0] => [0, *, a] #=> [0] length mismatch (given 1, expected 2+) (NoMatchingPatternError) Ignore test failures of typeprof caused by this change for now.
* Fix use-after-free on -DUSE_EMBED_CI=0Alan Wu2021-07-291-2/+2
| | | | | | | | On -DUSE_EMBED_CI=0, there are more GC allocations and the old code didn't keep old_operands[0] reachable while allocating. On a Debian based system, I get a crash requiring erb under GC stress mode. On macOS, tool/transcode-tblgen.rb runs incorrectly if I put GC.stress=true as the first line.
* Add pattern matching pin support for instance/class/global variablesJeremy Evans2021-07-151-0/+3
| | | | | | | | | | | Pin matching for local variables and constants is already supported, and it is fairly simple to add support for these variable types. Note that pin matching for method calls is still not supported without wrapping in parentheses (pin expressions). I think that's for the best as method calls are far more complex (arguments/blocks). Implements [Feature #17724]
* Store the dup'd CDHASH in the object list during IBF loadAaron Patterson2021-07-061-0/+5
| | | | | | | | | | Since b2fc592c304 nothing was holding a reference to the dup'd CDHASH during IBF loading. If a GC happened to run during IBF load then the copied hash wouldn't have anything to keep it alive. We don't really want to keep the originally loaded CDHASH hash, so this patch just overwrites the original hash with the copied / modified hash. [Bug #17984] [ruby-core:104259]
* Check type of instruction - can be INSN or ADJUSTeileencodes2021-06-231-0/+1
| | | | | | | | | If the type is ADJUST we don't want to treat it like an INSN so we have to check the type before reading from `insn_info.events`. [Bug #18001] [ruby-core:104371] Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
* Add a cache for class variableseileencodes2021-06-181-4/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Redo of 34a2acdac788602c14bf05fb616215187badd504 and 931138b00696419945dc03e10f033b1f53cd50f3 which were reverted. GitHub PR #4340. This change implements a cache for class variables. Previously there was no cache for cvars. Cvar access is slow due to needing to travel all the way up th ancestor tree before returning the cvar value. The deeper the ancestor tree the slower cvar access will be. The benefits of the cache are more visible with a higher number of included modules due to the way Ruby looks up class variables. The benchmark here includes 26 modules and shows with the cache, this branch is 6.5x faster when accessing class variables. ``` compare-ruby: ruby 3.1.0dev (2021-03-15T06:22:34Z master 9e5105c) [x86_64-darwin19] built-ruby: ruby 3.1.0dev (2021-03-15T12:12:44Z add-cache-for-clas.. c6be009) [x86_64-darwin19] | |compare-ruby|built-ruby| |:--------|-----------:|---------:| |vm_cvar | 5.681M| 36.980M| | | -| 6.51x| ``` Benchmark.ips calling `ActiveRecord::Base.logger` from within a Rails application. ActiveRecord::Base.logger has 71 ancestors. The more ancestors a tree has, the more clear the speed increase. IE if Base had only one ancestor we'd see no improvement. This benchmark is run on a vanilla Rails application. Benchmark code: ```ruby require "benchmark/ips" require_relative "config/environment" Benchmark.ips do |x| x.report "logger" do ActiveRecord::Base.logger end end ``` Ruby 3.0 master / Rails 6.1: ``` Warming up -------------------------------------- logger 155.251k i/100ms Calculating ------------------------------------- ``` Ruby 3.0 with cvar cache / Rails 6.1: ``` Warming up -------------------------------------- logger 1.546M i/100ms Calculating ------------------------------------- logger 14.857M (± 4.8%) i/s - 74.198M in 5.006202s ``` Lastly we ran a benchmark to demonstate the difference between master and our cache when the number of modules increases. This benchmark measures 1 ancestor, 30 ancestors, and 100 ancestors. Ruby 3.0 master: ``` Warming up -------------------------------------- 1 module 1.231M i/100ms 30 modules 432.020k i/100ms 100 modules 145.399k i/100ms Calculating ------------------------------------- 1 module 12.210M (± 2.1%) i/s - 61.553M in 5.043400s 30 modules 4.354M (± 2.7%) i/s - 22.033M in 5.063839s 100 modules 1.434M (± 2.9%) i/s - 7.270M in 5.072531s Comparison: 1 module: 12209958.3 i/s 30 modules: 4354217.8 i/s - 2.80x (± 0.00) slower 100 modules: 1434447.3 i/s - 8.51x (± 0.00) slower ``` Ruby 3.0 with cvar cache: ``` Warming up -------------------------------------- 1 module 1.641M i/100ms 30 modules 1.655M i/100ms 100 modules 1.620M i/100ms Calculating ------------------------------------- 1 module 16.279M (± 3.8%) i/s - 82.038M in 5.046923s 30 modules 15.891M (± 3.9%) i/s - 79.459M in 5.007958s 100 modules 16.087M (± 3.6%) i/s - 81.005M in 5.041931s Comparison: 1 module: 16279458.0 i/s 100 modules: 16087484.6 i/s - same-ish: difference falls within error 30 modules: 15891406.2 i/s - same-ish: difference falls within error ``` Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
* Enable USE_ISEQ_NODE_ID by defaultYusuke Endoh2021-06-181-5/+5
| | | | | | | | ... which is formally called EXPERIMENTAL_ISEQ_NODE_ID. See also ff69ef27b06eed1ba750e7d9cab8322f351ed245. https://bugs.ruby-lang.org/issues/17930
* Make it possible to get AST::Node from Thread::Backtrace::LocationYusuke Endoh2021-06-181-5/+23
| | | | | | | | | RubyVM::AST.of(Thread::Backtrace::Location) returns a node that corresponds to the location. Typically, the node is a method call, but not always. This change also includes iseq's dump/load support of node_ids for each instructions.
* node.h: Reduce struct size to fit with Ruby object size (five VALUEs)Yusuke Endoh2021-06-181-2/+1
| | | | | | | | by merging `rb_ast_body_t#line_count` and `#script_lines`. Fortunately `line_count == RARRAY_LEN(script_lines)` was always satisfied. When script_lines is saved, it has an array of lines, and when not saved, it has a Fixnum that represents the old line_count.
* ast.rb: RubyVM::AST.parse and .of accepts `save_script_lines: true`Yusuke Endoh2021-06-181-0/+1
| | | | | | | This option makes the parser keep the original source as an array of the original code lines. This feature exploits the mechanism of `SCRIPT_LINES__` but records only the specified code that is passed to RubyVM::AST.of or .parse, instead of recording all parsed program texts.
* Adjust styles [ci skip]Nobuyoshi Nakada2021-06-171-4/+6
| | | | | | | | | * --braces-after-func-def-line * --dont-cuddle-else * --procnames-start-lines * --space-after-for * --space-after-if * --space-after-while
* Warn more duplicate literal hash keysNobuyoshi Nakada2021-06-031-0/+5
| | | | | Following non-special_const literals: * T_REGEXP
* Warn more duplicate literal hash keysNobuyoshi Nakada2021-06-031-8/+8
| | | | | | | | Following non-special_const literals: * T_BIGNUM * T_FLOAT (non-flonum) * T_RATIONAL * T_COMPLEX
* Refactor rb_vm_insn_addr2insn callsTakashi Kokubun2021-06-021-5/+1
| | | | It's been a way too much amount of ifdefs.
* compile.c: Emit send for === calls in when statementsAlan Wu2021-05-281-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The checkmatch instruction with VM_CHECKMATCH_TYPE_CASE calls === without a call cache. Emit a send instruction to make the call instead. It includes a call cache. The call cache improves throughput of using when statements to check the class of a given object. This is useful for say, JSON serialization. Use of a regular send instead of checkmatch also avoids taking the VM lock every time, which is good for multi-ractor workloads. Calculating ------------------------------------- master post vm_case_classes 11.013M 16.172M i/s - 6.000M times in 0.544795s 0.371009s vm_case_lit 2.296 2.263 i/s - 1.000 times in 0.435606s 0.441826s vm_case 74.098M 64.338M i/s - 6.000M times in 0.080974s 0.093257s Comparison: vm_case_classes post: 16172114.4 i/s master: 11013316.9 i/s - 1.47x slower vm_case_lit master: 2.3 i/s post: 2.3 i/s - 1.01x slower vm_case master: 74097858.6 i/s post: 64338333.9 i/s - 1.15x slower The vm_case benchmark is a bit slower post patch, possibily due to the larger instruction sequence. The benchmark dispatches using opt_case_dispatch so was not running checkmatch and does not make the === call post patch.
* Make range literal peephole optimization target "newrange"Alan Wu2021-05-281-4/+3
| | | | | | | | | | | It looks for "checkmatch", when it could be applied to anything that has "newrange". Making the optimization target more ranges might only be fair play when all ranges are frozen. So I'm putting a reference to the ticket that froze all ranges. [Feature #15504]