delta/ruby.git - github.com: ruby/ruby.git

	Commit message (Collapse)	Author	Age	Files	Lines
*	Enhance keep_tokens option for RubyVM::AbstractSyntaxTree parsing methods	yui-knk	2022-11-21	1	-1/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Implementation for Language Server Protocol (LSP) sometimes needs token information. For example both `m(1)` and `m(1, )` has same AST structure other than node locations then it's impossible to check the existence of `,` from AST. However in later case, it might be better to suggest variables list for the second argument. Token information is important for such case. This commit adds these methods. * Add `keep_tokens` option for `RubyVM::AbstractSyntaxTree.parse`, `.parse_file` and `.of` * Add `RubyVM::AbstractSyntaxTree::Node#tokens` which returns tokens for the node including tokens for descendants nodes. * Add `RubyVM::AbstractSyntaxTree::Node#all_tokens` which returns all tokens for the input script regardless the receiver node. [Feature #19070] Impacts on memory usage and performance are below: Memory usage: ``` $ cat test.rb root = RubyVM::AbstractSyntaxTree.parse_file(File.expand_path('../test/ruby/test_keyword.rb', __FILE__), keep_tokens: true) $ /usr/bin/time -f %Mkb /usr/local/bin/ruby -v ruby 3.2.0dev (2022-11-19T09:41:54Z 19070-keep_tokens d3af1b8057) [x86_64-linux] 11408kb # keep_tokens :false $ /usr/bin/time -f %Mkb /usr/local/bin/ruby test.rb 17508kb # keep_tokens :true $ /usr/bin/time -f %Mkb /usr/local/bin/ruby test.rb 30960kb ``` Performance: ``` $ cat ../ast_keep_tokens.yml prelude: \| src = <<~SRC module M class C def m1(a, b) 1 + a + b end end end SRC benchmark: without_keep_tokens: \| RubyVM::AbstractSyntaxTree.parse(src, keep_tokens: false) with_keep_tokens: \| RubyVM::AbstractSyntaxTree.parse(src, keep_tokens: true) $ make benchmark COMPARE_RUBY="./ruby" ARGS=../ast_keep_tokens.yml /home/kaneko.y/.rbenv/shims/ruby --disable=gems -rrubygems -I../benchmark/lib ../benchmark/benchmark-driver/exe/benchmark-driver \ --executables="compare-ruby::./ruby -I.ext/common --disable-gem" \ --executables="built-ruby::./miniruby -I../lib -I. -I.ext/common ../tool/runruby.rb --extout=.ext -- --disable-gems --disable-gem" \ --output=markdown --output-compare -v ../ast_keep_tokens.yml compare-ruby: ruby 3.2.0dev (2022-11-19T09:41:54Z 19070-keep_tokens d3af1b8057) [x86_64-linux] built-ruby: ruby 3.2.0dev (2022-11-19T09:41:54Z 19070-keep_tokens d3af1b8057) [x86_64-linux] warming up.. \| \|compare-ruby\|built-ruby\| \|:--------------------\|-----------:\|---------:\| \|without_keep_tokens \| 21.659k\| 21.303k\| \| \| 1.02x\| -\| \|with_keep_tokens \| 6.220k\| 5.691k\| \| \| 1.09x\| -\| ```
*	Move `error` from top_stmts and top_stmt to stmt	yui-knk	2022-10-08	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	By this change, syntax error is recovered smaller units. In the case below, "DEFN :bar" is same level with "CLASS :Foo" now. ``` module Z class Foo foo. end def bar end end ``` [Feature #19013]
*	Initialize node_id	Wolf	2022-08-01	1	-0/+1
\| \| \| \| \| \|	In some causes node_id might have been left uninitialized leading to undefined behavior on access. So always set it to -1, so we have some valid value in there.
*	Expand tabs [ci skip]	Takashi Kokubun	2022-07-21	1	-706/+706
\| \| \| \|	[Misc #18891]
*	Remove `NODE_DASGN_CURR` [Feature #18406]	Nobuyoshi Nakada	2021-12-13	1	-8/+1
\| \| \| \| \| \| \|	This `NODE` type was used in pre-YARV implementation, to improve the performance of assignment to dynamic local variable defined at the innermost scope. It has no longer any actual difference with `NODE_DASGN`, except for the node dump.
*	Add `nd_type_p` macro	S.H	2021-12-04	1	-3/+3
\|
*	Refactor hacky ID tables to struct rb_ast_id_table_t	Yusuke Endoh	2021-11-21	1	-12/+35
\| \| \| \| \| \| \| \| \|	The implementation of a local variable tables was represented as `ID*`, but it was very hacky: the first element is not an ID but the size of the table, and, the last element is (sometimes) a link to the next local table only when the id tables are a linked list. This change converts the hacky implementation to a normal struct.
*	node.c (dump_node): update format explanation for NODE_ARGS	Yusuke Endoh	2021-11-17	1	-2/+2
\|
*	node.c (dump_node): trivial refactoring	Yusuke Endoh	2021-11-17	1	-3/+1
\|
*	Show node IDs in dump	Nobuyoshi Nakada	2021-07-12	1	-2/+2
\|
*	ast.rb: RubyVM::AST.parse and .of accepts `save_script_lines: true`	Yusuke Endoh	2021-06-18	1	-0/+1
\| \| \| \| \| \| \|	This option makes the parser keep the original source as an array of the original code lines. This feature exploits the mechanism of `SCRIPT_LINES__` but records only the specified code that is passed to RubyVM::AST.of or .parse, instead of recording all parsed program texts.
*	Partially revert 2c7d3b3a722c4636ab1e9d289cbca47ddd168d3e	Yusuke Endoh	2021-04-27	1	-1/+1
\| \| \| \|	to make imemo_ast WB-protected again. Only the test is kept.
*	node.c (rb_ast_new): imemo_ast is WB-unprotected	Yusuke Endoh	2021-04-26	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously imemo_ast was handled as WB-protected which caused a segfault of the following code: # shareable_constant_value: literal M0 = {} M1 = {} ... M100000 = {} My analysis is here: `shareable_constant_value: literal` creates many Hash instances during parsing, and add them to node_buffer of imemo_ast. However, the contents are missed because imemo_ast is incorrectly WB-protected. This changeset makes imemo_ast as WB-unprotected.
*	NODE markability should not change by nd_set_type	Nobuyoshi Nakada	2021-01-14	1	-6/+25
\|
*	Change NODE layout for pattern matching	Kazuki Tsujimoto	2020-11-01	1	-2/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I prefer pconst to be the first element of NODE. Before: \| ARYPTN \| FNDPTN \| HSHPTN ---+--------+--------+----------- u1 \| imemo \| imemo \| pkwargs u2 \| pconst \| pconst \| pconst u3 \| apinfo \| fpinfo \| pkwrestarg After: \| ARYPTN \| FNDPTN \| HSHPTN ---+--------+--------+----------- u1 \| pconst \| pconst \| pconst u2 \| imemo \| imemo \| pkwargs u3 \| apinfo \| fpinfo \| pkwrestarg
*	Dump FrozenCore specially	Nobuyoshi Nakada	2020-10-20	1	-1/+20
\|
*	Unfreeze string-literal-only interpolated string-literal	Nobuyoshi Nakada	2020-09-30	1	-0/+1
\| \| \| \|	[Feature #17104]
*	rb_{ary,fnd}_pattern_info: Remove imemo member to reduce memory usage	Kazuki Tsujimoto	2020-08-02	1	-24/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a partial revert commit of 8f096226e1b76f95f4d853d3dea2bc75eeeb5244. NODE layout: Before: \| ARYPTN \| FNDPTN \| HSHPTN ---+--------+--------+----------- u1 \| pconst \| pconst \| pconst u2 \| unused \| unused \| pkwargs u3 \| apinfo \| fpinfo \| pkwrestarg After: \| ARYPTN \| FNDPTN \| HSHPTN ---+--------+--------+----------- u1 \| imemo \| imemo \| pkwargs u2 \| pconst \| pconst \| pconst u3 \| apinfo \| fpinfo \| pkwrestarg
*	NODE_MATCH needs reference updating	Aaron Patterson	2020-07-30	1	-0/+1
\|
*	Use a linked list to eliminate imemo tmp bufs for managing local tables	Aaron Patterson	2020-07-27	1	-19/+17
\| \| \| \| \| \| \|	This patch changes local table memory to be managed by a linked list rather than via the garbage collector. It reduces allocations from the GC and also fixes a use-after-free bug in the concurrent-with-sweep compactor I'm working on.
*	Use ID instead of GENTRY for gvars. (#3278)	Koichi Sasada	2020-07-03	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	Use ID instead of GENTRY for gvars. Global variables are compiled into GENTRY (a pointer to struct rb_global_entry). This patch replace this GENTRY to ID and make the code simple. We need to search GENTRY from ID every time (st_lookup), so additional overhead will be introduced. However, the performance of accessing global variables is not important now a day and this simplicity helps Ractor development.
*	Introduce find pattern [Feature #16828]	Kazuki Tsujimoto	2020-06-14	1	-0/+34
\|
*	decouple internal.h headers	卜部昌平	2019-12-26	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Saves comitters' daily life by avoid #include-ing everything from internal.h to make each file do so instead. This would significantly speed up incremental builds. We take the following inclusion order in this changeset: 1. "ruby/config.h", where _GNU_SOURCE is defined (must be the very first thing among everything). 2. RUBY_EXTCONF_H if any. 3. Standard C headers, sorted alphabetically. 4. Other system headers, maybe guarded by #ifdef 5. Everything else, sorted alphabetically. Exceptions are those win32-related headers, which tend not be self- containing (headers have inclusion order dependencies).
*	Revert "Method reference operator"	Nobuyoshi Nakada	2019-11-12	1	-9/+0
\| \| \| \| \|	This reverts commit 67c574736912003c377218153f9d3b9c0c96a17b. [Feature #16275]
*	Use an identity hash for pinning Ripper objects	Aaron Patterson	2019-11-05	1	-6/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Ripper reuses parse.y for its implementation. Ripper changes the grammar productions to sometimes return Ruby objects. This Ruby objects are put in to the parser's stack, so they must be kept alive. This is where the "mark_ary" comes in. The mark array ensures that Ruby objects created and pushed on the stack during the course of parsing will stay alive for the life of the parsing functions. Unfortunately, Arrays do not prevent their contents from moving. If the compactor runs, objects on the parser stack could move because the array won't prevent them from moving. But the GC doesn't know about the parser stack, so it can't update references in that stack (it will update them in the array). This commit changes the mark array to be an identity hash. Since the identity hash relies on memory addresses for the definition of identity, the GC will not allow keys in an identity hash to move. We can prevent movement of objects in the parser stack by sticking them in an identity hash.
*	avoid overflow in integer multiplication	卜部昌平	2019-10-09	1	-3/+8
\| \| \| \| \| \| \|	This changeset basically replaces `ruby_xmalloc(x * y)` into `ruby_xmalloc2(x, y)`. Some convenient functions are also provided for instance `rb_xmalloc_mul_add(x, y, z)` which allocates x * y + z byes.
*	Adjusted spaces [ci skip]	Nobuyoshi Nakada	2019-09-27	1	-43/+43
\|
*	Add compaction support to `rb_ast_t`	Aaron Patterson	2019-09-26	1	-4/+53
\| \| \| \|	This commit adds compaction support to `rb_ast_t`.
*	`NODE_MATCH` needs to be marked / allocated from marking bucket	Aaron Patterson	2019-09-10	1	-1/+3
\| \| \| \|	Fixes a test in RubySpec
*	Revert "Reverting node marking until I can fix GC problem."	Aaron Patterson	2019-09-09	1	-17/+140
\| \| \| \|	This reverts commit 092f31e7e23c0ee04df987f0c0f979d036971804.
*	Rename NODE_ARRAY to NODE_LIST to reflect its actual use cases	Yusuke Endoh	2019-09-07	1	-5/+5
\| \| \| \| \| \| \| \| \| \|	and NODE_ZARRAY to NODE_ZLIST. NODE_ARRAY is used not only by an Array literal, but also the contents of Hash literals, method call arguments, dynamic string literals, etc. In addition, the structure of NODE_ARRAY is a linked list, not an array. This is very confusing, so I believe `NODE_LIST` is a better name.
*	Reverting node marking until I can fix GC problem.	Aaron Patterson	2019-09-05	1	-140/+17
\| \| \| \| \|	Looks like we're getting WB misses during stressful GC on startup. I am investigating.
*	I forgot to add `break` in my case statements	Aaron Patterson	2019-09-05	1	-0/+2
\| \| \| \|	Give me a break.
*	Stash tmpbuffer inside internal structs	Aaron Patterson	2019-09-05	1	-2/+10
\| \| \| \| \| \|	I guess those AST node were actually used for something, so we'd better not touch them. Instead this commit just puts the tmpbuffer inside a different internal struct so that we can mark them.
*	add debugging code to the mark function	Aaron Patterson	2019-09-05	1	-0/+2
\|
*	lazily allocate the mark array	Aaron Patterson	2019-09-05	1	-3/+4
\|
*	Create two buckets for allocating NODE structs	Aaron Patterson	2019-09-05	1	-16/+65
\| \| \| \| \| \| \| \| \|	This commit adds two buckets for allocating NODE structs, then allocates "markable" NODE objects from one bucket. The reason to do this is so when the AST mark function scans nodes for VALUE objects to mark, we only scan NODE objects that we know to reference VALUE objects. If we did not divide the objects, then the mark function spends too much time scanning objects that don't contain any references.
*	Stash the imemo buf at the end of the ID list	Aaron Patterson	2019-09-05	1	-1/+9
\| \| \| \| \| \| \|	Now we can reach the ID table buffer from the id table itself, so when SCOPE nodes are marked we can keep the buffers alive. This eliminates the need for the "mark array" during normal parse / compile (IOW not Ripper).
*	Mark some tmpbufs via node objects	Aaron Patterson	2019-09-05	1	-0/+3
\| \| \| \|	This way we don't need to add the tmpbufs to a Ruby array for marking
*	Directly mark node objects instead of using a mark array	Aaron Patterson	2019-09-05	1	-0/+50
\| \| \| \| \| \| \| \|	This patch changes the AST mark function so that it will walk through nodes in the NODE buffer marking Ruby objects rather than using a mark array to guarantee liveness. The reason I want to do this is so that when compaction happens on major GCs, node objects will have their references pinned (or possibly we can update them correctly).
*	Make pattern matching support **nil syntax	Kazuki Tsujimoto	2019-09-01	1	-1/+6
\|
*	Directly mark compile options from the AST object	Aaron Patterson	2019-08-27	1	-0/+1
\| \| \| \| \| \| \|	`rb_ast_t` holds a reference to this object, so it should mark the object. Currently it is relying on the `mark_ary` on `node_buffer` to ensure that the object stays alive. But since the array internals can move, this could cause a segv if compaction impacts the array.
*	Let memory sizes of the various IMEMO object types be reflected correctly	Lourens Naudé	2019-07-23	1	-2/+21
\| \| \| \| \| \|	[Feature #15805] Closes: https://github.com/ruby/ruby/pull/2140
*	Fix grammar of macro name: ECCESSED -> ECCESSIVE	Martin Dürst	2019-06-05	1	-1/+1
\| \| \| \| \|	Fix the name of the macro variable introduced in 0872ea5330 from NODE_SPECIAL_EXCESSED_COMMA to NODE_SPECIAL_EXCESSIVE_COMMA.
*	* expand tabs.	git	2019-06-04	1	-6/+6
\|
*	node.h: Avoid a magic number to represent excessed comma	Yusuke Endoh	2019-06-04	1	-1/+8
\| \| \| \| \| \|	`(ID)1` was assigned to NODE_ARGS#rest_arg for `{\|x,\| }`. This change removes the magic number by introducing an explicit macro variable for it: NODE_SPECIAL_EXCESSED_COMMA.
*	* expand tabs.	git	2019-06-04	1	-1/+1
\|
*	node.c: Show the ID of internal variable	Yusuke Endoh	2019-06-04	1	-1/+1
\|
*	Fix description of NODE_IN	Kazuki Tsujimoto	2019-04-27	1	-1/+1
\|
*	Avoid usage of the dummy empty BEGIN node	ktsj	2019-04-20	1	-1/+6
\| \| \| \| \| \|	Use NODE_SPECIAL_NO_NAME_REST instead. git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67629 b2dd03c8-39d4-4d8f-98ff-823fe69b080e