| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
| |
warning: src/cmd/5g/reg.c:461 format mismatch d VLONG, arg 5
warning: src/cmd/6g/reg.c:396 format mismatch d VLONG, arg 5
warning: src/cmd/9g/reg.c:440 format mismatch d VLONG, arg 5
LGTM=minux
R=rsc, minux
CC=golang-codereviews
https://codereview.appspot.com/179300043
|
|\
| |
| |
| |
| |
| | |
TBR=austin
CC=golang-codereviews
https://codereview.appspot.com/179040044
|
| |\
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This will be the last dev.power64 merge; we'll finish on dev.cc.
TBR=austin
CC=golang-codereviews
https://codereview.appspot.com/175420043
|
| | |\
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
This is to reduce the delta between dev.cc and dev.garbage to just garbage collector changes.
These are the files that had merge conflicts and have been edited by hand:
malloc.go
mem_linux.go
mgc.go
os1_linux.go
proc1.go
panic1.go
runtime1.go
LGTM=austin
R=austin
CC=golang-codereviews
https://codereview.appspot.com/174180043
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Theses were very helpful in understanding the regions and
register selection when porting regopt to 9g. Add them to the
other compilers (and improve 9g's successor debug print).
LGTM=rsc
R=rsc
CC=golang-codereviews
https://codereview.appspot.com/174130043
|
| |/ /
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
I added several comments to the regopt-related structures when
porting it to 9g. Synchronize those comments back in to the
other compilers.
LGTM=rsc
R=rsc
CC=golang-codereviews
https://codereview.appspot.com/175720043
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Replace a bit-wise AND with a logical one. This happened to
work before because bany returns 0 or 1, but the intent here
is clearly logical (and this makes 5g match with 6g and 8g).
LGTM=rsc
R=rsc
CC=golang-codereviews
https://codereview.appspot.com/172850043
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
For OITAB nodes, 5g's naddr was setting the wrong etype and
failing to set the width of the resulting Addr.
LGTM=rsc
R=rsc
CC=golang-codereviews
https://codereview.appspot.com/171220043
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The etype of references to strings was being incorrectly set
to TINT32 on all platforms. Change it to TSTRING. It seems
this doesn't matter for compilation, since x86 uses LEA
instructions to load string addresses and arm and power64
disassemble the string into its constituent pieces (with the
correct types), but it helps when debugging.
LGTM=rsc
R=rsc
CC=golang-codereviews
https://codereview.appspot.com/170100043
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
So far all of our architectures have had at most 32 registers,
so we've been able to use entry 0 in the Bits uint32 array
directly as a register mask. Power64 has 64 registers, so
this converts Bits to a uint64 array so we can continue to use
entry 0 directly as a register mask on Power64.
LGTM=rsc
R=rsc
CC=golang-codereviews
https://codereview.appspot.com/169060043
|
| |/
|/|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This patch is based only on reading the code. I have not
tried to construct a test case.
Fixes issue 9077.
LGTM=minux
R=minux
CC=golang-codereviews
https://codereview.appspot.com/172110043
|
|/
|
|
|
|
|
|
|
|
|
|
| |
The test intended to skip direct calls when creating
registerization variables was testing p->to.type instead of
p->to.name, so it always failed, causing regopt to create
unnecessary variables for these names.
LGTM=rsc
R=rsc
CC=golang-codereviews
https://codereview.appspot.com/169110043
|
|
|
|
|
|
|
|
|
|
| |
There's no point in continuing. We will only get confused.
6g already makes this fatal.
LGTM=dave, minux, iant
R=iant, dave, minux
CC=golang-codereviews
https://codereview.appspot.com/140660043
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The argsize PCDATA was specifying the number of
bytes passed to a function call, so that if the function
did not specify its argument count, the garbage collector
could use the call site information to scan those bytes
conservatively. We don't do that anymore, so stop
generating the information.
LGTM=khr
R=khr
CC=golang-codereviews
https://codereview.appspot.com/139530043
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This CL adjusts code referring to src/pkg to refer to src.
Immediately after submitting this CL, I will submit
a change doing 'hg mv src/pkg/* src'.
That change will be too large to review with Rietveld
but will contain only the 'hg mv'.
This CL will break the build.
The followup 'hg mv' will fix it.
For more about the move, see golang.org/s/go14nopkg.
LGTM=r
R=r
CC=golang-codereviews
https://codereview.appspot.com/134570043
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Before, a slice with cap=0 or a string with len=0 might have its
base pointer pointing beyond the actual slice/string data into
the next block. The collector had to ignore slices and strings with
cap=0 in order to avoid misinterpreting the base pointer.
Now, a slice with cap=0 or a string with len=0 still has a base
pointer pointing into the actual slice/string data, no matter what.
The collector can now always scan the pointer, which means
strings and slices are no longer special.
Fixes issue 8404.
LGTM=khr, josharian
R=josharian, khr, dvyukov
CC=golang-codereviews
https://codereview.appspot.com/112570044
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Update issue 8525
Some temporary variables that were fully registerized nevertheless had stack space allocated for them because the Addrs were still marked as having associated nodes.
Distribution of stack space reserved for temporary variables while running make.bash (6g):
BEFORE
40.89% 7026 allocauto: 0 to 0
7.83% 1346 allocauto: 0 to 24
7.22% 1241 allocauto: 0 to 8
6.30% 1082 allocauto: 0 to 16
4.96% 853 allocauto: 0 to 56
4.59% 789 allocauto: 0 to 32
2.97% 510 allocauto: 0 to 40
2.32% 399 allocauto: 0 to 48
2.10% 360 allocauto: 0 to 64
1.91% 328 allocauto: 0 to 72
AFTER
48.49% 8332 allocauto: 0 to 0
9.52% 1635 allocauto: 0 to 16
5.28% 908 allocauto: 0 to 48
4.80% 824 allocauto: 0 to 32
4.73% 812 allocauto: 0 to 8
3.38% 581 allocauto: 0 to 24
2.35% 404 allocauto: 0 to 40
2.32% 399 allocauto: 0 to 64
1.65% 284 allocauto: 0 to 56
1.34% 230 allocauto: 0 to 72
LGTM=rsc
R=rsc
CC=dave, dvyukov, golang-codereviews, minux
https://codereview.appspot.com/126160043
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
cmd/6g has been doing this for a long time.
Arrays are still problematic on 5g because the addressing
for t[0] where local var t has type [3]uintptr takes the address of t.
That's issue 8125.
Fixes issue 8123.
LGTM=josharian
R=josharian, dave
CC=golang-codereviews
https://codereview.appspot.com/102890046
|
|
|
|
|
|
|
| |
LGTM=bradfitz, dave
R=rsc, bradfitz, dave
CC=golang-codereviews
https://codereview.appspot.com/115070043
|
|
|
|
|
|
|
|
|
|
|
|
| |
They do not, but pretend that they do.
The immediate need is that it breaks the new GC because
these are weird symbols as if with pointers but not necessary
pointer aligned.
LGTM=rsc
R=golang-codereviews, dave, josharian, khr, rsc
CC=golang-codereviews, iant, khr, rlh
https://codereview.appspot.com/116060043
|
|
|
|
|
|
|
| |
LGTM=dave, rsc
R=rsc, iant, dave
CC=golang-codereviews
https://codereview.appspot.com/108360043
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The main changes fall into a few patterns:
1. Replace #define with enum.
2. Add /*c2go */ comment giving effect of #define.
This is necessary for function-like #defines and
non-enum-able #defined constants.
(Not all compilers handle negative or large enums.)
3. Add extra braces in struct initializer.
(c2go does not implement the full rules.)
This is enough to let c2go typecheck the source tree.
There may be more changes once it is doing
other semantic analyses.
LGTM=minux, iant
R=minux, dave, iant
CC=golang-codereviews
https://codereview.appspot.com/106860045
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This CL forces the optimizer to preserve some memory stores
that would be redundant except that a stack scan due to garbage
collection or stack copying might look at them during a function call.
As such, it forces additional memory writes and therefore slows
down the execution of some programs, especially garbage-heavy
programs that are already limited by memory bandwidth.
The slowdown can be as much as 7% for end-to-end benchmarks.
These numbers are from running go1.test -test.benchtime=5s three times,
taking the best (lowest) ns/op for each benchmark. I am excluding
benchmarks with time/op < 10us to focus on macro effects.
All benchmarks are on amd64.
Comparing tip (a27f34c771cb) against this CL on an Intel Core i5 MacBook Pro:
benchmark old ns/op new ns/op delta
BenchmarkBinaryTree17 3876500413 3856337341 -0.52%
BenchmarkFannkuch11 2965104777 2991182127 +0.88%
BenchmarkGobDecode 8563026 8788340 +2.63%
BenchmarkGobEncode 5050608 5267394 +4.29%
BenchmarkGzip 431191816 434168065 +0.69%
BenchmarkGunzip 107873523 110563792 +2.49%
BenchmarkHTTPClientServer 85036 86131 +1.29%
BenchmarkJSONEncode 22143764 22501647 +1.62%
BenchmarkJSONDecode 79646916 85658808 +7.55%
BenchmarkMandelbrot200 4720421 4700108 -0.43%
BenchmarkGoParse 4651575 4712247 +1.30%
BenchmarkRegexpMatchMedium_1K 71986 73490 +2.09%
BenchmarkRegexpMatchHard_1K 111018 117495 +5.83%
BenchmarkRevcomp 648798723 659352759 +1.63%
BenchmarkTemplate 112673009 112819078 +0.13%
Comparing tip (a27f34c771cb) against this CL on an Intel Xeon E5520:
BenchmarkBinaryTree17 5461110720 5393104469 -1.25%
BenchmarkFannkuch11 4314677151 4327177615 +0.29%
BenchmarkGobDecode 11065853 11235272 +1.53%
BenchmarkGobEncode 6500065 6959837 +7.07%
BenchmarkGzip 647478596 671769097 +3.75%
BenchmarkGunzip 139348579 141096376 +1.25%
BenchmarkHTTPClientServer 69376 73610 +6.10%
BenchmarkJSONEncode 30172320 31796106 +5.38%
BenchmarkJSONDecode 113704905 114239137 +0.47%
BenchmarkMandelbrot200 6032730 6003077 -0.49%
BenchmarkGoParse 6775251 6405995 -5.45%
BenchmarkRegexpMatchMedium_1K 111832 113895 +1.84%
BenchmarkRegexpMatchHard_1K 161112 168420 +4.54%
BenchmarkRevcomp 876363406 892319935 +1.82%
BenchmarkTemplate 146273096 148998339 +1.86%
Just to get a sense of where we are compared to the previous release,
here are the same benchmarks comparing Go 1.2 to this CL.
Comparing Go 1.2 against this CL on an Intel Core i5 MacBook Pro:
BenchmarkBinaryTree17 4370077662 3856337341 -11.76%
BenchmarkFannkuch11 3347052657 2991182127 -10.63%
BenchmarkGobDecode 8791384 8788340 -0.03%
BenchmarkGobEncode 4968759 5267394 +6.01%
BenchmarkGzip 437815669 434168065 -0.83%
BenchmarkGunzip 94604099 110563792 +16.87%
BenchmarkHTTPClientServer 87798 86131 -1.90%
BenchmarkJSONEncode 22818243 22501647 -1.39%
BenchmarkJSONDecode 97182444 85658808 -11.86%
BenchmarkMandelbrot200 4733516 4700108 -0.71%
BenchmarkGoParse 5054384 4712247 -6.77%
BenchmarkRegexpMatchMedium_1K 67612 73490 +8.69%
BenchmarkRegexpMatchHard_1K 107321 117495 +9.48%
BenchmarkRevcomp 733270055 659352759 -10.08%
BenchmarkTemplate 109304977 112819078 +3.21%
Comparing Go 1.2 against this CL on an Intel Xeon E5520:
BenchmarkBinaryTree17 5986953594 5393104469 -9.92%
BenchmarkFannkuch11 4861139174 4327177615 -10.98%
BenchmarkGobDecode 11830997 11235272 -5.04%
BenchmarkGobEncode 6608722 6959837 +5.31%
BenchmarkGzip 661875826 671769097 +1.49%
BenchmarkGunzip 138630019 141096376 +1.78%
BenchmarkHTTPClientServer 71534 73610 +2.90%
BenchmarkJSONEncode 30393609 31796106 +4.61%
BenchmarkJSONDecode 139645860 114239137 -18.19%
BenchmarkMandelbrot200 5988660 6003077 +0.24%
BenchmarkGoParse 6974092 6405995 -8.15%
BenchmarkRegexpMatchMedium_1K 111331 113895 +2.30%
BenchmarkRegexpMatchHard_1K 165961 168420 +1.48%
BenchmarkRevcomp 995049292 892319935 -10.32%
BenchmarkTemplate 145623363 148998339 +2.32%
Fixes issue 8036.
LGTM=khr
R=golang-codereviews, josharian, khr
CC=golang-codereviews, iant, r
https://codereview.appspot.com/99660044
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[Same as CL 102820043 except applied changes to 6g/gsubr.c
also to 5g/gsubr.c and 8g/gsubr.c. The problem I had last night
trying to do that was that 8g's copy of nodarg has different
(but equivalent) control flow and I was pasting the new code
into the wrong place.]
Description from CL 102820043:
The 'nodarg' function is used to obtain a Node*
representing a function argument or result.
It returned a brand new Node*, but that violates
the guarantee in most places in the compiler that
two Node*s refer to the same variable if and only if
they are the same Node* pointer. Reestablish that
invariant by making nodarg return a preexisting
named variable if present.
Having fixed that, avoid any copy during x=x in
componentgen, because the VARDEF we emit
before the copy marks the lhs x as dead incorrectly.
The change in walk.c avoids modifying the result
of nodarg. This was the only place in the compiler
that did so.
Fixes issue 8097.
LGTM=khr
R=golang-codereviews, khr
CC=golang-codereviews, iant, khr, r
https://codereview.appspot.com/103750043
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Breaks 386 and arm builds.
The obvious reason is that this CL only edited 6g/gsubr.c
and failed to edit 5g/gsubr.c and 8g/gsubr.c.
However, the obvious CL applying the same edit to those
files (CL 101900043) causes mysterious build failures
in various of the standard package tests, usually involving
reflect. Something deep and subtle is broken but only on
the 32-bit systems.
Undo this CL for now.
??? original CL description
cmd/gc: fix x=x crash
The 'nodarg' function is used to obtain a Node*
representing a function argument or result.
It returned a brand new Node*, but that violates
the guarantee in most places in the compiler that
two Node*s refer to the same variable if and only if
they are the same Node* pointer. Reestablish that
invariant by making nodarg return a preexisting
named variable if present.
Having fixed that, avoid any copy during x=x in
componentgen, because the VARDEF we emit
before the copy marks the lhs x as dead incorrectly.
The change in walk.c avoids modifying the result
of nodarg. This was the only place in the compiler
that did so.
Fixes issue 8097.
LGTM=r, khr
R=golang-codereviews, r, khr
CC=golang-codereviews, iant
https://codereview.appspot.com/102820043
???
TBR=r
CC=golang-codereviews, khr
https://codereview.appspot.com/95660043
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The 'nodarg' function is used to obtain a Node*
representing a function argument or result.
It returned a brand new Node*, but that violates
the guarantee in most places in the compiler that
two Node*s refer to the same variable if and only if
they are the same Node* pointer. Reestablish that
invariant by making nodarg return a preexisting
named variable if present.
Having fixed that, avoid any copy during x=x in
componentgen, because the VARDEF we emit
before the copy marks the lhs x as dead incorrectly.
The change in walk.c avoids modifying the result
of nodarg. This was the only place in the compiler
that did so.
Fixes issue 8097.
LGTM=r, khr
R=golang-codereviews, r, khr
CC=golang-codereviews, iant
https://codereview.appspot.com/102820043
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Globals, function arguments, and results are special cases in
registerization.
Globals must be flushed aggressively, because nearly any
operation can cause a panic, and the recovery code must see
the latest values. Globals also must be loaded aggressively,
because nearly any store through a pointer might be updating a
global: the compiler cannot see all the "address of"
operations on globals, especially exported globals. To
accomplish this, mark all globals as having their address
taken, which effectively disables registerization.
If a function contains a defer statement, the function results
must be flushed aggressively, because nearly any operation can
cause a panic, and the deferred code may call recover, causing
the original function to return the current values of its
function results. To accomplish this, mark all function
results as having their address taken if the function contains
any defer statements. This causes not just aggressive flushing
but also aggressive loading. The aggressive loading is
overkill but the best we can do in the current code.
Function arguments must be considered live at all safe points
in a function, because garbage collection always preserves
them: they must be up-to-date in order to be preserved
correctly. Accomplish this by marking them live at all call
sites. An earlier attempt at this marked function arguments as
having their address taken, which disabled registerization
completely, making programs slower. This CL's solution allows
registerization while preserving safety. The benchmark speedup
is caused by being able to registerize again (the earlier CL
lost the same amount).
benchmark old ns/op new ns/op delta
BenchmarkEqualPort32 61.4 56.0 -8.79%
benchmark old MB/s new MB/s speedup
BenchmarkEqualPort32 521.56 570.97 1.09x
Fixes issue 1304. (again)
Fixes issue 7944. (again)
Fixes issue 7984.
Fixes issue 7995.
LGTM=khr
R=golang-codereviews, khr
CC=golang-codereviews, iant, r
https://codereview.appspot.com/97500044
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The inputs to a function are marked live at all times in the
liveness bitmaps, so that the garbage collector will not free
the things they point at and reuse the pointers, so that the
pointers shown in stack traces are guaranteed not to have
been recycled.
Unfortunately, no one told the register optimizer that the
inputs need to be preserved at all call sites. If a function
is done with a particular input value, the optimizer will stop
preserving it across calls. For single-word values this just
means that the value recorded might be stale. For multi-word
values like slices, the value recorded could be only partially stale:
it can happen that, say, the cap was updated but not the len,
or that the len was updated but not the base pointer.
Either of these possibilities (and others) would make the
garbage collector misinterpret memory, leading to memory
corruption.
This came up in a real program, in which the garbage collector's
'slice len ? slice cap' check caught the inconsistency.
Fixes issue 7944.
LGTM=iant
R=golang-codereviews, iant
CC=golang-codereviews, khr
https://codereview.appspot.com/100370045
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is joint work with Daniel Morsing.
In order for the register allocator to alias two variables, they must have the same width, stack offset, and etype. Code generation was altering a variable's etype in a few places. This prevented the variable from being moved to a register, which in turn prevented peephole optimization. This failure to alias was very common, with almost 23,000 instances just running make.bash.
This phenomenon was not visible in the register allocation debug output because the variables that failed to alias had the same name. The debugging-only change to bits.c fixes this by printing the variable number with its name.
This CL fixes the source of all etype mismatches for 6g, all but one case for 8g, and depressingly few cases for 5g. (I believe that extending CL 6819083 to 5g is a prerequisite.) Fixing the remaining cases in 8g and 5g is work for the future.
The etype mismatch fixes are:
* [gc] Slicing changed the type of the base pointer into a uintptr in order to perform arithmetic on it. Instead, support addition directly on pointers.
* [*g] OSPTR was giving type uintptr to slice base pointers; undo that. This arose, for example, while compiling copy(dst, src).
* [8g] 64 bit float conversion was assigning int64 type during codegen, overwriting the existing uint64 type.
Note that some etype mismatches are appropriate, such as a struct with a single field or an array with a single element.
With these fixes, the number of registerizations that occur while running make.bash for 6g increases ~10%. Hello world binary size shrinks ~1.5%. Running all benchmarks in the standard library show performance improvements ranging from nominal to substantive (>10%); a full comparison using 6g on my laptop is available at https://gist.github.com/josharian/8f9b5beb46667c272064. The microbenchmarks must be taken with a grain of salt; see issue 7920. The few benchmarks that show real regressions are likely due to issue 7920. I manually examined the generated code for the top few regressions and none had any assembly output changes. The few benchmarks that show extraordinary improvements are likely also due to issue 7920.
Performance results from 8g appear similar to 6g.
5g shows no performance improvements. This is not surprising, given the discussion above.
Update issue 7316
LGTM=rsc
R=rsc, daniel.morsing, bradfitz
CC=dave, golang-codereviews
https://codereview.appspot.com/91850043
Committer: Russ Cox <rsc@golang.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The register allocator decides which variables should be placed into registers by charging for each load/store and crediting for each use, and then selecting an allocation with minimal cost. NOPs will be eliminated, however, so using a variable in a NOP should not generate credit.
Issue 7867 arises from attempted registerization of multi-word variables because they are used in NOPs. By not crediting for that use, they will no longer be considered for registerization.
This fix could theoretically lead to better register allocation, but NOPs are rare relative to other instructions.
Fixes issue 7867.
LGTM=rsc
R=rsc
CC=golang-codereviews
https://codereview.appspot.com/94810044
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
benchmark old ns/op new ns/op delta
BenchmarkCopyFat512 1307 329 -74.83%
BenchmarkCopyFat256 666 169 -74.62%
BenchmarkCopyFat1024 2617 671 -74.36%
BenchmarkCopyFat128 343 89.0 -74.05%
BenchmarkCopyFat64 182 48.9 -73.13%
BenchmarkCopyFat32 103 28.8 -72.04%
BenchmarkClearFat128 102 46.6 -54.31%
BenchmarkClearFat512 344 167 -51.45%
BenchmarkClearFat64 50.5 26.5 -47.52%
BenchmarkClearFat256 147 87.2 -40.68%
BenchmarkClearFat32 22.7 16.4 -27.75%
BenchmarkClearFat1024 511 662 +29.55%
Fixes issue 7624
LGTM=rsc
R=golang-codereviews, khr, bradfitz, josharian, dave, rsc
CC=golang-codereviews
https://codereview.appspot.com/92760044
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In large functions with many variables, the register optimizer
may give up and choose not to track certain variables at all.
In this case, the "nextinnode" information linking together
all the words from a given variable will be incomplete, and
the result may be that only some of a multiword value is
preserved across a call. That confuses the garbage collector,
so don't do that. Instead, mark those variables as having
their address taken, so that they will be preserved at all
calls. It's overkill, but correct.
Tested by hand using the 6g -S output to see that it does fix
the buggy generated code leading to the issue 7726 failure.
There is no automated test because I managed to break the
compiler while writing a test (see issue 7727). I will check
in a test along with the fix to issue 7727.
Fixes issue 7726.
LGTM=khr
R=khr, bradfitz, dave
CC=golang-codereviews
https://codereview.appspot.com/85200043
|
|
|
|
|
|
|
|
| |
Botched during CL 83090046.
TBR=khr
CC=golang-codereviews
https://codereview.appspot.com/83070046
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1. In functions with heap-allocated result variables or with
defer statements, the return sequence requires more than
just a single RET instruction. There is an optimization that
arranges for all returns to jump to a single copy of the return
epilogue in this case. Unfortunately, that optimization is
fundamentally incompatible with PC-based liveness information:
it takes PCs at many different points in the function and makes
them all land at one PC, making the combined liveness information
at that target PC a mess. Disable this optimization, so that each
return site gets its own copy of the 'call deferreturn' and the
copying of result variables back from the heap.
This removes quite a few spurious 'ambiguously live' variables.
2. Let orderexpr allocate temporaries that are passed by address
to a function call and then die on return, so that we can arrange
an appropriate VARKILL.
2a. Do this for ... slices.
2b. Do this for closure structs.
2c. Do this for runtime.concatstring, which is the implementation
of large string additions. Change representation of OADDSTR to
an explicit list in typecheck to avoid reconstructing list in both
walk and order.
3. Let orderexpr allocate the temporary variable copies used for
range loops, so that they can be killed when the loop is over.
Similarly, let it allocate the temporary holding the map iterator.
CL 81940043 reduced the number of ambiguously live temps
in the godoc binary from 860 to 711.
This CL reduces the number to 121. Still more to do, but another
good checkpoint.
Update issue 7345
LGTM=khr
R=khr
CC=golang-codereviews
https://codereview.appspot.com/83090046
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The new channel and map runtime routines take pointers
to values, typically temporaries. Without help, the compiler
cannot tell when those temporaries stop being needed,
because it isn't sure what happened to the pointer.
Arrange to insert explicit VARKILL instructions for these
temporaries so that the liveness analysis can avoid seeing
them as "ambiguously live".
The change is made in order.c, which was already in charge of
introducing temporaries to preserve the order-of-evaluation
guarantees. Now its job has expanded to include introducing
temporaries as needed by runtime routines, and then also
inserting the VARKILL annotations for all these temporaries,
so that their lifetimes can be shortened.
In order to do its job for the map runtime routines, order.c arranges
that all map lookups or map assignments have the form:
x = m[k]
x, y = m[k]
m[k] = x
where x, y, and k are simple variables (often temporaries).
Likewise, receiving from a channel is now always:
x = <-c
In order to provide the map guarantee, order.c is responsible for
rewriting x op= y into x = x op y, so that m[k] += z becomes
t = m[k]
t2 = t + z
m[k] = t2
While here, fix a few bugs in order.c's traversal: it was failing to
walk into select and switch case bodies, so order of evaluation
guarantees were not preserved in those situations.
Added tests to test/reorder2.go.
Fixes issue 7671.
In gc/popt's temporary-merging optimization, allow merging
of temporaries with their address taken as long as the liveness
ranges do not intersect. (There is a good chance of that now
that we have VARKILL annotations to limit the liveness range.)
Explicitly killing temporaries cuts the number of ambiguously
live temporaries that must be zeroed in the godoc binary from
860 to 711, or -17%. There is more work to be done, but this
is a good checkpoint.
Update issue 7345
LGTM=khr
R=khr
CC=golang-codereviews
https://codereview.appspot.com/81940043
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1. On entry to a function, only zero the ambiguously live stack variables.
Before, we were zeroing all stack variables containing pointers.
The zeroing is pretty inefficient right now (issue 7624), but there are also
too many stack variables detected as ambiguously live (issue 7345),
and that must be addressed before deciding how to improve the zeroing code.
(Changes in 5g/ggen.c, 6g/ggen.c, 8g/ggen.c, gc/pgen.c)
Fixes issue 7647.
2. Make the regopt word-based liveness analysis preserve the
whole-variable liveness property expected by the garbage collection
bitmap liveness analysis. That is, if the regopt liveness decides that
one word in a struct needs to be preserved, make sure it preserves
the entire struct. This is particularly important for multiword values
such as strings, slices, and interfaces, in which all the words need
to be present in order to understand the meaning.
(Changes in 5g/reg.c, 6g/reg.c, 8g/reg.c.)
Fixes issue 7591.
3. Make the regopt word-based liveness analysis treat a variable
as having its address taken - which makes it preserved across
all future calls - whenever n->addrtaken is set, for consistency
with the gc bitmap liveness analysis, even if there is no machine
instruction actually taking the address. In this case n->addrtaken
is incorrect (a nicer way to put it is overconservative), and ideally
there would be no such cases, but they can happen and the two
analyses need to agree.
(Changes in 5g/reg.c, 6g/reg.c, 8g/reg.c; test in bug484.go.)
Fixes crashes found by turning off "zero everything" in step 1.
4. Remove spurious VARDEF annotations. As the comment in
gc/pgen.c explains, the VARDEF must immediately precede
the initialization. It cannot be too early, and it cannot be too late.
In particular, if a function call sits between the VARDEF and the
actual machine instructions doing the initialization, the variable
will be treated as live during that function call even though it is
uninitialized, leading to problems.
(Changes in gc/gen.c; test in live.go.)
Fixes crashes found by turning off "zero everything" in step 1.
5. Do not treat loading the address of a wide value as a signal
that the value must be initialized. Instead depend on the existence
of a VARDEF or the first actual read/write of a word in the value.
If the load is in order to pass the address to a function that does
the actual initialization, treating the load as an implicit VARDEF
causes the same problems as described in step 4.
The alternative is to arrange to zero every such value before
passing it to the real initialization function, but this is a much
easier and more efficient change.
(Changes in gc/plive.c.)
Fixes crashes found by turning off "zero everything" in step 1.
6. Treat wide input parameters with their address taken as
initialized on entry to the function. Otherwise they look
"ambiguously live" and we will try to emit code to zero them.
(Changes in gc/plive.c.)
Fixes crashes found by turning off "zero everything" in step 1.
7. An array of length 0 has no pointers, even if the element type does.
Without this change, the zeroing code complains when asked to
clear a 0-length array.
(Changes in gc/reflect.c.)
LGTM=khr
R=khr
CC=golang-codereviews
https://codereview.appspot.com/80160044
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Replaces CL 70000043.
Introduce linkarchinit() from cmd/ld.
For cmd/6g, switch to the amd64p32 linker model if we are building under nacl/amd64p32.
LGTM=rsc
R=rsc
CC=golang-codereviews
https://codereview.appspot.com/71330045
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For non-closure functions, the context register is uninitialized
on entry and will not be used, but morestack saves it and then the
garbage collector treats it as live. This can be a source of memory
leaks if the context register points at otherwise dead memory.
Avoid this by introducing a parallel set of morestack functions
that clear the context register, and use those for the non-closure functions.
I hope this will help with some of the finalizer flakiness, but it probably won't.
Fixes issue 7244.
LGTM=dvyukov
R=khr, dvyukov
CC=golang-codereviews
https://codereview.appspot.com/71030044
|
|
|
|
|
|
|
|
|
|
|
| |
maxstksize is superfluous and appears to be vestigial. 6g does not use it.
c >= 4 cannot occur; c = w % 4.
LGTM=rsc
R=rsc
CC=golang-codereviews
https://codereview.appspot.com/68750043
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
See golang.org/s/go13nacl for design overview.
This CL is the mostly mechanical changes from rsc's Go 1.2 based NaCl branch, specifically 39cb35750369 to 500771b477cf from https://code.google.com/r/rsc-go13nacl. This CL does not include working NaCl support, there are probably two or three more large merges to come.
CL 15750044 is not included as it involves more invasive changes to the linker which will need to be merged separately.
The exact change lists included are
15050047: syscall: support for Native Client
15360044: syscall: unzip implementation for Native Client
15370044: syscall: Native Client SRPC implementation
15400047: cmd/dist, cmd/go, go/build, test: support for Native Client
15410048: runtime: support for Native Client
15410049: syscall: file descriptor table for Native Client
15410050: syscall: in-memory file system for Native Client
15440048: all: update +build lines for Native Client port
15540045: cmd/6g, cmd/8g, cmd/gc: support for Native Client
15570045: os: support for Native Client
15680044: crypto/..., hash/crc32, reflect, sync/atomic: support for amd64p32
15690044: net: support for Native Client
15690048: runtime: support for fake time like on Go Playground
15690051: build: disable various tests on Native Client
LGTM=rsc
R=rsc
CC=golang-codereviews
https://codereview.appspot.com/68150047
Committer: Russ Cox <rsc@golang.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The code here is being restored after its deletion in CL 14430048.
I restored the copy in cmd/6g in CL 56430043 but neglected the
other two.
This is the reason that enabling precisestack only worked on amd64.
LGTM=r
R=r
CC=golang-codereviews
https://codereview.appspot.com/66170043
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The VARDEF placement must be before the initialization
but after any final use. If you have something like s = ... using s ...
the rhs must be evaluated, then the VARDEF, then the lhs
assigned.
There is a large comment in pgen.c on gvardef explaining
this in more detail.
This CL also includes Ian's suggestions from earlier CLs,
namely commenting the use of mode in link.h and fixing
the precedence of the ~r check in dcl.c.
This CL enables the check that if liveness analysis decides
a variable is live on entry to the function, that variable must
be a function parameter (not a result, and not a local variable).
If this check fails, it indicates a bug in the liveness analysis or
in the generated code being analyzed.
The race detector generates invalid code for append(x, y...).
The code declares a temporary t and then uses cap(t) before
initializing t. The new liveness check catches this bug and
stops the compiler from writing out the buggy code.
Consequently, this CL disables the race detector tests in
run.bash until the race detector bug can be fixed
(golang.org/issue/7334).
Except for the race detector bug, the liveness analysis check
does not detect any problems (this CL and the previous CLs
fixed all the detected problems).
The net test still fails with GOGC=0 but the rest of the tests
now pass or time out (because GOGC=0 is so slow).
TBR=iant
CC=golang-codereviews
https://codereview.appspot.com/64170043
|
|
|
|
|
|
|
|
|
|
|
|
| |
Any initialization of a variable by a block copy or block zeroing
or by multiple assignments (componentwise copying or zeroing
of a multiword variable) needs to emit a VARDEF. These cases were not.
Fixes issue 7205.
TBR=iant
CC=golang-codereviews
https://codereview.appspot.com/63650044
|
|
|
|
|
|
|
|
|
|
|
|
| |
The test added in CL 63630043 fails on 5g and 8g because they
were not emitting the VARDEF instruction when clearing a fat
value by clearing the components. 6g had the call in the right place.
Hooray tests.
TBR=iant
CC=golang-codereviews
https://codereview.appspot.com/63660043
|
|
|
|
|
|
|
|
|
|
|
| |
The "fat" referred to being used for multiword values only.
We're going to use it for non-fat values sometimes too.
No change other than the renaming.
TBR=iant
CC=golang-codereviews
https://codereview.appspot.com/63650043
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
copyau1 was assuming that it could deduce the type of the
middle register p->reg from the type of the left or right
argument: in CMPF F1, F2, the p->reg==2 must be a D_FREG
because p->from is F1, and in CMP R1, R2, the p->reg==2 must
be a D_REG because p->from is R1.
This heuristic fails for CMP $0, R2, which was causing copyau1
not to recognize p->reg==2 as a reference to R2, which was
keeping it from properly renaming the register use when
substituting registers.
cmd/5c has the right approach: look at the opcode p->as to
decide the kind of register. It is unclear where 5g's copyau1
came from; perhaps it was an attempt to avoid expanding 5c's
a2type to include new instructions used only by 5g.
Copy a2type from cmd/5c, expand to include additional instructions,
and make it crash the compiler if asked about an instruction
it does not understand (avoid silent bugs in the future if new
instructions are added).
Should fix current arm build breakage.
While we're here, fix the print statements dumping the pred and
succ info in the asm listing to pass an int arg to %.4ud
(Prog.pc is a vlong now, due to the liblink merge).
TBR=ken2
CC=golang-codereviews
https://codereview.appspot.com/62730043
|
|
|
|
|
|
|
|
|
| |
Fixes issue 7294.
LGTM=minux.ma, dave, bradfitz
R=golang-codereviews, minux.ma, dave, bradfitz
CC=golang-codereviews
https://codereview.appspot.com/61370043
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We now use the %A, %D, %P, and %R routines from liblink
across the board.
Fixes issue 7178.
Fixes issue 7055.
LGTM=iant
R=golang-codereviews, gobot, rsc, dave, iant, remyoudompheng
CC=golang-codereviews
https://codereview.appspot.com/49170043
Committer: Russ Cox <rsc@golang.org>
|
|
|
|
|
|
|
|
|
|
|
| |
The UNDEF instruction was listed in the instruction data as having the next instruction in the stream as its successor. This confused the optimizer into adding a load where it wasn't needed, in turn confusing the liveness analysis pass for GC bitmaps into thinking that the variable was live.
Fixes issue 7229.
LGTM=iant, rsc
R=golang-codereviews, bradfitz, iant, dave, rsc
CC=golang-codereviews
https://codereview.appspot.com/56910045
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Eventually we will want to bypass DATA for everything,
but the relocations are not standardized well enough across
architectures to make that possible.
This did not help as much as I expected, but it is definitely better.
It shaves maybe 1-2% off all.bash depending on how much you
trust the timings of a single run:
Before: 241.139r 362.702u 112.967s
After: 234.339r 359.623u 111.045s
R=golang-codereviews, gobot, r, iant
CC=golang-codereviews
https://codereview.appspot.com/44650043
|