| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
| |
This patch changes the `Process` struct to only provide the functions
expected to be visible by the interface. So, now we only export the
open, reset, and size query functions. This prevents users of the
interface from messing with the internals of the process, so now the
only existing failure mode is mismatched send and recieve calls.
Reviewed By: michaelrj
Differential Revision: https://reviews.llvm.org/D150703
|
|
|
|
|
|
|
|
|
|
| |
This function is used to add unit test and hermetic test framework libraries.
It avoids the duplicated code to add compile options to each every test
framework libraries.
Reviewed By: jhuber6
Differential Revision: https://reviews.llvm.org/D150727
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit 65429b9af6a2c99d340ab2dcddd41dab201f399c.
Broke several projects, see https://reviews.llvm.org/D144509#4347562 onwards.
Also reverts follow-up commit "[OpenMP] Compile assembly files as ASM, not C"
This reverts commit 4072c8aee4c89c4457f4f30d01dc9bb4dfa52559.
Also reverts fix attempt "[cmake] Set CMP0091 to fix Windows builds after the cmake_minimum_required bump"
This reverts commit 7d47dac5f828efd1d378ba44a97559114f00fb64.
|
|
|
|
|
|
|
|
|
|
|
|
| |
LIBC_INLINE was doubly defined in two headers. Define it only in
one place. Also update a few uses to make sure it's always placed
where a function attribute is valid and is used consistently on
every declaration of the same function in case the attributes used
in its definition must match on declarations and definitions.
Reviewed By: abrachet
Differential Revision: https://reviews.llvm.org/D150731
|
|
|
|
|
|
| |
Few hermetic tests are failing as they are running out of memory.
Differential Revision: https://reviews.llvm.org/D150724
|
|
|
|
|
|
|
|
|
|
|
| |
There are not tests currently which use the main test framework but not
the `main` function from LibcTestMain.cpp. So, this change essentially
simplifies by merging the *TestMain libraries with the main test
libraries.
Reviewed By: michaelrj, jhuber6
Differential Revision: https://reviews.llvm.org/D150698
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently we use a template parameter called `InvertInbox` to invert the
inbox when we load it. This is more easily understood as a static check
on whether or not the process running it is the server. Inverting the
inbox makes the states 1 0 and 0 1 own the buffer, so it's easier to
simply say that the server own the buffer if in != out. Also clean up some of
the comments.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D150365
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds two versions of `bcmp` optimized for architectures where unaligned accesses are either illegal or extremely slow.
It is currently enabled for RISCV 64 and RISCV 32 but it could be used for ARM 32 architectures as well.
Here is the before / after output of `libc.benchmarks.memory_functions.opt_host --benchmark_filter=BM_memcmp` on a quad core Linux starfive RISCV 64 board running at 1.5GHz.
Before
```
Run on (4 X 1500 MHz CPU s)
CPU Caches:
L1 Instruction 32 KiB (x4)
L1 Data 32 KiB (x4)
L2 Unified 2048 KiB (x1)
----------------------------------------------------------------------
Benchmark Time CPU Iterations UserCounters...
----------------------------------------------------------------------
BM_Memcmp/0/0 110 ns 66.4 ns 10404864 bytes_per_cycle=0.107646/s bytes_per_second=153.989M/s items_per_second=15.071M/s __llvm_libc::memcmp,memcmp Google A
BM_Memcmp/1/0 318 ns 211 ns 3026944 bytes_per_cycle=0.131539/s bytes_per_second=188.167M/s items_per_second=4.73691M/s __llvm_libc::memcmp,memcmp Google B
BM_Memcmp/2/0 204 ns 115 ns 6118400 bytes_per_cycle=0.121675/s bytes_per_second=174.058M/s items_per_second=8.70241M/s __llvm_libc::memcmp,memcmp Google D
BM_Memcmp/3/0 143 ns 99.6 ns 7013376 bytes_per_cycle=0.117974/s bytes_per_second=168.763M/s items_per_second=10.0437M/s __llvm_libc::memcmp,memcmp Google L
BM_Memcmp/4/0 81.3 ns 58.2 ns 11426816 bytes_per_cycle=0.101125/s bytes_per_second=144.661M/s items_per_second=17.1805M/s __llvm_libc::memcmp,memcmp Google M
BM_Memcmp/5/0 177 ns 118 ns 5952512 bytes_per_cycle=0.120612/s bytes_per_second=172.537M/s items_per_second=8.45549M/s __llvm_libc::memcmp,memcmp Google Q
BM_Memcmp/6/0 342 ns 220 ns 3483648 bytes_per_cycle=0.132004/s bytes_per_second=188.834M/s items_per_second=4.54739M/s __llvm_libc::memcmp,memcmp Google S
BM_Memcmp/7/0 208 ns 130 ns 5681152 bytes_per_cycle=0.12468/s bytes_per_second=178.356M/s items_per_second=7.6674M/s __llvm_libc::memcmp,memcmp Google U
BM_Memcmp/8/0 123 ns 79.1 ns 8387584 bytes_per_cycle=0.110593/s bytes_per_second=158.204M/s items_per_second=12.6439M/s __llvm_libc::memcmp,memcmp Google W
BM_Memcmp/9/0 20707 ns 10643 ns 67584 bytes_per_cycle=0.142401/s bytes_per_second=203.707M/s items_per_second=93.9559k/s __llvm_libc::memcmp,uniform 384 to 4096
```
After
```
BM_Memcmp/0/0 80.4 ns 55.8 ns 12648448 bytes_per_cycle=0.132703/s bytes_per_second=189.834M/s items_per_second=17.9256M/s __llvm_libc::memcmp,memcmp Google A
BM_Memcmp/1/0 140 ns 80.5 ns 8230912 bytes_per_cycle=0.337273/s bytes_per_second=482.474M/s items_per_second=12.4165M/s __llvm_libc::memcmp,memcmp Google B
BM_Memcmp/2/0 101 ns 66.4 ns 10571776 bytes_per_cycle=0.208539/s bytes_per_second=298.317M/s items_per_second=15.0687M/s __llvm_libc::memcmp,memcmp Google D
BM_Memcmp/3/0 118 ns 67.6 ns 10533888 bytes_per_cycle=0.176822/s bytes_per_second=252.946M/s items_per_second=14.7946M/s __llvm_libc::memcmp,memcmp Google L
BM_Memcmp/4/0 106 ns 53.0 ns 12722176 bytes_per_cycle=0.111141/s bytes_per_second=158.988M/s items_per_second=18.8591M/s __llvm_libc::memcmp,memcmp Google M
BM_Memcmp/5/0 141 ns 70.2 ns 10436608 bytes_per_cycle=0.26032/s bytes_per_second=372.39M/s items_per_second=14.2458M/s __llvm_libc::memcmp,memcmp Google Q
BM_Memcmp/6/0 144 ns 79.3 ns 8932352 bytes_per_cycle=0.353168/s bytes_per_second=505.211M/s items_per_second=12.612M/s __llvm_libc::memcmp,memcmp Google S
BM_Memcmp/7/0 123 ns 71.7 ns 9945088 bytes_per_cycle=0.22143/s bytes_per_second=316.758M/s items_per_second=13.9421M/s __llvm_libc::memcmp,memcmp Google U
BM_Memcmp/8/0 97.0 ns 56.2 ns 12509184 bytes_per_cycle=0.160526/s bytes_per_second=229.635M/s items_per_second=17.7784M/s __llvm_libc::memcmp,memcmp Google W
BM_Memcmp/9/0 1840 ns 989 ns 676864 bytes_per_cycle=1.4894/s bytes_per_second=2.08067G/s items_per_second=1010.92k/s __llvm_libc::memcmp,uniform 384 to 4096
```
glibc
```
BM_Memcmp/0/0 72.6 ns 51.7 ns 12963840 bytes_per_cycle=0.141261/s bytes_per_second=202.075M/s items_per_second=19.3246M/s glibc::memcmp,memcmp Google A
BM_Memcmp/1/0 118 ns 75.2 ns 9280512 bytes_per_cycle=0.354054/s bytes_per_second=506.478M/s items_per_second=13.3046M/s glibc::memcmp,memcmp Google B
BM_Memcmp/2/0 114 ns 62.9 ns 11152384 bytes_per_cycle=0.222675/s bytes_per_second=318.539M/s items_per_second=15.8943M/s glibc::memcmp,memcmp Google D
BM_Memcmp/3/0 84.0 ns 63.5 ns 11030528 bytes_per_cycle=0.186353/s bytes_per_second=266.581M/s items_per_second=15.7378M/s glibc::memcmp,memcmp Google L
BM_Memcmp/4/0 93.5 ns 51.2 ns 13462528 bytes_per_cycle=0.119215/s bytes_per_second=170.539M/s items_per_second=19.5384M/s glibc::memcmp,memcmp Google M
BM_Memcmp/5/0 123 ns 61.7 ns 11376640 bytes_per_cycle=0.225262/s bytes_per_second=322.239M/s items_per_second=16.1993M/s glibc::memcmp,memcmp Google Q
BM_Memcmp/6/0 122 ns 71.6 ns 9967616 bytes_per_cycle=0.380844/s bytes_per_second=544.802M/s items_per_second=13.9579M/s glibc::memcmp,memcmp Google S
BM_Memcmp/7/0 118 ns 65.6 ns 10555392 bytes_per_cycle=0.238677/s bytes_per_second=341.43M/s items_per_second=15.2334M/s glibc::memcmp,memcmp Google U
BM_Memcmp/8/0 90.4 ns 54.0 ns 12920832 bytes_per_cycle=0.161987/s bytes_per_second=231.724M/s items_per_second=18.5169M/s glibc::memcmp,memcmp Google W
BM_Memcmp/9/0 1045 ns 601 ns 1195008 bytes_per_cycle=2.53677/s bytes_per_second=3.54383G/s items_per_second=1.66423M/s glibc::memcmp,uniform 384 to 4096
```
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D150663
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[libc] Add optimized bcmp for RISCV
This patch adds two versions of bcmp optimized for architectures where unaligned accesses are either illegal or extremely slow.
It is currently enabled for RISCV 64 and RISCV 32 but it could be used for ARM 32 architectures as well.
Here is the before / after output of libc.benchmarks.memory_functions.opt_host --benchmark_filter=BM_Bcmp on a quad core Linux starfive RISCV 64 board running at 1.5GHz.
Before
```
Run on (4 X 1500 MHz CPU s)
CPU Caches:
L1 Instruction 32 KiB (x4)
L1 Data 32 KiB (x4)
L2 Unified 2048 KiB (x1)
Load Average: 7.03, 5.98, 3.71
----------------------------------------------------------------------
Benchmark Time CPU Iterations UserCounters...
----------------------------------------------------------------------
BM_Bcmp/0/0 102 ns 60.5 ns 11662336 bytes_per_cycle=0.122696/s bytes_per_second=175.518M/s items_per_second=16.5258M/s __llvm_libc::bcmp,memcmp Google A
BM_Bcmp/1/0 328 ns 172 ns 3737600 bytes_per_cycle=0.15256/s bytes_per_second=218.238M/s items_per_second=5.80575M/s __llvm_libc::bcmp,memcmp Google B
BM_Bcmp/2/0 199 ns 99.7 ns 7019520 bytes_per_cycle=0.141897/s bytes_per_second=202.986M/s items_per_second=10.032M/s __llvm_libc::bcmp,memcmp Google D
BM_Bcmp/3/0 173 ns 86.5 ns 8361984 bytes_per_cycle=0.13863/s bytes_per_second=198.312M/s items_per_second=11.5669M/s __llvm_libc::bcmp,memcmp Google L
BM_Bcmp/4/0 105 ns 51.8 ns 13213696 bytes_per_cycle=0.116399/s bytes_per_second=166.51M/s items_per_second=19.2931M/s __llvm_libc::bcmp,memcmp Google M
BM_Bcmp/5/0 167 ns 93.9 ns 7853056 bytes_per_cycle=0.139432/s bytes_per_second=199.459M/s items_per_second=10.6503M/s __llvm_libc::bcmp,memcmp Google Q
BM_Bcmp/6/0 262 ns 165 ns 3931136 bytes_per_cycle=0.151516/s bytes_per_second=216.745M/s items_per_second=6.07091M/s __llvm_libc::bcmp,memcmp Google S
BM_Bcmp/7/0 168 ns 105 ns 6665216 bytes_per_cycle=0.143159/s bytes_per_second=204.791M/s items_per_second=9.52163M/s __llvm_libc::bcmp,memcmp Google U
BM_Bcmp/8/0 108 ns 68.0 ns 10175488 bytes_per_cycle=0.125504/s bytes_per_second=179.535M/s items_per_second=14.701M/s __llvm_libc::bcmp,memcmp Google W
BM_Bcmp/9/0 15371 ns 9007 ns 78848 bytes_per_cycle=0.166128/s bytes_per_second=237.648M/s items_per_second=111.031k/s __llvm_libc::bcmp,uniform 384 to 4096
```
After
```
BM_Bcmp/0/0 74.2 ns 49.7 ns 14306304 bytes_per_cycle=0.148927/s bytes_per_second=213.042M/s items_per_second=20.1101M/s __llvm_libc::bcmp,memcmp Google A
BM_Bcmp/1/0 108 ns 68.1 ns 10350592 bytes_per_cycle=0.411197/s bytes_per_second=588.222M/s items_per_second=14.6849M/s __llvm_libc::bcmp,memcmp Google B
BM_Bcmp/2/0 80.2 ns 56.0 ns 12386304 bytes_per_cycle=0.258588/s bytes_per_second=369.912M/s items_per_second=17.8585M/s __llvm_libc::bcmp,memcmp Google D
BM_Bcmp/3/0 92.4 ns 55.7 ns 12555264 bytes_per_cycle=0.206835/s bytes_per_second=295.88M/s items_per_second=17.943M/s __llvm_libc::bcmp,memcmp Google L
BM_Bcmp/4/0 79.3 ns 46.8 ns 14288896 bytes_per_cycle=0.125872/s bytes_per_second=180.061M/s items_per_second=21.3611M/s __llvm_libc::bcmp,memcmp Google M
BM_Bcmp/5/0 98.0 ns 57.9 ns 12232704 bytes_per_cycle=0.268815/s bytes_per_second=384.543M/s items_per_second=17.2711M/s __llvm_libc::bcmp,memcmp Google Q
BM_Bcmp/6/0 132 ns 65.5 ns 10474496 bytes_per_cycle=0.417246/s bytes_per_second=596.875M/s items_per_second=15.2673M/s __llvm_libc::bcmp,memcmp Google S
BM_Bcmp/7/0 101 ns 60.9 ns 11505664 bytes_per_cycle=0.253733/s bytes_per_second=362.968M/s items_per_second=16.4202M/s __llvm_libc::bcmp,memcmp Google U
BM_Bcmp/8/0 72.5 ns 50.2 ns 14082048 bytes_per_cycle=0.183262/s bytes_per_second=262.158M/s items_per_second=19.9271M/s __llvm_libc::bcmp,memcmp Google W
BM_Bcmp/9/0 852 ns 803 ns 854016 bytes_per_cycle=1.85028/s bytes_per_second=2.58481G/s items_per_second=1.24597M/s __llvm_libc::bcmp,uniform 384 to 4096
```
For comparison with glibc
```
BM_Bcmp/0/0 106 ns 52.6 ns 12906496 bytes_per_cycle=0.142072/s bytes_per_second=203.235M/s items_per_second=19.0271M/s glibc::bcmp,memcmp Google A
BM_Bcmp/1/0 132 ns 77.1 ns 8905728 bytes_per_cycle=0.365072/s bytes_per_second=522.239M/s items_per_second=12.9782M/s glibc::bcmp,memcmp Google B
BM_Bcmp/2/0 122 ns 62.3 ns 10909696 bytes_per_cycle=0.222667/s bytes_per_second=318.527M/s items_per_second=16.0563M/s glibc::bcmp,memcmp Google D
BM_Bcmp/3/0 99.5 ns 64.2 ns 11074560 bytes_per_cycle=0.185126/s bytes_per_second=264.825M/s items_per_second=15.5674M/s glibc::bcmp,memcmp Google L
BM_Bcmp/4/0 86.6 ns 50.2 ns 13488128 bytes_per_cycle=0.117941/s bytes_per_second=168.717M/s items_per_second=19.9053M/s glibc::bcmp,memcmp Google M
BM_Bcmp/5/0 106 ns 61.4 ns 11344896 bytes_per_cycle=0.248968/s bytes_per_second=356.151M/s items_per_second=16.284M/s glibc::bcmp,memcmp Google Q
BM_Bcmp/6/0 145 ns 71.9 ns 10046464 bytes_per_cycle=0.389814/s bytes_per_second=557.633M/s items_per_second=13.9019M/s glibc::bcmp,memcmp Google S
BM_Bcmp/7/0 119 ns 65.6 ns 10718208 bytes_per_cycle=0.243756/s bytes_per_second=348.696M/s items_per_second=15.2329M/s glibc::bcmp,memcmp Google U
BM_Bcmp/8/0 86.4 ns 54.5 ns 13250560 bytes_per_cycle=0.154831/s bytes_per_second=221.488M/s items_per_second=18.3532M/s glibc::bcmp,memcmp Google W
BM_Bcmp/9/0 1090 ns 604 ns 1186816 bytes_per_cycle=2.53848/s bytes_per_second=3.54622G/s items_per_second=1.65598M/s glibc::bcmp,uniform 384 to 4096
```
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D150567
|
|
|
|
|
|
|
|
|
| |
UInt<T>"
This reverts commit b663993067ffb5800632ad41ea7f2f92caab1093.
This caused a regression on aarch64:
https://lab.llvm.org/buildbot#builders/138/builds/43983
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch is similar to 86fe88c8d9 and adds several explicit
constructor calls (bool(...), uint64_t(...), uint8_t(...)) that are
needed when we use UInt<T> (in my case UInt<128> in riscv32).
This patch also adds two operators to UInt<T>:
* operator/= required by printf_core/float_hex_converter.h:148
* operator-- required by FPUtil/ManipulationFunctions.h:166
Reviewed By: sivachandra, lntue
Differential Revision: https://reviews.llvm.org/D149594
|
|
|
|
|
|
|
|
|
| |
We do a lot of arithmetic on void pointers here, so include a helper and
make some more consistent names. Changes no functionality.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D150576
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds another variable to cache cases where we know that we
own the buffer. This allows us to skip the atomic load on the inbox
because we already know its state. This is legal immediately after
opening a port, or when sending immediately after a recieve. This
caching nets a significant (~17%) speedup for the basic open, send,
recieve combination.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D150516
|
|
|
|
|
|
|
|
|
|
| |
We use a simple bump ptr in the `libc` tests. If we run out of data we
can currently return other static memory and have weird failure cases.
We should fail more explicitly here by returning a null pointer instead.
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D150529
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds two versions of `memset` optimized for architectures where unaligned accesses are either illegal or extremely slow.
It is currently enabled for RISCV 64 and RISCV 32 but it could be used for ARM 32 architectures as well.
Here is the before / after output of libc.benchmarks.memory_functions.opt_host --benchmark_filter=BM_Memset on a quad core Linux starfive RISCV 64 board running at 1.5GHz.
Before
```
Run on (4 X 1500 MHz CPU s)
CPU Caches:
L1 Instruction 32 KiB (x4)
L1 Data 32 KiB (x4)
L2 Unified 2048 KiB (x1)
------------------------------------------------------------------------
Benchmark Time CPU Iterations UserCounters...
------------------------------------------------------------------------
BM_Memset/0/0 506 ns 252 ns 2883584 bytes_per_cycle=0.238312/s bytes_per_second=340.908M/s items_per_second=3.96043M/s __llvm_libc::memset,memset Google A
BM_Memset/1/0 296 ns 189 ns 2900992 bytes_per_cycle=0.234589/s bytes_per_second=335.583M/s items_per_second=5.29382M/s __llvm_libc::memset,memset Google B
BM_Memset/2/0 2110 ns 1049 ns 678912 bytes_per_cycle=0.24687/s bytes_per_second=353.151M/s items_per_second=953.527k/s __llvm_libc::memset,memset Google D
BM_Memset/3/0 397 ns 254 ns 3055616 bytes_per_cycle=0.238479/s bytes_per_second=341.147M/s items_per_second=3.93224M/s __llvm_libc::memset,memset Google L
BM_Memset/4/0 1119 ns 621 ns 1079296 bytes_per_cycle=0.244925/s bytes_per_second=350.368M/s items_per_second=1.61047M/s __llvm_libc::memset,memset Google M
BM_Memset/5/0 605 ns 349 ns 1644544 bytes_per_cycle=0.241364/s bytes_per_second=345.274M/s items_per_second=2.8614M/s __llvm_libc::memset,memset Google Q
BM_Memset/6/0 472 ns 271 ns 2310144 bytes_per_cycle=0.238615/s bytes_per_second=341.341M/s items_per_second=3.68799M/s __llvm_libc::memset,memset Google S
BM_Memset/7/0 262 ns 143 ns 3956736 bytes_per_cycle=0.225812/s bytes_per_second=323.026M/s items_per_second=7.0087M/s __llvm_libc::memset,memset Google U
BM_Memset/8/0 454 ns 261 ns 2940928 bytes_per_cycle=0.238883/s bytes_per_second=341.725M/s items_per_second=3.82716M/s __llvm_libc::memset,memset Google W
BM_Memset/9/0 8768 ns 5998 ns 115712 bytes_per_cycle=0.249196/s bytes_per_second=356.478M/s items_per_second=166.724k/s __llvm_libc::memset,uniform 384 to 4096
```
After
```
BM_Memset/0/0 117 ns 69.5 ns 9761792 bytes_per_cycle=0.935152/s bytes_per_second=1.30639G/s items_per_second=14.3834M/s __llvm_libc::memset,memset Google A
BM_Memset/1/0 97.8 ns 58.5 ns 13002752 bytes_per_cycle=0.892814/s bytes_per_second=1.24725G/s items_per_second=17.0848M/s __llvm_libc::memset,memset Google B
BM_Memset/2/0 326 ns 163 ns 5156864 bytes_per_cycle=1.54408/s bytes_per_second=2.15706G/s items_per_second=6.1192M/s __llvm_libc::memset,memset Google D
BM_Memset/3/0 132 ns 65.4 ns 11455488 bytes_per_cycle=0.876411/s bytes_per_second=1.22433G/s items_per_second=15.2803M/s __llvm_libc::memset,memset Google L
BM_Memset/4/0 222 ns 120 ns 6405120 bytes_per_cycle=1.44398/s bytes_per_second=2.01722G/s items_per_second=8.30758M/s __llvm_libc::memset,memset Google M
BM_Memset/5/0 119 ns 79.2 ns 8930304 bytes_per_cycle=1.13327/s bytes_per_second=1.58317G/s items_per_second=12.6189M/s __llvm_libc::memset,memset Google Q
BM_Memset/6/0 123 ns 64.0 ns 11609088 bytes_per_cycle=1.008/s bytes_per_second=1.40817G/s items_per_second=15.6365M/s __llvm_libc::memset,memset Google S
BM_Memset/7/0 85.9 ns 52.1 ns 12423168 bytes_per_cycle=0.641164/s bytes_per_second=917.192M/s items_per_second=19.1937M/s __llvm_libc::memset,memset Google U
BM_Memset/8/0 114 ns 67.1 ns 10347520 bytes_per_cycle=0.911968/s bytes_per_second=1.274G/s items_per_second=14.9015M/s __llvm_libc::memset,memset Google W
BM_Memset/9/0 1326 ns 785 ns 907264 bytes_per_cycle=1.89716/s bytes_per_second=2.6503G/s items_per_second=1.27348M/s __llvm_libc::memset,uniform 384 to 4096
```
Again not as good as current glibc but it's a first step in the right direction.
```
BM_Memset/0/0 108 ns 53.6 ns 12894208 bytes_per_cycle=1.02858/s bytes_per_second=1.4369G/s items_per_second=18.668M/s glibc::memset,memset Google A
BM_Memset/1/0 84.6 ns 47.6 ns 14284800 bytes_per_cycle=1.00197/s bytes_per_second=1.39974G/s items_per_second=21.0256M/s glibc::memset,memset Google B
BM_Memset/2/0 160 ns 85.8 ns 8927232 bytes_per_cycle=3.30805/s bytes_per_second=4.62129G/s items_per_second=11.6596M/s glibc::memset,memset Google D
BM_Memset/3/0 78.9 ns 53.6 ns 13326336 bytes_per_cycle=1.14058/s bytes_per_second=1.59338G/s items_per_second=18.674M/s glibc::memset,memset Google L
BM_Memset/4/0 99.2 ns 60.8 ns 11460608 bytes_per_cycle=2.54751/s bytes_per_second=3.55884G/s items_per_second=16.4587M/s glibc::memset,memset Google M
BM_Memset/5/0 93.0 ns 56.1 ns 12219392 bytes_per_cycle=1.73379/s bytes_per_second=2.42207G/s items_per_second=17.8157M/s glibc::memset,memset Google Q
BM_Memset/6/0 89.4 ns 47.2 ns 14692352 bytes_per_cycle=1.34846/s bytes_per_second=1.88377G/s items_per_second=21.1795M/s glibc::memset,memset Google S
BM_Memset/7/0 84.0 ns 50.0 ns 14468096 bytes_per_cycle=0.911198/s bytes_per_second=1.27293G/s items_per_second=19.994M/s glibc::memset,memset Google U
BM_Memset/8/0 93.4 ns 52.8 ns 13063168 bytes_per_cycle=1.06642/s bytes_per_second=1.48977G/s items_per_second=18.9524M/s glibc::memset,memset Google W
BM_Memset/9/0 438 ns 241 ns 2853888 bytes_per_cycle=6.1185/s bytes_per_second=8.54744G/s items_per_second=4.15064M/s glibc::memset,uniform 384 to 4096
```
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D150433
|
|
|
|
|
|
| |
The owner of the last two failing buildbots updated CMake.
This reverts commit e8e8707b4aa6e4cc04c0cffb2de01d2de71165fc.
|
|
|
|
|
|
|
|
| |
This is to improve a performance bottleneck of printf for long double.
Reviewed By: michaelrj
Differential Revision: https://reviews.llvm.org/D150475
|
|
|
|
|
|
|
|
|
|
|
|
| |
We support asynchronous sends, that means that the kernel can issue a
send, then exit the kernel as we do with the `EXIT` syscall. Because of
the condition it's therefore possible for the kernel to exit and break
from the loop before we check the server again. This can potentially
cause us to ignore an `EXIT` call from the GPU.
Reviewed By: JonChesterfield, lntue
Differential Revision: https://reviews.llvm.org/D150456
|
|
|
|
|
| |
Summary:
We need this function from the test.cpp but need to declare it manually.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently we provide the `send_n` and `recv_n` functions. These were
somewhat divergent and not tested on the GPU. This patch changes the
support to be more common. We do this my making the CPU provide an array
equal the to at least the lane size while the GPU can rely on the
private memory address of its stack variables. This allows us to send
data back and forth generically.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D150379
|
|
|
|
|
|
| |
I forgot that we still used these variables in the loaders.
Differential Revision: https://reviews.llvm.org/D150362
|
|
|
|
|
|
|
| |
Small cleanup of the server code and fixes a constant name not following
the naming convention.
Differential Revision: https://reviews.llvm.org/D150361
|
|
|
|
| |
Being able to link statically depends on other CMake options and choice of libc.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch makes sure:
- we pass the correct compiler options when building Google benchmarks,
- we only import the C++ version of the memory functions.
The change in libc/cmake/modules/LLVMLibCTestRules.cmake is here to make sure CMake can generate the right command line in the presence of the CMAKE_CROSSCOMPILING_EMULATOR option.
Relevant documentation:
https://cmake.org/cmake/help/latest/variable/CMAKE_CROSSCOMPILING_EMULATOR.html
https://cmake.org/cmake/help/latest/command/add_custom_command.html#command:add_custom_command
"
If COMMAND specifies an executable target name (created by the `add_executable()` command), it will automatically be replaced by the location of the executable created at build time if either of the following is true:
- The target is not being cross-compiled (i.e. the CMAKE_CROSSCOMPILING variable is not set to true).
- New in version 3.6: The target is being cross-compiled and an emulator is provided (i.e. its CROSSCOMPILING_EMULATOR target property is set). In this case, the contents of CROSSCOMPILING_EMULATOR will be prepended to the command before the location of the target executable.
"
Reviewed By: gchatelet
Differential Revision: https://reviews.llvm.org/D150200
|
|
|
|
|
|
|
|
|
|
| |
Allows moving the pointer swap between server and client into reset.
Single allocation simplifies whatever allocates the client/server, currently
the libc loaders.
Reviewed By: jhuber6
Differential Revision: https://reviews.llvm.org/D150337
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The interface exported by the RPC library allows users to simply send
and recieve fixed sized packets without worrying about the data motion
underneath. However, this was broken in the current implementation. We
can think of the send and recieve implementations in terms of waiting
for ownership of the buffer, using the buffer, and posting ownership to
the other side. Our implementation of `recv` was incorrect in the
following scenarios.
recv -> send // we still own the buffer and should give away ownership
recv -> close // The other side is not waiting for data, this will
result in multiple openings of the same port
This patch attempts to fix this with an admittedly hacky fix where we
track if the previous implementation was a recv and post conditionally.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D150327
|
|
|
|
|
|
|
|
|
|
|
| |
Replaces the globals currently used. Worth changing to a bitmap
before allowing runtime number of ports >> 64. One bit per port is likely
to be cheap enough that sizing for the worst case is always fine, otherwise
in the future we can change to dynamically allocating it.
Reviewed By: jhuber6
Differential Revision: https://reviews.llvm.org/D150309
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The Port type has stipuations that the same exact mask used to open it
needs to close it. This can currently be violated by calling its move
constructor to put it somewhere else. We still need the move constructor
to handle the open and closing functions. So, we simply make these
constructors private and only allow a few classes to have move
priviledges on it.
Reviewed By: JonChesterfield, lntue
Differential Revision: https://reviews.llvm.org/D150118
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds two versions of memcpy optimized for architectures where unaligned accesses are either illegal or extremely slow.
It is currently enabled for RISCV 64 and RISCV 32 but it could be used for ARM 32 architectures as well.
Here is the before / after output of `libc.benchmarks.memory_functions.opt_host --benchmark_filter=BM_Memcpy` on a quad core Linux starfive RISCV 64 board running at 1.5GHz.
Before:
```
Run on (4 X 1500 MHz CPU s)
CPU Caches:
L1 Instruction 32 KiB (x4)
L1 Data 32 KiB (x4)
L2 Unified 2048 KiB (x1)
------------------------------------------------------------------------
Benchmark Time CPU Iterations UserCounters...
------------------------------------------------------------------------
BM_Memcpy/0/0 474 ns 474 ns 1483776 bytes_per_cycle=0.243492/s bytes_per_second=348.318M/s items_per_second=2.11097M/s __llvm_libc::memcpy,memcpy Google A
BM_Memcpy/1/0 210 ns 209 ns 3649536 bytes_per_cycle=0.233819/s bytes_per_second=334.481M/s items_per_second=4.77519M/s __llvm_libc::memcpy,memcpy Google B
BM_Memcpy/2/0 1814 ns 1814 ns 396288 bytes_per_cycle=0.247899/s bytes_per_second=354.622M/s items_per_second=551.402k/s __llvm_libc::memcpy,memcpy Google D
BM_Memcpy/3/0 89.3 ns 89.2 ns 7459840 bytes_per_cycle=0.217415/s bytes_per_second=311.014M/s items_per_second=11.2071M/s __llvm_libc::memcpy,memcpy Google L
BM_Memcpy/4/0 134 ns 134 ns 3815424 bytes_per_cycle=0.226584/s bytes_per_second=324.131M/s items_per_second=7.44567M/s __llvm_libc::memcpy,memcpy Google M
BM_Memcpy/5/0 52.8 ns 52.6 ns 11001856 bytes_per_cycle=0.194893/s bytes_per_second=278.797M/s items_per_second=19.0284M/s __llvm_libc::memcpy,memcpy Google Q
BM_Memcpy/6/0 180 ns 180 ns 4101120 bytes_per_cycle=0.231884/s bytes_per_second=331.713M/s items_per_second=5.55957M/s __llvm_libc::memcpy,memcpy Google S
BM_Memcpy/7/0 195 ns 195 ns 3906560 bytes_per_cycle=0.232951/s bytes_per_second=333.239M/s items_per_second=5.1217M/s __llvm_libc::memcpy,memcpy Google U
BM_Memcpy/8/0 152 ns 152 ns 4789248 bytes_per_cycle=0.227507/s bytes_per_second=325.452M/s items_per_second=6.58187M/s __llvm_libc::memcpy,memcpy Google W
BM_Memcpy/9/0 6036 ns 6033 ns 118784 bytes_per_cycle=0.249158/s bytes_per_second=356.423M/s items_per_second=165.75k/s __llvm_libc::memcpy,uniform 384 to 4096
```
After:
```
BM_Memcpy/0/0 126 ns 126 ns 5770240 bytes_per_cycle=1.04707/s bytes_per_second=1.46273G/s items_per_second=7.9385M/s __llvm_libc::memcpy,memcpy Google A
BM_Memcpy/1/0 75.1 ns 75.0 ns 10204160 bytes_per_cycle=0.691143/s bytes_per_second=988.687M/s items_per_second=13.3289M/s __llvm_libc::memcpy,memcpy Google B
BM_Memcpy/2/0 333 ns 333 ns 2174976 bytes_per_cycle=1.39297/s bytes_per_second=1.94596G/s items_per_second=3.00002M/s __llvm_libc::memcpy,memcpy Google D
BM_Memcpy/3/0 49.6 ns 49.5 ns 16092160 bytes_per_cycle=0.710161/s bytes_per_second=1015.89M/s items_per_second=20.1844M/s __llvm_libc::memcpy,memcpy Google L
BM_Memcpy/4/0 57.7 ns 57.7 ns 11213824 bytes_per_cycle=0.561557/s bytes_per_second=803.314M/s items_per_second=17.3228M/s __llvm_libc::memcpy,memcpy Google M
BM_Memcpy/5/0 48.0 ns 47.9 ns 16437248 bytes_per_cycle=0.346708/s bytes_per_second=495.97M/s items_per_second=20.8571M/s __llvm_libc::memcpy,memcpy Google Q
BM_Memcpy/6/0 67.5 ns 67.5 ns 10616832 bytes_per_cycle=0.614173/s bytes_per_second=878.582M/s items_per_second=14.8142M/s __llvm_libc::memcpy,memcpy Google S
BM_Memcpy/7/0 84.7 ns 84.6 ns 10480640 bytes_per_cycle=0.819077/s bytes_per_second=1.14424G/s items_per_second=11.8174M/s __llvm_libc::memcpy,memcpy Google U
BM_Memcpy/8/0 61.7 ns 61.6 ns 11191296 bytes_per_cycle=0.550078/s bytes_per_second=786.893M/s items_per_second=16.2279M/s __llvm_libc::memcpy,memcpy Google W
BM_Memcpy/9/0 981 ns 981 ns 703488 bytes_per_cycle=1.52333/s bytes_per_second=2.12807G/s items_per_second=1019.81k/s __llvm_libc::memcpy,uniform 384 to 4096
```
It is not as good as glibc for now so there's room for improvement. I suspect a path pumping 16 bytes at once given the doubled numbers for large copies.
```
BM_Memcpy/0/1 146 ns 82.5 ns 8576000 bytes_per_cycle=1.35236/s bytes_per_second=1.88922G/s items_per_second=12.1169M/s glibc memcpy,memcpy Google A
BM_Memcpy/1/1 112 ns 63.7 ns 10634240 bytes_per_cycle=0.628018/s bytes_per_second=898.387M/s items_per_second=15.702M/s glibc memcpy,memcpy Google B
BM_Memcpy/2/1 315 ns 180 ns 4079616 bytes_per_cycle=2.65229/s bytes_per_second=3.7052G/s items_per_second=5.54764M/s glibc memcpy,memcpy Google D
BM_Memcpy/3/1 85.3 ns 43.1 ns 15854592 bytes_per_cycle=0.774164/s bytes_per_second=1107.45M/s items_per_second=23.2249M/s glibc memcpy,memcpy Google L
BM_Memcpy/4/1 105 ns 54.3 ns 13427712 bytes_per_cycle=0.7793/s bytes_per_second=1114.8M/s items_per_second=18.4109M/s glibc memcpy,memcpy Google M
BM_Memcpy/5/1 77.1 ns 43.2 ns 16476160 bytes_per_cycle=0.279808/s bytes_per_second=400.269M/s items_per_second=23.1428M/s glibc memcpy,memcpy Google Q
BM_Memcpy/6/1 112 ns 62.7 ns 11236352 bytes_per_cycle=0.676078/s bytes_per_second=967.137M/s items_per_second=15.9387M/s glibc memcpy,memcpy Google S
BM_Memcpy/7/1 131 ns 65.5 ns 11751424 bytes_per_cycle=0.965616/s bytes_per_second=1.34895G/s items_per_second=15.2762M/s glibc memcpy,memcpy Google U
BM_Memcpy/8/1 104 ns 55.0 ns 12314624 bytes_per_cycle=0.583336/s bytes_per_second=834.468M/s items_per_second=18.1937M/s glibc memcpy,memcpy Google W
BM_Memcpy/9/1 932 ns 466 ns 1480704 bytes_per_cycle=3.17342/s bytes_per_second=4.43321G/s items_per_second=2.14679M/s glibc memcpy,uniform 384 to 4096
```
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D150202
|
|
|
|
|
|
| |
Reviewed By: michaelrj
Differential Revision: https://reviews.llvm.org/D150088
|
|
|
|
|
|
| |
Reviewed By: jhuber6
Differential Revision: https://reviews.llvm.org/D150065
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently the opcode is only valid if it is the same between all of the
ports. This is possible to violate if the opcode is places into a memory
location and then read in a non-uniform manner by the warp / wavefront.
Moving this to a compile time constant makes it impossible to break this
invariant.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D150115
|
|
|
|
|
|
|
|
|
|
|
|
| |
The exact set of supported values is determined by the <errno.h>
and <signal.h> headers, which don't (yet) come from llvm-libc on
Fuchsia. The mappings of SIG* and E* codes to psignal/strsignal
and perror/strerror text used in Fuchsia libc today is the same
as for Linux.
Reviewed By: abrachet
Differential Revision: https://reviews.llvm.org/D150026
|
| |
|
|
|
|
|
|
| |
Unfortunatly not all buildbots are updated.
This reverts commit ffb807ab5375b3f78df198dc5d4302b3b552242f.
|
|
|
|
|
|
| |
All build bots should be updated now.
This reverts commit 44d38022ab29a3156349602733b3459df5beef93.
|
|
|
|
|
|
|
| |
Summary;
This was changed a long time ago to drop the `LLVM_` prefix.
Differential Revision: https://reviews.llvm.org/D150012
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch uses the changed interface in D149972 to make these classes
move-only. The `Port` class could be further refined to be
construct-only in a future patch, but for now this makes it more
difficult to misuse the interface.
Depends on D149972
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D149974
|
|
|
|
|
|
|
|
|
|
|
| |
This patch replaces the existing `cpp::optional` type with a newer
version that has more features. This class is heavily inspired by the
old `llvm::Optional` class. Currently the limitations of this class is
that we only handle types with trivial constructors or operators.
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D149972
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously we used a single port to implement the RPC. This was
sufficient for single threaded tests but can potentially cause deadlocks
when using multiple threads. The reason for this is that GPUs make no
forward progress guarantees. Therefore one group of threads waiting on
another group of threads can spin forever because there is no guarantee
that the other threads will continue executing. The typical workaround
for this is to allocate enough memory that a sufficiently large number
of work groups can make progress. As long as this number is somewhat
close to the amount of total concurrency we can obtain reliable
execution around a shared resource.
This patch enables using multiple ports by widening the arrays to a
predetermined size and indexes into them. Empty ports are currently
obtained via a trivial linker scan. This should be imporoved in the
future for performance reasons. Portions of D148191 were applied to
achieve parallel support.
Depends on D149581
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D149598
|
|
|
|
|
|
|
|
|
| |
Previously this wasn't implemented because it's effectively a no-op.
However, this should be safe to emit on sm_60 architectures. It's
important because it carries semantic importance for whether or not
something can be moved. So we should always emit this instrinsic.
Differential Revision: https://reviews.llvm.org/D149923
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The GPU has a different execution model to standard `_start`
implementations. On the GPU, all threads are active at the start of a
kernel. In order to correctly intitialize and call the constructors we
want single threaded semantics. Previously, this was done using a
makeshift global barrier with atomics. However, it should be easier to
simply put the portions of the code that must be single threaded in
separate kernels and then call those with only one thread. Generally,
mixing global state between kernel launches makes optimizations more
difficult, similarly to calling a function outside of the TU, but for
testing it is better to be correct.
Depends on D149527 D148943
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D149581
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The execution model of the GPU expects that groups of threads will
execute in lock-step in SIMD fashion. It's both important for
performance and correctness that we treat this as the smallest possible
granularity for an RPC operation. Thus, we map multiple threads to a
single larger buffer and ship that across the wire.
This patch makes the necessary changes to support executing the RPC on
the GPU with multiple threads. This requires some workarounds to mimic
the model when handling the protocol from the CPU. I'm not completely
happy with some of the workarounds required, but I think it should work.
Uses some of the implementation details from D148191.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D148943
|
|
|
|
|
|
|
|
|
|
| |
This patch updates the struct dirent to be on par with glibc (by adding
a missing d_type member) and update the readdir call to use SYS_getdents64
instead of SYS_getdents.
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D147738
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Carefully work around not knowing the thread mask that nvptx intrinsic
functions require.
If the warp is converged when calling try_lock, a single rpc call will handle
all lanes within it. Otherwise more than one rpc call with thread masks that
compose to the unknown one will occur.
Reviewed By: jhuber6
Differential Revision: https://reviews.llvm.org/D149897
|
|
|
|
| |
This reverts commit b1323738649e96aac943f3773ec7336df110eea5.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Carefully work around not knowing the thread mask that nvptx intrinsic
functions require.
If the warp is converged when calling try_lock, a single rpc call will handle
all lanes within it. Otherwise more than one rpc call with thread masks that
compose to the unknown one will occur.
Reviewed By: jhuber6
Differential Revision: https://reviews.llvm.org/D149897
|
|
|
|
| |
This reverts commit 09ceb4729f1ca8781718d41b7876b68820baadba.
|
| |
|
|
|
|
|
|
|
| |
This test fails on sm_60 because of the atomics codegen. We test atomics
indirectly with the `rpc` so we still have coverage.
Differential Revision: https://reviews.llvm.org/D149887
|