summaryrefslogtreecommitdiff
path: root/libc
Commit message (Collapse)AuthorAgeFilesLines
* [libc] Restrict access to the RPC Process internalsJoseph Huber2023-05-171-0/+5
| | | | | | | | | | | | This patch changes the `Process` struct to only provide the functions expected to be visible by the interface. So, now we only export the open, reset, and size query functions. This prevents users of the interface from messing with the internals of the process, so now the only existing failure mode is mismatched send and recieve calls. Reviewed By: michaelrj Differential Revision: https://reviews.llvm.org/D150703
* [libc] Add a convenience CMake function `add_unittest_framework_library`.Siva Chandra Reddy2023-05-173-118/+153
| | | | | | | | | | This function is used to add unit test and hermetic test framework libraries. It avoids the duplicated code to add compile options to each every test framework libraries. Reviewed By: jhuber6 Differential Revision: https://reviews.llvm.org/D150727
* Revert "Reland "[CMake] Bumps minimum version to 3.20.0.""Nico Weber2023-05-172-2/+2
| | | | | | | | | | | | | | This reverts commit 65429b9af6a2c99d340ab2dcddd41dab201f399c. Broke several projects, see https://reviews.llvm.org/D144509#4347562 onwards. Also reverts follow-up commit "[OpenMP] Compile assembly files as ASM, not C" This reverts commit 4072c8aee4c89c4457f4f30d01dc9bb4dfa52559. Also reverts fix attempt "[cmake] Set CMP0091 to fix Windows builds after the cmake_minimum_required bump" This reverts commit 7d47dac5f828efd1d378ba44a97559114f00fb64.
* [libc] Fix definition and use of LIBC_INLINE macroRoland McGrath2023-05-163-5/+4
| | | | | | | | | | | | LIBC_INLINE was doubly defined in two headers. Define it only in one place. Also update a few uses to make sure it's always placed where a function attribute is valid and is used consistently on every declaration of the same function in case the attributes used in its definition must match on declarations and definitions. Reviewed By: abrachet Differential Revision: https://reviews.llvm.org/D150731
* [libc][Obvious] Bump hermetic alloc space to 64KB.Siva Chandra Reddy2023-05-161-1/+1
| | | | | | Few hermetic tests are failing as they are running out of memory. Differential Revision: https://reviews.llvm.org/D150724
* [libc] Remove *TestMain libraries and combine them with the main test libraries.Siva Chandra Reddy2023-05-162-27/+10
| | | | | | | | | | | There are not tests currently which use the main test framework but not the `main` function from LibcTestMain.cpp. So, this change essentially simplifies by merging the *TestMain libraries with the main test libraries. Reviewed By: michaelrj, jhuber6 Differential Revision: https://reviews.llvm.org/D150698
* [libc][NFC] Simplifly inbox and outbox state handlingJoseph Huber2023-05-161-37/+20
| | | | | | | | | | | | | Currently we use a template parameter called `InvertInbox` to invert the inbox when we load it. This is more easily understood as a static check on whether or not the process running it is the server. Inverting the inbox makes the states 1 0 and 0 1 own the buffer, so it's easier to simply say that the server own the buffer if in != out. Also clean up some of the comments. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D150365
* [libc] Add optimized memcmp for RISCVGuillaume Chatelet2023-05-161-3/+68
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds two versions of `bcmp` optimized for architectures where unaligned accesses are either illegal or extremely slow. It is currently enabled for RISCV 64 and RISCV 32 but it could be used for ARM 32 architectures as well. Here is the before / after output of `libc.benchmarks.memory_functions.opt_host --benchmark_filter=BM_memcmp` on a quad core Linux starfive RISCV 64 board running at 1.5GHz. Before ``` Run on (4 X 1500 MHz CPU s) CPU Caches: L1 Instruction 32 KiB (x4) L1 Data 32 KiB (x4) L2 Unified 2048 KiB (x1) ---------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... ---------------------------------------------------------------------- BM_Memcmp/0/0 110 ns 66.4 ns 10404864 bytes_per_cycle=0.107646/s bytes_per_second=153.989M/s items_per_second=15.071M/s __llvm_libc::memcmp,memcmp Google A BM_Memcmp/1/0 318 ns 211 ns 3026944 bytes_per_cycle=0.131539/s bytes_per_second=188.167M/s items_per_second=4.73691M/s __llvm_libc::memcmp,memcmp Google B BM_Memcmp/2/0 204 ns 115 ns 6118400 bytes_per_cycle=0.121675/s bytes_per_second=174.058M/s items_per_second=8.70241M/s __llvm_libc::memcmp,memcmp Google D BM_Memcmp/3/0 143 ns 99.6 ns 7013376 bytes_per_cycle=0.117974/s bytes_per_second=168.763M/s items_per_second=10.0437M/s __llvm_libc::memcmp,memcmp Google L BM_Memcmp/4/0 81.3 ns 58.2 ns 11426816 bytes_per_cycle=0.101125/s bytes_per_second=144.661M/s items_per_second=17.1805M/s __llvm_libc::memcmp,memcmp Google M BM_Memcmp/5/0 177 ns 118 ns 5952512 bytes_per_cycle=0.120612/s bytes_per_second=172.537M/s items_per_second=8.45549M/s __llvm_libc::memcmp,memcmp Google Q BM_Memcmp/6/0 342 ns 220 ns 3483648 bytes_per_cycle=0.132004/s bytes_per_second=188.834M/s items_per_second=4.54739M/s __llvm_libc::memcmp,memcmp Google S BM_Memcmp/7/0 208 ns 130 ns 5681152 bytes_per_cycle=0.12468/s bytes_per_second=178.356M/s items_per_second=7.6674M/s __llvm_libc::memcmp,memcmp Google U BM_Memcmp/8/0 123 ns 79.1 ns 8387584 bytes_per_cycle=0.110593/s bytes_per_second=158.204M/s items_per_second=12.6439M/s __llvm_libc::memcmp,memcmp Google W BM_Memcmp/9/0 20707 ns 10643 ns 67584 bytes_per_cycle=0.142401/s bytes_per_second=203.707M/s items_per_second=93.9559k/s __llvm_libc::memcmp,uniform 384 to 4096 ``` After ``` BM_Memcmp/0/0 80.4 ns 55.8 ns 12648448 bytes_per_cycle=0.132703/s bytes_per_second=189.834M/s items_per_second=17.9256M/s __llvm_libc::memcmp,memcmp Google A BM_Memcmp/1/0 140 ns 80.5 ns 8230912 bytes_per_cycle=0.337273/s bytes_per_second=482.474M/s items_per_second=12.4165M/s __llvm_libc::memcmp,memcmp Google B BM_Memcmp/2/0 101 ns 66.4 ns 10571776 bytes_per_cycle=0.208539/s bytes_per_second=298.317M/s items_per_second=15.0687M/s __llvm_libc::memcmp,memcmp Google D BM_Memcmp/3/0 118 ns 67.6 ns 10533888 bytes_per_cycle=0.176822/s bytes_per_second=252.946M/s items_per_second=14.7946M/s __llvm_libc::memcmp,memcmp Google L BM_Memcmp/4/0 106 ns 53.0 ns 12722176 bytes_per_cycle=0.111141/s bytes_per_second=158.988M/s items_per_second=18.8591M/s __llvm_libc::memcmp,memcmp Google M BM_Memcmp/5/0 141 ns 70.2 ns 10436608 bytes_per_cycle=0.26032/s bytes_per_second=372.39M/s items_per_second=14.2458M/s __llvm_libc::memcmp,memcmp Google Q BM_Memcmp/6/0 144 ns 79.3 ns 8932352 bytes_per_cycle=0.353168/s bytes_per_second=505.211M/s items_per_second=12.612M/s __llvm_libc::memcmp,memcmp Google S BM_Memcmp/7/0 123 ns 71.7 ns 9945088 bytes_per_cycle=0.22143/s bytes_per_second=316.758M/s items_per_second=13.9421M/s __llvm_libc::memcmp,memcmp Google U BM_Memcmp/8/0 97.0 ns 56.2 ns 12509184 bytes_per_cycle=0.160526/s bytes_per_second=229.635M/s items_per_second=17.7784M/s __llvm_libc::memcmp,memcmp Google W BM_Memcmp/9/0 1840 ns 989 ns 676864 bytes_per_cycle=1.4894/s bytes_per_second=2.08067G/s items_per_second=1010.92k/s __llvm_libc::memcmp,uniform 384 to 4096 ``` glibc ``` BM_Memcmp/0/0 72.6 ns 51.7 ns 12963840 bytes_per_cycle=0.141261/s bytes_per_second=202.075M/s items_per_second=19.3246M/s glibc::memcmp,memcmp Google A BM_Memcmp/1/0 118 ns 75.2 ns 9280512 bytes_per_cycle=0.354054/s bytes_per_second=506.478M/s items_per_second=13.3046M/s glibc::memcmp,memcmp Google B BM_Memcmp/2/0 114 ns 62.9 ns 11152384 bytes_per_cycle=0.222675/s bytes_per_second=318.539M/s items_per_second=15.8943M/s glibc::memcmp,memcmp Google D BM_Memcmp/3/0 84.0 ns 63.5 ns 11030528 bytes_per_cycle=0.186353/s bytes_per_second=266.581M/s items_per_second=15.7378M/s glibc::memcmp,memcmp Google L BM_Memcmp/4/0 93.5 ns 51.2 ns 13462528 bytes_per_cycle=0.119215/s bytes_per_second=170.539M/s items_per_second=19.5384M/s glibc::memcmp,memcmp Google M BM_Memcmp/5/0 123 ns 61.7 ns 11376640 bytes_per_cycle=0.225262/s bytes_per_second=322.239M/s items_per_second=16.1993M/s glibc::memcmp,memcmp Google Q BM_Memcmp/6/0 122 ns 71.6 ns 9967616 bytes_per_cycle=0.380844/s bytes_per_second=544.802M/s items_per_second=13.9579M/s glibc::memcmp,memcmp Google S BM_Memcmp/7/0 118 ns 65.6 ns 10555392 bytes_per_cycle=0.238677/s bytes_per_second=341.43M/s items_per_second=15.2334M/s glibc::memcmp,memcmp Google U BM_Memcmp/8/0 90.4 ns 54.0 ns 12920832 bytes_per_cycle=0.161987/s bytes_per_second=231.724M/s items_per_second=18.5169M/s glibc::memcmp,memcmp Google W BM_Memcmp/9/0 1045 ns 601 ns 1195008 bytes_per_cycle=2.53677/s bytes_per_second=3.54383G/s items_per_second=1.66423M/s glibc::memcmp,uniform 384 to 4096 ``` Reviewed By: sivachandra Differential Revision: https://reviews.llvm.org/D150663
* [libc] Add optimized bcmp for RISCVGuillaume Chatelet2023-05-162-5/+63
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [libc] Add optimized bcmp for RISCV This patch adds two versions of bcmp optimized for architectures where unaligned accesses are either illegal or extremely slow. It is currently enabled for RISCV 64 and RISCV 32 but it could be used for ARM 32 architectures as well. Here is the before / after output of libc.benchmarks.memory_functions.opt_host --benchmark_filter=BM_Bcmp on a quad core Linux starfive RISCV 64 board running at 1.5GHz. Before ``` Run on (4 X 1500 MHz CPU s) CPU Caches: L1 Instruction 32 KiB (x4) L1 Data 32 KiB (x4) L2 Unified 2048 KiB (x1) Load Average: 7.03, 5.98, 3.71 ---------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... ---------------------------------------------------------------------- BM_Bcmp/0/0 102 ns 60.5 ns 11662336 bytes_per_cycle=0.122696/s bytes_per_second=175.518M/s items_per_second=16.5258M/s __llvm_libc::bcmp,memcmp Google A BM_Bcmp/1/0 328 ns 172 ns 3737600 bytes_per_cycle=0.15256/s bytes_per_second=218.238M/s items_per_second=5.80575M/s __llvm_libc::bcmp,memcmp Google B BM_Bcmp/2/0 199 ns 99.7 ns 7019520 bytes_per_cycle=0.141897/s bytes_per_second=202.986M/s items_per_second=10.032M/s __llvm_libc::bcmp,memcmp Google D BM_Bcmp/3/0 173 ns 86.5 ns 8361984 bytes_per_cycle=0.13863/s bytes_per_second=198.312M/s items_per_second=11.5669M/s __llvm_libc::bcmp,memcmp Google L BM_Bcmp/4/0 105 ns 51.8 ns 13213696 bytes_per_cycle=0.116399/s bytes_per_second=166.51M/s items_per_second=19.2931M/s __llvm_libc::bcmp,memcmp Google M BM_Bcmp/5/0 167 ns 93.9 ns 7853056 bytes_per_cycle=0.139432/s bytes_per_second=199.459M/s items_per_second=10.6503M/s __llvm_libc::bcmp,memcmp Google Q BM_Bcmp/6/0 262 ns 165 ns 3931136 bytes_per_cycle=0.151516/s bytes_per_second=216.745M/s items_per_second=6.07091M/s __llvm_libc::bcmp,memcmp Google S BM_Bcmp/7/0 168 ns 105 ns 6665216 bytes_per_cycle=0.143159/s bytes_per_second=204.791M/s items_per_second=9.52163M/s __llvm_libc::bcmp,memcmp Google U BM_Bcmp/8/0 108 ns 68.0 ns 10175488 bytes_per_cycle=0.125504/s bytes_per_second=179.535M/s items_per_second=14.701M/s __llvm_libc::bcmp,memcmp Google W BM_Bcmp/9/0 15371 ns 9007 ns 78848 bytes_per_cycle=0.166128/s bytes_per_second=237.648M/s items_per_second=111.031k/s __llvm_libc::bcmp,uniform 384 to 4096 ``` After ``` BM_Bcmp/0/0 74.2 ns 49.7 ns 14306304 bytes_per_cycle=0.148927/s bytes_per_second=213.042M/s items_per_second=20.1101M/s __llvm_libc::bcmp,memcmp Google A BM_Bcmp/1/0 108 ns 68.1 ns 10350592 bytes_per_cycle=0.411197/s bytes_per_second=588.222M/s items_per_second=14.6849M/s __llvm_libc::bcmp,memcmp Google B BM_Bcmp/2/0 80.2 ns 56.0 ns 12386304 bytes_per_cycle=0.258588/s bytes_per_second=369.912M/s items_per_second=17.8585M/s __llvm_libc::bcmp,memcmp Google D BM_Bcmp/3/0 92.4 ns 55.7 ns 12555264 bytes_per_cycle=0.206835/s bytes_per_second=295.88M/s items_per_second=17.943M/s __llvm_libc::bcmp,memcmp Google L BM_Bcmp/4/0 79.3 ns 46.8 ns 14288896 bytes_per_cycle=0.125872/s bytes_per_second=180.061M/s items_per_second=21.3611M/s __llvm_libc::bcmp,memcmp Google M BM_Bcmp/5/0 98.0 ns 57.9 ns 12232704 bytes_per_cycle=0.268815/s bytes_per_second=384.543M/s items_per_second=17.2711M/s __llvm_libc::bcmp,memcmp Google Q BM_Bcmp/6/0 132 ns 65.5 ns 10474496 bytes_per_cycle=0.417246/s bytes_per_second=596.875M/s items_per_second=15.2673M/s __llvm_libc::bcmp,memcmp Google S BM_Bcmp/7/0 101 ns 60.9 ns 11505664 bytes_per_cycle=0.253733/s bytes_per_second=362.968M/s items_per_second=16.4202M/s __llvm_libc::bcmp,memcmp Google U BM_Bcmp/8/0 72.5 ns 50.2 ns 14082048 bytes_per_cycle=0.183262/s bytes_per_second=262.158M/s items_per_second=19.9271M/s __llvm_libc::bcmp,memcmp Google W BM_Bcmp/9/0 852 ns 803 ns 854016 bytes_per_cycle=1.85028/s bytes_per_second=2.58481G/s items_per_second=1.24597M/s __llvm_libc::bcmp,uniform 384 to 4096 ``` For comparison with glibc ``` BM_Bcmp/0/0 106 ns 52.6 ns 12906496 bytes_per_cycle=0.142072/s bytes_per_second=203.235M/s items_per_second=19.0271M/s glibc::bcmp,memcmp Google A BM_Bcmp/1/0 132 ns 77.1 ns 8905728 bytes_per_cycle=0.365072/s bytes_per_second=522.239M/s items_per_second=12.9782M/s glibc::bcmp,memcmp Google B BM_Bcmp/2/0 122 ns 62.3 ns 10909696 bytes_per_cycle=0.222667/s bytes_per_second=318.527M/s items_per_second=16.0563M/s glibc::bcmp,memcmp Google D BM_Bcmp/3/0 99.5 ns 64.2 ns 11074560 bytes_per_cycle=0.185126/s bytes_per_second=264.825M/s items_per_second=15.5674M/s glibc::bcmp,memcmp Google L BM_Bcmp/4/0 86.6 ns 50.2 ns 13488128 bytes_per_cycle=0.117941/s bytes_per_second=168.717M/s items_per_second=19.9053M/s glibc::bcmp,memcmp Google M BM_Bcmp/5/0 106 ns 61.4 ns 11344896 bytes_per_cycle=0.248968/s bytes_per_second=356.151M/s items_per_second=16.284M/s glibc::bcmp,memcmp Google Q BM_Bcmp/6/0 145 ns 71.9 ns 10046464 bytes_per_cycle=0.389814/s bytes_per_second=557.633M/s items_per_second=13.9019M/s glibc::bcmp,memcmp Google S BM_Bcmp/7/0 119 ns 65.6 ns 10718208 bytes_per_cycle=0.243756/s bytes_per_second=348.696M/s items_per_second=15.2329M/s glibc::bcmp,memcmp Google U BM_Bcmp/8/0 86.4 ns 54.5 ns 13250560 bytes_per_cycle=0.154831/s bytes_per_second=221.488M/s items_per_second=18.3532M/s glibc::bcmp,memcmp Google W BM_Bcmp/9/0 1090 ns 604 ns 1186816 bytes_per_cycle=2.53848/s bytes_per_second=3.54622G/s items_per_second=1.65598M/s glibc::bcmp,uniform 384 to 4096 ``` Reviewed By: sivachandra Differential Revision: https://reviews.llvm.org/D150567
* Revert "[libc] Add explicit constructor calls to fix compilation when using ↵Mikhail R. Gadelha2023-05-169-46/+26
| | | | | | | | | UInt<T>" This reverts commit b663993067ffb5800632ad41ea7f2f92caab1093. This caused a regression on aarch64: https://lab.llvm.org/buildbot#builders/138/builds/43983
* [libc] Add explicit constructor calls to fix compilation when using UInt<T>Mikhail R. Gadelha2023-05-169-26/+46
| | | | | | | | | | | | | | This patch is similar to 86fe88c8d9 and adds several explicit constructor calls (bool(...), uint64_t(...), uint8_t(...)) that are needed when we use UInt<T> (in my case UInt<128> in riscv32). This patch also adds two operators to UInt<T>: * operator/= required by printf_core/float_hex_converter.h:148 * operator-- required by FPUtil/ManipulationFunctions.h:166 Reviewed By: sivachandra, lntue Differential Revision: https://reviews.llvm.org/D149594
* [libc][NFC] Clean up the memory buffer handling for RPCJoseph Huber2023-05-152-38/+43
| | | | | | | | | We do a lot of arithmetic on void pointers here, so include a helper and make some more consistent names. Changes no functionality. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D150576
* [libc] Cache ownership of the shared buffer in the portJoseph Huber2023-05-151-7/+12
| | | | | | | | | | | | | This patch adds another variable to cache cases where we know that we own the buffer. This allows us to skip the atomic load on the inbox because we already know its state. This is legal immediately after opening a port, or when sending immediately after a recieve. This caching nets a significant (~17%) speedup for the basic open, send, recieve combination. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D150516
* [libc] Make the bump pointer explicitly return null on buffer oveerrunJoseph Huber2023-05-152-4/+6
| | | | | | | | | | We use a simple bump ptr in the `libc` tests. If we run out of data we can currently return other static memory and have weird failure cases. We should fail more explicitly here by returning a null pointer instead. Reviewed By: sivachandra Differential Revision: https://reviews.llvm.org/D150529
* [libc] Add optimized memset for RISCVGuillaume Chatelet2023-05-151-3/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds two versions of `memset` optimized for architectures where unaligned accesses are either illegal or extremely slow. It is currently enabled for RISCV 64 and RISCV 32 but it could be used for ARM 32 architectures as well. Here is the before / after output of libc.benchmarks.memory_functions.opt_host --benchmark_filter=BM_Memset on a quad core Linux starfive RISCV 64 board running at 1.5GHz. Before ``` Run on (4 X 1500 MHz CPU s) CPU Caches: L1 Instruction 32 KiB (x4) L1 Data 32 KiB (x4) L2 Unified 2048 KiB (x1) ------------------------------------------------------------------------ Benchmark Time CPU Iterations UserCounters... ------------------------------------------------------------------------ BM_Memset/0/0 506 ns 252 ns 2883584 bytes_per_cycle=0.238312/s bytes_per_second=340.908M/s items_per_second=3.96043M/s __llvm_libc::memset,memset Google A BM_Memset/1/0 296 ns 189 ns 2900992 bytes_per_cycle=0.234589/s bytes_per_second=335.583M/s items_per_second=5.29382M/s __llvm_libc::memset,memset Google B BM_Memset/2/0 2110 ns 1049 ns 678912 bytes_per_cycle=0.24687/s bytes_per_second=353.151M/s items_per_second=953.527k/s __llvm_libc::memset,memset Google D BM_Memset/3/0 397 ns 254 ns 3055616 bytes_per_cycle=0.238479/s bytes_per_second=341.147M/s items_per_second=3.93224M/s __llvm_libc::memset,memset Google L BM_Memset/4/0 1119 ns 621 ns 1079296 bytes_per_cycle=0.244925/s bytes_per_second=350.368M/s items_per_second=1.61047M/s __llvm_libc::memset,memset Google M BM_Memset/5/0 605 ns 349 ns 1644544 bytes_per_cycle=0.241364/s bytes_per_second=345.274M/s items_per_second=2.8614M/s __llvm_libc::memset,memset Google Q BM_Memset/6/0 472 ns 271 ns 2310144 bytes_per_cycle=0.238615/s bytes_per_second=341.341M/s items_per_second=3.68799M/s __llvm_libc::memset,memset Google S BM_Memset/7/0 262 ns 143 ns 3956736 bytes_per_cycle=0.225812/s bytes_per_second=323.026M/s items_per_second=7.0087M/s __llvm_libc::memset,memset Google U BM_Memset/8/0 454 ns 261 ns 2940928 bytes_per_cycle=0.238883/s bytes_per_second=341.725M/s items_per_second=3.82716M/s __llvm_libc::memset,memset Google W BM_Memset/9/0 8768 ns 5998 ns 115712 bytes_per_cycle=0.249196/s bytes_per_second=356.478M/s items_per_second=166.724k/s __llvm_libc::memset,uniform 384 to 4096 ``` After ``` BM_Memset/0/0 117 ns 69.5 ns 9761792 bytes_per_cycle=0.935152/s bytes_per_second=1.30639G/s items_per_second=14.3834M/s __llvm_libc::memset,memset Google A BM_Memset/1/0 97.8 ns 58.5 ns 13002752 bytes_per_cycle=0.892814/s bytes_per_second=1.24725G/s items_per_second=17.0848M/s __llvm_libc::memset,memset Google B BM_Memset/2/0 326 ns 163 ns 5156864 bytes_per_cycle=1.54408/s bytes_per_second=2.15706G/s items_per_second=6.1192M/s __llvm_libc::memset,memset Google D BM_Memset/3/0 132 ns 65.4 ns 11455488 bytes_per_cycle=0.876411/s bytes_per_second=1.22433G/s items_per_second=15.2803M/s __llvm_libc::memset,memset Google L BM_Memset/4/0 222 ns 120 ns 6405120 bytes_per_cycle=1.44398/s bytes_per_second=2.01722G/s items_per_second=8.30758M/s __llvm_libc::memset,memset Google M BM_Memset/5/0 119 ns 79.2 ns 8930304 bytes_per_cycle=1.13327/s bytes_per_second=1.58317G/s items_per_second=12.6189M/s __llvm_libc::memset,memset Google Q BM_Memset/6/0 123 ns 64.0 ns 11609088 bytes_per_cycle=1.008/s bytes_per_second=1.40817G/s items_per_second=15.6365M/s __llvm_libc::memset,memset Google S BM_Memset/7/0 85.9 ns 52.1 ns 12423168 bytes_per_cycle=0.641164/s bytes_per_second=917.192M/s items_per_second=19.1937M/s __llvm_libc::memset,memset Google U BM_Memset/8/0 114 ns 67.1 ns 10347520 bytes_per_cycle=0.911968/s bytes_per_second=1.274G/s items_per_second=14.9015M/s __llvm_libc::memset,memset Google W BM_Memset/9/0 1326 ns 785 ns 907264 bytes_per_cycle=1.89716/s bytes_per_second=2.6503G/s items_per_second=1.27348M/s __llvm_libc::memset,uniform 384 to 4096 ``` Again not as good as current glibc but it's a first step in the right direction. ``` BM_Memset/0/0 108 ns 53.6 ns 12894208 bytes_per_cycle=1.02858/s bytes_per_second=1.4369G/s items_per_second=18.668M/s glibc::memset,memset Google A BM_Memset/1/0 84.6 ns 47.6 ns 14284800 bytes_per_cycle=1.00197/s bytes_per_second=1.39974G/s items_per_second=21.0256M/s glibc::memset,memset Google B BM_Memset/2/0 160 ns 85.8 ns 8927232 bytes_per_cycle=3.30805/s bytes_per_second=4.62129G/s items_per_second=11.6596M/s glibc::memset,memset Google D BM_Memset/3/0 78.9 ns 53.6 ns 13326336 bytes_per_cycle=1.14058/s bytes_per_second=1.59338G/s items_per_second=18.674M/s glibc::memset,memset Google L BM_Memset/4/0 99.2 ns 60.8 ns 11460608 bytes_per_cycle=2.54751/s bytes_per_second=3.55884G/s items_per_second=16.4587M/s glibc::memset,memset Google M BM_Memset/5/0 93.0 ns 56.1 ns 12219392 bytes_per_cycle=1.73379/s bytes_per_second=2.42207G/s items_per_second=17.8157M/s glibc::memset,memset Google Q BM_Memset/6/0 89.4 ns 47.2 ns 14692352 bytes_per_cycle=1.34846/s bytes_per_second=1.88377G/s items_per_second=21.1795M/s glibc::memset,memset Google S BM_Memset/7/0 84.0 ns 50.0 ns 14468096 bytes_per_cycle=0.911198/s bytes_per_second=1.27293G/s items_per_second=19.994M/s glibc::memset,memset Google U BM_Memset/8/0 93.4 ns 52.8 ns 13063168 bytes_per_cycle=1.06642/s bytes_per_second=1.48977G/s items_per_second=18.9524M/s glibc::memset,memset Google W BM_Memset/9/0 438 ns 241 ns 2853888 bytes_per_cycle=6.1185/s bytes_per_second=8.54744G/s items_per_second=4.15064M/s glibc::memset,uniform 384 to 4096 ``` Reviewed By: sivachandra Differential Revision: https://reviews.llvm.org/D150433
* Reland "[CMake] Bumps minimum version to 3.20.0."Mark de Wever2023-05-132-2/+2
| | | | | | The owner of the last two failing buildbots updated CMake. This reverts commit e8e8707b4aa6e4cc04c0cffb2de01d2de71165fc.
* [libc][math] Implement fast division / modulus for UInt / (uint32_t * 2^e).Tue Ly2023-05-122-0/+147
| | | | | | | | This is to improve a performance bottleneck of printf for long double. Reviewed By: michaelrj Differential Revision: https://reviews.llvm.org/D150475
* [libc] Check the RPC server once again after the kernel exitsJoseph Huber2023-05-122-0/+8
| | | | | | | | | | | | We support asynchronous sends, that means that the kernel can issue a send, then exit the kernel as we do with the `EXIT` syscall. Because of the condition it's therefore possible for the kernel to exit and break from the loop before we check the server again. This can potentially cause us to ignore an `EXIT` call from the GPU. Reviewed By: JonChesterfield, lntue Differential Revision: https://reviews.llvm.org/D150456
* [libc] Fix undeclared 'free' function in stream testJoseph Huber2023-05-111-0/+1
| | | | | Summary: We need this function from the test.cpp but need to declare it manually.
* [libc] Implement a generic streaming interface in the RPCJoseph Huber2023-05-115-55/+137
| | | | | | | | | | | | | Currently we provide the `send_n` and `recv_n` functions. These were somewhat divergent and not tested on the GPU. This patch changes the support to be more common. We do this my making the CPU provide an array equal the to at least the lane size while the GPU can rely on the private memory address of its stack variables. This allows us to send data back and forth generically. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D150379
* [libc][obvious] Fix undefined variable after name changeJoseph Huber2023-05-114-4/+4
| | | | | | I forgot that we still used these variables in the loaders. Differential Revision: https://reviews.llvm.org/D150362
* [libc][NFC] Clean up some code in the RPC implementation.Joseph Huber2023-05-112-40/+27
| | | | | | | Small cleanup of the server code and fixes a constant name not following the naming convention. Differential Revision: https://reviews.llvm.org/D150361
* [libc][benchmark] Do not force static linkingGuillaume Chatelet2023-05-111-1/+0
| | | | Being able to link statically depends on other CMake options and choice of libc.
* [libc] Allows cross compilation of membenchmarksGuillaume Chatelet2023-05-112-9/+18
| | | | | | | | | | | | | | | | | | | | | This patch makes sure: - we pass the correct compiler options when building Google benchmarks, - we only import the C++ version of the memory functions. The change in libc/cmake/modules/LLVMLibCTestRules.cmake is here to make sure CMake can generate the right command line in the presence of the CMAKE_CROSSCOMPILING_EMULATOR option. Relevant documentation: https://cmake.org/cmake/help/latest/variable/CMAKE_CROSSCOMPILING_EMULATOR.html https://cmake.org/cmake/help/latest/command/add_custom_command.html#command:add_custom_command " If COMMAND specifies an executable target name (created by the `add_executable()` command), it will automatically be replaced by the location of the executable created at build time if either of the following is true: - The target is not being cross-compiled (i.e. the CMAKE_CROSSCOMPILING variable is not set to true). - New in version 3.6: The target is being cross-compiled and an emulator is provided (i.e. its CROSSCOMPILING_EMULATOR target property is set). In this case, the contents of CROSSCOMPILING_EMULATOR will be prepended to the command before the location of the target executable. " Reviewed By: gchatelet Differential Revision: https://reviews.llvm.org/D150200
* [libc][rpc] Allocate a single block of shared memory instead of threeJon Chesterfield2023-05-116-61/+78
| | | | | | | | | | Allows moving the pointer swap between server and client into reset. Single allocation simplifies whatever allocates the client/server, currently the libc loaders. Reviewed By: jhuber6 Differential Revision: https://reviews.llvm.org/D150337
* [libc] Fix RPC interface when sending and recieving aribtrary packetsJoseph Huber2023-05-104-3/+103
| | | | | | | | | | | | | | | | | | | | | The interface exported by the RPC library allows users to simply send and recieve fixed sized packets without worrying about the data motion underneath. However, this was broken in the current implementation. We can think of the send and recieve implementations in terms of waiting for ownership of the buffer, using the buffer, and posting ownership to the other side. Our implementation of `recv` was incorrect in the following scenarios. recv -> send // we still own the buffer and should give away ownership recv -> close // The other side is not waiting for data, this will result in multiple openings of the same port This patch attempts to fix this with an admittedly hacky fix where we track if the previous implementation was a recv and post conditionally. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D150327
* [libc][rpc] Allocate locks array within processJon Chesterfield2023-05-116-24/+15
| | | | | | | | | | | Replaces the globals currently used. Worth changing to a bitmap before allowing runtime number of ports >> 64. One bit per port is likely to be cheap enough that sizing for the worst case is always fine, otherwise in the future we can change to dynamically allocating it. Reviewed By: jhuber6 Differential Revision: https://reviews.llvm.org/D150309
* [libc] Prevent changing ownership of the port once openedJoseph Huber2023-05-101-2/+8
| | | | | | | | | | | | | The Port type has stipuations that the same exact mask used to open it needs to close it. This can currently be violated by calling its move constructor to put it somewhere else. We still need the move constructor to handle the open and closing functions. So, we simply make these constructors private and only allow a few classes to have move priviledges on it. Reviewed By: JonChesterfield, lntue Differential Revision: https://reviews.llvm.org/D150118
* [libc] Add optimized memcpy for RISCVGuillaume Chatelet2023-05-105-8/+191
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds two versions of memcpy optimized for architectures where unaligned accesses are either illegal or extremely slow. It is currently enabled for RISCV 64 and RISCV 32 but it could be used for ARM 32 architectures as well. Here is the before / after output of `libc.benchmarks.memory_functions.opt_host --benchmark_filter=BM_Memcpy` on a quad core Linux starfive RISCV 64 board running at 1.5GHz. Before: ``` Run on (4 X 1500 MHz CPU s) CPU Caches: L1 Instruction 32 KiB (x4) L1 Data 32 KiB (x4) L2 Unified 2048 KiB (x1) ------------------------------------------------------------------------ Benchmark Time CPU Iterations UserCounters... ------------------------------------------------------------------------ BM_Memcpy/0/0 474 ns 474 ns 1483776 bytes_per_cycle=0.243492/s bytes_per_second=348.318M/s items_per_second=2.11097M/s __llvm_libc::memcpy,memcpy Google A BM_Memcpy/1/0 210 ns 209 ns 3649536 bytes_per_cycle=0.233819/s bytes_per_second=334.481M/s items_per_second=4.77519M/s __llvm_libc::memcpy,memcpy Google B BM_Memcpy/2/0 1814 ns 1814 ns 396288 bytes_per_cycle=0.247899/s bytes_per_second=354.622M/s items_per_second=551.402k/s __llvm_libc::memcpy,memcpy Google D BM_Memcpy/3/0 89.3 ns 89.2 ns 7459840 bytes_per_cycle=0.217415/s bytes_per_second=311.014M/s items_per_second=11.2071M/s __llvm_libc::memcpy,memcpy Google L BM_Memcpy/4/0 134 ns 134 ns 3815424 bytes_per_cycle=0.226584/s bytes_per_second=324.131M/s items_per_second=7.44567M/s __llvm_libc::memcpy,memcpy Google M BM_Memcpy/5/0 52.8 ns 52.6 ns 11001856 bytes_per_cycle=0.194893/s bytes_per_second=278.797M/s items_per_second=19.0284M/s __llvm_libc::memcpy,memcpy Google Q BM_Memcpy/6/0 180 ns 180 ns 4101120 bytes_per_cycle=0.231884/s bytes_per_second=331.713M/s items_per_second=5.55957M/s __llvm_libc::memcpy,memcpy Google S BM_Memcpy/7/0 195 ns 195 ns 3906560 bytes_per_cycle=0.232951/s bytes_per_second=333.239M/s items_per_second=5.1217M/s __llvm_libc::memcpy,memcpy Google U BM_Memcpy/8/0 152 ns 152 ns 4789248 bytes_per_cycle=0.227507/s bytes_per_second=325.452M/s items_per_second=6.58187M/s __llvm_libc::memcpy,memcpy Google W BM_Memcpy/9/0 6036 ns 6033 ns 118784 bytes_per_cycle=0.249158/s bytes_per_second=356.423M/s items_per_second=165.75k/s __llvm_libc::memcpy,uniform 384 to 4096 ``` After: ``` BM_Memcpy/0/0 126 ns 126 ns 5770240 bytes_per_cycle=1.04707/s bytes_per_second=1.46273G/s items_per_second=7.9385M/s __llvm_libc::memcpy,memcpy Google A BM_Memcpy/1/0 75.1 ns 75.0 ns 10204160 bytes_per_cycle=0.691143/s bytes_per_second=988.687M/s items_per_second=13.3289M/s __llvm_libc::memcpy,memcpy Google B BM_Memcpy/2/0 333 ns 333 ns 2174976 bytes_per_cycle=1.39297/s bytes_per_second=1.94596G/s items_per_second=3.00002M/s __llvm_libc::memcpy,memcpy Google D BM_Memcpy/3/0 49.6 ns 49.5 ns 16092160 bytes_per_cycle=0.710161/s bytes_per_second=1015.89M/s items_per_second=20.1844M/s __llvm_libc::memcpy,memcpy Google L BM_Memcpy/4/0 57.7 ns 57.7 ns 11213824 bytes_per_cycle=0.561557/s bytes_per_second=803.314M/s items_per_second=17.3228M/s __llvm_libc::memcpy,memcpy Google M BM_Memcpy/5/0 48.0 ns 47.9 ns 16437248 bytes_per_cycle=0.346708/s bytes_per_second=495.97M/s items_per_second=20.8571M/s __llvm_libc::memcpy,memcpy Google Q BM_Memcpy/6/0 67.5 ns 67.5 ns 10616832 bytes_per_cycle=0.614173/s bytes_per_second=878.582M/s items_per_second=14.8142M/s __llvm_libc::memcpy,memcpy Google S BM_Memcpy/7/0 84.7 ns 84.6 ns 10480640 bytes_per_cycle=0.819077/s bytes_per_second=1.14424G/s items_per_second=11.8174M/s __llvm_libc::memcpy,memcpy Google U BM_Memcpy/8/0 61.7 ns 61.6 ns 11191296 bytes_per_cycle=0.550078/s bytes_per_second=786.893M/s items_per_second=16.2279M/s __llvm_libc::memcpy,memcpy Google W BM_Memcpy/9/0 981 ns 981 ns 703488 bytes_per_cycle=1.52333/s bytes_per_second=2.12807G/s items_per_second=1019.81k/s __llvm_libc::memcpy,uniform 384 to 4096 ``` It is not as good as glibc for now so there's room for improvement. I suspect a path pumping 16 bytes at once given the doubled numbers for large copies. ``` BM_Memcpy/0/1 146 ns 82.5 ns 8576000 bytes_per_cycle=1.35236/s bytes_per_second=1.88922G/s items_per_second=12.1169M/s glibc memcpy,memcpy Google A BM_Memcpy/1/1 112 ns 63.7 ns 10634240 bytes_per_cycle=0.628018/s bytes_per_second=898.387M/s items_per_second=15.702M/s glibc memcpy,memcpy Google B BM_Memcpy/2/1 315 ns 180 ns 4079616 bytes_per_cycle=2.65229/s bytes_per_second=3.7052G/s items_per_second=5.54764M/s glibc memcpy,memcpy Google D BM_Memcpy/3/1 85.3 ns 43.1 ns 15854592 bytes_per_cycle=0.774164/s bytes_per_second=1107.45M/s items_per_second=23.2249M/s glibc memcpy,memcpy Google L BM_Memcpy/4/1 105 ns 54.3 ns 13427712 bytes_per_cycle=0.7793/s bytes_per_second=1114.8M/s items_per_second=18.4109M/s glibc memcpy,memcpy Google M BM_Memcpy/5/1 77.1 ns 43.2 ns 16476160 bytes_per_cycle=0.279808/s bytes_per_second=400.269M/s items_per_second=23.1428M/s glibc memcpy,memcpy Google Q BM_Memcpy/6/1 112 ns 62.7 ns 11236352 bytes_per_cycle=0.676078/s bytes_per_second=967.137M/s items_per_second=15.9387M/s glibc memcpy,memcpy Google S BM_Memcpy/7/1 131 ns 65.5 ns 11751424 bytes_per_cycle=0.965616/s bytes_per_second=1.34895G/s items_per_second=15.2762M/s glibc memcpy,memcpy Google U BM_Memcpy/8/1 104 ns 55.0 ns 12314624 bytes_per_cycle=0.583336/s bytes_per_second=834.468M/s items_per_second=18.1937M/s glibc memcpy,memcpy Google W BM_Memcpy/9/1 932 ns 466 ns 1480704 bytes_per_cycle=3.17342/s bytes_per_second=4.43321G/s items_per_second=2.14679M/s glibc memcpy,uniform 384 to 4096 ``` Reviewed By: sivachandra Differential Revision: https://reviews.llvm.org/D150202
* [libc][NFC] Simplify string-table generation internals.Siva Chandra Reddy2023-05-083-26/+20
| | | | | | Reviewed By: michaelrj Differential Revision: https://reviews.llvm.org/D150088
* [libc][rpc][nfc] Encapsulate access to outbox pointerJon Chesterfield2023-05-081-7/+21
| | | | | | Reviewed By: jhuber6 Differential Revision: https://reviews.llvm.org/D150065
* [libc] Make the opcode parameter a compile time constantJoseph Huber2023-05-084-11/+10
| | | | | | | | | | | | Currently the opcode is only valid if it is the same between all of the ports. This is possible to violate if the opcode is places into a memory location and then read in a non-uniform manner by the warp / wavefront. Moving this to a compile time constant makes it impossible to break this invariant. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D150115
* [libc] Use Linux errno and signal strings for FuchsiaRoland McGrath2023-05-072-14/+28
| | | | | | | | | | | | The exact set of supported values is determined by the <errno.h> and <signal.h> headers, which don't (yet) come from llvm-libc on Fuchsia. The mappings of SIG* and E* codes to psignal/strsignal and perror/strerror text used in Fuchsia libc today is the same as for Linux. Reviewed By: abrachet Differential Revision: https://reviews.llvm.org/D150026
* [libc] Fix typos in documentationKazu Hirata2023-05-064-4/+4
|
* Revert "Reland "[CMake] Bumps minimum version to 3.20.0.""Mark de Wever2023-05-062-2/+2
| | | | | | Unfortunatly not all buildbots are updated. This reverts commit ffb807ab5375b3f78df198dc5d4302b3b552242f.
* Reland "[CMake] Bumps minimum version to 3.20.0."Mark de Wever2023-05-062-2/+2
| | | | | | All build bots should be updated now. This reverts commit 44d38022ab29a3156349602733b3459df5beef93.
* [libc][docs] Fix incorrect CMake argument in GPU documentationJoseph Huber2023-05-051-2/+2
| | | | | | | Summary; This was changed a long time ago to drop the `LLVM_` prefix. Differential Revision: https://reviews.llvm.org/D150012
* [libc] Make the RPC interfaces move onlyJoseph Huber2023-05-051-9/+13
| | | | | | | | | | | | | This patch uses the changed interface in D149972 to make these classes move-only. The `Port` class could be further refined to be construct-only in a future patch, but for now this makes it more difficult to misuse the interface. Depends on D149972 Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D149974
* [libc] Rework 'cpp:optional' to support move constructionJoseph Huber2023-05-051-43/+81
| | | | | | | | | | | This patch replaces the existing `cpp::optional` type with a newer version that has more features. This class is heavily inspired by the old `llvm::Optional` class. Currently the limitations of this class is that we only handle types with trivial constructors or operators. Reviewed By: sivachandra Differential Revision: https://reviews.llvm.org/D149972
* [libc] Support concurrent RPC port access on the GPUJoseph Huber2023-05-057-122/+160
| | | | | | | | | | | | | | | | | | | | | | | | | Previously we used a single port to implement the RPC. This was sufficient for single threaded tests but can potentially cause deadlocks when using multiple threads. The reason for this is that GPUs make no forward progress guarantees. Therefore one group of threads waiting on another group of threads can spin forever because there is no guarantee that the other threads will continue executing. The typical workaround for this is to allocate enough memory that a sufficiently large number of work groups can make progress. As long as this number is somewhat close to the amount of total concurrency we can obtain reliable execution around a shared resource. This patch enables using multiple ports by widening the arrays to a predetermined size and indexes into them. Empty ports are currently obtained via a trivial linker scan. This should be imporoved in the future for performance reasons. Portions of D148191 were applied to achieve parallel support. Depends on D149581 Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D149598
* [libc] Fix hanging test on NVPTX due to lack of warp syncJoseph Huber2023-05-041-4/+0
| | | | | | | | | Previously this wasn't implemented because it's effectively a no-op. However, this should be safe to emit on sm_60 architectures. It's important because it carries semantic importance for whether or not something can be moved. So we should always emit this instrinsic. Differential Revision: https://reviews.llvm.org/D149923
* [libc] Change GPU startup and loader to use multiple kernelsJoseph Huber2023-05-045-267/+242
| | | | | | | | | | | | | | | | | | | The GPU has a different execution model to standard `_start` implementations. On the GPU, all threads are active at the start of a kernel. In order to correctly intitialize and call the constructors we want single threaded semantics. Previously, this was done using a makeshift global barrier with atomics. However, it should be easier to simply put the portions of the code that must be single threaded in separate kernels and then call those with only one thread. Generally, mixing global state between kernel launches makes optimizations more difficult, similarly to calling a function outside of the TU, but for testing it is better to be correct. Depends on D149527 D148943 Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D149581
* [libc] Enable multiple threads to use RPC on the GPUJoseph Huber2023-05-0411-42/+184
| | | | | | | | | | | | | | | | | | | The execution model of the GPU expects that groups of threads will execute in lock-step in SIMD fashion. It's both important for performance and correctness that we treat this as the smallest possible granularity for an RPC operation. Thus, we map multiple threads to a single larger buffer and ship that across the wire. This patch makes the necessary changes to support executing the RPC on the GPU with multiple threads. This requires some workarounds to mimic the model when handling the protocol from the CPU. I'm not completely happy with some of the workarounds required, but I think it should work. Uses some of the implementation details from D148191. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D148943
* [libc] Enable linux directory entries syscalls in riscv64Mikhail R. Gadelha2023-05-046-6/+15
| | | | | | | | | | This patch updates the struct dirent to be on par with glibc (by adding a missing d_type member) and update the readdir call to use SYS_getdents64 instead of SYS_getdents. Reviewed By: sivachandra Differential Revision: https://reviews.llvm.org/D147738
* [libc][rpc] Update locking to work on voltaJon Chesterfield2023-05-046-5/+81
| | | | | | | | | | | | | Carefully work around not knowing the thread mask that nvptx intrinsic functions require. If the warp is converged when calling try_lock, a single rpc call will handle all lanes within it. Otherwise more than one rpc call with thread masks that compose to the unknown one will occur. Reviewed By: jhuber6 Differential Revision: https://reviews.llvm.org/D149897
* Revert "[libc][rpc] Update locking to work on volta"Jon Chesterfield2023-05-046-71/+5
| | | | This reverts commit b1323738649e96aac943f3773ec7336df110eea5.
* [libc][rpc] Update locking to work on voltaJon Chesterfield2023-05-046-5/+71
| | | | | | | | | | | | | Carefully work around not knowing the thread mask that nvptx intrinsic functions require. If the warp is converged when calling try_lock, a single rpc call will handle all lanes within it. Otherwise more than one rpc call with thread masks that compose to the unknown one will occur. Reviewed By: jhuber6 Differential Revision: https://reviews.llvm.org/D149897
* Revert "[libc][rpc] Land helpers from D148943"Jon Chesterfield2023-05-041-10/+0
| | | | This reverts commit 09ceb4729f1ca8781718d41b7876b68820baadba.
* [libc][rpc] Land helpers from D148943Jon Chesterfield2023-05-041-0/+10
|
* [libc] Remove support for atomic test due to failing on sm_60Joseph Huber2023-05-041-9/+13
| | | | | | | This test fails on sm_60 because of the atomics codegen. We test atomics indirectly with the `rpc` so we still have coverage. Differential Revision: https://reviews.llvm.org/D149887