diff options
author | Rafael Cardoso Fernandes Sousa <rafaelcfsousa@ibm.com> | 2022-02-15 19:07:58 -0600 |
---|---|---|
committer | Rafael Cardoso Fernandes Sousa <rafaelcfsousa@ibm.com> | 2022-04-15 22:42:38 -0500 |
commit | a14d04752036c9f1b4eb000d079b27da3bacedf2 (patch) | |
tree | cb918438e3bff523fbff923eed84c9b51686f171 /numpy/core/setup.py | |
parent | 1ab7e8fbf90ac4a81d2ffdde7d78ec464dccb02e (diff) | |
download | numpy-a14d04752036c9f1b4eb000d079b27da3bacedf2.tar.gz |
ENH,SIMD: Vectorize modulo/divide using the universal intrinsics
This commit optimizes the operations below:
- fmod (signed/unsigned integers)
- remainder (signed/unsigned integers)
- divmod (signed/unsigned integers)
- floor_divide (signed integers)
using the VSX4/Power10 integer vector division/modulo instructions.
See the improvements below (maximum speedup):
- numpy.fmod
- arr OP arr: signed (1.17x), unsigned (1.13x)
- arr OP scalar: signed (1.34x), unsigned (1.29x)
- numpy.remainder
- arr OP arr: signed (4.19x), unsigned (1.17x)
- arr OP scalar: signed (4.87x), unsigned (1.29x)
- numpy.divmod
- arr OP arr: signed (4.73x), unsigned (1.23x)
- arr OP scalar: signed (5.05x), unsigned (1.31x)
- numpy.floor_divide
- arr OP arr: signed (4.44x)
The times above were collected using the benchmark tool available in NumPy.
Diffstat (limited to 'numpy/core/setup.py')
-rw-r--r-- | numpy/core/setup.py | 1 |
1 files changed, 1 insertions, 0 deletions
diff --git a/numpy/core/setup.py b/numpy/core/setup.py index f6b31075d..fe52fde0d 100644 --- a/numpy/core/setup.py +++ b/numpy/core/setup.py @@ -1014,6 +1014,7 @@ def configuration(parent_package='',top_path=None): join('src', 'umath', 'loops_umath_fp.dispatch.c.src'), join('src', 'umath', 'loops_exponent_log.dispatch.c.src'), join('src', 'umath', 'loops_hyperbolic.dispatch.c.src'), + join('src', 'umath', 'loops_modulo.dispatch.c.src'), join('src', 'umath', 'matmul.h.src'), join('src', 'umath', 'matmul.c.src'), join('src', 'umath', 'clip.h'), |