opcache: Patch SSE based fast_memcpy() implementation

Use _mm_store_si128() instead of _mm_stream_si128(). This ensures that copied memory is preserved in data cache, which is good as the interpretor will start to use this data without the need to go back to memory. _mm_stream* is intended to be used for stores where we want to avoid reading data into the cache and the cache pollution; in our scenario it seems that preserving the data in cache has a positive impact. Tests on WordPress 4.1 show ~1% performance increase with fast_memcpy() in place versus standard memcpy() when running php-cgi -T10000 wordpress/index.php. I also updated SW prefetching on target memory but its contribution is almost negligible. The address to be prefetched will be used in a couple of cycles (at the next iteration) while the data from memory will be available in >100 cycles.
author: Bogdan Andone <bogdan.andone@intel.com> 2015-07-29 14:19:04 +0300
committer: Bogdan Andone <bogdan.andone@intel.com> 2015-07-29 14:51:57 +0300
commit: 68185bafbe2c7ab025703917d259c4c19ce456eb (patch)
tree: 7925372b64ce2f9ac0ff88079384cbd76b625569
parent: 4e66cce87ce0e57a7394486412e61abcfc5f3520 (diff)
download: php-git-68185bafbe2c7ab025703917d259c4c19ce456eb.tar.gz
1 files changed, 5 insertions, 4 deletions
diff --git a/ext/opcache/zend_accelerator_util_funcs.c b/ext/opcache/zend_accelerator_util_funcs.c
index e20f3d16f6..cfb03a00e4 100644
--- a/ext/opcache/zend_accelerator_util_funcs.c
+++ b/ext/opcache/zend_accelerator_util_funcs.c
@@ -658,16 +658,17 @@ static zend_always_inline void fast_memcpy(void *dest, const void *src, size_t s
 
 	do {
 		_mm_prefetch(dqsrc + 4, _MM_HINT_NTA);
+		_mm_prefetch(dqdest + 4, _MM_HINT_T0);
 
 		__m128i xmm0 = _mm_load_si128(dqsrc + 0);
 		__m128i xmm1 = _mm_load_si128(dqsrc + 1);
 		__m128i xmm2 = _mm_load_si128(dqsrc + 2);
 		__m128i xmm3 = _mm_load_si128(dqsrc + 3);
 		dqsrc  += 4;
-		_mm_stream_si128(dqdest + 0, xmm0);
-		_mm_stream_si128(dqdest + 1, xmm1);
-		_mm_stream_si128(dqdest + 2, xmm2);
-		_mm_stream_si128(dqdest	+ 3, xmm3);
+		_mm_store_si128(dqdest + 0, xmm0);
+		_mm_store_si128(dqdest + 1, xmm1);
+		_mm_store_si128(dqdest + 2, xmm2);
+		_mm_store_si128(dqdest + 3, xmm3);
 		dqdest += 4;
 	} while (dqsrc != end);
 }
author	Bogdan Andone <bogdan.andone@intel.com>	2015-07-29 14:19:04 +0300
committer	Bogdan Andone <bogdan.andone@intel.com>	2015-07-29 14:51:57 +0300
commit	68185bafbe2c7ab025703917d259c4c19ce456eb (patch)
tree	7925372b64ce2f9ac0ff88079384cbd76b625569
parent	4e66cce87ce0e57a7394486412e61abcfc5f3520 (diff)
download	php-git-68185bafbe2c7ab025703917d259c4c19ce456eb.tar.gz