Send a blank email to [email protected] to get a copy of this message
Hi Guys,
My name is Bogdan Andone and I work for Intel in the area of SW performance analysis and
optimizations.
We would like to actively contribute to Zend PHP project and to involve ourselves in finding new
performance improvement opportunities based on available and/or new hardware features.
I am still in the source code digesting phase but I had a look to the fast_memcpy() implementation
in opcache extension which uses SSE intrinsics:
If I am not wrong fast_memcpy() function is not currently used, as I didn't find the
"-msse4.2" gcc flag in the Makefile. I assume you probably didn't see any performance
benefit so you preserved generic memcpy() usage.
I would like to propose a slightly different implementation which uses _mm_store_si128() instead of
_mm_stream_si128(). This ensures that copied memory is preserved in data cache, which is not bad as
the interpreter will start to use this data without the need to go back one more time to memory.
_mm_stream_si128() in the current implementation is intended to be used for stores where we want to
avoid reading data into the cache and the cache pollution; in opcache scenario it seems that
preserving the data in cache has a positive impact.
Running php-cgi -T10000 on WordPress4.1/index.php I see ~1% performance increase for the new version
of fast_memcpy() compared with the generic memcpy(). Same result using a full load test with
http_load on a Haswell EP 18 cores.
Here is the proposed pull request: https://github.com/php/php-src/pull/1446
Related to the SW prefetching instructions in fast_memcpy()... they are not really useful in this
place. There benefit is almost negligible as the address requested for prefetch will be needed at
the next iteration (few cycles later), while the time needed to get data from RAM is >100 cycles
usually... Nevertheless... they don't heart and it seems they still have a very small benefit
so I preserved the original instruction and I added a new prefetch request for the destination
pointer.
Hope it helps,
Bogdan