diff options
author | Daniel Dragan <bulk88@hotmail.com> | 2012-10-24 16:15:51 -0400 |
---|---|---|
committer | Father Chrysostomos <sprout@cpan.org> | 2012-10-25 20:02:55 -0700 |
commit | f019c49e380f764c1ead36fe3602184804292711 (patch) | |
tree | a4a94ed46a4110e81d72daab995701db6efa1e6d /handy.h | |
parent | 3ed356df9354193bbcc5202f066f3c07ae84b443 (diff) | |
download | perl-f019c49e380f764c1ead36fe3602184804292711.tar.gz |
optimize memory wrap croaks, often used in MEM_WRAP_CHECK
Most perls are built with PERL_MALLOC_WRAP. This causes MEM_WRAP_CHECK
macro to perform some checks on the requested allocation size in macro
Newx. The checks are performed at the caller, not in the callee (for me
on Win32 perl the callee in Newx is Perl_safesysmalloc) of Newx.
If the check fails a "Perl_croak_nocontext("%s",PL_memory_wrap)" is done.
In x86 machine code,
"if(bad_alloc) Perl_croak_nocontext("%s",PL_memory_wrap); will be written
as "cond jmp ahead ~15 bytes", "push const pointer", "push const pointer",
"call const pointer". For each Newx where the allocation amount was not a
constant (constant folding would remove the croak memory wrap branch
compleatly), the branch takes 15-19 bytes depending on x86 compiler. There
are about 80 Newx'es in the interp (win32 dynamic linking perl) that do
the memory wrap check and have a
"Perl_croak_nocontext("%s",PL_memory_wrap)" in them after all optimizations
by the compiler.
This patch reduces the memory wrap branch from 15-19 to
5 bytes on x86. Since croak_memory_wrap is a static and a noreturn, a
compiler with IPO may optimize the whole branch to "cond jmp 32 bits
relative" at each callsite. A less optimal complier may do "cond jmp 8 bits
relative (jump past the "call S_croak_memory_wrap" instruction),
then "call S_croak_memory_wrap". Both ways are better than the current
situation. The reason why croak_memory_wrap is a static and not an export
is that the compiler has more opportunity to optimize/reduce the impact of
the memory wrap branch at the call site if the target is in the same image
rather than in a different image, which would require using the platform
specific dynamic linking mechanism/export table/etc, which often requires
a new stack frame per ABI of the platform. If a dynamic linked XS module
does not use S_croak_memory_wrap it will be removed from the image by the
C compiler. If it is included in the XS image, it is a very small block
of code and a 3 byte string litteral. A CPU cache line is typically
32 or 64 bytes and a memory read is typically 16. Cutting the
instructions by 10 to 16 bytes out of "hot code" (10 of the ~80 call
sites are pp_*) is a worthy goal. In a few places the memory wrap croak is
used explictly, not from a MEM_WRAP_CHECK, this patch converts those to use
the static. If PERL_MALLOC_WRAP is undef, there are still a couple uses of
croak memory wrap, so do not keep S_croak_memory_wrap in a ifdef
PERL_MALLOC_WRAP. Also see
http://www.nntp.perl.org/group/perl.perl5.porters/2012/10/msg194383.html
and [perl #115456].
Diffstat (limited to 'handy.h')
-rw-r--r-- | handy.h | 6 |
1 files changed, 3 insertions, 3 deletions
@@ -1141,13 +1141,13 @@ PoisonWith(0xEF) for catching access to freed memory. * overly eager compilers that will bleat about e.g. * (U16)n > (size_t)~0/sizeof(U16) always being false. */ #ifdef PERL_MALLOC_WRAP -#define MEM_WRAP_CHECK(n,t) MEM_WRAP_CHECK_1(n,t,PL_memory_wrap) +#define MEM_WRAP_CHECK(n,t) \ + (void)(sizeof(t) > 1 && ((MEM_SIZE)(n)+0.0) > MEM_SIZE_MAX/sizeof(t) && (croak_memory_wrap(),0)) #define MEM_WRAP_CHECK_1(n,t,a) \ (void)(sizeof(t) > 1 && ((MEM_SIZE)(n)+0.0) > MEM_SIZE_MAX/sizeof(t) && (Perl_croak_nocontext("%s",(a)),0)) #define MEM_WRAP_CHECK_(n,t) MEM_WRAP_CHECK(n,t), -#define PERL_STRLEN_ROUNDUP(n) ((void)(((n) > MEM_SIZE_MAX - 2 * PERL_STRLEN_ROUNDUP_QUANTUM) ? (Perl_croak_nocontext("%s",PL_memory_wrap),0):0),((n-1+PERL_STRLEN_ROUNDUP_QUANTUM)&~((MEM_SIZE)PERL_STRLEN_ROUNDUP_QUANTUM-1))) - +#define PERL_STRLEN_ROUNDUP(n) ((void)(((n) > MEM_SIZE_MAX - 2 * PERL_STRLEN_ROUNDUP_QUANTUM) ? (croak_memory_wrap(),0):0),((n-1+PERL_STRLEN_ROUNDUP_QUANTUM)&~((MEM_SIZE)PERL_STRLEN_ROUNDUP_QUANTUM-1))) #else #define MEM_WRAP_CHECK(n,t) |