| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
After finding a working TLS storage class specifier, configure was
continuing to test other candidates. This caused it to prefer
__declspec(thread) over __thread. However, __declspec(thread) is
ignored with a warning by mingw-w64 [1] and silently ignored by clang [2].
The resulting binary behaved as if PIXMAN_NO_TLS was defined.
Bug introduced by a069da6c.
[1] https://bugs.freedesktop.org/show_bug.cgi?id=57591
[2] http://lists.freedesktop.org/archives/pixman/2012-October/002320.html
|
|
|
|
|
|
|
|
|
| |
MinGW-w64 uses the GNU compiler and does not define _MSC_VER.
Nevertheless, it provides xmmintrin.h and must be handled
here like the MS compiler. Otherwise compilation fails due to
conflicting declarations.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
|
|
|
|
| |
Fixes bug 56889.
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
add_8_8 - add_8888_8888
Performance numbers before/after on MIPS-74kc @ 1GHz:
lowlevel-blt-bench results
Referent (before):
add_8888_8888_8888 = L1: 17.55 L2: 13.35 M: 8.13 ( 93.95%) HT: 6.60 VT: 6.64 R: 6.45 RT: 3.47 ( 26Kops/s)
add_8_8 = L1: 86.07 L2: 84.89 M: 62.36 ( 90.11%) HT: 36.36 VT: 34.74 R: 29.56 RT: 11.56 ( 52Kops/s)
add_8888_8888 = L1: 95.59 L2: 73.05 M: 17.62 (101.84%) HT: 15.46 VT: 15.01 R: 13.94 RT: 6.71 ( 42Kops/s)
Optimized:
add_8888_8888_8888 = L1: 41.52 L2: 33.21 M: 11.97 (138.45%) HT: 10.47 VT: 10.19 R: 9.42 RT: 4.86 ( 32Kops/s)
add_8_8 = L1: 135.06 L2: 104.82 M: 57.13 ( 82.58%) HT: 34.79 VT: 36.60 R: 28.28 RT: 10.54 ( 51Kops/s)
add_8888_8888 = L1: 176.36 L2: 67.82 M: 17.48 (101.06%) HT: 15.16 VT: 14.62 R: 13.88 RT: 8.05 ( 45Kops/s)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
add_8888_8_8888 - add_8888_n_8888
Performance numbers before/after on MIPS-74kc @ 1GHz:
lowlevel-blt-bench results
Referent (before):
add_0565_8_0565 = L1: 8.89 L2: 8.37 M: 7.35 ( 29.22%) HT: 5.90 VT: 5.85 R: 5.67 RT: 3.31 ( 26Kops/s)
add_8888_8_8888 = L1: 17.22 L2: 14.17 M: 9.89 ( 65.56%) HT: 7.57 VT: 7.50 R: 7.36 RT: 4.10 ( 30Kops/s)
add_8888_n_8888 = L1: 17.79 L2: 14.87 M: 10.35 ( 54.89%) HT: 5.19 VT: 4.93 R: 4.92 RT: 1.90 ( 19Kops/s)
Optimized:
add_0565_8_0565 = L1: 21.72 L2: 20.01 M: 14.96 ( 59.54%) HT: 12.03 VT: 11.81 R: 11.26 RT: 6.33 ( 37Kops/s)
add_8888_8_8888 = L1: 47.42 L2: 38.64 M: 15.90 (105.48%) HT: 13.34 VT: 13.03 R: 11.84 RT: 6.63 ( 38Kops/s)
add_8888_n_8888 = L1: 54.83 L2: 42.66 M: 17.36 ( 92.11%) HT: 15.20 VT: 14.82 R: 13.66 RT: 7.83 ( 41Kops/s)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- add_8_8_8
Performance numbers before/after on MIPS-74kc @ 1GHz:
lowlevel-blt-bench results
Referent (before):
add_n_8_8 = L1: 41.37 L2: 37.83 M: 30.38 ( 60.45%) HT: 23.70 VT: 22.85 R: 21.51 RT: 10.32 ( 45Kops/s)
add_n_8_8888 = L1: 16.01 L2: 14.46 M: 11.64 ( 46.32%) HT: 5.50 VT: 5.18 R: 5.06 RT: 1.89 ( 18Kops/s)
add_8_8_8 = L1: 13.26 L2: 12.47 M: 11.16 ( 29.61%) HT: 8.09 VT: 8.04 R: 7.68 RT: 3.90 ( 29Kops/s)
Optimized:
add_n_8_8 = L1: 96.03 L2: 79.37 M: 51.89 (103.31%) HT: 32.59 VT: 31.29 R: 28.52 RT: 11.08 ( 46Kops/s)
add_n_8_8888 = L1: 53.61 L2: 46.92 M: 23.78 ( 94.70%) HT: 19.06 VT: 18.64 R: 17.30 RT: 9.15 ( 43Kops/s)
add_8_8_8 = L1: 89.65 L2: 66.82 M: 37.10 ( 98.48%) HT: 22.10 VT: 21.74 R: 20.12 RT: 8.12 ( 41Kops/s)
|
|
|
|
|
| |
GCC 4.6 has problems with force_inline, so just use normal inline instead.
Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=55630
|
|
|
|
|
|
|
|
|
|
|
| |
pixman_composite_trapezoids() is supposed to composite across the
entire destination, but it actually only composites across the extent
of the trapezoids. For operators such as ADD or OVER this doesn't
matter since a zero source has no effect on the destination. But for
operators such as SRC or IN, it does matter.
So for such operators where a zero source has an effect, don't clip to
the trap extents.
|
|
|
|
|
| |
The computation of the extents rectangle is moved to its own
function.
|
|
|
|
|
|
|
|
|
|
|
| |
When pixman_image_create_bits() function is given NULL for bits, it
will allocate a new buffer and initialize it to zero. However, in some
cases, only a small region of the image is actually used; in that case
it is wasteful to touch all of the memory.
The new pixman_image_create_bits_no_clear() works exactly like
_create_bits() except that it doesn't initialize any newly allocated
memory.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(fixes bug #52101)
On MirBSD, the compiler produces a (harmless) warning when the compiler
is called without the standard CFLAGS:
foo.c:0: note: someone does not honour COPTS correctly, passed 0 times
However, PIXMAN_LINK_WITH_ENV considers _any_ output on stderr as an
error, even if the exit status of the compiler is 0. Furthermore, it
resets CFLAGS and LDFLAGS at the start. On MirBSD, this will lead to a
warning in each test, making all such tests fail. In particular, the
pthread_setspecific test fails, thus pixman is compiled without thread
support. This leads to compile errors later on, or at least it did when
I tried this on pkgsrc. Re-adding the saved CFLAGS, LDFLAGS and LIBS
before the test makes it work.
The second hunk inverts the order of the pthread flag checks. On BSD
systems (this is true at least on OpenBSD and MirBSD), both -lpthread
and -pthread work but the latter is "preferred", whatever this means.
|
| |
|
|
|
|
|
|
|
|
| |
This provides a way to enable MIPS DSP ASE optimizations if running
under qemu-user (where /proc/cpuinfo contains information about the
host processor instead of the emulated one). Can be used for running
pixman test suite in qemu-user when having no access to real MIPS
hardware.
|
|
|
|
|
| |
This is used to compute whether the regions in question overlap, but
nothing makes use of this information, so it can be removed.
|
|
|
|
|
|
| |
The while part of a do/while loop was formatted as if it were a while
loop with an empty body. Probably some indent tool misinterpreted the
code at some point.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In order for a src/mask pair to be considered a pixbuf, they have to
have identical transformations, but we don't check for that. Since the
only fast paths we have for pixbufs require identity transformations,
it sufficies to check that both source and mask are
untransformed.
This is also the reason that this bug can't be triggered by any test
code - if the source and mask had different transformations, we would
consider them a pixbuf, but then wouldn't take the fast path because
at least one of the transformations would be different from the
identity.
|
|
|
|
|
| |
pixman-combine32.[ch] were the only built sources, so BUILT_SOURCES
can now be removed.
|
|
|
|
|
|
|
| |
GCC doesn't move the divisions out of the loop, so do it manually by
looking up the four (1.0f / mask) values in a table. Table lookups are
used under the theory that one L2 hit plus three L1 hits is preferable
to four floating point divisions.
|
|
|
|
|
|
|
| |
Since pixman-combine64.[ch] are not used anymore, there is no point
generating these files from pixman-combine.[ch].template.
Also get rid of dependency on perl in configure.ac.
|
|
|
|
|
|
|
|
|
|
| |
The 64 bit pipeline is not used anymore, so it can now be removed.
Don't generate pixman-combine64.[ch] anymore. Don't generate the
pixman-srgb.c anymore. Delete all the 64 bit fetchers in
pixman-access.c, all the 64 bit iterator functions in
pixman-bits-image.c and all the functions that expand from 8 to 16
bits.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In pixman-bits-image.c, remove bits_image_fetch_untransformed_64() and
add bits_image_fetch_untransformed_float(); change
dest_get_scanline_wide() to produce a floating point buffer,
In the gradients, change *_get_scanline_wide() to call
pixman_expand_to_float() instead of pixman_expand().
In pixman-general.c change the wide Bpp to 16 instead of 8, and
initialize the buffers to 0 to prevent NaNs from causing trouble.
In pixman-noop.c make the wide solid iterator generate floating point
pixels.
In pixman-solid-fill.c, cache a floating point pixel, and make the
wide iterator generate floating point pixels.
Bug fix in bits_image_fetch_untransformed_repeat_normal
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Three new function pointer fields are added to bits_image_t:
fetch_scanline_float
fetch_pixel_float
store_scanline_float
similar to the existing 32 and 64 bit accessors. The fetcher_info_t
struct in pixman_access similarly gets a new get_scanline_float field.
For most formats, the new get_scanline_float field is set to a new
function fetch_scanline_generic_float() that first calls the 32 bit
fetcher uses the 32 bit scanline fetcher and then expands these pixels
to floating point.
For the 10 bpc formats, new floating point accessors are added that
use pixman_unorm_to_float() and pixman_float_to_unorm() to convert
back and forth.
The PIXMAN_a8r8g8b8_sRGB format is handled with a 256-entry table that
maps 8 bit sRGB channels to linear single precision floating point
numbers. The sRGB->linear direction can then be done with a simple
table lookup.
The other direction is currently done with 4096-entry table which
works fine for 16 bit integers, but not so great for floating
point. So instead this patch uses a binary search in the sRGB->linear
table. The existing 32 bit accessors for the sRGB format are also
converted to use this method.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A new struct argb_t containing a floating point pixel is added to
pixman-private.h and conversion routines are added to pixman-utils.c
to convert normalized integers to and from that struct.
New functions:
- pixman_expand_to_float()
Expands a buffer of integer pixels to a buffer of argb_t pixels
- pixman_contract_from_float()
Converts a buffer of argb_t pixels to a buffer integer pixels
- pixman_float_to_unorm()
Converts a floating point number to an unsigned normalized integer
- pixman_unorm_to_float()
Converts an unsigned normalized integer to a floating point number
|
|
|
|
|
|
|
|
|
| |
This test runs the new floating point combiners on random input with
divide-by-zero exceptions turned on.
With the floating point combiners the only thing we guarantee is that
divide-by-zero exceptions are not generated, so change
enable_fp_exceptions() to only enable those, and rename accordingly.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This file contains floating point implementations of combiners for all
pixman operators. These combiners operate on buffers containing single
precision floating point pixels stored in (a, r, g, b) order.
The combiners are added to the pixman_implementation_t struct, but
nothing uses them yet.
This commit incorporates a number of bug fixes contributed by Andrea
Canciani.
Some notes:
- The combiners are making sure to never divide by zero regardless of
input, so an application could enable divide-by-zero exceptions and
pixman wouldn't generate any.
- The operators are implemented according to the Render spec. Ie.,
- If the input pixels are between 0 and 1, then so is the output.
- The source and destination coefficients for the conjoint and
disjoint operators are clamped to [0, 1].
- The PDF operators are not described in the render spec, and the
implementation here doesn't do any clamping except in the final
conversion from floating point to destination format.
All of the above will need to be rethought if we add support for pixel
formats that can support negative and greater-than-one pixels. It is
in fact already the case in principle that convolution filters can
produce pixels with negative values, but since these go through the
broken "wide" path that narrows everything to 32 bits, these negative
values don't currently survive to the combiners.
|
|
|
|
|
| |
Comment out some formats in blitters-test that are going to rely on
floating point in some upcoming patches.
|
|
|
|
|
|
| |
In preparation for an upcoming change of the wide pipe to use floating
point, comment out some formats in glyph-test that are going to be
using floating point and update the CRC32 value to match.
|
|
|
|
|
|
|
|
| |
Add const to pointer arguments when the function doesn't change the
pointed-to data.
Also in add_glyphs() in pixman-glyph.c make 'white' in add_glyphs()
static and const.
|
|
|
|
|
|
| |
Definition was not present in <4.8.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=55451
|
|
|
|
|
|
| |
Otherwise the test fails on big-endian.
Tested-by: Matt Turner <mattst88@gmail.com>
|
|
|
|
|
|
|
|
| |
Before this patch it was often faster to scale and repeat
in two passes because each pass used a fast path vs.
the slow path that the single pass approach takes. This
makes it so that the single pass approach has competitive
performance.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The infinite loop detected by "affine-test 212944861" is caused by an
overflow in this expression:
max_x = pixman_fixed_to_int (vx + (width - 1) * unit_x) + 1;
where (width - 1) * unit_x doesn't fit in a signed int. This causes
max_x to be too small so that this:
src_width = 0
while (src_width < REPEAT_NORMAL_MIN_WIDTH && src_width <= max_x)
src_width += src_image->bits.width;
results in src_width being 0. Later on when src_width is used for
repeat calculations, we get the infinite loop.
By casting unit_x to int64_t, the expression no longer overflows and
affine-test 212944861 and infinite-loop no longer loop forever.
|
|
|
|
|
|
|
|
|
| |
This test demonstrates a bug where a certain transformation matrix can
result in an infinite loop. It was extracted as a standalone version
of "affine-test 212944861".
If given the option -nf, the test program will not call fail_after()
and therefore potentially run forever.
|
|
|
|
|
|
|
|
| |
Printing out the translation and scale is a bit misleading because the
actual transformation matrix can be modified in various other ways.
Instead simply print the whole transformation matrix that is actually
used.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
over_8888_8888_8888
Performance numbers before/after on MIPS-74kc @ 1GHz:
lowlevel-blt-bench results
Referent (before):
over_8888_8888 = L1: 19.61 L2: 17.10 M: 11.16 ( 59.20%) HT: 16.47 VT: 15.81 R: 14.82 RT: 8.90 ( 50Kops/s)
over_8888_8888_8888 = L1: 13.56 L2: 11.22 M: 7.46 ( 79.18%) HT: 6.24 VT: 6.20 R: 6.11 RT: 3.95 ( 29Kops/s)
Optimized:
over_8888_8888 = L1: 46.42 L2: 36.70 M: 16.69 ( 88.57%) HT: 17.11 VT: 16.55 R: 15.31 RT: 9.48 ( 52Kops/s)
over_8888_8888_8888 = L1: 26.06 L2: 22.53 M: 11.49 (121.91%) HT: 9.93 VT: 9.62 R: 9.19 RT: 5.75 ( 36Kops/s)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
over_0565_8_0565
Performance numbers before/after on MIPS-74kc @ 1GHz:
lowlevel-blt-bench results
Referent (before):
over_0565_n_0565 = L1: 7.56 L2: 7.24 M: 6.16 ( 16.38%) HT: 4.01 VT: 3.84 R: 3.79 RT: 1.66 ( 18Kops/s)
over_0565_8_0565 = L1: 7.43 L2: 7.05 M: 5.98 ( 23.85%) HT: 5.27 VT: 5.23 R: 5.09 RT: 3.14 ( 28Kops/s)
Optimized:
over_0565_n_0565 = L1: 15.47 L2: 14.52 M: 12.30 ( 32.65%) HT: 10.76 VT: 10.57 R: 10.27 RT: 6.63 ( 46Kops/s)
over_0565_8_0565 = L1: 15.47 L2: 14.61 M: 11.78 ( 46.92%) HT: 10.00 VT: 9.84 R: 9.40 RT: 5.81 ( 43Kops/s)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
over_8888_8_0565
Performance numbers before/after on MIPS-74kc @ 1GHz:
lowlevel-blt-bench results
Referent (before):
over_8888_n_0565 = L1: 8.95 L2: 8.33 M: 6.95 ( 27.74%) HT: 4.27 VT: 4.07 R: 4.01 RT: 1.74 ( 19Kops/s)
over_8888_8_0565 = L1: 8.86 L2: 8.11 M: 6.72 ( 35.71%) HT: 5.68 VT: 5.62 R: 5.47 RT: 3.35 ( 30Kops/s)
Optimized:
over_8888_n_0565 = L1: 18.76 L2: 17.55 M: 13.11 ( 52.19%) HT: 11.35 VT: 11.10 R: 10.88 RT: 6.94 ( 47Kops/s)
over_8888_8_0565 = L1: 18.14 L2: 16.79 M: 12.10 ( 64.25%) HT: 10.24 VT: 9.98 R: 9.63 RT: 5.89 ( 43Kops/s)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
over_8888_8_8888
Performance numbers before/after on MIPS-74kc @ 1GHz:
lowlevel-blt-bench results
Referent (before):
over_8888_n_8888 = L1: 9.92 L2: 11.27 M: 8.50 ( 45.23%) HT: 4.70 VT: 4.45 R: 4.49 RT: 1.85 ( 20Kops/s)
over_8888_8_8888 = L1: 12.54 L2: 10.86 M: 8.18 ( 54.36%) HT: 6.53 VT: 6.45 R: 6.41 RT: 3.83 ( 33Kops/s)
Optimized:
over_8888_n_8888 = L1: 28.02 L2: 24.92 M: 14.72 ( 78.15%) HT: 13.03 VT: 12.65 R: 12.00 RT: 7.49 ( 49Kops/s)
over_8888_8_8888 = L1: 26.92 L2: 23.93 M: 13.65 ( 90.58%) HT: 11.68 VT: 11.29 R: 10.56 RT: 6.37 ( 45Kops/s)
|
|
|
|
|
| |
Various formatting fixes, and removal of some obsolete comments about
strength reduction of operators.
|
|
|
|
|
|
|
|
| |
In the checks for whether the transforms are rotation matrices "-1"
and "1" were used instead of the correct -pixman_fixed_1 and
pixman_fixed_1.
Fixes test suite failure for rotate-test.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This program exercises a bug in pixman-image.c where "-1" and "1" were
used instead of the correct "- pixman_fixed_1" and "pixman_fixed_1".
With the fast implementation enabled:
% ./rotate-test
rotate test failed! (checksum=35A01AAB, expected 03A24D51)
Without it:
% env PIXMAN_DISABLE=fast ./rotate-test
pixman: Disabled fast implementation
rotate test passed (checksum=03A24D51)
V2: The first version didn't have lcg_srand (testnum) in test_transform().
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In general, the component alpha version of an operator is supposed to
do this:
- multiply source with mask in all channels
- multiply mask with source alpha in all channels
- compute the regular operator in all channels using the
mask value whenever source alpha is called for
The first two steps are usually accomplished with the function
combine_mask_ca(), but for operators where source alpha is not used,
such as SRC, ADD and OUT, the simpler function
combine_mask_value_ca(), which doesn't compute the new mask values,
can be used.
However, the PDF blend modes generally *do* make use of source alpha,
so they can't use combine_mask_value_ca() as they do now. They have to
use combine_mask_ca().
This patch fixes this in combine_multiply_ca() and the CA combiners
generated by PDF_SEPARABLE_BLEND_MODE.
|
|
|
|
|
|
|
|
|
|
| |
The fast_composite_scaled_nearest() function can be called when the
format is x8b8g8r8. In that case pixels fetched in fetch_nearest()
need to have their alpha channel set to 0xff.
Fixes test suite failure in scaling-test.
Reviewed-by: Matt Turner <mattst88@gmail.com>
|